Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: simon cruanes <simon.cruanes.2007@m4x.org>
To: caml-list@inria.fr
Subject: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list
Date: Fri, 12 Apr 2013 17:01:47 +0200	[thread overview]
Message-ID: <516821DB.7060600@m4x.org> (raw)
In-Reply-To: <58e60ce1.4599.13dfeacfdb3.Coremail.syshen@nudt.edu.cn>

If the order in the lists does not matter, I would suggest some kind of
Trie (http://en.wikipedia.org/wiki/Trie) to store the *sorted* int
lists; the algorithm for search would recursively explore all the
branches of the trie that can be a superlist of the input list.

Here is a code snippet (not thoroughly tested):

(* ------------------- %< ------ >% ----------------- *)

type trie =
  | Node of bool *  (* end of a list? *)
            (int * trie) list  (* subtries, indexed by their first
element *)

let empty = Node (false, [])

(* add [l] to [trie], assuming [l] is sorted *)
let rec add trie l =
  match trie, l with
  | Node (_, subtries), [] -> Node (true, subtries)
  | Node (b, subtries), x::l' ->
    let subtrie =
      try List.assoc x subtries
      with Not_found -> Node (false, []) in
    (* recursive add *)
    let subtrie = add subtrie l' in
    let subtries = List.remove_assoc x subtries in
    Node (b, (x,subtrie) :: subtries)

(* find whether [l] is a sublist of some list of [trie] *)
let rec find trie l =
  match trie, l with
  | _, [] -> true
  | Node (_, subtries), (x::l') ->
    find_list x subtries l'
and find_list x subtries l' = match subtries with
  | [] -> false
  | (y,subtrie)::subtries' ->
    (if y < x then find subtrie (x::l')
    else if y = x then find subtrie l'
    else false) || find_list x subtries' l'

(* ------------------- %< ------ >% ----------------- *)

Cheers!


Simon

On 12/04/2013 16:36, 沈胜宇 wrote:
> Dear all:
> 
> I have an int list list, whose name is LL
> 
> and I need to frequently decide whether a particular int list, whose
> name is L, is a sublist of an element of LL.
> 
> Is there any efficent data structure to do this?
> 
> At the mean time, I store LL as (int, bool) Hashtbl.t list, that is,
> each element of LL is stored as a hash table.
> 
> So searching L in LL is reduce to decide whether there exist an element
> of LL, such every element of L hit in this element.
> 
> At the mean time, the space is not a big problem, but the run time
> overhead is major concern,
> 
> So if there exist any more faster data structure?
> 
> Thank you
> 
> Shen
> 
>

next prev parent reply	other threads:[~2013-04-12 15:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-12 14:36 沈胜宇
2013-04-12 15:01 ` simon cruanes [this message]
2013-04-12 15:48 ` Jean-Francois Monin
2013-04-13  6:58   ` 沈胜宇
2013-04-13  7:56     ` Gabriel Scherer
2013-04-12 22:15 ` Toby Kelsey
2013-04-13  6:57   ` 沈胜宇
2013-04-23  9:05     ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516821DB.7060600@m4x.org \
    --to=simon.cruanes.2007@m4x.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).