caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list
Date: Tue, 23 Apr 2013 11:05:42 +0200	[thread overview]
Message-ID: <20130423090542.GE23558@frosties> (raw)
In-Reply-To: <3cfc99b5.468b.13e022ef3e8.Coremail.syshen@nudt.edu.cn>

On Sat, Apr 13, 2013 at 02:57:11PM +0800, ?????? wrote:
> Dear Toby:
> 
> Thank you for your help.
> 
> But my problem is a little more difference from the substring searching problem with suffix tree.
> 
> In my problem, a list L1 is another list L2's sublist, is much more general that the substring problem.
> 
> For example, bcd is a substring of abcde, because bcd is continuely occur in abcde.
> 
> At the same time, bd is not a substring of abcde, because is is not continuesly in abcde.
> 
> But in my problem, a list b->d is a sub list of a->b->c->d->e.
> 
> 
> So after reading the suffix tree introduction on wiki, I think it may not fit for my problem.
> 
> I also find that trie is more general than suffix, and can be used to handle my problem. but it is too general in the sense that it di not effiecently handle the case that two list with multiple(not just one) shared sublist.
> 
> For example, I first insert a list a->b->c->d->e->f into trie, and then I insert a->b->d->e into the trie.
> 
> the trie can not store the second shared sublist d->e in the same place, it can only store them like 
> a->b->c->d->e->f
>     ->d->e
> 
> So do you have more suggenhion on this ?
> 
> Shen
> 
> > -----????????-----
> > ??????: "Toby Kelsey" <toby.kelsey@gmail.com>
> > ????????: 2013-04-13 06:15:25 (??????)
> > ??????: caml-list@inria.fr
> > ????: syshen@nudt.edu.cn
> > ????: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list
> > 
> > On 12/04/13 15:36, ?????? wrote:
> > > Dear all:
> > > I have an int list list, whose name is LL
> > > and I need to frequently decide whether a particular int list, whose name is L, is a sublist of an element of LL.
> > > 
> > > Is there any efficent data structure to do this?
> > 
> > A data structure useful for finding substrings quickly is the "suffix tree",
> > this can be built in O(n) - for small alphabets - or O(n log n) time and
> > substring searches take O(length substring) time. The suffix tree takes more
> > space than the original string though. An int list can take the role of the
> > string here.
> > 
> > Toby

Note: A suffix tree can be build in O(n) and takes O(n) space. Takes
something like 48-64 times the space of the string in ocaml.


Seems like you aren't looking for sublists (in which the order would
matter) but subsets (order doesn't matter and elements are unique).

You can build a lookup tree containing all subsets of each set like this:

Tree with {a,b,c,d,e} inserted:

+a+b+c+d-e
| | | \e-d
| | +d+c-e
| | | \e-c
| | \e+c-d
| |   \d-c
| +c+b+d-e
| | | \e-d
| | +d+b-e
| | | \e-b
| | \e+b-d
| |   \d-b
| +d+b+c-e
| | | \e-c
| | +c+b-e
| | | \e-b
| | \e+b-c
| |   \c-b
| ...

That gets rather large. If you not only need to know L is a subset of
one of the sets in LL then each node also needs to store a list of
sets containing the subset expressed so far.

If you can get L sorted that reduces the tree quite a bit:

+a+b+c+d-e
| | | \e
| | +d-e
| | \e
| +c+d-e
| | \e
| +d-e
| \e
+b+c+d-e
| | \e
| +d-e
| \e
+c+d-e
| \e
+d-e
\e

Since L is sorted you only need the paths that are sorted. That gives
you a tree of size O(2^n) where n is the number of unique ints in all
sets. Still huge but your n might be small enough. This will give you
O(|L|) lookup.

Alternatively to sorting L you could still use the above tree. Start
at the root and check the first child: a. Is a in L? If so go down
that branch, otherwise check the next child. With L as a list each
lookup would be O(n). As Set it would be O(log n) and as Hashtbl.t it
would O(1).

MfG
	Goswin

      reply	other threads:[~2013-04-23  9:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-12 14:36 沈胜宇
2013-04-12 15:01 ` simon cruanes
2013-04-12 15:48 ` Jean-Francois Monin
2013-04-13  6:58   ` 沈胜宇
2013-04-13  7:56     ` Gabriel Scherer
2013-04-12 22:15 ` Toby Kelsey
2013-04-13  6:57   ` 沈胜宇
2013-04-23  9:05     ` Goswin von Brederlow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130423090542.GE23558@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).