From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 910277EE51 for ; Tue, 23 Apr 2013 11:05:44 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=pra; client-ip=212.227.17.11; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.17.11; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.web.de) identity=helo; client-ip=212.227.17.11; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="postmaster@mout.web.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkYDADVOdlHU4xELlWdsb2JhbABQvy+FJIECFg4BAQEBBw0JCRIqgh8BAQUnE08LGAklDwUNGzSIAQEDE6xmhX0fKw2IVoxjgl6CZmEDk0aBa4FjhXmGBIZ0gTg X-IPAS-Result: AkYDADVOdlHU4xELlWdsb2JhbABQvy+FJIECFg4BAQEBBw0JCRIqgh8BAQUnE08LGAklDwUNGzSIAQEDE6xmhX0fKw2IVoxjgl6CZmEDk0aBa4FjhXmGBIZ0gTg X-IronPort-AV: E=Sophos;i="4.87,531,1363129200"; d="scan'208";a="14466128" Received: from mout.web.de ([212.227.17.11]) by mail2-smtp-roc.national.inria.fr with ESMTP; 23 Apr 2013 11:05:44 +0200 Received: from frosties.localnet ([95.208.119.3]) by smtp.web.de (mrweb002) with ESMTPSA (Nemesis) id 0MZleQ-1UDQvQ3B54-00Llw3 for ; Tue, 23 Apr 2013 11:05:42 +0200 Received: from mrvn by frosties.localnet with local (Exim 4.80) (envelope-from ) id 1UUZAU-0006La-5p for caml-list@inria.fr; Tue, 23 Apr 2013 11:05:42 +0200 Date: Tue, 23 Apr 2013 11:05:42 +0200 From: Goswin von Brederlow To: caml-list@inria.fr Message-ID: <20130423090542.GE23558@frosties> References: <58e60ce1.4599.13dfeacfdb3.Coremail.syshen@nudt.edu.cn> <5168877D.1090103@gmail.com> <3cfc99b5.468b.13e022ef3e8.Coremail.syshen@nudt.edu.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3cfc99b5.468b.13e022ef3e8.Coremail.syshen@nudt.edu.cn> User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V02:K0:mWnz7hTciuAokbqGczG6XAv6XSbCOL5myfTFj+GD3Px 4/InAsNJMC8bLAn5ZwHHWe7rlafYss2ieXsZlQijGqN0bJ/vRr maZJmCzyr7SOAp1KLTprsEqV9QfYRbwJuWrK7tw9tQfxWn7J/Y ymtg6GFCxL9fI/N9U+GJkEwqiDM6o8gQ8uHU7As7y0PUj3QiS4 ofewtNHC8jipnY5lB3GBA== Subject: Re: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list On Sat, Apr 13, 2013 at 02:57:11PM +0800, ?????? wrote: > Dear Toby: > > Thank you for your help. > > But my problem is a little more difference from the substring searching problem with suffix tree. > > In my problem, a list L1 is another list L2's sublist, is much more general that the substring problem. > > For example, bcd is a substring of abcde, because bcd is continuely occur in abcde. > > At the same time, bd is not a substring of abcde, because is is not continuesly in abcde. > > But in my problem, a list b->d is a sub list of a->b->c->d->e. > > > So after reading the suffix tree introduction on wiki, I think it may not fit for my problem. > > I also find that trie is more general than suffix, and can be used to handle my problem. but it is too general in the sense that it di not effiecently handle the case that two list with multiple(not just one) shared sublist. > > For example, I first insert a list a->b->c->d->e->f into trie, and then I insert a->b->d->e into the trie. > > the trie can not store the second shared sublist d->e in the same place, it can only store them like > a->b->c->d->e->f > ->d->e > > So do you have more suggenhion on this ? > > Shen > > > -----????????----- > > ??????: "Toby Kelsey" > > ????????: 2013-04-13 06:15:25 (??????) > > ??????: caml-list@inria.fr > > ????: syshen@nudt.edu.cn > > ????: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list > > > > On 12/04/13 15:36, ?????? wrote: > > > Dear all: > > > I have an int list list, whose name is LL > > > and I need to frequently decide whether a particular int list, whose name is L, is a sublist of an element of LL. > > > > > > Is there any efficent data structure to do this? > > > > A data structure useful for finding substrings quickly is the "suffix tree", > > this can be built in O(n) - for small alphabets - or O(n log n) time and > > substring searches take O(length substring) time. The suffix tree takes more > > space than the original string though. An int list can take the role of the > > string here. > > > > Toby Note: A suffix tree can be build in O(n) and takes O(n) space. Takes something like 48-64 times the space of the string in ocaml. Seems like you aren't looking for sublists (in which the order would matter) but subsets (order doesn't matter and elements are unique). You can build a lookup tree containing all subsets of each set like this: Tree with {a,b,c,d,e} inserted: +a+b+c+d-e | | | \e-d | | +d+c-e | | | \e-c | | \e+c-d | | \d-c | +c+b+d-e | | | \e-d | | +d+b-e | | | \e-b | | \e+b-d | | \d-b | +d+b+c-e | | | \e-c | | +c+b-e | | | \e-b | | \e+b-c | | \c-b | ... That gets rather large. If you not only need to know L is a subset of one of the sets in LL then each node also needs to store a list of sets containing the subset expressed so far. If you can get L sorted that reduces the tree quite a bit: +a+b+c+d-e | | | \e | | +d-e | | \e | +c+d-e | | \e | +d-e | \e +b+c+d-e | | \e | +d-e | \e +c+d-e | \e +d-e \e Since L is sorted you only need the paths that are sorted. That gives you a tree of size O(2^n) where n is the number of unique ints in all sets. Still huge but your n might be small enough. This will give you O(|L|) lookup. Alternatively to sorting L you could still use the above tree. Start at the root and check the first child: a. Is a in L? If so go down that branch, otherwise check the next child. With L as a list each lookup would be O(n). As Set it would be O(log n) and as Hashtbl.t it would O(1). MfG Goswin