From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 803AA7EE51 for ; Sat, 13 Apr 2013 09:57:02 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of gabriel.scherer@gmail.com) identity=pra; client-ip=209.85.212.52; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of gabriel.scherer@gmail.com designates 209.85.212.52 as permitted sender) identity=mailfrom; client-ip=209.85.212.52; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="gabriel.scherer@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-vb0-f52.google.com) identity=helo; client-ip=209.85.212.52; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="gabriel.scherer@gmail.com"; x-sender="postmaster@mail-vb0-f52.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArwCAFYPaVHRVdQ0jWdsb2JhbABQDoMugzCsIpIZfggWDgEBAQEHCwsJEgYkgh8BAQUnGQEbEgsBAwwGAwILDQQJHwIiAREBBQEKEgYTEodvAQMPDI55jySLWghNgnuELAoZJwMKWYh+AQUMgRONRTMHgiqBFwOIUIpmg0yBIY4OFimCeHdBOg X-IPAS-Result: ArwCAFYPaVHRVdQ0jWdsb2JhbABQDoMugzCsIpIZfggWDgEBAQEHCwsJEgYkgh8BAQUnGQEbEgsBAwwGAwILDQQJHwIiAREBBQEKEgYTEodvAQMPDI55jySLWghNgnuELAoZJwMKWYh+AQUMgRONRTMHgiqBFwOIUIpmg0yBIY4OFimCeHdBOg X-IronPort-AV: E=Sophos;i="4.87,466,1363129200"; d="scan'208";a="13097597" Received: from mail-vb0-f52.google.com ([209.85.212.52]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 13 Apr 2013 09:57:01 +0200 Received: by mail-vb0-f52.google.com with SMTP id w8so2641863vbf.25 for ; Sat, 13 Apr 2013 00:57:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=uiJas41TDp7R5CVeEL7qC8lD9NK3Cvue9f0ykxO2fmE=; b=EnrZolME+52sA6jxlikGZ9BZyA7DLUXefY19yyhXx5wRzXnVRcH16wTl7jH/Gi/1WS 1fsG2JsQCLpa0clUepiDdzod8PHTAneZQq44LH1ACMQXP4GJbrkcP5Y8IWlhIODtOUPR IPsuN5jH2OeffEs9IHCKAUeKrOTlltbGKfYYL4/oXaWNuQbm8sldiDxki1OQ4BxgDhiX zm+/T96pZ4UPYlabRJW3n8HdAHCD7cyZN+nP4pxOkQu2nmIybS1LvEPIFY/DV+jrVnoX SE9gS3dLFbQbfD1QukycU8a/VGge7z1XAG8A/OZ+NEFYAZDduoQni7YKauGUYOfg2/b5 aLWg== X-Received: by 10.52.34.168 with SMTP id a8mr9243696vdj.75.1365839820602; Sat, 13 Apr 2013 00:57:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.233.13 with HTTP; Sat, 13 Apr 2013 00:56:20 -0700 (PDT) In-Reply-To: <6936226f.468c.13e02309329.Coremail.syshen@nudt.edu.cn> References: <58e60ce1.4599.13dfeacfdb3.Coremail.syshen@nudt.edu.cn> <20130412154804.GI6877@melezet> <6936226f.468c.13e02309329.Coremail.syshen@nudt.edu.cn> From: Gabriel Scherer Date: Sat, 13 Apr 2013 09:56:20 +0200 Message-ID: To: =?UTF-8?B?5rKI6IOc5a6H?= Cc: Jean-Francois Monin , caml-list Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable Subject: Re: Re: [Caml-list] [CAML]:: efficient data structure for storing and searching int list list There is a fairly generic way to get an efficient data structure if you don't mind huge preprocessing costs. You can see your problem as a word recognition problem (you want to accept only words that are sublists of one of the lists in your set), so a natural data representation of this is a finite-state automaton. Getting an efficient automaton out of your data set is easy (but may be extremely costly): you only need to implement a determinization algorithm (and if you want to avoid space explosion, maybe a minimization algorithm as well) and those are well-known. Given an automaton for a list LL, you can add a new list L by creating an automaton recognizing sublists of L, making its union with your LL automaton, and determinizing again. Of course, that is a kind of giant hammer, there are probably more specialized approaches that may be suitable for your problem. I didn't understand whether you're trying to check a subsequence problem ('ac' is a subsequence of 'abcd') or a substring problem ('ab' is not a substring, while 'abc' would be). For the substring problem, a common trick is to add to your trie not only a L, but also the reversed prefixes of L: for the word 'abcd' you would store 'abcd', 'bcd|a', 'cd|ab', 'd|abc'. Checking substring inclusion is then immediate. This results in a multiplication of the memory usage; note that DFA minimization can be seen as an optimal, principled way to introduce sharing in this data structure. On Sat, Apr 13, 2013 at 8:58 AM, =C9=F2=CA=A4=D3=EE wr= ote: > Dear Monin: > > thank you for your help. > > But I think trie is too general in the sense that it did not effiecently = handle the case that two list with multiple(not just one) shared sublist. > > For example, I first insert a list a->b->c->d->e->f into trie, and then I= insert a->b->d->e into the trie. > > the trie can not store the second shared sublist d->e in the same place, = it can only store them like > a->b->c->d->e->f > ->d->e > > do you have more suggesion on this? > > Shen >> -----=D4=AD=CA=BC=D3=CA=BC=FE----- >> =B7=A2=BC=FE=C8=CB: "Jean-Francois Monin" >> =B7=A2=CB=CD=CA=B1=BC=E4: 2013-04-12 23:48:04 (=D0=C7=C6=DA=CE=E5) >> =CA=D5=BC=FE=C8=CB: "=C9=F2=CA=A4=D3=EE" >> =B3=AD=CB=CD: caml-list >> =D6=F7=CC=E2: Re: [Caml-list] [CAML]:: efficient data structure for stor= ing and searching int list list >> >> You may have some total order on the elements of your lists. >> Then consider only sorted lists, and implement LL with tries. >> >> JF >> >> On Fri, Apr 12, 2013 at 10:36:22PM +0800, =C9=F2=CA=A4=D3=EE wrote: >> > Dear all: >> > I have an int list list, whose name is LL >> > and I need to frequently decide whether a particular int list, whos= e name >> > is L, is a sublist of an element of LL. >> > Is there any efficent data structure to do this? >> > At the mean time, I store LL as (int, bool) Hashtbl.t list, that is= , each >> > element of LL is stored as a hash table. >> > So searching L in LL is reduce to decide whether there exist an ele= ment of >> > LL, such every element of L hit in this element. >> > At the mean time, the space is not a big problem, but the run time >> > overhead is major concern, >> > So if there exist any more faster data structure? >> > Thank you >> > Shen >> >> -- >> Jean-Francois Monin >> LIAMA Project FORMES, CNRS & Universite de Grenoble 1 & >> Tsinghua University > > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs