From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA24980; Fri, 27 Dec 2002 14:07:37 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id OAA25005 for ; Fri, 27 Dec 2002 14:07:36 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id gBRD7TH08206; Fri, 27 Dec 2002 14:07:29 +0100 (MET) Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id OAA25656; Fri, 27 Dec 2002 14:07:29 +0100 (MET) From: Pierre Weis Message-Id: <200212271307.OAA25656@pauillac.inria.fr> Subject: Re: [Caml-list] Cross-platform DBM equivalent? In-Reply-To: from "Joshua D. Guttman" at "Dec 26, 102 12:03:49 pm" To: guttman@mitre.org Date: Fri, 27 Dec 2002 14:07:29 +0100 (MET) Cc: pierre.weis@inria.fr, caml-list@inria.fr, guttman@mitre.org X-Mailer: ELM [version 2.4ME+ PL28 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > Pierre Weis writes: > > > As far as I know the best (and simpler) way to do this for reasonable > > number of URLs bindings (say thousands but not millions) is to create > > a Hashtlbl.t or Map.t and dump it to file using output_value (then > > read it back with input_value). > > Is there a recommended data structure in case one needs tables for > reasonably fast access to millions or tens of millions of values? > Probably hash tables are no longer providing nearly-constant access > time at those sizes. Is there something better in the standard > library? > > Thanks -- > > Joshua You need to try :) I think that hash table and maps can handle tens of millions of values if you have enough memory available. If your hash tables are big enough and if your keys are reasonably different strings, hash table will still give you nearly-constant access time. If you have not enough memory, you should consider mmap facilities from the Bigarray module (memory mapping of files, accessed from the disk by need when a given page is indeed accessed by the application). All the best for the next year! Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/ ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners