* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 16:50 ` Pierre Weis
@ 2002-12-26 17:03 ` Joshua D. Guttman
2002-12-27 13:07 ` Pierre Weis
2002-12-26 17:08 ` David Brown
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Joshua D. Guttman @ 2002-12-26 17:03 UTC (permalink / raw)
To: Pierre Weis; +Cc: caml-list, Joshua D. Guttman
Pierre Weis <pierre.weis@inria.fr> writes:
> As far as I know the best (and simpler) way to do this for reasonable
> number of URLs bindings (say thousands but not millions) is to create
> a Hashtlbl.t or Map.t and dump it to file using output_value (then
> read it back with input_value).
Is there a recommended data structure in case one needs tables for
reasonably fast access to millions or tens of millions of values?
Probably hash tables are no longer providing nearly-constant access
time at those sizes. Is there something better in the standard
library?
Thanks --
Joshua
--
Joshua D. Guttman <guttman@mitre.org>
MITRE, Mail Stop S119 Office: +1 781 271 2654
202 Burlington Rd. Cell: +1 781 526 5713
Bedford, MA 01730-1420 USA Fax: +1 781 271 8953
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 17:03 ` Joshua D. Guttman
@ 2002-12-27 13:07 ` Pierre Weis
0 siblings, 0 replies; 16+ messages in thread
From: Pierre Weis @ 2002-12-27 13:07 UTC (permalink / raw)
To: guttman; +Cc: pierre.weis, caml-list, guttman
> Pierre Weis <pierre.weis@inria.fr> writes:
>
> > As far as I know the best (and simpler) way to do this for reasonable
> > number of URLs bindings (say thousands but not millions) is to create
> > a Hashtlbl.t or Map.t and dump it to file using output_value (then
> > read it back with input_value).
>
> Is there a recommended data structure in case one needs tables for
> reasonably fast access to millions or tens of millions of values?
> Probably hash tables are no longer providing nearly-constant access
> time at those sizes. Is there something better in the standard
> library?
>
> Thanks --
>
> Joshua
You need to try :) I think that hash table and maps can handle
tens of millions of values if you have enough memory available. If
your hash tables are big enough and if your keys are reasonably
different strings, hash table will still give you nearly-constant
access time.
If you have not enough memory, you should consider mmap facilities
from the Bigarray module (memory mapping of files, accessed from the
disk by need when a given page is indeed accessed by the
application).
All the best for the next year!
Pierre Weis
INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 16:50 ` Pierre Weis
2002-12-26 17:03 ` Joshua D. Guttman
@ 2002-12-26 17:08 ` David Brown
2002-12-26 18:23 ` Stefano Zacchiroli
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: David Brown @ 2002-12-26 17:08 UTC (permalink / raw)
To: caml-list
On Thu, Dec 26, 2002 at 05:50:26PM +0100, Pierre Weis wrote:
> > On Thu, Dec 26, 2002 at 09:39:33AM +0100, Alessandro Baretta wrote:
> > >
> > > >I am developing an application that needs fast access to persistent
> > > >configuration data, and I thought that DBM might be a good way to
> > > >provide that functionality ...
>
> As far as I know the best (and simpler) way to do this for reasonable
> number of URLs bindings (say thousands but not millions) is to create
> a Hashtlbl.t or Map.t and dump it to file using output_value (then
> read it back with input_value). In any case, I would start with this
> solution, since it provides cross-platform persistency with 4 lines of
> Caml code. A fast and easy way to obtain fast and good results!
And, with another few lines of code, you can write the new data to a
second backup file, and then Sys.rename it onto the first. Is
Sys.rename reliable on Windows, though?
Dave Brown
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 16:50 ` Pierre Weis
2002-12-26 17:03 ` Joshua D. Guttman
2002-12-26 17:08 ` David Brown
@ 2002-12-26 18:23 ` Stefano Zacchiroli
2002-12-27 13:11 ` Pierre Weis
2002-12-26 19:20 ` Dmitry Bely
2002-12-27 7:21 ` Matt Gushee
4 siblings, 1 reply; 16+ messages in thread
From: Stefano Zacchiroli @ 2002-12-26 18:23 UTC (permalink / raw)
To: caml-list
On Thu, Dec 26, 2002 at 05:50:26PM +0100, Pierre Weis wrote:
> As far as I know the best (and simpler) way to do this for reasonable
> number of URLs bindings (say thousands but not millions) is to create
> a Hashtlbl.t or Map.t and dump it to file using output_value (then
> read it back with input_value). In any case, I would start with this
Whath about memory consumption?
I know nothing about dbm internals but from my experience dbm doesn't
keep all the data in memory while Hashtbl and Map do.
Am I wrong?
Cheers.
--
Stefano Zacchiroli - Undergraduate Student of CS @ Uni. Bologna, Italy
zack@{cs.unibo.it,debian.org,bononia.it} - http://www.bononia.it/zack/
"I know you believe you understood what you think I said, but I am not
sure you realize that what you heard is not what I meant!" -- G.Romney
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 18:23 ` Stefano Zacchiroli
@ 2002-12-27 13:11 ` Pierre Weis
2003-01-12 10:13 ` Sven Luther
0 siblings, 1 reply; 16+ messages in thread
From: Pierre Weis @ 2002-12-27 13:11 UTC (permalink / raw)
To: Stefano Zacchiroli; +Cc: caml-list
> On Thu, Dec 26, 2002 at 05:50:26PM +0100, Pierre Weis wrote:
> > As far as I know the best (and simpler) way to do this for reasonable
> > number of URLs bindings (say thousands but not millions) is to create
> > a Hashtlbl.t or Map.t and dump it to file using output_value (then
> > read it back with input_value). In any case, I would start with this
>
> Whath about memory consumption?
>
> I know nothing about dbm internals but from my experience dbm doesn't
> keep all the data in memory while Hashtbl and Map do.
> Am I wrong?
>
> Cheers.
>
> --
> Stefano Zacchiroli - Undergraduate Student of CS @ Uni. Bologna, Italy
You're right. If memory foot prints are of concern, you should try to
define your own data structures using big arrays and memory
mapping. However, this is not yet available in the current
implementation; we will be glad to have your contribution on this
topic.
All the best for the next year,
Pierre Weis
INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-27 13:11 ` Pierre Weis
@ 2003-01-12 10:13 ` Sven Luther
0 siblings, 0 replies; 16+ messages in thread
From: Sven Luther @ 2003-01-12 10:13 UTC (permalink / raw)
To: Pierre Weis; +Cc: Stefano Zacchiroli, caml-list
On Fri, Dec 27, 2002 at 02:11:45PM +0100, Pierre Weis wrote:
> > On Thu, Dec 26, 2002 at 05:50:26PM +0100, Pierre Weis wrote:
> > > As far as I know the best (and simpler) way to do this for reasonable
> > > number of URLs bindings (say thousands but not millions) is to create
> > > a Hashtlbl.t or Map.t and dump it to file using output_value (then
> > > read it back with input_value). In any case, I would start with this
> >
> > Whath about memory consumption?
> >
> > I know nothing about dbm internals but from my experience dbm doesn't
> > keep all the data in memory while Hashtbl and Map do.
> > Am I wrong?
> >
> > Cheers.
> >
> > --
> > Stefano Zacchiroli - Undergraduate Student of CS @ Uni. Bologna, Italy
>
> You're right. If memory foot prints are of concern, you should try to
> define your own data structures using big arrays and memory
> mapping. However, this is not yet available in the current
> implementation; we will be glad to have your contribution on this
> topic.
BTW, Does ocaml optimize two successive writes to the same bigarray
field, or is it possible to use a big array on volatile memory, like for
example the MMIO command register of some piece of hardware ?
Friendly,
Sven Luther
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 16:50 ` Pierre Weis
` (2 preceding siblings ...)
2002-12-26 18:23 ` Stefano Zacchiroli
@ 2002-12-26 19:20 ` Dmitry Bely
2002-12-27 13:19 ` Pierre Weis
2002-12-27 7:21 ` Matt Gushee
4 siblings, 1 reply; 16+ messages in thread
From: Dmitry Bely @ 2002-12-26 19:20 UTC (permalink / raw)
To: caml-list
Pierre Weis <pierre.weis@inria.fr> writes:
>> > >I am developing an application that needs fast access to persistent
>> > >configuration data, and I thought that DBM might be a good way to
>> > >provide that functionality ...
>
> As far as I know the best (and simpler) way to do this for reasonable
> number of URLs bindings (say thousands but not millions) is to create
> a Hashtlbl.t or Map.t and dump it to file using output_value (then
> read it back with input_value). In any case, I would start with this
> solution, since it provides cross-platform persistency with 4 lines of
> Caml code. A fast and easy way to obtain fast and good results!
Can marshalling functions output be different for different versions of
Ocaml? If I try to feed incompatible data (e.g. form the previous version)
to input_value, what I will have then - segfault? If so, they can hardly be
used for saving configuration data.
- Dmitry Bely
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 19:20 ` Dmitry Bely
@ 2002-12-27 13:19 ` Pierre Weis
2002-12-27 18:03 ` brogoff
0 siblings, 1 reply; 16+ messages in thread
From: Pierre Weis @ 2002-12-27 13:19 UTC (permalink / raw)
To: Dmitry Bely; +Cc: caml-list
> Pierre Weis <pierre.weis@inria.fr> writes:
>
> >> > >I am developing an application that needs fast access to persistent
> >> > >configuration data, and I thought that DBM might be a good way to
> >> > >provide that functionality ...
> >
> > As far as I know the best (and simpler) way to do this for reasonable
> > number of URLs bindings (say thousands but not millions) is to create
> > a Hashtlbl.t or Map.t and dump it to file using output_value (then
> > read it back with input_value). In any case, I would start with this
> > solution, since it provides cross-platform persistency with 4 lines of
> > Caml code. A fast and easy way to obtain fast and good results!
>
> Can marshalling functions output be different for different versions of
> Ocaml? If I try to feed incompatible data (e.g. form the previous version)
> to input_value, what I will have then - segfault? If so, they can hardly be
> used for saving configuration data.
>
> - Dmitry Bely
There is is no warranty from the language definition, nor from the
implementor team, that the marshalling functions will never be
modified: imagine we find a new way to get data files much more
compact, then yes, sure, we will implement this new scheme as soon as
possible!
However, the modification of the output of marshalling functions is a
dramatic event, and in case of dramatic events the Caml team very
often provide translation programs fro the users. In the unlikely
event that the implementors do not provide such a tool, I'm pretty
sure that somebody in this list will rapidly contribute something
useful to translate from the old to the new format :)
All the best for the next year,
Pierre Weis
INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-27 13:19 ` Pierre Weis
@ 2002-12-27 18:03 ` brogoff
0 siblings, 0 replies; 16+ messages in thread
From: brogoff @ 2002-12-27 18:03 UTC (permalink / raw)
To: Pierre Weis; +Cc: Dmitry Bely, caml-list
On Fri, 27 Dec 2002, Pierre Weis wrote:
> > Can marshalling functions output be different for different versions of
> > Ocaml? If I try to feed incompatible data (e.g. form the previous version)
> > to input_value, what I will have then - segfault? If so, they can hardly be
> > used for saving configuration data.
> >
> > - Dmitry Bely
>
> There is is no warranty from the language definition, nor from the
> implementor team, that the marshalling functions will never be
> modified: imagine we find a new way to get data files much more
> compact, then yes, sure, we will implement this new scheme as soon as
> possible!
One can also write IO routines which tag the file with a header string
containing Sys.ocaml_version, and raise an exception if that header is
incompatible with the version of OCaml you're using. That avoids the
crash.
Wishing for type safe value IO in OCaml for 2003!
-- Brian
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Caml-list] Cross-platform DBM equivalent?
2002-12-26 16:50 ` Pierre Weis
` (3 preceding siblings ...)
2002-12-26 19:20 ` Dmitry Bely
@ 2002-12-27 7:21 ` Matt Gushee
4 siblings, 0 replies; 16+ messages in thread
From: Matt Gushee @ 2002-12-27 7:21 UTC (permalink / raw)
To: caml-list
On Thu, Dec 26, 2002 at 05:50:26PM +0100, Pierre Weis wrote:
> > On Thu, Dec 26, 2002 at 09:39:33AM +0100, Alessandro Baretta wrote:
> > >
> > > >I am developing an application that needs fast access to persistent
> > > >configuration data, and I thought that DBM might be a good way to
> > > >provide that functionality ...
>
> As far as I know the best (and simpler) way to do this for reasonable
> number of URLs bindings (say thousands but not millions) is to create
> a Hashtlbl.t or Map.t and dump it to file using output_value (then
> read it back with input_value). In any case, I would start with this
> solution, since it provides cross-platform persistency with 4 lines of
> Caml code. A fast and easy way to obtain fast and good results!
That sounds good. I'll try it. Thank you very much for the suggestion!
--
Matt Gushee When a nation follows the Way,
Englewood, Colorado, USA Horses bear manure through
mgushee@havenrock.com its fields;
http://www.havenrock.com/ When a nation ignores the Way,
Horses bear soldiers through
its streets.
--Lao Tzu (Peter Merel, trans.)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
^ permalink raw reply [flat|nested] 16+ messages in thread