caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Hashtbl or Set?
@ 2011-09-19 20:09 Ian Zimmerman
  2011-09-19 20:17 ` Basile Starynkevitch
  2011-09-20 10:13 ` Gerd Stolpmann
  0 siblings, 2 replies; 3+ messages in thread
From: Ian Zimmerman @ 2011-09-19 20:09 UTC (permalink / raw)
  To: caml-list


I need a somewhat large (thousands) set of strings, created once during
startup and never modified after.  What is a better choice, a (string,
unit) Hashtbl.t or the Set module?  If the Set module still uses trees
as it did when I was young :-), access will be logarithmic versus
constant for Hashtbl.  But on the other hand a hash function must
examine all of every string while the comparison of 2 strings stops at
the first nonmatching character.

I am thinking about the time to build the set as well as the probing
time.

-- 
Ian Zimmerman
gpg public key: 1024D/C6FF61AD
fingerprint: 66DC D68F 5C1B 4D71 2EE5  BD03 8A00 786C C6FF 61AD
Rule 420: All persons more than eight miles high to leave the court.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Hashtbl or Set?
  2011-09-19 20:09 [Caml-list] Hashtbl or Set? Ian Zimmerman
@ 2011-09-19 20:17 ` Basile Starynkevitch
  2011-09-20 10:13 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Basile Starynkevitch @ 2011-09-19 20:17 UTC (permalink / raw)
  To: Ian Zimmerman; +Cc: caml-list

On Mon, 19 Sep 2011 13:09:39 -0700
Ian Zimmerman <itz@buug.org> wrote:

> 
> I need a somewhat large (thousands) set of strings, created once during
> startup and never modified after. 

You didn't tell us why you need such as set, and what kind of strings are inside. 
(I mean that if the strings are file contents -e.g. typically 100kbytes long-, it is
different than if the strings are dictionnary words or source program identifiers - e.g.
typically a dozen of bytes each-)

> What is a better choice, a (string, unit) Hashtbl.t or the Set module? 
> If the Set module still uses trees
> as it did when I was young :-), access will be logarithmic versus
> constant for Hashtbl.  But on the other hand a hash function must
> examine all of every string while the comparison of 2 strings stops at
> the first nonmatching character.

Honestly, I believe that for practical purposes, unless you have very strange
constraints, both approaches are good enough. Don't forget that too early optimization is
evil.


> I am thinking about the time to build the set as well as the probing
> time.

Make it a module, that is, your abstract data type. Then, if you discover later that this
module is a bottleneck, optimize it further.


Honestly, I think you should not care that much in the first place. (the real timing may
also depend of memory cache considerations and sizes).

Cheers.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] Hashtbl or Set?
  2011-09-19 20:09 [Caml-list] Hashtbl or Set? Ian Zimmerman
  2011-09-19 20:17 ` Basile Starynkevitch
@ 2011-09-20 10:13 ` Gerd Stolpmann
  1 sibling, 0 replies; 3+ messages in thread
From: Gerd Stolpmann @ 2011-09-20 10:13 UTC (permalink / raw)
  To: Ian Zimmerman; +Cc: caml-list

Am Montag, den 19.09.2011, 13:09 -0700 schrieb Ian Zimmerman:
> I need a somewhat large (thousands) set of strings, created once during
> startup and never modified after.  What is a better choice, a (string,
> unit) Hashtbl.t or the Set module?  If the Set module still uses trees
> as it did when I was young :-), access will be logarithmic versus
> constant for Hashtbl.  But on the other hand a hash function must
> examine all of every string while the comparison of 2 strings stops at
> the first nonmatching character.
> 
> I am thinking about the time to build the set as well as the probing
> time.

Nothing has changed so far. Besides naive Hashtbl and Set you have at
least two more options: Hashtbl with self-defined hash function (i.e.
one that stops after the n-th char), and Set with a hash-based
comparison function (i.e. compare first the hashes of the strings, and
only if the hashes are equal, compare the strings). 

In my experience, for a large constant set a Hashtbl usually works best.
Set is usually only better when you can profit from functional updates.

Gerd

> 
> -- 
> Ian Zimmerman
> gpg public key: 1024D/C6FF61AD
> fingerprint: 66DC D68F 5C1B 4D71 2EE5  BD03 8A00 786C C6FF 61AD
> Rule 420: All persons more than eight miles high to leave the court.
> 

-- 
------------------------------------------------------------
Gerd Stolpmann, Darmstadt, Germany    gerd@gerd-stolpmann.de
Creator of GODI and camlcity.org.
Contact details:        http://www.camlcity.org/contact.html
Company homepage:       http://www.gerd-stolpmann.de
*** Searching for new projects! Need consulting for system
*** programming in Ocaml? Gerd Stolpmann can help you.
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-09-20 10:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-19 20:09 [Caml-list] Hashtbl or Set? Ian Zimmerman
2011-09-19 20:17 ` Basile Starynkevitch
2011-09-20 10:13 ` Gerd Stolpmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).