caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Basile Starynkevitch <basile@starynkevitch.net>
To: Ian Zimmerman <itz@buug.org>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl or Set?
Date: Mon, 19 Sep 2011 22:17:18 +0200	[thread overview]
Message-ID: <20110919221718.16929df0551299a720ca6def@starynkevitch.net> (raw)
In-Reply-To: <87sjnsl2f0.fsf@foolinux.dyndns.org>

On Mon, 19 Sep 2011 13:09:39 -0700
Ian Zimmerman <itz@buug.org> wrote:

> 
> I need a somewhat large (thousands) set of strings, created once during
> startup and never modified after. 

You didn't tell us why you need such as set, and what kind of strings are inside. 
(I mean that if the strings are file contents -e.g. typically 100kbytes long-, it is
different than if the strings are dictionnary words or source program identifiers - e.g.
typically a dozen of bytes each-)

> What is a better choice, a (string, unit) Hashtbl.t or the Set module? 
> If the Set module still uses trees
> as it did when I was young :-), access will be logarithmic versus
> constant for Hashtbl.  But on the other hand a hash function must
> examine all of every string while the comparison of 2 strings stops at
> the first nonmatching character.

Honestly, I believe that for practical purposes, unless you have very strange
constraints, both approaches are good enough. Don't forget that too early optimization is
evil.


> I am thinking about the time to build the set as well as the probing
> time.

Make it a module, that is, your abstract data type. Then, if you discover later that this
module is a bottleneck, optimize it further.


Honestly, I think you should not care that much in the first place. (the real timing may
also depend of memory cache considerations and sizes).

Cheers.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

  reply	other threads:[~2011-09-19 20:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-19 20:09 Ian Zimmerman
2011-09-19 20:17 ` Basile Starynkevitch [this message]
2011-09-20 10:13 ` Gerd Stolpmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110919221718.16929df0551299a720ca6def@starynkevitch.net \
    --to=basile@starynkevitch.net \
    --cc=caml-list@inria.fr \
    --cc=itz@buug.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).