caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Berke Durak <berke.durak@exalead.com>
To: "Harrison, John R" <john.r.harrison@intel.com>
Cc: Caml-list List <caml-list@inria.fr>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Date: Fri, 07 Mar 2008 11:09:06 +0100	[thread overview]
Message-ID: <47D11442.6090409@exalead.com> (raw)
In-Reply-To: <DCC19446A892D84FBB89AE7C94F0C04E01D98FB2BE@azsmsx501.amr.corp.intel.com>

Harrison, John R a écrit :
> Hi Berke,
> 
> | However, the idea of combining hash-consing and Patricia trees,
> | however elegant, does not suit my problem.  Basically, you are
> | cheating by using an auxiliary data structure, the hashtable (which is
> | also O(n^2) worst-case).
> 
> The code I pointed you to uses hashes, but not hash tables. As for the
> worst-case efficiency, what you say is true, but it's unlikely to
> matter in practice. And I suspect it may be unavoidable in principle,
> which is the interesting thing I learned last time this was discussed.
> See for example the following paper and its references:

Hi John,

Thanks for your code.  I'm sorry I thought you maintained a separate hash table.
It's very interesting code and I'll give it a try.

But it seems to have two properties which make it unsuitable for
the particular application I had in mind:

- The theoretical worst-case performance of hash-based data structures
can be attained if the hash function has bad dispersal.  As the code
relies on Hashtbl.hash, which does not hash deeply, this could
potentially be a proble, in particular if your keys have long "common
prefixes" that are not distinguished by the hash function.  It might
work well in practice but I feel a little uncomfortable.

- Also, it does not preserve the natural order for keys, and that
is particularly bad, because I often use, for instance, float-indexed
maps or sets as a substitute for heaps.

The paper with the inverse cubic lower bound is *very* interesting; we don't
see plausible lower bounds often in complexity theory, especially with such
large assumptions (just bounded out-degree for the graph nodes).

So it seems there is little hope to have a drop-in replacement for Set
or Map that is compatible with the natural order of things, a.k.a.,
Pervasives.compare.

-- 
Berke DURAK


  reply	other threads:[~2008-03-07 10:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch
2008-03-05 19:53   ` Jean-Christophe Filliâtre
2008-03-05 20:03   ` Jon Harrop
2008-03-05 21:56     ` Alain Frisch
2008-03-06  7:45     ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06  9:53 ` Berke Durak
2008-03-06 17:36   ` Harrison, John R
2008-03-07 10:09     ` Berke Durak [this message]
2008-03-07 17:13       ` Harrison, John R
2008-03-07 10:19   ` Alain Frisch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47D11442.6090409@exalead.com \
    --to=berke.durak@exalead.com \
    --cc=caml-list@inria.fr \
    --cc=john.r.harrison@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).