caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Yaron Minsky <yminsky@janestreet.com>
Cc: David Allsopp <dra-news@metastack.com>,
	"caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Hashtbl and security
Date: Fri, 30 Dec 2011 22:54:38 +0100	[thread overview]
Message-ID: <1325282078.5036.128.camel@samsung> (raw)
In-Reply-To: <CACLX4jST3AoKTvsKwNpTU6GWy-dAE-jOEesyJ7kA2c2DgGyrWA@mail.gmail.com>

Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky:
> It's not clever in that way.  It does try to do a good job of keeping
> the memory impact of the tree low, but you maintain O(1) by having a
> low load factor, and therefore trees of constant size.  You can take a
> look at the code here:
> 
> 
> https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml
> 
> (Don't rely on that repo too much yet, btw.  We're probably going to
> blow it away and create a new one in the next couple of days.  But
> going forward, we plan on using bitbucket as a place to work together
> with the community on Core.)

Interesting solution, at least. When I see it right, the Avltree has
special representations for empty and 1-node trees, so with some luck
this covers 99% of the array cells. So, the memory footprint will
usually not be higher than for conventional hash tables.

So, I'd consider Core_hashtbl as a way when you need high protection
against pathological cases, but want to keep close to the performance
pattern of Hashtbl.

However, we are in an area where we need to make compromises. I guess
Core_hashtbl is (for the non-pathological cases) by a factor slower than
the more lightweight Hashtbl. This raises the question how it competes
to other solutions, e.g. a Map where the keys are first compared by
their hash before cmp is called, for instance

let (++) x f = if x<>0 then x else f()

module HString = struct
  type t = int * string
  let compare (h1,s1) (h2,s2) =
    compare h1 h2 ++ (fun () -> compare s1 s2)
end

module HStringMap = Map.Make(HString)

now use it as: HStringMap.add (Hashtbl.hash key, key) value map

This eliminates one of the drawbacks of the normal Map, namely that many
keys need to be compared (which can be costly).

Gerd

> 
> y
> 
> On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp
> <dra-news@metastack.com> wrote:
>         Yaron Minsky wrote:
>         > For just this reason, the hashtables in Core have been
>         reimplemented to use an
>         > AVL tree in the buckets.  That way, even when you have
>         pathological collisions,
>         > you degrade gracefully to O(log n) per operation, instead of
>         O(n), where n is
>         > the number of keys in the hashtable.
>         
>         
>         I'm resisting the temptation to hack-it-and-see: does your
>         implementation do anything clever to maintain Hashtbl's O(1)
>         insertion time (e.g. Hashtbl.add updates a list and then the
>         first call to Hashtbl.find or Hashtbl.mem "moves" any items
>         from the list to the AVL). Or does doing that impact "general"
>         performance too much?
>         
>         In the POST web scenario (or processing HTTP request headers),
>         for example, "degrading" Hashtbl.add from O(1) to O(log n) is
>         only acceptable if you know that you'll query all the fields
>         in the POST (which isn't necessarily true).
>         
>         
>         David
> 
> 



  reply	other threads:[~2011-12-30 21:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30 16:44 Gerd Stolpmann
2011-12-30 16:48 ` Yaron Minsky
2011-12-30 19:01   ` David Allsopp
2011-12-30 20:52     ` Yaron Minsky
2011-12-30 21:54       ` Gerd Stolpmann [this message]
2011-12-30 17:06 ` Xavier Leroy
2011-12-30 21:16   ` Gerd Stolpmann
2011-12-31  0:57   ` oliver
2011-12-31  0:59     ` oliver
2012-01-01 12:52   ` Richard W.M. Jones
2012-01-01 17:29     ` Xavier Leroy
2012-01-01 21:04       ` Gerd Stolpmann
2012-01-01 23:24         ` oliver
2012-01-01 23:58           ` Gerd Stolpmann
2012-01-02  1:43             ` oliver
2012-01-04 17:56               ` Damien Doligez
2012-01-04 21:52                 ` oliver
2012-01-02  9:34         ` David MENTRE
2012-01-30 10:54       ` Goswin von Brederlow
2011-12-30 17:40 ` rixed
2011-12-30 17:52   ` Edgar Friendly
2011-12-31  1:02   ` oliver
2011-12-31  0:33 ` oliver
2012-01-02  0:21 ` Shawn Wagner
2012-01-02 14:52   ` Gerd Stolpmann
2012-01-30 10:51 ` Goswin von Brederlow
2012-01-31 14:16   ` Gerd Stolpmann
2012-02-08  9:41     ` Goswin von Brederlow
2012-02-08 10:43       ` Philippe Wang
2012-02-08 10:46       ` AUGER Cédric
2012-02-09 13:22         ` Goswin von Brederlow
2012-02-09 14:48           ` Gerd Stolpmann
2012-02-08 11:12       ` Gerd Stolpmann
2012-02-09 13:11         ` Goswin von Brederlow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1325282078.5036.128.camel@samsung \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@inria.fr \
    --cc=dra-news@metastack.com \
    --cc=yminsky@janestreet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).