From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pBULt3dY031585
	for <caml-list@sympa-roc.inria.fr>; Fri, 30 Dec 2011 22:55:03 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AtoBAEEy/k7U4xEJkGdsb2JhbABDhQ+nRCIBAQEBCQkNBxQDIoFyAQEEASMEUhALDgoCAiYCAlcGExQGAYdfAgakDJErBIEvgk6GfIEWBI0ajTeMZg
X-IronPort-AV: E=Sophos;i="4.71,434,1320620400"; 
   d="scan'208";a="125105088"
Received: from moutng.kundenserver.de ([212.227.17.9])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Dec 2011 22:54:57 +0100
Received: from office1.lan.sumadev.de (dslb-094-219-218-084.pools.arcor-ip.net [94.219.218.84])
	by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis)
	id 0M6K9X-1QjGuJ2K2V-00yCDx; Fri, 30 Dec 2011 22:54:40 +0100
Received: from [192.168.178.30] (546BF816.cm-12-4d.dynamic.ziggo.nl [84.107.248.22])
	by office1.lan.sumadev.de (Postfix) with ESMTPSA id 3C10DC00C7;
	Fri, 30 Dec 2011 22:54:40 +0100 (CET)
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Yaron Minsky <yminsky@janestreet.com>
Cc: David Allsopp <dra-news@metastack.com>,
        "caml-list@inria.fr"
 <caml-list@inria.fr>
In-Reply-To: <CACLX4jST3AoKTvsKwNpTU6GWy-dAE-jOEesyJ7kA2c2DgGyrWA@mail.gmail.com>
References: <1325263446.5036.104.camel@samsung>
	 <CACLX4jQwBZndqwq6AQiaE74Za7wvKumT_PERj6_h0-gLONOEow@mail.gmail.com>
	 <E51C5B015DBD1348A1D85763337FB6D9C25F6345@Remus.metastack.local>
	 <CACLX4jST3AoKTvsKwNpTU6GWy-dAE-jOEesyJ7kA2c2DgGyrWA@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Date: Fri, 30 Dec 2011 22:54:38 +0100
Message-ID: <1325282078.5036.128.camel@samsung>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:RUWBzy28lC2xZpo+cvMFb45qqSHw1Wf+JuJ3pFFXCTr
 aGOzAUXYHR/6naP/dcoFsl8udTL8Kvy8xRRsY7bIvvtCma2HGX
 ayObU1yQrBQr+dG1y1XlkYi46viEnRU/D8dE9NZdJvTTdyJh4V
 2fL3lTttddcel6+acvTMzhBqC+KS/XtI//NJE0ftXlblZr8mY5
 /tx/+mZL+cl8UeZPzCBS1ppA7ya/v0QHego5bJxBS7JltIn1vK
 tJTpZFtp6HUoSTwPm++5fsuLU2mizX2fSewPXRjAjAQBC6Wik7
 iCA9c/HGhO9vMNi006X1R8AX6dnhEgB+v8BWqaQet3JeisB+vU
 yNXtUIqAuNymj2KDHfgAWPLTzNbrAmEKBpKaANseY
Subject: Re: [Caml-list] Hashtbl and security

Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky:
> It's not clever in that way.  It does try to do a good job of keeping
> the memory impact of the tree low, but you maintain O(1) by having a
> low load factor, and therefore trees of constant size.  You can take a
> look at the code here:
> 
> 
> https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml
> 
> (Don't rely on that repo too much yet, btw.  We're probably going to
> blow it away and create a new one in the next couple of days.  But
> going forward, we plan on using bitbucket as a place to work together
> with the community on Core.)

Interesting solution, at least. When I see it right, the Avltree has
special representations for empty and 1-node trees, so with some luck
this covers 99% of the array cells. So, the memory footprint will
usually not be higher than for conventional hash tables.

So, I'd consider Core_hashtbl as a way when you need high protection
against pathological cases, but want to keep close to the performance
pattern of Hashtbl.

However, we are in an area where we need to make compromises. I guess
Core_hashtbl is (for the non-pathological cases) by a factor slower than
the more lightweight Hashtbl. This raises the question how it competes
to other solutions, e.g. a Map where the keys are first compared by
their hash before cmp is called, for instance

let (++) x f = if x<>0 then x else f()

module HString = struct
  type t = int * string
  let compare (h1,s1) (h2,s2) =
    compare h1 h2 ++ (fun () -> compare s1 s2)
end

module HStringMap = Map.Make(HString)

now use it as: HStringMap.add (Hashtbl.hash key, key) value map

This eliminates one of the drawbacks of the normal Map, namely that many
keys need to be compared (which can be costly).

Gerd

> 
> y
> 
> On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp
> <dra-news@metastack.com> wrote:
>         Yaron Minsky wrote:
>         > For just this reason, the hashtables in Core have been
>         reimplemented to use an
>         > AVL tree in the buckets.  That way, even when you have
>         pathological collisions,
>         > you degrade gracefully to O(log n) per operation, instead of
>         O(n), where n is
>         > the number of keys in the hashtable.
>         
>         
>         I'm resisting the temptation to hack-it-and-see: does your
>         implementation do anything clever to maintain Hashtbl's O(1)
>         insertion time (e.g. Hashtbl.add updates a list and then the
>         first call to Hashtbl.find or Hashtbl.mem "moves" any items
>         from the list to the AVL). Or does doing that impact "general"
>         performance too much?
>         
>         In the POST web scenario (or processing HTTP request headers),
>         for example, "degrading" Hashtbl.add from O(1) to O(log n) is
>         only acceptable if you know that you'll query all the fields
>         in the POST (which isn't necessarily true).
>         
>         
>         David
> 
>