From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pBUKqtd0030312 for ; Fri, 30 Dec 2011 21:52:55 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEFAC8k/k4macht/2dsb2JhbABDrEsIgQWBcgEBAQMBEgIsAQE3AQQLCwQHAgEKLiISAQUBHAYTFAYIh1gImG8KijOEHQGNewMEg32IEoJehV2MS4VviA49hBg X-IronPort-AV: E=Sophos;i="4.71,434,1320620400"; d="scan'208";a="125100691" Received: from unknown (HELO mxgoog1.janestreet.com) ([38.105.200.109]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 30 Dec 2011 21:52:47 +0100 Received: from mail-tul01m020-f170.google.com ([209.85.214.170]) by mxgoog1.janestreet.com with esmtp (Exim 4.76) (envelope-from ) id 1RgjRV-00023n-KM for caml-list@inria.fr; Fri, 30 Dec 2011 15:52:45 -0500 Received: by obcwo10 with SMTP id wo10so11254008obc.29 for ; Fri, 30 Dec 2011 12:52:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=google; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=jDioSCPo+GYF6/+romC0IYjjo8EM7Ddzq3WeWKekN7M=; b=i3W6icyqp833hmOtvxVTsBmjdcP78K1n9405k3SWStQPfeV6CLoPBsEvSNAWlpownU d/3iKtyBh64wWIoDXKusx5Qpr7qOdA5NafFnt9r8xb3KX2urfK25d7rikfdsKmM2Ngwn wKoJzw35thuoNNDCsIY5+t+Z/rHRSsH8yWtvs= MIME-Version: 1.0 Received: by 10.182.192.101 with SMTP id hf5mr35002566obc.3.1325278364904; Fri, 30 Dec 2011 12:52:44 -0800 (PST) Received: by 10.182.11.167 with HTTP; Fri, 30 Dec 2011 12:52:44 -0800 (PST) X-Originating-IP: [38.105.200.252] In-Reply-To: References: <1325263446.5036.104.camel@samsung> Date: Fri, 30 Dec 2011 15:52:44 -0500 Message-ID: From: Yaron Minsky To: David Allsopp Cc: Gerd Stolpmann , "caml-list@inria.fr" Content-Type: multipart/alternative; boundary=14dae9399b552b2cbc04b5556aae Subject: Re: [Caml-list] Hashtbl and security --14dae9399b552b2cbc04b5556aae Content-Type: text/plain; charset=ISO-8859-1 It's not clever in that way. It does try to do a good job of keeping the memory impact of the tree low, but you maintain O(1) by having a low load factor, and therefore trees of constant size. You can take a look at the code here: https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml (Don't rely on that repo too much yet, btw. We're probably going to blow it away and create a new one in the next couple of days. But going forward, we plan on using bitbucket as a place to work together with the community on Core.) y On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp wrote: > Yaron Minsky wrote: > > For just this reason, the hashtables in Core have been reimplemented to > use an > > AVL tree in the buckets. That way, even when you have pathological > collisions, > > you degrade gracefully to O(log n) per operation, instead of O(n), where > n is > > the number of keys in the hashtable. > > I'm resisting the temptation to hack-it-and-see: does your implementation > do anything clever to maintain Hashtbl's O(1) insertion time (e.g. > Hashtbl.add updates a list and then the first call to Hashtbl.find or > Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that > impact "general" performance too much? > > In the POST web scenario (or processing HTTP request headers), for > example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable > if you know that you'll query all the fields in the POST (which isn't > necessarily true). > > > David > --14dae9399b552b2cbc04b5556aae Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It's not clever in that way. =A0It does try to do a good job of keeping= the memory impact of the tree low, but you maintain O(1) by having a low l= oad factor, and therefore trees of constant size. =A0You can take a look at= the code here:

https://bitbucket.org/yminsky/ocaml= -core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml

(Don't = rely on that repo too much yet, btw. =A0We're probably going to blow it= away and create a new one in the next couple of days. =A0But going forward= , we plan on using bitbucket as a place to work together with the community= on Core.)

y

On Fri, Dec 30, 201= 1 at 2:01 PM, David Allsopp <dra-news@metastack.com> wrote:
Yaron Minsky wrote:
> For just this reason, the hashtables in Core have been reimplemented t= o use an
> AVL tree in the buckets. =A0That way, even when you have pathological = collisions,
> you degrade gracefully to O(log n) per operation, instead of O(n), whe= re n is
> the number of keys in the hashtable.

I'm resisting the temptation to hack-it-and-see: does your implem= entation do anything clever to maintain Hashtbl's O(1) insertion time (= e.g. Hashtbl.add updates a list and then the first call to Hashtbl.find or = Hashtbl.mem "moves" any items from the list to the AVL). Or does = doing that impact "general" performance too much?

In the POST web scenario (or processing HTTP request headers), for example,= "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable= if you know that you'll query all the fields in the POST (which isn= 9;t necessarily true).


David

--14dae9399b552b2cbc04b5556aae--