From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id pBULt3dY031585 for ; Fri, 30 Dec 2011 22:55:03 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtoBAEEy/k7U4xEJkGdsb2JhbABDhQ+nRCIBAQEBCQkNBxQDIoFyAQEEASMEUhALDgoCAiYCAlcGExQGAYdfAgakDJErBIEvgk6GfIEWBI0ajTeMZg X-IronPort-AV: E=Sophos;i="4.71,434,1320620400"; d="scan'208";a="125105088" Received: from moutng.kundenserver.de ([212.227.17.9]) by mail4-smtp-sop.national.inria.fr with ESMTP; 30 Dec 2011 22:54:57 +0100 Received: from office1.lan.sumadev.de (dslb-094-219-218-084.pools.arcor-ip.net [94.219.218.84]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0M6K9X-1QjGuJ2K2V-00yCDx; Fri, 30 Dec 2011 22:54:40 +0100 Received: from [192.168.178.30] (546BF816.cm-12-4d.dynamic.ziggo.nl [84.107.248.22]) by office1.lan.sumadev.de (Postfix) with ESMTPSA id 3C10DC00C7; Fri, 30 Dec 2011 22:54:40 +0100 (CET) From: Gerd Stolpmann To: Yaron Minsky Cc: David Allsopp , "caml-list@inria.fr" In-Reply-To: References: <1325263446.5036.104.camel@samsung> Content-Type: text/plain; charset="UTF-8" Date: Fri, 30 Dec 2011 22:54:38 +0100 Message-ID: <1325282078.5036.128.camel@samsung> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:RUWBzy28lC2xZpo+cvMFb45qqSHw1Wf+JuJ3pFFXCTr aGOzAUXYHR/6naP/dcoFsl8udTL8Kvy8xRRsY7bIvvtCma2HGX ayObU1yQrBQr+dG1y1XlkYi46viEnRU/D8dE9NZdJvTTdyJh4V 2fL3lTttddcel6+acvTMzhBqC+KS/XtI//NJE0ftXlblZr8mY5 /tx/+mZL+cl8UeZPzCBS1ppA7ya/v0QHego5bJxBS7JltIn1vK tJTpZFtp6HUoSTwPm++5fsuLU2mizX2fSewPXRjAjAQBC6Wik7 iCA9c/HGhO9vMNi006X1R8AX6dnhEgB+v8BWqaQet3JeisB+vU yNXtUIqAuNymj2KDHfgAWPLTzNbrAmEKBpKaANseY Subject: Re: [Caml-list] Hashtbl and security Am Freitag, den 30.12.2011, 15:52 -0500 schrieb Yaron Minsky: > It's not clever in that way. It does try to do a good job of keeping > the memory impact of the tree low, but you maintain O(1) by having a > low load factor, and therefore trees of constant size. You can take a > look at the code here: > > > https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml > > (Don't rely on that repo too much yet, btw. We're probably going to > blow it away and create a new one in the next couple of days. But > going forward, we plan on using bitbucket as a place to work together > with the community on Core.) Interesting solution, at least. When I see it right, the Avltree has special representations for empty and 1-node trees, so with some luck this covers 99% of the array cells. So, the memory footprint will usually not be higher than for conventional hash tables. So, I'd consider Core_hashtbl as a way when you need high protection against pathological cases, but want to keep close to the performance pattern of Hashtbl. However, we are in an area where we need to make compromises. I guess Core_hashtbl is (for the non-pathological cases) by a factor slower than the more lightweight Hashtbl. This raises the question how it competes to other solutions, e.g. a Map where the keys are first compared by their hash before cmp is called, for instance let (++) x f = if x<>0 then x else f() module HString = struct type t = int * string let compare (h1,s1) (h2,s2) = compare h1 h2 ++ (fun () -> compare s1 s2) end module HStringMap = Map.Make(HString) now use it as: HStringMap.add (Hashtbl.hash key, key) value map This eliminates one of the drawbacks of the normal Map, namely that many keys need to be compared (which can be costly). Gerd > > y > > On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp > wrote: > Yaron Minsky wrote: > > For just this reason, the hashtables in Core have been > reimplemented to use an > > AVL tree in the buckets. That way, even when you have > pathological collisions, > > you degrade gracefully to O(log n) per operation, instead of > O(n), where n is > > the number of keys in the hashtable. > > > I'm resisting the temptation to hack-it-and-see: does your > implementation do anything clever to maintain Hashtbl's O(1) > insertion time (e.g. Hashtbl.add updates a list and then the > first call to Hashtbl.find or Hashtbl.mem "moves" any items > from the list to the AVL). Or does doing that impact "general" > performance too much? > > In the POST web scenario (or processing HTTP request headers), > for example, "degrading" Hashtbl.add from O(1) to O(log n) is > only acceptable if you know that you'll query all the fields > in the POST (which isn't necessarily true). > > > David > >