It's not clever in that way.  It does try to do a good job of keeping the memory impact of the tree low, but you maintain O(1) by having a low load factor, and therefore trees of constant size.  You can take a look at the code here:

https://bitbucket.org/yminsky/ocaml-core/src/8e757d8f7309/base/core/lib/core_hashtbl.ml

(Don't rely on that repo too much yet, btw.  We're probably going to blow it away and create a new one in the next couple of days.  But going forward, we plan on using bitbucket as a place to work together with the community on Core.)

y

On Fri, Dec 30, 2011 at 2:01 PM, David Allsopp <dra-news@metastack.com> wrote:
Yaron Minsky wrote:
> For just this reason, the hashtables in Core have been reimplemented to use an
> AVL tree in the buckets.  That way, even when you have pathological collisions,
> you degrade gracefully to O(log n) per operation, instead of O(n), where n is
> the number of keys in the hashtable.

I'm resisting the temptation to hack-it-and-see: does your implementation do anything clever to maintain Hashtbl's O(1) insertion time (e.g. Hashtbl.add updates a list and then the first call to Hashtbl.find or Hashtbl.mem "moves" any items from the list to the AVL). Or does doing that impact "general" performance too much?

In the POST web scenario (or processing HTTP request headers), for example, "degrading" Hashtbl.add from O(1) to O(log n) is only acceptable if you know that you'll query all the fields in the POST (which isn't necessarily true).


David