From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by yquem.inria.fr (Postfix) with ESMTP id 8BFDBBC69 for ; Thu, 4 Oct 2007 18:06:36 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAM+rBEfAXQInemdsb2JhbACOOAEBCQo X-IronPort-AV: E=Sophos;i="4.21,230,1188770400"; d="scan'208";a="3757866" Received: from concorde.inria.fr ([192.93.2.39]) by mail3-smtp-sop.national.inria.fr with ESMTP; 04 Oct 2007 18:06:36 +0200 Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l94G6XTM005383 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Thu, 4 Oct 2007 18:06:35 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAIOsBEdQW+UCh2dsb2JhbACOOAEBCQon X-IronPort-AV: E=Sophos;i="4.21,230,1188770400"; d="scan'208";a="2383217" Received: from main.gmane.org (HELO ciao.gmane.org) ([80.91.229.2]) by mail2-smtp-roc.national.inria.fr with ESMTP; 04 Oct 2007 18:06:33 +0200 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IdTAN-0002LC-J6 for caml-list@inria.fr; Thu, 04 Oct 2007 16:03:11 +0000 Received: from ivr94-8-88-162-26-239.fbx.proxad.net ([88.162.26.239]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 Oct 2007 16:03:11 +0000 Received: from li by ivr94-8-88-162-26-239.fbx.proxad.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 04 Oct 2007 16:03:11 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: caml-list@inria.fr From: Zheng Li Subject: Re: [ANN] Weaktbl: a weak hash table library Date: Thu, 04 Oct 2007 18:01:13 +0200 Message-ID: <878x6iuawm.fsf@pps.jussieu.fr> References: <87641q3ysp.fsf@pps.jussieu.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: ivr94-8-88-162-26-239.fbx.proxad.net User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux) Cancel-Lock: sha1:FCERN7nojYlCEDIbydV2YqDHPcQ= Sender: news X-Miltered: at concorde with ID 47050F89.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; hash:01 advices:01 semantics:01 lgpl:01 semantics:01 hashtbl:01 structurally:01 bindings:01 hashtbl:01 hash:01 key-:01 hashtable:01 hashtable:01 bindings:01 gced:01 Hi, Follow some advices, I updated Weaktbl to version 0.02. Notable improvements include: - the implementation is improved based on the clarified semantics - a new weak stack implementation is used internally (I'm not sure whether it's useful to others so I decide not to expose it through interface) - add new sections to README The interface remains the same. Anyone has downloaded the previous version is strongly suggested to upgrade. Btw, the code is released under LGPL + linking exception. Here are the extra sections from the README: ------------------------------------------- ...... == Semantics == Weaktbl have all the same semantics as standard Hashtbl, with the only exception: a binding will be automatically garbage collected, if *its key* is no longer referenced by (hence unreachable from) the rest part your program. Note that - Here, "its key" indicates the exact (physical) key with which the binding was added, not any other keys of its equivalent class even if they may (or may not) be structurally equivalent to this key. - You may use noncollectable values as keys (such as int), but they won't be collected by the Weaktbl (this is the same convention all weak data structures follow). Since the keys won't be collected, according to our principle -- key's aliveness decide its' binding's aliveness, the bindings won't be collected too. You're actually using Weaktbl as a persistent Hashtbl, but that's still not too bad. Finally, the polymorphic hash interface of Weaktbl takes ''equal x y = (compare x y) = 0'' and ''hash = Hashtbl.hash'' as the default setting, the same as the polymorphic interface of standard Hashtbl. == Application == Weaktbl is useful for many kinds of applications, where standard Hashtbl's persistent storage prevents the necessity of automatic garbage collection. We'll illustrate this with a few typical examples: === Dict cache === Hash table is typically used for quick dictionary lookup. Suppose you're working with a huge data set and key->value lookup is the most frequent operations. To have all the data into a hash table in memory is out of consideration; but to have everything external is way too expensive (e.g. looking up in a large file or querying from a external database each time). Fortunately, you can work on one small part of the data set at a time, and the range of this small part evolving gradually, i.e. some data coming into the range of vision, others going out of scope. A typical working mechanics can be written as let query key = try lookup $key in $hashtable (* quick *) with Not_found -> let value = lookup $key in $external_storage in (* slow *) add $key $value to $hashtable; (* we may query it quite often recently *) value You'll soon find the problem -- $hashtable here will growing forever! The situation is, with the evolving of working range, some keys are out of reach and should be collected by the GC, so should the bindings related to them. However due to the Hashtbl's persistence, this won't happen. First, the key itself is persistently referenced by Hashtbl, it wouldn't likely to be GCed, not mention the bindings. For a weak hash table, this is not a problem: keys themselves are *weakly* referenced by the table, besides when a key is forgotten (GCed) by your program, its binding(s) will be forgotten (GCed) by the table too [*]. === User-level GC === Suppose you are working on graph-like data structure. Each node is represented as a id -> {links: id list; info: huge_data} bindings in a hash table, where the links is all the nodes directly reachable from node id, and the info is large blocks of information affiliated to the node id. The graph structure keeps evolving over time, i.e. new nodes being added, new links being established, old links being broken and old nodes being dropped etc. So far it's just normal. Image you break a link, i.e. remove some $id_b from the links field of $id_a, it's possible this is the only bridge between the sub-graph $id_b is located and the outside world, besides there's no active id directly points to this sub-graph. In short, the sub-graph is disconnected, obsoleted and unreachable. Because the graph data structure keeps evolving, it's very important to GC those unreachable parts. How this is done with a standard Hashtbl? Whenever we break a line $id_a -> $id_b, we check all the nodes' links field to see whether this is the last link to $id_b. If so, we first remove $id_b, break each outward link of $id_b and recursively check whether these broken links produce more unreachable nodes... With weak hash table, you do nothing. When the last link to a sub-graph is broken, it simply means the sub-graph is no longer referenced by the rest program, so it's GCed automatically. Zheng Li writes: > Hi, > > I remember weak hash table was discussed on the list not long ago. I once ran > into a situation where weak data structure was desired, and came up with this > small module. Though I didn't really get a chance to make use of it (I turned > to another solution laterly), I'd like to share and hope it would be useful to > others. > > == Description == > > Weaktbl is yet another weak hash table library for OCaml. Its main features > include: > > * Both keys and associated values are weakly stored. A binding exists until > the key is no longer referenced anywhere > * The implementation is built upon the hash table functor of Weak library > rather than implemented from scratch, so it's rather small > * The interface is fully compatible with the standard Hashtbl library instead > of the hash sub-module of the Weak library, so basically you can also use it > as an alternative of the standard Hashtbl > * Its behaviors also follow the standard Hashtbl library's conventions. > E.g. the "binding orders" and the "current binding" concepts all make > sense here (with find/find_all/remove/replace/iter/fold etc.) > > Link: http://www.pps.jussieu.fr/~li/software/index.html#weaktbl -- Zheng Li http://www.pps.jussieu.fr/~li