From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_FAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 83884BC69 for ; Wed, 3 Oct 2007 12:51:05 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAHsQA0fAXQInemdsb2JhbACOOAIJCg X-IronPort-AV: E=Sophos;i="4.21,224,1188770400"; d="scan'208";a="17233295" Received: from concorde.inria.fr ([192.93.2.39]) by mail4-smtp-sop.national.inria.fr with ESMTP; 03 Oct 2007 12:51:05 +0200 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l93Ap4I6014100 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Wed, 3 Oct 2007 12:51:04 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAHsQA0dQW+UCh2dsb2JhbACOOAIBCAon X-IronPort-AV: E=Sophos;i="4.21,224,1188770400"; d="scan'208";a="17233294" Received: from main.gmane.org (HELO ciao.gmane.org) ([80.91.229.2]) by mail4-smtp-sop.national.inria.fr with ESMTP; 03 Oct 2007 12:51:03 +0200 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Id1mL-00081K-8r for caml-list@inria.fr; Wed, 03 Oct 2007 10:48:33 +0000 Received: from ivr94-8-88-162-26-239.fbx.proxad.net ([88.162.26.239]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Oct 2007 10:48:33 +0000 Received: from li by ivr94-8-88-162-26-239.fbx.proxad.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 03 Oct 2007 10:48:33 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: caml-list@inria.fr From: Zheng Li Subject: Re: [ANN] Weaktbl: a weak hash table library Date: Wed, 03 Oct 2007 12:43:13 +0200 Message-ID: <873awsv5q6.fsf@pps.jussieu.fr> References: <87641q3ysp.fsf@pps.jussieu.fr> <1191323231.28954.27.camel@rosella.wigram> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: ivr94-8-88-162-26-239.fbx.proxad.net User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux) Cancel-Lock: sha1:lr+vonq366gdgcLiQ46AjOUBlsc= Sender: news X-Miltered: at concorde with ID 47037418.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; hash:01 variants:01 pointer:01 unreferenced:01 hashtable:01 hashtbl:01 node:01 ocaml:01 hash:01 pointers:01 pointers:01 functor:01 hashtbl:01 hashedtype:01 advices:01 Hi, skaller writes: >> * Both keys and associated values are weakly stored. A binding exists until >> the key is no longer referenced anywhere > > I'm a bit confused what it means.. > > If the key is weak, it is useless except for terms, i.e. variants > or records of another data structure, where you are placing a pointer > in the table as a key. You can't put values (int, float, etc) in the > key field, because they're immediately unreferenced. What happen if you put a noncollectable value (e.g. int) into a weak data structure (e.g. standard weak array/hashtable)? It won't disappear from the data structure unless you manually remove it. It's the same here, if you use noncollectable data as the key, the key won't disappear, hence the binding won't disappear (according to our principle -- key decide whether the binding disappear). So at least it's consistent with the standard Weak module. If you're using noncollectable value as key, you're actually using Weaktbl as normal Hashtbl, that's still not bad at all. > So you could use this thing as a cache, where for example you have > a tree and calculate some property of each node, and cache it > in the table. > > The problem is .. physical comparison in Ocaml is unstable, > so you cannot hash pointers using polymorphic hash, > but pointers are the only kind that persist. > > So I have to conclude you must use the Weaktbl with a functor, > instantiating the hash function to something which does a computation > on the value, and this claim: Weaktbl provides the same interface as Hashtbl, i.e. a polymorphic one and a functorial one. The functorial signature is exactly: module Make (H : Hashtbl.HashedType) : Hashtbl.S with type key = H.t The polymorphic one is equivalent to having "equal x y = (compare x y) = 0" and "hash = Hashtbl.hash", just as the Hashtbl's polymorphic interface did. You don't have to (shouldn't, as in standard Hashtbl) use a hash function related to physical property of data, just a normal hash (as the standard one) based on structural property is enough. Actually, it's the "equal" matters to the semantic. Here I'd like to clarify it a bit, and ask for some *advices*: - If physical equivalence (==) provided as "equal", there should be nothing ambiguous here, because every key is unique. A key disappears, all its bindings disappear. Querying with a key (e.g. using find/find_all/remove etc.), only those bindings added with this exact key will be given. The semantic is clear, every key is only related to binding added by itself --- because in each equivalent class there is just one value. - If other equivalent relation provided as "equal", e.g. "let equal = (=)" or "let equal x y = (fst x) = (fst y)". No doubt, when queering with a key , all bindings being added by its equivalent class should be given --- that's what we define the equivalent relation for. However, on the other problem: when a binding should disappear, we have two kinds of semantics as candidate: * The first one is used in the current library: a binding is physically related to the key with which it was added with. I.e., each key is still physically unique, and its presence decide the aliveness of all the bindings added with it. Still, a key disappear, the bindings added by it disappear. This has nothing to do with the equivalent relationship --- equivalent class relation is only used for querying and manipulation. E.g., if we define "equal = (=)" and $ka and $kb are keys structurally equivalent but physically nonequivalent, the presence of bindings added by $ka and bindings added by $kb is still independently decided by the presence of $ka and $kb. But when doing find_all/find/remove/replace etc., using no matter $ka or $kb should give you the same result. * Another possibility is: a binding will exist until all keys that have contributed to this equivalent class disappear, or say as far as there is at least one key that ever contributed to this class, all bindings of this class will remain; when the last key of this class disappears, the bindings belong to this class disappear all together. The rational behind the second approach is: people can sometimes mix up structurally equivalent but physically nonequivalent keys. E.g. they can use key $ka to add some binding, but *accidentally* switch to $ka's structural equivalence $kb afterwards to add more bindings. (That's normal, for example, you filter $ka through a function which doesn't return $ka itself but its copy among the result, and you don't insist in using the original $ka, but instead using the one in the output.) After a while, $ka is probably no longer used and hence GCed sometime later. In the first approach, those bindings added with $ka will just disappear which sounds mysterious; in the second approach, all bindings remain as far as one of $ka or $kb is alive. So people may think the second approach is usually convenient, because it's more conservative and safe, and usually people don't care much about the more memory occupation as far as it won't leak overall. But the second approach doesn't fully solve its problem and can produce even more mysterious situation: since its rational is assume people can make mistakes, then think about the following situation: after you using $kb as the key for a while, you accidentally (again) switch to $kc, however you hasn't got any chance to use $kc to manipulate bindings yet (so the weaktbl knows nothing about the existence of $kc), you just hold it because you believe it's $k(b) (and $k(a)). After a while, $ka and $kb disappear, all the bindings are gone...you're still holding $kc. I actually implemented both of them, the switch is just one line -- the first approach use a weak stack inside, the second use a common stack. However I don't want to provide both to confuse people. I'd like to hear more suggestions to decide which one is desired. Thanks in advance. -- Zheng Li http://www.pps.jussieu.fr/~li