* Weak hash table for attaching extra data to an object @ 2007-08-14 10:15 Richard Jones 2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher 2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones 0 siblings, 2 replies; 15+ messages in thread From: Richard Jones @ 2007-08-14 10:15 UTC (permalink / raw) To: caml-list Problem: I have a module which I can't edit[1]. This module, let's call it Document, stores a representation of an XML tree in internal node objects: type intnode What I want to do is -- completely outside Document -- attach my own extra information to these nodes. The key point is that if the garbage collector collects an intnode, then I want my extra information to be garbage collected too. My proposed solution would use a weak hash table similar to / derived from this one: http://remi.vanicat.free.fr/ocaml/hweak/. I would store in this hash table mappings from (weak) intnode pointers to my extra data structure. Remi's Hweak module isn't quite what I need because the data is also weak. Since nothing except the hash table itself is tracking my extra data, the GC can potentially drop all of the bindings, and that's not much help. I tried to modify Hweak so that the data is not weak, but that proved to be quite hard. I wonder if anyone has done this, or can suggest a better way to solve this problem? The issue of attaching extra data to some existing object would seem to be a fairly common one ... Rich. [1] By "can't edit" I really mean "don't want to modify". I want Document to be general purpose. Also, unlike PXP, I want multiple callers to be able to individually attach data to nodes. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones @ 2007-08-14 11:08 ` Thomas Fischbacher 2007-08-14 11:48 ` Till Varoquaux 2007-08-14 20:44 ` Jon Harrop 2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones 1 sibling, 2 replies; 15+ messages in thread From: Thomas Fischbacher @ 2007-08-14 11:08 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list Richard Jones wrote: > I have a module which I can't edit[1]. This module, let's call it > Document, stores a representation of an XML tree in internal node > objects: > > type intnode > > What I want to do is -- completely outside Document -- attach my own > extra information to these nodes. Ah, so the situation really is somewhat common, and not something just I am running into occasionally. (The problem in particular also arises occasionally when writing symbolic algebra code when one wants to attach meta-data (e.g. on typesetting) to terms without touching the underlying term representations in any way.) > The key point is that if the > garbage collector collects an intnode, then I want my extra > information to be garbage collected too. > > I wonder if anyone has done this, or can suggest a better way to solve > this problem? The issue of attaching extra data to some existing > object would seem to be a fairly common one ... You might be able to "fake" a weak hash table by using a trick: for a key-value entry k1 -> v1, you take k1 and v1 and rebuild the outermost data structure to get k2 with k2 = k1 and k2 != k1 (oh hey, now that looks confusing, right?), as well as v2 = v1 and v2 != v1. Now, you enter k2 -> v2 into an ordinary hash table and register finalizers for k1 and v1 which remove this k2 -> v2 entry from your hash. That might do. -- best regards, Thomas Fischbacher tf@functionality.de ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher @ 2007-08-14 11:48 ` Till Varoquaux 2007-08-14 20:44 ` Jon Harrop 1 sibling, 0 replies; 15+ messages in thread From: Till Varoquaux @ 2007-08-14 11:48 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list I have a small library that could do the trick. It is built a a hashtable whose keys are weak, you need to recompact it from time to time in order to free associated values. One strategy would be to recompact when n values are freed. I tried to limit the performance cost by storing collision values in a separate table. The code can be browsed online at: http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml.html There's a very small dependency on another file, i use Option.get (defined as in extlib: let get = function | Some v -> v | None -> raise No_value ) It is licensed under lgpl+special linking clause. If you need a more permissive license let me know. This can be downloaded via darcs: darcs get http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml.html or directly at: http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml There no mli file (I'm lazy). Constructive criticism is welcome. If you need automatic collection of values and have a hard time implementing I can work on it. Cheers, Till P.S.: I wasn't aware of HWeak when I did this library, I don't know how it compares performance wise. Having never had to profile this library I actually don't know how bad it really is. This can certainly also be worked on. On 8/14/07, Thomas Fischbacher <tf@functionality.de> wrote: > > Richard Jones wrote: > > > I have a module which I can't edit[1]. This module, let's call it > > Document, stores a representation of an XML tree in internal node > > objects: > > > > type intnode > > > > What I want to do is -- completely outside Document -- attach my own > > extra information to these nodes. > > Ah, so the situation really is somewhat common, and not something > just I am running into occasionally. (The problem in particular > also arises occasionally when writing symbolic algebra code when one > wants to attach meta-data (e.g. on typesetting) to terms without > touching the underlying term representations in any way.) > > > The key point is that if the > > garbage collector collects an intnode, then I want my extra > > information to be garbage collected too. > > > > I wonder if anyone has done this, or can suggest a better way to solve > > this problem? The issue of attaching extra data to some existing > > object would seem to be a fairly common one ... > > You might be able to "fake" a weak hash table by using a trick: for a > key-value entry k1 -> v1, you take k1 and v1 and rebuild the outermost > data structure to get k2 with k2 = k1 and k2 != k1 (oh hey, now that > looks confusing, right?), as well as v2 = v1 and v2 != v1. Now, you > enter k2 -> v2 into an ordinary hash table and register finalizers for > k1 and v1 which remove this k2 -> v2 entry from your hash. That might > do. > > -- > best regards, > Thomas Fischbacher > tf@functionality.de > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- http://till-varoquaux.blogspot.com/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher 2007-08-14 11:48 ` Till Varoquaux @ 2007-08-14 20:44 ` Jon Harrop 2007-08-14 23:09 ` Stefano Zacchiroli ` (2 more replies) 1 sibling, 3 replies; 15+ messages in thread From: Jon Harrop @ 2007-08-14 20:44 UTC (permalink / raw) To: caml-list On Tuesday 14 August 2007 12:08:11 Thomas Fischbacher wrote: > Richard Jones wrote: > > I have a module which I can't edit[1]. This module, let's call it > > Document, stores a representation of an XML tree in internal node > > objects: > > > > type intnode > > > > What I want to do is -- completely outside Document -- attach my own > > extra information to these nodes. > > Ah, so the situation really is somewhat common, and not something > just I am running into occasionally. (The problem in particular > also arises occasionally when writing symbolic algebra code when one > wants to attach meta-data (e.g. on typesetting) to terms without > touching the underlying term representations in any way.) I think this is an important design choice that people should be aware of when they first write the code. If you think the product types inside your expr type will be extended: type expr = | Int of int | Var of string | Seq of expr * expr array;; with pattern matches like this: Seq(h, t) -> ... Then a good first step is to turn n-ary constructor arguments into a single record: type expr = | Int of int | Var of string | Seq of seq and seq = {h: expr; t: expr array};; Then you write your pattern matches like this: Seq{h=h; t=t} -> ... and you are free to add more fields to the record without having to update such pattern matches. You also need to replace direct construction with an explicit constructor function: let seq h t = Seq{h=h; t=t} You can also parameterize the type over metadata: type 'a expr = | Int of int | Var of string | Seq of 'a seq and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};; Now you can insert arbitrary metadata into Seq nodes. Another possible solution (at design time) is to "untie the recursive knot" of the data structure: type 'expr aexpr = | Int of int | Var of string | Seq of 'expr * 'expr array;; This allows you to bind data structures together and, in particular, lets you inject metadata nodes between levels of exprs in the tree: type 'a expr = Meta of 'a * 'a expr aexpr;; (This can be done more efficiently using -rectypes) If your metadata are heavily used (e.g. hashing or culling information) and you are designing your own data structure then these approaches can all be useful (as well as polymorphic variants, that allow other forms of extension) because these approaches all result in fast code. If your metadata are sporadically used or, in particular, only sparsely present or you do not have access to the library then the weak hash table approach is probably the way to go. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 20:44 ` Jon Harrop @ 2007-08-14 23:09 ` Stefano Zacchiroli 2007-08-15 0:33 ` Jon Harrop 2007-08-15 1:28 ` skaller 2007-08-15 19:04 ` Richard Jones 2 siblings, 1 reply; 15+ messages in thread From: Stefano Zacchiroli @ 2007-08-14 23:09 UTC (permalink / raw) To: caml-list On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote: > I think this is an important design choice that people should be aware of when > they first write the code. If you think the product types inside your expr Richard's point was precisely about *not* being able to change such a design choice. There can be a lot of reasons for that, for example (even though this was not Richard's situation AFAICT) not being the author of the data structure you want to decorate. IOW: the original question is still valid. -- Stefano Zacchiroli -*- PhD in Computer Science ............... now what? zack@{cs.unibo.it,debian.org,bononia.it} -%- http://www.bononia.it/zack/ (15:56:48) Zack: e la demo dema ? /\ All one has to do is hit the (15:57:15) Bac: no, la demo scema \/ right keys at the right time ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 23:09 ` Stefano Zacchiroli @ 2007-08-15 0:33 ` Jon Harrop 2007-08-15 12:33 ` Daniel Bünzli 2007-08-16 14:54 ` Markus Mottl 0 siblings, 2 replies; 15+ messages in thread From: Jon Harrop @ 2007-08-15 0:33 UTC (permalink / raw) To: caml-list On Wednesday 15 August 2007 00:09:40 Stefano Zacchiroli wrote: > On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote: > > I think this is an important design choice that people should be aware of > > when they first write the code. If you think the product types inside > > your expr > > Richard's point was precisely about *not* being able to change such a > design choice. There can be a lot of reasons for that, for example (even > though this was not Richard's situation AFAICT) not being the author of > the data structure you want to decorate. Yes. Laterally, the point can be addressed by making library writers aware of the points that I made. I think Markus Mottl once said that all libraries should include phantom types... -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-15 0:33 ` Jon Harrop @ 2007-08-15 12:33 ` Daniel Bünzli 2007-08-16 14:54 ` Markus Mottl 1 sibling, 0 replies; 15+ messages in thread From: Daniel Bünzli @ 2007-08-15 12:33 UTC (permalink / raw) To: caml-list Le 15 août 07 à 02:33, Jon Harrop a écrit : > I think Markus Mottl once said that all libraries should include > phantom types... I don't know if he said that however the point of a phantom type is that there is no concrete value associated to it -- the phantom parameter isn't used in the type definition. Hence phantoms are useless if you need to decorate a type. Daniel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-15 0:33 ` Jon Harrop 2007-08-15 12:33 ` Daniel Bünzli @ 2007-08-16 14:54 ` Markus Mottl 1 sibling, 0 replies; 15+ messages in thread From: Markus Mottl @ 2007-08-16 14:54 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list On 8/14/07, Jon Harrop <jon@ffconsultancy.com> wrote: > Yes. Laterally, the point can be addressed by making library writers aware of > the points that I made. I think Markus Mottl once said that all libraries > should include phantom types... Well, I don't know in what context I said this, but one can certainly always create modules that are equivalent to another one with the exception of some abstract type taking an extra type parameter. This extra parameter could then capture additional constraints on the use of values of that type. In that sense it is not necessary that every library make use of phantom types. Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 20:44 ` Jon Harrop 2007-08-14 23:09 ` Stefano Zacchiroli @ 2007-08-15 1:28 ` skaller 2007-08-15 19:04 ` Richard Jones 2 siblings, 0 replies; 15+ messages in thread From: skaller @ 2007-08-15 1:28 UTC (permalink / raw) To: Jon Harrop; +Cc: caml-list On Tue, 2007-08-14 at 21:44 +0100, Jon Harrop wrote: > You can also parameterize the type over metadata: > > type 'a expr = > | Int of int > | Var of string > | Seq of 'a seq > and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};; > > Now you can insert arbitrary metadata into Seq nodes. This is not a precise description. You cannot put arbitrary meta-data into the Seq nodes with this design, for a given data structure, you can choose just one type of meta-data: this is the same as for example a Hashtbl. If you wish to *dynamically* vary the meta data, which is often the case when you have a tool like a compiler that performs phased analysis, a property list based on polymorphic variants can be useful. This is one of three ways I can think of to vary the data type dynamically across compilation unit boundaries. A second way is to use mutable field of a class type and change the object to a different implementation of the same abstraction. The third way is quite tricky and equivalent to the first: use a function unit -> unit as the type, and execute it to throw an exception. This is basically dynamic typing. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 20:44 ` Jon Harrop 2007-08-14 23:09 ` Stefano Zacchiroli 2007-08-15 1:28 ` skaller @ 2007-08-15 19:04 ` Richard Jones 2007-08-16 16:17 ` Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object) Richard Jones 2 siblings, 1 reply; 15+ messages in thread From: Richard Jones @ 2007-08-15 19:04 UTC (permalink / raw) To: caml-list On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote: > You can also parameterize the type over metadata: > > type 'a expr = > | Int of int > | Var of string > | Seq of 'a seq > and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};; > > Now you can insert arbitrary metadata into Seq nodes. This is basically what PXP does, and what my own PG'OCaml does. Unfortunately it's not very usable for a few practical reasons, as I shall describe. Firstly it only allows a single annotation. You can use a tuple to get around that, but that requires the annotators to cooperate somehow, which is not much good if you're combining code from several places. Secondly if you don't want to annotate, you need to manually cast the type to 'unit t' somewhere in your main program (and that will fail if one of the libraries you're using wants to annotate the type). Also, can you safely annotate the nodes one way in one part of your program, and other way in another part (eg. if you're transforming a tree with annotations, in stages)? Using variants isn't completely safe (but then neither are WHTs in this instance because you can not annotate a node and WHTs don't catch this statically). > If your metadata are sporadically used or, in particular, only sparsely > present or you do not have access to the library then the weak hash table > approach is probably the way to go. I should have said :-) I'm trying to annotate the <img> & <applet> nodes in a typical HTML document, so they are both sparse (compared to all nodes), but there are many of them. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object) 2007-08-15 19:04 ` Richard Jones @ 2007-08-16 16:17 ` Richard Jones 0 siblings, 0 replies; 15+ messages in thread From: Richard Jones @ 2007-08-16 16:17 UTC (permalink / raw) To: caml-list On Wed, Aug 15, 2007 at 08:04:55PM +0100, Richard Jones wrote: > I should have said :-) I'm trying to annotate the <img> & <applet> > nodes in a typical HTML document, so they are both sparse (compared to > all nodes), but there are many of them. I did some measurements using my WeakMetadata module to annotate a large random HTML document tree. Here are some observations and results, in no particular order. Code: http://annexia.org/tmp/weakMetadata.ml http://annexia.org/tmp/weakMetadata.mli http://annexia.org/tmp/test_weakmetadata.ml Firstly, it works, but I found a few programming pitfalls. In one case I was accidentally copying my tree nodes when annotating them, so I was annotating the copies rather than the nodes themselves. The code was along these lines: | Element (a,b,c,d) -> annotate (a,b,c,d) instead of: | Element ((_,_,_,_) as e) -> annotate e Obviously experienced OCaml programmers wouldn't make this mistake, right?! Another programming problem was that I ended up having to assign a unique identifier to each node which was used to do the hash (otherwise the hash function would have to do a costly recursive descent to determine if two nodes are equal). This is a peculiarity of the Weak hash table itself, namely that physical equality cannot be used. Another programming problem is that the choice of the initial size of the hash table determines performance factors in rather non-obvious ways. Allocating the hash table with 'create 13' causes buckets to have long chains, but that creates much lower memory overhead. Allocating the hash table with 'create 100003' gives short chains and high memory overhead. (Exact numbers below). I wrote a program to create a document (tree) with approximately 560,000 elements and approximately 840,000 text nodes. I annotated every element in the document to get some idea of the overhead. As mentioned before, choice of initial hash table size affects the length of the bucket chain: Initial hash table size: 13 Median chain length per bucket: 33 Overhead per element annotated: 21 bytes (on a 64 bit machine) Initial hash table size: 100003 Median chain length per bucket: 6 Overhead per element annotated: 40 bytes (on a 64 bit machine) Approximately 210,000 elements were <img> or <input> elements (15% of the total nodes) and I am particularly interested in just annotating these nodes. The overhead for annotating sparse nodes like this is similar (again with chain length dependent a lot on initial choice of hash table size): Initial hash table size: 13 Median chain length per bucket: 27 Overhead per element annotated: 26 bytes (on a 64 bit machine) Initial hash table size: 100003 Median chain length per bucket: 3 Overhead per element annotated: 70 bytes (on a 64 bit machine) You have to manually compact the hash table in order to actually regain the memory used by your metadata annotations. (Alternately, wait for the hash table as a whole to become unreachable, or reuse the hash table which causes old annotations to be overwritten). Internally, the Weak module stores a linked list of weak arrays (caml_weak_list_head). The GC has a separate phase to deal with weak arrays, and as far as I can tell from the code the amount of work it will do in each slice is limited. The GC also has special code to remove unreachable weak arrays, so this list does not just grow without bound. However I'm still unclear on the exact O(..) for using huge numbers of weak arrays, which is what the weak hash table does once you start annotating large numbers of nodes. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones 2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher @ 2007-08-14 16:22 ` Richard Jones 2007-08-14 23:24 ` Till Varoquaux 2007-08-14 23:35 ` Fernando Alegre 1 sibling, 2 replies; 15+ messages in thread From: Richard Jones @ 2007-08-14 16:22 UTC (permalink / raw) To: caml-list [-- Attachment #1: Type: text/plain, Size: 669 bytes --] Attached is my attempt to make a WeakMetadata module (a weak hash table where only the key is weak and the value is used to store metadata about the key). It is derived from Remi Vanicat's Hweak module, which is itself derived from the weak set in stdlib. Unfortunately it relies on the following being safe. I have no idea if this is safe or not. Array.create 0 (Obj.magic ()) Also it doesn't clean up the extra data (metadata) very aggressively. It basically waits for it to get overwritten when further bindings are added to a bucket. So if the metadata is very large it would be worth adding some sort of "compact" method. Rich. -- Richard Jones Red Hat [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: weakMetadata.ml --] [-- Type: text/plain; charset=utf-8, Size: 7104 bytes --] (***********************************************************************) (* *) (* HWeak *) (* *) (* Remi Vanicat *) (* *) (* Copyright 2002 Rémi Vanicat *) (* All rights reserved. This file is distributed under the terms of *) (* the GNU Library General Public License, with the special exception *) (* on linking described in the LICENCE file of the Objective Caml *) (* distribution *) (* *) (* Most of this file is an adptation of the implentation of Weak *) (* Hastable by Damien Doligez that can be found into Objective Caml *) (* which is Copyright 1997 Institut National de Recherche en *) (* Informatique et en Automatique and is distributed under the same *) (* licence *) (* *) (***********************************************************************) (* RWMJ: Modified so that only the key is weak. *) (** Weak array operations *) (** Weak hash tables *) module type S = sig type key type 'a t val create : int -> 'a t val clear : 'a t -> unit val add : 'a t -> key -> 'a -> unit val replace : 'a t -> key -> 'a -> unit val remove : 'a t -> key -> unit val merge : 'a t -> key -> 'a -> 'a val find : 'a t -> key -> 'a val find_all : 'a t -> key -> 'a list val mem : 'a t -> key -> bool val iter : (key -> 'a -> unit) -> 'a t -> unit val fold : (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b val count : 'a t -> int val stats : 'a t -> int * int * int * int * int * int end;; module Make (H : Hashtbl.HashedType) : (S with type key = H.t) = struct type key = H.t;; let emptykeybucket : key Weak.t = Weak.create 0;; type 'a t = { emptydatabucket : 'a array; mutable table : (key Weak.t * 'a array) array; mutable totsize : int; (* sum of the bucket sizes *) mutable limit : int; (* max ratio totsize/table length *) };; let get_index t d = (H.hash d land max_int) mod (Array.length t.table);; let create sz = let sz = if sz < 7 then 7 else sz in let sz = if sz > Sys.max_array_length then Sys.max_array_length else sz in let edb = Array.create 0 (Obj.magic ()) in { emptydatabucket = edb; table = Array.create sz (emptykeybucket, edb); totsize = 0; limit = 3; };; let clear t = for i = 0 to Array.length t.table - 1 do t.table.(i) <- (emptykeybucket, t.emptydatabucket); done; t.totsize <- 0; t.limit <- 3; ;; let fold f t init = let rec fold_bucket i ((b1, b2) as cpl) accu = if i >= Weak.length b1 then accu else match (Weak.get b1 i, Array.get b2 i) with | (Some v1, v2) -> fold_bucket (i+1) cpl (f v1 v2 accu) | _ -> fold_bucket (i+1) cpl accu in Array.fold_right (fold_bucket 0) t.table init ;; let iter f t = fold (fun d1 d2 () -> f d1 d2) t ();; let count t = let rec count_bucket i ((b1, b2) as cpl) accu = if i >= Weak.length b1 then accu else count_bucket (i+1) cpl (accu + (if Weak.check b1 i then 1 else 0)) in Array.fold_right (count_bucket 0) t.table 0 ;; let next_sz n = min (3*n/2 + 3) (Sys.max_array_length - 1);; let rec resize t = let oldlen = Array.length t.table in let newlen = next_sz oldlen in if newlen > oldlen then begin let newt = create newlen in newt.limit <- t.limit + 100; (* prevent resizing of newt *) fold (fun f t () -> add newt f t) t (); (* assert Array.length newt.table = newlen; *) t.table <- newt.table; t.limit <- t.limit + 2; end and add_aux t k e index = let bucket1, bucket2 = t.table.(index) in let sz = Weak.length bucket1 in let rec loop i = if i >= sz then begin let newsz = min (sz + 3) (Sys.max_array_length - 1) in if newsz <= sz then failwith "Weak.Make : hash bucket cannot grow more"; let newbucket1 = Weak.create newsz and newbucket2 = Array.create newsz e in Weak.blit bucket1 0 newbucket1 0 sz; Array.blit bucket2 0 newbucket2 0 sz; Weak.set newbucket1 i (Some k); (*Array.set newbucket2 i e; -- implied by Array.create above*) t.table.(index) <- (newbucket1, newbucket2); t.totsize <- t.totsize + (newsz - sz); if t.totsize > t.limit * Array.length t.table then resize t; end else begin if Weak.check bucket1 i then loop (i+1) else begin Weak.set bucket1 i (Some k); Array.set bucket2 i e; end end in loop 0; and add t k e = add_aux t k e (get_index t k) ;; let find_or t d ifnotfound = let index = get_index t d in let (bucket1, bucket2) = t.table.(index) in let sz = Weak.length bucket1 in let rec loop i = if i >= sz then ifnotfound index else begin match Weak.get_copy bucket1 i with | Some v when H.equal v d -> Array.get bucket2 i | _ -> loop (i+1) end in loop 0 ;; let merge t k d = find_or t k (fun index -> add_aux t k d index; d);; let find t d = find_or t d (fun index -> raise Not_found);; let find_shadow t d iffound ifnotfound = let index = get_index t d in let (bucket1, bucket2) = t.table.(index) in let sz = Weak.length bucket1 in let rec loop i = if i >= sz then ifnotfound else begin match Weak.get_copy bucket1 i with | Some v when H.equal v d -> iffound bucket1 bucket2 i | _ -> loop (i+1) end in loop 0 ;; let replace t k d = if (find_shadow t k (fun w1 w2 i -> Weak.set w1 i (Some k); Array.set w2 i d; false ) true) then add t k d let remove t d = find_shadow t d (fun w1 w2 i -> Weak.set w1 i None; (*Array.set w2 i ? -- leave it, it'll be overwritten*)) () let mem t d = find_shadow t d (fun _ _ i -> true) false let find_all t d = let index = get_index t d in let (bucket1, bucket2) = t.table.(index) in let sz = Weak.length bucket1 in let rec loop i accu = if i >= sz then accu else begin match Weak.get_copy bucket1 i with | Some v when H.equal v d -> let v = Array.get bucket2 i in loop (i+1) (v::accu) | _ -> loop (i+1) accu end in loop 0 [] ;; let stats t = let len = Array.length t.table in let lens = Array.map (fun (b,_) -> Weak.length b) t.table in Array.sort compare lens; let totlen = Array.fold_left ( + ) 0 lens in (len, count t, totlen, lens.(0), lens.(len/2), lens.(len-1)) ;; end;; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones @ 2007-08-14 23:24 ` Till Varoquaux 2007-08-14 23:35 ` Fernando Alegre 1 sibling, 0 replies; 15+ messages in thread From: Till Varoquaux @ 2007-08-14 23:24 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list On 8/14/07, Richard Jones <rich@annexia.org> wrote: > Attached is my attempt to make a WeakMetadata module (a weak hash > table where only the key is weak and the value is used to store > metadata about the key). It is derived from Remi Vanicat's Hweak > module, which is itself derived from the weak set in stdlib. > > Unfortunately it relies on the following being safe. I have no idea > if this is safe or not. > > Array.create 0 (Obj.magic ()) You seem to be asking for trouble here. From what I read in your code this *should* work. It is however highly dependent on the garbage collector implementation: the first bit contains tells it whether this is a block or not. I would recommend using options instead (and again you might just use my library but that shameless self promotion). Using options will have a performance impact. You can can see your solution as a clever (yet brittle) optimization hack. However if you are hellbent on performances you might also consider having direct access to your keys (fore quality) and collisions stored in another table. Regards, Till > Also it doesn't clean up the extra data (metadata) very aggressively. > It basically waits for it to get overwritten when further bindings are > added to a bucket. So if the metadata is very large it would be worth > adding some sort of "compact" method. > > Rich. > > -- > Richard Jones > Red Hat > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > -- http://till-varoquaux.blogspot.com/ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones 2007-08-14 23:24 ` Till Varoquaux @ 2007-08-14 23:35 ` Fernando Alegre 2007-08-15 7:59 ` Richard Jones 1 sibling, 1 reply; 15+ messages in thread From: Fernando Alegre @ 2007-08-14 23:35 UTC (permalink / raw) To: Richard Jones; +Cc: caml-list On Tue, Aug 14, 2007 at 05:22:06PM +0100, Richard Jones wrote: > if this is safe or not. > > Array.create 0 (Obj.magic ()) > I wonder if this the same as just [||] Fernando ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Caml-list] Weak hash table for attaching extra data to an object 2007-08-14 23:35 ` Fernando Alegre @ 2007-08-15 7:59 ` Richard Jones 0 siblings, 0 replies; 15+ messages in thread From: Richard Jones @ 2007-08-15 7:59 UTC (permalink / raw) To: Fernando Alegre; +Cc: caml-list On Tue, Aug 14, 2007 at 06:35:40PM -0500, Fernando Alegre wrote: > On Tue, Aug 14, 2007 at 05:22:06PM +0100, Richard Jones wrote: > > > if this is safe or not. > > > > Array.create 0 (Obj.magic ()) > > > > I wonder if this the same as just [||] Yes it is. I was looking so long at the details that I didn't see the obvious answer. Rich. -- Richard Jones Red Hat ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2007-08-16 16:17 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones 2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher 2007-08-14 11:48 ` Till Varoquaux 2007-08-14 20:44 ` Jon Harrop 2007-08-14 23:09 ` Stefano Zacchiroli 2007-08-15 0:33 ` Jon Harrop 2007-08-15 12:33 ` Daniel Bünzli 2007-08-16 14:54 ` Markus Mottl 2007-08-15 1:28 ` skaller 2007-08-15 19:04 ` Richard Jones 2007-08-16 16:17 ` Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object) Richard Jones 2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones 2007-08-14 23:24 ` Till Varoquaux 2007-08-14 23:35 ` Fernando Alegre 2007-08-15 7:59 ` Richard Jones
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).