caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Weak hash table for attaching extra data to an object
@ 2007-08-14 10:15 Richard Jones
  2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher
  2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Jones @ 2007-08-14 10:15 UTC (permalink / raw)
  To: caml-list

Problem:

I have a module which I can't edit[1].  This module, let's call it
Document, stores a representation of an XML tree in internal node
objects:

  type intnode

What I want to do is -- completely outside Document -- attach my own
extra information to these nodes.  The key point is that if the
garbage collector collects an intnode, then I want my extra
information to be garbage collected too.

My proposed solution would use a weak hash table similar to / derived
from this one: http://remi.vanicat.free.fr/ocaml/hweak/.  I would
store in this hash table mappings from (weak) intnode pointers to my
extra data structure.

Remi's Hweak module isn't quite what I need because the data is also
weak.  Since nothing except the hash table itself is tracking my extra
data, the GC can potentially drop all of the bindings, and that's not
much help.

I tried to modify Hweak so that the data is not weak, but that proved
to be quite hard.

I wonder if anyone has done this, or can suggest a better way to solve
this problem?  The issue of attaching extra data to some existing
object would seem to be a fairly common one ...

Rich.

[1] By "can't edit" I really mean "don't want to modify".  I want
Document to be general purpose.  Also, unlike PXP, I want multiple
callers to be able to individually attach data to nodes.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones
@ 2007-08-14 11:08 ` Thomas Fischbacher
  2007-08-14 11:48   ` Till Varoquaux
  2007-08-14 20:44   ` Jon Harrop
  2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones
  1 sibling, 2 replies; 15+ messages in thread
From: Thomas Fischbacher @ 2007-08-14 11:08 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list


Richard Jones wrote:

> I have a module which I can't edit[1].  This module, let's call it
> Document, stores a representation of an XML tree in internal node
> objects:
> 
>   type intnode
> 
> What I want to do is -- completely outside Document -- attach my own
> extra information to these nodes. 

Ah, so the situation really is somewhat common, and not something
just I am running into occasionally. (The problem in particular
also arises occasionally when writing symbolic algebra code when one
wants to attach meta-data (e.g. on typesetting) to terms without
touching the underlying term representations in any way.)

> The key point is that if the
> garbage collector collects an intnode, then I want my extra
> information to be garbage collected too.
> 
> I wonder if anyone has done this, or can suggest a better way to solve
> this problem?  The issue of attaching extra data to some existing
> object would seem to be a fairly common one ...

You might be able to "fake" a weak hash table by using a trick: for a
key-value entry k1 -> v1, you take k1 and v1 and rebuild the outermost
data structure to get k2 with k2 = k1 and k2 != k1 (oh hey, now that
looks confusing, right?), as well as v2 = v1 and v2 != v1. Now, you
enter k2 -> v2 into an ordinary hash table and register finalizers for
k1 and v1 which remove this k2 -> v2 entry from your hash. That might
do.

-- 
best regards,
Thomas Fischbacher
tf@functionality.de


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher
@ 2007-08-14 11:48   ` Till Varoquaux
  2007-08-14 20:44   ` Jon Harrop
  1 sibling, 0 replies; 15+ messages in thread
From: Till Varoquaux @ 2007-08-14 11:48 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

I have a small library that could do the trick. It is built a a
hashtable whose keys  are weak, you need to recompact it from time to
time in order to free associated values. One strategy would be to
recompact when n values are freed.
I tried to limit the performance cost by storing collision values in a
separate table.
The code can be browsed online at:
http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml.html
There's a very small dependency on another file, i use Option.get
(defined as in extlib:
  let get = function
   | Some v -> v
   | None -> raise No_value
)
It is licensed under lgpl+special linking clause. If you need a more
permissive license let me know.
This can be downloaded via darcs:
darcs get http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml.html
or directly at:
http://till.varoquaux.free.fr/projects/mini_js/WeakHt.ml
There no mli file (I'm lazy).
Constructive criticism is welcome.
If you need automatic collection of values and have a hard time
implementing I can work on it.
Cheers,
Till
P.S.: I wasn't aware of HWeak when I did this library, I don't know
how it compares performance wise. Having never had to profile this
library I actually don't know how bad it really is. This can certainly
also be worked on.
On 8/14/07, Thomas Fischbacher <tf@functionality.de> wrote:
>
> Richard Jones wrote:
>
> > I have a module which I can't edit[1].  This module, let's call it
> > Document, stores a representation of an XML tree in internal node
> > objects:
> >
> >   type intnode
> >
> > What I want to do is -- completely outside Document -- attach my own
> > extra information to these nodes.
>
> Ah, so the situation really is somewhat common, and not something
> just I am running into occasionally. (The problem in particular
> also arises occasionally when writing symbolic algebra code when one
> wants to attach meta-data (e.g. on typesetting) to terms without
> touching the underlying term representations in any way.)
>
> > The key point is that if the
> > garbage collector collects an intnode, then I want my extra
> > information to be garbage collected too.
> >
> > I wonder if anyone has done this, or can suggest a better way to solve
> > this problem?  The issue of attaching extra data to some existing
> > object would seem to be a fairly common one ...
>
> You might be able to "fake" a weak hash table by using a trick: for a
> key-value entry k1 -> v1, you take k1 and v1 and rebuild the outermost
> data structure to get k2 with k2 = k1 and k2 != k1 (oh hey, now that
> looks confusing, right?), as well as v2 = v1 and v2 != v1. Now, you
> enter k2 -> v2 into an ordinary hash table and register finalizers for
> k1 and v1 which remove this k2 -> v2 entry from your hash. That might
> do.
>
> --
> best regards,
> Thomas Fischbacher
> tf@functionality.de
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


-- 
http://till-varoquaux.blogspot.com/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones
  2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher
@ 2007-08-14 16:22 ` Richard Jones
  2007-08-14 23:24   ` Till Varoquaux
  2007-08-14 23:35   ` Fernando Alegre
  1 sibling, 2 replies; 15+ messages in thread
From: Richard Jones @ 2007-08-14 16:22 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

Attached is my attempt to make a WeakMetadata module (a weak hash
table where only the key is weak and the value is used to store
metadata about the key).  It is derived from Remi Vanicat's Hweak
module, which is itself derived from the weak set in stdlib.

Unfortunately it relies on the following being safe.  I have no idea
if this is safe or not.

  Array.create 0 (Obj.magic ())

Also it doesn't clean up the extra data (metadata) very aggressively.
It basically waits for it to get overwritten when further bindings are
added to a bucket.  So if the metadata is very large it would be worth
adding some sort of "compact" method.

Rich.

-- 
Richard Jones
Red Hat

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: weakMetadata.ml --]
[-- Type: text/plain; charset=utf-8, Size: 7104 bytes --]

(***********************************************************************)
(*                                                                     *)
(*                           HWeak                                     *)
(*                                                                     *)
(*                        Remi Vanicat                                 *)
(*                                                                     *)
(*  Copyright 2002 Rémi Vanicat                                        *)
(*  All rights reserved.  This file is distributed under the terms of  *)
(*  the GNU Library General Public License, with the special exception *)
(*  on linking described in the LICENCE file of the Objective Caml     *)
(*  distribution                                                       *)
(*                                                                     *)
(*  Most of this file is an adptation of the implentation of Weak      *)
(*  Hastable by Damien Doligez that can be found into Objective Caml   *)
(*  which is Copyright 1997 Institut National de Recherche en          *)
(*  Informatique et en Automatique and is distributed under the same   *)
(*  licence                                                            *)
(*                                                                     *)
(***********************************************************************)

(* RWMJ: Modified so that only the key is weak. *)

(** Weak array operations *)

(** Weak hash tables *)

module type S = sig
  type key
  type 'a t
  val create : int -> 'a t
  val clear : 'a t -> unit
  val add : 'a t -> key -> 'a -> unit
  val replace : 'a t -> key -> 'a -> unit
  val remove : 'a t -> key -> unit
  val merge : 'a t -> key -> 'a -> 'a
  val find : 'a t -> key -> 'a
  val find_all : 'a t -> key -> 'a list
  val mem : 'a t -> key -> bool
  val iter : (key -> 'a -> unit) -> 'a t -> unit
  val fold : (key -> 'a -> 'b -> 'b) -> 'a t -> 'b -> 'b
  val count : 'a t -> int
  val stats : 'a t -> int * int * int * int * int * int
end;;

module Make (H : Hashtbl.HashedType) : (S with type key = H.t) = struct

  type key = H.t;;

  let emptykeybucket : key Weak.t = Weak.create 0;;

  type 'a t = {
    emptydatabucket : 'a array;
    mutable table : (key Weak.t * 'a array) array;
    mutable totsize : int;             (* sum of the bucket sizes *)
    mutable limit : int;               (* max ratio totsize/table length *)
  };;

  let get_index t d = (H.hash d land max_int) mod (Array.length t.table);;

  let create sz =
    let sz = if sz < 7 then 7 else sz in
    let sz = if sz > Sys.max_array_length then Sys.max_array_length else sz in
    let edb = Array.create 0 (Obj.magic ()) in
    {
      emptydatabucket = edb;
      table = Array.create sz (emptykeybucket, edb);
      totsize = 0;
      limit = 3;
    };;

  let clear t =
    for i = 0 to Array.length t.table - 1 do
      t.table.(i) <- (emptykeybucket, t.emptydatabucket);
    done;
    t.totsize <- 0;
    t.limit <- 3;
  ;;

  let fold f t init =
    let rec fold_bucket i ((b1, b2) as cpl) accu =
      if i >= Weak.length b1 then accu else
      match (Weak.get b1 i, Array.get b2 i) with
      | (Some v1, v2) -> fold_bucket (i+1) cpl (f v1 v2 accu)
      | _ -> fold_bucket (i+1) cpl accu
    in
    Array.fold_right (fold_bucket 0) t.table init
  ;;

  let iter f t = fold (fun d1 d2 () -> f d1 d2) t ();;

  let count t =
    let rec count_bucket i ((b1, b2) as cpl) accu =
      if i >= Weak.length b1 then accu else
      count_bucket (i+1) cpl (accu + (if Weak.check b1 i then 1 else 0))
    in
    Array.fold_right (count_bucket 0) t.table 0
  ;;

  let next_sz n = min (3*n/2 + 3) (Sys.max_array_length - 1);;

  let rec resize t =
    let oldlen = Array.length t.table in
    let newlen = next_sz oldlen in
    if newlen > oldlen then begin
      let newt = create newlen in
      newt.limit <- t.limit + 100;          (* prevent resizing of newt *)
      fold (fun f t () -> add newt f t) t ();
   (* assert Array.length newt.table = newlen; *)
      t.table <- newt.table;
      t.limit <- t.limit + 2;
    end

  and add_aux t k e index =
    let bucket1, bucket2 = t.table.(index) in
    let sz = Weak.length bucket1 in
    let rec loop i =
      if i >= sz then begin
        let newsz = min (sz + 3) (Sys.max_array_length - 1) in
        if newsz <= sz then failwith "Weak.Make : hash bucket cannot grow more";
        let newbucket1 = Weak.create newsz 
	and newbucket2 = Array.create newsz e in
        Weak.blit bucket1 0 newbucket1 0 sz;
        Array.blit bucket2 0 newbucket2 0 sz;
        Weak.set newbucket1 i (Some k);
	(*Array.set newbucket2 i e; -- implied by Array.create above*)
        t.table.(index) <- (newbucket1, newbucket2);
        t.totsize <- t.totsize + (newsz - sz);
        if t.totsize > t.limit * Array.length t.table then resize t;
      end else begin
        if Weak.check bucket1 i
        then loop (i+1)
        else 
	  begin
	    Weak.set bucket1 i (Some k);
	    Array.set bucket2 i e;
	  end
      end
    in
    loop 0;

  and add t k e = add_aux t k e (get_index t k)
  ;;

  let find_or t d ifnotfound =
    let index = get_index t d in
    let (bucket1, bucket2) = t.table.(index) in
    let sz = Weak.length bucket1 in
    let rec loop i =
      if i >= sz then ifnotfound index
      else begin
        match Weak.get_copy bucket1 i with
        | Some v when H.equal v d -> Array.get bucket2 i
        | _ -> loop (i+1)
      end
    in
    loop 0
  ;;

  let merge t k d = find_or t k (fun index -> add_aux t k d index; d);;

  let find t d = find_or t d (fun index -> raise Not_found);;

  let find_shadow t d iffound ifnotfound =
    let index = get_index t d in
    let (bucket1, bucket2) = t.table.(index) in
    let sz = Weak.length bucket1 in
    let rec loop i =
      if i >= sz then ifnotfound else begin
        match Weak.get_copy bucket1 i with
        | Some v when H.equal v d
	    -> iffound bucket1 bucket2 i
        | _ -> loop (i+1)
      end
    in
    loop 0
  ;;

  let replace t k d = 
    if (find_shadow t k 
	  (fun w1 w2 i -> 
	     Weak.set w1 i (Some k); Array.set w2 i d; false ) true) 
    then
      add t k d

  let remove t d = find_shadow t d
		     (fun w1 w2 i ->
			Weak.set w1 i None;
			(*Array.set w2 i ? -- leave it, it'll be overwritten*))
		     ()

  let mem t d = find_shadow t d (fun _ _ i -> true) false

  let find_all t d =
    let index = get_index t d in
    let (bucket1, bucket2) = t.table.(index) in
    let sz = Weak.length bucket1 in
    let rec loop i accu =
      if i >= sz then accu
      else begin
        match Weak.get_copy bucket1 i with
        | Some v when H.equal v d
	    -> let v = Array.get bucket2 i in loop (i+1) (v::accu)
        | _ -> loop (i+1) accu
      end
    in
    loop 0 []
  ;;

  let stats t =
    let len = Array.length t.table in
    let lens = Array.map (fun (b,_) -> Weak.length b) t.table in
    Array.sort compare lens;
    let totlen = Array.fold_left ( + ) 0 lens in
    (len, count t, totlen, lens.(0), lens.(len/2), lens.(len-1))
  ;;

end;;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher
  2007-08-14 11:48   ` Till Varoquaux
@ 2007-08-14 20:44   ` Jon Harrop
  2007-08-14 23:09     ` Stefano Zacchiroli
                       ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Jon Harrop @ 2007-08-14 20:44 UTC (permalink / raw)
  To: caml-list

On Tuesday 14 August 2007 12:08:11 Thomas Fischbacher wrote:
> Richard Jones wrote:
> > I have a module which I can't edit[1].  This module, let's call it
> > Document, stores a representation of an XML tree in internal node
> > objects:
> >
> >   type intnode
> >
> > What I want to do is -- completely outside Document -- attach my own
> > extra information to these nodes.
>
> Ah, so the situation really is somewhat common, and not something
> just I am running into occasionally. (The problem in particular
> also arises occasionally when writing symbolic algebra code when one
> wants to attach meta-data (e.g. on typesetting) to terms without
> touching the underlying term representations in any way.)

I think this is an important design choice that people should be aware of when 
they first write the code. If you think the product types inside your expr 
type will be extended:

  type expr =
    | Int of int
    | Var of string
    | Seq of expr * expr array;;

with pattern matches like this:

  Seq(h, t) -> ...

Then a good first step is to turn n-ary constructor arguments into a single 
record:

  type expr =
    | Int of int
    | Var of string
    | Seq of seq
  and seq = {h: expr; t: expr array};;

Then you write your pattern matches like this:

  Seq{h=h; t=t} -> ...

and you are free to add more fields to the record without having to update 
such pattern matches. You also need to replace direct construction with an 
explicit constructor function:

  let seq h t = Seq{h=h; t=t}

You can also parameterize the type over metadata:

  type 'a expr =
    | Int of int
    | Var of string
    | Seq of 'a seq
  and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};;

Now you can insert arbitrary metadata into Seq nodes.

Another possible solution (at design time) is to "untie the recursive knot" of 
the data structure:

  type 'expr aexpr =
    | Int of int
    | Var of string
    | Seq of 'expr * 'expr array;;

This allows you to bind data structures together and, in particular, lets you 
inject metadata nodes between levels of exprs in the tree:

  type 'a expr = Meta of 'a * 'a expr aexpr;;

(This can be done more efficiently using -rectypes)

If your metadata are heavily used (e.g. hashing or culling information) and 
you are designing your own data structure then these approaches can all be 
useful (as well as polymorphic variants, that allow other forms of extension) 
because these approaches all result in fast code.

If your metadata are sporadically used or, in particular, only sparsely 
present or you do not have access to the library then the weak hash table 
approach is probably the way to go.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 20:44   ` Jon Harrop
@ 2007-08-14 23:09     ` Stefano Zacchiroli
  2007-08-15  0:33       ` Jon Harrop
  2007-08-15  1:28     ` skaller
  2007-08-15 19:04     ` Richard Jones
  2 siblings, 1 reply; 15+ messages in thread
From: Stefano Zacchiroli @ 2007-08-14 23:09 UTC (permalink / raw)
  To: caml-list

On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote:
> I think this is an important design choice that people should be aware of when 
> they first write the code. If you think the product types inside your expr 

Richard's point was precisely about *not* being able to change such a
design choice. There can be a lot of reasons for that, for example (even
though this was not Richard's situation AFAICT) not being the author of
the data structure you want to decorate.

IOW: the original question is still valid.

-- 
Stefano Zacchiroli -*- PhD in Computer Science ............... now what?
zack@{cs.unibo.it,debian.org,bononia.it} -%- http://www.bononia.it/zack/
(15:56:48)  Zack: e la demo dema ?    /\    All one has to do is hit the
(15:57:15)  Bac: no, la demo scema    \/    right keys at the right time


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones
@ 2007-08-14 23:24   ` Till Varoquaux
  2007-08-14 23:35   ` Fernando Alegre
  1 sibling, 0 replies; 15+ messages in thread
From: Till Varoquaux @ 2007-08-14 23:24 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

On 8/14/07, Richard Jones <rich@annexia.org> wrote:
> Attached is my attempt to make a WeakMetadata module (a weak hash
> table where only the key is weak and the value is used to store
> metadata about the key).  It is derived from Remi Vanicat's Hweak
> module, which is itself derived from the weak set in stdlib.
>
> Unfortunately it relies on the following being safe.  I have no idea
> if this is safe or not.
>
>   Array.create 0 (Obj.magic ())
You seem to be asking for trouble here. From what I read in your code
this *should* work. It is however highly dependent on the garbage
collector implementation: the first bit contains tells it whether this
is a block or not.

I would recommend using options instead (and again you might just use
my library but that shameless self promotion). Using options will have
a performance impact. You can can see your solution as a clever (yet
brittle) optimization hack.
However if you are hellbent on performances you might also consider
having direct access to your keys (fore quality) and collisions stored
in another table.

Regards,
Till
> Also it doesn't clean up the extra data (metadata) very aggressively.
> It basically waits for it to get overwritten when further bindings are
> added to a bucket.  So if the metadata is very large it would be worth
> adding some sort of "compact" method.
>
> Rich.
>
> --
> Richard Jones
> Red Hat
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>


-- 
http://till-varoquaux.blogspot.com/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones
  2007-08-14 23:24   ` Till Varoquaux
@ 2007-08-14 23:35   ` Fernando Alegre
  2007-08-15  7:59     ` Richard Jones
  1 sibling, 1 reply; 15+ messages in thread
From: Fernando Alegre @ 2007-08-14 23:35 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

On Tue, Aug 14, 2007 at 05:22:06PM +0100, Richard Jones wrote:

> if this is safe or not.
> 
>   Array.create 0 (Obj.magic ())
> 

I wonder if this the same as just [||]

Fernando


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 23:09     ` Stefano Zacchiroli
@ 2007-08-15  0:33       ` Jon Harrop
  2007-08-15 12:33         ` Daniel Bünzli
  2007-08-16 14:54         ` Markus Mottl
  0 siblings, 2 replies; 15+ messages in thread
From: Jon Harrop @ 2007-08-15  0:33 UTC (permalink / raw)
  To: caml-list

On Wednesday 15 August 2007 00:09:40 Stefano Zacchiroli wrote:
> On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote:
> > I think this is an important design choice that people should be aware of
> > when they first write the code. If you think the product types inside
> > your expr
>
> Richard's point was precisely about *not* being able to change such a
> design choice. There can be a lot of reasons for that, for example (even
> though this was not Richard's situation AFAICT) not being the author of
> the data structure you want to decorate.

Yes. Laterally, the point can be addressed by making library writers aware of 
the points that I made. I think Markus Mottl once said that all libraries 
should include phantom types...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 20:44   ` Jon Harrop
  2007-08-14 23:09     ` Stefano Zacchiroli
@ 2007-08-15  1:28     ` skaller
  2007-08-15 19:04     ` Richard Jones
  2 siblings, 0 replies; 15+ messages in thread
From: skaller @ 2007-08-15  1:28 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Tue, 2007-08-14 at 21:44 +0100, Jon Harrop wrote:

> You can also parameterize the type over metadata:
> 
>   type 'a expr =
>     | Int of int
>     | Var of string
>     | Seq of 'a seq
>   and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};;
> 
> Now you can insert arbitrary metadata into Seq nodes.

This is not a precise description. You cannot put arbitrary
meta-data into the Seq nodes with this design, for a given
data structure, you can choose just one type of meta-data: this
is the same as for example a Hashtbl.

If you wish to *dynamically* vary the meta data, which is often
the case when you have a tool like a compiler that performs
phased analysis, a property list based on polymorphic variants
can be useful.

This is one of three ways I can think of to vary the data 
type dynamically across compilation unit boundaries.

A second way is to use mutable field of a class type
and change the object to a different implementation
of the same abstraction.

The third way is quite tricky and equivalent to the first:
use a function unit -> unit as the type, and execute it
to throw an exception. This is basically dynamic typing.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 23:35   ` Fernando Alegre
@ 2007-08-15  7:59     ` Richard Jones
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Jones @ 2007-08-15  7:59 UTC (permalink / raw)
  To: Fernando Alegre; +Cc: caml-list

On Tue, Aug 14, 2007 at 06:35:40PM -0500, Fernando Alegre wrote:
> On Tue, Aug 14, 2007 at 05:22:06PM +0100, Richard Jones wrote:
> 
> > if this is safe or not.
> > 
> >   Array.create 0 (Obj.magic ())
> > 
> 
> I wonder if this the same as just [||]

Yes it is.  I was looking so long at the details that I didn't
see the obvious answer.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-15  0:33       ` Jon Harrop
@ 2007-08-15 12:33         ` Daniel Bünzli
  2007-08-16 14:54         ` Markus Mottl
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Bünzli @ 2007-08-15 12:33 UTC (permalink / raw)
  To: caml-list


Le 15 août 07 à 02:33, Jon Harrop a écrit :

> I think Markus Mottl once said that all libraries should include  
> phantom types...

I don't know if he said that however the point of a phantom type is  
that there is no concrete value associated to it -- the phantom  
parameter isn't used in the type definition. Hence phantoms are  
useless if you need to decorate a type.

Daniel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-14 20:44   ` Jon Harrop
  2007-08-14 23:09     ` Stefano Zacchiroli
  2007-08-15  1:28     ` skaller
@ 2007-08-15 19:04     ` Richard Jones
  2007-08-16 16:17       ` Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object) Richard Jones
  2 siblings, 1 reply; 15+ messages in thread
From: Richard Jones @ 2007-08-15 19:04 UTC (permalink / raw)
  To: caml-list

On Tue, Aug 14, 2007 at 09:44:15PM +0100, Jon Harrop wrote:
> You can also parameterize the type over metadata:
> 
>   type 'a expr =
>     | Int of int
>     | Var of string
>     | Seq of 'a seq
>   and 'a seq = {h: 'a expr; t: 'a expr array; meta: 'a};;
> 
> Now you can insert arbitrary metadata into Seq nodes.

This is basically what PXP does, and what my own PG'OCaml does.
Unfortunately it's not very usable for a few practical reasons, as I
shall describe.

Firstly it only allows a single annotation.  You can use a tuple to
get around that, but that requires the annotators to cooperate
somehow, which is not much good if you're combining code from several
places.

Secondly if you don't want to annotate, you need to manually cast the
type to 'unit t' somewhere in your main program (and that will fail if
one of the libraries you're using wants to annotate the type).

Also, can you safely annotate the nodes one way in one part of your
program, and other way in another part (eg. if you're transforming a
tree with annotations, in stages)?  Using variants isn't completely
safe (but then neither are WHTs in this instance because you can not
annotate a node and WHTs don't catch this statically).

> If your metadata are sporadically used or, in particular, only sparsely 
> present or you do not have access to the library then the weak hash table 
> approach is probably the way to go.

I should have said :-) I'm trying to annotate the <img> & <applet>
nodes in a typical HTML document, so they are both sparse (compared to
all nodes), but there are many of them.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Caml-list] Weak hash table for attaching extra data to an object
  2007-08-15  0:33       ` Jon Harrop
  2007-08-15 12:33         ` Daniel Bünzli
@ 2007-08-16 14:54         ` Markus Mottl
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Mottl @ 2007-08-16 14:54 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On 8/14/07, Jon Harrop <jon@ffconsultancy.com> wrote:
> Yes. Laterally, the point can be addressed by making library writers aware of
> the points that I made. I think Markus Mottl once said that all libraries
> should include phantom types...

Well, I don't know in what context I said this, but one can certainly
always create modules that are equivalent to another one with the
exception of some abstract type taking an extra type parameter.  This
extra parameter could then capture additional constraints on the use
of values of that type.  In that sense it is not necessary that every
library make use of phantom types.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object)
  2007-08-15 19:04     ` Richard Jones
@ 2007-08-16 16:17       ` Richard Jones
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Jones @ 2007-08-16 16:17 UTC (permalink / raw)
  To: caml-list

On Wed, Aug 15, 2007 at 08:04:55PM +0100, Richard Jones wrote:
> I should have said :-) I'm trying to annotate the <img> & <applet>
> nodes in a typical HTML document, so they are both sparse (compared to
> all nodes), but there are many of them.

I did some measurements using my WeakMetadata module to annotate a
large random HTML document tree.  Here are some observations and
results, in no particular order.

Code:

http://annexia.org/tmp/weakMetadata.ml
http://annexia.org/tmp/weakMetadata.mli
http://annexia.org/tmp/test_weakmetadata.ml

Firstly, it works, but I found a few programming pitfalls.  In one
case I was accidentally copying my tree nodes when annotating them, so
I was annotating the copies rather than the nodes themselves.  The
code was along these lines:

    | Element (a,b,c,d) -> annotate (a,b,c,d)

instead of:

    | Element ((_,_,_,_) as e) -> annotate e

Obviously experienced OCaml programmers wouldn't make this mistake,
right?!

Another programming problem was that I ended up having to assign a
unique identifier to each node which was used to do the hash
(otherwise the hash function would have to do a costly recursive
descent to determine if two nodes are equal).  This is a peculiarity
of the Weak hash table itself, namely that physical equality cannot be
used.

Another programming problem is that the choice of the initial size of
the hash table determines performance factors in rather non-obvious
ways.  Allocating the hash table with 'create 13' causes buckets to
have long chains, but that creates much lower memory overhead.
Allocating the hash table with 'create 100003' gives short chains and
high memory overhead.  (Exact numbers below).

I wrote a program to create a document (tree) with approximately
560,000 elements and approximately 840,000 text nodes.

I annotated every element in the document to get some idea of the
overhead.  As mentioned before, choice of initial hash table size
affects the length of the bucket chain:

  Initial hash table size: 13
  Median chain length per bucket: 33
  Overhead per element annotated: 21 bytes  (on a 64 bit machine)

  Initial hash table size: 100003
  Median chain length per bucket: 6
  Overhead per element annotated: 40 bytes  (on a 64 bit machine)

Approximately 210,000 elements were <img> or <input> elements (15% of
the total nodes) and I am particularly interested in just annotating
these nodes.  The overhead for annotating sparse nodes like this is
similar (again with chain length dependent a lot on initial choice of
hash table size):

  Initial hash table size: 13
  Median chain length per bucket: 27
  Overhead per element annotated: 26 bytes  (on a 64 bit machine)

  Initial hash table size: 100003
  Median chain length per bucket: 3
  Overhead per element annotated: 70 bytes  (on a 64 bit machine)

You have to manually compact the hash table in order to actually
regain the memory used by your metadata annotations.  (Alternately,
wait for the hash table as a whole to become unreachable, or reuse the
hash table which causes old annotations to be overwritten).

Internally, the Weak module stores a linked list of weak arrays
(caml_weak_list_head).  The GC has a separate phase to deal with weak
arrays, and as far as I can tell from the code the amount of work it
will do in each slice is limited.  The GC also has special code to
remove unreachable weak arrays, so this list does not just grow
without bound.  However I'm still unclear on the exact O(..) for using
huge numbers of weak arrays, which is what the weak hash table does
once you start annotating large numbers of nodes.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-08-16 16:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-14 10:15 Weak hash table for attaching extra data to an object Richard Jones
2007-08-14 11:08 ` [Caml-list] " Thomas Fischbacher
2007-08-14 11:48   ` Till Varoquaux
2007-08-14 20:44   ` Jon Harrop
2007-08-14 23:09     ` Stefano Zacchiroli
2007-08-15  0:33       ` Jon Harrop
2007-08-15 12:33         ` Daniel Bünzli
2007-08-16 14:54         ` Markus Mottl
2007-08-15  1:28     ` skaller
2007-08-15 19:04     ` Richard Jones
2007-08-16 16:17       ` Some observations and measurements with using Weak Hashtable to annotate a tree (was: Re: [Caml-list] Weak hash table for attaching extra data to an object) Richard Jones
2007-08-14 16:22 ` [Caml-list] Weak hash table for attaching extra data to an object Richard Jones
2007-08-14 23:24   ` Till Varoquaux
2007-08-14 23:35   ` Fernando Alegre
2007-08-15  7:59     ` Richard Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).