caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Harrison, John R" <john.r.harrison@intel.com>
To: Berke Durak <berke.durak@exalead.com>
Cc: Caml-list List <caml-list@inria.fr>
Subject: RE: [Caml-list] Canonical Set/Map datastructure?
Date: Fri, 7 Mar 2008 10:13:32 -0700	[thread overview]
Message-ID: <DCC19446A892D84FBB89AE7C94F0C04E01D99539DC@azsmsx501.amr.corp.intel.com> (raw)
In-Reply-To: <47D11442.6090409@exalead.com>

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

Hi Berke,

| Thanks for your code.  I'm sorry I thought you maintained a separate
| hash table. It's very interesting code and I'll give it a try.

It would in fact have been more efficient to use Jean-Christophe
Filliatre's implementation of Patricia trees instead of writing my
own. However, it was important to me to release the code under a
BSD-like license. For what it's worth, I attach a version of my code
that's extracted from its context so it can be used independently. (I
replaced a few of my pet functions with standard library ones, and I
hope I didn't screw anything up in the process.)

| - The theoretical worst-case performance of hash-based data structures
| can be attained if the hash function has bad dispersal.  As the code
| relies on Hashtbl.hash, which does not hash deeply, this could
| potentially be a proble, in particular if your keys have long "common
| prefixes" that are not distinguished by the hash function.  It might
| work well in practice but I feel a little uncomfortable.

Yes, sure. My applications are mainly in theorem proving and symbolic
computation where this isn't an issue, and I can imagine it might not
be suitable elsewhere.

| - Also, it does not preserve the natural order for keys, and that
| is particularly bad, because I often use, for instance,
| float-indexed maps or sets as a substitute for heaps.

When I was looking at this area last time (maybe just following the
references from the paper I cited) I came across a kind of mixed
heap/tree structure with distinct "horizontal" and "vertical"
orderings. That might be something to consider. But once again there
is a bad worst-case performance if the two orders happen to be
correlated.

| The paper with the inverse cubic lower bound is *very* interesting; we
| don't see plausible lower bounds often in complexity theory,
| especially with such large assumptions (just bounded out-degree for
| the graph nodes).
|
| So it seems there is little hope to have a drop-in replacement for Set
| or Map that is compatible with the natural order of things, a.k.a.,
| Pervasives.compare.

Yes, I really found it striking that in a fundamental sense, you need
to pay the price of noncanonicality in order to get the guaranteed
O(log n) lookup.

John.


[-- Attachment #2: map.ml --]
[-- Type: application/octet-stream, Size: 10766 bytes --]

(* ------------------------------------------------------------------------- *)
(* Polymorphic finite partial functions via Patricia trees.                  *)
(*                                                                           *)
(* The point of this strange representation is that it is canonical (equal   *)
(* functions have the same encoding) yet reasonably efficient on average.    *)
(*                                                                           *)
(*                  (c) John Harrison 2003.                                  *)
(*                                                                           *)
(*    See http://www.cl.cam.ac.uk/~jrh13/atp/OCaml/LICENSE.txt               *)
(*                                                                           *)
(* Idea due to Diego Olivier Fernandez Pons (OCaml list, 2003/11/10).        *)
(* ------------------------------------------------------------------------- *)

type ('a,'b)func =
   Empty
 | Leaf of int * ('a*'b)list
 | Branch of int * int * ('a,'b)func * ('a,'b)func;;

(* ------------------------------------------------------------------------- *)
(* Undefined function.                                                       *)
(* ------------------------------------------------------------------------- *)

let undefined = Empty;;

(* ------------------------------------------------------------------------- *)
(* In case of equality comparison worries, better use this.                  *)
(* ------------------------------------------------------------------------- *)

let is_undefined f =
  match f with
    Empty -> true
  | _ -> false;;

(* ------------------------------------------------------------------------- *)
(* Operation analogous to "map" for lists.                                   *)
(* ------------------------------------------------------------------------- *)

let mapf =
  let rec map_list f l =
    match l with
      [] -> []
    | (x,y)::t -> (x,f(y))::(map_list f t) in
  let rec mapf f t =
    match t with
      Empty -> Empty
    | Leaf(h,l) -> Leaf(h,map_list f l)
    | Branch(p,b,l,r) -> Branch(p,b,mapf f l,mapf f r) in
  mapf;;

(* ------------------------------------------------------------------------- *)
(* Operations analogous to "fold" for lists.                                 *)
(* ------------------------------------------------------------------------- *)

let foldl =
  let rec foldl_list f a l =
    match l with
      [] -> a
    | (x,y)::t -> foldl_list f (f a x y) t in
  let rec foldl f a t =
    match t with
      Empty -> a
    | Leaf(h,l) -> foldl_list f a l
    | Branch(p,b,l,r) -> foldl f (foldl f a l) r in
  foldl;;

let foldr =
  let rec foldr_list f l a =
    match l with
      [] -> a
    | (x,y)::t -> f x y (foldr_list f t a) in
  let rec foldr f t a =
    match t with
      Empty -> a
    | Leaf(h,l) -> foldr_list f l a
    | Branch(p,b,l,r) -> foldr f l (foldr f r a) in
  foldr;;

(* ------------------------------------------------------------------------- *)
(* Application.                                                              *)
(* ------------------------------------------------------------------------- *)

let applyd =
  let rec apply_listd l d x =
    match l with
      (a,b)::t -> let c = Pervasives.compare x a in
                  if c = 0 then b else if c > 0 then apply_listd t d x else d x
    | [] -> d x in
  fun f d x ->
    let k = Hashtbl.hash x in
    let rec look t =
      match t with
        Leaf(h,l) when h = k -> apply_listd l d x
      | Branch(p,b,l,r) when (k lxor p) land (b - 1) = 0
                -> look (if k land b = 0 then l else r)
      | _ -> d x in
    look f;;

let apply f = applyd f (fun x -> failwith "apply");;

let tryapplyd f a d = applyd f (fun x -> d) a;;

let tryapplyl f x = tryapplyd f x [];;

let defined f x = try apply f x; true with Failure _ -> false;;

(* ------------------------------------------------------------------------- *)
(* Undefinition.                                                             *)
(* ------------------------------------------------------------------------- *)

let undefine =
  let rec undefine_list x l =
    match l with
      (a,b as ab)::t ->
          let c = Pervasives.compare x a in
          if c = 0 then t
          else if c < 0 then l else
          let t' = undefine_list x t in
          if t' == t then l else ab::t'
    | [] -> [] in
  fun x ->
    let k = Hashtbl.hash x in
    let rec und t =
      match t with
        Leaf(h,l) when h = k ->
          let l' = undefine_list x l in
          if l' == l then t
          else if l' = [] then Empty
          else Leaf(h,l')
      | Branch(p,b,l,r) when k land (b - 1) = p ->
          if k land b = 0 then
            let l' = und l in
            if l' == l then t
            else if is_undefined l' then r
            else Branch(p,b,l',r)
          else
            let r' = und r in
            if r' == r then t
            else if is_undefined r' then l
            else Branch(p,b,l,r')
      | _ -> t in
    und;;

(* ------------------------------------------------------------------------- *)
(* Redefinition and combination.                                             *)
(* ------------------------------------------------------------------------- *)

let (|->),combine =
  let ldb x y = let z = x lxor y in z land (-z) in
  let newbranch p1 t1 p2 t2 =
    let b = ldb p1 p2 in
    let p = p1 land (b - 1) in
    if p1 land b = 0 then Branch(p,b,t1,t2)
    else Branch(p,b,t2,t1) in
  let rec define_list (x,y as xy) l =
    match l with
      (a,b as ab)::t ->
          let c = Pervasives.compare x a in
          if c = 0 then xy::t
          else if c < 0 then xy::l
          else ab::(define_list xy t)
    | [] -> [xy]
  and combine_list op z l1 l2 =
    match (l1,l2) with
      [],_ -> l2
    | _,[] -> l1
    | ((x1,y1 as xy1)::t1,(x2,y2 as xy2)::t2) ->
          let c = Pervasives.compare x1 x2 in
          if c < 0 then xy1::(combine_list op z t1 l2)
          else if c > 0 then xy2::(combine_list op z l1 t2) else
          let y = op y1 y2 and l = combine_list op z t1 t2 in
          if z(y) then l else (x1,y)::l in
  let (|->) x y =
    let k = Hashtbl.hash x in
    let rec upd t =
      match t with
        Empty -> Leaf (k,[x,y])
      | Leaf(h,l) ->
           if h = k then Leaf(h,define_list (x,y) l)
           else newbranch h t k (Leaf(k,[x,y]))
      | Branch(p,b,l,r) ->
          if k land (b - 1) <> p then newbranch p t k (Leaf(k,[x,y]))
          else if k land b = 0 then Branch(p,b,upd l,r)
          else Branch(p,b,l,upd r) in
    upd in
  let rec combine op z t1 t2 =
    match (t1,t2) with
      Empty,_ -> t2
    | _,Empty -> t1
    | Leaf(h1,l1),Leaf(h2,l2) ->
          if h1 = h2 then
            let l = combine_list op z l1 l2 in
            if l = [] then Empty else Leaf(h1,l)
          else newbranch h1 t1 h2 t2
    | (Leaf(k,lis) as lf),(Branch(p,b,l,r) as br) ->
          if k land (b - 1) = p then
            if k land b = 0 then
              let l' = combine op z lf l in
              if is_undefined l' then r else Branch(p,b,l',r)
            else
              let r' = combine op z lf r in
              if is_undefined r' then l else Branch(p,b,l,r')
          else
            newbranch k lf p br
    | (Branch(p,b,l,r) as br),(Leaf(k,lis) as lf) ->
          if k land (b - 1) = p then
            if k land b = 0 then
              let l' = combine op z l lf in
              if is_undefined l' then r else Branch(p,b,l',r)
            else
              let r' = combine op z r lf in
              if is_undefined r' then l else Branch(p,b,l,r')
          else
            newbranch p br k lf
    | Branch(p1,b1,l1,r1),Branch(p2,b2,l2,r2) ->
          if b1 < b2 then
            if p2 land (b1 - 1) <> p1 then newbranch p1 t1 p2 t2
            else if p2 land b1 = 0 then
              let l = combine op z l1 t2 in
              if is_undefined l then r1 else Branch(p1,b1,l,r1)
            else
              let r = combine op z r1 t2 in
              if is_undefined r then l1 else Branch(p1,b1,l1,r)
          else if b2 < b1 then
            if p1 land (b2 - 1) <> p2 then newbranch p1 t1 p2 t2
            else if p1 land b2 = 0 then
              let l = combine op z t1 l2 in
              if is_undefined l then r2 else Branch(p2,b2,l,r2)
            else
              let r = combine op z t1 r2 in
              if is_undefined r then l2 else Branch(p2,b2,l2,r)
          else if p1 = p2 then
            let l = combine op z l1 l2 and r = combine op z r1 r2 in
            if is_undefined l then r
            else if is_undefined r then l else Branch(p1,b1,l,r)
          else
            newbranch p1 t1 p2 t2 in
  (|->),combine;;

(* ------------------------------------------------------------------------- *)
(* Special case of point function.                                           *)
(* ------------------------------------------------------------------------- *)

let (|=>) = fun x y -> (x |-> y) undefined;;

(* ------------------------------------------------------------------------- *)
(* Grab an arbitrary element.                                                *)
(* ------------------------------------------------------------------------- *)

let rec choose t =
  match t with
    Empty -> failwith "choose: completely undefined function"
  | Leaf(h,l) -> List.hd l
  | Branch(b,p,t1,t2) -> choose t1;;

(* ------------------------------------------------------------------------- *)
(* Mapping to sorted-list representation of the graph, domain and range.     *)
(* ------------------------------------------------------------------------- *)

let rec uniq l =
  match l with
    x::(y::_ as t) -> let t' = uniq t in
                      if Pervasives.compare x y = 0 then t' else
                      if t'==t then l else x::t'
 | _ -> l;;

let setify l = uniq(List.sort Pervasives.compare l);;

let graph f = setify (foldl (fun a x y -> (x,y)::a) [] f);;

let dom f = setify(foldl (fun a x y -> x::a) [] f);;

let ran f = setify(foldl (fun a x y -> y::a) [] f);;

(* ------------------------------------------------------------------------- *)
(* Idiom for a mapping zipping domain and range lists.                       *)
(* ------------------------------------------------------------------------- *)

let fpf xs ys = List.fold_right2 (|->) xs ys undefined;;

(* ------------------------------------------------------------------------- *)
(* Install a (trivial) printer for finite partial functions.                 *)
(* ------------------------------------------------------------------------- *)

let print_fpf (f:('a,'b)func) = print_string "<func>";;

#install_printer print_fpf;;

  reply	other threads:[~2008-03-07 17:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch
2008-03-05 19:53   ` Jean-Christophe Filliâtre
2008-03-05 20:03   ` Jon Harrop
2008-03-05 21:56     ` Alain Frisch
2008-03-06  7:45     ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06  9:53 ` Berke Durak
2008-03-06 17:36   ` Harrison, John R
2008-03-07 10:09     ` Berke Durak
2008-03-07 17:13       ` Harrison, John R [this message]
2008-03-07 10:19   ` Alain Frisch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DCC19446A892D84FBB89AE7C94F0C04E01D99539DC@azsmsx501.amr.corp.intel.com \
    --to=john.r.harrison@intel.com \
    --cc=berke.durak@exalead.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).