caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Help with simple ocaml memoization problem
@ 2007-11-29  3:17 Evan Klitzke
  2007-11-29  5:53 ` [Caml-list] " Peng Zang
  0 siblings, 1 reply; 14+ messages in thread
From: Evan Klitzke @ 2007-11-29  3:17 UTC (permalink / raw)
  To: caml-list

Hi everyone,

I'm learning Ocaml and I'm trying to figure out why the code I've
written is so slow (and not working). I'm trying to learn Ocaml by
implementing the Project Euler problems in Ocaml, and I'm stuck on
problem 14 (http://projecteuler.net/index.php?section=problems&id=14).

Basically I'm doing the 3*n+1 problem and I need to memoize values
that I've already computed if the function is going to finish in a
reasonable amount of time. I wrote a really short prototype of what I
want to do in Python, and that code is really short and runs quickly
(11 seconds); if you're interested you can see the prototype at
http://static.eklitzke.org/collatz.py

You can see my Ocaml attempt at
http://static.eklitzke.org/problem14.ml . As you can see, it's a _lot_
more code, which alone leads me to think that I'm probably not doing
this idiomatically. When I run it (compiled with ocamlopt, ocaml 3.10
on Linux), the memory usage climbs and after it reaches about 128 MB
(after about 30 seconds) I get the error:

Fatal error: exception Stack_overflow

I think I could use some help with this problem. Maybe I just need to
increase the stack size (how do I do that?) -- the Python version gets
to about the same size right before finishing, so this seems like a
reasonable amount of memory to use -- but I'm concerned that maybe I'm
doing something else wrong, because I'd expect the Ocaml version to
run much more quickly given that it is compiled to native code.  I
would appreciate any pointers for things that I'm doing wrong or
awkwardly.

Thanks in advance!

I'm going to include the text of my ocaml version  inline below, but
due to the long lines in the program it will likely end up mangled :-/

module type INT = sig
  type t = int
  val compare : t -> t-> int
end;;

module Int : INT = struct
  type t = int
  let compare i j = if i < j then -1 else if i = j then 0 else 1
end;;

open Big_int
open Map

(* A hash table for ints *)
module IntMap = Map.Make(Int);;

(* This is where we're going to store memoized values for find_collatz_len *)
let ann = ref IntMap.empty;;

(* Find the length of the chain for the int n *)
let find_collatz_len (n:int) =

  (* This is the definition of the collatz map for big_ints *)
  let collatz (n:big_int) =
    let two_big_int = succ_big_int unit_big_int in
    if eq_big_int (mod_big_int n two_big_int) zero_big_int
    then
      div_big_int n two_big_int
    else
      succ_big_int(mult_int_big_int 3 n)

  (* Tries to find the value of key `k' in the map m, or 0 if not found *)
  and find_safe m k = try IntMap.find k m with Not_found -> 0 in

  (* Helper function that accepts big_ints; t is the term that we're
finding the length of *)
  let rec aux_collatz_len (t:big_int) =

    (* We use memoization for terms small enough to fit in an int *)
    if is_int_big_int t then
      let t_int = int_of_big_int t in
      let v = find_safe !ann t_int in
      if v <> 0 then
        v (* Cache hit -- we've already computed aux_collatz_len for
this value of t *)
      else
        if eq_big_int t unit_big_int then 1
        else
          let v' = 1 + aux_collatz_len (collatz t) in
          ann := IntMap.add t_int v' !ann ; v'
    else
      1 + aux_collatz_len (collatz t)
  in
  aux_collatz_len (big_int_of_int n);;

(* Makes the list [1..n] *)
let nats n =
  let rec rnats m = if m = 0 then [] else m :: (rnats (m - 1)) in
List.rev (rnats n) ;;

let biggest_int l =
  let rec aux l (num, len) =
    if l = [] then
      num
    else
      let (a, b) = List.hd l and t = List.tl l in if b > len then aux
t (a, b) else aux t (num, len)
  in
aux l (List.hd l) ;;

let solution = biggest_int (List.map (fun x -> (x, find_collatz_len
x)) (nats 999999)) ;;

print_endline (string_of_int solution)


-- 
Evan Klitzke <evan@yelp.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  3:17 Help with simple ocaml memoization problem Evan Klitzke
@ 2007-11-29  5:53 ` Peng Zang
  2007-11-29  6:12   ` Evan Klitzke
  2007-11-29  8:08   ` Jon Harrop
  0 siblings, 2 replies; 14+ messages in thread
From: Peng Zang @ 2007-11-29  5:53 UTC (permalink / raw)
  To: Evan Klitzke; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't know how to increase the stack size off the top of my head, but in 
general you want to avoid recursion on the stack anyways.  An easy way is to 
simply make the function tail recursive so the compiler can optimized it into 
a loop for you.  Here's a pretty faithful replication of your python code.  
Note I use Int64 instead of BigInt as I'm running OCaml 3.09.3.

let ( ++ ) = Int64.add;;
let ( ** ) = Int64.mul;;
let ( %% ) = Int64.rem;;
let ( // ) = Int64.div;;
let l1 = Int64.of_int 1;;
let l2 = Int64.of_int 2;;
let l3 = Int64.of_int 3;;
let cache = Hashtbl.create 1000000;;

let collatz = function
  | x when x %% l2 = l1 -> l3 ** x ++ l1
  | x -> x // l2
;;
let rec find_collatz_len = function
  | x when x = l1 -> l1
  | x when Hashtbl.mem cache x -> Lazy.force (Hashtbl.find cache x)
  | x -> let ans = lazy (l1 ++ find_collatz_len (collatz x)) in
      Hashtbl.add cache x ans;
      Lazy.force ans
;;
let lengths = Array.init 999999
  (fun x -> let x = Int64.of_int (x+1) in (x, find_collatz_len x));;
Array.sort (fun (a,b) (c,d) -> compare d b) lengths;;
lengths.(0);;


The use of laziness here is just so I can put the recursive call in the tail 
position -- a quick hack because I'm too lazy to spend time making it into a 
loop myself.

There should really be some easy syntax to write BigInts or Int64s directly in 
the code.  Is there some camlp4 that does this somewhere?  And it would nice 
if Int64 operators were predefined in the module.

Peng


On Wednesday 28 November 2007 10:17:42 pm Evan Klitzke wrote:
> Hi everyone,
>
> I'm learning Ocaml and I'm trying to figure out why the code I've
> written is so slow (and not working). I'm trying to learn Ocaml by
> implementing the Project Euler problems in Ocaml, and I'm stuck on
> problem 14 (http://projecteuler.net/index.php?section=problems&id=14).
>
> Basically I'm doing the 3*n+1 problem and I need to memoize values
> that I've already computed if the function is going to finish in a
> reasonable amount of time. I wrote a really short prototype of what I
> want to do in Python, and that code is really short and runs quickly
> (11 seconds); if you're interested you can see the prototype at
> http://static.eklitzke.org/collatz.py
>
> You can see my Ocaml attempt at
> http://static.eklitzke.org/problem14.ml . As you can see, it's a _lot_
> more code, which alone leads me to think that I'm probably not doing
> this idiomatically. When I run it (compiled with ocamlopt, ocaml 3.10
> on Linux), the memory usage climbs and after it reaches about 128 MB
> (after about 30 seconds) I get the error:
>
> Fatal error: exception Stack_overflow
>
> I think I could use some help with this problem. Maybe I just need to
> increase the stack size (how do I do that?) -- the Python version gets
> to about the same size right before finishing, so this seems like a
> reasonable amount of memory to use -- but I'm concerned that maybe I'm
> doing something else wrong, because I'd expect the Ocaml version to
> run much more quickly given that it is compiled to native code.  I
> would appreciate any pointers for things that I'm doing wrong or
> awkwardly.
>
> Thanks in advance!
>
> I'm going to include the text of my ocaml version  inline below, but
> due to the long lines in the program it will likely end up mangled :-/
>
> module type INT = sig
>   type t = int
>   val compare : t -> t-> int
> end;;
>
> module Int : INT = struct
>   type t = int
>   let compare i j = if i < j then -1 else if i = j then 0 else 1
> end;;
>
> open Big_int
> open Map
>
> (* A hash table for ints *)
> module IntMap = Map.Make(Int);;
>
> (* This is where we're going to store memoized values for find_collatz_len
> *) let ann = ref IntMap.empty;;
>
> (* Find the length of the chain for the int n *)
> let find_collatz_len (n:int) =
>
>   (* This is the definition of the collatz map for big_ints *)
>   let collatz (n:big_int) =
>     let two_big_int = succ_big_int unit_big_int in
>     if eq_big_int (mod_big_int n two_big_int) zero_big_int
>     then
>       div_big_int n two_big_int
>     else
>       succ_big_int(mult_int_big_int 3 n)
>
>   (* Tries to find the value of key `k' in the map m, or 0 if not found *)
>   and find_safe m k = try IntMap.find k m with Not_found -> 0 in
>
>   (* Helper function that accepts big_ints; t is the term that we're
> finding the length of *)
>   let rec aux_collatz_len (t:big_int) =
>
>     (* We use memoization for terms small enough to fit in an int *)
>     if is_int_big_int t then
>       let t_int = int_of_big_int t in
>       let v = find_safe !ann t_int in
>       if v <> 0 then
>         v (* Cache hit -- we've already computed aux_collatz_len for
> this value of t *)
>       else
>         if eq_big_int t unit_big_int then 1
>         else
>           let v' = 1 + aux_collatz_len (collatz t) in
>           ann := IntMap.add t_int v' !ann ; v'
>     else
>       1 + aux_collatz_len (collatz t)
>   in
>   aux_collatz_len (big_int_of_int n);;
>
> (* Makes the list [1..n] *)
> let nats n =
>   let rec rnats m = if m = 0 then [] else m :: (rnats (m - 1)) in
> List.rev (rnats n) ;;
>
> let biggest_int l =
>   let rec aux l (num, len) =
>     if l = [] then
>       num
>     else
>       let (a, b) = List.hd l and t = List.tl l in if b > len then aux
> t (a, b) else aux t (num, len)
>   in
> aux l (List.hd l) ;;
>
> let solution = biggest_int (List.map (fun x -> (x, find_collatz_len
> x)) (nats 999999)) ;;
>
> print_endline (string_of_int solution)


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)

iD8DBQFHTlPLfIRcEFL/JewRAuxTAKCr/Pe/xCTW2csXY/PpvYLL1yELKQCgrC4U
ykJ05fhUQIF6JKlwI0F8/cA=
=lLIG
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  5:53 ` [Caml-list] " Peng Zang
@ 2007-11-29  6:12   ` Evan Klitzke
  2007-11-29  8:16     ` David Allsopp
                       ` (2 more replies)
  2007-11-29  8:08   ` Jon Harrop
  1 sibling, 3 replies; 14+ messages in thread
From: Evan Klitzke @ 2007-11-29  6:12 UTC (permalink / raw)
  To: peng.zang; +Cc: caml-list

On 11/28/07, Peng Zang <peng.zang@gmail.com> wrote:
> I don't know how to increase the stack size off the top of my head, but in
> general you want to avoid recursion on the stack anyways.  An easy way is to
> simply make the function tail recursive so the compiler can optimized it into
> a loop for you.  Here's a pretty faithful replication of your python code.
> Note I use Int64 instead of BigInt as I'm running OCaml 3.09.3.
>
> let ( ++ ) = Int64.add;;
> let ( ** ) = Int64.mul;;
> let ( %% ) = Int64.rem;;
> let ( // ) = Int64.div;;
> let l1 = Int64.of_int 1;;
> let l2 = Int64.of_int 2;;
> let l3 = Int64.of_int 3;;
> let cache = Hashtbl.create 1000000;;
>
> let collatz = function
>   | x when x %% l2 = l1 -> l3 ** x ++ l1
>   | x -> x // l2
> ;;
> let rec find_collatz_len = function
>   | x when x = l1 -> l1
>   | x when Hashtbl.mem cache x -> Lazy.force (Hashtbl.find cache x)
>   | x -> let ans = lazy (l1 ++ find_collatz_len (collatz x)) in
>       Hashtbl.add cache x ans;
>       Lazy.force ans
> ;;
> let lengths = Array.init 999999
>   (fun x -> let x = Int64.of_int (x+1) in (x, find_collatz_len x));;
> Array.sort (fun (a,b) (c,d) -> compare d b) lengths;;
> lengths.(0);;
>
>
> The use of laziness here is just so I can put the recursive call in the tail
> position -- a quick hack because I'm too lazy to spend time making it into a
> loop myself.
>
> There should really be some easy syntax to write BigInts or Int64s directly in
> the code.  Is there some camlp4 that does this somewhere?  And it would nice
> if Int64 operators were predefined in the module.

Thanks Peng. This is much easier to grok than the code that I
originally wrote! One question that I have is what is the difference
between the Map and Hashtbl modules? From the documentation they look
very similar -- why did you use Hashtbl here rather than Map?

-- 
Evan Klitzke <evan@yelp.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  5:53 ` [Caml-list] " Peng Zang
  2007-11-29  6:12   ` Evan Klitzke
@ 2007-11-29  8:08   ` Jon Harrop
  2007-11-29 15:59     ` Peng Zang
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Harrop @ 2007-11-29  8:08 UTC (permalink / raw)
  To: caml-list

On Thursday 29 November 2007 05:53, Peng Zang wrote:
> I don't know how to increase the stack size off the top of my head, but in
> general you want to avoid recursion on the stack anyways.  An easy way is
> to simply make the function tail recursive so the compiler can optimized it
> into a loop for you.  Here's a pretty faithful replication of your python
> code. Note I use Int64 instead of BigInt as I'm running OCaml 3.09.3.
> ...
> let l1 = Int64.of_int 1;;
> let l2 = Int64.of_int 2;;
> let l3 = Int64.of_int 3;;

OCaml has int64 literals built-in:

# 3L;;
- : int64 = 3L

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:16     ` David Allsopp
@ 2007-11-29  8:11       ` Jon Harrop
  2007-11-29  8:58         ` Jean-Christophe Filliâtre
  2007-11-29  8:40       ` Luc Maranget
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Harrop @ 2007-11-29  8:11 UTC (permalink / raw)
  To: caml-list

On Thursday 29 November 2007 08:16, David Allsopp wrote:
> Map is often slower (though hash tables potentially waste a lot of space...

The Map implementation in the OCaml stdlib is also quite inefficient. I did a 
little benchmark once and discovered that Maps actually waste more space than 
Hashtbls.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  6:12   ` Evan Klitzke
@ 2007-11-29  8:16     ` David Allsopp
  2007-11-29  8:11       ` Jon Harrop
  2007-11-29  8:40       ` Luc Maranget
  2007-11-29  8:47     ` Jean-Christophe Filliâtre
  2007-12-04 23:49     ` Peng Zang
  2 siblings, 2 replies; 14+ messages in thread
From: David Allsopp @ 2007-11-29  8:16 UTC (permalink / raw)
  To: caml-list

On 11/28/07, Evan Klitzke <evan@yelp.com> wrote:
> On 11/28/07, Peng Zang <peng.zang@gmail.com> wrote:
> > I don't know how to increase the stack size off the top of my head, but
> > in general you want to avoid recursion on the stack anyways.  An easy
> > way is to...
>
> Thanks Peng. This is much easier to grok than the code that I
> originally wrote! One question that I have is what is the difference
> between the Map and Hashtbl modules? From the documentation they look
> very similar -- why did you use Hashtbl here rather than Map?

Map is often slower (though hash tables potentially waste a lot of space and
will be slower if you get lots of collisions, of course) and the functorial
interface means that when "knocking" something together it's often tempting
to use Hashtbl immediately just to save typing!


David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:16     ` David Allsopp
  2007-11-29  8:11       ` Jon Harrop
@ 2007-11-29  8:40       ` Luc Maranget
  1 sibling, 0 replies; 14+ messages in thread
From: Luc Maranget @ 2007-11-29  8:40 UTC (permalink / raw)
  To: caml-list

> On 11/28/07, Evan Klitzke <evan@yelp.com> wrote:
> > On 11/28/07, Peng Zang <peng.zang@gmail.com> wrote:
> > > I don't know how to increase the stack size off the top of my head, but
> > > in general you want to avoid recursion on the stack anyways.  An easy
> > > way is to...
> >
> > Thanks Peng. This is much easier to grok than the code that I
> > originally wrote! One question that I have is what is the difference
> > between the Map and Hashtbl modules? From the documentation they look
> > very similar -- why did you use Hashtbl here rather than Map?
> 
> Map is often slower (though hash tables potentially waste a lot of space and
> will be slower if you get lots of collisions, of course) and the functorial
> interface means that when "knocking" something together it's often tempting
> to use Hashtbl immediately just to save typing!
> 
> 
> David

There is another importtant difference, Hashtbl easily offers
a mutable data-structure, while Map offers a non-mutable (and thus
persistent) data structure.

As matter of fact, the OP code defined
let ann = ref IntMap.empty;;
          ^^
In that context, Hashtbl is also more natural.

-- 
Luc Maranget


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  6:12   ` Evan Klitzke
  2007-11-29  8:16     ` David Allsopp
@ 2007-11-29  8:47     ` Jean-Christophe Filliâtre
  2007-12-04 23:49     ` Peng Zang
  2 siblings, 0 replies; 14+ messages in thread
From: Jean-Christophe Filliâtre @ 2007-11-29  8:47 UTC (permalink / raw)
  To: Evan Klitzke; +Cc: peng.zang, caml-list

Evan Klitzke wrote:
> One question that I have is what is the difference
> between the Map and Hashtbl modules? From the documentation they look
> very similar -- why did you use Hashtbl here rather than Map?

Hashtbl implements an imperative data structure i.e. association tables
which are modified in-place when inserting, removing, etc.

On the contrary, Map implements a persistent data structure i.e. tables
which are not modified when you perform operations; instead, new tables
are returned, the previous ones being unchanged.

(This can be seen in the types

	Hashtbl.add : ('a, 'b) Hashtbl.t -> 'a -> 'b -> unit
	SomeMap.add : key -> 'a -> 'a t -> 'a t

Here you can see the return type unit for hash tables, and the return
type 'a t for the maps.)

There are many cases where persistence can help improving your code:
backtracking, sharing between several data structures, clarity and
correctness of the code, etc.

In the case of this particular example, however, the use of a hash table
is perfectly fine, as demonstrated by Peng. With a good hash function,
insertion in a hash table typically runs in constant time, while
insertion in a map has logarithmic cost. But don't be misleaded by this
performance comparison: persistence has so many advantages that it is
often the case that you want to pay this extra cost.

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:11       ` Jon Harrop
@ 2007-11-29  8:58         ` Jean-Christophe Filliâtre
  2007-11-29 18:57           ` Jon Harrop
  2007-11-29 22:25           ` Jon Harrop
  0 siblings, 2 replies; 14+ messages in thread
From: Jean-Christophe Filliâtre @ 2007-11-29  8:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> The Map implementation in the OCaml stdlib is also quite inefficient. I did a 
> little benchmark once and discovered that Maps actually waste more space than 
> Hashtbls.

I find it unfair to compare an imperative and a persistence data
structure for performances. Of course you are going to use some extra
space if you need to keep old versions of the data stuctures valid.
But you are sharing *a lot* among the various versions. So if you are
manipulating several sets/maps with common ancestors at the same time,
you are saving memory w.r.t. other data structures such as hash tables.

Of course, if you are using a single data structure, in a linear way,
then yes a hash table is probably more efficient (provided you have a
good hash function, which is not always easy to achieve).

Regarding implementation of ocaml maps, I wouldn't say that it is
inefficient: I did my own benchmarls (on sets, but this is the same
code) and found that ocaml AVLs are really efficient, on the contrary.
It usually beats other implementations (e.g. red-black trees from the
SML stdlib), or even specialized structures such as Patricia trees (when
keys are integers) on some operations.

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:08   ` Jon Harrop
@ 2007-11-29 15:59     ` Peng Zang
  0 siblings, 0 replies; 14+ messages in thread
From: Peng Zang @ 2007-11-29 15:59 UTC (permalink / raw)
  To: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ahhh.. that is helpful.  Thanks Jon,

Peng

On Thursday 29 November 2007 03:08:45 am Jon Harrop wrote:
> On Thursday 29 November 2007 05:53, Peng Zang wrote:
> > I don't know how to increase the stack size off the top of my head, but
> > in general you want to avoid recursion on the stack anyways.  An easy way
> > is to simply make the function tail recursive so the compiler can
> > optimized it into a loop for you.  Here's a pretty faithful replication
> > of your python code. Note I use Int64 instead of BigInt as I'm running
> > OCaml 3.09.3. ...
> > let l1 = Int64.of_int 1;;
> > let l2 = Int64.of_int 2;;
> > let l3 = Int64.of_int 3;;
>
> OCaml has int64 literals built-in:
>
> # 3L;;
> - : int64 = 3L


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)

iD8DBQFHTuHxfIRcEFL/JewRAhmoAKDCBN9QvQD5kxQATcOG9Ekj3YDkNQCeLiNl
1aGy4MJSYcMZfpcZY/D2chE=
=WzAJ
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:58         ` Jean-Christophe Filliâtre
@ 2007-11-29 18:57           ` Jon Harrop
  2007-11-29 22:25           ` Jon Harrop
  1 sibling, 0 replies; 14+ messages in thread
From: Jon Harrop @ 2007-11-29 18:57 UTC (permalink / raw)
  To: Jean-Christophe Filliâtre; +Cc: caml-list

On Thursday 29 November 2007 08:58, Jean-Christophe Filliâtre wrote:
> Jon Harrop wrote:
> > The Map implementation in the OCaml stdlib is also quite inefficient. I
> > did a little benchmark once and discovered that Maps actually waste more
> > space than Hashtbls.
>
> I find it unfair to compare an imperative and a persistence data
> structure for performances.

I agree.

> Of course you are going to use some extra 
> space if you need to keep old versions of the data stuctures valid.
> But you are sharing *a lot* among the various versions. So if you are
> manipulating several sets/maps with common ancestors at the same time,
> you are saving memory w.r.t. other data structures such as hash tables.

True, my benchmark was a drop-in replacement with no sharing.

> Of course, if you are using a single data structure, in a linear way,
> then yes a hash table is probably more efficient (provided you have a
> good hash function, which is not always easy to achieve).
>
> Regarding implementation of ocaml maps, I wouldn't say that it is
> inefficient: I did my own benchmarls (on sets, but this is the same
> code) and found that ocaml AVLs are really efficient, on the contrary.
> It usually beats other implementations (e.g. red-black trees from the
> SML stdlib), or even specialized structures such as Patricia trees (when
> keys are integers) on some operations.

I found that by manually unrolling with a Node1 constructor for single-element 
nodes you can reduce GC load and increase performance by ~30%.

Perhaps "badly optimized" would have been a better phrase. For example, the 
Map implementation in the OCaml stdlib manually inlines the height function 
even thought it makes relatively little difference to performance: GC load is 
the real killer for most immutable data structures.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  8:58         ` Jean-Christophe Filliâtre
  2007-11-29 18:57           ` Jon Harrop
@ 2007-11-29 22:25           ` Jon Harrop
  2007-11-30 11:03             ` Jean-Christophe Filliâtre
  1 sibling, 1 reply; 14+ messages in thread
From: Jon Harrop @ 2007-11-29 22:25 UTC (permalink / raw)
  To: caml-list

On Thursday 29 November 2007 08:58, Jean-Christophe Filliâtre wrote:
> Jon Harrop wrote:
> > The Map implementation in the OCaml stdlib is also quite inefficient. I
> > did a little benchmark once and discovered that Maps actually waste more
> > space than Hashtbls.
>
> I find it unfair to compare an imperative and a persistence data
> structure for performances. Of course you are going to use some extra
> space if you need to keep old versions of the data stuctures valid.
> But you are sharing *a lot* among the various versions. So if you are
> manipulating several sets/maps with common ancestors at the same time,
> you are saving memory w.r.t. other data structures such as hash tables.

Incidentally, what are the pedagogical applications of shared maps?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29 22:25           ` Jon Harrop
@ 2007-11-30 11:03             ` Jean-Christophe Filliâtre
  0 siblings, 0 replies; 14+ messages in thread
From: Jean-Christophe Filliâtre @ 2007-11-30 11:03 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> Incidentally, what are the pedagogical applications of shared maps?

Here is an example: when computing the FIRST and FOLLOW sets of a
grammar, you typically make unions of already computed sets for same or
other non-terminals. It potentially results in sharing between the
various sets.

More generally, static analysis traversing AST and computing sets (or
maps) for various nodes (liveness analysis, etc.) are likely to build
data structures which share subparts (when they are tree-based data
structures typically).

I hope it makes my point a little clearer.

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Caml-list] Help with simple ocaml memoization problem
  2007-11-29  6:12   ` Evan Klitzke
  2007-11-29  8:16     ` David Allsopp
  2007-11-29  8:47     ` Jean-Christophe Filliâtre
@ 2007-12-04 23:49     ` Peng Zang
  2 siblings, 0 replies; 14+ messages in thread
From: Peng Zang @ 2007-12-04 23:49 UTC (permalink / raw)
  To: Evan Klitzke; +Cc: caml-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oops.  I was looking at this code again, and it actually isn't tail recursive.  
Ie. It's still recursing on the stack.  For one, I still add 1 after I make 
the recursive call and two, wrapping it in a lazy means once the value is 
calculated it must go back to memoize it.  But, it turns out it doesn't 
matter as there's not that many levels of recursion anyways.  The value 
837799 has a collatz lenght of 525 which means about that number of recursive 
calls.  I actually think the problem you had before was overflow from using 
ints.  In any event, here's the code without any laziness to confuse the 
issue.

let ( ++ ) = Int64.add;;
let ( ** ) = Int64.mul;;
let ( %% ) = Int64.rem;;
let ( // ) = Int64.div;;
let cache = Hashtbl.create 1000000;;
let cache2 = Hashtbl.create 1000000;;


let collatz = function
  | x when x %% 2L = 1L -> 3L ** x ++ 1L
  | x -> x // 2L
;;

let rec find_collatz_len x = match x with
  | 1L -> 1L
  | x when Hashtbl.mem cache x -> Hashtbl.find cache x
  | x -> let ans = 1L ++ find_collatz_len (collatz x) in
      Hashtbl.add cache x ans;
      ans
;;

let lengths = Array.init 999999
  (fun x -> let x = Int64.of_int (x + 1) in (x, find_collatz_len x));;
Array.sort (fun (a,b) (c,d) -> compare d b) lengths;;
lengths.(0)


Peng




On Thursday 29 November 2007 01:12:57 am Evan Klitzke wrote:
> On 11/28/07, Peng Zang <peng.zang@gmail.com> wrote:
> > I don't know how to increase the stack size off the top of my head, but
> > in general you want to avoid recursion on the stack anyways.  An easy way
> > is to simply make the function tail recursive so the compiler can
> > optimized it into a loop for you.  Here's a pretty faithful replication
> > of your python code. Note I use Int64 instead of BigInt as I'm running
> > OCaml 3.09.3.
> >
> > let ( ++ ) = Int64.add;;
> > let ( ** ) = Int64.mul;;
> > let ( %% ) = Int64.rem;;
> > let ( // ) = Int64.div;;
> > let l1 = Int64.of_int 1;;
> > let l2 = Int64.of_int 2;;
> > let l3 = Int64.of_int 3;;
> > let cache = Hashtbl.create 1000000;;
> >
> > let collatz = function
> >
> >   | x when x %% l2 = l1 -> l3 ** x ++ l1
> >   | x -> x // l2
> >
> > ;;
> > let rec find_collatz_len = function
> >
> >   | x when x = l1 -> l1
> >   | x when Hashtbl.mem cache x -> Lazy.force (Hashtbl.find cache x)
> >   | x -> let ans = lazy (l1 ++ find_collatz_len (collatz x)) in
> >
> >       Hashtbl.add cache x ans;
> >       Lazy.force ans
> > ;;
> > let lengths = Array.init 999999
> >   (fun x -> let x = Int64.of_int (x+1) in (x, find_collatz_len x));;
> > Array.sort (fun (a,b) (c,d) -> compare d b) lengths;;
> > lengths.(0);;
> >
> >
> > The use of laziness here is just so I can put the recursive call in the
> > tail position -- a quick hack because I'm too lazy to spend time making
> > it into a loop myself.
> >
> > There should really be some easy syntax to write BigInts or Int64s
> > directly in the code.  Is there some camlp4 that does this somewhere? 
> > And it would nice if Int64 operators were predefined in the module.
>
> Thanks Peng. This is much easier to grok than the code that I
> originally wrote! One question that I have is what is the difference
> between the Map and Hashtbl modules? From the documentation they look
> very similar -- why did you use Hashtbl here rather than Map?


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)

iD8DBQFHVeeJfIRcEFL/JewRAv64AKC5xEgXeTibAg0BEPNyrFTIMCPprgCgqEYB
UG8FMXoAWBJBzicGk2z3MB8=
=Gw33
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-12-04 23:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-29  3:17 Help with simple ocaml memoization problem Evan Klitzke
2007-11-29  5:53 ` [Caml-list] " Peng Zang
2007-11-29  6:12   ` Evan Klitzke
2007-11-29  8:16     ` David Allsopp
2007-11-29  8:11       ` Jon Harrop
2007-11-29  8:58         ` Jean-Christophe Filliâtre
2007-11-29 18:57           ` Jon Harrop
2007-11-29 22:25           ` Jon Harrop
2007-11-30 11:03             ` Jean-Christophe Filliâtre
2007-11-29  8:40       ` Luc Maranget
2007-11-29  8:47     ` Jean-Christophe Filliâtre
2007-12-04 23:49     ` Peng Zang
2007-11-29  8:08   ` Jon Harrop
2007-11-29 15:59     ` Peng Zang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).