caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] speed, loops vs. hofs
@ 2003-04-28 17:13 Hal Daume III
  2003-04-29 14:39 ` [Caml-list] " Andrzej M. Ostruszka
  2003-04-30  0:45 ` [Caml-list] " Manos Renieris
  0 siblings, 2 replies; 3+ messages in thread
From: Hal Daume III @ 2003-04-28 17:13 UTC (permalink / raw)
  To: Caml Mailing List

One of the primary reasons I use OCaml is because it offers speed,
together with nice functional stuff, like HOFs.  However, it seems that
using these together might be impossible.

I have a module which allows iteration over a particular datatype.  I have
a method of doing this iteration very quickly using for loops.  In
particular, I have a function of type:

  loopPre : hmm -> node -> (node -> int -> bool -> unit) -> unit

which loops over the predecessors of the given node in the HMM.  The
implementation basically looks like:

  let loopPre hmm node action = 
    let ... some definitions ...
      begin
        action {...make the node...} 0 (...make the bool...);
        for i = 0 to num_nodes do
          action {...make the node...} i (...make the bool...);
          for j = 0 to {something} do
            action {...make the node...} j (...make the bool...);
          done;
        done;

basically, there are a few calls to 'action' in a pair of nested
loops.  Then, at the usage point, I do something like:

  ...1...
  loopPre hmm node_j
    (fun nd len null ->
      ...2...
    )
  ...3...

I had a sneaking feeling that if I actually stuck the for loops into the
original code I would get speedups.  However, "...2..." is pretty long, so
there were be (a) a bunch of code duplication and (b) it would be hard to
maintain.

In order to test this, I wrote the following program:

% cat loops.ml
let loop low high f =
  begin
    f 0 true;

    for i=low to high do
      f i false;
    done;

    f (-1) true;
  end

% cat useLoops.ml
open Loops

let top = 500000000

let cnt = ref 0
let pre1 = Sys.time ()
let _ = loop 1 top (fun i b -> if not b then cnt := !cnt + 1)
let pst1 = Sys.time ()

let cnt = ref 0
let pre2 = Sys.time ()
let _ = 
  let f i b = if not b then cnt := !cnt + 1 in
    loop 1 top f
let pst2 = Sys.time ()


let cnt = ref 0
let pre3 = Sys.time ()
let _ = 
  begin
    let i = 0 in let b = true in
      if not b then cnt := !cnt + 1;
      
      for i=1 to top do
        let b = false in
          if not b then cnt := !cnt + 1;
      done;

      let i = -1 in let b = true in
        if not b then cnt := !cnt + 1;
  end
let pst3 = Sys.time ()

let cnt = ref 0
let pre4 = Sys.time ()
let _ = 
  let f i b = if not b then cnt := !cnt + 1 in
    begin
      let i = 0 in let b = true in
        f i b;

        for i=1 to top do
          let b = false in
            f i b;
        done;

        let i = -1 in let b = true in
          f i b;
    end
let pst4 = Sys.time ()

let _ = Printf.printf "Time 1 = %f sec\n" (pst1-.pre1)
let _ = Printf.printf "Time 2 = %f sec\n" (pst2-.pre2)
let _ = Printf.printf "Time 3 = %f sec\n" (pst3-.pre3)
let _ = Printf.printf "Time 4 = %f sec\n" (pst4-.pre4)

%


Here, there are four loops done.  The first two use the HO loop function
from the Loops module (which is very much like my iterPre function); the
last two use loops.  One of them inserts the funciton definition at every
usage; the second does a let-bound definition.

For loops of 500 million elements, the timings come out:

Time 1 = 10.230000 sec
Time 2 = 10.230000 sec
Time 3 = 2.610000 sec
Time 4 = 7.020000 sec

on my x86 linux box.

I find these numbers very disturbing.  First, just looking at the
discrepency between 3 and 4.  I'd imagine this happens because in (3), the
"let b = false in if not b..." gets folded down to just "cnt := !cnt + 1".  
To test, this I also ran:

let _ = 
  let f i b = cnt := !cnt + 1 in
  let g i b = () in
    begin
      let i = 0 in let b = true in
	g i b;
	
	for i=1 to top do
	  let b = false in
	    f i b;
	done;
	
	let i = -1 in let b = true in
	  g i b;
    end

where the unfolding is done manually.  This implementation came in at:

Time 5 = 3.570000 sec

This seems to imply that I will triple the speed of my program by making
it (a) non-functional and (b) unreadable.  Is this true?  Is there another
way to get around this problem?

 - Hal


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Caml-list] Re: speed, loops vs. hofs
  2003-04-28 17:13 [Caml-list] speed, loops vs. hofs Hal Daume III
@ 2003-04-29 14:39 ` Andrzej M. Ostruszka
  2003-04-30  0:45 ` [Caml-list] " Manos Renieris
  1 sibling, 0 replies; 3+ messages in thread
From: Andrzej M. Ostruszka @ 2003-04-29 14:39 UTC (permalink / raw)
  To: Caml Mailing List

On Mon, Apr 28 (2003), Hal Daume III wrote:
[...]
> I find these numbers very disturbing.  First, just looking at the
> discrepency between 3 and 4.  I'd imagine this happens because in (3), the
> "let b = false in if not b..." gets folded down to just "cnt := !cnt + 1".  

<< Disclaimer: This is an answer of a newbie :)) >>

I haven't checked the assembler output but it looks like the
>   let f i b = if not b then cnt := !cnt + 1 in
is not inlined in the 4th version -- a bit surprising that default
inlining level is that conservative.

I've cut&pasted the 3rd and 4th functions into the test.ml:
[amo@order ostruszk]$ ocamlopt test.ml
[amo@order ostruszk]$ ./a.out
Time 3 = 1.050000 sec
Time 4 = 3.090000 sec
[amo@order ostruszk]$ ocamlopt -inline 2 test.ml
[amo@order ostruszk]$ ./a.out
Time 3 = 1.050000 sec
Time 4 = 1.050000 sec

As to the module version: it is impossible to inline unknown function.
Functions passed in arguments are unknown to the compiler -- unless
you've got some "whole program analyser" that checks every use of your
loop function -- so every call will be done via the function pointer.

I hope I haven't messed up anything and this will be of any help to you
:)
						Best regards
-- 
    ____   _  ___
   /  | \_/ |/ _ \		Andrzej Marek Ostruszka
  / _ |     | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow)
 /_/ L|_|V|_|\___/	(PGP <-- finger ostruszk@order.if.uj.edu.pl)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] speed, loops vs. hofs
  2003-04-28 17:13 [Caml-list] speed, loops vs. hofs Hal Daume III
  2003-04-29 14:39 ` [Caml-list] " Andrzej M. Ostruszka
@ 2003-04-30  0:45 ` Manos Renieris
  1 sibling, 0 replies; 3+ messages in thread
From: Manos Renieris @ 2003-04-30  0:45 UTC (permalink / raw)
  To: Hal Daume III; +Cc: Caml Mailing List

On Mon, Apr 28, 2003 at 10:13:17AM -0700, Hal Daume III wrote:
> One of the primary reasons I use OCaml is because it offers speed,
> together with nice functional stuff, like HOFs.  However, it seems that
> using these together might be impossible.
> 
> I have a module which allows iteration over a particular datatype.  I have
> a method of doing this iteration very quickly using for loops.  In
> particular, I have a function of type:
> 
>   loopPre : hmm -> node -> (node -> int -> bool -> unit) -> unit
> 
[...]
>  Then, at the usage point, I do something like:
> 
>   ...1...
>   loopPre hmm node_j
>     (fun nd len null ->
>       ...2...
>     )
>   ...3...
> 
> I had a sneaking feeling that if I actually stuck the for loops into the
> original code I would get speedups.  However, "...2..." is pretty long, so

If "...2..." is long, then the cost of the function call is going to 
insignificant wrt the cost of running "...2...". In the examples you
wrote, "...2..." is very short, so its cost is comparable to the cost of 
function calls, except ocamlopt is smarter than that and inlines most of them.

There is a chance that after the inlinining there is enough context to
optimize the inlined function bodies further, but I don't know the
innards of the compiler well enough to say if this happens or not.

-- Manos

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-04-30  0:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-28 17:13 [Caml-list] speed, loops vs. hofs Hal Daume III
2003-04-29 14:39 ` [Caml-list] " Andrzej M. Ostruszka
2003-04-30  0:45 ` [Caml-list] " Manos Renieris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).