caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Interesting optimization
@ 2004-07-22 21:07 Daniel Bünzli
  2004-07-22 21:40 ` John Prevost
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bünzli @ 2004-07-22 21:07 UTC (permalink / raw)
  To: caml-list

Hello,

I usually try not to be too much obsessed with speed, but I had the 
following interesting experience. While rearanging some checksum code I 
thought that I had rewritten it in a more efficient way. However I 
turned out not to be the case.

I can boil the example to the following (n.b. loops don't compute 
anything usefull). Basically I rewrote update into update'.

--- test.ml ---
let update c =
   let c' = ref !c in
   for n = 0 to max_int do
     c' := !c' land 0xff
   done;
   c := !c'

let update' c =
   for n = 0 to max_int do
     c := !c land 0xff;
   done

let compute use_ref =
   let x = ref 2 in
   if use_ref then update' x else update x;
   print_int !x

let main () =
   let use_ref = ref false in
   let args = [("-ref", (Arg.Set use_ref), "use reference directly")] in
   Arg.parse args (fun _ -> ()) "";
   compute !use_ref

let () = main ()
---------------

 > ocamlopt -o test.opt test.ml
 > time ./test.opt
2
real    0m3.500s
user    0m3.230s
sys     0m0.010s
 > time ./test.opt -ref
2
real    0m7.599s
user    0m7.550s
sys     0m0.030s

The few that I can read of ppc assembly tells me that in update the 
value of c' is directly stored in a register whearas update' accesses 
memory on each iteration.

Note that this is not restricted to int's, it occured to me with an 
int32. I guess it should work with anything that gets into a register.

Cheers,

Daniel

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Interesting optimization
  2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli
@ 2004-07-22 21:40 ` John Prevost
  2004-07-23  0:10   ` Brian Hurt
  0 siblings, 1 reply; 4+ messages in thread
From: John Prevost @ 2004-07-22 21:40 UTC (permalink / raw)
  To: Ocaml Mailing List

On Thu, 22 Jul 2004 23:07:03 +0200, Daniel Bünzli
<daniel.buenzli@epfl.ch> wrote:
> I usually try not to be too much obsessed with speed, but I had the
> following interesting experience. While rearanging some checksum code I
> thought that I had rewritten it in a more efficient way. However I
> turned out not to be the case.
> 
> I can boil the example to the following (n.b. loops don't compute
> anything usefull). Basically I rewrote update into update'.

Here's a slight addition to your comparison, which may or may not be
of interest to you.  I added this function:

let update'' c =
  let rec loop c' a =
    if a >= 0
      then loop (c' land 0xff) (succ a)
      else c'
  in
  c := (loop !c 0)

Basically, the same thing as your update, only using a tail-recursive
loop with an argument accumulator instead of using a for loop with a
reference accumulator.  Note that I had to dodge a little on the loop
end condition, since a will never be > max_int, which would be the
normal test for the end of the loop.

While I'll pretty much always prefer the tail-call version (which
avoids references altogether), the fact that the reference has so
little impact when used in the way you described is very nice to know.

using update:
real    0m9.707s
user    0m9.710s
sys     0m0.010s

using update':
real    0m16.172s
user    0m16.140s
sys     0m0.030s

using update'':
real    0m8.321s
user    0m8.320s
sys     0m0.000s

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Interesting optimization
  2004-07-22 21:40 ` John Prevost
@ 2004-07-23  0:10   ` Brian Hurt
  2004-07-23  1:22     ` John Prevost
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Hurt @ 2004-07-23  0:10 UTC (permalink / raw)
  To: John Prevost; +Cc: Ocaml Mailing List

On Thu, 22 Jul 2004, John Prevost wrote:

> let update'' c =
>   let rec loop c' a =
>     if a >= 0
>       then loop (c' land 0xff) (succ a)
>       else c'
>   in
>   c := (loop !c 0)


let update''' c =
    let rec loop c i =
        let c = c land 0xff in
        if (i < max_int) then
            loop c (i+1)
        else
            c
    in
    loop c 0
;;

Haven't tried timing it.  But the core code on the x86 becomes:
Temp__loop_59:
.L101:
        andl    $511, %eax
        movl    $2147483647, %ecx
        cmpl    %ecx, %ebx
        jge     .L100
        addl    $2, %ebx
        jmp     .L101
        .align  16

I'm slightly disappointed that Ocaml didn't just
	cmpl	$2147483647, %ebx
and not use ecx at all.  Not that this is that big a problem.  The only 
other possible optimizations would be to recognize that the entire loop is 
pointless, and the function can be replaced by:
let update c = c land 0xff;;

I feel comfortable leaving that particular optimization in the hands of 
the programmer, however.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Interesting optimization
  2004-07-23  0:10   ` Brian Hurt
@ 2004-07-23  1:22     ` John Prevost
  0 siblings, 0 replies; 4+ messages in thread
From: John Prevost @ 2004-07-23  1:22 UTC (permalink / raw)
  To: Ocaml Mailing List

update''' is a bit slower than update'' is on my system--midway
between the two reference versions:

using update''':
real    0m12.940s
user    0m12.930s
sys     0m0.010s

Just for reference, here's the generated intel assembly for the loop
part of each version:

update:
.L106:
        andl    $511, %esi
        movl    %ecx, %edx
        addl    $2, %ecx
        movl    $2147483647, %ebx                    (1)
        cmpl    %ebx, %edx
        jne     .L106

update':
.L109:
        movl    (%eax), %ebx
        andl    $511, %ebx
        movl    %ebx, (%eax)
        movl    %ecx, %edx
        addl    $2, %ecx
        movl    $2147483647, %ebx
        cmpl    %ebx, %edx
        jne     .L109

update'':
camlTest__loop_72:
.L101:
        cmpl    $1, %ebx
        jl      .L100
        addl    $2, %ebx
        andl    $511, %eax
        jmp     .L101

update''':
camlTest__loop_77:
.L103:
        andl    $511, %eax
        movl    $2147483647, %ecx
        cmpl    %ecx, %ebx
        jge     .L102
        addl    $2, %ebx
        jmp     .L103

Looking at the above, and comparing to two more versions:

let update'4 c =
  let rec loop c' a =
    if a = max_int
      then c'
      else loop (c' land 0xff) (succ a)
  in
  c := (loop !c 0)

camlTest__loop_83:
.L105:
        movl    $2147483647, %ecx
        cmpl    %ecx, %ebx
        jne     .L104
        ret
        .align  16
.L104:
        addl    $2, %ebx
        andl    $511, %eax
        jmp     .L105

let update'5 c =
  let c' = ref !c in
  for n = max_int downto 0 do
    c' := !c' land 0xff
  done;
  c := !c'

.L108:
        andl    $511, %edx
        movl    %ebx, %ecx         (3)
        subl    $2, %ebx
        cmpl    $1, %ecx           (2)
        jne     .L108


which has about the same performance as loop''', I feel justified in
saying the following:

The main interest here is what Daniel Bünzli originally noted: if a
reference is defined and then used in the local scope of a function,
you avoid ever actually working with the reference itself, which
removes that level of overhead.


Everything else is really all about squeezing the last ounce of
performance out, by as little as a single instruction.  There's really
not much profit in this, unless the loop in question is very important
and is known to be a bottleneck.  In fact, don't read anything more I
say unless you're really interested in that--unless you plan to look
at your own assembly code by hand, it's better to trust the compiler
than to second-guess it.

Specific notes from the above and from a couple of other exploratory functions:

a) if ... then ... else ... expressions bounding a tail loop should
have the result case in the "else" clause.  Putting the result case in
the "then" clause results in a little skip.  In essence, the compiler
assumes that the true case is the expected case.

b) match expressions seem to be man-handled more by the compiler.  You
can still predict what's going to happen, however.  As a general rule,
pattern matching works well when at least one case has an actual value
to match against.  If every pattern is a don't care and some have
guards, expect it to behave somewhat like the "if" bit above: the
first case is assumed to be the expected case.  When it matters and
you're not sure, look at the assembly.

c) as a special warning:

    match e with true -> e1 | false -> e2

is much less efficient than

    if e then e1 else e2


In short, if you know which branch is more expected, put it in the
"then" clause.  And trust that pattern matching is pretty smart unless
you do something really silly (like case c, or like using only guarded
don't-care patterns.)

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-07-23  1:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-07-22 21:07 [Caml-list] Interesting optimization Daniel Bünzli
2004-07-22 21:40 ` John Prevost
2004-07-23  0:10   ` Brian Hurt
2004-07-23  1:22     ` John Prevost

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).