caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Re: Binary logarithm of a power of 2
@ 2011-05-27 17:21 Dario Teixeira
  2011-05-28  5:29 ` Gabriel Kerneis
  0 siblings, 1 reply; 13+ messages in thread
From: Dario Teixeira @ 2011-05-27 17:21 UTC (permalink / raw)
  To: caml-list, Sylvain Le Gall

Hi,

> The code is not bug proof, you should check it ;-)

Yeap, there's one bug which manifests itself on whole-byte boundaries
(ex: logbin_array 512l, logbin_array 65536l)...

> 
> You can except at least 18% increase in term of performance
> on amd64 and a 1% decrease on i386. 
 
In my synthetic benchmark it behaves as well as my initial C-based
but "alloc-happy" solution, so congratulations!  (Though it is of
course slower than the C-based "noalloc" version).

Incidentally, I found a web page with lots of cool bit-twiddling hacks,
which includes several solutions to the binary logarithm problem:
http://graphics.stanford.edu/~seander/bithacks.html

Cheers,
Dario



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 17:21 [Caml-list] Re: Binary logarithm of a power of 2 Dario Teixeira
@ 2011-05-28  5:29 ` Gabriel Kerneis
  2011-05-28 16:17   ` Norman Hardy
  0 siblings, 1 reply; 13+ messages in thread
From: Gabriel Kerneis @ 2011-05-28  5:29 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list, Sylvain Le Gall

On Fri, May 27, 2011 at 10:21:09AM -0700, Dario Teixeira wrote:
> Incidentally, I found a web page with lots of cool bit-twiddling hacks,
> which includes several solutions to the binary logarithm problem:
> http://graphics.stanford.edu/~seander/bithacks.html

Some more of them:
http://www.cl.cam.ac.uk/~am21/hakmemc.html

Those are the (famous) HAKMEM tricks, translated to C by Alan Mycroft.
Probably of no practical use, but learning what the fixed point of the float
function was on a PDP-10 is priceless...

Best,
-- 
Gabriel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-28  5:29 ` Gabriel Kerneis
@ 2011-05-28 16:17   ` Norman Hardy
  0 siblings, 0 replies; 13+ messages in thread
From: Norman Hardy @ 2011-05-28 16:17 UTC (permalink / raw)
  To: Gabriel Kerneis; +Cc: Dario Teixeira, caml-list, Sylvain Le Gall


On 2011 May 27, at 22:29 , Gabriel Kerneis wrote:

> On Fri, May 27, 2011 at 10:21:09AM -0700, Dario Teixeira wrote:
>> Incidentally, I found a web page with lots of cool bit-twiddling hacks,
>> which includes several solutions to the binary logarithm problem:
>> http://graphics.stanford.edu/~seander/bithacks.html
> 
> Some more of them:
> http://www.cl.cam.ac.uk/~am21/hakmemc.html
> 
> Those are the (famous) HAKMEM tricks, translated to C by Alan Mycroft.
> Probably of no practical use, but learning what the fixed point of the float
> function was on a PDP-10 is priceless...
> 
> Best,
> -- 
> Gabriel
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 

In the spirit of HAKMEM here is a routine, in C, that counts the low zero bits of an integer less than 2^31:
http://cap-lore.com/code/fb/
In machine language it might require just 7 instructions with no branches.
I think that it cannot be improved to handle input greater than 2^31.
A 63 bit version should be possible.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-28 17:50       ` Matteo Frigo
@ 2011-05-30 11:48         ` Jean-Christophe Filliâtre
  0 siblings, 0 replies; 13+ messages in thread
From: Jean-Christophe Filliâtre @ 2011-05-30 11:48 UTC (permalink / raw)
  To: Matteo Frigo; +Cc: Xavier Leroy, caml-list


> Knuth Vol. 4A remarks that this is an old trick, invented but not
> published by some gentleman at IBM in the late sixties.  (Sorry, I don't
> have Knuth at hand for an exact reference.)  However the 1998 paper by
> Leiserson et al. should be available via google.

Another good reference along these lines: Henry Warren's "Hacker's
Delight" (http://www.hackersdelight.org/). My apologies if already
mentioned in this thread.

This book is wonderful and contains many ways to compute the number of
trailing zeros, among other things. (It is cited in volume 4A.)

-- 
Jean-Christophe

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 17:15     ` Xavier Leroy
  2011-05-27 18:04       ` Dario Teixeira
@ 2011-05-28 17:50       ` Matteo Frigo
  2011-05-30 11:48         ` Jean-Christophe Filliâtre
  1 sibling, 1 reply; 13+ messages in thread
From: Matteo Frigo @ 2011-05-28 17:50 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

A way exists to compute the binary logarithm with one multiplication and
a couple of shifts.  See Charles E. Leiserson, Harald Prokop, and Keith
H. Randall: Using de Bruijn Sequences to Index a 1 in a Computer Word.

The idea is that a magic 64-bit string of zeros and ones exists such
that all possible strings of 6 bits appear as substrings of the string.
Such a sequence is called a "de Bruijn Sequence".  Let K be such a
string interpreted as an int64.  Then K * 2^n shifts K to the left by n
places, leaving a unique encoding of n in the upper 6 bits of the
product.

Knuth Vol. 4A remarks that this is an old trick, invented but not
published by some gentleman at IBM in the late sixties.  (Sorry, I don't
have Knuth at hand for an exact reference.)  However the 1998 paper by
Leiserson et al. should be available via google.

Regards,
Matteo Frigo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-28 15:58         ` Richard W.M. Jones
@ 2011-05-28 16:04           ` Edgar Friendly
  0 siblings, 0 replies; 13+ messages in thread
From: Edgar Friendly @ 2011-05-28 16:04 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 678 bytes --]

On 05/28/2011 11:58 AM, Richard W.M. Jones wrote:
> It's long been on my OCaml wishlist (along with a 'return' statement)
> to be able to use inline asm.
Inline asm might not be here, but there's two similar implementations of 
'return', one in Batteries and one in Jane Street Core.

http://ocaml.janestreet.com/?q=node/91
http://ocaml-batteries-team.github.com/batteries-included/hdoc/BatReturn.html

Example (using BatReturn):

|    open Return
    let find_in_array a e =
     label (fun label ->
     for i = 0 to Array.length a - 1 do
       if Array.get a i = e then return label (Some i)
     done;
     None)|


Sharing this in case anyone didn't know about these.

E.

[-- Attachment #2: Type: text/html, Size: 1339 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 18:04       ` Dario Teixeira
  2011-05-27 18:37         ` Till Varoquaux
@ 2011-05-28 15:58         ` Richard W.M. Jones
  2011-05-28 16:04           ` Edgar Friendly
  1 sibling, 1 reply; 13+ messages in thread
From: Richard W.M. Jones @ 2011-05-28 15:58 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

On Fri, May 27, 2011 at 11:04:16AM -0700, Dario Teixeira wrote:
> Note that __builtin_ctz actually translates into a single opcode where
> available (BSFL in x86_64), and I expect that a modern CPU will do a
> decent job with it.  Therefore, despite that a C-based solution will
> most likely prevent inlining (right?), it may be hard to beat...

It's long been on my OCaml wishlist (along with a 'return' statement)
to be able to use inline asm.

Rich.

-- 
Richard Jones
Red Hat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 18:46           ` Till Varoquaux
@ 2011-05-28  0:28             ` Harrison, John R
  0 siblings, 0 replies; 13+ messages in thread
From: Harrison, John R @ 2011-05-28  0:28 UTC (permalink / raw)
  To: Till Varoquaux, Dario Teixeira; +Cc: caml-list

The following are my pet algorithms for essentially the same problem
of counting leading zeros, and the dual problem of counting trailing
zeros. It should be a routine exercise to translate to OCaml and tweak
the numbers for binary logarithm.

I doubt if these are as fast in practice as some of the other
well-known solutions. However they have the interesting feature that
given infinite parallelism and arranging the sums in a balanced tree,
they can be log(log(N)) + 2 in terms of latencies, rather than the
usual log(N), where N is the number of bits in a word. That is, all
the terms in the sum can be evaluated independently. As such, they
are much more like how the operation might be done in hardware.

// Counting leading zeros in an unsigned 32-bit word
// The "_nz" version will return the wrong answer (31) for zero inputs

#define CLZ32_MASK16 0xFFFF0000ul
#define CLZ32_MASK8  0xFF00FF00ul
#define CLZ32_MASK4  0xF0F0F0F0ul
#define CLZ32_MASK2  0xCCCCCCCCul
#define CLZ32_MASK1  0xAAAAAAAAul

#define clz32_nz(n)                                             \
 (((((n) & CLZ32_MASK16) <= ((n) & ~CLZ32_MASK16)) ? 16 : 0) +  \
  ((((n) & CLZ32_MASK8) <= ((n) & ~CLZ32_MASK8)) ? 8 : 0) +     \
  ((((n) & CLZ32_MASK4) <= ((n) & ~CLZ32_MASK4)) ? 4 : 0) +     \
  ((((n) & CLZ32_MASK2) <= ((n) & ~CLZ32_MASK2)) ? 2 : 0) +     \
  ((((n) & CLZ32_MASK1) <= ((n) & ~CLZ32_MASK1)) ? 1 : 0))

#define clz32(n) (((n)==0) ? 32 : clz32_nz(n))

// Counting trailing zeros in an unsigned 32-bit word
// The ctz32_1bit version is for a single bit

#define ctz32_1bit(n)                                           \
 ((((n) & ~CLZ32_MASK16) ? 0 : 16) +                            \
  (((n) & ~CLZ32_MASK8) ? 0 : 8) +                              \
  (((n) & ~CLZ32_MASK4) ? 0 : 4) +                              \
  (((n) & ~CLZ32_MASK2) ? 0 : 2) +                              \
  (((n) & ~CLZ32_MASK1) ? 0 : 1))

#define ctz32(n) (((n) == 0) ? 32 : ctz32_1bit((n) & -(n)))

John.

-----Original Message-----
From: till.varoquaux@gmail.com [mailto:till.varoquaux@gmail.com] On Behalf Of Till Varoquaux
Sent: Friday, May 27, 2011 11:46 AM
To: Dario Teixeira
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Re: Binary logarithm of a power of 2

Nevermind. I am actually running slower, bad benchmark.

Shameful,
Till

On Fri, May 27, 2011 at 2:37 PM, Till Varoquaux <till@pps.jussieu.fr> wrote:
> And you can probably gain on XL by dropping down to ints and not using refs:
>
> let lb5 n i = if n >= 0x2 then i+1 else i
>
> let lb4 n i =
>  if n >= 0x4 then lb5 (n asr 2) (i+2)
>  else lb5 n i
>
> let lb3 n i =
>  if n >= 0x10 then lb4 (n asr 4) (i+4)
>  else lb4 n i
>
> let lb2 n i =
>  if n >= 0x100 then lb3 (n asr 8) (i+8)
>  else lb3 n i
>
> let logbin n =
>  if n >= 0x10000 then lb2 (n asr 16) 16
>  else lb2 n 0
>
> (note that you can drop to ints right in logbin itself if you really
> need your input to be int32s and you aren't on a 32 bit machine).
>
> This is thoroughly untested.
>
> Till
>
> On Fri, May 27, 2011 at 2:04 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote:
>> Hi,
>>
>>> Simpler and maybe faster:
>>>
>>> let logbin (n: int32) =
>>>   let n = ref (Int32.add n 0l) and i = ref 0 in
>>>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
>>>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
>>>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
>>>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
>>>   if !n >= 0x2l then !i + 1 else !i
>>
>> Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very
>> close to the C-based "noalloc" one (the record holder so far), at least
>> on an x86_64.
>>
>> Note that __builtin_ctz actually translates into a single opcode where
>> available (BSFL in x86_64), and I expect that a modern CPU will do a
>> decent job with it.  Therefore, despite that a C-based solution will
>> most likely prevent inlining (right?), it may be hard to beat...
>>
>> Cheers,
>> Dario Teixeira
>>
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa-roc.inria.fr/wws/info/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>>
>


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 18:37         ` Till Varoquaux
@ 2011-05-27 18:46           ` Till Varoquaux
  2011-05-28  0:28             ` Harrison, John R
  0 siblings, 1 reply; 13+ messages in thread
From: Till Varoquaux @ 2011-05-27 18:46 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

Nevermind. I am actually running slower, bad benchmark.

Shameful,
Till

On Fri, May 27, 2011 at 2:37 PM, Till Varoquaux <till@pps.jussieu.fr> wrote:
> And you can probably gain on XL by dropping down to ints and not using refs:
>
> let lb5 n i = if n >= 0x2 then i+1 else i
>
> let lb4 n i =
>  if n >= 0x4 then lb5 (n asr 2) (i+2)
>  else lb5 n i
>
> let lb3 n i =
>  if n >= 0x10 then lb4 (n asr 4) (i+4)
>  else lb4 n i
>
> let lb2 n i =
>  if n >= 0x100 then lb3 (n asr 8) (i+8)
>  else lb3 n i
>
> let logbin n =
>  if n >= 0x10000 then lb2 (n asr 16) 16
>  else lb2 n 0
>
> (note that you can drop to ints right in logbin itself if you really
> need your input to be int32s and you aren't on a 32 bit machine).
>
> This is thoroughly untested.
>
> Till
>
> On Fri, May 27, 2011 at 2:04 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote:
>> Hi,
>>
>>> Simpler and maybe faster:
>>>
>>> let logbin (n: int32) =
>>>   let n = ref (Int32.add n 0l) and i = ref 0 in
>>>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
>>>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
>>>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
>>>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
>>>   if !n >= 0x2l then !i + 1 else !i
>>
>> Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very
>> close to the C-based "noalloc" one (the record holder so far), at least
>> on an x86_64.
>>
>> Note that __builtin_ctz actually translates into a single opcode where
>> available (BSFL in x86_64), and I expect that a modern CPU will do a
>> decent job with it.  Therefore, despite that a C-based solution will
>> most likely prevent inlining (right?), it may be hard to beat...
>>
>> Cheers,
>> Dario Teixeira
>>
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa-roc.inria.fr/wws/info/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 18:04       ` Dario Teixeira
@ 2011-05-27 18:37         ` Till Varoquaux
  2011-05-27 18:46           ` Till Varoquaux
  2011-05-28 15:58         ` Richard W.M. Jones
  1 sibling, 1 reply; 13+ messages in thread
From: Till Varoquaux @ 2011-05-27 18:37 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: caml-list

And you can probably gain on XL by dropping down to ints and not using refs:

let lb5 n i = if n >= 0x2 then i+1 else i

let lb4 n i =
  if n >= 0x4 then lb5 (n asr 2) (i+2)
  else lb5 n i

let lb3 n i =
  if n >= 0x10 then lb4 (n asr 4) (i+4)
  else lb4 n i

let lb2 n i =
  if n >= 0x100 then lb3 (n asr 8) (i+8)
  else lb3 n i

let logbin n =
  if n >= 0x10000 then lb2 (n asr 16) 16
  else lb2 n 0

(note that you can drop to ints right in logbin itself if you really
need your input to be int32s and you aren't on a 32 bit machine).

This is thoroughly untested.

Till

On Fri, May 27, 2011 at 2:04 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote:
> Hi,
>
>> Simpler and maybe faster:
>>
>> let logbin (n: int32) =
>>   let n = ref (Int32.add n 0l) and i = ref 0 in
>>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
>>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
>>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
>>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
>>   if !n >= 0x2l then !i + 1 else !i
>
> Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very
> close to the C-based "noalloc" one (the record holder so far), at least
> on an x86_64.
>
> Note that __builtin_ctz actually translates into a single opcode where
> available (BSFL in x86_64), and I expect that a modern CPU will do a
> decent job with it.  Therefore, despite that a C-based solution will
> most likely prevent inlining (right?), it may be hard to beat...
>
> Cheers,
> Dario Teixeira
>
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 17:15     ` Xavier Leroy
@ 2011-05-27 18:04       ` Dario Teixeira
  2011-05-27 18:37         ` Till Varoquaux
  2011-05-28 15:58         ` Richard W.M. Jones
  2011-05-28 17:50       ` Matteo Frigo
  1 sibling, 2 replies; 13+ messages in thread
From: Dario Teixeira @ 2011-05-27 18:04 UTC (permalink / raw)
  To: caml-list

Hi,

> Simpler and maybe faster:
> 
> let logbin (n: int32) =
>   let n = ref (Int32.add n 0l) and i = ref 0 in
>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
>   if !n >= 0x2l then !i + 1 else !i

Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very
close to the C-based "noalloc" one (the record holder so far), at least
on an x86_64.

Note that __builtin_ctz actually translates into a single opcode where
available (BSFL in x86_64), and I expect that a modern CPU will do a
decent job with it.  Therefore, despite that a C-based solution will
most likely prevent inlining (right?), it may be hard to beat...

Cheers,
Dario Teixeira



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 13:26   ` [Caml-list] " Sylvain Le Gall
@ 2011-05-27 17:15     ` Xavier Leroy
  2011-05-27 18:04       ` Dario Teixeira
  2011-05-28 17:50       ` Matteo Frigo
  0 siblings, 2 replies; 13+ messages in thread
From: Xavier Leroy @ 2011-05-27 17:15 UTC (permalink / raw)
  To: caml-list

On 05/27/2011 03:26 PM, Sylvain Le Gall wrote:

> Friday afternoon exercice ;-)

Same here :-)

> Here is my code:

[ Recursive & array-based implementation in Caml ]

Simpler and maybe faster:

let logbin (n: int32) =
  let n = ref (Int32.add n 0l) and i = ref 0 in
  if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
  if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
  if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
  if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
  if !n >= 0x2l then !i + 1 else !i

Have fun,

- Xavier

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Caml-list] Re: Binary logarithm of a power of 2
  2011-05-27 11:50 ` Dario Teixeira
@ 2011-05-27 13:26   ` Sylvain Le Gall
  2011-05-27 17:15     ` Xavier Leroy
  0 siblings, 1 reply; 13+ messages in thread
From: Sylvain Le Gall @ 2011-05-27 13:26 UTC (permalink / raw)
  To: caml-list

Hello,

On 27-05-2011, Dario Teixeira <darioteixeira@yahoo.com> wrote:
> Hi,
>  
>> Still, my question is whether this is indeed the optimal solution.
>> Is there some penalty associated with invoking C functions that may
>> negate the supposed advantage of __builtin_ctz?
>
> Thank you, Till and Goswin, for your suggestions.  Marking the stub
> as "noalloc" and removing the CAML_ boilerplate did result in a very
> noticeable speed up in the synthetic benchmarks I tried.  Great!
>

Friday afternoon exercice ;-)

Here is my code:

external logbin: int32 -> int = "logbin_stub" "noalloc"

let logbin_simple i =
  let rec find_bit n =
    if (i lsr n) = 0 then
      n - 1
    else
      find_bit (n + 1)
  in
    if i = 0 then
      0
    else
      find_bit 0

let precomp =
  Array.init
    256
    logbin_simple

let logbin_array i32 =
  let i = Int32.to_int i32 in
  let rec test shift =
    let i =
      0xFF land (i lsr shift)
    in
      match Array.unsafe_get precomp i with
        | 0 ->
            if shift = 0 then
              0
            else
              test (shift - 8)
        | n -> n + shift
  in

  let test32 shift =
    let i =
      Int32.to_int
        (Int32.logand
           0xFFl
           (Int32.shift_right i32 shift))
    in
      match Array.unsafe_get precomp i with
        | 0 ->
            test (shift - 8)
        | n -> n + shift
  in
    if Sys.word_size = 64 then
      test 24
    else
      test32 24

let () =
  List.iter
    (fun i ->
       Printf.printf "logbin_simple %d -> %d\n" i (logbin_simple i);
       Printf.printf "logbin_array  %d -> %d\n" i (logbin_array (Int32.of_int i));
       Printf.printf "logbin        %d -> %d\n" i (logbin (Int32.of_int i));
       Printf.printf "\n\n";
    )
    [ 0; 1; 2; 4; 1024; 2048; 4096; 4194304]


let () =
  Benchmark.tabulate
    (Benchmark.throughputN
       2
       [
         "logbin", (fun () -> logbin 4096l), ();
         "logbin_simple", (fun () -> logbin_simple 4096), ();
         "logbin_array",  (fun () -> logbin_array 4096l), ();
       ])


And the results:

Throughputs for "logbin", "logbin_simple", "logbin_array" each running for at least 2 CPU seconds:
       logbin:  2.14 WALL ( 2.14 usr +  0.00 sys =  2.14 CPU) @ 74191111.86/s (n=158778921)
logbin_simple:  2.02 WALL ( 2.02 usr +  0.00 sys =  2.02 CPU) @ 27884695.31/s (n=56330598)
 logbin_array:  2.05 WALL ( 2.05 usr +  0.00 sys =  2.05 CPU) @ 87426949.77/s (n=179411379)
                    Rate logbin_simple        logbin  logbin_array
logbin_simple 27884695/s            --          -62%          -68%
       logbin 74191112/s          166%            --          -15%
 logbin_array 87426950/s          214%           18%            --


The code is not bug proof, you should check it ;-)

You can except at least 18% increase in term of performance on amd64 and a 1% decrease on i386. 

The 4096 value is not very good for logbin_array because it have to loop more
times, so on greater value, you can expect better performance.

Cheers,
Sylvain Le Gall
-- 
My company: http://www.ocamlcore.com
Linkedin:   http://fr.linkedin.com/in/sylvainlegall
Start an OCaml project here: http://forge.ocamlcore.org
OCaml blogs:                 http://planet.ocamlcore.org



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-30 11:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-27 17:21 [Caml-list] Re: Binary logarithm of a power of 2 Dario Teixeira
2011-05-28  5:29 ` Gabriel Kerneis
2011-05-28 16:17   ` Norman Hardy
  -- strict thread matches above, loose matches on Subject: below --
2011-05-26 15:51 [Caml-list] " Dario Teixeira
2011-05-27 11:50 ` Dario Teixeira
2011-05-27 13:26   ` [Caml-list] " Sylvain Le Gall
2011-05-27 17:15     ` Xavier Leroy
2011-05-27 18:04       ` Dario Teixeira
2011-05-27 18:37         ` Till Varoquaux
2011-05-27 18:46           ` Till Varoquaux
2011-05-28  0:28             ` Harrison, John R
2011-05-28 15:58         ` Richard W.M. Jones
2011-05-28 16:04           ` Edgar Friendly
2011-05-28 17:50       ` Matteo Frigo
2011-05-30 11:48         ` Jean-Christophe Filliâtre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).