From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p4S0Svuv010588
	for <caml-list@sympa-roc.inria.fr>; Sat, 28 May 2011 02:28:57 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AicAAK9A4E2GhogYkWdsb2JhbABVhEmTKQ+NUm0UAQEBAQkLCwcUBSC2GJAygSuDbIEHBJUFhB6GMw
X-IronPort-AV: E=Sophos;i="4.65,283,1304287200"; 
   d="scan'208";a="84192893"
Received: from mga09.intel.com ([134.134.136.24])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 28 May 2011 02:28:51 +0200
Received: from orsmga002.jf.intel.com ([10.7.209.21])
  by orsmga102.jf.intel.com with ESMTP; 27 May 2011 17:28:50 -0700
X-ExtLoop1: 1
Received: from orsmsx603.amr.corp.intel.com ([10.22.226.49])
  by orsmga002.jf.intel.com with ESMTP; 27 May 2011 17:28:50 -0700
Received: from orsmsx151.amr.corp.intel.com (10.22.226.38) by
 orsmsx603.amr.corp.intel.com (10.22.226.49) with Microsoft SMTP Server (TLS)
 id 8.2.255.0; Fri, 27 May 2011 17:28:49 -0700
Received: from orsmsx101.amr.corp.intel.com ([169.254.8.114]) by
 ORSMSX151.amr.corp.intel.com ([169.254.5.187]) with mapi id 14.01.0289.001;
 Fri, 27 May 2011 17:28:49 -0700
From: "Harrison, John R" <john.r.harrison@intel.com>
To: Till Varoquaux <till@pps.jussieu.fr>,
        Dario Teixeira
	<darioteixeira@yahoo.com>
CC: "caml-list@inria.fr" <caml-list@inria.fr>
Thread-Topic: [Caml-list] Re: Binary logarithm of a power of 2
Thread-Index: AQHMHHG9UggX4FLc4EO2AEB6uOzh0pShX7qAgAANtwCAAAlJAIAAAmiA///miVA=
Date: Sat, 28 May 2011 00:28:46 +0000
Message-ID: <BD6BC2F330F5FB42AE6C31E84D7B76D701074D@ORSMSX101.amr.corp.intel.com>
References: <4DDFDC1F.5050605@inria.fr>
	<165028.34675.qm@web111509.mail.gq1.yahoo.com>
	<BANLkTi=UPCy+ZErybor6aDUxvk1EkgsfpA@mail.gmail.com>
 <BANLkTim049Ufr2H9E=AKmRmaiXPmZLS_kQ@mail.gmail.com>
In-Reply-To: <BANLkTim049Ufr2H9E=AKmRmaiXPmZLS_kQ@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.9.131.214]
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by walapai.inria.fr id p4S0Svuv010588
Subject: RE: [Caml-list] Re: Binary logarithm of a power of 2

The following are my pet algorithms for essentially the same problem
of counting leading zeros, and the dual problem of counting trailing
zeros. It should be a routine exercise to translate to OCaml and tweak
the numbers for binary logarithm.

I doubt if these are as fast in practice as some of the other
well-known solutions. However they have the interesting feature that
given infinite parallelism and arranging the sums in a balanced tree,
they can be log(log(N)) + 2 in terms of latencies, rather than the
usual log(N), where N is the number of bits in a word. That is, all
the terms in the sum can be evaluated independently. As such, they
are much more like how the operation might be done in hardware.

// Counting leading zeros in an unsigned 32-bit word
// The "_nz" version will return the wrong answer (31) for zero inputs

#define CLZ32_MASK16 0xFFFF0000ul
#define CLZ32_MASK8  0xFF00FF00ul
#define CLZ32_MASK4  0xF0F0F0F0ul
#define CLZ32_MASK2  0xCCCCCCCCul
#define CLZ32_MASK1  0xAAAAAAAAul

#define clz32_nz(n)                                             \
 (((((n) & CLZ32_MASK16) <= ((n) & ~CLZ32_MASK16)) ? 16 : 0) +  \
  ((((n) & CLZ32_MASK8) <= ((n) & ~CLZ32_MASK8)) ? 8 : 0) +     \
  ((((n) & CLZ32_MASK4) <= ((n) & ~CLZ32_MASK4)) ? 4 : 0) +     \
  ((((n) & CLZ32_MASK2) <= ((n) & ~CLZ32_MASK2)) ? 2 : 0) +     \
  ((((n) & CLZ32_MASK1) <= ((n) & ~CLZ32_MASK1)) ? 1 : 0))

#define clz32(n) (((n)==0) ? 32 : clz32_nz(n))

// Counting trailing zeros in an unsigned 32-bit word
// The ctz32_1bit version is for a single bit

#define ctz32_1bit(n)                                           \
 ((((n) & ~CLZ32_MASK16) ? 0 : 16) +                            \
  (((n) & ~CLZ32_MASK8) ? 0 : 8) +                              \
  (((n) & ~CLZ32_MASK4) ? 0 : 4) +                              \
  (((n) & ~CLZ32_MASK2) ? 0 : 2) +                              \
  (((n) & ~CLZ32_MASK1) ? 0 : 1))

#define ctz32(n) (((n) == 0) ? 32 : ctz32_1bit((n) & -(n)))

John.

-----Original Message-----
From: till.varoquaux@gmail.com [mailto:till.varoquaux@gmail.com] On Behalf Of Till Varoquaux
Sent: Friday, May 27, 2011 11:46 AM
To: Dario Teixeira
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Re: Binary logarithm of a power of 2

Nevermind. I am actually running slower, bad benchmark.

Shameful,
Till

On Fri, May 27, 2011 at 2:37 PM, Till Varoquaux <till@pps.jussieu.fr> wrote:
> And you can probably gain on XL by dropping down to ints and not using refs:
>
> let lb5 n i = if n >= 0x2 then i+1 else i
>
> let lb4 n i =
>  if n >= 0x4 then lb5 (n asr 2) (i+2)
>  else lb5 n i
>
> let lb3 n i =
>  if n >= 0x10 then lb4 (n asr 4) (i+4)
>  else lb4 n i
>
> let lb2 n i =
>  if n >= 0x100 then lb3 (n asr 8) (i+8)
>  else lb3 n i
>
> let logbin n =
>  if n >= 0x10000 then lb2 (n asr 16) 16
>  else lb2 n 0
>
> (note that you can drop to ints right in logbin itself if you really
> need your input to be int32s and you aren't on a 32 bit machine).
>
> This is thoroughly untested.
>
> Till
>
> On Fri, May 27, 2011 at 2:04 PM, Dario Teixeira <darioteixeira@yahoo.com> wrote:
>> Hi,
>>
>>> Simpler and maybe faster:
>>>
>>> let logbin (n: int32) =
>>>   let n = ref (Int32.add n 0l) and i = ref 0 in
>>>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16);
>>>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8);
>>>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4);
>>>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2);
>>>   if !n >= 0x2l then !i + 1 else !i
>>
>> Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very
>> close to the C-based "noalloc" one (the record holder so far), at least
>> on an x86_64.
>>
>> Note that __builtin_ctz actually translates into a single opcode where
>> available (BSFL in x86_64), and I expect that a modern CPU will do a
>> decent job with it.  Therefore, despite that a C-based solution will
>> most likely prevent inlining (right?), it may be hard to beat...
>>
>> Cheers,
>> Dario Teixeira
>>
>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa-roc.inria.fr/wws/info/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>
>>
>


-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs