From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p4S0Svuv010588 for ; Sat, 28 May 2011 02:28:57 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AicAAK9A4E2GhogYkWdsb2JhbABVhEmTKQ+NUm0UAQEBAQkLCwcUBSC2GJAygSuDbIEHBJUFhB6GMw X-IronPort-AV: E=Sophos;i="4.65,283,1304287200"; d="scan'208";a="84192893" Received: from mga09.intel.com ([134.134.136.24]) by mail3-smtp-sop.national.inria.fr with ESMTP; 28 May 2011 02:28:51 +0200 Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 27 May 2011 17:28:50 -0700 X-ExtLoop1: 1 Received: from orsmsx603.amr.corp.intel.com ([10.22.226.49]) by orsmga002.jf.intel.com with ESMTP; 27 May 2011 17:28:50 -0700 Received: from orsmsx151.amr.corp.intel.com (10.22.226.38) by orsmsx603.amr.corp.intel.com (10.22.226.49) with Microsoft SMTP Server (TLS) id 8.2.255.0; Fri, 27 May 2011 17:28:49 -0700 Received: from orsmsx101.amr.corp.intel.com ([169.254.8.114]) by ORSMSX151.amr.corp.intel.com ([169.254.5.187]) with mapi id 14.01.0289.001; Fri, 27 May 2011 17:28:49 -0700 From: "Harrison, John R" To: Till Varoquaux , Dario Teixeira CC: "caml-list@inria.fr" Thread-Topic: [Caml-list] Re: Binary logarithm of a power of 2 Thread-Index: AQHMHHG9UggX4FLc4EO2AEB6uOzh0pShX7qAgAANtwCAAAlJAIAAAmiA///miVA= Date: Sat, 28 May 2011 00:28:46 +0000 Message-ID: References: <4DDFDC1F.5050605@inria.fr> <165028.34675.qm@web111509.mail.gq1.yahoo.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.9.131.214] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by walapai.inria.fr id p4S0Svuv010588 Subject: RE: [Caml-list] Re: Binary logarithm of a power of 2 The following are my pet algorithms for essentially the same problem of counting leading zeros, and the dual problem of counting trailing zeros. It should be a routine exercise to translate to OCaml and tweak the numbers for binary logarithm. I doubt if these are as fast in practice as some of the other well-known solutions. However they have the interesting feature that given infinite parallelism and arranging the sums in a balanced tree, they can be log(log(N)) + 2 in terms of latencies, rather than the usual log(N), where N is the number of bits in a word. That is, all the terms in the sum can be evaluated independently. As such, they are much more like how the operation might be done in hardware. // Counting leading zeros in an unsigned 32-bit word // The "_nz" version will return the wrong answer (31) for zero inputs #define CLZ32_MASK16 0xFFFF0000ul #define CLZ32_MASK8 0xFF00FF00ul #define CLZ32_MASK4 0xF0F0F0F0ul #define CLZ32_MASK2 0xCCCCCCCCul #define CLZ32_MASK1 0xAAAAAAAAul #define clz32_nz(n) \ (((((n) & CLZ32_MASK16) <= ((n) & ~CLZ32_MASK16)) ? 16 : 0) + \ ((((n) & CLZ32_MASK8) <= ((n) & ~CLZ32_MASK8)) ? 8 : 0) + \ ((((n) & CLZ32_MASK4) <= ((n) & ~CLZ32_MASK4)) ? 4 : 0) + \ ((((n) & CLZ32_MASK2) <= ((n) & ~CLZ32_MASK2)) ? 2 : 0) + \ ((((n) & CLZ32_MASK1) <= ((n) & ~CLZ32_MASK1)) ? 1 : 0)) #define clz32(n) (((n)==0) ? 32 : clz32_nz(n)) // Counting trailing zeros in an unsigned 32-bit word // The ctz32_1bit version is for a single bit #define ctz32_1bit(n) \ ((((n) & ~CLZ32_MASK16) ? 0 : 16) + \ (((n) & ~CLZ32_MASK8) ? 0 : 8) + \ (((n) & ~CLZ32_MASK4) ? 0 : 4) + \ (((n) & ~CLZ32_MASK2) ? 0 : 2) + \ (((n) & ~CLZ32_MASK1) ? 0 : 1)) #define ctz32(n) (((n) == 0) ? 32 : ctz32_1bit((n) & -(n))) John. -----Original Message----- From: till.varoquaux@gmail.com [mailto:till.varoquaux@gmail.com] On Behalf Of Till Varoquaux Sent: Friday, May 27, 2011 11:46 AM To: Dario Teixeira Cc: caml-list@inria.fr Subject: Re: [Caml-list] Re: Binary logarithm of a power of 2 Nevermind. I am actually running slower, bad benchmark. Shameful, Till On Fri, May 27, 2011 at 2:37 PM, Till Varoquaux wrote: > And you can probably gain on XL by dropping down to ints and not using refs: > > let lb5 n i = if n >= 0x2 then i+1 else i > > let lb4 n i = >  if n >= 0x4 then lb5 (n asr 2) (i+2) >  else lb5 n i > > let lb3 n i = >  if n >= 0x10 then lb4 (n asr 4) (i+4) >  else lb4 n i > > let lb2 n i = >  if n >= 0x100 then lb3 (n asr 8) (i+8) >  else lb3 n i > > let logbin n = >  if n >= 0x10000 then lb2 (n asr 16) 16 >  else lb2 n 0 > > (note that you can drop to ints right in logbin itself if you really > need your input to be int32s and you aren't on a 32 bit machine). > > This is thoroughly untested. > > Till > > On Fri, May 27, 2011 at 2:04 PM, Dario Teixeira wrote: >> Hi, >> >>> Simpler and maybe faster: >>> >>> let logbin (n: int32) = >>>   let n = ref (Int32.add n 0l) and i = ref 0 in >>>   if !n >= 0x10000l then (n := Int32.shift_right !n 16; i := 16); >>>   if !n >= 0x100l then (n := Int32.shift_right !n 8; i := !i + 8); >>>   if !n >= 0x10l then (n := Int32.shift_right !n 4; i := !i + 4); >>>   if !n >= 0x4l then (n := Int32.shift_right !n 2; i := !i + 2); >>>   if !n >= 0x2l then !i + 1 else !i >> >> Thanks!  In my synthetic benchmark, this pure Ocaml solution comes very >> close to the C-based "noalloc" one (the record holder so far), at least >> on an x86_64. >> >> Note that __builtin_ctz actually translates into a single opcode where >> available (BSFL in x86_64), and I expect that a modern CPU will do a >> decent job with it.  Therefore, despite that a C-based solution will >> most likely prevent inlining (right?), it may be hard to beat... >> >> Cheers, >> Dario Teixeira >> >> >> >> -- >> Caml-list mailing list.  Subscription management and archives: >> https://sympa-roc.inria.fr/wws/info/caml-list >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners >> Bug reports: http://caml.inria.fr/bin/caml-bugs >> >> > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs