From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by sympa.inria.fr (Postfix) with ESMTPS id 0F0087F2AA for ; Fri, 21 Dec 2012 11:02:39 +0100 (CET) X-IronPort-AV: E=Sophos;i="4.84,329,1355094000"; d="scan'208";a="187190310" Received: from estephe.inria.fr (HELO [128.93.11.95]) ([128.93.11.95]) by mail1-relais-roc.national.inria.fr with ESMTP; 21 Dec 2012 11:02:38 +0100 Message-ID: <50D433BE.4010309@inria.fr> Date: Fri, 21 Dec 2012 11:02:38 +0100 From: Xavier Leroy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: caml-list@inria.fr References: In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Caml-list] Hashing failure On 12/20/2012 09:18 PM, Thomas Braibant wrote: > [Reducing a hash value to a [0, N) interval ] > My question is : is there a safe way to bullet-proof my code against > these pathological cases, without too much hinderance (this is some > code whose efficiency is critical...) Some possibilities to reduce an integer k to the interval [0, N): abs (k mod N) (if N > 0, k mod N is guaranteed to be in (-N,N) ) (k land max_int) mod N ("land max_int" masks the sign bit off and preserves all other bits, reducing k to [0, max_int]) The latter is a bit more efficient, as it avoids the conditional in "abs". If N is a power of 2, you can just do k land (N - 1) For hash tables, it is easy to ensure that the size of the array is a power of 2. However, the formula above assumes that the low bits of k have good distribution, which is not always the case if k is computed with a simple hash function. You can also "mix up" some of the high bits before taking the low bits, e.g. (k lxor (k lsr 16)) land (N - 1) Hope this helps, - Xavier