From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Xavier.Leroy@inria.fr>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by sympa.inria.fr (Postfix) with ESMTPS id 0F0087F2AA
	for <caml-list@sympa.inria.fr>; Fri, 21 Dec 2012 11:02:39 +0100 (CET)
X-IronPort-AV: E=Sophos;i="4.84,329,1355094000"; 
   d="scan'208";a="187190310"
Received: from estephe.inria.fr (HELO [128.93.11.95]) ([128.93.11.95])
  by mail1-relais-roc.national.inria.fr with ESMTP; 21 Dec 2012 11:02:38 +0100
Message-ID: <50D433BE.4010309@inria.fr>
Date: Fri, 21 Dec 2012 11:02:38 +0100
From: Xavier Leroy <Xavier.Leroy@inria.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0
MIME-Version: 1.0
To: caml-list@inria.fr
References: <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com> <CAHR=Vkx3hntWcQYV8tptKnrRyKHNBdfA-6KxjfxWEp=j4Q-saQ@mail.gmail.com>
In-Reply-To: <CAHR=Vkx3hntWcQYV8tptKnrRyKHNBdfA-6KxjfxWEp=j4Q-saQ@mail.gmail.com>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Caml-list] Hashing failure

On 12/20/2012 09:18 PM, Thomas Braibant wrote:

> [Reducing a hash value to a [0, N) interval ]
> My question is : is there a safe way to bullet-proof my code against
> these pathological cases, without too much hinderance (this is some
> code whose efficiency is critical...)

Some possibilities to reduce an integer k to the interval [0, N):

  abs (k mod N)
     (if N > 0, k mod N is guaranteed to be in (-N,N) )

  (k land max_int) mod N
     ("land max_int" masks the sign bit off and preserves all other
      bits, reducing k to [0, max_int])

The latter is a bit more efficient, as it avoids the conditional in "abs".

If N is a power of 2, you can just do

  k land (N - 1)

For hash tables, it is easy to ensure that the size of the array is a
power of 2.  However, the formula above assumes that the low bits of k
have good distribution, which is not always the case if k is computed
with a simple hash function.  You can also "mix up" some of the high
bits before taking the low bits, e.g.

(k lxor (k lsr 16)) land (N - 1)

Hope this helps,

- Xavier