caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Hashing failure
       [not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com>
@ 2012-12-20 20:18 ` Thomas Braibant
  2012-12-20 20:32   ` [Caml-list] " Thomas Braibant
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Thomas Braibant @ 2012-12-20 20:18 UTC (permalink / raw)
  To: caml-list

Hi list,

I have a piece of code that looks like this.

let hkey = H.hash d in (* a custom hash function *)
let index = hkey mod (Array.length t.table) in
let bucket = t.table.(index) in (* 1 *)

the line marked with 1 was generating an index out of bounds
exception. It appeared that it was because index was negative, a
problem  I knew of when using mod ... So I added an abs around index

let hkey = H.hash d in
let index =  abs (hkey mod (Array.length t.table)) in
let bucket = t.table.(index) in (* 1 *)

But it yields the same exception...

This seemed completely unbelievable, till I realized that
- x mod y may return a negative value if x is less than 0
- abs x may return a negative value if x is min_int

So I tried to reproduce this behavior with the following code:

let t = Array.create 17 0;;
let hkey = min_int in
let index = (abs hkey) mod (Array.length t) in
t.(index);;

which yields an out of bound. Bingo. It turns out that some value of
the length of t will result in an error, and some will not (e.g., 16) ... My
question is : is there a safe way to bullet-proof my code against
these pathological cases, without too much hinderance (this is some
code whose efficiency is critical...)


Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Caml-list] Re: Hashing failure
  2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
@ 2012-12-20 20:32   ` Thomas Braibant
  2012-12-20 21:01     ` Martin Jambon
  2012-12-21  7:58   ` [Caml-list] " Jacques Garrigue
  2012-12-21 10:02   ` Xavier Leroy
  2 siblings, 1 reply; 6+ messages in thread
From: Thomas Braibant @ 2012-12-20 20:32 UTC (permalink / raw)
  To: caml-list

The second piece of code should read
> let hkey = H.hash d in
> let index =  (abs hkey) mod (Array.length t.table) in
> let bucket = t.table.(index) in (* 1 *)

Sorry for the noise.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Re: Hashing failure
  2012-12-20 20:32   ` [Caml-list] " Thomas Braibant
@ 2012-12-20 21:01     ` Martin Jambon
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Jambon @ 2012-12-20 21:01 UTC (permalink / raw)
  To: Thomas Braibant; +Cc: caml-list

On Thu 20 Dec 2012 12:32:07 PM PST, Thomas Braibant wrote:
> The second piece of code should read
>> let hkey = H.hash d in
>> let index =  (abs hkey) mod (Array.length t.table) in
>> let bucket = t.table.(index) in (* 1 *)
>
> Sorry for the noise.

You can use (if hkey = min_int then 0 else abs hkey). It is guaranteed 
to work and won't cost much.

Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Hashing failure
  2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
  2012-12-20 20:32   ` [Caml-list] " Thomas Braibant
@ 2012-12-21  7:58   ` Jacques Garrigue
  2012-12-21 10:02   ` Xavier Leroy
  2 siblings, 0 replies; 6+ messages in thread
From: Jacques Garrigue @ 2012-12-21  7:58 UTC (permalink / raw)
  To: Thomas Braibant; +Cc: caml-list

On 2012/12/21, at 5:18, Thomas Braibant <thomas.braibant@gmail.com> wrote:

> Hi list,
> 
> I have a piece of code that looks like this.
> 
> let hkey = H.hash d in (* a custom hash function *)
> let index = hkey mod (Array.length t.table) in
> let bucket = t.table.(index) in (* 1 *)
> 
> the line marked with 1 was generating an index out of bounds
> exception. It appeared that it was because index was negative, a
> problem  I knew of when using mod ... So I added an abs around index
> 
> let hkey = H.hash d in
> let index =  abs (hkey mod (Array.length t.table)) in
> let bucket = t.table.(index) in (* 1 *)
> 
> But it yields the same exception...
> 
> This seemed completely unbelievable, till I realized that
> - x mod y may return a negative value if x is less than 0
> - abs x may return a negative value if x is min_int

What about just ensuring that your hash function returns a positive
value to start with.
An easy way to do that is

   let hkey = hkey land max_int

This avoids using abs altogether.
(I now understand why the standard hash function is 30-bit…)

	Jacques Garrigue

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Hashing failure
  2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
  2012-12-20 20:32   ` [Caml-list] " Thomas Braibant
  2012-12-21  7:58   ` [Caml-list] " Jacques Garrigue
@ 2012-12-21 10:02   ` Xavier Leroy
  2012-12-21 19:03     ` Martin Jambon
  2 siblings, 1 reply; 6+ messages in thread
From: Xavier Leroy @ 2012-12-21 10:02 UTC (permalink / raw)
  To: caml-list

On 12/20/2012 09:18 PM, Thomas Braibant wrote:

> [Reducing a hash value to a [0, N) interval ]
> My question is : is there a safe way to bullet-proof my code against
> these pathological cases, without too much hinderance (this is some
> code whose efficiency is critical...)

Some possibilities to reduce an integer k to the interval [0, N):

  abs (k mod N)
     (if N > 0, k mod N is guaranteed to be in (-N,N) )

  (k land max_int) mod N
     ("land max_int" masks the sign bit off and preserves all other
      bits, reducing k to [0, max_int])

The latter is a bit more efficient, as it avoids the conditional in "abs".

If N is a power of 2, you can just do

  k land (N - 1)

For hash tables, it is easy to ensure that the size of the array is a
power of 2.  However, the formula above assumes that the low bits of k
have good distribution, which is not always the case if k is computed
with a simple hash function.  You can also "mix up" some of the high
bits before taking the low bits, e.g.

(k lxor (k lsr 16)) land (N - 1)

Hope this helps,

- Xavier


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Hashing failure
  2012-12-21 10:02   ` Xavier Leroy
@ 2012-12-21 19:03     ` Martin Jambon
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Jambon @ 2012-12-21 19:03 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

On Fri 21 Dec 2012 02:02:38 AM PST, Xavier Leroy wrote:
> On 12/20/2012 09:18 PM, Thomas Braibant wrote:
>
>> [Reducing a hash value to a [0, N) interval ]
>> My question is : is there a safe way to bullet-proof my code against
>> these pathological cases, without too much hinderance (this is some
>> code whose efficiency is critical...)
>
> Some possibilities to reduce an integer k to the interval [0, N):
>
>    abs (k mod N)
>       (if N > 0, k mod N is guaranteed to be in (-N,N) )
>
>    (k land max_int) mod N
>       ("land max_int" masks the sign bit off and preserves all other
>        bits, reducing k to [0, max_int])

Is the two's complement representation of integers guaranteed by all 
OCaml implementations? It would be helpful if the manual was explicit 
about it.


Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-12-21 19:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHR=VkzBhR5FMsWO-_BVzYV4yhSARP8rg7G3b=jezU6OOOg0jQ@mail.gmail.com>
2012-12-20 20:18 ` [Caml-list] Hashing failure Thomas Braibant
2012-12-20 20:32   ` [Caml-list] " Thomas Braibant
2012-12-20 21:01     ` Martin Jambon
2012-12-21  7:58   ` [Caml-list] " Jacques Garrigue
2012-12-21 10:02   ` Xavier Leroy
2012-12-21 19:03     ` Martin Jambon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).