caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Hashtbl.create 401
@ 2005-04-22  3:09 Matt Gushee
  2005-04-22  3:21 ` [Caml-list] " Erik de Castro Lopo
  2005-04-22  3:33 ` Radu Grigore
  0 siblings, 2 replies; 5+ messages in thread
From: Matt Gushee @ 2005-04-22  3:09 UTC (permalink / raw)
  To: caml-list

Hello, all--

While browsing through the Labltk source code (and perhaps in some other 
places), I noticed several instances of:

   Hashtbl.create 401

Why 401?

--
Matt Gushee
Englewood, CO, USA


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hashtbl.create 401
  2005-04-22  3:09 Hashtbl.create 401 Matt Gushee
@ 2005-04-22  3:21 ` Erik de Castro Lopo
  2005-04-22 12:06   ` padiolea
  2005-04-22  3:33 ` Radu Grigore
  1 sibling, 1 reply; 5+ messages in thread
From: Erik de Castro Lopo @ 2005-04-22  3:21 UTC (permalink / raw)
  To: caml-list

On Thu, 21 Apr 2005 21:09:18 -0600
Matt Gushee <mgushee@havenrock.com> wrote:

> Hello, all--
> 
> While browsing through the Labltk source code (and perhaps in some other 
> places), I noticed several instances of:
> 
>    Hashtbl.create 401
> 
> Why 401?

Hash tables distribute data across a set of hash buckets. If
the number of hash buckets is prime (401 is prime) then their
is a better chance of the data being evenly distributed across
the buckets.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
+-----------------------------------------------------------+
"I invented the term Object-Oriented, and I can tell you I
did not have C++ in mind." -- Alan Kay


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hashtbl.create 401
  2005-04-22  3:09 Hashtbl.create 401 Matt Gushee
  2005-04-22  3:21 ` [Caml-list] " Erik de Castro Lopo
@ 2005-04-22  3:33 ` Radu Grigore
  1 sibling, 0 replies; 5+ messages in thread
From: Radu Grigore @ 2005-04-22  3:33 UTC (permalink / raw)
  To: Matt Gushee; +Cc: caml-list

On 4/22/05, Matt Gushee <mgushee@havenrock.com> wrote:

>    Hashtbl.create 401
> Why 401?

Probably because it is a prime number and the estimate number of
objects in the hash is "less than 400".

-- 
regards,
 radu
http://rgrig.blogspot.com/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hashtbl.create 401
  2005-04-22  3:21 ` [Caml-list] " Erik de Castro Lopo
@ 2005-04-22 12:06   ` padiolea
  2005-04-23  0:26     ` sejourne_kevin
  0 siblings, 1 reply; 5+ messages in thread
From: padiolea @ 2005-04-22 12:06 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list

> On Thu, 21 Apr 2005 21:09:18 -0600
> Matt Gushee <mgushee@havenrock.com> wrote:
>
>> Hello, all--
>>
>> While browsing through the Labltk source code (and perhaps in some other
>> places), I noticed several instances of:
>>
>>    Hashtbl.create 401
>>
>> Why 401?
>
> Hash tables distribute data across a set of hash buckets. If
> the number of hash buckets is prime (401 is prime) then their
> is a better chance of the data being evenly distributed across
> the buckets.

but it seems that this number is not that much important since,
as said in the ml source:
(* We do dynamic hashing, and resize the table and rehash the elements
   when buckets become too long. *)


>
> Erik
> --
> +-----------------------------------------------------------+
>   Erik de Castro Lopo  nospam@mega-nerd.com (Yes it's valid)
> +-----------------------------------------------------------+
> "I invented the term Object-Oriented, and I can tell you I
> did not have C++ in mind." -- Alan Kay
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hashtbl.create 401
  2005-04-22 12:06   ` padiolea
@ 2005-04-23  0:26     ` sejourne_kevin
  0 siblings, 0 replies; 5+ messages in thread
From: sejourne_kevin @ 2005-04-23  0:26 UTC (permalink / raw)
  To: padiolea; +Cc: Erik de Castro Lopo, caml-list

padiolea@irisa.fr a écrit :
>>On Thu, 21 Apr 2005 21:09:18 -0600
>>Matt Gushee <mgushee@havenrock.com> wrote:
>>
>>
>>>Hello, all--
>>>
>>>While browsing through the Labltk source code (and perhaps in some other
>>>places), I noticed several instances of:
>>>
>>>   Hashtbl.create 401
>>>
>>>Why 401?
>>
>>Hash tables distribute data across a set of hash buckets. If
>>the number of hash buckets is prime (401 is prime) then their
>>is a better chance of the data being evenly distributed across
>>the buckets.
> 
> 
> but it seems that this number is not that much important since,
> as said in the ml source:
> (* We do dynamic hashing, and resize the table and rehash the elements
>    when buckets become too long. *)
If you want to have a HashTbl with only a prime number of hash buckets,
you can do things like that (with code stolen from the stdlib):

let table = (* list of (next_primes 2n+1) of root 3 *)
[| 3; 11; 29; 61; 127; 257; 521; 1049; 2111; 4229; 8461; 16927; 33857;
67723; 135449; 270913; 541831; 1083689; 2167393;(*max 32 bits*) 4334791;
8669593; 17339197; 34678421; 69356857; 138713717; 277427441; 554854889
|]
;;
type ('a, 'b) t =
  {
    mutable size: int;                        (* number of elements *)
    mutable index: int;                       (* index of the next array
size*)
    mutable data: ('a, 'b) bucketlist array   (* the buckets *)
  }

and ...
let create initial_size =
  let i_m, _ =
      fold_lefti (fun i ((_,v_m) as u) x ->
		    let v = abs (x - initial_size) in
		      if v < v_m  then (i,v)  else u
		 ) (0,max_int) table
  in
  let s = min (max 1 table.(i_m)) Sys.max_array_length
  in { size = 0; index = i_m; data = Array.make s Empty }


let resize hashfun tbl =
  let _ = tbl.index <- succ tbl.index in
  let osize = Array.length tbl.data in
  let nsize = try min (table.(tbl.index)) Sys.max_array_length with |_->
osize
  in
    if nsize <> osize then
      begin
	let ndata = Array.create nsize Empty in
	let rec insert_bucket = function
            Empty -> ()
	  | Cons(key, data, rest) ->
              insert_bucket rest; (* preserve original order of elements *)
              let nidx = (hashfun key) mod nsize in
		ndata.(nidx) <- Cons(key, data, ndata.(nidx)) in
	  for i = 0 to osize - 1 do
	    insert_bucket tbl.data.(i)
	  done;
	  tbl.data <- ndata
      end

let copy h =
  { size = h.size;
    index = h.index;
    data = Array.copy h.data }

....



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-04-22 22:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-22  3:09 Hashtbl.create 401 Matt Gushee
2005-04-22  3:21 ` [Caml-list] " Erik de Castro Lopo
2005-04-22 12:06   ` padiolea
2005-04-23  0:26     ` sejourne_kevin
2005-04-22  3:33 ` Radu Grigore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).