caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] What is the default hash function of string in 3.12?
@ 2012-06-12 20:23 Jianzhou Zhao
  2012-06-13  8:25 ` rixed
  0 siblings, 1 reply; 2+ messages in thread
From: Jianzhou Zhao @ 2012-06-12 20:23 UTC (permalink / raw)
  To: caml users

Hi,

I have a table (with size > 30000) implemented by both Hashtbl and
Map. I found their performance are almost the same. But I think
Hashtbl should be much faster than Map. So, I was wondering what hash
function of String OCaml 3.12 uses, and if I can define my own.

Thanks!
-- 
Jianzhou

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Caml-list] What is the default hash function of string in 3.12?
  2012-06-12 20:23 [Caml-list] What is the default hash function of string in 3.12? Jianzhou Zhao
@ 2012-06-13  8:25 ` rixed
  0 siblings, 0 replies; 2+ messages in thread
From: rixed @ 2012-06-13  8:25 UTC (permalink / raw)
  To: caml users

-[ Tue, Jun 12, 2012 at 04:23:46PM -0400, Jianzhou Zhao ]----
> So, I was wondering what hash
> function of String OCaml 3.12 uses, and if I can define my own.

Here is the relevant part of the code:

--[ byterun/hash.c ]--

    switch (tag) {
    case String_tag:
      hash_univ_count--;
      i = caml_string_length(obj);
      for (p = &Byte_u(obj, 0); i > 0; i--, p++)
        Combine_small(*p);
      break;

----

If I can read C (which sometime I can't) then it seams hashing a string
requires an amount of CPU proportional to the length of the string, and that a
string count as one value only for the depth limit.  We could imagine for
instance that each char decrease the hash_univ_count, but then you need to
choose the chars of the string you consider very wisely.  For instance if you
choose to hash only the first of last N chars then you risk hashing many
strings that ends/starts with the same substrings to the same value. If you
choose one char every N you'r still at risk (for instance strings that differs
only by a few chars may all end with the same hash).  So all in all its not
easy to _not_ process all chars.

As for changing the hash function, you can use the Hashtbl.Make functor to
build your own hash out of the hash function you want.  But there is no way to
change the default hash function for strings AFAICT.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-06-13  8:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-12 20:23 [Caml-list] What is the default hash function of string in 3.12? Jianzhou Zhao
2012-06-13  8:25 ` rixed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).