caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Thomas Fischbacher <Thomas.Fischbacher@Physik.Uni-Muenchen.DE>
To: "Correnson Loïc" <Loic.Correnson@trusted-logic.fr>
Cc: Ocaml <caml-list@inria.fr>
Subject: Re: [Caml-list] Hashtbl and destructive operations on keys
Date: Tue, 23 Mar 2004 13:12:23 +0100 (CET)	[thread overview]
Message-ID: <Pine.LNX.4.58.0403231215540.2953@seekar.cip.physik.uni-muenchen.de> (raw)
In-Reply-To: <00b901c410b8$0c1adb10$0203a8c0@hoedic>


On Tue, 23 Mar 2004, Correnson Loïc wrote:

> Notice also that implementing your own hashtbl would not be so far
> difficult.

Yes, I know, and I did so in lisp before. But one thing you must not 
forget: when doing research, your primary constraint is time. You want to 
get problem X solved with as little effort as possible (since everyone 
looks at your scientific throughput - already assuming your research is of 
reasonable quality). Note that I am not talking about 
compiler/language/other CS research here! Every day, you have a certain 
amount of mental energy which you can use. And you do not want to spend 
too much of it on low-level details like implementing your own hashing 
functions when you have more pressing problems in your mind, like 
devising a new and contrived algorithm to solve a new and contrived 
problem, or like the actual background of the problem you are 
investigating.

> For your dedicated use, the code for hashing and replacing this key to that
> value would not exceed 10 lines,
> assuming you can reuse Ocaml's hashing function and implements hash
> collision with a simple array and associative lists.

This suggestion is not far away from "why don't you use C right away?" - I 
do not want to, I want to make use of available powerful tools to I do not 
have to spend too much time on uninteresting low-level details.

> Instead, your *codes* to work around Hashtbl with or without references may
> never answer your question and will remain actually under specified and
> fragile for your own purpose.

That's precisely why I asked for additional specification.

> Moreover, it is clear in your case that the
> capability for ocaml hash tables to register several values for one key is
> useless, since your always use replace and never add for existing key.

This is in fact true, but then I must ask whether the "other", more usual 
type of hashes provided e.g. by LISP isn't the one that is more widely 
used, and if there really is so much of an overhead in this feature, 
why ocaml does not provide the other type of hashes?

> To
> finish with, your work-around code uses 3 times the hashtbl functions, hence
> you hash the key 3 times. This is incompatible with looking for performance!

Before I talk long... look at the following example:

let ht = Hashtbl.create 1000000;;

for i = 0 to 1000000 do
  let s = String.create 5 in
  for k = 0 to 5-1 do String.set s k (Char.chr ((i lsr (k lsl 2)) land 
15)) done;
  Hashtbl.add ht s i;
done;;

(* Linux-specific... *)

let h = open_in "/proc/self/status" in
let rec scan () =
  let line = input_line h in
  if String.sub line 0 7 = "VmSize:"
  then (close_in h; line)
  else scan()
in Printf.printf "MEM: %s" (scan())
;;

Try it. Then substitute the Hashtbl.add s i by Hashtbl.add s (ref i). 
You'll see that there is roughly 25% difference in RAM consumption. Now, 
if you are working close to the limit of virtual address space, or even 
only if this 25% makes the difference whether your program has to page out 
or not, this is indeed relevant.

But of course, it would be terrific if there were officially 
sanctioned variants of the hashtable functions that take a pre-computed 
hash key as an extra arg.

-- 
regards,               tf@cip.physik.uni-muenchen.de              (o_
 Thomas Fischbacher -  http://www.cip.physik.uni-muenchen.de/~tf  //\
(lambda (n) ((lambda (p q r) (p p q r)) (lambda (g x y)           V_/_
(if (= x 0) y (g g (- x 1) (* x y)))) n 1))                  (Debian GNU)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  reply	other threads:[~2004-03-23 12:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-03-22 18:16 Thomas Fischbacher
2004-03-22 19:09 ` Remi Vanicat
2004-03-22 23:13   ` Thomas Fischbacher
2004-03-23  0:59     ` Jacques Garrigue
2004-03-23  9:20     ` Correnson Loïc
2004-03-23 12:12       ` Thomas Fischbacher [this message]
2004-03-23  9:45     ` Oliver Bandel
2004-03-22 23:44 Oliver Bandel
2004-03-22 23:44 Oliver Bandel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0403231215540.2953@seekar.cip.physik.uni-muenchen.de \
    --to=thomas.fischbacher@physik.uni-muenchen.de \
    --cc=Loic.Correnson@trusted-logic.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).