caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* symbol managment in Caml
@ 2000-02-25 12:16 Stephan Houben
  2000-02-25 16:21 ` skaller
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Stephan Houben @ 2000-02-25 12:16 UTC (permalink / raw)
  To: caml-list

Hello list,

I am writing an interpreter for a simple language in O'Caml.
The interpreter often needs to search a hash table with a string
as key. A common optimization for this is to use pointer identity
instead of string equality, and "intern" every string before using
it as a key to the hash table.

Unfortunately, I don't see how to do this with the current O'Caml
libs. You can do identity checks with ==, but there seems no way
to get a good hash corresponding to ==.

Has anyone already written some code for this task?

Thanks in advance,

Stephan



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: symbol managment in Caml
  2000-02-25 12:16 symbol managment in Caml Stephan Houben
@ 2000-02-25 16:21 ` skaller
  2000-02-25 16:25 ` Judicael Courant
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: skaller @ 2000-02-25 16:21 UTC (permalink / raw)
  To: Stephan Houben; +Cc: caml-list

Stephan Houben wrote:
> 
> Hello list,
> 
> I am writing an interpreter for a simple language in O'Caml.
> The interpreter often needs to search a hash table with a string
> as key. A common optimization for this is to use pointer identity
> instead of string equality, and "intern" every string before using
> it as a key to the hash table.
> 
> Unfortunately, I don't see how to do this with the current O'Caml
> libs. You can do identity checks with ==, but there seems no way
> to get a good hash corresponding to ==.
> 
> Has anyone already written some code for this task?

No, but may I suggest using a hashtable of references to strings?
The compiler will also need to keep a separate set of strings,
which the hash table entries reference. I'd be interested in a
performance
comparison (since I have a similar interpreter :-)

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: symbol managment in Caml
  2000-02-25 12:16 symbol managment in Caml Stephan Houben
  2000-02-25 16:21 ` skaller
@ 2000-02-25 16:25 ` Judicael Courant
  2000-02-25 17:10 ` Claudio Sacerdoti Coen
  2000-02-28  9:49 ` Sven LUTHER
  3 siblings, 0 replies; 6+ messages in thread
From: Judicael Courant @ 2000-02-25 16:25 UTC (permalink / raw)
  To: Stephan Houben; +Cc: caml-list


Hi,

In his message of Fri February 25, 2000, Stephan Houben writes: 
> 
> The interpreter often needs to search a hash table with a string
> as key. A common optimization for this is to use pointer identity
> instead of string equality, and "intern" every string before using
> it as a key to the hash table.
> 
> Unfortunately, I don't see how to do this with the current O'Caml
> libs. You can do identity checks with ==, but there seems no way
> to get a good hash corresponding to ==.
> 
> Has anyone already written some code for this task?
> 

I guess it is very difficult to provide a good hash function
corresponding to == because the GC may move the block the string has
been allocated to at any time.

However, in older versions of Coq (http://coq.inria.fr) there was some
code achieving what you are looking for : roughly, identifiers were
internally represented as (unique) integers and we had two conversion
functions ident_of_string and string_of_ident (each time a new string
was interned through ident_of_string, a new integer was
allocated). Therefore, ident comparison was integer comparison, and we
had a hash function over idents.

NB : after a while, the code implementing this was removed from Coq
(we spent more time (re)interning terms than we saved...)

Judicaël Courant.
-- 
Judicael.Courant@lri.fr, http://www.lri.fr/~jcourant/
(+33) (0)1 69 15 64 85
"Montre moi des morceaux de ton monde, et je te montrerai le mien"
Tim, matricule #929, condamné à mort.
http://rozenn.picard.free.fr/tim.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: symbol managment in Caml
  2000-02-25 12:16 symbol managment in Caml Stephan Houben
  2000-02-25 16:21 ` skaller
  2000-02-25 16:25 ` Judicael Courant
@ 2000-02-25 17:10 ` Claudio Sacerdoti Coen
  2000-02-26  6:13   ` Frank A. Christoph
  2000-02-28  9:49 ` Sven LUTHER
  3 siblings, 1 reply; 6+ messages in thread
From: Claudio Sacerdoti Coen @ 2000-02-25 17:10 UTC (permalink / raw)
  To: caml-list

 Hello,

 I have had the same problem and I was about to write the
 new function when I stopped: the time taken by the hash
 function was not worth the extra coding.

 But I think that it's really a great pity that the
 standard library lacks such a function. Moreover, it
 should be really easy to implement it. Could we expect
 it implemented?

					Cheers,
					C.S.C.


On Fri, Feb 25, 2000 at 13:16:22 +0100, Stephan Houben wrote:
> Hello list,
> 
> I am writing an interpreter for a simple language in O'Caml.
> The interpreter often needs to search a hash table with a string
> as key. A common optimization for this is to use pointer identity
> instead of string equality, and "intern" every string before using
> it as a key to the hash table.
> 
> Unfortunately, I don't see how to do this with the current O'Caml
> libs. You can do identity checks with ==, but there seems no way
> to get a good hash corresponding to ==.
> 
> Has anyone already written some code for this task?
> 
> Thanks in advance,
> 
> Stephan

-- 
-----------------------------------------
Real Name: Claudio Sacerdoti Coen
Address: via del Colle n.6
	 S. Lazzaro di Savena (BO)
	 Italy
e-mail:  sacerdot@cs.unibo.it
-----------------------------------------




^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: symbol managment in Caml
  2000-02-25 17:10 ` Claudio Sacerdoti Coen
@ 2000-02-26  6:13   ` Frank A. Christoph
  0 siblings, 0 replies; 6+ messages in thread
From: Frank A. Christoph @ 2000-02-26  6:13 UTC (permalink / raw)
  To: Claudio Sacerdoti Coen, caml-list

[re: hashing identifiers]
>  I have had the same problem and I was about to write the
>  new function when I stopped: the time taken by the hash
>  function was not worth the extra coding.

Even if it takes more, or as much, time as comparing identifiers directly,
there is still an advantage to interning strings: you can share their
representations. Of course, if you don't expect much sharing, or you don't
expect to allocate many identifiers during any single run, it may still not
be worht it.

--fac



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: symbol managment in Caml
  2000-02-25 12:16 symbol managment in Caml Stephan Houben
                   ` (2 preceding siblings ...)
  2000-02-25 17:10 ` Claudio Sacerdoti Coen
@ 2000-02-28  9:49 ` Sven LUTHER
  3 siblings, 0 replies; 6+ messages in thread
From: Sven LUTHER @ 2000-02-28  9:49 UTC (permalink / raw)
  To: caml-list

On Fri, Feb 25, 2000 at 01:16:22PM +0100, Stephan Houben wrote:
> Hello list,
> 
> I am writing an interpreter for a simple language in O'Caml.
> The interpreter often needs to search a hash table with a string
> as key. A common optimization for this is to use pointer identity
> instead of string equality, and "intern" every string before using
> it as a key to the hash table.
> 
> Unfortunately, I don't see how to do this with the current O'Caml
> libs. You can do identity checks with ==, but there seems no way
> to get a good hash corresponding to ==.
> 
> Has anyone already written some code for this task?

Be carefull about this, i had the same kind of idea, but found out that using
references in ocaml is not necessarily a good way of increasing speed, since
some kind of references operation are expensive, and can even trigger a minor
garbage collection wich can in turn start a major garbage collection, thus
making it very expensive.

i was using a small associative list (~15 entries) and it resulted that it was
faster than the reference using version.

Friendly,

Sven LUTHER



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2000-02-28 23:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-02-25 12:16 symbol managment in Caml Stephan Houben
2000-02-25 16:21 ` skaller
2000-02-25 16:25 ` Judicael Courant
2000-02-25 17:10 ` Claudio Sacerdoti Coen
2000-02-26  6:13   ` Frank A. Christoph
2000-02-28  9:49 ` Sven LUTHER

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).