caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Specialized dictionaries
@ 2001-11-05 10:06 Marcin 'Qrczak' Kowalczyk
  2001-11-05 10:19 ` Xavier Leroy
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 10:06 UTC (permalink / raw)
  To: caml-list

I need dictionaries indexed by ints which must be very fast. I'm
afraid that there is an overhead in using Hashtbl.t such that the
generic hash function must recognize that the value is immediate
instead of using it as a hash directly.

Is it worth to do something with it? What to do? I could copy the first
half of hashtbl.ml and replace all occurrences of the function hash by
land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
nonnegative results). Any better idea?

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^
QRCZAK

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
@ 2001-11-05 10:19 ` Xavier Leroy
  2001-11-05 10:32 ` Jean-Christophe Filliatre
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Xavier Leroy @ 2001-11-05 10:19 UTC (permalink / raw)
  To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list

> I need dictionaries indexed by ints which must be very fast. I'm
> afraid that there is an overhead in using Hashtbl.t such that the
> generic hash function must recognize that the value is immediate
> instead of using it as a hash directly.
> 
> Is it worth to do something with it? What to do? I could copy the first
> half of hashtbl.ml and replace all occurrences of the function hash by
> land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
> nonnegative results). Any better idea?

No, no need to copy anything, just unleash the power of functors!

module IntHashtbl = Hashtbl.make(struct type t = int
                                        let equal = (==)
                                        let hash x = x land 0x3FFFFFFF
                                 end)

I'm not sure the performance gain is significant, but it's worth a try.

- Xavier Leroy
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
  2001-11-05 10:19 ` Xavier Leroy
@ 2001-11-05 10:32 ` Jean-Christophe Filliatre
  2001-11-05 17:36   ` Florian Hars
       [not found]   ` <9s6j7c$i6r$1@qrnik.zagroda>
       [not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
  2001-11-05 23:40 ` Julian Assange
  3 siblings, 2 replies; 12+ messages in thread
From: Jean-Christophe Filliatre @ 2001-11-05 10:32 UTC (permalink / raw)
  To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list


Marcin 'Qrczak' Kowalczyk writes:
 > I need dictionaries indexed by ints which must be very fast. I'm
 > afraid that there is an overhead in using Hashtbl.t such that the
 > generic hash function must recognize that the value is immediate
 > instead of using it as a hash directly.
 > 
 > Is it worth to do something with it? What to do? I could copy the first
 > half of hashtbl.ml and replace all occurrences of the function hash by
 > land'ing with 0x3FFFFFFF (so the value is nonnegative and mod gives
 > nonnegative results). Any better idea?

As  suggested  by  Xavier  regarding  your  other  question,  you  can
instantiate Hashtbl.Make accordingly:

======================================================================
module IntHashtbl = Hashtbl.Make(struct
  type t = int
  let equal = (==)
  let hash n = n land 0x3FFFFFFF
end)
======================================================================

To be  even more efficient, I'm  afraid you have to  follow your idea,
that is to inline this hash function in your own copy of hashtbl.ml.

-- 
Jean-Christophe Filliatre (http://www.lri.fr/~filliatr)

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
       [not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
@ 2001-11-05 11:49   ` Marcin 'Qrczak' Kowalczyk
  0 siblings, 0 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 11:49 UTC (permalink / raw)
  To: caml-list

Mon, 5 Nov 2001 11:19:56 +0100, Xavier Leroy <xavier.leroy@inria.fr> pisze:

> No, no need to copy anything, just unleash the power of functors!
> 
> module IntHashtbl = Hashtbl.make(struct type t = int
>                                         let equal = (==)
>                                         let hash x = x land 0x3FFFFFFF
>                                  end)

Ok, I tried:

 Implementation                  | Test1 | Test2
---------------------------------+-------+-------
 Hashtbl.t                       | 7.40s | 6.45s
 Hashtbl.Make(...)               | 3.62s | 5.35s
 hashtbl.ml specialized for ints | 2.37s | 5.00s

Test1 is a small program which does nothing but dictionary lookups.
Test2 is a real program where I use dictionaries.

It happenens that

    let equal = (==)

    let equal (x : int) (y : int) = x = y

are fast, where

    let equal x y = x = y
      (* module constrained by Hashtbl.HashedType with type t = int *)

    let equal : int -> int -> bool = (=)

are slow. The compiler doesn't insert the specialized equality version
if it's not immediately applied, or if its type is constrained only
by module signature.

I'm going to use the functorial version: 7% loss of performance
wrt. the specialized version is acceptable.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^
QRCZAK

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 10:32 ` Jean-Christophe Filliatre
@ 2001-11-05 17:36   ` Florian Hars
  2001-11-05 17:54     ` Sven
       [not found]   ` <9s6j7c$i6r$1@qrnik.zagroda>
  1 sibling, 1 reply; 12+ messages in thread
From: Florian Hars @ 2001-11-05 17:36 UTC (permalink / raw)
  To: Jean-Christophe Filliatre; +Cc: Marcin 'Qrczak' Kowalczyk, caml-list

On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote:
> Marcin 'Qrczak' Kowalczyk writes:
>  > I need dictionaries indexed by ints which must be very fast.
> 
> To be  even more efficient, I'm  afraid you have to  follow your idea,
> that is to inline this hash function in your own copy of hashtbl.ml.

Wouldn't the Patricia Trees (from the 
"the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on 
http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless 
the problem needs the in-place update available with Hashtbl)? 
The documentation claims that "The
    performances are always better than the standard library's module
    [Set], except for linear insertion (building a set by insertion of
    consecutive integers)."
Or is Hashtbl faster still?

Yours, Florian Hars.


-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 17:36   ` Florian Hars
@ 2001-11-05 17:54     ` Sven
  0 siblings, 0 replies; 12+ messages in thread
From: Sven @ 2001-11-05 17:54 UTC (permalink / raw)
  To: Florian Hars
  Cc: Jean-Christophe Filliatre, Marcin 'Qrczak' Kowalczyk, caml-list

On Mon, Nov 05, 2001 at 06:36:54PM +0100, Florian Hars wrote:
> On Mon, Nov 05, 2001 at 11:32:51AM +0100, Jean-Christophe Filliatre wrote:
> > Marcin 'Qrczak' Kowalczyk writes:
> >  > I need dictionaries indexed by ints which must be very fast.
> > 
> > To be  even more efficient, I'm  afraid you have to  follow your idea,
> > that is to inline this hash function in your own copy of hashtbl.ml.
> 
> Wouldn't the Patricia Trees (from the 
> "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on 
> http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless 
> the problem needs the in-place update available with Hashtbl)? 
> The documentation claims that "The
>     performances are always better than the standard library's module
>     [Set], except for linear insertion (building a set by insertion of
>     consecutive integers)."

The standard library [Set] is a functional B tree, if i am not wrong, it is
quite fast, but depending on the apps, it will not be faster than the
hashtable, that's why we have the hashables.

Sven Luther
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
       [not found]   ` <9s6j7c$i6r$1@qrnik.zagroda>
@ 2001-11-05 18:18     ` Marcin 'Qrczak' Kowalczyk
  2001-11-05 18:24       ` Nicolas George
       [not found]       ` <9s6m53$k16$1@qrnik.zagroda>
  0 siblings, 2 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 18:18 UTC (permalink / raw)
  To: caml-list

Mon, 5 Nov 2001 18:36:54 +0100, Florian Hars <florian@hars.de> pisze:

> Wouldn't the Patricia Trees (from the 
> "the-name-of-the-author-currently-escapes-me"-department :-)) mentioned on 
> http://www.lri.fr/~filliatr/software.en.html be useful in this case (unless
> the problem needs the in-place update available with Hashtbl)? 

Hey, it's faster! One program runs in 4.4s instead of 5.3s. Thanks!

I'm using these dictionaries for dispatching on types in a dynamically
typed language compiled to OCaml. So updates are rare, dictionaries are
small and they contain small integers, but lookups are very frequent.

There are also rarely used dictionaries indexed by pairs of integers
and Hashtbl should be OK for them.

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^
QRCZAK

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 18:18     ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-05 18:24       ` Nicolas George
       [not found]       ` <9s6m53$k16$1@qrnik.zagroda>
  1 sibling, 0 replies; 12+ messages in thread
From: Nicolas George @ 2001-11-05 18:24 UTC (permalink / raw)
  To: caml-list

Le quintidi 15 brumaire, an CCX, Marcin 'Qrczak' Kowalczyk a écrit :
>				    So updates are rare, dictionaries are
> small and they contain small integers, but lookups are very frequent.

What about using a simple array for that?
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
       [not found]       ` <9s6m53$k16$1@qrnik.zagroda>
@ 2001-11-05 20:56         ` Marcin 'Qrczak' Kowalczyk
  2001-11-06  6:53           ` Sven
  2001-11-06  0:35         ` Marcin 'Qrczak' Kowalczyk
  1 sibling, 1 reply; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-05 20:56 UTC (permalink / raw)
  To: caml-list

Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:

>>				    So updates are rare, dictionaries are
>> small and they contain small integers, but lookups are very frequent.
> 
> What about using a simple array for that?

Then usually contain small integers, but theoretically these integers
can go large. If many types are created in a program, then it would
be wasteful to allocate large arrays for each dispatched function
which uses a single type with a large number.

Perhaps some heuristic could use an array for the initial segment
of numbers (which correspond to types created earlier) and another
dictionary for the rest, but it would complicate what is being
done purely for fun and for being simple. More importantly, small
differences such that loading modules in a different order could have
large effects; I don't like treating old types and young types in a
very different way.

I've heard about packing multiple dispatch tables in a large array.
Well, it's complicated, and it's hard to perform dynamic updates if
slots are used by different functions. Updates are rare but they do
occur - for example if a dispatched function is used at a type for
the first time and the implementation was found at its supertype.

I don't know...

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^
QRCZAK

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
                   ` (2 preceding siblings ...)
       [not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
@ 2001-11-05 23:40 ` Julian Assange
  3 siblings, 0 replies; 12+ messages in thread
From: Julian Assange @ 2001-11-05 23:40 UTC (permalink / raw)
  To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list

Do these dictionaries change? If not you could consider searching for
a perfect hash algorithm.

--
 Julian Assange        |If you want to build a ship, don't drum up people
                       |together to collect wood or assign them tasks and
 proff@iq.org          |work, but rather teach them to long for the endless
 proff@gnu.ai.mit.edu  |immensity of the sea. -- Antoine de Saint Exupery
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
       [not found]       ` <9s6m53$k16$1@qrnik.zagroda>
  2001-11-05 20:56         ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-06  0:35         ` Marcin 'Qrczak' Kowalczyk
  1 sibling, 0 replies; 12+ messages in thread
From: Marcin 'Qrczak' Kowalczyk @ 2001-11-06  0:35 UTC (permalink / raw)
  To: caml-list

Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:

>>				    So updates are rare, dictionaries are
>> small and they contain small integers, but lookups are very frequent.
> 
> What about using a simple array for that?

I tested how fast is 'a option array, allocated big enough for the
test and with bounds checking disabled. Surprisingly it's only 5%
faster than the Patricia tree, measuring the whole program which does
many lookups but also other things like computing Fibonacci numbers.

The difference would be obviously larger if actual dictionaries had
more entries (the ones I used happened to have 10, 2 and 3), but now
I feel that this part is optimized enough.

Here is the mutable version of Ptmap I'm using:

module Typetbl =
  struct
    type 'a t = 'a Ptmap.t ref
    let create _ = ref Ptmap.empty
    let add dict k v = dict := Ptmap.add k v (!dict)
    let find dict k = Ptmap.find k (!dict)
    let mem dict k = Ptmap.mem k (!dict)
    let replace dict k v = dict := Ptmap.add k v (!dict)
    let clear dict = dict := Ptmap.empty
  end

And here is the quick & dirty array wrapper:

module Typetbl =
  struct
    type 'a t = 'a option array
    let create _ = Array.make 100 None
    let add dict k v = dict.(k) <- Some v
    let find dict k = match dict.(k) with
      | Some v -> v
      | None -> raise Not_found
    let mem dict k = match dict.(k) with
      | Some _ -> true
      | None -> false
    let replace dict k v = dict.(k) <- Some v
    let clear dict = for i = 0 to 99 do dict.(i) <- None done
  end

-- 
 __("<  Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^
QRCZAK

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Specialized dictionaries
  2001-11-05 20:56         ` Marcin 'Qrczak' Kowalczyk
@ 2001-11-06  6:53           ` Sven
  0 siblings, 0 replies; 12+ messages in thread
From: Sven @ 2001-11-06  6:53 UTC (permalink / raw)
  To: Marcin 'Qrczak' Kowalczyk; +Cc: caml-list

On Mon, Nov 05, 2001 at 08:56:40PM +0000, Marcin 'Qrczak' Kowalczyk wrote:
> Mon, 5 Nov 2001 19:24:06 +0100, Nicolas George <nicolas.george@ens.fr> pisze:
> 
> >>				    So updates are rare, dictionaries are
> >> small and they contain small integers, but lookups are very frequent.
> > 
> > What about using a simple array for that?
> 
> Then usually contain small integers, but theoretically these integers
> can go large. If many types are created in a program, then it would
> be wasteful to allocate large arrays for each dispatched function
> which uses a single type with a large number.
> 
> Perhaps some heuristic could use an array for the initial segment
> of numbers (which correspond to types created earlier) and another
> dictionary for the rest, but it would complicate what is being
> done purely for fun and for being simple. More importantly, small
> differences such that loading modules in a different order could have
> large effects; I don't like treating old types and young types in a
> very different way.
> 
> I've heard about packing multiple dispatch tables in a large array.
> Well, it's complicated, and it's hard to perform dynamic updates if
> slots are used by different functions. Updates are rare but they do
> occur - for example if a dispatched function is used at a type for
> the first time and the implementation was found at its supertype.

What about using a datatype with several arrays, using a maximum number of
entries per array or something like that, and then having a serie of such
arrayys, or an array of arrays. You would just need to make a division and a
modulo operation to get the right array and get the value, if you take the
rigth max number, you could even get away with only bit shifts, which is not
so expensive and two indirections instead of one.

If you do it right, you could even have the datatype grow incrementally based
on your needs. That will work only if you numbers are contigous though.

That said, i had the impression that, as ocaml is optimized for functional
datatypes, it will be more freindly to the GC that you use a functional
datatype, and thus faster maybe.

Friendly,

Sven Luther
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-11-06 11:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-11-05 10:06 [Caml-list] Specialized dictionaries Marcin 'Qrczak' Kowalczyk
2001-11-05 10:19 ` Xavier Leroy
2001-11-05 10:32 ` Jean-Christophe Filliatre
2001-11-05 17:36   ` Florian Hars
2001-11-05 17:54     ` Sven
     [not found]   ` <9s6j7c$i6r$1@qrnik.zagroda>
2001-11-05 18:18     ` Marcin 'Qrczak' Kowalczyk
2001-11-05 18:24       ` Nicolas George
     [not found]       ` <9s6m53$k16$1@qrnik.zagroda>
2001-11-05 20:56         ` Marcin 'Qrczak' Kowalczyk
2001-11-06  6:53           ` Sven
2001-11-06  0:35         ` Marcin 'Qrczak' Kowalczyk
     [not found] ` <9s5pe7$5k6$1@qrnik.zagroda>
2001-11-05 11:49   ` Marcin 'Qrczak' Kowalczyk
2001-11-05 23:40 ` Julian Assange

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).