caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <xavier.leroy@inria.fr>
To: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Hashtbl and max_len
Date: Mon, 19 Nov 2001 16:34:39 +0100	[thread overview]
Message-ID: <20011119163439.C7102@pauillac.inria.fr> (raw)
In-Reply-To: <slrn9vfiul.rkm.qrczak@qrnik.zagroda>; from qrczak@knm.org.pl on Sun, Nov 18, 2001 at 02:55:17PM +0000

> The initial_size argument of Hashtbl.create has much bigger impact on
> efficiency than I thought.
> [...]
> IMHO max_len shouldn't grow that much. Why does it grow at all when
> the current size is not near Sys.max_array_length?

The original strategy was indeed to bound the length of buckets by a
constant, and resize when a bucket exceeds this length.  However, this
results in excessive resizing when the hashing causes collisions, or
(worse!) the same key is repeatedly inserted in the hash table.
Hence, as a stopgap, we decided to increase the maximal bucket length
at each resizing.

There are two problems with this strategy:
- as you noticed, this causes the initial size to impact performance
  significantly;
- checking for long buckets becomes linear in the size of the table,
  resulting in quadratic behavior for long sequences of insertions.

The latter problem was the last nail in the coffin of this
implementation.  In the working sources, we now keep track of the size
(total number of elements inserted in the table) and base resizing
decisions on this.  In case the hashing is good (all buckets have
approximately the same length), this results in constant length for
buckets.  If the hashing is not good, the performances degrade more
gracefully than with the old scheme.

- Xavier Leroy

PS.  Thanks for pointing out the typos in the working implementation.
The dangers of cut-and-paste programming...
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


      parent reply	other threads:[~2001-11-19 15:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-11-18 14:55 Marcin 'Qrczak' Kowalczyk
2001-11-18 17:01 ` [Caml-list] " Marcin 'Qrczak' Kowalczyk
2001-11-18 17:19   ` Marcin 'Qrczak' Kowalczyk
2001-11-19 15:34 ` Xavier Leroy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20011119163439.C7102@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=qrczak@knm.org.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).