caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Simon Cruanes <simon.cruanes.2007@m4x.org>
To: "Neuhaeusser, Martin" <martin.neuhaeusser@siemens.com>
Cc: "caml-list@inria.fr" <caml-list@inria.fr>
Subject: Re: [Caml-list] Hash consed Patricia trees
Date: Mon, 23 May 2016 16:49:44 +0200	[thread overview]
Message-ID: <20160523144943.GV25494@nunchakus.loria.fr> (raw)
In-Reply-To: <965631B03C670145ABB9F693E51622530D1279D7@DENBGAT9EK5MSX.ww902.siemens.net>

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]

Hi,

I toyed with an example of that some time ago, implementing a set of
integers with hashconsing of every sub-tree:
https://github.com/c-cube/ocaml-containers/blob/master/src/data/CCHashconsedSet.mli
https://github.com/c-cube/ocaml-containers/blob/master/src/data/CCHashconsedSet.ml
(there are some tests, but for serious use it might be nice to have more
tests).

As far as I can tell after years spent using hashconsed terms,
hashconsing is only useful for quite specific cases. You need to "read"
(typically, use the unique integer, physically compare, etc.) a LOT more
than you "write" (create new structures); or you might want the
reduction in memory if many similar objects are built. In this
particular case, I think it would be worth it if set comparison was
extremely frequent.

Cheers!

Le Mon, 23 May 2016, Neuhaeusser, Martin a écrit :
> Dear all,
> 
> during some experiments with integer set implementations, I came across a discussion on that list that proposed to use Patricia trees and hash consing on the tree nodes' constructors to achieve maximal sharing:
> http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html
> 
> Is anyone aware of a corresponding implementation that also has a performance benefit (or, at least, no negative performance impact) compared to standard sets or to non-hash consed Patricia trees? Or is anyone aware of a paper on that matter? 
> 
> Sadly, in all my experiments, the combination of Patricia trees with hash consing applied to the terms representing the tree has a horrible impact on performance (a slowdown by an order of magnitude). After spending some thoughts, this seems to be reasonable given the structure of a Patricia tree. In particular, we found no way to make significand use of the reflexivity properties obtained by hash consing in set operations like subset or union. In our benchmarks, the time for constructing hash-consed subtrees during set operations outweighs any gains obtained by the "physical equality = set equality" property. Or is the whole point in the earlier discussion the possibility to use hash consing tags for memoization of set operations? 
> 
> Any hints and comments are highly appreciated. It would really be great if some of the participants from the 2008 discussion could perhaps share their experience.


-- 
Simon Cruanes

http://weusepgp.info/
key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08 49AA 62B6

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-05-23 14:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 14:33 Neuhaeusser, Martin
2016-05-23 14:49 ` Simon Cruanes [this message]
2016-05-25 12:29 ` Boris Yakobowski
2016-05-25 13:20   ` Francois Berenger
2016-05-25 19:25     ` Boris Yakobowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160523144943.GV25494@nunchakus.loria.fr \
    --to=simon.cruanes.2007@m4x.org \
    --cc=caml-list@inria.fr \
    --cc=martin.neuhaeusser@siemens.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).