Re: [Caml-list] Hash consed Patricia trees

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Francois Berenger <francois.berenger@inria.fr>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Hash consed Patricia trees
Date: Wed, 25 May 2016 15:20:03 +0200	[thread overview]
Message-ID: <5745A683.2050108@inria.fr> (raw)
In-Reply-To: <CABbVA-Bn7CVFv77r6dMedVKTPR7HJEJ9pSGXJh_PwjEPMnq4gQ@mail.gmail.com>

On 25/05/2016 14:29, Boris Yakobowski wrote:
> Hi,
>
> The Value Analysis plugin of Frama-C uses hash-consing of Patricial
> trees extensively. In fact, some analyses would not run without it at
> all. See Section 9 of
> cristal.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.ps
> <http://cristal.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.ps>
> for more details. Unfortunately, as mentioned there, no figures exist
> for with hash-consing vs. without hash-consing -- but most of the
> examples would have failed without it.
>
> Although I'm not sure what was implemented exactly at the time, one
> important feature when using hash-consed Patricia trees is the
> possibility of using caches. Alain mentioned this in this mail:
>
>> Also, you get a nice unique integer for each tree. This allow you to
>> memoize efficiently set operations (like union, intersection, for which
>> you can use memoization in the inner loop, not only at toplevel), and to
>> build sets of sets (and so on).
>
> I should stress that the possibility of memoizing *in the inner loop*,
> is crucial. When performing e.g. unions or map2 operations, it is
> possible to return a result in constant time when either
> - the two trees is equivalent (because e.g. union s s == s)
> - the two trees have already been merged, and the result is in the cache.
> In practice, most operations become O(D ln D), where D is the number of
> differences between the two trees, or even O(1) if the cache is big
> enough and the operations repetitive enough.
>
> If this kind of caching may be useful to you, the files hptmap*.ml* of
> Frama-C provides very nice iterators and abstractions.

It might even be useful to have this data structure in opam provided as 
a standalone library.

> HTH,
>
>
> On Mon, May 23, 2016 at 4:33 PM, Neuhaeusser, Martin
> <martin.neuhaeusser@siemens.com <mailto:martin.neuhaeusser@siemens.com>>
> wrote:
>
>     Dear all,
>
>     during some experiments with integer set implementations, I came
>     across a discussion on that list that proposed to use Patricia trees
>     and hash consing on the tree nodes' constructors to achieve maximal
>     sharing:
>     http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html
>
>     Is anyone aware of a corresponding implementation that also has a
>     performance benefit (or, at least, no negative performance impact)
>     compared to standard sets or to non-hash consed Patricia trees? Or
>     is anyone aware of a paper on that matter?
>
>     Sadly, in all my experiments, the combination of Patricia trees with
>     hash consing applied to the terms representing the tree has a
>     horrible impact on performance (a slowdown by an order of
>     magnitude). After spending some thoughts, this seems to be
>     reasonable given the structure of a Patricia tree. In particular, we
>     found no way to make significand use of the reflexivity properties
>     obtained by hash consing in set operations like subset or union. In
>     our benchmarks, the time for constructing hash-consed subtrees
>     during set operations outweighs any gains obtained by the "physical
>     equality = set equality" property. Or is the whole point in the
>     earlier discussion the possibility to use hash consing tags for
>     memoization of set operations?
>
>     Any hints and comments are highly appreciated. It would really be
>     great if some of the participants from the 2008 discussion could
>     perhaps share their experience.
>
>     Best regards,
>     Martin
>
>     --
>     Caml-list mailing list.  Subscription management and archives:
>     https://sympa.inria.fr/sympa/arc/caml-list
>     Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>     Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>
>
> --
> Boris

-- 
Regards,
Francois.
"When in doubt, use more types"

next prev parent reply	other threads:[~2016-05-25 13:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 14:33 Neuhaeusser, Martin
2016-05-23 14:49 ` Simon Cruanes
2016-05-25 12:29 ` Boris Yakobowski
2016-05-25 13:20   ` Francois Berenger [this message]
2016-05-25 19:25     ` Boris Yakobowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5745A683.2050108@inria.fr \
    --to=francois.berenger@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).