caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Hash consed Patricia trees
@ 2016-05-23 14:33 Neuhaeusser, Martin
  2016-05-23 14:49 ` Simon Cruanes
  2016-05-25 12:29 ` Boris Yakobowski
  0 siblings, 2 replies; 5+ messages in thread
From: Neuhaeusser, Martin @ 2016-05-23 14:33 UTC (permalink / raw)
  To: caml-list

Dear all,

during some experiments with integer set implementations, I came across a discussion on that list that proposed to use Patricia trees and hash consing on the tree nodes' constructors to achieve maximal sharing:
http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html

Is anyone aware of a corresponding implementation that also has a performance benefit (or, at least, no negative performance impact) compared to standard sets or to non-hash consed Patricia trees? Or is anyone aware of a paper on that matter? 

Sadly, in all my experiments, the combination of Patricia trees with hash consing applied to the terms representing the tree has a horrible impact on performance (a slowdown by an order of magnitude). After spending some thoughts, this seems to be reasonable given the structure of a Patricia tree. In particular, we found no way to make significand use of the reflexivity properties obtained by hash consing in set operations like subset or union. In our benchmarks, the time for constructing hash-consed subtrees during set operations outweighs any gains obtained by the "physical equality = set equality" property. Or is the whole point in the earlier discussion the possibility to use hash consing tags for memoization of set operations? 

Any hints and comments are highly appreciated. It would really be great if some of the participants from the 2008 discussion could perhaps share their experience.

Best regards,
Martin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hash consed Patricia trees
  2016-05-23 14:33 [Caml-list] Hash consed Patricia trees Neuhaeusser, Martin
@ 2016-05-23 14:49 ` Simon Cruanes
  2016-05-25 12:29 ` Boris Yakobowski
  1 sibling, 0 replies; 5+ messages in thread
From: Simon Cruanes @ 2016-05-23 14:49 UTC (permalink / raw)
  To: Neuhaeusser, Martin; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]

Hi,

I toyed with an example of that some time ago, implementing a set of
integers with hashconsing of every sub-tree:
https://github.com/c-cube/ocaml-containers/blob/master/src/data/CCHashconsedSet.mli
https://github.com/c-cube/ocaml-containers/blob/master/src/data/CCHashconsedSet.ml
(there are some tests, but for serious use it might be nice to have more
tests).

As far as I can tell after years spent using hashconsed terms,
hashconsing is only useful for quite specific cases. You need to "read"
(typically, use the unique integer, physically compare, etc.) a LOT more
than you "write" (create new structures); or you might want the
reduction in memory if many similar objects are built. In this
particular case, I think it would be worth it if set comparison was
extremely frequent.

Cheers!

Le Mon, 23 May 2016, Neuhaeusser, Martin a écrit :
> Dear all,
> 
> during some experiments with integer set implementations, I came across a discussion on that list that proposed to use Patricia trees and hash consing on the tree nodes' constructors to achieve maximal sharing:
> http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html
> 
> Is anyone aware of a corresponding implementation that also has a performance benefit (or, at least, no negative performance impact) compared to standard sets or to non-hash consed Patricia trees? Or is anyone aware of a paper on that matter? 
> 
> Sadly, in all my experiments, the combination of Patricia trees with hash consing applied to the terms representing the tree has a horrible impact on performance (a slowdown by an order of magnitude). After spending some thoughts, this seems to be reasonable given the structure of a Patricia tree. In particular, we found no way to make significand use of the reflexivity properties obtained by hash consing in set operations like subset or union. In our benchmarks, the time for constructing hash-consed subtrees during set operations outweighs any gains obtained by the "physical equality = set equality" property. Or is the whole point in the earlier discussion the possibility to use hash consing tags for memoization of set operations? 
> 
> Any hints and comments are highly appreciated. It would really be great if some of the participants from the 2008 discussion could perhaps share their experience.


-- 
Simon Cruanes

http://weusepgp.info/
key 49AA62B6, fingerprint 949F EB87 8F06 59C6 D7D3  7D8D 4AC0 1D08 49AA 62B6

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hash consed Patricia trees
  2016-05-23 14:33 [Caml-list] Hash consed Patricia trees Neuhaeusser, Martin
  2016-05-23 14:49 ` Simon Cruanes
@ 2016-05-25 12:29 ` Boris Yakobowski
  2016-05-25 13:20   ` Francois Berenger
  1 sibling, 1 reply; 5+ messages in thread
From: Boris Yakobowski @ 2016-05-25 12:29 UTC (permalink / raw)
  To: Neuhaeusser, Martin; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3393 bytes --]

Hi,

The Value Analysis plugin of Frama-C uses hash-consing of Patricial trees
extensively. In fact, some analyses would not run without it at all. See
Section 9 of cristal.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.ps
for more details. Unfortunately, as mentioned there, no figures exist for
with hash-consing vs. without hash-consing -- but most of the examples
would have failed without it.

Although I'm not sure what was implemented exactly at the time, one
important feature when using hash-consed Patricia trees is the possibility
of using caches. Alain mentioned this in this mail:

> Also, you get a nice unique integer for each tree. This allow you to
> memoize efficiently set operations (like union, intersection, for which
> you can use memoization in the inner loop, not only at toplevel), and to
> build sets of sets (and so on).

I should stress that the possibility of memoizing *in the inner loop*, is
crucial. When performing e.g. unions or map2 operations, it is possible to
return a result in constant time when either
- the two trees is equivalent (because e.g. union s s == s)
- the two trees have already been merged, and the result is in the cache.
In practice, most operations become O(D ln D), where D is the number of
differences between the two trees, or even O(1) if the cache is big enough
and the operations repetitive enough.

If this kind of caching may be useful to you, the files hptmap*.ml* of
Frama-C provides very nice iterators and abstractions.

HTH,


On Mon, May 23, 2016 at 4:33 PM, Neuhaeusser, Martin <
martin.neuhaeusser@siemens.com> wrote:

> Dear all,
>
> during some experiments with integer set implementations, I came across a
> discussion on that list that proposed to use Patricia trees and hash
> consing on the tree nodes' constructors to achieve maximal sharing:
>
> http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html
>
> Is anyone aware of a corresponding implementation that also has a
> performance benefit (or, at least, no negative performance impact) compared
> to standard sets or to non-hash consed Patricia trees? Or is anyone aware
> of a paper on that matter?
>
> Sadly, in all my experiments, the combination of Patricia trees with hash
> consing applied to the terms representing the tree has a horrible impact on
> performance (a slowdown by an order of magnitude). After spending some
> thoughts, this seems to be reasonable given the structure of a Patricia
> tree. In particular, we found no way to make significand use of the
> reflexivity properties obtained by hash consing in set operations like
> subset or union. In our benchmarks, the time for constructing hash-consed
> subtrees during set operations outweighs any gains obtained by the
> "physical equality = set equality" property. Or is the whole point in the
> earlier discussion the possibility to use hash consing tags for memoization
> of set operations?
>
> Any hints and comments are highly appreciated. It would really be great if
> some of the participants from the 2008 discussion could perhaps share their
> experience.
>
> Best regards,
> Martin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs




-- 
Boris

[-- Attachment #2: Type: text/html, Size: 4589 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hash consed Patricia trees
  2016-05-25 12:29 ` Boris Yakobowski
@ 2016-05-25 13:20   ` Francois Berenger
  2016-05-25 19:25     ` Boris Yakobowski
  0 siblings, 1 reply; 5+ messages in thread
From: Francois Berenger @ 2016-05-25 13:20 UTC (permalink / raw)
  To: caml-list

On 25/05/2016 14:29, Boris Yakobowski wrote:
> Hi,
>
> The Value Analysis plugin of Frama-C uses hash-consing of Patricial
> trees extensively. In fact, some analyses would not run without it at
> all. See Section 9 of
> cristal.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.ps
> <http://cristal.inria.fr/~doligez/publications/cuoq-doligez-mlw-2008.ps>
> for more details. Unfortunately, as mentioned there, no figures exist
> for with hash-consing vs. without hash-consing -- but most of the
> examples would have failed without it.
>
> Although I'm not sure what was implemented exactly at the time, one
> important feature when using hash-consed Patricia trees is the
> possibility of using caches. Alain mentioned this in this mail:
>
>> Also, you get a nice unique integer for each tree. This allow you to
>> memoize efficiently set operations (like union, intersection, for which
>> you can use memoization in the inner loop, not only at toplevel), and to
>> build sets of sets (and so on).
>
> I should stress that the possibility of memoizing *in the inner loop*,
> is crucial. When performing e.g. unions or map2 operations, it is
> possible to return a result in constant time when either
> - the two trees is equivalent (because e.g. union s s == s)
> - the two trees have already been merged, and the result is in the cache.
> In practice, most operations become O(D ln D), where D is the number of
> differences between the two trees, or even O(1) if the cache is big
> enough and the operations repetitive enough.
>
> If this kind of caching may be useful to you, the files hptmap*.ml* of
> Frama-C provides very nice iterators and abstractions.

It might even be useful to have this data structure in opam provided as 
a standalone library.

> HTH,
>
>
> On Mon, May 23, 2016 at 4:33 PM, Neuhaeusser, Martin
> <martin.neuhaeusser@siemens.com <mailto:martin.neuhaeusser@siemens.com>>
> wrote:
>
>     Dear all,
>
>     during some experiments with integer set implementations, I came
>     across a discussion on that list that proposed to use Patricia trees
>     and hash consing on the tree nodes' constructors to achieve maximal
>     sharing:
>     http://caml.inria.fr/pub/ml-archives/caml-list/2008/03/5be97d51e2e8aab16b9e7e369a5a5533.en.html
>
>     Is anyone aware of a corresponding implementation that also has a
>     performance benefit (or, at least, no negative performance impact)
>     compared to standard sets or to non-hash consed Patricia trees? Or
>     is anyone aware of a paper on that matter?
>
>     Sadly, in all my experiments, the combination of Patricia trees with
>     hash consing applied to the terms representing the tree has a
>     horrible impact on performance (a slowdown by an order of
>     magnitude). After spending some thoughts, this seems to be
>     reasonable given the structure of a Patricia tree. In particular, we
>     found no way to make significand use of the reflexivity properties
>     obtained by hash consing in set operations like subset or union. In
>     our benchmarks, the time for constructing hash-consed subtrees
>     during set operations outweighs any gains obtained by the "physical
>     equality = set equality" property. Or is the whole point in the
>     earlier discussion the possibility to use hash consing tags for
>     memoization of set operations?
>
>     Any hints and comments are highly appreciated. It would really be
>     great if some of the participants from the 2008 discussion could
>     perhaps share their experience.
>
>     Best regards,
>     Martin
>
>     --
>     Caml-list mailing list.  Subscription management and archives:
>     https://sympa.inria.fr/sympa/arc/caml-list
>     Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>     Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
>
>
> --
> Boris

-- 
Regards,
Francois.
"When in doubt, use more types"

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Hash consed Patricia trees
  2016-05-25 13:20   ` Francois Berenger
@ 2016-05-25 19:25     ` Boris Yakobowski
  0 siblings, 0 replies; 5+ messages in thread
From: Boris Yakobowski @ 2016-05-25 19:25 UTC (permalink / raw)
  To: Francois Berenger; +Cc: The Caml Mailing List

[-- Attachment #1: Type: text/plain, Size: 1255 bytes --]

On Wed, May 25, 2016 at 3:20 PM, Francois Berenger <
francois.berenger@inria.fr> wrote:

> On 25/05/2016 14:29, Boris Yakobowski wrote:
>
>> If this kind of caching may be useful to you, the files hptmap*.ml* of
>> Frama-C provides very nice iterators and abstractions.
>>
>
> It might even be useful to have this data structure in opam provided as a
> standalone library.
>

I agree, but the devil lurks in the details. Currently, these modules
depend on a Frama-C abstraction for (OCaml) types, which would need to be
exported as a library first. This is turns requires more things: Frama-C
logging infrastructure, Frama-C's notion of "projects", etc. Simply
parameterizing the library by all those modules would result in a functor
with maybe 8 or 9 arguments -- with only 3 being really "useful".

If there is a strong interest in this library, we can try to parameterize
the current functor by the few _functions_ that are really required, and
use a wrapper above this version in Frama-C. People at OCamlPro are
currently looking at the functionalities of the library for the
Secure-OCaml project. This may give us a further incentive.

(There is also the fact that we cannot resist slightly changing the APIs at
least once per version!)


-- 
Boris

[-- Attachment #2: Type: text/html, Size: 1930 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-25 19:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 14:33 [Caml-list] Hash consed Patricia trees Neuhaeusser, Martin
2016-05-23 14:49 ` Simon Cruanes
2016-05-25 12:29 ` Boris Yakobowski
2016-05-25 13:20   ` Francois Berenger
2016-05-25 19:25     ` Boris Yakobowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).