caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: [Caml-list] Why AVL-tree?
@ 2014-06-02 13:21 Damien Guichard
  2014-06-02 13:34 ` Andrew Herron
  0 siblings, 1 reply; 17+ messages in thread
From: Damien Guichard @ 2014-06-02 13:21 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

 
Red-black tree would spare a machine word per node, because a red-black tree
doesn't need depth information. 
Hence the reason is either historical or a space/speed trade-off (comparing
two depths may be faster than pattern matching). 
 
Regards,
 
damien guichard 
Hi, list, 


Just from the curiosity, why balanced binary trees used in Set and Map are
AVL-trees, not their alternative, say, red-black trees?  Is there a deep
reason for it, or just a historical one?


Best,
-- 

Yoriyuki Yamagata
http://yoriyuki.info/

[-- Attachment #2: Type: text/html, Size: 1539 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 13:21 [Caml-list] Why AVL-tree? Damien Guichard
@ 2014-06-02 13:34 ` Andrew Herron
  2014-06-02 15:06   ` Gabriel Scherer
  2014-06-02 16:57   ` Xavier Leroy
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Herron @ 2014-06-02 13:34 UTC (permalink / raw)
  To: Damien Guichard; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

Wikipedia has some notes on the difference:


http://en.wikipedia.org/wiki/AVL_tree




AVL has faster lookup, so maybe they decided to optimise for that.




It's different to some other languages I've seen, but then so is their decision to not use a tail recursive List.map. Each to their own, it's not hard to implement the alternative :)

On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
wrote:

>  
> Red-black tree would spare a machine word per node, because a red-black tree
> doesn't need depth information. 
> Hence the reason is either historical or a space/speed trade-off (comparing
> two depths may be faster than pattern matching). 
>  
> Regards,
>  
> damien guichard 
> Hi, list, 
> Just from the curiosity, why balanced binary trees used in Set and Map are
> AVL-trees, not their alternative, say, red-black trees?  Is there a deep
> reason for it, or just a historical one?
> Best,
> -- 
> Yoriyuki Yamagata
> http://yoriyuki.info/
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

[-- Attachment #2: Type: text/html, Size: 2110 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 13:34 ` Andrew Herron
@ 2014-06-02 15:06   ` Gabriel Scherer
  2014-06-03 12:48     ` Yaron Minsky
  2014-06-02 16:57   ` Xavier Leroy
  1 sibling, 1 reply; 17+ messages in thread
From: Gabriel Scherer @ 2014-06-02 15:06 UTC (permalink / raw)
  To: Andrew Herron; +Cc: Damien Guichard, Caml List

Note that OCaml's balanced trees are not exactly what is usually
called AVL, as the imbalance between different branches can be at most
2 (+1 on one side and -1 on the other) instead of just 1 as the
traditional definition assumes.

On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote:
> Wikipedia has some notes on the difference:
>
> http://en.wikipedia.org/wiki/AVL_tree
>
> AVL has faster lookup, so maybe they decided to optimise for that.
>
> It's different to some other languages I've seen, but then so is their
> decision to not use a tail recursive List.map. Each to their own, it's not
> hard to implement the alternative :)
>
>
> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
> wrote:
>>
>>
>> Red-black tree would spare a machine word per node, because a red-black
>> tree doesn't need depth information.
>> Hence the reason is either historical or a space/speed trade-off
>> (comparing two depths may be faster than pattern matching).
>>
>> Regards,
>>
>> damien guichard
>>
>> Hi, list,
>>
>> Just from the curiosity, why balanced binary trees used in Set and Map are
>> AVL-trees, not their alternative, say, red-black trees?  Is there a deep
>> reason for it, or just a historical one?
>>
>> Best,
>> --
>> Yoriyuki Yamagata
>> http://yoriyuki.info/
>>
>>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 13:34 ` Andrew Herron
  2014-06-02 15:06   ` Gabriel Scherer
@ 2014-06-02 16:57   ` Xavier Leroy
  2014-06-02 21:16     ` Andrew Herron
                       ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Xavier Leroy @ 2014-06-02 16:57 UTC (permalink / raw)
  To: caml-list

On 02/06/14 15:34, Andrew Herron wrote:
> It's different to some other languages I've seen, but then so is their
> decision to not use a tail recursive List.map. Each to their own, it's not
> hard to implement the alternative :)

Yes, we're stupid, of course.

Yoriyuki Yamagata originally asked:

>         Just from the curiosity, why balanced binary trees used in Set and
>         Map are AVL-trees, not their alternative, say, red-black trees?  Is
>         there a deep reason for it, or just a historical one?

At the time Set was written, no efficient algorithms for whole-set
operations (union, intersection, difference, etc) were known for
red-black trees.  I'm not sure they are known today.

As for performance of insert/lookup operations, Jean-Christophe
Filliâtre has measurements showing that OCaml's 2-imbalance AVL trees
perform better than red-black trees.  It all depends on your ratio of
insertions to lookups, of course.  But the 2-imbalance trick makes a
big difference with textbook AVL trees.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 16:57   ` Xavier Leroy
@ 2014-06-02 21:16     ` Andrew Herron
  2014-06-10 18:19     ` jonikelee
  2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  2 siblings, 0 replies; 17+ messages in thread
From: Andrew Herron @ 2014-06-02 21:16 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Tue, Jun  3, 2014 at 02:58 AM, Xavier Leroy<xavier.leroy@inria.fr>, wrote:
On 02/06/14 15:34, Andrew Herron wrote:

> It's different to some other languages I've seen, but then so is their

> decision to not use a tail recursive List.map. Each to their own, it's not

> hard to implement the alternative :)


Yes, we're stupid, of course. 
​I didn't mean to imply that. My apologies if it came across that way.




I'm supportive of doing things differently; if everyone did the same thing the world would never evolve.




Cheers,

Andy

[-- Attachment #2: Type: text/html, Size: 1164 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 15:06   ` Gabriel Scherer
@ 2014-06-03 12:48     ` Yaron Minsky
  2014-06-03 13:12       ` Gabriel Scherer
  2014-06-03 13:41       ` Yoriyuki Yamagata
  0 siblings, 2 replies; 17+ messages in thread
From: Yaron Minsky @ 2014-06-03 12:48 UTC (permalink / raw)
  To: Gabriel Scherer; +Cc: Andrew Herron, Damien Guichard, Caml List, David Powers

The following summary of what we do with respect to Maps and Sets in
Core was written by David Powers (who isn't yet subscribe to the list,
so he asked me to forward it on.)

In Core we use a slight modification of the AVL tree found in the
standard library.  I think the biggest change (other than the
interface) is that we add a specialized constructor (Leaf of 'key *
'value) as a specialization of Node (left * key * value * right) to
limit allocation.  It's a nice speed bump and doesn't do too much
damage to the readability of the code.

We also spent a bunch of time last summer working through the research
papers of the last 10 years to see if we could find an implementation
we liked better.  I'd have to pull up the full history of the project
to give real details, but we tried at least all of the following:

- red-black trees
- left-leaning red-black trees
- treaps (including a variant that stored entropy in the spare bits in
the variant tag)
- splay trees
- weight balanced trees
- AVL trees with GADT enforcement of the invariants
- 1-2 brother trees

I'll lead with the caveat that benchmarking is hard, and these
structures shine in different ways depending on the type of workload
you throw at them.  Each implementation below was also mostly a
first-pass to understand the structure and do simple tests, so there
may be more speed gold in the hills.  Your mileage may vary.

That said, our conclusions at the end:

- red black trees are hard to code and understand (mostly due to
remove), and don't show a real performance win.

- treaps are a wonderful structure in terms of code simplicity, but
getting enough randomness quickly enough is too costly to make them a
win over AVL trees (you need to allocate just as much and you need to
generate randomness)

- splay trees are in our tree, but are too special purpose to be a general win.

- Weight balanced trees are a nice structure, and are used in other
languages/libraries.  They were neither better or worse than AVL
trees.

- AVL trees with GADT enforcement work, but were actually slower than
straightforward AVL trees at the time we tested them.  There is some
extra matching due to the variant having more cases, so perhaps this
isn't surprising.  It's also likely that we didn't carry the
2-imbalance trick into the GADT version, which might have skewed the
result.

- 1-2 brother trees were the best of the lot, and we actually produced
a version of the code that we felt was an overall win (or tie) for all
workloads.  Unfortunately, the optimizations we needed to get us there
made the code much longer and harder to understand than the AVL tree
code.  We just couldn't convince ourselves that it was worth it.

Probably the most important point is that nothing we did above gave a
general win of more than 10-20% in the tight loop case.  Given that,
we kept our tweaked AVL tree implementation.  If you want to be very
very fast, you probably can't get away with a map, and if you just
want to be "fast enough" the AVL tree we have is a nice set of
tradeoffs for code complexity.

On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
<gabriel.scherer@gmail.com> wrote:
> Note that OCaml's balanced trees are not exactly what is usually
> called AVL, as the imbalance between different branches can be at most
> 2 (+1 on one side and -1 on the other) instead of just 1 as the
> traditional definition assumes.
>
> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote:
>> Wikipedia has some notes on the difference:
>>
>> http://en.wikipedia.org/wiki/AVL_tree
>>
>> AVL has faster lookup, so maybe they decided to optimise for that.
>>
>> It's different to some other languages I've seen, but then so is their
>> decision to not use a tail recursive List.map. Each to their own, it's not
>> hard to implement the alternative :)
>>
>>
>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
>> wrote:
>>>
>>>
>>> Red-black tree would spare a machine word per node, because a red-black
>>> tree doesn't need depth information.
>>> Hence the reason is either historical or a space/speed trade-off
>>> (comparing two depths may be faster than pattern matching).
>>>
>>> Regards,
>>>
>>> damien guichard
>>>
>>> Hi, list,
>>>
>>> Just from the curiosity, why balanced binary trees used in Set and Map are
>>> AVL-trees, not their alternative, say, red-black trees?  Is there a deep
>>> reason for it, or just a historical one?
>>>
>>> Best,
>>> --
>>> Yoriyuki Yamagata
>>> http://yoriyuki.info/
>>>
>>>
>>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-03 12:48     ` Yaron Minsky
@ 2014-06-03 13:12       ` Gabriel Scherer
  2014-06-03 13:37         ` Yaron Minsky
  2014-06-03 13:41       ` Yoriyuki Yamagata
  1 sibling, 1 reply; 17+ messages in thread
From: Gabriel Scherer @ 2014-06-03 13:12 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: Andrew Herron, Damien Guichard, Caml List, David Powers

Thanks Yaron, that is very interesting feedback.

Would you happen to have the same kind of information about your
experiments with balanced tree buckets for Hashtable? I'm quite
interested in their good worst-case behavior, and considered
experimenting with such a structure for Batteries, but didn't have
time to look at it so far.

On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com> wrote:
> The following summary of what we do with respect to Maps and Sets in
> Core was written by David Powers (who isn't yet subscribe to the list,
> so he asked me to forward it on.)
>
> In Core we use a slight modification of the AVL tree found in the
> standard library.  I think the biggest change (other than the
> interface) is that we add a specialized constructor (Leaf of 'key *
> 'value) as a specialization of Node (left * key * value * right) to
> limit allocation.  It's a nice speed bump and doesn't do too much
> damage to the readability of the code.
>
> We also spent a bunch of time last summer working through the research
> papers of the last 10 years to see if we could find an implementation
> we liked better.  I'd have to pull up the full history of the project
> to give real details, but we tried at least all of the following:
>
> - red-black trees
> - left-leaning red-black trees
> - treaps (including a variant that stored entropy in the spare bits in
> the variant tag)
> - splay trees
> - weight balanced trees
> - AVL trees with GADT enforcement of the invariants
> - 1-2 brother trees
>
> I'll lead with the caveat that benchmarking is hard, and these
> structures shine in different ways depending on the type of workload
> you throw at them.  Each implementation below was also mostly a
> first-pass to understand the structure and do simple tests, so there
> may be more speed gold in the hills.  Your mileage may vary.
>
> That said, our conclusions at the end:
>
> - red black trees are hard to code and understand (mostly due to
> remove), and don't show a real performance win.
>
> - treaps are a wonderful structure in terms of code simplicity, but
> getting enough randomness quickly enough is too costly to make them a
> win over AVL trees (you need to allocate just as much and you need to
> generate randomness)
>
> - splay trees are in our tree, but are too special purpose to be a general win.
>
> - Weight balanced trees are a nice structure, and are used in other
> languages/libraries.  They were neither better or worse than AVL
> trees.
>
> - AVL trees with GADT enforcement work, but were actually slower than
> straightforward AVL trees at the time we tested them.  There is some
> extra matching due to the variant having more cases, so perhaps this
> isn't surprising.  It's also likely that we didn't carry the
> 2-imbalance trick into the GADT version, which might have skewed the
> result.
>
> - 1-2 brother trees were the best of the lot, and we actually produced
> a version of the code that we felt was an overall win (or tie) for all
> workloads.  Unfortunately, the optimizations we needed to get us there
> made the code much longer and harder to understand than the AVL tree
> code.  We just couldn't convince ourselves that it was worth it.
>
> Probably the most important point is that nothing we did above gave a
> general win of more than 10-20% in the tight loop case.  Given that,
> we kept our tweaked AVL tree implementation.  If you want to be very
> very fast, you probably can't get away with a map, and if you just
> want to be "fast enough" the AVL tree we have is a nice set of
> tradeoffs for code complexity.
>
> On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> <gabriel.scherer@gmail.com> wrote:
>> Note that OCaml's balanced trees are not exactly what is usually
>> called AVL, as the imbalance between different branches can be at most
>> 2 (+1 on one side and -1 on the other) instead of just 1 as the
>> traditional definition assumes.
>>
>> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wrote:
>>> Wikipedia has some notes on the difference:
>>>
>>> http://en.wikipedia.org/wiki/AVL_tree
>>>
>>> AVL has faster lookup, so maybe they decided to optimise for that.
>>>
>>> It's different to some other languages I've seen, but then so is their
>>> decision to not use a tail recursive List.map. Each to their own, it's not
>>> hard to implement the alternative :)
>>>
>>>
>>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
>>> wrote:
>>>>
>>>>
>>>> Red-black tree would spare a machine word per node, because a red-black
>>>> tree doesn't need depth information.
>>>> Hence the reason is either historical or a space/speed trade-off
>>>> (comparing two depths may be faster than pattern matching).
>>>>
>>>> Regards,
>>>>
>>>> damien guichard
>>>>
>>>> Hi, list,
>>>>
>>>> Just from the curiosity, why balanced binary trees used in Set and Map are
>>>> AVL-trees, not their alternative, say, red-black trees?  Is there a deep
>>>> reason for it, or just a historical one?
>>>>
>>>> Best,
>>>> --
>>>> Yoriyuki Yamagata
>>>> http://yoriyuki.info/
>>>>
>>>>
>>>
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-03 13:12       ` Gabriel Scherer
@ 2014-06-03 13:37         ` Yaron Minsky
  0 siblings, 0 replies; 17+ messages in thread
From: Yaron Minsky @ 2014-06-03 13:37 UTC (permalink / raw)
  To: Gabriel Scherer
  Cc: caml-list, Andrew Herron, David Powers, Damien Guichard, Eric Stokes

[-- Attachment #1: Type: text/plain, Size: 5753 bytes --]

Looping in Eric Stokes, who did the benchmarking on that.
On Jun 3, 2014 9:13 AM, "Gabriel Scherer" <gabriel.scherer@gmail.com> wrote:

> Thanks Yaron, that is very interesting feedback.
>
> Would you happen to have the same kind of information about your
> experiments with balanced tree buckets for Hashtable? I'm quite
> interested in their good worst-case behavior, and considered
> experimenting with such a structure for Batteries, but didn't have
> time to look at it so far.
>
> On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com>
> wrote:
> > The following summary of what we do with respect to Maps and Sets in
> > Core was written by David Powers (who isn't yet subscribe to the list,
> > so he asked me to forward it on.)
> >
> > In Core we use a slight modification of the AVL tree found in the
> > standard library.  I think the biggest change (other than the
> > interface) is that we add a specialized constructor (Leaf of 'key *
> > 'value) as a specialization of Node (left * key * value * right) to
> > limit allocation.  It's a nice speed bump and doesn't do too much
> > damage to the readability of the code.
> >
> > We also spent a bunch of time last summer working through the research
> > papers of the last 10 years to see if we could find an implementation
> > we liked better.  I'd have to pull up the full history of the project
> > to give real details, but we tried at least all of the following:
> >
> > - red-black trees
> > - left-leaning red-black trees
> > - treaps (including a variant that stored entropy in the spare bits in
> > the variant tag)
> > - splay trees
> > - weight balanced trees
> > - AVL trees with GADT enforcement of the invariants
> > - 1-2 brother trees
> >
> > I'll lead with the caveat that benchmarking is hard, and these
> > structures shine in different ways depending on the type of workload
> > you throw at them.  Each implementation below was also mostly a
> > first-pass to understand the structure and do simple tests, so there
> > may be more speed gold in the hills.  Your mileage may vary.
> >
> > That said, our conclusions at the end:
> >
> > - red black trees are hard to code and understand (mostly due to
> > remove), and don't show a real performance win.
> >
> > - treaps are a wonderful structure in terms of code simplicity, but
> > getting enough randomness quickly enough is too costly to make them a
> > win over AVL trees (you need to allocate just as much and you need to
> > generate randomness)
> >
> > - splay trees are in our tree, but are too special purpose to be a
> general win.
> >
> > - Weight balanced trees are a nice structure, and are used in other
> > languages/libraries.  They were neither better or worse than AVL
> > trees.
> >
> > - AVL trees with GADT enforcement work, but were actually slower than
> > straightforward AVL trees at the time we tested them.  There is some
> > extra matching due to the variant having more cases, so perhaps this
> > isn't surprising.  It's also likely that we didn't carry the
> > 2-imbalance trick into the GADT version, which might have skewed the
> > result.
> >
> > - 1-2 brother trees were the best of the lot, and we actually produced
> > a version of the code that we felt was an overall win (or tie) for all
> > workloads.  Unfortunately, the optimizations we needed to get us there
> > made the code much longer and harder to understand than the AVL tree
> > code.  We just couldn't convince ourselves that it was worth it.
> >
> > Probably the most important point is that nothing we did above gave a
> > general win of more than 10-20% in the tight loop case.  Given that,
> > we kept our tweaked AVL tree implementation.  If you want to be very
> > very fast, you probably can't get away with a map, and if you just
> > want to be "fast enough" the AVL tree we have is a nice set of
> > tradeoffs for code complexity.
> >
> > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> > <gabriel.scherer@gmail.com> wrote:
> >> Note that OCaml's balanced trees are not exactly what is usually
> >> called AVL, as the imbalance between different branches can be at most
> >> 2 (+1 on one side and -1 on the other) instead of just 1 as the
> >> traditional definition assumes.
> >>
> >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com>
> wrote:
> >>> Wikipedia has some notes on the difference:
> >>>
> >>> http://en.wikipedia.org/wiki/AVL_tree
> >>>
> >>> AVL has faster lookup, so maybe they decided to optimise for that.
> >>>
> >>> It's different to some other languages I've seen, but then so is their
> >>> decision to not use a tail recursive List.map. Each to their own, it's
> not
> >>> hard to implement the alternative :)
> >>>
> >>>
> >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr
> >
> >>> wrote:
> >>>>
> >>>>
> >>>> Red-black tree would spare a machine word per node, because a
> red-black
> >>>> tree doesn't need depth information.
> >>>> Hence the reason is either historical or a space/speed trade-off
> >>>> (comparing two depths may be faster than pattern matching).
> >>>>
> >>>> Regards,
> >>>>
> >>>> damien guichard
> >>>>
> >>>> Hi, list,
> >>>>
> >>>> Just from the curiosity, why balanced binary trees used in Set and
> Map are
> >>>> AVL-trees, not their alternative, say, red-black trees?  Is there a
> deep
> >>>> reason for it, or just a historical one?
> >>>>
> >>>> Best,
> >>>> --
> >>>> Yoriyuki Yamagata
> >>>> http://yoriyuki.info/
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 7694 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-03 12:48     ` Yaron Minsky
  2014-06-03 13:12       ` Gabriel Scherer
@ 2014-06-03 13:41       ` Yoriyuki Yamagata
  1 sibling, 0 replies; 17+ messages in thread
From: Yoriyuki Yamagata @ 2014-06-03 13:41 UTC (permalink / raw)
  To: Caml List; +Cc: David Powers

[-- Attachment #1: Type: text/plain, Size: 5697 bytes --]

Thank you everyone for answering my questions.

By the way, in Batteries Included, AVL-trees used in BatISet and BatIMap
allow height difference by 1 (?).  I think the code is originated from my
Camomile.  When I wrote them, I'm afraid of "deviating" from the text book
definition of AVL tree.

Maybe the change is order?

Best,

2014-06-03 21:48 GMT+09:00 Yaron Minsky <yminsky@janestreet.com>:

> The following summary of what we do with respect to Maps and Sets in
> Core was written by David Powers (who isn't yet subscribe to the list,
> so he asked me to forward it on.)
>
> In Core we use a slight modification of the AVL tree found in the
> standard library.  I think the biggest change (other than the
> interface) is that we add a specialized constructor (Leaf of 'key *
> 'value) as a specialization of Node (left * key * value * right) to
> limit allocation.  It's a nice speed bump and doesn't do too much
> damage to the readability of the code.
>
> We also spent a bunch of time last summer working through the research
> papers of the last 10 years to see if we could find an implementation
> we liked better.  I'd have to pull up the full history of the project
> to give real details, but we tried at least all of the following:
>
> - red-black trees
> - left-leaning red-black trees
> - treaps (including a variant that stored entropy in the spare bits in
> the variant tag)
> - splay trees
> - weight balanced trees
> - AVL trees with GADT enforcement of the invariants
> - 1-2 brother trees
>
> I'll lead with the caveat that benchmarking is hard, and these
> structures shine in different ways depending on the type of workload
> you throw at them.  Each implementation below was also mostly a
> first-pass to understand the structure and do simple tests, so there
> may be more speed gold in the hills.  Your mileage may vary.
>
> That said, our conclusions at the end:
>
> - red black trees are hard to code and understand (mostly due to
> remove), and don't show a real performance win.
>
> - treaps are a wonderful structure in terms of code simplicity, but
> getting enough randomness quickly enough is too costly to make them a
> win over AVL trees (you need to allocate just as much and you need to
> generate randomness)
>
> - splay trees are in our tree, but are too special purpose to be a general
> win.
>
> - Weight balanced trees are a nice structure, and are used in other
> languages/libraries.  They were neither better or worse than AVL
> trees.
>
> - AVL trees with GADT enforcement work, but were actually slower than
> straightforward AVL trees at the time we tested them.  There is some
> extra matching due to the variant having more cases, so perhaps this
> isn't surprising.  It's also likely that we didn't carry the
> 2-imbalance trick into the GADT version, which might have skewed the
> result.
>
> - 1-2 brother trees were the best of the lot, and we actually produced
> a version of the code that we felt was an overall win (or tie) for all
> workloads.  Unfortunately, the optimizations we needed to get us there
> made the code much longer and harder to understand than the AVL tree
> code.  We just couldn't convince ourselves that it was worth it.
>
> Probably the most important point is that nothing we did above gave a
> general win of more than 10-20% in the tight loop case.  Given that,
> we kept our tweaked AVL tree implementation.  If you want to be very
> very fast, you probably can't get away with a map, and if you just
> want to be "fast enough" the AVL tree we have is a nice set of
> tradeoffs for code complexity.
>
> On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> <gabriel.scherer@gmail.com> wrote:
> > Note that OCaml's balanced trees are not exactly what is usually
> > called AVL, as the imbalance between different branches can be at most
> > 2 (+1 on one side and -1 on the other) instead of just 1 as the
> > traditional definition assumes.
> >
> > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com>
> wrote:
> >> Wikipedia has some notes on the difference:
> >>
> >> http://en.wikipedia.org/wiki/AVL_tree
> >>
> >> AVL has faster lookup, so maybe they decided to optimise for that.
> >>
> >> It's different to some other languages I've seen, but then so is their
> >> decision to not use a tail recursive List.map. Each to their own, it's
> not
> >> hard to implement the alternative :)
> >>
> >>
> >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
> >> wrote:
> >>>
> >>>
> >>> Red-black tree would spare a machine word per node, because a red-black
> >>> tree doesn't need depth information.
> >>> Hence the reason is either historical or a space/speed trade-off
> >>> (comparing two depths may be faster than pattern matching).
> >>>
> >>> Regards,
> >>>
> >>> damien guichard
> >>>
> >>> Hi, list,
> >>>
> >>> Just from the curiosity, why balanced binary trees used in Set and Map
> are
> >>> AVL-trees, not their alternative, say, red-black trees?  Is there a
> deep
> >>> reason for it, or just a historical one?
> >>>
> >>> Best,
> >>> --
> >>> Yoriyuki Yamagata
> >>> http://yoriyuki.info/
> >>>
> >>>
> >>
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



-- 
Yoriyuki Yamagata
*http://yoriyuki.info/ <http://yoriyuki.info/>*

[-- Attachment #2: Type: text/html, Size: 7759 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 16:57   ` Xavier Leroy
  2014-06-02 21:16     ` Andrew Herron
@ 2014-06-10 18:19     ` jonikelee
  2014-06-10 18:51       ` Florian Hars
  2014-06-15  4:51       ` Lukasz Stafiniak
  2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  2 siblings, 2 replies; 17+ messages in thread
From: jonikelee @ 2014-06-10 18:19 UTC (permalink / raw)
  To: Xavier.Leroy, caml-list

I am coming here from the Coq mailing list, because of a link posted there
about this e-mail thread.  I have been working on Coq-verified and fully
proof-erased versions of AVL and red-black trees, as well as a variant I
didn't know existed before, until I read about "2-imbalance AVL trees" in this
thread.

The github address for my development is:

https://github.com/jonleivent/mindless-coding

and the "gaptree" development there may be the same as your "2-imbalance" AVL
trees, though I can't be sure without more of an explanation (or maybe just a
pointer to the source code) for the 2-imbalance AVL trees.

Any further information or source pointers you can give me would be greatly
appreciated.

Thanks
--Jonathan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-10 18:19     ` jonikelee
@ 2014-06-10 18:51       ` Florian Hars
  2014-06-10 19:52         ` Jonathan
  2014-06-15  4:51       ` Lukasz Stafiniak
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Hars @ 2014-06-10 18:51 UTC (permalink / raw)
  To: caml-list

Am 10.06.2014 20:19, schrieb jonikelee@gmail.com:
> Any further information or source pointers you can give me would be greatly
> appreciated.

This may prove to be of interest to you:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.6890

(Or the version on one of the authors' page: 
https://www.lri.fr/~filliatr/ftp/publis/fpp.ps.gz )

- Florian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-10 18:51       ` Florian Hars
@ 2014-06-10 19:52         ` Jonathan
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan @ 2014-06-10 19:52 UTC (permalink / raw)
  To: caml-list

On 06/10/2014 02:51 PM, Florian Hars wrote:
> Am 10.06.2014 20:19, schrieb jonikelee@gmail.com:
>> Any further information or source pointers you can give me would be 
>> greatly
>> appreciated.
>
> This may prove to be of interest to you:
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.6890
>
> (Or the version on one of the authors' page: 
> https://www.lri.fr/~filliatr/ftp/publis/fpp.ps.gz )
>
> - Florian.
>

Thanks for the links.  I see that those are the origin of the FSets in Coq.

The code I generated for AVL trees and "gap" trees doesn't carry around 
the height for each node at runtime (although the proofs do - but it 
gets fully erased in the extracted OCaml code, leaving just a small 
enumeration type), nor perform any height computations at runtime.  
There is a claim in the above paper that they carry around the height on 
each node for "greater efficiency".  I wonder what that claim refers 
to.  Possibly the fact that they relax the AVL "delta" (the difference 
between sibling subtree heights) to be greater than 1.  Is that what was 
meant by "2-imbalance AVL trees" in this thread?

-- Jonathan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-10 18:19     ` jonikelee
  2014-06-10 18:51       ` Florian Hars
@ 2014-06-15  4:51       ` Lukasz Stafiniak
  2014-06-15 14:01         ` Jonathan
  1 sibling, 1 reply; 17+ messages in thread
From: Lukasz Stafiniak @ 2014-06-15  4:51 UTC (permalink / raw)
  To: Caml, jonikelee

[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

Here is a 2-imbalance AVL trees example from InvarGenT:
https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadt
and the (inferred) types:
https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadti.target

Regards,
Lukasz

On Tue, Jun 10, 2014 at 11:19 AM, <jonikelee@gmail.com> wrote:

> I am coming here from the Coq mailing list, because of a link posted there
> about this e-mail thread.  I have been working on Coq-verified and fully
> proof-erased versions of AVL and red-black trees, as well as a variant I
> didn't know existed before, until I read about "2-imbalance AVL trees" in
> this
> thread.
>
> The github address for my development is:
>
> https://github.com/jonleivent/mindless-coding
>
> and the "gaptree" development there may be the same as your "2-imbalance"
> AVL
> trees, though I can't be sure without more of an explanation (or maybe
> just a
> pointer to the source code) for the 2-imbalance AVL trees.
>
> Any further information or source pointers you can give me would be greatly
> appreciated.
>
> Thanks
> --Jonathan
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 2418 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-15  4:51       ` Lukasz Stafiniak
@ 2014-06-15 14:01         ` Jonathan
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan @ 2014-06-15 14:01 UTC (permalink / raw)
  To: Lukasz Stafiniak, Caml

On 06/15/2014 12:51 AM, Lukasz Stafiniak wrote:
> Here is a 2-imbalance AVL trees example from InvarGenT:
> https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadt
> and the (inferred) types:
> https://github.com/lukstafi/invargent/blob/master/examples/avl_tree.gadti.target
>
> Regards,
> Lukasz

Thank you for those links, Lukasz.  I see from them that there is a 
difference between 2-imbalance AVL trees and gap trees.  In 2-imbalance 
AVL trees, as with standard AVL trees, the parent height is always 1 + 
the maximum child height.  In gap trees, a parent's height can in some 
cases be 2 above the height of both children, even though the childrens' 
heights never differ by more than 1.

I am wondering how important this difference is.  For example, with gap 
trees, there is at most one rebalancing needed per insertion or deletion 
operation.  My understanding of standard AVL trees is that they may 
require O(logN) rebalancings per insertion or deletion.  Do you know how 
the 2-imbalance property addition changes this rebalancing requirement 
for AVL trees?

Thanks again,
Jonathan

>
> On Tue, Jun 10, 2014 at 11:19 AM, <jonikelee@gmail.com> wrote:
>
>> I am coming here from the Coq mailing list, because of a link posted there
>> about this e-mail thread.  I have been working on Coq-verified and fully
>> proof-erased versions of AVL and red-black trees, as well as a variant I
>> didn't know existed before, until I read about "2-imbalance AVL trees" in
>> this
>> thread.
>>
>> The github address for my development is:
>>
>> https://github.com/jonleivent/mindless-coding
>>
>> and the "gaptree" development there may be the same as your "2-imbalance"
>> AVL
>> trees, though I can't be sure without more of an explanation (or maybe
>> just a
>> pointer to the source code) for the 2-imbalance AVL trees.
>>
>> Any further information or source pointers you can give me would be greatly
>> appreciated.
>>
>> Thanks
>> --Jonathan
>>
>> --
>> Caml-list mailing list.  Subscription management and archives:
>> https://sympa.inria.fr/sympa/arc/caml-list
>> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>> Bug reports: http://caml.inria.fr/bin/caml-bugs
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
  2014-06-02 16:57   ` Xavier Leroy
  2014-06-02 21:16     ` Andrew Herron
  2014-06-10 18:19     ` jonikelee
@ 2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  2 siblings, 0 replies; 17+ messages in thread
From: Diego Olivier Fernandez Pons @ 2014-08-03 21:25 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 5202 bytes --]

    Hi


> At the time Set was written, no efficient algorithms for whole-set
> operations (union, intersection, difference, etc) were known for
> red-black trees.  I'm not sure they are known today.
>

The answer is no and I will try to explain.

When doing a set union, at some point you will have to merge two disjoint
sets represented each by a tree

[1 .. 10] union [15 .. 22]

At that point you need to know the "size" of each tree because if both
trees are the same size you just need to append them

    return Node ([1..10], [15..22])

while if one tree is bigger than the other, you will need to cut that tree
into smaller pieces and reorganize them (a "rotation").

    divide [1..10] into [1..5], [6..10]
    return Node([1..5], Node ([6..10], [15..22]))

The AVL implementation of OCaml gives you the "size" of each tree, its the
extra int in the constructor

    'a tree = Empty | Node 'a tree * 'a * 'a tree * int <--- here

it is equal to the longest path in the tree

    f = function E -> 0 | Node (l, _, r, h) -> 1 + max (f l, fr)

If you implement Weight-balanced trees, that value is the cardinal of the
set represented by the tree.

If you use red-black or "traditional AVL" representation, that extra bit
just tells you if the tree is leaning towards the left or towards the
right, but doesn't tell you any information about the global "size" of the
tree. Which means you cannot implement set union directly. You have to
traverse full tree in O(n) to recompute that information.

There is at least one library in Haskell that implements AVL trees that
way, it uses -1 | 0 | 1 information for most operations and recomputes the
longest path for all trees when doing set operations, then discards that
info.

There is also a way to implement red black trees to do set operations just
like (non-traditional) AVLs. It is described in Olivié H.J A new class of
balanced search trees : half balanced search trees, RAIRO Informatique
Théorique 16, 51 71 - also in his PhD Thesis.

Basically a tree is red-black when shortest-branch * 2 >= longest-branch
(cf. Constructing Red-Black trees, Ralf Hinze 1999 - easier to find than
Olivié's works)

So you can implement a tree structure that memorizes both the shortest
and the longest branch in each node

    'a tree = E | N of 'a tree * 'a * 'a tree * int * int <---- shortest
and longest

From there you just follow the same scheme than the AVL implementation in
the OCaml lib, you just need to cope with a couple of triple rotation cases.



> As for performance of insert/lookup operations, Jean-Christophe
> Filliâtre has measurements showing that OCaml's 2-imbalance AVL trees
> perform better than red-black trees.  It all depends on your ratio of
> insertions to lookups, of course.  But the 2-imbalance trick makes a
> big difference with textbook AVL trees.
>


This is a bit more difficult but in short "red-black trees do not implement
truly red-black trees" and "all implementations of red-black trees
implement a different approximation of red-black trees".

Any implementation of red-black trees that uses the 1 bit trick and where
insertions are in log (n) is not surjective. You can understand that as
meaning
(1) - some red-black trees will never be built by the balancing algorithm
(2) - sometimes you will give the balancing algorithm a red-black tree (=
has a provable red-black coloring) and it will still rebalance it instead
of noticing it's already balanced and returning it untouched.

The reason is insertion in red-black trees using the 1 bit trick is linear.
Meaning if I want a balancing algorithm that never falls into (1) or (2) I
cannot avoid linear time insertion. The result is all algorithms found in
the literature are log(n) approximations of the full algorithm. The only
algorithm that is truly logarithmic is again Olivié's thanks to it's
implicit representation of the red-black coloring.

From there, comparing performance of AVLs with "red-black trees" in general
makes no sense as the person that implemented them may have implemented any
arbitrary approximation of the full algorithm. This translates in the code
as a random number of "cases" to check for balancing : 4 for Okasaki, 27 in
the CLR, etc. With double, triple, quadruple rotations, according to how
"deep" the author was willing to go. Because what these guys are doing is
"unfolding" the linear behavior of the algorithm into a constant and
falling back into rebuilding any tree they are given for all other cases.

In any case, some benchmarks I did very, very, very longtime ago showed AVL
and Weight balanced are good all-purpose implementations. The advantage of
WB being it gives you the cardinality of the set in O(1). In some sense,
AVL is the log of WB (in the same way the height of a tree is the log of
its size).

        Diego Olivier


>
> - Xavier Leroy
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 7020 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Caml-list] Why AVL-tree?
@ 2014-06-02 18:23 Damien Guichard
  0 siblings, 0 replies; 17+ messages in thread
From: Damien Guichard @ 2014-06-02 18:23 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 184 bytes --]

Worth to be mentionned : OCaml Reins has red-black trees featuring whole-set
operations.
 
http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=599
 
Regards,
 
damien guichard
 

[-- Attachment #2: Type: text/html, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Caml-list] Why AVL-tree?
@ 2014-06-02 11:48 Yoriyuki Yamagata
  0 siblings, 0 replies; 17+ messages in thread
From: Yoriyuki Yamagata @ 2014-06-02 11:48 UTC (permalink / raw)
  To: Caml List

[-- Attachment #1: Type: text/plain, Size: 277 bytes --]

Hi, list,

Just from the curiosity, why balanced binary trees used in Set and Map are
AVL-trees, not their alternative, say, red-black trees?  Is there a deep
reason for it, or just a historical one?

Best,
-- 
Yoriyuki Yamagata
*http://yoriyuki.info/ <http://yoriyuki.info/>*

[-- Attachment #2: Type: text/html, Size: 401 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-08-03 21:25 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-02 13:21 [Caml-list] Why AVL-tree? Damien Guichard
2014-06-02 13:34 ` Andrew Herron
2014-06-02 15:06   ` Gabriel Scherer
2014-06-03 12:48     ` Yaron Minsky
2014-06-03 13:12       ` Gabriel Scherer
2014-06-03 13:37         ` Yaron Minsky
2014-06-03 13:41       ` Yoriyuki Yamagata
2014-06-02 16:57   ` Xavier Leroy
2014-06-02 21:16     ` Andrew Herron
2014-06-10 18:19     ` jonikelee
2014-06-10 18:51       ` Florian Hars
2014-06-10 19:52         ` Jonathan
2014-06-15  4:51       ` Lukasz Stafiniak
2014-06-15 14:01         ` Jonathan
2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  -- strict thread matches above, loose matches on Subject: below --
2014-06-02 18:23 Damien Guichard
2014-06-02 11:48 Yoriyuki Yamagata

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).