Re: [Caml-list] Why AVL-tree? - Yoriyuki Yamagata

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Yoriyuki Yamagata <yoriyuki.y@gmail.com>
To: Caml List <caml-list@inria.fr>
Cc: David Powers <dpowers@janestreet.com>
Subject: Re: [Caml-list] Why AVL-tree?
Date: Tue, 3 Jun 2014 22:41:48 +0900	[thread overview]
Message-ID: <CALdQWQ7B3x4hSELLbZGO5TT4chVe3t_+dLo4H2r90fahd6CoEA@mail.gmail.com> (raw)
In-Reply-To: <CACLX4jS0Hn04JZnaxB2eh=4Tr3c73drKS0EsFP1oJfMUZ+qkeg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5697 bytes --]

Thank you everyone for answering my questions.

By the way, in Batteries Included, AVL-trees used in BatISet and BatIMap
allow height difference by 1 (?).  I think the code is originated from my
Camomile.  When I wrote them, I'm afraid of "deviating" from the text book
definition of AVL tree.

Maybe the change is order?

Best,

2014-06-03 21:48 GMT+09:00 Yaron Minsky <yminsky@janestreet.com>:

> The following summary of what we do with respect to Maps and Sets in
> Core was written by David Powers (who isn't yet subscribe to the list,
> so he asked me to forward it on.)
>
> In Core we use a slight modification of the AVL tree found in the
> standard library.  I think the biggest change (other than the
> interface) is that we add a specialized constructor (Leaf of 'key *
> 'value) as a specialization of Node (left * key * value * right) to
> limit allocation.  It's a nice speed bump and doesn't do too much
> damage to the readability of the code.
>
> We also spent a bunch of time last summer working through the research
> papers of the last 10 years to see if we could find an implementation
> we liked better.  I'd have to pull up the full history of the project
> to give real details, but we tried at least all of the following:
>
> - red-black trees
> - left-leaning red-black trees
> - treaps (including a variant that stored entropy in the spare bits in
> the variant tag)
> - splay trees
> - weight balanced trees
> - AVL trees with GADT enforcement of the invariants
> - 1-2 brother trees
>
> I'll lead with the caveat that benchmarking is hard, and these
> structures shine in different ways depending on the type of workload
> you throw at them.  Each implementation below was also mostly a
> first-pass to understand the structure and do simple tests, so there
> may be more speed gold in the hills.  Your mileage may vary.
>
> That said, our conclusions at the end:
>
> - red black trees are hard to code and understand (mostly due to
> remove), and don't show a real performance win.
>
> - treaps are a wonderful structure in terms of code simplicity, but
> getting enough randomness quickly enough is too costly to make them a
> win over AVL trees (you need to allocate just as much and you need to
> generate randomness)
>
> - splay trees are in our tree, but are too special purpose to be a general
> win.
>
> - Weight balanced trees are a nice structure, and are used in other
> languages/libraries.  They were neither better or worse than AVL
> trees.
>
> - AVL trees with GADT enforcement work, but were actually slower than
> straightforward AVL trees at the time we tested them.  There is some
> extra matching due to the variant having more cases, so perhaps this
> isn't surprising.  It's also likely that we didn't carry the
> 2-imbalance trick into the GADT version, which might have skewed the
> result.
>
> - 1-2 brother trees were the best of the lot, and we actually produced
> a version of the code that we felt was an overall win (or tie) for all
> workloads.  Unfortunately, the optimizations we needed to get us there
> made the code much longer and harder to understand than the AVL tree
> code.  We just couldn't convince ourselves that it was worth it.
>
> Probably the most important point is that nothing we did above gave a
> general win of more than 10-20% in the tight loop case.  Given that,
> we kept our tweaked AVL tree implementation.  If you want to be very
> very fast, you probably can't get away with a map, and if you just
> want to be "fast enough" the AVL tree we have is a nice set of
> tradeoffs for code complexity.
>
> On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> <gabriel.scherer@gmail.com> wrote:
> > Note that OCaml's balanced trees are not exactly what is usually
> > called AVL, as the imbalance between different branches can be at most
> > 2 (+1 on one side and -1 on the other) instead of just 1 as the
> > traditional definition assumes.
> >
> > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com>
> wrote:
> >> Wikipedia has some notes on the difference:
> >>
> >> http://en.wikipedia.org/wiki/AVL_tree
> >>
> >> AVL has faster lookup, so maybe they decided to optimise for that.
> >>
> >> It's different to some other languages I've seen, but then so is their
> >> decision to not use a tail recursive List.map. Each to their own, it's
> not
> >> hard to implement the alternative :)
> >>
> >>
> >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr>
> >> wrote:
> >>>
> >>>
> >>> Red-black tree would spare a machine word per node, because a red-black
> >>> tree doesn't need depth information.
> >>> Hence the reason is either historical or a space/speed trade-off
> >>> (comparing two depths may be faster than pattern matching).
> >>>
> >>> Regards,
> >>>
> >>> damien guichard
> >>>
> >>> Hi, list,
> >>>
> >>> Just from the curiosity, why balanced binary trees used in Set and Map
> are
> >>> AVL-trees, not their alternative, say, red-black trees?  Is there a
> deep
> >>> reason for it, or just a historical one?
> >>>
> >>> Best,
> >>> --
> >>> Yoriyuki Yamagata
> >>> http://yoriyuki.info/
> >>>
> >>>
> >>
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



-- 
Yoriyuki Yamagata
*http://yoriyuki.info/ <http://yoriyuki.info/>*

[-- Attachment #2: Type: text/html, Size: 7759 bytes --]

next prev parent reply	other threads:[~2014-06-03 13:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-02 13:21 Damien Guichard
2014-06-02 13:34 ` Andrew Herron
2014-06-02 15:06   ` Gabriel Scherer
2014-06-03 12:48     ` Yaron Minsky
2014-06-03 13:12       ` Gabriel Scherer
2014-06-03 13:37         ` Yaron Minsky
2014-06-03 13:41       ` Yoriyuki Yamagata [this message]
2014-06-02 16:57   ` Xavier Leroy
2014-06-02 21:16     ` Andrew Herron
2014-06-10 18:19     ` jonikelee
2014-06-10 18:51       ` Florian Hars
2014-06-10 19:52         ` Jonathan
2014-06-15  4:51       ` Lukasz Stafiniak
2014-06-15 14:01         ` Jonathan
2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  -- strict thread matches above, loose matches on Subject: below --
2014-06-02 18:23 Damien Guichard
2014-06-02 11:48 Yoriyuki Yamagata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALdQWQ7B3x4hSELLbZGO5TT4chVe3t_+dLo4H2r90fahd6CoEA@mail.gmail.com \
    --to=yoriyuki.y@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=dpowers@janestreet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).