caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Yaron Minsky <yminsky@janestreet.com>
To: Gabriel Scherer <gabriel.scherer@gmail.com>
Cc: caml-list@inria.fr, Andrew Herron <andrew.herron@gmail.com>,
	 David Powers <dpowers@janestreet.com>,
	Damien Guichard <alphablock@orange.fr>,
	 Eric Stokes <estokes@janestreet.com>
Subject: Re: [Caml-list] Why AVL-tree?
Date: Tue, 3 Jun 2014 09:37:38 -0400	[thread overview]
Message-ID: <CACLX4jRac8nqnX2_eYz_tEaSWxsNhLHO6V4O_3SEkq=G6_dufw@mail.gmail.com> (raw)
In-Reply-To: <CAPFanBFrnP+eM8EETEejCrCPO_FLWU6toR4O9xgAA-qZJ_pAhg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5753 bytes --]

Looping in Eric Stokes, who did the benchmarking on that.
On Jun 3, 2014 9:13 AM, "Gabriel Scherer" <gabriel.scherer@gmail.com> wrote:

> Thanks Yaron, that is very interesting feedback.
>
> Would you happen to have the same kind of information about your
> experiments with balanced tree buckets for Hashtable? I'm quite
> interested in their good worst-case behavior, and considered
> experimenting with such a structure for Batteries, but didn't have
> time to look at it so far.
>
> On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com>
> wrote:
> > The following summary of what we do with respect to Maps and Sets in
> > Core was written by David Powers (who isn't yet subscribe to the list,
> > so he asked me to forward it on.)
> >
> > In Core we use a slight modification of the AVL tree found in the
> > standard library.  I think the biggest change (other than the
> > interface) is that we add a specialized constructor (Leaf of 'key *
> > 'value) as a specialization of Node (left * key * value * right) to
> > limit allocation.  It's a nice speed bump and doesn't do too much
> > damage to the readability of the code.
> >
> > We also spent a bunch of time last summer working through the research
> > papers of the last 10 years to see if we could find an implementation
> > we liked better.  I'd have to pull up the full history of the project
> > to give real details, but we tried at least all of the following:
> >
> > - red-black trees
> > - left-leaning red-black trees
> > - treaps (including a variant that stored entropy in the spare bits in
> > the variant tag)
> > - splay trees
> > - weight balanced trees
> > - AVL trees with GADT enforcement of the invariants
> > - 1-2 brother trees
> >
> > I'll lead with the caveat that benchmarking is hard, and these
> > structures shine in different ways depending on the type of workload
> > you throw at them.  Each implementation below was also mostly a
> > first-pass to understand the structure and do simple tests, so there
> > may be more speed gold in the hills.  Your mileage may vary.
> >
> > That said, our conclusions at the end:
> >
> > - red black trees are hard to code and understand (mostly due to
> > remove), and don't show a real performance win.
> >
> > - treaps are a wonderful structure in terms of code simplicity, but
> > getting enough randomness quickly enough is too costly to make them a
> > win over AVL trees (you need to allocate just as much and you need to
> > generate randomness)
> >
> > - splay trees are in our tree, but are too special purpose to be a
> general win.
> >
> > - Weight balanced trees are a nice structure, and are used in other
> > languages/libraries.  They were neither better or worse than AVL
> > trees.
> >
> > - AVL trees with GADT enforcement work, but were actually slower than
> > straightforward AVL trees at the time we tested them.  There is some
> > extra matching due to the variant having more cases, so perhaps this
> > isn't surprising.  It's also likely that we didn't carry the
> > 2-imbalance trick into the GADT version, which might have skewed the
> > result.
> >
> > - 1-2 brother trees were the best of the lot, and we actually produced
> > a version of the code that we felt was an overall win (or tie) for all
> > workloads.  Unfortunately, the optimizations we needed to get us there
> > made the code much longer and harder to understand than the AVL tree
> > code.  We just couldn't convince ourselves that it was worth it.
> >
> > Probably the most important point is that nothing we did above gave a
> > general win of more than 10-20% in the tight loop case.  Given that,
> > we kept our tweaked AVL tree implementation.  If you want to be very
> > very fast, you probably can't get away with a map, and if you just
> > want to be "fast enough" the AVL tree we have is a nice set of
> > tradeoffs for code complexity.
> >
> > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> > <gabriel.scherer@gmail.com> wrote:
> >> Note that OCaml's balanced trees are not exactly what is usually
> >> called AVL, as the imbalance between different branches can be at most
> >> 2 (+1 on one side and -1 on the other) instead of just 1 as the
> >> traditional definition assumes.
> >>
> >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com>
> wrote:
> >>> Wikipedia has some notes on the difference:
> >>>
> >>> http://en.wikipedia.org/wiki/AVL_tree
> >>>
> >>> AVL has faster lookup, so maybe they decided to optimise for that.
> >>>
> >>> It's different to some other languages I've seen, but then so is their
> >>> decision to not use a tail recursive List.map. Each to their own, it's
> not
> >>> hard to implement the alternative :)
> >>>
> >>>
> >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr
> >
> >>> wrote:
> >>>>
> >>>>
> >>>> Red-black tree would spare a machine word per node, because a
> red-black
> >>>> tree doesn't need depth information.
> >>>> Hence the reason is either historical or a space/speed trade-off
> >>>> (comparing two depths may be faster than pattern matching).
> >>>>
> >>>> Regards,
> >>>>
> >>>> damien guichard
> >>>>
> >>>> Hi, list,
> >>>>
> >>>> Just from the curiosity, why balanced binary trees used in Set and
> Map are
> >>>> AVL-trees, not their alternative, say, red-black trees?  Is there a
> deep
> >>>> reason for it, or just a historical one?
> >>>>
> >>>> Best,
> >>>> --
> >>>> Yoriyuki Yamagata
> >>>> http://yoriyuki.info/
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 7694 bytes --]

  reply	other threads:[~2014-06-03 13:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-02 13:21 Damien Guichard
2014-06-02 13:34 ` Andrew Herron
2014-06-02 15:06   ` Gabriel Scherer
2014-06-03 12:48     ` Yaron Minsky
2014-06-03 13:12       ` Gabriel Scherer
2014-06-03 13:37         ` Yaron Minsky [this message]
2014-06-03 13:41       ` Yoriyuki Yamagata
2014-06-02 16:57   ` Xavier Leroy
2014-06-02 21:16     ` Andrew Herron
2014-06-10 18:19     ` jonikelee
2014-06-10 18:51       ` Florian Hars
2014-06-10 19:52         ` Jonathan
2014-06-15  4:51       ` Lukasz Stafiniak
2014-06-15 14:01         ` Jonathan
2014-08-03 21:25     ` Diego Olivier Fernandez Pons
  -- strict thread matches above, loose matches on Subject: below --
2014-06-02 18:23 Damien Guichard
2014-06-02 11:48 Yoriyuki Yamagata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACLX4jRac8nqnX2_eYz_tEaSWxsNhLHO6V4O_3SEkq=G6_dufw@mail.gmail.com' \
    --to=yminsky@janestreet.com \
    --cc=alphablock@orange.fr \
    --cc=andrew.herron@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=dpowers@janestreet.com \
    --cc=estokes@janestreet.com \
    --cc=gabriel.scherer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).