Looping in Eric Stokes, who did the benchmarking on that.
On Jun 3, 2014 9:13 AM, "Gabriel Scherer" <gabriel.scherer@gmail.com> wrote:

> Thanks Yaron, that is very interesting feedback.
>
> Would you happen to have the same kind of information about your
> experiments with balanced tree buckets for Hashtable? I'm quite
> interested in their good worst-case behavior, and considered
> experimenting with such a structure for Batteries, but didn't have
> time to look at it so far.
>
> On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky <yminsky@janestreet.com>
> wrote:
> > The following summary of what we do with respect to Maps and Sets in
> > Core was written by David Powers (who isn't yet subscribe to the list,
> > so he asked me to forward it on.)
> >
> > In Core we use a slight modification of the AVL tree found in the
> > standard library.  I think the biggest change (other than the
> > interface) is that we add a specialized constructor (Leaf of 'key *
> > 'value) as a specialization of Node (left * key * value * right) to
> > limit allocation.  It's a nice speed bump and doesn't do too much
> > damage to the readability of the code.
> >
> > We also spent a bunch of time last summer working through the research
> > papers of the last 10 years to see if we could find an implementation
> > we liked better.  I'd have to pull up the full history of the project
> > to give real details, but we tried at least all of the following:
> >
> > - red-black trees
> > - left-leaning red-black trees
> > - treaps (including a variant that stored entropy in the spare bits in
> > the variant tag)
> > - splay trees
> > - weight balanced trees
> > - AVL trees with GADT enforcement of the invariants
> > - 1-2 brother trees
> >
> > I'll lead with the caveat that benchmarking is hard, and these
> > structures shine in different ways depending on the type of workload
> > you throw at them.  Each implementation below was also mostly a
> > first-pass to understand the structure and do simple tests, so there
> > may be more speed gold in the hills.  Your mileage may vary.
> >
> > That said, our conclusions at the end:
> >
> > - red black trees are hard to code and understand (mostly due to
> > remove), and don't show a real performance win.
> >
> > - treaps are a wonderful structure in terms of code simplicity, but
> > getting enough randomness quickly enough is too costly to make them a
> > win over AVL trees (you need to allocate just as much and you need to
> > generate randomness)
> >
> > - splay trees are in our tree, but are too special purpose to be a
> general win.
> >
> > - Weight balanced trees are a nice structure, and are used in other
> > languages/libraries.  They were neither better or worse than AVL
> > trees.
> >
> > - AVL trees with GADT enforcement work, but were actually slower than
> > straightforward AVL trees at the time we tested them.  There is some
> > extra matching due to the variant having more cases, so perhaps this
> > isn't surprising.  It's also likely that we didn't carry the
> > 2-imbalance trick into the GADT version, which might have skewed the
> > result.
> >
> > - 1-2 brother trees were the best of the lot, and we actually produced
> > a version of the code that we felt was an overall win (or tie) for all
> > workloads.  Unfortunately, the optimizations we needed to get us there
> > made the code much longer and harder to understand than the AVL tree
> > code.  We just couldn't convince ourselves that it was worth it.
> >
> > Probably the most important point is that nothing we did above gave a
> > general win of more than 10-20% in the tight loop case.  Given that,
> > we kept our tweaked AVL tree implementation.  If you want to be very
> > very fast, you probably can't get away with a map, and if you just
> > want to be "fast enough" the AVL tree we have is a nice set of
> > tradeoffs for code complexity.
> >
> > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
> > <gabriel.scherer@gmail.com> wrote:
> >> Note that OCaml's balanced trees are not exactly what is usually
> >> called AVL, as the imbalance between different branches can be at most
> >> 2 (+1 on one side and -1 on the other) instead of just 1 as the
> >> traditional definition assumes.
> >>
> >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com>
> wrote:
> >>> Wikipedia has some notes on the difference:
> >>>
> >>> http://en.wikipedia.org/wiki/AVL_tree
> >>>
> >>> AVL has faster lookup, so maybe they decided to optimise for that.
> >>>
> >>> It's different to some other languages I've seen, but then so is their
> >>> decision to not use a tail recursive List.map. Each to their own, it's
> not
> >>> hard to implement the alternative :)
> >>>
> >>>
> >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr
> >
> >>> wrote:
> >>>>
> >>>>
> >>>> Red-black tree would spare a machine word per node, because a
> red-black
> >>>> tree doesn't need depth information.
> >>>> Hence the reason is either historical or a space/speed trade-off
> >>>> (comparing two depths may be faster than pattern matching).
> >>>>
> >>>> Regards,
> >>>>
> >>>> damien guichard
> >>>>
> >>>> Hi, list,
> >>>>
> >>>> Just from the curiosity, why balanced binary trees used in Set and
> Map are
> >>>> AVL-trees, not their alternative, say, red-black trees?  Is there a
> deep
> >>>> reason for it, or just a historical one?
> >>>>
> >>>> Best,
> >>>> --
> >>>> Yoriyuki Yamagata
> >>>> http://yoriyuki.info/
> >>>>
> >>>>
> >>>
> >>
> >> --
> >> Caml-list mailing list.  Subscription management and archives:
> >> https://sympa.inria.fr/sympa/arc/caml-list
> >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> >> Bug reports: http://caml.inria.fr/bin/caml-bugs
>