Looping in Eric Stokes, who did the benchmarking on that. On Jun 3, 2014 9:13 AM, "Gabriel Scherer" wrote: > Thanks Yaron, that is very interesting feedback. > > Would you happen to have the same kind of information about your > experiments with balanced tree buckets for Hashtable? I'm quite > interested in their good worst-case behavior, and considered > experimenting with such a structure for Batteries, but didn't have > time to look at it so far. > > On Tue, Jun 3, 2014 at 2:48 PM, Yaron Minsky > wrote: > > The following summary of what we do with respect to Maps and Sets in > > Core was written by David Powers (who isn't yet subscribe to the list, > > so he asked me to forward it on.) > > > > In Core we use a slight modification of the AVL tree found in the > > standard library. I think the biggest change (other than the > > interface) is that we add a specialized constructor (Leaf of 'key * > > 'value) as a specialization of Node (left * key * value * right) to > > limit allocation. It's a nice speed bump and doesn't do too much > > damage to the readability of the code. > > > > We also spent a bunch of time last summer working through the research > > papers of the last 10 years to see if we could find an implementation > > we liked better. I'd have to pull up the full history of the project > > to give real details, but we tried at least all of the following: > > > > - red-black trees > > - left-leaning red-black trees > > - treaps (including a variant that stored entropy in the spare bits in > > the variant tag) > > - splay trees > > - weight balanced trees > > - AVL trees with GADT enforcement of the invariants > > - 1-2 brother trees > > > > I'll lead with the caveat that benchmarking is hard, and these > > structures shine in different ways depending on the type of workload > > you throw at them. Each implementation below was also mostly a > > first-pass to understand the structure and do simple tests, so there > > may be more speed gold in the hills. Your mileage may vary. > > > > That said, our conclusions at the end: > > > > - red black trees are hard to code and understand (mostly due to > > remove), and don't show a real performance win. > > > > - treaps are a wonderful structure in terms of code simplicity, but > > getting enough randomness quickly enough is too costly to make them a > > win over AVL trees (you need to allocate just as much and you need to > > generate randomness) > > > > - splay trees are in our tree, but are too special purpose to be a > general win. > > > > - Weight balanced trees are a nice structure, and are used in other > > languages/libraries. They were neither better or worse than AVL > > trees. > > > > - AVL trees with GADT enforcement work, but were actually slower than > > straightforward AVL trees at the time we tested them. There is some > > extra matching due to the variant having more cases, so perhaps this > > isn't surprising. It's also likely that we didn't carry the > > 2-imbalance trick into the GADT version, which might have skewed the > > result. > > > > - 1-2 brother trees were the best of the lot, and we actually produced > > a version of the code that we felt was an overall win (or tie) for all > > workloads. Unfortunately, the optimizations we needed to get us there > > made the code much longer and harder to understand than the AVL tree > > code. We just couldn't convince ourselves that it was worth it. > > > > Probably the most important point is that nothing we did above gave a > > general win of more than 10-20% in the tight loop case. Given that, > > we kept our tweaked AVL tree implementation. If you want to be very > > very fast, you probably can't get away with a map, and if you just > > want to be "fast enough" the AVL tree we have is a nice set of > > tradeoffs for code complexity. > > > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > > wrote: > >> Note that OCaml's balanced trees are not exactly what is usually > >> called AVL, as the imbalance between different branches can be at most > >> 2 (+1 on one side and -1 on the other) instead of just 1 as the > >> traditional definition assumes. > >> > >> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron > wrote: > >>> Wikipedia has some notes on the difference: > >>> > >>> http://en.wikipedia.org/wiki/AVL_tree > >>> > >>> AVL has faster lookup, so maybe they decided to optimise for that. > >>> > >>> It's different to some other languages I've seen, but then so is their > >>> decision to not use a tail recursive List.map. Each to their own, it's > not > >>> hard to implement the alternative :) > >>> > >>> > >>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard > > >>> wrote: > >>>> > >>>> > >>>> Red-black tree would spare a machine word per node, because a > red-black > >>>> tree doesn't need depth information. > >>>> Hence the reason is either historical or a space/speed trade-off > >>>> (comparing two depths may be faster than pattern matching). > >>>> > >>>> Regards, > >>>> > >>>> damien guichard > >>>> > >>>> Hi, list, > >>>> > >>>> Just from the curiosity, why balanced binary trees used in Set and > Map are > >>>> AVL-trees, not their alternative, say, red-black trees? Is there a > deep > >>>> reason for it, or just a historical one? > >>>> > >>>> Best, > >>>> -- > >>>> Yoriyuki Yamagata > >>>> http://yoriyuki.info/ > >>>> > >>>> > >>> > >> > >> -- > >> Caml-list mailing list. Subscription management and archives: > >> https://sympa.inria.fr/sympa/arc/caml-list > >> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > >> Bug reports: http://caml.inria.fr/bin/caml-bugs >