Thank you everyone for answering my questions. By the way, in Batteries Included, AVL-trees used in BatISet and BatIMap allow height difference by 1 (?). I think the code is originated from my Camomile. When I wrote them, I'm afraid of "deviating" from the text book definition of AVL tree. Maybe the change is order? Best, 2014-06-03 21:48 GMT+09:00 Yaron Minsky : > The following summary of what we do with respect to Maps and Sets in > Core was written by David Powers (who isn't yet subscribe to the list, > so he asked me to forward it on.) > > In Core we use a slight modification of the AVL tree found in the > standard library. I think the biggest change (other than the > interface) is that we add a specialized constructor (Leaf of 'key * > 'value) as a specialization of Node (left * key * value * right) to > limit allocation. It's a nice speed bump and doesn't do too much > damage to the readability of the code. > > We also spent a bunch of time last summer working through the research > papers of the last 10 years to see if we could find an implementation > we liked better. I'd have to pull up the full history of the project > to give real details, but we tried at least all of the following: > > - red-black trees > - left-leaning red-black trees > - treaps (including a variant that stored entropy in the spare bits in > the variant tag) > - splay trees > - weight balanced trees > - AVL trees with GADT enforcement of the invariants > - 1-2 brother trees > > I'll lead with the caveat that benchmarking is hard, and these > structures shine in different ways depending on the type of workload > you throw at them. Each implementation below was also mostly a > first-pass to understand the structure and do simple tests, so there > may be more speed gold in the hills. Your mileage may vary. > > That said, our conclusions at the end: > > - red black trees are hard to code and understand (mostly due to > remove), and don't show a real performance win. > > - treaps are a wonderful structure in terms of code simplicity, but > getting enough randomness quickly enough is too costly to make them a > win over AVL trees (you need to allocate just as much and you need to > generate randomness) > > - splay trees are in our tree, but are too special purpose to be a general > win. > > - Weight balanced trees are a nice structure, and are used in other > languages/libraries. They were neither better or worse than AVL > trees. > > - AVL trees with GADT enforcement work, but were actually slower than > straightforward AVL trees at the time we tested them. There is some > extra matching due to the variant having more cases, so perhaps this > isn't surprising. It's also likely that we didn't carry the > 2-imbalance trick into the GADT version, which might have skewed the > result. > > - 1-2 brother trees were the best of the lot, and we actually produced > a version of the code that we felt was an overall win (or tie) for all > workloads. Unfortunately, the optimizations we needed to get us there > made the code much longer and harder to understand than the AVL tree > code. We just couldn't convince ourselves that it was worth it. > > Probably the most important point is that nothing we did above gave a > general win of more than 10-20% in the tight loop case. Given that, > we kept our tweaked AVL tree implementation. If you want to be very > very fast, you probably can't get away with a map, and if you just > want to be "fast enough" the AVL tree we have is a nice set of > tradeoffs for code complexity. > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > wrote: > > Note that OCaml's balanced trees are not exactly what is usually > > called AVL, as the imbalance between different branches can be at most > > 2 (+1 on one side and -1 on the other) instead of just 1 as the > > traditional definition assumes. > > > > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron > wrote: > >> Wikipedia has some notes on the difference: > >> > >> http://en.wikipedia.org/wiki/AVL_tree > >> > >> AVL has faster lookup, so maybe they decided to optimise for that. > >> > >> It's different to some other languages I've seen, but then so is their > >> decision to not use a tail recursive List.map. Each to their own, it's > not > >> hard to implement the alternative :) > >> > >> > >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard > >> wrote: > >>> > >>> > >>> Red-black tree would spare a machine word per node, because a red-black > >>> tree doesn't need depth information. > >>> Hence the reason is either historical or a space/speed trade-off > >>> (comparing two depths may be faster than pattern matching). > >>> > >>> Regards, > >>> > >>> damien guichard > >>> > >>> Hi, list, > >>> > >>> Just from the curiosity, why balanced binary trees used in Set and Map > are > >>> AVL-trees, not their alternative, say, red-black trees? Is there a > deep > >>> reason for it, or just a historical one? > >>> > >>> Best, > >>> -- > >>> Yoriyuki Yamagata > >>> http://yoriyuki.info/ > >>> > >>> > >> > > > > -- > > Caml-list mailing list. Subscription management and archives: > > https://sympa.inria.fr/sympa/arc/caml-list > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Yoriyuki Yamagata *http://yoriyuki.info/ *