From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 560277F8F2 for ; Tue, 3 Jun 2014 15:42:13 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of yoriyuki.y@gmail.com) identity=pra; client-ip=209.85.192.46; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yoriyuki.y@gmail.com"; x-sender="yoriyuki.y@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of yoriyuki.y@gmail.com designates 209.85.192.46 as permitted sender) identity=mailfrom; client-ip=209.85.192.46; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yoriyuki.y@gmail.com"; x-sender="yoriyuki.y@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-qg0-f46.google.com) identity=helo; client-ip=209.85.192.46; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="yoriyuki.y@gmail.com"; x-sender="postmaster@mail-qg0-f46.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: An0BAEDQjVPRVcAulGdsb2JhbABZg1lYgmyqDIxriHsBgQUIFg4BAQEBBwsLCRIqgiUBAQEDARIRHQEbCQkLAQMBCwYFCw0MAR0CAiEBAREBBQEKEgYTEgwEiAsBAwkIDaAraosngXKDDZktChknAwpkhF4RAQUMiSeDCYEuEVMEB4J1gUsEhF8FhUGLWYINgXiBPot/hAMYKYR3LoEx X-IPAS-Result: An0BAEDQjVPRVcAulGdsb2JhbABZg1lYgmyqDIxriHsBgQUIFg4BAQEBBwsLCRIqgiUBAQEDARIRHQEbCQkLAQMBCwYFCw0MAR0CAiEBAREBBQEKEgYTEgwEiAsBAwkIDaAraosngXKDDZktChknAwpkhF4RAQUMiSeDCYEuEVMEB4J1gUsEhF8FhUGLWYINgXiBPot/hAMYKYR3LoEx X-IronPort-AV: E=Sophos;i="4.98,965,1392159600"; d="scan'208";a="77805852" Received: from mail-qg0-f46.google.com ([209.85.192.46]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 03 Jun 2014 15:42:09 +0200 Received: by mail-qg0-f46.google.com with SMTP id q108so12986170qgd.33 for ; Tue, 03 Jun 2014 06:42:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Y88xCfWLTJLzsezj3Lpis9tcnzi5HlZsYNmB5uhcl68=; b=Fx7Jf/ERFetnQo5aNs4VukmZsR8D6YJgpN3XkVqTUvbndY+puq88UxVAF2hP3GIGOK D0he6aVvlOATV0i03KfAY0MRn4QEe0FGbHOD3rTBpfKWGNSH703uvVBnYjYJnV8ZKs4+ fRf11cz5LiPaVNTFTpNLt+8AO9U+NhZiIT6byCvu4MGIFbLnlIX3E1/n15Zl3FOCcpD6 6ux5QowE1wz7sqdT/Ba4Kj/OUUM2DchGmtXDzLybyUlE9GsmQa/oAGxmwdWA2QLU69oN NS00YCo3cv2+Hi+KXHiIgabziw5v6wOktJvqPpuZWPBu/2oNXimy/vHd7HvLJbPjVMkl lfMA== X-Received: by 10.140.91.113 with SMTP id y104mr58420591qgd.3.1401802928720; Tue, 03 Jun 2014 06:42:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.41.228 with HTTP; Tue, 3 Jun 2014 06:41:48 -0700 (PDT) In-Reply-To: References: <543099239773658961@orange.fr> <1401716071976.85ecb8da@Nodemailer> From: Yoriyuki Yamagata Date: Tue, 3 Jun 2014 22:41:48 +0900 Message-ID: To: Caml List Cc: David Powers Content-Type: multipart/alternative; boundary=001a113b57429c911904faeeada4 Subject: Re: [Caml-list] Why AVL-tree? --001a113b57429c911904faeeada4 Content-Type: text/plain; charset=UTF-8 Thank you everyone for answering my questions. By the way, in Batteries Included, AVL-trees used in BatISet and BatIMap allow height difference by 1 (?). I think the code is originated from my Camomile. When I wrote them, I'm afraid of "deviating" from the text book definition of AVL tree. Maybe the change is order? Best, 2014-06-03 21:48 GMT+09:00 Yaron Minsky : > The following summary of what we do with respect to Maps and Sets in > Core was written by David Powers (who isn't yet subscribe to the list, > so he asked me to forward it on.) > > In Core we use a slight modification of the AVL tree found in the > standard library. I think the biggest change (other than the > interface) is that we add a specialized constructor (Leaf of 'key * > 'value) as a specialization of Node (left * key * value * right) to > limit allocation. It's a nice speed bump and doesn't do too much > damage to the readability of the code. > > We also spent a bunch of time last summer working through the research > papers of the last 10 years to see if we could find an implementation > we liked better. I'd have to pull up the full history of the project > to give real details, but we tried at least all of the following: > > - red-black trees > - left-leaning red-black trees > - treaps (including a variant that stored entropy in the spare bits in > the variant tag) > - splay trees > - weight balanced trees > - AVL trees with GADT enforcement of the invariants > - 1-2 brother trees > > I'll lead with the caveat that benchmarking is hard, and these > structures shine in different ways depending on the type of workload > you throw at them. Each implementation below was also mostly a > first-pass to understand the structure and do simple tests, so there > may be more speed gold in the hills. Your mileage may vary. > > That said, our conclusions at the end: > > - red black trees are hard to code and understand (mostly due to > remove), and don't show a real performance win. > > - treaps are a wonderful structure in terms of code simplicity, but > getting enough randomness quickly enough is too costly to make them a > win over AVL trees (you need to allocate just as much and you need to > generate randomness) > > - splay trees are in our tree, but are too special purpose to be a general > win. > > - Weight balanced trees are a nice structure, and are used in other > languages/libraries. They were neither better or worse than AVL > trees. > > - AVL trees with GADT enforcement work, but were actually slower than > straightforward AVL trees at the time we tested them. There is some > extra matching due to the variant having more cases, so perhaps this > isn't surprising. It's also likely that we didn't carry the > 2-imbalance trick into the GADT version, which might have skewed the > result. > > - 1-2 brother trees were the best of the lot, and we actually produced > a version of the code that we felt was an overall win (or tie) for all > workloads. Unfortunately, the optimizations we needed to get us there > made the code much longer and harder to understand than the AVL tree > code. We just couldn't convince ourselves that it was worth it. > > Probably the most important point is that nothing we did above gave a > general win of more than 10-20% in the tight loop case. Given that, > we kept our tweaked AVL tree implementation. If you want to be very > very fast, you probably can't get away with a map, and if you just > want to be "fast enough" the AVL tree we have is a nice set of > tradeoffs for code complexity. > > On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer > wrote: > > Note that OCaml's balanced trees are not exactly what is usually > > called AVL, as the imbalance between different branches can be at most > > 2 (+1 on one side and -1 on the other) instead of just 1 as the > > traditional definition assumes. > > > > On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron > wrote: > >> Wikipedia has some notes on the difference: > >> > >> http://en.wikipedia.org/wiki/AVL_tree > >> > >> AVL has faster lookup, so maybe they decided to optimise for that. > >> > >> It's different to some other languages I've seen, but then so is their > >> decision to not use a tail recursive List.map. Each to their own, it's > not > >> hard to implement the alternative :) > >> > >> > >> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard > >> wrote: > >>> > >>> > >>> Red-black tree would spare a machine word per node, because a red-black > >>> tree doesn't need depth information. > >>> Hence the reason is either historical or a space/speed trade-off > >>> (comparing two depths may be faster than pattern matching). > >>> > >>> Regards, > >>> > >>> damien guichard > >>> > >>> Hi, list, > >>> > >>> Just from the curiosity, why balanced binary trees used in Set and Map > are > >>> AVL-trees, not their alternative, say, red-black trees? Is there a > deep > >>> reason for it, or just a historical one? > >>> > >>> Best, > >>> -- > >>> Yoriyuki Yamagata > >>> http://yoriyuki.info/ > >>> > >>> > >> > > > > -- > > Caml-list mailing list. Subscription management and archives: > > https://sympa.inria.fr/sympa/arc/caml-list > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Yoriyuki Yamagata *http://yoriyuki.info/ * --001a113b57429c911904faeeada4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thank you everyone for answering my questions.

By the way, in Batteries Included, AVL-trees used in BatISet and Ba= tIMap allow height difference by 1 (?). =C2=A0I think the code is originate= d from my Camomile. =C2=A0When I wrote them, I'm afraid of "deviat= ing" from the text book definition of AVL tree.

Maybe the change is order?

Bes= t,

2014-06-0= 3 21:48 GMT+09:00 Yaron Minsky <yminsky@janestreet.com>= :
The following summary of what we do with res= pect to Maps and Sets in
Core was written by David Powers (who isn't yet subscribe to the list,<= br> so he asked me to forward it on.)

In Core we use a slight modification of the AVL tree found in the
standard library. =C2=A0I think the biggest change (other than the
interface) is that we add a specialized constructor (Leaf of 'key *
'value) as a specialization of Node (left * key * value * right) to
limit allocation. =C2=A0It's a nice speed bump and doesn't do too m= uch
damage to the readability of the code.

We also spent a bunch of time last summer working through the research
papers of the last 10 years to see if we could find an implementation
we liked better. =C2=A0I'd have to pull up the full history of the proj= ect
to give real details, but we tried at least all of the following:

- red-black trees
- left-leaning red-black trees
- treaps (including a variant that stored entropy in the spare bits in
the variant tag)
- splay trees
- weight balanced trees
- AVL trees with GADT enforcement of the invariants
- 1-2 brother trees

I'll lead with the caveat that benchmarking is hard, and these
structures shine in different ways depending on the type of workload
you throw at them. =C2=A0Each implementation below was also mostly a
first-pass to understand the structure and do simple tests, so there
may be more speed gold in the hills. =C2=A0Your mileage may vary.

That said, our conclusions at the end:

- red black trees are hard to code and understand (mostly due to
remove), and don't show a real performance win.

- treaps are a wonderful structure in terms of code simplicity, but
getting enough randomness quickly enough is too costly to make them a
win over AVL trees (you need to allocate just as much and you need to
generate randomness)

- splay trees are in our tree, but are too special purpose to be a general = win.

- Weight balanced trees are a nice structure, and are used in other
languages/libraries. =C2=A0They were neither better or worse than AVL
trees.

- AVL trees with GADT enforcement work, but were actually slower than
straightforward AVL trees at the time we tested them. =C2=A0There is some extra matching due to the variant having more cases, so perhaps this
isn't surprising. =C2=A0It's also likely that we didn't carry t= he
2-imbalance trick into the GADT version, which might have skewed the
result.

- 1-2 brother trees were the best of the lot, and we actually produced
a version of the code that we felt was an overall win (or tie) for all
workloads. =C2=A0Unfortunately, the optimizations we needed to get us there=
made the code much longer and harder to understand than the AVL tree
code. =C2=A0We just couldn't convince ourselves that it was worth it.
Probably the most important point is that nothing we did above gave a
general win of more than 10-20% in the tight loop case. =C2=A0Given that, we kept our tweaked AVL tree implementation. =C2=A0If you want to be very very fast, you probably can't get away with a map, and if you just
want to be "fast enough" the AVL tree we have is a nice set of
tradeoffs for code complexity.

On Mon, Jun 2, 2014 at 11:06 AM, Gabriel Scherer
<gabriel.= scherer@gmail.com> wrote:
> Note that OCaml's balanced trees are not exactly what is usually > called AVL, as the imbalance between different branches can be at most=
> 2 (+1 on one side and -1 on the other) instead of just 1 as the
> traditional definition assumes.
>
> On Mon, Jun 2, 2014 at 3:34 PM, Andrew Herron <andrew.herron@gmail.com> wr= ote:
>> Wikipedia has some notes on the difference:
>>
>> http://en.wikipedia.org/wiki/AVL_tree
>>
>> AVL has faster lookup, so maybe they decided to optimise for that.=
>>
>> It's different to some other languages I've seen, but then= so is their
>> decision to not use a tail recursive List.map. Each to their own, = it's not
>> hard to implement the alternative :)
>>
>>
>> On Mon, Jun 2, 2014 at 11:21 PM, Damien Guichard <alphablock@orange.fr> >> wrote:
>>>
>>>
>>> Red-black tree would spare a machine word per node, because a = red-black
>>> tree doesn't need depth information.
>>> Hence the reason is either historical or a space/speed trade-o= ff
>>> (comparing two depths may be faster than pattern matching).
>>>
>>> Regards,
>>>
>>> damien guichard
>>>
>>> Hi, list,
>>>
>>> Just from the curiosity, why balanced binary trees used in Set= and Map are
>>> AVL-trees, not their alternative, say, red-black trees? =C2=A0= Is there a deep
>>> reason for it, or just a historical one?
>>>
>>> Best,
>>> --
>>> Yoriyuki Yamagata
>>> http://yor= iyuki.info/
>>>
>>>
>>
>
> --
> Caml-list mailing list. =C2=A0Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports:
http://caml.inria.fr/bin/caml-bugs

--
Caml-list mailing list. =C2=A0Subscription management and archives:
ht= tps://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



--
=
Yoriyuki Yamagata
http://yoriyuki.info/
--001a113b57429c911904faeeada4--