Re: [Caml-list] Why AVL-tree? - Diego Olivier Fernandez Pons

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Diego Olivier Fernandez Pons <dofp.ocaml@gmail.com>
To: Xavier Leroy <Xavier.Leroy@inria.fr>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Why AVL-tree?
Date: Sun, 3 Aug 2014 23:25:20 +0200	[thread overview]
Message-ID: <CAHqiZ-+0QiatZwq7CTUmBGV6CtsfZw-eqR=QmdyqCL83T=EF4Q@mail.gmail.com> (raw)
In-Reply-To: <538CAD12.2040104@inria.fr>

[-- Attachment #1: Type: text/plain, Size: 5202 bytes --]

    Hi

> At the time Set was written, no efficient algorithms for whole-set
> operations (union, intersection, difference, etc) were known for
> red-black trees.  I'm not sure they are known today.
>

The answer is no and I will try to explain.

When doing a set union, at some point you will have to merge two disjoint
sets represented each by a tree

[1 .. 10] union [15 .. 22]

At that point you need to know the "size" of each tree because if both
trees are the same size you just need to append them

    return Node ([1..10], [15..22])

while if one tree is bigger than the other, you will need to cut that tree
into smaller pieces and reorganize them (a "rotation").

    divide [1..10] into [1..5], [6..10]
    return Node([1..5], Node ([6..10], [15..22]))

The AVL implementation of OCaml gives you the "size" of each tree, its the
extra int in the constructor

    'a tree = Empty | Node 'a tree * 'a * 'a tree * int <--- here

it is equal to the longest path in the tree

    f = function E -> 0 | Node (l, _, r, h) -> 1 + max (f l, fr)

If you implement Weight-balanced trees, that value is the cardinal of the
set represented by the tree.

If you use red-black or "traditional AVL" representation, that extra bit
just tells you if the tree is leaning towards the left or towards the
right, but doesn't tell you any information about the global "size" of the
tree. Which means you cannot implement set union directly. You have to
traverse full tree in O(n) to recompute that information.

There is at least one library in Haskell that implements AVL trees that
way, it uses -1 | 0 | 1 information for most operations and recomputes the
longest path for all trees when doing set operations, then discards that
info.

There is also a way to implement red black trees to do set operations just
like (non-traditional) AVLs. It is described in Olivié H.J A new class of
balanced search trees : half balanced search trees, RAIRO Informatique
Théorique 16, 51 71 - also in his PhD Thesis.

Basically a tree is red-black when shortest-branch * 2 >= longest-branch
(cf. Constructing Red-Black trees, Ralf Hinze 1999 - easier to find than
Olivié's works)

So you can implement a tree structure that memorizes both the shortest
and the longest branch in each node

    'a tree = E | N of 'a tree * 'a * 'a tree * int * int <---- shortest
and longest

From there you just follow the same scheme than the AVL implementation in
the OCaml lib, you just need to cope with a couple of triple rotation cases.

> As for performance of insert/lookup operations, Jean-Christophe
> Filliâtre has measurements showing that OCaml's 2-imbalance AVL trees
> perform better than red-black trees.  It all depends on your ratio of
> insertions to lookups, of course.  But the 2-imbalance trick makes a
> big difference with textbook AVL trees.
>

This is a bit more difficult but in short "red-black trees do not implement
truly red-black trees" and "all implementations of red-black trees
implement a different approximation of red-black trees".

Any implementation of red-black trees that uses the 1 bit trick and where
insertions are in log (n) is not surjective. You can understand that as
meaning
(1) - some red-black trees will never be built by the balancing algorithm
(2) - sometimes you will give the balancing algorithm a red-black tree (=
has a provable red-black coloring) and it will still rebalance it instead
of noticing it's already balanced and returning it untouched.

The reason is insertion in red-black trees using the 1 bit trick is linear.
Meaning if I want a balancing algorithm that never falls into (1) or (2) I
cannot avoid linear time insertion. The result is all algorithms found in
the literature are log(n) approximations of the full algorithm. The only
algorithm that is truly logarithmic is again Olivié's thanks to it's
implicit representation of the red-black coloring.

From there, comparing performance of AVLs with "red-black trees" in general
makes no sense as the person that implemented them may have implemented any
arbitrary approximation of the full algorithm. This translates in the code
as a random number of "cases" to check for balancing : 4 for Okasaki, 27 in
the CLR, etc. With double, triple, quadruple rotations, according to how
"deep" the author was willing to go. Because what these guys are doing is
"unfolding" the linear behavior of the algorithm into a constant and
falling back into rebuilding any tree they are given for all other cases.

In any case, some benchmarks I did very, very, very longtime ago showed AVL
and Weight balanced are good all-purpose implementations. The advantage of
WB being it gives you the cardinality of the set in O(1). In some sense,
AVL is the log of WB (in the same way the height of a tree is the log of
its size).

        Diego Olivier

>
> - Xavier Leroy
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 7020 bytes --]

next prev parent reply	other threads:[~2014-08-03 21:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-02 13:21 Damien Guichard
2014-06-02 13:34 ` Andrew Herron
2014-06-02 15:06   ` Gabriel Scherer
2014-06-03 12:48     ` Yaron Minsky
2014-06-03 13:12       ` Gabriel Scherer
2014-06-03 13:37         ` Yaron Minsky
2014-06-03 13:41       ` Yoriyuki Yamagata
2014-06-02 16:57   ` Xavier Leroy
2014-06-02 21:16     ` Andrew Herron
2014-06-10 18:19     ` jonikelee
2014-06-10 18:51       ` Florian Hars
2014-06-10 19:52         ` Jonathan
2014-06-15  4:51       ` Lukasz Stafiniak
2014-06-15 14:01         ` Jonathan
2014-08-03 21:25     ` Diego Olivier Fernandez Pons [this message]
  -- strict thread matches above, loose matches on Subject: below --
2014-06-02 18:23 Damien Guichard
2014-06-02 11:48 Yoriyuki Yamagata

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHqiZ-+0QiatZwq7CTUmBGV6CtsfZw-eqR=QmdyqCL83T=EF4Q@mail.gmail.com' \
    --to=dofp.ocaml@gmail.com \
    --cc=Xavier.Leroy@inria.fr \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).