caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Brian Hurt <bhurt@spnz.org>
Cc: Oliver Bandel <oliver@first.in-berlin.de>, caml-list@inria.fr
Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms)
Date: Fri, 18 Nov 2005 05:57:11 +1100	[thread overview]
Message-ID: <1132253831.9668.116.camel@rosella> (raw)
In-Reply-To: <Pine.LNX.4.63.0511171205570.24132@localhost.localdomain>

On Thu, 2005-11-17 at 12:08 -0600, Brian Hurt wrote:

> > I'm not sure what it is we disagree on.
> 
> They don't ever need global rebalancing.

Yup, they do not need it to maintain the invariant
you stated. But performance can still benefit from it
significantly.

Two quite distinct BTrees can contain the same data exactly,
it depends on the order of insertion of keys. If you are
clever, you can fill up a BTree so every block is exactly
full .. however you don't need to be clever to get the worst
possible BTree -- just fill an empty tree with sorted
data and almost all the blocks are guaranteed to be half
full (the worst possible case for storage use and
access time). Yet, this is a common case in practice.
** all the blocks will be half full except those on
the right edge of the tree -- for a tree of depth 5
that's millions of half full blocks, and only 5 that 
can possibly be more than half full :)

And in between, blocks can be filled to different extents.
Rebalancing adjusts this. 

The basic algorithm you describe has MANY tweaks which
attempt to rebalance the tree.

The version I played with did 'scrolling', where
instead of splitting an overfull block, you scroll
a key (and child) across to one of its siblings 
via the parent. This is quite tricky ..  but it
fixes the sorted insertion problem nicely --
with this modification, all the blocks are
always FULL .. except those on the right most
edge and their left siblings: they grow until
they're both full, then the rightmost one (only)
is split. So at worst, you waste 5 blocks
out of millions.. basically this doubles the 
capacity of the tree (built from a sorted list).

The same tweak can be used on random insertions
the same way, however the extra reads and write
may or may not be worth it, depending on how
valuable your disk storage is (etc).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


  reply	other threads:[~2005-11-17 18:57 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16 23:42 Oliver Bandel
2005-11-17  8:15 ` [Caml-list] " skaller
2005-11-17 15:09   ` Brian Hurt
2005-11-17 17:31     ` skaller
2005-11-17 18:08       ` Brian Hurt
2005-11-17 18:57         ` skaller [this message]
2005-11-17 22:15           ` Brian Hurt
2005-11-18  1:49             ` skaller
2005-11-17  8:35 ` Florian Hars
2005-11-17  9:24   ` Oliver Bandel
2005-11-17 12:39     ` Florian Weimer
2005-11-17 20:57       ` Oliver Bandel
2005-11-17 22:02         ` Florian Weimer
2005-11-17 11:49 ` Florian Weimer
2005-11-17 13:55   ` Richard Jones
2005-11-18 14:54   ` Jonathan Bryant
2005-11-18 14:22     ` Oliver Bandel
2005-11-18 14:37       ` Florian Weimer
2005-11-18 15:05         ` Thomas Fischbacher
2005-11-18 15:14           ` Florian Weimer
2005-11-18 16:03             ` Thomas Fischbacher
2005-11-18 20:03               ` Gerd Stolpmann
2005-11-18 20:01             ` Gerd Stolpmann
2005-11-18 21:12               ` Florian Weimer
2005-11-18 16:13         ` Oliver Bandel
2005-11-18 14:45     ` Florian Weimer
     [not found] ` <437CD0E5.8080503@yahoo.fr>
2005-11-17 20:02   ` Oliver Bandel
     [not found]     ` <437CE8EC.1070109@yahoo.fr>
2005-11-17 20:41       ` Oliver Bandel
2005-11-18 15:06         ` Florian Hars
     [not found] ` <437BD5F5.6010307@1969web.com>
2005-11-17 20:10   ` Oliver Bandel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1132253831.9668.116.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=bhurt@spnz.org \
    --cc=caml-list@inria.fr \
    --cc=oliver@first.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).