caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: Brian Hurt <bhurt@spnz.org>
Cc: Oliver Bandel <oliver@first.in-berlin.de>, caml-list@inria.fr
Subject: Re: [Caml-list] [1/2 OT] Indexing (and mergeable Index-algorithms)
Date: Fri, 18 Nov 2005 12:49:55 +1100	[thread overview]
Message-ID: <1132278595.9668.127.camel@rosella> (raw)
In-Reply-To: <Pine.LNX.4.63.0511171459120.24132@localhost.localdomain>

On Thu, 2005-11-17 at 16:15 -0600, Brian Hurt wrote:

> 
> This is the worst possible case- that each block is half full.  Which 
> means that instead of log_k(N) blocks, you're having to touch log_{k/2}(N) 
> blocks.  This means that if N=2^32 and k=256, that you need to read 5 
> blocks instead of 4 (128^5 = 2^35).  And the number of blocks you need has 
> about doubled.  Also note that the binary search per block is now cheaper 
> (by one step), and the cost of inserting elements is half.
> 
> So the question becomes: is the performance advantage gained by 
> rebalancing worth the cost?

Yes, that's the question. And there is no single answer :)

Note, it is not 5 reads instead of 4, it is 3 reads instead of 2
(assuming the first two levels are cached).

A BTree system I used once was fixed at 3 levels. So it could
be kind of critical :)

> If I was worried about it, I'd be inclined to be more agressive on merging 
> and splitting nodes.  Basically, if the node is under 5/8th full, I'd look 
> to steal some children from siblings.  If the node is over 7/8th full, I'd 
> look to share some child with siblings.  Note that if you have three nodes 
> each 1/2 full, you can combine the three into two nodes, each 3/4th full. 
> You want to keep nodes about 3/4th full, as that makes it cheaper to add 
> and delete elements.

Yup. There are lots of possible tweaks :)

> Two problems with this: first, what happens when the sibling is full too, 
> you can get into a case where an insert is O(N) cost, and second, this is 
> assuming inserts only (I can still get to worst-case with deletes).

Depends precisely on the algorithm -- mine only looked once.
If the sibling was full, you just split as usual. Its a cheap 
hack :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


  reply	other threads:[~2005-11-18  1:50 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-16 23:42 Oliver Bandel
2005-11-17  8:15 ` [Caml-list] " skaller
2005-11-17 15:09   ` Brian Hurt
2005-11-17 17:31     ` skaller
2005-11-17 18:08       ` Brian Hurt
2005-11-17 18:57         ` skaller
2005-11-17 22:15           ` Brian Hurt
2005-11-18  1:49             ` skaller [this message]
2005-11-17  8:35 ` Florian Hars
2005-11-17  9:24   ` Oliver Bandel
2005-11-17 12:39     ` Florian Weimer
2005-11-17 20:57       ` Oliver Bandel
2005-11-17 22:02         ` Florian Weimer
2005-11-17 11:49 ` Florian Weimer
2005-11-17 13:55   ` Richard Jones
2005-11-18 14:54   ` Jonathan Bryant
2005-11-18 14:22     ` Oliver Bandel
2005-11-18 14:37       ` Florian Weimer
2005-11-18 15:05         ` Thomas Fischbacher
2005-11-18 15:14           ` Florian Weimer
2005-11-18 16:03             ` Thomas Fischbacher
2005-11-18 20:03               ` Gerd Stolpmann
2005-11-18 20:01             ` Gerd Stolpmann
2005-11-18 21:12               ` Florian Weimer
2005-11-18 16:13         ` Oliver Bandel
2005-11-18 14:45     ` Florian Weimer
     [not found] ` <437CD0E5.8080503@yahoo.fr>
2005-11-17 20:02   ` Oliver Bandel
     [not found]     ` <437CE8EC.1070109@yahoo.fr>
2005-11-17 20:41       ` Oliver Bandel
2005-11-18 15:06         ` Florian Hars
     [not found] ` <437BD5F5.6010307@1969web.com>
2005-11-17 20:10   ` Oliver Bandel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1132278595.9668.127.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=bhurt@spnz.org \
    --cc=caml-list@inria.fr \
    --cc=oliver@first.in-berlin.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).