9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: wb.kloke@gmail.com
To: 9fans <9fans@9fans.net>
Subject: [9fans] yet another try to fixup venti
Date: Tue, 11 Jun 2024 16:52:30 -0400	[thread overview]
Message-ID: <17181391500.35F5.93227@composer.9fans.topicbox.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1780 bytes --]

After studying Steve Stallion's  SSD venti disaster, I decided to do my own try to fix the issues of venti.

Despite my reservations on the lasting wisdom of some of the design choices, I try to use the traditional  arena disk layout.
Only the on-disk index is replaced with a trie-based in-memory structure. 

The trienodes represent either the score and IAddr data as leaves or 16 indices for the next nibble of the score to search further. There is no need for a Bloom filter, as the trie search is not less performant for negative results. The actual trienode size is 64 bytes now, but can probably shorted to 48 bytes.

So far, I have managed to convert buildindex into buildtrie.  If -v option is used, the contents of the trie are printed in lexical order of the score.

The data from my experiments are:

I used my 4 arena files, each 20GB, containing about 10 million clumps in standard 500MB arenas. Data from the arena directories are read in in about  one and a half minute. (There is one error in one of the arenas.) IMHO this is acceptable as startup time for a venti server.

The trie has about 14m nodes, which are stored in a contiguous array. The trie, which is now 32 bit indexed, thus may be reduced to 24 bit index for the current data amount.

For larger storage, there is a design choice, either use 24 bit indices and 48 byte trie nodes, and 256 trie arrays, or use 32bit indices and 64 byte trienodes in a single array.

After I  manage to  push my data to a planport fork on github, you will hear more.
------------------------------------------
9fans: 9fans
Permalink: https://9fans.topicbox.com/groups/9fans/T21878aa53884911b-Mb074534433ed9a094542eef4
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[-- Attachment #2: Type: text/html, Size: 2555 bytes --]

             reply	other threads:[~2024-06-11 20:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11 20:52 wb.kloke [this message]
2024-06-12 20:12 ` [9fans] " wb.kloke
2024-06-13  4:08 ` [9fans] " ori
2024-06-13 15:52   ` wb.kloke
2024-06-13 19:41     ` wb.kloke
2024-06-16  9:19       ` wb.kloke
2024-06-20 15:32         ` wb.kloke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17181391500.35F5.93227@composer.9fans.topicbox.com \
    --to=wb.kloke@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).