From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,DKIM_VALID,FORGED_GMAIL_RCVD,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 Received: from tb-ob1.topicbox.com (tb-ob1.topicbox.com [64.147.108.173]) by inbox.vuxu.org (Postfix) with ESMTP id 623C32234B for ; Thu, 13 Jun 2024 17:52:49 +0200 (CEST) Received: from tb-mx0.topicbox.com (tb-mx0.nyi.icgroup.com [10.90.30.73]) by tb-ob1.topicbox.com (Postfix) with ESMTP id F2D5024E7D for ; Thu, 13 Jun 2024 11:52:48 -0400 (EDT) (envelope-from bounce.mMf21f7a13249d0b37cde75c53.r522be890-2105-11eb-b15e-8d699134e1fa@9fans.bounce.topicbox.com) Received: by tb-mx0.topicbox.com (Postfix, from userid 1132) id F0CE31BD83DD; Thu, 13 Jun 2024 11:52:48 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=9fans.net; h=from:to :subject:message-id:in-reply-to:references:date:mime-version :content-type:content-transfer-encoding:list-help:list-id :list-post:list-subscribe:reply-to:list-unsubscribe; s=dkim-1; t=1718293968; x=1718380368; bh=MixHnlhSCcscK+0wKtQbQgydoxB4IBzD 8hH2QhvmATE=; b=rHt2AWZ3hQMCUwop+PlY5Q9/vyV/XDFQDVbDxwdLvNIKiWqj puqxIjh3derCKOMCh0NYyrHsAPiFcocnk4Aros2y23BSy0uxi1s8asg1hxxI1lA3 +pd98EXgfgurcwLCsariMUz9iMl2/KCYxq2qm1yxh3O7NvTKuiafy0NW6Qo= From: wb.kloke@gmail.com To: 9fans <9fans@9fans.net> Subject: Re: [9fans] yet another try to fixup venti Message-Id: <17182939550.C8ecf.211654@composer.9fans.topicbox.com> In-Reply-To: <9C832F275135C5830776F4656A698D43@eigenstate.org> References: <9C832F275135C5830776F4656A698D43@eigenstate.org> Date: Thu, 13 Jun 2024 11:52:35 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=17182939551.eaDF.211654 Content-Transfer-Encoding: 7bit Topicbox-Policy-Reasoning: allow: sender is a member Topicbox-Message-UUID: f44a253a-299c-11ef-84c3-c256242d11b0 Archived-At: =?UTF-8?B?PGh0dHBzOi8vOWZhbnMudG9waWNib3guY29tL2dyb3Vwcy85?= =?UTF-8?B?ZmFucy9UMjE4NzhhYTUzODg0OTExYi1NZjIxZjdhMTMyNDlkMGIzN2NkZTc1?= =?UTF-8?B?YzUzPg==?= List-Help: List-Id: "9fans" <9fans.9fans.net> List-Post: List-Software: Topicbox v0 List-Subscribe: Precedence: list Reply-To: 9fans <9fans@9fans.net> List-Unsubscribe: , Topicbox-Delivery-ID: 2:9fans:437d30aa-c441-11e9-8a57-d036212d11b0:522be890-2105-11eb-b15e-8d699134e1fa:Mf21f7a13249d0b37cde75c53:1:BzPcUGH0H2q33VgHQNn-bOkpZcr3jjGiys8KbJCTSJc --17182939551.eaDF.211654 Date: Thu, 13 Jun 2024 11:52:35 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thursday, 13 June 2024, at 6:08 AM, ori wrote: > Sounds fairly interesting, though I'm curious how it compares; my guess was that because of the lack of locality due to using hashes for the score, a trie wouldn't be that different from a hash table. You are right. Lack of locality is a main issue.=C2=A0 But, as Noam Preil has shown in his workshop presentation, it is nearly imp= ossible to work on the current implementation. Example: index.c has 19202 b= ytes, whereas trie.c has 3232, doing roughly the same work. The main drawback of the current venti is the disk based index. A fair comp= arison will have to put this on a ramdisk. The index section of my data set= now is 5GB, but 2GB would be surely sufficient. When I have my implementio= n ready to work (this is a Herculian task, but I intend just to disable ind= ex disk writing without changing the rest), I shall publish the performance= data. One idea to improve locality is avoiding=C2=A0 the storage of the full scor= e in the trie nodes. Only the nodes referring to the first 6 nibbles tend t= o be densely populated. The leaves=C2=A0 need not really to contain the sco= re, as the index addr stored there also points to a place where the score i= s found in the the arena partition. This should not really be detrimental f= or performance as the block is needed anyway. ------------------------------------------ 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T21878aa53884911b-Mf21f7= a13249d0b37cde75c53 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription --17182939551.eaDF.211654 Date: Thu, 13 Jun 2024 11:52:35 -0400 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
On Thursday, 13 June 2024, at 6:08 AM, ori wro= te:
Sounds fairly interesting, though = I'm curious how it compares; my guess was that because of the lack of locality due to using hashes for the score, a trie wouldn't be that different from a hash table.
You are right. Lack of locality is = a main issue. 

But, as Noam Prei= l has shown in his workshop presentation, it is nearly impossible to work o= n the current implementation. Example: index.c has 19202 bytes, whereas tri= e.c has 3232, doing roughly the same work.

The main drawback of the current venti is the disk based index. A fair com= parison will have to put this on a ramdisk. The index section of my data se= t now is 5GB, but 2GB would be surely sufficient. When I have my implementi= on ready to work (this is a Herculian task, but I intend just to disable in= dex disk writing without changing the rest), I shall publish the performanc= e data.

One idea to improve locality is av= oiding  the storage of the full score in the trie nodes. Only the node= s referring to the first 6 nibbles tend to be densely populated. The leaves=   need not really to contain the score, as the index addr stored there= also points to a place where the score is found in the the arena partition= . This should not really be detrimental for performance as the block is nee= ded anyway.


<= div id=3D"topicbox-footer" style=3D"margin:10px 0 0;border-top:1px solid #d= dd;border-color:rgba(0,0,0,.15);padding:7px 0;"> 9fans / 9fans / see discussions + participants + delivery&n= bsp;options Permalink = --17182939551.eaDF.211654--