9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: hruodr@gmail.com
To: 9fans@9fans.net
Subject: Re: [9fans] thoughs about venti+fossil
Date: Thu, 23 Apr 2015 07:21:38 +0000	[thread overview]
Message-ID: <55389d82.M9BOTEwLjblmxdW/%hruodr@gmail.com> (raw)


On Tue, 21 Apr 2015, Russ Cox wrote:

> My paper with Sean Rhea and Alex Pesterev documents the performance 
> effect of double-checking the equality in some detail.
> https://www.usenix.org/legacy/event/usenix08/tech/full_papers/rhea/rhea.pdf

Very nice paper. Specialy from chapter 3.

By reading it, my same original question arises again:

****
  Foundation’s CAS layer is modeled on the Venti [34]
  content-addressed storage server, but we have adapted the
  Venti algorithms for use in a single-disk system and also
  optionally eliminated the assumption that SHA-1 is free
  of collisions, producing two operating modes for Foundation:
  compare-by-hash and compare-by-value. (Page 4)
****


And the question is answered in the paper:

****
While we originally investigated this mode [Compare-by-value] 
due to (in our opinion, unfounded) concerns about cryptographic
hash collisions (see [5, 16] for a lively debate), we
were surprised to find that its overall write performance
was close to that of compare-by-hash mode, despite the
added comparisons. Moreover, compare-by-value is always
faster for reads, as naming blocks by their log offsets
completely eliminates index lookups during reads. (page 7)
****


> (Caveat: in the usual academic tradition, the paper uses "Venti" to mean
> the system described in the original paper, not the system in Plan 9 today.
> The current Plan 9 implementation is much closer to what the paper calls 
> "Foundation: Compare by Hash".)

New question: and if I compile it with "int verifywrites = 1", is it
closer to "Compare-by-Value"?

I mean offset as handle.


> Hope this helps.

I wanted to know if the optional compiling with full check was in
consideration of people that have concerns about the (in)correctness of
compare-by-hash. One can disagree about the risk of using compare-by-hash,
but one cannot disagree in the fact that one disagrees. :)

I think, everyone should decide himself if he uses "compare-as-hash" 
and where he uses it. In some applications I would even take much more 
risk than compare-by-hash. And I find interesting the experiments with
hash functions, including compare-by-hash.

I appreciate that the option of "compare-by-value" is there. Documentation 
about where "compare-by-hash" is used, is important in orther that people 
may decide by themselve.

Interesting would also be the possibility of easily changing the hash
functions. As you note in the paper, this is important in "compare-by-value"
for increasing performance.

The problem I have with "compare-by-hash" is not only the probability of
hash colisions, but that it seems to rely on empirical knowledge about 
the used hash function. People used to analytical arguments may find
empirical arguments and empirical programming gruesome. If the empirical 
knowledge changes, if one discovers that the used hash function does not 
distribute homogenously enough its domain in its range, then one will 
want (specially in the case of compare by hash) to change the hash 
function with a better one. Trial and error is the empirical method 
of solving (and making) problems.

Rodrigo.




             reply	other threads:[~2015-04-23  7:21 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-23  7:21 hruodr [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-04-21 18:30 hruodr
2015-04-21 19:46 ` Russ Cox
2008-03-06 19:09 Brian L. Stuart
2008-03-06 19:50 ` Charles Forsyth
2008-03-05 14:03 erik quanstrom
2008-03-05 16:00 ` Russ Cox
2008-03-05  4:00 Enrico Weigelt
2008-03-05  4:11 ` Roman Shaposhnik
2008-03-05  4:43   ` erik quanstrom
2008-03-05  5:09     ` Roman Shaposhnik
2008-03-05  5:52   ` Enrico Weigelt
2008-03-05  6:24     ` geoff
2008-03-05  6:35     ` Taj Khattra
     [not found]     ` <7f575fa27b41329b9ae24f40e6e5a3cd@plan9.bell-labs.com>
2008-03-06  4:04       ` Enrico Weigelt
2008-03-06  4:13         ` Bruce Ellis
2008-03-06  4:15         ` andrey mirtchovski
2008-03-06  4:31           ` Bruce Ellis
2008-03-06  6:16             ` Enrico Weigelt
2008-03-06 18:50               ` ron minnich
2008-03-06 19:43                 ` Charles Forsyth
2008-03-06 19:45               ` Paul Lalonde
2008-03-06 20:18                 ` Bruce Ellis
2008-03-06 21:39                   ` Paul Lalonde
2008-03-08  9:06                     ` Enrico Weigelt
2008-03-06 22:10                   ` Martin Harriss
2008-03-06  6:40           ` Enrico Weigelt
2008-03-06 14:35             ` erik quanstrom
2008-03-06 14:58             ` Tom Lieber
2008-03-06 15:09             ` Charles Forsyth
2008-03-06 17:09               ` Robert Raschke
2008-03-10 10:19               ` sqweek
2008-03-10 12:29                 ` Gorka Guardiola
2008-03-10 13:20                 ` erik quanstrom
2008-03-10 19:00                   ` Wes Kussmaul
2008-03-10 19:27                     ` erik quanstrom
2008-03-10 20:55                       ` Bakul Shah
2008-03-11  2:04                       ` Wes Kussmaul
2008-03-11  2:10                         ` erik quanstrom
2008-03-11  6:03                           ` Bruce Ellis
2008-03-10 16:18                 ` Russ Cox
2008-03-10 18:06                   ` Bruce Ellis
2008-03-10 18:31                     ` Eric Van Hensbergen
2008-03-10 18:40                       ` Bruce Ellis
2008-03-10 18:46                     ` Geoffrey Avila
2008-03-10 20:28                       ` Charles Forsyth
2008-03-10 21:35                     ` Charles Forsyth
2008-03-06  9:54           ` Wilhelm B. Kloke
2008-03-08  9:37             ` Enrico Weigelt
2008-03-08  9:57               ` Bruce Ellis
2008-03-08 10:46               ` Charles Forsyth
2008-03-08 15:37               ` erik quanstrom
2008-03-06  4:40         ` cummij
2008-03-06  5:15           ` Bruce Ellis
2008-03-06  5:40         ` Uriel
2008-03-06  5:55           ` Bruce Ellis
2008-03-11 18:34             ` Uriel
2008-03-06 12:26           ` erik quanstrom
2008-03-05  5:04 ` geoff
2008-03-05  8:43 ` Charles Forsyth
2008-03-05  9:05   ` Gorka Guardiola
2008-03-05 14:33 ` Russ Cox
2008-03-06 12:39   ` Enrico Weigelt
2008-03-06 16:58     ` Russ Cox
2008-03-06 18:16       ` andrey mirtchovski
     [not found] ` <a553f487750f88281db1cce3378577c7@terzarima.net>
2008-03-06  5:38   ` Enrico Weigelt
2008-03-06  9:44     ` Joel C. Salomon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55389d82.M9BOTEwLjblmxdW/%hruodr@gmail.com \
    --to=hruodr@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).