9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] compare-by-hash
@ 2003-07-31 17:24 Joel Salomon
  2003-07-31 17:50 ` andrey mirtchovski
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Joel Salomon @ 2003-07-31 17:24 UTC (permalink / raw)
  To: 9fans

This is a bit late, but...
A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of
collisions after 2^80 entries. Google "birthday paradox" for the math.

--Joel



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
@ 2003-07-31 17:50 ` andrey mirtchovski
  2003-07-31 17:51 ` Sape Mullender
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: andrey mirtchovski @ 2003-07-31 17:50 UTC (permalink / raw)
  To: 9fans

Solution: increase the hash size to 320 bits to achieve approx 1 collision after
2^160 entries :P

Theorem:
                                       ┌───┐
Applying a random mapping β: ʃ ⇒ ʃ to  ⎷|ʃ|  values is expected to produce about 
1 collision. 

andrey

ps: you may need the 10646 fonts to see the unicode typesetting above in
its full glory ;)

On Thu, 31 Jul 2003, Joel Salomon wrote:

> This is a bit late, but...
> A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of 
> collisions after 2^80 entries. Google "birthday paradox" for the math.
> 
> --Joel
> 




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
  2003-07-31 17:50 ` andrey mirtchovski
@ 2003-07-31 17:51 ` Sape Mullender
  2003-07-31 17:53 ` rog
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Sape Mullender @ 2003-07-31 17:51 UTC (permalink / raw)
  To: 9fans

> A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of
> collisions after 2^80 entries. Google "birthday paradox" for the math.

While you make the 2^80 entries, we'll keep ourselves busy worrying about something else.

	Sape



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
  2003-07-31 17:50 ` andrey mirtchovski
  2003-07-31 17:51 ` Sape Mullender
@ 2003-07-31 17:53 ` rog
  2003-07-31 18:03 ` jmk
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: rog @ 2003-07-31 17:53 UTC (permalink / raw)
  To: 9fans

> A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of
> collisions after 2^80 entries. Google "birthday paradox" for the math.

2^80 is fairly high... after how many entries does the probability
reach, say, 0.1%?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
                   ` (2 preceding siblings ...)
  2003-07-31 17:53 ` rog
@ 2003-07-31 18:03 ` jmk
  2003-08-01  2:45 ` Joel Salomon
  2003-08-01  9:02 ` Douglas A. Gwyn
  5 siblings, 0 replies; 44+ messages in thread
From: jmk @ 2003-07-31 18:03 UTC (permalink / raw)
  To: 9fans

On Thu Jul 31 13:20:30 EDT 2003, salomo3@cooper.edu wrote:
> This is a bit late, but...
> A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of
> collisions after 2^80 entries. Google "birthday paradox" for the math.
>
> --Joel

I assume you think Venti does compare-by-hash, but it doesn't.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
                   ` (3 preceding siblings ...)
  2003-07-31 18:03 ` jmk
@ 2003-08-01  2:45 ` Joel Salomon
  2003-08-01  2:52   ` Geoff Collyer
  2003-08-01  9:02 ` Douglas A. Gwyn
  5 siblings, 1 reply; 44+ messages in thread
From: Joel Salomon @ 2003-08-01  2:45 UTC (permalink / raw)
  To: 9fans

> I assume you think Venti does compare-by-hash, but it doesn't.

Actually, I was responding to the unanswered question of a month ago -
how concerned do we need to be about hash collisions? Answer: not very.
Even at the petabyte and exabyte levels, these would be vanishingly unlikely.

(BTW, are there 'standard' prefixes for higher ranges than exa- ? *Very*
important to be able to say when SHA-1 will start having problems :-)

--Joel



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  2:45 ` Joel Salomon
@ 2003-08-01  2:52   ` Geoff Collyer
  2003-08-01  3:42     ` jmk
                       ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Geoff Collyer @ 2003-08-01  2:52 UTC (permalink / raw)
  To: 9fans

Some new prefixes were blessed in 1991.  Yotta (appreviated `Y') is
10⁲⁴, Zetta (`Z') is 10⁲ⁱ.  In the other direction, beyond
pico, femto and atto, zepto (`z') is 10⁻⁲ⁱ, yocto (`y') is
10⁻⁲⁴.  http://www.unc.edu/~rowlett/units/prefixes.html has the
whole story.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  2:52   ` Geoff Collyer
@ 2003-08-01  3:42     ` jmk
  2003-08-01  4:12       ` Geoff Collyer
  2003-08-01 15:23       ` Jack Johnson
  2003-08-01  9:03     ` Anthony Mandic
  2003-08-01 15:11     ` Jack Johnson
  2 siblings, 2 replies; 44+ messages in thread
From: jmk @ 2003-08-01  3:42 UTC (permalink / raw)
  To: 9fans

On Thu Jul 31 22:53:21 EDT 2003, geoff@collyer.net wrote:
> Some new prefixes were blessed in 1991.  Yotta (appreviated `Y') is
> 10⁲⁴, Zetta (`Z') is 10⁲ⁱ.  In the other direction, beyond
> pico, femto and atto, zepto (`z') is 10⁻⁲ⁱ, yocto (`y') is
> 10⁻⁲⁴.  http://www.unc.edu/~rowlett/units/prefixes.html has the
> whole story.

So, that would make 2^80 a yobibyte (1YiB) in the International
Electrotechnical Commission's scheme for prefixes of powers of 2?


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  3:42     ` jmk
@ 2003-08-01  4:12       ` Geoff Collyer
  2003-08-01 15:23       ` Jack Johnson
  1 sibling, 0 replies; 44+ messages in thread
From: Geoff Collyer @ 2003-08-01  4:12 UTC (permalink / raw)
  To: 9fans

> So, that would make 2^80 a yobibyte (1YiB) in the International
> Electrotechnical Commission's scheme for prefixes of powers of 2?

I believe so, if anyone used that scheme.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
                   ` (4 preceding siblings ...)
  2003-08-01  2:45 ` Joel Salomon
@ 2003-08-01  9:02 ` Douglas A. Gwyn
  5 siblings, 0 replies; 44+ messages in thread
From: Douglas A. Gwyn @ 2003-08-01  9:02 UTC (permalink / raw)
  To: 9fans

Joel Salomon wrote:
> A 160-bit hash (assuming "strong" hashing) has a 50% probabilty of
> collisions after 2^80 entries. Google "birthday paradox" for the math.

Of course storing that many messages with their hash indexes
is somewhat of a problem for the foreseeable future.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  2:52   ` Geoff Collyer
  2003-08-01  3:42     ` jmk
@ 2003-08-01  9:03     ` Anthony Mandic
  2003-08-01 15:11     ` Jack Johnson
  2 siblings, 0 replies; 44+ messages in thread
From: Anthony Mandic @ 2003-08-01  9:03 UTC (permalink / raw)
  To: 9fans

Geoff Collyer wrote:
> 
> In the other direction, beyond pico, femto and atto,
> zepto (`z') is 10⁻⁲ⁱ, yocto (`y') is 10⁻⁲⁴.

	These sound like the obscure Marx brothers I haven't yet heard about.

-am	© 2003


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  2:52   ` Geoff Collyer
  2003-08-01  3:42     ` jmk
  2003-08-01  9:03     ` Anthony Mandic
@ 2003-08-01 15:11     ` Jack Johnson
  2 siblings, 0 replies; 44+ messages in thread
From: Jack Johnson @ 2003-08-01 15:11 UTC (permalink / raw)
  To: 9fans

Geoff Collyer wrote:

> Some new prefixes were blessed in 1991.  Yotta (appreviated `Y') is
> 10⁲⁴, Zetta (`Z') is 10⁲ⁱ.  In the other direction, beyond
> pico, femto and atto, zepto (`z') is 10⁻⁲ⁱ, yocto (`y') is
> 10⁻⁲⁴.  http://www.unc.edu/~rowlett/units/prefixes.html has the
> whole story.

I notice they left out harpo and chico.

-J



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-08-01  3:42     ` jmk
  2003-08-01  4:12       ` Geoff Collyer
@ 2003-08-01 15:23       ` Jack Johnson
  1 sibling, 0 replies; 44+ messages in thread
From: Jack Johnson @ 2003-08-01 15:23 UTC (permalink / raw)
  To: 9fans

jmk@plan9.bell-labs.com wrote:
> So, that would make 2^80 a yobibyte (1YiB) in the International
> Electrotechnical Commission's scheme for prefixes of powers of 2?

Interesting followup:

	"there is a difference of more than two hundred
	 sextillion (208 925 819 614 629 174 706 176) bytes
	 between 1 yobibyte (YiB) and 1 yottabyte (YB)"

	- http://jack.p5.org.uk/byte-me.en.html

-Jack



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
@ 2003-08-03  4:39 Andrew Simmons
  0 siblings, 0 replies; 44+ messages in thread
From: Andrew Simmons @ 2003-08-03  4:39 UTC (permalink / raw)
  To: 9fans

> 2^80 is fairly high... after how many entries does the probability
> reach, say, 0.1%?

To a reasonable approximation, the number of entries at which the
probability is reached is 2^80 times the square root of twice the
probability, which in the case of 0.1% is between 2^75 and 2^76.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-02 11:37     ` Sam
@ 2003-06-02 12:41       ` boyd, rounin
  0 siblings, 0 replies; 44+ messages in thread
From: boyd, rounin @ 2003-06-02 12:41 UTC (permalink / raw)
  To: 9fans

say it once 'cryptographic hash'.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:15   ` northern snowfall
                       ` (2 preceding siblings ...)
  2003-06-01  0:28     ` William Josephson
@ 2003-06-02 11:37     ` Sam
  2003-06-02 12:41       ` boyd, rounin
  3 siblings, 1 reply; 44+ messages in thread
From: Sam @ 2003-06-02 11:37 UTC (permalink / raw)
  To: 9fans

> Still, the knowledge that collisions have a probability
> of occuring is slightly unsettling. What are the statistics
> regarding probability of collisions in the venti, anyways?

I do believe that Rob's answer to this question at FAST
last year was,

	``The probability that all the atoms on one side
	of the universe will up and move to the other.''

Now, whether that's scientifically accurate or simply good
for dramatic effect is another issue. :)

I still laugh when I think about the seat shifting that
occurred when he said (app), ``look, you just have to get used
to not deleting anything.  It's not a problem.''

Sam




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01 16:24                   ` David Presotto
@ 2003-06-02  4:14                     ` lucio
  0 siblings, 0 replies; 44+ messages in thread
From: lucio @ 2003-06-02  4:14 UTC (permalink / raw)
  To: 9fans

> If you have something else, it ignores it.  This was a
> failure of vision on my part, i.e., I used the same
> routine that parses the messages headers and it does that
> filtering.  Should I be more liberal for /mail/box/$user/headers?

Yes.  Although you've included X- which covers a lot of sins.  I think
"Organization:" ought to be in there, at the very least, but the user
who sets this up oughtn't to need nannying (well, it's a user,
unfortunately).

++L



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01 15:36                 ` David Presotto
  2003-06-01 16:24                   ` David Presotto
@ 2003-06-02  4:03                   ` lucio
  1 sibling, 0 replies; 44+ messages in thread
From: lucio @ 2003-06-02  4:03 UTC (permalink / raw)
  To: 9fans

> Looks like a bug to me.  I'll look at it.

The very message you replied to had the additional headers in it.  I'm
sure it was a mistake on my part.

++L



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  8:57           ` Charles Forsyth
  2003-06-01  9:06             ` lucio
@ 2003-06-01 22:55             ` Geoff Collyer
  1 sibling, 0 replies; 44+ messages in thread
From: Geoff Collyer @ 2003-06-01 22:55 UTC (permalink / raw)
  To: 9fans

I got jackpots from V6 diff often enough that I can still remember
getting them.  I can't recall any since then, and just looked at the
V7 diff sources to see why:

/*
	if(jackpot)
		mesg("jackpot",empty);
*/

Here's the explanation of what a jackpot is, from
/sys/src/cmd/diff/diffreg.c:53,58 even though Plan 9 diff no longer
contains any other references to jackpots:

*	With J in hand, the matches there recorded are
*	check'ed against reality to assure that no spurious
*	matches have crept in due to hashing. If they have,
*	they are broken, and "jackpot " is recorded--a harmless
*	matter except that a true match for a spuriously
*	mated line may now be unnecessarily reported as a change.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01 15:36                 ` David Presotto
@ 2003-06-01 16:24                   ` David Presotto
  2003-06-02  4:14                     ` lucio
  2003-06-02  4:03                   ` lucio
  1 sibling, 1 reply; 44+ messages in thread
From: David Presotto @ 2003-06-01 16:24 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

it only allows headers starting with the following:

[Hfrom]		"from:",
[Hto]		"to:",
[Hcc]		"cc:",
[Hbcc]		"bcc:",
[Hreplyto]	"reply-to:",
[Hinreplyto]	"in-reply-to:",
[Hsender]	"sender:",
[Hdate]		"date:",
[Hsubject]	"subject:",
[Hpriority]	"priority:",
[Hmsgid]	"message-id:",
[Hmime]		"mime-",
[Hcontent]	"content-",
[Hx]		"x-",

If you have something else, it ignores it.  This was a
failure of vision on my part, i.e., I used the same
routine that parses the messages headers and it does that
filtering.  Should I be more liberal for /mail/box/$user/headers?

[-- Attachment #2: Type: message/rfc822, Size: 3421 bytes --]

[-- Attachment #2.1.1: Type: text/plain, Size: 41 bytes --]

Looks like a bug to me.  I'll look at it.

[-- Attachment #2.1.2: Type: message/rfc822, Size: 1787 bytes --]

From: lucio@proxima.alt.za
To: 9fans@cse.psu.edu
Subject: Re: [9fans] compare-by-hash
Date: Sun, 1 Jun 2003 11:25:35 +0200
Message-ID: <2c612cc517a9935d33cae128875323ff@proxima.alt.za>

> PS: What's /acme/mail/Mail's secret location of user headers?  I can't
> seem to find a reference.

I thought it was /mail/box/$user/headers but I could not get it to
work.  I see that marshal(1) does specify it.  I must have had the
wrong permissions.

Sorry about the noise.

++L

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  9:25               ` lucio
@ 2003-06-01 15:36                 ` David Presotto
  2003-06-01 16:24                   ` David Presotto
  2003-06-02  4:03                   ` lucio
  0 siblings, 2 replies; 44+ messages in thread
From: David Presotto @ 2003-06-01 15:36 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 41 bytes --]

Looks like a bug to me.  I'll look at it.

[-- Attachment #2: Type: message/rfc822, Size: 1787 bytes --]

From: lucio@proxima.alt.za
To: 9fans@cse.psu.edu
Subject: Re: [9fans] compare-by-hash
Date: Sun, 1 Jun 2003 11:25:35 +0200
Message-ID: <2c612cc517a9935d33cae128875323ff@proxima.alt.za>

> PS: What's /acme/mail/Mail's secret location of user headers?  I can't
> seem to find a reference.

I thought it was /mail/box/$user/headers but I could not get it to
work.  I see that marshal(1) does specify it.  I must have had the
wrong permissions.

Sorry about the noise.

++L

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  4:07               ` northern snowfall
@ 2003-06-01 15:11                 ` boyd, rounin
  0 siblings, 0 replies; 44+ messages in thread
From: boyd, rounin @ 2003-06-01 15:11 UTC (permalink / raw)
  To: 9fans

> Get a grip

yup, cool Aerosmith CD.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  9:06             ` lucio
@ 2003-06-01  9:25               ` lucio
  2003-06-01 15:36                 ` David Presotto
  0 siblings, 1 reply; 44+ messages in thread
From: lucio @ 2003-06-01  9:25 UTC (permalink / raw)
  To: 9fans

> PS: What's /acme/mail/Mail's secret location of user headers?  I can't
> seem to find a reference.

I thought it was /mail/box/$user/headers but I could not get it to
work.  I see that marshal(1) does specify it.  I must have had the
wrong permissions.

Sorry about the noise.

++L



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  8:57           ` Charles Forsyth
@ 2003-06-01  9:06             ` lucio
  2003-06-01  9:25               ` lucio
  2003-06-01 22:55             ` Geoff Collyer
  1 sibling, 1 reply; 44+ messages in thread
From: lucio @ 2003-06-01  9:06 UTC (permalink / raw)
  To: 9fans

> what's the prize for the first person who finds a clash?
> fame, of course, but i was hoping for a more concrete `and fortune'.
>
It strikes me that reserving a byte or two in a Venti block would
allow for as many as 256 or 65536 collisions before having to panic.
Is it too high a price to pay, or am I missing something fundamental?

++L

PS: What's /acme/mail/Mail's secret location of user headers?  I can't
seem to find a reference.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  5:52         ` Dan Cross
@ 2003-06-01  8:57           ` Charles Forsyth
  2003-06-01  9:06             ` lucio
  2003-06-01 22:55             ` Geoff Collyer
  0 siblings, 2 replies; 44+ messages in thread
From: Charles Forsyth @ 2003-06-01  8:57 UTC (permalink / raw)
  To: 9fans

what's the prize for the first person who finds a clash?
fame, of course, but i was hoping for a more concrete `and fortune'.

by the way, how many of you ever had `Jackpot'?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-05-31 23:49       ` Skip Tavakkolian
@ 2003-06-01  5:52         ` Dan Cross
  2003-06-01  8:57           ` Charles Forsyth
  0 siblings, 1 reply; 44+ messages in thread
From: Dan Cross @ 2003-06-01  5:52 UTC (permalink / raw)
  To: 9fans

> A finite improbability then?

That's a good way to put it.  The more I think about the numbers
(admitedly, very little; I'm thinking more about the wrist I sprained
yesterday), I really do believe the disk would levitate before the
average venti would encounter a score collision (statistically
speaking, of course).

	- Dan C.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  2:57             ` W. Josephson
@ 2003-06-01  4:07               ` northern snowfall
  2003-06-01 15:11                 ` boyd, rounin
  0 siblings, 1 reply; 44+ messages in thread
From: northern snowfall @ 2003-06-01  4:07 UTC (permalink / raw)
  To: 9fans

>
>
>Get a grip: I'm not condemning anyone.
>
lol

>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:38         ` William K. Josephson
  2003-06-01  1:57           ` Scott Schwartz
@ 2003-06-01  2:58           ` northern snowfall
  2003-06-01  2:57             ` W. Josephson
  1 sibling, 1 reply; 44+ messages in thread
From: northern snowfall @ 2003-06-01  2:58 UTC (permalink / raw)
  To: 9fans

>
>
>It isn't a rant.
>
It's a rant when you assume things that are not proven.

>I'm just amused that people in general
>are often so worried about hash collisions but willing to
>tolerate common software systems which it is painfully
>obvious are far less reliable.
>
Maybe you should be less amused and more realistic, understanding
that most people are secure in what techniques are proven over
time. File systems bound by hashing isn't well known or used by
the majority of people. I think people are just being safe
with their data. When you put people in a situation where their
critical data is placed in an unproved containment facility,
they tend to have questions and doubts. That's nothing more
than survival.

Imagine if the CDC was introduced to a new technique for solidifying
critical germs in vacuum facilities. Don't you think they would
have a lot of questions, doubts and theories as to why the introduced
technique may not work?

>For instance, where I have
>been working recently, people will trust important data to
>file systems that are easily crashed and corrupted, but they
>worry more about sha1 hash collisions.  I find fsck and
>failed disks much scarier since they can and do burn me
>with some regularity :-)
>
Instead of condemning them for their 'worries', maybe you should
sit down with them and discuss in detail the pros and cons of
a hash based system. Allowing for open commentary and assertion
of facts will help you win over most people. They're just looking
for a sense of security and reliability. There isn't anything
wrong with that.

Work with people, not against them :)

Don

>




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  2:58           ` northern snowfall
@ 2003-06-01  2:57             ` W. Josephson
  2003-06-01  4:07               ` northern snowfall
  0 siblings, 1 reply; 44+ messages in thread
From: W. Josephson @ 2003-06-01  2:57 UTC (permalink / raw)
  To: 9fans

On Sat, May 31, 2003 at 09:58:12PM -0500, northern snowfall wrote:
> >It isn't a rant.
>
> It's a rant when you assume things that are not proven.

Stop wearing your heart on your sleeve.

> Instead of condemning them for their 'worries', maybe you should
> sit down with them and discuss in detail the pros and cons of

Get a grip: I'm not condemning anyone.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:38         ` William K. Josephson
@ 2003-06-01  1:57           ` Scott Schwartz
  2003-06-01  2:58           ` northern snowfall
  1 sibling, 0 replies; 44+ messages in thread
From: Scott Schwartz @ 2003-06-01  1:57 UTC (permalink / raw)
  To: 9fans

| I'm just amused that people in general
| are often so worried about hash collisions but willing to
| tolerate common software systems which it is painfully
| obvious are far less reliable.

Agreed.  Andrew Hume has written some nice usenix papers on the topic.

In my experience, it's not at all uncommon for a big cluster to flip
some bits when you aren't looking.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  0:28     ` William Josephson
@ 2003-06-01  1:46       ` northern snowfall
  2003-06-01  1:38         ` William K. Josephson
  0 siblings, 1 reply; 44+ messages in thread
From: northern snowfall @ 2003-06-01  1:46 UTC (permalink / raw)
  To: 9fans

>
>
>I'm not sure I see why: are you equally paranoid that the
>bits in core will be flipped by a passing alpha particle?
>
I find that to be an odd comparison. Being aware of what variables
are present in your environment and how they can possibly affect
your work is imperative. I find it hard to see that as paranoia.
I simply did not know the facts or the statistics. That's why I
ask questions ;).

>I have to confess I don't see why people are so afraid of
>randomization.
>
Who said I was afraid of randomization? I just want to know
the facts so that I am aware of the possibilities I must face
when trusting a given environment.

>For something like venti it is worth working
>out the numbers and probably worth detecting collisions, but
>the chances of silently losing/corrupting data due to disk
>firmware or driver bugs, for instance, seems much worse.
>
That may be true, but, how can I know that without any facts?
If you admit that venti is worth working out the numbers, why
even make this rant?

Don

>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:46       ` northern snowfall
@ 2003-06-01  1:38         ` William K. Josephson
  2003-06-01  1:57           ` Scott Schwartz
  2003-06-01  2:58           ` northern snowfall
  0 siblings, 2 replies; 44+ messages in thread
From: William K. Josephson @ 2003-06-01  1:38 UTC (permalink / raw)
  To: 9fans

On Sat, May 31, 2003 at 08:46:44PM -0500, northern snowfall wrote:
> >For something like venti it is worth working
> >out the numbers and probably worth detecting collisions, but
> >the chances of silently losing/corrupting data due to disk
> >firmware or driver bugs, for instance, seems much worse.
>
> That may be true, but, how can I know that without any facts?
> If you admit that venti is worth working out the numbers, why
> even make this rant?

It isn't a rant.  I'm just amused that people in general
are often so worried about hash collisions but willing to
tolerate common software systems which it is painfully
obvious are far less reliable.  For instance, where I have
been working recently, people will trust important data to
filesystems that are easily crashed and corrupted, but they
worry more about sha1 hash collisions.  I find fsck and
failed disks much scarier since they can and do burn me
with some regularity :-)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  0:23     ` Russ Cox
@ 2003-06-01  1:23       ` northern snowfall
  0 siblings, 0 replies; 44+ messages in thread
From: northern snowfall @ 2003-06-01  1:23 UTC (permalink / raw)
  To: 9fans

>
>
>Read the Venti paper and the link that was just posted.
>
Done

>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  0:00 ` Russ Cox
@ 2003-06-01  1:15   ` northern snowfall
  2003-06-01  0:23     ` Russ Cox
                       ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: northern snowfall @ 2003-06-01  1:15 UTC (permalink / raw)
  To: 9fans

>
>
>And then you're out of luck, but at least you're
>not blissfully using the wrong data.
>
Still, the knowledge that collisions have a probability
of occuring is slightly unsettling. What are the statistics
regarding probability of collisions in the venti, anyways?

>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:15   ` northern snowfall
  2003-06-01  0:23     ` Russ Cox
  2003-06-01  0:24     ` Dan Cross
@ 2003-06-01  0:28     ` William Josephson
  2003-06-01  1:46       ` northern snowfall
  2003-06-02 11:37     ` Sam
  3 siblings, 1 reply; 44+ messages in thread
From: William Josephson @ 2003-06-01  0:28 UTC (permalink / raw)
  To: 9fans

On Sat, May 31, 2003 at 08:15:20PM -0500, northern snowfall wrote:
> >And then you're out of luck, but at least you're
> >not blissfully using the wrong data.
>
> Still, the knowledge that collisions have a probability
> of occuring is slightly unsettling. What are the statistics

I'm not sure I see why: are you equally paranoid that the
bits in core will be flipped by a passing alpha particle?
I have to confess I don't see why people are so afraid of
randomization.  For something like venti it is worth working
out the numbers and probably worth detecting collisions, but
the chances of silently losing/corrupting data due to disk
firmware or driver bugs, for instance, seems much worse.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:15   ` northern snowfall
  2003-06-01  0:23     ` Russ Cox
@ 2003-06-01  0:24     ` Dan Cross
  2003-05-31 23:49       ` Skip Tavakkolian
  2003-06-01  0:28     ` William Josephson
  2003-06-02 11:37     ` Sam
  3 siblings, 1 reply; 44+ messages in thread
From: Dan Cross @ 2003-06-01  0:24 UTC (permalink / raw)
  To: 9fans

> >And then you're out of luck, but at least you're
> >not blissfully using the wrong data.
>
> Still, the knowledge that collisions have a probability
> of occuring is slightly unsettling. What are the statistics
> regarding probability of collisions in the venti, anyways?

Extremely low.  It's much more likely the disk will spontaneously
levitate first.

	- Dan C.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  1:15   ` northern snowfall
@ 2003-06-01  0:23     ` Russ Cox
  2003-06-01  1:23       ` northern snowfall
  2003-06-01  0:24     ` Dan Cross
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Russ Cox @ 2003-06-01  0:23 UTC (permalink / raw)
  To: 9fans

> Still, the knowledge that collisions have a probability
> of occuring is slightly unsettling. What are the statistics
> regarding probability of collisions in the venti, anyways?

Read the Venti paper and the link that was just posted.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [9fans] compare-by-hash
  2003-05-31 23:58 philw
@ 2003-06-01  0:00 ` Russ Cox
  2003-06-01  1:15   ` northern snowfall
  0 siblings, 1 reply; 44+ messages in thread
From: Russ Cox @ 2003-06-01  0:00 UTC (permalink / raw)
  To: 9fans

And then you're out of luck, but at least you're
not blissfully using the wrong data.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [9fans] compare-by-hash
@ 2003-05-31 23:58 philw
  2003-06-01  0:00 ` Russ Cox
  0 siblings, 1 reply; 44+ messages in thread
From: philw @ 2003-05-31 23:58 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 664 bytes --]

and then what?

	-----Original Message----- 
	From: Russ Cox [mailto:rsc@plan9.bell-labs.com] 
	Sent: Sat 5/31/2003 4:54 PM 
	To: 9fans@cse.psu.edu 
	Cc: 
	Subject: Re: [9fans] compare-by-hash
	
	

	The paper seems correct on most things, but is unfair to Venti.
	
	Venti is closer to hashing than compare-by-hash.
	Venti does look for SHA1 hash collisions -- once a block with a
	particular SHA1 hash has been written, you can't write any
	others.  Therefore you can't possibly end up thinking there are
	two different blocks stored on the same server and represented
	by the same SHA1 hash -- the store of the second will fail!
	
	Russ
	


[-- Attachment #2.1: Type: text/plain, Size: 268 bytes --]

The following attachment had content that we can't
prove to be harmless.  To avoid possible automatic
execution, we changed the content headers.
The original header was:

	Content-Type: application/ms-tnef;
	name="winmail.dat"
	Content-Transfer-Encoding: base64

[-- Attachment #2.2: winmail.dat.suspect --]
[-- Type: application/octet-stream, Size: 3710 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-05-31 23:48 Taj Khattra
  2003-05-31 23:48 ` Charles Forsyth
  2003-05-31 23:50 ` Charles Forsyth
@ 2003-05-31 23:54 ` Russ Cox
  2 siblings, 0 replies; 44+ messages in thread
From: Russ Cox @ 2003-05-31 23:54 UTC (permalink / raw)
  To: 9fans

The paper seems correct on most things, but is unfair to Venti.

Venti is closer to hashing than compare-by-hash.
Venti does look for SHA1 hash collisions -- once a block with a
particular SHA1 hash has been written, you can't write any
others.  Therefore you can't possibly end up thinking there are
two different blocks stored on the same server and represented
by the same SHA1 hash -- the store of the second will fail!

Russ


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-05-31 23:48 Taj Khattra
  2003-05-31 23:48 ` Charles Forsyth
@ 2003-05-31 23:50 ` Charles Forsyth
  2003-05-31 23:54 ` Russ Cox
  2 siblings, 0 replies; 44+ messages in thread
From: Charles Forsyth @ 2003-05-31 23:50 UTC (permalink / raw)
  To: 9fans

i did also think it was rich that someone from Sun suggests
``keep some state!'' to avoid undetected errors when people
have had to suffer `stateless' NFS in various ways for years.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-06-01  0:24     ` Dan Cross
@ 2003-05-31 23:49       ` Skip Tavakkolian
  2003-06-01  5:52         ` Dan Cross
  0 siblings, 1 reply; 44+ messages in thread
From: Skip Tavakkolian @ 2003-05-31 23:49 UTC (permalink / raw)
  To: 9fans

> Extremely low.  It's much more likely the disk will spontaneously
> levitate first.

A finite improbability then?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [9fans] compare-by-hash
  2003-05-31 23:48 Taj Khattra
@ 2003-05-31 23:48 ` Charles Forsyth
  2003-05-31 23:50 ` Charles Forsyth
  2003-05-31 23:54 ` Russ Cox
  2 siblings, 0 replies; 44+ messages in thread
From: Charles Forsyth @ 2003-05-31 23:48 UTC (permalink / raw)
  To: 9fans

i suspect venti does actually do the extra check:

	packetSha1(p, score);

	u = lookupLump(score, type);
	if(u->data != nil){
		ok = 1;
		if(packetCmp(p, u->data) != 0){
			setErr(EStrange, "score collision");
			ok = 0;
		}



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [9fans] compare-by-hash
@ 2003-05-31 23:48 Taj Khattra
  2003-05-31 23:48 ` Charles Forsyth
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Taj Khattra @ 2003-05-31 23:48 UTC (permalink / raw)
  To: 9fans

do the venti/fossil folks have any comments on the
'An Analysis of Compare-by-hash' paper at HotOS-IX

	http://www.usenix.org/events/hotos03/tech/henson.html

or is it crying wolf ?

-taj


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2003-08-03  4:39 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-31 17:24 [9fans] compare-by-hash Joel Salomon
2003-07-31 17:50 ` andrey mirtchovski
2003-07-31 17:51 ` Sape Mullender
2003-07-31 17:53 ` rog
2003-07-31 18:03 ` jmk
2003-08-01  2:45 ` Joel Salomon
2003-08-01  2:52   ` Geoff Collyer
2003-08-01  3:42     ` jmk
2003-08-01  4:12       ` Geoff Collyer
2003-08-01 15:23       ` Jack Johnson
2003-08-01  9:03     ` Anthony Mandic
2003-08-01 15:11     ` Jack Johnson
2003-08-01  9:02 ` Douglas A. Gwyn
  -- strict thread matches above, loose matches on Subject: below --
2003-08-03  4:39 Andrew Simmons
2003-05-31 23:58 philw
2003-06-01  0:00 ` Russ Cox
2003-06-01  1:15   ` northern snowfall
2003-06-01  0:23     ` Russ Cox
2003-06-01  1:23       ` northern snowfall
2003-06-01  0:24     ` Dan Cross
2003-05-31 23:49       ` Skip Tavakkolian
2003-06-01  5:52         ` Dan Cross
2003-06-01  8:57           ` Charles Forsyth
2003-06-01  9:06             ` lucio
2003-06-01  9:25               ` lucio
2003-06-01 15:36                 ` David Presotto
2003-06-01 16:24                   ` David Presotto
2003-06-02  4:14                     ` lucio
2003-06-02  4:03                   ` lucio
2003-06-01 22:55             ` Geoff Collyer
2003-06-01  0:28     ` William Josephson
2003-06-01  1:46       ` northern snowfall
2003-06-01  1:38         ` William K. Josephson
2003-06-01  1:57           ` Scott Schwartz
2003-06-01  2:58           ` northern snowfall
2003-06-01  2:57             ` W. Josephson
2003-06-01  4:07               ` northern snowfall
2003-06-01 15:11                 ` boyd, rounin
2003-06-02 11:37     ` Sam
2003-06-02 12:41       ` boyd, rounin
2003-05-31 23:48 Taj Khattra
2003-05-31 23:48 ` Charles Forsyth
2003-05-31 23:50 ` Charles Forsyth
2003-05-31 23:54 ` Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).