9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: David Presotto <presotto@closedmind.org>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Qid path generation
Date: Mon,  5 May 2003 22:29:25 -0400	[thread overview]
Message-ID: <b4460a1b789c3d712983b272f0488dbb@plan9.bell-labs.com> (raw)
In-Reply-To: <01c901c3136e$16616dc0$3f00a8c0@MERCURY>

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

Actually I'm glad you asked since its made me think a
bit more about it.

I wanted something that would
notice a file being created, deleted, and a new one created
with the same name.  Making the qid.version the modification time
might be good enough.  However, in the case of files being
used only for appends, I wanted to distinguish a new version of
the same file from one that just had something appended to
it.  Hence I used the creation time as part of the qid.path.

Of course, that means that there's more likelyhood for the
qid.paths of two files created in the same second to collide.

If you assume we're using a sha1-like hash, the likelyhood of a
collision between 2 randomly chosen files using a 64 bit hash
is about 2^-32; using a 32 bit hash, 2^-16.  That means that we're
making it 65000 times more likely of a collision between two
random files created the same second.

I've now convinced myself that I'ld be better off taking
64 bits of the sha1 hash of the full path appended to the
creation time.  Then the likelyhood of any two files colliding
is still < 2^-32 and I still distinguish two files created
with the same path but at different times.  Then I can also
use the full 64 bit time available in NTFS.

Of course, SHA1 is a lot slower than the hash I currently use
but processors are getting pretty damned fast anyways.

[-- Attachment #2: Type: message/rfc822, Size: 1927 bytes --]

From: "Andrew Simmons" <andrew.simmons@monitorbm.co.nz>
To: <9fans@cse.psu.edu>
Subject: Re: [9fans] Qid path generation
Date: Tue, 6 May 2003 13:23:23 +1200
Message-ID: <01c901c3136e$16616dc0$3f00a8c0@MERCURY>


> By the way, here's my hash function.  I pass it the full path
> name and creation time:
>

Thanks, I'll give that a go. I don't know much about the theory of hash
keys. Does the addition of the creation date to the path significantly
reduce the likelihood of collisions?

  reply	other threads:[~2003-05-06  2:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-05-05 21:29 Andrew Simmons
2003-05-05 21:30 ` W. Josephson
2003-05-05 21:33 ` rsc
2003-05-05 21:45   ` Andrew Simmons
2003-05-06 11:29     ` Aharon Robbins
2003-05-06  1:09   ` David Presotto
2003-05-06  1:10 ` David Presotto
2003-05-06  1:23   ` Andrew Simmons
2003-05-06  2:29     ` David Presotto [this message]
2003-05-06  3:05       ` Russ Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4460a1b789c3d712983b272f0488dbb@plan9.bell-labs.com \
    --to=presotto@closedmind.org \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).