9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Qid path generation
@ 2003-05-05 21:29 Andrew Simmons
  2003-05-05 21:30 ` W. Josephson
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Simmons @ 2003-05-05 21:29 UTC (permalink / raw)
  To: 9fans

I'm pondering how to generate unique qid paths for NTFS/FAT32 files. The 2nd
ed toolset for Windows has a function called "pathqid" which generates a 4
byte number from the filename. I'd appreciate any hints on the workings of
this function, particularly the choice of the magic number 1000003, the
reason for masking with ~CHDIR, and the role of the first parameter, which
always seems to have 0 passed in - plus any ideas on extending the method to
8 bytes, or a better method.

Also, I would guess that a method such as pathqid could occasionally
generate the same number from two different file names. Any guesses as to
the impact of this on a client (eg 9fs)?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:29 [9fans] Qid path generation Andrew Simmons
@ 2003-05-05 21:30 ` W. Josephson
  2003-05-05 21:33 ` rsc
  2003-05-06  1:10 ` David Presotto
  2 siblings, 0 replies; 10+ messages in thread
From: W. Josephson @ 2003-05-05 21:30 UTC (permalink / raw)
  To: 9fans

On Tue, May 06, 2003 at 09:29:23AM +1200, Andrew Simmons wrote:
> Also, I would guess that a method such as pathqid could occasionally
> generate the same number from two different file names. Any guesses as to
> the impact of this on a client (eg 9fs)?

It'll bugger bind and most anything that tries to cache.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:29 [9fans] Qid path generation Andrew Simmons
  2003-05-05 21:30 ` W. Josephson
@ 2003-05-05 21:33 ` rsc
  2003-05-05 21:45   ` Andrew Simmons
  2003-05-06  1:09   ` David Presotto
  2003-05-06  1:10 ` David Presotto
  2 siblings, 2 replies; 10+ messages in thread
From: rsc @ 2003-05-05 21:33 UTC (permalink / raw)
  To: 9fans

> I'm pondering how to generate unique qid paths for NTFS/FAT32 files. The 2nd
> ed toolset for Windows has a function called "pathqid" which generates a 4
> byte number from the filename. I'd appreciate any hints on the workings of
> this function, particularly the choice of the magic number 1000003, the
> reason for masking with ~CHDIR, and the role of the first parameter, which
> always seems to have 0 passed in - plus any ideas on extending the method to
> 8 bytes, or a better method.

It's just a hash function.

In old Plan 9, the CHDIR bit indicated that the file was a directory.

The extra parameter let you start the hash at a different value,
so that if you knew the hash of a directory, you could compute
the hash of a file within that directory without rehashing the
directory name.

> Also, I would guess that a method such as pathqid could occasionally
> generate the same number from two different file names. Any guesses as to
> the impact of this on a client (eg 9fs)?

It matters when you try to bind or mount on those paths.
Mounts on top of one would also appear on top of the other.
Other programs like cp also use the qids to avoid copying
a file onto itself.  Also, as wkj pointed out, cfs and mount -C
cache based on qids.

I think there might be better ways to get a unique identifier
out of Windows these days.  U9fs, for example, uses the inode
number and device number.  Maybe Windows will let you get at
similar information now.

Russ



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:33 ` rsc
@ 2003-05-05 21:45   ` Andrew Simmons
  2003-05-06 11:29     ` Aharon Robbins
  2003-05-06  1:09   ` David Presotto
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Simmons @ 2003-05-05 21:45 UTC (permalink / raw)
  To: 9fans

> I think there might be better ways to get a unique identifier
> out of Windows these days.

You'd think so, but I'm buggered if I can find them. I'll keep looking.
Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:33 ` rsc
  2003-05-05 21:45   ` Andrew Simmons
@ 2003-05-06  1:09   ` David Presotto
  1 sibling, 0 replies; 10+ messages in thread
From: David Presotto @ 2003-05-06  1:09 UTC (permalink / raw)
  To: 9fans

>
> I think there might be better ways to get a unique identifier
> out of Windows these days.  U9fs, for example, uses the inode
> number and device number.  Maybe Windows will let you get at
> similar information now.
>

I've been using a hash of the full windows path name and the
creation time as the qid.  It has the advantage of working
both for NTFS and FAT file systems.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:29 [9fans] Qid path generation Andrew Simmons
  2003-05-05 21:30 ` W. Josephson
  2003-05-05 21:33 ` rsc
@ 2003-05-06  1:10 ` David Presotto
  2003-05-06  1:23   ` Andrew Simmons
  2 siblings, 1 reply; 10+ messages in thread
From: David Presotto @ 2003-05-06  1:10 UTC (permalink / raw)
  To: 9fans

By the way, here's my hash function.  I pass it the full path
name and creation time:


static uvlong
hash(Rune *p, ulong seed)
{
	ulong x;
	uvlong xx;

	x = seed;
	while(*p)
		x = x*1103515245 + 12345 + *p++;
	xx = x;
	xx <<= 32;
	xx |= seed;
	return xx;
}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-06  1:10 ` David Presotto
@ 2003-05-06  1:23   ` Andrew Simmons
  2003-05-06  2:29     ` David Presotto
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Simmons @ 2003-05-06  1:23 UTC (permalink / raw)
  To: 9fans


> By the way, here's my hash function.  I pass it the full path
> name and creation time:
>

Thanks, I'll give that a go. I don't know much about the theory of hash
keys. Does the addition of the creation date to the path significantly
reduce the likelihood of collisions?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-06  1:23   ` Andrew Simmons
@ 2003-05-06  2:29     ` David Presotto
  2003-05-06  3:05       ` Russ Cox
  0 siblings, 1 reply; 10+ messages in thread
From: David Presotto @ 2003-05-06  2:29 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 1375 bytes --]

Actually I'm glad you asked since its made me think a
bit more about it.

I wanted something that would
notice a file being created, deleted, and a new one created
with the same name.  Making the qid.version the modification time
might be good enough.  However, in the case of files being
used only for appends, I wanted to distinguish a new version of
the same file from one that just had something appended to
it.  Hence I used the creation time as part of the qid.path.

Of course, that means that there's more likelyhood for the
qid.paths of two files created in the same second to collide.

If you assume we're using a sha1-like hash, the likelyhood of a
collision between 2 randomly chosen files using a 64 bit hash
is about 2^-32; using a 32 bit hash, 2^-16.  That means that we're
making it 65000 times more likely of a collision between two
random files created the same second.

I've now convinced myself that I'ld be better off taking
64 bits of the sha1 hash of the full path appended to the
creation time.  Then the likelyhood of any two files colliding
is still < 2^-32 and I still distinguish two files created
with the same path but at different times.  Then I can also
use the full 64 bit time available in NTFS.

Of course, SHA1 is a lot slower than the hash I currently use
but processors are getting pretty damned fast anyways.

[-- Attachment #2: Type: message/rfc822, Size: 1927 bytes --]

From: "Andrew Simmons" <andrew.simmons@monitorbm.co.nz>
To: <9fans@cse.psu.edu>
Subject: Re: [9fans] Qid path generation
Date: Tue, 6 May 2003 13:23:23 +1200
Message-ID: <01c901c3136e$16616dc0$3f00a8c0@MERCURY>


> By the way, here's my hash function.  I pass it the full path
> name and creation time:
>

Thanks, I'll give that a go. I don't know much about the theory of hash
keys. Does the addition of the creation date to the path significantly
reduce the likelihood of collisions?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-06  2:29     ` David Presotto
@ 2003-05-06  3:05       ` Russ Cox
  0 siblings, 0 replies; 10+ messages in thread
From: Russ Cox @ 2003-05-06  3:05 UTC (permalink / raw)
  To: 9fans

The 64-bit hash for qids is a good idea
except that exportfs and programs like it assume that
the top 16 bits are fairly low entropy so they can be
renamed as necessary to present many 9P servers
to a single client.  Taking 48 bits of it will give you
2^24, which is halfway between the two and might
be enough to appease both camps.

Russ



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] Qid path generation
  2003-05-05 21:45   ` Andrew Simmons
@ 2003-05-06 11:29     ` Aharon Robbins
  0 siblings, 0 replies; 10+ messages in thread
From: Aharon Robbins @ 2003-05-06 11:29 UTC (permalink / raw)
  To: 9fans

In article <018e01c3134f$af81cf50$3f00a8c0@MERCURY>,
Andrew Simmons <9fans@cse.psu.edu> wrote:
>> I think there might be better ways to get a unique identifier
>> out of Windows these days.
>
>You'd think so, but I'm buggered if I can find them. I'll keep looking.
>Thanks.

There is some low-level windows function, something like _realpath() that
takes a filename and returns the full path to the file; this is guaranteed
to be unique.  I don't know if that helps any.

Arnold
--
Aharon (Arnold) Robbins --- Pioneer Consulting Ltd.	arnold@skeeve.com
P.O. Box 354		Home Phone: +972  8 979-0381	Fax: +1 928 569 9018
Nof Ayalon		Cell Phone: +972 51  297-545
D.N. Shimshon 99785	ISRAEL


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-06 11:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-05 21:29 [9fans] Qid path generation Andrew Simmons
2003-05-05 21:30 ` W. Josephson
2003-05-05 21:33 ` rsc
2003-05-05 21:45   ` Andrew Simmons
2003-05-06 11:29     ` Aharon Robbins
2003-05-06  1:09   ` David Presotto
2003-05-06  1:10 ` David Presotto
2003-05-06  1:23   ` Andrew Simmons
2003-05-06  2:29     ` David Presotto
2003-05-06  3:05       ` Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).