Re: [9fans] 9P2000

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] 9P2000
@ 2001-01-31  2:16 rob pike
  2001-01-31  8:56 ` Mike Haertel
  0 siblings, 1 reply; 10+ messages in thread
From: rob pike @ 2001-01-31  2:16 UTC (permalink / raw)
  To: 9fans

Thanks for your thorough reading and commentary.  As you can
imagine, there were endless weeks of debate over many of the
issues in the protocol.

	* restrict allowable contents of file, owner, and group names
	  at the protocol level to be equivalent to the restrictions
	  imposed at the Plan 9 kernel level.

It's unwise to prohibit things in a protocol when you don't know for
certain that they aren't useful.  We have changed the legal character
set for file names in Plan 9 several times, but have not yet had to
make any change to the character set for 9P.  This is evidence that
we're making prohibitions in the right places.

	* eliminate the useless special case of ~0 tags.

Only useless if you don't allow Tsession to reset a connection.

	* eliminate multiple Tsessions from the protocol; require
	  that each connection begin with exactly one Tversion,
	  exactly one Tsession, and disallow any further occurrences
	  of Tsession and Tversion in the conversation.  then the
	  funny "aborts all transactions" semantics of Tsession can
	  also be eliminated.

The requirement for Tsession to reset a connection is there so 9P can
be used on point-to-point wired networks such as the old BIT32 or
other back-to-back bus devices.  If every connection to the server
begins as an IP call, I admit Tsession is less useful, but there are
setups for which Tsession provides the necessary reset capability.  ~0
is a reserved tag so one may always issue a `reset' this way.

	* specify a "minimum maximum" msize that a client can request
	  in such a way that the client can always read() any stat
	  structure that the server might need to return for any
	  possible directory entry.

The issue of going variable-sized in this protocol is by far the most
subtle, difficult, and pervasive issue.  There are still rough edges
around all this area.  The new kernel does only very preliminary stuff
here.  There's no question we need to be careful, but it's all so new
(and hard!  dirread was a bear!)  I want practice to guide our design.
The basics are in the protocol spec. but lots of details remain to be
filled in.

The particular case you raise here is nasty, but only superficially.
You get an error back on the stat or read; that can happen anyway.

The problem with setting a minimum is that it cascades into other size
issues.  How big is the biggest stat?  How does the server know a
priori?  Etc. Etc. Also, if you set a minimum, it means you can't use
that connection to encapsulate.  That feels ugly.

Again, I want to see how this plays out before defining more
explicitly.

	* expand time stamps to 64 bits for posterity.

2038 is coming, but the clock actually overflows in 2106.  This is
one we talked to death.  The earliest drafts had 64-bit clocks but
we backed down for several reasons:
1) Clocks are used everywhere and vlongs are large, slow, and
	painful to work with.
2) Too many other fields have doubled in size; we wanted to
	keep the overall size of the protocol small.  Much of the
	purpose of this revision is performance.   (For instance,
	we kept Qid.version at 32 bits.)
2) All the existing interface software uses 32 bit clocks.
3) 2106 is a long time from now.
4) Mk's clock resolution may be an issue, but mk is enough of
	a crock we'd rather force people to think about how to
	build software than make everything slower to support
	mk.
5) Retrofitting existing timestamps into 64-bit resolution is
	a nasty issue for servers.
6) Finally, no design for how the 64-bit clock should be set
	up looked good in practice.  Nanoseconds? Microseconds?
	Milliseconds?  If we choose (say) microseconds, what does
	it mean to say a file has that time stamp?  The bits may
	be there but they're meaningless in practice. And whatever
	you choose, it's wrong for some future technology if you
	depend on the precision of the clock to make critical
	decisions, as does mk.  Better to face the real issue some
	other way and keep times around mostly for humans,
	as in ls -l output.
In short, leaving clocks alone, as 32 bits of seconds from 1970.0, is
compatible with every other system out there, including our own.

	* forbid attempts in wstat to alter the length of a directory.

This may make sense, but I hardly think it's worth the time to
specify.  And again, that issue about forbidding things too soon...

	* remove discussion of Plan 9 group leader semantics and
	  other weird stuff from the protocol specification.
	  similarly remove the claim that wstat cannot change file
	  ownership from the specification.  instead say that
	  allowable owner, group, and permission changes are
	  determined at the discretion of whatever security policy
	  the server chooses to implement.  (the discussion of Plan 9
	  group semantics would presumably migrate to the man page
	  for the specific file server.)

I have some sympathy with this one.  The protocol is a peculiar
place to write down all this permission stuff.

	* ensure that walks to .. are reliable by explicitly requiring
	  at the protocol level that the hierarchy is always a strict tree.

The hierarchy is not a strict tree in many of our existing servers.

	* disallow walks to "" (the zero length name) in addition to the
	  already-disallowed walks to "."

Existing practice depends on "" meaning ".", particularly within
file names such as "#e".  I'm not sure empty strings go across the wire,
but I'm also not sure they don't.  This one may be worth clarifying.

	* in walk operations that fail, newfid should be implicitly clunked
	  unless it was equal to fid.

The manual says that newfid is unaffected in that case, and that
newfid must not be in use.  I think this is correct and sufficient.

-rob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
  2001-01-31  2:16 [9fans] 9P2000 rob pike
@ 2001-01-31  8:56 ` Mike Haertel
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Haertel @ 2001-01-31  8:56 UTC (permalink / raw)
  To: 9fans

>	* eliminate multiple Tsessions from the protocol; require
>	  that each connection begin with exactly one Tversion,
>	  exactly one Tsession, and disallow any further occurrences
>	  of Tsession and Tversion in the conversation.  then the
>	  funny "aborts all transactions" semantics of Tsession can
>	  also be eliminated.
>
>The requirement for Tsession to reset a connection is there so 9P can
>be used on point-to-point wired networks such as the old BIT32 or
>other back-to-back bus devices.  If every connection to the server
>begins as an IP call, I admit Tsession is less useful, but there are
>setups for which Tsession provides the necessary reset capability.  ~0
>is a reserved tag so one may always issue a `reset' this way.

Ok, that's what I thought.  Did you take the time to read my
argument (in the longer discussion) that this is a bogus
consideration?

Here's why: imagine you have a hard-wired connection.  Suppose
moreover that your client crashes *in the middle* of sending
a message.  Then the client reboots and starts by sending a
new Tversion message, but the server still thinks it is in
the middle of whatever message the client was previously
sending.  So the server never resynchronizes with the client,
and the client ends up thinking the server is being stubborn
and never responding, or else sending back garbage.

In order to avoid this scenario you need some kind of markers
that the server can look for once it realizes it has become
desynchronized.  One simple approach might be simply to prefix
every 9P message with a particular magic byte that the server
can look for.  As long as that magic byte is seen whenever
the server is about to begin reading a new message, it knows
it is (probably) synchronized.  If it becomes desynchronized
it can hunt for the magic byte in an attempt to become
resychronized.

This is the sort of service an underlying transport protocol
provides robustly.  You could include this functionality in
9P, but why?  The Tsession stuff has to be one of the most
non-robust ways of doing this that I have ever seen.  If you've
had no problem with unencapsulated 9P on hard-wired links so
far, it's only because you've been lucky.  Better by far to
assume a real underlying transport layer.  It could be as
simple as a trivial wrapper that puts delimiter bytes on
messages before sending them on a your hardwired connection.
Even something as simple as that will do a better job of crash
recovery than 9P by itself.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
  2001-01-31  9:31 Russ Cox
@ 2001-01-31 17:46 ` Mike Haertel
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Haertel @ 2001-01-31 17:46 UTC (permalink / raw)
  To: 9fans

>Aren't there two issues here?  One is resynchronizing
>the message stream, so that both sides agree on the 
>message boundaries.  The other is resynchronizing 
>the 9P conversation state, so that both sides agree
>on which tags and fids are in use and what they mean.

Yes.

>Something (an underlying transport protocol, say) needs
>to provide the first capability, but without the second
>you're still hosed.  In an IP environment, you can drop
>and redial the connection,

Yes, in an IP environment, the connection gets closed, and the
other end of the conversation detects that *independently of
the 9P byte stream*, when attempting to read or write the connection
returns some kind of "EOF" indication.

>but if you've got a hard-wired
>link, you need an explicit restart within the protocol,
>hence Tsession, no?

Nope.  Because we've already established that in a a hard-wired
environment 9P cannot reliably be the lowest level protocol.

Therefore, we know we already NEED a lower level below 9P,
just to delimit message boundaries.  Why not just make that
lower level also know how to "return EOF"?

Then, to the higher 9P level, the hard wired link would look
*just like* and IP connection.  So the higher level would have
only one execution environment to cope with, instead of two
subtly different ones.

Let
	A = total_complexity_of(9P + Tsession abort and ~0 tags)
	B = total_complexity_of(encapsulation layer that doesn't "return EOF")
	C = total_complexity_of(9P with those features removed)
	D = total_complexity_of(encapsulation layer that does return EOF)

My argument is simply that

	A + B > C + D

But, if you guys aren't comfortable with this, I guess it's
not worth arguing about further.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
@ 2001-01-31  9:31 Russ Cox
  2001-01-31 17:46 ` Mike Haertel
  0 siblings, 1 reply; 10+ messages in thread
From: Russ Cox @ 2001-01-31  9:31 UTC (permalink / raw)
  To: 9fans

[I'm just confused trying to follow the argument.  Feel free to ignore.]

Aren't there two issues here?  One is resynchronizing
the message stream, so that both sides agree on the 
message boundaries.  The other is resynchronizing 
the 9P conversation state, so that both sides agree
on which tags and fids are in use and what they mean.

Something (an underlying transport protocol, say) needs
to provide the first capability, but without the second
you're still hosed.  In an IP environment, you can drop
and redial the connection, but if you've got a hard-wired
link, you need an explicit restart within the protocol,
hence Tsession, no?

Russ

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
@ 2001-01-31  2:18 rob pike
  0 siblings, 0 replies; 10+ messages in thread
From: rob pike @ 2001-01-31  2:18 UTC (permalink / raw)
  To: 9fans

	Consider changing the format of this table to read:
		size[4] Tversion[1] tag[2] msize[4] version[s]

Right. This was just a mistake that others have pointed out
and has been fixed.

	By the way, one thing I really like about the new encoding is that
	emulating the old "fcall" streams module becomes trivial.

There is in fact no fcall any more; the mount driver gets it right.
But this only supports your observation.

-rob



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
  2001-01-30 12:09 rog
@ 2001-01-30 18:04 ` Mike Haertel
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Haertel @ 2001-01-30 18:04 UTC (permalink / raw)
  To: 9fans; +Cc: rog

>there's one case where the client has to be a bit careful about the
>size of messages it generates. a Twalk message can itself fit inside
>the negotiated msize, but require an Rwalk that will not do so. (e.g.
>walking down several short pathname elements).  it might be worth
>requiring a minimum message size of 1+4+2+2+MAXWELEM*13 = 217 which
>would avoid this problem.

This is not nearly as bad as the directory entry situation.  A
client that specified a very small mside could be held responsible
for not producing Twalks whose corresponding Rwalks would exceed
the msize; this would be under control of the client and so is
an avoidable situation.

But the client has no control whatever over the size of a directory
entry it is about to read.  It is helpless.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
@ 2001-01-30 12:09 rog
  2001-01-30 18:04 ` Mike Haertel
  0 siblings, 1 reply; 10+ messages in thread
From: rog @ 2001-01-30 12:09 UTC (permalink / raw)
  To: 9fans

ducky.net!mike wrote:
> (B) the byte count of the directory entry might result
> in the required size of the Rread message exceeding the negotiated
> maximum transaction size between the 9P client and server.
[...]
> Scenario (B) is bad.  There is no easy way for the client to recover.
> Certainly the client application can do nothing about it: the protocol
> connection is already established and the msize is fixed in stone.

my understanding was that if a client tried to negotiate a message size
that was too small for the server's maximum filename size, the server
would yield an Rerror "message size too small" or somesuch.

this, perhaps, would be one reason to allow multiple Tversions - if a
Tversion has resulted in an Rerror, then surely it should be possible
to negotiate another version or msize.

there's one case where the client has to be a bit careful about the
size of messages it generates. a Twalk message can itself fit inside
the negotiated msize, but require an Rwalk that will not do so. (e.g.
walking down several short pathname elements).  it might be worth
requiring a minimum message size of 1+4+2+2+MAXWELEM*13 = 217 which
would avoid this problem.

> Assuming the server does *not* reject truncation of a directory to
> length 0, should a client assume that all files under the directory
> have been removed? This is another one of those possible complications
> that I think should be eliminated by specifying them out of the
> protocol: always reject attempts by wstat to change the length of
> a directory.

a directory has a conventional length of 0 anyway, so it would make
sense if setting the length of a directory to zero was a no-op.

> If the walk operation fails, does newfid exist (and point to the
> same qid as fid), or is it implicitly clunked?

and quoted previously:

> > Also, nqid will always
> > be less than or equal to nwname.  Only if it is equal, how-
> > ever, will newfid be affected, in which case it will repre-
> > sent the file reached by the final elementwise walk
> > requested in the message.

i.e. if the walk operation fails, newfid is not affected (created or
walked).

  cheers,
    rog.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
  2001-01-27 21:58 rob pike
  2001-01-28 16:29 ` Sam Ducksworth
@ 2001-01-30  9:21 ` Mike Haertel
  1 sibling, 0 replies; 10+ messages in thread
From: Mike Haertel @ 2001-01-30  9:21 UTC (permalink / raw)
  To: 9fans; +Cc: mike, rob

Here are some reactions.

They mostly boil down to suggested changes that will make the
specification of the protocol both simpler and more bulletproof.

Here is a concise summary of the proposed protocol changes:

	* restrict allowable contents of file, owner, and group names
	  at the protocol level to be equivalent to the restrictions
	  imposed at the Plan 9 kernel level.

	* eliminate the useless special case of ~0 tags.

	* eliminate multiple Tsessions from the protocol; require
	  that each connection begin with exactly one Tversion,
	  exactly one Tsession, and disallow any further occurrences
	  of Tsession and Tversion in the conversation.  then the
	  funny "aborts all transactions" semantics of Tsession can
	  also be eliminated.

	* specify a "minimum maximum" msize that a client can request
	  in such a way that the client can always read() any stat
	  structure that the server might need to return for any
	  possible directory entry.

	* expand time stamps to 64 bits for posterity.

	* forbid attempts in wstat to alter the length of a directory.

	* remove discussion of Plan 9 group leader semantics and
	  other weird stuff from the protocol specification.
	  similarly remove the claim that wstat cannot change file
	  ownership from the specification.  instead say that
	  allowable owner, group, and permission changes are
	  determined at the discretion of whatever security policy
	  the server chooses to implement.  (the discussion of Plan 9
	  group semantics would presumably migrate to the man page
	  for the specific file server.)

	* ensure that walks to .. are reliable by explicitly requiring
	  at the protocol level that the hierarchy is always a strict tree.

	* disallow walks to "" (the zero length name) in addition to the
	  already-disallowed walks to "."

	* in walk operations that fail, newfid should be implicitly clunked
	  unless it was equal to fid.

Long discussion follows...

(There are also a few stylistic comments.)

>     INTRO(5)                                                 INTRO(5)

>               Tversion  size[4] tag[2] msize[4] version[s]
>               Rversion  size[4] tag[2] msize[4] version[s]
>		[...]

Consider changing the format of this table to read:

	size[4] Tversion[1] tag[2] msize[4] version[s]
	size[4] Rversion[1] tag[2] msize[4] version[s]
	[...]

to explicitly show the placement of the type byte in each message.

The old format was appropriate when the type byte was the first
byte of the message, but with the new protocol the table was
confusing until I reread the preceding paragraph that described
the placement of the message type byte after the size[4].  This
proposed format is self documenting at a glance.

By the way, one thing I really like about the new encoding is that
emulating the old "fcall" streams module becomes trivial.

>[...]
>          (Systems may choose
>          to reduce the set of legal characters to reduce syntactic
>          problems, for example to remove slashes from name compo-
>          nents, but the protocol has no such restriction.  Plan 9
>          names may contain any printable character (that is, any
>          character outside hexadecimal 00-1F and 80-9F) except
>          slash.)

I think it is a huge mistake to say "the protocol has no
such restriction".

One of the big problems with Unix was that you could have nearly
arbitrary characters in filenames, but a lot of programs (notably
things like cpio, xargs, and the shell itself) did not take this
possibility seriously.  It takes a lot more experience than it
should to write reliable scripts for dealing with files in Unix.

Admittedly rc has much cleaner quoting than the Bourne shell, and
Plan 9 has helped by outlawing newlines in file names.  However,
why reincarnate the same problem in a different guise?  If 9P2000
servers can export arbitrary strings as file name components, but
the clients (e.g. the Plan 9 kernel) device) can't handle some of those
strings, then it will be impossible to write reliable client
programs.  Consider u9fs.  Since Unix allows nearly arbitrary file
names, it is quite easy now to create Unix files that you can't access
from a Plan 9 client.  If the protocol explicitly forbids funny
characters, then u9fs will have to be fixed to map those characters
in some way, or it can't claim to be a 9P implementation.  I think
that would be a desirable state of affairs.

Another way of putting this: in theory the protocol has no such
restriction, but in practice it does, and always will; therefore,
why not fix the theory to admit the practical restrictions?

>          An exception is the tag ~0, meaning `no tag': the
>          client can use it, when establishing a connection, to over-
>          ride tag matching in version and session messages.

This is poorly worded.  The Tsession page states that the tag must
be ~0; saying here that the client "can" use it makes it sound
optional.  Also, can a client use a tag of ~0 on a Tversion
transaction that is not the first transaction on a new connection?

This whole ~0 feature is undesirable and adds useless complexity.
Tsession could just as well require a tag of 0, since it flushes
all tags.  It could guarantee that the reply tag is 0.  And there
is no reason that Tversion can't or shouldn't be required to just
have a normal tag.  Then you can completely eliminate special cases
in servers (treatment of ~0 tags) and clients (the need to avoid
accidently generating ~0 tags).

>          The version message identifies the version of the protocol
>          and indicates the maximum message size the system is pre-
>          pared to handle.  A session request initializes a connection
>          and aborts all outstanding I/O on the connection.  The set
>          of messages between session requests is called a session.

I've always wondered why the session transaction has these abort
semantics.  It is easy to see the exchange of authentication data
as a justification for a required Tsession style message at the
beginning of a session, and it is also easy to see different protocol
versions might require different session messages, hence justifying
the existence of Tversion as a separate transaction required before
Tsession.

But I don't understand the point of the abort semantics.  My guess
is that it is intended to support some kind of persistent channel
to a server, analagous to hard wired serial port, where there is
no out-of-band concept of channel setup or teardown.  Unlike TCP,
where the setup and teardown connection can be detected independently
of the bytes transmitted on the virtual circuit.  So, for example,
a client reboot could result in a new Tsession on such an imaginary
hardwired connection.

The problem with this idea is that it is insufficient to give
reliable behavior in the face of arbitrary client or server crashes
or misbehavior, since the 9P encoding (and the proposed 9P2000
encoding) provides no easy way to resynchronize with the byte stream
if you somehow lose track of transaction boundaries.

(Ok, can you tell that I've been implementing SONET recently? :-)

I would like to suggest that the "abort" semantics be removed from
Tsession.  Admit that an underlying transport protocol will always
need to exist, specify that only one Tsession message can ever be
sent during the lifetime of a connection, and specify that outstanding
transactions are aborted when the underlying protocol's connection
is shut down.

If I have misunderstood the point of the "abort" semantics of
Tsession, please explain why it's there.

>          The stat transaction retrieves information about the file.
>          The stat field in the reply includes the file's name, access
>          permissions (read, write and execute for owner, group and
>          public), access and modification times, and owner and group
>          identifications (see stat(2)). The owner and group identifi-
>          cations are textual names.  The wstat transaction allows
>          some of a file's properties to be changed.

Again, I would like to lobby for protocol-imposed restrictions on the
legal contents of owner and group names.

>     DIRECTORIES
>          Directories are created by create with DMDIR set in the per-
>          missions argument (see stat(5)). The members of a directory
>          can be found with read(5). All directories must support
>          walks to the directory .. (dot-dot) meaning parent direc-
>          tory, although by convention directories contain no explicit
>          entry for .. or . (dot).  The parent of the root directory
>          of a server's tree is itself.

If I walk to foo/bar/.. does the protocol require that I return to bar?
I.e. is the file hierarchy required to be strictly a tree?  It looks to me
like another one of those restrictions the protocol should impose for the
sanity of the client: without such a restriction, the "lexical names"
feature in the Plan 9 kernel could get hopelessly confused.

>     CLUNK(5)                                                 CLUNK(5)
>
>          Even if the clunk returns an error, the fid is no longer
>          valid.

What are plausible errors associated with Tclunk (other than
attempting to clunk an invalid fid)?  The only thing I could think
of would be deferred errors associated with earlier transactions
that were not detected until later, like media errors associated
with deferred writes.

I assume the intent of allowing errors on Tclunk is that the error
returned is returned as the result of the close() system call?
(Of course, not every Tclunk corresponds to a close()...)

>     ERROR(5)                                                 ERROR(5)
>
>          By convention, clients may truncate error messages after 255
>          bytes, defined as ERRMAX in <libc.h>.

Translation: the server ought to make sure the meat of the error message
fits into the first 255 bytes, otherwise the user of the client might
not see it.

>     READ(5)                                                   READ(5)

>          For directories, read returns an integral number of direc-
>          tory entries exactly as in stat (see stat(5)), one for each
>          member of the directory.  The read request message must have
>          offset equal to zero or the value of offset in the previous
>          read on the directory, plus the number of bytes returned in
>          the previous read.  In other words, seeking other than to
>          the beginning is illegal in a directory (see seek(2)).

What happens if I have a directory entry SomeReallyLongStupidFileNameFromJava
and I attempt to read() fewer than the bytes required for the
associated stat structure?  This could happen two ways: (A) the
client application might have just issued a really small read
request, or (B) the byte count of the directory entry might result
in the required size of the Rread message exceeding the negotiated
maximum transaction size between the 9P client and server.

Scenario (A) can always be handled at the client application level
by executing a seek to the beginning of the directory and rescanning
with a larger buffer.  (Or by just always using a 64K+1 read in the
first place, darnit.)  So scenario (A) is not a serious threat to
the integrity of the underlying protocol design.

Scenario (B) is bad.  There is no easy way for the client to recover.
Certainly the client application can do nothing about it: the protocol
connection is already established and the msize is fixed in stone.

At the protocol level one hypothetical solution might be for the server
to return some kind of error cookie that:

	1) Indicates there was a really long directory entry.

	2) Returns a new offset that the client can use to read beyond
	   the directory entry that didn't fit.  This is important--we
	   wouldn't want it to be possible to "hide" files behind
	   ReallyLongDirectoryEntries.

Another hypothetical solution: the server could have a notion of
a "truncated stat structure" that returns as much as will fit, plus
the real offset to the next directory entry.

Both of these possibilities are needlessly complex.  Better if
scenario (B) could never happen.

It would be easy for the server to ensure this by preventing such
files from ever be created in the first place -- except for one
tiny hitch.  That is that the server cannot exceed the client's
requested msize that was previously specified in Tversion.  So a
client that negotiates a too-small msize can make scenario (B)
possible.

Rather than adding a complex special case response to the server's
repertoire that all clients would have to know about, I'd prefer
to legislate this situation out of existence: add a "minimum maximum"
to Tversion: require that the smallest allowable msize that a client
can request is 64K + some slop, enough to hold an Rread containing
one worst-case stat structure.  Then the need for a way to recover
from scenario (B) is removed from the protocol.

If 64K+slop is unpalatably large, consider specifying a smaller
maximum possible stat record, say 8K-slop, so that the minimum msize
becomes 8K exactly.

>     STAT(5)                                                   STAT(5)

>          name[ s ]
>               file name; must be / if the file is the root directory
>               of the server

Not to beat on a dead horse, but other than this one exception, *please*
outlaw /'s in file names throughout the protocol.

>          Servers may implement a time-
>          out on the lock on an exclusive use file: if the fid holding
>          the file open has been unused for an extended period (of
>          order at least minutes), it is reasonable to break the lock
>          and deny the initial fid further I/O.

Consider an allowable minimum and a required maximum timeout?  This is
one of those situations where you know that whatever you specify will
be wrong, but it's still better to have a specification so that all
implementations will be broken in exactly the same way.

>          The two time fields are measured in seconds since the epoch
>          (Jan 1 00:00 1970 GMT).  The mtime field reflects the time
>          of the last change of content (except when later changed by
>          wstat).  For a plain file, mtime is the time of the most
>          recent create, open with truncation, or write; for a direc-
>          tory it is the time of the most recent remove, create, or
>          wstat of a file in the directory.  Similarly, the atime
>          field records the last read of the contents; also it is set
>          whenever mtime is set.  In addition, for a directory, it is
>          set by an attach, walk, or create, all whether successful or
>          not.

Consider changing the time fields to 64 bits.  2038 is not so far away.
Also for the benefit of programs like mk it would arguably desirable
for timestamps to have finer granularity than 1 second in today's world
of very fast computers (although I suppose mk could detect "instantaneous"
commands by looking for changed qid.versions).  Say 1 microsecond?
64 bits offers a lot of room...

>          The wstat request can change some of the file status infor-
>          mation.  [...]  The length can be
>          changed (affecting the actual length of the file) by anyone
>          with write permission on the file.  It is an error to
>          attempt to set the length of a directory to a non-zero
>          value, and servers may decide to reject length changes for
>          other reasons.

Assuming the server does *not* reject truncation of a directory to
length 0, should a client assume that all files under the directory
have been removed? This is another one of those possible complications
that I think should be eliminated by specifying them out of the
protocol: always reject attempts by wstat to change the length of
a directory.

>	   None
>          of the other data can be altered by a wstat.  In particular,
>          there is no way to change the owner of a file.

This is not true in existing implementations: for example, with
"disk/kfscmd allow", I can change file ownership.  Moreover this
is a necessary feature for system administration to ensure that
system files have the right owners.  I would argue that the protocol
allows you to request a change of ownership, and that it is at the
server's discretion whether to allow or reject, according to the
security policy of the server, which should not be considered part
of the protocol.

In fact, I would go a bit further: the whole concept of "group leaders"
is a weird Plan 9 thing that is not true on, say, a Unix based server.
So it should also be at the server's discretion whether to accept
or reject group changes, again according to a security policy that
is considered outside the scope of the protocol.

Changes in ownership, group, or permissions that are refused should
always result in an Rerror.  (Alright, I see you've covered that
later in the "all or nothing" clause...)

(And the discussion of the main Plan 9 file server's security policy
should really be on some other manual pages than the definition of 9P.)

Now at this point I suppose you'll jump on me and argue that I here
I am arguing for server-dependent variations in behavior, whereas
above (on file names, owner/group names, and the meaning of ..) I
was arguing for required uniform behavior across all servers.  The
reason is that here I consider implementation-dependent variations
less harmful, since relatively few programs normally want to mess
with file ownership, and those that do have a reasonable expectation
of the operations failing anyway.  In contrast, non-uniform rules
for allowable file, owner, and group names or the meaning of ..
would pervasively break a whole lot of programs, like any script
that wants to parse the output of "ls -l" or expects "cd .." to go
somewhere reliable.

>          Note that since the stat information is sent as a 9P
>          variable-length datum, it is limited to a maximum of 65535
>          bytes.

So what should happen if I use Tcreat to create a file name that
is so long that the stat structure associated with the file would
exceed 64k-1 bytes?

I would argue that the Tcreate man page should explicitly say such
requests must always fail.

>     VERSION(5)                                             VERSION(5)
>
>     NAME
>          version - negotiate protocol version
>
>     SYNOPSIS
>          Tversion size[4] tag[2] msize[4] version[s]
>          Rversion size[4] tag[2] msize[4] version[s]
>
>     DESCRIPTION
>          The version request negotiates the protocol version and mes-
>          sage size to be used on the connection.  Tversion must be
>          the first message sent on the 9P connection, and the client
>          cannot issue any further requests until it has received the
>          Rversion reply.

Can you issue another Tversion later?  I would argue that it should
be explicitly prohibited, even more strongly than I previously argued
that multiple Tsessions should be prohibited.

>          The client suggests a maximum message size, msize, that is
>          the maximum length, in bytes, it will ever generate or
>          expect to receive in a single 9P message.

As previously mentioned, please specify a minimum msize that a client
is allowed to request, and make the largest possible stat record
consistent with the value of this minimal msize.

>     WALK(5)                                                   WALK(5)

Interesting: this subsumes the old "clwalk", and also subsumes the old
"clone" via the subterfuge of zero-element walks.

>          The element ``..''  (dot-dot) represents the parent direc-
>          tory.  The name ``.''  (dot), meaning the current directory,
>          is not used in the protocol.
>
>          It is legal for nwname to be zero, in which case newfid will
>          represent the same file as fid and the walk will usually
>          succeed; this is equivalent to walking to dot.  The rest of
>          this discussion assumes nwname is greater than zero.

Do these two paragraphs taken together mean that when the mnt(3) device
When mnt(3) sees the name "foo/./bar", is it expected to generate
walk("foo", "", "bar"), or is it expected to generate walk("foo", "bar")?

I would argue that walk("") should be simply disallowed: if the
mnt(3) device needs to elide walks to ".", it might as well also
elide walks to "" as well; that way you can eliminate a special
case that would otherwise need to be explicitly coded in all servers.

>          If the first element cannot be walked for any reason, Rerror
>          is returned.  Otherwise, the walk will return an Rwalk mes-
>          sage containing nqid qids corresponding, in order, to the
>          files that are visited by the nqid successful elementwise
>          walks; nqid is therefore either nwname or the index of the
>          first elementwise walk that failed.  The value of nqid can-
>          not be zero unless nwname is zero.  Also, nqid will always
>          be less than or equal to nwname.  Only if it is equal, how-
>          ever, will newfid be affected, in which case it will repre-
>          sent the file reached by the final elementwise walk
>          requested in the message.

If the walk operation fails, does newfid exist (and point to the
same qid as fid), or is it implicitly clunked?

My suggestion: If the walk fails, newfid should be implicitly
clunked unless it was equal to fid.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [9fans] 9P2000
  2001-01-27 21:58 rob pike
@ 2001-01-28 16:29 ` Sam Ducksworth
  2001-01-30  9:21 ` Mike Haertel
  1 sibling, 0 replies; 10+ messages in thread
From: Sam Ducksworth @ 2001-01-28 16:29 UTC (permalink / raw)
  To: 9fans

rob pike wrote:

>
> distribution.  That is the real reason things have seemed quiet
> lately, not the Lucent announcements.
>
> -rob
>

rob,

thanks for the update. sorry for being so paranoid.

sam




^ permalink raw reply	[flat|nested] 10+ messages in thread

* [9fans] 9P2000
@ 2001-01-27 21:58 rob pike
  2001-01-28 16:29 ` Sam Ducksworth
  2001-01-30  9:21 ` Mike Haertel
  0 siblings, 2 replies; 10+ messages in thread
From: rob pike @ 2001-01-27 21:58 UTC (permalink / raw)
  To: 9fans

I've been thinking of sending this information to 9fans for a while.
Since the cat is out of the bag, now is as good a time as any.  We
have reworked 9P to address many of its failings, most important:

1)	Nesting and encapsulation: exportfs embeds 9P within 9P,
	which can make reads and writes not fit within the 8K limit.

2)	Walk performance: it takes too many walks to evaluate a
	name.

3)	Sizes fixed and too small: read/write sizes and, most
	important, path name elements have limited, too-small sizes.

4)	Authentication too rigid: the authentication protocols were
	defined in the protocol and so impossible to change.

And a host of other lesser things.

We have a file server and kernel running this protocol now and have
adapted much but not all of our stuff; it's not yet the system we live
with.  Comments on the following man pages are welcome.  I've included
all of section 5 (9P itself, now 9P2000) and some relevant parts of
section 2.  Directory handling is very different, for example.

There may be many errors in these pages and many details are sure to
change before we're done.

Until this stuff gets installed and a lot of shaking down has
happened, there won't be much in the way of updates to the existing
distribution.  That is the real reason things have seemed quiet
lately, not the Lucent announcements.

-rob

     INTRO(5)                                                 INTRO(5)

     NAME
          intro - introduction to the Plan 9 File Protocol, 9P

     SYNOPSIS
          #include <fcall.h>

     DESCRIPTION
          A Plan 9 server is an agent that provides one or more hier-
          archical file systems - file trees - that may be accessed by
          Plan 9 processes.  A server responds to requests by clients
          to navigate the hierarchy, and to create, remove, read, and
          write files.  The prototypical server is a separate machine
          that stores large numbers of user files on permanent media;
          such a machine is called, somewhat confusingly, a file
          server. Another possibility for a server is to synthesize
          files on demand, perhaps based on information on data struc-
          tures inside the kernel; the proc(3) kernel device is a part
          of the Plan 9 kernel that does this.  User programs can also
          act as servers.

          A connection to a server is a bidirectional communication
          path from the client to the server.  There may be a single
          client or multiple clients sharing the same connection.  A
          server's file tree is attached to a process group's name
          space by bind(2) and mount calls; see intro(2). Processes in
          the group are then clients of the server: system calls oper-
          ating on files are translated into requests and responses
          transmitted on the connection to the appropriate service.

          The Plan 9 File Protocol, 9P, is used for messages between
          clients and servers. A client transmits requests (T-
          messages) to a server, which subsequently returns replies
          (R-messages) to the client.  The combined acts of transmit-
          ting (receiving) a request of a particular type, and receiv-
          ing (transmitting) its reply is called a transaction of that
          type.

          Each message consists of a sequence of bytes.  Two-, four-,
          and eight-byte fields hold unsigned integers represented in
          little-endian order (least significant byte first).  Data
          items of larger or variable lengths are represented by a
          two-byte field specifying a count, n, followed by n bytes of
          data.  Text strings are represented this way, with the text
          itself stored as a UTF-8 encoded sequence of Unicode charac-
          ters (see utf(6)). Text strings in 9P messages are not NUL-
          terminated: n counts the bytes of UTF-8 data, which include
          no final zero byte.  The NUL character is illegal in all
          text strings in 9P, and is therefore excluded from file
          names, user names, and so on.

     Page 1                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          Each 9P message begins with a four-byte size field specify-
          ing the length in bytes of the complete message including
          the four bytes of the size field itself.  The next byte is
          the message type, one of the constants in the enumeration in
          the include file <fcall.h>.  The remaining bytes are parame-
          ters of different sizes.  In the message descriptions below,
          the number of bytes in a field is given in brackets after
          the field name.  The notation parameter[n] where n is not a
          constant represents a variable-length parameter: n[2] fol-
          lowed by n bytes of data forming the parameter. The notation
          string[s] (using a literal s character) is shorthand for
          s[2] followed by s bytes of UTF-8 text.  (Systems may choose
          to reduce the set of legal characters to reduce syntactic
          problems, for example to remove slashes from name compo-
          nents, but the protocol has no such restriction.  Plan 9
          names may contain any printable character (that is, any
          character outside hexadecimal 00-1F and 80-9F) except
          slash.)  Messages are transported in byte form to allow for
          machine independence; fcall(2) describes routines that con-
          vert to and from this form into a machine-dependent C struc-
          ture.

     MESSAGES
               Tversion  size[4] tag[2] msize[4] version[s]
               Rversion  size[4] tag[2] msize[4] version[s]

               Tsession  size[4] tag[2] chal[n]
               Rsession  size[4] tag[2] chal[n] authid[s] authdom[s]

               Rerror    size[4] tag[2] ename[s]

               Tflush    size[4] tag[2] oldtag[4]
               Rflush    size[4] tag[2]

               Tattach   size[4] tag[2] fid[4] uname[s] aname[s]
               auth[n]
               Rattach   size[4] tag[2] qid[13] rauth[n]

               Twalk     size[4] tag[2] fid[4] newfid[4] nwname[2]
               nwname*(wname[s])
               Rwalk     size[4] tag[2] nwqid[2] nwqid*(wqid[13])

               Topen     size[4] tag[2] fid[4] mode[1]
               Ropen     size[4] tag[2] qid[13] iounit[4]

               Tcreate   size[4] tag[2] fid[4] name[s] perm[4] mode[1]
               Rcreate   size[4] tag[2] qid[13] iounit[4]

               Tread     size[4] tag[2] fid[4] offset[8] count[4]
               Rread     size[4] tag[2] count[4] data[count]

     Page 2                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

               Twrite    size[4] tag[2] fid[4] offset[8] count[4]
               data[count]
               Rwrite    size[4] tag[2] count[4]

               Tclunk    size[4] tag[2] fid[4]
               Rclunk    size[4] tag[2]

               Tremove   size[4] tag[2] fid[4]
               Rremove   size[4] tag[2]

               Tstat     size[4] tag[2] fid[4]
               Rstat     size[4] tag[2] stat[n]

               Twstat    size[4] tag[2] fid[4] stat[n]
               Rwstat    size[4] tag[2]

          Each T-message has a tag field, chosen and used by the
          client to identify the message.  The reply to the message
          will have the same tag.  Clients must arrange that no two
          outstanding messages on the same connection have the same
          tag.  An exception is the tag ~0, meaning `no tag': the
          client can use it, when establishing a connection, to over-
          ride tag matching in version and session messages.

          The type of an R-message will either be one greater than the
          type of the corresponding T-message or Rerror, indicating
          that the request failed.  In the latter case, the ename
          field contains a string describing the reason for failure.

          The version message identifies the version of the protocol
          and indicates the maximum message size the system is pre-
          pared to handle.  A session request initializes a connection
          and aborts all outstanding I/O on the connection.  The set
          of messages between session requests is called a session.

          Most T-messages contain a fid, a 32-bit unsigned integer
          that the client uses to identify a ``current file'' on the
          server.  Fids are somewhat like file descriptors in a user
          process, but they are not restricted to files open for I/O:
          directories being examined, files being accessed by stat(2)
          calls, and so on - all files being manipulated by the oper-
          ating system - are identified by fids.  Fids are chosen by
          the client.  All requests on a connection share the same fid
          space; when several clients share a connection, the agent
          managing the sharing must arrange that no two clients choose
          the same fid.

          The first fid supplied (in an attach message) will be taken
          by the server to refer to the root of the served file tree.
          The attach identifies the user to the server and may specify
          a particular file tree served by the server (for those that
          supply more than one).  A walk message causes the server to

     Page 3                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          change the current file associated with a fid to be a file
          in the directory that is the old current file, or one of its
          subdirectories.  Walk returns a new fid that refers to the
          resulting file.  Usually, a client maintains a fid for the
          root, and navigates by walks from the root fid.

          A client can send multiple T-messages without waiting for
          the corresponding R-messages, but all outstanding T-messages
          must specify different tags.  The server may delay the
          response to a request on one fid and respond to later
          requests on other fids; this is sometimes necessary, for
          example when the client reads from a file that the server
          synthesizes from external events such as keyboard charac-
          ters.

          Replies (R-messages) to attach, walk, open, and create
          requests convey a qid field back to the client.  The qid
          represents the server's unique identification for the file
          being accessed: two files on the same server hierarchy are
          the same if and only if their qids are the same.  (The
          client may have multiple fids pointing to a single file on a
          server and hence having a single qid.)  The seventeen-byte
          qid fields hold a one-byte type, specifying whether the file
          is a directory, append-only file, etc., and two eight-byte
          unsigned integers: first the qid path, then the qid version.
          The path is an integer unique among all files in the hierar-
          chy.  If a file is deleted and recreated with the same name
          in the same directory, the old and new path components of
          the qids should be different.  The version is a version num-
          ber for a file; typically, it is incremented every time the
          file is modified.

          An existing file can be opened, or a new file may be created
          in the current (directory) file.  I/O of a given number of
          bytes at a given offset on an open file is done by read and
          write.

          A client should clunk any fid that is no longer needed.  The
          remove transaction deletes files.

          The stat transaction retrieves information about the file.
          The stat field in the reply includes the file's name, access
          permissions (read, write and execute for owner, group and
          public), access and modification times, and owner and group
          identifications (see stat(2)). The owner and group identifi-
          cations are textual names.  The wstat transaction allows
          some of a file's properties to be changed.

          A request can be aborted with a Tflush request.  When a
          server receives a Tflush, it should not reply to the message
          with tag oldtag (unless it has already replied), and it
          should immediately send an Rflush.  The client must wait

     Page 4                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          until it gets the Rflush (even if the reply to the original
          message arrives in the interim), at which point oldtag may
          be reused.

          Most programs do not see the 9P protocol directly; instead
          calls to library routines that access files are translated
          by the mount driver, mnt(3), into 9P messages.

     DIRECTORIES
          Directories are created by create with DMDIR set in the per-
          missions argument (see stat(5)). The members of a directory
          can be found with read(5). All directories must support
          walks to the directory .. (dot-dot) meaning parent direc-
          tory, although by convention directories contain no explicit
          entry for .. or . (dot).  The parent of the root directory
          of a server's tree is itself.

     ACCESS PERMISSIONS
          Each file server maintains a set of user and group names.
          Each user can be a member of any number of groups.  Each
          group has a group leader who has special privileges (see
          stat(5) and users(6)). Every file request has an implicit
          user id (copied from the original attach) and an implicit
          set of groups (every group of which the user is a member).

          Each file has an associated owner and group id and three
          sets of permissions: those of the owner, those of the group,
          and those of ``other'' users.  When the owner attempts to do
          something to a file, the owner, group, and other permissions
          are consulted, and if any of them grant the requested per-
          mission, the operation is allowed.  For someone who is not
          the owner, but is a member of the file's group, the group
          and other permissions are consulted.  For everyone else, the
          other permissions are used.  Each set of permissions says
          whether reading is allowed, whether writing is allowed, and
          whether executing is allowed.  A walk in a directory is
          regarded as executing the directory, not reading it.  Per-
          missions are kept in the low-order bits of the file mode:
          owner read/write/execute permission represented as 1 in bits
          8, 7, and 6 respectively (using 0 to number the low order).
          The group permissions are in bits 5, 4, and 3, and the other
          permissions are in bits 2, 1, and 0.

          The file mode contains some additional attributes besides
          the permissions.  If bit 31 is set, the file is a directory;
          if bit 30 is set, the file is append-only (offset is ignored
          in writes); if bit 29 is set, the file is exclusive-use
          (only one client may have it open at a time).  These bits
          are reproduced, from the top bit down, in the type byte of
          the Qid.

     Page 5                       Plan 9             (printed 1/27/00)

     ATTACH(5)                                               ATTACH(5)

     NAME
          attach, session - messages to initiate activity

     SYNOPSIS
          Tsession  size[4] tag[2] chal[n]
          Rsession  size[4] tag[2] chal[n] authid[s] authdom[s]

          Tattach   size[4] tag[2] fid[4] uid[s] aname[s] auth[n]
          Rattach   size[4] tag[2] qid[13] rauth[n]

     DESCRIPTION
          The session request initializes a connection between a
          client and a server and exchanges authentication informa-
          tion.  All outstanding I/O on the connection is aborted.
          The set of messages between session requests is called a
          session. The host's user name (authid) and its authentica-
          tion domain (authdom) identify the key to be used when
          authenticating to this host.  The exchanged challenges
          (chal) are used in the authentication algorithm.  If authid
          is an empty string no authentication is performed in this
          session.

          The tag should be NOTAG (value ~0) for a session message.

          The attach message serves as a fresh introduction from a
          user on the client machine to the server.  The message iden-
          tifies the user (uid) and may select the file tree to access
          (aname).  The auth argument contains authorization data
          derived from the exchanged challenges of the session mes-
          sage; see auth(6).

          As a result of the attach transaction, the client will have
          a connection to the root directory of the desired file tree,
          represented by fid. An error is returned if fid is already
          in use.  The server's idea of the root of the file tree is
          represented by the returned qid.

     ENTRY POINTS
          An attach transaction will be generated for kernel devices
          (see intro(3)) when a system call evaluates a file name
          beginning with `#'.  Pipe(2) generates an attach on the ker-
          nel device pipe(3). The mount system call (see bind(2)) gen-
          erates an attach message to the remote file server.  When
          the kernel boots, an attach is made to the root device,
          root(3), and then an attach is made to the requested file
          server machine.

     SEE ALSO
          version(5), auth(6)

     Page 6                       Plan 9             (printed 1/27/00)

     CLUNK(5)                                                 CLUNK(5)

     NAME
          clunk - forget about a fid

     SYNOPSIS
          Tclunk  size[4] tag[2] fid[4]
          Rclunk  size[4] tag[2]

     DESCRIPTION
          The clunk request informs the file server that the current
          file represented by fid is no longer needed by the client.
          The actual file is not removed on the server unless the fid
          had been opened with ORCLOSE.

          Once a fid has been clunked, the same fid can be reused in a
          new walk or attach request.

          Even if the clunk returns an error, the fid is no longer
          valid.

     ENTRY POINTS
          A clunk message is generated by close and indirectly by
          other actions such as failed open calls.

     Page 7                       Plan 9             (printed 1/27/00)

     ERROR(5)                                                 ERROR(5)

     NAME
          error - return an error

     SYNOPSIS
          Rerror  size[4] tag[2] ename[s]

     DESCRIPTION
          The Rerror request (there is no Terror) is used to return an
          error string describing the failure of a transaction.  It
          replaces the corresponding reply message that would accom-
          pany a successful call; its tag is that of the request.

          By convention, clients may truncate error messages after 255
          bytes, defined as ERRMAX in <libc.h>.

     Page 8                       Plan 9             (printed 1/27/00)

     FLUSH(5)                                                 FLUSH(5)

     NAME
          flush - abort a message

     SYNOPSIS
          Tflush  size[4] tag[2] oldtag[4]
          Rflush  size[4] tag[2]

     DESCRIPTION
          When the response to a request is no longer needed, such as
          when a user interrupts a process doing a read(2), a Tflush
          request is sent to the server to purge the pending response.
          The message being flushed is identified by oldtag. The
          semantics of flush depends on messages arriving in order.

          The server must answer the flush message immediately.  If it
          recognizes oldtag as the tag of a pending transaction, it
          should abort any pending response and discard that tag.  In
          either case, it should respond with an Rflush echoing the
          tag (not oldtag) of the Tflush message.  A Tflush can never
          be responded to by an Rerror message.

          When the client sends a Tflush, it must wait to receive the
          corresponding Rflush before reusing oldtag for subsequent
          messages.  If a response to the flushed request is received
          before the Rflush, the client must honor the response as if
          it had not been flushed, since the completed request may
          signify a state change in the server.  For instance, Tcreate
          will have created a file and Twalk may have allocated a fid.
          If no response is received before the Rflush, the flushed
          transaction is considered to have been canceled, and should
          be treated as though it had never been sent.

          Several exceptional conditions are handled correctly by the
          above specification: sending multiple flushes for a single
          tag, flushing after a transaction is completed, flushing a
          Tflush, and flushing an invalid tag.

     Page 9                       Plan 9             (printed 1/27/00)

     OPEN(5)                                                   OPEN(5)

     NAME
          open, create - prepare a fid for I/O on an existing or new
          file

     SYNOPSIS
          Topen    size[4] tag[2] fid[4] mode[1]
          Ropen    size[4] tag[2] qid[13] iounit[4]

          Tcreate  size[4] tag[2] fid[4] name[s] perm[4] mode[1]
          Rcreate  size[4] tag[2] qid[13] iounit[4]

     DESCRIPTION
          The open request asks the file server to check permissions
          and prepare a fid for I/O with subsequent read and write
          messages.  The mode field determines the type of I/O: 0, 1,
          2, and 3 mean read access, write access, read and write
          access, and execute access, to be checked against the per-
          missions for the file.  In addition, if mode has the OTRUNC
          (0x10) bit set, the file is to be truncated, which requires
          write permission (if the file is append-only, and permission
          is granted, the open succeeds but the file will not be trun-
          cated); if the mode has the ORCLOSE (0x40) bit set, the file
          is to be removed when the fid is clunked, which requires
          permission to remove the file from its directory.  If other
          bits are set in mode they will be ignored.  It is illegal to
          write a directory, truncate it, or attempt to remove it on
          close.  If the file is marked for exclusive use (see
          stat(5)), only one client can have the file open at any
          time.  That is, after such a file has been opened, further
          opens will fail until fid has been clunked.  All these per-
          missions are checked at the time of the open request; subse-
          quent changes to the permissions of files do not affect the
          ability to read, write, or remove an open file.

          The create request asks the file server to create a new file
          with the name supplied, in the directory (dir) represented
          by fid, and requires write permission in the directory.  The
          owner of the file is the implied user id of the request, the
          group of the file is the same as dir, and the permissions
          are the value of
                    perm & (~0666 | (dir.perm & 0666))
          if a regular file is being created and
                    perm & (~0777 | (dir.perm & 0777))
          if a directory is being created.  This means, for example,
          that if the create allows read permission to others, but the
          containing directory does not, then the created file will
          not allow others to read the file.

          Finally, the newly created file is opened according to mode,
          and fid will represent the newly opened file.  Mode is not

     Page 10                      Plan 9             (printed 1/27/00)

     OPEN(5)                                                   OPEN(5)

          checked against the permissions in perm. The qid for the new
          file is returned with the create reply message.

          Directories are created by setting the DMDIR bit
          (0x80000000) in the perm.

          The names . and .. are special; it is illegal to create
          files with these names.

          It is an error for either of these messages if the fid is
          already the product of a successful open or create message.

          An attempt to create a file in a directory where the given
          name already exists will be rejected; in this case, the
          create system call (see open(2)) uses open with truncation.
          The algorithm used by the create system call is: first walk
          to the directory to contain the file.  If that fails, return
          an error.  Next walk to the specified file.  If the walk
          succeeds, send a request to open and truncate the file and
          return the result, successful or not.  If the walk fails,
          send a create message.  If that fails, it may be because the
          file was created by another process after the previous walk
          failed, so (once) try the walk and open again.

          For the behavior of create on a union directory, see
          bind(2).

          The iounit field returned by open and create may be zero.
          If it is not, it is the maximum number of bytes that are
          guaranteed to be read from or written to the file without
          breaking the I/O transfer into multiple 9P messages; see
          read(5).

     ENTRY POINTS
          Open and create both generate open messages; only create
          generates a create message.

          For programs that need atomic file creation, without the
          race that exists in the open-create sequence described
          above, the kernel does the following.  If the OEXCL (0x1000)
          bit is set in the mode for a create system call, the open
          message is not sent; the kernel issues only the create.
          Thus, if the file exists, create will draw an error, but if
          it doesn't and the create system call succeeds, the process
          issuing the create is guaranteed to be the one that created
          the file.

     Page 11                      Plan 9             (printed 1/27/00)

     READ(5)                                                   READ(5)

     NAME
          read, write - transfer data from and to a file

     SYNOPSIS
          Tread   size[4] tag[2] fid[4] offset[8] count[4]
          Rread   size[4] tag[2] count[4] data[count]

          Twrite  size[4] tag[2] fid[4] offset[8] count[4] data[count]
          Rwrite  size[4] tag[2] count[4]

     DESCRIPTION
          The read request asks for count bytes of data from the file
          identified by fid, which must be opened for reading, start-
          ing offset bytes after the beginning of the file.  The bytes
          are returned with the read reply message.

          The count field in the reply indicates the number of bytes
          returned.  This may be less than the requested amount.  If
          the offset field is greater than or equal to the number of
          bytes in the file, a count of zero will be returned.

          For directories, read returns an integral number of direc-
          tory entries exactly as in stat (see stat(5)), one for each
          member of the directory.  The read request message must have
          offset equal to zero or the value of offset in the previous
          read on the directory, plus the number of bytes returned in
          the previous read.  In other words, seeking other than to
          the beginning is illegal in a directory (see seek(2)).

          The write request asks that count bytes of data be recorded
          in the file identified by fid, which must be opened for
          writing, starting offset bytes after the beginning of the
          file.  If the file has been opened append only, the data
          will be placed at the end of the file regardless of offset.
          Directories may not be written.

          The write reply records the number of bytes actually writ-
          ten.  It is usually an error if this is not the same as
          requested.

          Because 9P implementations may limit the size of individual
          messages, more than one message may be produced by a single
          read or write call.  The iounit field returned by open(5),
          if non-zero, reports the maximum size that is guaranteed to
          be transferred atomically.

     ENTRY POINTS
          Read and write messages are generated by the corresponding
          calls.  Although seek(2) affects the offset, it does not
          generate a message.

     Page 12                      Plan 9             (printed 1/27/00)

     REMOVE(5)                                               REMOVE(5)

     NAME
          remove - remove a file from a server

     SYNOPSIS
          Tremove  size[4] tag[2] fid[4]
          Rremove  size[4] tag[2]

     DESCRIPTION
          The remove request asks the file server both to remove the
          file represented by fid and to clunk the fid, even if the
          remove fails.  This request will fail if the client does not
          have write permission in the parent directory.

          It is correct to consider remove to be a clunk with the side
          effect of removing the file if permissions allow.

     ENTRY POINTS
          Remove messages are generated by remove.

     Page 13                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

     NAME
          stat, wstat - inquire or change file attributes

     SYNOPSIS
          Tstat   size[4] tag[2] fid[4]
          Rstat   size[4] tag[2] stat[n]

          Twstat  size[4] tag[2] fid[4] stat[n]
          Rwstat  size[4] tag[2]

     DESCRIPTION
          The stat transaction inquires about the file identified by
          fid. The reply will contain a machine-independent directory
          entry, stat, laid out as follows:

          type[2]
               for kernel use

          dev[4]
               for kernel use

          qid.type[1]
               the type of the file (directory, etc.), represented as
               a bit vector corresponding to the high 8 bits of the
               file's mode word.

          qid.vers[4]
               version number for given path

          qid.path[8]
               the file server's unique identification for the file

          mode[4]
               permissions and flags

          atime[4]
               last access time

          mtime[4]
               last modification time

          length[8]
               length of file in bytes

          name[ s ]
               file name; must be / if the file is the root directory
               of the server

          uid[ s ]
               owner name

     Page 14                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

          gid[ s ]
               group name

          muid[ s ]
               name of the user who last modified the file

          Integers in this encoding are in little-endian order (least
          significant byte first).  The convM2D and convD2M routines
          (see fcall(2)) convert between directory entries and C
          structs.

          This encoding may be turned into a machine dependent Dir
          structure (see stat(2)) using routines defined in fcall(2).

          The mode contains permission bits as described in intro(5)
          and the following: 0x80000000 (this file is a directory),
          0x40000000 (append only), 0x20000000 (exclusive use); these
          are echoed in Qid.type.  Writes to append-only files always
          place their data at the end of the file; the offset in the
          write message is ignored, as is the OTRUNC bit in an open.
          Exclusive use files may be open for I/O by only one fid at a
          time across all clients of the server.  If a second open is
          attempted, it draws an error.  Servers may implement a time-
          out on the lock on an exclusive use file: if the fid holding
          the file open has been unused for an extended period (of
          order at least minutes), it is reasonable to break the lock
          and deny the initial fid further I/O.

          The two time fields are measured in seconds since the epoch
          (Jan 1 00:00 1970 GMT).  The mtime field reflects the time
          of the last change of content (except when later changed by
          wstat).  For a plain file, mtime is the time of the most
          recent create, open with truncation, or write; for a direc-
          tory it is the time of the most recent remove, create, or
          wstat of a file in the directory.  Similarly, the atime
          field records the last read of the contents; also it is set
          whenever mtime is set.  In addition, for a directory, it is
          set by an attach, walk, or create, all whether successful or
          not.

          The muid field names the user whose actions most recently
          changed the mtime of the file.

          The length records the number of bytes in the file.  Direc-
          tories and most files representing devices have a conven-
          tional length of 0.

          The stat request requires no special permissions.

          The wstat request can change some of the file status infor-
          mation.  The name can be changed by anyone with write per-
          mission in the parent directory; it is an error to change

     Page 15                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

          the name to that of an existing file.  The length can be
          changed (affecting the actual length of the file) by anyone
          with write permission on the file.  It is an error to
          attempt to set the length of a directory to a non-zero
          value, and servers may decide to reject length changes for
          other reasons.  The mode and mtime can be changed by the
          owner of the file or the group leader of the file's current
          group.  The directory bit cannot be changed by a wstat; the
          other defined permission and mode bits can.  The gid can be
          changed: by the owner if also a member of the new group; or
          by the group leader of the file's current group if also
          leader of the new group (see intro(5) for more information
          about permissions and users(6) for users and groups).  None
          of the other data can be altered by a wstat.  In particular,
          there is no way to change the owner of a file.

          Either all the changes in wstat request happen, or none of
          them does: if the request succeeds, all changes were made;
          if it fails, none were.

          A wstat request can explicitly avoid modifying some proper-
          ties of the file by providing explicit ``don't touch'' val-
          ues in the stat data that is sent: zero-length strings for
          text values and ~0 for integral values.

          A read of a directory yields an integral number of directory
          entries in the machine independent encoding given above (see
          read(5)).

          Note that since the stat information is sent as a 9P
          variable-length datum, it is limited to a maximum of 65535
          bytes.

     ENTRY POINTS
          Stat messages are generated by fstat and stat.

          Wstat messages are generated by fwstat and wstat.

     Page 16                      Plan 9             (printed 1/27/00)

     VERSION(5)                                             VERSION(5)

     NAME
          version - negotiate protocol version

     SYNOPSIS
          Tversion size[4] tag[2] msize[4] version[s]
          Rversion size[4] tag[2] msize[4] version[s]

     DESCRIPTION
          The version request negotiates the protocol version and mes-
          sage size to be used on the connection.  Tversion must be
          the first message sent on the 9P connection, and the client
          cannot issue any further requests until it has received the
          Rversion reply.

          The client suggests a maximum message size, msize, that is
          the maximum length, in bytes, it will ever generate or
          expect to receive in a single 9P message.  This count
          includes all 9P protocol data, starting from the size field
          and extending through the message, but excludes enveloping
          transport protocols.  The server responds with its own maxi-
          mum, msize, which must be less than or equal to the client's
          value.  Thenceforth, both sides of the connection must honor
          this limit.

          The version string identifies the level of the protocol.
          The string must always begin with the two characters ``9P''.
          If the server does not understand the client's version
          string, it should respond with an Rversion message (not
          Rerror) with the version string the 7 characters
          ``unknown''.

          The server may respond with the client's version string, or
          a version string identifying an earlier defined protocol
          version.  Currently, the only defined version is the 6 char-
          acters ``9P2000''.  Version strings will be defined such
          that, if the client string contains one or more period char-
          acters, the initial substring up to but not including any
          single period in the version string defines a version of the
          protocol.  Other version strings may also be valid, however.

          The client and server will use the protocol version defined
          by the server's response for all subsequent communication on
          the connection.

     ENTRY POINTS
          The version message is generated by the kernel by the first
          mount system call on the connection.

     Page 17                      Plan 9             (printed 1/27/00)

     WALK(5)                                                   WALK(5)

     NAME
          walk - descend a directory hierarchy

     SYNOPSIS
          Twalk  size[4] tag[2] fid[4] newfid[4] nwname[2]
          nwname*(wname[s])
          Rwalk  size[4] tag[2] nqid[2] nqid*(qid[13])

     DESCRIPTION
          The walk request carries as arguments an existing fid, which
          must represent a directory, and a proposed newfid (which
          must not be in use unless it is the same as fid) that the
          client wishes to associate with the result of descending the
          directory hierarchy by `walking' the hierarchy using the
          successive path name elements wname.

          The fid must be valid in the current session and must not
          have been opened for I/O by an open or create message.  If
          the full sequence of nwname elements is walked successfully,
          newfid will represent the file that results.  If not, newfid
          (and fid) will be unaffected.  However, if newfid is in use
          or otherwise illegal, an Rerror is returned.

          The element ``..''  (dot-dot) represents the parent direc-
          tory.  The name ``.''  (dot), meaning the current directory,
          is not used in the protocol.

          It is legal for nwname to be zero, in which case newfid will
          represent the same file as fid and the walk will usually
          succeed; this is equivalent to walking to dot.  The rest of
          this discussion assumes nwname is greater than zero.

          The nwname path name elements wname are walked in order,
          ``elementwise''.  For the first elementwise walk to succeed,
          the file identified by fid must be a directory, and the
          implied user of the request must have permission to search
          the directory (see intro(5)). Subsequent elementwise walks
          have equivalent restrictions applied to the implicit fid
          that results from the preceding elementwise walk.

          If the first element cannot be walked for any reason, Rerror
          is returned.  Otherwise, the walk will return an Rwalk mes-
          sage containing nqid qids corresponding, in order, to the
          files that are visited by the nqid successful elementwise
          walks; nqid is therefore either nwname or the index of the
          first elementwise walk that failed.  The value of nqid can-
          not be zero unless nwname is zero.  Also, nqid will always
          be less than or equal to nwname.  Only if it is equal, how-
          ever, will newfid be affected, in which case it will repre-
          sent the file reached by the final elementwise walk

     Page 18                      Plan 9             (printed 1/27/00)

     WALK(5)                                                   WALK(5)

          requested in the message.

          A walk of the name ``..''  in the root directory of a server
          is equivalent to a walk with no name elements.

          If newfid is the same as fid, the above discussion applies,
          with the obvious difference that if the walk changes the
          state of newfid, it also changes the state of fid; and if
          newfid is unaffected, then fid is also unaffected.

          To simplify the implementation of the servers, a maximum of
          sixteen name elements or qids may be packed in a single mes-
          sage.  This constant is called MAXWELEM in fcall(2). Despite
          this restriction, the system imposes no limit on the number
          of elements in a file name, only the number that may be
          transmitted in a single message.

     ENTRY POINTS
          A call to chdir(2) causes a walk.  One or more walk messages
          may be generated by any of the following calls, which evalu-
          ate file names: bind, create, exec, mount, open, remove,
          stat, unmount, wstat. The file name element . (dot) is
          interpreted locally and is not transmitted in walk messages.

     Page 19                      Plan 9             (printed 1/27/00)

     DIRREAD(2)                                             DIRREAD(2)

     NAME
          dirread, dirreadall - read directory

     SYNOPSIS
          #include <u.h>
          #include <libc.h>

          long dirread(int fd, Dir **buf)

          long dirreadall(int fd, Dir **buf)

          #define   STATMAX   65535U

          #define   DIRMAX    (sizeof(Dir)+STATMAX)

     DESCRIPTION
          The data returned by a read(2) on a directory is a set of
          complete directory entries in a machine-independent format,
          exactly equivalent to the result of a stat(2) on each file
          or subdirectory in the directory.  Dirread decodes the
          directory entries into a machine-dependent form.  It reads
          from fd and unpacks the data into an array of Dir structures
          whose address is returned in *buf (see stat(2) for the lay-
          out of a Dir).  The array is allocated with malloc(1) each
          time dirread is called.

          Dirreadall is like dirread, but reads in the entire direc-
          tory; by contrast, dirread steps through a directory on
          read(2) at a time.

          Directory entries have variable length.  A successful read
          of a directory always returns an integral number of complete
          directory entries; dirread always returns complete Dir
          structures.  See read(5) for more information.

          The constant STATMAX is the maximum size that a directory
          entry can occupy.  The constant DIRMAX is an upper limit on
          the size necessary to hold a Dir structure and all the asso-
          ciated data.

          Dirread returns the number of Dir structures filled in buf.
          The file offset is advanced by the number of bytes actually
          read.

     SOURCE
          /sys/src/libc/9sys/dirread.c

     SEE ALSO
          intro(2), open(2), read(2)

     Page 20                      Plan 9             (printed 1/27/00)

     DIRREAD(2)                                             DIRREAD(2)

     DIAGNOSTICS
          Sets errstr.

     Page 21                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

     NAME
          Fcall, convS2M, convD2M, convM2S, convM2D, getS, fcallconv,
          dirconv, dirmodeconv, read9pmsg - interface to Plan 9 File
          protocol

     SYNOPSIS
          #include <u.h>
          #include <libc.h>
          #include <auth.h>
          #include <fcall.h>

          uint convS2M(Fcall *f, uchar *ap, uint nap)

          uint convD2M(Dir *d, uchar *ap, uint nap)

          uint convM2S(uchar *ap, uint nap, Fcall *f)

          uint convM2D(uchar *ap, uint nap, Dir *d, char *strs)

          int dirconv(void *o, Fconv*)

          int fcallconv(void *o, Fconv*)

          int dirmodeconv(void *o, Fconv*)

          int read9pmsg(int fd, uchar *buf, uint nbuf);

     DESCRIPTION
          These routines convert messages in the machine-independent
          format of the Plan 9 file protocol, 9P, to and from a more
          convenient form, an Fcall structure:

          #define MAXWELEM 16

          typedef
          struct Fcall
          {
              uchar type;
              u32int     fid;
              ushort     tag;
              union {
                    struct {
                         u32int                  msize;/* Tversion, Rversion */
                         char  *version;         /* Tversion, Rversion */
                    };
                    struct {
                         u32int                  oldtag;/* Tflush */
                    };
                    struct {
                         char  *ename;               /* Rerror */

     Page 22                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

                    };
                    struct {
                         Qid   qid;                  /* Rattach, Ropen, Rcreate */
                         u32int                  iounit;/* Ropen, Rcreate */
                         ushort                  nrauth;/* Rattach */
                         uchar *rauth;               /* Rattach */
                    };
                    struct {
                         char  *uname;               /* Tattach */
                         char  *aname;               /* Tattach */
                         ushort                  nauth;/* Tattach */
                         uchar *auth;                /* Tattach */
                    };
                    struct {
                         char  *authid;          /* Rsession */
                         char  *authdom;         /* Rsession */
                         ushort                  nchal;/* Tsession/Rsession */
                         uchar *chal;                /* Tsession/Rsession */
                    };
                    struct {
                         u32int                  perm;/* Tcreate */
                         char  *name;                /* Tcreate */
                         uchar mode;                 /* Tcreate, Topen */
                    };
                    struct {
                         u32int                  newfid;/* Twalk */
                         ushort                  nwname;/* Twalk */
                         char  *wname[MAXWELEM]; /* Twalk */
                    };
                    struct {
                         ushort                  nwqid;/* Rwalk */
                         Qid   wqid[MAXWELEM];       /* Rwalk */
                    };
                    struct {
                         vlong offset;               /* Tread, Twrite */
                         u32int                  count;/* Tread, Twrite, Rread */
                         char  *data;                /* Twrite, Rread */
                    };
                    struct {
                         ushort                  nstat;/* Twstat, Rstat */
                         uchar *stat;                /* Twstat, Rstat */
                    };
              };
          } Fcall;

          /* these are implemented as macros */

          uchar     GBIT8(uchar*)
          ushort    GBIT16(uchar*)
          ulong     GBIT32(uchar*)
          vlong     GBIT64(uchar*)

     Page 23                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

          void      PBIT8(uchar*, uchar)
          void      PBIT16(uchar*, ushort)
          void      PBIT32(uchar*, ulong)
          void      PBIT64(uchar*, vlong)

          #define   BIT8SZ     1
          #define   BIT16SZ    2
          #define   BIT32SZ    4
          #define   BIT64SZ    8

          This structure is defined in <fcall.h>.  See section 5 for a
          full description of 9P messages and their encoding.  For all
          message types, the type field of an Fcall holds one of Tnop,
          Rnop, Tsession, Rsession, etc. (defined in an enumerated
          type in <fcall.h>).  Fid is used by most messages, and tag
          is used by all messages.  The other fields are used selec-
          tively by the message types given in comments.

          ConvM2S takes a 9P message at ap of length nap, and uses it
          to fill in Fcall structure f. If the passed message includ-
          ing any data for Twrite and Rread messages is formatted
          properly, the return value is the number of bytes the mes-
          sage occupied in the buffer ap, which will always be less
          than or equal to nap; otherwise it is 0.  For Twrite and
          Tread messages, data is set to a pointer into the argument
          message, not a copy.

          ConvS2M does the reverse conversion, turning f into a mes-
          sage starting at ap. The length of the resulting message is
          returned.  For Twrite and Rread messages, count bytes start-
          ing at data are copied into the message.

          The constant IOHDRSZ is a suitable amount of buffer to
          reserve for storing the 9P header; the data portion of a
          Twrite or Rread will be no more than the buffer size nego-
          tated in the Tversion/Rversion exchange, minus IOHDRSZ.

          Another structure is Dir, used by the routines described in
          stat(2). ConvM2D converts the machine-independent form
          starting at ap into d and returns the length of the
          machine-independent, input encoding.  The strings in the
          returned Dir structure are stored at successive locations
          starting at strs; if strs is nil they are ignored; however,
          the return value still includes their length.

          ConvD2M does the reverse translation, also returning the
          length of the encoding.  If the buffer is too short, the
          return value will be BIT16SZ and the correct size will be
          returned in the first BIT16SZ bytes.  The macro GBIT16 can
          be used to extract the correct value.  The related macros
          with different sizes retrieve the corresponding-sized quan-
          tities.  PBIT16 and its brethren place values in messages.

     Page 24                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

          With the exception of handling short buffers in convD2M,
          these macros are not usually needed except by internal rou-
          tines.

          GetS reads a message from file descriptor fd into ap and
          converts the message using convM2S into the Fcall structure
          f. The lp argument must point to a long holding the size of
          the ap buffer.  It is somewhat resilient to transient read
          errors.  If convM2S succeeds, its return value is stored in
          *lp, and getS returns zero.  Otherwise getS returns a string
          identifying the error.

          Dirconv, fcallconv, and dirmodeconv are formatting routines,
          suitable for fmtinstall (see print(2)). They convert Dir*,
          Fcall*, and long values into string representations of the
          directory buffer, Fcall buffer, or file mode value.
          Fcallconv assumes that dirconv has been installed with for-
          mat letter `D' and dirmodeconv with format letter `M'.

          Read9pmsg calls read(2) multiple times, if necessary, to
          read an entire 9P message into buf.  The return value is 0
          for end of file, or -1 for error; it does not return partial
          messages.

     SOURCE
          /sys/src/libc/9sys

     SEE ALSO
          intro(2), stat(2), intro(5)

     Page 25                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

     NAME
          stat, fstat, wstat, fwstat, dirstat, dirfstat, dirwstat,
          dirfwstat, nulldir - get and put file status

     SYNOPSIS
          #include <u.h>
          #include <libc.h>

          int stat(char *name, uchar *edir, int nedir)

          int fstat(int fd, uchar *edir, int nedir)

          int wstat(char *name, uchar *edir, int nedir)

          int fwstat(int fd, uchar *edir, int nedir)

          Dir* dirstat(char *name)

          Dir* dirfstat(int fd)

          int dirwstat(char *name, Dir *dir)

          int dirfwstat(int fd, Dir *dir)

          void nulldir(Dir *d)

     DESCRIPTION
          Given a file's name, or an open file descriptor fd, these
          routines retrieve or modify file status information.  Stat,
          fstat, wstat, and fwstat are the system calls; they deal
          with machine-independent directory entries. Their format is
          defined by stat(5). Stat and fstat retrieve information
          about name or fd into edir, a buffer of length nedir,
          defined in <libc.h>.  Wstat and fwstat write information
          back, thus changing file attributes according to the con-
          tents of edir. The data returned from the kernel includes
          its leading 16-bit length field as described in intro(5).
          For symmetry, this field mustal also be present when passing
          data to the kernel in a call to wstat and fwstat, but its
          value is ignored.

          Dirstat, dirfstat, dirwstat, and dirfwstat are similar to
          their counterparts, except that they operate on Dir struc-
          tures:

               typedef
               struct Dir {
                     /* system-modified data */
                     uint  type;    /* server type */
                     uint  dev;     /* server subtype */

     Page 26                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

                     /* file data */
                     Qid   qid;     /* unique id from server */
                     ulong mode;    /* permissions */
                     ulong atime;   /* last read time */
                     ulong mtime;   /* last write time */
                     vlong length;  /* file length: see <u.h> */
                     char  *name;   /* last element of path */
                     char  *uid;    /* owner name */
                     char  *gid;    /* group name */
                     char  *muid;   /* last modifier name */
               } Dir;

          The returned structure is allocated by malloc(2); freeing it
          also frees the associated strings.

          This structure and the Qid structure are defined in
          <libc.h>.  If the file resides on permanent storage and is
          not a directory, the length returned by stat is the number
          of bytes in the file.  For directories, the length returned
          is zero.  For files that are streams (e.g., pipes and net-
          work connections), the length is the number of bytes that
          can be read without blocking.

          Each file is the responsibility of some server: it could be
          a file server, a kernel device, or a user process.  Type
          identifies the server type, and dev says which of a group of
          servers of the same type is the one responsible for this
          file.  Qid is a structure containing path and vers fields:
          path is guaranteed to be unique among all path names cur-
          rently on the file server, and vers changes each time the
          file is modified.  The path is a long long (64 bits, vlong)
          and the vers is an unsigned long (32 bits, ulong).  Thus, if
          two files have the same type, dev, and qid they are the same
          file.

          The bits in mode are defined by

                0x80000000   directory
                0x40000000   append only
                0x20000000   exclusive use (locked)

                      0400   read permission by owner
                      0200   write permission by owner
                      0100   execute permission (search on directory) by owner
                      0070   read, write, execute (search) by group
                      0007   read, write, execute (search) by others

          There are constants defined in <libc.h> for these bits:
          DMDIR, DMAPPEND, and DMEXCL for the first three; and DMREAD,
          DMWRITE, and DMEXEC for the read, write, and execute bits
          for others.

     Page 27                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

          The two time fields are measured in seconds since the epoch
          (Jan 1 00:00 1970 GMT).  Mtime is the time of the last
          change of content.  Similarly, atime is set whenever the
          contents are accessed; also, it is set whenever mtime is
          set.

          Uid and gid are the names of the owner and group of the
          file; muid is the name of the user that last modified the
          file (setting mtime).  Groups are also users, but each
          server is free to associate a list of users with any user
          name g, and that list is the set of users in the group g.
          When an initial attachment is made to a server, the user
          string in the process group is communicated to the server.
          Thus, the server knows, for any given file access, whether
          the accessing process is the owner of, or in the group of,
          the file.  This selects which sets of three bits in mode is
          used to check permissions.

          Only some of the fields may be changed with the wstat calls.
          The name can be changed by anyone with write permission in
          the parent directory.  The mode and mtime can be changed by
          the owner or the group leader of the file's current group.
          The gid can be changed by the owner if he or she is a member
          of the new group.  The gid can be changed by the group
          leader of the file's current group if he or she is the
          leader of the new group.  The length can be changed by any-
          one with write permission, provided the operation is imple-
          mented by the server.  (See intro(5) for permission informa-
          tion, and users(6) for user and group information).

          Special values in the fields of the Dir passed to wstat
          indicate that the field is not intended to be changed by the
          call.  The values are ~0 for integral values and the empty
          string for string values.  The routine nulldir initializes a
          Dir to all `ignore' values.  Thus one may change the mode,
          for example, by using nulldir to initialize a Dir, then set-
          ting the mode, and then doing wstat; it is not necessary to
          use stat to retrieve the initial values first.

     SOURCE
          /sys/src/libc/9syscall  for the non-dir routines
          /sys/src/libc/9sys      for the routines prefixed dir

     SEE ALSO
          intro(2), fcall(2), dirread(2), stat(5)

     DIAGNOSTICS
          All these functions return the number of bytes copied on
          success, -1 on error, and set errstr.

          If the buffer for stat or fstat is too short for the
          returned data, the return value will be BIT16SZ (see

     Page 28                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

          fcall(2)) and the two bytes returned will contain the ini-
          tial count field of the returned data; retrying with nedir
          equal to that value plus BIT16SZ (for the count itself)
          should succeed.

     Page 29                      Plan 9             (printed 1/27/00)



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-01-31 17:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-01-31  2:16 [9fans] 9P2000 rob pike
2001-01-31  8:56 ` Mike Haertel
  -- strict thread matches above, loose matches on Subject: below --
2001-01-31  9:31 Russ Cox
2001-01-31 17:46 ` Mike Haertel
2001-01-31  2:18 rob pike
2001-01-30 12:09 rog
2001-01-30 18:04 ` Mike Haertel
2001-01-27 21:58 rob pike
2001-01-28 16:29 ` Sam Ducksworth
2001-01-30  9:21 ` Mike Haertel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).