From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Haertel Message-Id: <200101300921.f0U9LpW00742@ducky.net> To: 9fans@cse.psu.edu Subject: Re: [9fans] 9P2000 Cc: mike@ducky.net, rob@plan9.bell-labs.com In-Reply-To: <20010127215828.D7A06199D5@mail.cse.psu.edu> Date: Tue, 30 Jan 2001 01:21:51 -0800 Topicbox-Message-UUID: 5468ddac-eac9-11e9-9e20-41e7f4b1d025 Here are some reactions. They mostly boil down to suggested changes that will make the specification of the protocol both simpler and more bulletproof. Here is a concise summary of the proposed protocol changes: * restrict allowable contents of file, owner, and group names at the protocol level to be equivalent to the restrictions imposed at the Plan 9 kernel level. * eliminate the useless special case of ~0 tags. * eliminate multiple Tsessions from the protocol; require that each connection begin with exactly one Tversion, exactly one Tsession, and disallow any further occurrences of Tsession and Tversion in the conversation. then the funny "aborts all transactions" semantics of Tsession can also be eliminated. * specify a "minimum maximum" msize that a client can request in such a way that the client can always read() any stat structure that the server might need to return for any possible directory entry. * expand time stamps to 64 bits for posterity. * forbid attempts in wstat to alter the length of a directory. * remove discussion of Plan 9 group leader semantics and other weird stuff from the protocol specification. similarly remove the claim that wstat cannot change file ownership from the specification. instead say that allowable owner, group, and permission changes are determined at the discretion of whatever security policy the server chooses to implement. (the discussion of Plan 9 group semantics would presumably migrate to the man page for the specific file server.) * ensure that walks to .. are reliable by explicitly requiring at the protocol level that the hierarchy is always a strict tree. * disallow walks to "" (the zero length name) in addition to the already-disallowed walks to "." * in walk operations that fail, newfid should be implicitly clunked unless it was equal to fid. Long discussion follows... (There are also a few stylistic comments.) > INTRO(5) INTRO(5) > Tversion size[4] tag[2] msize[4] version[s] > Rversion size[4] tag[2] msize[4] version[s] > [...] Consider changing the format of this table to read: size[4] Tversion[1] tag[2] msize[4] version[s] size[4] Rversion[1] tag[2] msize[4] version[s] [...] to explicitly show the placement of the type byte in each message. The old format was appropriate when the type byte was the first byte of the message, but with the new protocol the table was confusing until I reread the preceding paragraph that described the placement of the message type byte after the size[4]. This proposed format is self documenting at a glance. By the way, one thing I really like about the new encoding is that emulating the old "fcall" streams module becomes trivial. >[...] > (Systems may choose > to reduce the set of legal characters to reduce syntactic > problems, for example to remove slashes from name compo- > nents, but the protocol has no such restriction. Plan 9 > names may contain any printable character (that is, any > character outside hexadecimal 00-1F and 80-9F) except > slash.) I think it is a huge mistake to say "the protocol has no such restriction". One of the big problems with Unix was that you could have nearly arbitrary characters in filenames, but a lot of programs (notably things like cpio, xargs, and the shell itself) did not take this possibility seriously. It takes a lot more experience than it should to write reliable scripts for dealing with files in Unix. Admittedly rc has much cleaner quoting than the Bourne shell, and Plan 9 has helped by outlawing newlines in file names. However, why reincarnate the same problem in a different guise? If 9P2000 servers can export arbitrary strings as file name components, but the clients (e.g. the Plan 9 kernel) device) can't handle some of those strings, then it will be impossible to write reliable client programs. Consider u9fs. Since Unix allows nearly arbitrary file names, it is quite easy now to create Unix files that you can't access from a Plan 9 client. If the protocol explicitly forbids funny characters, then u9fs will have to be fixed to map those characters in some way, or it can't claim to be a 9P implementation. I think that would be a desirable state of affairs. Another way of putting this: in theory the protocol has no such restriction, but in practice it does, and always will; therefore, why not fix the theory to admit the practical restrictions? > An exception is the tag ~0, meaning `no tag': the > client can use it, when establishing a connection, to over- > ride tag matching in version and session messages. This is poorly worded. The Tsession page states that the tag must be ~0; saying here that the client "can" use it makes it sound optional. Also, can a client use a tag of ~0 on a Tversion transaction that is not the first transaction on a new connection? This whole ~0 feature is undesirable and adds useless complexity. Tsession could just as well require a tag of 0, since it flushes all tags. It could guarantee that the reply tag is 0. And there is no reason that Tversion can't or shouldn't be required to just have a normal tag. Then you can completely eliminate special cases in servers (treatment of ~0 tags) and clients (the need to avoid accidently generating ~0 tags). > The version message identifies the version of the protocol > and indicates the maximum message size the system is pre- > pared to handle. A session request initializes a connection > and aborts all outstanding I/O on the connection. The set > of messages between session requests is called a session. I've always wondered why the session transaction has these abort semantics. It is easy to see the exchange of authentication data as a justification for a required Tsession style message at the beginning of a session, and it is also easy to see different protocol versions might require different session messages, hence justifying the existence of Tversion as a separate transaction required before Tsession. But I don't understand the point of the abort semantics. My guess is that it is intended to support some kind of persistent channel to a server, analagous to hard wired serial port, where there is no out-of-band concept of channel setup or teardown. Unlike TCP, where the setup and teardown connection can be detected independently of the bytes transmitted on the virtual circuit. So, for example, a client reboot could result in a new Tsession on such an imaginary hardwired connection. The problem with this idea is that it is insufficient to give reliable behavior in the face of arbitrary client or server crashes or misbehavior, since the 9P encoding (and the proposed 9P2000 encoding) provides no easy way to resynchronize with the byte stream if you somehow lose track of transaction boundaries. (Ok, can you tell that I've been implementing SONET recently? :-) I would like to suggest that the "abort" semantics be removed from Tsession. Admit that an underlying transport protocol will always need to exist, specify that only one Tsession message can ever be sent during the lifetime of a connection, and specify that outstanding transactions are aborted when the underlying protocol's connection is shut down. If I have misunderstood the point of the "abort" semantics of Tsession, please explain why it's there. > The stat transaction retrieves information about the file. > The stat field in the reply includes the file's name, access > permissions (read, write and execute for owner, group and > public), access and modification times, and owner and group > identifications (see stat(2)). The owner and group identifi- > cations are textual names. The wstat transaction allows > some of a file's properties to be changed. Again, I would like to lobby for protocol-imposed restrictions on the legal contents of owner and group names. > DIRECTORIES > Directories are created by create with DMDIR set in the per- > missions argument (see stat(5)). The members of a directory > can be found with read(5). All directories must support > walks to the directory .. (dot-dot) meaning parent direc- > tory, although by convention directories contain no explicit > entry for .. or . (dot). The parent of the root directory > of a server's tree is itself. If I walk to foo/bar/.. does the protocol require that I return to bar? I.e. is the file hierarchy required to be strictly a tree? It looks to me like another one of those restrictions the protocol should impose for the sanity of the client: without such a restriction, the "lexical names" feature in the Plan 9 kernel could get hopelessly confused. > CLUNK(5) CLUNK(5) > > Even if the clunk returns an error, the fid is no longer > valid. What are plausible errors associated with Tclunk (other than attempting to clunk an invalid fid)? The only thing I could think of would be deferred errors associated with earlier transactions that were not detected until later, like media errors associated with deferred writes. I assume the intent of allowing errors on Tclunk is that the error returned is returned as the result of the close() system call? (Of course, not every Tclunk corresponds to a close()...) > ERROR(5) ERROR(5) > > By convention, clients may truncate error messages after 255 > bytes, defined as ERRMAX in . Translation: the server ought to make sure the meat of the error message fits into the first 255 bytes, otherwise the user of the client might not see it. > READ(5) READ(5) > For directories, read returns an integral number of direc- > tory entries exactly as in stat (see stat(5)), one for each > member of the directory. The read request message must have > offset equal to zero or the value of offset in the previous > read on the directory, plus the number of bytes returned in > the previous read. In other words, seeking other than to > the beginning is illegal in a directory (see seek(2)). What happens if I have a directory entry SomeReallyLongStupidFileNameFromJava and I attempt to read() fewer than the bytes required for the associated stat structure? This could happen two ways: (A) the client application might have just issued a really small read request, or (B) the byte count of the directory entry might result in the required size of the Rread message exceeding the negotiated maximum transaction size between the 9P client and server. Scenario (A) can always be handled at the client application level by executing a seek to the beginning of the directory and rescanning with a larger buffer. (Or by just always using a 64K+1 read in the first place, darnit.) So scenario (A) is not a serious threat to the integrity of the underlying protocol design. Scenario (B) is bad. There is no easy way for the client to recover. Certainly the client application can do nothing about it: the protocol connection is already established and the msize is fixed in stone. At the protocol level one hypothetical solution might be for the server to return some kind of error cookie that: 1) Indicates there was a really long directory entry. 2) Returns a new offset that the client can use to read beyond the directory entry that didn't fit. This is important--we wouldn't want it to be possible to "hide" files behind ReallyLongDirectoryEntries. Another hypothetical solution: the server could have a notion of a "truncated stat structure" that returns as much as will fit, plus the real offset to the next directory entry. Both of these possibilities are needlessly complex. Better if scenario (B) could never happen. It would be easy for the server to ensure this by preventing such files from ever be created in the first place -- except for one tiny hitch. That is that the server cannot exceed the client's requested msize that was previously specified in Tversion. So a client that negotiates a too-small msize can make scenario (B) possible. Rather than adding a complex special case response to the server's repertoire that all clients would have to know about, I'd prefer to legislate this situation out of existence: add a "minimum maximum" to Tversion: require that the smallest allowable msize that a client can request is 64K + some slop, enough to hold an Rread containing one worst-case stat structure. Then the need for a way to recover from scenario (B) is removed from the protocol. If 64K+slop is unpalatably large, consider specifying a smaller maximum possible stat record, say 8K-slop, so that the minimum msize becomes 8K exactly. > STAT(5) STAT(5) > name[ s ] > file name; must be / if the file is the root directory > of the server Not to beat on a dead horse, but other than this one exception, *please* outlaw /'s in file names throughout the protocol. > Servers may implement a time- > out on the lock on an exclusive use file: if the fid holding > the file open has been unused for an extended period (of > order at least minutes), it is reasonable to break the lock > and deny the initial fid further I/O. Consider an allowable minimum and a required maximum timeout? This is one of those situations where you know that whatever you specify will be wrong, but it's still better to have a specification so that all implementations will be broken in exactly the same way. > The two time fields are measured in seconds since the epoch > (Jan 1 00:00 1970 GMT). The mtime field reflects the time > of the last change of content (except when later changed by > wstat). For a plain file, mtime is the time of the most > recent create, open with truncation, or write; for a direc- > tory it is the time of the most recent remove, create, or > wstat of a file in the directory. Similarly, the atime > field records the last read of the contents; also it is set > whenever mtime is set. In addition, for a directory, it is > set by an attach, walk, or create, all whether successful or > not. Consider changing the time fields to 64 bits. 2038 is not so far away. Also for the benefit of programs like mk it would arguably desirable for timestamps to have finer granularity than 1 second in today's world of very fast computers (although I suppose mk could detect "instantaneous" commands by looking for changed qid.versions). Say 1 microsecond? 64 bits offers a lot of room... > The wstat request can change some of the file status infor- > mation. [...] The length can be > changed (affecting the actual length of the file) by anyone > with write permission on the file. It is an error to > attempt to set the length of a directory to a non-zero > value, and servers may decide to reject length changes for > other reasons. Assuming the server does *not* reject truncation of a directory to length 0, should a client assume that all files under the directory have been removed? This is another one of those possible complications that I think should be eliminated by specifying them out of the protocol: always reject attempts by wstat to change the length of a directory. > None > of the other data can be altered by a wstat. In particular, > there is no way to change the owner of a file. This is not true in existing implementations: for example, with "disk/kfscmd allow", I can change file ownership. Moreover this is a necessary feature for system administration to ensure that system files have the right owners. I would argue that the protocol allows you to request a change of ownership, and that it is at the server's discretion whether to allow or reject, according to the security policy of the server, which should not be considered part of the protocol. In fact, I would go a bit further: the whole concept of "group leaders" is a weird Plan 9 thing that is not true on, say, a Unix based server. So it should also be at the server's discretion whether to accept or reject group changes, again according to a security policy that is considered outside the scope of the protocol. Changes in ownership, group, or permissions that are refused should always result in an Rerror. (Alright, I see you've covered that later in the "all or nothing" clause...) (And the discussion of the main Plan 9 file server's security policy should really be on some other manual pages than the definition of 9P.) Now at this point I suppose you'll jump on me and argue that I here I am arguing for server-dependent variations in behavior, whereas above (on file names, owner/group names, and the meaning of ..) I was arguing for required uniform behavior across all servers. The reason is that here I consider implementation-dependent variations less harmful, since relatively few programs normally want to mess with file ownership, and those that do have a reasonable expectation of the operations failing anyway. In contrast, non-uniform rules for allowable file, owner, and group names or the meaning of .. would pervasively break a whole lot of programs, like any script that wants to parse the output of "ls -l" or expects "cd .." to go somewhere reliable. > Note that since the stat information is sent as a 9P > variable-length datum, it is limited to a maximum of 65535 > bytes. So what should happen if I use Tcreat to create a file name that is so long that the stat structure associated with the file would exceed 64k-1 bytes? I would argue that the Tcreate man page should explicitly say such requests must always fail. > VERSION(5) VERSION(5) > > NAME > version - negotiate protocol version > > SYNOPSIS > Tversion size[4] tag[2] msize[4] version[s] > Rversion size[4] tag[2] msize[4] version[s] > > DESCRIPTION > The version request negotiates the protocol version and mes- > sage size to be used on the connection. Tversion must be > the first message sent on the 9P connection, and the client > cannot issue any further requests until it has received the > Rversion reply. Can you issue another Tversion later? I would argue that it should be explicitly prohibited, even more strongly than I previously argued that multiple Tsessions should be prohibited. > The client suggests a maximum message size, msize, that is > the maximum length, in bytes, it will ever generate or > expect to receive in a single 9P message. As previously mentioned, please specify a minimum msize that a client is allowed to request, and make the largest possible stat record consistent with the value of this minimal msize. > WALK(5) WALK(5) Interesting: this subsumes the old "clwalk", and also subsumes the old "clone" via the subterfuge of zero-element walks. > The element ``..'' (dot-dot) represents the parent direc- > tory. The name ``.'' (dot), meaning the current directory, > is not used in the protocol. > > It is legal for nwname to be zero, in which case newfid will > represent the same file as fid and the walk will usually > succeed; this is equivalent to walking to dot. The rest of > this discussion assumes nwname is greater than zero. Do these two paragraphs taken together mean that when the mnt(3) device When mnt(3) sees the name "foo/./bar", is it expected to generate walk("foo", "", "bar"), or is it expected to generate walk("foo", "bar")? I would argue that walk("") should be simply disallowed: if the mnt(3) device needs to elide walks to ".", it might as well also elide walks to "" as well; that way you can eliminate a special case that would otherwise need to be explicitly coded in all servers. > If the first element cannot be walked for any reason, Rerror > is returned. Otherwise, the walk will return an Rwalk mes- > sage containing nqid qids corresponding, in order, to the > files that are visited by the nqid successful elementwise > walks; nqid is therefore either nwname or the index of the > first elementwise walk that failed. The value of nqid can- > not be zero unless nwname is zero. Also, nqid will always > be less than or equal to nwname. Only if it is equal, how- > ever, will newfid be affected, in which case it will repre- > sent the file reached by the final elementwise walk > requested in the message. If the walk operation fails, does newfid exist (and point to the same qid as fid), or is it implicitly clunked? My suggestion: If the walk fails, newfid should be implicitly clunked unless it was equal to fid.