From mboxrd@z Thu Jan 1 00:00:00 1970 From: "rob pike" To: 9fans@cse.psu.edu MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Message-Id: <20010127215828.D7A06199D5@mail.cse.psu.edu> Subject: [9fans] 9P2000 Date: Sat, 27 Jan 2001 16:58:26 -0500 Topicbox-Message-UUID: 53af3f14-eac9-11e9-9e20-41e7f4b1d025 I've been thinking of sending this information to 9fans for a while. Since the cat is out of the bag, now is as good a time as any. We have reworked 9P to address many of its failings, most important: 1) Nesting and encapsulation: exportfs embeds 9P within 9P, which can make reads and writes not fit within the 8K limit. 2) Walk performance: it takes too many walks to evaluate a name. 3) Sizes fixed and too small: read/write sizes and, most important, path name elements have limited, too-small sizes. 4) Authentication too rigid: the authentication protocols were defined in the protocol and so impossible to change. And a host of other lesser things. We have a file server and kernel running this protocol now and have adapted much but not all of our stuff; it's not yet the system we live with. Comments on the following man pages are welcome. I've included all of section 5 (9P itself, now 9P2000) and some relevant parts of section 2. Directory handling is very different, for example. There may be many errors in these pages and many details are sure to change before we're done. Until this stuff gets installed and a lot of shaking down has happened, there won't be much in the way of updates to the existing distribution. That is the real reason things have seemed quiet lately, not the Lucent announcements. -rob INTRO(5) INTRO(5) NAME intro - introduction to the Plan 9 File Protocol, 9P SYNOPSIS #include DESCRIPTION A Plan 9 server is an agent that provides one or more hier- archical file systems - file trees - that may be accessed by Plan 9 processes. A server responds to requests by clients to navigate the hierarchy, and to create, remove, read, and write files. The prototypical server is a separate machine that stores large numbers of user files on permanent media; such a machine is called, somewhat confusingly, a file server. Another possibility for a server is to synthesize files on demand, perhaps based on information on data struc- tures inside the kernel; the proc(3) kernel device is a part of the Plan 9 kernel that does this. User programs can also act as servers. A connection to a server is a bidirectional communication path from the client to the server. There may be a single client or multiple clients sharing the same connection. A server's file tree is attached to a process group's name space by bind(2) and mount calls; see intro(2). Processes in the group are then clients of the server: system calls oper- ating on files are translated into requests and responses transmitted on the connection to the appropriate service. The Plan 9 File Protocol, 9P, is used for messages between clients and servers. A client transmits requests (T- messages) to a server, which subsequently returns replies (R-messages) to the client. The combined acts of transmit- ting (receiving) a request of a particular type, and receiv- ing (transmitting) its reply is called a transaction of that type. Each message consists of a sequence of bytes. Two-, four-, and eight-byte fields hold unsigned integers represented in little-endian order (least significant byte first). Data items of larger or variable lengths are represented by a two-byte field specifying a count, n, followed by n bytes of data. Text strings are represented this way, with the text itself stored as a UTF-8 encoded sequence of Unicode charac- ters (see utf(6)). Text strings in 9P messages are not NUL- terminated: n counts the bytes of UTF-8 data, which include no final zero byte. The NUL character is illegal in all text strings in 9P, and is therefore excluded from file names, user names, and so on. Page 1 Plan 9 (printed 1/27/00) INTRO(5) INTRO(5) Each 9P message begins with a four-byte size field specify- ing the length in bytes of the complete message including the four bytes of the size field itself. The next byte is the message type, one of the constants in the enumeration in the include file . The remaining bytes are parame- ters of different sizes. In the message descriptions below, the number of bytes in a field is given in brackets after the field name. The notation parameter[n] where n is not a constant represents a variable-length parameter: n[2] fol- lowed by n bytes of data forming the parameter. The notation string[s] (using a literal s character) is shorthand for s[2] followed by s bytes of UTF-8 text. (Systems may choose to reduce the set of legal characters to reduce syntactic problems, for example to remove slashes from name compo- nents, but the protocol has no such restriction. Plan 9 names may contain any printable character (that is, any character outside hexadecimal 00-1F and 80-9F) except slash.) Messages are transported in byte form to allow for machine independence; fcall(2) describes routines that con- vert to and from this form into a machine-dependent C struc- ture. MESSAGES Tversion size[4] tag[2] msize[4] version[s] Rversion size[4] tag[2] msize[4] version[s] Tsession size[4] tag[2] chal[n] Rsession size[4] tag[2] chal[n] authid[s] authdom[s] Rerror size[4] tag[2] ename[s] Tflush size[4] tag[2] oldtag[4] Rflush size[4] tag[2] Tattach size[4] tag[2] fid[4] uname[s] aname[s] auth[n] Rattach size[4] tag[2] qid[13] rauth[n] Twalk size[4] tag[2] fid[4] newfid[4] nwname[2] nwname*(wname[s]) Rwalk size[4] tag[2] nwqid[2] nwqid*(wqid[13]) Topen size[4] tag[2] fid[4] mode[1] Ropen size[4] tag[2] qid[13] iounit[4] Tcreate size[4] tag[2] fid[4] name[s] perm[4] mode[1] Rcreate size[4] tag[2] qid[13] iounit[4] Tread size[4] tag[2] fid[4] offset[8] count[4] Rread size[4] tag[2] count[4] data[count] Page 2 Plan 9 (printed 1/27/00) INTRO(5) INTRO(5) Twrite size[4] tag[2] fid[4] offset[8] count[4] data[count] Rwrite size[4] tag[2] count[4] Tclunk size[4] tag[2] fid[4] Rclunk size[4] tag[2] Tremove size[4] tag[2] fid[4] Rremove size[4] tag[2] Tstat size[4] tag[2] fid[4] Rstat size[4] tag[2] stat[n] Twstat size[4] tag[2] fid[4] stat[n] Rwstat size[4] tag[2] Each T-message has a tag field, chosen and used by the client to identify the message. The reply to the message will have the same tag. Clients must arrange that no two outstanding messages on the same connection have the same tag. An exception is the tag ~0, meaning `no tag': the client can use it, when establishing a connection, to over- ride tag matching in version and session messages. The type of an R-message will either be one greater than the type of the corresponding T-message or Rerror, indicating that the request failed. In the latter case, the ename field contains a string describing the reason for failure. The version message identifies the version of the protocol and indicates the maximum message size the system is pre- pared to handle. A session request initializes a connection and aborts all outstanding I/O on the connection. The set of messages between session requests is called a session. Most T-messages contain a fid, a 32-bit unsigned integer that the client uses to identify a ``current file'' on the server. Fids are somewhat like file descriptors in a user process, but they are not restricted to files open for I/O: directories being examined, files being accessed by stat(2) calls, and so on - all files being manipulated by the oper- ating system - are identified by fids. Fids are chosen by the client. All requests on a connection share the same fid space; when several clients share a connection, the agent managing the sharing must arrange that no two clients choose the same fid. The first fid supplied (in an attach message) will be taken by the server to refer to the root of the served file tree. The attach identifies the user to the server and may specify a particular file tree served by the server (for those that supply more than one). A walk message causes the server to Page 3 Plan 9 (printed 1/27/00) INTRO(5) INTRO(5) change the current file associated with a fid to be a file in the directory that is the old current file, or one of its subdirectories. Walk returns a new fid that refers to the resulting file. Usually, a client maintains a fid for the root, and navigates by walks from the root fid. A client can send multiple T-messages without waiting for the corresponding R-messages, but all outstanding T-messages must specify different tags. The server may delay the response to a request on one fid and respond to later requests on other fids; this is sometimes necessary, for example when the client reads from a file that the server synthesizes from external events such as keyboard charac- ters. Replies (R-messages) to attach, walk, open, and create requests convey a qid field back to the client. The qid represents the server's unique identification for the file being accessed: two files on the same server hierarchy are the same if and only if their qids are the same. (The client may have multiple fids pointing to a single file on a server and hence having a single qid.) The seventeen-byte qid fields hold a one-byte type, specifying whether the file is a directory, append-only file, etc., and two eight-byte unsigned integers: first the qid path, then the qid version. The path is an integer unique among all files in the hierar- chy. If a file is deleted and recreated with the same name in the same directory, the old and new path components of the qids should be different. The version is a version num- ber for a file; typically, it is incremented every time the file is modified. An existing file can be opened, or a new file may be created in the current (directory) file. I/O of a given number of bytes at a given offset on an open file is done by read and write. A client should clunk any fid that is no longer needed. The remove transaction deletes files. The stat transaction retrieves information about the file. The stat field in the reply includes the file's name, access permissions (read, write and execute for owner, group and public), access and modification times, and owner and group identifications (see stat(2)). The owner and group identifi- cations are textual names. The wstat transaction allows some of a file's properties to be changed. A request can be aborted with a Tflush request. When a server receives a Tflush, it should not reply to the message with tag oldtag (unless it has already replied), and it should immediately send an Rflush. The client must wait Page 4 Plan 9 (printed 1/27/00) INTRO(5) INTRO(5) until it gets the Rflush (even if the reply to the original message arrives in the interim), at which point oldtag may be reused. Most programs do not see the 9P protocol directly; instead calls to library routines that access files are translated by the mount driver, mnt(3), into 9P messages. DIRECTORIES Directories are created by create with DMDIR set in the per- missions argument (see stat(5)). The members of a directory can be found with read(5). All directories must support walks to the directory .. (dot-dot) meaning parent direc- tory, although by convention directories contain no explicit entry for .. or . (dot). The parent of the root directory of a server's tree is itself. ACCESS PERMISSIONS Each file server maintains a set of user and group names. Each user can be a member of any number of groups. Each group has a group leader who has special privileges (see stat(5) and users(6)). Every file request has an implicit user id (copied from the original attach) and an implicit set of groups (every group of which the user is a member). Each file has an associated owner and group id and three sets of permissions: those of the owner, those of the group, and those of ``other'' users. When the owner attempts to do something to a file, the owner, group, and other permissions are consulted, and if any of them grant the requested per- mission, the operation is allowed. For someone who is not the owner, but is a member of the file's group, the group and other permissions are consulted. For everyone else, the other permissions are used. Each set of permissions says whether reading is allowed, whether writing is allowed, and whether executing is allowed. A walk in a directory is regarded as executing the directory, not reading it. Per- missions are kept in the low-order bits of the file mode: owner read/write/execute permission represented as 1 in bits 8, 7, and 6 respectively (using 0 to number the low order). The group permissions are in bits 5, 4, and 3, and the other permissions are in bits 2, 1, and 0. The file mode contains some additional attributes besides the permissions. If bit 31 is set, the file is a directory; if bit 30 is set, the file is append-only (offset is ignored in writes); if bit 29 is set, the file is exclusive-use (only one client may have it open at a time). These bits are reproduced, from the top bit down, in the type byte of the Qid. Page 5 Plan 9 (printed 1/27/00) ATTACH(5) ATTACH(5) NAME attach, session - messages to initiate activity SYNOPSIS Tsession size[4] tag[2] chal[n] Rsession size[4] tag[2] chal[n] authid[s] authdom[s] Tattach size[4] tag[2] fid[4] uid[s] aname[s] auth[n] Rattach size[4] tag[2] qid[13] rauth[n] DESCRIPTION The session request initializes a connection between a client and a server and exchanges authentication informa- tion. All outstanding I/O on the connection is aborted. The set of messages between session requests is called a session. The host's user name (authid) and its authentica- tion domain (authdom) identify the key to be used when authenticating to this host. The exchanged challenges (chal) are used in the authentication algorithm. If authid is an empty string no authentication is performed in this session. The tag should be NOTAG (value ~0) for a session message. The attach message serves as a fresh introduction from a user on the client machine to the server. The message iden- tifies the user (uid) and may select the file tree to access (aname). The auth argument contains authorization data derived from the exchanged challenges of the session mes- sage; see auth(6). As a result of the attach transaction, the client will have a connection to the root directory of the desired file tree, represented by fid. An error is returned if fid is already in use. The server's idea of the root of the file tree is represented by the returned qid. ENTRY POINTS An attach transaction will be generated for kernel devices (see intro(3)) when a system call evaluates a file name beginning with `#'. Pipe(2) generates an attach on the ker- nel device pipe(3). The mount system call (see bind(2)) gen- erates an attach message to the remote file server. When the kernel boots, an attach is made to the root device, root(3), and then an attach is made to the requested file server machine. SEE ALSO version(5), auth(6) Page 6 Plan 9 (printed 1/27/00) CLUNK(5) CLUNK(5) NAME clunk - forget about a fid SYNOPSIS Tclunk size[4] tag[2] fid[4] Rclunk size[4] tag[2] DESCRIPTION The clunk request informs the file server that the current file represented by fid is no longer needed by the client. The actual file is not removed on the server unless the fid had been opened with ORCLOSE. Once a fid has been clunked, the same fid can be reused in a new walk or attach request. Even if the clunk returns an error, the fid is no longer valid. ENTRY POINTS A clunk message is generated by close and indirectly by other actions such as failed open calls. Page 7 Plan 9 (printed 1/27/00) ERROR(5) ERROR(5) NAME error - return an error SYNOPSIS Rerror size[4] tag[2] ename[s] DESCRIPTION The Rerror request (there is no Terror) is used to return an error string describing the failure of a transaction. It replaces the corresponding reply message that would accom- pany a successful call; its tag is that of the request. By convention, clients may truncate error messages after 255 bytes, defined as ERRMAX in . Page 8 Plan 9 (printed 1/27/00) FLUSH(5) FLUSH(5) NAME flush - abort a message SYNOPSIS Tflush size[4] tag[2] oldtag[4] Rflush size[4] tag[2] DESCRIPTION When the response to a request is no longer needed, such as when a user interrupts a process doing a read(2), a Tflush request is sent to the server to purge the pending response. The message being flushed is identified by oldtag. The semantics of flush depends on messages arriving in order. The server must answer the flush message immediately. If it recognizes oldtag as the tag of a pending transaction, it should abort any pending response and discard that tag. In either case, it should respond with an Rflush echoing the tag (not oldtag) of the Tflush message. A Tflush can never be responded to by an Rerror message. When the client sends a Tflush, it must wait to receive the corresponding Rflush before reusing oldtag for subsequent messages. If a response to the flushed request is received before the Rflush, the client must honor the response as if it had not been flushed, since the completed request may signify a state change in the server. For instance, Tcreate will have created a file and Twalk may have allocated a fid. If no response is received before the Rflush, the flushed transaction is considered to have been canceled, and should be treated as though it had never been sent. Several exceptional conditions are handled correctly by the above specification: sending multiple flushes for a single tag, flushing after a transaction is completed, flushing a Tflush, and flushing an invalid tag. Page 9 Plan 9 (printed 1/27/00) OPEN(5) OPEN(5) NAME open, create - prepare a fid for I/O on an existing or new file SYNOPSIS Topen size[4] tag[2] fid[4] mode[1] Ropen size[4] tag[2] qid[13] iounit[4] Tcreate size[4] tag[2] fid[4] name[s] perm[4] mode[1] Rcreate size[4] tag[2] qid[13] iounit[4] DESCRIPTION The open request asks the file server to check permissions and prepare a fid for I/O with subsequent read and write messages. The mode field determines the type of I/O: 0, 1, 2, and 3 mean read access, write access, read and write access, and execute access, to be checked against the per- missions for the file. In addition, if mode has the OTRUNC (0x10) bit set, the file is to be truncated, which requires write permission (if the file is append-only, and permission is granted, the open succeeds but the file will not be trun- cated); if the mode has the ORCLOSE (0x40) bit set, the file is to be removed when the fid is clunked, which requires permission to remove the file from its directory. If other bits are set in mode they will be ignored. It is illegal to write a directory, truncate it, or attempt to remove it on close. If the file is marked for exclusive use (see stat(5)), only one client can have the file open at any time. That is, after such a file has been opened, further opens will fail until fid has been clunked. All these per- missions are checked at the time of the open request; subse- quent changes to the permissions of files do not affect the ability to read, write, or remove an open file. The create request asks the file server to create a new file with the name supplied, in the directory (dir) represented by fid, and requires write permission in the directory. The owner of the file is the implied user id of the request, the group of the file is the same as dir, and the permissions are the value of perm & (~0666 | (dir.perm & 0666)) if a regular file is being created and perm & (~0777 | (dir.perm & 0777)) if a directory is being created. This means, for example, that if the create allows read permission to others, but the containing directory does not, then the created file will not allow others to read the file. Finally, the newly created file is opened according to mode, and fid will represent the newly opened file. Mode is not Page 10 Plan 9 (printed 1/27/00) OPEN(5) OPEN(5) checked against the permissions in perm. The qid for the new file is returned with the create reply message. Directories are created by setting the DMDIR bit (0x80000000) in the perm. The names . and .. are special; it is illegal to create files with these names. It is an error for either of these messages if the fid is already the product of a successful open or create message. An attempt to create a file in a directory where the given name already exists will be rejected; in this case, the create system call (see open(2)) uses open with truncation. The algorithm used by the create system call is: first walk to the directory to contain the file. If that fails, return an error. Next walk to the specified file. If the walk succeeds, send a request to open and truncate the file and return the result, successful or not. If the walk fails, send a create message. If that fails, it may be because the file was created by another process after the previous walk failed, so (once) try the walk and open again. For the behavior of create on a union directory, see bind(2). The iounit field returned by open and create may be zero. If it is not, it is the maximum number of bytes that are guaranteed to be read from or written to the file without breaking the I/O transfer into multiple 9P messages; see read(5). ENTRY POINTS Open and create both generate open messages; only create generates a create message. For programs that need atomic file creation, without the race that exists in the open-create sequence described above, the kernel does the following. If the OEXCL (0x1000) bit is set in the mode for a create system call, the open message is not sent; the kernel issues only the create. Thus, if the file exists, create will draw an error, but if it doesn't and the create system call succeeds, the process issuing the create is guaranteed to be the one that created the file. Page 11 Plan 9 (printed 1/27/00) READ(5) READ(5) NAME read, write - transfer data from and to a file SYNOPSIS Tread size[4] tag[2] fid[4] offset[8] count[4] Rread size[4] tag[2] count[4] data[count] Twrite size[4] tag[2] fid[4] offset[8] count[4] data[count] Rwrite size[4] tag[2] count[4] DESCRIPTION The read request asks for count bytes of data from the file identified by fid, which must be opened for reading, start- ing offset bytes after the beginning of the file. The bytes are returned with the read reply message. The count field in the reply indicates the number of bytes returned. This may be less than the requested amount. If the offset field is greater than or equal to the number of bytes in the file, a count of zero will be returned. For directories, read returns an integral number of direc- tory entries exactly as in stat (see stat(5)), one for each member of the directory. The read request message must have offset equal to zero or the value of offset in the previous read on the directory, plus the number of bytes returned in the previous read. In other words, seeking other than to the beginning is illegal in a directory (see seek(2)). The write request asks that count bytes of data be recorded in the file identified by fid, which must be opened for writing, starting offset bytes after the beginning of the file. If the file has been opened append only, the data will be placed at the end of the file regardless of offset. Directories may not be written. The write reply records the number of bytes actually writ- ten. It is usually an error if this is not the same as requested. Because 9P implementations may limit the size of individual messages, more than one message may be produced by a single read or write call. The iounit field returned by open(5), if non-zero, reports the maximum size that is guaranteed to be transferred atomically. ENTRY POINTS Read and write messages are generated by the corresponding calls. Although seek(2) affects the offset, it does not generate a message. Page 12 Plan 9 (printed 1/27/00) REMOVE(5) REMOVE(5) NAME remove - remove a file from a server SYNOPSIS Tremove size[4] tag[2] fid[4] Rremove size[4] tag[2] DESCRIPTION The remove request asks the file server both to remove the file represented by fid and to clunk the fid, even if the remove fails. This request will fail if the client does not have write permission in the parent directory. It is correct to consider remove to be a clunk with the side effect of removing the file if permissions allow. ENTRY POINTS Remove messages are generated by remove. Page 13 Plan 9 (printed 1/27/00) STAT(5) STAT(5) NAME stat, wstat - inquire or change file attributes SYNOPSIS Tstat size[4] tag[2] fid[4] Rstat size[4] tag[2] stat[n] Twstat size[4] tag[2] fid[4] stat[n] Rwstat size[4] tag[2] DESCRIPTION The stat transaction inquires about the file identified by fid. The reply will contain a machine-independent directory entry, stat, laid out as follows: type[2] for kernel use dev[4] for kernel use qid.type[1] the type of the file (directory, etc.), represented as a bit vector corresponding to the high 8 bits of the file's mode word. qid.vers[4] version number for given path qid.path[8] the file server's unique identification for the file mode[4] permissions and flags atime[4] last access time mtime[4] last modification time length[8] length of file in bytes name[ s ] file name; must be / if the file is the root directory of the server uid[ s ] owner name Page 14 Plan 9 (printed 1/27/00) STAT(5) STAT(5) gid[ s ] group name muid[ s ] name of the user who last modified the file Integers in this encoding are in little-endian order (least significant byte first). The convM2D and convD2M routines (see fcall(2)) convert between directory entries and C structs. This encoding may be turned into a machine dependent Dir structure (see stat(2)) using routines defined in fcall(2). The mode contains permission bits as described in intro(5) and the following: 0x80000000 (this file is a directory), 0x40000000 (append only), 0x20000000 (exclusive use); these are echoed in Qid.type. Writes to append-only files always place their data at the end of the file; the offset in the write message is ignored, as is the OTRUNC bit in an open. Exclusive use files may be open for I/O by only one fid at a time across all clients of the server. If a second open is attempted, it draws an error. Servers may implement a time- out on the lock on an exclusive use file: if the fid holding the file open has been unused for an extended period (of order at least minutes), it is reasonable to break the lock and deny the initial fid further I/O. The two time fields are measured in seconds since the epoch (Jan 1 00:00 1970 GMT). The mtime field reflects the time of the last change of content (except when later changed by wstat). For a plain file, mtime is the time of the most recent create, open with truncation, or write; for a direc- tory it is the time of the most recent remove, create, or wstat of a file in the directory. Similarly, the atime field records the last read of the contents; also it is set whenever mtime is set. In addition, for a directory, it is set by an attach, walk, or create, all whether successful or not. The muid field names the user whose actions most recently changed the mtime of the file. The length records the number of bytes in the file. Direc- tories and most files representing devices have a conven- tional length of 0. The stat request requires no special permissions. The wstat request can change some of the file status infor- mation. The name can be changed by anyone with write per- mission in the parent directory; it is an error to change Page 15 Plan 9 (printed 1/27/00) STAT(5) STAT(5) the name to that of an existing file. The length can be changed (affecting the actual length of the file) by anyone with write permission on the file. It is an error to attempt to set the length of a directory to a non-zero value, and servers may decide to reject length changes for other reasons. The mode and mtime can be changed by the owner of the file or the group leader of the file's current group. The directory bit cannot be changed by a wstat; the other defined permission and mode bits can. The gid can be changed: by the owner if also a member of the new group; or by the group leader of the file's current group if also leader of the new group (see intro(5) for more information about permissions and users(6) for users and groups). None of the other data can be altered by a wstat. In particular, there is no way to change the owner of a file. Either all the changes in wstat request happen, or none of them does: if the request succeeds, all changes were made; if it fails, none were. A wstat request can explicitly avoid modifying some proper- ties of the file by providing explicit ``don't touch'' val- ues in the stat data that is sent: zero-length strings for text values and ~0 for integral values. A read of a directory yields an integral number of directory entries in the machine independent encoding given above (see read(5)). Note that since the stat information is sent as a 9P variable-length datum, it is limited to a maximum of 65535 bytes. ENTRY POINTS Stat messages are generated by fstat and stat. Wstat messages are generated by fwstat and wstat. Page 16 Plan 9 (printed 1/27/00) VERSION(5) VERSION(5) NAME version - negotiate protocol version SYNOPSIS Tversion size[4] tag[2] msize[4] version[s] Rversion size[4] tag[2] msize[4] version[s] DESCRIPTION The version request negotiates the protocol version and mes- sage size to be used on the connection. Tversion must be the first message sent on the 9P connection, and the client cannot issue any further requests until it has received the Rversion reply. The client suggests a maximum message size, msize, that is the maximum length, in bytes, it will ever generate or expect to receive in a single 9P message. This count includes all 9P protocol data, starting from the size field and extending through the message, but excludes enveloping transport protocols. The server responds with its own maxi- mum, msize, which must be less than or equal to the client's value. Thenceforth, both sides of the connection must honor this limit. The version string identifies the level of the protocol. The string must always begin with the two characters ``9P''. If the server does not understand the client's version string, it should respond with an Rversion message (not Rerror) with the version string the 7 characters ``unknown''. The server may respond with the client's version string, or a version string identifying an earlier defined protocol version. Currently, the only defined version is the 6 char- acters ``9P2000''. Version strings will be defined such that, if the client string contains one or more period char- acters, the initial substring up to but not including any single period in the version string defines a version of the protocol. Other version strings may also be valid, however. The client and server will use the protocol version defined by the server's response for all subsequent communication on the connection. ENTRY POINTS The version message is generated by the kernel by the first mount system call on the connection. Page 17 Plan 9 (printed 1/27/00) WALK(5) WALK(5) NAME walk - descend a directory hierarchy SYNOPSIS Twalk size[4] tag[2] fid[4] newfid[4] nwname[2] nwname*(wname[s]) Rwalk size[4] tag[2] nqid[2] nqid*(qid[13]) DESCRIPTION The walk request carries as arguments an existing fid, which must represent a directory, and a proposed newfid (which must not be in use unless it is the same as fid) that the client wishes to associate with the result of descending the directory hierarchy by `walking' the hierarchy using the successive path name elements wname. The fid must be valid in the current session and must not have been opened for I/O by an open or create message. If the full sequence of nwname elements is walked successfully, newfid will represent the file that results. If not, newfid (and fid) will be unaffected. However, if newfid is in use or otherwise illegal, an Rerror is returned. The element ``..'' (dot-dot) represents the parent direc- tory. The name ``.'' (dot), meaning the current directory, is not used in the protocol. It is legal for nwname to be zero, in which case newfid will represent the same file as fid and the walk will usually succeed; this is equivalent to walking to dot. The rest of this discussion assumes nwname is greater than zero. The nwname path name elements wname are walked in order, ``elementwise''. For the first elementwise walk to succeed, the file identified by fid must be a directory, and the implied user of the request must have permission to search the directory (see intro(5)). Subsequent elementwise walks have equivalent restrictions applied to the implicit fid that results from the preceding elementwise walk. If the first element cannot be walked for any reason, Rerror is returned. Otherwise, the walk will return an Rwalk mes- sage containing nqid qids corresponding, in order, to the files that are visited by the nqid successful elementwise walks; nqid is therefore either nwname or the index of the first elementwise walk that failed. The value of nqid can- not be zero unless nwname is zero. Also, nqid will always be less than or equal to nwname. Only if it is equal, how- ever, will newfid be affected, in which case it will repre- sent the file reached by the final elementwise walk Page 18 Plan 9 (printed 1/27/00) WALK(5) WALK(5) requested in the message. A walk of the name ``..'' in the root directory of a server is equivalent to a walk with no name elements. If newfid is the same as fid, the above discussion applies, with the obvious difference that if the walk changes the state of newfid, it also changes the state of fid; and if newfid is unaffected, then fid is also unaffected. To simplify the implementation of the servers, a maximum of sixteen name elements or qids may be packed in a single mes- sage. This constant is called MAXWELEM in fcall(2). Despite this restriction, the system imposes no limit on the number of elements in a file name, only the number that may be transmitted in a single message. ENTRY POINTS A call to chdir(2) causes a walk. One or more walk messages may be generated by any of the following calls, which evalu- ate file names: bind, create, exec, mount, open, remove, stat, unmount, wstat. The file name element . (dot) is interpreted locally and is not transmitted in walk messages. Page 19 Plan 9 (printed 1/27/00) DIRREAD(2) DIRREAD(2) NAME dirread, dirreadall - read directory SYNOPSIS #include #include long dirread(int fd, Dir **buf) long dirreadall(int fd, Dir **buf) #define STATMAX 65535U #define DIRMAX (sizeof(Dir)+STATMAX) DESCRIPTION The data returned by a read(2) on a directory is a set of complete directory entries in a machine-independent format, exactly equivalent to the result of a stat(2) on each file or subdirectory in the directory. Dirread decodes the directory entries into a machine-dependent form. It reads from fd and unpacks the data into an array of Dir structures whose address is returned in *buf (see stat(2) for the lay- out of a Dir). The array is allocated with malloc(1) each time dirread is called. Dirreadall is like dirread, but reads in the entire direc- tory; by contrast, dirread steps through a directory on read(2) at a time. Directory entries have variable length. A successful read of a directory always returns an integral number of complete directory entries; dirread always returns complete Dir structures. See read(5) for more information. The constant STATMAX is the maximum size that a directory entry can occupy. The constant DIRMAX is an upper limit on the size necessary to hold a Dir structure and all the asso- ciated data. Dirread returns the number of Dir structures filled in buf. The file offset is advanced by the number of bytes actually read. SOURCE /sys/src/libc/9sys/dirread.c SEE ALSO intro(2), open(2), read(2) Page 20 Plan 9 (printed 1/27/00) DIRREAD(2) DIRREAD(2) DIAGNOSTICS Sets errstr. Page 21 Plan 9 (printed 1/27/00) FCALL(2) FCALL(2) NAME Fcall, convS2M, convD2M, convM2S, convM2D, getS, fcallconv, dirconv, dirmodeconv, read9pmsg - interface to Plan 9 File protocol SYNOPSIS #include #include #include #include uint convS2M(Fcall *f, uchar *ap, uint nap) uint convD2M(Dir *d, uchar *ap, uint nap) uint convM2S(uchar *ap, uint nap, Fcall *f) uint convM2D(uchar *ap, uint nap, Dir *d, char *strs) int dirconv(void *o, Fconv*) int fcallconv(void *o, Fconv*) int dirmodeconv(void *o, Fconv*) int read9pmsg(int fd, uchar *buf, uint nbuf); DESCRIPTION These routines convert messages in the machine-independent format of the Plan 9 file protocol, 9P, to and from a more convenient form, an Fcall structure: #define MAXWELEM 16 typedef struct Fcall { uchar type; u32int fid; ushort tag; union { struct { u32int msize;/* Tversion, Rversion */ char *version; /* Tversion, Rversion */ }; struct { u32int oldtag;/* Tflush */ }; struct { char *ename; /* Rerror */ Page 22 Plan 9 (printed 1/27/00) FCALL(2) FCALL(2) }; struct { Qid qid; /* Rattach, Ropen, Rcreate */ u32int iounit;/* Ropen, Rcreate */ ushort nrauth;/* Rattach */ uchar *rauth; /* Rattach */ }; struct { char *uname; /* Tattach */ char *aname; /* Tattach */ ushort nauth;/* Tattach */ uchar *auth; /* Tattach */ }; struct { char *authid; /* Rsession */ char *authdom; /* Rsession */ ushort nchal;/* Tsession/Rsession */ uchar *chal; /* Tsession/Rsession */ }; struct { u32int perm;/* Tcreate */ char *name; /* Tcreate */ uchar mode; /* Tcreate, Topen */ }; struct { u32int newfid;/* Twalk */ ushort nwname;/* Twalk */ char *wname[MAXWELEM]; /* Twalk */ }; struct { ushort nwqid;/* Rwalk */ Qid wqid[MAXWELEM]; /* Rwalk */ }; struct { vlong offset; /* Tread, Twrite */ u32int count;/* Tread, Twrite, Rread */ char *data; /* Twrite, Rread */ }; struct { ushort nstat;/* Twstat, Rstat */ uchar *stat; /* Twstat, Rstat */ }; }; } Fcall; /* these are implemented as macros */ uchar GBIT8(uchar*) ushort GBIT16(uchar*) ulong GBIT32(uchar*) vlong GBIT64(uchar*) Page 23 Plan 9 (printed 1/27/00) FCALL(2) FCALL(2) void PBIT8(uchar*, uchar) void PBIT16(uchar*, ushort) void PBIT32(uchar*, ulong) void PBIT64(uchar*, vlong) #define BIT8SZ 1 #define BIT16SZ 2 #define BIT32SZ 4 #define BIT64SZ 8 This structure is defined in . See section 5 for a full description of 9P messages and their encoding. For all message types, the type field of an Fcall holds one of Tnop, Rnop, Tsession, Rsession, etc. (defined in an enumerated type in ). Fid is used by most messages, and tag is used by all messages. The other fields are used selec- tively by the message types given in comments. ConvM2S takes a 9P message at ap of length nap, and uses it to fill in Fcall structure f. If the passed message includ- ing any data for Twrite and Rread messages is formatted properly, the return value is the number of bytes the mes- sage occupied in the buffer ap, which will always be less than or equal to nap; otherwise it is 0. For Twrite and Tread messages, data is set to a pointer into the argument message, not a copy. ConvS2M does the reverse conversion, turning f into a mes- sage starting at ap. The length of the resulting message is returned. For Twrite and Rread messages, count bytes start- ing at data are copied into the message. The constant IOHDRSZ is a suitable amount of buffer to reserve for storing the 9P header; the data portion of a Twrite or Rread will be no more than the buffer size nego- tated in the Tversion/Rversion exchange, minus IOHDRSZ. Another structure is Dir, used by the routines described in stat(2). ConvM2D converts the machine-independent form starting at ap into d and returns the length of the machine-independent, input encoding. The strings in the returned Dir structure are stored at successive locations starting at strs; if strs is nil they are ignored; however, the return value still includes their length. ConvD2M does the reverse translation, also returning the length of the encoding. If the buffer is too short, the return value will be BIT16SZ and the correct size will be returned in the first BIT16SZ bytes. The macro GBIT16 can be used to extract the correct value. The related macros with different sizes retrieve the corresponding-sized quan- tities. PBIT16 and its brethren place values in messages. Page 24 Plan 9 (printed 1/27/00) FCALL(2) FCALL(2) With the exception of handling short buffers in convD2M, these macros are not usually needed except by internal rou- tines. GetS reads a message from file descriptor fd into ap and converts the message using convM2S into the Fcall structure f. The lp argument must point to a long holding the size of the ap buffer. It is somewhat resilient to transient read errors. If convM2S succeeds, its return value is stored in *lp, and getS returns zero. Otherwise getS returns a string identifying the error. Dirconv, fcallconv, and dirmodeconv are formatting routines, suitable for fmtinstall (see print(2)). They convert Dir*, Fcall*, and long values into string representations of the directory buffer, Fcall buffer, or file mode value. Fcallconv assumes that dirconv has been installed with for- mat letter `D' and dirmodeconv with format letter `M'. Read9pmsg calls read(2) multiple times, if necessary, to read an entire 9P message into buf. The return value is 0 for end of file, or -1 for error; it does not return partial messages. SOURCE /sys/src/libc/9sys SEE ALSO intro(2), stat(2), intro(5) Page 25 Plan 9 (printed 1/27/00) STAT(2) STAT(2) NAME stat, fstat, wstat, fwstat, dirstat, dirfstat, dirwstat, dirfwstat, nulldir - get and put file status SYNOPSIS #include #include int stat(char *name, uchar *edir, int nedir) int fstat(int fd, uchar *edir, int nedir) int wstat(char *name, uchar *edir, int nedir) int fwstat(int fd, uchar *edir, int nedir) Dir* dirstat(char *name) Dir* dirfstat(int fd) int dirwstat(char *name, Dir *dir) int dirfwstat(int fd, Dir *dir) void nulldir(Dir *d) DESCRIPTION Given a file's name, or an open file descriptor fd, these routines retrieve or modify file status information. Stat, fstat, wstat, and fwstat are the system calls; they deal with machine-independent directory entries. Their format is defined by stat(5). Stat and fstat retrieve information about name or fd into edir, a buffer of length nedir, defined in . Wstat and fwstat write information back, thus changing file attributes according to the con- tents of edir. The data returned from the kernel includes its leading 16-bit length field as described in intro(5). For symmetry, this field mustal also be present when passing data to the kernel in a call to wstat and fwstat, but its value is ignored. Dirstat, dirfstat, dirwstat, and dirfwstat are similar to their counterparts, except that they operate on Dir struc- tures: typedef struct Dir { /* system-modified data */ uint type; /* server type */ uint dev; /* server subtype */ Page 26 Plan 9 (printed 1/27/00) STAT(2) STAT(2) /* file data */ Qid qid; /* unique id from server */ ulong mode; /* permissions */ ulong atime; /* last read time */ ulong mtime; /* last write time */ vlong length; /* file length: see */ char *name; /* last element of path */ char *uid; /* owner name */ char *gid; /* group name */ char *muid; /* last modifier name */ } Dir; The returned structure is allocated by malloc(2); freeing it also frees the associated strings. This structure and the Qid structure are defined in . If the file resides on permanent storage and is not a directory, the length returned by stat is the number of bytes in the file. For directories, the length returned is zero. For files that are streams (e.g., pipes and net- work connections), the length is the number of bytes that can be read without blocking. Each file is the responsibility of some server: it could be a file server, a kernel device, or a user process. Type identifies the server type, and dev says which of a group of servers of the same type is the one responsible for this file. Qid is a structure containing path and vers fields: path is guaranteed to be unique among all path names cur- rently on the file server, and vers changes each time the file is modified. The path is a long long (64 bits, vlong) and the vers is an unsigned long (32 bits, ulong). Thus, if two files have the same type, dev, and qid they are the same file. The bits in mode are defined by 0x80000000 directory 0x40000000 append only 0x20000000 exclusive use (locked) 0400 read permission by owner 0200 write permission by owner 0100 execute permission (search on directory) by owner 0070 read, write, execute (search) by group 0007 read, write, execute (search) by others There are constants defined in for these bits: DMDIR, DMAPPEND, and DMEXCL for the first three; and DMREAD, DMWRITE, and DMEXEC for the read, write, and execute bits for others. Page 27 Plan 9 (printed 1/27/00) STAT(2) STAT(2) The two time fields are measured in seconds since the epoch (Jan 1 00:00 1970 GMT). Mtime is the time of the last change of content. Similarly, atime is set whenever the contents are accessed; also, it is set whenever mtime is set. Uid and gid are the names of the owner and group of the file; muid is the name of the user that last modified the file (setting mtime). Groups are also users, but each server is free to associate a list of users with any user name g, and that list is the set of users in the group g. When an initial attachment is made to a server, the user string in the process group is communicated to the server. Thus, the server knows, for any given file access, whether the accessing process is the owner of, or in the group of, the file. This selects which sets of three bits in mode is used to check permissions. Only some of the fields may be changed with the wstat calls. The name can be changed by anyone with write permission in the parent directory. The mode and mtime can be changed by the owner or the group leader of the file's current group. The gid can be changed by the owner if he or she is a member of the new group. The gid can be changed by the group leader of the file's current group if he or she is the leader of the new group. The length can be changed by any- one with write permission, provided the operation is imple- mented by the server. (See intro(5) for permission informa- tion, and users(6) for user and group information). Special values in the fields of the Dir passed to wstat indicate that the field is not intended to be changed by the call. The values are ~0 for integral values and the empty string for string values. The routine nulldir initializes a Dir to all `ignore' values. Thus one may change the mode, for example, by using nulldir to initialize a Dir, then set- ting the mode, and then doing wstat; it is not necessary to use stat to retrieve the initial values first. SOURCE /sys/src/libc/9syscall for the non-dir routines /sys/src/libc/9sys for the routines prefixed dir SEE ALSO intro(2), fcall(2), dirread(2), stat(5) DIAGNOSTICS All these functions return the number of bytes copied on success, -1 on error, and set errstr. If the buffer for stat or fstat is too short for the returned data, the return value will be BIT16SZ (see Page 28 Plan 9 (printed 1/27/00) STAT(2) STAT(2) fcall(2)) and the two bytes returned will contain the ini- tial count field of the returned data; retrying with nedir equal to that value plus BIT16SZ (for the count itself) should succeed. Page 29 Plan 9 (printed 1/27/00)