From mboxrd@z Thu Jan  1 00:00:00 1970
From: "rob pike" <rob@plan9.bell-labs.com>
To: 9fans@cse.psu.edu
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Message-Id: <20010127215828.D7A06199D5@mail.cse.psu.edu>
Subject: [9fans] 9P2000
Date: Sat, 27 Jan 2001 16:58:26 -0500
Topicbox-Message-UUID: 53af3f14-eac9-11e9-9e20-41e7f4b1d025

I've been thinking of sending this information to 9fans for a while.
Since the cat is out of the bag, now is as good a time as any.  We
have reworked 9P to address many of its failings, most important:

1)	Nesting and encapsulation: exportfs embeds 9P within 9P,
	which can make reads and writes not fit within the 8K limit.

2)	Walk performance: it takes too many walks to evaluate a
	name.

3)	Sizes fixed and too small: read/write sizes and, most
	important, path name elements have limited, too-small sizes.

4)	Authentication too rigid: the authentication protocols were
	defined in the protocol and so impossible to change.

And a host of other lesser things.

We have a file server and kernel running this protocol now and have
adapted much but not all of our stuff; it's not yet the system we live
with.  Comments on the following man pages are welcome.  I've included
all of section 5 (9P itself, now 9P2000) and some relevant parts of
section 2.  Directory handling is very different, for example.

There may be many errors in these pages and many details are sure to
change before we're done.

Until this stuff gets installed and a lot of shaking down has
happened, there won't be much in the way of updates to the existing
distribution.  That is the real reason things have seemed quiet
lately, not the Lucent announcements.

-rob

     INTRO(5)                                                 INTRO(5)

     NAME
          intro - introduction to the Plan 9 File Protocol, 9P

     SYNOPSIS
          #include <fcall.h>

     DESCRIPTION
          A Plan 9 server is an agent that provides one or more hier-
          archical file systems - file trees - that may be accessed by
          Plan 9 processes.  A server responds to requests by clients
          to navigate the hierarchy, and to create, remove, read, and
          write files.  The prototypical server is a separate machine
          that stores large numbers of user files on permanent media;
          such a machine is called, somewhat confusingly, a file
          server. Another possibility for a server is to synthesize
          files on demand, perhaps based on information on data struc-
          tures inside the kernel; the proc(3) kernel device is a part
          of the Plan 9 kernel that does this.  User programs can also
          act as servers.

          A connection to a server is a bidirectional communication
          path from the client to the server.  There may be a single
          client or multiple clients sharing the same connection.  A
          server's file tree is attached to a process group's name
          space by bind(2) and mount calls; see intro(2). Processes in
          the group are then clients of the server: system calls oper-
          ating on files are translated into requests and responses
          transmitted on the connection to the appropriate service.

          The Plan 9 File Protocol, 9P, is used for messages between
          clients and servers. A client transmits requests (T-
          messages) to a server, which subsequently returns replies
          (R-messages) to the client.  The combined acts of transmit-
          ting (receiving) a request of a particular type, and receiv-
          ing (transmitting) its reply is called a transaction of that
          type.

          Each message consists of a sequence of bytes.  Two-, four-,
          and eight-byte fields hold unsigned integers represented in
          little-endian order (least significant byte first).  Data
          items of larger or variable lengths are represented by a
          two-byte field specifying a count, n, followed by n bytes of
          data.  Text strings are represented this way, with the text
          itself stored as a UTF-8 encoded sequence of Unicode charac-
          ters (see utf(6)). Text strings in 9P messages are not NUL-
          terminated: n counts the bytes of UTF-8 data, which include
          no final zero byte.  The NUL character is illegal in all
          text strings in 9P, and is therefore excluded from file
          names, user names, and so on.

     Page 1                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          Each 9P message begins with a four-byte size field specify-
          ing the length in bytes of the complete message including
          the four bytes of the size field itself.  The next byte is
          the message type, one of the constants in the enumeration in
          the include file <fcall.h>.  The remaining bytes are parame-
          ters of different sizes.  In the message descriptions below,
          the number of bytes in a field is given in brackets after
          the field name.  The notation parameter[n] where n is not a
          constant represents a variable-length parameter: n[2] fol-
          lowed by n bytes of data forming the parameter. The notation
          string[s] (using a literal s character) is shorthand for
          s[2] followed by s bytes of UTF-8 text.  (Systems may choose
          to reduce the set of legal characters to reduce syntactic
          problems, for example to remove slashes from name compo-
          nents, but the protocol has no such restriction.  Plan 9
          names may contain any printable character (that is, any
          character outside hexadecimal 00-1F and 80-9F) except
          slash.)  Messages are transported in byte form to allow for
          machine independence; fcall(2) describes routines that con-
          vert to and from this form into a machine-dependent C struc-
          ture.

     MESSAGES
               Tversion  size[4] tag[2] msize[4] version[s]
               Rversion  size[4] tag[2] msize[4] version[s]

               Tsession  size[4] tag[2] chal[n]
               Rsession  size[4] tag[2] chal[n] authid[s] authdom[s]

               Rerror    size[4] tag[2] ename[s]

               Tflush    size[4] tag[2] oldtag[4]
               Rflush    size[4] tag[2]

               Tattach   size[4] tag[2] fid[4] uname[s] aname[s]
               auth[n]
               Rattach   size[4] tag[2] qid[13] rauth[n]

               Twalk     size[4] tag[2] fid[4] newfid[4] nwname[2]
               nwname*(wname[s])
               Rwalk     size[4] tag[2] nwqid[2] nwqid*(wqid[13])

               Topen     size[4] tag[2] fid[4] mode[1]
               Ropen     size[4] tag[2] qid[13] iounit[4]

               Tcreate   size[4] tag[2] fid[4] name[s] perm[4] mode[1]
               Rcreate   size[4] tag[2] qid[13] iounit[4]

               Tread     size[4] tag[2] fid[4] offset[8] count[4]
               Rread     size[4] tag[2] count[4] data[count]

     Page 2                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

               Twrite    size[4] tag[2] fid[4] offset[8] count[4]
               data[count]
               Rwrite    size[4] tag[2] count[4]

               Tclunk    size[4] tag[2] fid[4]
               Rclunk    size[4] tag[2]

               Tremove   size[4] tag[2] fid[4]
               Rremove   size[4] tag[2]

               Tstat     size[4] tag[2] fid[4]
               Rstat     size[4] tag[2] stat[n]

               Twstat    size[4] tag[2] fid[4] stat[n]
               Rwstat    size[4] tag[2]

          Each T-message has a tag field, chosen and used by the
          client to identify the message.  The reply to the message
          will have the same tag.  Clients must arrange that no two
          outstanding messages on the same connection have the same
          tag.  An exception is the tag ~0, meaning `no tag': the
          client can use it, when establishing a connection, to over-
          ride tag matching in version and session messages.

          The type of an R-message will either be one greater than the
          type of the corresponding T-message or Rerror, indicating
          that the request failed.  In the latter case, the ename
          field contains a string describing the reason for failure.

          The version message identifies the version of the protocol
          and indicates the maximum message size the system is pre-
          pared to handle.  A session request initializes a connection
          and aborts all outstanding I/O on the connection.  The set
          of messages between session requests is called a session.

          Most T-messages contain a fid, a 32-bit unsigned integer
          that the client uses to identify a ``current file'' on the
          server.  Fids are somewhat like file descriptors in a user
          process, but they are not restricted to files open for I/O:
          directories being examined, files being accessed by stat(2)
          calls, and so on - all files being manipulated by the oper-
          ating system - are identified by fids.  Fids are chosen by
          the client.  All requests on a connection share the same fid
          space; when several clients share a connection, the agent
          managing the sharing must arrange that no two clients choose
          the same fid.

          The first fid supplied (in an attach message) will be taken
          by the server to refer to the root of the served file tree.
          The attach identifies the user to the server and may specify
          a particular file tree served by the server (for those that
          supply more than one).  A walk message causes the server to

     Page 3                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          change the current file associated with a fid to be a file
          in the directory that is the old current file, or one of its
          subdirectories.  Walk returns a new fid that refers to the
          resulting file.  Usually, a client maintains a fid for the
          root, and navigates by walks from the root fid.

          A client can send multiple T-messages without waiting for
          the corresponding R-messages, but all outstanding T-messages
          must specify different tags.  The server may delay the
          response to a request on one fid and respond to later
          requests on other fids; this is sometimes necessary, for
          example when the client reads from a file that the server
          synthesizes from external events such as keyboard charac-
          ters.

          Replies (R-messages) to attach, walk, open, and create
          requests convey a qid field back to the client.  The qid
          represents the server's unique identification for the file
          being accessed: two files on the same server hierarchy are
          the same if and only if their qids are the same.  (The
          client may have multiple fids pointing to a single file on a
          server and hence having a single qid.)  The seventeen-byte
          qid fields hold a one-byte type, specifying whether the file
          is a directory, append-only file, etc., and two eight-byte
          unsigned integers: first the qid path, then the qid version.
          The path is an integer unique among all files in the hierar-
          chy.  If a file is deleted and recreated with the same name
          in the same directory, the old and new path components of
          the qids should be different.  The version is a version num-
          ber for a file; typically, it is incremented every time the
          file is modified.

          An existing file can be opened, or a new file may be created
          in the current (directory) file.  I/O of a given number of
          bytes at a given offset on an open file is done by read and
          write.

          A client should clunk any fid that is no longer needed.  The
          remove transaction deletes files.

          The stat transaction retrieves information about the file.
          The stat field in the reply includes the file's name, access
          permissions (read, write and execute for owner, group and
          public), access and modification times, and owner and group
          identifications (see stat(2)). The owner and group identifi-
          cations are textual names.  The wstat transaction allows
          some of a file's properties to be changed.

          A request can be aborted with a Tflush request.  When a
          server receives a Tflush, it should not reply to the message
          with tag oldtag (unless it has already replied), and it
          should immediately send an Rflush.  The client must wait

     Page 4                       Plan 9             (printed 1/27/00)

     INTRO(5)                                                 INTRO(5)

          until it gets the Rflush (even if the reply to the original
          message arrives in the interim), at which point oldtag may
          be reused.

          Most programs do not see the 9P protocol directly; instead
          calls to library routines that access files are translated
          by the mount driver, mnt(3), into 9P messages.

     DIRECTORIES
          Directories are created by create with DMDIR set in the per-
          missions argument (see stat(5)). The members of a directory
          can be found with read(5). All directories must support
          walks to the directory .. (dot-dot) meaning parent direc-
          tory, although by convention directories contain no explicit
          entry for .. or . (dot).  The parent of the root directory
          of a server's tree is itself.

     ACCESS PERMISSIONS
          Each file server maintains a set of user and group names.
          Each user can be a member of any number of groups.  Each
          group has a group leader who has special privileges (see
          stat(5) and users(6)). Every file request has an implicit
          user id (copied from the original attach) and an implicit
          set of groups (every group of which the user is a member).

          Each file has an associated owner and group id and three
          sets of permissions: those of the owner, those of the group,
          and those of ``other'' users.  When the owner attempts to do
          something to a file, the owner, group, and other permissions
          are consulted, and if any of them grant the requested per-
          mission, the operation is allowed.  For someone who is not
          the owner, but is a member of the file's group, the group
          and other permissions are consulted.  For everyone else, the
          other permissions are used.  Each set of permissions says
          whether reading is allowed, whether writing is allowed, and
          whether executing is allowed.  A walk in a directory is
          regarded as executing the directory, not reading it.  Per-
          missions are kept in the low-order bits of the file mode:
          owner read/write/execute permission represented as 1 in bits
          8, 7, and 6 respectively (using 0 to number the low order).
          The group permissions are in bits 5, 4, and 3, and the other
          permissions are in bits 2, 1, and 0.

          The file mode contains some additional attributes besides
          the permissions.  If bit 31 is set, the file is a directory;
          if bit 30 is set, the file is append-only (offset is ignored
          in writes); if bit 29 is set, the file is exclusive-use
          (only one client may have it open at a time).  These bits
          are reproduced, from the top bit down, in the type byte of
          the Qid.

     Page 5                       Plan 9             (printed 1/27/00)

     ATTACH(5)                                               ATTACH(5)

     NAME
          attach, session - messages to initiate activity

     SYNOPSIS
          Tsession  size[4] tag[2] chal[n]
          Rsession  size[4] tag[2] chal[n] authid[s] authdom[s]

          Tattach   size[4] tag[2] fid[4] uid[s] aname[s] auth[n]
          Rattach   size[4] tag[2] qid[13] rauth[n]

     DESCRIPTION
          The session request initializes a connection between a
          client and a server and exchanges authentication informa-
          tion.  All outstanding I/O on the connection is aborted.
          The set of messages between session requests is called a
          session. The host's user name (authid) and its authentica-
          tion domain (authdom) identify the key to be used when
          authenticating to this host.  The exchanged challenges
          (chal) are used in the authentication algorithm.  If authid
          is an empty string no authentication is performed in this
          session.

          The tag should be NOTAG (value ~0) for a session message.

          The attach message serves as a fresh introduction from a
          user on the client machine to the server.  The message iden-
          tifies the user (uid) and may select the file tree to access
          (aname).  The auth argument contains authorization data
          derived from the exchanged challenges of the session mes-
          sage; see auth(6).

          As a result of the attach transaction, the client will have
          a connection to the root directory of the desired file tree,
          represented by fid. An error is returned if fid is already
          in use.  The server's idea of the root of the file tree is
          represented by the returned qid.

     ENTRY POINTS
          An attach transaction will be generated for kernel devices
          (see intro(3)) when a system call evaluates a file name
          beginning with `#'.  Pipe(2) generates an attach on the ker-
          nel device pipe(3). The mount system call (see bind(2)) gen-
          erates an attach message to the remote file server.  When
          the kernel boots, an attach is made to the root device,
          root(3), and then an attach is made to the requested file
          server machine.

     SEE ALSO
          version(5), auth(6)

     Page 6                       Plan 9             (printed 1/27/00)

     CLUNK(5)                                                 CLUNK(5)

     NAME
          clunk - forget about a fid

     SYNOPSIS
          Tclunk  size[4] tag[2] fid[4]
          Rclunk  size[4] tag[2]

     DESCRIPTION
          The clunk request informs the file server that the current
          file represented by fid is no longer needed by the client.
          The actual file is not removed on the server unless the fid
          had been opened with ORCLOSE.

          Once a fid has been clunked, the same fid can be reused in a
          new walk or attach request.

          Even if the clunk returns an error, the fid is no longer
          valid.

     ENTRY POINTS
          A clunk message is generated by close and indirectly by
          other actions such as failed open calls.

     Page 7                       Plan 9             (printed 1/27/00)

     ERROR(5)                                                 ERROR(5)

     NAME
          error - return an error

     SYNOPSIS
          Rerror  size[4] tag[2] ename[s]

     DESCRIPTION
          The Rerror request (there is no Terror) is used to return an
          error string describing the failure of a transaction.  It
          replaces the corresponding reply message that would accom-
          pany a successful call; its tag is that of the request.

          By convention, clients may truncate error messages after 255
          bytes, defined as ERRMAX in <libc.h>.

     Page 8                       Plan 9             (printed 1/27/00)

     FLUSH(5)                                                 FLUSH(5)

     NAME
          flush - abort a message

     SYNOPSIS
          Tflush  size[4] tag[2] oldtag[4]
          Rflush  size[4] tag[2]

     DESCRIPTION
          When the response to a request is no longer needed, such as
          when a user interrupts a process doing a read(2), a Tflush
          request is sent to the server to purge the pending response.
          The message being flushed is identified by oldtag. The
          semantics of flush depends on messages arriving in order.

          The server must answer the flush message immediately.  If it
          recognizes oldtag as the tag of a pending transaction, it
          should abort any pending response and discard that tag.  In
          either case, it should respond with an Rflush echoing the
          tag (not oldtag) of the Tflush message.  A Tflush can never
          be responded to by an Rerror message.

          When the client sends a Tflush, it must wait to receive the
          corresponding Rflush before reusing oldtag for subsequent
          messages.  If a response to the flushed request is received
          before the Rflush, the client must honor the response as if
          it had not been flushed, since the completed request may
          signify a state change in the server.  For instance, Tcreate
          will have created a file and Twalk may have allocated a fid.
          If no response is received before the Rflush, the flushed
          transaction is considered to have been canceled, and should
          be treated as though it had never been sent.

          Several exceptional conditions are handled correctly by the
          above specification: sending multiple flushes for a single
          tag, flushing after a transaction is completed, flushing a
          Tflush, and flushing an invalid tag.

     Page 9                       Plan 9             (printed 1/27/00)

     OPEN(5)                                                   OPEN(5)

     NAME
          open, create - prepare a fid for I/O on an existing or new
          file

     SYNOPSIS
          Topen    size[4] tag[2] fid[4] mode[1]
          Ropen    size[4] tag[2] qid[13] iounit[4]

          Tcreate  size[4] tag[2] fid[4] name[s] perm[4] mode[1]
          Rcreate  size[4] tag[2] qid[13] iounit[4]

     DESCRIPTION
          The open request asks the file server to check permissions
          and prepare a fid for I/O with subsequent read and write
          messages.  The mode field determines the type of I/O: 0, 1,
          2, and 3 mean read access, write access, read and write
          access, and execute access, to be checked against the per-
          missions for the file.  In addition, if mode has the OTRUNC
          (0x10) bit set, the file is to be truncated, which requires
          write permission (if the file is append-only, and permission
          is granted, the open succeeds but the file will not be trun-
          cated); if the mode has the ORCLOSE (0x40) bit set, the file
          is to be removed when the fid is clunked, which requires
          permission to remove the file from its directory.  If other
          bits are set in mode they will be ignored.  It is illegal to
          write a directory, truncate it, or attempt to remove it on
          close.  If the file is marked for exclusive use (see
          stat(5)), only one client can have the file open at any
          time.  That is, after such a file has been opened, further
          opens will fail until fid has been clunked.  All these per-
          missions are checked at the time of the open request; subse-
          quent changes to the permissions of files do not affect the
          ability to read, write, or remove an open file.

          The create request asks the file server to create a new file
          with the name supplied, in the directory (dir) represented
          by fid, and requires write permission in the directory.  The
          owner of the file is the implied user id of the request, the
          group of the file is the same as dir, and the permissions
          are the value of
                    perm & (~0666 | (dir.perm & 0666))
          if a regular file is being created and
                    perm & (~0777 | (dir.perm & 0777))
          if a directory is being created.  This means, for example,
          that if the create allows read permission to others, but the
          containing directory does not, then the created file will
          not allow others to read the file.

          Finally, the newly created file is opened according to mode,
          and fid will represent the newly opened file.  Mode is not

     Page 10                      Plan 9             (printed 1/27/00)

     OPEN(5)                                                   OPEN(5)

          checked against the permissions in perm. The qid for the new
          file is returned with the create reply message.

          Directories are created by setting the DMDIR bit
          (0x80000000) in the perm.

          The names . and .. are special; it is illegal to create
          files with these names.

          It is an error for either of these messages if the fid is
          already the product of a successful open or create message.

          An attempt to create a file in a directory where the given
          name already exists will be rejected; in this case, the
          create system call (see open(2)) uses open with truncation.
          The algorithm used by the create system call is: first walk
          to the directory to contain the file.  If that fails, return
          an error.  Next walk to the specified file.  If the walk
          succeeds, send a request to open and truncate the file and
          return the result, successful or not.  If the walk fails,
          send a create message.  If that fails, it may be because the
          file was created by another process after the previous walk
          failed, so (once) try the walk and open again.

          For the behavior of create on a union directory, see
          bind(2).

          The iounit field returned by open and create may be zero.
          If it is not, it is the maximum number of bytes that are
          guaranteed to be read from or written to the file without
          breaking the I/O transfer into multiple 9P messages; see
          read(5).

     ENTRY POINTS
          Open and create both generate open messages; only create
          generates a create message.

          For programs that need atomic file creation, without the
          race that exists in the open-create sequence described
          above, the kernel does the following.  If the OEXCL (0x1000)
          bit is set in the mode for a create system call, the open
          message is not sent; the kernel issues only the create.
          Thus, if the file exists, create will draw an error, but if
          it doesn't and the create system call succeeds, the process
          issuing the create is guaranteed to be the one that created
          the file.

     Page 11                      Plan 9             (printed 1/27/00)

     READ(5)                                                   READ(5)

     NAME
          read, write - transfer data from and to a file

     SYNOPSIS
          Tread   size[4] tag[2] fid[4] offset[8] count[4]
          Rread   size[4] tag[2] count[4] data[count]

          Twrite  size[4] tag[2] fid[4] offset[8] count[4] data[count]
          Rwrite  size[4] tag[2] count[4]

     DESCRIPTION
          The read request asks for count bytes of data from the file
          identified by fid, which must be opened for reading, start-
          ing offset bytes after the beginning of the file.  The bytes
          are returned with the read reply message.

          The count field in the reply indicates the number of bytes
          returned.  This may be less than the requested amount.  If
          the offset field is greater than or equal to the number of
          bytes in the file, a count of zero will be returned.

          For directories, read returns an integral number of direc-
          tory entries exactly as in stat (see stat(5)), one for each
          member of the directory.  The read request message must have
          offset equal to zero or the value of offset in the previous
          read on the directory, plus the number of bytes returned in
          the previous read.  In other words, seeking other than to
          the beginning is illegal in a directory (see seek(2)).

          The write request asks that count bytes of data be recorded
          in the file identified by fid, which must be opened for
          writing, starting offset bytes after the beginning of the
          file.  If the file has been opened append only, the data
          will be placed at the end of the file regardless of offset.
          Directories may not be written.

          The write reply records the number of bytes actually writ-
          ten.  It is usually an error if this is not the same as
          requested.

          Because 9P implementations may limit the size of individual
          messages, more than one message may be produced by a single
          read or write call.  The iounit field returned by open(5),
          if non-zero, reports the maximum size that is guaranteed to
          be transferred atomically.

     ENTRY POINTS
          Read and write messages are generated by the corresponding
          calls.  Although seek(2) affects the offset, it does not
          generate a message.

     Page 12                      Plan 9             (printed 1/27/00)

     REMOVE(5)                                               REMOVE(5)

     NAME
          remove - remove a file from a server

     SYNOPSIS
          Tremove  size[4] tag[2] fid[4]
          Rremove  size[4] tag[2]

     DESCRIPTION
          The remove request asks the file server both to remove the
          file represented by fid and to clunk the fid, even if the
          remove fails.  This request will fail if the client does not
          have write permission in the parent directory.

          It is correct to consider remove to be a clunk with the side
          effect of removing the file if permissions allow.

     ENTRY POINTS
          Remove messages are generated by remove.

     Page 13                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

     NAME
          stat, wstat - inquire or change file attributes

     SYNOPSIS
          Tstat   size[4] tag[2] fid[4]
          Rstat   size[4] tag[2] stat[n]

          Twstat  size[4] tag[2] fid[4] stat[n]
          Rwstat  size[4] tag[2]

     DESCRIPTION
          The stat transaction inquires about the file identified by
          fid. The reply will contain a machine-independent directory
          entry, stat, laid out as follows:

          type[2]
               for kernel use

          dev[4]
               for kernel use

          qid.type[1]
               the type of the file (directory, etc.), represented as
               a bit vector corresponding to the high 8 bits of the
               file's mode word.

          qid.vers[4]
               version number for given path

          qid.path[8]
               the file server's unique identification for the file

          mode[4]
               permissions and flags

          atime[4]
               last access time

          mtime[4]
               last modification time

          length[8]
               length of file in bytes

          name[ s ]
               file name; must be / if the file is the root directory
               of the server

          uid[ s ]
               owner name

     Page 14                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

          gid[ s ]
               group name

          muid[ s ]
               name of the user who last modified the file

          Integers in this encoding are in little-endian order (least
          significant byte first).  The convM2D and convD2M routines
          (see fcall(2)) convert between directory entries and C
          structs.

          This encoding may be turned into a machine dependent Dir
          structure (see stat(2)) using routines defined in fcall(2).

          The mode contains permission bits as described in intro(5)
          and the following: 0x80000000 (this file is a directory),
          0x40000000 (append only), 0x20000000 (exclusive use); these
          are echoed in Qid.type.  Writes to append-only files always
          place their data at the end of the file; the offset in the
          write message is ignored, as is the OTRUNC bit in an open.
          Exclusive use files may be open for I/O by only one fid at a
          time across all clients of the server.  If a second open is
          attempted, it draws an error.  Servers may implement a time-
          out on the lock on an exclusive use file: if the fid holding
          the file open has been unused for an extended period (of
          order at least minutes), it is reasonable to break the lock
          and deny the initial fid further I/O.

          The two time fields are measured in seconds since the epoch
          (Jan 1 00:00 1970 GMT).  The mtime field reflects the time
          of the last change of content (except when later changed by
          wstat).  For a plain file, mtime is the time of the most
          recent create, open with truncation, or write; for a direc-
          tory it is the time of the most recent remove, create, or
          wstat of a file in the directory.  Similarly, the atime
          field records the last read of the contents; also it is set
          whenever mtime is set.  In addition, for a directory, it is
          set by an attach, walk, or create, all whether successful or
          not.

          The muid field names the user whose actions most recently
          changed the mtime of the file.

          The length records the number of bytes in the file.  Direc-
          tories and most files representing devices have a conven-
          tional length of 0.

          The stat request requires no special permissions.

          The wstat request can change some of the file status infor-
          mation.  The name can be changed by anyone with write per-
          mission in the parent directory; it is an error to change

     Page 15                      Plan 9             (printed 1/27/00)

     STAT(5)                                                   STAT(5)

          the name to that of an existing file.  The length can be
          changed (affecting the actual length of the file) by anyone
          with write permission on the file.  It is an error to
          attempt to set the length of a directory to a non-zero
          value, and servers may decide to reject length changes for
          other reasons.  The mode and mtime can be changed by the
          owner of the file or the group leader of the file's current
          group.  The directory bit cannot be changed by a wstat; the
          other defined permission and mode bits can.  The gid can be
          changed: by the owner if also a member of the new group; or
          by the group leader of the file's current group if also
          leader of the new group (see intro(5) for more information
          about permissions and users(6) for users and groups).  None
          of the other data can be altered by a wstat.  In particular,
          there is no way to change the owner of a file.

          Either all the changes in wstat request happen, or none of
          them does: if the request succeeds, all changes were made;
          if it fails, none were.

          A wstat request can explicitly avoid modifying some proper-
          ties of the file by providing explicit ``don't touch'' val-
          ues in the stat data that is sent: zero-length strings for
          text values and ~0 for integral values.

          A read of a directory yields an integral number of directory
          entries in the machine independent encoding given above (see
          read(5)).

          Note that since the stat information is sent as a 9P
          variable-length datum, it is limited to a maximum of 65535
          bytes.

     ENTRY POINTS
          Stat messages are generated by fstat and stat.

          Wstat messages are generated by fwstat and wstat.

     Page 16                      Plan 9             (printed 1/27/00)

     VERSION(5)                                             VERSION(5)

     NAME
          version - negotiate protocol version

     SYNOPSIS
          Tversion size[4] tag[2] msize[4] version[s]
          Rversion size[4] tag[2] msize[4] version[s]

     DESCRIPTION
          The version request negotiates the protocol version and mes-
          sage size to be used on the connection.  Tversion must be
          the first message sent on the 9P connection, and the client
          cannot issue any further requests until it has received the
          Rversion reply.

          The client suggests a maximum message size, msize, that is
          the maximum length, in bytes, it will ever generate or
          expect to receive in a single 9P message.  This count
          includes all 9P protocol data, starting from the size field
          and extending through the message, but excludes enveloping
          transport protocols.  The server responds with its own maxi-
          mum, msize, which must be less than or equal to the client's
          value.  Thenceforth, both sides of the connection must honor
          this limit.

          The version string identifies the level of the protocol.
          The string must always begin with the two characters ``9P''.
          If the server does not understand the client's version
          string, it should respond with an Rversion message (not
          Rerror) with the version string the 7 characters
          ``unknown''.

          The server may respond with the client's version string, or
          a version string identifying an earlier defined protocol
          version.  Currently, the only defined version is the 6 char-
          acters ``9P2000''.  Version strings will be defined such
          that, if the client string contains one or more period char-
          acters, the initial substring up to but not including any
          single period in the version string defines a version of the
          protocol.  Other version strings may also be valid, however.

          The client and server will use the protocol version defined
          by the server's response for all subsequent communication on
          the connection.

     ENTRY POINTS
          The version message is generated by the kernel by the first
          mount system call on the connection.

     Page 17                      Plan 9             (printed 1/27/00)

     WALK(5)                                                   WALK(5)

     NAME
          walk - descend a directory hierarchy

     SYNOPSIS
          Twalk  size[4] tag[2] fid[4] newfid[4] nwname[2]
          nwname*(wname[s])
          Rwalk  size[4] tag[2] nqid[2] nqid*(qid[13])

     DESCRIPTION
          The walk request carries as arguments an existing fid, which
          must represent a directory, and a proposed newfid (which
          must not be in use unless it is the same as fid) that the
          client wishes to associate with the result of descending the
          directory hierarchy by `walking' the hierarchy using the
          successive path name elements wname.

          The fid must be valid in the current session and must not
          have been opened for I/O by an open or create message.  If
          the full sequence of nwname elements is walked successfully,
          newfid will represent the file that results.  If not, newfid
          (and fid) will be unaffected.  However, if newfid is in use
          or otherwise illegal, an Rerror is returned.

          The element ``..''  (dot-dot) represents the parent direc-
          tory.  The name ``.''  (dot), meaning the current directory,
          is not used in the protocol.

          It is legal for nwname to be zero, in which case newfid will
          represent the same file as fid and the walk will usually
          succeed; this is equivalent to walking to dot.  The rest of
          this discussion assumes nwname is greater than zero.

          The nwname path name elements wname are walked in order,
          ``elementwise''.  For the first elementwise walk to succeed,
          the file identified by fid must be a directory, and the
          implied user of the request must have permission to search
          the directory (see intro(5)). Subsequent elementwise walks
          have equivalent restrictions applied to the implicit fid
          that results from the preceding elementwise walk.

          If the first element cannot be walked for any reason, Rerror
          is returned.  Otherwise, the walk will return an Rwalk mes-
          sage containing nqid qids corresponding, in order, to the
          files that are visited by the nqid successful elementwise
          walks; nqid is therefore either nwname or the index of the
          first elementwise walk that failed.  The value of nqid can-
          not be zero unless nwname is zero.  Also, nqid will always
          be less than or equal to nwname.  Only if it is equal, how-
          ever, will newfid be affected, in which case it will repre-
          sent the file reached by the final elementwise walk

     Page 18                      Plan 9             (printed 1/27/00)

     WALK(5)                                                   WALK(5)

          requested in the message.

          A walk of the name ``..''  in the root directory of a server
          is equivalent to a walk with no name elements.

          If newfid is the same as fid, the above discussion applies,
          with the obvious difference that if the walk changes the
          state of newfid, it also changes the state of fid; and if
          newfid is unaffected, then fid is also unaffected.

          To simplify the implementation of the servers, a maximum of
          sixteen name elements or qids may be packed in a single mes-
          sage.  This constant is called MAXWELEM in fcall(2). Despite
          this restriction, the system imposes no limit on the number
          of elements in a file name, only the number that may be
          transmitted in a single message.

     ENTRY POINTS
          A call to chdir(2) causes a walk.  One or more walk messages
          may be generated by any of the following calls, which evalu-
          ate file names: bind, create, exec, mount, open, remove,
          stat, unmount, wstat. The file name element . (dot) is
          interpreted locally and is not transmitted in walk messages.

     Page 19                      Plan 9             (printed 1/27/00)

     DIRREAD(2)                                             DIRREAD(2)

     NAME
          dirread, dirreadall - read directory

     SYNOPSIS
          #include <u.h>
          #include <libc.h>

          long dirread(int fd, Dir **buf)

          long dirreadall(int fd, Dir **buf)

          #define   STATMAX   65535U

          #define   DIRMAX    (sizeof(Dir)+STATMAX)

     DESCRIPTION
          The data returned by a read(2) on a directory is a set of
          complete directory entries in a machine-independent format,
          exactly equivalent to the result of a stat(2) on each file
          or subdirectory in the directory.  Dirread decodes the
          directory entries into a machine-dependent form.  It reads
          from fd and unpacks the data into an array of Dir structures
          whose address is returned in *buf (see stat(2) for the lay-
          out of a Dir).  The array is allocated with malloc(1) each
          time dirread is called.

          Dirreadall is like dirread, but reads in the entire direc-
          tory; by contrast, dirread steps through a directory on
          read(2) at a time.

          Directory entries have variable length.  A successful read
          of a directory always returns an integral number of complete
          directory entries; dirread always returns complete Dir
          structures.  See read(5) for more information.

          The constant STATMAX is the maximum size that a directory
          entry can occupy.  The constant DIRMAX is an upper limit on
          the size necessary to hold a Dir structure and all the asso-
          ciated data.

          Dirread returns the number of Dir structures filled in buf.
          The file offset is advanced by the number of bytes actually
          read.

     SOURCE
          /sys/src/libc/9sys/dirread.c

     SEE ALSO
          intro(2), open(2), read(2)

     Page 20                      Plan 9             (printed 1/27/00)

     DIRREAD(2)                                             DIRREAD(2)

     DIAGNOSTICS
          Sets errstr.

     Page 21                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

     NAME
          Fcall, convS2M, convD2M, convM2S, convM2D, getS, fcallconv,
          dirconv, dirmodeconv, read9pmsg - interface to Plan 9 File
          protocol

     SYNOPSIS
          #include <u.h>
          #include <libc.h>
          #include <auth.h>
          #include <fcall.h>

          uint convS2M(Fcall *f, uchar *ap, uint nap)

          uint convD2M(Dir *d, uchar *ap, uint nap)

          uint convM2S(uchar *ap, uint nap, Fcall *f)

          uint convM2D(uchar *ap, uint nap, Dir *d, char *strs)

          int dirconv(void *o, Fconv*)

          int fcallconv(void *o, Fconv*)

          int dirmodeconv(void *o, Fconv*)

          int read9pmsg(int fd, uchar *buf, uint nbuf);

     DESCRIPTION
          These routines convert messages in the machine-independent
          format of the Plan 9 file protocol, 9P, to and from a more
          convenient form, an Fcall structure:

          #define MAXWELEM 16

          typedef
          struct Fcall
          {
              uchar type;
              u32int     fid;
              ushort     tag;
              union {
                    struct {
                         u32int                  msize;/* Tversion, Rversion */
                         char  *version;         /* Tversion, Rversion */
                    };
                    struct {
                         u32int                  oldtag;/* Tflush */
                    };
                    struct {
                         char  *ename;               /* Rerror */

     Page 22                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

                    };
                    struct {
                         Qid   qid;                  /* Rattach, Ropen, Rcreate */
                         u32int                  iounit;/* Ropen, Rcreate */
                         ushort                  nrauth;/* Rattach */
                         uchar *rauth;               /* Rattach */
                    };
                    struct {
                         char  *uname;               /* Tattach */
                         char  *aname;               /* Tattach */
                         ushort                  nauth;/* Tattach */
                         uchar *auth;                /* Tattach */
                    };
                    struct {
                         char  *authid;          /* Rsession */
                         char  *authdom;         /* Rsession */
                         ushort                  nchal;/* Tsession/Rsession */
                         uchar *chal;                /* Tsession/Rsession */
                    };
                    struct {
                         u32int                  perm;/* Tcreate */
                         char  *name;                /* Tcreate */
                         uchar mode;                 /* Tcreate, Topen */
                    };
                    struct {
                         u32int                  newfid;/* Twalk */
                         ushort                  nwname;/* Twalk */
                         char  *wname[MAXWELEM]; /* Twalk */
                    };
                    struct {
                         ushort                  nwqid;/* Rwalk */
                         Qid   wqid[MAXWELEM];       /* Rwalk */
                    };
                    struct {
                         vlong offset;               /* Tread, Twrite */
                         u32int                  count;/* Tread, Twrite, Rread */
                         char  *data;                /* Twrite, Rread */
                    };
                    struct {
                         ushort                  nstat;/* Twstat, Rstat */
                         uchar *stat;                /* Twstat, Rstat */
                    };
              };
          } Fcall;

          /* these are implemented as macros */

          uchar     GBIT8(uchar*)
          ushort    GBIT16(uchar*)
          ulong     GBIT32(uchar*)
          vlong     GBIT64(uchar*)

     Page 23                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

          void      PBIT8(uchar*, uchar)
          void      PBIT16(uchar*, ushort)
          void      PBIT32(uchar*, ulong)
          void      PBIT64(uchar*, vlong)

          #define   BIT8SZ     1
          #define   BIT16SZ    2
          #define   BIT32SZ    4
          #define   BIT64SZ    8

          This structure is defined in <fcall.h>.  See section 5 for a
          full description of 9P messages and their encoding.  For all
          message types, the type field of an Fcall holds one of Tnop,
          Rnop, Tsession, Rsession, etc. (defined in an enumerated
          type in <fcall.h>).  Fid is used by most messages, and tag
          is used by all messages.  The other fields are used selec-
          tively by the message types given in comments.

          ConvM2S takes a 9P message at ap of length nap, and uses it
          to fill in Fcall structure f. If the passed message includ-
          ing any data for Twrite and Rread messages is formatted
          properly, the return value is the number of bytes the mes-
          sage occupied in the buffer ap, which will always be less
          than or equal to nap; otherwise it is 0.  For Twrite and
          Tread messages, data is set to a pointer into the argument
          message, not a copy.

          ConvS2M does the reverse conversion, turning f into a mes-
          sage starting at ap. The length of the resulting message is
          returned.  For Twrite and Rread messages, count bytes start-
          ing at data are copied into the message.

          The constant IOHDRSZ is a suitable amount of buffer to
          reserve for storing the 9P header; the data portion of a
          Twrite or Rread will be no more than the buffer size nego-
          tated in the Tversion/Rversion exchange, minus IOHDRSZ.

          Another structure is Dir, used by the routines described in
          stat(2). ConvM2D converts the machine-independent form
          starting at ap into d and returns the length of the
          machine-independent, input encoding.  The strings in the
          returned Dir structure are stored at successive locations
          starting at strs; if strs is nil they are ignored; however,
          the return value still includes their length.

          ConvD2M does the reverse translation, also returning the
          length of the encoding.  If the buffer is too short, the
          return value will be BIT16SZ and the correct size will be
          returned in the first BIT16SZ bytes.  The macro GBIT16 can
          be used to extract the correct value.  The related macros
          with different sizes retrieve the corresponding-sized quan-
          tities.  PBIT16 and its brethren place values in messages.

     Page 24                      Plan 9             (printed 1/27/00)

     FCALL(2)                                                 FCALL(2)

          With the exception of handling short buffers in convD2M,
          these macros are not usually needed except by internal rou-
          tines.

          GetS reads a message from file descriptor fd into ap and
          converts the message using convM2S into the Fcall structure
          f. The lp argument must point to a long holding the size of
          the ap buffer.  It is somewhat resilient to transient read
          errors.  If convM2S succeeds, its return value is stored in
          *lp, and getS returns zero.  Otherwise getS returns a string
          identifying the error.

          Dirconv, fcallconv, and dirmodeconv are formatting routines,
          suitable for fmtinstall (see print(2)). They convert Dir*,
          Fcall*, and long values into string representations of the
          directory buffer, Fcall buffer, or file mode value.
          Fcallconv assumes that dirconv has been installed with for-
          mat letter `D' and dirmodeconv with format letter `M'.

          Read9pmsg calls read(2) multiple times, if necessary, to
          read an entire 9P message into buf.  The return value is 0
          for end of file, or -1 for error; it does not return partial
          messages.

     SOURCE
          /sys/src/libc/9sys

     SEE ALSO
          intro(2), stat(2), intro(5)

     Page 25                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

     NAME
          stat, fstat, wstat, fwstat, dirstat, dirfstat, dirwstat,
          dirfwstat, nulldir - get and put file status

     SYNOPSIS
          #include <u.h>
          #include <libc.h>

          int stat(char *name, uchar *edir, int nedir)

          int fstat(int fd, uchar *edir, int nedir)

          int wstat(char *name, uchar *edir, int nedir)

          int fwstat(int fd, uchar *edir, int nedir)

          Dir* dirstat(char *name)

          Dir* dirfstat(int fd)

          int dirwstat(char *name, Dir *dir)

          int dirfwstat(int fd, Dir *dir)

          void nulldir(Dir *d)

     DESCRIPTION
          Given a file's name, or an open file descriptor fd, these
          routines retrieve or modify file status information.  Stat,
          fstat, wstat, and fwstat are the system calls; they deal
          with machine-independent directory entries. Their format is
          defined by stat(5). Stat and fstat retrieve information
          about name or fd into edir, a buffer of length nedir,
          defined in <libc.h>.  Wstat and fwstat write information
          back, thus changing file attributes according to the con-
          tents of edir. The data returned from the kernel includes
          its leading 16-bit length field as described in intro(5).
          For symmetry, this field mustal also be present when passing
          data to the kernel in a call to wstat and fwstat, but its
          value is ignored.

          Dirstat, dirfstat, dirwstat, and dirfwstat are similar to
          their counterparts, except that they operate on Dir struc-
          tures:

               typedef
               struct Dir {
                     /* system-modified data */
                     uint  type;    /* server type */
                     uint  dev;     /* server subtype */

     Page 26                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

                     /* file data */
                     Qid   qid;     /* unique id from server */
                     ulong mode;    /* permissions */
                     ulong atime;   /* last read time */
                     ulong mtime;   /* last write time */
                     vlong length;  /* file length: see <u.h> */
                     char  *name;   /* last element of path */
                     char  *uid;    /* owner name */
                     char  *gid;    /* group name */
                     char  *muid;   /* last modifier name */
               } Dir;

          The returned structure is allocated by malloc(2); freeing it
          also frees the associated strings.

          This structure and the Qid structure are defined in
          <libc.h>.  If the file resides on permanent storage and is
          not a directory, the length returned by stat is the number
          of bytes in the file.  For directories, the length returned
          is zero.  For files that are streams (e.g., pipes and net-
          work connections), the length is the number of bytes that
          can be read without blocking.

          Each file is the responsibility of some server: it could be
          a file server, a kernel device, or a user process.  Type
          identifies the server type, and dev says which of a group of
          servers of the same type is the one responsible for this
          file.  Qid is a structure containing path and vers fields:
          path is guaranteed to be unique among all path names cur-
          rently on the file server, and vers changes each time the
          file is modified.  The path is a long long (64 bits, vlong)
          and the vers is an unsigned long (32 bits, ulong).  Thus, if
          two files have the same type, dev, and qid they are the same
          file.

          The bits in mode are defined by

                0x80000000   directory
                0x40000000   append only
                0x20000000   exclusive use (locked)

                      0400   read permission by owner
                      0200   write permission by owner
                      0100   execute permission (search on directory) by owner
                      0070   read, write, execute (search) by group
                      0007   read, write, execute (search) by others

          There are constants defined in <libc.h> for these bits:
          DMDIR, DMAPPEND, and DMEXCL for the first three; and DMREAD,
          DMWRITE, and DMEXEC for the read, write, and execute bits
          for others.

     Page 27                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

          The two time fields are measured in seconds since the epoch
          (Jan 1 00:00 1970 GMT).  Mtime is the time of the last
          change of content.  Similarly, atime is set whenever the
          contents are accessed; also, it is set whenever mtime is
          set.

          Uid and gid are the names of the owner and group of the
          file; muid is the name of the user that last modified the
          file (setting mtime).  Groups are also users, but each
          server is free to associate a list of users with any user
          name g, and that list is the set of users in the group g.
          When an initial attachment is made to a server, the user
          string in the process group is communicated to the server.
          Thus, the server knows, for any given file access, whether
          the accessing process is the owner of, or in the group of,
          the file.  This selects which sets of three bits in mode is
          used to check permissions.

          Only some of the fields may be changed with the wstat calls.
          The name can be changed by anyone with write permission in
          the parent directory.  The mode and mtime can be changed by
          the owner or the group leader of the file's current group.
          The gid can be changed by the owner if he or she is a member
          of the new group.  The gid can be changed by the group
          leader of the file's current group if he or she is the
          leader of the new group.  The length can be changed by any-
          one with write permission, provided the operation is imple-
          mented by the server.  (See intro(5) for permission informa-
          tion, and users(6) for user and group information).

          Special values in the fields of the Dir passed to wstat
          indicate that the field is not intended to be changed by the
          call.  The values are ~0 for integral values and the empty
          string for string values.  The routine nulldir initializes a
          Dir to all `ignore' values.  Thus one may change the mode,
          for example, by using nulldir to initialize a Dir, then set-
          ting the mode, and then doing wstat; it is not necessary to
          use stat to retrieve the initial values first.

     SOURCE
          /sys/src/libc/9syscall  for the non-dir routines
          /sys/src/libc/9sys      for the routines prefixed dir

     SEE ALSO
          intro(2), fcall(2), dirread(2), stat(5)

     DIAGNOSTICS
          All these functions return the number of bytes copied on
          success, -1 on error, and set errstr.

          If the buffer for stat or fstat is too short for the
          returned data, the return value will be BIT16SZ (see

     Page 28                      Plan 9             (printed 1/27/00)

     STAT(2)                                                   STAT(2)

          fcall(2)) and the two bytes returned will contain the ini-
          tial count field of the returned data; retrying with nedir
          equal to that value plus BIT16SZ (for the count itself)
          should succeed.

     Page 29                      Plan 9             (printed 1/27/00)