The prime functional characteristics of pipes on Unix are: 1) they are a pseudo-device that can be created on demand by programs 2) communications is one-way, from one or more write channel to one or more read channels 3) if either all read channels or all write channels are deassigned, the other communicating partners are notified via an error condition (Unix calls this "broken pipe"). The last characteristic, detection of broken pipes, is the key. The restriction of pipes to one-way communication is required for broken pipe detection. If there were exactly one reader and one writer, then it would be easy to detect a broken pipe. If the channel count drops to 1, then the pipe is broken. This is how DECnet-VAX detects "network partner exited" conditions. However, because fork(2) can cause the cloning of open I/O channels, this isn't sufficient in the Unix pipe case. One can have multiple readers and multiple writers. It therefore is necessary to restrict each individual I/O channel to be either read-only or write-only. With that restriction in place, the I/O system can detect "broken pipe"--the condition exists when there are readers but no writers, or writers but no readers. VMS mailboxes have most of the characteristics required for pipes. There is a service to create the pseudo-devices from user-mode code. You can have multiple read and write channels. The only problem is that you cannot detect the broken pipe condition. One cannot do this because channels assigned to a mailbox can be used for both reading and writing. There is no way to tell which is which, so one cannot tell when all readers have gone away or when all writers have gone away. This is why I had to write a new driver to support pipes on VMS. The driver was written from scratch, but follows the design of the mailbox driver very closely. Here is a functional summary of the VMS pipe driver. - You create a pipe by assigning a channel to the template device PIPE0:. You get back a single I/O channel to a newly-cloned pipe pseudo-device. You can use $GETDVI to find out the name of the new device so that you can assign other channels to it. The device is created with the characteristics bits MBX, REC, SHR, IDV, ODV, device class DC$_MAILBOX, device type DC$_PIPE. RMS treats it as a mailbox. I opted for dynamic UCB cloning rather than building a $CREPIPE system service to do the cloning. Since I was not in the VMS group, I could not add a system service easily. Also, using dynamic cloning is more flexible. It makes it easy to create pipes from DCL level, for example, whereas it is nearly impossible to create and use mailboxes from DCL level because there is no way to get at $CREMBX from command level. This does mean, though, that one cannot control buffer quota, max. message size, device protection, or assignment of a logical name as part of the device creation call. Max. message size (the UCB device buffer size) and device protection can be set by IO$_SETMODE calls. The pipe driver does not allow for user control of buffer quota (more about that later, though). The device protection set upon cloning is (S,O:RWLP,G,W). This is what you want for communication among child processes in the same job tree, which is the usual application of pipes. - When one initially assigns a channel to a pipe, it is "untyped"--the driver does not know if it is a read or a write channel. The first I/O operation to the channel determines which type of channel it is. If the first operation is IO$_READxBLK (virtual, logical, and physical are all the same for pipes) or IO$_SETMODE!IO$M_WRTATTN, then the channel is a read-only channel. If the first operation is IO$_WRITExBLK or IO$_SETMODE!IO$M_READATTN then the channel becomes a write-only channel. Once the type of a channel has been set, attempts to do the opposite operation (a write to a read channel or a read to a write channel) generate SS$_ILLIOFUNC errors. Sometimes it is desireable to declare the type of a channel before you actually do any I/O to it. The I/O functions IO$_SETMODE!IO$M_READCHAN and IO$_SETMODE!IO$M_WRITECHAN exist for this purpose. - I/O to a pipe is done via the usual IO$_READxBLK, IO$_WRITExBLK, and IO$_WRITEOF function codes. These behave the same way that they do with mailboxes, except that there is no IO$M_NOW modifier (see the next item). The IO$M_READATTN and IO$M_WRTATTN modifiers to IO$_SETMODE are available and function identically with the mailbox driver. There is a IO$M_STREAM modifier to IO$_READxBLK and IO$_WRITExBLK that implements true Unix-style stream-mode I/O (this is the only device in VMS to do this on the $QIO level, in fact). The default is record mode. More about stream mode later. - BUFFERING: Both mailboxes and pipes buffer all in-transit records in non- paged pool. The drivers differ in the quota bookkeeping, however. The mailbox driver grabs a fixed amount of pool quota from the creator of the device at $CREMBX time. The buffer quota is set in the $CREMBX call or defaulted from a SYSGEN parameter. When a program writes to a mailbox, the data are copied into non-paged pool and the buffer quota is decremented accordingly. If the quota reaches zero, then the writing process is put into RWMBX state (if system service resource wait mode is on), or the write failes with SS$_MBFULL status (if system service resource wait mode is off). RWMBX state is quite nasty because it prevents the process from being deleted until and unless somebody empties the mailbox enough to let the write operation complete. However, it does mean that writes to a mailbox never use up process BYTLM quota (the quota having been previously deducted from the creator of the mailbox). I asked several old-time VMS developers and was never able to get a satisfactory explanation or rationalization for the existece of RWMBX state in the system. It seems to cause more harm than good, so I left it out of the pipe driver design. Pipes have a fixed 4096-byte lien on non-paged pool. When a process writes to a pipe, the driver allocates a buffer of the appropriate size from non-paged pool and moves the data into it. If the 4096-byte lien has not been fully used up, then the driver does not deduct any process BYTLM quota from the writer. If the 4096-byte lien has been used up, then the pipe driver deducts the difference from process BYTLM quota--in other words, it is like any run-of-the-mill buffered I/O operation. There is no use of RWMBX state for buffer quota control. - I/O COMPLETION: The normal operation of the mailbox driver is for write operations not to complete until the record was read from the mailbox. One must specify IO$M_NOW if one requires the operation to complete as soon as the data are moved into the mailbox buffer in non-paged pool. Likewise, the mailbox driver provides a IO$M_NOW function for reads to allow a reader to poll mailbox for the presence of messages. The pipe driver does not provide a IO$M_NOW function. A write operation always completes immediately if the message is covered by the 4096-byte lien on non-paged pool. If user proces BYTLM was required to cover all or part of the message, then the write operation does not complete until all of the message is covered by the lien, or until the message is read from the pipe. Thus, a write of a message longer than 4096 bytes will not complete until all but 4096 bytes of it have been read (if mixed record and stream mode I/O is being done, it is possible for messages to be partially read--more about this below). Reads never complete until something has been read. I could have implemented IO$M_NOW, but I chose not to do so. The chosen implementation offers proper quota management (without RWMBX state), and allows for asynchronous operation of the readers and writers in the default I/O case (e.g., RMS I/O), something that doesn't happen with mailboxes. A writer doing RMS I/O to a mailbox stalls until the message is read. With pipes, he does not stall until he gets over 4096 bytes ahead of the reader. The pipe driver design allows efficient processing overlap in pipelines without the need for any special programming. - MAILBOX MODE VS. ULTRIX MODE: In a Unix-style pipeline, several images may be run in succession with their output going down the pipe. RMS thinks the pipe is a mailbox, and so it writes an EOF record to the pipe every time one of the images closes its file on the pipe. In this case, we want to ignore the EOF records and treat the breakage of the pipe at the end of the whole I/O sequence as the EOF condition. However, in "normal" VMS useage, one wants to pass EOF records just as the mailbox driver does. I invented the concept of "Ultrix mode" versus "mailbox mode" to handle this. Pipes are created in VMS mode. The $QIO call IO$_SETMODE!IO$M_ULTRIX places the pipe device in Ultrix mode. IO$M_SETMODE!IO$M_MAILBOX will put it back. There are two differences between the modes: a) In mailbox mode, a IO$_WRITEOF operation puts a EOF record in the pipe. In Ultrix mode, IO$_WRITEOF completes successfully but is a no-op. b) In mailbox mode, reads to a broken pipe terminate with SS$_LINKDISCON status. In Ultrix mode, reads to a broken pipe terminate with SS$_ENDOFFILE status. - BROKEN PIPE NOTIFICATION: The pipe driver keeps counts of the numbers of read and write channels assigned, and two additional state bits: readers- have-existed and writers-have-existed. The readers-have-existed bit is set when the first read channel is assigned. The writers-have-existed bit is set when the first write channel is assigned. A broken pipe condition exists whenever: a) a write operation is pending on the pipe, readers-have-existed is set, but the current count of read channels is zero b) a read operation is pending on the pipe, the pipe is empty, writers-have-existed is set, but the current count of write channels is zero The two "have-existed" bits exist to coordinate startup of the pipe communication. Without those bits, there would be a race condition between the first write to the pipe and the first read to the pipe. It is not an error to be writing to a pipe that has no readers and has never had readers, or to be reading from a pipe that has no writers but has never had writers. There is the potential for a hang condition here if, for example, the reader process dies before it ever gets a chance to open its channel to the pipe. The same potential exists on Unix. In practice, it is not a problem, especially since writes to a pipe cannot put you in a resource-wait state from which there is no exit (RWMBX state). If all writers have exited, readers can continue to read from the pipe without error until they have emptied it. This is necessary so that writers don't have to wait around for all of their data to be read. Reads to a broken pipe complete with SS$_ENDOFFILE status if the pipe has been set in Ultrix mode, or with SS$_LINKDISCON (network partner disconnected logical link) status if the pipe is in mailbox mode (the default). Writes to a broken pipe complete with SS$_LINKDISCON status regardless of mode. - STREAM MODE: The pipe driver provides a modifier, IO$M_STREAM, for both IO$_READxBLK and IO$_WRITExBLK. The presence of the modifier indicates that the I/O operation is to be performed in stream mode rather than in record mode. A stream mode read operation always reads the requested number of bytes from the pipe. It ignores record boundaries. For example, if there are three 10-byte records in the pipe, and a $QIO specifies a 15-byte stream mode read, the first record and the first 5 bytes of the second record will be read and put in the user's buffer as one chunk of data. That will leave the remaining 5 bytes of record 2 and all of record 3 in the pipe. A subsequent $QIO read in record mode will read the 5 bytes of record 2. There is one case where a stream mode read doesn't read exactly the number of bytes that the user specified. That is if end-of-pipe is detected. End-of-pipe is either a EOF record in the pipe, or a broken pipe. In both of these cases, the read in progress terminates with a short byte count. The next read issued picks up the EOF or LINKDISCON condition. This is exactly the Unix semantics for reads from pipes. A stream mode write operation is the same as a record mode write operation except that the write doesn't imply a record boundary. For example, suppose there are three $QIOs specifying stream mode write, each for 5 bytes, followed by two record mode writes, each for 10 bytes. A record mode reader will see two records. The first is 25 bytes long, the second is 10 bytes long. I will send you a copy of the complete pipe driver specification so you can see how this looks in toto. IMPLICATIONS FOR P.TBD INTERPROCESS COMMUNICATION Clearly P.ULTRIX requires Unix-compatible pipes. The driver design outlined above accomplishes this in a way that is compatible with simultaneous use by a record-oriented access method such as RMS. IO$M_STREAM probably isn't necessary: the Ultrix read(2) and write(2) facilities, which are what present a stream mode interface to programs, have to deal with block-oriented devices such as disks and tapes anyway, so they are capable of doing the necessary record blocking and deblocking to make anything appear to be stream-oriented regardless of its underlying record-oriented characteristics. One thing that you get for free with the VMS pipe driver design is that pipes have names. "Named pipes" are a relatively recent innovation in the Unix world and are all the rage these days. It is not as clear that P.VMS needs VAX/VMS-compatible mailboxes. In the vast majority of cases, channels assigned to mailboxes are used either exclusively for reading or exclusively for writing, and therefore pipes would suffice. In fact, use of pipes in place of mailboxes would relieve implementors of all the defensive programming you need with mailboxes to get around the fact that with a mailbox there's no way to tell that the other end of the communications link has gone away. To be conservative, though, it's probably a good idea to provide a mailbox-compatible facility. I think that all of the needs in this space could be addressed by a single I/O facility for interprocess communication. The desireable characteristics are: 1) The pseudo-device object is created by the P.TBD equivalent of dynamic UCB cloning on VAX/VMS. That is, the object is created when an I/O channel is assigned to a template object. It should be possible to get one of these devices without doing an explicit system service call such as SYS$CREMBX or pipe(2). Using dynamic UCB cloning allows the pseudo-devices to be created from command level without any special support (such as a lexical function). 2) The object has two major modes of operation: mailbox mode and pipe mode. Upon creation, the object is in pipe mode. The P.TBD equivalent of a IO$_SETMODE $QIO operation switches the device between mailbox mode and pipe mode. a) In mailbox mode, the device acts like a VAX/VMS mailbox. Channels can be used for either reading or writing. You get no "broken pipe" notification. Operations stall unless a IO$M_NOW modifier is present. b) In pipe mode, the device acts like a Unix pipe. Channels can be used only for reading or writing, but not both. The first operation to a channel determines its type (read-only or write-only). Type can be explicitly declared via a IO$_SETMODE-like call. Broken pipe notification semantics are as for the VAX/VMS pipe driver when operating in Ultrix mode. IO$_WRITEOF is a no-op. IO$M_NOW is ignored--the device always behaves like the VAX/VMS pipe driver as regards stalling. c) The equivalent of IO$M_READATTN and IO$M_WRITEATTN routines should operate the way they do for VAX/VMS mailboxes regardless of mode. These two operations set the mode of the I/O channel. 3) The VMS compatibility library would supply a SYS$CREMBX call that would assign a channel to the pseudo-device template object, put the object in mailbox mode, do IO$_SETMODE-equlvalent calls to set the protection and buffer size characteristics the way the user wanted them, then return the assigned channel to the caller. 4) The P.Ultrix library would supply a pipe(2) call that would assign a channel to the pseudo-device template object, assign another channel to the cloned pipe object, then do a IO$_SETMODE!IO$M_READCHAN-equivalent I/O call to set the first channel read-only, and a IO$M_WRITECHAN- equivalent call to set the second channel write-only. Then it would return both channels to the caller. 5) No equivalent of the RWMBX resource wait state should be provided. If a pipe or mailbox fills, the current I/O operation merely should be stalled. I hope all this stuff is helpful. If there are any specific questions about pipes or mailboxes that I can answer, just ask. --PSW