From mboxrd@z Thu Jan 1 00:00:00 1970 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> In-reply-to: Your message of "Sat, 05 Dec 2009 08:24:45 -1000." References: <20091205031747.GA8759@nipl.net> From: Bakul Shah Date: Sat, 5 Dec 2009 11:47:40 -0800 Message-Id: <20091205194741.0697D5B76@mail.bitblocks.com> Subject: Re: [9fans] ideas for helpful system io functions Topicbox-Message-UUID: a8be7644-ead5-11e9-9d60-3106f5b1d025 On Sat, 05 Dec 2009 08:24:45 -1000 Tim Newsham wrote: > >> I can see two possible solutions for this, both of which would be useful i > n > >> my > >> opinion: > >> > >> - an "unread" function, like ungetc, which allows a program to put back > >> some > >> data that was already read to the OS stdin buffer (not the stdio > >> buffer). > >> This might be problematic if there is a limit to the size of the > >> buffers. > > > > Wouldn't it be a lot easier to change the convention of the > > program you're forking and execing to take 1) a buffer of data > > (passed via cmd line, or fd, or whatever) and 2) the fd with > > the unconsumed part of the data? The only data that would have > > to be copied would be the preconsumed data that you would have > > wanted to "unget". > > ps. if you wanted to hide this ugliness of passing a buffer and > fd to a child process instead of just passing an fd, you could > still solve it in userland without a syscall. Write a library > that does buffered IO. Include unget() if you like. Write the > library in a way that you can initialize it after a fork/exec > to pick up state from the parent (ie. by taking two fds, > reading the buffer from the first, and continuing on with the > 2nd when it is exhausted). > > Is there much benefit in doing this in the kernel instead? Some OS support will help... but first let me provide some motivation! A useful abstraction for this sort of thing is "streams" as in functional programming languages, where the tail of a stream is computed as needed and the computed prefix of the stream can be reread as many times as you wish (stuff no one can reference any more will be garbage collected). So for example, if I define a "primes" stream, I can do 100 `take` primes in Haskell any number of times and always get the first 100 primes. If I wanted to pass entire primes stream *minus* the first 100 to a function, I'd use "100 `drop` primes" to get a new stream. In the example given you'd represent your http data as a stream (its tail is "computed" as you read from the socket/fd), do any preprocessing you want and then pass the whole stream on. Data already read is buffered and you can reread it from the stream. Now unix/plan9 sort of do this for files but not when an fd refers to a fifo of some sort. For an open file, after a fork both the parent and the child start off at the same place in the file but then they can read at different rates. But io to fifos/sockets don't share this behavior. The OS support I am talking about: a) the fork behavior on an open file should be available *without* forking. dup() doesn't cut it (both fds share the same offset on the underlying file). I'd call the new syscall fdfork(). That is, if I do int newfd = fdfork(oldfd); reading N bytes each from newfd and oldfd will return identical data. b) there should be a way to implement the same semantics for fifos or communication end points (or any synthetic file). In the above example same N bytes must be returned even if the underlying object is not a file. c) there should be a way to pass the fd (really, a capability) to another process. Given these, what the OP wants can be implemented cleanly. You fdfork() first, do all your analysis using one fd, close it and then pass on the other fd to a helper process. Implementing b) ideally requires the OS to store potentially arbitrary amount of data. But an implementation must set some practical limit (like that on fifo buffering).