9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Bakul Shah <bakul+plan9@bitblocks.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] ideas for helpful system io functions
Date: Sat,  5 Dec 2009 11:47:40 -0800	[thread overview]
Message-ID: <20091205194741.0697D5B76@mail.bitblocks.com> (raw)
In-Reply-To: Your message of "Sat, 05 Dec 2009 08:24:45 -1000." <Pine.BSI.4.64.0912050822560.13404@malasada.lava.net>

On Sat, 05 Dec 2009 08:24:45 -1000 Tim Newsham <newsham@lava.net>  wrote:
> >> I can see two possible solutions for this, both of which would be useful i
> n
> >> my
> >> opinion:
> >>
> >>  - an "unread" function, like ungetc, which allows a program to put back
> >> some
> >>    data that was already read to the OS stdin buffer (not the stdio
> >> buffer).
> >>    This might be problematic if there is a limit to the size of the
> >> buffers.
> >
> > Wouldn't it be a lot easier to change the convention of the
> > program you're forking and execing to take 1) a buffer of data
> > (passed via cmd line, or fd, or whatever) and 2) the fd with
> > the unconsumed part of the data?  The only data that would have
> > to be copied would be the preconsumed data that you would have
> > wanted to "unget".
>
> ps. if you wanted to hide this ugliness of passing a buffer and
> fd to a child process instead of just passing an fd, you could
> still solve it in userland without a syscall.  Write a library
> that does buffered IO.  Include unget() if you like.  Write the
> library in a way that you can initialize it after a fork/exec
> to pick up state from the parent (ie. by taking two fds,
> reading the buffer from the first, and continuing on with the
> 2nd when it is exhausted).
>
> Is there much benefit in doing this in the kernel instead?

Some OS support will help... but first let me provide some
motivation!

A useful abstraction for this sort of thing is "streams" as
in functional programming languages, where the tail of a
stream is computed as needed and the computed prefix of the
stream can be reread as many times as you wish (stuff no one
can reference any more will be garbage collected).  So for
example, if I define a "primes" stream, I can do

    100 `take` primes

in Haskell any number of times and always get the first 100
primes. If I wanted to pass entire primes stream *minus* the
first 100 to a function, I'd use "100 `drop` primes" to get
a new stream.

In the example given you'd represent your http data as a
stream (its tail is "computed" as you read from the
socket/fd), do any preprocessing you want and then pass the
whole stream on.  Data already read is buffered and you can
reread it from the stream.

Now unix/plan9 sort of do this for files but not when an fd
refers to a fifo of some sort. For an open file, after a fork
both the parent and the child start off at the same place in
the file but then they can read at different rates. But io to
fifos/sockets don't share this behavior.

The OS support I am talking about:
a) the fork behavior on an open file should be available
   *without* forking.  dup() doesn't cut it (both fds share
   the same offset on the underlying file). I'd call the new
   syscall fdfork().  That is, if I do

       int newfd = fdfork(oldfd);

   reading N bytes each from newfd and oldfd will return
   identical data.

b) there should be a way to implement the same semantics for
   fifos or communication end points (or any synthetic file).
   In the above example same N bytes must be returned even if
   the underlying object is not a file.

c) there should be a way to pass the fd (really, a capability)
   to another process.

Given these, what the OP wants can be implemented cleanly.
You fdfork() first, do all your analysis using one fd, close
it and then pass on the other fd to a helper process.

Implementing b) ideally requires the OS to store potentially
arbitrary amount of data.  But an implementation must set
some practical limit (like that on fifo buffering).



  reply	other threads:[~2009-12-05 19:47 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-05  3:17 Sam Watkins
2009-12-05  3:36 ` Lyndon Nerenberg
2009-12-05  3:56   ` Sam Watkins
2009-12-05  4:03     ` Lyndon Nerenberg
2009-12-05 18:16 ` Tim Newsham
2009-12-05 18:24   ` Tim Newsham
2009-12-05 19:47     ` Bakul Shah [this message]
2009-12-07 12:24       ` roger peppe
2009-12-07 12:32         ` Charles Forsyth
2009-12-07 12:35           ` Francisco J Ballesteros
2009-12-07 13:42             ` Charles Forsyth
2009-12-07 16:10             ` erik quanstrom
2009-12-07 16:14               ` Francisco J Ballesteros
2009-12-07 14:13         ` Sam Watkins
2009-12-07 14:36           ` roger peppe
2009-12-07 19:11             ` Nathaniel W Filardo
2009-12-07 21:03               ` roger peppe
2009-12-08 12:51           ` matt
2009-12-07 12:06     ` Mechiel Lukkien
2009-12-07 12:31       ` roger peppe
2010-01-05 13:48     ` Enrico Weigelt
2010-01-05 15:53       ` Steve Simon
     [not found] <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca>
2009-12-05  4:47 ` erik quanstrom
2009-12-05  5:09   ` Lyndon Nerenberg
2009-12-05  5:11     ` Lyndon Nerenberg
2009-12-05  8:10   ` Sam Watkins
2009-12-05 11:44     ` Francisco J Ballesteros
2009-12-05 16:32       ` ron minnich
2009-12-05 17:01         ` Francisco J Ballesteros
2009-12-05 17:09           ` ron minnich
     [not found] <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca>
2009-12-05 13:26 ` erik quanstrom
2009-12-05 14:22   ` Sam Watkins
2009-12-05 17:47     ` Skip Tavakkolian
2009-12-05 17:56       ` Skip Tavakkolian
     [not found] <<20091205081032.GJ8759@nipl.net>
2009-12-05 13:51 ` erik quanstrom
     [not found] <<20091205194741.0697D5B76@mail.bitblocks.com>
2009-12-05 20:03 ` erik quanstrom
2009-12-05 20:24   ` Bakul Shah
     [not found] <<20091205202420.855AD5B77@mail.bitblocks.com>
2009-12-05 20:27 ` erik quanstrom
2009-12-05 20:59   ` Bakul Shah
2009-12-06  7:45     ` Sam Watkins
2009-12-05 20:30 ` erik quanstrom
     [not found] <<20091207120652.GB16320@knaagkever.ueber.net>
2009-12-07 12:19 ` erik quanstrom
2009-12-07 14:41 Francisco J Ballesteros
2009-12-07 15:11 ` roger peppe
     [not found] <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com>
2009-12-07 16:24 ` erik quanstrom
2009-12-07 16:48   ` Francisco J Ballesteros

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091205194741.0697D5B76@mail.bitblocks.com \
    --to=bakul+plan9@bitblocks.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).