Re: [9fans] ideas for helpful system io functions

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] ideas for helpful system io functions
       [not found] <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca>
@ 2009-12-05 13:26 ` erik quanstrom
  2009-12-05 14:22   ` Sam Watkins
  0 siblings, 1 reply; 46+ messages in thread
From: erik quanstrom @ 2009-12-05 13:26 UTC (permalink / raw)
  To: 9fans

On Sat Dec  5 00:12:53 EST 2009, lyndon@orthanc.ca wrote:
> > Where FD passing is useful is to avoid that fork/exec overhead.
>
> Sorry -- brain in neutral. Where FD passing wins BIG is that the front-end
> process doesn't have to do copy-through of all the data between the
> network and the back-end process.

if you don't need to modify the data futher, then exec the guy who
does.

by the way, during some recent testing, i was able to move ~100k
packets-per-second and create 25 million new processes / day with
a load of 0 on a lowly 1.6ghz woodcrest.

if you were to get 25m http requests/day and each did only 4k of
i/o that's 97gb/day which is > 10mbit.

i think for many net-facing applications, you'll easily be able
to fork/exec fast enough and eliminating the fork/exec would
be a premature optimization that would cost tons in development,
debugging and maintence time.

- erik

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 13:26 ` [9fans] ideas for helpful system io functions erik quanstrom
@ 2009-12-05 14:22   ` Sam Watkins
  2009-12-05 17:47     ` Skip Tavakkolian
  0 siblings, 1 reply; 46+ messages in thread
From: Sam Watkins @ 2009-12-05 14:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Dec 05, 2009 at 08:26:20AM -0500, erik quanstrom wrote:
> if you don't need to modify the data futher, then exec the guy who
> does.

This is my issue - when I want to exec, too much of the request data has
already been read.  I don't want to be calling read(fd, buf, 1) in a loop.
I would like to pass the extra buffered data to the guy I am execing then let
him read the rest directly from the socket, but I see no existing way to do
that.  hence my suggestions for alternative ways.  my "join" suggestion is the
most versatile so probably the best.

> by the way, during some recent testing, i was able to move ~100k
> packets-per-second and create 25 million new processes / day with
> a load of 0 on a lowly 1.6ghz woodcrest.

nice.

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 14:22   ` Sam Watkins
@ 2009-12-05 17:47     ` Skip Tavakkolian
  2009-12-05 17:56       ` Skip Tavakkolian
  0 siblings, 1 reply; 46+ messages in thread
From: Skip Tavakkolian @ 2009-12-05 17:47 UTC (permalink / raw)
  To: 9fans

> I would like to pass the extra buffered data to the guy I am execing then let
> him read the rest directly from the socket, but I see no existing way to do
> that.

httpd passes the headers and any left over buffer it has already read to /magic
apps through a command line param. there's a function for parsing and placing
the "unread" stuff in the input buffer that's tied to the fd.

/sys/src/libhttpd/hio.c:233: hunload(Hio *h)




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 17:47     ` Skip Tavakkolian
@ 2009-12-05 17:56       ` Skip Tavakkolian
  0 siblings, 0 replies; 46+ messages in thread
From: Skip Tavakkolian @ 2009-12-05 17:56 UTC (permalink / raw)
  To: 9fans

>> I would like to pass the extra buffered data to the guy I am execing then let
>> him read the rest directly from the socket, but I see no existing way to do
>> that.
>
> httpd passes the headers and any left over buffer it has already read to /magic
> apps through a command line param. there's a function for parsing and placing
> the "unread" stuff in the input buffer that's tied to the fd.
>
> /sys/src/libhttpd/hio.c:233: hunload(Hio *h)

also:
/sys/src/libhttpd/hio.c:274: hload(Hio *h, char *buf)
/sys/src/cmd/ip/httpd/init.c:23: init(int argc, char **argv)




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2010-01-05 13:48     ` Enrico Weigelt
@ 2010-01-05 15:53       ` Steve Simon
  0 siblings, 0 replies; 46+ messages in thread
From: Steve Simon @ 2010-01-05 15:53 UTC (permalink / raw)
  To: weigelt, 9fans

> I'm thinking about whether it's worth to change uclibc in a way
> that it allows to plug-in userland-vfs'es.

Nothing new under the sun I'am afraid, however if you go ahead
this might be interesting.

http://www.cs.ncl.ac.uk/publications/articles/papers/399.pdf

-Steve



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 18:24   ` Tim Newsham
  2009-12-05 19:47     ` Bakul Shah
  2009-12-07 12:06     ` Mechiel Lukkien
@ 2010-01-05 13:48     ` Enrico Weigelt
  2010-01-05 15:53       ` Steve Simon
  2 siblings, 1 reply; 46+ messages in thread
From: Enrico Weigelt @ 2010-01-05 13:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

* Tim Newsham <newsham@lava.net> wrote:

> ps. if you wanted to hide this ugliness of passing a buffer and
> fd to a child process instead of just passing an fd, you could
> still solve it in userland without a syscall.  Write a library
> that does buffered IO.  Include unget() if you like.  Write the
> library in a way that you can initialize it after a fork/exec
> to pick up state from the parent (ie. by taking two fds,
> reading the buffer from the first, and continuing on with the
> 2nd when it is exhausted).

Not sure how things work on Plan9, but on GNU/Linux you could
even use LD_PRELOAD to overlay the read() libc function to
hide that magic, or even tweak libc for that.

BTW: how to do you in general think about having tweaked libc's
instead of all these "cross-platform libraries" ? For example,
I'm thinking about whether it's worth to change uclibc in a way
that it allows to plug-in userland-vfs'es.


cu
--
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weigelt@metux.de
 mobile: +49 174 7066481   icq:   210169427         skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 14:13         ` Sam Watkins
  2009-12-07 14:36           ` roger peppe
@ 2009-12-08 12:51           ` matt
  1 sibling, 0 replies; 46+ messages in thread
From: matt @ 2009-12-08 12:51 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs


>
>That's not what I meant by joining two fds.
>
>
>
back seat coding, nice



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 19:11             ` Nathaniel W Filardo
@ 2009-12-07 21:03               ` roger peppe
  0 siblings, 0 replies; 46+ messages in thread
From: roger peppe @ 2009-12-07 21:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/7 Nathaniel W Filardo <nwf@cs.jhu.edu>:
>> fd1 := open("/foo1", ORDWR);
>> fd2 := open("/foo2", ORDWR);
>> fd3 := fdjoin(fd1, fd2);
>>
>> what is going to happen?
>> something has got to initiate the requests to actually
>> shift the data, and it's not clear which direction the
>> data will flow.
>
> "file to file" joins like that are not the typical case and might even be an
> error to attempt.

in plan 9, everything is a file - whether it's generated by opening '#p/data1'
or '/foo1'.

>  Linux's equivalent APIs (yes, plural, sigh) always hook
> an "active" component somewhere...  sendfile() for example is typically
> employed as a crude hook on the TCP stack's "I could accept some bytes from
> a write() from userland now" "event" and turn that into a read() of the
> sendfile()d thing (which must be a pagecacheable thing... wtf.  splice()
> fixes at least some or perhaps all of that).  splice()d file descriptors
> just forward read()s and write()s across the splice.

i see why you might want to send file descriptors around
the place, (for instance, one could theoretically add a control
request to /net/tcp that said "treat this fd as your source of data",
though it wouldn't work across the network),
but i still don't see how "splice" could work in general.

>> this is an optimisation, right? what parts of the current system
>> could be speeded up by the use of this primitive?
>
> A typical *nix use case is sending a prefix and static file to a socket
> (e.g. nonencrypting, nonchunked httpds, ftpds, etc.).

well, that case is easily dealt with with something like devjoin.

> Of note, more generally, splice() and friends are approximating something
> possible and (relatively) easy in the capability kernel world: some process
> has capabilities to two objects and wishes to introduce those objects to
> each other (and further wishes that those objects would stop bothering it.
> :) ).  i.e. "Please resend all outstanding and forward all future requests
> to this other capability."

i see that.
but i think that fdjoin(fd1, fd2) is more like introducing two capabilities
to each other, which doesn't really make sense, than talking to the
objects behind the scenes.
the objects behind the scenes in plan 9 are servers and device drivers.

it might be interesting to provide a nice (not /srv based) way of passing
file descriptors between unrelated processes. the challenge comes when
you want to make it
work on a remote file server...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 14:36           ` roger peppe
@ 2009-12-07 19:11             ` Nathaniel W Filardo
  2009-12-07 21:03               ` roger peppe
  0 siblings, 1 reply; 46+ messages in thread
From: Nathaniel W Filardo @ 2009-12-07 19:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

On Mon, Dec 07, 2009 at 02:36:36PM +0000, roger peppe wrote:
> 2009/12/7 Sam Watkins <sam@nipl.net>:
> > I meant for example if a process is reading from its stdin a open file 'A' and
> > writing to stdout the input of a pipe 'B', rather than looping and forwarding
> > data it may simply "join" these two fds, and exit.  The OS will then do what is
> > necessary to make sure the data can travel from A to B (and/or vice versa) with
> > the minimum effort needed.
> 
> i'm not sure how you think this would work.

The pipe would have to be a bit smarter than Plan 9's pipes currently are,
or the attempts to join to a pipe would have to skip over the pipe and join
with the other descriptor.  It's certainly _possible_ to do, and AFAIK the
Linux guys do so with abandon. ;)

> a file descriptor is essentially a passive object - it responds
> to read, write, etc requests on it, but it doesn't do anything
> of its own accord.
> 
> if i do:
> 
> fd1 := open("/foo1", ORDWR);
> fd2 := open("/foo2", ORDWR);
> fd3 := fdjoin(fd1, fd2);
> 
> what is going to happen?
> something has got to initiate the requests to actually
> shift the data, and it's not clear which direction the
> data will flow.

"file to file" joins like that are not the typical case and might even be an
error to attempt.  Linux's equivalent APIs (yes, plural, sigh) always hook
an "active" component somewhere...  sendfile() for example is typically
employed as a crude hook on the TCP stack's "I could accept some bytes from
a write() from userland now" "event" and turn that into a read() of the
sendfile()d thing (which must be a pagecacheable thing... wtf.  splice()
fixes at least some or perhaps all of that).  splice()d file descriptors
just forward read()s and write()s across the splice.

> this is an optimisation, right? what parts of the current system
> could be speeded up by the use of this primitive?

A typical *nix use case is sending a prefix and static file to a socket
(e.g. nonencrypting, nonchunked httpds, ftpds, etc.).

Of note, more generally, splice() and friends are approximating something
possible and (relatively) easy in the capability kernel world: some process
has capabilities to two objects and wishes to introduce those objects to
each other (and further wishes that those objects would stop bothering it.
:) ).  i.e. "Please resend all outstanding and forward all future requests
to this other capability."

--nwf;

[-- Attachment #2: Type: application/pgp-signature, Size: 204 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 16:24 ` erik quanstrom
@ 2009-12-07 16:48   ` Francisco J Ballesteros
  0 siblings, 0 replies; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-07 16:48 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

the idea is that if fs knows enough to handle partitions like
we are accustomed to, then partitioning code can be removed
from everywhere else (but for compat) and existing tools used
to handle partitions (e.g., fdisk) very much like they are used now.

Either way, It's not standalone, in one case you require loop, in the other fs.
both can load their configs and both require help to learn which
partitions to use at boot time.

On Mon, Dec 7, 2009 at 5:24 PM, erik quanstrom <quanstro@quanstro.net> wrote:
> On Mon Dec  7 11:16:04 EST 2009, nemo@lsub.org wrote:
>> It seems that changing a bit fs(3) can suffice and is generic
>> enough for all usages required. In the end it might result in code
>> removed instead of adding code, but time will tell. As of today, it's
>> only an experiment.
>
> not everyone who uses usb disks also uses fs.
> i also like the idea of a stand-alone device.
> i can also put the entire configuration of a
> loop device in plan9.ini.
>
> - erik
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com>
@ 2009-12-07 16:24 ` erik quanstrom
  2009-12-07 16:48   ` Francisco J Ballesteros
  0 siblings, 1 reply; 46+ messages in thread
From: erik quanstrom @ 2009-12-07 16:24 UTC (permalink / raw)
  To: 9fans

On Mon Dec  7 11:16:04 EST 2009, nemo@lsub.org wrote:
> It seems that changing a bit fs(3) can suffice and is generic
> enough for all usages required. In the end it might result in code
> removed instead of adding code, but time will tell. As of today, it's
> only an experiment.

not everyone who uses usb disks also uses fs.
i also like the idea of a stand-alone device.
i can also put the entire configuration of a
loop device in plan9.ini.

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 16:10             ` erik quanstrom
@ 2009-12-07 16:14               ` Francisco J Ballesteros
  0 siblings, 0 replies; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-07 16:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

It seems that changing a bit fs(3) can suffice and is generic
enough for all usages required. In the end it might result in code
removed instead of adding code, but time will tell. As of today, it's
only an experiment.

On Mon, Dec 7, 2009 at 5:10 PM, erik quanstrom <quanstro@coraid.com> wrote:
>> fs is already larger than it was, there's an experimental
>> ongoing version that knows enough of partitioning to help
>> usb and others on that respect.
>
> why not just use sdloop(3)?
>
> - erik
>
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:35           ` Francisco J Ballesteros
  2009-12-07 13:42             ` Charles Forsyth
@ 2009-12-07 16:10             ` erik quanstrom
  2009-12-07 16:14               ` Francisco J Ballesteros
  1 sibling, 1 reply; 46+ messages in thread
From: erik quanstrom @ 2009-12-07 16:10 UTC (permalink / raw)
  To: 9fans

> fs is already larger than it was, there's an experimental
> ongoing version that knows enough of partitioning to help
> usb and others on that respect.

why not just use sdloop(3)?

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 14:41 Francisco J Ballesteros
@ 2009-12-07 15:11 ` roger peppe
  0 siblings, 0 replies; 46+ messages in thread
From: roger peppe @ 2009-12-07 15:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/7 Francisco J Ballesteros <nemo@lsub.org>:
> I think he wants copyfile + a kproc.

yup, i was thinking of inferno's sys->stream().

but neither is in a position to do the kind of redundancy
optimisation that sam was talking about, AFAICS.
at least it can avoid copying by calling bread and bwrite.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
@ 2009-12-07 14:41 Francisco J Ballesteros
  2009-12-07 15:11 ` roger peppe
  0 siblings, 1 reply; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-07 14:41 UTC (permalink / raw)
  To: 9fans

I think he wants copyfile + a kproc.

On 07/12/2009, at 15:37, rogpeppe@gmail.com wrote:

> 2009/12/7 Sam Watkins <sam@nipl.net>:
>> I meant for example if a process is reading from its stdin a open
>> file 'A' and
>> writing to stdout the input of a pipe 'B', rather than looping and
>> forwarding
>> data it may simply "join" these two fds, and exit.  The OS will
>> then do what is
>> necessary to make sure the data can travel from A to B (and/or vice
>> versa) with
>> the minimum effort needed.
>
> i'm not sure how you think this would work.
>
> a file descriptor is essentially a passive object - it responds
> to read, write, etc requests on it, but it doesn't do anything
> of its own accord.
>
> if i do:
>
> fd1 := open("/foo1", ORDWR);
> fd2 := open("/foo2", ORDWR);
> fd3 := fdjoin(fd1, fd2);
>
> what is going to happen?
> something has got to initiate the requests to actually
> shift the data, and it's not clear which direction the
> data will flow.
>
> this is an optimisation, right? what parts of the current system
> could be speeded up by the use of this primitive?
>
> [/mail/box/nemo/msgs/200912/452]



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 14:13         ` Sam Watkins
@ 2009-12-07 14:36           ` roger peppe
  2009-12-07 19:11             ` Nathaniel W Filardo
  2009-12-08 12:51           ` matt
  1 sibling, 1 reply; 46+ messages in thread
From: roger peppe @ 2009-12-07 14:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/7 Sam Watkins <sam@nipl.net>:
> I meant for example if a process is reading from its stdin a open file 'A' and
> writing to stdout the input of a pipe 'B', rather than looping and forwarding
> data it may simply "join" these two fds, and exit.  The OS will then do what is
> necessary to make sure the data can travel from A to B (and/or vice versa) with
> the minimum effort needed.

i'm not sure how you think this would work.

a file descriptor is essentially a passive object - it responds
to read, write, etc requests on it, but it doesn't do anything
of its own accord.

if i do:

fd1 := open("/foo1", ORDWR);
fd2 := open("/foo2", ORDWR);
fd3 := fdjoin(fd1, fd2);

what is going to happen?
something has got to initiate the requests to actually
shift the data, and it's not clear which direction the
data will flow.

this is an optimisation, right? what parts of the current system
could be speeded up by the use of this primitive?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:24       ` roger peppe
  2009-12-07 12:32         ` Charles Forsyth
@ 2009-12-07 14:13         ` Sam Watkins
  2009-12-07 14:36           ` roger peppe
  2009-12-08 12:51           ` matt
  1 sibling, 2 replies; 46+ messages in thread
From: Sam Watkins @ 2009-12-07 14:13 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Mon, Dec 07, 2009 at 12:24:05PM +0000, roger peppe wrote:
> if you wanted it, an "fd join" driver could be simply
> implemented in a similar way:
>
> bind '#j4.5' /mnt/joined
> open /mnt/joined/data to get a (read-only) fd that satisfies reads from fd 4
> until eof, then fd 5.

That's not what I meant by joining two fds.

I meant for example if a process is reading from its stdin a open file 'A' and
writing to stdout the input of a pipe 'B', rather than looping and forwarding
data it may simply "join" these two fds, and exit.  The OS will then do what is
necessary to make sure the data can travel from A to B (and/or vice versa) with
the minimum effort needed.

Supposing another process 'foo' is reading the other end of the pipe 'C', the
OS will simply remove the pipe 'B-C' entirely, and reroute 'foo's stdin to come
directly from 'A'.  In other circumstances the OS might need to effectively
exec 'cat' (or a 2-way socket-cat) to take over the task of copying data, but
often it will be able to remove a pipe, reducing the amount of unnecessary
copying that will take place.

Where I have said "stdin" I mean the fds not stdio / buffered IO FILEs.

I hope I've cleared up what I meant now, seems I'm not very good at explaining
it.

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:35           ` Francisco J Ballesteros
@ 2009-12-07 13:42             ` Charles Forsyth
  2009-12-07 16:10             ` erik quanstrom
  1 sibling, 0 replies; 46+ messages in thread
From: Charles Forsyth @ 2009-12-07 13:42 UTC (permalink / raw)
  To: 9fans

> i wonder if there's a way of perverting fs(3)

i made the comment fairly idly, so i shouldn't take it too seriously.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:32         ` Charles Forsyth
@ 2009-12-07 12:35           ` Francisco J Ballesteros
  2009-12-07 13:42             ` Charles Forsyth
  2009-12-07 16:10             ` erik quanstrom
  0 siblings, 2 replies; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-07 12:35 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Hmmm. That's what a cat device do, only that
it does so by looking at the sizes and not at eof
indications. Also, it depends on seek pos., which
wont work for streams.

Perhaps a streamcat, although I don't like to have
cats and streamcats.  Perhaps yet another option.

fs is already larger than it was, there's an experimental
ongoing version that knows enough of partitioning to help
usb and others on that respect.

Trying is fun, anyway.

On Mon, Dec 7, 2009 at 1:32 PM, Charles Forsyth <forsyth@terzarima.net> wrote:
>>bind '#j4.5' /mnt/joined
>> ... to get a (read-only) fd that satisfies reads from fd 4
>>until eof, then fd 5.
>
> i wonder if there's a way of perverting fs(3)
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:24       ` roger peppe
@ 2009-12-07 12:32         ` Charles Forsyth
  2009-12-07 12:35           ` Francisco J Ballesteros
  2009-12-07 14:13         ` Sam Watkins
  1 sibling, 1 reply; 46+ messages in thread
From: Charles Forsyth @ 2009-12-07 12:32 UTC (permalink / raw)
  To: 9fans

>bind '#j4.5' /mnt/joined
> ... to get a (read-only) fd that satisfies reads from fd 4
>until eof, then fd 5.

i wonder if there's a way of perverting fs(3)



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-07 12:06     ` Mechiel Lukkien
@ 2009-12-07 12:31       ` roger peppe
  0 siblings, 0 replies; 46+ messages in thread
From: roger peppe @ 2009-12-07 12:31 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/7 Mechiel Lukkien <mechiel@xs4all.nl>:
> i've attached devbuf.c and devjoin.c, as example (for inferno).

[saw this just after i'd posted]
that's funny - you even chose the same device character for
devbuf!

to be honest, your devbuf.c is almost synomous with a pipe.
for buffer sizes of <64K, writes on a pipe don't block.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 19:47     ` Bakul Shah
@ 2009-12-07 12:24       ` roger peppe
  2009-12-07 12:32         ` Charles Forsyth
  2009-12-07 14:13         ` Sam Watkins
  0 siblings, 2 replies; 46+ messages in thread
From: roger peppe @ 2009-12-07 12:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

2009/12/5 Bakul Shah <bakul+plan9@bitblocks.com>:
>       int newfd = fdfork(oldfd);

i'm not sure that there needs to be a new syscall to enable
this. a driver would be adequate.

here's one possibility:

the driver implements "buffered streams" - i.e. reads
are lazy, but previous reads can be re-read.

bind '#β4.8192' /mnt/bufstream to get a buffered, read-only stream of
fd 4, with an 8K buffer.

open /mnt/bufstream/data to get a new window
on the stream. if you read at an offset beyond
anything previously read, it triggers a read on the
underlying fd, which may block. if the offset isn't within the buffer size,
then the read returns -1; otherwise the read is satisfied
from the buffered data.

the underlying assumption is that the fd is stream-,
not message-oriented - as with tcp; message boundaries
are not preserved.

if you wanted it, an "fd join" driver could be simply
implemented in a similar way:

bind '#j4.5' /mnt/joined
open /mnt/joined/data to get a (read-only) fd that satisfies reads from fd 4
until eof, then fd 5.

both of these might make a fun exercise for a rainy day.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<20091207120652.GB16320@knaagkever.ueber.net>
@ 2009-12-07 12:19 ` erik quanstrom
  0 siblings, 0 replies; 46+ messages in thread
From: erik quanstrom @ 2009-12-07 12:19 UTC (permalink / raw)
  To: 9fans

> since file descriptors are so essential, it may help to have "tools"
> to use them.  yesterday evening i hacked up devbuf.c and devjoin.c
> after reading this thread.   both offer a file "new".  for devbuf.c
> you can write data to it, then later consume it (yes, you could just
> use a pipe instead).

why can't you use ramfs instead of devbuf?

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 18:24   ` Tim Newsham
  2009-12-05 19:47     ` Bakul Shah
@ 2009-12-07 12:06     ` Mechiel Lukkien
  2009-12-07 12:31       ` roger peppe
  2010-01-05 13:48     ` Enrico Weigelt
  2 siblings, 1 reply; 46+ messages in thread
From: Mechiel Lukkien @ 2009-12-07 12:06 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 2499 bytes --]

On Sat, Dec 05, 2009 at 08:24:45AM -1000, Tim Newsham wrote:
> ps. if you wanted to hide this ugliness of passing a buffer and
> fd to a child process instead of just passing an fd, you could
> still solve it in userland without a syscall.  Write a library
> that does buffered IO.  Include unget() if you like.  Write the
> library in a way that you can initialize it after a fork/exec
> to pick up state from the parent (ie. by taking two fds,
> reading the buffer from the first, and continuing on with the
> 2nd when it is exhausted).
>
> Is there much benefit in doing this in the kernel instead?

it's all library code, and it loses the "everything is a file
(descriptor)" advantage.  you cannot pass that library state to another
program.  you could if the state was a file descriptor.

for inferno i wrote an http client library that turns a request into an
fd to read the data from.  that fd has http chunking,gzip,ssl peeled
off.  now i can pass the fd with the http response to other programs,
do buffered i/o on it, etc.  this is implemented in user-space btw,
with inferno's sys->file2chan (as opposed to pipes, you can do error
message propagation over file2chan's).

since file descriptors are so essential, it may help to have "tools"
to use them.  yesterday evening i hacked up devbuf.c and devjoin.c
after reading this thread.   both offer a file "new".  for devbuf.c
you can write data to it, then later consume it (yes, you could just
use a pipe instead).  for devjoin.c, you can write fd numbers (of open
files) to register an fd, then later reads will get data from the first
registered file, when that returns 0 it continues on the next, and so on.
so fd's can be chained for reading (not writing).  i know this "join"
functionality is different from what sam originally described.

i've attached devbuf.c and devjoin.c, as example (for inferno).
they have bugs (don't assign qid.path, probably *walk is broken too).
testbufjoin.b is an example of how the dev's can be used.  it creates a
new fd that has a buffer at the front (e.g. leftovers from http header
reading), then continues on stdin (where the leftover may have come from).
then it reads the new fd and writes its data to stdout.

these devices are not for performance.  perhaps they make working with
one of the most basic OS concepts (fd's) a bit easier.  but perhaps this
problem is not common enough, or can be handled (with fd's preferrably)
in a better way.

mjl

[-- Attachment #2: devbuf.c --]
[-- Type: text/x-csrc, Size: 1729 bytes --]

#include	"dat.h"
#include	"fns.h"
#include	"error.h"

typedef struct Buffile Buffile;
struct Buffile
{
	uchar	*p;
	int	s;
	int	e;
};

enum
{
	Qdir,
	Qbuffile,
};

Dirtab bufdir[] =
{
	".",		{Qdir,0,QTDIR},	0,		DMDIR|0500,
	"new",		{Qbuffile},	0,		0660,
};

static Buffile*
buffilealloc(uchar *p, int n)
{
	Buffile *b;

	b = malloc(sizeof b[0]+n);
	b->p = (uchar*)b+sizeof b[0];
	memmove(b->p, p, n);
	b->s = 0;
	b->e = n;
	return b;
}

static Chan*
bufattach(char *spec)
{
	return devattach('β', spec);
}

static Walkqid*
bufwalk(Chan *c, Chan *nc, char **name, int nname)
{
	return devwalk(c, nc, name, nname, bufdir, nelem(bufdir), devgen);
}

static int
bufstat(Chan *c, uchar *db, int n)
{
        return devstat(c, db, n, bufdir, nelem(bufdir), devgen);
}

static Chan*
bufopen(Chan *c, int omode)
{
	return devopen(c, omode, bufdir, nelem(bufdir), devgen);
}

static void
bufclose(Chan *c)
{
	free(c->aux);
	c->aux = nil;
}

static long
bufread(Chan *c, void *va, long n, vlong off)
{
	Buffile *b;
	int have;

	if(c->qid.type == QTDIR)
		return devdirread(c, va, n, bufdir, nelem(bufdir), devgen);

	b = c->aux;
	if(b == nil)
		return 0;
	
	USED(off);
	have = b->e - b->s;
	if(have < n || n < 0)
		n = have;
	memmove(va, b->p, n);
	b->s += n;
	return n;
}

static long
bufwrite(Chan *c, void *va, long n, vlong off)
{
	if(c->qid.type == QTDIR)
		error(Eisdir);

	free(c->aux);
	c->aux = buffilealloc(va, n);
	return n;
}


Dev bufdevtab = {
	'β',
	"buf",

	devinit,
	bufattach,
	bufwalk,
	bufstat,
	bufopen,
	devcreate,
	bufclose,
	bufread,
	devbread,
	bufwrite,
	devbwrite,
	devremove,
	devwstat,
};

[-- Attachment #3: devjoin.c --]
[-- Type: text/x-csrc, Size: 2037 bytes --]

#include	"dat.h"
#include	"fns.h"
#include	"error.h"

typedef struct Join Join;
struct Join
{
	Chan *c;
	Join *next;
};

enum
{
	Qdir,
	Qjoinfile,
};

Dirtab joindir[] =
{
	".",		{Qdir,0,QTDIR},	0,		DMDIR|0500,
	"new",		{Qjoinfile},	0,		0660,
};

static void
joinfree(Join *j)
{
	if(j == nil)
		return;
	joinfree(j->next);
	cclose(j->c);
	free(j);
}

static Chan*
joinattach(char *spec)
{
	return devattach('δ', spec);
}

static Walkqid*
joinwalk(Chan *c, Chan *nc, char **name, int nname)
{
	return devwalk(c, nc, name, nname, joindir, nelem(joindir), devgen);
}

static int
joinstat(Chan *c, uchar *db, int n)
{
        return devstat(c, db, n, joindir, nelem(joindir), devgen);
}

static Chan*
joinopen(Chan *c, int omode)
{
	return devopen(c, omode, joindir, nelem(joindir), devgen);
}

static void
joinclose(Chan *c)
{
	joinfree(c->aux);
	c->aux = nil;
}

static long
joinread(Chan *c, void *va, long n, vlong off)
{
	Join *j;
	long l;


	if(c->qid.type == QTDIR)
		return devdirread(c, va, n, joindir, nelem(joindir), devgen);

	l = 0;
	while(c->aux != nil) {
		j = c->aux;
		l = devtab[j->c->type]->read(j->c, va, n, off);
		if(l != 0)
			break;

		c->aux = j->next;
		cclose(j->c);
		free(j);
	}
	return l;
}

static long
joinwrite(Chan *c, void *va, long n, vlong off)
{
	char buf[32];
	int fd;
	Chan *jc;
	Join *j;
	Join *nj;

	if(c->qid.type == QTDIR)
		error(Eisdir);

	if(n >= sizeof buf+1)
		error(Ebadarg);
	memmove(buf, va, n);
	buf[n] = 0;
	fd = atoi(buf);
	jc = fdtochan(up->env->fgrp, fd, -1, 0, 1);
	nj = malloc(sizeof nj[0]);
	nj->c = jc;
	nj->next = nil;
	if(c->aux == nil) {
		c->aux = nj;
	} else {
		for(j = c->aux; j->next != nil; j = j->next)
			{}
		j->next = nj;
	}
	return n;
}


Dev joindevtab = {
	'δ',
	"join",

	devinit,
	joinattach,
	joinwalk,
	joinstat,
	joinopen,
	devcreate,
	joinclose,
	joinread,
	devbread,
	joinwrite,
	devbwrite,
	devremove,
	devwstat,
};

[-- Attachment #4: testbufjoin.b --]
[-- Type: chemical/x-molconn-Z, Size: 1303 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 20:59   ` Bakul Shah
@ 2009-12-06  7:45     ` Sam Watkins
  0 siblings, 0 replies; 46+ messages in thread
From: Sam Watkins @ 2009-12-06  7:45 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Dec 05, 2009 at 12:59:34PM -0800, Bakul Shah wrote:
> You cut out the bit about buffering where I explained what I meant.

Your idea seems good, so long as the OS buffers data and keeps it around until
all readers have consumed it there would be no problem.  This would be another
possible solution to my problem, you could fork the fd before reading the http
headers, read the headers on one fd, find how long they are, and seek forward
over the exact length of the headers in the other fd before execing the script.

The only problem would be that the OS might be required to keep an arbitrarily
large buffer for the fd, if one forked fd reads a long way ahead, but the other
stays still.

I do think it is a good idea to have shared pipes / input streams though, there
are many cases where two or more processes need to read the same input; and
it's inefficient to have multiple pipes for this purpose, when they could
easily share a single buffered "multi-pipe".  Perhaps a limitation could be
that if a process tries to read too far ahead from the other processes, it may
block.  This limit might be configurable as the "pipe size".  An httpd
application might (should) reject requests with over-large headers, so this
limitation would be is okay.

I still like my "join" function.  It can be used for other cases, such as when
you have to prepend a header before connecting the input stream to your CGI
script (or whatever it is).  It would make it easy to implement zero-copy, as
all plain in-to-out copying can be delegated to the OS and in many cases will
not require the OS to do any work, just to collapse two pipes together or
something simple like that.

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 20:27 ` erik quanstrom
@ 2009-12-05 20:59   ` Bakul Shah
  2009-12-06  7:45     ` Sam Watkins
  0 siblings, 1 reply; 46+ messages in thread
From: Bakul Shah @ 2009-12-05 20:59 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 05 Dec 2009 15:27:02 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> > To be precise, both fds have their own pointer (or offset)
> > and reading N bytes from some offset O must return the same
> > bytes.
>
> wrong.  /dev/random is my example.

You cut out the bit about buffering where I explained what I
meant.  As I said, those are the semantics I would choose so
by definition it is not "wrong"! Though it may not do what
you expect.  As a matter of fact I do see a use case for
/dev/random for getting repeatable random numbers! If you
want an independet stream of random numbers, just open
/dev/random again (or dup()), and not use fdfork().

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<20091205202420.855AD5B77@mail.bitblocks.com>
  2009-12-05 20:27 ` erik quanstrom
@ 2009-12-05 20:30 ` erik quanstrom
  1 sibling, 0 replies; 46+ messages in thread
From: erik quanstrom @ 2009-12-05 20:30 UTC (permalink / raw)
  To: 9fans

> For disk based files and fifos there should be no
> problem.

there is no such distinction in plan 9.

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<20091205202420.855AD5B77@mail.bitblocks.com>
@ 2009-12-05 20:27 ` erik quanstrom
  2009-12-05 20:59   ` Bakul Shah
  2009-12-05 20:30 ` erik quanstrom
  1 sibling, 1 reply; 46+ messages in thread
From: erik quanstrom @ 2009-12-05 20:27 UTC (permalink / raw)
  To: 9fans

> To be precise, both fds have their own pointer (or offset)
> and reading N bytes from some offset O must return the same
> bytes.

wrong.  /dev/random is my example.

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 20:03 ` erik quanstrom
@ 2009-12-05 20:24   ` Bakul Shah
  0 siblings, 0 replies; 46+ messages in thread
From: Bakul Shah @ 2009-12-05 20:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 05 Dec 2009 15:03:44 EST erik quanstrom <quanstro@quanstro.net>  wrote:
> > The OS support I am talking about:
> > a) the fork behavior on an open file should be available
> >    *without* forking.  dup() doesn't cut it (both fds share
> >    the same offset on the underlying file). I'd call the new
> >    syscall fdfork().  That is, if I do
> >
> >        int newfd = fdfork(oldfd);
> >
> >    reading N bytes each from newfd and oldfd will return
> >    identical data.
>
> i can't think of a way to do this correctly.  buffering in the
> kernel would only work if each process issued exactly the
> same set of reads.  there is no requirement that the data
> from 2 reads of 100 bytes each be the same as the data
> return with 1 200 byte read.

To be precise, both fds have their own pointer (or offset)
and reading N bytes from some offset O must return the same
bytes.  The semantics I'd choose is first read gets bufferred
and reads get satisfied first from buffered data and only
then from the underlying object. Same with writes.  They are
'write through".  If synthetic files do weird things at
different offsets or for different read/write counts, I'd
consider them uncacheable (and you shouldn't use fdfork with
them).  For disk based files and fifos there should be no
problem.

Note that Haskell streams are basically cacheable!

> before you bother with "but that's a wierd case", remember
> that the success of unix and plan 9 has been built on the
> fact that there aren't syscalls that fail in "wierd" cases.

I completely agree. But hey, I just came up with the idea and
haven't worked out all the design bugs (and may never)!  It
seemed worth sharing to elicit exactly the kind of feedback
you are giving.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<20091205194741.0697D5B76@mail.bitblocks.com>
@ 2009-12-05 20:03 ` erik quanstrom
  2009-12-05 20:24   ` Bakul Shah
  0 siblings, 1 reply; 46+ messages in thread
From: erik quanstrom @ 2009-12-05 20:03 UTC (permalink / raw)
  To: 9fans

> The OS support I am talking about:
> a) the fork behavior on an open file should be available
>    *without* forking.  dup() doesn't cut it (both fds share
>    the same offset on the underlying file). I'd call the new
>    syscall fdfork().  That is, if I do
>
>        int newfd = fdfork(oldfd);
>
>    reading N bytes each from newfd and oldfd will return
>    identical data.

i can't think of a way to do this correctly.  buffering in the
kernel would only work if each process issued exactly the
same set of reads.  there is no requirement that the data
from 2 reads of 100 bytes each be the same as the data
return with 1 200 byte read.

before you bother with "but that's a wierd case", remember
that the success of unix and plan 9 has been built on the
fact that there aren't syscalls that fail in "wierd" cases.

- erik

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 18:24   ` Tim Newsham
@ 2009-12-05 19:47     ` Bakul Shah
  2009-12-07 12:24       ` roger peppe
  2009-12-07 12:06     ` Mechiel Lukkien
  2010-01-05 13:48     ` Enrico Weigelt
  2 siblings, 1 reply; 46+ messages in thread
From: Bakul Shah @ 2009-12-05 19:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, 05 Dec 2009 08:24:45 -1000 Tim Newsham <newsham@lava.net>  wrote:
> >> I can see two possible solutions for this, both of which would be useful i
> n
> >> my
> >> opinion:
> >>
> >>  - an "unread" function, like ungetc, which allows a program to put back
> >> some
> >>    data that was already read to the OS stdin buffer (not the stdio
> >> buffer).
> >>    This might be problematic if there is a limit to the size of the
> >> buffers.
> >
> > Wouldn't it be a lot easier to change the convention of the
> > program you're forking and execing to take 1) a buffer of data
> > (passed via cmd line, or fd, or whatever) and 2) the fd with
> > the unconsumed part of the data?  The only data that would have
> > to be copied would be the preconsumed data that you would have
> > wanted to "unget".
>
> ps. if you wanted to hide this ugliness of passing a buffer and
> fd to a child process instead of just passing an fd, you could
> still solve it in userland without a syscall.  Write a library
> that does buffered IO.  Include unget() if you like.  Write the
> library in a way that you can initialize it after a fork/exec
> to pick up state from the parent (ie. by taking two fds,
> reading the buffer from the first, and continuing on with the
> 2nd when it is exhausted).
>
> Is there much benefit in doing this in the kernel instead?

Some OS support will help... but first let me provide some
motivation!

A useful abstraction for this sort of thing is "streams" as
in functional programming languages, where the tail of a
stream is computed as needed and the computed prefix of the
stream can be reread as many times as you wish (stuff no one
can reference any more will be garbage collected).  So for
example, if I define a "primes" stream, I can do

    100 `take` primes

in Haskell any number of times and always get the first 100
primes. If I wanted to pass entire primes stream *minus* the
first 100 to a function, I'd use "100 `drop` primes" to get
a new stream.

In the example given you'd represent your http data as a
stream (its tail is "computed" as you read from the
socket/fd), do any preprocessing you want and then pass the
whole stream on.  Data already read is buffered and you can
reread it from the stream.

Now unix/plan9 sort of do this for files but not when an fd
refers to a fifo of some sort. For an open file, after a fork
both the parent and the child start off at the same place in
the file but then they can read at different rates. But io to
fifos/sockets don't share this behavior.

The OS support I am talking about:
a) the fork behavior on an open file should be available
   *without* forking.  dup() doesn't cut it (both fds share
   the same offset on the underlying file). I'd call the new
   syscall fdfork().  That is, if I do

       int newfd = fdfork(oldfd);

   reading N bytes each from newfd and oldfd will return
   identical data.

b) there should be a way to implement the same semantics for
   fifos or communication end points (or any synthetic file).
   In the above example same N bytes must be returned even if
   the underlying object is not a file.

c) there should be a way to pass the fd (really, a capability)
   to another process.

Given these, what the OP wants can be implemented cleanly.
You fdfork() first, do all your analysis using one fd, close
it and then pass on the other fd to a helper process.

Implementing b) ideally requires the OS to store potentially
arbitrary amount of data.  But an implementation must set
some practical limit (like that on fifo buffering).

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 18:16 ` Tim Newsham
@ 2009-12-05 18:24   ` Tim Newsham
  2009-12-05 19:47     ` Bakul Shah
                       ` (2 more replies)
  0 siblings, 3 replies; 46+ messages in thread
From: Tim Newsham @ 2009-12-05 18:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

>> I can see two possible solutions for this, both of which would be useful in
>> my
>> opinion:
>>
>>  - an "unread" function, like ungetc, which allows a program to put back
>> some
>>    data that was already read to the OS stdin buffer (not the stdio
>> buffer).
>>    This might be problematic if there is a limit to the size of the
>> buffers.
>
> Wouldn't it be a lot easier to change the convention of the
> program you're forking and execing to take 1) a buffer of data
> (passed via cmd line, or fd, or whatever) and 2) the fd with
> the unconsumed part of the data?  The only data that would have
> to be copied would be the preconsumed data that you would have
> wanted to "unget".

ps. if you wanted to hide this ugliness of passing a buffer and
fd to a child process instead of just passing an fd, you could
still solve it in userland without a syscall.  Write a library
that does buffered IO.  Include unget() if you like.  Write the
library in a way that you can initialize it after a fork/exec
to pick up state from the parent (ie. by taking two fds,
reading the buffer from the first, and continuing on with the
2nd when it is exhausted).

Is there much benefit in doing this in the kernel instead?

Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  3:17 Sam Watkins
  2009-12-05  3:36 ` Lyndon Nerenberg
@ 2009-12-05 18:16 ` Tim Newsham
  2009-12-05 18:24   ` Tim Newsham
  1 sibling, 1 reply; 46+ messages in thread
From: Tim Newsham @ 2009-12-05 18:16 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I can see two possible solutions for this, both of which would be useful in my
> opinion:
>
>  - an "unread" function, like ungetc, which allows a program to put back some
>    data that was already read to the OS stdin buffer (not the stdio buffer).
>    This might be problematic if there is a limit to the size of the buffers.

Wouldn't it be a lot easier to change the convention of the
program you're forking and execing to take 1) a buffer of data
(passed via cmd line, or fd, or whatever) and 2) the fd with
the unconsumed part of the data?  The only data that would have
to be copied would be the preconsumed data that you would have
wanted to "unget".

> Sam

Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 17:01         ` Francisco J Ballesteros
@ 2009-12-05 17:09           ` ron minnich
  0 siblings, 0 replies; 46+ messages in thread
From: ron minnich @ 2009-12-05 17:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Dec 5, 2009 at 9:01 AM, Francisco J Ballesteros <nemo@lsub.org> wrote:
> I mostly agree, but, if you read one char at a time it's likely you'll
> become quite
> slow, in general.

Absolutely right. It's very application dependent. But for an httpd, I
doubt that this slowness would matter.

Anyway, I think Sam has something to work on, namely, try several
things out and let us know what he ends up liking best :-)

ron



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 16:32       ` ron minnich
@ 2009-12-05 17:01         ` Francisco J Ballesteros
  2009-12-05 17:09           ` ron minnich
  0 siblings, 1 reply; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-05 17:01 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I mostly agree, but, if you read one char at a time it's likely you'll
become quite
slow, in general. An external process providing `buffering' so you can
seek back if you want, seems to me like a more general solution that does
not require a kernel change.

In any case, if I gave the impression that it's not worth to
experiment, I apologize.
that's not what I tried to say.


On Sat, Dec 5, 2009 at 5:32 PM, ron minnich <rminnich@gmail.com> wrote:
> On Sat, Dec 5, 2009 at 3:44 AM, Francisco J Ballesteros <nemo@lsub.org> wrote:
>
>> If you insist on 'unreading', you could just put a front-end process that
>> keeps per-request data so that your external process can ask the
>> front-end for all the data again.
>
> The easiest way to implement unread is not to read in the first place.
>
> If you're only reading small amounts of data, say less then 1024
> bytes, and then forking a process to handle the rest, then by all
> means don't use IO that reads in lots of data you may not want.
> Instead:
>
> read(fd, &c, 1);
>
> and then there's no "overread" to deal with.
>
> That said, you can prototype unread() so why not?
> unread(fd, data, size);
>
> Attach the "unread" data to the open file struct, modify read so that
> if it sees this data it reads it first, try it out. Why not? Plan 9 is
> there to be hacked on, so hack it.
>
> Sam, the rule is, just do it. This hackability is one thing that makes
> Plan 9 so attractive.
>
> ron
>
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05 11:44     ` Francisco J Ballesteros
@ 2009-12-05 16:32       ` ron minnich
  2009-12-05 17:01         ` Francisco J Ballesteros
  0 siblings, 1 reply; 46+ messages in thread
From: ron minnich @ 2009-12-05 16:32 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Sat, Dec 5, 2009 at 3:44 AM, Francisco J Ballesteros <nemo@lsub.org> wrote:

> If you insist on 'unreading', you could just put a front-end process that
> keeps per-request data so that your external process can ask the
> front-end for all the data again.

The easiest way to implement unread is not to read in the first place.

If you're only reading small amounts of data, say less then 1024
bytes, and then forking a process to handle the rest, then by all
means don't use IO that reads in lots of data you may not want.
Instead:

read(fd, &c, 1);

and then there's no "overread" to deal with.

That said, you can prototype unread() so why not?
unread(fd, data, size);

Attach the "unread" data to the open file struct, modify read so that
if it sees this data it reads it first, try it out. Why not? Plan 9 is
there to be hacked on, so hack it.

Sam, the rule is, just do it. This hackability is one thing that makes
Plan 9 so attractive.

ron

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<20091205081032.GJ8759@nipl.net>
@ 2009-12-05 13:51 ` erik quanstrom
  0 siblings, 0 replies; 46+ messages in thread
From: erik quanstrom @ 2009-12-05 13:51 UTC (permalink / raw)
  To: 9fans

On Sat Dec  5 03:11:09 EST 2009, sam@nipl.net wrote:
> > the standard way of passing file descriptors is by fork/exec.
> > this allows security is handled by the normal means.
>
> Erik/others, would you please give some feedback on my idea (a join call which
> connects two fds together and disowns them from the process).  Passing fds
> around does not solve the same problems and has nothing to do with what I
> suggested.
>
> Perhaps this list is not the right place to air "new" or different ideas
> related to the implementation of operating systems?

the problem with syscalls is (as we see in linux and before them
berkeley), it is realatively easy to think of a special case for which
a specialized system call would be just the ticket.

the set of all these special cases is quite large. and since the goal
of plan 9 is to be a (relatively) general purpose operating system
that can be understood by a single person, and well-maintained
by a small group, one needs a pretty compelling case for a new
system call.

further, system calls are by definition tied to the machine the call
was made on.  system calls live outside the namespace.  i would
first think about doing this as a kernel file server.  but it seems
to me there are security concerns.

i don't yet see that a compelling case has been made for a new
system call or even a kernel fileserver.  a real world (working)
example and a demonstration of why existing mechanisms
fall short would be helpful.

- erik

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  8:10   ` Sam Watkins
@ 2009-12-05 11:44     ` Francisco J Ballesteros
  2009-12-05 16:32       ` ron minnich
  0 siblings, 1 reply; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-05 11:44 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I guess the question is, is this the easy way to address the problem
you try to solve? Or is it a solution seeking for a problem?
You could just forward the data to the new process. Is there a
performance problem here?

If you insist on 'unreading', you could just put a front-end process that
keeps per-request data so that your external process can ask the
front-end for all the data again.

Or I'm missing something.

On Sat, Dec 5, 2009 at 9:10 AM, Sam Watkins <sam@nipl.net> wrote:
>> the standard way of passing file descriptors is by fork/exec.
>> this allows security is handled by the normal means.
>
> Erik/others, would you please give some feedback on my idea (a join call which
> connects two fds together and disowns them from the process).  Passing fds
> around does not solve the same problems and has nothing to do with what I
> suggested.
>
> Perhaps this list is not the right place to air "new" or different ideas
> related to the implementation of operating systems?
>
> Sam
>
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  4:47 ` erik quanstrom
  2009-12-05  5:09   ` Lyndon Nerenberg
@ 2009-12-05  8:10   ` Sam Watkins
  2009-12-05 11:44     ` Francisco J Ballesteros
  1 sibling, 1 reply; 46+ messages in thread
From: Sam Watkins @ 2009-12-05  8:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> the standard way of passing file descriptors is by fork/exec.
> this allows security is handled by the normal means.

Erik/others, would you please give some feedback on my idea (a join call which
connects two fds together and disowns them from the process).  Passing fds
around does not solve the same problems and has nothing to do with what I
suggested.

Perhaps this list is not the right place to air "new" or different ideas
related to the implementation of operating systems?

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  5:09   ` Lyndon Nerenberg
@ 2009-12-05  5:11     ` Lyndon Nerenberg
  0 siblings, 0 replies; 46+ messages in thread
From: Lyndon Nerenberg @ 2009-12-05  5:11 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Where FD passing is useful is to avoid that fork/exec overhead.

Sorry -- brain in neutral. Where FD passing wins BIG is that the front-end
process doesn't have to do copy-through of all the data between the
network and the back-end process.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  4:47 ` erik quanstrom
@ 2009-12-05  5:09   ` Lyndon Nerenberg
  2009-12-05  5:11     ` Lyndon Nerenberg
  2009-12-05  8:10   ` Sam Watkins
  1 sibling, 1 reply; 46+ messages in thread
From: Lyndon Nerenberg @ 2009-12-05  5:09 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> the standard way of passing file descriptors is by fork/exec.
> this allows security is handled by the normal means.

Where FD passing is useful is to avoid that fork/exec overhead. The apps I
was working on had a relatively simple front-end process that would field
requests that required data to be crunched in various ways. Some of this
crunching had *very* high overhead relative to the volume of requests
coming in. Fork/exec simply would not scale. Instead we wrote long-lived
backend processors, and let the front-end act as a connection multiplexor,
handing the FDs from the incoming requests around as required to crunch
the data. This significantly reduced the system-related overhead, and also
made it very easy to chain filters together with the front-end managing
the whole thing from a single configuration file.

> this case would be handled by fork/exec.  the general case is
> handled by srv(3).

Well, srv(3) in reverse ... sort of.  I've been thinking about doing
something like this for a while now, specifically for httpd.  What I've
been scratching my head over is if the handoff between httpd and the
backends should be a raw file descriptor, or a 9P interface.  I need to
scratch together a prototype to experiment with but there's too much on my
plate right now.

--lyndon

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
       [not found] <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca>
@ 2009-12-05  4:47 ` erik quanstrom
  2009-12-05  5:09   ` Lyndon Nerenberg
  2009-12-05  8:10   ` Sam Watkins
  0 siblings, 2 replies; 46+ messages in thread
From: erik quanstrom @ 2009-12-05  4:47 UTC (permalink / raw)
  To: 9fans

On Fri Dec  4 22:39:59 EST 2009, lyndon@orthanc.ca wrote:
> > Another example, a little server that allows connections on a single port 443
> > for https and ssh.  Ideally after reading the "GET" or ssh banner, it can just
> > exec whichever server is needed (or fork and exec something like netcat).  but
> > in fact due to this "already read some data" problem, it has to stay alive and
> > copy the data in and out from the other server.
>
> It shouldn't be too difficult to write a device that allows file
> descriptors to be passed from one process to another.
>
> The functionality is quite useful. BSD has supported this since the dawn
> of time (SCM_RIGHTS), and I have used it in a few commercial network
> server products over the years. (Later System Vs have it as well, and
> Solaris supports it through their "doors" API. Stevens Vol. 2 describes
> the various APIs.)

the standard way of passing file descriptors is by fork/exec.
this allows security is handled by the normal means.

this case would be handled by fork/exec.  the general case is
handled by srv(3).

no sockets need apply.

- erik



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  3:56   ` Sam Watkins
@ 2009-12-05  4:03     ` Lyndon Nerenberg
  0 siblings, 0 replies; 46+ messages in thread
From: Lyndon Nerenberg @ 2009-12-05  4:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> My proposed type of CGI would have an advantage (?) that it presents a
> bidirectional socket to the script, rather than a file that was already read
> and saved to disk and a write-only socket.  CGI chat over a single http
> connection for example would be possible (if the browser/client also supported
> it).

Your CGI scripts aren't going to run on Plan 9 anyway. For the work it
will take to port stuff you're better off inventing a better method that
takes advantage of Plan 9's facilities.

I'm interested in what can be done on Plan 9, not on mythical utopias for
other OSes intractable problems.

Ditch CGI, replace the execed scripts with long-running servers, and turn
httpd into a dispatcher that hands off FDs based on a URL matching scheme.
You could probably even hack up the plumber as the dispatcher.

--lyndon

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  3:36 ` Lyndon Nerenberg
@ 2009-12-05  3:56   ` Sam Watkins
  2009-12-05  4:03     ` Lyndon Nerenberg
  0 siblings, 1 reply; 46+ messages in thread
From: Sam Watkins @ 2009-12-05  3:56 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Fri, Dec 04, 2009 at 08:36:29PM -0700, Lyndon Nerenberg wrote:
> >Another example, a little server that allows connections on a single port
> >443 for https and ssh.  Ideally after reading the "GET" or ssh banner, it
> >can just exec whichever server is needed (or fork and exec something like
> >netcat).  but in fact due to this "already read some data" problem, it has
> >to stay alive and copy the data in and out from the other server.
>
> It shouldn't be too difficult to write a device that allows file descriptors
> to be passed from one process to another.

You can do that with unix-domain sockets (or fork, sort of), but I don't think
it solves the problem.  "fork" also shares fds, but sharing or sending fds does
not let me send some extra prefix data to a CGI script's stdin fd then exit and
let the CGI script take over reading from my old stdin fd, if that makes any
sense.  also, obviously I don't want to have to hack every CGI script in
existance to make it work.

Another possible solution, which would only work with http (so it's not a real
solution) would be a function like "read_until" where it would stop reading
just before a delimiter "\r\n\r\n" in that case of http.  That would not help
with the ssh/https and similar multiplexing problems.  I think the best way
would be my proposed "join" system call.

My proposed type of CGI would have an advantage (?) that it presents a
bidirectional socket to the script, rather than a file that was already read
and saved to disk and a write-only socket.  CGI chat over a single http
connection for example would be possible (if the browser/client also supported
it).

Maybe I need to draw some ascii-art.

This may be off topic here since it's not specific to plan 9 but I suppose
people here may be interested in topics like this.  and I don't think I want to
brave the lkml right now!  "splice" and "sendfile" on Linux are similar
contepts to my "join" I guess.  I think join is better though!

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [9fans] ideas for helpful system io functions
  2009-12-05  3:17 Sam Watkins
@ 2009-12-05  3:36 ` Lyndon Nerenberg
  2009-12-05  3:56   ` Sam Watkins
  2009-12-05 18:16 ` Tim Newsham
  1 sibling, 1 reply; 46+ messages in thread
From: Lyndon Nerenberg @ 2009-12-05  3:36 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> Another example, a little server that allows connections on a single port 443
> for https and ssh.  Ideally after reading the "GET" or ssh banner, it can just
> exec whichever server is needed (or fork and exec something like netcat).  but
> in fact due to this "already read some data" problem, it has to stay alive and
> copy the data in and out from the other server.

It shouldn't be too difficult to write a device that allows file
descriptors to be passed from one process to another.

The functionality is quite useful. BSD has supported this since the dawn
of time (SCM_RIGHTS), and I have used it in a few commercial network
server products over the years. (Later System Vs have it as well, and
Solaris supports it through their "doors" API. Stevens Vol. 2 describes
the various APIs.)

--lyndon

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [9fans] ideas for helpful system io functions
@ 2009-12-05  3:17 Sam Watkins
  2009-12-05  3:36 ` Lyndon Nerenberg
  2009-12-05 18:16 ` Tim Newsham
  0 siblings, 2 replies; 46+ messages in thread
From: Sam Watkins @ 2009-12-05  3:17 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

I have two ideas for io functions that I think would be helpful, they are
alternative options to solve a simple problem really.  I don't know if plan 9
has any functions like these already.

For example, when starting a CGI script for a POST request, a httpd reads the
http headers but typically also the first little bit of the POST data.  I would
like to be able to simply fork and exec the CGI script, but this missing POST
data means this will not work.  The httpd has to write the POST data to a
temporary file, or else use a temporary "socketpair" or similar to communicate
with the CGI script.  Hopefully you know what I mean.

Another example, a little server that allows connections on a single port 443
for https and ssh.  Ideally after reading the "GET" or ssh banner, it can just
exec whichever server is needed (or fork and exec something like netcat).  but
in fact due to this "already read some data" problem, it has to stay alive and
copy the data in and out from the other server.

I can see two possible solutions for this, both of which would be useful in my
opinion:

  - an "unread" function, like ungetc, which allows a program to put back some
    data that was already read to the OS stdin buffer (not the stdio buffer).
    This might be problematic if there is a limit to the size of the buffers.
  - a "join" function (or something) which allows a process to unify/join its
    file descriptors (e.g. before exiting).  For example join(0, 1) would
    connect STDIN directly to STDOUT.  The OS might need to interpose a
    "sendfile"-like copy mechanism, or collapse a pipe or socket, to make this
    work nicely.  This would allow a process to fork, write some data to
    STDOUT, join(0, 1) and exit, solving this problem I mentioned.

what do you think?  I doubt these ideas are original, but I think they would be useful and I don't know of any implementation in unix or any other OS.

Sam

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2010-01-05 15:53 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca>
2009-12-05 13:26 ` [9fans] ideas for helpful system io functions erik quanstrom
2009-12-05 14:22   ` Sam Watkins
2009-12-05 17:47     ` Skip Tavakkolian
2009-12-05 17:56       ` Skip Tavakkolian
     [not found] <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com>
2009-12-07 16:24 ` erik quanstrom
2009-12-07 16:48   ` Francisco J Ballesteros
2009-12-07 14:41 Francisco J Ballesteros
2009-12-07 15:11 ` roger peppe
     [not found] <<20091207120652.GB16320@knaagkever.ueber.net>
2009-12-07 12:19 ` erik quanstrom
     [not found] <<20091205202420.855AD5B77@mail.bitblocks.com>
2009-12-05 20:27 ` erik quanstrom
2009-12-05 20:59   ` Bakul Shah
2009-12-06  7:45     ` Sam Watkins
2009-12-05 20:30 ` erik quanstrom
     [not found] <<20091205194741.0697D5B76@mail.bitblocks.com>
2009-12-05 20:03 ` erik quanstrom
2009-12-05 20:24   ` Bakul Shah
     [not found] <<20091205081032.GJ8759@nipl.net>
2009-12-05 13:51 ` erik quanstrom
     [not found] <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca>
2009-12-05  4:47 ` erik quanstrom
2009-12-05  5:09   ` Lyndon Nerenberg
2009-12-05  5:11     ` Lyndon Nerenberg
2009-12-05  8:10   ` Sam Watkins
2009-12-05 11:44     ` Francisco J Ballesteros
2009-12-05 16:32       ` ron minnich
2009-12-05 17:01         ` Francisco J Ballesteros
2009-12-05 17:09           ` ron minnich
  -- strict thread matches above, loose matches on Subject: below --
2009-12-05  3:17 Sam Watkins
2009-12-05  3:36 ` Lyndon Nerenberg
2009-12-05  3:56   ` Sam Watkins
2009-12-05  4:03     ` Lyndon Nerenberg
2009-12-05 18:16 ` Tim Newsham
2009-12-05 18:24   ` Tim Newsham
2009-12-05 19:47     ` Bakul Shah
2009-12-07 12:24       ` roger peppe
2009-12-07 12:32         ` Charles Forsyth
2009-12-07 12:35           ` Francisco J Ballesteros
2009-12-07 13:42             ` Charles Forsyth
2009-12-07 16:10             ` erik quanstrom
2009-12-07 16:14               ` Francisco J Ballesteros
2009-12-07 14:13         ` Sam Watkins
2009-12-07 14:36           ` roger peppe
2009-12-07 19:11             ` Nathaniel W Filardo
2009-12-07 21:03               ` roger peppe
2009-12-08 12:51           ` matt
2009-12-07 12:06     ` Mechiel Lukkien
2009-12-07 12:31       ` roger peppe
2010-01-05 13:48     ` Enrico Weigelt
2010-01-05 15:53       ` Steve Simon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).