* Re: [9fans] ideas for helpful system io functions [not found] <<20091207120652.GB16320@knaagkever.ueber.net> @ 2009-12-07 12:19 ` erik quanstrom 0 siblings, 0 replies; 46+ messages in thread From: erik quanstrom @ 2009-12-07 12:19 UTC (permalink / raw) To: 9fans > since file descriptors are so essential, it may help to have "tools" > to use them. yesterday evening i hacked up devbuf.c and devjoin.c > after reading this thread. both offer a file "new". for devbuf.c > you can write data to it, then later consume it (yes, you could just > use a pipe instead). why can't you use ramfs instead of devbuf? - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com>]
* Re: [9fans] ideas for helpful system io functions [not found] <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com> @ 2009-12-07 16:24 ` erik quanstrom 2009-12-07 16:48 ` Francisco J Ballesteros 0 siblings, 1 reply; 46+ messages in thread From: erik quanstrom @ 2009-12-07 16:24 UTC (permalink / raw) To: 9fans On Mon Dec 7 11:16:04 EST 2009, nemo@lsub.org wrote: > It seems that changing a bit fs(3) can suffice and is generic > enough for all usages required. In the end it might result in code > removed instead of adding code, but time will tell. As of today, it's > only an experiment. not everyone who uses usb disks also uses fs. i also like the idea of a stand-alone device. i can also put the entire configuration of a loop device in plan9.ini. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 16:24 ` erik quanstrom @ 2009-12-07 16:48 ` Francisco J Ballesteros 0 siblings, 0 replies; 46+ messages in thread From: Francisco J Ballesteros @ 2009-12-07 16:48 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs the idea is that if fs knows enough to handle partitions like we are accustomed to, then partitioning code can be removed from everywhere else (but for compat) and existing tools used to handle partitions (e.g., fdisk) very much like they are used now. Either way, It's not standalone, in one case you require loop, in the other fs. both can load their configs and both require help to learn which partitions to use at boot time. On Mon, Dec 7, 2009 at 5:24 PM, erik quanstrom <quanstro@quanstro.net> wrote: > On Mon Dec 7 11:16:04 EST 2009, nemo@lsub.org wrote: >> It seems that changing a bit fs(3) can suffice and is generic >> enough for all usages required. In the end it might result in code >> removed instead of adding code, but time will tell. As of today, it's >> only an experiment. > > not everyone who uses usb disks also uses fs. > i also like the idea of a stand-alone device. > i can also put the entire configuration of a > loop device in plan9.ini. > > - erik > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions
@ 2009-12-07 14:41 Francisco J Ballesteros
2009-12-07 15:11 ` roger peppe
0 siblings, 1 reply; 46+ messages in thread
From: Francisco J Ballesteros @ 2009-12-07 14:41 UTC (permalink / raw)
To: 9fans
I think he wants copyfile + a kproc.
On 07/12/2009, at 15:37, rogpeppe@gmail.com wrote:
> 2009/12/7 Sam Watkins <sam@nipl.net>:
>> I meant for example if a process is reading from its stdin a open
>> file 'A' and
>> writing to stdout the input of a pipe 'B', rather than looping and
>> forwarding
>> data it may simply "join" these two fds, and exit. The OS will
>> then do what is
>> necessary to make sure the data can travel from A to B (and/or vice
>> versa) with
>> the minimum effort needed.
>
> i'm not sure how you think this would work.
>
> a file descriptor is essentially a passive object - it responds
> to read, write, etc requests on it, but it doesn't do anything
> of its own accord.
>
> if i do:
>
> fd1 := open("/foo1", ORDWR);
> fd2 := open("/foo2", ORDWR);
> fd3 := fdjoin(fd1, fd2);
>
> what is going to happen?
> something has got to initiate the requests to actually
> shift the data, and it's not clear which direction the
> data will flow.
>
> this is an optimisation, right? what parts of the current system
> could be speeded up by the use of this primitive?
>
> [/mail/box/nemo/msgs/200912/452]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 14:41 Francisco J Ballesteros @ 2009-12-07 15:11 ` roger peppe 0 siblings, 0 replies; 46+ messages in thread From: roger peppe @ 2009-12-07 15:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/12/7 Francisco J Ballesteros <nemo@lsub.org>: > I think he wants copyfile + a kproc. yup, i was thinking of inferno's sys->stream(). but neither is in a position to do the kind of redundancy optimisation that sam was talking about, AFAICS. at least it can avoid copying by calling bread and bwrite. ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<20091205202420.855AD5B77@mail.bitblocks.com>]
* Re: [9fans] ideas for helpful system io functions [not found] <<20091205202420.855AD5B77@mail.bitblocks.com> @ 2009-12-05 20:27 ` erik quanstrom 2009-12-05 20:59 ` Bakul Shah 2009-12-05 20:30 ` erik quanstrom 1 sibling, 1 reply; 46+ messages in thread From: erik quanstrom @ 2009-12-05 20:27 UTC (permalink / raw) To: 9fans > To be precise, both fds have their own pointer (or offset) > and reading N bytes from some offset O must return the same > bytes. wrong. /dev/random is my example. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 20:27 ` erik quanstrom @ 2009-12-05 20:59 ` Bakul Shah 2009-12-06 7:45 ` Sam Watkins 0 siblings, 1 reply; 46+ messages in thread From: Bakul Shah @ 2009-12-05 20:59 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, 05 Dec 2009 15:27:02 EST erik quanstrom <quanstro@quanstro.net> wrote: > > To be precise, both fds have their own pointer (or offset) > > and reading N bytes from some offset O must return the same > > bytes. > > wrong. /dev/random is my example. You cut out the bit about buffering where I explained what I meant. As I said, those are the semantics I would choose so by definition it is not "wrong"! Though it may not do what you expect. As a matter of fact I do see a use case for /dev/random for getting repeatable random numbers! If you want an independet stream of random numbers, just open /dev/random again (or dup()), and not use fdfork(). ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 20:59 ` Bakul Shah @ 2009-12-06 7:45 ` Sam Watkins 0 siblings, 0 replies; 46+ messages in thread From: Sam Watkins @ 2009-12-06 7:45 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Dec 05, 2009 at 12:59:34PM -0800, Bakul Shah wrote: > You cut out the bit about buffering where I explained what I meant. Your idea seems good, so long as the OS buffers data and keeps it around until all readers have consumed it there would be no problem. This would be another possible solution to my problem, you could fork the fd before reading the http headers, read the headers on one fd, find how long they are, and seek forward over the exact length of the headers in the other fd before execing the script. The only problem would be that the OS might be required to keep an arbitrarily large buffer for the fd, if one forked fd reads a long way ahead, but the other stays still. I do think it is a good idea to have shared pipes / input streams though, there are many cases where two or more processes need to read the same input; and it's inefficient to have multiple pipes for this purpose, when they could easily share a single buffered "multi-pipe". Perhaps a limitation could be that if a process tries to read too far ahead from the other processes, it may block. This limit might be configurable as the "pipe size". An httpd application might (should) reject requests with over-large headers, so this limitation would be is okay. I still like my "join" function. It can be used for other cases, such as when you have to prepend a header before connecting the input stream to your CGI script (or whatever it is). It would make it easy to implement zero-copy, as all plain in-to-out copying can be delegated to the OS and in many cases will not require the OS to do any work, just to collapse two pipes together or something simple like that. Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions [not found] <<20091205202420.855AD5B77@mail.bitblocks.com> 2009-12-05 20:27 ` erik quanstrom @ 2009-12-05 20:30 ` erik quanstrom 1 sibling, 0 replies; 46+ messages in thread From: erik quanstrom @ 2009-12-05 20:30 UTC (permalink / raw) To: 9fans > For disk based files and fifos there should be no > problem. there is no such distinction in plan 9. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<20091205194741.0697D5B76@mail.bitblocks.com>]
* Re: [9fans] ideas for helpful system io functions [not found] <<20091205194741.0697D5B76@mail.bitblocks.com> @ 2009-12-05 20:03 ` erik quanstrom 2009-12-05 20:24 ` Bakul Shah 0 siblings, 1 reply; 46+ messages in thread From: erik quanstrom @ 2009-12-05 20:03 UTC (permalink / raw) To: 9fans > The OS support I am talking about: > a) the fork behavior on an open file should be available > *without* forking. dup() doesn't cut it (both fds share > the same offset on the underlying file). I'd call the new > syscall fdfork(). That is, if I do > > int newfd = fdfork(oldfd); > > reading N bytes each from newfd and oldfd will return > identical data. i can't think of a way to do this correctly. buffering in the kernel would only work if each process issued exactly the same set of reads. there is no requirement that the data from 2 reads of 100 bytes each be the same as the data return with 1 200 byte read. before you bother with "but that's a wierd case", remember that the success of unix and plan 9 has been built on the fact that there aren't syscalls that fail in "wierd" cases. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 20:03 ` erik quanstrom @ 2009-12-05 20:24 ` Bakul Shah 0 siblings, 0 replies; 46+ messages in thread From: Bakul Shah @ 2009-12-05 20:24 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, 05 Dec 2009 15:03:44 EST erik quanstrom <quanstro@quanstro.net> wrote: > > The OS support I am talking about: > > a) the fork behavior on an open file should be available > > *without* forking. dup() doesn't cut it (both fds share > > the same offset on the underlying file). I'd call the new > > syscall fdfork(). That is, if I do > > > > int newfd = fdfork(oldfd); > > > > reading N bytes each from newfd and oldfd will return > > identical data. > > i can't think of a way to do this correctly. buffering in the > kernel would only work if each process issued exactly the > same set of reads. there is no requirement that the data > from 2 reads of 100 bytes each be the same as the data > return with 1 200 byte read. To be precise, both fds have their own pointer (or offset) and reading N bytes from some offset O must return the same bytes. The semantics I'd choose is first read gets bufferred and reads get satisfied first from buffered data and only then from the underlying object. Same with writes. They are 'write through". If synthetic files do weird things at different offsets or for different read/write counts, I'd consider them uncacheable (and you shouldn't use fdfork with them). For disk based files and fifos there should be no problem. Note that Haskell streams are basically cacheable! > before you bother with "but that's a wierd case", remember > that the success of unix and plan 9 has been built on the > fact that there aren't syscalls that fail in "wierd" cases. I completely agree. But hey, I just came up with the idea and haven't worked out all the design bugs (and may never)! It seemed worth sharing to elicit exactly the kind of feedback you are giving. ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<20091205081032.GJ8759@nipl.net>]
* Re: [9fans] ideas for helpful system io functions [not found] <<20091205081032.GJ8759@nipl.net> @ 2009-12-05 13:51 ` erik quanstrom 0 siblings, 0 replies; 46+ messages in thread From: erik quanstrom @ 2009-12-05 13:51 UTC (permalink / raw) To: 9fans On Sat Dec 5 03:11:09 EST 2009, sam@nipl.net wrote: > > the standard way of passing file descriptors is by fork/exec. > > this allows security is handled by the normal means. > > Erik/others, would you please give some feedback on my idea (a join call which > connects two fds together and disowns them from the process). Passing fds > around does not solve the same problems and has nothing to do with what I > suggested. > > Perhaps this list is not the right place to air "new" or different ideas > related to the implementation of operating systems? the problem with syscalls is (as we see in linux and before them berkeley), it is realatively easy to think of a special case for which a specialized system call would be just the ticket. the set of all these special cases is quite large. and since the goal of plan 9 is to be a (relatively) general purpose operating system that can be understood by a single person, and well-maintained by a small group, one needs a pretty compelling case for a new system call. further, system calls are by definition tied to the machine the call was made on. system calls live outside the namespace. i would first think about doing this as a kernel file server. but it seems to me there are security concerns. i don't yet see that a compelling case has been made for a new system call or even a kernel fileserver. a real world (working) example and a demonstration of why existing mechanisms fall short would be helpful. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca>]
* Re: [9fans] ideas for helpful system io functions [not found] <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca> @ 2009-12-05 13:26 ` erik quanstrom 2009-12-05 14:22 ` Sam Watkins 0 siblings, 1 reply; 46+ messages in thread From: erik quanstrom @ 2009-12-05 13:26 UTC (permalink / raw) To: 9fans On Sat Dec 5 00:12:53 EST 2009, lyndon@orthanc.ca wrote: > > Where FD passing is useful is to avoid that fork/exec overhead. > > Sorry -- brain in neutral. Where FD passing wins BIG is that the front-end > process doesn't have to do copy-through of all the data between the > network and the back-end process. if you don't need to modify the data futher, then exec the guy who does. by the way, during some recent testing, i was able to move ~100k packets-per-second and create 25 million new processes / day with a load of 0 on a lowly 1.6ghz woodcrest. if you were to get 25m http requests/day and each did only 4k of i/o that's 97gb/day which is > 10mbit. i think for many net-facing applications, you'll easily be able to fork/exec fast enough and eliminating the fork/exec would be a premature optimization that would cost tons in development, debugging and maintence time. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 13:26 ` erik quanstrom @ 2009-12-05 14:22 ` Sam Watkins 2009-12-05 17:47 ` Skip Tavakkolian 0 siblings, 1 reply; 46+ messages in thread From: Sam Watkins @ 2009-12-05 14:22 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Dec 05, 2009 at 08:26:20AM -0500, erik quanstrom wrote: > if you don't need to modify the data futher, then exec the guy who > does. This is my issue - when I want to exec, too much of the request data has already been read. I don't want to be calling read(fd, buf, 1) in a loop. I would like to pass the extra buffered data to the guy I am execing then let him read the rest directly from the socket, but I see no existing way to do that. hence my suggestions for alternative ways. my "join" suggestion is the most versatile so probably the best. > by the way, during some recent testing, i was able to move ~100k > packets-per-second and create 25 million new processes / day with > a load of 0 on a lowly 1.6ghz woodcrest. nice. Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 14:22 ` Sam Watkins @ 2009-12-05 17:47 ` Skip Tavakkolian 2009-12-05 17:56 ` Skip Tavakkolian 0 siblings, 1 reply; 46+ messages in thread From: Skip Tavakkolian @ 2009-12-05 17:47 UTC (permalink / raw) To: 9fans > I would like to pass the extra buffered data to the guy I am execing then let > him read the rest directly from the socket, but I see no existing way to do > that. httpd passes the headers and any left over buffer it has already read to /magic apps through a command line param. there's a function for parsing and placing the "unread" stuff in the input buffer that's tied to the fd. /sys/src/libhttpd/hio.c:233: hunload(Hio *h) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 17:47 ` Skip Tavakkolian @ 2009-12-05 17:56 ` Skip Tavakkolian 0 siblings, 0 replies; 46+ messages in thread From: Skip Tavakkolian @ 2009-12-05 17:56 UTC (permalink / raw) To: 9fans >> I would like to pass the extra buffered data to the guy I am execing then let >> him read the rest directly from the socket, but I see no existing way to do >> that. > > httpd passes the headers and any left over buffer it has already read to /magic > apps through a command line param. there's a function for parsing and placing > the "unread" stuff in the input buffer that's tied to the fd. > > /sys/src/libhttpd/hio.c:233: hunload(Hio *h) also: /sys/src/libhttpd/hio.c:274: hload(Hio *h, char *buf) /sys/src/cmd/ip/httpd/init.c:23: init(int argc, char **argv) ^ permalink raw reply [flat|nested] 46+ messages in thread
[parent not found: <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca>]
* Re: [9fans] ideas for helpful system io functions [not found] <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca> @ 2009-12-05 4:47 ` erik quanstrom 2009-12-05 5:09 ` Lyndon Nerenberg 2009-12-05 8:10 ` Sam Watkins 0 siblings, 2 replies; 46+ messages in thread From: erik quanstrom @ 2009-12-05 4:47 UTC (permalink / raw) To: 9fans On Fri Dec 4 22:39:59 EST 2009, lyndon@orthanc.ca wrote: > > Another example, a little server that allows connections on a single port 443 > > for https and ssh. Ideally after reading the "GET" or ssh banner, it can just > > exec whichever server is needed (or fork and exec something like netcat). but > > in fact due to this "already read some data" problem, it has to stay alive and > > copy the data in and out from the other server. > > It shouldn't be too difficult to write a device that allows file > descriptors to be passed from one process to another. > > The functionality is quite useful. BSD has supported this since the dawn > of time (SCM_RIGHTS), and I have used it in a few commercial network > server products over the years. (Later System Vs have it as well, and > Solaris supports it through their "doors" API. Stevens Vol. 2 describes > the various APIs.) the standard way of passing file descriptors is by fork/exec. this allows security is handled by the normal means. this case would be handled by fork/exec. the general case is handled by srv(3). no sockets need apply. - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 4:47 ` erik quanstrom @ 2009-12-05 5:09 ` Lyndon Nerenberg 2009-12-05 5:11 ` Lyndon Nerenberg 2009-12-05 8:10 ` Sam Watkins 1 sibling, 1 reply; 46+ messages in thread From: Lyndon Nerenberg @ 2009-12-05 5:09 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > the standard way of passing file descriptors is by fork/exec. > this allows security is handled by the normal means. Where FD passing is useful is to avoid that fork/exec overhead. The apps I was working on had a relatively simple front-end process that would field requests that required data to be crunched in various ways. Some of this crunching had *very* high overhead relative to the volume of requests coming in. Fork/exec simply would not scale. Instead we wrote long-lived backend processors, and let the front-end act as a connection multiplexor, handing the FDs from the incoming requests around as required to crunch the data. This significantly reduced the system-related overhead, and also made it very easy to chain filters together with the front-end managing the whole thing from a single configuration file. > this case would be handled by fork/exec. the general case is > handled by srv(3). Well, srv(3) in reverse ... sort of. I've been thinking about doing something like this for a while now, specifically for httpd. What I've been scratching my head over is if the handoff between httpd and the backends should be a raw file descriptor, or a 9P interface. I need to scratch together a prototype to experiment with but there's too much on my plate right now. --lyndon ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 5:09 ` Lyndon Nerenberg @ 2009-12-05 5:11 ` Lyndon Nerenberg 0 siblings, 0 replies; 46+ messages in thread From: Lyndon Nerenberg @ 2009-12-05 5:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > Where FD passing is useful is to avoid that fork/exec overhead. Sorry -- brain in neutral. Where FD passing wins BIG is that the front-end process doesn't have to do copy-through of all the data between the network and the back-end process. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 4:47 ` erik quanstrom 2009-12-05 5:09 ` Lyndon Nerenberg @ 2009-12-05 8:10 ` Sam Watkins 2009-12-05 11:44 ` Francisco J Ballesteros 1 sibling, 1 reply; 46+ messages in thread From: Sam Watkins @ 2009-12-05 8:10 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > the standard way of passing file descriptors is by fork/exec. > this allows security is handled by the normal means. Erik/others, would you please give some feedback on my idea (a join call which connects two fds together and disowns them from the process). Passing fds around does not solve the same problems and has nothing to do with what I suggested. Perhaps this list is not the right place to air "new" or different ideas related to the implementation of operating systems? Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 8:10 ` Sam Watkins @ 2009-12-05 11:44 ` Francisco J Ballesteros 2009-12-05 16:32 ` ron minnich 0 siblings, 1 reply; 46+ messages in thread From: Francisco J Ballesteros @ 2009-12-05 11:44 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I guess the question is, is this the easy way to address the problem you try to solve? Or is it a solution seeking for a problem? You could just forward the data to the new process. Is there a performance problem here? If you insist on 'unreading', you could just put a front-end process that keeps per-request data so that your external process can ask the front-end for all the data again. Or I'm missing something. On Sat, Dec 5, 2009 at 9:10 AM, Sam Watkins <sam@nipl.net> wrote: >> the standard way of passing file descriptors is by fork/exec. >> this allows security is handled by the normal means. > > Erik/others, would you please give some feedback on my idea (a join call which > connects two fds together and disowns them from the process). Passing fds > around does not solve the same problems and has nothing to do with what I > suggested. > > Perhaps this list is not the right place to air "new" or different ideas > related to the implementation of operating systems? > > Sam > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 11:44 ` Francisco J Ballesteros @ 2009-12-05 16:32 ` ron minnich 2009-12-05 17:01 ` Francisco J Ballesteros 0 siblings, 1 reply; 46+ messages in thread From: ron minnich @ 2009-12-05 16:32 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Dec 5, 2009 at 3:44 AM, Francisco J Ballesteros <nemo@lsub.org> wrote: > If you insist on 'unreading', you could just put a front-end process that > keeps per-request data so that your external process can ask the > front-end for all the data again. The easiest way to implement unread is not to read in the first place. If you're only reading small amounts of data, say less then 1024 bytes, and then forking a process to handle the rest, then by all means don't use IO that reads in lots of data you may not want. Instead: read(fd, &c, 1); and then there's no "overread" to deal with. That said, you can prototype unread() so why not? unread(fd, data, size); Attach the "unread" data to the open file struct, modify read so that if it sees this data it reads it first, try it out. Why not? Plan 9 is there to be hacked on, so hack it. Sam, the rule is, just do it. This hackability is one thing that makes Plan 9 so attractive. ron ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 16:32 ` ron minnich @ 2009-12-05 17:01 ` Francisco J Ballesteros 2009-12-05 17:09 ` ron minnich 0 siblings, 1 reply; 46+ messages in thread From: Francisco J Ballesteros @ 2009-12-05 17:01 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I mostly agree, but, if you read one char at a time it's likely you'll become quite slow, in general. An external process providing `buffering' so you can seek back if you want, seems to me like a more general solution that does not require a kernel change. In any case, if I gave the impression that it's not worth to experiment, I apologize. that's not what I tried to say. On Sat, Dec 5, 2009 at 5:32 PM, ron minnich <rminnich@gmail.com> wrote: > On Sat, Dec 5, 2009 at 3:44 AM, Francisco J Ballesteros <nemo@lsub.org> wrote: > >> If you insist on 'unreading', you could just put a front-end process that >> keeps per-request data so that your external process can ask the >> front-end for all the data again. > > The easiest way to implement unread is not to read in the first place. > > If you're only reading small amounts of data, say less then 1024 > bytes, and then forking a process to handle the rest, then by all > means don't use IO that reads in lots of data you may not want. > Instead: > > read(fd, &c, 1); > > and then there's no "overread" to deal with. > > That said, you can prototype unread() so why not? > unread(fd, data, size); > > Attach the "unread" data to the open file struct, modify read so that > if it sees this data it reads it first, try it out. Why not? Plan 9 is > there to be hacked on, so hack it. > > Sam, the rule is, just do it. This hackability is one thing that makes > Plan 9 so attractive. > > ron > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 17:01 ` Francisco J Ballesteros @ 2009-12-05 17:09 ` ron minnich 0 siblings, 0 replies; 46+ messages in thread From: ron minnich @ 2009-12-05 17:09 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Dec 5, 2009 at 9:01 AM, Francisco J Ballesteros <nemo@lsub.org> wrote: > I mostly agree, but, if you read one char at a time it's likely you'll > become quite > slow, in general. Absolutely right. It's very application dependent. But for an httpd, I doubt that this slowness would matter. Anyway, I think Sam has something to work on, namely, try several things out and let us know what he ends up liking best :-) ron ^ permalink raw reply [flat|nested] 46+ messages in thread
* [9fans] ideas for helpful system io functions @ 2009-12-05 3:17 Sam Watkins 2009-12-05 3:36 ` Lyndon Nerenberg 2009-12-05 18:16 ` Tim Newsham 0 siblings, 2 replies; 46+ messages in thread From: Sam Watkins @ 2009-12-05 3:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs I have two ideas for io functions that I think would be helpful, they are alternative options to solve a simple problem really. I don't know if plan 9 has any functions like these already. For example, when starting a CGI script for a POST request, a httpd reads the http headers but typically also the first little bit of the POST data. I would like to be able to simply fork and exec the CGI script, but this missing POST data means this will not work. The httpd has to write the POST data to a temporary file, or else use a temporary "socketpair" or similar to communicate with the CGI script. Hopefully you know what I mean. Another example, a little server that allows connections on a single port 443 for https and ssh. Ideally after reading the "GET" or ssh banner, it can just exec whichever server is needed (or fork and exec something like netcat). but in fact due to this "already read some data" problem, it has to stay alive and copy the data in and out from the other server. I can see two possible solutions for this, both of which would be useful in my opinion: - an "unread" function, like ungetc, which allows a program to put back some data that was already read to the OS stdin buffer (not the stdio buffer). This might be problematic if there is a limit to the size of the buffers. - a "join" function (or something) which allows a process to unify/join its file descriptors (e.g. before exiting). For example join(0, 1) would connect STDIN directly to STDOUT. The OS might need to interpose a "sendfile"-like copy mechanism, or collapse a pipe or socket, to make this work nicely. This would allow a process to fork, write some data to STDOUT, join(0, 1) and exit, solving this problem I mentioned. what do you think? I doubt these ideas are original, but I think they would be useful and I don't know of any implementation in unix or any other OS. Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 3:17 Sam Watkins @ 2009-12-05 3:36 ` Lyndon Nerenberg 2009-12-05 3:56 ` Sam Watkins 2009-12-05 18:16 ` Tim Newsham 1 sibling, 1 reply; 46+ messages in thread From: Lyndon Nerenberg @ 2009-12-05 3:36 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > Another example, a little server that allows connections on a single port 443 > for https and ssh. Ideally after reading the "GET" or ssh banner, it can just > exec whichever server is needed (or fork and exec something like netcat). but > in fact due to this "already read some data" problem, it has to stay alive and > copy the data in and out from the other server. It shouldn't be too difficult to write a device that allows file descriptors to be passed from one process to another. The functionality is quite useful. BSD has supported this since the dawn of time (SCM_RIGHTS), and I have used it in a few commercial network server products over the years. (Later System Vs have it as well, and Solaris supports it through their "doors" API. Stevens Vol. 2 describes the various APIs.) --lyndon ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 3:36 ` Lyndon Nerenberg @ 2009-12-05 3:56 ` Sam Watkins 2009-12-05 4:03 ` Lyndon Nerenberg 0 siblings, 1 reply; 46+ messages in thread From: Sam Watkins @ 2009-12-05 3:56 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Dec 04, 2009 at 08:36:29PM -0700, Lyndon Nerenberg wrote: > >Another example, a little server that allows connections on a single port > >443 for https and ssh. Ideally after reading the "GET" or ssh banner, it > >can just exec whichever server is needed (or fork and exec something like > >netcat). but in fact due to this "already read some data" problem, it has > >to stay alive and copy the data in and out from the other server. > > It shouldn't be too difficult to write a device that allows file descriptors > to be passed from one process to another. You can do that with unix-domain sockets (or fork, sort of), but I don't think it solves the problem. "fork" also shares fds, but sharing or sending fds does not let me send some extra prefix data to a CGI script's stdin fd then exit and let the CGI script take over reading from my old stdin fd, if that makes any sense. also, obviously I don't want to have to hack every CGI script in existance to make it work. Another possible solution, which would only work with http (so it's not a real solution) would be a function like "read_until" where it would stop reading just before a delimiter "\r\n\r\n" in that case of http. That would not help with the ssh/https and similar multiplexing problems. I think the best way would be my proposed "join" system call. My proposed type of CGI would have an advantage (?) that it presents a bidirectional socket to the script, rather than a file that was already read and saved to disk and a write-only socket. CGI chat over a single http connection for example would be possible (if the browser/client also supported it). Maybe I need to draw some ascii-art. This may be off topic here since it's not specific to plan 9 but I suppose people here may be interested in topics like this. and I don't think I want to brave the lkml right now! "splice" and "sendfile" on Linux are similar contepts to my "join" I guess. I think join is better though! Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 3:56 ` Sam Watkins @ 2009-12-05 4:03 ` Lyndon Nerenberg 0 siblings, 0 replies; 46+ messages in thread From: Lyndon Nerenberg @ 2009-12-05 4:03 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > My proposed type of CGI would have an advantage (?) that it presents a > bidirectional socket to the script, rather than a file that was already read > and saved to disk and a write-only socket. CGI chat over a single http > connection for example would be possible (if the browser/client also supported > it). Your CGI scripts aren't going to run on Plan 9 anyway. For the work it will take to port stuff you're better off inventing a better method that takes advantage of Plan 9's facilities. I'm interested in what can be done on Plan 9, not on mythical utopias for other OSes intractable problems. Ditch CGI, replace the execed scripts with long-running servers, and turn httpd into a dispatcher that hands off FDs based on a URL matching scheme. You could probably even hack up the plumber as the dispatcher. --lyndon ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 3:17 Sam Watkins 2009-12-05 3:36 ` Lyndon Nerenberg @ 2009-12-05 18:16 ` Tim Newsham 2009-12-05 18:24 ` Tim Newsham 1 sibling, 1 reply; 46+ messages in thread From: Tim Newsham @ 2009-12-05 18:16 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > I can see two possible solutions for this, both of which would be useful in my > opinion: > > - an "unread" function, like ungetc, which allows a program to put back some > data that was already read to the OS stdin buffer (not the stdio buffer). > This might be problematic if there is a limit to the size of the buffers. Wouldn't it be a lot easier to change the convention of the program you're forking and execing to take 1) a buffer of data (passed via cmd line, or fd, or whatever) and 2) the fd with the unconsumed part of the data? The only data that would have to be copied would be the preconsumed data that you would have wanted to "unget". > Sam Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 18:16 ` Tim Newsham @ 2009-12-05 18:24 ` Tim Newsham 2009-12-05 19:47 ` Bakul Shah ` (2 more replies) 0 siblings, 3 replies; 46+ messages in thread From: Tim Newsham @ 2009-12-05 18:24 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs >> I can see two possible solutions for this, both of which would be useful in >> my >> opinion: >> >> - an "unread" function, like ungetc, which allows a program to put back >> some >> data that was already read to the OS stdin buffer (not the stdio >> buffer). >> This might be problematic if there is a limit to the size of the >> buffers. > > Wouldn't it be a lot easier to change the convention of the > program you're forking and execing to take 1) a buffer of data > (passed via cmd line, or fd, or whatever) and 2) the fd with > the unconsumed part of the data? The only data that would have > to be copied would be the preconsumed data that you would have > wanted to "unget". ps. if you wanted to hide this ugliness of passing a buffer and fd to a child process instead of just passing an fd, you could still solve it in userland without a syscall. Write a library that does buffered IO. Include unget() if you like. Write the library in a way that you can initialize it after a fork/exec to pick up state from the parent (ie. by taking two fds, reading the buffer from the first, and continuing on with the 2nd when it is exhausted). Is there much benefit in doing this in the kernel instead? Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 18:24 ` Tim Newsham @ 2009-12-05 19:47 ` Bakul Shah 2009-12-07 12:24 ` roger peppe 2009-12-07 12:06 ` Mechiel Lukkien 2010-01-05 13:48 ` Enrico Weigelt 2 siblings, 1 reply; 46+ messages in thread From: Bakul Shah @ 2009-12-05 19:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, 05 Dec 2009 08:24:45 -1000 Tim Newsham <newsham@lava.net> wrote: > >> I can see two possible solutions for this, both of which would be useful i > n > >> my > >> opinion: > >> > >> - an "unread" function, like ungetc, which allows a program to put back > >> some > >> data that was already read to the OS stdin buffer (not the stdio > >> buffer). > >> This might be problematic if there is a limit to the size of the > >> buffers. > > > > Wouldn't it be a lot easier to change the convention of the > > program you're forking and execing to take 1) a buffer of data > > (passed via cmd line, or fd, or whatever) and 2) the fd with > > the unconsumed part of the data? The only data that would have > > to be copied would be the preconsumed data that you would have > > wanted to "unget". > > ps. if you wanted to hide this ugliness of passing a buffer and > fd to a child process instead of just passing an fd, you could > still solve it in userland without a syscall. Write a library > that does buffered IO. Include unget() if you like. Write the > library in a way that you can initialize it after a fork/exec > to pick up state from the parent (ie. by taking two fds, > reading the buffer from the first, and continuing on with the > 2nd when it is exhausted). > > Is there much benefit in doing this in the kernel instead? Some OS support will help... but first let me provide some motivation! A useful abstraction for this sort of thing is "streams" as in functional programming languages, where the tail of a stream is computed as needed and the computed prefix of the stream can be reread as many times as you wish (stuff no one can reference any more will be garbage collected). So for example, if I define a "primes" stream, I can do 100 `take` primes in Haskell any number of times and always get the first 100 primes. If I wanted to pass entire primes stream *minus* the first 100 to a function, I'd use "100 `drop` primes" to get a new stream. In the example given you'd represent your http data as a stream (its tail is "computed" as you read from the socket/fd), do any preprocessing you want and then pass the whole stream on. Data already read is buffered and you can reread it from the stream. Now unix/plan9 sort of do this for files but not when an fd refers to a fifo of some sort. For an open file, after a fork both the parent and the child start off at the same place in the file but then they can read at different rates. But io to fifos/sockets don't share this behavior. The OS support I am talking about: a) the fork behavior on an open file should be available *without* forking. dup() doesn't cut it (both fds share the same offset on the underlying file). I'd call the new syscall fdfork(). That is, if I do int newfd = fdfork(oldfd); reading N bytes each from newfd and oldfd will return identical data. b) there should be a way to implement the same semantics for fifos or communication end points (or any synthetic file). In the above example same N bytes must be returned even if the underlying object is not a file. c) there should be a way to pass the fd (really, a capability) to another process. Given these, what the OP wants can be implemented cleanly. You fdfork() first, do all your analysis using one fd, close it and then pass on the other fd to a helper process. Implementing b) ideally requires the OS to store potentially arbitrary amount of data. But an implementation must set some practical limit (like that on fifo buffering). ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 19:47 ` Bakul Shah @ 2009-12-07 12:24 ` roger peppe 2009-12-07 12:32 ` Charles Forsyth 2009-12-07 14:13 ` Sam Watkins 0 siblings, 2 replies; 46+ messages in thread From: roger peppe @ 2009-12-07 12:24 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/12/5 Bakul Shah <bakul+plan9@bitblocks.com>: > int newfd = fdfork(oldfd); i'm not sure that there needs to be a new syscall to enable this. a driver would be adequate. here's one possibility: the driver implements "buffered streams" - i.e. reads are lazy, but previous reads can be re-read. bind '#β4.8192' /mnt/bufstream to get a buffered, read-only stream of fd 4, with an 8K buffer. open /mnt/bufstream/data to get a new window on the stream. if you read at an offset beyond anything previously read, it triggers a read on the underlying fd, which may block. if the offset isn't within the buffer size, then the read returns -1; otherwise the read is satisfied from the buffered data. the underlying assumption is that the fd is stream-, not message-oriented - as with tcp; message boundaries are not preserved. if you wanted it, an "fd join" driver could be simply implemented in a similar way: bind '#j4.5' /mnt/joined open /mnt/joined/data to get a (read-only) fd that satisfies reads from fd 4 until eof, then fd 5. both of these might make a fun exercise for a rainy day. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:24 ` roger peppe @ 2009-12-07 12:32 ` Charles Forsyth 2009-12-07 12:35 ` Francisco J Ballesteros 2009-12-07 14:13 ` Sam Watkins 1 sibling, 1 reply; 46+ messages in thread From: Charles Forsyth @ 2009-12-07 12:32 UTC (permalink / raw) To: 9fans >bind '#j4.5' /mnt/joined > ... to get a (read-only) fd that satisfies reads from fd 4 >until eof, then fd 5. i wonder if there's a way of perverting fs(3) ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:32 ` Charles Forsyth @ 2009-12-07 12:35 ` Francisco J Ballesteros 2009-12-07 13:42 ` Charles Forsyth 2009-12-07 16:10 ` erik quanstrom 0 siblings, 2 replies; 46+ messages in thread From: Francisco J Ballesteros @ 2009-12-07 12:35 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hmmm. That's what a cat device do, only that it does so by looking at the sizes and not at eof indications. Also, it depends on seek pos., which wont work for streams. Perhaps a streamcat, although I don't like to have cats and streamcats. Perhaps yet another option. fs is already larger than it was, there's an experimental ongoing version that knows enough of partitioning to help usb and others on that respect. Trying is fun, anyway. On Mon, Dec 7, 2009 at 1:32 PM, Charles Forsyth <forsyth@terzarima.net> wrote: >>bind '#j4.5' /mnt/joined >> ... to get a (read-only) fd that satisfies reads from fd 4 >>until eof, then fd 5. > > i wonder if there's a way of perverting fs(3) > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:35 ` Francisco J Ballesteros @ 2009-12-07 13:42 ` Charles Forsyth 2009-12-07 16:10 ` erik quanstrom 1 sibling, 0 replies; 46+ messages in thread From: Charles Forsyth @ 2009-12-07 13:42 UTC (permalink / raw) To: 9fans > i wonder if there's a way of perverting fs(3) i made the comment fairly idly, so i shouldn't take it too seriously. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:35 ` Francisco J Ballesteros 2009-12-07 13:42 ` Charles Forsyth @ 2009-12-07 16:10 ` erik quanstrom 2009-12-07 16:14 ` Francisco J Ballesteros 1 sibling, 1 reply; 46+ messages in thread From: erik quanstrom @ 2009-12-07 16:10 UTC (permalink / raw) To: 9fans > fs is already larger than it was, there's an experimental > ongoing version that knows enough of partitioning to help > usb and others on that respect. why not just use sdloop(3)? - erik ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 16:10 ` erik quanstrom @ 2009-12-07 16:14 ` Francisco J Ballesteros 0 siblings, 0 replies; 46+ messages in thread From: Francisco J Ballesteros @ 2009-12-07 16:14 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs It seems that changing a bit fs(3) can suffice and is generic enough for all usages required. In the end it might result in code removed instead of adding code, but time will tell. As of today, it's only an experiment. On Mon, Dec 7, 2009 at 5:10 PM, erik quanstrom <quanstro@coraid.com> wrote: >> fs is already larger than it was, there's an experimental >> ongoing version that knows enough of partitioning to help >> usb and others on that respect. > > why not just use sdloop(3)? > > - erik > > ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:24 ` roger peppe 2009-12-07 12:32 ` Charles Forsyth @ 2009-12-07 14:13 ` Sam Watkins 2009-12-07 14:36 ` roger peppe 2009-12-08 12:51 ` matt 1 sibling, 2 replies; 46+ messages in thread From: Sam Watkins @ 2009-12-07 14:13 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Dec 07, 2009 at 12:24:05PM +0000, roger peppe wrote: > if you wanted it, an "fd join" driver could be simply > implemented in a similar way: > > bind '#j4.5' /mnt/joined > open /mnt/joined/data to get a (read-only) fd that satisfies reads from fd 4 > until eof, then fd 5. That's not what I meant by joining two fds. I meant for example if a process is reading from its stdin a open file 'A' and writing to stdout the input of a pipe 'B', rather than looping and forwarding data it may simply "join" these two fds, and exit. The OS will then do what is necessary to make sure the data can travel from A to B (and/or vice versa) with the minimum effort needed. Supposing another process 'foo' is reading the other end of the pipe 'C', the OS will simply remove the pipe 'B-C' entirely, and reroute 'foo's stdin to come directly from 'A'. In other circumstances the OS might need to effectively exec 'cat' (or a 2-way socket-cat) to take over the task of copying data, but often it will be able to remove a pipe, reducing the amount of unnecessary copying that will take place. Where I have said "stdin" I mean the fds not stdio / buffered IO FILEs. I hope I've cleared up what I meant now, seems I'm not very good at explaining it. Sam ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 14:13 ` Sam Watkins @ 2009-12-07 14:36 ` roger peppe 2009-12-07 19:11 ` Nathaniel W Filardo 2009-12-08 12:51 ` matt 1 sibling, 1 reply; 46+ messages in thread From: roger peppe @ 2009-12-07 14:36 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/12/7 Sam Watkins <sam@nipl.net>: > I meant for example if a process is reading from its stdin a open file 'A' and > writing to stdout the input of a pipe 'B', rather than looping and forwarding > data it may simply "join" these two fds, and exit. The OS will then do what is > necessary to make sure the data can travel from A to B (and/or vice versa) with > the minimum effort needed. i'm not sure how you think this would work. a file descriptor is essentially a passive object - it responds to read, write, etc requests on it, but it doesn't do anything of its own accord. if i do: fd1 := open("/foo1", ORDWR); fd2 := open("/foo2", ORDWR); fd3 := fdjoin(fd1, fd2); what is going to happen? something has got to initiate the requests to actually shift the data, and it's not clear which direction the data will flow. this is an optimisation, right? what parts of the current system could be speeded up by the use of this primitive? ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 14:36 ` roger peppe @ 2009-12-07 19:11 ` Nathaniel W Filardo 2009-12-07 21:03 ` roger peppe 0 siblings, 1 reply; 46+ messages in thread From: Nathaniel W Filardo @ 2009-12-07 19:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 2478 bytes --] On Mon, Dec 07, 2009 at 02:36:36PM +0000, roger peppe wrote: > 2009/12/7 Sam Watkins <sam@nipl.net>: > > I meant for example if a process is reading from its stdin a open file 'A' and > > writing to stdout the input of a pipe 'B', rather than looping and forwarding > > data it may simply "join" these two fds, and exit. The OS will then do what is > > necessary to make sure the data can travel from A to B (and/or vice versa) with > > the minimum effort needed. > > i'm not sure how you think this would work. The pipe would have to be a bit smarter than Plan 9's pipes currently are, or the attempts to join to a pipe would have to skip over the pipe and join with the other descriptor. It's certainly _possible_ to do, and AFAIK the Linux guys do so with abandon. ;) > a file descriptor is essentially a passive object - it responds > to read, write, etc requests on it, but it doesn't do anything > of its own accord. > > if i do: > > fd1 := open("/foo1", ORDWR); > fd2 := open("/foo2", ORDWR); > fd3 := fdjoin(fd1, fd2); > > what is going to happen? > something has got to initiate the requests to actually > shift the data, and it's not clear which direction the > data will flow. "file to file" joins like that are not the typical case and might even be an error to attempt. Linux's equivalent APIs (yes, plural, sigh) always hook an "active" component somewhere... sendfile() for example is typically employed as a crude hook on the TCP stack's "I could accept some bytes from a write() from userland now" "event" and turn that into a read() of the sendfile()d thing (which must be a pagecacheable thing... wtf. splice() fixes at least some or perhaps all of that). splice()d file descriptors just forward read()s and write()s across the splice. > this is an optimisation, right? what parts of the current system > could be speeded up by the use of this primitive? A typical *nix use case is sending a prefix and static file to a socket (e.g. nonencrypting, nonchunked httpds, ftpds, etc.). Of note, more generally, splice() and friends are approximating something possible and (relatively) easy in the capability kernel world: some process has capabilities to two objects and wishes to introduce those objects to each other (and further wishes that those objects would stop bothering it. :) ). i.e. "Please resend all outstanding and forward all future requests to this other capability." --nwf; [-- Attachment #2: Type: application/pgp-signature, Size: 204 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 19:11 ` Nathaniel W Filardo @ 2009-12-07 21:03 ` roger peppe 0 siblings, 0 replies; 46+ messages in thread From: roger peppe @ 2009-12-07 21:03 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/12/7 Nathaniel W Filardo <nwf@cs.jhu.edu>: >> fd1 := open("/foo1", ORDWR); >> fd2 := open("/foo2", ORDWR); >> fd3 := fdjoin(fd1, fd2); >> >> what is going to happen? >> something has got to initiate the requests to actually >> shift the data, and it's not clear which direction the >> data will flow. > > "file to file" joins like that are not the typical case and might even be an > error to attempt. in plan 9, everything is a file - whether it's generated by opening '#p/data1' or '/foo1'. > Linux's equivalent APIs (yes, plural, sigh) always hook > an "active" component somewhere... sendfile() for example is typically > employed as a crude hook on the TCP stack's "I could accept some bytes from > a write() from userland now" "event" and turn that into a read() of the > sendfile()d thing (which must be a pagecacheable thing... wtf. splice() > fixes at least some or perhaps all of that). splice()d file descriptors > just forward read()s and write()s across the splice. i see why you might want to send file descriptors around the place, (for instance, one could theoretically add a control request to /net/tcp that said "treat this fd as your source of data", though it wouldn't work across the network), but i still don't see how "splice" could work in general. >> this is an optimisation, right? what parts of the current system >> could be speeded up by the use of this primitive? > > A typical *nix use case is sending a prefix and static file to a socket > (e.g. nonencrypting, nonchunked httpds, ftpds, etc.). well, that case is easily dealt with with something like devjoin. > Of note, more generally, splice() and friends are approximating something > possible and (relatively) easy in the capability kernel world: some process > has capabilities to two objects and wishes to introduce those objects to > each other (and further wishes that those objects would stop bothering it. > :) ). i.e. "Please resend all outstanding and forward all future requests > to this other capability." i see that. but i think that fdjoin(fd1, fd2) is more like introducing two capabilities to each other, which doesn't really make sense, than talking to the objects behind the scenes. the objects behind the scenes in plan 9 are servers and device drivers. it might be interesting to provide a nice (not /srv based) way of passing file descriptors between unrelated processes. the challenge comes when you want to make it work on a remote file server... ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 14:13 ` Sam Watkins 2009-12-07 14:36 ` roger peppe @ 2009-12-08 12:51 ` matt 1 sibling, 0 replies; 46+ messages in thread From: matt @ 2009-12-08 12:51 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs > >That's not what I meant by joining two fds. > > > back seat coding, nice ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 18:24 ` Tim Newsham 2009-12-05 19:47 ` Bakul Shah @ 2009-12-07 12:06 ` Mechiel Lukkien 2009-12-07 12:31 ` roger peppe 2010-01-05 13:48 ` Enrico Weigelt 2 siblings, 1 reply; 46+ messages in thread From: Mechiel Lukkien @ 2009-12-07 12:06 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 2499 bytes --] On Sat, Dec 05, 2009 at 08:24:45AM -1000, Tim Newsham wrote: > ps. if you wanted to hide this ugliness of passing a buffer and > fd to a child process instead of just passing an fd, you could > still solve it in userland without a syscall. Write a library > that does buffered IO. Include unget() if you like. Write the > library in a way that you can initialize it after a fork/exec > to pick up state from the parent (ie. by taking two fds, > reading the buffer from the first, and continuing on with the > 2nd when it is exhausted). > > Is there much benefit in doing this in the kernel instead? it's all library code, and it loses the "everything is a file (descriptor)" advantage. you cannot pass that library state to another program. you could if the state was a file descriptor. for inferno i wrote an http client library that turns a request into an fd to read the data from. that fd has http chunking,gzip,ssl peeled off. now i can pass the fd with the http response to other programs, do buffered i/o on it, etc. this is implemented in user-space btw, with inferno's sys->file2chan (as opposed to pipes, you can do error message propagation over file2chan's). since file descriptors are so essential, it may help to have "tools" to use them. yesterday evening i hacked up devbuf.c and devjoin.c after reading this thread. both offer a file "new". for devbuf.c you can write data to it, then later consume it (yes, you could just use a pipe instead). for devjoin.c, you can write fd numbers (of open files) to register an fd, then later reads will get data from the first registered file, when that returns 0 it continues on the next, and so on. so fd's can be chained for reading (not writing). i know this "join" functionality is different from what sam originally described. i've attached devbuf.c and devjoin.c, as example (for inferno). they have bugs (don't assign qid.path, probably *walk is broken too). testbufjoin.b is an example of how the dev's can be used. it creates a new fd that has a buffer at the front (e.g. leftovers from http header reading), then continues on stdin (where the leftover may have come from). then it reads the new fd and writes its data to stdout. these devices are not for performance. perhaps they make working with one of the most basic OS concepts (fd's) a bit easier. but perhaps this problem is not common enough, or can be handled (with fd's preferrably) in a better way. mjl [-- Attachment #2: devbuf.c --] [-- Type: text/x-csrc, Size: 1729 bytes --] #include "dat.h" #include "fns.h" #include "error.h" typedef struct Buffile Buffile; struct Buffile { uchar *p; int s; int e; }; enum { Qdir, Qbuffile, }; Dirtab bufdir[] = { ".", {Qdir,0,QTDIR}, 0, DMDIR|0500, "new", {Qbuffile}, 0, 0660, }; static Buffile* buffilealloc(uchar *p, int n) { Buffile *b; b = malloc(sizeof b[0]+n); b->p = (uchar*)b+sizeof b[0]; memmove(b->p, p, n); b->s = 0; b->e = n; return b; } static Chan* bufattach(char *spec) { return devattach('β', spec); } static Walkqid* bufwalk(Chan *c, Chan *nc, char **name, int nname) { return devwalk(c, nc, name, nname, bufdir, nelem(bufdir), devgen); } static int bufstat(Chan *c, uchar *db, int n) { return devstat(c, db, n, bufdir, nelem(bufdir), devgen); } static Chan* bufopen(Chan *c, int omode) { return devopen(c, omode, bufdir, nelem(bufdir), devgen); } static void bufclose(Chan *c) { free(c->aux); c->aux = nil; } static long bufread(Chan *c, void *va, long n, vlong off) { Buffile *b; int have; if(c->qid.type == QTDIR) return devdirread(c, va, n, bufdir, nelem(bufdir), devgen); b = c->aux; if(b == nil) return 0; USED(off); have = b->e - b->s; if(have < n || n < 0) n = have; memmove(va, b->p, n); b->s += n; return n; } static long bufwrite(Chan *c, void *va, long n, vlong off) { if(c->qid.type == QTDIR) error(Eisdir); free(c->aux); c->aux = buffilealloc(va, n); return n; } Dev bufdevtab = { 'β', "buf", devinit, bufattach, bufwalk, bufstat, bufopen, devcreate, bufclose, bufread, devbread, bufwrite, devbwrite, devremove, devwstat, }; [-- Attachment #3: devjoin.c --] [-- Type: text/x-csrc, Size: 2037 bytes --] #include "dat.h" #include "fns.h" #include "error.h" typedef struct Join Join; struct Join { Chan *c; Join *next; }; enum { Qdir, Qjoinfile, }; Dirtab joindir[] = { ".", {Qdir,0,QTDIR}, 0, DMDIR|0500, "new", {Qjoinfile}, 0, 0660, }; static void joinfree(Join *j) { if(j == nil) return; joinfree(j->next); cclose(j->c); free(j); } static Chan* joinattach(char *spec) { return devattach('δ', spec); } static Walkqid* joinwalk(Chan *c, Chan *nc, char **name, int nname) { return devwalk(c, nc, name, nname, joindir, nelem(joindir), devgen); } static int joinstat(Chan *c, uchar *db, int n) { return devstat(c, db, n, joindir, nelem(joindir), devgen); } static Chan* joinopen(Chan *c, int omode) { return devopen(c, omode, joindir, nelem(joindir), devgen); } static void joinclose(Chan *c) { joinfree(c->aux); c->aux = nil; } static long joinread(Chan *c, void *va, long n, vlong off) { Join *j; long l; if(c->qid.type == QTDIR) return devdirread(c, va, n, joindir, nelem(joindir), devgen); l = 0; while(c->aux != nil) { j = c->aux; l = devtab[j->c->type]->read(j->c, va, n, off); if(l != 0) break; c->aux = j->next; cclose(j->c); free(j); } return l; } static long joinwrite(Chan *c, void *va, long n, vlong off) { char buf[32]; int fd; Chan *jc; Join *j; Join *nj; if(c->qid.type == QTDIR) error(Eisdir); if(n >= sizeof buf+1) error(Ebadarg); memmove(buf, va, n); buf[n] = 0; fd = atoi(buf); jc = fdtochan(up->env->fgrp, fd, -1, 0, 1); nj = malloc(sizeof nj[0]); nj->c = jc; nj->next = nil; if(c->aux == nil) { c->aux = nj; } else { for(j = c->aux; j->next != nil; j = j->next) {} j->next = nj; } return n; } Dev joindevtab = { 'δ', "join", devinit, joinattach, joinwalk, joinstat, joinopen, devcreate, joinclose, joinread, devbread, joinwrite, devbwrite, devremove, devwstat, }; [-- Attachment #4: testbufjoin.b --] [-- Type: chemical/x-molconn-Z, Size: 1303 bytes --] ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-07 12:06 ` Mechiel Lukkien @ 2009-12-07 12:31 ` roger peppe 0 siblings, 0 replies; 46+ messages in thread From: roger peppe @ 2009-12-07 12:31 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs 2009/12/7 Mechiel Lukkien <mechiel@xs4all.nl>: > i've attached devbuf.c and devjoin.c, as example (for inferno). [saw this just after i'd posted] that's funny - you even chose the same device character for devbuf! to be honest, your devbuf.c is almost synomous with a pipe. for buffer sizes of <64K, writes on a pipe don't block. ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2009-12-05 18:24 ` Tim Newsham 2009-12-05 19:47 ` Bakul Shah 2009-12-07 12:06 ` Mechiel Lukkien @ 2010-01-05 13:48 ` Enrico Weigelt 2010-01-05 15:53 ` Steve Simon 2 siblings, 1 reply; 46+ messages in thread From: Enrico Weigelt @ 2010-01-05 13:48 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs * Tim Newsham <newsham@lava.net> wrote: > ps. if you wanted to hide this ugliness of passing a buffer and > fd to a child process instead of just passing an fd, you could > still solve it in userland without a syscall. Write a library > that does buffered IO. Include unget() if you like. Write the > library in a way that you can initialize it after a fork/exec > to pick up state from the parent (ie. by taking two fds, > reading the buffer from the first, and continuing on with the > 2nd when it is exhausted). Not sure how things work on Plan9, but on GNU/Linux you could even use LD_PRELOAD to overlay the read() libc function to hide that magic, or even tweak libc for that. BTW: how to do you in general think about having tweaked libc's instead of all these "cross-platform libraries" ? For example, I'm thinking about whether it's worth to change uclibc in a way that it allows to plug-in userland-vfs'es. cu -- ---------------------------------------------------------------------- Enrico Weigelt, metux IT service -- http://www.metux.de/ phone: +49 36207 519931 email: weigelt@metux.de mobile: +49 174 7066481 icq: 210169427 skype: nekrad666 ---------------------------------------------------------------------- Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [9fans] ideas for helpful system io functions 2010-01-05 13:48 ` Enrico Weigelt @ 2010-01-05 15:53 ` Steve Simon 0 siblings, 0 replies; 46+ messages in thread From: Steve Simon @ 2010-01-05 15:53 UTC (permalink / raw) To: weigelt, 9fans > I'm thinking about whether it's worth to change uclibc in a way > that it allows to plug-in userland-vfs'es. Nothing new under the sun I'am afraid, however if you go ahead this might be interesting. http://www.cs.ncl.ac.uk/publications/articles/papers/399.pdf -Steve ^ permalink raw reply [flat|nested] 46+ messages in thread
end of thread, other threads:[~2010-01-05 15:53 UTC | newest] Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <<20091207120652.GB16320@knaagkever.ueber.net> 2009-12-07 12:19 ` [9fans] ideas for helpful system io functions erik quanstrom [not found] <<8ccc8ba40912070814o2f2c7eb9s5887a31810eab12e@mail.gmail.com> 2009-12-07 16:24 ` erik quanstrom 2009-12-07 16:48 ` Francisco J Ballesteros 2009-12-07 14:41 Francisco J Ballesteros 2009-12-07 15:11 ` roger peppe [not found] <<20091205202420.855AD5B77@mail.bitblocks.com> 2009-12-05 20:27 ` erik quanstrom 2009-12-05 20:59 ` Bakul Shah 2009-12-06 7:45 ` Sam Watkins 2009-12-05 20:30 ` erik quanstrom [not found] <<20091205194741.0697D5B76@mail.bitblocks.com> 2009-12-05 20:03 ` erik quanstrom 2009-12-05 20:24 ` Bakul Shah [not found] <<20091205081032.GJ8759@nipl.net> 2009-12-05 13:51 ` erik quanstrom [not found] <<alpine.BSF.2.00.0912042210290.81688@legolas.yyc.orthanc.ca> 2009-12-05 13:26 ` erik quanstrom 2009-12-05 14:22 ` Sam Watkins 2009-12-05 17:47 ` Skip Tavakkolian 2009-12-05 17:56 ` Skip Tavakkolian [not found] <<alpine.BSF.2.00.0912042029370.66255@legolas.yyc.orthanc.ca> 2009-12-05 4:47 ` erik quanstrom 2009-12-05 5:09 ` Lyndon Nerenberg 2009-12-05 5:11 ` Lyndon Nerenberg 2009-12-05 8:10 ` Sam Watkins 2009-12-05 11:44 ` Francisco J Ballesteros 2009-12-05 16:32 ` ron minnich 2009-12-05 17:01 ` Francisco J Ballesteros 2009-12-05 17:09 ` ron minnich -- strict thread matches above, loose matches on Subject: below -- 2009-12-05 3:17 Sam Watkins 2009-12-05 3:36 ` Lyndon Nerenberg 2009-12-05 3:56 ` Sam Watkins 2009-12-05 4:03 ` Lyndon Nerenberg 2009-12-05 18:16 ` Tim Newsham 2009-12-05 18:24 ` Tim Newsham 2009-12-05 19:47 ` Bakul Shah 2009-12-07 12:24 ` roger peppe 2009-12-07 12:32 ` Charles Forsyth 2009-12-07 12:35 ` Francisco J Ballesteros 2009-12-07 13:42 ` Charles Forsyth 2009-12-07 16:10 ` erik quanstrom 2009-12-07 16:14 ` Francisco J Ballesteros 2009-12-07 14:13 ` Sam Watkins 2009-12-07 14:36 ` roger peppe 2009-12-07 19:11 ` Nathaniel W Filardo 2009-12-07 21:03 ` roger peppe 2009-12-08 12:51 ` matt 2009-12-07 12:06 ` Mechiel Lukkien 2009-12-07 12:31 ` roger peppe 2010-01-05 13:48 ` Enrico Weigelt 2010-01-05 15:53 ` Steve Simon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).