From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <dd6fe68a0901042220y278aa283g87f42d2eb458e617@mail.gmail.com>
Date: Sun,  4 Jan 2009 22:20:55 -0800
From: "Russ Cox" <rsc@swtch.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
Subject: Re: [9fans] Why do we need syspipe() ?
In-Reply-To: <1231130372.11463.433.camel@goose.sun.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <1231045486.11463.245.camel@goose.sun.com>
	 <dd6fe68a0901032256i7a40c196odcc522a35202caea@mail.gmail.com>
	 <1231130372.11463.433.camel@goose.sun.com>
Topicbox-Message-UUID: 799aaeba-ead4-11e9-9d60-3106f5b1d025

>> I don't believe you can write a race-free implementation of
>> the pipe system call using #|.
>
> Could you, please, elaborate on what particular race do you have
> in mind? Indeed, I ran into a problem with devpipe implementation,
> but it isn't a race, its a dreaded implicit ->attach that namec()
> does when it evaluates names with the first character being #.

The closest you can come in user space to implementing pipe is:

int
pipe(int *fd)
{
	bind("#|", "/mnt", MREPL);
	fd[0] = open("/mnt/data", ORDWR);
	fd[1] = open("/mnt/data1", ORDWR);
	unmount("/mnt");
	return 0;
}

but if there are multiple processes running pipe()
in the same name space, the binds will step
on each other and the pipes might get crossed.
Even if not, maybe something else was already
mounted on /mnt (or whatever mount point you
choose), and now there's nothing there.

>> I also don't believe you can implement the dup system call
>> (remember, it has two arguments) using #d.
>
> Agreed. That's why I mentioned that a more feature-rich devdup
> is needed. Of course, now I've also discovered that the current
> implementation of devpipe is also not sufficient enough for me
> to be able to produce a 100% user-space version of pipe(2).

Sorry, I thought you were saying that #d was already more
feature rich than dup (it is, in a way, since it has the ctl files now).
I was trying to say that although that is true, it doesn't have the
dup features.

There are some devices in Plan 9 that simply don't "virtualize",
because at a deep level they are tied to process state that
doesn't go through the file system.  Dup manipulates the file
descriptor table, not files themselves.  Pipe accesses files that
have no name in the file system.   The pid returned by getpid
needs to match the pid returned by the parent's fork; it really
needs to be the process's actual pid.  For example, suppose
a process wants to know .  If getpid read from /dev/pid
instead of #c/pid, then running "iostats rc -c 'echo $pid'"
would show iostats's pid, not rc's.  What then if rc wants to send
itself (or, more likely, its note group) a note, or fiddle with
one of its /proc files?  It would be manipulating iostats, not
itself.

A write to devsrv is even more magical: when you write "23"
to #s/newfile, your process's fd 23 gets taken over by the
kernel.  For this reason you can't use iostats on any program
that writes to /srv/newfile instead of #s/newfile--when the program
writes "23", the kernel sees the request come from iostats
instead of the original program, and it takes over the wrong fd.
(Most of those programs are 9P servers that fork into the
background, and iostats isn't too useful on those anyway,
so no one has bothered to address this.)

The tls device and ssl devices #a and #D use the same trick,
so you can't interpose on traffic destined to them.  Happily,
the libraries use #a and #D directly, so using iostats on them
simply misses that i/o rather than causing the program to
execute incorrectly.

You could add a special message to #d to make the dup
system call unnecessary, but it wouldn't be any cleaner than
having the system call, since you'd have to hard code #d
instead of using /fd, or else you'd have the same problems
as #s, #a, and #D already do.

The # device syntax is very useful to mean the kernel device
and none other in these situations.  There's definitely
something unsatisfactory about it, but it works.

Russ