Re: [9fans] Potential race in syspipe()

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Re: [9fans] Potential race in syspipe()
@ 2000-08-14  4:30 presotto
  2000-08-14  4:50 ` Alexander Viro
  0 siblings, 1 reply; 8+ messages in thread
From: presotto @ 2000-08-14  4:30 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 2423 bytes --]

If the user is doing a dup2 unsynchronized with the syspipe(),
even if everything works, the results are wierd with both solutions.
Both the pipe() and the dup2() can succeed, except the pipe()
can be returning fd's that have nothing to do with a pipe.
As you said, dup2 is an dangerous thing when the fd's are shared.

However, you're right.  Your solution yields a more explainable
world in the unlikely event that newfd() fails in syspipe().
I put in a newfd2().  The exact code follows.  I added the
routine findfreefd() since I want to keep that piece of
magic in one place.  I noticed your snippet didn't check the
need to growfd after getting i, perhaps a good reason for
findfreefd().

/*
 *  this assumes that the fgrp is locked
 */
int
findfreefd(Fgrp *f, int start)
{
	int fd;

	for(fd=start; fd<f->nfd; fd++)
		if(f->fd[fd] == 0)
			break;
	if(fd >= f->nfd && growfd(f, fd) < 0)
		return -1;
	return fd;
}

int
newfd(Chan *c)
{
	int fd;
	Fgrp *f;

	f = up->fgrp;
	lock(f);
	fd = findfreefd(f, 0);
	if(fd < 0){
		unlock(f);
		return -1;
	}
	if(fd > f->maxfd)
		f->maxfd = fd;
	f->fd[fd] = c;
	unlock(f);
	return fd;
}

int
newfd2(int fd[2], Chan *c[2])
{
	Fgrp *f;

	f = up->fgrp;
	lock(f);
	fd[0] = findfreefd(f, 0);
	if(fd[0] < 0){
		unlock(f);
		return -1;
	}
	fd[1] = findfreefd(f, fd[0]+1);
	if(fd[1] < 0){
		unlock(f);
		return -1;
	}
	if(fd[1] > f->maxfd)
		f->maxfd = fd[1];
	f->fd[fd[0]] = c[0];
	f->fd[fd[1]] = c[1];
	unlock(f);

	return 0;
}

...

long
syspipe(ulong *arg)
{
	int fd[2];
	Chan *c[2];
	Dev *d;

	validaddr(arg[0], 2*BY2WD, 1);
	evenaddr(arg[0]);
	d = devtab[devno('|', 0)];
	c[0] = namec("#|", Atodir, 0, 0);
	c[1] = 0;
	fd[0] = -1;
	fd[1] = -1;

	if(waserror()){
		cclose(c[0]);
		if(c[1])
			cclose(c[1]);
		nexterror();
	}
	c[1] = cclone(c[0], 0);
	if(walk(&c[0], "data", 1) < 0)
		error(Egreg);
	if(walk(&c[1], "data1", 1) < 0)
		error(Egreg);
	c[0] = d->open(c[0], ORDWR);
	c[1] = d->open(c[1], ORDWR);
	if(newfd2(fd, c) < 0)
		error(Enofd);
	poperror();

	((long*)arg[0])[0] = fd[0];
	((long*)arg[0])[1] = fd[1];
	return 0;
}

As for create(), I'ld rather change the man page.  Otherwise we'ld
have to reserve an fd in some way before the open but not have that
fd visible to other processes until it was assinged a channel
or let it be step on-able by dup2...

[-- Attachment #2: Type: message/rfc822, Size: 4041 bytes --]

From: Alexander Viro <viro@math.psu.edu>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] Potential race in syspipe()
Date: Sun, 13 Aug 2000 22:42:05 -0400 (EDT)
Message-ID: <Pine.GSO.4.10.10008132154210.194-100000@weyl.math.psu.edu>



On Sun, 13 Aug 2000 presotto@plan9.bell-labs.com wrote:

> You are right, it is rookie programming, probably mine.  Your statement:
> 
> 	_anything_ that goes into ->fd[] must be treated as final
> 	as soon as you release the lock on ->fgrp
> 
> is the crux of the matter.
> 
> Your solution is reasonable.  I would probably do something a little
> dirtier to avoid duplicating the functionality of newfd().  It would be
> just to make the waserror() a little smarter in syspipe():
> 
> 	if(waserror()){
> 		if(fd[0] >= 0)
> 			fdclose(fd[0])
> 		else
> 			cclose(c[0]);
> 		if(fd[1] >= 0)
> 			fdclose(fd[1]);
> 		else if(c[1] != nil)
> 			cclose(c[1]);
> 		nexterror();
> 	}
> 
> and leave the rest alone.

Umm... You are handling that race to userland - effect of the race
scenario will be
Process A: pipe() returns -1
Process B: dup(foo, bar) returns bar, but fd[bar]==NULL

OTOH the race _was_ in the userland from the very beginning, so it may be
OK. Sigh... dup2() semantics seems to be ill-suited for systems that can
share descriptor table between several processes. Probably Plan 9 was the
first system that had to deal with that (IIRC Linux got clone() in '94 or
'95 and *BSD got rfork() pretty recently). Were there any attempts to
provide something, erm, less race-prone than dup2()?

	BTW, looks like create(2) (and creat() in APE) have an interesting
misfeature/bug/whatever: suppose that file creation succeeds but newfd()
fails. Then you are getting Enofd, but file is already created (and stays
alive, indeed). For APE it's a bug - directly violates POSIX. For the
native code... Hell knows, but it seems to be at least worth mentioning in
the manpage.

	I realize very well that changing that would be painful - the best
strategy is probably to ask for space at the very beginning and keep the
number of pending allocations (i.e. make sure that newfd() never fails).
Yuck. I'll probably try to see what this approach gives on our side, but
I'm not sure that it will be any better than our current code (right now
we are using an equivalent of falloc(9); since we keep a bitmap of
allocated descriptors we can see if dup2() tries to stick its nose into
the pending slot). newfd() equivalent had been tried some time ago, but
back then it was rejected due to problems with creat(2).
	At least you don't have to deal with bloody SCM_RIGHTS
file-passing...

								Al

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
  2000-08-14  4:30 [9fans] Potential race in syspipe() presotto
@ 2000-08-14  4:50 ` Alexander Viro
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Viro @ 2000-08-14  4:50 UTC (permalink / raw)
  To: 9fans



On Mon, 14 Aug 2000 presotto@plan9.bell-labs.com wrote:

> magic in one place.  I noticed your snippet didn't check the
> need to growfd after getting i, perhaps a good reason for
> findfreefd().

	Didn't have to, actually - if the first for() would reach the end
of array we would have i==f->nfd and the second for() would immediately
terminate with j==f->nfd+1. After that we would get the right call of
growfd() and correct values at once.

	But yes, findfreefd() makes the things more obviously correct.

[snip]

> As for create(), I'ld rather change the man page.  Otherwise we'ld
> have to reserve an fd in some way before the open but not have that
> fd visible to other processes until it was assinged a channel
> or let it be step on-able by dup2...

<nods>

							Al



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
  2000-08-16 11:58     ` Alexander Viro
@ 2000-08-16 12:36       ` Boyd Roberts
  0 siblings, 0 replies; 8+ messages in thread
From: Boyd Roberts @ 2000-08-16 12:36 UTC (permalink / raw)
  To: 9fans

>  Unix sucks
> now days anyway (damn, I'm starting to sound like Oleg....)

``Not only is UNIX dead, it's starting to smell really bad''.

    -- rob, circa 1992




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
  2000-08-15 19:42   ` Dan Cross
@ 2000-08-16 11:58     ` Alexander Viro
  2000-08-16 12:36       ` Boyd Roberts
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2000-08-16 11:58 UTC (permalink / raw)
  To: 9fans



On Tue, 15 Aug 2000, Dan Cross wrote:

> So why is RedHat interested in Plan 9?  :-)

Redhat is interested? I didn't know that...

> btw- BSD has had rfork() since at least '96, if not longer.  It
> was just never used.  Scott Schwartz and I were at one point going
> to look into make it more robust, but never bothered.

Ugh... I've looked at the *BSD side. FreeBSD-CURRENT handling of shared
descriptor tables is choke-full of panicable races. OpenBSD is more or
less free of the obvious ones, but they still have open()/dup2() races.

FreeBSD is roughly at the stage we were in Spring '99 before the large
VFS threading - I had to go through the upper layers anyway, so cleaning
them up and closing the races was an obvious idea. Several races survived
- the last group had been hunted down and shot quite recently (sitting
down to write description gave usual results). OpenBSD is somewhat between
FreeBSD and Linux in that respect. I didn't do the full analysis of Plan 9
handling of the same area, but the quick check showed only a dubious place
in syspipe() - everything else seems to be fine. Constructing a race once
you know what place needs attention... <shrug> SOP.

>  Unix sucks
> now days anyway (damn, I'm starting to sound like Oleg....)

psu.flame is ---> that way. Umm... Maybe a good idea - place seems to be
quite dead lately...



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
  2000-08-14  2:42 ` Alexander Viro
@ 2000-08-15 19:42   ` Dan Cross
  2000-08-16 11:58     ` Alexander Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Cross @ 2000-08-15 19:42 UTC (permalink / raw)
  To: 9fans

In article <Pine.GSO.4.10.10008132154210.194-100000@weyl.math.psu.edu> you write:
>	I realize very well that changing that would be painful - the best
>strategy is probably to ask for space at the very beginning and keep the
>number of pending allocations (i.e. make sure that newfd() never fails).
>Yuck. I'll probably try to see what this approach gives on our side, but
>I'm not sure that it will be any better than our current code (right now
>we are using an equivalent of falloc(9); since we keep a bitmap of
>allocated descriptors we can see if dup2() tries to stick its nose into
>the pending slot). newfd() equivalent had been tried some time ago, but
>back then it was rejected due to problems with creat(2).
>	At least you don't have to deal with bloody SCM_RIGHTS
>file-passing...

So why is RedHat interested in Plan 9?  :-)

btw- BSD has had rfork() since at least '96, if not longer.  It
was just never used.  Scott Schwartz and I were at one point going
to look into make it more robust, but never bothered.  Unix sucks
now days anyway (damn, I'm starting to sound like Oleg....)

	- Dan C.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
  2000-08-14  1:47 presotto
@ 2000-08-14  2:42 ` Alexander Viro
  2000-08-15 19:42   ` Dan Cross
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2000-08-14  2:42 UTC (permalink / raw)
  To: 9fans



On Sun, 13 Aug 2000 presotto@plan9.bell-labs.com wrote:

> You are right, it is rookie programming, probably mine.  Your statement:
> 
> 	_anything_ that goes into ->fd[] must be treated as final
> 	as soon as you release the lock on ->fgrp
> 
> is the crux of the matter.
> 
> Your solution is reasonable.  I would probably do something a little
> dirtier to avoid duplicating the functionality of newfd().  It would be
> just to make the waserror() a little smarter in syspipe():
> 
> 	if(waserror()){
> 		if(fd[0] >= 0)
> 			fdclose(fd[0])
> 		else
> 			cclose(c[0]);
> 		if(fd[1] >= 0)
> 			fdclose(fd[1]);
> 		else if(c[1] != nil)
> 			cclose(c[1]);
> 		nexterror();
> 	}
> 
> and leave the rest alone.

Umm... You are handling that race to userland - effect of the race
scenario will be
Process A: pipe() returns -1
Process B: dup(foo, bar) returns bar, but fd[bar]==NULL

OTOH the race _was_ in the userland from the very beginning, so it may be
OK. Sigh... dup2() semantics seems to be ill-suited for systems that can
share descriptor table between several processes. Probably Plan 9 was the
first system that had to deal with that (IIRC Linux got clone() in '94 or
'95 and *BSD got rfork() pretty recently). Were there any attempts to
provide something, erm, less race-prone than dup2()?

	BTW, looks like create(2) (and creat() in APE) have an interesting
misfeature/bug/whatever: suppose that file creation succeeds but newfd()
fails. Then you are getting Enofd, but file is already created (and stays
alive, indeed). For APE it's a bug - directly violates POSIX. For the
native code... Hell knows, but it seems to be at least worth mentioning in
the manpage.

	I realize very well that changing that would be painful - the best
strategy is probably to ask for space at the very beginning and keep the
number of pending allocations (i.e. make sure that newfd() never fails).
Yuck. I'll probably try to see what this approach gives on our side, but
I'm not sure that it will be any better than our current code (right now
we are using an equivalent of falloc(9); since we keep a bitmap of
allocated descriptors we can see if dup2() tries to stick its nose into
the pending slot). newfd() equivalent had been tried some time ago, but
back then it was rejected due to problems with creat(2).
	At least you don't have to deal with bloody SCM_RIGHTS
file-passing...

								Al



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [9fans] Potential race in syspipe()
@ 2000-08-14  1:47 presotto
  2000-08-14  2:42 ` Alexander Viro
  0 siblings, 1 reply; 8+ messages in thread
From: presotto @ 2000-08-14  1:47 UTC (permalink / raw)
  To: 9fans

[-- Attachment #1: Type: text/plain, Size: 627 bytes --]

You are right, it is rookie programming, probably mine.  Your statement:

	_anything_ that goes into ->fd[] must be treated as final
	as soon as you release the lock on ->fgrp

is the crux of the matter.

Your solution is reasonable.  I would probably do something a little
dirtier to avoid duplicating the functionality of newfd().  It would be
just to make the waserror() a little smarter in syspipe():

	if(waserror()){
		if(fd[0] >= 0)
			fdclose(fd[0])
		else
			cclose(c[0]);
		if(fd[1] >= 0)
			fdclose(fd[1]);
		else if(c[1] != nil)
			cclose(c[1]);
		nexterror();
	}

and leave the rest alone.

[-- Attachment #2: Type: message/rfc822, Size: 3555 bytes --]

From: Alexander Viro <viro@math.psu.edu>
To: 9fans@cse.psu.edu
Subject: [9fans] Potential race in syspipe()
Date: Fri, 11 Aug 2000 07:20:14 -0400 (EDT)
Message-ID: <Pine.GSO.4.10.10008110640100.194-100000@weyl.math.psu.edu>


	Umm... Unless I'm seriously misreading the code, there is a race
between syspipe() and sysdup(). Scenario:

CPU1 (in syspipe()):
        c[0] = d->open(c[0], ORDWR);
        c[1] = d->open(c[1], ORDWR);
        fd[0] = newfd(c[0]);
/* got so far */
        fd[1] = newfd(c[1]);
/* failed on that one for whatever reason */

CPU2 (in sysdup()):
		...
                lock(f);
		...
                oc = f->fd[fd];
                f->fd[fd] = c;
                unlock(f);
                if(oc)
                        cclose(oc);
/* where fd happens to be equal to fd[0] on the CPU1 */
/* oc == CPU1's c[0], indeed */

CPU1:
        if(fd[0] < 0 || fd[1] < 0)
                error(Enofd);
/* there we go */
                cclose(c[0]);
/* woops - extra cclose on c[0] and we are welcome to... */
        if(c->flag&CFREE)
                panic("cclose %lux", getcallerpc(&c));


Giving user enough rope to hang himself is a good idea, but panic() goes
somewhat beyond "himself", especially when triggered on CPU server...
AFAICS the cleanish solution would be along the lines of

int
newfd2(int fd[2], Chan *c[2])
{
        Fgrp *f;
	int i,j;

        f = up->fgrp;
        lock(f);
        for(i=0; i<f->nfd; i++)
                if(f->fd[i] == 0)
                        break;
	for(j=i+1; j<f->nfd; j++)
		if (f->fd[j] == 0)
			break;
      
        if(j >= f->nfd && growfd(f, j) < 0){
                unlock(f);
                return -1;
        }
        if(j > f->maxfd)
                f->maxfd = j;
        f->fd[fd[0]=i] = c[0];
        f->fd[fd[1]=j] = c[1];
        unlock(f);
        return 0;
}

and using it in syspipe() (instead of two newfd() calls). That way the
error recovery in syspipe() doesn't have to touch anything that might be
anywhere near the ->fd[].

Comments? AFAICS _anything_ that goes into ->fd[] must be treated as final
as soon as you release the lock on ->fgrp (due to the dup2() issues).

							Al

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] Potential race in syspipe()
@ 2000-08-11 11:20 Alexander Viro
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Viro @ 2000-08-11 11:20 UTC (permalink / raw)
  To: 9fans


	Umm... Unless I'm seriously misreading the code, there is a race
between syspipe() and sysdup(). Scenario:

CPU1 (in syspipe()):
        c[0] = d->open(c[0], ORDWR);
        c[1] = d->open(c[1], ORDWR);
        fd[0] = newfd(c[0]);
/* got so far */
        fd[1] = newfd(c[1]);
/* failed on that one for whatever reason */

CPU2 (in sysdup()):
		...
                lock(f);
		...
                oc = f->fd[fd];
                f->fd[fd] = c;
                unlock(f);
                if(oc)
                        cclose(oc);
/* where fd happens to be equal to fd[0] on the CPU1 */
/* oc == CPU1's c[0], indeed */

CPU1:
        if(fd[0] < 0 || fd[1] < 0)
                error(Enofd);
/* there we go */
                cclose(c[0]);
/* woops - extra cclose on c[0] and we are welcome to... */
        if(c->flag&CFREE)
                panic("cclose %lux", getcallerpc(&c));


Giving user enough rope to hang himself is a good idea, but panic() goes
somewhat beyond "himself", especially when triggered on CPU server...
AFAICS the cleanish solution would be along the lines of

int
newfd2(int fd[2], Chan *c[2])
{
        Fgrp *f;
	int i,j;

        f = up->fgrp;
        lock(f);
        for(i=0; i<f->nfd; i++)
                if(f->fd[i] == 0)
                        break;
	for(j=i+1; j<f->nfd; j++)
		if (f->fd[j] == 0)
			break;
      
        if(j >= f->nfd && growfd(f, j) < 0){
                unlock(f);
                return -1;
        }
        if(j > f->maxfd)
                f->maxfd = j;
        f->fd[fd[0]=i] = c[0];
        f->fd[fd[1]=j] = c[1];
        unlock(f);
        return 0;
}

and using it in syspipe() (instead of two newfd() calls). That way the
error recovery in syspipe() doesn't have to touch anything that might be
anywhere near the ->fd[].

Comments? AFAICS _anything_ that goes into ->fd[] must be treated as final
as soon as you release the lock on ->fgrp (due to the dup2() issues).

							Al



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2000-08-16 12:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-08-14  4:30 [9fans] Potential race in syspipe() presotto
2000-08-14  4:50 ` Alexander Viro
  -- strict thread matches above, loose matches on Subject: below --
2000-08-14  1:47 presotto
2000-08-14  2:42 ` Alexander Viro
2000-08-15 19:42   ` Dan Cross
2000-08-16 11:58     ` Alexander Viro
2000-08-16 12:36       ` Boyd Roberts
2000-08-11 11:20 Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).