9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] wait hang
@ 2006-12-20  3:55 erik quanstrom
  0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-20  3:55 UTC (permalink / raw)
  To: rsc, quanstro, 9fans

there's a version of rc on sources (/n/sources/contrib/quanstro/rc-wait.tbz)
that keeps careful track of waits so "wait $pid" should be dependable,
at least in the basic approach, i did minimal minimal testing of the details.
there is somewhat verbose information printed along these lines:

	; sleep & sleep & sleep & sleep
	waitqadd(772665)
	waitqadd(772666)
	waitqadd(772667)
	waitqadd(772668)
	waitfor 772668
	  w->pid 772668
	waitqrm(772668)
	Xreturn i=0; p 205256, i=1
	broken! wait 772665
	waitfor 772665
	  w->pid 772667
	  w->pid 772666
	  w->pid 772665
	waitqrm(772665)
	Xreturn i=0; p 205256, i=1
	broken! wait 772666
	waitfor 772666
	Xreturn i=0; p 205256, i=1
	broken! 

this output can be supressed by redefining the wdebug macro in exec.h.

the previous case of
	; sleep $longtime &
	; echo&echo&echo&
	; wait $apid

should illustrate the difference.

- erik



^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
@ 2006-12-18 20:45 erik quanstrom
  0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 20:45 UTC (permalink / raw)
  To: rsc, quanstro, 9fans

if that scan through the runq fails (it always will, since runq->ret 
is never set). then the wait message will be lost by Waitfor unless the
first wait message returned from the kernel happens to be the one we're
looking for.

perhaps what i wrote wasn't clear.

it's not so bad dropping the kernel's wait message, but in order for "wait $pid"
to work as advertised, Xasync would need to add $apid to a list and Waitfor
would need to remove it.  this way, if the waitfor list is nil, we just complain
there's nothing to wait for instead of hanging.  if it's not nil, then if the wait
completed, we return otherwise we find the right member of the list and note
that it completed.

the list scan that's there i'm pretty sure is for non-Xasync processes only.  p->ret
is referenced in Xreturn and Xstart and Waitfor only.

- erik

On Mon Dec 18 15:06:38 EST 2006, rsc@swtch.com wrote:
> > since 20 is much less than 128, this can't be the kernel.  i think the
> > problem is in the discarded wait messages here: rc/plan9.c/^Waitfor
> 
> Where do you see discarded wait messages?
> 
> > int
> > Waitfor(int pid, int)
> > {
> >         thread *p;
> >         Waitmsg *w;
> >         char errbuf[ERRMAX];
> >
> >         while((w = wait()) != nil){
> >                 if(w->pid==pid){
> >                         setstatus(w->msg);
> >                         free(w);
> >                         return 0;
> >                 }
> >                 for(p = runq->ret;p;p = p->ret)
> >                         if(p->pid==w->pid){
> >                                 p->pid=-1;
> >                                 strcpy(p->status, w->msg);
> >                         }
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
@ 2006-12-18 20:00 erik quanstrom
  0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 20:00 UTC (permalink / raw)
  To: rsc, 9fans

On Mon Dec 18 14:27:31 EST 2006, rsc@swtch.com wrote:
> I'm not really convinced this part of rc ever worked right to
> begin with, so this is no big loss.

you're right.  this feature of rc doesn't work as expected.  and your example 
illustrates this well.  for example if we try 20 wait records instead of 128:

	; sleep 1000000 &
	; pids=()
	; for(i in `{seq 20}){
		  echo foo >/dev/null &
		  pids=($pids $apid)
		}
	; for(p in $pids){
			echo $p
		  	wait $p
		}
	4627
	[hang]

since 20 is much less than 128, this can't be the kernel.  i think the
problem is in the discarded wait messages here: rc/plan9.c/^Waitfor

int
Waitfor(int pid, int)
{
	thread *p;
	Waitmsg *w;
	char errbuf[ERRMAX];

	while((w = wait()) != nil){
		if(w->pid==pid){
			setstatus(w->msg);
			free(w);
			return 0;
		}
		for(p = runq->ret;p;p = p->ret)
			if(p->pid==w->pid){
				p->pid=-1;
				strcpy(p->status, w->msg);
			}
		free(w);
	}

	errstr(errbuf, sizeof errbuf);
	if(strcmp(errbuf, "interrupted")==0) return -1;
	return 0;
}

byron's rc solves this problem by maintaining it's own waitlist in $apids.

	; for(i in 1 2 3 4) { echo fu>/dev/null&}
	31427
	31428
	31429
	31430
	; echo $apids
	31427 31428 31429 31430
	; echo $apids
	31427 31428 31429 31430
	; wait $apids(1)
	; echo $apids
	31428 31429 31430
	; wait $apids(3)
	; echo $apids
	
	; wait 31428
	;

i would think the right way to go may be to either remove
"wait $pid" from rc's vocabulary or do something like byron's rc.

- erik


^ permalink raw reply	[flat|nested] 8+ messages in thread
* [9fans] wait hang
@ 2006-12-18 16:01 erik quanstrom
  2006-12-18 16:40 ` Russ Cox
  0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 16:01 UTC (permalink / raw)
  To: 9fans

the kernel fix does indeed keep
	; for(i in `{seq 1 $n}){>/dev/null echo fu&}
	; for(i in `{seq 1 $n}){>/dev/null echo fu&}
from hanging.

however, i think the fix to rc might be in error.  if i do this

	#!/bin/rc
	cmd1&
	cmd2&
	wait

shouldn't that wait for both processes?  for example, on an older
rc, from this script

	#/bin/rc
	date
	sleep 15 &
	sleep 30 &
	wait
	date

i get

	Mon Dec 18 10:57:43 EST 2006
	Mon Dec 18 10:58:13 EST 2006

whereas with the rc from friday, i get

	; 8.out
	broken! date ; sleep 15 & sleep 30 & wait ; date
	Mon Dec 18 11:00:33 EST 2006
	Mon Dec 18 11:00:33 EST 2006

- erik


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-12-20  3:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <db450a2ec40aeb8284b66f00407cb1cb@coraid.com>
2006-12-18 20:06 ` [9fans] wait hang Russ Cox
2006-12-20  3:55 erik quanstrom
  -- strict thread matches above, loose matches on Subject: below --
2006-12-18 20:45 erik quanstrom
2006-12-18 20:00 erik quanstrom
2006-12-18 16:01 erik quanstrom
2006-12-18 16:40 ` Russ Cox
2006-12-18 19:05   ` erik quanstrom
2006-12-18 19:23     ` Russ Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).