* Re: [9fans] wait hang
@ 2006-12-18 20:00 erik quanstrom
0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 20:00 UTC (permalink / raw)
To: rsc, 9fans
On Mon Dec 18 14:27:31 EST 2006, rsc@swtch.com wrote:
> I'm not really convinced this part of rc ever worked right to
> begin with, so this is no big loss.
you're right. this feature of rc doesn't work as expected. and your example
illustrates this well. for example if we try 20 wait records instead of 128:
; sleep 1000000 &
; pids=()
; for(i in `{seq 20}){
echo foo >/dev/null &
pids=($pids $apid)
}
; for(p in $pids){
echo $p
wait $p
}
4627
[hang]
since 20 is much less than 128, this can't be the kernel. i think the
problem is in the discarded wait messages here: rc/plan9.c/^Waitfor
int
Waitfor(int pid, int)
{
thread *p;
Waitmsg *w;
char errbuf[ERRMAX];
while((w = wait()) != nil){
if(w->pid==pid){
setstatus(w->msg);
free(w);
return 0;
}
for(p = runq->ret;p;p = p->ret)
if(p->pid==w->pid){
p->pid=-1;
strcpy(p->status, w->msg);
}
free(w);
}
errstr(errbuf, sizeof errbuf);
if(strcmp(errbuf, "interrupted")==0) return -1;
return 0;
}
byron's rc solves this problem by maintaining it's own waitlist in $apids.
; for(i in 1 2 3 4) { echo fu>/dev/null&}
31427
31428
31429
31430
; echo $apids
31427 31428 31429 31430
; echo $apids
31427 31428 31429 31430
; wait $apids(1)
; echo $apids
31428 31429 31430
; wait $apids(3)
; echo $apids
; wait 31428
;
i would think the right way to go may be to either remove
"wait $pid" from rc's vocabulary or do something like byron's rc.
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
@ 2006-12-20 3:55 erik quanstrom
0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-20 3:55 UTC (permalink / raw)
To: rsc, quanstro, 9fans
there's a version of rc on sources (/n/sources/contrib/quanstro/rc-wait.tbz)
that keeps careful track of waits so "wait $pid" should be dependable,
at least in the basic approach, i did minimal minimal testing of the details.
there is somewhat verbose information printed along these lines:
; sleep & sleep & sleep & sleep
waitqadd(772665)
waitqadd(772666)
waitqadd(772667)
waitqadd(772668)
waitfor 772668
w->pid 772668
waitqrm(772668)
Xreturn i=0; p 205256, i=1
broken! wait 772665
waitfor 772665
w->pid 772667
w->pid 772666
w->pid 772665
waitqrm(772665)
Xreturn i=0; p 205256, i=1
broken! wait 772666
waitfor 772666
Xreturn i=0; p 205256, i=1
broken!
this output can be supressed by redefining the wdebug macro in exec.h.
the previous case of
; sleep $longtime &
; echo&echo&echo&
; wait $apid
should illustrate the difference.
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
@ 2006-12-18 20:45 erik quanstrom
0 siblings, 0 replies; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 20:45 UTC (permalink / raw)
To: rsc, quanstro, 9fans
if that scan through the runq fails (it always will, since runq->ret
is never set). then the wait message will be lost by Waitfor unless the
first wait message returned from the kernel happens to be the one we're
looking for.
perhaps what i wrote wasn't clear.
it's not so bad dropping the kernel's wait message, but in order for "wait $pid"
to work as advertised, Xasync would need to add $apid to a list and Waitfor
would need to remove it. this way, if the waitfor list is nil, we just complain
there's nothing to wait for instead of hanging. if it's not nil, then if the wait
completed, we return otherwise we find the right member of the list and note
that it completed.
the list scan that's there i'm pretty sure is for non-Xasync processes only. p->ret
is referenced in Xreturn and Xstart and Waitfor only.
- erik
On Mon Dec 18 15:06:38 EST 2006, rsc@swtch.com wrote:
> > since 20 is much less than 128, this can't be the kernel. i think the
> > problem is in the discarded wait messages here: rc/plan9.c/^Waitfor
>
> Where do you see discarded wait messages?
>
> > int
> > Waitfor(int pid, int)
> > {
> > thread *p;
> > Waitmsg *w;
> > char errbuf[ERRMAX];
> >
> > while((w = wait()) != nil){
> > if(w->pid==pid){
> > setstatus(w->msg);
> > free(w);
> > return 0;
> > }
> > for(p = runq->ret;p;p = p->ret)
> > if(p->pid==w->pid){
> > p->pid=-1;
> > strcpy(p->status, w->msg);
> > }
>
^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <db450a2ec40aeb8284b66f00407cb1cb@coraid.com>]
* Re: [9fans] wait hang
[not found] <db450a2ec40aeb8284b66f00407cb1cb@coraid.com>
@ 2006-12-18 20:06 ` Russ Cox
0 siblings, 0 replies; 8+ messages in thread
From: Russ Cox @ 2006-12-18 20:06 UTC (permalink / raw)
To: erik quanstrom; +Cc: 9fans
> since 20 is much less than 128, this can't be the kernel. i think the
> problem is in the discarded wait messages here: rc/plan9.c/^Waitfor
Where do you see discarded wait messages?
> int
> Waitfor(int pid, int)
> {
> thread *p;
> Waitmsg *w;
> char errbuf[ERRMAX];
>
> while((w = wait()) != nil){
> if(w->pid==pid){
> setstatus(w->msg);
> free(w);
> return 0;
> }
> for(p = runq->ret;p;p = p->ret)
> if(p->pid==w->pid){
> p->pid=-1;
> strcpy(p->status, w->msg);
> }
This loop is trying to save the information about the
wait message. But it doesn't seem to get consulted
anywhere, and I have never understood why that particular
linked list is the one to be looking through.
Russ
^ permalink raw reply [flat|nested] 8+ messages in thread
* [9fans] wait hang
@ 2006-12-18 16:01 erik quanstrom
2006-12-18 16:40 ` Russ Cox
0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 16:01 UTC (permalink / raw)
To: 9fans
the kernel fix does indeed keep
; for(i in `{seq 1 $n}){>/dev/null echo fu&}
; for(i in `{seq 1 $n}){>/dev/null echo fu&}
from hanging.
however, i think the fix to rc might be in error. if i do this
#!/bin/rc
cmd1&
cmd2&
wait
shouldn't that wait for both processes? for example, on an older
rc, from this script
#/bin/rc
date
sleep 15 &
sleep 30 &
wait
date
i get
Mon Dec 18 10:57:43 EST 2006
Mon Dec 18 10:58:13 EST 2006
whereas with the rc from friday, i get
; 8.out
broken! date ; sleep 15 & sleep 30 & wait ; date
Mon Dec 18 11:00:33 EST 2006
Mon Dec 18 11:00:33 EST 2006
- erik
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
2006-12-18 16:01 erik quanstrom
@ 2006-12-18 16:40 ` Russ Cox
2006-12-18 19:05 ` erik quanstrom
0 siblings, 1 reply; 8+ messages in thread
From: Russ Cox @ 2006-12-18 16:40 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
> the kernel fix does indeed keep
> ; for(i in `{seq 1 $n}){>/dev/null echo fu&}
> ; for(i in `{seq 1 $n}){>/dev/null echo fu&}
> from hanging.
>
> however, i think the fix to rc might be in error. if i do this
>
> #!/bin/rc
> cmd1&
> cmd2&
> wait
>
> shouldn't that wait for both processes? for example, on an older
> rc, from this script
>
> #/bin/rc
> date
> sleep 15 &
> sleep 30 &
> wait
> date
>
> i get
>
> Mon Dec 18 10:57:43 EST 2006
> Mon Dec 18 10:58:13 EST 2006
>
> whereas with the rc from friday, i get
>
> ; 8.out
> broken! date ; sleep 15 & sleep 30 & wait ; date
> Mon Dec 18 11:00:33 EST 2006
> Mon Dec 18 11:00:33 EST 2006
You're right. I forgot about "wait".
I reverted the change to rc.
Note that if you add "wait" to your hanging example
it will not necessarily wait for all of the echos, only
the first 128 or so.
Russ
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
2006-12-18 16:40 ` Russ Cox
@ 2006-12-18 19:05 ` erik quanstrom
2006-12-18 19:23 ` Russ Cox
0 siblings, 1 reply; 8+ messages in thread
From: erik quanstrom @ 2006-12-18 19:05 UTC (permalink / raw)
To: 9fans
On Mon Dec 18 11:49:41 EST 2006, rsc@swtch.com wrote:
> You're right. I forgot about "wait".
> I reverted the change to rc.
>
> Note that if you add "wait" to your hanging example
> it will not necessarily wait for all of the echos, only
> the first 128 or so.
>
> Russ
if i'm reading pwait correctly, it will actually wait for all if them because nwait
is now properly incremented and decremented, but the wait message will be
discarded if the waitlist is >= 128 when the child exits.
do i have this wrong?
- erik
from port/proc.c
1195 lock(&up->exl);
1196 if(up->nchild == 0 && up->waitq == 0) {
1197 unlock(&up->exl);
1198 error(Enochild);
1199 }
1200 unlock(&up->exl);
1201
1202 sleep(&up->waitr, haswaitq, up);
1203
1204 lock(&up->exl);
1205 wq = up->waitq;
1206 up->waitq = wq->next;
1207 up->nwait--;
1208 unlock(&up->exl);
1209
1210 qunlock(&up->qwaitr);
1211 poperror();
1212
1213 if(w)
1214 memmove(w, &wq->w, sizeof(Waitmsg));
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [9fans] wait hang
2006-12-18 19:05 ` erik quanstrom
@ 2006-12-18 19:23 ` Russ Cox
0 siblings, 0 replies; 8+ messages in thread
From: Russ Cox @ 2006-12-18 19:23 UTC (permalink / raw)
To: Fans of the OS Plan 9 from Bell Labs
You're right again. The "wait" command will work correctly --
it won't return until all children have finished. What might not
work is waiting for a particular pid. That is, if you do something like
sleep 1000000 &
pids=()
for(i in `{seq 200}){
echo foo >/dev/null &
pids=($pids $apid)
}
for(p in $pids)
wait $p
then since the kernel tosses away some of the wait records,
the first wait $p for a tossed-away record will just loop calling
wait and never getting the expected record. It will not give up
because there is still the one child (sleep 1000000) left.
I'm not really convinced this part of rc ever worked right to
begin with, so this is no big loss.
Plain "wait", with no argument, does work correctly now.
Russ
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-12-20 3:55 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-18 20:00 [9fans] wait hang erik quanstrom
-- strict thread matches above, loose matches on Subject: below --
2006-12-20 3:55 erik quanstrom
2006-12-18 20:45 erik quanstrom
[not found] <db450a2ec40aeb8284b66f00407cb1cb@coraid.com>
2006-12-18 20:06 ` Russ Cox
2006-12-18 16:01 erik quanstrom
2006-12-18 16:40 ` Russ Cox
2006-12-18 19:05 ` erik quanstrom
2006-12-18 19:23 ` Russ Cox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).