9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] devmnt crash, fix.
@ 2012-08-22 18:53 erik quanstrom
  2012-08-22 19:09 ` cinap_lenrek
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-22 18:53 UTC (permalink / raw)
  To: 9fans

i'm probablly wrong again, but we had a crash with this back trace.
the process doing the i/o was given a note. ...

src(0xe010a6b9); // dumpstack+0x10
src(0xe0112653); // panic+0x112
src(0xe010a8d7); // fault386+0x17d
src(0xe0109dc2); // trap+0x15d
src(0xe010066f); // forkret
//passing interrupt frame; last pc found at sp=0xf4ea5d94
src(0xe01babbd); // lockloop+0x34
src(0xe01bacf6); // lock+0x103
src(0xe01aeed8); // wakeup+0x18
src(0xe0111a91); // mountmux+0xf1
src(0xe01115b6); // mountio+0x21d
src(0xe011127a); // mountrpc+0x27
src(0xe011116f); // mntrdwr+0xfb
src(0xe0110faa); // mntread+0x126
src(0xe01b90e3); // sysexec+0x172
src(0xe010abbb); // syscall+0x238
src(0xe0100cb5); // _syscallintr+0x18
...

i believe the unlock(m) needs to be moved to just before the return
because mntflushfree -> mntqrm can run after we drop the lock and
before the wakeup.

- erik

----

void
mountmux(Mnt *m, Mntrpc *r)
{
	Mntrpc **l, *q;

	lock(m);
	l = &m->queue;
	for(q = *l; q; q = q->list) {
		/* look for a reply to a message */
		if(q->request.tag == r->reply.tag) {
			*l = q->list;
			if(q != r) {
				/*
				 * Completed someone else.
				 * Trade pointers to receive buffer.
				 */
				q->reply = r->reply;
				q->b = r->b;
				r->b = nil;
			}
			q->done = 1;
			unlock(m);
			if(mntstats != nil)
				(*mntstats)(q->request.type,
					m->c, q->stime,
					q->reqlen + r->replen);
			if(q != r)
mountmux+0xf1			wakeup(&q->r);
			return;
		}
		l = &q->list;
	}
	unlock(m);
	print("unexpected reply tag %ud; type %d\n", r->reply.tag, r->reply.type);
}



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 18:53 [9fans] devmnt crash, fix erik quanstrom
@ 2012-08-22 19:09 ` cinap_lenrek
  2012-08-22 19:12 ` cinap_lenrek
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2012-08-22 19:09 UTC (permalink / raw)
  To: 9fans

the mntrpc m->queue is protected by the mnt lock. when
mountmux() unlocks m, it has already removed the rpc from
the queue so how would mntqrm() get to the rpc in that case?

any other cases where the rpc is taken somewhere and then
the mnt lock is (maybe just temporarily) released and then
used again?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 18:53 [9fans] devmnt crash, fix erik quanstrom
  2012-08-22 19:09 ` cinap_lenrek
@ 2012-08-22 19:12 ` cinap_lenrek
  2012-08-22 19:53 ` cinap_lenrek
  2012-08-22 23:30 ` cinap_lenrek
  3 siblings, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2012-08-22 19:12 UTC (permalink / raw)
  To: 9fans

ohhh! i see :)

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 18:53 [9fans] devmnt crash, fix erik quanstrom
  2012-08-22 19:09 ` cinap_lenrek
  2012-08-22 19:12 ` cinap_lenrek
@ 2012-08-22 19:53 ` cinap_lenrek
  2012-08-22 22:26   ` erik quanstrom
  2012-08-22 23:30 ` cinap_lenrek
  3 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2012-08-22 19:53 UTC (permalink / raw)
  To: 9fans

i think this is not so mutch mntflushfree() / mntqrm(), but when we
complete a request for someone else (q != r) and we release m,
the other proc could indeed be woken up already freeing the rpc
while we are are trying to wakeup on that "done with" rpc?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 19:53 ` cinap_lenrek
@ 2012-08-22 22:26   ` erik quanstrom
  0 siblings, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-22 22:26 UTC (permalink / raw)
  To: cinap_lenrek, 9fans

On Wed Aug 22 15:56:43 EDT 2012, cinap_lenrek@gmx.de wrote:
> i think this is not so mutch mntflushfree() / mntqrm(), but when we
> complete a request for someone else (q != r) and we release m,
> the other proc could indeed be woken up already freeing the rpc
> while we are are trying to wakeup on that "done with" rpc?

that's what i ment.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 18:53 [9fans] devmnt crash, fix erik quanstrom
                   ` (2 preceding siblings ...)
  2012-08-22 19:53 ` cinap_lenrek
@ 2012-08-22 23:30 ` cinap_lenrek
  2012-08-23  1:53   ` erik quanstrom
  3 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2012-08-22 23:30 UTC (permalink / raw)
  To: 9fans

i think we'r seeing exactly what russ described on 9fans here:

http://9fans.net/archive/2011/02/358

after we set q->done = 1; (the unlock of m probably doesnt even
matter) it might be possible for mountio()'s sleep() call to return
immidiately and return, freeing the rpc before mountmux()
on another proc/cpu even call wakeup() and potentialy hitting freed
memory.

if this theory is true (anyone seeing what prevents this from
happening?) one trick could be to use a Rendez not embedded in the
rpc structure. If we make the Rendez into a Rendez* which we
take a local copy of before we set q->done = 1; in mountmux(),
and in mountio assign r->r = &up->sleep; (or maybe even dedicate
a new Rendez in the proc structure?) then its safe to call
wakeup even if the rpc got already freed. This might produce
spurious wakeups on that rendez but thats not a big problem
(for devmnt's case at least) as we always check rpc->done and
repeat the sleep on spurious wakeup. Doing interlocked refcounting
to synchronize the wakeup seems like overkill and would me mutch
more complicated i think.

suggestions?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-22 23:30 ` cinap_lenrek
@ 2012-08-23  1:53   ` erik quanstrom
  2012-08-23  9:47     ` cinap_lenrek
  0 siblings, 1 reply; 21+ messages in thread
From: erik quanstrom @ 2012-08-23  1:53 UTC (permalink / raw)
  To: 9fans

On Wed Aug 22 19:33:52 EDT 2012, cinap_lenrek@gmx.de wrote:
> i think we'r seeing exactly what russ described on 9fans here:
>
> http://9fans.net/archive/2011/02/358
>
> after we set q->done = 1; (the unlock of m probably doesnt even
> matter) it might be possible for mountio()'s sleep() call to return
> immidiately and return, freeing the rpc before mountmux()
> on another proc/cpu even call wakeup() and potentialy hitting freed
> memory.

devaoe (9atom version) deals with a similar problem.  see strategy().

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23  1:53   ` erik quanstrom
@ 2012-08-23  9:47     ` cinap_lenrek
  2012-08-23 13:27       ` erik quanstrom
  0 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2012-08-23  9:47 UTC (permalink / raw)
  To: 9fans

can you provide the source file, dont have current 9atom iso handy
and the stuff i found on the net doesnt seem to cut it. however,
i found http://www.quanstro.net/plan9/sleepwake.html which at the
end gives a solution to the problem:

void
dowake(Io *i)
{
	i->cond = 1;
	wakeup(i);
	i->cond = 2;		/* atomic */
	/* hands off i now */
}
void
doio(void)
{
	Io *i;

	dispatch(i = malloc(sizeof *i));
	while(waserror())
		/* eat note */;
	sleep(i, iodone, i);
	poperror();
	while(i->cond != 2)
		/* rendez not done */;
	free(i);
}

that would be relatively easy to apply to devmnt and doesnt have the
problem of spurious wakeups with abusing up->sleep as rendez.

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23  9:47     ` cinap_lenrek
@ 2012-08-23 13:27       ` erik quanstrom
  2012-08-23 15:02         ` cinap_lenrek
  2012-08-23 15:39         ` Bakul Shah
  0 siblings, 2 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 13:27 UTC (permalink / raw)
  To: 9fans

On Thu Aug 23 05:50:45 EDT 2012, cinap_lenrek@gmx.de wrote:
> can you provide the source file, dont have current 9atom iso handy

here's the full kernel source: http://ftp.quanstro.net/other/kernel.mkfs.bz2

> and the stuff i found on the net doesnt seem to cut it. however,
> i found http://www.quanstro.net/plan9/sleepwake.html which at the
> end gives a solution to the problem:

this was actually distilled from the solution that devaoe uses.  the
main simplification is in the case of a note.  which is key in this case,
since it would be very annoying for devmnt to ignore notes.

> that would be relatively easy to apply to devmnt and doesnt have the
> problem of spurious wakeups with abusing up->sleep as rendez.

any time you use sleep/wakeup, you're going to have to deal with
spurious wakeups.  it does not matter which rendezvous structure
you use.  it doesn't matter if it's shared or not.  it's perfectly valid
for sleep to return for no reason at all!

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 13:27       ` erik quanstrom
@ 2012-08-23 15:02         ` cinap_lenrek
  2012-08-23 15:39         ` Bakul Shah
  1 sibling, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2012-08-23 15:02 UTC (permalink / raw)
  To: 9fans

why? are you counting notes as spurious wakeups? i would not count
them as such because sleep will not return in that case but do a
error jump out of it instead no? what other spurious wakeup sources
are there?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 13:27       ` erik quanstrom
  2012-08-23 15:02         ` cinap_lenrek
@ 2012-08-23 15:39         ` Bakul Shah
  2012-08-23 15:48           ` erik quanstrom
  2012-08-23 15:53           ` cinap_lenrek
  1 sibling, 2 replies; 21+ messages in thread
From: Bakul Shah @ 2012-08-23 15:39 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Aug 23, 2012, at 6:27 AM, erik quanstrom <quanstro@quanstro.net> wrote:
> any time you use sleep/wakeup, you're going to have to deal with
> spurious wakeups.  it does not matter which rendezvous structure
> you use.  it doesn't matter if it's shared or not.  it's perfectly valid
> for sleep to return for no reason at all!

Indeed. One way to catch sleep() errors  is to change wakeup() to wake *everyone* up.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 15:39         ` Bakul Shah
@ 2012-08-23 15:48           ` erik quanstrom
  2012-08-23 15:53           ` cinap_lenrek
  1 sibling, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 15:48 UTC (permalink / raw)
  To: 9fans

>
> Indeed. One way to catch sleep() errors  is to change wakeup() to wake *everyone* up.

i was thinking on this and it occurred to me ....

to make the point by absurdity, consider this approximation
of sleep and wakeup without notes.

	void
	sleep(Rendez*, void (*)(void*), void*)
	{
	}

	void
	wakeup(Rendez*)
	{
	}

these have the property that you may wake without the condition
but you may not miss a wakeup.  it also doesn't matter if the condition
happens while sleep is executing.  and there is no hazard between
the two.  so i think this impemenation does satisfy the all
the requirements!

i didn't say it was efficient.  :-)

i think what cinap is thinking is, how could devmnt's particular
use of sleep/wakeup with this particular implementation of sleep/wakeup
evade handling spurious wakeups.  if that's what the thought is,
i would resist it.  it may be that someone thinks of a very efficient
way of re-implementing sleep/wakeup that may have slightly different
properties.  as long as the client and sleep/wakeup play by the rules,
this should not be a problem.  (it's possible to do plan 9-style sleep
wakeup within, say, the windows kernel.)

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 15:39         ` Bakul Shah
  2012-08-23 15:48           ` erik quanstrom
@ 2012-08-23 15:53           ` cinap_lenrek
  2012-08-23 16:06             ` erik quanstrom
  1 sibling, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2012-08-23 15:53 UTC (permalink / raw)
  To: 9fans

pwait() in port/proc.c would crash in that case if you wake it up
with up->waitq == nil.

or was this ment as a joke?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 15:53           ` cinap_lenrek
@ 2012-08-23 16:06             ` erik quanstrom
  2012-08-23 16:18               ` cinap_lenrek
  0 siblings, 1 reply; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 16:06 UTC (permalink / raw)
  To: 9fans

On Thu Aug 23 11:56:42 EDT 2012, cinap_lenrek@gmx.de wrote:
> pwait() in port/proc.c would crash in that case if you wake it up
> with up->waitq == nil.

how would that happen?  with the approperiate locks held (waitqr),
up->waitq == nil is tested.

> or was this ment as a joke?

no joke.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 16:06             ` erik quanstrom
@ 2012-08-23 16:18               ` cinap_lenrek
  2012-08-23 16:24                 ` erik quanstrom
  0 siblings, 1 reply; 21+ messages in thread
From: cinap_lenrek @ 2012-08-23 16:18 UTC (permalink / raw)
  To: 9fans

pwait():

...

	sleep(&up->waitr, haswaitq, up);

	lock(&up->exl);
	wq = up->waitq;
	up->waitq = wq->next;	<-- wq == nil, boom! its gone!
	up->nwait--;
	unlock(&up->exl);


if sleep returns or is spuriously woken up even tho up->waitq == nil.

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 16:18               ` cinap_lenrek
@ 2012-08-23 16:24                 ` erik quanstrom
  2012-08-23 16:47                   ` Charles Forsyth
       [not found]                   ` <CAOw7k5ii5PoYZnEbDeUuhR9GsgEEGt7i+XTQm7FR9+r=ENg=xw@mail.gmail.c>
  0 siblings, 2 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 16:24 UTC (permalink / raw)
  To: cinap_lenrek, 9fans

> 	sleep(&up->waitr, haswaitq, up);
>
> 	lock(&up->exl);
> 	wq = up->waitq;
> 	up->waitq = wq->next;	<-- wq == nil, boom! its gone!
> 	up->nwait--;
> 	unlock(&up->exl);
>
>
> if sleep returns or is spuriously woken up even tho up->waitq == nil.

ah, right.  looks like a bug.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 16:24                 ` erik quanstrom
@ 2012-08-23 16:47                   ` Charles Forsyth
  2012-08-23 17:01                     ` cinap_lenrek
  2012-08-23 17:53                     ` Bakul Shah
       [not found]                   ` <CAOw7k5ii5PoYZnEbDeUuhR9GsgEEGt7i+XTQm7FR9+r=ENg=xw@mail.gmail.c>
  1 sibling, 2 replies; 21+ messages in thread
From: Charles Forsyth @ 2012-08-23 16:47 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

No. It's not true that all sleeps might return spuriously (it might have
been true in Unix, I can't remember).
It's not true in Plan 9.

On 23 August 2012 12:24, erik quanstrom <quanstro@quanstro.net> wrote:

> >       sleep(&up->waitr, haswaitq, up);
> >
> >       lock(&up->exl);
> >       wq = up->waitq;
> >       up->waitq = wq->next;   <-- wq == nil, boom! its gone!
> >       up->nwait--;
> >       unlock(&up->exl);
> >
> >
> > if sleep returns or is spuriously woken up even tho up->waitq == nil.
>
> ah, right.  looks like a bug.
>
> - erik
>
>

[-- Attachment #2: Type: text/html, Size: 1067 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 16:47                   ` Charles Forsyth
@ 2012-08-23 17:01                     ` cinap_lenrek
  2012-08-23 17:53                     ` Bakul Shah
  1 sibling, 0 replies; 21+ messages in thread
From: cinap_lenrek @ 2012-08-23 17:01 UTC (permalink / raw)
  To: 9fans

yes, this is what i would'v expected. so who made up that rule
that sleep might spuriously wakeup for no reason or that kernel
code should be written as if sleep/wakeup being troll implementations?

--
cinap



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
       [not found]                   ` <CAOw7k5ii5PoYZnEbDeUuhR9GsgEEGt7i+XTQm7FR9+r=ENg=xw@mail.gmail.c>
@ 2012-08-23 17:09                     ` erik quanstrom
  0 siblings, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 17:09 UTC (permalink / raw)
  To: 9fans

On Thu Aug 23 12:48:17 EDT 2012, charles.forsyth@gmail.com wrote:

> No. It's not true that all sleeps might return spuriously (it might have
> been true in Unix, I can't remember).
> It's not true in Plan 9.

is this in /sys/doc/sleep.ps?  i think i'm missing it.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 16:47                   ` Charles Forsyth
  2012-08-23 17:01                     ` cinap_lenrek
@ 2012-08-23 17:53                     ` Bakul Shah
  2012-08-23 18:03                       ` erik quanstrom
  1 sibling, 1 reply; 21+ messages in thread
From: Bakul Shah @ 2012-08-23 17:53 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, 23 Aug 2012 12:47:10 EDT Charles Forsyth <charles.forsyth@gmail.com>  wrote:
> No. It's not true that all sleeps might return spuriously (it might have
> been true in Unix, I can't remember).
> It's not true in Plan 9.

It is not that sleep might return spuriously (that is, the
wakeup condition is still false) but that it may have *become*
false once again due to an action by another process before
the sleeper has had a chance to run.  To handle this one would
use a pattern like this

	while (!wakeup_condition(&foo))
		sleep(&foo);

I haven't looked at this in ages but this used to be the case
in unix.  So here awakening everyone *spuriously* is /safe/
(but not efficient).

doc/sleep.ps says
    We assume that a particular call to sleep corresponds to a
    particular call to wakeup, that is, at most one process is
    asleep waiting for a particular event.

But it is not clear to me that another process can't get in to
change the condition between the time wakeup() is called and
the sleeper actually runs -- the above condition is not
violated as the new process won't be sleeping. But if yet
another process tries to access the same condition, it will be
blocked on getting the rendezvous structure lock?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [9fans] devmnt crash, fix.
  2012-08-23 17:53                     ` Bakul Shah
@ 2012-08-23 18:03                       ` erik quanstrom
  0 siblings, 0 replies; 21+ messages in thread
From: erik quanstrom @ 2012-08-23 18:03 UTC (permalink / raw)
  To: 9fans

> the sleeper actually runs -- the above condition is not
> violated as the new process won't be sleeping. But if yet
> another process tries to access the same condition, it will be
> blocked on getting the rendezvous structure lock?

in plan 9 sleep, only 1 sleeper is allowed.

- erik



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-08-23 18:03 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-22 18:53 [9fans] devmnt crash, fix erik quanstrom
2012-08-22 19:09 ` cinap_lenrek
2012-08-22 19:12 ` cinap_lenrek
2012-08-22 19:53 ` cinap_lenrek
2012-08-22 22:26   ` erik quanstrom
2012-08-22 23:30 ` cinap_lenrek
2012-08-23  1:53   ` erik quanstrom
2012-08-23  9:47     ` cinap_lenrek
2012-08-23 13:27       ` erik quanstrom
2012-08-23 15:02         ` cinap_lenrek
2012-08-23 15:39         ` Bakul Shah
2012-08-23 15:48           ` erik quanstrom
2012-08-23 15:53           ` cinap_lenrek
2012-08-23 16:06             ` erik quanstrom
2012-08-23 16:18               ` cinap_lenrek
2012-08-23 16:24                 ` erik quanstrom
2012-08-23 16:47                   ` Charles Forsyth
2012-08-23 17:01                     ` cinap_lenrek
2012-08-23 17:53                     ` Bakul Shah
2012-08-23 18:03                       ` erik quanstrom
     [not found]                   ` <CAOw7k5ii5PoYZnEbDeUuhR9GsgEEGt7i+XTQm7FR9+r=ENg=xw@mail.gmail.c>
2012-08-23 17:09                     ` erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).