9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] sleep/wakeup bug?
@ 2011-02-24 22:01 erik quanstrom
  2011-02-25  4:46 ` Russ Cox
  2011-02-25  9:46 ` Richard Miller
  0 siblings, 2 replies; 15+ messages in thread
From: erik quanstrom @ 2011-02-24 22:01 UTC (permalink / raw)
  To: 9fans

/sys/doc/sleep.ps says that sleep/wakeup are atomic.
in concrete terms, i take this to mean that if sleep
has returned, wakeup will no longer be in its critical
section.

unfortunately, this does not seem to be the case.
the woken process can continue before the rendezvous
lock is dropped.  this means that any dynamic allocation
of structures containing rendezvous is not possible because
structure can be free'd before the rendezvous lock is
dropped by the waking process.

this was biting me on a high-end mp system with improved
lapic arbitration in the aoe driver, faulting in the pool
library.  the memory in question was an Srb.  after i zeroed
the memory before free'ing, i observed an unlock: not locked:
pc x, where x was the splx in wakeup().  this led directly to
the observation that the ready'd process could never know
when it would be safe to free the rendezvous-containing
structure.

here's my suggested correction

Proc*
wakeup(Rendez *r)
{
	Proc *p;
	int s;

	s = splhi();

	lock(r);
	p = r->p;

	if(p != nil){
		lock(&p->rlock);
		if(p->state != Wakeme || p->r != r){
			iprint("%p %p %d\n", p->r, r, p->state);
			panic("wakeup: state");
		}
		r->p = nil;
		p->r = nil;
	}
	unlock(r);
	if(p != nil){
		ready(p);
		unlock(&p->rlock);
	}
	splx(s);

	return p;
}

the handling of p->rlock looks wierd, but i haven't
investigated.

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [9fans] sleep/wakeup bug?
@ 2011-02-25  5:26 erik quanstrom
  2011-02-25  5:47 ` Russ Cox
  0 siblings, 1 reply; 15+ messages in thread
From: erik quanstrom @ 2011-02-25  5:26 UTC (permalink / raw)
  To: 9fans

> assuming a tight 1:1 coupling between sleep and
> wakeup is a recipe for trouble.  even if your change
> fixes one possible race (i didn't bother to see what changed),
> you still have to deal with

the point of sleep/rendezvous is tight coupling, no?

the change was to move the ready() to after the rendezvous
lock was dropped.  therefore the sleeper knows the rendezvous
is not locked by the event that woke him.  if one can assert
that each sleep has exactly one wakeup (as is often the case
for rpc-style programming), then that is enough to know
the rendezvous can be retired.

> these races are inherent to the definition of sleep and
> wakeup.  it doesn't mean what you need it to mean
> to free memory immediately after sleeping on it.

if not a tight coupling, what kind of coupling would you
think is appropriate?  when would you think it would be
fair to recycle the rendezvous?  10s?  :-)  what idiom do
you think would be appropriate for such a case?

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-25 16:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-24 22:01 [9fans] sleep/wakeup bug? erik quanstrom
2011-02-25  4:46 ` Russ Cox
2011-02-25  9:46 ` Richard Miller
2011-02-25  5:26 erik quanstrom
2011-02-25  5:47 ` Russ Cox
2011-02-25  5:53   ` erik quanstrom
2011-02-25  6:01     ` Russ Cox
2011-02-25  6:12       ` erik quanstrom
     [not found]       ` <2808a9fa079bea86380a8d52be67b980@coraid.com>
     [not found]         ` <AANLkTi=4_=++Tm2a9Jq9jSzqUSexkW-ZjM-38oD_bS1y@mail.gmail.com>
     [not found]           ` <40925e8f64489665bd5bd6ca743400ea@coraid.com>
2011-02-25  6:51             ` Russ Cox
2011-02-25  7:13               ` erik quanstrom
2011-02-25 14:44                 ` Russ Cox
2011-02-25  8:37               ` Sape Mullender
2011-02-25  9:18                 ` Bakul Shah
2011-02-25 14:57               ` Charles Forsyth
2011-02-25 16:09               ` Venkatesh Srinivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).