From mboxrd@z Thu Jan  1 00:00:00 1970
To: 9fans@9fans.net
Date: Fri, 25 Feb 2011 09:37:39 +0100
From: Sape Mullender <sape@plan9.bell-labs.com>
Message-ID: <9e5ef05b5cfedd0faa831cc7c0d57f74@plan9.cs.bell-labs.com>
In-Reply-To: <AANLkTi=FabYqOd3ozUEXi9_Ua8S5DujfUjhzCYxPF2TA@mail.gmail.com>
References: <11da45046fa8267e7445128ed00724cd@ladd.quanstro.net>
	<AANLkTinuyGgidt_Lz9rQ8VyVtTVdnU2TABmbZG7NWt8g@mail.gmail.com>
	<24bb48f61c5eab87a133b82a9ef32474@coraid.com>
	<AANLkTim7W5rkgP00_M0AprNGpFhYgP=UUtWtOytJDKHO@mail.gmail.com>
	<2808a9fa079bea86380a8d52be67b980@coraid.com>
	<AANLkTi=4_=++Tm2a9Jq9jSzqUSexkW-ZjM-38oD_bS1y@mail.gmail.com>
	<40925e8f64489665bd5bd6ca743400ea@coraid.com>
	<AANLkTi=FabYqOd3ozUEXi9_Ua8S5DujfUjhzCYxPF2TA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Cc: sape@plan9.bell-labs.com
Subject: Re: [9fans] sleep/wakeup bug?
Topicbox-Message-UUID: b48243ba-ead6-11e9-9d60-3106f5b1d025

I suppose the use of counting semaphores in sleep/wakeup could
help in cases like this (but I'm sure there are still plenty of
other scenarios where they might not help).  The value of the
semaphore would represent something like "number of things to
do", so acquire(sema) would (atomically) wait until the value
of sema is greater than zero, then (using compare&swap, or
doing the whole thing inside an ilock) decrement the semaphore
and continue.
Release(sema) will (atomically) increment the semaphore and, if the
old value was zero, wake up any waiters.

Now, at first glance that looks like a vast improvement over sleep/
wakeup, but *inside* acquire and release, you'd still have sleep/wakeup
and you'd still run the risk of waking up just when something else
managed to grab the semaphore, or waking up something that hasn't
actually gone to sleep yet.

So, I think you can think of semaphores as a wrapper for sleep/wakeup
that can be used in some case to make sure that you can indeed safely
do a free() of some memory (this was, I think what started the whole
discussion).

It's taken a long time to get sleep/wakeup bugfree in Plan 9 and
some of the greatest minds in code verification (formerly at Bell Labs)
have been called upon to help get it right.

Russ is perfectly correct in the explanations below and it's a good
exercise to read through it.  This stuff is really tricky.  Many
optimization, all of them seemingly correct, failed because of subtle
race conditions, some of them involving three or more processes.

	Sape

>> your layout in your first email (i think) assumes that wakeup
>> can be called twice.
>
> it doesn't.  the scenario in my first email has exactly one
> sleep and one wakeup.
>
> the canonical problem you have to avoid when implementing
> sleep and wakeup is that the wakeup might happen before
> the sleep has gotten around to sleeping.  to be concrete,
> you might do something like:
>
> cpu1:
>     kick off disk i/o operation
>     sleep(r)
>
> cpu2:
>     interrupt happens
>     mark operation completed
>     wakeup(r)
>
> the problem is what happens if the interrupt is so fast
> that cpu2 runs all that before sleep(&r) starts.
> a wakeup without a sleep is defined to be a no-op, so
> if the wakeup runs first the sleep never wakes up:
>
> cpu1:
>     kick off disk i/o operation
>
> cpu2:
>     interrupt happens
>     mark operation completed
>     wakeup(r)
>
> cpu1:
>     sleep(r)  // never returns
>
> to avoid that problem there is this extra f, arg passed
> to sleep along with some locks to make sure sleep
> and wakeup are not running their coordination code
> simultaneously.  with f(arg), the last scenario becomes:
>
> cpu1:
>     kick off disk i/o operation
>
> cpu2:
>     interrupt happens
>     mark operation completed
>     wakeup(r)
>
> cpu1:
>     sleep(r)
>         calls f(arg), which sees op marked completed, returns 1
>         sleep returns immediately
>
> avoiding the missed wakeup.
>
> unfortunately the f(arg) check means that now sleep can
> sometimes return before wakeup (kind of a missed sleep):
>
> cpu1:
>     kick off disk i/o operation
>
> cpu2:
>     interrupt happens
>     mark operation completed
>
> cpu1:
>     sleep(r)
>         calls f(arg), which checks completed, returns 1
>         sleep returns immediately
>
> cpu2:
>     wakeup(r)
>         finds nothing sleeping on r, no-op.
>
> there's no second wakeup involved here.  this is just sleep
> figuring out that there's nothing to sleep for, before wakeup
> comes along.  f(arg) == true means that wakeup is either
> on its way or already passed by, and sleep doesn't know which,
> so it has to be conservative and not sleep.
>
> if r is allocated memory and cpu1 calls free(r) when sleep
> returns, that's not okay, because cpu2 has already decided
> to call wakeup(r), which will now be scribbling on or at least
> looking at freed memory.
>
> as i said originally, it's simply not 1:1.
> if you need 1:1, you need a semaphore.
>
> russ
>
>
> p.s. not relevant to your "only one sleep and one wakeup"
> constraint, but that last scenario also means that if you
> are doing repeated sleep + wakeup on a single r, that pending
> wakeup call left over on cpu2 might not happen until cpu1 has
> gone back to sleep (a second time).  that is, the first wakeup
> can wake the second sleep, intending to wake the first sleep.
> so in general you have to handle the case where sleep
> wakes up for no good reason.  it doesn't happen all the time,
> but it does happen.