From mboxrd@z Thu Jan 1 00:00:00 1970 MIME-Version: 1.0 In-Reply-To: <40925e8f64489665bd5bd6ca743400ea@coraid.com> References: <11da45046fa8267e7445128ed00724cd@ladd.quanstro.net> <24bb48f61c5eab87a133b82a9ef32474@coraid.com> <2808a9fa079bea86380a8d52be67b980@coraid.com> <40925e8f64489665bd5bd6ca743400ea@coraid.com> Date: Fri, 25 Feb 2011 01:51:10 -0500 Message-ID: Subject: Re: [9fans] sleep/wakeup bug? From: Russ Cox To: erik quanstrom Cc: 9fans <9fans@9fans.net> Content-Type: text/plain; charset=UTF-8 Topicbox-Message-UUID: b473f648-ead6-11e9-9d60-3106f5b1d025 > your layout in your first email (i think) assumes that wakeup > can be called twice. it doesn't. the scenario in my first email has exactly one sleep and one wakeup. the canonical problem you have to avoid when implementing sleep and wakeup is that the wakeup might happen before the sleep has gotten around to sleeping. to be concrete, you might do something like: cpu1: kick off disk i/o operation sleep(r) cpu2: interrupt happens mark operation completed wakeup(r) the problem is what happens if the interrupt is so fast that cpu2 runs all that before sleep(&r) starts. a wakeup without a sleep is defined to be a no-op, so if the wakeup runs first the sleep never wakes up: cpu1: kick off disk i/o operation cpu2: interrupt happens mark operation completed wakeup(r) cpu1: sleep(r) // never returns to avoid that problem there is this extra f, arg passed to sleep along with some locks to make sure sleep and wakeup are not running their coordination code simultaneously. with f(arg), the last scenario becomes: cpu1: kick off disk i/o operation cpu2: interrupt happens mark operation completed wakeup(r) cpu1: sleep(r) calls f(arg), which sees op marked completed, returns 1 sleep returns immediately avoiding the missed wakeup. unfortunately the f(arg) check means that now sleep can sometimes return before wakeup (kind of a missed sleep): cpu1: kick off disk i/o operation cpu2: interrupt happens mark operation completed cpu1: sleep(r) calls f(arg), which checks completed, returns 1 sleep returns immediately cpu2: wakeup(r) finds nothing sleeping on r, no-op. there's no second wakeup involved here. this is just sleep figuring out that there's nothing to sleep for, before wakeup comes along. f(arg) == true means that wakeup is either on its way or already passed by, and sleep doesn't know which, so it has to be conservative and not sleep. if r is allocated memory and cpu1 calls free(r) when sleep returns, that's not okay, because cpu2 has already decided to call wakeup(r), which will now be scribbling on or at least looking at freed memory. as i said originally, it's simply not 1:1. if you need 1:1, you need a semaphore. russ p.s. not relevant to your "only one sleep and one wakeup" constraint, but that last scenario also means that if you are doing repeated sleep + wakeup on a single r, that pending wakeup call left over on cpu2 might not happen until cpu1 has gone back to sleep (a second time). that is, the first wakeup can wake the second sleep, intending to wake the first sleep. so in general you have to handle the case where sleep wakes up for no good reason. it doesn't happen all the time, but it does happen.