From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4C1242CD.5020202@bouyapop.org> Date: Fri, 11 Jun 2010 16:06:05 +0200 From: Philippe Anel User-Agent: Thunderbird 2.0.0.24 (X11/20100318) MIME-Version: 1.0 To: 9fans@9fans.net Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [9fans] 9vx, kproc and *double sleep* Topicbox-Message-UUID: 314280e6-ead6-11e9-9d60-3106f5b1d025 Dear 9fans, I think I maybe have found the reason why some of us have a *double sleep* error while running 9vx. I think this would even explain some segmentation faults seen in kprocs. Please forgive me for the length of this post. I'm trying to be as explicit as possible. I have no certainty about this, mostly because the part of the code I think involved in the bug has been checked and read several times by several programmers. But I instrumented my own version of 9vx with a circular log buffer and have the same results each time I encounter this bug. While reading the a/proc.c source code, I wondered what would happen if two cpus (Machs) try to call the function ready() on the same process at almost the same time. I know the function ready() queues a proc into run queue so the scheduler (or schedulers as one is executed per Mach) can execute it. Because of this, if a process can be readied twice, two Machs would execute the same code with the same stack and hence the whole kernel would crash. Then I immediatly wondered if this could be the reason why we have the function sleep called twice on the same Rendez address (typically a "*double sleep*"). The function queueproc() called by the function ready() does not check the if process about to be added to the run queue has already been added. The reason is not only because it takes time, but most of all, this case is not supposed to happen. If a process A in running (ie in Running state), it is out of the run queue. Same if it is waiting (ie in Wakeme state), in which case, only one other process is expected to have a ref on A (through the Rendez.p member). The thing is I think this last point is not totally true, because process A also holds a ref to itself. Let's think about the following thought experiment (here again I apologize because I suspect I could have made this simpler), which would requires 3 cpus (or Machs): Step 1, on Mach 0: A proc X is asking for a worker process (kproc/kserver in 9vx) to execute some task. In order to do so, it calls wakeup on a Rendez pointer held by a proc A, typically sleeping on it (we will see how a few lines below). Step 2, on Mach 1: Proc A is awakened. It serves the call (Kcall in 9vx) and then signals the client (proc X) the work is done. In order to do so, it calls wakeup on the Rendez pointer Proc.rsleep of X. Then proc A waits for another request. It sleeps again on the Rendez pointer assigned to the server. Let see how it works. It first locks the Rendez, then locks itself. After this it checks if the condition happened of if a note is pending. Lets assume this did not happened. It thus change its own state to Wakeme and initialize the schedule point which be called by the scheduler when the Rendez will be awakened. Then, before going to sleep (or giving control to scheduler), it unlocks itself, then unlocks the Rendez. Sleeping means giving control the scheduler so that another proc (or coroutine in kernel) can execute. In order to do that, the sleep() function just calls "gotolabel(&m->sched)". At this moment, no one has a lock on either the Rendez or proc A. Step 3, on Mach 0: Proc X, which has been awakened by proc A, ask proc A for another request. It calls wakeup on the Rendez on which A is trying to sleep. I insist on the fact proc A is 'trying' because the scheduler has still not switched Mach.up and thus Mach 1 has still a ref on Proc A. In the function wakeup(), proc X locks r (the Rendez), then locks r->p which is pointer to proc A, and then call ready on it before unlocking p and r. At this point, we have proc A in the run queue, and still in the scheduler of Mach 1. We then can imagine that a third cpu (Mach 2) schedules this proc (ie dequeue it from run queue, change its state to Running and calls gotolabel(&p->sched) ... executing proc A code on proc A stack ... This is the bug I think. Because in the meantime, Mach 1 can continue its execution in schedinit() function, with proc A state set to Running. And Mach 1 would thus calls ready() on proc A and even schedules and executes it. I know this is unlikely to happen because sleep() goes to the scheduler with splhi and there is no reason why it could not process schedinit() "if (up) {...}" statement before another cpu/Mach calls the function wakeup which itself calls ready() on proc A. But the problem is that 9vx splhi() is empty ... and cpu/Machs which are really pthreads are scheduled by the operating system (Linux, BSD, ...). In fact, there is a 0.000001% (I cannot calculate this to be honnest) (non-)chance this can happen on real hardware. I updated the functions sleep() and schedinit() so the function schedinit() unlocks both p and p->r in "if (up) { ... }" statement. Since this change, I no longer have the *double sleep* error, nor segmentation fault. What do you think ? Phil; /* * sleep if a condition is not true. Another process will * awaken us after it sets the condition. When we awaken * the condition may no longer be true. * * we lock both the process and the rendezvous to keep r->p * and p->r synchronized. */ void sleep(Rendez *r, int (*f)(void*), void *arg) { int s; s = splhi(); lock(r); lock(&up->rlock); if(r->p){ print("double sleep called from %#p, %lud %lud\n", getcallerpc(&r), r->p->pid, up->pid); dumpstack(); } /* * Wakeup only knows there may be something to do by testing * r->p in order to get something to lock on. * Flush that information out to memory in case the sleep is * committed. */ r->p = up; if((*f)(arg) || up->notepending){ /* * if condition happened or a note is pending * never mind */ r->p = nil; unlock(&up->rlock); unlock(r); } else { /* * now we are committed to * change state and call scheduler */ up->state = Wakeme; up->r = r; procsave(up); if(setlabel(&up->sched)) { /* * here when the process is awakened */ procrestore(up); spllo(); } else { /* * here to go to sleep (i.e. stop Running) */ // xigh: move unlocking to schedinit() // unlock(&up->rlock); // unlock(r); gotolabel(&m->sched); } } if(up->notepending) { up->notepending = 0; splx(s); if(up->procctl == Proc_exitme && up->closingfgrp) forceclosefgrp(); error(Eintr); } splx(s); } /* * Always splhi()'ed. */ void schedinit(void) /* never returns */ { Edf *e; setlabel(&m->sched); if(up) { if((e = up->edf) && (e->flags & Admitted)) edfrecord(up); m->proc = 0; switch(up->state) { case Running: ready(up); break; case Wakeme: unlock(&up->rlock); unlock(up->r); break; case Moribund: // ... break; } up->mach = nil; updatecpu(up); up = nil; } sched(); }