From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <4C1242CD.5020202@bouyapop.org>
Date: Fri, 11 Jun 2010 16:06:05 +0200
From: Philippe Anel <xigh@bouyapop.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100318)
MIME-Version: 1.0
To: 9fans@9fans.net
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [9fans] 9vx, kproc and *double sleep*
Topicbox-Message-UUID: 314280e6-ead6-11e9-9d60-3106f5b1d025


Dear 9fans,

I think I maybe have found the reason why some of us have a *double
sleep* error while running 9vx. I think this would even explain some
segmentation faults seen in kprocs.

Please forgive me for the length of this post. I'm trying to be as
explicit as possible.

I have no certainty about this, mostly because the part of the code I
think involved in the bug has been checked and read several times by
several programmers. But I instrumented my own version of 9vx with a
circular log buffer and have the same results each time I encounter
this bug.

While reading the a/proc.c source code, I wondered what would happen
if two cpus (Machs) try to call the function ready() on the same
process at almost the same time.

I know the function ready() queues a proc into run queue so the
scheduler (or schedulers as one is executed per Mach) can execute
it. Because of this, if a process can be readied twice, two Machs
would execute the same code with the same stack and hence the whole
kernel would crash.

Then I immediatly wondered if this could be the reason why we have the
function sleep called twice on the same Rendez address (typically a
"*double sleep*").

The function queueproc() called by the function ready() does not check
the if process about to be added to the run queue has already been
added. The reason is not only because it takes time, but most of all,
this case is not supposed to happen. If a process A in running (ie in
Running state), it is out of the run queue. Same if it is waiting (ie
in Wakeme state), in which case, only one other process is expected to
have a ref on A (through the Rendez.p member).

The thing is I think this last point is not totally true, because
process A also holds a ref to itself. Let's think about the following
thought experiment (here again I apologize because I suspect I could
have made this simpler), which would requires 3 cpus (or Machs):

Step 1, on Mach 0:

   A proc X is asking for a worker process (kproc/kserver in 9vx) to
   execute some task. In order to do so, it calls wakeup on a Rendez
   pointer held by a proc A, typically sleeping on it (we will see how
   a few lines below).

Step 2, on Mach 1:

   Proc A is awakened. It serves the call (Kcall in 9vx) and then
   signals the client (proc X) the work is done. In order to do so, it
   calls wakeup on the Rendez pointer Proc.rsleep of X.

   Then proc A waits for another request. It sleeps again on the
   Rendez pointer assigned to the server.

   Let see how it works. It first locks the Rendez, then locks
   itself. After this it checks if the condition happened of if a note
   is pending. Lets assume this did not happened. It thus change its
   own state to Wakeme and initialize the schedule point which be
   called by the scheduler when the Rendez will be awakened. Then,
   before going to sleep (or giving control to scheduler), it unlocks
   itself, then unlocks the Rendez.

   Sleeping means giving control the scheduler so that another proc
   (or coroutine in kernel) can execute. In order to do that, the
   sleep() function just calls "gotolabel(&m->sched)".

   At this moment, no one has a lock on either the Rendez or proc A.

Step 3, on Mach 0:

   Proc X, which has been awakened by proc A, ask proc A for another
   request. It calls wakeup on the Rendez on which A is trying to
   sleep. I insist on the fact proc A is 'trying' because the
   scheduler has still not switched Mach.up and thus Mach 1 has still
   a ref on Proc A.

   In the function wakeup(), proc X locks r (the Rendez), then locks
   r->p which is pointer to proc A, and then call ready on it before
   unlocking p and r.

At this point, we have proc A in the run queue, and still in the
scheduler of Mach 1. We then can imagine that a third cpu (Mach 2)
schedules this proc (ie dequeue it from run queue, change its state to
Running and calls gotolabel(&p->sched) ... executing proc A code on
proc A stack ...

This is the bug I think. Because in the meantime, Mach 1 can continue
its execution in schedinit() function, with proc A state set to
Running. And Mach 1 would thus calls ready() on proc A and even
schedules and executes it.

I know this is unlikely to happen because sleep() goes to the
scheduler with splhi and there is no reason why it could not process
schedinit() "if (up) {...}" statement before another cpu/Mach calls
the function wakeup which itself calls ready() on proc A. But the
problem is that 9vx splhi() is empty ... and cpu/Machs which are
really pthreads are scheduled by the operating system (Linux, BSD,
...). In fact, there is a 0.000001% (I cannot calculate this to be
honnest) (non-)chance this can happen on real hardware.

I updated the functions sleep() and schedinit() so the function
schedinit() unlocks both p and p->r in "if (up) { ... }" statement.

Since this change, I no longer have the *double sleep* error, nor
segmentation fault.

What do you think ?

Phil;


/*
 *  sleep if a condition is not true.  Another process will
 *  awaken us after it sets the condition.  When we awaken
 *  the condition may no longer be true.
 *
 *  we lock both the process and the rendezvous to keep r->p
 *  and p->r synchronized.
 */
void
sleep(Rendez *r, int (*f)(void*), void *arg)
{
    int s;

    s = splhi();
    lock(r);
    lock(&up->rlock);
    if(r->p){
        print("double sleep called from %#p, %lud %lud\n",
             getcallerpc(&r), r->p->pid, up->pid);
        dumpstack();
    }
    /*
     *  Wakeup only knows there may be something to do by testing
     *  r->p in order to get something to lock on.
     *  Flush that information out to memory in case the sleep is
     *  committed.
     */
    r->p = up;
    if((*f)(arg) || up->notepending){
        /*
         *  if condition happened or a note is pending
         *  never mind
         */
        r->p = nil;
        unlock(&up->rlock);
        unlock(r);
    } else {
        /*
         *  now we are committed to
         *  change state and call scheduler
         */
        up->state = Wakeme;
        up->r = r;

        procsave(up);
        if(setlabel(&up->sched)) {
            /*
             *  here when the process is awakened
             */
            procrestore(up);
            spllo();
        } else {
            /*
             *  here to go to sleep (i.e. stop Running)
             */
            // xigh: move unlocking to schedinit()
             // unlock(&up->rlock);
            // unlock(r);
            gotolabel(&m->sched);
        }
    }
    if(up->notepending) {
        up->notepending = 0;
        splx(s);
        if(up->procctl == Proc_exitme && up->closingfgrp)
            forceclosefgrp();
        error(Eintr);
    }
    splx(s);
}

/*
 * Always splhi()'ed.
 */
void
schedinit(void)        /* never returns */
{
    Edf *e;

    setlabel(&m->sched);
    if(up) {
        if((e = up->edf) && (e->flags & Admitted))
            edfrecord(up);
        m->proc = 0;
        switch(up->state) {
        case Running:
            ready(up);
            break;
        case Wakeme:
            unlock(&up->rlock);
            unlock(up->r);
            break;
        case Moribund:
            // ...
            break;
        }
        up->mach = nil;
        updatecpu(up);
        up = nil;
    }
    sched();
}