From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4C125582.4050703@bouyapop.org> Date: Fri, 11 Jun 2010 17:25:54 +0200 From: Philippe Anel User-Agent: Thunderbird 2.0.0.24 (X11/20100318) MIME-Version: 1.0 To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> References: <4C1242CD.5020202@bouyapop.org> <4C124E2C.7010008@bouyapop.org> <70d80f50da355772daa7d21f195c7b4b@kw.quanstro.net> <4C12549C.4070701@bouyapop.org> In-Reply-To: <4C12549C.4070701@bouyapop.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [9fans] 9vx, kproc and *double sleep* Topicbox-Message-UUID: 31971408-ead6-11e9-9d60-3106f5b1d025 Oooops ... sorry for double copy :) The post was supposed to be : I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and as I wrote, this is only a thougth experiment ... with my poor brain :) Phil; Philippe Anel wrote: > > I never seen it on real hardware but I think it does not mean it > cannot happen. The problem in 9vx comes from the fact 9vx Mach are > simulated by pthreads which can be scheduled just before calling > gotolabel in sleep(). This gives the time to another Mach (or pthread) > to 'readies' the proc A. > > I never seen it on real hardware but I think it does not mean it > cannot happen. The problem in 9vx comes from the fact 9vx Mach are > simulated by pthreads which can be scheduled just before calling > gotolabel in sleep(). This gives the time to another Mach (or pthread) > to 'readies' the proc A. > > I think it does not happen on real hardware because the cpu just don't > stop while calling gotolabel() and executes the scheduler. It does not > happen because the cpu is not interupted (thanks to splhi). But still, > I feel the problem is here, and we can imagine ... why not, the cpu > running proc A blocking on a bus request or something else. > > I don't know if the model is good or not ... and I wrote this is only > a thougth experiment ... with my poor brain :) > > I think it does not happen on real hardware because the cpu just don't > stop while calling gotolabel() and executes the scheduler. It does not > happen because the cpu is not interupted (thanks to splhi). But still, > I feel the problem is here, and we can imagine ... why not, the cpu > running proc A blocking on a bus request or something else. > > I don't know if the model is good or not ... and I wrote this is only > a thougth experiment ... with my poor brain :) > > Phil; > > > erik quanstrom wrote: >> On Fri Jun 11 10:54:40 EDT 2010, xigh@bouyapop.org wrote: >> >>> I don't think either splhi fixes the problem ... it only hides it >>> for the 99.999999999% cases. >>> >> >> on a casual reading, i agree. unfortunately, >> the current simplified promela model disagrees, >> and coraid has run millions of cpu-hrs on quad >> processor machines running near 100% load >> with up to 1500 procs, and never seen this. >> >> unless you have a good reason why we've never >> seen such a deadlock, i'm inclined to believe >> we're missing something. we need better reasons >> for sticking locks in than guesswork. >> multiple locks can easily lead to deadlock. >> >> have you tried your solution with a single Mach? >> >> >>> No ... I don't think so. I think the problem comes from the fact the >>> process is no longer exclusively tied to the current Mach when going >>> (back) to schedinit() ... hence the change I did. >>> >> >> have you tried? worst case is you'll have more >> information on the problem. >> >> - erik >> >> >> > > >