From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Fri, 11 Jun 2010 11:03:19 -0400
To: 9fans@9fans.net
Message-ID: <70d80f50da355772daa7d21f195c7b4b@kw.quanstro.net>
In-Reply-To: <4C124E2C.7010008@bouyapop.org>
References: <4C1242CD.5020202@bouyapop.org>
	<AANLkTilQUS_JD8CWkcSJt_boJ1tBTG4i_xjZTfYfeyUl@mail.gmail.com>
	<f398e344c84e5946e0189ebb69638d57@kw.quanstro.net>
	<4C124E2C.7010008@bouyapop.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] 9vx, kproc and *double sleep*
Topicbox-Message-UUID: 317e4054-ead6-11e9-9d60-3106f5b1d025

On Fri Jun 11 10:54:40 EDT 2010, xigh@bouyapop.org wrote:
> I don't think either splhi fixes the problem ... it only hides it for
> the 99.999999999% cases.

on a casual reading, i agree.  unfortunately,
the current simplified promela model disagrees,
and coraid has run millions of cpu-hrs on quad
processor machines running near 100% load
with up to 1500 procs, and never seen this.

unless you have a good reason why we've never
seen such a deadlock, i'm inclined to believe
we're missing something.  we need better reasons
for sticking locks in than guesswork.
multiple locks can easily lead to deadlock.

have you tried your solution with a single Mach?

> No ... I don't think so. I think the problem comes from the fact the
> process is no longer exclusively tied to the current Mach when going
> (back) to schedinit() ... hence the change I did.

have you tried?  worst case is you'll have more
information on the problem.

- erik