9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* calling sleep() while holding lock() (fwd)
@ 1997-05-20 19:14 G.David
  0 siblings, 0 replies; only message in thread
From: G.David @ 1997-05-20 19:14 UTC (permalink / raw)


Sorry for letting this sit, but life happens...

Back to our story.

>From ivan@ncube.com (Eivind Sarto)

>> ---------- Forwarded message ----------
>> From: "G. David Butler" <gdb@dbSystems.com>
>> 
>> A summary of the changes:
>> 
>
>Wow!  That was quick.

Thanks.

>As I mention in my previous mail, there are some locking violations
>in the VM code that only shows up under heavy load.

Or not so heavy load....

snip....

>These are caused by uncachepage() being called with one or more locks.
>uncachepage() can call putimage() which may close the chan.

Yes, you are very right!

>The solution I implemented was to make putimage not close the chan.  If
>it was the last reference to the image, it returns the Chan*, otherwise
>it return NULL.  uncachepage needs to be slightly recoded and it must
>also terurn the Chan* pointer back to its caller.  Whoever called
>uncachepage can then close the Chan after locks have been released (if
>it was the last reference to the image).

I agree with your approach.  The only place that is a little hard
is the end of duppage() in page.c.  For the moment I have a panic
to guard the double Chan return.  If I ever see one I will have
to fix it right.  A reading of the code says a panic is possible.

>These changes prevented me from making radical changes to the page code.
>Just some minor changes to putimage, uncachepage and whoever calls them.

Are radical changes necessary, at least from this perspective?

>I think we fixed a couple of locking violations in the streams code, too.
>One place even has a comment about holding spin-lock and sleeping.

I don't see that comment.  I've been using the patches I sent before
to find violations, but haven't seen any there.

>I'll be happy to answer any questions.

Ok, here is the next big one.  What happens when a interrupt handler
needs a lock that the non interrupt code holds at spllo()?  Can you
say "lock loop"?  It seems some care needs to be applied to find all
spin locks that can be acquired in interrupt handlers and make sure
the base code uses ilock to guard the lock with splhi().  Now that
I have fixed most of the sleep with spins I can put considerable load
on the system and the next thing that breaks is lock loops.

This problem exists on all platforms both uni and multi cpu.  If an
interrupt handler stops the execution of the critical code on a cpu,
you can get a lock loop.  It is ok for an interrupt to occur on
another cpu since the critical locked code will continue and release
the lock while the interrupt spins.

Before I start this journey, can you lend further pointers?

>ivan@ncube.com

Thanks for the help.

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~1997-05-20 19:14 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-05-20 19:14 calling sleep() while holding lock() (fwd) G.David

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).