From mboxrd@z Thu Jan  1 00:00:00 1970
From: presotto@plan9.bell-labs.com
To: 9fans@cse.psu.edu
Subject: Re: [9fans] sleep(), sched() and ilock()
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="upas-qmrztztwbtehutpkjwtkjwbfbj"
Message-Id: <20001110130656.CD862199EA@mail.cse.psu.edu>
Date: Fri, 10 Nov 2000 08:06:55 -0500
Topicbox-Message-UUID: 273f5108-eac9-11e9-9e20-41e7f4b1d025

This is a multi-part message in MIME format.
--upas-qmrztztwbtehutpkjwtkjwbfbj
Content-Disposition: inline
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

I can't figure out what you're asking.  ilock() keeps interrupts from happening.
Therefore, unless you explicitly call sched() in the ilock'd code, you can't
move to a new processor because nothing else can run to switch the context.
Therefore, the ilock() will complete on the same processor.

If you do call sched() you're making a mistake.  You will give up the processor
with the lock still held and any interrupt that needs the lock will deadlock.
That's the reason ilock exists as opposed to lock.

The same is true of the splhi and the lock's around sleep.  The splhi is
there because wakeup can be called from interrupt level and hence the
process can't give up the processor until the locks are released.  The
splx is necessary because we may not sleep and we need to return in the
same state we arrived.  An spllo there would probably be good enough since
we don't expect sleep to be called splhi except by accident.

--upas-qmrztztwbtehutpkjwtkjwbfbj
Content-Type: message/rfc822
Content-Disposition: inline

Received: from plan9.cs.bell-labs.com ([135.104.9.2]) by plan9; Thu Nov  9 18:29:22 EST 2000
Received: from mail.cse.psu.edu ([130.203.4.6]) by plan9; Thu Nov  9 18:29:20 EST 2000
Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.30.6])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP
	id 6E88E199EA; Thu,  9 Nov 2000 18:29:09 -0500 (EST)
Received: from alfa.comberg.cz (web.comberg.cz [212.24.142.98])
	by mail.cse.psu.edu (CSE Mail Server) with ESMTP id C95C8199E4
	for <9fans@cse.psu.edu>; Thu,  9 Nov 2000 18:28:27 -0500 (EST)
Received: from beta (datela-1-4-19.dialup.vol.cz [212.20.98.53])
	by alfa.comberg.cz (8.9.3/8.9.3/Debian/GNU) with SMTP id AAA11613
	for <9fans@cse.psu.edu>; Fri, 10 Nov 2000 00:45:58 +0100
Message-ID: <001901c022bc$58778dc0$356214d4@cz99.cz>
From: "Jakub Jermar" <jj@comberg.cz>
To: <9fans@cse.psu.edu>
References: <20001109211530.9CCDB199E4@mail.cse.psu.edu>
Subject: Re: [9fans] sleep(), sched() and ilock()
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-2"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Sender: 9fans-admin@cse.psu.edu
Errors-To: 9fans-admin@cse.psu.edu
X-BeenThere: 9fans@cse.psu.edu
X-Mailman-Version: 2.0rc1
Precedence: bulk
Reply-To: 9fans@cse.psu.edu
List-Id: Fans of the O/S Plan 9 from Bell Labs <9fans.cse.psu.edu>
List-Archive: <http://lists.cse.psu.edu/archives/9fans/>
Date: Wed, 20 Sep 2000 06:35:33 +0200

> First. Am I right when supposing that the reason ilock() doesn't call
> scheduler is that it has to be sure
> that iunlock() executes on the same cpu?
>
> In general, if you are doing an ilock, it should be for something that
> can be done quickly.  ilock is used when you're locking something that
> an interrupt routine also uses and scheding in the middle of it would be
> deadly.

I understand that there can be a deadlock if both a process and an interrupt
race for one lock - therefore the concept of ilock(). I also understand why
ilock() dies when it finds the ilock locked on a uni-processor - because no
one can unlock it and, seen from the other side, ilock() must have been
called twice without first calling iunlock().
But I still don't know, whether iunlock() executing on the same cpu is a
goal (maybe because of the willingness to restore the processor state as it
appeared before ilock()???) or a mere consequence of the fact that there was
no rescheduling in between.

> Second. Why doesn't sleep() simply call sched() right after it releases
both
> locks? Am I unaware of some possible race condition?
>
> Off hand, I can't see any particular reason other than tunnel vision
> on our part.  Looks like all we're doing is duplicating code with the
> net effect of saving 2 subroutine calls (splhi and sched).  I may try
> it the other way and have Gerard Holtzman verify it again and see if
> I'm missing something myself.

That's interesting. Sleep is an example of routine that goes to scheduler
splhi()'ed (as you refer to in your comment to my third point). Furthemore,
it emulates the behaviour of normal sched() after the process resumes its
p->sched: it spllo()'s but immediately after it, there is splx(x) which
cancels the effect of spllo(), no matter what x was.
Actually, this function makes me think there must be a reason for it to
duplicate piece of sched(), but I can't figure out what it is. I thought it
was because of the two locks being held, but their releasing before
gotolabel(&m->sched) is as neccessary as it would have been if sleep() had
called the normal sched().

> Third. It seems to me that sleep() might disable interrupts on a different
> cpu than is the one that enables them again before sleep() finishes. Is it
> ok when a process brings the old processor state to the new current
> processor and leaves the old one in (say) the splhi()'d state?
>
> Once you go frolicking through 'gotolabel(&m->sched)', the spl level
> is lost, i.e., schedinit always starts hi and returns low: any
> process coming out of a rescheduling comes out spllo().  That
> means that calling sched() when you are splhi() is wrong.  This
> refers back to your first point.  The processor that you did the sleep
> on will go spllo while its looking for a new process to run and will
> start that process up spllo.

But when I gotolabel(&up->sched) in sched(), after it runproc()'s a new
process, it restores its PC
and SP. SP points to the process's stack which contains the process's x
(variable from sleep()) with previous processor state, which is restored in
sleep() on the current cpu afterwards.
I believe this is the way that sleeping and waking processes carry processor
state with them. Never mind if the two processors are not the same - I
realized the functionality must be the same - it works just as though there
was no processor switch (the emphasis must be on the code and not the
processor). If the process just gets rescheduled, it leaves sched()
spllo()'ed - just as you said..

Jakub Jermar

--upas-qmrztztwbtehutpkjwtkjwbfbj--