Pentium Pro and coherence

9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed

* Pentium Pro and coherence
@ 1997-04-21 21:18 miller
  0 siblings, 0 replies; 2+ messages in thread
From: miller @ 1997-04-21 21:18 UTC (permalink / raw)


I'm sorry if my clumsy editing gave the impression that my summary
of the content of a message from presotto@plan9.bell-labs.com:

	> [a fascinating account of how the Pentium Pro's out-of-order
	> instruction execution breaks the Plan 9 sleep/wakeup code on
	> a multi-CPU system]

was meant to be a direct quote.  The words in editorial square brackets
are mine, not his.  I haven't quite got the hang of this new-fangled
electronical mailing business ...

Anyway, it is certainly plausible that speculative reads are the
problem.  The essential bit of code seems to be the equivalent of the 
following fragment of Alef:

	par {
		{ a = 1; x = b; }
		{ b = 1; y = a; }
	}

It is `obvious' that this establishes the postcondition (x==1 || y==1).
Informal proof: if not (x==1), then (x=b) must have been executed before
(b=1), and therefore (a=1) must have been executed before (y=a), so that
we must have (y==1).  But this reasoning depends on the left-to-right
ordering of sequential assignments.  If that's not guaranteed on the
Pentium Pro, then shared-variable concurrency without locks becomes
very scary indeed.

presotto@plan9.bell-labs.com writes [his words this time]:

> Forcing an interlock
> at both the beginning and end of a locked section seems to be
> pretty conservative to me [...] The
> truth is that we have no way of knowing whether we're conservative
> enough.

I hope this is overly pessimistic.  In order for locked critical
sections to work at all, the hardware must be able to guarantee
that the ordering of effects before and after the locking instruction
will be preserved.  Or is this too naive?

-- Richard Miller




^ permalink raw reply	[flat|nested] 2+ messages in thread

* Pentium Pro and coherence
@ 1997-04-21 14:33 presotto
  0 siblings, 0 replies; 2+ messages in thread
From: presotto @ 1997-04-21 14:33 UTC (permalink / raw)

Sorry for yet another long message...

→	From: hamnavoe.demon.co.uk!miller
	To: cse.psu.edu!9fans
	Subject: Re: porting linux programs and drivers to plan9

	presotto@plan9.bell-labs.com writes:

	> [a fascinating account of how the Pentium Pro's out-of-order
	> instruction execution breaks the Plan 9 sleep/wakeup code on
	> a multi-CPU system]

I didn't write those words.  I may have written what
accompanied them but not having seen the message, I don't
know.

The exact ordering I gave in my last mail was impossible
because of the locks.  An equally illustrative
(and this time actually possible) version follows.  

	wakeup_condition = 1;

			p = u->p;
			lock(&p->rlock);
			r->p = p;		/* put myself in the rendezvous structure */
		A:	if(wakeup_condition){
				r->p = 0;		/* no need to sleep */
				unlock(&p->rlock);
				return;
			} else {

				/* go to sleep */
				p->state = Wakeme;
				p->r = r;
				unlock(&p->rlock);

	p = r->p;
B:	if(p == 0)
		return;
	lock(&p->rlock);
	if(r->p == p && p->r == r){
		r->p = 0;
		p->r = 0;
		ready(p);
	}
	unlock(&p->rlock);

				sched();
			}

The ordering of the critical instructions is the same but at least
this time I got the ordering of the locked pieces right.  The critical
points are A and B.  With speculative reads, both r->p and
wakeup_condition may appear to be 0 (depending on what lock()
does or doesn't do).

→	It appears that the slightly different version of sleep/wakeup
	given in the Volume 2 paper `Process Sleep and Wakeup on a
	Shared-memory Multiprocessor' should be immune to the effects
	of weak memory coherency, because the shared variables are
	referenced only inside a lock/unlock pair.  Is this right?

I'm not sure.  It depends a bit on what we believe fixes
the coherence.  We don't really know what's happening inside the
pro, we're just guessing.  We're not even certain that speculative
reads are the problem.  The Pro people have remained silent
on the subject (we've sent email).

Assuming that it was indeed speculative reads, the simplest mechanism
that I can posit Intel to have provided was to have speculative
reads canceled whenever an interlocking instruction is encountered.
If this is indeed the case, then leaving everything between locks
wouldbe sufficient.

( Unfortunately, we don't do that
  because of the interaction between postnote and sleep/wakeup.  Postnote
  doesn't know what r is without first looking at p->r outside of any
  possible lock.  We could fix sleep/wakeup by moving the problem so
  to be between sleep and postnote.  However, it'ld be the same
  problem.  This is perhaps another story. )

Of course, I could be totally wrong about the speculative reads and
it may be the interlock instruction on the writer and not the
reader that causes the processors to become coherent.  In that case, at the
very least, we'ld have to make unlock() end with an interlocking
instruction.  The released version just sets 'l->val = 0'.

We have discovered empiricly that performing an interlock instruction
between setting one shared variable and looking at the other seems
sufficient.  Nothing less seemed to work for us.  Putting everything
back inside the locks might have worked but we didn't because of
postnote().

Since we're paranoids, we now perform an interlocking instruction
before checking the state variables in sleep() and wakeup() AND
at the end of unlock().  Everywhere else, we seem to be following
a strict just change/look at shared things inside of lock/unlock policy.

→	Perhaps the moral is that it's better to be conservative with
	locks than to trust hardware designers to do what we expect.

I certainly agree.  We are going to encounter more relaxed ordering
in multiprocessors.  The question is, what do the hardware
designers consider conservative?  Forcing an interlock
at both the beginning and end of a locked section seems to be
pretty conservative to me, but I clearly am not immaginative
enough.  The Pro manuals go into excruciating detail in describing
the caches and what keeps them coherent but don't seem to care
to say anything detailed about execution or read ordering.  The
truth is that we have no way of knowing whether we're conservative
enough.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~1997-04-21 21:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1997-04-21 21:18 Pentium Pro and coherence miller
  -- strict thread matches above, loose matches on Subject: below --
1997-04-21 14:33 presotto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).