9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* Re: [9fans] trouble with /dev/reboot and mp irqs (fix)
       [not found] <<B4460221-A10D-45CD-A66A-8C536C83655E@lsub.org>
@ 2009-12-10 14:31 ` erik quanstrom
  0 siblings, 0 replies; only message in thread
From: erik quanstrom @ 2009-12-10 14:31 UTC (permalink / raw)
  To: 9fans

On Tue Dec  8 13:01:13 EST 2009, nemo@lsub.org wrote:
> the shutdown code is new and might be wrong. worked here.
> I'll double check.
>

what i have been seeing is that a random machno can
execute the reboot.  this can cause the setting up of
the lapics to hang the machine.  (the mp spec doesn't
guarentee that any old processor can be the bsp, and
i'm not sure we do enough setup anyway.)  i have also
seen the clock interrupt getting in the way.  it seems
that while reboot() is running, exit() can be called
again from hzclock on the processor that's doing the
shutdown.  this causes a warm reboot rather than
a jump to the new kernel.

another problem is that there are lots delays in the
shutdown code.  these become fiddly if we're not going
through a bios reset.  this is because if the timing is
off, an ap can survive the reset and the old kernel
can print "cpu%d:
exiting" after the new kernel has started.  fun!

i can't prove this also explains the pull-the-power
hangs i've seen ~5% of the time, but it seems likely.

so what's the fix?  i'm not sure this is the best fix, but
what i decided to do is to try to get a case that's easy
to understand and already works, the uniprocessor
case.  this seems to be working pretty well.

so i wrote a kproc to park a processor.  it
1.  procwired()s (sic) itself to the processor in question;
2.  turns off interrupts and splhi's;
3.  calls idle.

then reboot was modified to
1.  wire itself to the bsp (machno==0),;
2.  halt mach 1..n;
3.  turns off interrupts;

if there's something here i have missed or apparently
don't understand, please let me know.

- erik

----

typedef struct {
	int i;
} Apshut;

void
apshut(void *v)
{
	Apshut *a;

	a = v;
	procwired(up, a->i);
	sched();
	splhi();
	if (arch)
		arch->introff();
	else
		i8259off();
	active.machs &= ~(1<<a->i);
	print("cpu%d: halt %.2ux\n", m->machno, active.machs);
	idle();
}

void
reboot(void *entry, void *code, ulong size)
{
	int i;
	Apshut a[MAXMACH];
	void (*f)(ulong, ulong, ulong);
	ulong *pdb;

	writeconf();

	procwired(up, 0);
	sched();

	for(i = 1; i < MAXMACH; i++){
		a[i].i = i;
		if(active.machs & 1<<i)
			kproc("apshutdown", apshut, a + i);
	}

	while(active.machs != 1)
		sched();

	print("cpu%d: thunderbirdsarestop %d\n", m->machno, active.machs);
	splhi();
	if (arch)
		arch->introff();
	else
		i8259off();
	print("cpu%d: shutting down...\n", m->machno);

	/* turn off buffered serial console */
	serialoq = nil;

	/* shutdown devices */
	chandevshutdown();

	/*
	 * Modify the machine page table to directly map the low 4MB of memory
	 * This allows the reboot code to turn off the page mapping
	 */
	pdb = m->pdb;
	pdb[PDX(0)] = pdb[PDX(KZERO)];
	mmuflushtlb(PADDR(pdb));

	/* setup reboot trampoline function */
	f = (void*)REBOOTADDR;
	memmove(f, rebootcode, sizeof(rebootcode));

	print("cpu%d: rebooting... %p [%p %p %lux]\n", m->machno, PADDR(reboot), PADDR(entry), PADDR(code), size);

	/* off we go - never to return */
	(*f)(PADDR(entry), PADDR(code), size);
}



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-12-10 14:31 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <<B4460221-A10D-45CD-A66A-8C536C83655E@lsub.org>
2009-12-10 14:31 ` [9fans] trouble with /dev/reboot and mp irqs (fix) erik quanstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).