9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] dual cpu fs/auth/cpu server contd.
@ 2004-11-22  9:52 Steve Simon
  0 siblings, 0 replies; only message in thread
From: Steve Simon @ 2004-11-22  9:52 UTC (permalink / raw)
  To: 9fans

Supermicro motherboard dual P3 update:

I have replace the RAM and PSU (and the disks though these where
just upgraded to zippier items) but still get problems.

First off, when the board is booted in single cpu mode all is fine,
it appears to be quite reliable.

When both cpus are booted it works for a while but then I get "no procs"
scrolling up the console, the timing of this relative to boot seems random,
the board has run for 2 days, somtimes only a few minuites. 

I tied a while() loop rebuilding kernels and couldn't force it
to break, and likewise when idle (apart from email arrival and
cron jobs) the machine seems stable, the problems seem to occur
when I am drawtermed to this machine and doing some work.

When I get the "no procs" scrolling ^T^T^P shows hundreds of cron's
running. I also noticed the clock in my drawterm session attached
to this server shows the wrong time, very wrong, 17.30 on 22nd Nov
has jumped to to 14.30 3rd Nov.

If I kill cron when I boot the machine (dual cpu) then I no-longer get "no prcs"
but the CPU load does go up to 100% (and it gets a bit sluggish). So far I have
only done this once however.

jmk said they use dual P3s a lot at the labs, I wonder if my problem
has somting to do my dual P3 being a file, auth, and cpu server?
does cron or timesync run on a dualcpu machine at the labs (clutching at straws).
 
There where some problems reported a few months ago, to do with time
passing too fast after a suspend and resume cycle, perhaps this is related?

My next step is to try to crash the machine a few more times with cron disabled
and look for a more definite pattern - I saw the fp exception in fossil again
last night but that may be spurious.

It all smacks of andrey's problem of a year or two ago...

http://groups.google.com/groups?selm=Pine.LNX.4.44.0307171223310.7478-100000%40fbsd.cpsc.ucalgary.ca&output=gplain

but this was a boot time problem and was fixed in timesync(1), wasn't it?

Thoughts anyone?

-Steve


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2004-11-22  9:52 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-22  9:52 [9fans] dual cpu fs/auth/cpu server contd Steve Simon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).