9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Random SATA errors with SMP on a dual core machine.
@ 2009-06-02 20:22 Dan Cross
  2009-06-02 21:01 ` Steve Simon
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dan Cross @ 2009-06-02 20:22 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Has anyone else seen this?  I am experiencing random SATA errors when
I turn on SMP on a dual core machine.

After a several-year hiatus, I just got some new hardware to build a
plan 9 network at home.  My file server is a 1U rackmount machine with
the following hardware:

1. SuperMicro PDSML-LN2+ motherboard
   - builtin ICH7R SATA controller
   - builtin Intel 82573L Gigabit Ethernet adapter)
2. 1.8GHz Dual-core Intel Core2 Duo processor
3. 2GB RAM
4. 2 x 750GB SATA drives
5. 1 x 2GB Compact Flash removal disk.

Note that this machine has neither a CD nor DVD drive.  This is
because I misread the vendor's quote: they could not fit a slimline CD
or DVD drive into the 1U chassis along with two hard drives but I
didn't realize that until I pulled the machine out of the box.  I got
around this by installing Plan 9 onto the compact flash card on
another machine that did have a CD drive, then bringing it up on this
machine.

The first problem I had was using the SATA drives; the SATA drivers in
the distributed kernel had problems, so I updated them to the latest
from Erik's directory on sources.  Specifically:

% 9fs sources
% cd /n/sources/contrib/quanstro/root/sys/src/9/pc
% cp sdata.c sdiahci.c ahci.h /sys/src/9/pc
% cd ../port
% cp devsd.c sd.h sdloop.c /sys/src/9/port
% cd ../../libfis
% mkdir /sys/src/libfis
% cp fis.h mkfile /sys/src/libfis
% cd /sys/src/libfs
% mk install

I then edited the appropriate mkfile to refer to /386/lib/libfis.a and
built the 'pcf' kernel, copied it to 9fat (on the CF card) and
rebooted.  I'm not sure that I didn't miss any steps, but I was able
to fdisk, prep and flfmt the SATA drives and load the operating system
by running the (slightly edited) installation scripts from
/sys/lib/dist/pc/inst, choosing a fossil+venti configuration.  To this
point, I'd only been using one core as '*nomp=1' was set in plan9.ini.
 At this point, everything is still running as a terminal.

Now the problem that I am seeing is that, if I boot the machine up
with both cores enabled, I get some relatively small amount of use out
of the SATA drives, then I get a (seemingly) random i/o error and then
all further access to the drives fails.  I am still booting from the
CF disk, but using the fossil on the SATA drives as the root.  I was
also having problems with rio, but upon further investigation, I see
that there are known issues with VESA and MP, but even if I don't load
the VGA registers and stay in CGA mode things still behave strangely
(for instance, my venti got corrupted and all of /sys/include
disappeared).  However, if I set '*nomp=1' in plan9.ini, everything
works fine.

Has anyone seen this before?
Is this a known issue?
Even better, is there a fix?

Btw: my long term intention is to use the fs driver to mirror fossil
and venti across both of the SATA drives, keep a small fossil on the
CF card for emergencies, and keep a partition there for secstore data.
 But I haven't gotten to that stage yet.

        - Dan C.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] Random SATA errors with SMP on a dual core machine.
  2009-06-02 20:22 [9fans] Random SATA errors with SMP on a dual core machine Dan Cross
@ 2009-06-02 21:01 ` Steve Simon
  2009-06-03  5:15 ` erik quanstrom
  2009-06-04 19:33 ` Venkatesh Srinivas
  2 siblings, 0 replies; 6+ messages in thread
From: Steve Simon @ 2009-06-02 21:01 UTC (permalink / raw)
  To: 9fans

Oh yes, I would suggest you use the contrib package to install Eriks
sd driver, if you haven't played with it you should just need:

9fs sources
/n/sources/contrib/fgb/root/rc/bin/contrib/install fgb/contrib
contrib/list -v quanstro/sd
contrib/install quanstro/sd
contrib/install quanstro/fis

you will probably need to overwrite a couple of the files you
copied by hand, but this is just things like:

contrib/pull -s sys/src/9/pc/sdata.c sd

you don't need the quanstro/ once you have done the install.

-Steve



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] Random SATA errors with SMP on a dual core machine.
  2009-06-02 20:22 [9fans] Random SATA errors with SMP on a dual core machine Dan Cross
  2009-06-02 21:01 ` Steve Simon
@ 2009-06-03  5:15 ` erik quanstrom
  2009-06-04 16:03   ` Dan Cross
  2009-06-04 19:33 ` Venkatesh Srinivas
  2 siblings, 1 reply; 6+ messages in thread
From: erik quanstrom @ 2009-06-03  5:15 UTC (permalink / raw)
  To: 9fans

sorry to hear things aren't working.

> % 9fs sources
> % cd /n/sources/contrib/quanstro/root/sys/src/9/pc
> % cp sdata.c sdiahci.c ahci.h /sys/src/9/pc
> % cd ../port
> % cp devsd.c sd.h sdloop.c /sys/src/9/port
> % cd ../../libfis
> % mkdir /sys/src/libfis
> % cp fis.h mkfile /sys/src/libfis
> % cd /sys/src/libfs
> % mk install

steve is correct.  using contrib is the easy way to do this,
but the full list of files is in /n/sources/contrib/quanstro/replica/sd/,
if you don't want to use the contrib stuff.  pc/sdscsi.c
jumps out as missing.

> Now the problem that I am seeing is that, if I boot the machine up
> with both cores enabled, I get some relatively small amount of use out
> of the SATA drives, then I get a (seemingly) random i/o error and then
> all further access to the drives fails.  I am still booting from the
> CF disk, but using the fossil on the SATA drives as the root.  I was

(i assuming that you are not running in combined mode and you have
configured the sata ports in ahci mode.)

could you send me the errors you are seeing?  it may be helpful to
turn on debugging.  i'd suggest recompiling the kernel with debug=1
in sdiahci.c.  i'd be interested in the output (including all the boot-time
noise) if you could send it off list.  i am running the same intel 3000
mukilteo-2 chipset in a few machines but don't see any trouble.  i am
running all of them with mp interrupts enabled.

- erik



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] Random SATA errors with SMP on a dual core machine.
  2009-06-03  5:15 ` erik quanstrom
@ 2009-06-04 16:03   ` Dan Cross
  2009-06-04 17:54     ` erik quanstrom
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Cross @ 2009-06-04 16:03 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Wed, Jun 3, 2009 at 1:15 AM, erik quanstrom<quanstro@quanstro.net> wrote:
> sorry to hear things aren't working.
>
>> % 9fs sources
>> % cd /n/sources/contrib/quanstro/root/sys/src/9/pc
>> % cp sdata.c sdiahci.c ahci.h /sys/src/9/pc
>> % cd ../port
>> % cp devsd.c sd.h sdloop.c /sys/src/9/port
>> % cd ../../libfis
>> % mkdir /sys/src/libfis
>> % cp fis.h mkfile /sys/src/libfis
>> % cd /sys/src/libfs
>> % mk install
>
> steve is correct.  using contrib is the easy way to do this,
> but the full list of files is in /n/sources/contrib/quanstro/replica/sd/,
> if you don't want to use the contrib stuff.  pc/sdscsi.c
> jumps out as missing.

Sorry, the commands in the message were typed from memory.  I had
correctly copied in pc/sdscsi.c (and /sys/include/fis.h).  Btw: these
drivers seem much more complete and robust than those in the
distribution; why haven't they been integrated into /sys/src and made
the default?  Not a complaint, just a question.

The contrib tool sounds neat; I'd never heard of it before.

>> Now the problem that I am seeing is that, if I boot the machine up
>> with both cores enabled, I get some relatively small amount of use out
>> of the SATA drives, then I get a (seemingly) random i/o error and then
>> all further access to the drives fails.  I am still booting from the
>> CF disk, but using the fossil on the SATA drives as the root.  I was
>
> (i assuming that you are not running in combined mode and you have
> configured the sata ports in ahci mode.)

Yes, that's correct.

> could you send me the errors you are seeing?  it may be helpful to
> turn on debugging.  i'd suggest recompiling the kernel with debug=1
> in sdiahci.c.  i'd be interested in the output (including all the boot-time
> noise) if you could send it off list.  i am running the same intel 3000
> mukilteo-2 chipset in a few machines but don't see any trouble.  i am
> running all of them with mp interrupts enabled.

I'll see if I can't capture them; they seem to have largely
disappeared and now I'm wondering if I had accidentally enabled VESA
and MP together and somehow gotten the system into such an odd state
that it told the disks to do something weird that corrupted venti.

Things seem to be working okay, except for some leaked blocks (which
may have been due to an unclean shutdown), so perhaps it was a fluke.
Regardless, I'll see about getting you those boot messages.

        - Dan C.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] Random SATA errors with SMP on a dual core machine.
  2009-06-04 16:03   ` Dan Cross
@ 2009-06-04 17:54     ` erik quanstrom
  0 siblings, 0 replies; 6+ messages in thread
From: erik quanstrom @ 2009-06-04 17:54 UTC (permalink / raw)
  To: 9fans

> correctly copied in pc/sdscsi.c (and /sys/include/fis.h).  Btw: these
> drivers seem much more complete and robust than those in the
> distribution

thank you.  my hope is that it will be included in the distribution.

i'm not sure if i've recapped this on the list or not.  there are a
couple of main goals of the sd/libfis work:
- support for raw ata commands allowing, e.g., smart, atazz.
- generic lba/nblocks <-> sata fis / sas frame translation.
- support for 64-bit (scsi) or 48-bit (ata) lbas in all sd drivers.
- support for combined-mode sata/sas hardware; e.g. sdorion.
- cleanup; e.g. removing the translation from lba/nblocks ->
scsi cdb -> lba/nblocks for non-scsi devices.

other small things have been done as well like conformance
with acs-2 (the latest ata command set proposal) and updating
to ahci 1.3 and 3d generation/6gbps sata.

> Things seem to be working okay, except for some leaked blocks (which
> may have been due to an unclean shutdown), so perhaps it was a fluke.
> Regardless, I'll see about getting you those boot messages.

it occurred to me that i have also also seen some funnies when
"legacy usb" is enabled in bios with a usb keyboard or mouse.
this feature enables smm (system mgmt mode) and can cause
interrupt problems.

- erik



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [9fans] Random SATA errors with SMP on a dual core machine.
  2009-06-02 20:22 [9fans] Random SATA errors with SMP on a dual core machine Dan Cross
  2009-06-02 21:01 ` Steve Simon
  2009-06-03  5:15 ` erik quanstrom
@ 2009-06-04 19:33 ` Venkatesh Srinivas
  2 siblings, 0 replies; 6+ messages in thread
From: Venkatesh Srinivas @ 2009-06-04 19:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

> I see that there are known issues with VESA and MP...

Shouldn't the vesa driver be checking if MP is enabled and warn pretty
loudly if so? Same for the apm driver (both use realmode()).

-- vs



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-04 19:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-02 20:22 [9fans] Random SATA errors with SMP on a dual core machine Dan Cross
2009-06-02 21:01 ` Steve Simon
2009-06-03  5:15 ` erik quanstrom
2009-06-04 16:03   ` Dan Cross
2009-06-04 17:54     ` erik quanstrom
2009-06-04 19:33 ` Venkatesh Srinivas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).