9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Robert Raschke <rrplan9@tombob.com>
To: 9fans@cse.psu.edu
Subject: Re: [9fans] venti wrarena i/o errors
Date: Tue,  4 Dec 2007 23:02:46 +0000	[thread overview]
Message-ID: <8151ba5cb4868bc8265e2bd66552dad6@tombob.com> (raw)
In-Reply-To: <9AC90B7C-FEE5-4FA0-A60A-6049CE79E322@utopian.net>

Hi,

I'm still having issues with Venti on a VIA EPIA-5000.  I am getting
write i/o errors from icachewritesect, as well as lock errors.

My disk is a Samsung SP1613N, and is fine according to Samsung's
HUTIL.  I also ran memtest86 in case my memory is sick, but that is
fine too.

I booted using PXE and my disk was completely clean.  My startup info
looks like this:

Plan 9
E820: 00000000 0009fc00 memory	E820: 0009fc00 000a0000 reserved
E820: 000f0000 00100000 reserved	E820: 00100000 0f7f0000 memory
E820: 0f7f0000 0f7f3000 acpi nvs	E820: 0f7f3000 0f800000 acpi reclaim
E820: ffff0000 100000000 reserved	126 holes free
00054000 0008a000 221184
002ee000 064cf000 102633472
102854656 bytes free
cpu0: 533MHz CentaurHauls Via C3 Samuel 2 or Ezra (cpuid: AX 0x0673 DX 0x803035)
ELCR: 0E08
pcirouting: ignoring south bridge PCI.0.0.0 1106/0601
#l0: vt6102: 100Mbps port 0xE800 irq 11: 004063e24ea8
#U/usb0: uhci: port 0xD400 irq 3
#U/usb1: uhci: port 0xD800 irq 3
248M memory: 101M kernel data, 147M user, 561M swap
root is from (tcp)[tcp]: 
user[none]: rtr
secstore password: 
version...time...

init: starting /bin/rc

Next I set up ~30GB on the disk:

term% disk/mbr -m /386/mbr /dev/sdC0/data
term% disk/fdisk /dev/sdC0/data
[...]
>>> p
'  p1                      0 3911         (3911 cylinders, 29.95 GB) PLAN9
   empty                3911 19457        (15546 cylinders, 119.08 GB) 
>>> w
>>> q
term% disk/prep /dev/sdC0/data
>>> p
' 9fat                   0 204800      (204800 sectors, 100.00 MB)
' nvram             204800 204801      (1 sectors, 512 B )
' fossil            204801 10192089    (9987288 sectors, 4.76 GB)
' bloom           10192089 10224857    (32768 sectors, 16.00 MB)
' arenas          10224857 60325137    (50100280 sectors, 23.88 GB)
' isect           60325137 62830152    (2505015 sectors, 1.19 GB)
>>> w
>>> q
term% disk/format -b /386/pbs -d -r 2 /dev/sdC0/9fat /386/9load /386/9pcf
add 9load at clust 2
add 9pcf at clust a7
Initializing FAT file system
type hard, 12 tracks, 255 heads, 63 sectors/track, 512 bytes/sec
Adding file /386/9load, length 337588
add 9load at clust 2
Adding file /386/9pcf, length 2874011
add 9pcf at clust a7
used 3215360 bytes

And now I set up venti:

term% cat /tmp/venti.conf
index main
isect /dev/sdC0/isect
arenas /dev/sdC0/arenas
bloom /dev/sdC0/bloom
mem 2M
bcmem 4M
icmem 6M
addr tcp!*!17034
httpaddr tcp!*!8000
term% venti/fmtisect isect /dev/sdC0/isect
fmtisect /dev/sdC0/isect: 156,466 buckets of 215 entries, 524,288 bytes for index map
term% venti/fmtarenas arenas /dev/sdC0/arenas
fmtarenas /dev/sdC0/arenas: 48 arenas, 25,650,544,640 bytes storage, 524,288 bytes for index map
term% venti/fmtbloom /dev/sdC0/bloom
fmtbloom: using 16MB, 32 hashes/score, best up to 2,982,616 blocks
term% venti/conf -w /dev/sdC0/arenas </tmp/venti.conf
term% venti/fmtindex /dev/sdC0/arenas
fmtindex: 48 arenas, 156,466 index buckets, 25,649,758,208 bytes storage
term% venti/venti -c /dev/sdC0/arenas
2007/1204 21:43:15 venti: conf...httpd tcp!*!8000...init...icache 6,291,456 bytes = 98,304 entries; 4 scache
sync...announce tcp!*!17034...serving.

So far so good, I previously saved my arenas on my old
auth/cpu/fossil/venti server (running the old venti in a cpuf kernel)
using the new venti/rdarena.  I now use the new venti/wrarena on my
old server and load the first arena into my new running venti.

The first error turns up on my screen a little while later:

lock 0xf0c77cf8 loop key 0xdeaddead pc 0xf01c846f held by pc 0xf01c846f proc 307
295:     venti pc f01da773 dbgpc    203db     Pread (Running) ut 1 st 537 bss 4342000 qpc f01be14f nl 0 nd 0 lpc f01c57a1 pri 3
307:     venti pc f01cded7 dbgpc    203db     Pread (Ready) ut 332 st 1137 bss 4342000 qpc f013ea9a nl 2 nd 0 lpc f01c1026 pri 0
lock 0xf0c77cf8 loop key 0xdeaddead pc 0xf01c846f held by pc 0xf01c846f proc 307
297:     venti pc f01cda6c dbgpc    203db     Pread (Running) ut 79 st 553 bss 4342000 qpc f01c8d59 nl 0 nd 0 lpc f01c57a1 pri 3
307:     venti pc f01cded7 dbgpc    203db     Pread (Ready) ut 332 st 1137 bss 4342000 qpc f013ea9a nl 2 nd 0 lpc f01c108e pri 0

Once the wrarena has finished and all activity has ceased on my new
machine I kill and restart venti (was reading the man pages in the
meantime, and thought turning on debug output might be a good idea)
(don't know if the sync does anything at all with the new venti):

term% venti/sync -h P9VIA
term% kill venti |rc
term% venti/venti -d -c /dev/sdC0/arenas
2007/1204 22:12:09 venti: conf...httpd tcp!*!8000...init...icache 6,291,456 bytes = 98,304 entries; 4 scache
sync...2007/1204 22:12:20 arenas0: indexing 96523 clumps...
announce tcp!*!17034...serving.

Now I get a sawtooth pattern in the load section of my stats window,
and after about 20 minutes of that, I get this in the window where I
started venti:

2007/1204 22:24:20 err 4: write /dev/sdC0/isect offset 0x293ae000 count 65536 buf 337e000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2922e000: icachewritesect writepart: write /dev/sdC0/isect offset 0x293ae000 count 65536 buf 337e000 returned -1: i/o error
2007/1204 22:24:21 err 4: read /dev/sdC0/isect offset 0x29a2e000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x29a2e000: icachewritesect readpart: read /dev/sdC0/isect offset 0x29a2e000 count 65536 buf 31fe000 returned -1: i/o error
2007/1204 22:24:21 err 4: read /dev/sdC0/isect offset 0x2a22e000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2a22e000: icachewritesect readpart: read /dev/sdC0/isect offset 0x2a22e000 count 65536 buf 31fe000 returned -1: i/o error
2007/1204 22:24:21 err 4: read /dev/sdC0/isect offset 0x2aa32000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2aa32000: icachewritesect readpart: read /dev/sdC0/isect offset 0x2aa32000 count 65536 buf 31fe000 returned -1: i/o error
2007/1204 22:24:28 err 4: read /dev/sdC0/isect offset 0x2b234000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2b234000: icachewritesect readpart: read /dev/sdC0/isect offset 0x2b234000 count 65536 buf 31fe000 returned -1: i/o error
2007/1204 22:24:29 err 4: read /dev/sdC0/isect offset 0x2ba36000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2ba36000: icachewritesect readpart: read /dev/sdC0/isect offset 0x2ba36000 count 65536 buf 31fe000 returned -1: i/o error
2007/1204 22:24:29 err 4: read /dev/sdC0/isect offset 0x2c236000 count 65536 buf 31fe000 returned -1: i/o error
venti/venti: part /dev/sdC0/isect addr 0x2c236000: icachewritesect readpart: read /dev/sdC0/isect offset 0x2c236000 count 65536 buf 31fe000 returned -1: i/o error

This looks like it will continue pretty much until I kill venti.

My drive info is:

term% cat /dev/sdC0/ctl
inquiry SAMSUNG SP1613N                         
config 0040 capabilities 2F00 dma 00550004 dmactl 00000000 rwm 16 rwmctl 0 lba48always off
geometry 312581808 512 16383 16 63
part data 0 312581808
part plan9 63 62830215
part 9fat 63 204863
part nvram 204863 204864
part fossil 204864 10192152
part bloom 10192152 10224920
part arenas 10224920 60325200
part isect 60325200 62830215

And attempting to turn on dma results in

atagenioretry: disabling dma
sdC0: retry: dma 00000000 rwm 0000

Which I guess means that I won't be using dma. That's ok.

The disk offsets into the isect partition look fine to me.

I guess I will try and cobble together the old venti from my archives
and give that a try.  Just to see if that behaves any different.  But
if anyone has any ideas what to try next, I'd love to hear
suggestions.

Thanks,
Robby


--
replace my plan9 mail alias with r.raschke for direct emails



  reply	other threads:[~2007-12-04 23:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-29 12:12 Joshua Wood
2007-12-04 23:02 ` Robert Raschke [this message]
2007-12-05  0:29   ` Russ Cox
     [not found] <24c8b3ca4a8c192c1731e5e67d43f50a@tombob.com>
2007-12-06  9:54 ` Robert Raschke
2007-12-11  9:41   ` Robert Raschke
2007-12-14  0:45     ` sqweek
2007-12-14  0:48       ` ron minnich
2007-12-14  8:22         ` Christian Kellermann
2007-12-14 16:05           ` ron minnich
2007-12-14 15:18     ` Richard Miller
  -- strict thread matches above, loose matches on Subject: below --
2007-11-29 17:44 Joshua Wood
2007-11-29  1:11 Joshua Wood
2007-11-29  9:30 ` Robert Raschke
2007-11-28 22:42 Robert Raschke
2007-11-28 23:29 ` erik quanstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8151ba5cb4868bc8265e2bd66552dad6@tombob.com \
    --to=rrplan9@tombob.com \
    --cc=9fans@cse.psu.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).