* [9fans] venti+fossil woes @ 2003-11-14 23:18 Christopher Nielsen 2003-11-14 23:23 ` Geoff Collyer ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-14 23:18 UTC (permalink / raw) To: 9fans fossil crashed in the middle of an archival snapshot. now, i'm getting err 4: no space left in arenas failed to write lump for <vac score>: no space left in arenas there's plenty of space left in the arenas. a whole other 167G disc, in fact. i've run venti/checkarenas and venti/checkindex to fix any inconsistencies. they were both successful according to the output. any ideas about what is going on and how to fix it? also, is there any way to tell fossil to stop trying to do the snapshot? -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:18 [9fans] venti+fossil woes Christopher Nielsen @ 2003-11-14 23:23 ` Geoff Collyer 2003-11-14 23:34 ` Christopher Nielsen 2003-11-14 23:37 ` Charles Forsyth [not found] ` <20031116013757.GO834@cassie.foobarbaz.net> 2 siblings, 1 reply; 26+ messages in thread From: Geoff Collyer @ 2003-11-14 23:23 UTC (permalink / raw) To: 9fans Venti may be fine, but fossil need not be. What does flchk -f /dev/sdXX/fossil # where sdXX is your fossil disk say? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:23 ` Geoff Collyer @ 2003-11-14 23:34 ` Christopher Nielsen 0 siblings, 0 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-14 23:34 UTC (permalink / raw) To: 9fans On Fri, Nov 14, 2003 at 03:23:03PM -0800, Geoff Collyer wrote: > Venti may be fine, but fossil need not be. What does > > flchk -f /dev/sdXX/fossil # where sdXX is your fossil disk > > say? gives me a huge list of bfree commands -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:18 [9fans] venti+fossil woes Christopher Nielsen 2003-11-14 23:23 ` Geoff Collyer @ 2003-11-14 23:37 ` Charles Forsyth 2003-11-14 23:43 ` Christopher Nielsen 2003-11-14 23:43 ` boyd, rounin [not found] ` <20031116013757.GO834@cassie.foobarbaz.net> 2 siblings, 2 replies; 26+ messages in thread From: Charles Forsyth @ 2003-11-14 23:37 UTC (permalink / raw) To: 9fans >>here's plenty of space left in the arenas. a whole other >>167G disc, in fact. i wonder whether something in the mix can't cope correctly with a disc that size ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:37 ` Charles Forsyth @ 2003-11-14 23:43 ` Christopher Nielsen 2003-11-15 0:17 ` Charles Forsyth 2003-11-14 23:43 ` boyd, rounin 1 sibling, 1 reply; 26+ messages in thread From: Christopher Nielsen @ 2003-11-14 23:43 UTC (permalink / raw) To: 9fans On Fri, Nov 14, 2003 at 11:37:55PM +0000, Charles Forsyth wrote: > > i wonder whether something in the mix can't cope > correctly with a disc that size it's been coping just fine for the last three months. the first disc of arenas is 167G, too. before i wrote the 48-bit lba support for the ata driver, it wouldn't recognise the upper part of the disc. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:43 ` Christopher Nielsen @ 2003-11-15 0:17 ` Charles Forsyth 2003-11-15 1:00 ` Christopher Nielsen 0 siblings, 1 reply; 26+ messages in thread From: Charles Forsyth @ 2003-11-15 0:17 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 245 bytes --] i know you did that, but i suppose i meant: in the code of one or both of them, somewhere it stores a value shorter than it ought to be, and it's only noticed when allocation moves into the given region. how big are the various partitions? [-- Attachment #2: Type: message/rfc822, Size: 2737 bytes --] From: Christopher Nielsen <cnielsen@pobox.com> To: 9fans@cse.psu.edu Subject: Re: [9fans] venti+fossil woes Date: Fri, 14 Nov 2003 15:43:03 -0800 Message-ID: <20031114234303.GE834@cassie.foobarbaz.net> On Fri, Nov 14, 2003 at 11:37:55PM +0000, Charles Forsyth wrote: > > i wonder whether something in the mix can't cope > correctly with a disc that size it's been coping just fine for the last three months. the first disc of arenas is 167G, too. before i wrote the 48-bit lba support for the ata driver, it wouldn't recognise the upper part of the disc. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-15 0:17 ` Charles Forsyth @ 2003-11-15 1:00 ` Christopher Nielsen 0 siblings, 0 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-15 1:00 UTC (permalink / raw) To: 9fans On Sat, Nov 15, 2003 at 12:17:38AM +0000, Charles Forsyth wrote: > i know you did that, but right. sorry. > i suppose i meant: in the code of one or both of them, > somewhere it stores a value shorter than it ought to be, > and it's only noticed when allocation moves into the given region. > how big are the various partitions? i understand what you're saying, now. the arena partitions take up the whole of both 167G discs. the arenas are (the default) 512M. fossil is on a 74G partition, and the venti index is on a 18G partititon. the arena it's trying to write to is 312 out of 670. the amount of data currently in the venti is ~155G. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-14 23:37 ` Charles Forsyth 2003-11-14 23:43 ` Christopher Nielsen @ 2003-11-14 23:43 ` boyd, rounin 1 sibling, 0 replies; 26+ messages in thread From: boyd, rounin @ 2003-11-14 23:43 UTC (permalink / raw) To: 9fans > i wonder whether something in the mix can't cope > correctly with a disc that size that could be it. any extensible system must cope with stuff that's larger than it expects. in that case it should say _immediately_: i can't cope with that ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <20031116013757.GO834@cassie.foobarbaz.net>]
* Re: [9fans] venti+fossil woes [not found] ` <20031116013757.GO834@cassie.foobarbaz.net> @ 2003-11-18 12:40 ` Christopher Nielsen 2003-11-18 14:08 ` Russ Cox 2003-11-18 15:36 ` SPAM: " jmk 0 siblings, 2 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-18 12:40 UTC (permalink / raw) To: 9fans Here's an update for anyone interested, since I can't manage to get to sleep for some reason. I bought some better quality ata cables yesterday. That helped to the point that I thought my troubles were over. No such luck. Now, what I am seeing is whenever a venti arena becomes full and is in the process of being sealed, the screen becomes filled with IBsy+ repeated ad infinitum, which I know is from the ata driver. Eventually, fossil gives an error from diskReadRaw() saying something like: archive(0, <block addr>): cannot find block: i/o error followed by a dump that I presume could be useful for diagnostics. What I am guessing is happening is that there is so much contention in the controller that it's causing reads and sometimes writes to timeout. This eventually causes fossil to just fall over dead. At which point, I reboot from a CD, run venti/checkarenas -vf on the arena partition and then reboot so that fossil can continue where it left off with the snapshot. Wash, rinse, repeat. Anyway, the saga continues. We'll see if I end up losing data. I'm still guessing not. My only comment is that it would be nice if fossil would handle such error conditions more gracefully. Regardless, I am going to dig around for another ata controller to spread the disks across. On Sat, Nov 15, 2003 at 05:37:57PM -0800, Christopher Nielsen wrote: > this is looking more and more like it was a hardware > problem. reseating all the connections eliminated most > of the errors i was seeing. now i am getting errors > from diskRawWrite, which leads me to believe that one > of the disks is going bad. i can't really tell which > one, though. the error message from diskRawWrite gives > some diagnostic info, but i don't know how to interpret > it. admittedly, i haven't dived into the source as much > as i could, but maybe someone can provide some insight > before i go ahead and do that. > > thanks to everyone that has provided input so far. > > i have to say, it doesn't look like i'm going to lose > any data. it's not certain yet, but it's looking good. > the paranoia in fossil and venti are good. > > On Fri, Nov 14, 2003 at 03:18:42PM -0800, Christopher Nielsen wrote: > > fossil crashed in the middle of an archival snapshot. > > now, i'm getting > > > > err 4: no space left in arenas > > failed to write lump for <vac score>: no space left in arenas > > > > there's plenty of space left in the arenas. a whole other > > 167G disc, in fact. > > > > i've run venti/checkarenas and venti/checkindex to fix any > > inconsistencies. they were both successful according to the > > output. > > > > any ideas about what is going on and how to fix it? > > > > also, is there any way to tell fossil to stop trying to do > > the snapshot? > > -- > Christopher Nielsen > "They who can give up essential liberty for temporary > safety, deserve neither liberty nor safety." --Benjamin Franklin -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 12:40 ` Christopher Nielsen @ 2003-11-18 14:08 ` Russ Cox 2003-11-18 15:27 ` Charles Forsyth 2003-11-18 22:35 ` Christopher Nielsen 2003-11-18 15:36 ` SPAM: " jmk 1 sibling, 2 replies; 26+ messages in thread From: Russ Cox @ 2003-11-18 14:08 UTC (permalink / raw) To: 9fans Have you tried replacing the disk? Since only one disk I/O is done at a time, I have a hard time believing that contention is causing them to time out. Contention might cause them to take a while before they acquire the lock around the disk, but that isn't timed. Russ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 14:08 ` Russ Cox @ 2003-11-18 15:27 ` Charles Forsyth 2003-11-18 22:35 ` Christopher Nielsen 1 sibling, 0 replies; 26+ messages in thread From: Charles Forsyth @ 2003-11-18 15:27 UTC (permalink / raw) To: 9fans it's probably just me, but i nearly always suspect software rather than hardware, until i'm absolutely sure (i have seen some botched hardware though). i wonder whether the iBsy+ isn't a clue: it has got an interrupt, but the device is busy when the interrupt function starts, and it seems to happen (after some point) under load. is the interrupt shared? are rwm and dma set? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 14:08 ` Russ Cox 2003-11-18 15:27 ` Charles Forsyth @ 2003-11-18 22:35 ` Christopher Nielsen 2003-11-18 23:10 ` jmk 1 sibling, 1 reply; 26+ messages in thread From: Christopher Nielsen @ 2003-11-18 22:35 UTC (permalink / raw) To: 9fans On Tue, Nov 18, 2003 at 09:08:49AM -0500, Russ Cox wrote: > Have you tried replacing the disk? Since only one disk I/O > is done at a time, I have a hard time believing that contention > is causing them to time out. Contention might cause them to > take a while before they acquire the lock around the disk, > but that isn't timed. Changing the disk, currently, is not an option. There's still data being written from fossil to venti from an archival snapshot that starts up every time the machine boots, albeit slowly and interspersed with fossil crashes. On Tue, Nov 18, 2003 at 03:27:00PM +0000, Charles Forsyth wrote: > it's probably just me, but i nearly always > suspect software rather than hardware, until > i'm absolutely sure (i have seen some botched > hardware though). i wonder whether > the iBsy+ isn't a clue: it has got an interrupt, > but the device is busy when the interrupt function > starts, and it seems to happen (after some point) > under load. is the interrupt shared? > are rwm and dma set? I don't think the interrupt is shared. Both dma and rwm are set for all the drives. On Tue, Nov 18, 2003 at 10:36:55AM -0500, jmk@plan9.bell-labs.com wrote: > > I'd say that still points to a hardware problem. Can you tell > us what > ATA controller (or which chipset it is in) > drive model numbers > the 'dev' lines printed at kernel boot for the drives are? The controller is a Promise 100TX2 (PDC20268). The drives are all Western Digital: 2x WD1800JB venti arenas 1x WD800JB fossil 1x WD400JB venti index The 'dev' lines from boot: dev A0 port CCD8 config 427A capabilities 2F00 mwdma 0007 udma 203F dev B0 port CCD8 config 427A capabilities 2F00 mwdma 0007 udma 203F dev A0 port CCC0 config 427A capabilities 2F00 mwdma 0007 udma 203F dev B0 port CCC0 config 427A capabilities 2F00 mwdma 0007 udma 203F Thanks for the help/input, guys. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 22:35 ` Christopher Nielsen @ 2003-11-18 23:10 ` jmk 2003-11-18 23:18 ` mirtchov 2003-11-18 23:30 ` Christopher Nielsen 0 siblings, 2 replies; 26+ messages in thread From: jmk @ 2003-11-18 23:10 UTC (permalink / raw) To: 9fans On Tue Nov 18 17:36:41 EST 2003, cnielsen@pobox.com wrote: > ... > I don't think the interrupt is shared. Both dma and rwm > are set for all the drives. > ... cat '#P/irqalloc' should tell you. > ... > The controller is a Promise 100TX2 (PDC20268). > ... Does anyone know if this works properly? I know the driver will recognise it, but are there any bugs/workarounds for this chip we are missing? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 23:10 ` jmk @ 2003-11-18 23:18 ` mirtchov 2003-11-18 23:30 ` Christopher Nielsen 1 sibling, 0 replies; 26+ messages in thread From: mirtchov @ 2003-11-18 23:18 UTC (permalink / raw) To: 9fans > Does anyone know if this works properly? I know the driver > will recognise it, but are there any bugs/workarounds for this > chip we are missing? Mine is a Promise 20378. I've had no problems of the sort Chris is reporting. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 23:10 ` jmk 2003-11-18 23:18 ` mirtchov @ 2003-11-18 23:30 ` Christopher Nielsen 2003-11-18 23:59 ` Geoff Collyer 2003-11-19 0:03 ` jmk 1 sibling, 2 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-18 23:30 UTC (permalink / raw) To: 9fans On Tue, Nov 18, 2003 at 06:10:06PM -0500, jmk@plan9.bell-labs.com wrote: > cat '#P/irqalloc' > should tell you. ah great. thanks. #nidaba cat '#P/irqalloc' 3 0 debugpt 7 0 mathemu 8 0 doublefault 9 0 mathover 14 0 fault386 16 0 matherror 32 0 clock 33 1 kbd 36 4 COM1 38 6 floppy 41 9 sdF (ata) 41 9 sdE (ata) 43 11 usb0 43 11 ether0 47 15 sdD (ata) So it looks like the controllers for the four disks are sharing an interrupt. That could be a problem... > > ... > > The controller is a Promise 100TX2 (PDC20268). > > ... > Does anyone know if this works properly? I know the driver > will recognise it, but are there any bugs/workarounds for this > chip we are missing? That was one of the first things I started to look into. I didn't see anything unusual in the FreeBSD ata driver that would indicate it has troubles. But I did see some possible problems when I did a google search. I haven't had the time to dive into it in enough detail. I'll see if I can find anything useful. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 23:30 ` Christopher Nielsen @ 2003-11-18 23:59 ` Geoff Collyer 2003-11-19 0:35 ` Christopher Nielsen 2003-11-19 0:03 ` jmk 1 sibling, 1 reply; 26+ messages in thread From: Geoff Collyer @ 2003-11-18 23:59 UTC (permalink / raw) To: 9fans Wow, your irqalloc output is already sorted; I have to sort mine by hand. It seems odd that sdC doesn't show up (at irq 14). I only have one CPU server with IDE disk(s), but its irqs look like this: ; sort +1nb '#P/irqalloc' 3 0 debugpt 7 0 mathemu 8 0 doublefault 9 0 mathover 14 0 fault386 16 0 matherror 32 0 clock 33 1 kbd 38 6 floppy 42 10 ether1 44 12 ether0 46 14 sdC (ata) 47 15 sdD (ata) You've got sdE and sdF sharing irq 9 and usb0 and ether0 sharing irq 11 (yet irqs 10 and 12 are unused). This could be due to a buggy BIOS (I've got a few of those), but more likely your BIOS is running out of irqs that it knows to be free and thus doubles up devices. It would be worthwhile to go into BIOS setup and disable any devices you aren't using, to reclaim their IRQs. Also visit your PCI/PNP assignment screen and let PCI/PNP have all free irqs, or turn off manual selection of free irqs. On a CPU server, you can probably disable all LPT (parallel) ports at irqs 5 and 7, COM2 (second serial port) at irq 3, and PS/2 mouse at irq 12. If you're not using USB, don't assign it an irq. If there's an option to assign your vga card an irq, disable it. I'm not sure what's sitting on irq 10; maybe vga. Make sure your first IDE controller is enabled (in your `integrated peripherals' screen); it should appear at irq 14. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 23:59 ` Geoff Collyer @ 2003-11-19 0:35 ` Christopher Nielsen 2003-11-19 1:12 ` okamoto 2003-11-19 4:51 ` Dan Cross 0 siblings, 2 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-19 0:35 UTC (permalink / raw) To: 9fans On Tue, Nov 18, 2003 at 03:59:15PM -0800, Geoff Collyer wrote: > Wow, your irqalloc output is already sorted; I have to sort mine by > hand. Well, it was transcribed by hand, but it was transcribed exactly as I saw it. > It seems odd that sdC doesn't show up (at irq 14). When you consider that there isn't a drive on sdC, it's not so odd. > I only have one CPU server with IDE disk(s), but its irqs look like this: > > ; sort +1nb '#P/irqalloc' > 3 0 debugpt > 7 0 mathemu > 8 0 doublefault > 9 0 mathover > 14 0 fault386 > 16 0 matherror > 32 0 clock > 33 1 kbd > 38 6 floppy > 42 10 ether1 > 44 12 ether0 > 46 14 sdC (ata) > 47 15 sdD (ata) > > You've got sdE and sdF sharing irq 9 and usb0 and ether0 sharing irq > 11 (yet irqs 10 and 12 are unused). This could be due to a buggy BIOS > (I've got a few of those), but more likely your BIOS is running out of > irqs that it knows to be free and thus doubles up devices. It would > be worthwhile to go into BIOS setup and disable any devices you aren't > using, to reclaim their IRQs. Also visit your PCI/PNP assignment > screen and let PCI/PNP have all free irqs, or turn off manual > selection of free irqs. sdE and sdF are where all the drives doing all the i/o are living, so it's no wonder there are problems, since they're all sharing an interrupt. Unfortunately, the BIOS setup for this particular machine (an old Dell optiplex) seems mostly neutered. You can't do anything you describe. Considering it's a Dell, I am leaning very heavily toward buggy BIOS. I have an extra case and (decent) motherboard that I can moves the entire system to, so I think I will try that and see if I get more reasonable behavior. Never send a desktop to do a server's work. *sigh* > On a CPU server, you can probably disable all LPT (parallel) ports at > irqs 5 and 7, COM2 (second serial port) at irq 3, and PS/2 mouse at > irq 12. If you're not using USB, don't assign it an irq. If there's > an option to assign your vga card an irq, disable it. I'm not sure > what's sitting on irq 10; maybe vga. Make sure your first IDE > controller is enabled (in your `integrated peripherals' screen); it > should appear at irq 14. Yeah, I usually disable any devices that I have no use for on a server system, pretty much including everything you've listed. Thanks! -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-19 0:35 ` Christopher Nielsen @ 2003-11-19 1:12 ` okamoto 2003-11-19 4:51 ` Dan Cross 1 sibling, 0 replies; 26+ messages in thread From: okamoto @ 2003-11-19 1:12 UTC (permalink / raw) To: 9fans > anything you describe. Considering it's a Dell, I am leaning > very heavily toward buggy BIOS. I cannot set the quantity of shared VGA memory on an old DELL Optiplex either. I'm sick... ☺ Kenji ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-19 0:35 ` Christopher Nielsen 2003-11-19 1:12 ` okamoto @ 2003-11-19 4:51 ` Dan Cross 2003-11-19 5:59 ` Christopher Nielsen 1 sibling, 1 reply; 26+ messages in thread From: Dan Cross @ 2003-11-19 4:51 UTC (permalink / raw) To: 9fans Christopher Nielsen <cnielsen@pobox.com> writes: > Unfortunately, the BIOS setup for this particular machine > (an old Dell optiplex) seems mostly neutered. You can't do > anything you describe. Considering it's a Dell, I am leaning > very heavily toward buggy BIOS. I have an extra case and > (decent) motherboard that I can moves the entire system to, > so I think I will try that and see if I get more reasonable > behavior. > > Never send a desktop to do a server's work. *sigh* Dude! You got screwed by a Dell!! - Dan C. (Okay, sorry; I realize this must be frustrating, and I mean no animosity, but I couldn't resist the illusion to the infamous ``Dell Dude.'') ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-19 4:51 ` Dan Cross @ 2003-11-19 5:59 ` Christopher Nielsen 0 siblings, 0 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-19 5:59 UTC (permalink / raw) To: 9fans On Tue, Nov 18, 2003 at 11:51:50PM -0500, Dan Cross wrote: > Christopher Nielsen <cnielsen@pobox.com> writes: > > Unfortunately, the BIOS setup for this particular machine > > (an old Dell optiplex) seems mostly neutered. You can't do > > anything you describe. Considering it's a Dell, I am leaning > > very heavily toward buggy BIOS. I have an extra case and > > (decent) motherboard that I can moves the entire system to, > > so I think I will try that and see if I get more reasonable > > behavior. > > > > Never send a desktop to do a server's work. *sigh* > > Dude! You got screwed by a Dell!! Hah! > (Okay, sorry; I realize this must be frustrating, and I mean no > animosity, but I couldn't resist the illusion to the infamous > ``Dell Dude.'') Yes, very frustrating, and I know you mean no animosity. Humor is good at times like these. Thanks. :) -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-18 23:30 ` Christopher Nielsen 2003-11-18 23:59 ` Geoff Collyer @ 2003-11-19 0:03 ` jmk 2003-11-19 0:20 ` Charles Forsyth 2003-11-20 6:04 ` Christopher Nielsen 1 sibling, 2 replies; 26+ messages in thread From: jmk @ 2003-11-19 0:03 UTC (permalink / raw) To: 9fans On Tue Nov 18 18:31:43 EST 2003, cnielsen@pobox.com wrote: > ... > So it looks like the controllers for the four disks are > sharing an interrupt. That could be a problem... > ... Sharing the interrupt should be be OK, but spitting out the Ibsy message might confuse things enough to be a problem. Disable that debug message in the driver. Getting the controllers onto separate interrupts would help in any case. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-19 0:03 ` jmk @ 2003-11-19 0:20 ` Charles Forsyth 2003-11-20 6:04 ` Christopher Nielsen 1 sibling, 0 replies; 26+ messages in thread From: Charles Forsyth @ 2003-11-19 0:20 UTC (permalink / raw) To: 9fans >>Sharing the interrupt should be be OK, but spitting out >>the Ibsy message might confuse things enough to be a problem. it might delay the response to the IO request on the other controller (which presumably caused the interrupt) long enough for it to time out on occasion. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-19 0:03 ` jmk 2003-11-19 0:20 ` Charles Forsyth @ 2003-11-20 6:04 ` Christopher Nielsen 2003-11-20 9:26 ` C H Forsyth 1 sibling, 1 reply; 26+ messages in thread From: Christopher Nielsen @ 2003-11-20 6:04 UTC (permalink / raw) To: 9fans Well, it appears to be fixed. For reasons that aren't important, I wasn't able to put the system on a different motherboard, so i did the next best thing. i put fossil and isect on the onboard controller so all disks wouldn't be sharing an interrupt. So far, no errors whatsoever. the transfer rate isn't as high because the onboard ata controller isn't ata100, but at least it appears stable. I'm going to let it run for a while to make sure it's happy, and if anything changes, I'll let you know. The moral of this story, don't share interrupts with fossil and venti on multiple disks. Thank you very much to everyone that offered up advice and help. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-20 6:04 ` Christopher Nielsen @ 2003-11-20 9:26 ` C H Forsyth 2003-11-20 10:14 ` Christopher Nielsen 0 siblings, 1 reply; 26+ messages in thread From: C H Forsyth @ 2003-11-20 9:26 UTC (permalink / raw) To: 9fans >>The moral of this story, don't share interrupts with >>fossil and venti on multiple disks. i think the moral is that something is wrong that prevents that. i think it ought to work. it's not an uncommon arrangement. did you try disabling the IBsy+ print before you moved the drives, and if so did it make no difference. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [9fans] venti+fossil woes 2003-11-20 9:26 ` C H Forsyth @ 2003-11-20 10:14 ` Christopher Nielsen 0 siblings, 0 replies; 26+ messages in thread From: Christopher Nielsen @ 2003-11-20 10:14 UTC (permalink / raw) To: 9fans On Thu, Nov 20, 2003 at 09:26:07AM +0000, C H Forsyth wrote: > > i think the moral is that something is wrong that > prevents that. i think it ought to work. > it's not an uncommon arrangement. that was my point, even if i didn't state it directly. i think it ought to work too, but for some reason it does not. i'm willing to investigate. > did you try disabling the IBsy+ print before you > moved the drives, and if so did it make no difference. i didn't. mostly because i didn't have the time. for the sake of experiment and improving plan 9, i may have some time to try that on friday. as it stands, i have still not seen any errors. -- Christopher Nielsen "They who can give up essential liberty for temporary safety, deserve neither liberty nor safety." --Benjamin Franklin ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: SPAM: Re: [9fans] venti+fossil woes 2003-11-18 12:40 ` Christopher Nielsen 2003-11-18 14:08 ` Russ Cox @ 2003-11-18 15:36 ` jmk 1 sibling, 0 replies; 26+ messages in thread From: jmk @ 2003-11-18 15:36 UTC (permalink / raw) To: 9fans On Tue Nov 18 07:41:46 EST 2003, cnielsen@pobox.com wrote: > ... > > Now, what I am seeing is whenever a venti arena becomes > full and is in the process of being sealed, the screen > becomes filled with IBsy+ repeated ad infinitum, which > I know is from the ata driver. Eventually, fossil gives > an error from diskReadRaw() saying something like: > > ... > I'd say that still points to a hardware problem. Can you tell us what ATA controller (or which chipset it is in) drive model numbers the 'dev' lines printed at kernel boot for the drives are? ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2003-11-20 10:14 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-11-14 23:18 [9fans] venti+fossil woes Christopher Nielsen 2003-11-14 23:23 ` Geoff Collyer 2003-11-14 23:34 ` Christopher Nielsen 2003-11-14 23:37 ` Charles Forsyth 2003-11-14 23:43 ` Christopher Nielsen 2003-11-15 0:17 ` Charles Forsyth 2003-11-15 1:00 ` Christopher Nielsen 2003-11-14 23:43 ` boyd, rounin [not found] ` <20031116013757.GO834@cassie.foobarbaz.net> 2003-11-18 12:40 ` Christopher Nielsen 2003-11-18 14:08 ` Russ Cox 2003-11-18 15:27 ` Charles Forsyth 2003-11-18 22:35 ` Christopher Nielsen 2003-11-18 23:10 ` jmk 2003-11-18 23:18 ` mirtchov 2003-11-18 23:30 ` Christopher Nielsen 2003-11-18 23:59 ` Geoff Collyer 2003-11-19 0:35 ` Christopher Nielsen 2003-11-19 1:12 ` okamoto 2003-11-19 4:51 ` Dan Cross 2003-11-19 5:59 ` Christopher Nielsen 2003-11-19 0:03 ` jmk 2003-11-19 0:20 ` Charles Forsyth 2003-11-20 6:04 ` Christopher Nielsen 2003-11-20 9:26 ` C H Forsyth 2003-11-20 10:14 ` Christopher Nielsen 2003-11-18 15:36 ` SPAM: " jmk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).