From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Mon, 29 Dec 1997 19:20:46 -0600
From: G. David Butler gdb@dbSystems.com
Subject: [9fans] Re: etherelnk3.c
Topicbox-Message-UUID: 6ec74d6a-eac8-11e9-9e20-41e7f4b1d025
Message-ID: <19971230012046._o6X-syQmm9KyIQeTu04PH8j-s0Z_uA3Db01brTOcA4@z>

From: eld@jewel.ucsd.edu (Eric Dorman)

>Definately worth a look; I've got a 9pcfs that allows the use
>of IDE disks for filesystems

Why?  I agree IDE disks are very attractive from a $$/MB point
of view, but you can't get enough of them on a machine.  The
overall $$/MB is less with SCSI since you can aggregate the cost
of the CPU/RAM over more disks.  Also, IDE is PIO and that
is a bad place to waste CPU resources.

>                             and have found that the network
>is far and away the limiting factor (10Mbit/sec 10BaseT on NE2000s)

If you use two transmit buffers, sending one and filling the other,
you will find much more "network" in your NE2000.  [I don't use
the NE2000 anymore now that I have the 3c515 working! Not for the
100bT, but for the 64k of RAM and busmaster transfers!]

>and 100BaseT is practically free these days; I'd love to use 100BaseT.

Not when you look at the price/performance of 10bT full duplex switches
and 100bT hubs.  100bT full duplex switches are nice, and expensive.
(I'm looking at many nodes on the network, all *very* busy.)

If you follow the thread a while ago about the 3Com cards and the
big packet problem, you will see it's pretty easy to get the
3C509 PCI 10/100 card up and running (in PIO mode).  I still haven't
ported the Brazil driver because of the ongoing discussion about
ringbufs/blocks/msgbufs.  (I'm still leaning towards using the
ringbufs.)  But, once that is done, you'll have that option. 

>            Might even be able to interleave across primary and 
>secondary IDEs (if the braindead chipsets will support it..).

No need, filsys main [h0h1] will interleave for you.

>So far I've had to scramble around in the fs code changing 'long's
>to 'ulong's in bytewise size computations and changing the type
>of disk block addresses to ulongs; the matched-pair of 3.5G disks
>breaks 'long's.  I'm worrying though that this may have caused
>my block tag bug. 

Very possible.  I looked at doing a global long to ulong change
but found some places that weren't easy to fix (I don't remeber
where now), so I left it alone.  I was thinking of reducing the
block size and needed more blocks.  If you leave the block size
at 4K, and handle the multiplier like devwren does, then you
shouldn't need to make that change.

[snip]
>reads a block, expecting it to have tag 3 (IND1 block) but instead
>got a file data block (tag 5, err DFile); I'm pretty sure the block 

What is the path set to?  Is it the first file or the second?
If it is the second, then you are overwriting the block.

>in question *should* have been an IND1 block but I haven't actually 
>seen the fs scribble on that particular block any time after it 
>gets flushed for the first time.  (Grr) It appears the right 

Was it in memory correctly before being flushed?

>I would like some comments on changing the way the fs knows
>about available physical disks.  As it stands the fs knows
>about 'Devwren', 'Devworm' and etc. which is fine.  The choices
>for adding an IDE interface were to utilize a new dev type
>(Devide) and codeletter (h) for IDE disks, (requires rewiring
>stuff in fs/port/sub.c)

Yes!

>                        or somehow patching into the scsi
>stuff below the Devwren level.

No!

>                                I chose the former as an
>easier solution but it's, well, icky; changing stuff in
>fs/port is evil since I'd have to stub out 'ideread/idewrite'
>in all other architectures.  

Why?  Use different names.  You need to handle the translation
of RBUFSIZE blocks to real sectors somewhere.

>Seems to me a better solution would be to have the hardware-specific 
>initialization stuff build a table describing the disks connected
>to the box (complete with codeletters, size, traps into the
>hardware driver, etc) and have fs/port/sub.c go indirect
>through the table to the hardware.

Maybe.  I like the way it is now because my mirror code likes
to know that a disk is missing (for whatever reason) and then
know that it is available again later (a reboot cleared the error,
or the drive was replaced.)  The config block is written to a
mirror set with "config {w0.0w1.0}" and it tells you what the system
is suppose to look like even if it doesn't look like that now.
I have caused drive and controller failures and the system
just takes the drive (or all the drives on a failed controller)
off line and keeps running.  When the system is fixed and rebooted,
it finds the mirrors needing recovery and does it.  Even if it
is booted with the drives sick, it does the right thing.

When I added log support to the system, I thought about going
directly to scsiio so I could write less data to the log,
(you have to go down to that level before 512 byte blocks
are visible).  But I also wanted to use the striping [] and
mirror {} capabilities so I created another special file
system "log".  (Special in the way that "main" is special.)
This decision made many other things easier, e.g. I can use
the buffer cache for recovery (getbuf/putbuf) and bypass it
otherwise (devwrite).

I would recommend using what is there, it works pretty well.

David Butler
gdb@dbSystems.com