9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] DPT PM2041W observations
@ 1998-01-12 20:38 Eric
  0 siblings, 0 replies; 8+ messages in thread
From: Eric @ 1998-01-12 20:38 UTC (permalink / raw)



> Date: Mon, 12 Jan 1998 13:58:43 -0600
> From: "G. David Butler" <gdb@dbSystems.com>
> To: 9fans@cse.psu.edu
> Subject: Re: [9fans] DPT PM2041W observations
> 
> >If the answer to that is "yes, with Tag Queueing", then I
> >need some kind soul to help me get that to happen on the
> >DPT adapter.  (The 154x can't do that.)  None of the
> >documentation or source I have enable that feature.
> 
> I found and read the relevant parts of the SCSI-2 spec.
> I now have tag queueing working.  I also found the DPO flag
> which helps both cards.  Now the 154x does 1460MB/s (2% isn't
> much, but anything helps.)

Cool.  Patch?

> So why does this 4th generation card with tag queueing support
> run slower than the 1st generation design that is over 10 years
> old?

Perhaps the driver for the 1540 is better optimized as it's been
around longer, and it's had more work put into it.  There might
be something in the DPT documentation that one can do with the 
card that one can't do with a 1540 ( bus on/off? ).  1540 is a
well-developed and understood card, whereas most other stuff
isin't nearly as common.

> David Butler
> gdb@dbSystems.com

Regards,

Eric Dorman
edorman@ucsd.edu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-14 22:56 Mark
  0 siblings, 0 replies; 8+ messages in thread
From: Mark @ 1998-01-14 22:56 UTC (permalink / raw)


In article <199801120019.SAA14049@ns.dbSystems.com>, you wrote:
>I ordered a PM2041W and it arrived Friday.  I started with
>the dpt driver in the update on plan9.bell-labs.com, the driver
>kit from DPT and source to the BSDi driver from DPT and modified
>my 154x/174x driver to talk to the 2041.  I finished Saturday
>afternoon and ran some performance tests.
>
>The 2041 is about 10%-15% *slower* than the 154x!  My main test,
>write performance, gets 1260K/s on the 2041 and 1440K/s on the
>154x to a single drive.

Ick! <an unnofficial DPT response> ;->

On the BSDi system, I found I could get something like 2MB/s to a
Barracuda, but that doesn't mean much as its not a controlled test,
with a 2041. That performance was 30% less than the 1540 before
I started optimizing the driver (so yes, you can make a sucky driver
for the DPT controller!), but I would never admit to that publicly ;->

Ooopps

The keys to high performance with the ISA cards is to ensure that
the DMA activity is minimized as it takes 150us to set up the DMA.
I went to some trouble coalescing physically adjacent scatter gather
entries. There is a DMA setup for every command, and for the scatter
gather table, and then for each element for the scatter gather table.
EISA and PCI cards only got a 0.2% improvement by coalescing scatter
gather entries on the other hand (their DMA setup time is sub
microsecond), and in some cases actually got slower (a direct
result of using a slower host processor to do the optimization).

Adapter cache makes a difference, but only after you ensure your
DMA activities are minimized. The things I could tell you about DMA
but can't ... ;-<

>The computer is a P5/66 EISA bus computer with Micropolis 4421
>hard drives.  I used the cable that came with the DPT.  On this
>same computer and disk with a 1740 in enhanced mode I can get
>2680K/s write performance, so the computer and the drive don't
>seem to be the problem.  The same drive connected to a 2940
>on a P5/133 PCI machine running Linux(yuck) can write ~4000K/s!

Isn't cacheing wonderful! A large buffer arena is *great*!

now, try that `sustained' ...

>So, what gives?  I read in this group that the Smartcache IV
>ISA controller was timed 40% faster than the 2940.  With that
>kind of endorsement, I had to give it a try.  I wonder what
>stupid driver was used on the 2940?

It was me, this test occured with BSDi 2.1. The test was done
with a modified microsecond accurate version of the dd command done
on a raw sequential read over the entire (2G) drive. sequential
reads on the raw device is not real life. BSDi 2.1 did no command
queueing, BTW. Plan 9 appears to be able to do this.

>The way my drvier works is a call to scsiio creates a scsi_queue
>entry ordered by r/w, unit, lun, scsi block address with all the
>data copied for any >16M buffering and calls the routine that
>creates ccb's.  The ccb routine gathers multiple scsi_queue
>elements into a scatter/gather list of no more than 32k and
>queues it to the adapter.

Are you ensuring that adjacent Scatter Gather entries get coalesced?

>On interrupt I get the adapter status, target status and ccb
>address from the status packet then acknowledge the interrupt.
>I then wake up each process waiting on it's scsi_queue entry.
>If there are no more active ccb's (in other words, I shouldn't
>expect another interrupt, I then try to lanuch more ccb's before
>I return from the interrupt routine.

As soon as you read the status register, the status packet for
the next command gets filled and then the interrupt is
reasserted. You read the aux status register to prevent the
next interrupt to be set up.

>I understand that the ISA bus is the problem I'm running into.
>I'm just trying to get to all of it.  In PIO mode, the ISA
>bus should be good for ~2MB/s. (8Mhz, 16bit, 8 clocks/transfer.)
>In DMA burst mode with 12us bus on time and 2us bus off time,
>I would expect ~6MB/s. (Burst at 8MB/s 80% of the time.)
>I'm only getting about ~2MB/s throughput to the drives, so is
>the drive idle for some time?  I know the adapter can accept
>multiple outstanding commands, can the drives?

You have to ask the drives that :-> There is a mode page
for that information. The adapter will ensure that if a 
drive can not take it, the adapter will hold the commands
and sort them for you instead.

Highest performance is usually when you have one command on
the drive, one more primed on the drive (two outstanding to
the controller) and the remaining residing on the HOST machine
to be sorted and coalesced into larger requests (elevator optimization).

If there is a large number of sequential requests, then send them
off to the drive instead of holding back (a sequential mode). The
controller will do the elevator optimizations, the drive will gain from
the `zero relative access' latency improvement.

If the OS does not do elevator optimization (plan 9?), then it
is best to let the controller to the elevator optimization.

>If the answer to that is "yes, with Tag Queueing", then I
>need some kind soul to help me get that to happen on the
>DPT adapter.

Its on by default. You can turn it off by doing a <CTRL-D>
during the BIOS setup phase and changing the command
queueing parameter to off.

>  (The 154x can't do that.) 

It can ... but it is some work ...

I hope my ramblings help.

Sincerely -- Mark Salyzyn




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-14 14:57 G.David
  0 siblings, 0 replies; 8+ messages in thread
From: G.David @ 1998-01-14 14:57 UTC (permalink / raw)


Some further questions.

How do I abort a command?

If a driver needs to time out on a command and reclaim the
memory, how do I tell the DPT that not only am I not interested
in command anymore, but if it returns with the address of
a ccb the driver has reused, bad things will happen.

If there is a way to abort the command to the adapter,
how long does it wait for the queued command to come back
from the target?  What happens if that is not long enough?
Or does the adapter tell the target to abort the queued
operation?

In my Adaptec 154x driver, I send a ccb abort message and the
controller acknowledges it before I reuse the memory.

Is there a current technical reference manual?

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-12 23:18 G.David
  0 siblings, 0 replies; 8+ messages in thread
From: G.David @ 1998-01-12 23:18 UTC (permalink / raw)


>So why does this 4th generation card with tag queueing support
>run slower than the 1st generation design that is over 10 years
>old?

Some further info.  Having enabled tag queueing I went back to
do other benchmarks since the sequential write performance is
now in the same ballpark.  Tag queueing makes a big difference
everywhere!  Random reads and writes are much faster than without
tags.  In fact, the overall result is that the DPT, with tag
queueing, is quite a bit faster than the 154x.  Without tag
queueing the 154x is faster.

Now should I investigate the cache/raid add-on card?  I currently
do "software" mirroring across controllers to handle controller
failure.  Most drives now have advertised MTBFs of 1,000,000 hours.
The 154x has a MTBF of 300,000.  What is the DPT's?

I like the idea of mirroring on the controller so the data only
passes over the ISA bus once.  Is this true?  What is the minimum
configuration for the 2041W to do RAID-1?  What is the MTBF of
that configuration?  In that configuration is there any caching?
Does the cache honor DPO and FUA in the extended read/write
SCSI commands?

Is there a current technical reference manual available?

Thanks for any info.

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-12 23:17 Eric
  0 siblings, 0 replies; 8+ messages in thread
From: Eric @ 1998-01-12 23:17 UTC (permalink / raw)



Gdb wrote:
> Date: Mon, 12 Jan 1998 14:52:37 -0600
> From: "G. David Butler" <gdb@dbSystems.com>
 
> >> So why does this 4th generation card with tag queueing support
> >> run slower than the 1st generation design that is over 10 years
> >> old?
> >Perhaps the driver for the 1540 is better optimized as it's been
> >around longer, and it's had more work put into it.  There might
> Remember, I have written both drivers from scratch and each
> is optimized for the controller it is talking to.

  Is it?  I gathered from your original post that your guide was 
the BSDi DPT driver, and teaching the 154x driver to talk to the 
DPT; the DPT docs weren't mentioned..  Is the BSDi driver considered 
definative?

  Historically I vaguely recall the DPT ppl claiming their card 
was 'faster' than a 154x, but that may be only for certain subsets 
of problems.  DPT may be interested in discovering why, for this 
test, it apparently loses out.. then again, as PCI has supplanted 
ISA for this application, maybe not.  

  Do you have a caching module?  It may be that w/o one the card 
isin't a win.  The caching might also take advantage of a request
distribution similar to the usual Micro$sloth stuff which isin't
valid for Plan9.
 
> David Butler
> gdb@dbSystems.com

Regards,

Eric Dorman
edorman@ucsd.edu





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-12 20:52 G.David
  0 siblings, 0 replies; 8+ messages in thread
From: G.David @ 1998-01-12 20:52 UTC (permalink / raw)


>Cool.  Patch?

For DPO?  Change wren.c to encode DPO in cmd[1].
cmd[1] = 0x10;  For command queueing?  It won't help the current
DPT driver since it only sends one command to the card at a
time, just like the 154x driver that came with the system.
If you want to try, set msg[1] to 0x20 and msg[2] to 0.
To use command queueing you need to be able to send multiple
commands to each *target*!

>> So why does this 4th generation card with tag queueing support
>> run slower than the 1st generation design that is over 10 years
>> old?
>
>Perhaps the driver for the 1540 is better optimized as it's been
>around longer, and it's had more work put into it.  There might

Remember, I have written both drivers from scratch and each
is optimized for the controller it is talking to.

>be something in the DPT documentation that one can do with the 
>card that one can't do with a 1540 ( bus on/off? ).  1540 is a

Yes I know.  For the comparison I turned the 154x back to 12us
on and 2us off, to match the DPT.  I usally run the 154x at
13us on 1us off.

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-12 19:58 G.David
  0 siblings, 0 replies; 8+ messages in thread
From: G.David @ 1998-01-12 19:58 UTC (permalink / raw)


>If the answer to that is "yes, with Tag Queueing", then I
>need some kind soul to help me get that to happen on the
>DPT adapter.  (The 154x can't do that.)  None of the
>documentation or source I have enable that feature.

I found and read the relevant parts of the SCSI-2 spec.
I now have tag queueing working.  I also found the DPO flag
which helps both cards.  Now the 154x does 1460MB/s (2% isn't
much, but anything helps.)

With tag queueing and 4 outstanding commands the DPT now gets
1370MB/s up from 1260MB/s.  At 8 outstanding commands it only
gets 1310MB/s.  At 2 outstanding commands it gets 1430MB/s.

So why does this 4th generation card with tag queueing support
run slower than the 1st generation design that is over 10 years
old?

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [9fans] DPT PM2041W observations
@ 1998-01-12  0:19 G.David
  0 siblings, 0 replies; 8+ messages in thread
From: G.David @ 1998-01-12  0:19 UTC (permalink / raw)


I ordered a PM2041W and it arrived Friday.  I started with
the dpt driver in the update on plan9.bell-labs.com, the driver
kit from DPT and source to the BSDi driver from DPT and modified
my 154x/174x driver to talk to the 2041.  I finished Saturday
afternoon and ran some performance tests.

The 2041 is about 10%-15% *slower* than the 154x!  My main test,
write performance, gets 1260K/s on the 2041 and 1440K/s on the
154x to a single drive.  During the test the BUSY and WRITE leds
are on *solid* and the DATA FROM HOST and IRQ lights were on but
very dim.  The light on the drive was on solid too, but that never
means much.  (BTW, my test is real world.  The write data is
generated by my logging subsystem to one of the drives.  I just
cd to a directory and copy files to it.)

The computer is a P5/66 EISA bus computer with Micropolis 4421
hard drives.  I used the cable that came with the DPT.  On this
same computer and disk with a 1740 in enhanced mode I can get
2680K/s write performance, so the computer and the drive don't
seem to be the problem.  The same drive connected to a 2940
on a P5/133 PCI machine running Linux(yuck) can write ~4000K/s!

So, what gives?  I read in this group that the Smartcache IV
ISA controller was timed 40% faster than the 2940.  With that
kind of endorsement, I had to give it a try.  I wonder what
stupid driver was used on the 2940?

The way my drvier works is a call to scsiio creates a scsi_queue
entry ordered by r/w, unit, lun, scsi block address with all the
data copied for any >16M buffering and calls the routine that
creates ccb's.  The ccb routine gathers multiple scsi_queue
elements into a scatter/gather list of no more than 32k and
queues it to the adapter.  It stops after 4 ccb's per target.
On interrupt I get the adapter status, target status and ccb
address from the status packet then acknowledge the interrupt.
I then wake up each process waiting on it's scsi_queue entry.
If there are no more active ccb's (in other words, I shouldn't
expect another interrupt, I then try to lanuch more ccb's before
I return from the interrupt routine.  Once a process is awake,
it gets the status that was saved and if all is well, it copies
any >16M data and returns to the calling routine.  It is very
fast and keeps the adapter very busy.

I understand that the ISA bus is the problem I'm running into.
I'm just trying to get to all of it.  In PIO mode, the ISA
bus should be good for ~2MB/s. (8Mhz, 16bit, 8 clocks/transfer.)
In DMA burst mode with 12us bus on time and 2us bus off time,
I would expect ~6MB/s. (Burst at 8MB/s 80% of the time.)
I'm only getting about ~2MB/s throughput to the drives, so is
the drive idle for some time?  I know the adapter can accept
multiple outstanding commands, can the drives?

If the answer to that is "yes, with Tag Queueing", then I
need some kind soul to help me get that to happen on the
DPT adapter.  (The 154x can't do that.)  None of the
documentation or source I have enable that feature.

Thanks for any help.

David Butler
gdb@dbSystems.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1998-01-14 22:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-01-12 20:38 [9fans] DPT PM2041W observations Eric
  -- strict thread matches above, loose matches on Subject: below --
1998-01-14 22:56 Mark
1998-01-14 14:57 G.David
1998-01-12 23:18 G.David
1998-01-12 23:17 Eric
1998-01-12 20:52 G.David
1998-01-12 19:58 G.David
1998-01-12  0:19 G.David

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).