From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 14 Jan 1998 17:56:24 -0500 From: Mark Salyzyn salyzyn_mark@dpt.com Subject: [9fans] DPT PM2041W observations Topicbox-Message-UUID: 7013e7d2-eac8-11e9-9e20-41e7f4b1d025 Message-ID: <19980114225624.SB_0uA9zhktVO0wKXrRts4Z7Zrq_VqvF6NJfiYd8UjY@z> In article <199801120019.SAA14049@ns.dbSystems.com>, you wrote: >I ordered a PM2041W and it arrived Friday. I started with >the dpt driver in the update on plan9.bell-labs.com, the driver >kit from DPT and source to the BSDi driver from DPT and modified >my 154x/174x driver to talk to the 2041. I finished Saturday >afternoon and ran some performance tests. > >The 2041 is about 10%-15% *slower* than the 154x! My main test, >write performance, gets 1260K/s on the 2041 and 1440K/s on the >154x to a single drive. Ick! ;-> On the BSDi system, I found I could get something like 2MB/s to a Barracuda, but that doesn't mean much as its not a controlled test, with a 2041. That performance was 30% less than the 1540 before I started optimizing the driver (so yes, you can make a sucky driver for the DPT controller!), but I would never admit to that publicly ;-> Ooopps The keys to high performance with the ISA cards is to ensure that the DMA activity is minimized as it takes 150us to set up the DMA. I went to some trouble coalescing physically adjacent scatter gather entries. There is a DMA setup for every command, and for the scatter gather table, and then for each element for the scatter gather table. EISA and PCI cards only got a 0.2% improvement by coalescing scatter gather entries on the other hand (their DMA setup time is sub microsecond), and in some cases actually got slower (a direct result of using a slower host processor to do the optimization). Adapter cache makes a difference, but only after you ensure your DMA activities are minimized. The things I could tell you about DMA but can't ... ;-< >The computer is a P5/66 EISA bus computer with Micropolis 4421 >hard drives. I used the cable that came with the DPT. On this >same computer and disk with a 1740 in enhanced mode I can get >2680K/s write performance, so the computer and the drive don't >seem to be the problem. The same drive connected to a 2940 >on a P5/133 PCI machine running Linux(yuck) can write ~4000K/s! Isn't cacheing wonderful! A large buffer arena is *great*! now, try that `sustained' ... >So, what gives? I read in this group that the Smartcache IV >ISA controller was timed 40% faster than the 2940. With that >kind of endorsement, I had to give it a try. I wonder what >stupid driver was used on the 2940? It was me, this test occured with BSDi 2.1. The test was done with a modified microsecond accurate version of the dd command done on a raw sequential read over the entire (2G) drive. sequential reads on the raw device is not real life. BSDi 2.1 did no command queueing, BTW. Plan 9 appears to be able to do this. >The way my drvier works is a call to scsiio creates a scsi_queue >entry ordered by r/w, unit, lun, scsi block address with all the >data copied for any >16M buffering and calls the routine that >creates ccb's. The ccb routine gathers multiple scsi_queue >elements into a scatter/gather list of no more than 32k and >queues it to the adapter. Are you ensuring that adjacent Scatter Gather entries get coalesced? >On interrupt I get the adapter status, target status and ccb >address from the status packet then acknowledge the interrupt. >I then wake up each process waiting on it's scsi_queue entry. >If there are no more active ccb's (in other words, I shouldn't >expect another interrupt, I then try to lanuch more ccb's before >I return from the interrupt routine. As soon as you read the status register, the status packet for the next command gets filled and then the interrupt is reasserted. You read the aux status register to prevent the next interrupt to be set up. >I understand that the ISA bus is the problem I'm running into. >I'm just trying to get to all of it. In PIO mode, the ISA >bus should be good for ~2MB/s. (8Mhz, 16bit, 8 clocks/transfer.) >In DMA burst mode with 12us bus on time and 2us bus off time, >I would expect ~6MB/s. (Burst at 8MB/s 80% of the time.) >I'm only getting about ~2MB/s throughput to the drives, so is >the drive idle for some time? I know the adapter can accept >multiple outstanding commands, can the drives? You have to ask the drives that :-> There is a mode page for that information. The adapter will ensure that if a drive can not take it, the adapter will hold the commands and sort them for you instead. Highest performance is usually when you have one command on the drive, one more primed on the drive (two outstanding to the controller) and the remaining residing on the HOST machine to be sorted and coalesced into larger requests (elevator optimization). If there is a large number of sequential requests, then send them off to the drive instead of holding back (a sequential mode). The controller will do the elevator optimizations, the drive will gain from the `zero relative access' latency improvement. If the OS does not do elevator optimization (plan 9?), then it is best to let the controller to the elevator optimization. >If the answer to that is "yes, with Tag Queueing", then I >need some kind soul to help me get that to happen on the >DPT adapter. Its on by default. You can turn it off by doing a during the BIOS setup phase and changing the command queueing parameter to off. > (The 154x can't do that.) It can ... but it is some work ... I hope my ramblings help. Sincerely -- Mark Salyzyn