* [TUHS] /dev/drum [not found] <mailman.125.1524526228.3788.tuhs@minnie.tuhs.org> @ 2018-04-23 23:44 ` Johnny Billquist 2018-04-23 23:57 ` Steve Nickolas ` (5 more replies) 0 siblings, 6 replies; 14+ messages in thread From: Johnny Billquist @ 2018-04-23 23:44 UTC (permalink / raw) On 2018-04-24 01:30, Grant Taylor <gtaylor at tnetconsulting.net> wrote: > On 04/23/2018 04:15 PM, Warner Losh wrote: >> It's weird. These days lower LBAs perform better on spinning drives. >> We're seeing about 1.5x better performance on the first 30% of a drive >> than on the last 30%, at least for read speeds for video streaming.... > I think manufacturers have switched things around on us. I'm used to > higher LBA numbers being on the outside of the disk. But I've seen > anecdotal indicators that the opposite is now true. That must have been somewhere in the middle of history in that case. Old (proper) drives had/have track 0 at the outer edge. The disk loaded the heads after spin up, and that was at the outer edge, and then you just locked on to track 0, which should be near. Heads had to be retracted for the disk pack to be replaced. But this whole optimization for swap based on transfer speeds makes no sense to me. The dominating factor in spinning rust is seek times, and not transfer speed. If you place the swap at one end of the disk, it won't matter much that transfers will be faster, as seek times will on average be much longer, and that will eat up any transfer gain ten times over before even thinking. (Unless all your disk ever does is swapping, at which time the heads can stay around the swapping area all the time.) Which is also why the file system for RSX (ODS-1) placed the index file (equivalent of the inode table) at the middle of the disk by default. Not sure if Unix did that optimization, but I would hope so. (Never dug into that part of the code.) Johnny -- Johnny Billquist || "I'm on a bus || on a psychedelic trip email: bqt at softjar.se || Reading murder books pdp is alive! || tryin' to stay hip" - B. Idol ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist @ 2018-04-23 23:57 ` Steve Nickolas 2018-04-24 0:24 ` Ronald Natalie ` (4 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Steve Nickolas @ 2018-04-23 23:57 UTC (permalink / raw) On Tue, 24 Apr 2018, Johnny Billquist wrote: > Which is also why the file system for RSX (ODS-1) placed the index file > (equivalent of the inode table) at the middle of the disk by default. > > Not sure if Unix did that optimization, but I would hope so. (Never dug into > that part of the code.) That reminds me of the Apple ][, whose original DOS put the directory and bitmap table on track 17 (0-indexed) on a 35-track floppy disk. -uso. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist 2018-04-23 23:57 ` Steve Nickolas @ 2018-04-24 0:24 ` Ronald Natalie 2018-04-24 0:25 ` Warren Toomey ` (3 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Ronald Natalie @ 2018-04-24 0:24 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 716 bytes --] It also makes no sense because disks of the day were constant angular density (unlike CDs for example, which are constant linear). There’s no different in transfer rate anywhere on the disk. Each track has the same number of sectors. We spent a lot of time in those days with “elevator” algorithms and clustering inodes to try to minimize seek time. The other thing that was done on the fancier devices (not often found on PDP-11s) was optimizing where you were in the rotational angle. The original UNIX filesystems were dumb. It was <boot block><superblock><inodes><datablocks>. Wasn’t until later things like the Berkeley file systems that things started to get more clever in layout. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist 2018-04-23 23:57 ` Steve Nickolas 2018-04-24 0:24 ` Ronald Natalie @ 2018-04-24 0:25 ` Warren Toomey 2018-04-24 0:27 ` [TUHS] i-nodes in middle of disk Warren Toomey 2018-04-24 0:31 ` [TUHS] /dev/drum Dave Horsfall 2018-04-24 1:02 ` Lyndon Nerenberg ` (2 subsequent siblings) 5 siblings, 2 replies; 14+ messages in thread From: Warren Toomey @ 2018-04-24 0:25 UTC (permalink / raw) On Tue, Apr 24, 2018 at 01:44:06AM +0200, Johnny Billquist wrote: >Which is also why the file system for RSX (ODS-1) placed the index >file (equivalent of the inode table) at the middle of the disk by >default. > >Not sure if Unix did that optimization, but I would hope so. (Never >dug into that part of the code.) Boston Children's Museum RK05 driver for 6th Ed springs to mind! Cheers, Warren ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] i-nodes in middle of disk 2018-04-24 0:25 ` Warren Toomey @ 2018-04-24 0:27 ` Warren Toomey 2018-04-24 10:19 ` Dave Horsfall 2018-04-24 0:31 ` [TUHS] /dev/drum Dave Horsfall 1 sibling, 1 reply; 14+ messages in thread From: Warren Toomey @ 2018-04-24 0:27 UTC (permalink / raw) On Tue, Apr 24, 2018 at 10:25:17AM +1000, Warren Toomey wrote: >On Tue, Apr 24, 2018 at 01:44:06AM +0200, Johnny Billquist wrote: >>Which is also why the file system for RSX (ODS-1) placed the index >>file (equivalent of the inode table) at the middle of the disk by >>default. >> >>Not sure if Unix did that optimization, but I would hope so. (Never >>dug into that part of the code.) > >Boston Children's Museum RK05 driver for 6th Ed springs to mind! See the blurb for the UNSW 01 image here: http://www.tuhs.org/Archive/Distributions/UNSW UNSW 01 ------- Tape label: System Source Disk DD format URK? BS=24B count=203 800bpi 9track UNIX System Source 1 of 1 25/1/78 A distribution of UNIX source from UNSW, with several changes. record0.gz is an RK05 image laid out according to the `Boston Children's Museum' format (i-nodes in the middle). Latest file timestamp is Jan 24 1978. There is only kernel source, plus a `unswbatch' directory. The latter seems to hold the source to a UNIX batch system developed by Ian Johnstone and other at the School of Electrical Engineering at UNSW. record0.tar.gz is a tar archive of the RK05 image. Cheers, Warren ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] i-nodes in middle of disk 2018-04-24 0:27 ` [TUHS] i-nodes in middle of disk Warren Toomey @ 2018-04-24 10:19 ` Dave Horsfall 0 siblings, 0 replies; 14+ messages in thread From: Dave Horsfall @ 2018-04-24 10:19 UTC (permalink / raw) On Tue, 24 Apr 2018, Warren Toomey wrote: > UNSW 01 > ------- > Tape label: System Source Disk > DD format URK? BS=24B count=203 800bpi 9track > UNIX System Source 1 of 1 > 25/1/78 > > A distribution of UNIX source from UNSW, with several changes. record0.gz is > an RK05 image laid out according to the `Boston Children's Museum' format > (i-nodes in the middle). Latest file timestamp is Jan 24 1978. There is only > kernel source, plus a `unswbatch' directory. The latter seems to hold the > source to a UNIX batch system developed by Ian Johnstone and other at the > School of Electrical Engineering at UNSW. Odd; I could've sworn that "URK" was un-rotated i.e. traditional format; are you sure about that? And "unswbatch"... Ahhh... I must take a look at that distribution some time, to see whether it has my fingerprints on it; Kevin Hill and I totally rewrote the system, throwing out IanJ's rubbish, with him doing the application stuff ("submit" etc[*]) and me doing the driver. After that, it actually worked (once I'd figured out a nasty bug in KRONOS' UT-200 driver, that led to a POLL/REJECT loop). [*] I did a memorable hack to "submit" once; you see, we had an old VT-05 in the fishbowl, facing outwards for the sheeple, and it displayed the batch queue (by title) in real time. Well, me being me, I hacked up the aforesaid "submit" command to take multiple arbitrary files as input with a specified title, so for a while it displayed "xxx... LLAMAS ARE BIGGER THAN FROGS" (job names were 8 characters, and I have no idea how that will display in people's MUAs).. (I *think* I was actually working for them at that time.) -- Dave Horsfall DTM (VK2KFU) "Those who don't understand security will suffer." ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-24 0:25 ` Warren Toomey 2018-04-24 0:27 ` [TUHS] i-nodes in middle of disk Warren Toomey @ 2018-04-24 0:31 ` Dave Horsfall 1 sibling, 0 replies; 14+ messages in thread From: Dave Horsfall @ 2018-04-24 0:31 UTC (permalink / raw) On Tue, 24 Apr 2018, Warren Toomey wrote: >> Not sure if Unix did that optimization, but I would hope so. (Never dug >> into that part of the code.) > > Boston Children's Museum RK05 driver for 6th Ed springs to mind! Yep, with the inodes in the middle of the disk! It also helped that we (UNSW) put the superblock in the middle of said slice... -- Dave Horsfall BSc DTM (VK2KFU) -- FuglySoft -- Gosford IT -- Unix/C/Perl (AbW) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist ` (2 preceding siblings ...) 2018-04-24 0:25 ` Warren Toomey @ 2018-04-24 1:02 ` Lyndon Nerenberg 2018-04-24 4:32 ` Grant Taylor 2018-04-24 6:46 ` Lars Brinkhoff 5 siblings, 0 replies; 14+ messages in thread From: Lyndon Nerenberg @ 2018-04-24 1:02 UTC (permalink / raw) > But this whole optimization for swap based on transfer speeds makes no sense > to me. The dominating factor in spinning rust is seek times, and not transfer > speed. And thus were born cylinder groups. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist ` (3 preceding siblings ...) 2018-04-24 1:02 ` Lyndon Nerenberg @ 2018-04-24 4:32 ` Grant Taylor 2018-04-24 4:49 ` Bakul Shah 2018-04-24 6:46 ` Lars Brinkhoff 5 siblings, 1 reply; 14+ messages in thread From: Grant Taylor @ 2018-04-24 4:32 UTC (permalink / raw) On 04/23/2018 05:44 PM, Johnny Billquist wrote: > But this whole optimization for swap based on transfer speeds makes no > sense to me. The dominating factor in spinning rust is seek times, and > not transfer speed. If you place the swap at one end of the disk, it > won't matter much that transfers will be faster, as seek times will on > average be much longer, and that will eat up any transfer gain ten times > over before even thinking. (Unless all your disk ever does is swapping, > at which time the heads can stay around the swapping area all the time.) I wonder if part of the (perceived?) performance gain was from the likelihood that swap at one end of the drive meant that things could be contiguous. Seek, lay down / pick up a large (or at least not small) number of sectors, and seek back. I had always assumed that the outer edge (what I thought was the end of the disk) was faster than the inner edge (what I thought was the beginning of the disk) because of geometry. However, as Ronald stated, hard drives were constant angular density. Thus negating what I originally thought about speed. -- Grant. . . . unix || die -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3982 bytes Desc: S/MIME Cryptographic Signature URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180423/da65f2ca/attachment.bin> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-24 4:32 ` Grant Taylor @ 2018-04-24 4:49 ` Bakul Shah 2018-04-24 4:59 ` Warner Losh 0 siblings, 1 reply; 14+ messages in thread From: Bakul Shah @ 2018-04-24 4:49 UTC (permalink / raw) On Mon, 23 Apr 2018 22:32:26 -0600 Grant Taylor via TUHS <tuhs at minnie.tuhs.org> wrote: > > I had always assumed that the outer edge (what I thought was the end of > the disk) was faster than the inner edge (what I thought was the > beginning of the disk) because of geometry. However, as Ronald stated, > hard drives were constant angular density. Thus negating what I > originally thought about speed. Constant angular velocity means faster "linear" velocity for tracks further away from the center. Since 1990 or so disk tracks are divided up in 16 or so "zones", where outer zones have more blocks per track. This translates to higher throughput. A modern Seagate Exos SAS disk may have a range of 279MB/s (outermost) to 136MB/s (innermost) or 300MB/s to 210MB/s for faster disks (15Krpm). Disk vendors don't seem to break this range out for consumer drives. But you can measure it using tools like diskinfo on FreeBSD. For example: # diskinfo -t /dev/ada4 # this is an 5 year old 1TB WD "Black" disk. /dev/ada4 ... Not_Zoned # Zone Mode <<== this seems wrong. ... Transfer rates: outside: 102400 kbytes in 0.972176 sec = 105331 kbytes/sec middle: 102400 kbytes in 1.088977 sec = 94033 kbytes/sec inside: 102400 kbytes in 1.804460 sec = 56748 kbytes/sec ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-24 4:49 ` Bakul Shah @ 2018-04-24 4:59 ` Warner Losh 2018-04-24 6:22 ` Bakul Shah 0 siblings, 1 reply; 14+ messages in thread From: Warner Losh @ 2018-04-24 4:59 UTC (permalink / raw) On Mon, Apr 23, 2018 at 10:49 PM, Bakul Shah <bakul at bitblocks.com> wrote: > On Mon, 23 Apr 2018 22:32:26 -0600 Grant Taylor via TUHS < > tuhs at minnie.tuhs.org> wrote: > > > > I had always assumed that the outer edge (what I thought was the end of > > the disk) was faster than the inner edge (what I thought was the > > beginning of the disk) because of geometry. However, as Ronald stated, > > hard drives were constant angular density. Thus negating what I > > originally thought about speed. > > Constant angular velocity means faster "linear" velocity for > tracks further away from the center. Since 1990 or so disk > tracks are divided up in 16 or so "zones", where outer zones > have more blocks per track. This translates to higher > throughput. > > A modern Seagate Exos SAS disk may have a range of 279MB/s > (outermost) to 136MB/s (innermost) or 300MB/s to 210MB/s for > faster disks (15Krpm). Disk vendors don't seem to break this > range out for consumer drives. But you can measure it using > tools like diskinfo on FreeBSD. For example: > > # diskinfo -t /dev/ada4 # this is an 5 year old 1TB WD "Black" disk. > /dev/ada4 > ... > Not_Zoned # Zone Mode <<== this seems wrong. > That's right. This is for BIO_ZONE stuff, which has to do with host managed and host aware SMR drive zones. That's different than the zones you are talking about. > ... > Transfer rates: > outside: 102400 kbytes in 0.972176 sec = 105331 > kbytes/sec > middle: 102400 kbytes in 1.088977 sec = 94033 > kbytes/sec > inside: 102400 kbytes in 1.804460 sec = 56748 > kbytes/sec > Yes. This matches our experience where we get 1.5x better on the low LBAs than the high LBAs. We're looking to 'short stroke' the drive to the first part of it to get better performance... Toss a filesystem on top of it, and have a more random workload and it's down to about 30% better than using the whole drive.... Warner -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180423/5e564894/attachment-0001.html> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-24 4:59 ` Warner Losh @ 2018-04-24 6:22 ` Bakul Shah 2018-04-24 14:57 ` Warner Losh 0 siblings, 1 reply; 14+ messages in thread From: Bakul Shah @ 2018-04-24 6:22 UTC (permalink / raw) On Mon, 23 Apr 2018 22:59:19 -0600 Warner Losh <imp at bsdimp.com> wrote: > > ... > > Not_Zoned # Zone Mode <<== this seems wrong. > > > > That's right. This is for BIO_ZONE stuff, which has to do with host managed > and host aware SMR drive zones. That's different than the zones you are > talking about. Ah. Thanks! Does host management of SMR zones provide better throughput for sequential writes? Enough to make it worht it? [I guess this may be something you guys may care about?] Haven't had a chance to work on storage stuff for ages. [Last I played with Ceph was 5 years ago and at a higher level than disks.] > Yes. This matches our experience where we get 1.5x better on the low LBAs > than the high LBAs. We're looking to 'short stroke' the drive to the first > part of it to get better performance... Toss a filesystem on top of it, and > have a more random workload and it's down to about 30% better than using > the whole drive.... Is the tradeoff worth it? Now you have choices like Sata vs SAS vs SDD vs PCIe.... We've come a long way from /dev/drum :-) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-24 6:22 ` Bakul Shah @ 2018-04-24 14:57 ` Warner Losh 0 siblings, 0 replies; 14+ messages in thread From: Warner Losh @ 2018-04-24 14:57 UTC (permalink / raw) On Tue, Apr 24, 2018 at 12:22 AM, Bakul Shah <bakul at bitblocks.com> wrote: > On Mon, 23 Apr 2018 22:59:19 -0600 Warner Losh <imp at bsdimp.com> wrote: > > > ... > > > Not_Zoned # Zone Mode <<== this seems wrong. > > > > > > > That's right. This is for BIO_ZONE stuff, which has to do with host > managed > > and host aware SMR drive zones. That's different than the zones you are > > talking about. > > Ah. Thanks! Does host management of SMR zones provide better > throughput for sequential writes? Enough to make it worht it? > [I guess this may be something you guys may care about?] > Haven't had a chance to work on storage stuff for ages. [Last > I played with Ceph was 5 years ago and at a higher level than > disks.] Right now, I don't think that we do anything in stock FreeBSD with the zones. I've looked at trying to create some kind of FS that copes with the large granularity writes blocks that host managed SMR drives would need to do and changes we'd need to a write-in-place FS to take advantage of it. It's possible, but it would turn UFS from a write-in-place system to a write-in-place, but only for meta-data, and different free block allocation methods. So far, it hasn't been enough of a win to be worth bothering with for our application (eg, we can get 10-20% more storage, but that delta is likely to remain constant, and the effort to make it happen is high enough that the savings isn't there to pay for the development). > Yes. This matches our experience where we get 1.5x better on the low LBAs > > than the high LBAs. We're looking to 'short stroke' the drive to the > first > > part of it to get better performance... Toss a filesystem on top of it, > and > > have a more random workload and it's down to about 30% better than using > > the whole drive.... > > Is the tradeoff worth it? Now you have choices like Sata vs > SAS vs SDD vs PCIe.... > We have a multi-tiered storage architecture. When you want to play a video from our service, we see if any of the close, fast boxes has a copy we can use. If they are too busy, we go back to slower, but more complete tiers. The last tier is made up of machines with lots of spinning disks. Some catalogs are small enough that only using 1/2 the drive, but getting 30% better throughput is the right engineering decision since it would improve network utilization w/o needing to deploy more servers. We use all those technologies in the different tiers: our fastest 100G boxes are NVMe, the 40G boxes we have are JBOD of SSDs, the 10G storage boxes are spinning rust. We use SATA for SSDs since the SAS SSDs are super pricy, but we use SAS HDDs since we need the deeper queues and other features of SAS that are absent from SATA. We also sometimes oversubscribe PCIe lanes to get better storage density at a cheaper price point. There's lots of tradeoffs that can be made.... Warner -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180424/8050d0d9/attachment.html> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [TUHS] /dev/drum 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist ` (4 preceding siblings ...) 2018-04-24 4:32 ` Grant Taylor @ 2018-04-24 6:46 ` Lars Brinkhoff 5 siblings, 0 replies; 14+ messages in thread From: Lars Brinkhoff @ 2018-04-24 6:46 UTC (permalink / raw) Johnny Billquist wrote: > Which is also why the file system for RSX (ODS-1) placed the index > file (equivalent of the inode table) at the middle of the disk by > default. Not sure if Unix did that optimization, but I would hope so. I know of an operating system predating Unix which has that optimization. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-04-24 14:57 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <mailman.125.1524526228.3788.tuhs@minnie.tuhs.org> 2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist 2018-04-23 23:57 ` Steve Nickolas 2018-04-24 0:24 ` Ronald Natalie 2018-04-24 0:25 ` Warren Toomey 2018-04-24 0:27 ` [TUHS] i-nodes in middle of disk Warren Toomey 2018-04-24 10:19 ` Dave Horsfall 2018-04-24 0:31 ` [TUHS] /dev/drum Dave Horsfall 2018-04-24 1:02 ` Lyndon Nerenberg 2018-04-24 4:32 ` Grant Taylor 2018-04-24 4:49 ` Bakul Shah 2018-04-24 4:59 ` Warner Losh 2018-04-24 6:22 ` Bakul Shah 2018-04-24 14:57 ` Warner Losh 2018-04-24 6:46 ` Lars Brinkhoff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).