[TUHS] /dev/drum

The Unix Heritage Society mailing list
 help / color / mirror / Atom feed

* [TUHS] /dev/drum
       [not found] <mailman.125.1524526228.3788.tuhs@minnie.tuhs.org>
@ 2018-04-23 23:44 ` Johnny Billquist
  2018-04-23 23:57   ` Steve Nickolas
                     ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Johnny Billquist @ 2018-04-23 23:44 UTC (permalink / raw)

On 2018-04-24 01:30, Grant Taylor <gtaylor at tnetconsulting.net> wrote:

> On 04/23/2018 04:15 PM, Warner Losh wrote:
>> It's weird. These days lower LBAs perform better on spinning drives.
>> We're seeing about 1.5x better performance on the first 30% of a drive
>> than on the last 30%, at least for read speeds for video streaming....
> I think manufacturers have switched things around on us.  I'm used to
> higher LBA numbers being on the outside of the disk.  But I've seen
> anecdotal indicators that the opposite is now true.

That must have been somewhere in the middle of history in that case. Old 
(proper) drives had/have track 0 at the outer edge. The disk loaded the 
heads after spin up, and that was at the outer edge, and then you just 
locked on to track 0, which should be near.
Heads had to be retracted for the disk pack to be replaced.

But this whole optimization for swap based on transfer speeds makes no 
sense to me. The dominating factor in spinning rust is seek times, and 
not transfer speed. If you place the swap at one end of the disk, it 
won't matter much that transfers will be faster, as seek times will on 
average be much longer, and that will eat up any transfer gain ten times 
over before even thinking. (Unless all your disk ever does is swapping, 
at which time the heads can stay around the swapping area all the time.)

Which is also why the file system for RSX (ODS-1) placed the index file 
(equivalent of the inode table) at the middle of the disk by default.

Not sure if Unix did that optimization, but I would hope so. (Never dug 
into that part of the code.)

   Johnny

-- 
Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
@ 2018-04-23 23:57   ` Steve Nickolas
  2018-04-24  0:24   ` Ronald Natalie
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Steve Nickolas @ 2018-04-23 23:57 UTC (permalink / raw)


On Tue, 24 Apr 2018, Johnny Billquist wrote:

> Which is also why the file system for RSX (ODS-1) placed the index file 
> (equivalent of the inode table) at the middle of the disk by default.
>
> Not sure if Unix did that optimization, but I would hope so. (Never dug into 
> that part of the code.)

That reminds me of the Apple ][, whose original DOS put the directory and 
bitmap table on track 17 (0-indexed) on a 35-track floppy disk.

-uso.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
  2018-04-23 23:57   ` Steve Nickolas
@ 2018-04-24  0:24   ` Ronald Natalie
  2018-04-24  0:25   ` Warren Toomey
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ronald Natalie @ 2018-04-24  0:24 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

It also makes no sense because disks of the day were constant angular density (unlike CDs for example, which are constant linear).    There’s no different in transfer rate anywhere on the disk.   Each track has the same number of sectors.
We spent a lot of time in those days with “elevator” algorithms and clustering inodes to try to minimize seek time.   The other thing that was done on the fancier devices (not often found on PDP-11s) was optimizing where you were in the rotational angle.    

The original UNIX filesystems were dumb.    It was <boot block><superblock><inodes><datablocks>.     Wasn’t until later things like the Berkeley file systems that things started to get more clever in layout.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
  2018-04-23 23:57   ` Steve Nickolas
  2018-04-24  0:24   ` Ronald Natalie
@ 2018-04-24  0:25   ` Warren Toomey
  2018-04-24  0:27     ` [TUHS] i-nodes in middle of disk Warren Toomey
  2018-04-24  0:31     ` [TUHS] /dev/drum Dave Horsfall
  2018-04-24  1:02   ` Lyndon Nerenberg
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Warren Toomey @ 2018-04-24  0:25 UTC (permalink / raw)


On Tue, Apr 24, 2018 at 01:44:06AM +0200, Johnny Billquist wrote:
>Which is also why the file system for RSX (ODS-1) placed the index 
>file (equivalent of the inode table) at the middle of the disk by 
>default.
>
>Not sure if Unix did that optimization, but I would hope so. (Never 
>dug into that part of the code.)

Boston Children's Museum RK05 driver for 6th Ed springs to mind!

Cheers, Warren


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] i-nodes in middle of disk
  2018-04-24  0:25   ` Warren Toomey
@ 2018-04-24  0:27     ` Warren Toomey
  2018-04-24 10:19       ` Dave Horsfall
  2018-04-24  0:31     ` [TUHS] /dev/drum Dave Horsfall
  1 sibling, 1 reply; 14+ messages in thread
From: Warren Toomey @ 2018-04-24  0:27 UTC (permalink / raw)


On Tue, Apr 24, 2018 at 10:25:17AM +1000, Warren Toomey wrote:
>On Tue, Apr 24, 2018 at 01:44:06AM +0200, Johnny Billquist wrote:
>>Which is also why the file system for RSX (ODS-1) placed the index 
>>file (equivalent of the inode table) at the middle of the disk by 
>>default.
>>
>>Not sure if Unix did that optimization, but I would hope so. (Never 
>>dug into that part of the code.)
>
>Boston Children's Museum RK05 driver for 6th Ed springs to mind!

See the blurb for the UNSW 01 image here:
http://www.tuhs.org/Archive/Distributions/UNSW

UNSW 01
-------
	Tape label: System Source Disk
		    DD format URK? BS=24B count=203 800bpi 9track
		    UNIX System Source 1 of 1
		    25/1/78

A distribution of UNIX source from UNSW, with several changes. record0.gz is
an RK05 image laid out according to the `Boston Children's Museum' format
(i-nodes in the middle). Latest file timestamp is Jan 24 1978. There is only
kernel source, plus a `unswbatch' directory. The latter seems to hold the
source to a UNIX batch system developed by Ian Johnstone and other at the
School of Electrical Engineering at UNSW.

record0.tar.gz is a tar archive of the RK05 image.

Cheers, Warren


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] i-nodes in middle of disk
  2018-04-24  0:27     ` [TUHS] i-nodes in middle of disk Warren Toomey
@ 2018-04-24 10:19       ` Dave Horsfall
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Horsfall @ 2018-04-24 10:19 UTC (permalink / raw)

On Tue, 24 Apr 2018, Warren Toomey wrote:

> UNSW 01
> -------
> 	Tape label: System Source Disk
> 		    DD format URK? BS=24B count=203 800bpi 9track
> 		    UNIX System Source 1 of 1
> 		    25/1/78
>
> A distribution of UNIX source from UNSW, with several changes. record0.gz is
> an RK05 image laid out according to the `Boston Children's Museum' format
> (i-nodes in the middle). Latest file timestamp is Jan 24 1978. There is only
> kernel source, plus a `unswbatch' directory. The latter seems to hold the
> source to a UNIX batch system developed by Ian Johnstone and other at the
> School of Electrical Engineering at UNSW.

Odd; I could've sworn that "URK" was un-rotated i.e. traditional format; 
are you sure about that?

And "unswbatch"...  Ahhh... I must take a look at that distribution some 
time, to see whether it has my fingerprints on it; Kevin Hill and I 
totally rewrote the system, throwing out IanJ's rubbish, with him doing 
the application stuff ("submit" etc[*]) and me doing the driver.  After 
that, it actually worked (once I'd figured out a nasty bug in KRONOS' 
UT-200 driver, that led to a POLL/REJECT loop).

[*]
I did a memorable hack to "submit" once; you see, we had an old VT-05
in the fishbowl, facing outwards for the sheeple, and it displayed the
batch queue (by title) in real time.  Well, me being me, I hacked up the
aforesaid "submit" command to take multiple arbitrary files as input with
a specified title, so for a while it displayed "xxx...  LLAMAS   ARE 
BIGGER   THAN     FROGS" (job names were 8 characters, and I have no idea
how that will display in people's MUAs)..

(I *think* I was actually working for them at that time.)

-- 
Dave Horsfall DTM (VK2KFU)  "Those who don't understand security will suffer."

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-24  0:25   ` Warren Toomey
  2018-04-24  0:27     ` [TUHS] i-nodes in middle of disk Warren Toomey
@ 2018-04-24  0:31     ` Dave Horsfall
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Horsfall @ 2018-04-24  0:31 UTC (permalink / raw)


On Tue, 24 Apr 2018, Warren Toomey wrote:

>> Not sure if Unix did that optimization, but I would hope so. (Never dug 
>> into that part of the code.)
>
> Boston Children's Museum RK05 driver for 6th Ed springs to mind!

Yep, with the inodes in the middle of the disk!  It also helped that we 
(UNSW) put the superblock in the middle of said slice...

-- 
Dave Horsfall BSc DTM (VK2KFU) -- FuglySoft -- Gosford IT -- Unix/C/Perl (AbW)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
                     ` (2 preceding siblings ...)
  2018-04-24  0:25   ` Warren Toomey
@ 2018-04-24  1:02   ` Lyndon Nerenberg
  2018-04-24  4:32   ` Grant Taylor
  2018-04-24  6:46   ` Lars Brinkhoff
  5 siblings, 0 replies; 14+ messages in thread
From: Lyndon Nerenberg @ 2018-04-24  1:02 UTC (permalink / raw)


> But this whole optimization for swap based on transfer speeds makes no sense 
> to me. The dominating factor in spinning rust is seek times, and not transfer 
> speed.

And thus were born cylinder groups.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
                     ` (3 preceding siblings ...)
  2018-04-24  1:02   ` Lyndon Nerenberg
@ 2018-04-24  4:32   ` Grant Taylor
  2018-04-24  4:49     ` Bakul Shah
  2018-04-24  6:46   ` Lars Brinkhoff
  5 siblings, 1 reply; 14+ messages in thread
From: Grant Taylor @ 2018-04-24  4:32 UTC (permalink / raw)


On 04/23/2018 05:44 PM, Johnny Billquist wrote:
> But this whole optimization for swap based on transfer speeds makes no 
> sense to me. The dominating factor in spinning rust is seek times, and 
> not transfer speed. If you place the swap at one end of the disk, it 
> won't matter much that transfers will be faster, as seek times will on 
> average be much longer, and that will eat up any transfer gain ten times 
> over before even thinking. (Unless all your disk ever does is swapping, 
> at which time the heads can stay around the swapping area all the time.)

I wonder if part of the (perceived?) performance gain was from the 
likelihood that swap at one end of the drive meant that things could be 
contiguous.  Seek, lay down / pick up a large (or at least not small) 
number of sectors, and seek back.

I had always assumed that the outer edge (what I thought was the end of 
the disk) was faster than the inner edge (what I thought was the 
beginning of the disk) because of geometry.  However, as Ronald stated, 
hard drives were constant angular density.  Thus negating what I 
originally thought about speed.



-- 
Grant. . . .
unix || die

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3982 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180423/da65f2ca/attachment.bin>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-24  4:32   ` Grant Taylor
@ 2018-04-24  4:49     ` Bakul Shah
  2018-04-24  4:59       ` Warner Losh
  0 siblings, 1 reply; 14+ messages in thread
From: Bakul Shah @ 2018-04-24  4:49 UTC (permalink / raw)

On Mon, 23 Apr 2018 22:32:26 -0600 Grant Taylor via TUHS <tuhs at minnie.tuhs.org> wrote:
> 
> I had always assumed that the outer edge (what I thought was the end of
> the disk) was faster than the inner edge (what I thought was the
> beginning of the disk) because of geometry.  However, as Ronald stated,
> hard drives were constant angular density.  Thus negating what I
> originally thought about speed.

Constant angular velocity means faster "linear" velocity for
tracks further away from the center.  Since 1990 or so disk
tracks are divided up in 16 or so "zones", where outer zones
have more blocks per track.  This translates to higher
throughput.

A modern Seagate Exos SAS disk may have a range of 279MB/s
(outermost) to 136MB/s (innermost) or 300MB/s to 210MB/s for
faster disks (15Krpm).  Disk vendors don't seem to break this
range out for consumer drives. But you can measure it using
tools like diskinfo on FreeBSD. For example:

# diskinfo -t /dev/ada4 # this is an 5 year old 1TB WD "Black" disk.
/dev/ada4
	...
        Not_Zoned       # Zone Mode <<== this seems wrong.
        ...
Transfer rates:
        outside:       102400 kbytes in   0.972176 sec =   105331 kbytes/sec
        middle:        102400 kbytes in   1.088977 sec =    94033 kbytes/sec
        inside:        102400 kbytes in   1.804460 sec =    56748 kbytes/sec

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-24  4:49     ` Bakul Shah
@ 2018-04-24  4:59       ` Warner Losh
  2018-04-24  6:22         ` Bakul Shah
  0 siblings, 1 reply; 14+ messages in thread
From: Warner Losh @ 2018-04-24  4:59 UTC (permalink / raw)


On Mon, Apr 23, 2018 at 10:49 PM, Bakul Shah <bakul at bitblocks.com> wrote:

> On Mon, 23 Apr 2018 22:32:26 -0600 Grant Taylor via TUHS <
> tuhs at minnie.tuhs.org> wrote:
> >
> > I had always assumed that the outer edge (what I thought was the end of
> > the disk) was faster than the inner edge (what I thought was the
> > beginning of the disk) because of geometry.  However, as Ronald stated,
> > hard drives were constant angular density.  Thus negating what I
> > originally thought about speed.
>
> Constant angular velocity means faster "linear" velocity for
> tracks further away from the center.  Since 1990 or so disk
> tracks are divided up in 16 or so "zones", where outer zones
> have more blocks per track.  This translates to higher
> throughput.
>
> A modern Seagate Exos SAS disk may have a range of 279MB/s
> (outermost) to 136MB/s (innermost) or 300MB/s to 210MB/s for
> faster disks (15Krpm).  Disk vendors don't seem to break this
> range out for consumer drives. But you can measure it using
> tools like diskinfo on FreeBSD. For example:
>
> # diskinfo -t /dev/ada4 # this is an 5 year old 1TB WD "Black" disk.
> /dev/ada4
>         ...
>         Not_Zoned       # Zone Mode <<== this seems wrong.
>

That's right. This is for BIO_ZONE stuff, which has to do with host managed
and host aware SMR drive zones. That's different than the zones you are
talking about.


>         ...
> Transfer rates:
>         outside:       102400 kbytes in   0.972176 sec =   105331
> kbytes/sec
>         middle:        102400 kbytes in   1.088977 sec =    94033
> kbytes/sec
>         inside:        102400 kbytes in   1.804460 sec =    56748
> kbytes/sec
>

Yes. This matches our experience where we get 1.5x better on the low LBAs
than the high LBAs. We're looking to 'short stroke' the drive to the first
part of it to get better performance... Toss a filesystem on top of it, and
have a more random workload and it's down to about 30% better than using
the whole drive....

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180423/5e564894/attachment-0001.html>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-24  4:59       ` Warner Losh
@ 2018-04-24  6:22         ` Bakul Shah
  2018-04-24 14:57           ` Warner Losh
  0 siblings, 1 reply; 14+ messages in thread
From: Bakul Shah @ 2018-04-24  6:22 UTC (permalink / raw)


On Mon, 23 Apr 2018 22:59:19 -0600 Warner Losh <imp at bsdimp.com> wrote:
> >         ...
> >         Not_Zoned       # Zone Mode <<== this seems wrong.
> >
> 
> That's right. This is for BIO_ZONE stuff, which has to do with host managed
> and host aware SMR drive zones. That's different than the zones you are
> talking about.

Ah. Thanks! Does host management of SMR zones provide better
throughput for sequential writes? Enough to make it worht it?
[I guess this may be something you guys may care about?]
Haven't had a chance to work on storage stuff for ages.  [Last
I played with Ceph was 5 years ago and at a higher level than
disks.]

> Yes. This matches our experience where we get 1.5x better on the low LBAs
> than the high LBAs. We're looking to 'short stroke' the drive to the first
> part of it to get better performance... Toss a filesystem on top of it, and
> have a more random workload and it's down to about 30% better than using
> the whole drive....

Is the tradeoff worth it? Now you have choices like Sata vs
SAS vs SDD vs PCIe....

We've come a long way from /dev/drum :-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-24  6:22         ` Bakul Shah
@ 2018-04-24 14:57           ` Warner Losh
  0 siblings, 0 replies; 14+ messages in thread
From: Warner Losh @ 2018-04-24 14:57 UTC (permalink / raw)

On Tue, Apr 24, 2018 at 12:22 AM, Bakul Shah <bakul at bitblocks.com> wrote:

> On Mon, 23 Apr 2018 22:59:19 -0600 Warner Losh <imp at bsdimp.com> wrote:
> > >         ...
> > >         Not_Zoned       # Zone Mode <<== this seems wrong.
> > >
> >
> > That's right. This is for BIO_ZONE stuff, which has to do with host
> managed
> > and host aware SMR drive zones. That's different than the zones you are
> > talking about.
>
> Ah. Thanks! Does host management of SMR zones provide better
> throughput for sequential writes? Enough to make it worht it?
> [I guess this may be something you guys may care about?]
> Haven't had a chance to work on storage stuff for ages.  [Last
> I played with Ceph was 5 years ago and at a higher level than
> disks.]

Right now, I don't think that we do anything in stock FreeBSD with the
zones. I've looked at trying to create some kind of FS that copes with the
large granularity writes blocks that host managed SMR drives would need to
do and changes we'd need to a write-in-place FS to take advantage of it.
It's possible, but it would turn UFS from a write-in-place system to a
write-in-place, but only for meta-data, and different free block allocation
methods. So far, it hasn't been enough of a win to be worth bothering with
for our application (eg, we can get 10-20% more storage, but that delta is
likely to remain constant, and the effort to make it happen is high enough
that the savings isn't there to pay for the development).

> Yes. This matches our experience where we get 1.5x better on the low LBAs
> > than the high LBAs. We're looking to 'short stroke' the drive to the
> first
> > part of it to get better performance... Toss a filesystem on top of it,
> and
> > have a more random workload and it's down to about 30% better than using
> > the whole drive....
>
> Is the tradeoff worth it? Now you have choices like Sata vs
> SAS vs SDD vs PCIe....
>

We have a multi-tiered storage architecture. When you want to play a video
from our service, we see if any of the close, fast boxes has a copy we can
use. If they are too busy, we go back to slower, but more complete tiers.
The last tier is made up of machines with lots of spinning disks. Some
catalogs are small enough that only using 1/2 the drive, but getting 30%
better throughput is the right engineering decision since it would improve
network utilization w/o needing to deploy more servers. We use all those
technologies in the different tiers: our fastest 100G boxes are NVMe, the
40G boxes we have are JBOD of SSDs, the 10G storage boxes are spinning
rust. We use SATA for SSDs since the SAS SSDs are super pricy, but we use
SAS HDDs since we need the deeper queues and other features of SAS that are
absent from SATA. We also sometimes oversubscribe PCIe lanes to get better
storage density at a cheaper price point.

There's lots of tradeoffs that can be made....

Warner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://minnie.tuhs.org/pipermail/tuhs/attachments/20180424/8050d0d9/attachment.html>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [TUHS] /dev/drum
  2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
                     ` (4 preceding siblings ...)
  2018-04-24  4:32   ` Grant Taylor
@ 2018-04-24  6:46   ` Lars Brinkhoff
  5 siblings, 0 replies; 14+ messages in thread
From: Lars Brinkhoff @ 2018-04-24  6:46 UTC (permalink / raw)


Johnny Billquist wrote:
> Which is also why the file system for RSX (ODS-1) placed the index
> file (equivalent of the inode table) at the middle of the disk by
> default.  Not sure if Unix did that optimization, but I would hope so.

I know of an operating system predating Unix which has that
optimization.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-04-24 14:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <mailman.125.1524526228.3788.tuhs@minnie.tuhs.org>
2018-04-23 23:44 ` [TUHS] /dev/drum Johnny Billquist
2018-04-23 23:57   ` Steve Nickolas
2018-04-24  0:24   ` Ronald Natalie
2018-04-24  0:25   ` Warren Toomey
2018-04-24  0:27     ` [TUHS] i-nodes in middle of disk Warren Toomey
2018-04-24 10:19       ` Dave Horsfall
2018-04-24  0:31     ` [TUHS] /dev/drum Dave Horsfall
2018-04-24  1:02   ` Lyndon Nerenberg
2018-04-24  4:32   ` Grant Taylor
2018-04-24  4:49     ` Bakul Shah
2018-04-24  4:59       ` Warner Losh
2018-04-24  6:22         ` Bakul Shah
2018-04-24 14:57           ` Warner Losh
2018-04-24  6:46   ` Lars Brinkhoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).