The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Perkin-Elmer Sort/Merge II vs Unix sort(1)
@ 2025-01-17 17:23 Diomidis Spinellis
  2025-01-17 19:10 ` [TUHS] " Bakul Shah via TUHS
  2025-01-17 20:07 ` John Levine
  0 siblings, 2 replies; 17+ messages in thread
From: Diomidis Spinellis @ 2025-01-17 17:23 UTC (permalink / raw)
  To: TUHS main list

I chanced upon a brochure describing the Perkin-Elmer Series 3200 / 
(previously Interdata, later Concurrent Computer Corporation) Sort/Merge 
II utility [1].  It is instructive to compare its design against that of 
the contemporary Unix sort(1) program [2].

- Sort/Merge II appears to be marketed as a separate product (P/N 
S90-408), whereas sort(1) was/is an integral part of the Unix used 
throughout the system.

- Sort/Merge II provides interactive and batch command input modes; 
sort(1) relies on the shell to support both usages.

- Sort/Merge II appears to be able to also sort binary files; sort(1) 
can only handle text.

- Sort/Merge II can recover from run-time errors by interactively 
prompting for user corrections and additional files.  In Unix this is 
delegated to shell scripts.

- Sort/Merge II has built-in support for tape handling and blocking; 
sort(1) relies on pipes from/to dd(1) for this.

- Sort/Merge II supports user-coded decision subroutines written in 
FORTRAN, COBOL, or CAL.  Sort(1) doesn't have such support to this day. 
One could construct a synthetic key with awk(1) if needed.

- Sort/Merge II can automatically "allocate" its temporary file.  For 
sort(1) file allocation is handled by the Unix kernel.

To me this list is a real-life demonstration of the differences between 
the, prevalent at the time, thoughtless agglomeration of features into a 
monolith approach against Unix's careful separation of concerns and 
modularization via small tools.  The same contrast appears in a more 
contrived setting in J. Bentley's CACM Programming Pearl's column where 
Doug McIlroy critiques a unique word counting literate program written 
by Don Knuth [3].  (I slightly suspect that the initial program 
specification was a trap set up for Knuth.)

I also think that the design of Perkin-Elmer's Sort/Merge II shows the 
influence of salespeople forcing developers to tack-on whatever features 
were required by important customers.  Maybe the clean design of Unix 
owes a lot to AT&T's operation under the 1956 consent decree that 
prevented it from entering the computer market.  This may have shielded 
the system's design from unhealthy market pressures during its critical 
gestation years.


[1] 
https://bitsavers.computerhistory.org/pdf/interdata/32bit/brochures/Sort_Merge_II.pdf
[2] https://s3.amazonaws.com/plan9-bell-labs/7thEdMan/v7vol1.pdf#page=166
[3] https://doi.org/10.1145/5948.315654

Diomidis - https://www.spinellis.gr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 17:23 [TUHS] Perkin-Elmer Sort/Merge II vs Unix sort(1) Diomidis Spinellis
@ 2025-01-17 19:10 ` Bakul Shah via TUHS
  2025-01-17 19:35   ` Marc Rochkind
  2025-01-17 20:07 ` John Levine
  1 sibling, 1 reply; 17+ messages in thread
From: Bakul Shah via TUHS @ 2025-01-17 19:10 UTC (permalink / raw)
  To: Diomidis Spinellis; +Cc: TUHS main list

On Jan 17, 2025, at 9:23 AM, Diomidis Spinellis <dds@aueb.gr> wrote:
> 
> I also think that the design of Perkin-Elmer's Sort/Merge II shows the influence of salespeople forcing developers to tack-on whatever features were required by important customers.  Maybe the clean design of Unix owes a lot to AT&T's operation under the 1956 consent decree that prevented it from entering the computer market.  This may have shielded the system's design from unhealthy market pressures during its critical gestation years.

IIRC sort/merge was/is a pretty major thing on IBM mainframes, with
products from multiple companies. May be Perkin-Elmer were trying
to compete with mainframe sort/merge products? Also, I suspect that
for sorting terabytes of data Unix sort likely won't work as fast as
mainframe sorts....

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 19:10 ` [TUHS] " Bakul Shah via TUHS
@ 2025-01-17 19:35   ` Marc Rochkind
  2025-01-18 14:51     ` Diomidis Spinellis
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Rochkind @ 2025-01-17 19:35 UTC (permalink / raw)
  To: Bakul Shah; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

Why did you say "thoughtless agglomeration of features?"

Do you know anything about the design of the P-E S/M, or is just a biased
guess? Have you ever tried a large external sort with UNIX commands?

Marc

On Fri, Jan 17, 2025, 12:10 PM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:

> On Jan 17, 2025, at 9:23 AM, Diomidis Spinellis <dds@aueb.gr> wrote:
> >
> > I also think that the design of Perkin-Elmer's Sort/Merge II shows the
> influence of salespeople forcing developers to tack-on whatever features
> were required by important customers.  Maybe the clean design of Unix owes
> a lot to AT&T's operation under the 1956 consent decree that prevented it
> from entering the computer market.  This may have shielded the system's
> design from unhealthy market pressures during its critical gestation years.
>
> IIRC sort/merge was/is a pretty major thing on IBM mainframes, with
> products from multiple companies. May be Perkin-Elmer were trying
> to compete with mainframe sort/merge products? Also, I suspect that
> for sorting terabytes of data Unix sort likely won't work as fast as
> mainframe sorts....

[-- Attachment #2: Type: text/html, Size: 1551 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 17:23 [TUHS] Perkin-Elmer Sort/Merge II vs Unix sort(1) Diomidis Spinellis
  2025-01-17 19:10 ` [TUHS] " Bakul Shah via TUHS
@ 2025-01-17 20:07 ` John Levine
  2025-01-18  4:46   ` Dave Horsfall
  1 sibling, 1 reply; 17+ messages in thread
From: John Levine @ 2025-01-17 20:07 UTC (permalink / raw)
  To: tuhs

It appears that Diomidis Spinellis <dds@aueb.gr> said:
>I chanced upon a brochure describing the Perkin-Elmer Series 3200 / 
>(previously Interdata, later Concurrent Computer Corporation) Sort/Merge 
>II utility [1].  It is instructive to compare its design against that of 
>the contemporary Unix sort(1) program [2].

That's not a resaonable comparison. In the 1960s and 1970s computers spent more
time doing sort/merge than anything else, perhaps than everything else. Computer
manufacturers tried really hard to make sorting fast, with clever hacks like
compiling the comparison rules into machine code so they don't have to be
reinterpreted for each record, scheduling their own I/O to keep devices busy,
and reading intermediate tapes backward so they didn't have to rewind between
passes. 

They also handle really big files, tape files that span more than one tape reel
or sometimes disk files that span more than one removable pack.. In that era a
tape held about 150MB and a 3330 disk pack was about 100MB. If you had big
files, you had to keep them on tape and that meant a lot of sorting and merging
to do updates. Even on disk, databases were nothing like they are now and what
would now be in a SQL database was more likely in sorted files that were
rewritten periodically with changes merged in.

The P-E sort is a mainframe sort.  Compare it to this IBM DOS VS sort and you'll
see many of the same features, I am sure not by coincidence.

https://bitsavers.computerhistory.org/pdf/ibm/370/DOS_VS/SC33-4044-2_DOS_VS_Sort_Merge_Version_2_Programmers_Guide_Nov79.pdf

The unix sort program is fine for what it does which is sorting toy sized files on
small disks.  There's nothing wrong with that, I still use it all the time, but
other than the name it doesn't have much in common with mainframe sort/merge.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 20:07 ` John Levine
@ 2025-01-18  4:46   ` Dave Horsfall
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Horsfall @ 2025-01-18  4:46 UTC (permalink / raw)
  To: The Eunuchs Hysterical Society

On Sat, 17 Jan 2025, John Levine wrote:

> The unix sort program is fine for what it does which is sorting toy 
> sized files on small disks.  There's nothing wrong with that, I still 
> use it all the time, but other than the name it doesn't have much in 
> common with mainframe sort/merge.

Hands up all those who remember SORMG...

-- Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 19:35   ` Marc Rochkind
@ 2025-01-18 14:51     ` Diomidis Spinellis
  2025-01-18 15:16       ` Larry McVoy
  0 siblings, 1 reply; 17+ messages in thread
From: Diomidis Spinellis @ 2025-01-18 14:51 UTC (permalink / raw)
  To: Marc Rochkind, Bakul Shah; +Cc: TUHS main list

I gave specific examples of facilities offered by the Perkin-Elmer 
Sort/Merge, (file allocation, blocking, interactive and batch modes) 
that on Unix systems are handled in a way that allows all programs to 
benefit from them.  The Unix way reduces duplication and makes the 
system more versatile by offering the facilities to all programs.

I based my comparison on the documented facilities of the two programs. 
I also have some first hand experience with Perkin-Elmer's OS/32.  In 
the 1990s I was involved in servicing and transferring some 
record-keeping applications from a Perkin-Elmer running OS/32 and 
RELIANCE to a Unix system running Ingres.  I found I was a lot more 
productive in Unix's shell than in Perkin-Elmer's MTM.  (Admittedly, 
this could also be a matter of experience.)

In 2018 I used the Unix sort and join commands to speed up a MariaDB 
relational join of a five billion row table with a 847 million row table 
(108 GB in total) from 380 hours to 12 hours [1], so I'm very happy with 
how Unix sort can handle moderately large data sets.  The GNU version 
will even recursively merge intermediate files when it runs out of file 
descriptors.  Even the Seventh Edition sort would overflow to temporary 
files and merge them [2].

I'm sure the mainframe sort programs did some pretty amazing things and 
could run circles around the puny 830 line Unix Seventh Edition sort 
program.  The 215 page IBM DOS VS sort documentation that John Levine 
posted here is particularly impressive.  But I can't stop thinking that, 
in common with the mainframes these programs were running on, they 
represent a mindset that has been surpassed by superior ideas.

[1] https://www.spinellis.gr/blog/20180805/
[2] 
https://github.com/dspinellis/unix-history-repo/blob/Research-V7/usr/src/cmd/sort.c#L350

Diomidis

On 17-Jan-25 21:35, Marc Rochkind wrote:
> Why did you say "thoughtless agglomeration of features?"
> 
> Do you know anything about the design of the P-E S/M, or is just a 
> biased guess? Have you ever tried a large external sort with UNIX commands?
> 
> Marc

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 14:51     ` Diomidis Spinellis
@ 2025-01-18 15:16       ` Larry McVoy
  2025-01-18 15:40         ` Paul Winalski
  2025-01-18 16:00         ` Bakul Shah via TUHS
  0 siblings, 2 replies; 17+ messages in thread
From: Larry McVoy @ 2025-01-18 15:16 UTC (permalink / raw)
  To: Diomidis Spinellis; +Cc: Marc Rochkind, Bakul Shah, TUHS main list

On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
> I'm sure the mainframe sort programs did some pretty amazing things and
> could run circles around the puny 830 line Unix Seventh Edition sort
> program.  The 215 page IBM DOS VS sort documentation that John Levine posted
> here is particularly impressive.  But I can't stop thinking that, in common
> with the mainframes these programs were running on, they represent a mindset
> that has been surpassed by superior ideas.

I disagree.  Go back and read the reply where someone was talking about
sorting datasets that spanned multiple tapes, each of which was much
larger than local disk.  sort(1) can't begin to think about handling
something like that.

I have a lot of respect for how Unix does things, if the problem fits
then the Unix answer is more simple, more flexible, it's better.  If
the problem doesn't fit, the Unix answer is awful.

cmd < data | cmd2 | cmd3

is a LOT of data copying.  A custom answer that did all of that in
one address space is a lot more efficient but also a lot more special
purpose.  Unix wins on flexibility and simplicity, special purpose
wins on performance.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 15:16       ` Larry McVoy
@ 2025-01-18 15:40         ` Paul Winalski
  2025-01-18 16:54           ` Marc Rochkind
  2025-01-19  3:45           ` sjenkin
  2025-01-18 16:00         ` Bakul Shah via TUHS
  1 sibling, 2 replies; 17+ messages in thread
From: Paul Winalski @ 2025-01-18 15:40 UTC (permalink / raw)
  To: Larry McVoy; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

On Sat, Jan 18, 2025 at 10:17 AM Larry McVoy <lm@mcvoy.com> wrote:

> On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
> > But I can't stop thinking that, in common
> > with the mainframes these programs were running on, they represent a
> mindset
> > that has been surpassed by superior ideas.
>
> I disagree.  Go back and read the reply where someone was talking about
> sorting datasets that spanned multiple tapes, each of which was much
> larger than local disk.  sort(1) can't begin to think about handling
> something like that.
>
> I have a lot of respect for how Unix does things, if the problem fits
> then the Unix answer is more simple, more flexible, it's better.  If
> the problem doesn't fit, the Unix answer is awful.
>
> cmd < data | cmd2 | cmd3
>
> is a LOT of data copying.  A custom answer that did all of that in
> one address space is a lot more efficient but also a lot more special
> purpose.  Unix wins on flexibility and simplicity, special purpose
> wins on performance.
>

Another consideration:  the smaller System/360 mainframes ran DOS (Disk
Operating System) or TOS (Tape Operating System, for shops that didn't have
disks).  These were both single-process operating systems.  There is no way
that the Unix method of chaining programs together could have been done.

OS MFT (Multiprogramming with a Fixed number of Tasks) and MVT
(Multiprogramming with a Variable number of Tasks) were multiprocess
systems, but they lacked any interprocess communication system (such as
Unix pipes).

True databases in those days were rare, expensive, slow, and of limited
capacity.  The usual way to, say, produce a list of customers who owed
money, sorted by how much they owed would be:

[1] scan the data set for customers who owed money and write that out to
tape(s)

[2] use sort/merge to sort the data on tape(s) in the desired order

[3] run a program to print the sorted data in the desired format

It is important in step [2] to keep the tapes moving.  Start/stop
operations waste a ton of time.  Most of the complexity of the mainframe
sort/merge programs was in I/O management to keep the devices busy to the
maximum extent.  The gold standard for sort/merge in the IBM world was a
third-party program called SyncSort.  It cost a fortune but was well worth
it for the big shops.

So the short, bottom line answer is that the Unix way wasn't even possible
on the smaller mainframes and was too inefficient for the large ones.

-Paul W.

[-- Attachment #2: Type: text/html, Size: 3108 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 15:16       ` Larry McVoy
  2025-01-18 15:40         ` Paul Winalski
@ 2025-01-18 16:00         ` Bakul Shah via TUHS
  2025-01-18 16:25           ` Tom Lyon
  1 sibling, 1 reply; 17+ messages in thread
From: Bakul Shah via TUHS @ 2025-01-18 16:00 UTC (permalink / raw)
  To: Larry McVoy; +Cc: Marc Rochkind, TUHS main list

On Jan 18, 2025, at 7:16 AM, Larry McVoy <lm@mcvoy.com> wrote:
> 
> On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
>> I'm sure the mainframe sort programs did some pretty amazing things and
>> could run circles around the puny 830 line Unix Seventh Edition sort
>> program.  The 215 page IBM DOS VS sort documentation that John Levine posted
>> here is particularly impressive.  But I can't stop thinking that, in common
>> with the mainframes these programs were running on, they represent a mindset
>> that has been surpassed by superior ideas.
> 
> I disagree.  Go back and read the reply where someone was talking about
> sorting datasets that spanned multiple tapes, each of which was much
> larger than local disk.  sort(1) can't begin to think about handling
> something like that.
> 
> I have a lot of respect for how Unix does things, if the problem fits
> then the Unix answer is more simple, more flexible, it's better.  If
> the problem doesn't fit, the Unix answer is awful.
> 
> cmd < data | cmd2 | cmd3
> 
> is a LOT of data copying.  A custom answer that did all of that in
> one address space is a lot more efficient but also a lot more special
> purpose.  Unix wins on flexibility and simplicity, special purpose
> wins on performance.

Mainframes had usage based pricing, not unlike what you pay for renting
resources in the cloud, so performance really mattered. Also note that
users use whatever computing resources they have available to get their
job done, ideally at the lowest cost. Elegance of any OS architecture
is secondary, if that.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 16:00         ` Bakul Shah via TUHS
@ 2025-01-18 16:25           ` Tom Lyon
  2025-01-18 17:07             ` ron minnich
  0 siblings, 1 reply; 17+ messages in thread
From: Tom Lyon @ 2025-01-18 16:25 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]

Related to the sort discussion, there's an oral history of Duane Whitlow,
founder of SyncSort, which was a big deal in IBM shops in the 70s. (and
perhaps later; I lost track)
https://archive.computerhistory.org/resources/access/text/2013/05/102702251-05-01-acc.pdf

On Sat, Jan 18, 2025 at 8:00 AM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:

> On Jan 18, 2025, at 7:16 AM, Larry McVoy <lm@mcvoy.com> wrote:
> >
> > On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
> >> I'm sure the mainframe sort programs did some pretty amazing things and
> >> could run circles around the puny 830 line Unix Seventh Edition sort
> >> program.  The 215 page IBM DOS VS sort documentation that John Levine
> posted
> >> here is particularly impressive.  But I can't stop thinking that, in
> common
> >> with the mainframes these programs were running on, they represent a
> mindset
> >> that has been surpassed by superior ideas.
> >
> > I disagree.  Go back and read the reply where someone was talking about
> > sorting datasets that spanned multiple tapes, each of which was much
> > larger than local disk.  sort(1) can't begin to think about handling
> > something like that.
> >
> > I have a lot of respect for how Unix does things, if the problem fits
> > then the Unix answer is more simple, more flexible, it's better.  If
> > the problem doesn't fit, the Unix answer is awful.
> >
> > cmd < data | cmd2 | cmd3
> >
> > is a LOT of data copying.  A custom answer that did all of that in
> > one address space is a lot more efficient but also a lot more special
> > purpose.  Unix wins on flexibility and simplicity, special purpose
> > wins on performance.
>
> Mainframes had usage based pricing, not unlike what you pay for renting
> resources in the cloud, so performance really mattered. Also note that
> users use whatever computing resources they have available to get their
> job done, ideally at the lowest cost. Elegance of any OS architecture
> is secondary, if that.
>
>

[-- Attachment #2: Type: text/html, Size: 2680 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 15:40         ` Paul Winalski
@ 2025-01-18 16:54           ` Marc Rochkind
  2025-01-19  3:45           ` sjenkin
  1 sibling, 0 replies; 17+ messages in thread
From: Marc Rochkind @ 2025-01-18 16:54 UTC (permalink / raw)
  Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 4217 bytes --]

Another problem with arrangements of small UNIX commands in pipelines is
that the actual arrangement in use suffers from reliability and usability
problems:

1. No way to test the whole, since in general each application has a unique
structure with a potentially different choice of components, (A shell
program executes whatever commands are on the system, not those it might
have been tested with.)
2. No comprehensive error reporting (at best, reporting from individual
commands), and
3. No way to provide support.

On a much smaller scale, imagine a component stereo setup that is
delivering bad sound. You have a turntable, an arm, a cartridge, a pre-amp,
an amp, speakers, and cables and wires, typically from seven or more
different manufacturers. Not one of them would be able to help you with
support. The dealer would, if you bought the whole lot from them. Or you
could pay a consultant. This is one reason why in the 1960s so-called
console stereos were popular. Generally, console stereos delivered inferior
sound.

This isn't a criticism of sorting with UNIX commands, it's a broader
criticism of the UNIX software tools approach for serious application
development.

Of course, one could build a single system out of components, and package
it all together as a tested and supported product. That's exactly what
object-oriented programming does, and very successfully.

Marc

On Sat, Jan 18, 2025 at 8:50 AM Paul Winalski <paul.winalski@gmail.com>
wrote:

> On Sat, Jan 18, 2025 at 10:17 AM Larry McVoy <lm@mcvoy.com> wrote:
>
>> On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
>> > But I can't stop thinking that, in common
>> > with the mainframes these programs were running on, they represent a
>> mindset
>> > that has been surpassed by superior ideas.
>>
>> I disagree.  Go back and read the reply where someone was talking about
>> sorting datasets that spanned multiple tapes, each of which was much
>> larger than local disk.  sort(1) can't begin to think about handling
>> something like that.
>>
>> I have a lot of respect for how Unix does things, if the problem fits
>> then the Unix answer is more simple, more flexible, it's better.  If
>> the problem doesn't fit, the Unix answer is awful.
>>
>> cmd < data | cmd2 | cmd3
>>
>> is a LOT of data copying.  A custom answer that did all of that in
>> one address space is a lot more efficient but also a lot more special
>> purpose.  Unix wins on flexibility and simplicity, special purpose
>> wins on performance.
>>
>
> Another consideration:  the smaller System/360 mainframes ran DOS (Disk
> Operating System) or TOS (Tape Operating System, for shops that didn't have
> disks).  These were both single-process operating systems.  There is no way
> that the Unix method of chaining programs together could have been done.
>
> OS MFT (Multiprogramming with a Fixed number of Tasks) and MVT
> (Multiprogramming with a Variable number of Tasks) were multiprocess
> systems, but they lacked any interprocess communication system (such as
> Unix pipes).
>
> True databases in those days were rare, expensive, slow, and of limited
> capacity.  The usual way to, say, produce a list of customers who owed
> money, sorted by how much they owed would be:
>
> [1] scan the data set for customers who owed money and write that out to
> tape(s)
>
> [2] use sort/merge to sort the data on tape(s) in the desired order
>
> [3] run a program to print the sorted data in the desired format
>
> It is important in step [2] to keep the tapes moving.  Start/stop
> operations waste a ton of time.  Most of the complexity of the mainframe
> sort/merge programs was in I/O management to keep the devices busy to the
> maximum extent.  The gold standard for sort/merge in the IBM world was a
> third-party program called SyncSort.  It cost a fortune but was well worth
> it for the big shops.
>
> So the short, bottom line answer is that the Unix way wasn't even possible
> on the smaller mainframes and was too inefficient for the large ones.
>
> -Paul W.
>
>
>

-- 
Subscribe to my Photo-of-the-Week emails at my website mrochkind.com.

[-- Attachment #2: Type: text/html, Size: 5314 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 16:25           ` Tom Lyon
@ 2025-01-18 17:07             ` ron minnich
  2025-01-18 19:39               ` Marc Rochkind
  0 siblings, 1 reply; 17+ messages in thread
From: ron minnich @ 2025-01-18 17:07 UTC (permalink / raw)
  To: Tom Lyon; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2270 bytes --]

I checked and syncsort is still  out there, doing their thing. Fifty years
of sorting! Sort of amazing.

On Sat, Jan 18, 2025 at 8:40 AM Tom Lyon <pugs78@gmail.com> wrote:

> Related to the sort discussion, there's an oral history of Duane Whitlow,
> founder of SyncSort, which was a big deal in IBM shops in the 70s. (and
> perhaps later; I lost track)
>
> https://archive.computerhistory.org/resources/access/text/2013/05/102702251-05-01-acc.pdf
>
> On Sat, Jan 18, 2025 at 8:00 AM Bakul Shah via TUHS <tuhs@tuhs.org> wrote:
>
>> On Jan 18, 2025, at 7:16 AM, Larry McVoy <lm@mcvoy.com> wrote:
>> >
>> > On Sat, Jan 18, 2025 at 04:51:15PM +0200, Diomidis Spinellis wrote:
>> >> I'm sure the mainframe sort programs did some pretty amazing things and
>> >> could run circles around the puny 830 line Unix Seventh Edition sort
>> >> program.  The 215 page IBM DOS VS sort documentation that John Levine
>> posted
>> >> here is particularly impressive.  But I can't stop thinking that, in
>> common
>> >> with the mainframes these programs were running on, they represent a
>> mindset
>> >> that has been surpassed by superior ideas.
>> >
>> > I disagree.  Go back and read the reply where someone was talking about
>> > sorting datasets that spanned multiple tapes, each of which was much
>> > larger than local disk.  sort(1) can't begin to think about handling
>> > something like that.
>> >
>> > I have a lot of respect for how Unix does things, if the problem fits
>> > then the Unix answer is more simple, more flexible, it's better.  If
>> > the problem doesn't fit, the Unix answer is awful.
>> >
>> > cmd < data | cmd2 | cmd3
>> >
>> > is a LOT of data copying.  A custom answer that did all of that in
>> > one address space is a lot more efficient but also a lot more special
>> > purpose.  Unix wins on flexibility and simplicity, special purpose
>> > wins on performance.
>>
>> Mainframes had usage based pricing, not unlike what you pay for renting
>> resources in the cloud, so performance really mattered. Also note that
>> users use whatever computing resources they have available to get their
>> job done, ideally at the lowest cost. Elegance of any OS architecture
>> is secondary, if that.
>>
>>

[-- Attachment #2: Type: text/html, Size: 3248 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 17:07             ` ron minnich
@ 2025-01-18 19:39               ` Marc Rochkind
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Rochkind @ 2025-01-18 19:39 UTC (permalink / raw)
  To: ron minnich; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 373 bytes --]

On Sat, Jan 18, 2025 at 12:30 PM ron minnich <rminnich@gmail.com> wrote:

> I checked and syncsort is still  out there, doing their thing. Fifty years
> of sorting! Sort of amazing.
>

You mean the product has been on the market that long, or that a sort is
still running? ;-)

Marc

-- 
Subscribe to my Photo-of-the-Week emails at my website mrochkind.com.

[-- Attachment #2: Type: text/html, Size: 923 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-18 15:40         ` Paul Winalski
  2025-01-18 16:54           ` Marc Rochkind
@ 2025-01-19  3:45           ` sjenkin
  1 sibling, 0 replies; 17+ messages in thread
From: sjenkin @ 2025-01-19  3:45 UTC (permalink / raw)
  To: TUHS

I’d like to challenge the "Big Iron” hypothesis having worked with IBM/370 systems early on, DOS-VS, VM/CMS and some OS/MVS.
The system design and standard tools forced considerable complexity & waste in CPU time & storage compared to Unix I'd used at UNSW.

Probably the harshest criticism is the lack of O/S & tool development forced by IBM’s “backwards compatibility” model - at least while I had to battle it.

 [ Ken Robinson at UNSW had used OS/360 since ~1965. in 1975 he warned me about a pernicious batch job error message, ]
 [  “No space” - except it didn’t say on _which_ ‘DD’ (data definition == file). The O/S _knew_ exactly what was wrong, but didn’t say.]
 [ I hit this problem at work ~1985, costing me a week or two of time, plus considerable ‘chargeback’ expenses for wasted CPU & disk usage ]
 [ the problem was a trivial one if I’d had Unix piplelines available]

Just because mainframes are still used for the majority of business critical online “transaction” systems, doesn’t mean they are great, even good, solutions.
It only means the "cost of exit” is more than the owners wish to pay, it’s cheaper to keep old designs running than to change.

To achieve the perceived ‘high performance’ of mainframes required considerable SysProg, programmer/analyst & Operations work/ time.
Simple things such as the optimum ‘block size’ for a particular disk drive caused months of work for our operations team when we changed drives.
(2314 removable to 3350 sealed HDA’s)

Andrew Hume’s “Project Gecko” is worth reading for those who don’t know it.
I’m sure if Andrew & team had been tried to build a similar system a decade before, they’d have figured a way to stream data between tape drives,
the initial use-case for ’syncsort’ discussed.

Andrew used the standard Unix tools, a small amount of C, flat files and intelligent ’streaming processing’ from one disk to another, then back,
to push a SUN system to its limits, and handsomely beat Oracle.

We’ve already had the Knuth / McIlroy ‘literate programming’ vs ’shell one-liner’ example in this thread.

It comes down to the same thing:

	Unix’s philosophy is good design and “Tools to Build Tools”,
	allowing everyone to Stand On the Shoulders of Giants, 
	not _have_ to endlessly reinvent the wheel for themselves, 
	which the mainframe world forces on everyone.

============

Gecko: tracking a very large billing system
	Andrew Hume, Scott Daniels, Angus MacLellan
	2000
	<https://www.usenix.org/legacy/event/usenix2000/general/full_papers/hume/hume.pdf>

============

> On 19 Jan 2025, at 02:40, Paul Winalski <paul.winalski@gmail.com> wrote:
> 
> Another consideration:  the smaller System/360 mainframes ran DOS (Disk Operating System) or TOS (Tape Operating System, for shops that didn't have disks).  These were both single-process operating systems.  There is no way that the Unix method of chaining programs together could have been done.
> 
> OS MFT (Multiprogramming with a Fixed number of Tasks) and MVT (Multiprogramming with a Variable number of Tasks) were multiprocess systems, but they lacked any interprocess communication system (such as Unix pipes).
> 
> True databases in those days were rare, expensive, slow, and of limited capacity.  The usual way to, say, produce a list of customers who owed money, sorted by how much they owed would be:
> 
> [1] scan the data set for customers who owed money and write that out to tape(s)
> 
> [2] use sort/merge to sort the data on tape(s) in the desired order
> 
> [3] run a program to print the sorted data in the desired format
> 
> It is important in step [2] to keep the tapes moving.  Start/stop operations waste a ton of time.  Most of the complexity of the mainframe sort/merge programs was in I/O management to keep the devices busy to the maximum extent.  The gold standard for sort/merge in the IBM world was a third-party program called SyncSort.  It cost a fortune but was well worth it for the big shops.
> 
> So the short, bottom line answer is that the Unix way wasn't even possible on the smaller mainframes and was too inefficient for the large ones.
> 
> -Paul W.

============

Gecko: tracking a very large billing system
	Andrew Hume, Scott Daniels, Angus MacLellan
	1999/2000
	<https://www.usenix.org/legacy/event/usenix2000/general/full_papers/hume/hume.pdf>

	This paper describes Gecko, a system for tracking the state of every call in a very large billing system,
	 which uses sorted flat files to implement a database of about 60G records occupying 2.6TB.

	After a team at Research, including two interns from Consumer Billing, built a successful prototype in 1996, 
	the decision was made to build a production version. 
	A team of six people (within Consumer Billing) started in March 1997 and the system went live in December 1997.

	The design we implemented to solve the database problem does not use conventional database technology; 
	as described in [Hum99], we experimented with an Oracle-based implementation, but it was unsatisfactory.

	Instead, we used sorted flat files and relied on the speed and I/O capacity of modern high-end Unix systems, such as large SGI and Sun systems.

	The system supporting the datastore is a Sun E10000, with 32 processors and 6GB of memory, running Solaris 2.6. 
	The datastore disk storage is provided by 16 A3000 (formerly RSM2000) RAID cabinets, 
		which provides about 3.6TB of RAID-5 disk storage. 
	For backup purposes, we have a StorageTek 9310 Powderhorn tape silo with 8 Redwood tape drives.

	The datastore is organised as 93 filesystems, each with 52 directories; each directory contains a partition of the datastore…

	We can characterise Gecko’s performance by two measures. 
		The first is how long it takes to achieve the report and cycle end gates. 
		The second is how fast we can scan the datastore performing an ad hoc search/extract.

	Over the last 12 cycles, the report gate ranged between 6.1 and 9.9 wall clock hours, with an average time of 7.6 hours. 

	The cycle end gate is reached after the updated datastore has been backed up and any other housekeeping chores have been completed. 
	Over the last 12 cycles, the cycle end gate ranged between 11.1 and 15.1 wall clock hours, 
		with an average time of 11.5 hours. 
	Both these averages comfortably beat the original requirements.

	The implementation of Gecko relies heavily on a modest number of tools in the implementation of its processing and the management of that processing. 
	Nearly all of these have application beyond Gecko and so we describe them here.

	Most of the code is written in C and ksh; the remainder is in awk.

	The Gecko scripts make extensive use of grep, and in particular, fgrep for searching for many fixed strings in a file. 
	Solaris’s fgrep has an unacceptably low limit on the number of strings (we routinely search for 5-6000 strings, and sometimes 20000 or so). 

	The XPG4 version has much higher limits, but runs unacceptably slowly with large lists. 

	We finally switched to gre, developed by Andrew Hume in 1986. 
	For our larger lists, it runs about 200 times faster, cutting run times from 45 minutes down to 15 seconds or so.

============

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
@ 2025-01-21 21:53 Douglas McIlroy
  0 siblings, 0 replies; 17+ messages in thread
From: Douglas McIlroy @ 2025-01-21 21:53 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 1859 bytes --]

All-in-one vs pipelined sorts brought to mind NSA's undeservedly obscure
dataflow language, POGOL, https://doi.org/10.1145/512927.512948 (POPL
1973). In POGOL one wrote programs as collections of routines that
communicated via named files, which the compiler did its best to optimize
away. Often this amounted to loop jamming or to the distributive law for
map over function composition. POGOL could, however, handle general
dataflow programming including feedback loops.

One can imagine a program for pulling the POGOL trick on a shell pipeline.
That could accomplish--at negligible cost--the conversion of a cheap demo
into a genuine candidate for intensive production use.

This consideration spurs another thought. Despite Unix's claim to build
tools to make tools, only a relativelly narrow scope of higher-order tools
that take programs as dara ever arose. After the bootstrapping B, there
were a number of compilers,  most notably C, plus  f77, bc, ratfor, and
struct. A slight variant on the idea of compiling was the suite of troff
preprocessors.

The shell also manipulates programs by composing them into larger programs.

Aside from such examples, only one other category of higher-order Unix
program comes to mind: Peter Weinberger's lcomp for instrumenting C
programs with instruction counts.

An offshoot of Unix were Gerard Holzmann's tools for extracting
model-checker models from C programs. These saw use at Indian Hill and most
notably at JPL, but never appeared among mainstream Unix offerings. Similar
tools exist in-house at Microsoft and elsewhere. But generally speaking we
have vey few kinds of programs that manipulate programs.

What are the prospects for computer science advancing to a stage where
higher-level programs become commonplace? What might be in one's standard
vocabulary of functions that operate on programs?

Doug

[-- Attachment #2: Type: text/html, Size: 2351 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
  2025-01-17 18:12 Douglas McIlroy
@ 2025-01-18  4:29 ` G. Branden Robinson
  0 siblings, 0 replies; 17+ messages in thread
From: G. Branden Robinson @ 2025-01-18  4:29 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

At 2025-01-17T13:12:23-0500, Douglas McIlroy wrote:
> It wasn't a setup. Although Jon's introduction seems to imply that he
> had invited both Don and me to participate, I actually was moved to
> write the critique when I proofread the 2-author column, as I did for
> many of Jon's Programming Pearls. That led to the 3-author
> arrangement. Knuth and I are still friends; he even reprinted the
> critique. It is also memorably depicted at
> https://comic.browserling.com/tag/douglas-mcilroy.

Can an episode of Epic Rap Battles of History be far behind?

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [TUHS] Re: Perkin-Elmer Sort/Merge II vs Unix sort(1)
@ 2025-01-17 18:12 Douglas McIlroy
  2025-01-18  4:29 ` G. Branden Robinson
  0 siblings, 1 reply; 17+ messages in thread
From: Douglas McIlroy @ 2025-01-17 18:12 UTC (permalink / raw)
  To: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

> To me this list is a real-life demonstration of the differences between
> the, prevalent at the time, thoughtless agglomeration of features into a
> monolith approach against Unix's careful separation of concerns and
> modularization via small tools.  The same contrast appears in a more
> contrived setting in J. Bentley's CACM Programming Pearl's column where
> Doug McIlroy critiques a unique word counting literate program written
> by Don Knuth [3].  (I slightly suspect that the initial program
> specification was a trap set up for Knuth.)

It wasn't a setup. Although Jon's introduction seems to imply that he had
invited both Don and me to participate, I actually was moved to write the
critique when I proofread the 2-author column, as I did for many of Jon's
Programming Pearls. That led to the 3-author arrangement. Knuth and
I are still friends; he even reprinted the critique. It is also memorably
depicted at https://comic.browserling.com/tag/douglas-mcilroy.

Doug

[-- Attachment #2: Type: text/html, Size: 1449 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-01-21 21:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-17 17:23 [TUHS] Perkin-Elmer Sort/Merge II vs Unix sort(1) Diomidis Spinellis
2025-01-17 19:10 ` [TUHS] " Bakul Shah via TUHS
2025-01-17 19:35   ` Marc Rochkind
2025-01-18 14:51     ` Diomidis Spinellis
2025-01-18 15:16       ` Larry McVoy
2025-01-18 15:40         ` Paul Winalski
2025-01-18 16:54           ` Marc Rochkind
2025-01-19  3:45           ` sjenkin
2025-01-18 16:00         ` Bakul Shah via TUHS
2025-01-18 16:25           ` Tom Lyon
2025-01-18 17:07             ` ron minnich
2025-01-18 19:39               ` Marc Rochkind
2025-01-17 20:07 ` John Levine
2025-01-18  4:46   ` Dave Horsfall
2025-01-17 18:12 Douglas McIlroy
2025-01-18  4:29 ` G. Branden Robinson
2025-01-21 21:53 Douglas McIlroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).