The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
* [TUHS] Re: Nice video with Brian Kernighan
@ 2022-08-20 15:48 Douglas McIlroy
  2022-08-20 16:17 ` Clem Cole
  2022-08-21  9:36 ` Mohamed Akram
  0 siblings, 2 replies; 10+ messages in thread
From: Douglas McIlroy @ 2022-08-20 15:48 UTC (permalink / raw)
  To: TUHS main list

Brian's tribute to the brilliant regex mechanism that awk borrowed
from egrep  spurred memories.

For more than forty years I claimed credit for stimulating Ken to
liberate grep from ed. Then, thanks to TUHS, I learned that I had
merely caused Ken to spring from the closet a program he had already
made for his own use.

There's a related story for egrep. Al Aho made a deterministic
regular-expression recognizer as a faster replacement for the
non-deterministic recognizer in grep. He also extended the domain of
patterns to full regular expressions, including alternation; thus the
"e" in egrep.

About the same time, I built on Norm Shryer's personal calendar
utility. I wanted to generalize Norm's strict syntax for dates to
cover most any (American) representation of dates, and to warn about
tomorrow's calendar as well as today's--where "tomorrow" could extend
across a weekend or holiday.

Egrep was just the tool I needed for picking the dates out of a
free-form calendar file. I wrote a little program that built an egrep
pattern based on today's date. The following mouthful for Saturday,
August 20 covers Sunday and Monday, too. (Note that, in egrep, newline
is a synonym for |, the alternation operator.)

        (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*20)([^0123456789]|$)
        (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*21)([^0123456789]|$)
        (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*22)([^0123456789]|$)

It worked like a charm, except that it took a good part of a minute to
handle even a tiny calendar file. The reason: the state count of the
deterministic automaton was exponentially larger than the regular
regular expression; and egrep had to build the automaton before it
could run it. Al was mortified that an early serious use of egrep
should be such a turkey.

But Al was undaunted. He replaced the automaton construction with an
equivalent lazy algorithm that constructed a state only when the
recognizer was about to visit it. This made egrep into the brilliant
tool that Brian praised.

What I don't know is whether the calendar program stimulated the idea
of lazy implementation, or whether Al, like Ken before him with grep,
already had the idea up his sleeve.

Doug

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-20 15:48 [TUHS] Re: Nice video with Brian Kernighan Douglas McIlroy
@ 2022-08-20 16:17 ` Clem Cole
  2022-08-20 18:24   ` Rich Morin
  2022-08-21  9:36 ` Mohamed Akram
  1 sibling, 1 reply; 10+ messages in thread
From: Clem Cole @ 2022-08-20 16:17 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3095 bytes --]

Doug,

No matter.  I have often thought about what an amazing muse you were to so
many people regarding so many different ideas that have panned out.  It
seems to me that you were always there at the right time.  Your powers to
get the best from everyone around you are unlike anyone else I have ever
been lucky enough to have met. Although I have also often said to students
the one really hard thing to directly teach, that you can learn by looking
at people that came before you is 'good taste.'   We all owe you as much
thanks for being there and inspiring your peers, as for their brilliance in
implementing the concepts with style and taste.

Clem
ᐧ

On Sat, Aug 20, 2022 at 11:49 AM Douglas McIlroy <
douglas.mcilroy@dartmouth.edu> wrote:

> Brian's tribute to the brilliant regex mechanism that awk borrowed
> from egrep  spurred memories.
>
> For more than forty years I claimed credit for stimulating Ken to
> liberate grep from ed. Then, thanks to TUHS, I learned that I had
> merely caused Ken to spring from the closet a program he had already
> made for his own use.
>
> There's a related story for egrep. Al Aho made a deterministic
> regular-expression recognizer as a faster replacement for the
> non-deterministic recognizer in grep. He also extended the domain of
> patterns to full regular expressions, including alternation; thus the
> "e" in egrep.
>
> About the same time, I built on Norm Shryer's personal calendar
> utility. I wanted to generalize Norm's strict syntax for dates to
> cover most any (American) representation of dates, and to warn about
> tomorrow's calendar as well as today's--where "tomorrow" could extend
> across a weekend or holiday.
>
> Egrep was just the tool I needed for picking the dates out of a
> free-form calendar file. I wrote a little program that built an egrep
> pattern based on today's date. The following mouthful for Saturday,
> August 20 covers Sunday and Monday, too. (Note that, in egrep, newline
> is a synonym for |, the alternation operator.)
>
>         (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*20)([^0123456789]|$)
>         (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*21)([^0123456789]|$)
>         (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*22)([^0123456789]|$)
>
> It worked like a charm, except that it took a good part of a minute to
> handle even a tiny calendar file. The reason: the state count of the
> deterministic automaton was exponentially larger than the regular
> regular expression; and egrep had to build the automaton before it
> could run it. Al was mortified that an early serious use of egrep
> should be such a turkey.
>
> But Al was undaunted. He replaced the automaton construction with an
> equivalent lazy algorithm that constructed a state only when the
> recognizer was about to visit it. This made egrep into the brilliant
> tool that Brian praised.
>
> What I don't know is whether the calendar program stimulated the idea
> of lazy implementation, or whether Al, like Ken before him with grep,
> already had the idea up his sleeve.
>
> Doug
>

[-- Attachment #2: Type: text/html, Size: 4249 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-20 16:17 ` Clem Cole
@ 2022-08-20 18:24   ` Rich Morin
  2022-08-21 15:09     ` John Cowan
  0 siblings, 1 reply; 10+ messages in thread
From: Rich Morin @ 2022-08-20 18:24 UTC (permalink / raw)
  To: TUHS main list

After watching the video, I remain curious about a couple of things.

Q: What were Brian's main contributions to AWK? (aside from the book :-)

Q: Where did the idea for AWK originate?

FWIW, my spouse (Vicki Brown) used AWK to support her Master's thesis.  She:

- defined a common, human-friendly data format
- used AWK to convert it for submission to IBM and Univac programs
- used AWK to boil down the output (printer plots of dendograms)
- used AWK to convert the data for use with my SunCore interpreter

This all worked very well, but some real pain was involved when her advisor asked her to convert her scripts to Fortran...

-r


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-20 15:48 [TUHS] Re: Nice video with Brian Kernighan Douglas McIlroy
  2022-08-20 16:17 ` Clem Cole
@ 2022-08-21  9:36 ` Mohamed Akram
  1 sibling, 0 replies; 10+ messages in thread
From: Mohamed Akram @ 2022-08-21  9:36 UTC (permalink / raw)
  To: Douglas McIlroy; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 3648 bytes --]

Hi folks,

This is my first time posting on this list, it’s been such a joy to read about so many little-known yet enduring and consequential aspects of UNIX. I had written a short post [1] about the calendar utility some time ago, with a brief glimpse at its history. Seeing its implementation, it certainly made me scratch my head a bit - who would think to create a program whose sole purpose was to dynamically generate a regular expression that would then be fed into another program (it doesn’t stop there either, as my post goes into). I found it to be perhaps the most illustrative and comprehensive example of UNIX composition that I had come across. Unbeknownst to me that it was Douglas McIlroy who had written this program, which in hindsight should not come as a surprise at all, him being the exemplar of composing simple, orthogonal, yet robust tools to get a job done quickly and efficiently.

Thank you Doug for your reply, I thoroughly enjoyed learning more about the origins and history of the calendar program. That it was the impetus to turn egrep into the performant and viable tool that we know today further colors the picture of this unassuming utility.

[1] https://akr.am/blog/posts/today-in-history-brought-to-you-by-unix

Regards,
Mohamed

On Aug 20, 2022, at 7:48 PM, Douglas McIlroy <douglas.mcilroy@dartmouth.edu<mailto:douglas.mcilroy@dartmouth.edu>> wrote:

Brian's tribute to the brilliant regex mechanism that awk borrowed
from egrep  spurred memories.

For more than forty years I claimed credit for stimulating Ken to
liberate grep from ed. Then, thanks to TUHS, I learned that I had
merely caused Ken to spring from the closet a program he had already
made for his own use.

There's a related story for egrep. Al Aho made a deterministic
regular-expression recognizer as a faster replacement for the
non-deterministic recognizer in grep. He also extended the domain of
patterns to full regular expressions, including alternation; thus the
"e" in egrep.

About the same time, I built on Norm Shryer's personal calendar
utility. I wanted to generalize Norm's strict syntax for dates to
cover most any (American) representation of dates, and to warn about
tomorrow's calendar as well as today's--where "tomorrow" could extend
across a weekend or holiday.

Egrep was just the tool I needed for picking the dates out of a
free-form calendar file. I wrote a little program that built an egrep
pattern based on today's date. The following mouthful for Saturday,
August 20 covers Sunday and Monday, too. (Note that, in egrep, newline
is a synonym for |, the alternation operator.)

       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*20)([^0123456789]|$)
       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*21)([^0123456789]|$)
       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*22)([^0123456789]|$)

It worked like a charm, except that it took a good part of a minute to
handle even a tiny calendar file. The reason: the state count of the
deterministic automaton was exponentially larger than the regular
regular expression; and egrep had to build the automaton before it
could run it. Al was mortified that an early serious use of egrep
should be such a turkey.

But Al was undaunted. He replaced the automaton construction with an
equivalent lazy algorithm that constructed a state only when the
recognizer was about to visit it. This made egrep into the brilliant
tool that Brian praised.

What I don't know is whether the calendar program stimulated the idea
of lazy implementation, or whether Al, like Ken before him with grep,
already had the idea up his sleeve.

Doug


[-- Attachment #2: Type: text/html, Size: 5148 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-20 18:24   ` Rich Morin
@ 2022-08-21 15:09     ` John Cowan
  2022-08-21 15:41       ` Clem Cole
  2022-08-21 20:07       ` Rich Morin
  0 siblings, 2 replies; 10+ messages in thread
From: John Cowan @ 2022-08-21 15:09 UTC (permalink / raw)
  To: Rich Morin; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On Sat, Aug 20, 2022 at 2:26 PM Rich Morin <rdm@cfcl.com> wrote:


> - used AWK to convert the data for use with my SunCore interpreter
>

What is this SunCore of which you speak?  Dr. Google reports too many
confounds.

> This all worked very well, but some real pain was involved when her
> advisor asked her to convert her scripts to Fortran...
>

Nowadays, of course, awk is actually more readily available than Fortran.

[-- Attachment #2: Type: text/html, Size: 1294 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-21 15:09     ` John Cowan
@ 2022-08-21 15:41       ` Clem Cole
  2022-08-22  2:49         ` John Cowan
  2022-08-21 20:07       ` Rich Morin
  1 sibling, 1 reply; 10+ messages in thread
From: Clem Cole @ 2022-08-21 15:41 UTC (permalink / raw)
  To: John Cowan; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 2235 bytes --]

On Sun, Aug 21, 2022 at 11:11 AM John Cowan <cowan@ccil.org> wrote:

> Nowadays, of course, awk is actually more readily available than Fortran.
>
> Becare of a statement/thinking like that.   While you and I might not
program with it, I can show you some interesting usage graphs.   Simply
over 90% of all supercomputer cycles are still Fortran (why - because the
Math has not changed - *a.k.a.* Cole's law).  Plus Fortran2018 is not the
language Rich and I learned in the 1960s and 1970s.  Also remember that
there are multiple extremely good commercial (production quality)
Fortran2018 implementations that are freely available for download for
everything from Windows to Linux to macOS [as I like to say - I don't
program in it, but FTN has paid my salary pretty much my mine entire 55+
years in the biz and make damned sure my OS and my systems run programs
compiled with it really well].  If you are interested, here is a pointer to
the Intel one: HPC Toolkit Download
<https://streaklinks.com/BK93i3_EhBDjUuXGPgr-7jLH/https%3A%2F%2Fwww.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdeveloper%2Ftools%2Foneapi%2Fhpc-toolkit-download.html>
which has the DNA from the old DEC compilers ground up and injected into
BTW [note you will need to download the free C/C++ compiler too which
contains the runtimes libraries that Fortran uses and shares].  While its
Fortran 2018, it will even compile 'dust decks Fortran-IV' - fixed format
too.   Programs like Adventure 'just work' (are actually part of the test
suite).    FWIW: I believe the Portland Group's compilers were/are also
freely available and maybe IBM's also but I have not tried to get them in a
few years.

BTW:  I have a young Mech E professor friend teaching/doing research @ an
infamous engineering school here in the Boston area.  He got his PhD about
5-6 years ago at another infamous school in the midwest.   What are all his
students using for their research? (which is thermal properties of
materials - trying to get the heat out our Si we can run them faster
without them melting). It is all Fortran, with a little bit of Numpy
(running on their Macs) to prep the data, but anything that matters runs on
the clusters in is Fortran.
ᐧ

[-- Attachment #2: Type: text/html, Size: 3942 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-21 15:09     ` John Cowan
  2022-08-21 15:41       ` Clem Cole
@ 2022-08-21 20:07       ` Rich Morin
  1 sibling, 0 replies; 10+ messages in thread
From: Rich Morin @ 2022-08-21 20:07 UTC (permalink / raw)
  To: TUHS main list

> On Aug 21, 2022, at 08:09, John Cowan <cowan@ccil.org> wrote:
> 
> On Sat, Aug 20, 2022 at 2:26 PM Rich Morin <rdm@cfcl.com> wrote:
>  
> - used AWK to convert the data for use with my SunCore interpreter
> 
> What is this SunCore of which you speak?  Dr. Google reports too many confounds. ...

I was able to find some web mentions of the relevant SunCore.  I've put a set of links below, which others may well be able to improve upon.  Anyway, the SunCore Graphics Package shipped with early versions of SunOS. It was a set of C libraries which allowed programs to draw on the bitmapped display.  My interpreter read simple text commands (eg, "fn_name arg_1 ..."), parsed them, and made the specified library calls.

-r

P.S. For the curious...

The dendrogram plotting software, which ran on U of MD's IBM and Univac mainframes, generated line printer plot files.  These used characters such as dashes and vertical bars to draw the dendrogram "trees".  So, Vicki's code needed to scan the files, extract the shape of each tree, and generate plotting commands for my interpreter.

The production process for that part of Vicki's thesis was roughly as follows:

- hand-code data files in a common, human-friendly format (vi)
- convert into formats for the IBM and Univac software (AWK)
- upload and process the files, then download the results
- analyze the line printer plot files of dendrograms (AWK)
- generate commands for the SunCore interpreter (AWK)
- run the interpreter, generating diagrams on the display
- dump bitmap images of the displayed diagrams
- print the images, using a dot-matrix printer

The text portion of the thesis was generated using a different tool chain:

- create and/or edit the thesis text (vi)
- format the text for printing (nroff)
- print on an IBM I/O selectric (Datel 30)

Printing on the Datel 30 was complicated by several factors.  It wanted BCDIC correspondence code, rather than ASCII.  Also, it needed null characters to provide enough time for various activities (eg, print ball rotation, carriage returns, line feeds).  And, given that paper feeding was a manual process, we needed a way to initiate printing of a new page, reprint botched pages, etc.  So, I wrote a small utility program that handled all of this.

# Links

https://en.wikipedia.org/wiki/Dendrogram

http://vtda.org/docs/computing/Sun/software/800-1115-01%20-%20SunOS%201.1%20Programmer's%20Reference%20Manual%20for%20SunCore.pdf

http://vtda.org/docs/computing/Sun/software/800-1787-10_SunCoreReferenceManual_RevA_9May88.pdf

http://www-lehre.inf.uos.de/~sp/Man/_Man_SunOS_4.1.3_html/html6/suncoredemos.6.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-21 15:41       ` Clem Cole
@ 2022-08-22  2:49         ` John Cowan
  0 siblings, 0 replies; 10+ messages in thread
From: John Cowan @ 2022-08-22  2:49 UTC (permalink / raw)
  To: Clem Cole; +Cc: TUHS main list

[-- Attachment #1: Type: text/plain, Size: 640 bytes --]

On Sun, Aug 21, 2022 at 11:41 AM Clem Cole <clemc@ccc.com> wrote:


> On Sun, Aug 21, 2022 at 11:11 AM John Cowan <cowan@ccil.org> wrote:
>
>> Nowadays, of course, awk is actually more readily available than Fortran.
>>
>> Becare of a statement/thinking like that.
>

What I meant was that if you walk up to a (non-Windows) computer and type
`awk`, it starts up (and prints a help message, if it's gawk).  But if you
type f90 or whatever, you probably get "f90: command not found".  That's
not to say that you can't install it easily enough if you have root, but it
does mean Fortran is less widely available.

>
> ᐧ
>

[-- Attachment #2: Type: text/html, Size: 2443 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-18 21:58 ` [TUHS] " Clem Cole
@ 2022-08-18 22:48   ` Pete Wright via TUHS
  0 siblings, 0 replies; 10+ messages in thread
From: Pete Wright via TUHS @ 2022-08-18 22:48 UTC (permalink / raw)
  To: tuhs

[-- Attachment #1: Type: text/plain, Size: 295 bytes --]


On 8/18/22 14:58, Clem Cole wrote:
> Thanks Arnold -- always fun.

agreed!  i shared this with my data engineering team - i hope historical 
context like this helps them in how they approach their data processing 
tasks.

-pete

-- 
Pete Wright
pete@nomadlogic.org
@nomadlogicLA

[-- Attachment #2: Type: text/html, Size: 985 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [TUHS] Re: Nice video with Brian Kernighan
  2022-08-18 10:01 [TUHS] " Arnold Robbins
@ 2022-08-18 21:58 ` Clem Cole
  2022-08-18 22:48   ` Pete Wright via TUHS
  0 siblings, 1 reply; 10+ messages in thread
From: Clem Cole @ 2022-08-18 21:58 UTC (permalink / raw)
  To: Arnold Robbins; +Cc: tuhs

[-- Attachment #1: Type: text/plain, Size: 310 bytes --]

Thanks Arnold -- always fun.
ᐧ

On Thu, Aug 18, 2022 at 6:01 AM Arnold Robbins <arnold@skeeve.com> wrote:

> https://www.youtube.com/watch?v=GNyQxXw_oMQ
>
> Not quite 30 minutes long. Mostly about the history of awk but some
> other stuff, including a nice plug for TUHS at the end.
>
> Arnold
>

[-- Attachment #2: Type: text/html, Size: 1059 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-08-22  2:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-20 15:48 [TUHS] Re: Nice video with Brian Kernighan Douglas McIlroy
2022-08-20 16:17 ` Clem Cole
2022-08-20 18:24   ` Rich Morin
2022-08-21 15:09     ` John Cowan
2022-08-21 15:41       ` Clem Cole
2022-08-22  2:49         ` John Cowan
2022-08-21 20:07       ` Rich Morin
2022-08-21  9:36 ` Mohamed Akram
  -- strict thread matches above, loose matches on Subject: below --
2022-08-18 10:01 [TUHS] " Arnold Robbins
2022-08-18 21:58 ` [TUHS] " Clem Cole
2022-08-18 22:48   ` Pete Wright via TUHS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).