The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Mohamed Akram <mohd.akram@outlook.com>
To: Douglas McIlroy <douglas.mcilroy@dartmouth.edu>
Cc: TUHS main list <tuhs@tuhs.org>
Subject: [TUHS] Re: Nice video with Brian Kernighan
Date: Sun, 21 Aug 2022 09:36:03 +0000	[thread overview]
Message-ID: <A7E20292-66B3-467E-9507-C69B39BA9B73@outlook.com> (raw)
In-Reply-To: <CAKH6PiWx5jUsr7B65ZXnJW8vvJTWWvZVb5Xfkxqp1SnOL7dUkg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3648 bytes --]

Hi folks,

This is my first time posting on this list, it’s been such a joy to read about so many little-known yet enduring and consequential aspects of UNIX. I had written a short post [1] about the calendar utility some time ago, with a brief glimpse at its history. Seeing its implementation, it certainly made me scratch my head a bit - who would think to create a program whose sole purpose was to dynamically generate a regular expression that would then be fed into another program (it doesn’t stop there either, as my post goes into). I found it to be perhaps the most illustrative and comprehensive example of UNIX composition that I had come across. Unbeknownst to me that it was Douglas McIlroy who had written this program, which in hindsight should not come as a surprise at all, him being the exemplar of composing simple, orthogonal, yet robust tools to get a job done quickly and efficiently.

Thank you Doug for your reply, I thoroughly enjoyed learning more about the origins and history of the calendar program. That it was the impetus to turn egrep into the performant and viable tool that we know today further colors the picture of this unassuming utility.

[1] https://akr.am/blog/posts/today-in-history-brought-to-you-by-unix

Regards,
Mohamed

On Aug 20, 2022, at 7:48 PM, Douglas McIlroy <douglas.mcilroy@dartmouth.edu<mailto:douglas.mcilroy@dartmouth.edu>> wrote:

Brian's tribute to the brilliant regex mechanism that awk borrowed
from egrep  spurred memories.

For more than forty years I claimed credit for stimulating Ken to
liberate grep from ed. Then, thanks to TUHS, I learned that I had
merely caused Ken to spring from the closet a program he had already
made for his own use.

There's a related story for egrep. Al Aho made a deterministic
regular-expression recognizer as a faster replacement for the
non-deterministic recognizer in grep. He also extended the domain of
patterns to full regular expressions, including alternation; thus the
"e" in egrep.

About the same time, I built on Norm Shryer's personal calendar
utility. I wanted to generalize Norm's strict syntax for dates to
cover most any (American) representation of dates, and to warn about
tomorrow's calendar as well as today's--where "tomorrow" could extend
across a weekend or holiday.

Egrep was just the tool I needed for picking the dates out of a
free-form calendar file. I wrote a little program that built an egrep
pattern based on today's date. The following mouthful for Saturday,
August 20 covers Sunday and Monday, too. (Note that, in egrep, newline
is a synonym for |, the alternation operator.)

       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*20)([^0123456789]|$)
       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*21)([^0123456789]|$)
       (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*22)([^0123456789]|$)

It worked like a charm, except that it took a good part of a minute to
handle even a tiny calendar file. The reason: the state count of the
deterministic automaton was exponentially larger than the regular
regular expression; and egrep had to build the automaton before it
could run it. Al was mortified that an early serious use of egrep
should be such a turkey.

But Al was undaunted. He replaced the automaton construction with an
equivalent lazy algorithm that constructed a state only when the
recognizer was about to visit it. This made egrep into the brilliant
tool that Brian praised.

What I don't know is whether the calendar program stimulated the idea
of lazy implementation, or whether Al, like Ken before him with grep,
already had the idea up his sleeve.

Doug


[-- Attachment #2: Type: text/html, Size: 5148 bytes --]

  parent reply	other threads:[~2022-08-21  9:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-20 15:48 Douglas McIlroy
2022-08-20 16:17 ` Clem Cole
2022-08-20 18:24   ` Rich Morin
2022-08-21 15:09     ` John Cowan
2022-08-21 15:41       ` Clem Cole
2022-08-22  2:49         ` John Cowan
2022-08-21 20:07       ` Rich Morin
2022-08-21  9:36 ` Mohamed Akram [this message]
  -- strict thread matches above, loose matches on Subject: below --
2022-08-18 10:01 [TUHS] " Arnold Robbins
2022-08-18 21:58 ` [TUHS] " Clem Cole
2022-08-18 22:48   ` Pete Wright via TUHS

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A7E20292-66B3-467E-9507-C69B39BA9B73@outlook.com \
    --to=mohd.akram@outlook.com \
    --cc=douglas.mcilroy@dartmouth.edu \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).