* [TUHS] Re: regex early discussions
2024-03-04 2:03 ` [TUHS] " Marc Rochkind
@ 2024-03-04 3:38 ` Larry McVoy
2024-03-04 4:18 ` Rich Salz
` (3 more replies)
2024-03-04 7:10 ` Otto Moerbeek via TUHS
` (2 subsequent siblings)
3 siblings, 4 replies; 24+ messages in thread
From: Larry McVoy @ 2024-03-04 3:38 UTC (permalink / raw)
To: Marc Rochkind; +Cc: Will Senn, TUHS
Marc is right. I'll add that I grew up in terminal rooms, a bunch of
kids connected to a VAX 780, like 40 or more. I have no idea how the
kids ahead of me learned but I learned by looking at their terminal
and going "what did you just do?".
My real understanding of regex is from Henry Spencer's regex.
On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
> Will, here's my recollection, when I got to UNIX in late 1972 or
> thereabouts:
>
> First, there was ed. grep and sed were derived from ed, so came along
> later. awk came along way later.
>
> There were only manual pages. You typed "man ed" and there it was. The man
> pages were very accurate, very clear, and very authoritative. Many found
> them too succinct, especially as UNIX got more popular, but all of us back
> in the day found them perfect. Maybe you had to read the man page a few
> times to understand it, but at least that's all you had to read. No need to
> hunt around for more documentation!
>
> (Well, there was more documentation: The source code, which was all online.
> But reading the ed source to understand regular expressions was impossible.
> It was in assembler, and Ken was generating code on the fly as the
> expression was compiled.)
>
> Also, it should be noted that ed produced a single error message: a
> question mark. No wasting of teletype paper!
>
> The motivation for learning regular expressions was that that's how you
> edited files. ed was the only game in town.
>
> (sh used a greatly restricted form of regular expressions, which were
> documented on the sh man page.)
>
> Marc Rochkind
>
> On Sun, Mar 3, 2024 at 6:31???PM Will Senn <will.senn@gmail.com> wrote:
>
> > Hi All,
> >
> > I was wondering, what were the best early sources of information for
> > regexes and why did folks need to know them to use unix? In my recent
> > explorations, I have needed to have a better understanding of them, so I'm
> > digging in... awk's my most recent thing and it's deeply associated with
> > them, so here we are. I went to the bookshelf to find something appropriate
> > and as usual, I've traced to primary sources to some extent. I started with
> > Mastering Regular Expressions by Friedl, and I won't knock it (it's one of
> > the bestsellers in our field), but it's much to long for my personal taste
> > and it's not quite as systematic as I would like (the author himself notes
> > that his interests are less technical than authors preceding him on the
> > subject). So, back to the shelves... Bourne's, The Unix Environment, and
> > Kernighan & Pike's, The Unix Programming Evironment both talk about them in
> > the context of grep, ed, sed, and awk. Going further back, the Unix
> > Programmer's Manual v7 - ed, grep, sed, awk...
> >
> > After digging around it seems like folks needed regexes for ed, grep, sed
> > and awk... and any other utility that leveraged the wonderful nature of
> > these handy expressions. Fine. Where did folks go learn them? Was there a
> > particularly good (succinct and accurate) source of information that folks
> > kept handy? I'm imagining (based on what I've seen) that someone might cut
> > out the ed discussion or the grep pages of the manual and tape them to
> > their monitors, but maybe I'm stooopid and they didn't need no stinkin'
> > memory device for regexes - surely they're intuitive enough that even a
> > simpleton could pick them up after seeing a few examples... but if that
> > were really the case, Friedl's book would have been a flop and it wasn't
> > :). So seriously, if you remember that far back - what was the definitive
> > source of your regex knowledge and what were the first motivators for
> > learning them?
> >
> > Thanks,
> >
> > Will
> >
>
>
> --
> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
--
---
Larry McVoy Retired to fishing http://www.mcvoy.com/lm/boat
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 3:38 ` Larry McVoy
@ 2024-03-04 4:18 ` Rich Salz
2024-03-04 7:51 ` Alec Muffett
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Rich Salz @ 2024-03-04 4:18 UTC (permalink / raw)
To: Larry McVoy; +Cc: Marc Rochkind, Will Senn, TUHS
[-- Attachment #1: Type: text/plain, Size: 158 bytes --]
I remember being given a copy of grep source and seeing a char pointer
written as "p[-1]" and it was an like a thunderbolt of understanding about
C pointers.
[-- Attachment #2: Type: text/html, Size: 190 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 3:38 ` Larry McVoy
2024-03-04 4:18 ` Rich Salz
@ 2024-03-04 7:51 ` Alec Muffett
2024-03-04 8:17 ` Rob Pike
2024-03-04 14:34 ` Larry McVoy
3 siblings, 0 replies; 24+ messages in thread
From: Alec Muffett @ 2024-03-04 7:51 UTC (permalink / raw)
To: Larry McVoy; +Cc: Marc Rochkind, Will Senn, TUHS
[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]
On Mon, 4 Mar 2024, 03:38 Larry McVoy, <lm@mcvoy.com> wrote:
> Marc is right. I'll add that I grew up in terminal rooms, a bunch of
> kids connected to a VAX 780, like 40 or more. I have no idea how the
> kids ahead of me learned but I learned by looking at their terminal
> and going "what did you just do?".
>
> My real understanding of regex is from Henry Spencer's regex.
>
I have a similar story; I landed in Unix circa 1987 because the computer
science students at UCL were all raving about Unix / the Pyramid (Did we
pass unused cspyr accounts around the college nerd undergraduate
underground? Nooooooo, we would never have done that, that would be
"hacking…") and finally the physics department got some Suns too play with.
I bought the Bourne book to navigate the basic shell utilities and of
course there was source code (which we also weren't meant to have access
to, etc etc) - but from my world "regexp" were a fuzzy concept defined by
sed and grep (and various grep reimplementations) until Perl arrived and
sedimented (crowned?) Henry's implementation.
And that's why we call it PCRE.
-a
[-- Attachment #2: Type: text/html, Size: 1875 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 3:38 ` Larry McVoy
2024-03-04 4:18 ` Rich Salz
2024-03-04 7:51 ` Alec Muffett
@ 2024-03-04 8:17 ` Rob Pike
2024-03-04 8:43 ` Alec Muffett
2024-03-04 10:21 ` Bakul Shah via TUHS
2024-03-04 14:34 ` Larry McVoy
3 siblings, 2 replies; 24+ messages in thread
From: Rob Pike @ 2024-03-04 8:17 UTC (permalink / raw)
To: Larry McVoy; +Cc: Marc Rochkind, Will Senn, TUHS
[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]
If that's really true, that you learned from Spencer's library, then you
didn't learn the most important thing about them, which is the automata
theory that guarantees their performance is always linear. Not to take
anything away from Henry, who admitted at the time that it could be slow
for bad expressions, but we're still paying the price for refusing to
connect "regex" with the theory that created them, ignoring it in fact.
Background: https://swtch.com/~rsc/regexp/regexp1.html
-rob
On Mon, Mar 4, 2024 at 2:38 PM Larry McVoy <lm@mcvoy.com> wrote:
> Marc is right. I'll add that I grew up in terminal rooms, a bunch of
> kids connected to a VAX 780, like 40 or more. I have no idea how the
> kids ahead of me learned but I learned by looking at their terminal
> and going "what did you just do?".
>
> My real understanding of regex is from Henry Spencer's regex.
>
> On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
> > Will, here's my recollection, when I got to UNIX in late 1972 or
> > thereabouts:
> >
> > First, there was ed. grep and sed were derived from ed, so came along
> > later. awk came along way later.
> >
> > There were only manual pages. You typed "man ed" and there it was. The
> man
> > pages were very accurate, very clear, and very authoritative. Many found
> > them too succinct, especially as UNIX got more popular, but all of us
> back
> > in the day found them perfect. Maybe you had to read the man page a few
> > times to understand it, but at least that's all you had to read. No need
> to
> > hunt around for more documentation!
> >
> > (Well, there was more documentation: The source code, which was all
> online.
> > But reading the ed source to understand regular expressions was
> impossible.
> > It was in assembler, and Ken was generating code on the fly as the
> > expression was compiled.)
> >
> > Also, it should be noted that ed produced a single error message: a
> > question mark. No wasting of teletype paper!
> >
> > The motivation for learning regular expressions was that that's how you
> > edited files. ed was the only game in town.
> >
> > (sh used a greatly restricted form of regular expressions, which were
> > documented on the sh man page.)
> >
> > Marc Rochkind
> >
> > On Sun, Mar 3, 2024 at 6:31???PM Will Senn <will.senn@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I was wondering, what were the best early sources of information for
> > > regexes and why did folks need to know them to use unix? In my recent
> > > explorations, I have needed to have a better understanding of them, so
> I'm
> > > digging in... awk's my most recent thing and it's deeply associated
> with
> > > them, so here we are. I went to the bookshelf to find something
> appropriate
> > > and as usual, I've traced to primary sources to some extent. I started
> with
> > > Mastering Regular Expressions by Friedl, and I won't knock it (it's
> one of
> > > the bestsellers in our field), but it's much to long for my personal
> taste
> > > and it's not quite as systematic as I would like (the author himself
> notes
> > > that his interests are less technical than authors preceding him on the
> > > subject). So, back to the shelves... Bourne's, The Unix Environment,
> and
> > > Kernighan & Pike's, The Unix Programming Evironment both talk about
> them in
> > > the context of grep, ed, sed, and awk. Going further back, the Unix
> > > Programmer's Manual v7 - ed, grep, sed, awk...
> > >
> > > After digging around it seems like folks needed regexes for ed, grep,
> sed
> > > and awk... and any other utility that leveraged the wonderful nature of
> > > these handy expressions. Fine. Where did folks go learn them? Was
> there a
> > > particularly good (succinct and accurate) source of information that
> folks
> > > kept handy? I'm imagining (based on what I've seen) that someone might
> cut
> > > out the ed discussion or the grep pages of the manual and tape them to
> > > their monitors, but maybe I'm stooopid and they didn't need no stinkin'
> > > memory device for regexes - surely they're intuitive enough that even a
> > > simpleton could pick them up after seeing a few examples... but if that
> > > were really the case, Friedl's book would have been a flop and it
> wasn't
> > > :). So seriously, if you remember that far back - what was the
> definitive
> > > source of your regex knowledge and what were the first motivators for
> > > learning them?
> > >
> > > Thanks,
> > >
> > > Will
> > >
> >
> >
> > --
> > *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
>
> --
> ---
> Larry McVoy Retired to fishing
> http://www.mcvoy.com/lm/boat
>
[-- Attachment #2: Type: text/html, Size: 6374 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 8:17 ` Rob Pike
@ 2024-03-04 8:43 ` Alec Muffett
2024-03-04 14:25 ` Jan Schaumann via TUHS
2024-03-04 10:21 ` Bakul Shah via TUHS
1 sibling, 1 reply; 24+ messages in thread
From: Alec Muffett @ 2024-03-04 8:43 UTC (permalink / raw)
To: Rob Pike; +Cc: Marc Rochkind, Will Senn, TUHS
[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]
On Mon, 4 Mar 2024, 08:27 Rob Pike, <robpike@gmail.com> wrote [to Larry]
Oh happy days. Hi Rob, loved the book.
If that's really true, that you learned from Spencer's library, then you
> didn't learn the most important thing about them, which is the automata
> theory that guarantees their performance is always linear. Not to take
> anything away from Henry, who admitted at the time that it could be slow
> for bad expressions, but we're still paying the price for refusing to
> connect "regex" with the theory that created them, ignoring it in fact.
>
I once got into a bunfight with a Googler on the topic of coding interview
questions, on a related matter. He was promulgating a regular expression to
correctly match/parse-out legitimate dotted-quad IPv4 addresses, including
bounds-checking the octets to be in the range 0..255, and arguing that it
since it was going to be run through a DFA that it was a sunk cost for
efficiency and therefore perfect.
The result looked like line noise, and he was perturbed that I said I would
prefer to take a much simpler (NFA?) RE, parse out the ints and
bounds-check them, just to reduce cognitive load and increase
maintainability of code.
We didn't really come to an agreement.
-a
[-- Attachment #2: Type: text/html, Size: 2156 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 8:43 ` Alec Muffett
@ 2024-03-04 14:25 ` Jan Schaumann via TUHS
0 siblings, 0 replies; 24+ messages in thread
From: Jan Schaumann via TUHS @ 2024-03-04 14:25 UTC (permalink / raw)
To: Alec Muffett; +Cc: Marc Rochkind, Will Senn, TUHS
Alec Muffett <alec.muffett@gmail.com> wrote:
> I once got into a bunfight with a Googler on the topic of coding interview
> questions, on a related matter. He was promulgating a regular expression to
> correctly match/parse-out legitimate dotted-quad IPv4 addresses
That seems an excellent illustration of "now they have
two problems." (And now do IPv6.)
If you need to pull IP addresses from text, the most
liberal regex will generally be "good enough"; if you
must be certain, feed the string to inet_aton(3).
:-)
-Jan
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 8:17 ` Rob Pike
2024-03-04 8:43 ` Alec Muffett
@ 2024-03-04 10:21 ` Bakul Shah via TUHS
1 sibling, 0 replies; 24+ messages in thread
From: Bakul Shah via TUHS @ 2024-03-04 10:21 UTC (permalink / raw)
To: Rob Pike; +Cc: Marc Rochkind, Will Senn, TUHS
[-- Attachment #1: Type: text/html, Size: 7348 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 3:38 ` Larry McVoy
` (2 preceding siblings ...)
2024-03-04 8:17 ` Rob Pike
@ 2024-03-04 14:34 ` Larry McVoy
3 siblings, 0 replies; 24+ messages in thread
From: Larry McVoy @ 2024-03-04 14:34 UTC (permalink / raw)
To: Marc Rochkind; +Cc: Will Senn, TUHS
On Sun, Mar 03, 2024 at 07:38:45PM -0800, Larry McVoy wrote:
> Marc is right. I'll add that I grew up in terminal rooms, a bunch of
> kids connected to a VAX 780, like 40 or more. I have no idea how the
> kids ahead of me learned but I learned by looking at their terminal
> and going "what did you just do?".
>
> My real understanding of regex is from Henry Spencer's regex.
And this little implementation, I've used this one a lot.
http://www.cs.yorku.ca/~oz/regex.bun
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 2:03 ` [TUHS] " Marc Rochkind
2024-03-04 3:38 ` Larry McVoy
@ 2024-03-04 7:10 ` Otto Moerbeek via TUHS
2024-03-04 7:19 ` Dave Long
2024-03-04 7:25 ` Otto Moerbeek via TUHS
2024-03-04 12:00 ` Peter Weinberger (温博格) via TUHS
2024-03-04 17:05 ` Will Senn
3 siblings, 2 replies; 24+ messages in thread
From: Otto Moerbeek via TUHS @ 2024-03-04 7:10 UTC (permalink / raw)
To: Marc Rochkind; +Cc: Will Senn, TUHS
On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
> Will, here's my recollection, when I got to UNIX in late 1972 or
> thereabouts:
>
> First, there was ed. grep and sed were derived from ed, so came along
> later. awk came along way later.
>
> There were only manual pages. You typed "man ed" and there it was. The man
> pages were very accurate, very clear, and very authoritative. Many found
> them too succinct, especially as UNIX got more popular, but all of us back
> in the day found them perfect. Maybe you had to read the man page a few
> times to understand it, but at least that's all you had to read. No need to
> hunt around for more documentation!
>
> (Well, there was more documentation: The source code, which was all online.
> But reading the ed source to understand regular expressions was impossible.
> It was in assembler, and Ken was generating code on the fly as the
> expression was compiled.)
I like to add that there was also quite a large set of additional
documentatiomn (Volume 2, Voilume 1 were the man pages), which
includes "Advanced Editing on UNIX" giving many examples on the use of
regexes in ed(1).
I do remeber reading a lot from Volume 2, as CS students in Amsterdam
we received printed and bound copies of both Volume 1 and 2. So in my
case, "only man pages or source" is not true. Having paper versions
was importent, because access to terminals for students was limited
(until I became a teaching assistent, which came with privileges,
including 24h access to terminals)
-Otto
>
> Also, it should be noted that ed produced a single error message: a
> question mark. No wasting of teletype paper!
>
> The motivation for learning regular expressions was that that's how you
> edited files. ed was the only game in town.
>
> (sh used a greatly restricted form of regular expressions, which were
> documented on the sh man page.)
>
> Marc Rochkind
>
> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
>
> > Hi All,
> >
> > I was wondering, what were the best early sources of information for
> > regexes and why did folks need to know them to use unix? In my recent
> > explorations, I have needed to have a better understanding of them, so I'm
> > digging in... awk's my most recent thing and it's deeply associated with
> > them, so here we are. I went to the bookshelf to find something appropriate
> > and as usual, I've traced to primary sources to some extent. I started with
> > Mastering Regular Expressions by Friedl, and I won't knock it (it's one of
> > the bestsellers in our field), but it's much to long for my personal taste
> > and it's not quite as systematic as I would like (the author himself notes
> > that his interests are less technical than authors preceding him on the
> > subject). So, back to the shelves... Bourne's, The Unix Environment, and
> > Kernighan & Pike's, The Unix Programming Evironment both talk about them in
> > the context of grep, ed, sed, and awk. Going further back, the Unix
> > Programmer's Manual v7 - ed, grep, sed, awk...
> >
> > After digging around it seems like folks needed regexes for ed, grep, sed
> > and awk... and any other utility that leveraged the wonderful nature of
> > these handy expressions. Fine. Where did folks go learn them? Was there a
> > particularly good (succinct and accurate) source of information that folks
> > kept handy? I'm imagining (based on what I've seen) that someone might cut
> > out the ed discussion or the grep pages of the manual and tape them to
> > their monitors, but maybe I'm stooopid and they didn't need no stinkin'
> > memory device for regexes - surely they're intuitive enough that even a
> > simpleton could pick them up after seeing a few examples... but if that
> > were really the case, Friedl's book would have been a flop and it wasn't
> > :). So seriously, if you remember that far back - what was the definitive
> > source of your regex knowledge and what were the first motivators for
> > learning them?
> >
> > Thanks,
> >
> > Will
> >
>
>
> --
> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 7:10 ` Otto Moerbeek via TUHS
@ 2024-03-04 7:19 ` Dave Long
2024-03-04 7:25 ` arnold
2024-03-04 7:25 ` Otto Moerbeek via TUHS
1 sibling, 1 reply; 24+ messages in thread
From: Dave Long @ 2024-03-04 7:19 UTC (permalink / raw)
To: Otto Moerbeek; +Cc: Marc Rochkind, Will Senn, TUHS
Did `learn` have a regex module? (my memory* does not suffice, and I didn't even manage to get google to tell me if it were learn(1) or learn(6), so please forgive the imprecision of this response)
-Dave
* although I do recall this was how I learned one of ed(1) or vi(1)
> On 4 Mar 2024, at 08:10, Otto Moerbeek via TUHS <tuhs@tuhs.org> wrote:
>
> On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
>
>> Will, here's my recollection, when I got to UNIX in late 1972 or
>> thereabouts:
>>
>> First, there was ed. grep and sed were derived from ed, so came along
>> later. awk came along way later.
>>
>> There were only manual pages. You typed "man ed" and there it was. The man
>> pages were very accurate, very clear, and very authoritative. Many found
>> them too succinct, especially as UNIX got more popular, but all of us back
>> in the day found them perfect. Maybe you had to read the man page a few
>> times to understand it, but at least that's all you had to read. No need to
>> hunt around for more documentation!
>>
>> (Well, there was more documentation: The source code, which was all online.
>> But reading the ed source to understand regular expressions was impossible.
>> It was in assembler, and Ken was generating code on the fly as the
>> expression was compiled.)
>
> I like to add that there was also quite a large set of additional
> documentatiomn (Volume 2, Voilume 1 were the man pages), which
> includes "Advanced Editing on UNIX" giving many examples on the use of
> regexes in ed(1).
>
> I do remeber reading a lot from Volume 2, as CS students in Amsterdam
> we received printed and bound copies of both Volume 1 and 2. So in my
> case, "only man pages or source" is not true. Having paper versions
> was importent, because access to terminals for students was limited
> (until I became a teaching assistent, which came with privileges,
> including 24h access to terminals)
>
> -Otto
>
>>
>> Also, it should be noted that ed produced a single error message: a
>> question mark. No wasting of teletype paper!
>>
>> The motivation for learning regular expressions was that that's how you
>> edited files. ed was the only game in town.
>>
>> (sh used a greatly restricted form of regular expressions, which were
>> documented on the sh man page.)
>>
>> Marc Rochkind
>>
>> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I was wondering, what were the best early sources of information for
>>> regexes and why did folks need to know them to use unix? In my recent
>>> explorations, I have needed to have a better understanding of them, so I'm
>>> digging in... awk's my most recent thing and it's deeply associated with
>>> them, so here we are. I went to the bookshelf to find something appropriate
>>> and as usual, I've traced to primary sources to some extent. I started with
>>> Mastering Regular Expressions by Friedl, and I won't knock it (it's one of
>>> the bestsellers in our field), but it's much to long for my personal taste
>>> and it's not quite as systematic as I would like (the author himself notes
>>> that his interests are less technical than authors preceding him on the
>>> subject). So, back to the shelves... Bourne's, The Unix Environment, and
>>> Kernighan & Pike's, The Unix Programming Evironment both talk about them in
>>> the context of grep, ed, sed, and awk. Going further back, the Unix
>>> Programmer's Manual v7 - ed, grep, sed, awk...
>>>
>>> After digging around it seems like folks needed regexes for ed, grep, sed
>>> and awk... and any other utility that leveraged the wonderful nature of
>>> these handy expressions. Fine. Where did folks go learn them? Was there a
>>> particularly good (succinct and accurate) source of information that folks
>>> kept handy? I'm imagining (based on what I've seen) that someone might cut
>>> out the ed discussion or the grep pages of the manual and tape them to
>>> their monitors, but maybe I'm stooopid and they didn't need no stinkin'
>>> memory device for regexes - surely they're intuitive enough that even a
>>> simpleton could pick them up after seeing a few examples... but if that
>>> were really the case, Friedl's book would have been a flop and it wasn't
>>> :). So seriously, if you remember that far back - what was the definitive
>>> source of your regex knowledge and what were the first motivators for
>>> learning them?
>>>
>>> Thanks,
>>>
>>> Will
>>>
>>
>>
>> --
>> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 7:19 ` Dave Long
@ 2024-03-04 7:25 ` arnold
2024-03-04 12:05 ` Ralph Corderoy
0 siblings, 1 reply; 24+ messages in thread
From: arnold @ 2024-03-04 7:25 UTC (permalink / raw)
To: otto, dave.long; +Cc: will.senn, tuhs, mrochkind
I learned regular expressions from Kernighan & Plauger's book
"Software Tools". I was exposed to that book, Unix (v6 on a PDP-11)
and C programming (via K&R's book) all at the same time. This was in
the fall of 1980.
"Software Tools" changed my life.
Arnold
Dave Long <dave.long@bluewin.ch> wrote:
> Did `learn` have a regex module? (my memory* does not suffice, and
> I didn't even manage to get google to tell me if it were learn(1) or
> learn(6), so please forgive the imprecision of this response)
>
> -Dave
>
> * although I do recall this was how I learned one of ed(1) or vi(1)
>
> > On 4 Mar 2024, at 08:10, Otto Moerbeek via TUHS <tuhs@tuhs.org> wrote:
> >
> > On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
> >
> >> Will, here's my recollection, when I got to UNIX in late 1972 or
> >> thereabouts:
> >>
> >> First, there was ed. grep and sed were derived from ed, so came along
> >> later. awk came along way later.
> >>
> >> There were only manual pages. You typed "man ed" and there it was. The man
> >> pages were very accurate, very clear, and very authoritative. Many found
> >> them too succinct, especially as UNIX got more popular, but all of us back
> >> in the day found them perfect. Maybe you had to read the man page a few
> >> times to understand it, but at least that's all you had to read. No need to
> >> hunt around for more documentation!
> >>
> >> (Well, there was more documentation: The source code, which was all online.
> >> But reading the ed source to understand regular expressions was impossible.
> >> It was in assembler, and Ken was generating code on the fly as the
> >> expression was compiled.)
> >
> > I like to add that there was also quite a large set of additional
> > documentatiomn (Volume 2, Voilume 1 were the man pages), which
> > includes "Advanced Editing on UNIX" giving many examples on the use of
> > regexes in ed(1).
> >
> > I do remeber reading a lot from Volume 2, as CS students in Amsterdam
> > we received printed and bound copies of both Volume 1 and 2. So in my
> > case, "only man pages or source" is not true. Having paper versions
> > was importent, because access to terminals for students was limited
> > (until I became a teaching assistent, which came with privileges,
> > including 24h access to terminals)
> >
> > -Otto
> >
> >>
> >> Also, it should be noted that ed produced a single error message: a
> >> question mark. No wasting of teletype paper!
> >>
> >> The motivation for learning regular expressions was that that's how you
> >> edited files. ed was the only game in town.
> >>
> >> (sh used a greatly restricted form of regular expressions, which were
> >> documented on the sh man page.)
> >>
> >> Marc Rochkind
> >>
> >> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I was wondering, what were the best early sources of information for
> >>> regexes and why did folks need to know them to use unix? In my recent
> >>> explorations, I have needed to have a better understanding of them, so I'm
> >>> digging in... awk's my most recent thing and it's deeply associated with
> >>> them, so here we are. I went to the bookshelf to find something appropriate
> >>> and as usual, I've traced to primary sources to some extent. I started with
> >>> Mastering Regular Expressions by Friedl, and I won't knock it (it's one of
> >>> the bestsellers in our field), but it's much to long for my personal taste
> >>> and it's not quite as systematic as I would like (the author himself notes
> >>> that his interests are less technical than authors preceding him on the
> >>> subject). So, back to the shelves... Bourne's, The Unix Environment, and
> >>> Kernighan & Pike's, The Unix Programming Evironment both talk about them in
> >>> the context of grep, ed, sed, and awk. Going further back, the Unix
> >>> Programmer's Manual v7 - ed, grep, sed, awk...
> >>>
> >>> After digging around it seems like folks needed regexes for ed, grep, sed
> >>> and awk... and any other utility that leveraged the wonderful nature of
> >>> these handy expressions. Fine. Where did folks go learn them? Was there a
> >>> particularly good (succinct and accurate) source of information that folks
> >>> kept handy? I'm imagining (based on what I've seen) that someone might cut
> >>> out the ed discussion or the grep pages of the manual and tape them to
> >>> their monitors, but maybe I'm stooopid and they didn't need no stinkin'
> >>> memory device for regexes - surely they're intuitive enough that even a
> >>> simpleton could pick them up after seeing a few examples... but if that
> >>> were really the case, Friedl's book would have been a flop and it wasn't
> >>> :). So seriously, if you remember that far back - what was the definitive
> >>> source of your regex knowledge and what were the first motivators for
> >>> learning them?
> >>>
> >>> Thanks,
> >>>
> >>> Will
> >>>
> >>
> >>
> >> --
> >> *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 7:25 ` arnold
@ 2024-03-04 12:05 ` Ralph Corderoy
2024-03-04 13:01 ` arnold
0 siblings, 1 reply; 24+ messages in thread
From: Ralph Corderoy @ 2024-03-04 12:05 UTC (permalink / raw)
To: tuhs; +Cc: will.senn, mrochkind
Hi Arnold,
> I learned regular expressions from Kernighan & Plauger's book
> "Software Tools". I was exposed to that book, Unix (v6 on a PDP-11)
> and C programming (via K&R's book) all at the same time. This was in
> the fall of 1980.
An excellent book. What I think you've not mentioned is that it
implements regular expressions. Being inside the black box can aid
understanding, including the performance of the matcher and the way the
regexp is best written for a particular matcher.
Kernighan and Pike's ‘The practice of programming’ also briefly
implements some regexp functionality when talking about the power of
notation.
--
Cheers, Ralph.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 12:05 ` Ralph Corderoy
@ 2024-03-04 13:01 ` arnold
0 siblings, 0 replies; 24+ messages in thread
From: arnold @ 2024-03-04 13:01 UTC (permalink / raw)
To: tuhs, ralph; +Cc: will.senn, mrochkind
Hi Ralph.
Ralph Corderoy <ralph@inputplus.co.uk> wrote:
> Hi Arnold,
>
> > I learned regular expressions from Kernighan & Plauger's book
> > "Software Tools". I was exposed to that book, Unix (v6 on a PDP-11)
> > and C programming (via K&R's book) all at the same time. This was in
> > the fall of 1980.
>
> An excellent book. What I think you've not mentioned is that it
> implements regular expressions. Being inside the black box can aid
> understanding, including the performance of the matcher and the way the
> regexp is best written for a particular matcher.
Quite true.
> Kernighan and Pike's ‘The practice of programming’ also briefly
> implements some regexp functionality when talking about the power of
> notation.
What I didn't quite remember when I wrote the earlier note was that
at the same time as I was learning C, Unix and software tools, I took
a compiler course, using the first edition of the dragon book, which
covered regular expressions, NFAs and DFAs.
It all came together at the same time.
Arnold
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 7:10 ` Otto Moerbeek via TUHS
2024-03-04 7:19 ` Dave Long
@ 2024-03-04 7:25 ` Otto Moerbeek via TUHS
1 sibling, 0 replies; 24+ messages in thread
From: Otto Moerbeek via TUHS @ 2024-03-04 7:25 UTC (permalink / raw)
To: Otto Moerbeek via TUHS; +Cc: Marc Rochkind, Will Senn
On Mon, Mar 04, 2024 at 08:10:26AM +0100, Otto Moerbeek via TUHS wrote:
> On Sun, Mar 03, 2024 at 07:03:39PM -0700, Marc Rochkind wrote:
>
> > Will, here's my recollection, when I got to UNIX in late 1972 or
> > thereabouts:
> >
> > First, there was ed. grep and sed were derived from ed, so came along
> > later. awk came along way later.
> >
> > There were only manual pages. You typed "man ed" and there it was. The man
> > pages were very accurate, very clear, and very authoritative. Many found
> > them too succinct, especially as UNIX got more popular, but all of us back
> > in the day found them perfect. Maybe you had to read the man page a few
> > times to understand it, but at least that's all you had to read. No need to
> > hunt around for more documentation!
> >
> > (Well, there was more documentation: The source code, which was all online.
> > But reading the ed source to understand regular expressions was impossible.
> > It was in assembler, and Ken was generating code on the fly as the
> > expression was compiled.)
>
> I like to add that there was also quite a large set of additional
> documentatiomn (Volume 2, Voilume 1 were the man pages), which
> includes "Advanced Editing on UNIX" giving many examples on the use of
> regexes in ed(1).
>
> I do remeber reading a lot from Volume 2, as CS students in Amsterdam
> we received printed and bound copies of both Volume 1 and 2. So in my
> case, "only man pages or source" is not true. Having paper versions
> was importent, because access to terminals for students was limited
> (until I became a teaching assistent, which came with privileges,
> including 24h access to terminals)
https://wolfram.schneider.org/bsd/7thEdManVol2/ shows the contens of
Volume 2 (level ranges from introductionary tutorial to interals of
the compiler)
>
> -Otto
>
> >
> > Also, it should be noted that ed produced a single error message: a
> > question mark. No wasting of teletype paper!
> >
> > The motivation for learning regular expressions was that that's how you
> > edited files. ed was the only game in town.
> >
> > (sh used a greatly restricted form of regular expressions, which were
> > documented on the sh man page.)
> >
> > Marc Rochkind
> >
> > On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I was wondering, what were the best early sources of information for
> > > regexes and why did folks need to know them to use unix? In my recent
> > > explorations, I have needed to have a better understanding of them, so I'm
> > > digging in... awk's my most recent thing and it's deeply associated with
> > > them, so here we are. I went to the bookshelf to find something appropriate
> > > and as usual, I've traced to primary sources to some extent. I started with
> > > Mastering Regular Expressions by Friedl, and I won't knock it (it's one of
> > > the bestsellers in our field), but it's much to long for my personal taste
> > > and it's not quite as systematic as I would like (the author himself notes
> > > that his interests are less technical than authors preceding him on the
> > > subject). So, back to the shelves... Bourne's, The Unix Environment, and
> > > Kernighan & Pike's, The Unix Programming Evironment both talk about them in
> > > the context of grep, ed, sed, and awk. Going further back, the Unix
> > > Programmer's Manual v7 - ed, grep, sed, awk...
> > >
> > > After digging around it seems like folks needed regexes for ed, grep, sed
> > > and awk... and any other utility that leveraged the wonderful nature of
> > > these handy expressions. Fine. Where did folks go learn them? Was there a
> > > particularly good (succinct and accurate) source of information that folks
> > > kept handy? I'm imagining (based on what I've seen) that someone might cut
> > > out the ed discussion or the grep pages of the manual and tape them to
> > > their monitors, but maybe I'm stooopid and they didn't need no stinkin'
> > > memory device for regexes - surely they're intuitive enough that even a
> > > simpleton could pick them up after seeing a few examples... but if that
> > > were really the case, Friedl's book would have been a flop and it wasn't
> > > :). So seriously, if you remember that far back - what was the definitive
> > > source of your regex knowledge and what were the first motivators for
> > > learning them?
> > >
> > > Thanks,
> > >
> > > Will
> > >
> >
> >
> > --
> > *My new email address is mrochkind@gmail.com <mrochkind@gmail.com>*
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 2:03 ` [TUHS] " Marc Rochkind
2024-03-04 3:38 ` Larry McVoy
2024-03-04 7:10 ` Otto Moerbeek via TUHS
@ 2024-03-04 12:00 ` Peter Weinberger (温博格) via TUHS
2024-03-04 17:05 ` Will Senn
3 siblings, 0 replies; 24+ messages in thread
From: Peter Weinberger (温博格) via TUHS @ 2024-03-04 12:00 UTC (permalink / raw)
To: Marc Rochkind; +Cc: Will Senn, TUHS
my recollection is that awk and sed were contemporaneous.
On Sun, Mar 3, 2024 at 9:04 PM Marc Rochkind <mrochkind@gmail.com> wrote:
>
> Will, here's my recollection, when I got to UNIX in late 1972 or thereabouts:
>
> First, there was ed. grep and sed were derived from ed, so came along later. awk came along way later.
>
> There were only manual pages. You typed "man ed" and there it was. The man pages were very accurate, very clear, and very authoritative. Many found them too succinct, especially as UNIX got more popular, but all of us back in the day found them perfect. Maybe you had to read the man page a few times to understand it, but at least that's all you had to read. No need to hunt around for more documentation!
>
> (Well, there was more documentation: The source code, which was all online. But reading the ed source to understand regular expressions was impossible. It was in assembler, and Ken was generating code on the fly as the expression was compiled.)
>
> Also, it should be noted that ed produced a single error message: a question mark. No wasting of teletype paper!
>
> The motivation for learning regular expressions was that that's how you edited files. ed was the only game in town.
>
> (sh used a greatly restricted form of regular expressions, which were documented on the sh man page.)
>
> Marc Rochkind
>
> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
>>
>> Hi All,
>>
>> I was wondering, what were the best early sources of information for regexes and why did folks need to know them to use unix? In my recent explorations, I have needed to have a better understanding of them, so I'm digging in... awk's my most recent thing and it's deeply associated with them, so here we are. I went to the bookshelf to find something appropriate and as usual, I've traced to primary sources to some extent. I started with Mastering Regular Expressions by Friedl, and I won't knock it (it's one of the bestsellers in our field), but it's much to long for my personal taste and it's not quite as systematic as I would like (the author himself notes that his interests are less technical than authors preceding him on the subject). So, back to the shelves... Bourne's, The Unix Environment, and Kernighan & Pike's, The Unix Programming Evironment both talk about them in the context of grep, ed, sed, and awk. Going further back, the Unix Programmer's Manual v7 - ed, grep, sed, awk...
>>
>> After digging around it seems like folks needed regexes for ed, grep, sed and awk... and any other utility that leveraged the wonderful nature of these handy expressions. Fine. Where did folks go learn them? Was there a particularly good (succinct and accurate) source of information that folks kept handy? I'm imagining (based on what I've seen) that someone might cut out the ed discussion or the grep pages of the manual and tape them to their monitors, but maybe I'm stooopid and they didn't need no stinkin' memory device for regexes - surely they're intuitive enough that even a simpleton could pick them up after seeing a few examples... but if that were really the case, Friedl's book would have been a flop and it wasn't :). So seriously, if you remember that far back - what was the definitive source of your regex knowledge and what were the first motivators for learning them?
>>
>> Thanks,
>>
>> Will
>
>
>
> --
> My new email address is mrochkind@gmail.com
^ permalink raw reply [flat|nested] 24+ messages in thread
* [TUHS] Re: regex early discussions
2024-03-04 2:03 ` [TUHS] " Marc Rochkind
` (2 preceding siblings ...)
2024-03-04 12:00 ` Peter Weinberger (温博格) via TUHS
@ 2024-03-04 17:05 ` Will Senn
2024-03-04 18:43 ` Rich Salz
3 siblings, 1 reply; 24+ messages in thread
From: Will Senn @ 2024-03-04 17:05 UTC (permalink / raw)
To: TUHS
[-- Attachment #1: Type: text/plain, Size: 5559 bytes --]
To close the loop a bit...
I really appreciate the anecdotes and background. It's helpful to those
of us who didn't live it.
On the best resources front:
The Unix Programmer's Manual for v7 contains:
"A Tutorial Introduction to the UNIX Text Editor" by B. W. Kernighan -
excellent coverage of Context Searching using a limited subset of regex.
"Advanced Editing on UNIX" by B. W. Kernighan - lots of examples.
"ed(1)" by authors of the manpages - super concise but thorough coverage
of the regex rules (great followup to the tutorial).
Articles:
"Regular Expression Search Algorithm", by K. Thompson - an Algol-60
implementation of regex described in 4 pages... in 1968... I was 2 1/2.
"Regular Expression Matching Can Be Simple and Fast", by Russ Cox - how
can an article be both simple and deep? Great concision.
Other Books:
"The AWK Programming Language" by A. V. Aho, B. W. Kernighan, & P. J.
Weinberger - the discussion on pp. 28-31, Regular Expressions, is the
best I've seen.
"Chapter 9. Regular Expresssions" in the XBD section of the SUS (IEEE
Std 1003.1-2017) - Comprehensive presentation of the spec (good stuff,
even if nobody perfectly implements it).
There are plenty more, but with the tutorial, ed(1), and AWK book in
hand, I think a beginner is covered.
BTW, awk is awesome (particularly with the new csv additions) - I don't
"need" the new unicode support, but it's nice. I didn't get awk, but
when I figured out you could do this:
awk '/SYS.*\(write\,/, /\)/' */*
SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
size_t, count)
in the kernel source, I was sold. I've never really wrapped my head
around how to efficiently search over multiple lines, awk's range
patterns... just make sense :). Even in it looks crazy, it works.
ranges bounded by regexes... who'd of thunk it?
Will
On 3/3/24 8:03 PM, Marc Rochkind wrote:
> Will, here's my recollection, when I got to UNIX in late 1972 or
> thereabouts:
>
> First, there was ed. grep and sed were derived from ed, so came along
> later. awk came along way later.
>
> There were only manual pages. You typed "man ed" and there it was. The
> man pages were very accurate, very clear, and very authoritative. Many
> found them too succinct, especially as UNIX got more popular, but all
> of us back in the day found them perfect. Maybe you had to read the
> man page a few times to understand it, but at least that's all you had
> to read. No need to hunt around for more documentation!
>
> (Well, there was more documentation: The source code, which was all
> online. But reading the ed source to understand regular expressions
> was impossible. It was in assembler, and Ken was generating code on
> the fly as the expression was compiled.)
>
> Also, it should be noted that ed produced a single error message: a
> question mark. No wasting of teletype paper!
>
> The motivation for learning regular expressions was that that's how
> you edited files. ed was the only game in town.
>
> (sh used a greatly restricted form of regular expressions, which were
> documented on the sh man page.)
>
> Marc Rochkind
>
> On Sun, Mar 3, 2024 at 6:31 PM Will Senn <will.senn@gmail.com> wrote:
>
> Hi All,
>
> I was wondering, what were the best early sources of information
> for regexes and why did folks need to know them to use unix? In my
> recent explorations, I have needed to have a better understanding
> of them, so I'm digging in... awk's my most recent thing and it's
> deeply associated with them, so here we are. I went to the
> bookshelf to find something appropriate and as usual, I've traced
> to primary sources to some extent. I started with Mastering
> Regular Expressions by Friedl, and I won't knock it (it's one of
> the bestsellers in our field), but it's much to long for my
> personal taste and it's not quite as systematic as I would like
> (the author himself notes that his interests are less technical
> than authors preceding him on the subject). So, back to the
> shelves... Bourne's, The Unix Environment, and Kernighan & Pike's,
> The Unix Programming Evironment both talk about them in the
> context of grep, ed, sed, and awk. Going further back, the Unix
> Programmer's Manual v7 - ed, grep, sed, awk...
>
> After digging around it seems like folks needed regexes for ed,
> grep, sed and awk... and any other utility that leveraged the
> wonderful nature of these handy expressions. Fine. Where did folks
> go learn them? Was there a particularly good (succinct and
> accurate) source of information that folks kept handy? I'm
> imagining (based on what I've seen) that someone might cut out the
> ed discussion or the grep pages of the manual and tape them to
> their monitors, but maybe I'm stooopid and they didn't need no
> stinkin' memory device for regexes - surely they're intuitive
> enough that even a simpleton could pick them up after seeing a few
> examples... but if that were really the case, Friedl's book would
> have been a flop and it wasn't :). So seriously, if you remember
> that far back - what was the definitive source of your regex
> knowledge and what were the first motivators for learning them?
>
> Thanks,
>
> Will
>
>
>
> --
> /My new email address is mrochkind@gmail.com/
[-- Attachment #2: Type: text/html, Size: 8513 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread