From: Will Senn <will.senn@gmail.com>
To: TUHS main list <tuhs@minnie.tuhs.org>
Subject: [TUHS] Regular Expressions
Date: Fri, 31 Jul 2020 17:57:37 -0500 [thread overview]
Message-ID: <6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 2221 bytes --]
I've always been intrigued with regexes. When I was first exposed to
them, I was mystified and lost in the greediness of matches. Now, I use
them regularly, but still have trouble using them. I think it is because
I don't really understand how they work.
My question for y'all has to do with early unix. I have a copy of
Thompson, K. (1968). Regular expression search algorithm. Communications
of the ACM, 11(6), 419-422. It is interesting as an example of
Thompson's thinking about regexes. In this paper, he presents a
non-backtracking, efficient, algorithm for converting a regex into an
IBM 7094 (whatever that is) program that can be run against text input
that generates matches. It's cool. It got me to thinking maybe the way
to understand the unix regex lies in a careful investigation into how it
is implemented (original thought, right?). So, here I am again to ask
your indulgence as the latecomer wannabe unix apprentice. My thought is
that ed is where it begins and might be a good starting point, but I'm
not sure - what say y'all?
I also have a copy of the O'Reilly Mastering Regular Expressions book,
but that's not really the kind of thing I'm talking about. My question
is more basic than how to use regexes practically. I would like to
understand them at a parsing level/state change level (not sure that's
the correct way to say it, but I'm really new to this kind of lingo).
When I'm done with my stepping through the source, I want to be able to
reason that this is why that search matched that text and not this text
and why the search was greedy, or not greedy because of this logic here...
If my question above isn't focused or on topic enough, here's an
alternative set to ruminate on and hopefully discuss:
1. What's the provenance of regex in unix (when did it appear, in what
form, etc)?
2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
3. What are some of the milestones along the way (major changes, forks,
disagreements)?
4. Where, in the source, or in a paper, would you point someone to
wanting to better understand the mechanics of regex?
Thanks!
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
[-- Attachment #2: Type: text/html, Size: 2765 bytes --]
next reply other threads:[~2020-07-31 22:58 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-31 22:57 Will Senn [this message]
2020-08-01 0:01 ` Bakul Shah
2020-08-01 0:36 ` Rob Pike
2020-08-01 0:53 ` John P. Linderman
2020-08-01 1:31 ` Bakul Shah
2020-08-01 1:39 ` Larry McVoy
2020-08-01 2:33 ` Will Senn
2020-08-01 2:50 ` Rich Morin
2020-08-01 3:01 ` Larry McVoy
2020-08-01 3:07 ` Will Senn
2020-08-01 4:31 ` Earl Baugh
2020-08-01 4:53 ` ron minnich
2020-08-01 5:48 ` Andrew Hume
2020-08-01 13:31 ` Richard Salz
2020-08-01 13:43 ` Andrew Hume
2020-08-02 0:45 ` Christopher Browne
2020-08-09 1:00 ` Dave Horsfall
2020-08-09 1:15 ` Nelson H. F. Beebe
2020-08-09 23:53 ` Dave Horsfall
2020-08-10 1:38 ` John Cowan
2020-08-01 0:00 Noel Chiappa
2020-08-01 21:12 Doug McIlroy
2020-08-09 23:44 ` Dave Horsfall
2020-08-10 0:50 ` Rob Pike
2020-08-02 4:59 Rudi Blom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e9ca056-dfb0-376d-effd-e41c9ed3ef2a@gmail.com \
--to=will.senn@gmail.com \
--cc=tuhs@minnie.tuhs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).