Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Ralph Corderoy <ralph@inputplus.co.uk>
To: coff@tuhs.org
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Fri, 03 Mar 2023 13:42:15 +0000	[thread overview]
Message-ID: <20230303134215.3ED63215AA@orac.inputplus.co.uk> (raw)
In-Reply-To: <CAEoi9W7D49pdoKXMruSLUp7QO7468FkmwL2E715Nu=DfSHWhmQ@mail.gmail.com>

Hi Dan,

> > If you want to understand:
> >
> > - the maths of regular expressions,
> > - the syntax of regexps which these days expresses more than REs, and
> > - the regexp engines in programs, the differences in how they work
> >   and what they match, and
> > - how to efficiently steer an engine's internals
> >
> > then I recommend Jeffrey Friedl's Mastering Regular Expressions.
> > http://regex.info/book.html
>
> I'm afraid I must sound a note of caution about Friedl's book.  Russ
> Cox alludes to some of the problems in the "History and References"
> section of his page (https://swtch.com/~rsc/regexp/regexp1.html), that
> was linked earlier

Russ says:

 1 ‘Finally, any discussion of regular expressions would be incomplete
    without mentioning Jeffrey Friedl's book Mastering Regular Expressions,
    perhaps the most popular reference among today's programmers.
 2  Friedl's book teaches programmers how best to use today's regular
    expression implementations, but not how best to implement them.
 3  What little text it devotes to implementation issues perpetuates the
    widespread belief that recursive backtracking is the only way to
    simulate an NFA.
 4  Friedl makes it clear that he [neither understands nor respects] the
    underlying theory.’  http://regex.info/blog/2006-09-15/248

I think Grant is after what Russ addresses in sentence 2.  :-)

> The impression is that Friedl shows wonderfully how to _use_ regular
> expressions, but does not understand the theory behind their
> implementation.

Yes, Friedl does show that wonderfully.  From long-ago memory, Friedl
understands enough to have diagrams of NFAs and DFAs clocking through
their inputs, showing the differences in number of states, etc.

Yes, Friedl says an NFA must recursively backtrack.  As Russ says in #3,
it was a ‘widespread belief’.  Friedl didn't originate it; I ‘knew’ it
before reading his book.  Friedl was at the sharp end of regexps,
needing to process large amounts of text, at Yahoo! IIRC.  He
investigated how the programs available behaved; he didn't start at the
theory and come up with a new program best suited to his needs.

> Personally, I'd stick with Russ's stuff, especially as `egrep` is the
> target here.

Russ's stuff is great.  He refuted that widespread belief, for one
thing.  But Russ isn't trying to teach a programmer how to best use the
regexp engine in sed, grep, egrep, Perl, PCRE, ... whereas Friedl takes
the many pages needed to do this.

It depends what one wants to learn first.

As Friedl says in the post Russ linked to:

   ‘As a user, you don't care if it's regular, nonregular, unregular,
    irregular, or incontinent.  So long as you know what you can expect
    from it (something this chapter will show you), you know all you need
    to care about.

   ‘For those wishing to learn more about the theory of regular expressions,
    the classic computer-science text is chapter 3 of Aho, Sethi, and
    Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley,
    1986), commonly called “The Dragon Book” due to the cover design.
    More specifically, this is the “red dragon”.  The “green dragon”
    is its predecessor, Aho and Ullman's Principles of Compiler Design.’

In addition to the Dragon Book, Hopcroft and Ullman's ‘Automata Theory,
Languages, and Computation’ goes further into the subject.  Chapter two
has DFA, NFA, epsilon transitions, and uses searching text as an
example.  Chapter three is regular expressions, four is regular
languages.  Pushdown automata is chapter six.

Too many books, not enough time to read.  :-)

-- 
Cheers, Ralph.

  reply	other threads:[~2023-03-03 13:42 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy [this message]
2023-03-03 19:19       ` Grant Taylor via COFF
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` [COFF] " Grant Taylor via COFF
2023-03-08 11:22           ` Ed Bradford
2023-03-07 16:14         ` Dan Cross
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230303134215.3ED63215AA@orac.inputplus.co.uk \
    --to=ralph@inputplus.co.uk \
    --cc=coff@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).