Computer Old Farts Forum
 help / color / mirror / Atom feed
From: Grant Taylor via COFF <coff@tuhs.org>
To: coff@tuhs.org
Subject: [COFF] Re: Requesting thoughts on extended regular expressions in grep.
Date: Fri, 3 Mar 2023 12:19:29 -0700	[thread overview]
Message-ID: <21e8477c-c388-7b90-ed10-21c7f76f0892@spamtrap.tnetconsulting.net> (raw)
In-Reply-To: <20230303134215.3ED63215AA@orac.inputplus.co.uk>

[-- Attachment #1: Type: text/plain, Size: 3590 bytes --]

On 3/3/23 6:42 AM, Ralph Corderoy wrote:
> I think Grant is after what Russ addresses in sentence 2.  :-)

You are mostly correct.  The motivation for this thread is very much so 
wanting to learn "how best to use today's regular expression 
implementations".  However there is also the part of me that wants to 
have a little bit of understanding behind why the former is the case.

> Yes, Friedl does show that wonderfully.  From long-ago memory, Friedl
> understands enough to have diagrams of NFAs and DFAs clocking through
> their inputs, showing the differences in number of states, etc.

It seems like I need to find another copy of Friedl's book.  --  My 
current copy is boxed up for a move nearly 1k miles away.  :-/

> Yes, Friedl says an NFA must recursively backtrack.  As Russ says in #3,
> it was a ‘widespread belief’.  Friedl didn't originate it; I ‘knew’ it
> before reading his book.  Friedl was at the sharp end of regexps,
> needing to process large amounts of text, at Yahoo! IIRC.  He
> investigated how the programs available behaved; he didn't start at the
> theory and come up with a new program best suited to his needs.

It sounds like I'm coming from a similar position of "what is the best* 
way to process this corpus" more than "what is the underlying theory 
behind what I'm wanting to do".

> Russ's stuff is great.  He refuted that widespread belief, for one
> thing.  But Russ isn't trying to teach a programmer how to best use the
> regexp engine in sed, grep, egrep, Perl, PCRE, ... whereas Friedl takes
> the many pages needed to do this.

:-)

> It depends what one wants to learn first.

I'm learning that I'm more of a technician that wants to know how to use 
the existing tools to the best of his / their ability.  While having 
some interest in theory behind things.

> As Friedl says in the post Russ linked to:
> 
>     ‘As a user, you don't care if it's regular, nonregular, unregular,
>      irregular, or incontinent.  So long as you know what you can expect
>      from it (something this chapter will show you), you know all you need
>      to care about.

Yep.  That's the position that I would be in if someone were paying me 
to write the REs that I'm writing.

>     ‘For those wishing to learn more about the theory of regular expressions,
>      the classic computer-science text is chapter 3 of Aho, Sethi, and
>      Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley,
>      1986), commonly called “The Dragon Book” due to the cover design.
>      More specifically, this is the “red dragon”.  The “green dragon”
>      is its predecessor, Aho and Ullman's Principles of Compiler Design.’

This all sounds interesting to me, and like something I might add to my 
collection of books.  But it also sounds like something that will be an 
up hill read and vast learning opportunity.

> In addition to the Dragon Book, Hopcroft and Ullman's ‘Automata Theory,
> Languages, and Computation’ goes further into the subject.  Chapter two
> has DFA, NFA, epsilon transitions, and uses searching text as an
> example.  Chapter three is regular expressions, four is regular
> languages.  Pushdown automata is chapter six.
> 
> Too many books, not enough time to read.  :-)

Yep.  Even inventorying and keeping track of the books can be time 
consuming.  --  Thankfully I took some time to do exactly that and have 
access to that information on the super computer in my pocket.



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4017 bytes --]

  reply	other threads:[~2023-03-03 19:19 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-02 18:54 [COFF] " Grant Taylor via COFF
2023-03-02 19:23 ` [COFF] " Clem Cole
2023-03-02 19:38   ` Grant Taylor via COFF
2023-03-02 23:01   ` Stuff Received
2023-03-02 23:46     ` Steffen Nurpmeso
2023-03-03  1:08     ` Grant Taylor via COFF
2023-03-03  2:10       ` Dave Horsfall
2023-03-03  3:34         ` Grant Taylor via COFF
2023-03-02 21:53 ` Dan Cross
2023-03-03  1:05   ` Grant Taylor via COFF
2023-03-03  3:04     ` Dan Cross
2023-03-03  3:53       ` Grant Taylor via COFF
2023-03-03 13:47         ` Dan Cross
2023-03-03 19:26           ` Grant Taylor via COFF
2023-03-03 10:59 ` Ralph Corderoy
2023-03-03 13:11   ` Dan Cross
2023-03-03 13:42     ` Ralph Corderoy
2023-03-03 19:19       ` Grant Taylor via COFF [this message]
2023-03-04 10:15         ` [COFF] Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.) Ralph Corderoy
2023-03-07 21:49           ` [COFF] " Tomasz Rola
2023-03-07 22:46             ` Tomasz Rola
2023-06-20 16:02           ` Michael Parson
2023-06-20 21:26             ` Tomasz Rola
2023-06-22 15:45               ` Michael Parson
2023-07-10  9:08                 ` [COFF] Re: Reader, paper, tablet, phone (was: Re: Reading PDFs on a mobile. (Was: Requesting thoughts on extended regular expressions in grep.)) Tomasz Rola
2023-03-03 16:12   ` [COFF] Re: Requesting thoughts on extended regular expressions in grep Dave Horsfall
2023-03-03 17:13     ` Dan Cross
2023-03-03 17:38       ` Ralph Corderoy
2023-03-03 19:09         ` Dan Cross
2023-03-03 19:36     ` Grant Taylor via COFF
2023-03-04 10:26       ` Ralph Corderoy
2023-03-03 19:06 ` Grant Taylor via COFF
2023-03-03 19:31   ` Dan Cross
2023-03-04 10:07   ` Ralph Corderoy
2023-03-06 10:01 ` Ed Bradford
2023-03-06 21:01   ` Dan Cross
2023-03-06 21:49     ` Steffen Nurpmeso
2023-03-07  1:43     ` Larry McVoy
2023-03-07  4:01       ` Ed Bradford
2023-03-07 11:39         ` [COFF] " Ralph Corderoy
2023-03-07 18:31           ` [COFF] " Grant Taylor via COFF
2023-03-08 11:22           ` Ed Bradford
2023-03-07 16:14         ` Dan Cross
2023-03-07 17:34           ` [COFF] " Ralph Corderoy
2023-03-07 18:33             ` [COFF] " Dan Cross
2023-03-07  4:19     ` Ed Bradford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21e8477c-c388-7b90-ed10-21c7f76f0892@spamtrap.tnetconsulting.net \
    --to=coff@tuhs.org \
    --cc=gtaylor@tnetconsulting.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).