On 3/3/23 6:42 AM, Ralph Corderoy wrote:
> I think Grant is after what Russ addresses in sentence 2.  :-)

You are mostly correct.  The motivation for this thread is very much so 
wanting to learn "how best to use today's regular expression 
implementations".  However there is also the part of me that wants to 
have a little bit of understanding behind why the former is the case.

> Yes, Friedl does show that wonderfully.  From long-ago memory, Friedl
> understands enough to have diagrams of NFAs and DFAs clocking through
> their inputs, showing the differences in number of states, etc.

It seems like I need to find another copy of Friedl's book.  --  My 
current copy is boxed up for a move nearly 1k miles away.  :-/

> Yes, Friedl says an NFA must recursively backtrack.  As Russ says in #3,
> it was a ‘widespread belief’.  Friedl didn't originate it; I ‘knew’ it
> before reading his book.  Friedl was at the sharp end of regexps,
> needing to process large amounts of text, at Yahoo! IIRC.  He
> investigated how the programs available behaved; he didn't start at the
> theory and come up with a new program best suited to his needs.

It sounds like I'm coming from a similar position of "what is the best* 
way to process this corpus" more than "what is the underlying theory 
behind what I'm wanting to do".

> Russ's stuff is great.  He refuted that widespread belief, for one
> thing.  But Russ isn't trying to teach a programmer how to best use the
> regexp engine in sed, grep, egrep, Perl, PCRE, ... whereas Friedl takes
> the many pages needed to do this.

:-)

> It depends what one wants to learn first.

I'm learning that I'm more of a technician that wants to know how to use 
the existing tools to the best of his / their ability.  While having 
some interest in theory behind things.

> As Friedl says in the post Russ linked to:
> 
>     ‘As a user, you don't care if it's regular, nonregular, unregular,
>      irregular, or incontinent.  So long as you know what you can expect
>      from it (something this chapter will show you), you know all you need
>      to care about.

Yep.  That's the position that I would be in if someone were paying me 
to write the REs that I'm writing.

>     ‘For those wishing to learn more about the theory of regular expressions,
>      the classic computer-science text is chapter 3 of Aho, Sethi, and
>      Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley,
>      1986), commonly called “The Dragon Book” due to the cover design.
>      More specifically, this is the “red dragon”.  The “green dragon”
>      is its predecessor, Aho and Ullman's Principles of Compiler Design.’

This all sounds interesting to me, and like something I might add to my 
collection of books.  But it also sounds like something that will be an 
up hill read and vast learning opportunity.

> In addition to the Dragon Book, Hopcroft and Ullman's ‘Automata Theory,
> Languages, and Computation’ goes further into the subject.  Chapter two
> has DFA, NFA, epsilon transitions, and uses searching text as an
> example.  Chapter three is regular expressions, four is regular
> languages.  Pushdown automata is chapter six.
> 
> Too many books, not enough time to read.  :-)

Yep.  Even inventorying and keeping track of the books can be time 
consuming.  --  Thankfully I took some time to do exactly that and have 
access to that information on the super computer in my pocket.



-- 
Grant. . . .
unix || die