From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Mon, 27 Oct 2008 13:13:57 +0000
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <7F426112C3E059066F0024AE@[192.168.1.2]>
In-Reply-To: <45432371edb2231a8b90975a0efdf7bb@terzarima.net>
References: <45432371edb2231a8b90975a0efdf7bb@terzarima.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Subject: Re: [9fans] non greedy regular expressions
Topicbox-Message-UUID: 27f10c26-ead4-11e9-9d60-3106f5b1d025

> practical application.  now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc

I thought they were "regular" because they "regularly" occurred in the=20
target text. Turns out other interpretations are possible. Though, mine has =

the advantage of being correct :-D

The set of "big books on regular expressions" includes Jeffrey Friedl's=20
"Mastering Regular Expressions" that happens to contain a chapter by the=20
title "NFA, DFA, and POSIX" wherein he says:

> DFA Speed with NFA Capabilities: Regex Nirvana?
>
> I've said several times that a DFA can't provide capturing parentheses or
> backreferences. This is quite true, but it certainly doesn't preclude
> hybrid approaches that mix technologies in an attempt to reach regex
> nirvana. The sidebar in Section 4.6.3.1 told how NFAs have diverged from
> the theoretical straight and narrow in search of more power, and it's
> only natural that the same happens with DFAs. A DFA's construction makes
> it more difficult, but that doesn't mean impossible.
>
> GNU grep takes a simple but effective approach. It uses a DFA when
> possible, reverting to an NFA when backreferences are used. GNU awk does
> something similar=E2=80=94it uses GNU grep's fast shortest-leftmost DFA =
engine
> for simple "does it match" checks, and reverts to a different engine for
> checks where the actual extent of the match must be known. Since that
> other engine is an NFA, GNU awk can conveniently offer capturing
> parentheses, and it does via its special gensub function.
>
> Tcl's regex engine is a true hybrid, custom built by Henry Spencer (whom
> you may remember having played an important part in the early development
> and popularization of regular expressions see Section 3.1.1.6). The Tcl
> engine sometimes appears to be an NFA=E2=80=94 it has lookaround, =
capturing
> parentheses, backreferences, and lazy quantifiers. Yet, it has true POSIX
> longest-leftmost match (see Section 4.6.1), and doesn't suffer from some
> of the NFA problems that we'll see in Chapter 6. It really seems quite
> wonderful.
>
> Currently, this engine is available only to Tcl, but Henry tells me that
> it's on his to-do list to break it out into a separate package that can
> be used by others.

Again, turns out the "big books on regular expressions" can give the=20
lowlife--that's me--things "hackers" deny them.

--On Monday, October 27, 2008 10:18 AM +0000 Charles Forsyth=20
<forsyth@terzarima.net> wrote:

>> Both systems are complex enough that essentially no one completely
>> understands them.
>
> this touches on an important point. the first introduction of regular
> expressions to editors was great, because it took some formal language
> theory and made it useful in an `every day' way. conversely, the theory
> provided significant underpinning (in a structural sense) to that
> practical application.  now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc
> additions. (i think there is a similarity to Perl's relationship to the
> history of programming languages.) the result really is hard to reason
> about ... because it hasn't got that underpinning and the algebra has
> gone. it's worse than that: the code is a mess too, i supposed in
> consequence.
>