From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] non greedy regular expressions
Date: Mon, 27 Oct 2008 13:13:57 +0000 [thread overview]
Message-ID: <7F426112C3E059066F0024AE@[192.168.1.2]> (raw)
In-Reply-To: <45432371edb2231a8b90975a0efdf7bb@terzarima.net>
> practical application. now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc
I thought they were "regular" because they "regularly" occurred in the
target text. Turns out other interpretations are possible. Though, mine has
the advantage of being correct :-D
The set of "big books on regular expressions" includes Jeffrey Friedl's
"Mastering Regular Expressions" that happens to contain a chapter by the
title "NFA, DFA, and POSIX" wherein he says:
> DFA Speed with NFA Capabilities: Regex Nirvana?
>
> I've said several times that a DFA can't provide capturing parentheses or
> backreferences. This is quite true, but it certainly doesn't preclude
> hybrid approaches that mix technologies in an attempt to reach regex
> nirvana. The sidebar in Section 4.6.3.1 told how NFAs have diverged from
> the theoretical straight and narrow in search of more power, and it's
> only natural that the same happens with DFAs. A DFA's construction makes
> it more difficult, but that doesn't mean impossible.
>
> GNU grep takes a simple but effective approach. It uses a DFA when
> possible, reverting to an NFA when backreferences are used. GNU awk does
> something similar—it uses GNU grep's fast shortest-leftmost DFA engine
> for simple "does it match" checks, and reverts to a different engine for
> checks where the actual extent of the match must be known. Since that
> other engine is an NFA, GNU awk can conveniently offer capturing
> parentheses, and it does via its special gensub function.
>
> Tcl's regex engine is a true hybrid, custom built by Henry Spencer (whom
> you may remember having played an important part in the early development
> and popularization of regular expressions see Section 3.1.1.6). The Tcl
> engine sometimes appears to be an NFA— it has lookaround, capturing
> parentheses, backreferences, and lazy quantifiers. Yet, it has true POSIX
> longest-leftmost match (see Section 4.6.1), and doesn't suffer from some
> of the NFA problems that we'll see in Chapter 6. It really seems quite
> wonderful.
>
> Currently, this engine is available only to Tcl, but Henry tells me that
> it's on his to-do list to break it out into a separate package that can
> be used by others.
Again, turns out the "big books on regular expressions" can give the
lowlife--that's me--things "hackers" deny them.
--On Monday, October 27, 2008 10:18 AM +0000 Charles Forsyth
<forsyth@terzarima.net> wrote:
>> Both systems are complex enough that essentially no one completely
>> understands them.
>
> this touches on an important point. the first introduction of regular
> expressions to editors was great, because it took some formal language
> theory and made it useful in an `every day' way. conversely, the theory
> provided significant underpinning (in a structural sense) to that
> practical application. now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc
> additions. (i think there is a similarity to Perl's relationship to the
> history of programming languages.) the result really is hard to reason
> about ... because it hasn't got that underpinning and the algebra has
> gone. it's worse than that: the code is a mess too, i supposed in
> consequence.
>
next prev parent reply other threads:[~2008-10-27 13:13 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-23 18:58 Rudolf Sykora
2008-10-23 19:05 ` erik quanstrom
2008-10-24 8:08 ` Rudolf Sykora
2008-10-24 12:23 ` erik quanstrom
2008-10-24 16:11 ` Rudolf Sykora
2008-10-24 16:54 ` erik quanstrom
2008-10-24 17:02 ` John Stalker
2008-10-24 17:15 ` Rob Pike
2008-10-24 17:41 ` Rudolf Sykora
2008-10-24 18:01 ` Russ Cox
2008-10-24 19:56 ` Rudolf Sykora
2008-10-24 21:10 ` Russ Cox
2008-10-24 21:40 ` Rudolf Sykora
2008-10-24 21:47 ` erik quanstrom
2008-10-24 22:04 ` Rudolf Sykora
2008-10-24 22:38 ` Gabriel Diaz Lopez de la Llave
2008-10-24 22:54 ` Charles Forsyth
2008-10-24 22:59 ` Charles Forsyth
2008-10-24 23:52 ` Tom Simons
2008-10-25 22:35 ` Rudolf Sykora
2008-10-25 23:02 ` Steve Simon
2008-10-26 8:57 ` John Stalker
2008-10-26 18:36 ` Eris Discordia
2008-10-27 4:55 ` Russ Cox
2008-10-27 8:28 ` Rudolf Sykora
2008-10-27 10:18 ` Charles Forsyth
2008-10-27 13:13 ` Eris Discordia [this message]
2008-10-27 13:23 ` erik quanstrom
2008-10-27 19:42 ` Eris Discordia
2008-10-27 16:13 ` Brian L. Stuart
2008-11-30 8:29 ` Yard Ape
2008-12-11 16:32 ` Rudolf Sykora
2008-10-24 18:02 ` John Stalker
2008-10-24 17:10 ` Uriel
2008-10-24 19:56 ` Charles Forsyth
2008-10-24 19:56 ` Rudolf Sykora
2008-10-26 21:23 ` Rob Pike
2008-10-24 11:27 Aharon Robbins
2008-10-27 19:23 Aharon Robbins
2008-10-27 20:15 ` Eris Discordia
2008-10-27 20:00 Eris Discordia
2008-10-28 14:51 ` Brian L. Stuart
2008-10-28 15:07 ` Eris Discordia
2008-10-27 21:08 Aharon Robbins
2008-10-28 14:53 ` Eris Discordia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='7F426112C3E059066F0024AE@[192.168.1.2]' \
--to=eris.discordia@gmail.com \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).