9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] non greedy regular expressions
Date: Mon, 27 Oct 2008 13:13:57 +0000	[thread overview]
Message-ID: <7F426112C3E059066F0024AE@[192.168.1.2]> (raw)
In-Reply-To: <45432371edb2231a8b90975a0efdf7bb@terzarima.net>

> practical application.  now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc

I thought they were "regular" because they "regularly" occurred in the 
target text. Turns out other interpretations are possible. Though, mine has 
the advantage of being correct :-D

The set of "big books on regular expressions" includes Jeffrey Friedl's 
"Mastering Regular Expressions" that happens to contain a chapter by the 
title "NFA, DFA, and POSIX" wherein he says:

> DFA Speed with NFA Capabilities: Regex Nirvana?
>
> I've said several times that a DFA can't provide capturing parentheses or
> backreferences. This is quite true, but it certainly doesn't preclude
> hybrid approaches that mix technologies in an attempt to reach regex
> nirvana. The sidebar in Section 4.6.3.1 told how NFAs have diverged from
> the theoretical straight and narrow in search of more power, and it's
> only natural that the same happens with DFAs. A DFA's construction makes
> it more difficult, but that doesn't mean impossible.
>
> GNU grep takes a simple but effective approach. It uses a DFA when
> possible, reverting to an NFA when backreferences are used. GNU awk does
> something similar—it uses GNU grep's fast shortest-leftmost DFA engine
> for simple "does it match" checks, and reverts to a different engine for
> checks where the actual extent of the match must be known. Since that
> other engine is an NFA, GNU awk can conveniently offer capturing
> parentheses, and it does via its special gensub function.
>
> Tcl's regex engine is a true hybrid, custom built by Henry Spencer (whom
> you may remember having played an important part in the early development
> and popularization of regular expressions see Section 3.1.1.6). The Tcl
> engine sometimes appears to be an NFA— it has lookaround, capturing
> parentheses, backreferences, and lazy quantifiers. Yet, it has true POSIX
> longest-leftmost match (see Section 4.6.1), and doesn't suffer from some
> of the NFA problems that we'll see in Chapter 6. It really seems quite
> wonderful.
>
> Currently, this engine is available only to Tcl, but Henry tells me that
> it's on his to-do list to break it out into a separate package that can
> be used by others.

Again, turns out the "big books on regular expressions" can give the 
lowlife--that's me--things "hackers" deny them.

--On Monday, October 27, 2008 10:18 AM +0000 Charles Forsyth 
<forsyth@terzarima.net> wrote:

>> Both systems are complex enough that essentially no one completely
>> understands them.
>
> this touches on an important point. the first introduction of regular
> expressions to editors was great, because it took some formal language
> theory and made it useful in an `every day' way. conversely, the theory
> provided significant underpinning (in a structural sense) to that
> practical application.  now there are big books on `regular expressions'
> mainly because they are no longer regular but a big collection of ad-hoc
> additions. (i think there is a similarity to Perl's relationship to the
> history of programming languages.) the result really is hard to reason
> about ... because it hasn't got that underpinning and the algebra has
> gone. it's worse than that: the code is a mess too, i supposed in
> consequence.
>







  reply	other threads:[~2008-10-27 13:13 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23 18:58 Rudolf Sykora
2008-10-23 19:05 ` erik quanstrom
2008-10-24  8:08   ` Rudolf Sykora
2008-10-24 12:23     ` erik quanstrom
2008-10-24 16:11       ` Rudolf Sykora
2008-10-24 16:54         ` erik quanstrom
2008-10-24 17:02         ` John Stalker
2008-10-24 17:15           ` Rob Pike
2008-10-24 17:41           ` Rudolf Sykora
2008-10-24 18:01             ` Russ Cox
2008-10-24 19:56               ` Rudolf Sykora
2008-10-24 21:10                 ` Russ Cox
2008-10-24 21:40                   ` Rudolf Sykora
2008-10-24 21:47                     ` erik quanstrom
2008-10-24 22:04                       ` Rudolf Sykora
2008-10-24 22:38                         ` Gabriel Diaz Lopez de la Llave
2008-10-24 22:54                         ` Charles Forsyth
2008-10-24 22:59                           ` Charles Forsyth
2008-10-24 23:52                         ` Tom Simons
2008-10-25 22:35                           ` Rudolf Sykora
2008-10-25 23:02                             ` Steve Simon
2008-10-26  8:57                             ` John Stalker
2008-10-26 18:36                               ` Eris Discordia
2008-10-27  4:55                             ` Russ Cox
2008-10-27  8:28                               ` Rudolf Sykora
2008-10-27 10:18                               ` Charles Forsyth
2008-10-27 13:13                                 ` Eris Discordia [this message]
2008-10-27 13:23                                   ` erik quanstrom
2008-10-27 19:42                                     ` Eris Discordia
2008-10-27 16:13                                   ` Brian L. Stuart
2008-11-30  8:29                             ` Yard Ape
2008-12-11 16:32                               ` Rudolf Sykora
2008-10-24 18:02             ` John Stalker
2008-10-24 17:10         ` Uriel
2008-10-24 19:56         ` Charles Forsyth
2008-10-24 19:56           ` Rudolf Sykora
2008-10-26 21:23             ` Rob Pike
2008-10-24 11:27 Aharon Robbins
2008-10-27 19:23 Aharon Robbins
2008-10-27 20:15 ` Eris Discordia
2008-10-27 20:00 Eris Discordia
2008-10-28 14:51 ` Brian L. Stuart
2008-10-28 15:07   ` Eris Discordia
2008-10-27 21:08 Aharon Robbins
2008-10-28 14:53 ` Eris Discordia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='7F426112C3E059066F0024AE@[192.168.1.2]' \
    --to=eris.discordia@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).