9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Eris Discordia <eris.discordia@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: Re: [9fans] non greedy regular expressions
Date: Mon, 27 Oct 2008 20:15:11 +0000	[thread overview]
Message-ID: <0F714DF8E3B2E24FC9148DBC@[192.168.1.2]> (raw)
In-Reply-To: <200810271923.m9RJNqSQ004211@skeeve.com>

> As other mails have pointed out, anything that isn't leftmost longest
> has weird semantics.  Non-greedy operators are mostly syntactic sugar.

Is (leftmost-longest + all-greedy operators) syntactic salt then?

> Not in the least. The Plan 9 regexp library in fact gives you close to
> the same nirvana; an automata that has DFA speed characteristics with
> the NFA's ability to capture sub texts.

Does regexp(n) also give the lowlife any hint of why it should behave
differently from Perl? Friedl's book doesn't, but it has good reason.

> Friedl's book is good for what it aims to be: an introduction to
> regular expressions.  But scientifically rigid (as in, on the same
> order as the dragon book) it's definitely not.

That's good. No problem with "big books on regular expressions," then.
Introductory ones, or scientific ones.

--On Monday, October 27, 2008 9:23 PM +0200 Aharon Robbins
<arnold@skeeve.com> wrote:

>> > GNU grep takes a simple but effective approach. It uses a DFA when
>> > possible, reverting to an NFA when backreferences are used. GNU awk
>> > does something similar---it uses GNU grep's fast shortest-leftmost DFA
>> > engine for simple "does it match" checks, and reverts to a different
>> > engine for checks where the actual extent of the match must be known.
>> > Since that other engine is an NFA, GNU awk can conveniently offer
>> > capturing parentheses, and it does via its special gensub function.
>
> I'll be the first to admit that gawk's behavior is something of a hack.
> I'd much rather be able to use a single matcher. (At least I was able to
> get Friedl to describe gawk's implementation correctly. :-)
>
>> > Tcl's regex engine is a true hybrid, custom built by Henry Spencer ....
>> >
>> > Currently, this engine is available only to Tcl, but Henry tells me
>> > that it's on his to-do list to break it out into a separate package
>> > that can be used by others.
>
> It's been on that TODO list for well over a decade, IIRC. More vaporware
> as far as a broader community is concerned.
>
>> Again, turns out the "big books on regular expressions" can give the
>> lowlife--that's me--things "hackers" deny them.
>
> Not in the least. The Plan 9 regexp library in fact gives you close to
> the same nirvana; an automata that has DFA speed characteristics with
> the NFA's ability to capture sub texts.
>
> As other mails have pointed out, anything that isn't leftmost longest
> has weird semantics.  Non-greedy operators are mostly syntactic sugar.
> I'll agree that in practice they're likely to be useful, but as I'm pretty
> much an awk guy (<grin>) I've never felt the pain of their lack, either.
>
> Gawk is stuck with its current two-matcher approach since it provides
> the option for different syntaxes, but that approach also has lots of
> problems; more than once I've come across cases where they interpret the
> same syntax differently, or don't handle multibyte characters the same.
> (Not to mention that they sntax bits API in the GNU matchers is a
> a real nightmare. Another thing that's too late to change.)
>
> If I could start over again, I'd start with either the plan 9 lib or
> ripping out Henry's library from tcl, but the former is probably easier.
>
> Friedl's book is good for what it aims to be: an introduction to
> regular expressions.  But scientifically rigid (as in, on the same
> order as the dragon book) it's definitely not.
>
> Russ's paper describes the state of the world pretty well.
>
> Arnold
>



  reply	other threads:[~2008-10-27 20:15 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-27 19:23 Aharon Robbins
2008-10-27 20:15 ` Eris Discordia [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-10-27 21:08 Aharon Robbins
2008-10-28 14:53 ` Eris Discordia
2008-10-27 20:00 Eris Discordia
2008-10-28 14:51 ` Brian L. Stuart
2008-10-28 15:07   ` Eris Discordia
2008-10-24 11:27 Aharon Robbins
2008-10-23 18:58 Rudolf Sykora
2008-10-23 19:05 ` erik quanstrom
2008-10-24  8:08   ` Rudolf Sykora
2008-10-24 12:23     ` erik quanstrom
2008-10-24 16:11       ` Rudolf Sykora
2008-10-24 16:54         ` erik quanstrom
2008-10-24 17:02         ` John Stalker
2008-10-24 17:15           ` Rob Pike
2008-10-24 17:41           ` Rudolf Sykora
2008-10-24 18:01             ` Russ Cox
2008-10-24 19:56               ` Rudolf Sykora
2008-10-24 21:10                 ` Russ Cox
2008-10-24 21:40                   ` Rudolf Sykora
2008-10-24 21:47                     ` erik quanstrom
2008-10-24 22:04                       ` Rudolf Sykora
2008-10-24 22:38                         ` Gabriel Diaz Lopez de la Llave
2008-10-24 22:54                         ` Charles Forsyth
2008-10-24 22:59                           ` Charles Forsyth
2008-10-24 23:52                         ` Tom Simons
2008-10-25 22:35                           ` Rudolf Sykora
2008-10-25 23:02                             ` Steve Simon
2008-10-26  8:57                             ` John Stalker
2008-10-26 18:36                               ` Eris Discordia
2008-10-27  4:55                             ` Russ Cox
2008-10-27  8:28                               ` Rudolf Sykora
2008-10-27 10:18                               ` Charles Forsyth
2008-10-27 13:13                                 ` Eris Discordia
2008-10-27 13:23                                   ` erik quanstrom
2008-10-27 19:42                                     ` Eris Discordia
2008-10-27 16:13                                   ` Brian L. Stuart
2008-11-30  8:29                             ` Yard Ape
2008-12-11 16:32                               ` Rudolf Sykora
2008-10-24 18:02             ` John Stalker
2008-10-24 17:10         ` Uriel
2008-10-24 19:56         ` Charles Forsyth
2008-10-24 19:56           ` Rudolf Sykora
2008-10-26 21:23             ` Rob Pike

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='0F714DF8E3B2E24FC9148DBC@[192.168.1.2]' \
    --to=eris.discordia@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).