From: "Rudolf Sykora" <rudolf.sykora@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
Subject: Re: [9fans] non greedy regular expressions
Date: Fri, 24 Oct 2008 21:56:57 +0200 [thread overview]
Message-ID: <a560a5d00810241256q6ca59f46n4f0d70cf63475f51@mail.gmail.com> (raw)
In-Reply-To: <dd6fe68a0810241101s6d8a9e57n80e28550ca80255@mail.gmail.com>
> In that model, it is not accurate to describe the * + ?
> operators as greedy or not. None of them is working
> toward any goal other than the overall longest match
> at the leftmost position.
So then I must be mistaken about the terminology.
I thought greedy=leftmost-longest, while non-greedy=leftmost-first:
Having
bla bla (AB)(CDEF)(GH)
/\(.*\)/
matches the whole (AB)(CDEF)(GH), 'greedy', while if '*' were
'non-greedy', I'd expect a match with (AB). This is now not in Plan9.
This mentioned example is easy to solve, if I want to go in 'bracket'
steps with
/\([^)]*\)/
but there are harder examples, eg. where the_interesting_part is
delimited with a more complicated structure ('ABC' here):
ABCthe_interesting_partABC blabla bla ABCthe_interesting_partABC etc.
Now, how to parse it? With non-greedy '*', no problem. With 'greedy'
and no negative lookahead assertion? Ok, maybe 'y' in sam could help
(don't know). What about
ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA etc.?
All the thinking about this is simply removed with 'non-greedy' ops.
Ruda
> In Perl and its imitators, the match starts at the leftmost
> position but is otherwise the first one that is found,
> not necessarily the longest. In that context, words like
> "greedy" and "non-greedy" start to make sense,
> because the behavior of any one operator influences which
> match is encountered first.
>
> Either approach -- leftmost-longest or leftmost-first --
> can be implemented using finite automata.
>
> Russ
>
>
next prev parent reply other threads:[~2008-10-24 19:56 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-23 18:58 Rudolf Sykora
2008-10-23 19:05 ` erik quanstrom
2008-10-24 8:08 ` Rudolf Sykora
2008-10-24 12:23 ` erik quanstrom
2008-10-24 16:11 ` Rudolf Sykora
2008-10-24 16:54 ` erik quanstrom
2008-10-24 17:02 ` John Stalker
2008-10-24 17:15 ` Rob Pike
2008-10-24 17:41 ` Rudolf Sykora
2008-10-24 18:01 ` Russ Cox
2008-10-24 19:56 ` Rudolf Sykora [this message]
2008-10-24 21:10 ` Russ Cox
2008-10-24 21:40 ` Rudolf Sykora
2008-10-24 21:47 ` erik quanstrom
2008-10-24 22:04 ` Rudolf Sykora
2008-10-24 22:38 ` Gabriel Diaz Lopez de la Llave
2008-10-24 22:54 ` Charles Forsyth
2008-10-24 22:59 ` Charles Forsyth
2008-10-24 23:52 ` Tom Simons
2008-10-25 22:35 ` Rudolf Sykora
2008-10-25 23:02 ` Steve Simon
2008-10-26 8:57 ` John Stalker
2008-10-26 18:36 ` Eris Discordia
2008-10-27 4:55 ` Russ Cox
2008-10-27 8:28 ` Rudolf Sykora
2008-10-27 10:18 ` Charles Forsyth
2008-10-27 13:13 ` Eris Discordia
2008-10-27 13:23 ` erik quanstrom
2008-10-27 19:42 ` Eris Discordia
2008-10-27 16:13 ` Brian L. Stuart
2008-11-30 8:29 ` Yard Ape
2008-12-11 16:32 ` Rudolf Sykora
2008-10-24 18:02 ` John Stalker
2008-10-24 17:10 ` Uriel
2008-10-24 19:56 ` Charles Forsyth
2008-10-24 19:56 ` Rudolf Sykora
2008-10-26 21:23 ` Rob Pike
2008-10-24 11:27 Aharon Robbins
2008-10-27 19:23 Aharon Robbins
2008-10-27 20:15 ` Eris Discordia
2008-10-27 20:00 Eris Discordia
2008-10-28 14:51 ` Brian L. Stuart
2008-10-28 15:07 ` Eris Discordia
2008-10-27 21:08 Aharon Robbins
2008-10-28 14:53 ` Eris Discordia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a560a5d00810241256q6ca59f46n4f0d70cf63475f51@mail.gmail.com \
--to=rudolf.sykora@gmail.com \
--cc=9fans@9fans.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).