9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: "Rudolf Sykora" <rudolf.sykora@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
Subject: Re: [9fans] non greedy regular expressions
Date: Fri, 24 Oct 2008 21:56:57 +0200	[thread overview]
Message-ID: <a560a5d00810241256q6ca59f46n4f0d70cf63475f51@mail.gmail.com> (raw)
In-Reply-To: <dd6fe68a0810241101s6d8a9e57n80e28550ca80255@mail.gmail.com>

> In that model, it is not accurate to describe the * + ?
> operators as greedy or not.  None of them is working
> toward any goal other than the overall longest match
> at the leftmost position.

So then I must be mistaken about the terminology.
I thought greedy=leftmost-longest, while non-greedy=leftmost-first:
Having
bla bla (AB)(CDEF)(GH)
/\(.*\)/
matches the whole (AB)(CDEF)(GH), 'greedy', while if '*' were
'non-greedy', I'd expect a match with (AB). This is now not in Plan9.
This mentioned example is easy to solve, if I want to go in 'bracket'
steps with
/\([^)]*\)/
but there are harder examples, eg. where the_interesting_part is
delimited with a more complicated structure ('ABC' here):
ABCthe_interesting_partABC blabla bla ABCthe_interesting_partABC etc.
Now, how to parse it? With non-greedy '*', no problem. With 'greedy'
and no negative lookahead assertion? Ok, maybe 'y' in sam could help
(don't know). What about
ABCthe_interesting_partCBA blabla bla ABCthe_interesting_partCBA etc.?
All the thinking about this is simply removed with 'non-greedy' ops.

Ruda



> In Perl and its imitators, the match starts at the leftmost
> position but is otherwise the first one that is found,
> not necessarily the longest.  In that context, words like
> "greedy" and "non-greedy" start to make sense,
> because the behavior of any one operator influences which
> match is encountered first.
>
> Either approach -- leftmost-longest or leftmost-first --
> can be implemented using finite automata.
>
> Russ
>
>



  reply	other threads:[~2008-10-24 19:56 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23 18:58 Rudolf Sykora
2008-10-23 19:05 ` erik quanstrom
2008-10-24  8:08   ` Rudolf Sykora
2008-10-24 12:23     ` erik quanstrom
2008-10-24 16:11       ` Rudolf Sykora
2008-10-24 16:54         ` erik quanstrom
2008-10-24 17:02         ` John Stalker
2008-10-24 17:15           ` Rob Pike
2008-10-24 17:41           ` Rudolf Sykora
2008-10-24 18:01             ` Russ Cox
2008-10-24 19:56               ` Rudolf Sykora [this message]
2008-10-24 21:10                 ` Russ Cox
2008-10-24 21:40                   ` Rudolf Sykora
2008-10-24 21:47                     ` erik quanstrom
2008-10-24 22:04                       ` Rudolf Sykora
2008-10-24 22:38                         ` Gabriel Diaz Lopez de la Llave
2008-10-24 22:54                         ` Charles Forsyth
2008-10-24 22:59                           ` Charles Forsyth
2008-10-24 23:52                         ` Tom Simons
2008-10-25 22:35                           ` Rudolf Sykora
2008-10-25 23:02                             ` Steve Simon
2008-10-26  8:57                             ` John Stalker
2008-10-26 18:36                               ` Eris Discordia
2008-10-27  4:55                             ` Russ Cox
2008-10-27  8:28                               ` Rudolf Sykora
2008-10-27 10:18                               ` Charles Forsyth
2008-10-27 13:13                                 ` Eris Discordia
2008-10-27 13:23                                   ` erik quanstrom
2008-10-27 19:42                                     ` Eris Discordia
2008-10-27 16:13                                   ` Brian L. Stuart
2008-11-30  8:29                             ` Yard Ape
2008-12-11 16:32                               ` Rudolf Sykora
2008-10-24 18:02             ` John Stalker
2008-10-24 17:10         ` Uriel
2008-10-24 19:56         ` Charles Forsyth
2008-10-24 19:56           ` Rudolf Sykora
2008-10-26 21:23             ` Rob Pike
2008-10-24 11:27 Aharon Robbins
2008-10-27 19:23 Aharon Robbins
2008-10-27 20:15 ` Eris Discordia
2008-10-27 20:00 Eris Discordia
2008-10-28 14:51 ` Brian L. Stuart
2008-10-28 15:07   ` Eris Discordia
2008-10-27 21:08 Aharon Robbins
2008-10-28 14:53 ` Eris Discordia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a560a5d00810241256q6ca59f46n4f0d70cf63475f51@mail.gmail.com \
    --to=rudolf.sykora@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).