From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <a560a5d00810240911gee79793h966c6f325ce19f97@mail.gmail.com>
Date: Fri, 24 Oct 2008 18:11:10 +0200
From: "Rudolf Sykora" <rudolf.sykora@gmail.com>
To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net>
In-Reply-To: <765ef13a653652d5fcef9001ff70f814@quanstro.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <a560a5d00810240108q62854ec1w16614c90c7071436@mail.gmail.com>
	<765ef13a653652d5fcef9001ff70f814@quanstro.net>
Subject: Re: [9fans] non greedy regular expressions
Topicbox-Message-UUID: 262a703a-ead4-11e9-9d60-3106f5b1d025

> well reading the code would be a travesty.  it's curious
> that neither the sam paper nor regexp(6) mentions
> submatches.  maybe i missed them.
>
> sed -n 's:.*(KRAK[A-Z]+*) +([a-zA-Z]+).*:\2, \1:gp' </lib/volcanoes
> - erik

Ok, so despite the documentation, some submatch tracking is there.
But in all (?) your examples, as well as in the scripts you mentioned,
this tracking is exclusively used with the s command (which is said to
be unnecessary at least in sam/acme). If I try sth. like
/( b(.)b)/a/\1\2/
on
bla blb 56
I get
bla blb\1\2 56
which is not quite what I want... How then? (I'd like to get 'bla blblblb 56')

Further, in R. Cox's text (http://swtch.com/~rsc/regexp/regexp1.html)
he claims that all nice features except for backreferences can be
implemented with Thomson's NFA algorithm. And even the backreferences
can be handled gracefully somehow. That is: ALL: non-greedy operators,
generalized assertions, counted repetitions, character classes CAN be
processed using the fast algorithm. Why then we don't have it? I once
wrote a program in python and was pretty happy to have non-greedy
operators and lookahead assertions on hand. Should I hadn't had those,
I probably wouldn't have been able to write it (nicely).

Ruda