From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Fri, 24 Oct 2008 18:11:10 +0200 From: "Rudolf Sykora" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net> In-Reply-To: <765ef13a653652d5fcef9001ff70f814@quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <765ef13a653652d5fcef9001ff70f814@quanstro.net> Subject: Re: [9fans] non greedy regular expressions Topicbox-Message-UUID: 262a703a-ead4-11e9-9d60-3106f5b1d025 > well reading the code would be a travesty. it's curious > that neither the sam paper nor regexp(6) mentions > submatches. maybe i missed them. > > sed -n 's:.*(KRAK[A-Z]+*) +([a-zA-Z]+).*:\2, \1:gp' - erik Ok, so despite the documentation, some submatch tracking is there. But in all (?) your examples, as well as in the scripts you mentioned, this tracking is exclusively used with the s command (which is said to be unnecessary at least in sam/acme). If I try sth. like /( b(.)b)/a/\1\2/ on bla blb 56 I get bla blb\1\2 56 which is not quite what I want... How then? (I'd like to get 'bla blblblb 56') Further, in R. Cox's text (http://swtch.com/~rsc/regexp/regexp1.html) he claims that all nice features except for backreferences can be implemented with Thomson's NFA algorithm. And even the backreferences can be handled gracefully somehow. That is: ALL: non-greedy operators, generalized assertions, counted repetitions, character classes CAN be processed using the fast algorithm. Why then we don't have it? I once wrote a program in python and was pretty happy to have non-greedy operators and lookahead assertions on hand. Should I hadn't had those, I probably wouldn't have been able to write it (nicely). Ruda