9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
* [9fans] Recursive structural expressions?
@ 2009-08-20 17:52 J.R. Mauro
  2009-08-20 19:53 ` erik quanstrom
  2009-08-21 14:24 ` Russ Cox
  0 siblings, 2 replies; 15+ messages in thread
From: J.R. Mauro @ 2009-08-20 17:52 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Has anyone given thought to recursive structural regular expressions?
I didn't see a way to do recursion from reading the paper and
experimenting.

The reason for recursion would be that instead of having hopped-up
regexes, we'd now have context-free expressions. Yacc needing tokens
makes doing CF things a pain, but I think if grep, sed, sam, etc
understood a simple CF expression language, we'd have something that
was (mathematically) more powerful.

One (perhaps silly) example is that you could see if your source code
had balanced parens/braces/brackets/tags in a jiffy, but other
examples include searching through records with complex structure.

I suppose it would likely be too slow, but would it cost anything when
it's not used?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro
@ 2009-08-20 19:53 ` erik quanstrom
  2009-08-20 20:33   ` J.R. Mauro
  2009-08-21 14:24 ` Russ Cox
  1 sibling, 1 reply; 15+ messages in thread
From: erik quanstrom @ 2009-08-20 19:53 UTC (permalink / raw)
  To: 9fans

On Thu Aug 20 13:54:35 EDT 2009, jrm8005@gmail.com wrote:
> Has anyone given thought to recursive structural regular expressions?

i'm sure i've done them.  this is a lame example, since i can't remember
where i've used these techniques off the top of my head

,x:.*: g/#pragma/ x:[^ 	]+[ 	]: g/print/p

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 19:53 ` erik quanstrom
@ 2009-08-20 20:33   ` J.R. Mauro
  2009-08-20 20:39     ` erik quanstrom
  0 siblings, 1 reply; 15+ messages in thread
From: J.R. Mauro @ 2009-08-20 20:33 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Aug 20, 2009 at 3:53 PM, erik quanstrom<quanstro@quanstro.net> wrote:
> On Thu Aug 20 13:54:35 EDT 2009, jrm8005@gmail.com wrote:
>> Has anyone given thought to recursive structural regular expressions?
>
> i'm sure i've done them.  this is a lame example, since i can't remember
> where i've used these techniques off the top of my head
>
> ,x:.*: g/#pragma/ x:[^  ]+[     ]: g/print/p

How is this recursive?

>
> - erik
>
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 20:33   ` J.R. Mauro
@ 2009-08-20 20:39     ` erik quanstrom
  2009-08-20 21:14       ` J.R. Mauro
  0 siblings, 1 reply; 15+ messages in thread
From: erik quanstrom @ 2009-08-20 20:39 UTC (permalink / raw)
  To: 9fans

> > i'm sure i've done them.  this is a lame example, since i can't remember
> > where i've used these techniques off the top of my head
> >
> > ,x:.*: g/#pragma/ x:[^  ]+[     ]: g/print/p
>
> How is this recursive?

in the sense that 'x' is recursively applied to the output of 'x'.
i have no idea what you would mean by applying the same 'x'
expression to the output of said 'x' expression, as regular expressions
are greedy and can't count.  wouldn't they all loop forever?

can you outline somehow what you're thinking of?

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 20:39     ` erik quanstrom
@ 2009-08-20 21:14       ` J.R. Mauro
  2009-08-20 21:26         ` erik quanstrom
  0 siblings, 1 reply; 15+ messages in thread
From: J.R. Mauro @ 2009-08-20 21:14 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Aug 20, 2009 at 4:39 PM, erik quanstrom<quanstro@quanstro.net> wrote:
>> > i'm sure i've done them.  this is a lame example, since i can't remember
>> > where i've used these techniques off the top of my head
>> >
>> > ,x:.*: g/#pragma/ x:[^  ]+[     ]: g/print/p
>>
>> How is this recursive?
>
> in the sense that 'x' is recursively applied to the output of 'x'.
> i have no idea what you would mean by applying the same 'x'
> expression to the output of said 'x' expression, as regular expressions
> are greedy and can't count.  wouldn't they all loop forever?

No, I know you can apply them `recursively', I mean something more
like an expression in a CFG or yacc.

>
> can you outline somehow what you're thinking of?

Basically, if you could take a bracketed expression in sam and then
name it, and then call it recursively.

All the problems with CFGs (shift/reduce problems, ambiguities, etc)
would possibly apply.

>
> - erik
>
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 21:14       ` J.R. Mauro
@ 2009-08-20 21:26         ` erik quanstrom
  2009-08-21  0:21           ` J.R. Mauro
  0 siblings, 1 reply; 15+ messages in thread
From: erik quanstrom @ 2009-08-20 21:26 UTC (permalink / raw)
  To: 9fans

> No, I know you can apply them `recursively', I mean something more
> like an expression in a CFG or yacc.
>
> >
> > can you outline somehow what you're thinking of?
>
> Basically, if you could take a bracketed expression in sam and then
> name it, and then call it recursively.
>
> All the problems with CFGs (shift/reduce problems, ambiguities, etc)
> would possibly apply.

could you give an example in your proposed language?
i don't see how greedy regular expressions wouldn't kill you.
example.  let's say you have a SRE that breaks text into lines,
you couldn't apply that recursively.  in fact i think you'd have
the same problem with any SRE, since they are greedy and
can't count.

maybe i'm just confused because 'x' goes from a blob of text
to a stream of tokens.  where as grammars go from a stream
of tokens to productions.  maybe you mean to replace the
traditional tokenizer with named SREs?

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 21:26         ` erik quanstrom
@ 2009-08-21  0:21           ` J.R. Mauro
  2009-08-21  1:01             ` erik quanstrom
  0 siblings, 1 reply; 15+ messages in thread
From: J.R. Mauro @ 2009-08-21  0:21 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Aug 20, 2009 at 5:26 PM, erik quanstrom<quanstro@quanstro.net> wrote:
>> No, I know you can apply them `recursively', I mean something more
>> like an expression in a CFG or yacc.
>>
>> >
>> > can you outline somehow what you're thinking of?
>>
>> Basically, if you could take a bracketed expression in sam and then
>> name it, and then call it recursively.
>>
>> All the problems with CFGs (shift/reduce problems, ambiguities, etc)
>> would possibly apply.
>
> could you give an example in your proposed language?
> i don't see how greedy regular expressions wouldn't kill you.
> example.  let's say you have a SRE that breaks text into lines,
> you couldn't apply that recursively.  in fact i think you'd have
> the same problem with any SRE, since they are greedy and
> can't count.
>
> maybe i'm just confused because 'x' goes from a blob of text
> to a stream of tokens.  where as grammars go from a stream
> of tokens to productions.  maybe you mean to replace the
> traditional tokenizer with named SREs?

Here's an example. Let's make the syntax extra pukey: @#, where # is
1-9, defines a `named procedure', which is the same thing as putting
something in braces in Sam.

x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | }

Spaces have been added to reduce ugliness, but in the ``real'' syntax,
the spaces would have to be left out. The last pipe followed by a
space represents the null string. The entire expression should,
despite me being really rusty at CS Theory, match balanced strings of
parentheses.

The problems that would arise with this scheme include:
0) Infinite recursion
1) Shift/reduce errors
2) Ambiguities

I'm not sure if this is as powerful as CFGs, but it is at least more
powerful than regular expressions. Again, I'm not great at theory and
it's been a while since I did any. If we call this syntax S, I have
proven (hopefully correctly) only this:

L(Reg) < L(S) <= L(CFG)

Newsham was thinking about it more lucidly than I was on the IRC
channel about what it's really equivalent to.

Hopefully this makes it clear.

>
> - erik
>
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-21  0:21           ` J.R. Mauro
@ 2009-08-21  1:01             ` erik quanstrom
  2009-08-21  1:14               ` Charles Forsyth
  2009-08-21 13:58               ` J. R. Mauro
  0 siblings, 2 replies; 15+ messages in thread
From: erik quanstrom @ 2009-08-21  1:01 UTC (permalink / raw)
  To: 9fans

> Here's an example. Let's make the syntax extra pukey: @#, where # is
> 1-9, defines a `named procedure', which is the same thing as putting
> something in braces in Sam.
>
> x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | }
>

x/re/ repeatedly sets . with matches until the input
is exhausted.  so i don't think i understand your example.
how does /.*\n/{(.*\n)|.*\n(.*\n)|} match anything
but the tautological .*\n|?  for example, if the input
is "line1\nline2\n", then . is in turn "line1\n" and "line2\n",
and both match .*\n|.

where do the actions go in your example?

- erik



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-21  1:01             ` erik quanstrom
@ 2009-08-21  1:14               ` Charles Forsyth
  2009-08-22  1:10                 ` J.R. Mauro
  2009-08-21 13:58               ` J. R. Mauro
  1 sibling, 1 reply; 15+ messages in thread
From: Charles Forsyth @ 2009-08-21  1:14 UTC (permalink / raw)
  To: 9fans

qed allowed naming of regular expressions using `e' and their recursive invocation
using \E, with results suggested earlier.

http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html
http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf

``It should be noted that the ability to define regular expressions recursively makes the term "regular expression" a misnomer: it is not hard to see that expressions can be constructed to match exactly the members of any given context-free language.''



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-21  1:01             ` erik quanstrom
  2009-08-21  1:14               ` Charles Forsyth
@ 2009-08-21 13:58               ` J. R. Mauro
  1 sibling, 0 replies; 15+ messages in thread
From: J. R. Mauro @ 2009-08-21 13:58 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: 9fans





On Aug 20, 2009, at 21:01, erik quanstrom <quanstro@quanstro.net> wrote:

>> Here's an example. Let's make the syntax extra pukey: @#, where # is
>> 1-9, defines a `named procedure', which is the same thing as putting
>> something in braces in Sam.
>>
>> x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | }
>>
>
> x/re/ repeatedly sets . with matches until the input
> is exhausted.  so i don't think i understand your example.
> how does /.*\n/{(.*\n)|.*\n(.*\n)|} match anything
> but the tautological .*\n|?  for example, if the input
> is "line1\nline2\n", then . is in turn "line1\n" and "line2\n",
> and both match .*\n|.
>
> where do the actions go in your example?

I guess I botched the example a bit. The whole part after the @ should
be in an x//, and the actions would come after. Something more
complicated would allow you to match any parenthetical expression on a
line and alter it, which could be useful.

>
> - erik
>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro
  2009-08-20 19:53 ` erik quanstrom
@ 2009-08-21 14:24 ` Russ Cox
  2009-08-21 14:57   ` J. R. Mauro
  1 sibling, 1 reply; 15+ messages in thread
From: Russ Cox @ 2009-08-21 14:24 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

Regular expressions, in the strict formal sense, have
an important property: they completely express the set
of patterns that can be searched for in a single linear-time
pass through the text.  That is, they have an associated
linear-time performance guarantee.

In your particular case, adding recursion would produce
context-free expressions, and general context-free parsing as
typically implemented is cubic as opposed to linear.
(It turns out to be the same problem as matrix multiplication,
so with a lot more effort you could get it down to n^2.38 or
perhaps even lower, but you can't possibly get sub-quadratic.)

As Charles pointed out, there is a long, distinguished
history of editors and other tools making regular expressions
more expressive.  It is important to remember that this
is always done at the cost of giving up the performance
guarantee, at least in some cases.  And unless a general
matcher can distinguish the different cases and implement
separate algorithms for each, it usually gives up the
performance guarantee in all cases.

Russ


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-21 14:24 ` Russ Cox
@ 2009-08-21 14:57   ` J. R. Mauro
  0 siblings, 0 replies; 15+ messages in thread
From: J. R. Mauro @ 2009-08-21 14:57 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs; +Cc: Fans of the OS Plan 9 from Bell Labs





On Aug 21, 2009, at 10:24, Russ Cox <rsc@swtch.com> wrote:

> Regular expressions, in the strict formal sense, have
> an important property: they completely express the set
> of patterns that can be searched for in a single linear-time
> pass through the text.  That is, they have an associated
> linear-time performance guarantee.
>
> In your particular case, adding recursion would produce
> context-free expressions, and general context-free parsing as
> typically implemented is cubic as opposed to linear.
> (It turns out to be the same problem as matrix multiplication,
> so with a lot more effort you could get it down to n^2.38 or
> perhaps even lower, but you can't possibly get sub-quadratic.)
>
> As Charles pointed out, there is a long, distinguished
> history of editors and other tools making regular expressions
> more expressive.  It is important to remember that this
> is always done at the cost of giving up the performance
> guarantee, at least in some cases.  And unless a general
> matcher can distinguish the different cases and implement
> separate algorithms for each, it usually gives up the
> performance guarantee in all cases.
>
> Russ
>

This, and complexity/ambiguity are almost certainly why they didn't
carry from qed into ed.

Perhaps at least a context-free grep would be a useful thing to have,
considering the limitations CF expressions would place on a text editor.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-21  1:14               ` Charles Forsyth
@ 2009-08-22  1:10                 ` J.R. Mauro
  2009-08-23 19:59                   ` Aharon Robbins
  0 siblings, 1 reply; 15+ messages in thread
From: J.R. Mauro @ 2009-08-22  1:10 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

On Thu, Aug 20, 2009 at 9:14 PM, Charles Forsyth<forsyth@terzarima.net> wrote:
> qed allowed naming of regular expressions using `e' and their recursive invocation
> using \E, with results suggested earlier.
>
> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html
> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf
>
> ``It should be noted that the ability to define regular expressions recursively makes the term "regular expression" a misnomer: it is not hard to see that expressions can be constructed to match exactly the members of any given context-free language.''
>
>

I guess I missed this when I last read that paper. Do you know how qed
dealt with infinite recursion or ambiguous CF expressions?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-22  1:10                 ` J.R. Mauro
@ 2009-08-23 19:59                   ` Aharon Robbins
  2009-08-23 21:27                     ` Rob Pike
  0 siblings, 1 reply; 15+ messages in thread
From: Aharon Robbins @ 2009-08-23 19:59 UTC (permalink / raw)
  To: jrm8005; +Cc: 9fans

I have a file dated October 30 1987 which is the source for a qed
clone written at U of Toronto.  A quick glance at the man page
seems to indicate that it doesn't have recursive regular expressions.

I'm not sure who I got it from - possibly Henry Spencer.

Anyone know anymore about it?  I can put it up where people can get
to it if there's interest.

Thanks,

Arnold

In article <3aaafc130908211810u3afbd969ne259a57162d07325@mail.gmail.com> you write:
>On Thu, Aug 20, 2009 at 9:14 PM, Charles Forsyth<forsyth@terzarima.net> wrote:
>> qed allowed naming of regular expressions using `e' and their
>recursive invocation
>> using \E, with results suggested earlier.
>>
>> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html
>> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf
>>
>> ``It should be noted that the ability to define regular expressions
>recursively makes the term "regular expression" a misnomer: it is not
>hard to see that expressions can be constructed to match exactly the
>members of any given context-free language.''
>>
>>
>
>I guess I missed this when I last read that paper. Do you know how qed
>dealt with infinite recursion or ambiguous CF expressions?
>


--
Aharon (Arnold) Robbins 				arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [9fans] Recursive structural expressions?
  2009-08-23 19:59                   ` Aharon Robbins
@ 2009-08-23 21:27                     ` Rob Pike
  0 siblings, 0 replies; 15+ messages in thread
From: Rob Pike @ 2009-08-23 21:27 UTC (permalink / raw)
  To: Fans of the OS Plan 9 from Bell Labs

That qed was done in the 1970s by Tom Duff, Hugh Redelmeier, David
Tilbrook and myself.  The regular expressions are from ed, plus simple
forms of parentheses, alternation, and backreferences and a couple of
minor tweaks that later appeared in vi.

It's essentially ed with some original qed features put back in, but
with our spin.  These include multiple files (buffers), string
registers, and a couple of related commands.

http://cm.bell-labs.com/cm/cs/who/dmr/qed.html

-rob



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-08-23 21:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro
2009-08-20 19:53 ` erik quanstrom
2009-08-20 20:33   ` J.R. Mauro
2009-08-20 20:39     ` erik quanstrom
2009-08-20 21:14       ` J.R. Mauro
2009-08-20 21:26         ` erik quanstrom
2009-08-21  0:21           ` J.R. Mauro
2009-08-21  1:01             ` erik quanstrom
2009-08-21  1:14               ` Charles Forsyth
2009-08-22  1:10                 ` J.R. Mauro
2009-08-23 19:59                   ` Aharon Robbins
2009-08-23 21:27                     ` Rob Pike
2009-08-21 13:58               ` J. R. Mauro
2009-08-21 14:24 ` Russ Cox
2009-08-21 14:57   ` J. R. Mauro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).