* [9fans] Recursive structural expressions? @ 2009-08-20 17:52 J.R. Mauro 2009-08-20 19:53 ` erik quanstrom 2009-08-21 14:24 ` Russ Cox 0 siblings, 2 replies; 15+ messages in thread From: J.R. Mauro @ 2009-08-20 17:52 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Has anyone given thought to recursive structural regular expressions? I didn't see a way to do recursion from reading the paper and experimenting. The reason for recursion would be that instead of having hopped-up regexes, we'd now have context-free expressions. Yacc needing tokens makes doing CF things a pain, but I think if grep, sed, sam, etc understood a simple CF expression language, we'd have something that was (mathematically) more powerful. One (perhaps silly) example is that you could see if your source code had balanced parens/braces/brackets/tags in a jiffy, but other examples include searching through records with complex structure. I suppose it would likely be too slow, but would it cost anything when it's not used? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro @ 2009-08-20 19:53 ` erik quanstrom 2009-08-20 20:33 ` J.R. Mauro 2009-08-21 14:24 ` Russ Cox 1 sibling, 1 reply; 15+ messages in thread From: erik quanstrom @ 2009-08-20 19:53 UTC (permalink / raw) To: 9fans On Thu Aug 20 13:54:35 EDT 2009, jrm8005@gmail.com wrote: > Has anyone given thought to recursive structural regular expressions? i'm sure i've done them. this is a lame example, since i can't remember where i've used these techniques off the top of my head ,x:.*: g/#pragma/ x:[^ ]+[ ]: g/print/p - erik ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 19:53 ` erik quanstrom @ 2009-08-20 20:33 ` J.R. Mauro 2009-08-20 20:39 ` erik quanstrom 0 siblings, 1 reply; 15+ messages in thread From: J.R. Mauro @ 2009-08-20 20:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Aug 20, 2009 at 3:53 PM, erik quanstrom<quanstro@quanstro.net> wrote: > On Thu Aug 20 13:54:35 EDT 2009, jrm8005@gmail.com wrote: >> Has anyone given thought to recursive structural regular expressions? > > i'm sure i've done them. this is a lame example, since i can't remember > where i've used these techniques off the top of my head > > ,x:.*: g/#pragma/ x:[^ ]+[ ]: g/print/p How is this recursive? > > - erik > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 20:33 ` J.R. Mauro @ 2009-08-20 20:39 ` erik quanstrom 2009-08-20 21:14 ` J.R. Mauro 0 siblings, 1 reply; 15+ messages in thread From: erik quanstrom @ 2009-08-20 20:39 UTC (permalink / raw) To: 9fans > > i'm sure i've done them. this is a lame example, since i can't remember > > where i've used these techniques off the top of my head > > > > ,x:.*: g/#pragma/ x:[^ ]+[ ]: g/print/p > > How is this recursive? in the sense that 'x' is recursively applied to the output of 'x'. i have no idea what you would mean by applying the same 'x' expression to the output of said 'x' expression, as regular expressions are greedy and can't count. wouldn't they all loop forever? can you outline somehow what you're thinking of? - erik ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 20:39 ` erik quanstrom @ 2009-08-20 21:14 ` J.R. Mauro 2009-08-20 21:26 ` erik quanstrom 0 siblings, 1 reply; 15+ messages in thread From: J.R. Mauro @ 2009-08-20 21:14 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Aug 20, 2009 at 4:39 PM, erik quanstrom<quanstro@quanstro.net> wrote: >> > i'm sure i've done them. this is a lame example, since i can't remember >> > where i've used these techniques off the top of my head >> > >> > ,x:.*: g/#pragma/ x:[^ ]+[ ]: g/print/p >> >> How is this recursive? > > in the sense that 'x' is recursively applied to the output of 'x'. > i have no idea what you would mean by applying the same 'x' > expression to the output of said 'x' expression, as regular expressions > are greedy and can't count. wouldn't they all loop forever? No, I know you can apply them `recursively', I mean something more like an expression in a CFG or yacc. > > can you outline somehow what you're thinking of? Basically, if you could take a bracketed expression in sam and then name it, and then call it recursively. All the problems with CFGs (shift/reduce problems, ambiguities, etc) would possibly apply. > > - erik > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 21:14 ` J.R. Mauro @ 2009-08-20 21:26 ` erik quanstrom 2009-08-21 0:21 ` J.R. Mauro 0 siblings, 1 reply; 15+ messages in thread From: erik quanstrom @ 2009-08-20 21:26 UTC (permalink / raw) To: 9fans > No, I know you can apply them `recursively', I mean something more > like an expression in a CFG or yacc. > > > > > can you outline somehow what you're thinking of? > > Basically, if you could take a bracketed expression in sam and then > name it, and then call it recursively. > > All the problems with CFGs (shift/reduce problems, ambiguities, etc) > would possibly apply. could you give an example in your proposed language? i don't see how greedy regular expressions wouldn't kill you. example. let's say you have a SRE that breaks text into lines, you couldn't apply that recursively. in fact i think you'd have the same problem with any SRE, since they are greedy and can't count. maybe i'm just confused because 'x' goes from a blob of text to a stream of tokens. where as grammars go from a stream of tokens to productions. maybe you mean to replace the traditional tokenizer with named SREs? - erik ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 21:26 ` erik quanstrom @ 2009-08-21 0:21 ` J.R. Mauro 2009-08-21 1:01 ` erik quanstrom 0 siblings, 1 reply; 15+ messages in thread From: J.R. Mauro @ 2009-08-21 0:21 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Aug 20, 2009 at 5:26 PM, erik quanstrom<quanstro@quanstro.net> wrote: >> No, I know you can apply them `recursively', I mean something more >> like an expression in a CFG or yacc. >> >> > >> > can you outline somehow what you're thinking of? >> >> Basically, if you could take a bracketed expression in sam and then >> name it, and then call it recursively. >> >> All the problems with CFGs (shift/reduce problems, ambiguities, etc) >> would possibly apply. > > could you give an example in your proposed language? > i don't see how greedy regular expressions wouldn't kill you. > example. let's say you have a SRE that breaks text into lines, > you couldn't apply that recursively. in fact i think you'd have > the same problem with any SRE, since they are greedy and > can't count. > > maybe i'm just confused because 'x' goes from a blob of text > to a stream of tokens. where as grammars go from a stream > of tokens to productions. maybe you mean to replace the > traditional tokenizer with named SREs? Here's an example. Let's make the syntax extra pukey: @#, where # is 1-9, defines a `named procedure', which is the same thing as putting something in braces in Sam. x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | } Spaces have been added to reduce ugliness, but in the ``real'' syntax, the spaces would have to be left out. The last pipe followed by a space represents the null string. The entire expression should, despite me being really rusty at CS Theory, match balanced strings of parentheses. The problems that would arise with this scheme include: 0) Infinite recursion 1) Shift/reduce errors 2) Ambiguities I'm not sure if this is as powerful as CFGs, but it is at least more powerful than regular expressions. Again, I'm not great at theory and it's been a while since I did any. If we call this syntax S, I have proven (hopefully correctly) only this: L(Reg) < L(S) <= L(CFG) Newsham was thinking about it more lucidly than I was on the IRC channel about what it's really equivalent to. Hopefully this makes it clear. > > - erik > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-21 0:21 ` J.R. Mauro @ 2009-08-21 1:01 ` erik quanstrom 2009-08-21 1:14 ` Charles Forsyth 2009-08-21 13:58 ` J. R. Mauro 0 siblings, 2 replies; 15+ messages in thread From: erik quanstrom @ 2009-08-21 1:01 UTC (permalink / raw) To: 9fans > Here's an example. Let's make the syntax extra pukey: @#, where # is > 1-9, defines a `named procedure', which is the same thing as putting > something in braces in Sam. > > x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | } > x/re/ repeatedly sets . with matches until the input is exhausted. so i don't think i understand your example. how does /.*\n/{(.*\n)|.*\n(.*\n)|} match anything but the tautological .*\n|? for example, if the input is "line1\nline2\n", then . is in turn "line1\n" and "line2\n", and both match .*\n|. where do the actions go in your example? - erik ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-21 1:01 ` erik quanstrom @ 2009-08-21 1:14 ` Charles Forsyth 2009-08-22 1:10 ` J.R. Mauro 2009-08-21 13:58 ` J. R. Mauro 1 sibling, 1 reply; 15+ messages in thread From: Charles Forsyth @ 2009-08-21 1:14 UTC (permalink / raw) To: 9fans qed allowed naming of regular expressions using `e' and their recursive invocation using \E, with results suggested earlier. http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf ``It should be noted that the ability to define regular expressions recursively makes the term "regular expression" a misnomer: it is not hard to see that expressions can be constructed to match exactly the members of any given context-free language.'' ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-21 1:14 ` Charles Forsyth @ 2009-08-22 1:10 ` J.R. Mauro 2009-08-23 19:59 ` Aharon Robbins 0 siblings, 1 reply; 15+ messages in thread From: J.R. Mauro @ 2009-08-22 1:10 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Aug 20, 2009 at 9:14 PM, Charles Forsyth<forsyth@terzarima.net> wrote: > qed allowed naming of regular expressions using `e' and their recursive invocation > using \E, with results suggested earlier. > > http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html > http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf > > ``It should be noted that the ability to define regular expressions recursively makes the term "regular expression" a misnomer: it is not hard to see that expressions can be constructed to match exactly the members of any given context-free language.'' > > I guess I missed this when I last read that paper. Do you know how qed dealt with infinite recursion or ambiguous CF expressions? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-22 1:10 ` J.R. Mauro @ 2009-08-23 19:59 ` Aharon Robbins 2009-08-23 21:27 ` Rob Pike 0 siblings, 1 reply; 15+ messages in thread From: Aharon Robbins @ 2009-08-23 19:59 UTC (permalink / raw) To: jrm8005; +Cc: 9fans I have a file dated October 30 1987 which is the source for a qed clone written at U of Toronto. A quick glance at the man page seems to indicate that it doesn't have recursive regular expressions. I'm not sure who I got it from - possibly Henry Spencer. Anyone know anymore about it? I can put it up where people can get to it if there's interest. Thanks, Arnold In article <3aaafc130908211810u3afbd969ne259a57162d07325@mail.gmail.com> you write: >On Thu, Aug 20, 2009 at 9:14 PM, Charles Forsyth<forsyth@terzarima.net> wrote: >> qed allowed naming of regular expressions using `e' and their >recursive invocation >> using \E, with results suggested earlier. >> >> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.html >> http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf >> >> ``It should be noted that the ability to define regular expressions >recursively makes the term "regular expression" a misnomer: it is not >hard to see that expressions can be constructed to match exactly the >members of any given context-free language.'' >> >> > >I guess I missed this when I last read that paper. Do you know how qed >dealt with infinite recursion or ambiguous CF expressions? > -- Aharon (Arnold) Robbins arnold AT skeeve DOT com P.O. Box 354 Home Phone: +972 8 979-0381 Nof Ayalon Cell Phone: +972 50 729-7545 D.N. Shimshon 99785 ISRAEL ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-23 19:59 ` Aharon Robbins @ 2009-08-23 21:27 ` Rob Pike 0 siblings, 0 replies; 15+ messages in thread From: Rob Pike @ 2009-08-23 21:27 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs That qed was done in the 1970s by Tom Duff, Hugh Redelmeier, David Tilbrook and myself. The regular expressions are from ed, plus simple forms of parentheses, alternation, and backreferences and a couple of minor tweaks that later appeared in vi. It's essentially ed with some original qed features put back in, but with our spin. These include multiple files (buffers), string registers, and a couple of related commands. http://cm.bell-labs.com/cm/cs/who/dmr/qed.html -rob ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-21 1:01 ` erik quanstrom 2009-08-21 1:14 ` Charles Forsyth @ 2009-08-21 13:58 ` J. R. Mauro 1 sibling, 0 replies; 15+ messages in thread From: J. R. Mauro @ 2009-08-21 13:58 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs; +Cc: 9fans On Aug 20, 2009, at 21:01, erik quanstrom <quanstro@quanstro.net> wrote: >> Here's an example. Let's make the syntax extra pukey: @#, where # is >> 1-9, defines a `named procedure', which is the same thing as putting >> something in braces in Sam. >> >> x/.*\n/ @1{ ( @1 ) | @1 ( @1 ) ( ) | } >> > > x/re/ repeatedly sets . with matches until the input > is exhausted. so i don't think i understand your example. > how does /.*\n/{(.*\n)|.*\n(.*\n)|} match anything > but the tautological .*\n|? for example, if the input > is "line1\nline2\n", then . is in turn "line1\n" and "line2\n", > and both match .*\n|. > > where do the actions go in your example? I guess I botched the example a bit. The whole part after the @ should be in an x//, and the actions would come after. Something more complicated would allow you to match any parenthetical expression on a line and alter it, which could be useful. > > - erik > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro 2009-08-20 19:53 ` erik quanstrom @ 2009-08-21 14:24 ` Russ Cox 2009-08-21 14:57 ` J. R. Mauro 1 sibling, 1 reply; 15+ messages in thread From: Russ Cox @ 2009-08-21 14:24 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Regular expressions, in the strict formal sense, have an important property: they completely express the set of patterns that can be searched for in a single linear-time pass through the text. That is, they have an associated linear-time performance guarantee. In your particular case, adding recursion would produce context-free expressions, and general context-free parsing as typically implemented is cubic as opposed to linear. (It turns out to be the same problem as matrix multiplication, so with a lot more effort you could get it down to n^2.38 or perhaps even lower, but you can't possibly get sub-quadratic.) As Charles pointed out, there is a long, distinguished history of editors and other tools making regular expressions more expressive. It is important to remember that this is always done at the cost of giving up the performance guarantee, at least in some cases. And unless a general matcher can distinguish the different cases and implement separate algorithms for each, it usually gives up the performance guarantee in all cases. Russ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [9fans] Recursive structural expressions? 2009-08-21 14:24 ` Russ Cox @ 2009-08-21 14:57 ` J. R. Mauro 0 siblings, 0 replies; 15+ messages in thread From: J. R. Mauro @ 2009-08-21 14:57 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs; +Cc: Fans of the OS Plan 9 from Bell Labs On Aug 21, 2009, at 10:24, Russ Cox <rsc@swtch.com> wrote: > Regular expressions, in the strict formal sense, have > an important property: they completely express the set > of patterns that can be searched for in a single linear-time > pass through the text. That is, they have an associated > linear-time performance guarantee. > > In your particular case, adding recursion would produce > context-free expressions, and general context-free parsing as > typically implemented is cubic as opposed to linear. > (It turns out to be the same problem as matrix multiplication, > so with a lot more effort you could get it down to n^2.38 or > perhaps even lower, but you can't possibly get sub-quadratic.) > > As Charles pointed out, there is a long, distinguished > history of editors and other tools making regular expressions > more expressive. It is important to remember that this > is always done at the cost of giving up the performance > guarantee, at least in some cases. And unless a general > matcher can distinguish the different cases and implement > separate algorithms for each, it usually gives up the > performance guarantee in all cases. > > Russ > This, and complexity/ambiguity are almost certainly why they didn't carry from qed into ed. Perhaps at least a context-free grep would be a useful thing to have, considering the limitations CF expressions would place on a text editor. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-08-23 21:27 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-20 17:52 [9fans] Recursive structural expressions? J.R. Mauro 2009-08-20 19:53 ` erik quanstrom 2009-08-20 20:33 ` J.R. Mauro 2009-08-20 20:39 ` erik quanstrom 2009-08-20 21:14 ` J.R. Mauro 2009-08-20 21:26 ` erik quanstrom 2009-08-21 0:21 ` J.R. Mauro 2009-08-21 1:01 ` erik quanstrom 2009-08-21 1:14 ` Charles Forsyth 2009-08-22 1:10 ` J.R. Mauro 2009-08-23 19:59 ` Aharon Robbins 2009-08-23 21:27 ` Rob Pike 2009-08-21 13:58 ` J. R. Mauro 2009-08-21 14:24 ` Russ Cox 2009-08-21 14:57 ` J. R. Mauro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).