* bufferwords() lexes a subshell in a shortloop repeat as a string @ 2016-01-15 6:26 Daniel Shahaf 2016-01-15 9:41 ` Peter Stephenson 0 siblings, 1 reply; 22+ messages in thread From: Daniel Shahaf @ 2016-01-15 6:26 UTC (permalink / raw) To: zsh-workers The ${(z)} modifier gives me a subshell as a single unit: % pz() { print -rl - ${(qq)${(z)1}} } % pz 'repeat 3 (echo this is a subshell)' 'repeat' '3' '(echo this is a subshell)' I expected the subshell to be broken into '(', 'echo', …, ')' tokens, as per usual. Looking at it in gdb, I see (after the third call to ctxtlex()): tok == STRING tokstr == "(echo this is a subshell)" Cheers, Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-15 6:26 bufferwords() lexes a subshell in a shortloop repeat as a string Daniel Shahaf @ 2016-01-15 9:41 ` Peter Stephenson 2016-01-15 19:25 ` Bart Schaefer 2016-01-18 2:25 ` Daniel Shahaf 0 siblings, 2 replies; 22+ messages in thread From: Peter Stephenson @ 2016-01-15 9:41 UTC (permalink / raw) To: zsh-workers On Fri, 15 Jan 2016 06:26:48 +0000 Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > The ${(z)} modifier gives me a subshell as a single unit: It doesn't know it's a subshell; it doesn't know you want the first character to be in command position. It's not parsing the whole thing as a command expression, it's just splitting words, and (... ...) indeed works as a complete word: % noglob print -l one (two three) four one (two three) four The noglob hints at why parentheses not in command position are treated like that --- it's convenient for glob qualifiers. Having the parenthesised expressions in the strings 'one (two three) four' and '(two three)' split in different ways by the same function typically would be confusing, though it depends what you're doing with the result. It might be possible to add a flag to cause an expression you pass in to be split as if it were a complete command line, not just an arbitrary set of arguments, but that's a whole new ball game. If you're trying to make the (z) work as a kind of eval without execution, I think you're expecting too much. pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-15 9:41 ` Peter Stephenson @ 2016-01-15 19:25 ` Bart Schaefer 2016-01-17 18:16 ` Peter Stephenson 2016-01-18 2:25 ` Daniel Shahaf 1 sibling, 1 reply; 22+ messages in thread From: Bart Schaefer @ 2016-01-15 19:25 UTC (permalink / raw) To: zsh-workers On Jan 15, 9:41am, Peter Stephenson wrote: } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str } } it's just splitting words, and (... ...) indeed works as a complete word Urk. That probably ought to be disabled, at least in shell emulation, e.g. here is bash: schaefer@burner$ echo one (two three) four bash: syntax error near unexpected token `(' Here's an interesting side effect: torch% touch "two three" torch% unsetopt bareglobqual torch% ls -l (two three) -rw-rw-r-- 1 schaefer schaefer 0 Jan 15 11:22 two three It becomes another way to quote spaces in file names, but only if the file already exists. torch% touch (one two) zsh: no matches found: (one two) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-15 19:25 ` Bart Schaefer @ 2016-01-17 18:16 ` Peter Stephenson 2016-01-17 22:46 ` Bart Schaefer 0 siblings, 1 reply; 22+ messages in thread From: Peter Stephenson @ 2016-01-17 18:16 UTC (permalink / raw) To: zsh-workers On Fri, 15 Jan 2016 11:25:16 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Jan 15, 9:41am, Peter Stephenson wrote: > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str > } > } it's just splitting words, and (... ...) indeed works as a complete word > > Urk. That probably ought to be disabled, at least in shell emulation, > e.g. here is bash: shwordsplit does this. (I thought it would be shglob, but it isn't.) By the way, I was too glib before: if you have a string that *starts* with "(", it *does* get split as a complete command line that starts with a subshell, even in native mode, as you might expect. So I think (z) is behaving basically rationally, but with the caveat I mentioned that it's a fairly brutal tool in comparison with real context sensitivity. pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-17 18:16 ` Peter Stephenson @ 2016-01-17 22:46 ` Bart Schaefer 2016-01-18 9:41 ` Peter Stephenson 0 siblings, 1 reply; 22+ messages in thread From: Bart Schaefer @ 2016-01-17 22:46 UTC (permalink / raw) To: zsh-workers On Jan 17, 6:16pm, Peter Stephenson wrote: } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str } } On Fri, 15 Jan 2016 11:25:16 -0800 } Bart Schaefer <schaefer@brasslantern.com> wrote: } > On Jan 15, 9:41am, Peter Stephenson wrote: } > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str } > } } > } it's just splitting words, and (... ...) indeed works as a complete word } > } > Urk. That probably ought to be disabled, at least in shell emulation, } > e.g. here is bash: } } shwordsplit does this. (I thought it would be shglob, but it isn't.) Hrm. I see no evidence of that. Src/zsh -f torch% emulate sh torch% print -l one (two three) four one (two three) four torch% ARGV0=sh Src/zsh $ print -l one (two three) four one (two three) four $ It appears that parens are still parsed as grouping, even though they thereafter are considered a literal pattern character. E.g., I expected "|" to be treated as a pipe in the following: $ print -l one (two three|foo) four one (two three|foo) four ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-17 22:46 ` Bart Schaefer @ 2016-01-18 9:41 ` Peter Stephenson 2016-01-18 16:36 ` Bart Schaefer 0 siblings, 1 reply; 22+ messages in thread From: Peter Stephenson @ 2016-01-18 9:41 UTC (permalink / raw) To: zsh-workers On Sun, 17 Jan 2016 14:46:35 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > On Jan 17, 6:16pm, Peter Stephenson wrote: > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str > } > } On Fri, 15 Jan 2016 11:25:16 -0800 > } Bart Schaefer <schaefer@brasslantern.com> wrote: > } > On Jan 15, 9:41am, Peter Stephenson wrote: > } > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str > } > } > } > } it's just splitting words, and (... ...) indeed works as a complete word > } > > } > Urk. That probably ought to be disabled, at least in shell emulation, > } > e.g. here is bash: > } > } shwordsplit does this. (I thought it would be shglob, but it isn't.) > > Hrm. I see no evidence of that. I'm not sure what it is you're not doing, but that's not what I get with ARGV0=sh... pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 9:41 ` Peter Stephenson @ 2016-01-18 16:36 ` Bart Schaefer 2016-01-18 16:52 ` Peter Stephenson 0 siblings, 1 reply; 22+ messages in thread From: Bart Schaefer @ 2016-01-18 16:36 UTC (permalink / raw) To: zsh-workers On Jan 18, 9:41am, Peter Stephenson wrote: } } I'm not sure what it is you're not doing, but that's not what I get } with ARGV0=sh... schaefer[564] ARGV0=sh Src/zsh -f $ print -l one (two three|foo) four one (two three|foo) four $ print $ZSH_PATCHLEVEL zsh-5.2-82-g0194b4a What are YOU seeing? Do we need to diff "set -o" output or something? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 16:36 ` Bart Schaefer @ 2016-01-18 16:52 ` Peter Stephenson 2016-01-19 0:32 ` Bart Schaefer 0 siblings, 1 reply; 22+ messages in thread From: Peter Stephenson @ 2016-01-18 16:52 UTC (permalink / raw) To: zsh-workers On Mon, 18 Jan 2016 08:36:58 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > print -l one (two three|foo) four I was trying it with the (z) flag, which does cause the word to be split up, not directly at the command line, where I get what you get. I don't know why they'd be different, offhand. pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 16:52 ` Peter Stephenson @ 2016-01-19 0:32 ` Bart Schaefer 2016-01-19 3:29 ` Bart Schaefer 2016-01-19 9:36 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson 0 siblings, 2 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-19 0:32 UTC (permalink / raw) To: zsh-workers On Jan 18, 4:52pm, Peter Stephenson wrote: } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str } } On Mon, 18 Jan 2016 08:36:58 -0800 } Bart Schaefer <schaefer@brasslantern.com> wrote: } > print -l one (two three|foo) four } } I was trying it with the (z) flag, which does cause the word to be } split up, not directly at the command line, where I get what you get. } } I don't know why they'd be different, offhand. Seems to be that parameter substitution applies shwordsplit before (z) gets involved, so we have separate calls to bufferwords() for each of "one", "(two", "three|foo)" and "four". Directly at command line, gettok() returns "\210two three\216four\212". Does the below look correct? schaefer[573] ARGV0=sh Src/zsh -f $ print one (two three|four) five zsh: parse error near `(' $ diff --git a/Src/lex.c b/Src/lex.c index 0f260d0..c21ef2d 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -801,7 +801,7 @@ gettok(void) return INOUTPAR; hungetc(d); lexstop = 0; - if (!(incond == 1 || incmdpos)) + if (!(isset(SHGLOB) || incond == 1 || incmdpos)) break; return INPAR; case LX1_OUTPAR: Aside: "emulate sh" does the equivalent of setopt shglob noglob nokshglob In order to make kshglob work, one must setopt glob kshglob Is that correct, or should only kshglob be needed? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-19 0:32 ` Bart Schaefer @ 2016-01-19 3:29 ` Bart Schaefer 2016-01-19 4:07 ` kshglob + noglob (was Re: bufferwords() lexes ....) Bart Schaefer 2016-01-19 9:36 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson 1 sibling, 1 reply; 22+ messages in thread From: Bart Schaefer @ 2016-01-19 3:29 UTC (permalink / raw) To: zsh-workers On Jan 18, 4:32pm, Bart Schaefer wrote: } } Aside: "emulate sh" does the equivalent of } } setopt shglob noglob nokshglob My bad. That's "ARGV0=sh zsh -f", not emulate. I should not have used -f there; mistaken history edit. However, @(foo) is *parsed* as a pattern with only kshglob, but then does NOT *match* as a pattern unless glob is also set. To get back on the original topic of this thread, here's more oddness from bufferwords(): torch% print -l ${(z):-repeat 3 (echo foo;echo bar)} repeat 3 echo foo ; echo bar torch% Where did the parens go? I suspect something is failing to set tokstr. Without the (z) flag, the parens are interpreted as delimiting glob qualifiers. ^ permalink raw reply [flat|nested] 22+ messages in thread
* kshglob + noglob (was Re: bufferwords() lexes ....) 2016-01-19 3:29 ` Bart Schaefer @ 2016-01-19 4:07 ` Bart Schaefer 0 siblings, 0 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-19 4:07 UTC (permalink / raw) To: zsh-workers On Jan 18, 7:29pm, Bart Schaefer wrote: } } However, @(foo) is *parsed* as a pattern with only kshglob, but then } does NOT *match* as a pattern unless glob is also set. This seems to correspond to the behavior of actual ksh, so I'm going to drop this line of inquiry. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-19 0:32 ` Bart Schaefer 2016-01-19 3:29 ` Bart Schaefer @ 2016-01-19 9:36 ` Peter Stephenson 2016-01-19 18:59 ` Bart Schaefer 1 sibling, 1 reply; 22+ messages in thread From: Peter Stephenson @ 2016-01-19 9:36 UTC (permalink / raw) To: zsh-workers On Mon, 18 Jan 2016 16:32:55 -0800 Bart Schaefer <schaefer@brasslantern.com> wrote: > Directly at command line, gettok() returns "\210two three\216four\212". > > Does the below look correct? It's certainly plausible. > Aside: "emulate sh" does the equivalent of > > setopt shglob noglob nokshglob > > In order to make kshglob work, one must > > setopt glob kshglob > > Is that correct, or should only kshglob be needed? Do you really mean "glob"/"noglob"? I thought that meant what it says, controlling all globbing. I can't see any evidence it's related to emulation --- it's got the "emulate" attribute, so is affected by "emulate" without -R but it's on in all emulations. (Not sure what use that combination is... Oh, I see, if you turned it off yourself for some reason, then as soon as you try to set up for any standard emulation it goes back on again.) I think the real question is whether kshglob should actually be on in sh emulation nowadays. pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-19 9:36 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson @ 2016-01-19 18:59 ` Bart Schaefer 0 siblings, 0 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-19 18:59 UTC (permalink / raw) To: zsh-workers On Jan 19, 9:36am, Peter Stephenson wrote: } } Do you really mean "glob"/"noglob"? I did not, see subsequent email. } I thought that meant what it says, controlling all globbing. It turns off globbing, but it doesn't change the parsing of patterns. Which temporarily surprised me, until I was reminded that it always has worked that way. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-15 9:41 ` Peter Stephenson 2016-01-15 19:25 ` Bart Schaefer @ 2016-01-18 2:25 ` Daniel Shahaf 2016-01-18 10:45 ` Peter Stephenson 2016-01-19 4:56 ` Bart Schaefer 1 sibling, 2 replies; 22+ messages in thread From: Daniel Shahaf @ 2016-01-18 2:25 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers Peter Stephenson wrote on Fri, Jan 15, 2016 at 09:41:17 +0000: > Having the parenthesised expressions in the strings > > 'one (two three) four' > > and > > '(two three)' > > split in different ways by the same function typically would be confusing, > though it depends what you're doing with the result. What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are split differently. ;-) Shouldn't both of them treat the "(x)" the same way [either both of them considering it one unit, or both of them considering it three units]? > It might be possible to add a flag to cause an expression you pass in to > be split as if it were a complete command line, not just an arbitrary > set of arguments, but that's a whole new ball game. > > If you're trying to make the (z) work as a kind of eval without > execution, I think you're expecting too much. Even after reading your other reply, I still don't understand what distinction you're trying to draw here, what case you say isn't expected to work. Could you clarify, please? If you're asking whether I expect setopt NO_shortloops print -rl - ${(z):-"setopt shortloops; repeat 3 foo"} to parse the "repeat 3 foo" part with shortloops set, the answer is no, I don't expect that. Thanks, Daniel > pws > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 2:25 ` Daniel Shahaf @ 2016-01-18 10:45 ` Peter Stephenson 2016-01-20 7:47 ` Daniel Shahaf 2016-01-19 4:56 ` Bart Schaefer 1 sibling, 1 reply; 22+ messages in thread From: Peter Stephenson @ 2016-01-18 10:45 UTC (permalink / raw) To: zsh-workers On Mon, 18 Jan 2016 02:25:58 +0000 Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > Even after reading your other reply, I still don't understand what > distinction you're trying to draw here, what case you say isn't expected > to work. Could you clarify, please? No, the whole point is I *can't* say what case isn't going to work, just that there will be a vast heap of them if you attempt to split arbitrary strings and prod the result in enough detail.. The underlying splitting is executing the raw lexer with various squiggles on top to fix up some special cases (but only some). It's doing it in a way which is sort-of helpful to completion, but it's doing it thoroughly inconsistently, given that in the case of (z) all it's been told is "here, have this string which has got some bits of command line in". So you just have to see what actually works and work round it. (Or, of course, rewrite the whole thing, which would be nice, but I don't think is ever going to happen.) There is some special casing in bufferwords() for loops, though, so maybe the case you want isn't far off working. pws ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 10:45 ` Peter Stephenson @ 2016-01-20 7:47 ` Daniel Shahaf 2016-01-20 15:59 ` Bart Schaefer 0 siblings, 1 reply; 22+ messages in thread From: Daniel Shahaf @ 2016-01-20 7:47 UTC (permalink / raw) To: Peter Stephenson; +Cc: zsh-workers Peter Stephenson wrote on Mon, Jan 18, 2016 at 10:45:48 +0000: > There is some special casing in bufferwords() for loops, though, so > maybe the case you want isn't far off working. bufferwords() received the «(x)» as a STRING token, so I looked further down, into gettok(). The attached patch seems to do the trick [see the added tests]. However, to paraphrase Knuth, I only tested this code, not proved it correct. I'd appreciate a review. Thanks for the (snipped) clarifications. Cheers, Daniel diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index bcea980..b64c76e 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -479,6 +479,8 @@ '(( 3 + 1 == 8 / 2 ))' 'for (( i = 1 ; i < 10 ; i++ ))' '((0.25542 * 60) - 15)*60' + 'repeat 3 (x)' + 'repeat 3 (echo foo; echo bar)' ) for string in $strings; do array=(${(z)string}) @@ -514,6 +516,20 @@ >8:15: >9:): >10:*60: +>1:repeat: +>2:3: +>3:(: +>4:x: +>5:): +>1:repeat: +>2:3: +>3:(: +>4:echo: +>5:foo: +>6:;: +>7:echo: +>8:bar: +>9:): line=$'A line with # someone\'s comment\nanother line # (1 more\nanother one' diff --git a/Src/lex.c b/Src/lex.c index 0f260d0..2505dd6 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -267,9 +267,13 @@ zshlex(void) { if (tok == LEXERR) return; - do + do { + if (inrepeat_) + ++inrepeat_; + if (inrepeat_ == 3 && isset(SHORTLOOPS)) + incmdpos = 1; tok = gettok(); - while (tok != ENDINPUT && exalias()); + } while (tok != ENDINPUT && exalias()); nocorrect &= 1; if (tok == NEWLIN || tok == ENDINPUT) { while (hdocs) { @@ -1870,6 +1874,7 @@ exalias(void) zshlextext[0] == '}' && !zshlextext[1])) && (rw = (Reswd) reswdtab->getnode(reswdtab, zshlextext))) { tok = rw->token; + inrepeat_ = (tok == REPEAT); if (tok == DINBRACK) incond = 1; } else if (incond && !strcmp(zshlextext, "]]")) { diff --git a/Src/parse.c b/Src/parse.c index 4829e3a..49c1ac0 100644 --- a/Src/parse.c +++ b/Src/parse.c @@ -63,6 +63,12 @@ int isnewlin; /**/ int infor; +/* != 0 if we are after a repeat keyword; if it's nonzero it's a 1-based index + * of the current token from the last-seen command position */ + +/**/ +int inrepeat_; + /* != 0 if parsing arguments of typeset etc. */ /**/ @@ -271,6 +277,7 @@ parse_context_save(struct parse_stack *ps, int toplevel) ps->incasepat = incasepat; ps->isnewlin = isnewlin; ps->infor = infor; + ps->inrepeat_ = inrepeat_; ps->intypeset = intypeset; ps->hdocs = hdocs; @@ -305,6 +312,7 @@ parse_context_restore(const struct parse_stack *ps, int toplevel) incasepat = ps->incasepat; isnewlin = ps->isnewlin; infor = ps->infor; + inrepeat_ = ps->inrepeat_; intypeset = ps->intypeset; hdocs = ps->hdocs; @@ -447,6 +455,7 @@ init_parse_status(void) * using the lexical analyser for strings as well as here. */ incasepat = incond = inredir = infor = intypeset = 0; + inrepeat_ = 0; incmdpos = 1; } @@ -1482,6 +1491,7 @@ par_while(int *cmplx) static void par_repeat(int *cmplx) { + /* ### what to do about inrepeat_ here? */ int oecused = ecused, p; p = ecadd(0); diff --git a/Src/zsh.h b/Src/zsh.h index 0302d68..a398242 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -2913,6 +2913,7 @@ struct parse_stack { int incasepat; int isnewlin; int infor; + int inrepeat_; int intypeset; int eclen, ecused, ecnpats; diff --git a/Src/Zle/zle_vi.c b/Src/Zle/zle_vi.c index 86840bd..a3af234 100644 --- a/Src/Zle/zle_vi.c +++ b/Src/Zle/zle_vi.c @@ -65,7 +65,7 @@ char *vichgbuf; int viinsbegin; static struct modifier lastmod; -static int inrepeat, vichgrepeat; +static int inrepeat, vichgrepeat; /* that's why the trailing underscore */ /** * im: >= 0: is an insertmode ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-20 7:47 ` Daniel Shahaf @ 2016-01-20 15:59 ` Bart Schaefer 2016-01-21 6:50 ` Bart Schaefer 2016-01-23 23:53 ` Daniel Shahaf 0 siblings, 2 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-20 15:59 UTC (permalink / raw) To: zsh-workers On Jan 20, 7:47am, Daniel Shahaf wrote: } } bufferwords() received the "(x)" as a STRING token, so I looked further } down, into gettok(). The attached patch seems to do the trick [see the } added tests]. However, to paraphrase Knuth, I only tested this code, } not proved it correct. I'd appreciate a review. I haven't tried compiling with the patch, but of course the interesting test case is something like repeat $( : complicated thing ending with; print $number ) (echo foo) I.e. syntax is not "repeat TOKEN command" it's "repeat WORD command" Also what's the reason for the trailing underscore on "inrepeat_"? That isn't done anywhere else in the source. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-20 15:59 ` Bart Schaefer @ 2016-01-21 6:50 ` Bart Schaefer 2016-01-23 23:53 ` Daniel Shahaf 1 sibling, 0 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-21 6:50 UTC (permalink / raw) To: zsh-workers On Jan 20, 7:59am, Bart Schaefer wrote: } } Also what's the reason for the trailing underscore on "inrepeat_"? Well, that's what I get for not reading all the way to the end of the patch, I guess. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-20 15:59 ` Bart Schaefer 2016-01-21 6:50 ` Bart Schaefer @ 2016-01-23 23:53 ` Daniel Shahaf 2016-01-24 5:56 ` Bart Schaefer 1 sibling, 1 reply; 22+ messages in thread From: Daniel Shahaf @ 2016-01-23 23:53 UTC (permalink / raw) To: zsh-workers Bart Schaefer wrote on Wed, Jan 20, 2016 at 07:59:17 -0800: > On Jan 20, 7:47am, Daniel Shahaf wrote: > } > } bufferwords() received the "(x)" as a STRING token, so I looked further > } down, into gettok(). The attached patch seems to do the trick [see the > } added tests]. However, to paraphrase Knuth, I only tested this code, > } not proved it correct. I'd appreciate a review. > > I haven't tried compiling with the patch, but of course the interesting > test case is something like > > repeat $( : complicated thing ending with; print $number ) (echo foo) > > I.e. syntax is not "repeat TOKEN command" it's "repeat WORD command" Seems fine: % pz 'repeat $(( 2 + 4 )) (x)' 'repeat' '$(( 2 + 4 ))' '(' 'x' ')' % pz 'repeat $( : foo bar; echo 4) (x)' 'repeat' '$( : foo bar; echo 4)' '(' 'x' ')' % pz 'repeat "1"'\''2'\''$(( 3 + 0 ))$((echo 4);)\ 5 (x)' 'repeat' $'"1"\'2\'$(( 3 + 0 ))$((echo 4);)\\ 5' '(' 'x' ')' Shall I commit this and wait for bug reports? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-23 23:53 ` Daniel Shahaf @ 2016-01-24 5:56 ` Bart Schaefer 0 siblings, 0 replies; 22+ messages in thread From: Bart Schaefer @ 2016-01-24 5:56 UTC (permalink / raw) To: zsh-workers On Jan 23, 11:53pm, Daniel Shahaf wrote: } } Shall I commit this and wait for bug reports? Sure. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-18 2:25 ` Daniel Shahaf 2016-01-18 10:45 ` Peter Stephenson @ 2016-01-19 4:56 ` Bart Schaefer 2016-01-20 7:47 ` Daniel Shahaf 1 sibling, 1 reply; 22+ messages in thread From: Bart Schaefer @ 2016-01-19 4:56 UTC (permalink / raw) To: Daniel Shahaf; +Cc: Peter Stephenson, Zsh hackers list [Returning to the original topic of this thread ...] On Sun, Jan 17, 2016 at 6:25 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are > split differently. ;-) > > Shouldn't both of them treat the "(x)" the same way [either both of > them considering it one unit, or both of them considering it three units]? As Peter said earlier, the (z) flag does nothing but break the string into syntactic shell words. With the exception of "for" loops, which are a weird special case because of "for ((...))", It does NOT interpret shell keywords to parse any corresponding loop structures. It knows a little about assignments and redirections but otherwise reads lexical tokens in their most generic possible context; you can think of it as having "lex" without "yacc" to drive it. (z) also does not expand aliases, which means that even if it did interpret keywords you could trivially break it by aliasing something else to expand as "repeat" or vice-versa. (In fact you can already break the magic "for" parsing the same way.) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: bufferwords() lexes a subshell in a shortloop repeat as a string 2016-01-19 4:56 ` Bart Schaefer @ 2016-01-20 7:47 ` Daniel Shahaf 0 siblings, 0 replies; 22+ messages in thread From: Daniel Shahaf @ 2016-01-20 7:47 UTC (permalink / raw) To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list Bart Schaefer wrote on Mon, Jan 18, 2016 at 20:56:04 -0800: > [Returning to the original topic of this thread ...] > > On Sun, Jan 17, 2016 at 6:25 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote: > > What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are > > split differently. ;-) > > > > Shouldn't both of them treat the "(x)" the same way [either both of > > them considering it one unit, or both of them considering it three units]? > > As Peter said earlier, the (z) flag does nothing but break the string > into syntactic shell words. With the exception of "for" loops, which > are a weird special case because of "for ((...))", It does NOT > interpret shell keywords to parse any corresponding loop structures. > It knows a little about assignments and redirections but otherwise > reads lexical tokens in their most generic possible context; you can > think of it as having "lex" without "yacc" to drive it. > Okay; so what I was seeing was that bufferwords() knew that a DOLOOP token is followed by a command position, but not that a REPEAT token is followed by a token that's followed by a command position. I think REPEAT is the only place where that happens: other reserved words are followed immediately by a command position with no intervening words. (Which is why get_comp_string() sets 'ins' to '2' only for REPEAT tokens.) Aside: bufferwords(), get_comp_string(), and z-sy-h's main loop have something in common: they all drive the lexer and keep track of a little bit of syntax. E.g., with this patch all of them keep track of "if the command word is 'repeat', the word-after-next is a command word". > (z) also does not expand aliases, which means that even if it did > interpret keywords you could trivially break it by aliasing something > else to expand as "repeat" or vice-versa. (In fact you can already > break the magic "for" parsing the same way.) Don't do that, then :-) Cheers, Daniel ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-01-24 5:55 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-15 6:26 bufferwords() lexes a subshell in a shortloop repeat as a string Daniel Shahaf 2016-01-15 9:41 ` Peter Stephenson 2016-01-15 19:25 ` Bart Schaefer 2016-01-17 18:16 ` Peter Stephenson 2016-01-17 22:46 ` Bart Schaefer 2016-01-18 9:41 ` Peter Stephenson 2016-01-18 16:36 ` Bart Schaefer 2016-01-18 16:52 ` Peter Stephenson 2016-01-19 0:32 ` Bart Schaefer 2016-01-19 3:29 ` Bart Schaefer 2016-01-19 4:07 ` kshglob + noglob (was Re: bufferwords() lexes ....) Bart Schaefer 2016-01-19 9:36 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson 2016-01-19 18:59 ` Bart Schaefer 2016-01-18 2:25 ` Daniel Shahaf 2016-01-18 10:45 ` Peter Stephenson 2016-01-20 7:47 ` Daniel Shahaf 2016-01-20 15:59 ` Bart Schaefer 2016-01-21 6:50 ` Bart Schaefer 2016-01-23 23:53 ` Daniel Shahaf 2016-01-24 5:56 ` Bart Schaefer 2016-01-19 4:56 ` Bart Schaefer 2016-01-20 7:47 ` Daniel Shahaf
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).