zsh-workers
 help / color / mirror / code / Atom feed
* bufferwords() lexes a subshell in a shortloop repeat as a string
@ 2016-01-15  6:26 Daniel Shahaf
  2016-01-15  9:41 ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Shahaf @ 2016-01-15  6:26 UTC (permalink / raw)
  To: zsh-workers

The ${(z)} modifier gives me a subshell as a single unit:

    % pz() { print -rl - ${(qq)${(z)1}} } 
    % pz 'repeat 3 (echo this is a subshell)'
    'repeat'
    '3'
    '(echo this is a subshell)'

I expected the subshell to be broken into '(', 'echo', …, ')' tokens, as
per usual.

Looking at it in gdb, I see (after the third call to ctxtlex()):
    tok == STRING
    tokstr == "(echo this is a subshell)"

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-15  6:26 bufferwords() lexes a subshell in a shortloop repeat as a string Daniel Shahaf
@ 2016-01-15  9:41 ` Peter Stephenson
  2016-01-15 19:25   ` Bart Schaefer
  2016-01-18  2:25   ` Daniel Shahaf
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-01-15  9:41 UTC (permalink / raw)
  To: zsh-workers

On Fri, 15 Jan 2016 06:26:48 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> The ${(z)} modifier gives me a subshell as a single unit:

It doesn't know it's a subshell; it doesn't know you want the first
character to be in command position.  It's not parsing the whole thing
as a command expression, it's just splitting words, and (... ...) indeed
works as a complete word:

% noglob print -l one (two three) four
one
(two three)
four

The noglob hints at why parentheses not in command position are treated
like that --- it's convenient for glob qualifiers.

Having the parenthesised expressions in the strings

'one (two three) four'

and

'(two three)'

split in different ways by the same function typically would be confusing,
though it depends what you're doing with the result.

It might be possible to add a flag to cause an expression you pass in to
be split as if it were a complete command line, not just an arbitrary
set of arguments, but that's a whole new ball game.

If you're trying to make the (z) work as a kind of eval without
execution, I think you're expecting too much.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-15  9:41 ` Peter Stephenson
@ 2016-01-15 19:25   ` Bart Schaefer
  2016-01-17 18:16     ` Peter Stephenson
  2016-01-18  2:25   ` Daniel Shahaf
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-01-15 19:25 UTC (permalink / raw)
  To: zsh-workers

On Jan 15,  9:41am, Peter Stephenson wrote:
} Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
}
} it's just splitting words, and (... ...) indeed works as a complete word

Urk.  That probably ought to be disabled, at least in shell emulation,
e.g. here is bash:

schaefer@burner$ echo one (two three) four
bash: syntax error near unexpected token `('

Here's an interesting side effect:

torch% touch "two three"
torch% unsetopt bareglobqual
torch% ls -l (two three)
-rw-rw-r--  1 schaefer schaefer 0 Jan 15 11:22 two three

It becomes another way to quote spaces in file names, but only if the file
already exists.

torch% touch (one two)
zsh: no matches found: (one two)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-15 19:25   ` Bart Schaefer
@ 2016-01-17 18:16     ` Peter Stephenson
  2016-01-17 22:46       ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-01-17 18:16 UTC (permalink / raw)
  To: zsh-workers

On Fri, 15 Jan 2016 11:25:16 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jan 15,  9:41am, Peter Stephenson wrote:
> } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
> }
> } it's just splitting words, and (... ...) indeed works as a complete word
> 
> Urk.  That probably ought to be disabled, at least in shell emulation,
> e.g. here is bash:

shwordsplit does this.  (I thought it would be shglob, but it isn't.)

By the way, I was too glib before: if you have a string that *starts*
with "(", it *does* get split as a complete command line that starts
with a subshell, even in native mode, as you might expect.  So I think
(z) is behaving basically rationally, but with the caveat I mentioned
that it's a fairly brutal tool in comparison with real context
sensitivity.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-17 18:16     ` Peter Stephenson
@ 2016-01-17 22:46       ` Bart Schaefer
  2016-01-18  9:41         ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-01-17 22:46 UTC (permalink / raw)
  To: zsh-workers

On Jan 17,  6:16pm, Peter Stephenson wrote:
} Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
}
} On Fri, 15 Jan 2016 11:25:16 -0800
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > On Jan 15,  9:41am, Peter Stephenson wrote:
} > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
} > }
} > } it's just splitting words, and (... ...) indeed works as a complete word
} > 
} > Urk.  That probably ought to be disabled, at least in shell emulation,
} > e.g. here is bash:
} 
} shwordsplit does this.  (I thought it would be shglob, but it isn't.)

Hrm.  I see no evidence of that.

Src/zsh -f

torch% emulate sh 
torch% print -l one (two three) four
one
(two three)
four
torch% 

ARGV0=sh Src/zsh

$ print -l one (two three) four
one
(two three)
four
$ 

It appears that parens are still parsed as grouping, even though they
thereafter are considered a literal pattern character.  E.g., I expected
"|" to be treated as a pipe in the following:

$ print -l one (two three|foo) four
one
(two three|foo)
four


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-15  9:41 ` Peter Stephenson
  2016-01-15 19:25   ` Bart Schaefer
@ 2016-01-18  2:25   ` Daniel Shahaf
  2016-01-18 10:45     ` Peter Stephenson
  2016-01-19  4:56     ` Bart Schaefer
  1 sibling, 2 replies; 22+ messages in thread
From: Daniel Shahaf @ 2016-01-18  2:25 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

Peter Stephenson wrote on Fri, Jan 15, 2016 at 09:41:17 +0000:
> Having the parenthesised expressions in the strings
> 
> 'one (two three) four'
> 
> and
> 
> '(two three)'
> 
> split in different ways by the same function typically would be confusing,
> though it depends what you're doing with the result.

What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are
split differently. ;-)

Shouldn't both of them treat the "(x)" the same way [either both of
them considering it one unit, or both of them considering it three units]?

> It might be possible to add a flag to cause an expression you pass in to
> be split as if it were a complete command line, not just an arbitrary
> set of arguments, but that's a whole new ball game.
> 
> If you're trying to make the (z) work as a kind of eval without
> execution, I think you're expecting too much.

Even after reading your other reply, I still don't understand what
distinction you're trying to draw here, what case you say isn't expected
to work.  Could you clarify, please?

If you're asking whether I expect
    setopt NO_shortloops
    print -rl - ${(z):-"setopt shortloops; repeat 3 foo"}
to parse the "repeat 3 foo" part with shortloops set, the answer is no,
I don't expect that.

Thanks,

Daniel

> pws
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-17 22:46       ` Bart Schaefer
@ 2016-01-18  9:41         ` Peter Stephenson
  2016-01-18 16:36           ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-01-18  9:41 UTC (permalink / raw)
  To: zsh-workers

On Sun, 17 Jan 2016 14:46:35 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jan 17,  6:16pm, Peter Stephenson wrote:
> } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
> }
> } On Fri, 15 Jan 2016 11:25:16 -0800
> } Bart Schaefer <schaefer@brasslantern.com> wrote:
> } > On Jan 15,  9:41am, Peter Stephenson wrote:
> } > } Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
> } > }
> } > } it's just splitting words, and (... ...) indeed works as a complete word
> } > 
> } > Urk.  That probably ought to be disabled, at least in shell emulation,
> } > e.g. here is bash:
> } 
> } shwordsplit does this.  (I thought it would be shglob, but it isn't.)
> 
> Hrm.  I see no evidence of that.

I'm not sure what it is you're not doing, but that's not what I get
with ARGV0=sh...

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18  2:25   ` Daniel Shahaf
@ 2016-01-18 10:45     ` Peter Stephenson
  2016-01-20  7:47       ` Daniel Shahaf
  2016-01-19  4:56     ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-01-18 10:45 UTC (permalink / raw)
  To: zsh-workers

On Mon, 18 Jan 2016 02:25:58 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> Even after reading your other reply, I still don't understand what
> distinction you're trying to draw here, what case you say isn't expected
> to work.  Could you clarify, please?

No, the whole point is I *can't* say what case isn't going to work, just
that there will be a vast heap of them if you attempt to split arbitrary
strings and prod the result in enough detail..

The underlying splitting is executing the raw lexer with various
squiggles on top to fix up some special cases (but only some).  It's
doing it in a way which is sort-of helpful to completion, but it's doing
it thoroughly inconsistently, given that in the case of (z) all it's
been told is "here, have this string which has got some bits of command
line in".  So you just have to see what actually works and work round
it.  (Or, of course, rewrite the whole thing, which would be nice, but I
don't think is ever going to happen.)

There is some special casing in bufferwords() for loops, though, so
maybe the case you want isn't far off working.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18  9:41         ` Peter Stephenson
@ 2016-01-18 16:36           ` Bart Schaefer
  2016-01-18 16:52             ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-01-18 16:36 UTC (permalink / raw)
  To: zsh-workers

On Jan 18,  9:41am, Peter Stephenson wrote:
}
} I'm not sure what it is you're not doing, but that's not what I get
} with ARGV0=sh...

schaefer[564] ARGV0=sh Src/zsh -f
$ print -l one (two three|foo) four
one
(two three|foo)
four
$ print $ZSH_PATCHLEVEL
zsh-5.2-82-g0194b4a


What are YOU seeing?  Do we need to diff "set -o" output or something?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18 16:36           ` Bart Schaefer
@ 2016-01-18 16:52             ` Peter Stephenson
  2016-01-19  0:32               ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-01-18 16:52 UTC (permalink / raw)
  To: zsh-workers

On Mon, 18 Jan 2016 08:36:58 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> print -l one (two three|foo) four

I was trying it with the (z) flag, which does cause the word to be
split up, not directly at the command line, where I get what you get.

I don't know why they'd be different, offhand.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18 16:52             ` Peter Stephenson
@ 2016-01-19  0:32               ` Bart Schaefer
  2016-01-19  3:29                 ` Bart Schaefer
  2016-01-19  9:36                 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson
  0 siblings, 2 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-19  0:32 UTC (permalink / raw)
  To: zsh-workers

On Jan 18,  4:52pm, Peter Stephenson wrote:
} Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a str
}
} On Mon, 18 Jan 2016 08:36:58 -0800
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > print -l one (two three|foo) four
} 
} I was trying it with the (z) flag, which does cause the word to be
} split up, not directly at the command line, where I get what you get.
} 
} I don't know why they'd be different, offhand.

Seems to be that parameter substitution applies shwordsplit before (z)
gets involved, so we have separate calls to bufferwords() for each of
"one", "(two", "three|foo)" and "four".

Directly at command line, gettok() returns "\210two three\216four\212".

Does the below look correct?

schaefer[573] ARGV0=sh Src/zsh -f
$ print one (two three|four) five
zsh: parse error near `('
$ 


diff --git a/Src/lex.c b/Src/lex.c
index 0f260d0..c21ef2d 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -801,7 +801,7 @@ gettok(void)
 	    return INOUTPAR;
 	hungetc(d);
 	lexstop = 0;
-	if (!(incond == 1 || incmdpos))
+	if (!(isset(SHGLOB) || incond == 1 || incmdpos))
 	    break;
 	return INPAR;
     case LX1_OUTPAR:


Aside:  "emulate sh" does the equivalent of

    setopt shglob noglob nokshglob

In order to make kshglob work, one must

    setopt glob kshglob

Is that correct, or should only kshglob be needed?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-19  0:32               ` Bart Schaefer
@ 2016-01-19  3:29                 ` Bart Schaefer
  2016-01-19  4:07                   ` kshglob + noglob (was Re: bufferwords() lexes ....) Bart Schaefer
  2016-01-19  9:36                 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-01-19  3:29 UTC (permalink / raw)
  To: zsh-workers

On Jan 18,  4:32pm, Bart Schaefer wrote:
}
} Aside:  "emulate sh" does the equivalent of
} 
}     setopt shglob noglob nokshglob

My bad.  That's "ARGV0=sh zsh -f", not emulate.  I should not have
used -f there; mistaken history edit.

However, @(foo) is *parsed* as a pattern with only kshglob, but then
does NOT *match* as a pattern unless glob is also set.  

To get back on the original topic of this thread, here's more oddness
from bufferwords():

torch% print -l ${(z):-repeat 3 (echo foo;echo bar)}  
repeat
3
echo
foo
;
echo
bar
torch% 

Where did the parens go?  I suspect something is failing to set tokstr.
Without the (z) flag, the parens are interpreted as delimiting glob
qualifiers.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* kshglob + noglob (was Re: bufferwords() lexes ....)
  2016-01-19  3:29                 ` Bart Schaefer
@ 2016-01-19  4:07                   ` Bart Schaefer
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-19  4:07 UTC (permalink / raw)
  To: zsh-workers

On Jan 18,  7:29pm, Bart Schaefer wrote:
}
} However, @(foo) is *parsed* as a pattern with only kshglob, but then
} does NOT *match* as a pattern unless glob is also set.  

This seems to correspond to the behavior of actual ksh, so I'm going
to drop this line of inquiry.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18  2:25   ` Daniel Shahaf
  2016-01-18 10:45     ` Peter Stephenson
@ 2016-01-19  4:56     ` Bart Schaefer
  2016-01-20  7:47       ` Daniel Shahaf
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-01-19  4:56 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: Peter Stephenson, Zsh hackers list

[Returning to the original topic of this thread ...]

On Sun, Jan 17, 2016 at 6:25 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are
> split differently. ;-)
>
> Shouldn't both of them treat the "(x)" the same way [either both of
> them considering it one unit, or both of them considering it three units]?

As Peter said earlier, the (z) flag does nothing but break the string
into syntactic shell words.  With the exception of "for" loops, which
are a weird special case because of "for ((...))", It does NOT
interpret shell keywords to parse any corresponding loop structures.
It knows a little about assignments and redirections but otherwise
reads lexical tokens in their most generic possible context; you can
think of it as having "lex" without "yacc" to drive it.

(z) also does not expand aliases, which means that even if it did
interpret keywords you could trivially break it by aliasing something
else to expand as "repeat" or vice-versa.  (In fact you can already
break the magic "for" parsing the same way.)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-19  0:32               ` Bart Schaefer
  2016-01-19  3:29                 ` Bart Schaefer
@ 2016-01-19  9:36                 ` Peter Stephenson
  2016-01-19 18:59                   ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-01-19  9:36 UTC (permalink / raw)
  To: zsh-workers

On Mon, 18 Jan 2016 16:32:55 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> Directly at command line, gettok() returns "\210two three\216four\212".
> 
> Does the below look correct?

It's certainly plausible.

> Aside:  "emulate sh" does the equivalent of
> 
>     setopt shglob noglob nokshglob
> 
> In order to make kshglob work, one must
> 
>     setopt glob kshglob
> 
> Is that correct, or should only kshglob be needed?

Do you really mean "glob"/"noglob"?  I thought that meant what it says,
controlling all globbing.  I can't see any evidence it's related to
emulation --- it's got the "emulate" attribute, so is affected by
"emulate" without -R but it's on in all emulations.  (Not sure what use
that combination is... Oh, I see, if you turned it off yourself for some
reason, then as soon as you try to set up for any standard emulation it
goes back on again.)

I think the real question is whether kshglob should actually be on in sh
emulation nowadays.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-19  9:36                 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson
@ 2016-01-19 18:59                   ` Bart Schaefer
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-19 18:59 UTC (permalink / raw)
  To: zsh-workers

On Jan 19,  9:36am, Peter Stephenson wrote:
}
} Do you really mean "glob"/"noglob"?

I did not, see subsequent email.

} I thought that meant what it says, controlling all globbing.

It turns off globbing, but it doesn't change the parsing of patterns.
Which temporarily surprised me, until I was reminded that it always
has worked that way.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-18 10:45     ` Peter Stephenson
@ 2016-01-20  7:47       ` Daniel Shahaf
  2016-01-20 15:59         ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Daniel Shahaf @ 2016-01-20  7:47 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

Peter Stephenson wrote on Mon, Jan 18, 2016 at 10:45:48 +0000:
> There is some special casing in bufferwords() for loops, though, so
> maybe the case you want isn't far off working.

bufferwords() received the «(x)» as a STRING token, so I looked further
down, into gettok().  The attached patch seems to do the trick [see the
added tests].  However, to paraphrase Knuth, I only tested this code,
not proved it correct.  I'd appreciate a review.

Thanks for the (snipped) clarifications.

Cheers,

Daniel


diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index bcea980..b64c76e 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -479,6 +479,8 @@
     '(( 3 + 1 == 8 / 2 ))'
     'for (( i = 1 ; i < 10 ; i++ ))'
     '((0.25542 * 60) - 15)*60'
+    'repeat 3 (x)'
+    'repeat 3 (echo foo; echo bar)'
   )
   for string in $strings; do
     array=(${(z)string})
@@ -514,6 +516,20 @@
 >8:15:
 >9:):
 >10:*60:
+>1:repeat:
+>2:3:
+>3:(:
+>4:x:
+>5:):
+>1:repeat:
+>2:3:
+>3:(:
+>4:echo:
+>5:foo:
+>6:;:
+>7:echo:
+>8:bar:
+>9:):
 
 
   line=$'A line with # someone\'s comment\nanother line # (1 more\nanother one'
diff --git a/Src/lex.c b/Src/lex.c
index 0f260d0..2505dd6 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -267,9 +267,13 @@ zshlex(void)
 {
     if (tok == LEXERR)
 	return;
-    do
+    do {
+	if (inrepeat_)
+	    ++inrepeat_;
+	if (inrepeat_ == 3 && isset(SHORTLOOPS))
+	    incmdpos = 1;
 	tok = gettok();
-    while (tok != ENDINPUT && exalias());
+    } while (tok != ENDINPUT && exalias());
     nocorrect &= 1;
     if (tok == NEWLIN || tok == ENDINPUT) {
 	while (hdocs) {
@@ -1870,6 +1874,7 @@ exalias(void)
 		  zshlextext[0] == '}' && !zshlextext[1])) &&
 		(rw = (Reswd) reswdtab->getnode(reswdtab, zshlextext))) {
 		tok = rw->token;
+		inrepeat_ = (tok == REPEAT);
 		if (tok == DINBRACK)
 		    incond = 1;
 	    } else if (incond && !strcmp(zshlextext, "]]")) {
diff --git a/Src/parse.c b/Src/parse.c
index 4829e3a..49c1ac0 100644
--- a/Src/parse.c
+++ b/Src/parse.c
@@ -63,6 +63,12 @@ int isnewlin;
 /**/
 int infor;
 
+/* != 0 if we are after a repeat keyword; if it's nonzero it's a 1-based index
+ * of the current token from the last-seen command position */
+
+/**/
+int inrepeat_;
+
 /* != 0 if parsing arguments of typeset etc. */
 
 /**/
@@ -271,6 +277,7 @@ parse_context_save(struct parse_stack *ps, int toplevel)
     ps->incasepat = incasepat;
     ps->isnewlin = isnewlin;
     ps->infor = infor;
+    ps->inrepeat_ = inrepeat_;
     ps->intypeset = intypeset;
 
     ps->hdocs = hdocs;
@@ -305,6 +312,7 @@ parse_context_restore(const struct parse_stack *ps, int toplevel)
     incasepat = ps->incasepat;
     isnewlin = ps->isnewlin;
     infor = ps->infor;
+    inrepeat_ = ps->inrepeat_;
     intypeset = ps->intypeset;
 
     hdocs = ps->hdocs;
@@ -447,6 +455,7 @@ init_parse_status(void)
      * using the lexical analyser for strings as well as here.
      */
     incasepat = incond = inredir = infor = intypeset = 0;
+    inrepeat_ = 0;
     incmdpos = 1;
 }
 
@@ -1482,6 +1491,7 @@ par_while(int *cmplx)
 static void
 par_repeat(int *cmplx)
 {
+    /* ### what to do about inrepeat_ here? */
     int oecused = ecused, p;
 
     p = ecadd(0);
diff --git a/Src/zsh.h b/Src/zsh.h
index 0302d68..a398242 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -2913,6 +2913,7 @@ struct parse_stack {
     int incasepat;
     int isnewlin;
     int infor;
+    int inrepeat_;
     int intypeset;
 
     int eclen, ecused, ecnpats;
diff --git a/Src/Zle/zle_vi.c b/Src/Zle/zle_vi.c
index 86840bd..a3af234 100644
--- a/Src/Zle/zle_vi.c
+++ b/Src/Zle/zle_vi.c
@@ -65,7 +65,7 @@ char *vichgbuf;
 int viinsbegin;
 
 static struct modifier lastmod;
-static int inrepeat, vichgrepeat;
+static int inrepeat, vichgrepeat; /* that's why the trailing underscore */
 
 /**
  * im: >= 0: is an insertmode


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-19  4:56     ` Bart Schaefer
@ 2016-01-20  7:47       ` Daniel Shahaf
  0 siblings, 0 replies; 22+ messages in thread
From: Daniel Shahaf @ 2016-01-20  7:47 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Peter Stephenson, Zsh hackers list

Bart Schaefer wrote on Mon, Jan 18, 2016 at 20:56:04 -0800:
> [Returning to the original topic of this thread ...]
> 
> On Sun, Jan 17, 2016 at 6:25 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are
> > split differently. ;-)
> >
> > Shouldn't both of them treat the "(x)" the same way [either both of
> > them considering it one unit, or both of them considering it three units]?
> 
> As Peter said earlier, the (z) flag does nothing but break the string
> into syntactic shell words.  With the exception of "for" loops, which
> are a weird special case because of "for ((...))", It does NOT
> interpret shell keywords to parse any corresponding loop structures.
> It knows a little about assignments and redirections but otherwise
> reads lexical tokens in their most generic possible context; you can
> think of it as having "lex" without "yacc" to drive it.
> 

Okay; so what I was seeing was that bufferwords() knew that a DOLOOP token
is followed by a command position, but not that a REPEAT token is
followed by a token that's followed by a command position.

I think REPEAT is the only place where that happens: other reserved
words are followed immediately by a command position with no intervening
words.  (Which is why get_comp_string() sets 'ins' to '2' only for
REPEAT tokens.)

Aside: bufferwords(), get_comp_string(), and z-sy-h's main loop have
something in common: they all drive the lexer and keep track of a little
bit of syntax.  E.g., with this patch all of them keep track of "if the
command word is 'repeat', the word-after-next is a command word".

> (z) also does not expand aliases, which means that even if it did
> interpret keywords you could trivially break it by aliasing something
> else to expand as "repeat" or vice-versa.  (In fact you can already
> break the magic "for" parsing the same way.)

Don't do that, then :-)

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-20  7:47       ` Daniel Shahaf
@ 2016-01-20 15:59         ` Bart Schaefer
  2016-01-21  6:50           ` Bart Schaefer
  2016-01-23 23:53           ` Daniel Shahaf
  0 siblings, 2 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-20 15:59 UTC (permalink / raw)
  To: zsh-workers

On Jan 20,  7:47am, Daniel Shahaf wrote:
}
} bufferwords() received the "(x)" as a STRING token, so I looked further
} down, into gettok().  The attached patch seems to do the trick [see the
} added tests].  However, to paraphrase Knuth, I only tested this code,
} not proved it correct.  I'd appreciate a review.

I haven't tried compiling with the patch, but of course the interesting
test case is something like

repeat $( : complicated thing ending with; print $number ) (echo foo)

I.e. syntax is not "repeat TOKEN command" it's "repeat WORD command"

Also what's the reason for the trailing underscore on "inrepeat_"?  That
isn't done anywhere else in the source.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-20 15:59         ` Bart Schaefer
@ 2016-01-21  6:50           ` Bart Schaefer
  2016-01-23 23:53           ` Daniel Shahaf
  1 sibling, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-21  6:50 UTC (permalink / raw)
  To: zsh-workers

On Jan 20,  7:59am, Bart Schaefer wrote:
}
} Also what's the reason for the trailing underscore on "inrepeat_"?

Well, that's what I get for not reading all the way to the end of the
patch, I guess.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-20 15:59         ` Bart Schaefer
  2016-01-21  6:50           ` Bart Schaefer
@ 2016-01-23 23:53           ` Daniel Shahaf
  2016-01-24  5:56             ` Bart Schaefer
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Shahaf @ 2016-01-23 23:53 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Wed, Jan 20, 2016 at 07:59:17 -0800:
> On Jan 20,  7:47am, Daniel Shahaf wrote:
> }
> } bufferwords() received the "(x)" as a STRING token, so I looked further
> } down, into gettok().  The attached patch seems to do the trick [see the
> } added tests].  However, to paraphrase Knuth, I only tested this code,
> } not proved it correct.  I'd appreciate a review.
> 
> I haven't tried compiling with the patch, but of course the interesting
> test case is something like
> 
> repeat $( : complicated thing ending with; print $number ) (echo foo)
> 
> I.e. syntax is not "repeat TOKEN command" it's "repeat WORD command"

Seems fine:

% pz 'repeat $(( 2 + 4 )) (x)'
'repeat'
'$(( 2 + 4 ))'
'('
'x'
')'

% pz 'repeat $( : foo bar; echo 4) (x)'
'repeat'
'$( : foo bar; echo 4)'
'('
'x'
')'

% pz 'repeat "1"'\''2'\''$(( 3 + 0 ))$((echo 4);)\ 5 (x)'
'repeat'
$'"1"\'2\'$(( 3 + 0 ))$((echo 4);)\\ 5'
'('
'x'
')'

Shall I commit this and wait for bug reports?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bufferwords() lexes a subshell in a shortloop repeat as a string
  2016-01-23 23:53           ` Daniel Shahaf
@ 2016-01-24  5:56             ` Bart Schaefer
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-01-24  5:56 UTC (permalink / raw)
  To: zsh-workers

On Jan 23, 11:53pm, Daniel Shahaf wrote:
}
} Shall I commit this and wait for bug reports?

Sure.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-01-24  5:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15  6:26 bufferwords() lexes a subshell in a shortloop repeat as a string Daniel Shahaf
2016-01-15  9:41 ` Peter Stephenson
2016-01-15 19:25   ` Bart Schaefer
2016-01-17 18:16     ` Peter Stephenson
2016-01-17 22:46       ` Bart Schaefer
2016-01-18  9:41         ` Peter Stephenson
2016-01-18 16:36           ` Bart Schaefer
2016-01-18 16:52             ` Peter Stephenson
2016-01-19  0:32               ` Bart Schaefer
2016-01-19  3:29                 ` Bart Schaefer
2016-01-19  4:07                   ` kshglob + noglob (was Re: bufferwords() lexes ....) Bart Schaefer
2016-01-19  9:36                 ` bufferwords() lexes a subshell in a shortloop repeat as a string Peter Stephenson
2016-01-19 18:59                   ` Bart Schaefer
2016-01-18  2:25   ` Daniel Shahaf
2016-01-18 10:45     ` Peter Stephenson
2016-01-20  7:47       ` Daniel Shahaf
2016-01-20 15:59         ` Bart Schaefer
2016-01-21  6:50           ` Bart Schaefer
2016-01-23 23:53           ` Daniel Shahaf
2016-01-24  5:56             ` Bart Schaefer
2016-01-19  4:56     ` Bart Schaefer
2016-01-20  7:47       ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).