* ${(z)} split of unmatched, doubled (( @ 2015-09-27 1:23 Daniel Shahaf 2015-09-27 16:00 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Daniel Shahaf @ 2015-09-27 1:23 UTC (permalink / raw) To: zsh-workers Consider: % print -rl - ${(z):-'(( e'} e % Shouldn't it output the parentheses as well? I realise it can't know yet whether it's an arithmetic evaluation or two subshells, but pretending the parentheses don't exist will never be the right parse... Other unfinished constructs don't seem to exhibit this behaviour: % print -rl - ${(z):-'( e'} ( e % print -rl - ${(z):-'( ( e'} ( ( e % print -rl - ${(z):-'echo "hello w'} echo "hello w I ran into this in zsh-syntax-highlighting when BUFFER='(( 42 ', with the closing-double-paren not having been typed yet. For that use-case, I don't care whether I get a single '((' token or two '(' tokens; either would be fine. Cheers, Daniel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ${(z)} split of unmatched, doubled (( 2015-09-27 1:23 ${(z)} split of unmatched, doubled (( Daniel Shahaf @ 2015-09-27 16:00 ` Bart Schaefer 2015-09-27 23:51 ` Daniel Shahaf 0 siblings, 1 reply; 6+ messages in thread From: Bart Schaefer @ 2015-09-27 16:00 UTC (permalink / raw) To: zsh-workers On Sep 27, 1:23am, Daniel Shahaf wrote: } } % print -rl - ${(z):-'(( e'} } e } % } } Shouldn't it output the parentheses as well? This has always been broken. The '((' parse doesn't have a tokstr value (cf. comments about parsing "for (( ... ))" in bufferwords() [hist.c]). Prior to recent fixes to backtrack this properly, the error was even worse: torch% print $ZSH_VERSION 4.2.0 torch% print -rl - ${(z):-'(( e x y'} e x y ( e x y torch% There's also an issue of how to treat "e" in this example. If the double parens are taken as math context, then "e" is a single double- quoted token, otherwise it has to be decomposed into shell words. As you can see, 4.2.0 parses it BOTH ways (eek). I don't have an answer for where the right place to "output" the parens would be; the backtracking makes this ugly. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ${(z)} split of unmatched, doubled (( 2015-09-27 16:00 ` Bart Schaefer @ 2015-09-27 23:51 ` Daniel Shahaf 2015-09-28 0:59 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Daniel Shahaf @ 2015-09-27 23:51 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-workers [ replying out of order ] Bart Schaefer wrote on Sun, Sep 27, 2015 at 09:00:48 -0700: > On Sep 27, 1:23am, Daniel Shahaf wrote: > } > } % print -rl - ${(z):-'(( e'} > } e > } % > } > } Shouldn't it output the parentheses as well? > > There's also an issue of how to treat "e" in this example. If the > double parens are taken as math context, then "e" is a single double- > quoted token, otherwise it has to be decomposed into shell words. As > you can see, 4.2.0 parses it BOTH ways (eek). > The "e x y" should be treated consistently with the "((": the latter should be reported as two tokens iff the "e x y" is split into words. Since the "two subshells" case can be disambiguated by adding a space, but arithmetic evluations cannot be disambiguated, I assume ambiguous cases should be resolved in favour of the latter. > This has always been broken. The '((' parse doesn't have a tokstr > value (cf. comments about parsing "for (( ... ))" in bufferwords() > [hist.c]). Prior to recent fixes to backtrack this properly, the > error was even worse: > > torch% print $ZSH_VERSION > 4.2.0 > torch% print -rl - ${(z):-'(( e x y'} > e x y > ( > e > x > y > torch% > I see the problem: ${(z)} is bufferwords(), which calls ctxtlex(), which ultimately calls cmd_or_math(), which classifies the unbalanced opening parentheses as a syntax error, because they have no matching ')' before the end of the input. Consequently, cmd_or_math() returns CMD_OR_MATH_ERR on line 512 (in the 'if (lexstop)' block), which causes ctxtlex() to return LEXERR. I guess cmd_or_math() is actually doing the right thing, insofar as the "interpret and execute code" use-case of the lexer is concerned. But the bufferwords() caller shouldn't simply skip over the "((" characters in the input buffer. (More on this below.) > I don't have an answer for where the right place to "output" the parens > would be; the backtracking makes this ugly. Looking at bufferwords(), the "e" is added to the output by the addlinknode() in this block: 3350 if (buf && tok == LEXERR && tokstr && *tokstr) { 3351 int plen; 3352 untokenize((p = dupstring(tokstr))); 3353 plen = strlen(p); 3354 /* 3355 * Strip the space we added for lexing but which won't have 3356 * been swallowed by the lexer because we aborted early. 3357 * The test is paranoia. 3358 */ 3359 if (plen && p[plen-1] == ' ' && (plen == 1 || p[plen-2] != Meta)) 3360 p[plen - 1] = '\0'; 3361 addlinknode(list, p); 3362 num++; 3363 } If this addlinknode() were skipped, output would stop immediately before the '((': [with line 3361 commented out] % Src/zsh -fc 'print -rl - ${(z):-":; (( e"}' : ; However, removing the addlinknode() call makes a test fail: ./D04parameter.ztst: starting. *** /tmp/zsh.ztst.out.13138 2015-09-27 20:16:41.812154669 +0000 --- /tmp/zsh.ztst.tout.13138 2015-09-27 20:16:41.816154673 +0000 *************** *** 3,11 **** line with # - someone's comment - another line # (1 more - another one *** Kept *** --- 3,8 ---- Test ./D04parameter.ztst failed: output differs from expected as shown above for: line=$'A line with # someone\'s comment\nanother line # (1 more\nanother one' print "*** Normal ***" print -l ${(z)line} print "*** Kept ***" Was testing: Comments with (z) So I'm not sure what's right here. Perhaps the addlinknode() should be skipped for the "(( e" case? (i.e., parse as far as possible, and stop before the ambiguity) And why is tokstr " e " when tok is LEXERR? It's not the " e" that caused the error... Cheers, Daniel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ${(z)} split of unmatched, doubled (( 2015-09-27 23:51 ` Daniel Shahaf @ 2015-09-28 0:59 ` Bart Schaefer 2015-09-28 1:55 ` Daniel Shahaf 0 siblings, 1 reply; 6+ messages in thread From: Bart Schaefer @ 2015-09-28 0:59 UTC (permalink / raw) To: zsh-workers On Sep 27, 11:51pm, Daniel Shahaf wrote: } Subject: Re: ${(z)} split of unmatched, doubled (( } } Since the "two subshells" case can be disambiguated by adding a space, } but arithmetic evluations cannot be disambiguated, I assume ambiguous } cases should be resolved in favour of the latter. } } I see the problem: ${(z)} is bufferwords(), which calls ctxtlex(), which } ultimately calls cmd_or_math(), which classifies the unbalanced opening } parentheses as a syntax error, because they have no matching ')' before } the end of the input. Consequently, cmd_or_math() returns CMD_OR_MATH_ERR } on line 512 (in the 'if (lexstop)' block), which causes ctxtlex() to } return LEXERR. So perhaps this: diff --git a/Src/lex.c b/Src/lex.c index 70f3d14..6eb3c82 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -785,6 +785,8 @@ gettok(void) return INPAR; default: + if (lexflags & LEXFLAGS_ACTIVE) + tokstr = dyncat("((", tokstr); return LEXERR; } } It might be prudent to also test that tokstr != NULL there, but I have not found a sample input where that occurs. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ${(z)} split of unmatched, doubled (( 2015-09-28 0:59 ` Bart Schaefer @ 2015-09-28 1:55 ` Daniel Shahaf 2015-09-28 3:30 ` Bart Schaefer 0 siblings, 1 reply; 6+ messages in thread From: Daniel Shahaf @ 2015-09-28 1:55 UTC (permalink / raw) To: Bart Schaefer; +Cc: zsh-workers Bart Schaefer wrote on Sun, Sep 27, 2015 at 17:59:06 -0700: > So perhaps this: > > +++ b/Src/lex.c > @@ -785,6 +785,8 @@ gettok(void) > default: > + if (lexflags & LEXFLAGS_ACTIVE) > + tokstr = dyncat("((", tokstr); > return LEXERR; LGTM: I've tested it both manually and under zsh-syntax-highlighting, and it's an improvement in both cases. > > It might be prudent to also test that tokstr != NULL there, but I have > not found a sample input where that occurs. The failure mode when tokstr is NULL is calling strlen/strcpy on a NULL argument, which is formally undefined behaviour and practically will probably just segfault. If we can't prove that tokstr is always non-NULL, I would vote to test it for NULL before using it. Thanks, Daniel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ${(z)} split of unmatched, doubled (( 2015-09-28 1:55 ` Daniel Shahaf @ 2015-09-28 3:30 ` Bart Schaefer 0 siblings, 0 replies; 6+ messages in thread From: Bart Schaefer @ 2015-09-28 3:30 UTC (permalink / raw) To: zsh-workers On Sep 28, 1:55am, Daniel Shahaf wrote: } } If we can't prove that tokstr is always } non-NULL, I would vote to test it for NULL before using it. OK, I'm going to commit it like this; diff --git a/Src/lex.c b/Src/lex.c index 70f3d14..89af961 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -783,6 +783,15 @@ gettok(void) */ tokstr = NULL; return INPAR; + + case CMD_OR_MATH_ERR: + /* + * LEXFLAGS_ACTIVE means we came from bufferwords(), + * so we treat as an incomplete math expression + */ + if (lexflags & LEXFLAGS_ACTIVE) + tokstr = dyncat("((", tokstr ? tokstr : ""); + /* fall through */ default: return LEXERR; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-28 3:30 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-09-27 1:23 ${(z)} split of unmatched, doubled (( Daniel Shahaf 2015-09-27 16:00 ` Bart Schaefer 2015-09-27 23:51 ` Daniel Shahaf 2015-09-28 0:59 ` Bart Schaefer 2015-09-28 1:55 ` Daniel Shahaf 2015-09-28 3:30 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).