From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4616 invoked by alias); 27 Sep 2015 23:51:12 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36665 Received: (qmail 24991 invoked from network); 27 Sep 2015 23:51:11 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.0 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= daniel.shahaf.name; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=rQ9xDDW9Fw5Wqedq Ih7f3yFtmSk=; b=MBwRS3i2L/SQEGYuF6Cq/1mvl2oO7gXuuEohti1zyE+yMyDh 8rlKRcWAcE9ujaKyrkh7KGA1TilpJKjkdDe5HkizV/WQBm6giVRtRaCLeIOnpPfp M3j7xKMumCysHSyaWUKzWGTmKhv5YawGmXZrt9gdqpRXuKJC6UFbTRhmwUg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=rQ9xDDW9Fw5Wqed qIh7f3yFtmSk=; b=sD7UMf2PRz9o34b0jFcLgN5qmNcs3MwDvdxWZTNMj0uDcJd Q6RBPxQBAq7LnsM9srwf2zMjR1zcWXtiBjor7tJxFZBajm317YNrwwEPUyRc6aHH pcBkv4EKB4jEa1QdI7t3JNEngzYh7HvHJIO8TsjWmKjDJxZf0pHJ8jQaq7eo= X-Sasl-enc: xYrvOrI0/PbhXoxsWtvgg7xTxU24sVT5ys6FOMFEo8FK 1443397868 Date: Sun, 27 Sep 2015 23:51:06 +0000 From: Daniel Shahaf To: Bart Schaefer Cc: zsh-workers@zsh.org Subject: Re: ${(z)} split of unmatched, doubled (( Message-ID: <20150927235106.GD1879@tarsus.local2> References: <20150927012337.GD1989@tarsus.local2> <150927090048.ZM25706@torch.brasslantern.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <150927090048.ZM25706@torch.brasslantern.com> User-Agent: Mutt/1.5.21 (2010-09-15) [ replying out of order ] Bart Schaefer wrote on Sun, Sep 27, 2015 at 09:00:48 -0700: > On Sep 27, 1:23am, Daniel Shahaf wrote: > } > } % print -rl - ${(z):-'(( e'} > } e > } % > } > } Shouldn't it output the parentheses as well? > > There's also an issue of how to treat "e" in this example. If the > double parens are taken as math context, then "e" is a single double- > quoted token, otherwise it has to be decomposed into shell words. As > you can see, 4.2.0 parses it BOTH ways (eek). > The "e x y" should be treated consistently with the "((": the latter should be reported as two tokens iff the "e x y" is split into words. Since the "two subshells" case can be disambiguated by adding a space, but arithmetic evluations cannot be disambiguated, I assume ambiguous cases should be resolved in favour of the latter. > This has always been broken. The '((' parse doesn't have a tokstr > value (cf. comments about parsing "for (( ... ))" in bufferwords() > [hist.c]). Prior to recent fixes to backtrack this properly, the > error was even worse: > > torch% print $ZSH_VERSION > 4.2.0 > torch% print -rl - ${(z):-'(( e x y'} > e x y > ( > e > x > y > torch% > I see the problem: ${(z)} is bufferwords(), which calls ctxtlex(), which ultimately calls cmd_or_math(), which classifies the unbalanced opening parentheses as a syntax error, because they have no matching ')' before the end of the input. Consequently, cmd_or_math() returns CMD_OR_MATH_ERR on line 512 (in the 'if (lexstop)' block), which causes ctxtlex() to return LEXERR. I guess cmd_or_math() is actually doing the right thing, insofar as the "interpret and execute code" use-case of the lexer is concerned. But the bufferwords() caller shouldn't simply skip over the "((" characters in the input buffer. (More on this below.) > I don't have an answer for where the right place to "output" the parens > would be; the backtracking makes this ugly. Looking at bufferwords(), the "e" is added to the output by the addlinknode() in this block: 3350 if (buf && tok == LEXERR && tokstr && *tokstr) { 3351 int plen; 3352 untokenize((p = dupstring(tokstr))); 3353 plen = strlen(p); 3354 /* 3355 * Strip the space we added for lexing but which won't have 3356 * been swallowed by the lexer because we aborted early. 3357 * The test is paranoia. 3358 */ 3359 if (plen && p[plen-1] == ' ' && (plen == 1 || p[plen-2] != Meta)) 3360 p[plen - 1] = '\0'; 3361 addlinknode(list, p); 3362 num++; 3363 } If this addlinknode() were skipped, output would stop immediately before the '((': [with line 3361 commented out] % Src/zsh -fc 'print -rl - ${(z):-":; (( e"}' : ; However, removing the addlinknode() call makes a test fail: ./D04parameter.ztst: starting. *** /tmp/zsh.ztst.out.13138 2015-09-27 20:16:41.812154669 +0000 --- /tmp/zsh.ztst.tout.13138 2015-09-27 20:16:41.816154673 +0000 *************** *** 3,11 **** line with # - someone's comment - another line # (1 more - another one *** Kept *** --- 3,8 ---- Test ./D04parameter.ztst failed: output differs from expected as shown above for: line=$'A line with # someone\'s comment\nanother line # (1 more\nanother one' print "*** Normal ***" print -l ${(z)line} print "*** Kept ***" Was testing: Comments with (z) So I'm not sure what's right here. Perhaps the addlinknode() should be skipped for the "(( e" case? (i.e., parse as far as possible, and stop before the ambiguity) And why is tokstr " e " when tok is LEXERR? It's not the " e" that caused the error... Cheers, Daniel