zsh-workers
 help / color / mirror / code / Atom feed
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: zsh-workers@zsh.org
Subject: Re: ${(z)} split of unmatched, doubled ((
Date: Sun, 27 Sep 2015 23:51:06 +0000	[thread overview]
Message-ID: <20150927235106.GD1879@tarsus.local2> (raw)
In-Reply-To: <150927090048.ZM25706@torch.brasslantern.com>

[ replying out of order ]

Bart Schaefer wrote on Sun, Sep 27, 2015 at 09:00:48 -0700:
> On Sep 27,  1:23am, Daniel Shahaf wrote:
> }
> } % print -rl - ${(z):-'(( e'}
> }  e
> } %
> } 
> } Shouldn't it output the parentheses as well?
> 
> There's also an issue of how to treat "e" in this example.  If the
> double parens are taken as math context, then "e" is a single double-
> quoted token, otherwise it has to be decomposed into shell words.  As
> you can see, 4.2.0 parses it BOTH ways (eek).
> 

The "e x y" should be treated consistently with the "((": the latter
should be reported as two tokens iff the "e x y" is split into words.

Since the "two subshells" case can be disambiguated by adding a space,
but arithmetic evluations cannot be disambiguated, I assume ambiguous
cases should be resolved in favour of the latter.

> This has always been broken.  The '((' parse doesn't have a tokstr
> value (cf. comments about parsing "for (( ... ))" in bufferwords()
> [hist.c]).  Prior to recent fixes to backtrack this properly, the
> error was even worse:
> 
> torch% print $ZSH_VERSION
> 4.2.0
> torch% print -rl - ${(z):-'(( e x y'}
>  e x y 
> (
> e
> x
> y
> torch% 
> 

I see the problem: ${(z)} is bufferwords(), which calls ctxtlex(), which
ultimately calls cmd_or_math(), which classifies the unbalanced opening
parentheses as a syntax error, because they have no matching ')' before
the end of the input.  Consequently, cmd_or_math() returns CMD_OR_MATH_ERR
on line 512 (in the 'if (lexstop)' block), which causes ctxtlex() to
return LEXERR.

I guess cmd_or_math() is actually doing the right thing, insofar as the
"interpret and execute code" use-case of the lexer is concerned.  But
the bufferwords() caller shouldn't simply skip over the "((" characters
in the input buffer.  (More on this below.)

> I don't have an answer for where the right place to "output" the parens
> would be; the backtracking makes this ugly.

Looking at bufferwords(), the "e" is added to the output by the
addlinknode() in this block:

  3350	    if (buf && tok == LEXERR && tokstr && *tokstr) {
  3351		int plen;
  3352		untokenize((p = dupstring(tokstr)));
  3353		plen = strlen(p);
  3354		/*
  3355		 * Strip the space we added for lexing but which won't have
  3356		 * been swallowed by the lexer because we aborted early.
  3357		 * The test is paranoia.
  3358		 */
  3359		if (plen && p[plen-1] == ' ' && (plen == 1 || p[plen-2] != Meta))
  3360		    p[plen - 1] = '\0';
  3361		addlinknode(list, p);
  3362		num++;
  3363	    }

If this addlinknode() were skipped, output would stop immediately before
the '((':

    [with line 3361 commented out]
    % Src/zsh -fc 'print -rl - ${(z):-":; (( e"}' 
    :
    ;

However, removing the addlinknode() call makes a test fail:

    ./D04parameter.ztst: starting.
    *** /tmp/zsh.ztst.out.13138	2015-09-27 20:16:41.812154669 +0000
    --- /tmp/zsh.ztst.tout.13138	2015-09-27 20:16:41.816154673 +0000
    ***************
    *** 3,11 ****
      line
      with
      #
    - someone's comment
    - another line # (1 more
    - another one
      *** Kept ***
    --- 3,8 ----
    Test ./D04parameter.ztst failed: output differs from expected as shown above for:
      line=$'A line with # someone\'s comment\nanother line # (1 more\nanother one'
      print "*** Normal ***"
      print -l ${(z)line}
      print "*** Kept ***"
    Was testing: Comments with (z)

So I'm not sure what's right here.  Perhaps the addlinknode() should be
skipped for the "(( e" case?  (i.e., parse as far as possible, and stop
before the ambiguity)  And why is tokstr " e " when tok is LEXERR?  It's
not the " e" that caused the error...

Cheers,

Daniel


  reply	other threads:[~2015-09-27 23:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-27  1:23 Daniel Shahaf
2015-09-27 16:00 ` Bart Schaefer
2015-09-27 23:51   ` Daniel Shahaf [this message]
2015-09-28  0:59     ` Bart Schaefer
2015-09-28  1:55       ` Daniel Shahaf
2015-09-28  3:30         ` Bart Schaefer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150927235106.GD1879@tarsus.local2 \
    --to=d.s@daniel.shahaf.name \
    --cc=schaefer@brasslantern.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).