From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15154 invoked by alias); 13 May 2014 05:01:26 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 32609 Received: (qmail 11079 invoked from network); 13 May 2014 05:01:20 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 From: Bart Schaefer Message-id: <140512220112.ZM20283@torch.brasslantern.com> Date: Mon, 12 May 2014 22:01:12 -0700 In-reply-to: <140511111200.ZM20625@torch.brasslantern.com> Comments: In reply to Bart Schaefer "Re: Parser issues and and [[ $var ]]" (May 11, 11:12am) References: <140416102727.ZM19090@torch.brasslantern.com> <534FE710.3020601@eastlink.ca> <140417123722.ZM22179@torch.brasslantern.com> <20140423165024.1480528a@pws-pc.ntlworld.com> <20140425172112.7bf50606@pwslap01u.europe.root.pri> <140426133019.ZM29630@torch.brasslantern.com> <140510140932.ZM32668@torch.brasslantern.com> <140510180144.ZM26488@torch.brasslantern.com> <20140511180148.3b614054@pws-pc.ntlworld.com> <140511111200.ZM20625@torch.brasslantern.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: zsh-workers@zsh.org Subject: [PATCH] Re: Parser issues and and [[ $var ]] MIME-version: 1.0 Content-type: text/plain; charset=us-ascii On May 11, 11:12am, Bart Schaefer wrote: } } } > Does anyone know why the lexer } } > sometimes sets (tok = DOUTBRACK, tokstr = "\220\220") and other times } } > (tok = DOUTBRACK, tokstr = NULL) ? } } } } It must surely be an oversight. I've left the lexer unchanged and accounted for it in par_cond_2(). } } > Or, we can decree that any string that } } > starts with a "-" is treated as an operator } } } } I think the latter is probably acceptable The patch below does this, which means a lot of runtime "unknown" errors rather than parse errors. } } but maybe there should be an } } interface saying a condition code defined by a module has been called with } } zero arguments (which might on failure alternatively confirm that there } } was no such condition code, allowing a different interpretation). I haven't done anything specific about this yet except to allow conditions to be called with zero arguments. We could change the "unknown condition" handling in cond.c if we want this to behave differently. } Isn't the following an intractable syntax problem? } } [[ -uptofour -a -n $foo ]] } } Or do the "arguments" have to be things that don't look like operators? } What if another module defines an infix operator "-both" and you write } } [[ -uptofour -both -n $foo ]] } } ? Is -both an operator or an argument? Before the module is loaded, how } do you tell? This problem already exists -- infix operators that aren't tokens like && and || are already lower prececence than prefix operators. E.g.: zsh% [[ -n 1 -eq 1 ]] zsh: unknown condition: -n So I don't think the patch below changes any of that. I've included PWS's test from users/18775. Some more tests of *wrong* syntax might be helpful; for example, Test/A01grammar.ztst helped find a crash on tokstr == NULL with the "echo '[[' >bad_syntax" test, though the current testing regime makes it pretty difficult to determine what test failed when the failure was a segmentation fault. diff --git a/Src/parse.c b/Src/parse.c index 530a070..1af98b5 100644 --- a/Src/parse.c +++ b/Src/parse.c @@ -2068,6 +2068,9 @@ par_cond_2(void) /* one argument: [ foo ] is equivalent to [ -n foo ] */ s1 = tokstr; condlex(); + /* ksh behavior: [ -t ] means [ -t 1 ]; bash disagrees */ + if (unset(POSIXBUILTINS) && !strcmp(s1, "-t")) + return par_cond_double(s1, dupstring("1")); return par_cond_double(dupstring("-n"), s1); } if (testargs[1]) { @@ -2086,6 +2089,10 @@ par_cond_2(void) return par_cond_triple(s1, s2, s3); } } + /* + * We fall through here on any non-numeric infix operator + * or any other time there are at least two arguments. + */ } if (tok == BANG) { /* @@ -2114,18 +2121,20 @@ par_cond_2(void) condlex(); return r; } + s1 = tokstr; + dble = (s1 && *s1 == '-' + && (condlex != testlex + || strspn(s1+1, "abcdefghknoprstuwxzLONGS") == 1) + && !s1[2]); if (tok != STRING) { - if (tok && tok != LEXERR && condlex == testlex) { - s1 = tokstr; + /* Check first argument for [[ STRING ]] re-interpretation */ + if (s1 /* tok != DOUTBRACK && tok != DAMPER && tok != DBAR */ + && tok != LEXERR && (!dble || condlex == testlex)) { condlex(); - return par_cond_double("-n", s1); + return par_cond_double(dupstring("-n"), s1); } else YYERROR(ecused); } - s1 = tokstr; - if (condlex == testlex) - dble = (*s1 == '-' && strspn(s1+1, "abcdefghknoprstuwxzLONGS") == 1 - && !s1[2]); condlex(); if (tok == INANG || tok == OUTANG) { enum lextok xtok = tok; @@ -2140,15 +2149,21 @@ par_cond_2(void) return 1; } if (tok != STRING) { - if (tok != LEXERR && condlex == testlex) { - if (!dble) - return par_cond_double("-n", s1); - else if (!strcmp(s1, "-t")) - return par_cond_double(s1, "1"); + /* + * Check second argument in case semantics e.g. [ = -a = ] + * mean we have to go back and fix up the first one + */ + if (tok != LEXERR) { + if (!dble || condlex == testlex) + return par_cond_double(dupstring("-n"), s1); + else + return par_cond_multi(s1, newlinklist()); } else YYERROR(ecused); } - s2 = tokstr; + s2 = tokstr; + if (condlex != testlex) + dble = (s2 && s2 && *s2 == '-' && !s2[2]); incond++; /* parentheses do globbing */ condlex(); incond--; /* parentheses do grouping */ diff --git a/Test/C02cond.ztst b/Test/C02cond.ztst index 94fca8b..6900147 100644 --- a/Test/C02cond.ztst +++ b/Test/C02cond.ztst @@ -349,6 +349,14 @@ F:Failures in these cases do not indicate a problem in the shell. >0 >1 + foo='' + [[ $foo ]] || print foo is empty + foo=full + [[ $foo ]] && print foo is full +0:bash compatibility with single [[ ... ]] argument +>foo is empty +>foo is full + %clean # This works around a bug in rm -f in some versions of Cygwin chmod 644 unmodish