From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13917 invoked by alias); 21 May 2015 08:54:30 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 35248 Received: (qmail 13606 invoked from network); 21 May 2015 08:54:28 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,T_HDRS_LCASE, T_MANY_HDRS_LCASE autolearn=ham autolearn_force=no version=3.4.0 X-AuditID: cbfec7f5-f794b6d000001495-52-555d9d3e8dfb Date: Thu, 21 May 2015 09:54:19 +0100 From: Peter Stephenson To: Zsh Hackers' List Subject: PATCH: get off my case Message-id: <20150521095419.168809c8@pwslap01u.europe.root.pri> Organization: Samsung Cambridge Solution Centre X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu) MIME-version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrNLMWRmVeSWpSXmKPExsVy+t/xa7p2c2NDDQ518VscbH7I5MDoserg B6YAxigum5TUnMyy1CJ9uwSujF+/JjAXTDOq2DVnHnsD4yPlLkZODgkBE4mf6x4yQ9hiEhfu rWfrYuTiEBJYyiix9vlLJghnBpPEheermCGcbYwSf948YAdpYRFQlZhy9SAjiM0mYCgxddNs IJuDQ0RAW6L9oxhIWFhAXuJTfzsbiM0rYC+xZeo1JhCbX0Bf4urfT0wQm+0lZl45wwhRIyjx Y/I9FhCbWUBLYvO2JlYIW15i85q3YJcKCahL3Li7m30Co8AsJC2zkLTMQtKygJF5FaNoamly QXFSeq6RXnFibnFpXrpecn7uJkZIEH7dwbj0mNUhRgEORiUe3g0qsaFCrIllxZW5hxglOJiV RHhzooFCvCmJlVWpRfnxRaU5qcWHGKU5WJTEeWfueh8iJJCeWJKanZpakFoEk2Xi4JRqYEzO U5b/HPg05cyuZYcvMbw4tX2C4cIZr1c5r7n8VvJzqMK/s8suFp5huJZR8OtqpdmtlndnDiTd WXK58N7UvyU/5ljHi0XW+uS8qv/7u2tCe8Uu8w/Ozje3BUaqN60/aWnf7G0eJxnZ9bvI/OHp eKGS5+eYLy8645/Dzav3bypvqlC+k/mOayJKLMUZiYZazEXFiQDXiue/PgIAAA== On closer examination, it turns out the previous "case" featurama could be divided cleanly (if that's an appropriate word in this context) into two parts. First, there was code to handle sh-style parenthesis handling, what you get when you set SH_GLOB, by sticking words back together with a "|" and treating the result as a single pattern. This is now redundant with the alternations being handled separately. So sh-style cases can be left alone and continue to work properly without having to force initial parentheses to match singly with SH_GLOB off. Second, there was code to handle zsh patterns when a fully parenthesised expression turned up with SH_GLOB off. This is where the hackfest on the internal whitespace took place. This part is still needed to reconcile the two worlds when we revert to only matching separate opening parentheses in the SH_GLOB case. However, it can be resurrected without damage to the other part. Note there are no further changes to wordcode here. Case solved. pws diff --git a/Src/lex.c b/Src/lex.c index 87b0cd3..841fb0b 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -761,8 +761,6 @@ gettok(void) lexstop = 0; return BAR; case LX1_INPAR: - if (incasepat == 2) - return INPAR; d = hgetc(); if (d == '(') { if (infor) { diff --git a/Src/parse.c b/Src/parse.c index c486699..7d618cd 100644 --- a/Src/parse.c +++ b/Src/parse.c @@ -1152,7 +1152,7 @@ par_case(int *cmplx) YYERRORV(oecused); } brflag = (tok == INBRACE); - incasepat = 2; + incasepat = 1; incmdpos = 0; noaliases = ona; nocorrect = onc; @@ -1165,10 +1165,8 @@ par_case(int *cmplx) zshlex(); if (tok == OUTBRACE) break; - if (tok == INPAR) { - incasepat = 1; + if (tok == INPAR) zshlex(); - } if (tok != STRING) YYERRORV(oecused); if (!strcmp(tokstr, "esac")) @@ -1178,19 +1176,96 @@ par_case(int *cmplx) pp = ecadd(0); palts = ecadd(0); nalts = 0; + /* + * Hack here. + * + * [Pause for astonished hubbub to subside.] + * + * The next token we get may be + * - ")" or "|" if we're looking at an honest-to-god + * "case" patten, either because there's no opening + * parenthesis, or because SH_GLOB is set and we + * managed to grab an initial "(" to mark the start + * of the case pattern. + * - Something else --- we don't care what --- because + * we're parsing a complete "(...)" as a complete + * zsh pattern. In that case, we treat this as a + * single instance of a case pattern but we pretend + * we're doing proper case parsing --- in which the + * parentheses and bar are in different words from + * the string, so may be separated by whitespace. + * So we quietly massage the whitespace and hope + * no one noticed. This is horrible, but it's + * unfortunately too difficult to comine traditional + * zsh patterns with a properly parsed case pattern + * without generating incompatibilities which aren't + * all that popular (I've discovered). + * - We can also end up with something other than ")" or "|" + * just because we're looking at garbage. + * + * Because of the second case, what happens next might + * be the start of the command after the pattern, so we + * need to treat it as in command position. Luckily + * this doesn't affect our ability to match a | or ) as + * these are valid on command lines. + */ + incasepat = 0; + incmdpos = 1; for (;;) { - ecstr(str); - ecadd(ecnpats++); - nalts++; - zshlex(); if (tok == OUTPAR) { + ecstr(str); + ecadd(ecnpats++); + nalts++; + incasepat = 0; incmdpos = 1; zshlex(); break; - } else if (tok != BAR) + } else if (tok == BAR) { + ecstr(str); + ecadd(ecnpats++); + nalts++; + + incasepat = 1; + incmdpos = 0; + } else { + if (!nalts && str[0] == Inpar) { + int pct = 0, sl; + char *s; + + for (s = str; *s; s++) { + if (*s == Inpar) + pct++; + if (!pct) + break; + if (pct == 1) { + if (*s == Bar || *s == Inpar) + while (iblank(s[1])) + chuck(s+1); + if (*s == Bar || *s == Outpar) + while (iblank(s[-1]) && + (s < str + 1 || s[-2] != Meta)) + chuck(--s); + } + if (*s == Outpar) + pct--; + } + if (*s || pct || s == str) + YYERRORV(oecused); + /* Simplify pattern by removing surrounding (...) */ + sl = strlen(str); + DPUTS(*str != Inpar || str[sl - 1] != Outpar, + "BUG: strange case pattern"); + str[sl - 1] = '\0'; + chuck(str); + ecstr(str); + ecadd(ecnpats++); + nalts++; + break; + } YYERRORV(oecused); + } zshlex(); if (tok != STRING) @@ -1208,7 +1283,7 @@ par_case(int *cmplx) break; if (tok != DSEMI && tok != SEMIAMP && tok != SEMIBAR) YYERRORV(oecused); - incasepat = 2; + incasepat = 1; incmdpos = 0; zshlex(); } diff --git a/Test/A01grammar.ztst b/Test/A01grammar.ztst index 41fb486..50058e2 100644 --- a/Test/A01grammar.ztst +++ b/Test/A01grammar.ztst @@ -614,7 +614,8 @@ >mytrue >END - fn() { + (emulate sh -c ' + fn() { case $1 in ( one | two | three ) print Matched $1 @@ -627,6 +628,7 @@ ;; esac } + ' which fn fn one fn two @@ -635,8 +637,8 @@ fn five fn six fn abecedinarian - fn xylophone -0: case word handling + fn xylophone) +0: case word handling in sh emulation (SH_GLOB parentheses) >fn () { > case $1 in > (one | two | three) print Matched $1 ;; @@ -665,3 +667,31 @@ 0: case patterns within words >1 OK >2 OK + + case horrible in + ([a-m])(|[n-z])rr(|ib(um|le|ah))) + print It worked + ;; + esac + case "a string with separate words" in + (*with separate*)) + print That worked, too + ;; + esac +0:Unbalanced parentheses and spaces with zsh pattern +>It worked +>That worked, too + + case horrible in + (([a-m])(|[n-z])rr(|ib(um|le|ah))) + print It worked + ;; + esac + case "a string with separate words" in + (*with separate*) + print That worked, too + ;; + esac +0:Balanced parentheses and spaces with zsh pattern +>It worked +>That worked, too