* [BUG] quoting within bracket patterns has no effect @ 2016-01-18 4:23 Martijn Dekker 2016-01-18 17:24 ` Peter Stephenson 0 siblings, 1 reply; 17+ messages in thread From: Martijn Dekker @ 2016-01-18 4:23 UTC (permalink / raw) To: zsh-workers, ast-developers Quotes should disable the special meaning of characters in glob patterns[*]. So this: case b in ( ['a-c'] ) echo 'false match' ;; ( [a-c] ) echo 'correct match' ;; esac should output "correct match". But on zsh and AT&T ksh93 (and only those), it outputs "false match". Meaning, quoting the characters within the bracket pattern does not disable the special meaning of '-' in the bracket pattern. This hinders a realistic use case: the ability to pass a series of arbitrary characters in a variable for use within a bracket pattern. Quoting the variable does not have any effect; if the series contains a '-', the result is unexpected. For example: mychars='abs$ad-f3ra' # arbitrary series of characters containing '-' somevar=qezm # this contains none of the characters above case $somevar in ( *["$mychars"]* ) echo "$somevar contains one of $mychars" ;; esac produces a false positive on zsh and ksh93. A workaround is to make sure the '-', if any, is always last in the string of characters to match against. The same thing also affects glob patterns in other contexts, e.g. removing characters using parameter substitution. Other shells, at least bash, (d)ash variants, pdksh, mksh and yash, all act like POSIX says they should, according to my tests. Thanks, - Martijn [*] "If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself." http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-18 4:23 [BUG] quoting within bracket patterns has no effect Martijn Dekker @ 2016-01-18 17:24 ` Peter Stephenson 2016-01-19 15:57 ` Jun T. ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Peter Stephenson @ 2016-01-18 17:24 UTC (permalink / raw) To: Zsh Hackers' List On Mon, 18 Jan 2016 05:23:07 +0100 Martijn Dekker <martijn@inlv.org> wrote: > Quotes should disable the special meaning of characters in glob > patterns[*]. > > [*] "If any character (ordinary, shell special, or pattern special) is > quoted, that pattern shall match the character itself." > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01 Dash is a pattern special, but not shell special, character. These haven't had much attention --- I have a vague memory, which could be fallacious, that some time ago the state of the art (whether or not the standard) was that it wasn't actually possible to quote these (other than by putting them in special positions) in most shells. There could therefore be others like this. Prior art for characters that are only sometimes special, but need to be first class tokens when they are, exists in the case of "," as used in subscripts, suggesting the following isn't hopelessly optimistic. One thing that makes me think there's something I've missed is that activating "-" in the case of a "[" within a brace parameter --- which I did in case there was a pattern inside --- caused three tests to fall over, and I can't see why. However, it seems the case ${foo#[a-z]} does work without that (again I don't know why), so it looks like that tweak isn't needed. You can read the code and the tests for the various gotchas I did manage to think about. "[]a-z]" being a valid range was one. pws diff --git a/Src/glob.c b/Src/glob.c index 8bd2fc4..e5d8956 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -3476,7 +3476,7 @@ static void zshtokenize(char *s, int flags) { char *t; - int bslash = 0; + int bslash = 0, seen_brct = 0; for (; *s; s++) { cont: @@ -3507,21 +3507,35 @@ zshtokenize(char *s, int flags) *t = Inang; *s = Outang; break; + case '[': + if (bslash) + s[-1] = (flags & ZSHTOK_SUBST) ? Bnullkeep : Bnull; + else { + seen_brct = 1; + *s = Inbrack; + } + break; + case '-': + if (bslash) + s[-1] = (flags & ZSHTOK_SUBST) ? Bnullkeep : Bnull; + else if (seen_brct) /* see corresonding code in lex.c */ + *s = Dash; + break; case '(': case '|': case ')': if (flags & ZSHTOK_SHGLOB) break; + /*FALLTHROUGH*/ case '>': case '^': case '#': case '~': - case '[': case ']': case '*': case '?': case '=': - for (t = ztokens; *t; t++) + for (t = ztokens; *t; t++) { if (*t == *s) { if (bslash) s[-1] = (flags & ZSHTOK_SUBST) ? Bnullkeep : Bnull; @@ -3529,6 +3543,8 @@ zshtokenize(char *s, int flags) *s = (t - ztokens) + Pound; break; } + } + break; } bslash = 0; } diff --git a/Src/lex.c b/Src/lex.c index 0f260d0..9a7e3b8 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -35,7 +35,7 @@ /* tokens */ /**/ -mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,'\"\\\\"; +mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-'\"\\\\"; /* parts of the current token */ @@ -394,8 +394,9 @@ ctxtlex(void) #define LX2_DQUOTE 15 #define LX2_BQUOTE 16 #define LX2_COMMA 17 -#define LX2_OTHER 18 -#define LX2_META 19 +#define LX2_DASH 18 +#define LX2_OTHER 19 +#define LX2_META 20 static unsigned char lexact1[256], lexact2[256], lextok2[256]; @@ -405,7 +406,7 @@ initlextabs(void) { int t0; static char *lx1 = "\\q\n;!&|(){}[]<>"; - static char *lx2 = ";)|$[]~({}><=\\\'\"`,"; + static char *lx2 = ";)|$[]~({}><=\\\'\"`,-"; for (t0 = 0; t0 != 256; t0++) { lexact1[t0] = LX1_OTHER; @@ -919,7 +920,7 @@ gettok(void) static enum lextok gettokstr(int c, int sub) { - int bct = 0, pct = 0, brct = 0, fdpar = 0; + int bct = 0, pct = 0, brct = 0, seen_brct = 0, fdpar = 0; int intpos = 1, in_brace_param = 0; int inquote, unmatched = 0; enum lextok peek; @@ -1033,8 +1034,10 @@ gettokstr(int c, int sub) } break; case LX2_INBRACK: - if (!in_brace_param) + if (!in_brace_param) { brct++; + seen_brct = 1; + } c = Inbrack; break; case LX2_OUTBRACK: @@ -1346,6 +1349,21 @@ gettokstr(int c, int sub) c = Tick; SETPAREND break; + case LX2_DASH: + /* + * - shouldn't be treated as a special character unless + * we're in a pattern. Howeve,simply counting "[" doesn't + * work as []a-z] is a valid expression and we don't know + * down here what this "[" is for as $foo[stuff] is valid + * in zsh. So just detect an opening [, which is enough + * to turn this into a pattern; the Dash will be harmlessly + * untokenised if not wanted. + */ + if (seen_brct) + c = Dash; + else + c = '-'; + break; } add(c); c = hgetc(); diff --git a/Src/pattern.c b/Src/pattern.c index 9e8a80a..d2b8c59 100644 --- a/Src/pattern.c +++ b/Src/pattern.c @@ -1459,7 +1459,7 @@ patcomppiece(int *flagp, int paren) charstart = patparse; METACHARINC(patparse); - if (*patparse == '-' && patparse[1] && + if (*patparse == Dash && patparse[1] && patparse[1] != Outbrack) { patadd(NULL, STOUC(Meta)+PP_RANGE, 1, PA_NOALIGN); if (itok(*charstart)) { @@ -1468,7 +1468,7 @@ patcomppiece(int *flagp, int paren) } else { patadd(charstart, 0, patparse-charstart, PA_NOALIGN); } - charstart = ++patparse; /* skip ASCII '-' */ + charstart = ++patparse; /* skip Dash token */ METACHARINC(patparse); } if (itok(*charstart)) { diff --git a/Src/utils.c b/Src/utils.c index 788eba9..fd0bab3 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -3888,7 +3888,7 @@ inittyptab(void) typtab['\0'] |= IMETA; typtab[STOUC(Meta) ] |= IMETA; typtab[STOUC(Marker)] |= IMETA; - for (t0 = (int)STOUC(Pound); t0 <= (int)STOUC(Comma); t0++) + for (t0 = (int)STOUC(Pound); t0 <= (int)STOUC(LAST_NORMAL_TOK); t0++) typtab[t0] |= ITOK | IMETA; for (t0 = (int)STOUC(Snull); t0 <= (int)STOUC(Nularg); t0++) typtab[t0] |= ITOK | IMETA | INULL; diff --git a/Src/zsh.h b/Src/zsh.h index 0302d68..6ee2a9c 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -192,24 +192,30 @@ struct mathfunc { #define Tilde ((char) 0x98) #define Qtick ((char) 0x99) #define Comma ((char) 0x9a) +#define Dash ((char) 0x9b) /* Only in patterns */ +/* + * Marks the last of the group above. + * Remaining tokens are even more special. + */ +#define LAST_NORMAL_TOK Dash /* * Null arguments: placeholders for single and double quotes * and backslashes. */ -#define Snull ((char) 0x9b) -#define Dnull ((char) 0x9c) -#define Bnull ((char) 0x9d) +#define Snull ((char) 0x9c) +#define Dnull ((char) 0x9d) +#define Bnull ((char) 0x9e) /* * Backslash which will be returned to "\" instead of being stripped * when we turn the string into a printable format. */ -#define Bnullkeep ((char) 0x9e) +#define Bnullkeep ((char) 0x9f) /* * Null argument that does not correspond to any character. * This should be last as it does not appear in ztokens and * is used to initialise the IMETA type in inittyptab(). */ -#define Nularg ((char) 0x9f) +#define Nularg ((char) 0xa0) /* * Take care to update the use of IMETA appropriately when adding @@ -220,7 +226,7 @@ struct mathfunc { * Also used in pattern character arrays as guaranteed not to * mark a character in a string. */ -#define Marker ((char) 0xa0) +#define Marker ((char) 0xa1) /* chars that need to be quoted if meant literally */ diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst index f944a4f..86133b0 100644 --- a/Test/D02glob.ztst +++ b/Test/D02glob.ztst @@ -582,3 +582,43 @@ >1 OK >2 OK >3 OK + + [[ foo = 'f'\o"o" ]] +0:Stripping of quotes from patterns (1) + + [[ foo = 'f'('o'|'a')('o'|'b') ]] +0:Stripping of quotes from patterns (2) + + [[ fob = 'f'('o'|'a')('o'|'b') ]] +0:Stripping of quotes from patterns (3) + + [[ fab = 'f'('o'|'a')('o'|'b') ]] +0:Stripping of quotes from patterns (4) + + [[ fib != 'f'('o'|'a')('o'|'b') ]] +0:Stripping of quotes from patterns (4) + + [[ - != [a-z] ]] +0:- is a special character in ranges + + [[ - = ['a-z'] ]] +0:- is not a special character in ranges if quoted + + [[ b-1 = [a-z]-[0-9] ]] +0:- untokenized following a bracketed subexpression + + [[ b-1 = []a-z]-[]0-9] ]] +0:- "]" after "[" is normal range character and - still works + + headremove="bcdef" + print ${headremove#[a-z]} +0:active - works in pattern in parameter +>cdef + + headremove="bcdef" + print ${headremove#['a-z']} + headremove="-cdef" + print ${headremove#['a-z']} +0:quoted - works in pattern in parameter +>bcdef +>cdef ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-18 17:24 ` Peter Stephenson @ 2016-01-19 15:57 ` Jun T. 2016-01-19 17:35 ` Peter Stephenson 2016-01-19 16:03 ` Peter Stephenson 2016-01-23 0:17 ` Martijn Dekker 2 siblings, 1 reply; 17+ messages in thread From: Jun T. @ 2016-01-19 15:57 UTC (permalink / raw) To: zsh-workers The patch causes the following error: % /usr/local/bin/zsh -f zsh% autoload -U compinit zsh% compinit compaudit:151: unknown file attribute: - It seems the error is actually from line 153 of compaudit: _i_wdirs=( $_i_wdirs ${^fpath}.zwc^([^_]*|*~)(N-^${_i_owners}) ) Simpler example is zsh% echo a[b](-/) zsh: unknown file attribute: - In glob.c, switch statement from line 1298, *s is not '-' but Dash, and not handled by case '-': at line 1316. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 15:57 ` Jun T. @ 2016-01-19 17:35 ` Peter Stephenson 2016-01-19 18:54 ` Bart Schaefer 2016-01-20 10:48 ` Jun T. 0 siblings, 2 replies; 17+ messages in thread From: Peter Stephenson @ 2016-01-19 17:35 UTC (permalink / raw) To: zsh-workers On Wed, 20 Jan 2016 00:57:30 +0900 Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote: > In glob.c, switch statement from line 1298, > *s is not '-' but Dash, and not handled by > case '-': > at line 1316. It looks like glob qualifiers are a grey area for tokenisation, with arguments that should be untokenised but aren't (so far as I can see), so it's not necessarily just that case. Short of rewriting it, this is about the best I can see, and ought to be OK for -, but I suspect this is just the tip of the iceberg. However, maybe it gets tidied up later somewhere I haven't noticed so just handling the case statement would actually work. pws diff --git a/Src/glob.c b/Src/glob.c index c799281..69de155 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -1230,7 +1230,7 @@ zglob(LinkList list, LinkNode np, int nountok) char *s; int sense, qualsfound; off_t data; - char *sdata, *newcolonmod; + char *sdata, *newcolonmod, *ptr; int (*func) _((char *, Statptr, off_t, char *)); /* @@ -1273,6 +1273,9 @@ zglob(LinkList list, LinkNode np, int nountok) *s++ = 0; if (qualsfound == 2) s += 2; + for (ptr = s; *ptr; ptr++) + if (*ptr == Dash) + *ptr = '-'; while (*s && !newcolonmod) { func = (int (*) _((char *, Statptr, off_t, char *)))0; if (idigit(*s)) { ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 17:35 ` Peter Stephenson @ 2016-01-19 18:54 ` Bart Schaefer 2016-01-20 10:48 ` Jun T. 1 sibling, 0 replies; 17+ messages in thread From: Bart Schaefer @ 2016-01-19 18:54 UTC (permalink / raw) To: zsh-workers On Jan 19, 5:35pm, Peter Stephenson wrote: } Subject: Re: [BUG] quoting within bracket patterns has no effect } } It looks like glob qualifiers are a grey area for tokenisation, with } arguments that should be untokenised but aren't (so far as I can see), Note that a zsh "tip" I posted a back in August depends on tokenization in glob qualifiers. See users/20439 Not that such a trick is a reason to keep tokenizing if other things work better untoken'd. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 17:35 ` Peter Stephenson 2016-01-19 18:54 ` Bart Schaefer @ 2016-01-20 10:48 ` Jun T. 2016-01-20 11:04 ` Peter Stephenson 1 sibling, 1 reply; 17+ messages in thread From: Jun T. @ 2016-01-20 10:48 UTC (permalink / raw) To: zsh-workers vcs_info doesn't work due to the '-'/Dash problem. It fails at line 6 of VCS_INFO_get_cmd: vcs_comm[cmd]=${cmd:-$vcs} In gettokstr(), seen_brct is set to 1 by the '[' and never reset to 0, and the '-' is converted to Dash. % x= % y=yes % echo ${x:-$y} yes % a[1]=${x:-$y} % echo '<'$a[1]'>' <> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-20 10:48 ` Jun T. @ 2016-01-20 11:04 ` Peter Stephenson 0 siblings, 0 replies; 17+ messages in thread From: Peter Stephenson @ 2016-01-20 11:04 UTC (permalink / raw) To: zsh-workers On Wed, 20 Jan 2016 19:48:00 +0900 Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote: > In gettokstr(), seen_brct is set to 1 by the '[' and > never reset to 0, and the '-' is converted to Dash. > > % x= > % y=yes > % echo ${x:-$y} > yes > % a[1]=${x:-$y} > % echo '<'$a[1]'>' > <> There could well be more of these --- as we don't parse patterns until late (we don't know it's a pattern) and quote handling is done early I don't see a more general fix. diff --git a/Src/lex.c b/Src/lex.c index 3ea878c..23b0a1c 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -1026,8 +1026,10 @@ gettokstr(int c, int sub) c = Inbrace; ++bct; cmdpush(CS_BRACEPAR); - if (!in_brace_param) - in_brace_param = bct; + if (!in_brace_param) { + if ((in_brace_param = bct)) + seen_brct = 0; + } } else { hungetc(e); lexstop = 0; diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index bcea980..a6817fe 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -1880,3 +1880,9 @@ >'two words' >'three so-called '\''words'\' >'three so-called ''words''' + + array=(one two three) + array[1]=${nonexistent:-foo} + print $array +0:"-" works after "[" in same expression (Dash problem) +>foo two three ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-18 17:24 ` Peter Stephenson 2016-01-19 15:57 ` Jun T. @ 2016-01-19 16:03 ` Peter Stephenson 2016-01-19 16:25 ` Mikael Magnusson 2016-01-19 18:41 ` Bart Schaefer 2016-01-23 0:17 ` Martijn Dekker 2 siblings, 2 replies; 17+ messages in thread From: Peter Stephenson @ 2016-01-19 16:03 UTC (permalink / raw) To: Zsh Hackers' List On Mon, 18 Jan 2016 17:24:34 +0000 Peter Stephenson <p.stephenson@samsung.com> wrote: > Dash is a pattern special, but not shell special, character. These > haven't had much attention --- I have a vague memory, which could be > fallacious, that some time ago the state of the art (whether or not the > standard) was that it wasn't actually possible to quote these (other > than by putting them in special positions) in most shells. > > There could therefore be others like this. We need to change "^" and "!" for negation of character sets. "^" is easy; just remove a special case that it checks for the non-tokenised version. "!" needs a new token along the same lines as "-". It's not used very much in zsh for this purpose, being inconvenient with history substitution at the command line. The changes make the following code behave differently: seq="a-z" [[ $char = [$seq] ]] (except with GLOBSUBST in sh emulation where pattern characters from unquoted variables are active). Now you need [[ $char = [$~seq] ]] I took a brief look at the completion code to see if anything would be affected by this, but nothing stood out. I had to change the code behind $~ so that it always tokenized "!" and "-", not just after a "[", to get the above case to work. I don't think this actually makes much difference --- haswilds(), which looks to see if globbing is needed, already does a more careful check than just looking for tokens, so the only difference I can think of is optimisation of a pattern to a pure string match, which could be optimised to ignore Dash and Bang as they're only active if we have Inbrack. Another not very pleasant case is kshglob where "!(...)" expressions now may have an untokenised or tokenised "!" --- the unquoted parentheses are what triggers it to be a glob expression. However, unless you start fiddling with "disable -p" to turn off this form of globbing, which you don't (please), no one's going to notice. pws diff --git a/README b/README index 2e2ebce..8ec148e 100644 --- a/README +++ b/README @@ -29,17 +29,43 @@ Zsh is a shell with lots of features. For a list of some of these, see the file FEATURES, and for the latest changes see NEWS. For more details, see the documentation. -Incompatibilities between 5.1 and 5.2 +Incompatibilities between 5.2 and 5.3 ------------------------------------- +In character classes delimited by "[" and "]" within patterns, whether +used for filename generation (globbing) or other forms of pattern +matching, it used not to be possible to quote "-" when used for a range, +or "^" and "!" when used for negating a character set. The chracters can +now be quoted by any of the standard shell means, but note that +the "[" and "]" must not be quoted. For example, + + [[ $a = ['a-z'] ]] + +matches if the variable a contains just one of the characters "a", "-" +or "z" only. Previously this would have matched any lower case ASCII +letter. Note therefore the useful fact that + + [[ $a = ["$cset"] ]] + +matches any chracter contained in the variable "cset". A consequence +of this change is that variables that should have active ranges need +(with default zsh options) to be indicated explicitly, e.g. + + cset="a-z" + [[ b = [${~cset}] ]] + +The "~" causes the "-" character to be active. In sh emulation the +"~" is unncessary in this example and double quotes must be used to +suppress the range behaviour of the "-". + +Incompatibilities between 5.0.8 and 5.2 +--------------------------------------- + The behaviour of the parameter flag (P) has changed when it appears in a nested parameter group, in order to make it more useful in such cases. A (P) in the outermost parameter group behaves as before. See NEWS for more. -Incompatibilities between 5.0.8 and 5.1 ---------------------------------------- - The default behaviour when text is pasted into an X Windows terminal has changed significantly (unless you are using a very old terminal emulator that doesn't support this mode). Now, the new "bracketed paste mode" diff --git a/Src/glob.c b/Src/glob.c index e5d8956..c799281 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -3476,7 +3476,7 @@ static void zshtokenize(char *s, int flags) { char *t; - int bslash = 0, seen_brct = 0; + int bslash = 0; for (; *s; s++) { cont: @@ -3507,20 +3507,6 @@ zshtokenize(char *s, int flags) *t = Inang; *s = Outang; break; - case '[': - if (bslash) - s[-1] = (flags & ZSHTOK_SUBST) ? Bnullkeep : Bnull; - else { - seen_brct = 1; - *s = Inbrack; - } - break; - case '-': - if (bslash) - s[-1] = (flags & ZSHTOK_SUBST) ? Bnullkeep : Bnull; - else if (seen_brct) /* see corresonding code in lex.c */ - *s = Dash; - break; case '(': case '|': case ')': @@ -3531,10 +3517,13 @@ zshtokenize(char *s, int flags) case '^': case '#': case '~': + case '[': case ']': case '*': case '?': case '=': + case '-': + case '!': for (t = ztokens; *t; t++) { if (*t == *s) { if (bslash) diff --git a/Src/lex.c b/Src/lex.c index 9a7e3b8..0202d25 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -35,7 +35,7 @@ /* tokens */ /**/ -mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-'\"\\\\"; +mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-!'\"\\\\"; /* parts of the current token */ @@ -395,8 +395,9 @@ ctxtlex(void) #define LX2_BQUOTE 16 #define LX2_COMMA 17 #define LX2_DASH 18 -#define LX2_OTHER 19 -#define LX2_META 20 +#define LX2_BANG 19 +#define LX2_OTHER 20 +#define LX2_META 21 static unsigned char lexact1[256], lexact2[256], lextok2[256]; @@ -406,10 +407,10 @@ initlextabs(void) { int t0; static char *lx1 = "\\q\n;!&|(){}[]<>"; - static char *lx2 = ";)|$[]~({}><=\\\'\"`,-"; + static char *lx2 = ";)|$[]~({}><=\\\'\"`,-!"; for (t0 = 0; t0 != 256; t0++) { - lexact1[t0] = LX1_OTHER; + lexact1[t0] = LX1_OTHER; lexact2[t0] = LX2_OTHER; lextok2[t0] = t0; } @@ -1361,12 +1362,20 @@ gettokstr(int c, int sub) */ if (seen_brct) c = Dash; - else - c = '-'; - break; - } - add(c); - c = hgetc(); + else + c = '-'; + break; + case LX2_BANG: + /* + * Same logic as Dash, for ! to perform negation in range. + */ + if (seen_brct) + c = Bang; + else + c = '!'; + } + add(c); + c = hgetc(); if (intpos) intpos--; if (lexstop) diff --git a/Src/pattern.c b/Src/pattern.c index d2b8c59..72c7d97 100644 --- a/Src/pattern.c +++ b/Src/pattern.c @@ -247,7 +247,7 @@ typedef unsigned long zrange_t; */ static const char zpc_chars[ZPC_COUNT] = { '/', '\0', Bar, Outpar, Tilde, Inpar, Quest, Star, Inbrack, Inang, - Hat, Pound, Bnullkeep, Quest, Star, '+', '!', '@' + Hat, Pound, Bnullkeep, Quest, Star, '+', Bang, '!', '@' }; /* @@ -257,7 +257,7 @@ static const char zpc_chars[ZPC_COUNT] = { /**/ mod_export const char *zpc_strings[ZPC_COUNT] = { NULL, NULL, "|", NULL, "~", "(", "?", "*", "[", "<", - "^", "#", NULL, "?(", "*(", "+(", "!(", "@(" + "^", "#", NULL, "?(", "*(", "+(", "!(", "\\!(", "@(" }; /* @@ -481,7 +481,7 @@ patcompcharsset(void) */ zpc_special[ZPC_KSH_QUEST] = zpc_special[ZPC_KSH_STAR] = zpc_special[ZPC_KSH_PLUS] = zpc_special[ZPC_KSH_BANG] = - zpc_special[ZPC_KSH_AT] = Marker; + zpc_special[ZPC_KSH_BANG2] = zpc_special[ZPC_KSH_AT] = Marker; } /* * Note that if we are using KSHGLOB, then we test for a following @@ -1268,6 +1268,8 @@ patcomppiece(int *flagp, int paren) kshchar = STOUC('+'); else if (*patparse == zpc_special[ZPC_KSH_BANG]) kshchar = STOUC('!'); + else if (*patparse == zpc_special[ZPC_KSH_BANG2]) + kshchar = STOUC('!'); else if (*patparse == zpc_special[ZPC_KSH_AT]) kshchar = STOUC('@'); else if (*patparse == zpc_special[ZPC_KSH_STAR]) @@ -1424,7 +1426,7 @@ patcomppiece(int *flagp, int paren) DPUTS(zpc_special[ZPC_INBRACK] == Marker, "Treating '[' as pattern character although disabled"); flags |= P_SIMPLE; - if (*patparse == Hat || *patparse == '^' || *patparse == '!') { + if (*patparse == Hat || *patparse == Bang) { patparse++; starter = patnode(P_ANYBUT); } else @@ -4245,7 +4247,8 @@ haswilds(char *str) ((str[-1] == Quest && !zpc_disables[ZPC_KSH_QUEST]) || (str[-1] == Star && !zpc_disables[ZPC_KSH_STAR]) || (str[-1] == '+' && !zpc_disables[ZPC_KSH_PLUS]) || - (str[-1] == '!' && !zpc_disables[ZPC_KSH_BANG]) || + (str[-1] == Bang && !zpc_disables[ZPC_KSH_BANG]) || + (str[-1] == '!' && !zpc_disables[ZPC_KSH_BANG2]) || (str[-1] == '@' && !zpc_disables[ZPC_KSH_AT])))) return 1; break; diff --git a/Src/zsh.h b/Src/zsh.h index 6ee2a9c..0120ad7 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -193,29 +193,30 @@ struct mathfunc { #define Qtick ((char) 0x99) #define Comma ((char) 0x9a) #define Dash ((char) 0x9b) /* Only in patterns */ +#define Bang ((char) 0x9c) /* Only in patterns */ /* * Marks the last of the group above. * Remaining tokens are even more special. */ -#define LAST_NORMAL_TOK Dash +#define LAST_NORMAL_TOK Bang /* * Null arguments: placeholders for single and double quotes * and backslashes. */ -#define Snull ((char) 0x9c) -#define Dnull ((char) 0x9d) -#define Bnull ((char) 0x9e) +#define Snull ((char) 0x9d) +#define Dnull ((char) 0x9e) +#define Bnull ((char) 0x9f) /* * Backslash which will be returned to "\" instead of being stripped * when we turn the string into a printable format. */ -#define Bnullkeep ((char) 0x9f) +#define Bnullkeep ((char) 0xa0) /* * Null argument that does not correspond to any character. * This should be last as it does not appear in ztokens and * is used to initialise the IMETA type in inittyptab(). */ -#define Nularg ((char) 0xa0) +#define Nularg ((char) 0xa1) /* * Take care to update the use of IMETA appropriately when adding @@ -226,7 +227,7 @@ struct mathfunc { * Also used in pattern character arrays as guaranteed not to * mark a character in a string. */ -#define Marker ((char) 0xa1) +#define Marker ((char) 0xa2) /* chars that need to be quoted if meant literally */ @@ -1549,6 +1550,7 @@ enum zpc_chars { ZPC_KSH_STAR, /* * for *(...) in KSH_GLOB */ ZPC_KSH_PLUS, /* + for +(...) in KSH_GLOB */ ZPC_KSH_BANG, /* ! for !(...) in KSH_GLOB */ + ZPC_KSH_BANG2, /* ! for !(...) in KSH_GLOB, untokenised */ ZPC_KSH_AT, /* @ for @(...) in KSH_GLOB */ ZPC_COUNT /* Number of special chararacters */ }; diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst index 89256e3..a6b704a 100644 --- a/Test/D02glob.ztst +++ b/Test/D02glob.ztst @@ -622,3 +622,36 @@ 0:quoted - works in pattern in parameter >bcdef >cdef + + [[ a != [^a] ]] +0:^ active in character class if not quoted + + [[ a = ['^a'] ]] +0:^ not active in character class if quoted + + [[ a != [!a] ]] +0:! active in character class if not quoted + + [[ a = ['!a'] ]] +0:! not active in character class if quoted + + # Actually, we don't need the quoting here, + # c.f. the next test. This just makes it look + # more standard. + cset="^a-z" + [[ "^" = ["$cset"] ]] || print Fail 1 + [[ "a" = ["$cset"] ]] || print Fail 2 + [[ "-" = ["$cset"] ]] || print Fail 3 + [[ "z" = ["$cset"] ]] || print Fail 4 + [[ "1" != ["$cset"] ]] || print Fail 5 + [[ "b" != ["$cset"] ]] || print Fail 6 +0:character set specified as quoted variable + + cset="^a-z" + [[ "^" = [$~cset] ]] || print Fail 1 + [[ "a" != [$~cset] ]] || print Fail 2 + [[ "-" = [$~cset] ]] || print Fail 3 + [[ "z" != [$~cset] ]] || print Fail 4 + [[ "1" = [$~cset] ]] || print Fail 5 + [[ "b" != [$~cset] ]] || print Fail 6 +0:character set specified as active variabe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 16:03 ` Peter Stephenson @ 2016-01-19 16:25 ` Mikael Magnusson 2016-01-19 16:34 ` Peter Stephenson 2016-01-19 18:41 ` Bart Schaefer 1 sibling, 1 reply; 17+ messages in thread From: Mikael Magnusson @ 2016-01-19 16:25 UTC (permalink / raw) To: Peter Stephenson; +Cc: Zsh Hackers' List On Tue, Jan 19, 2016 at 5:03 PM, Peter Stephenson <p.stephenson@samsung.com> wrote: > diff --git a/README b/README > index 2e2ebce..8ec148e 100644 > --- a/README > +++ b/README > @@ -29,17 +29,43 @@ Zsh is a shell with lots of features. For a list of some of these, see the > file FEATURES, and for the latest changes see NEWS. For more > details, see the documentation. > > -Incompatibilities between 5.1 and 5.2 > +Incompatibilities between 5.2 and 5.3 > ------------------------------------- > > +In character classes delimited by "[" and "]" within patterns, whether > +used for filename generation (globbing) or other forms of pattern > +matching, it used not to be possible to quote "-" when used for a range, > +or "^" and "!" when used for negating a character set. The chracters can > +now be quoted by any of the standard shell means, but note that > +the "[" and "]" must not be quoted. For example, > + > + [[ $a = ['a-z'] ]] > + > +matches if the variable a contains just one of the characters "a", "-" > +or "z" only. Previously this would have matched any lower case ASCII > +letter. Note therefore the useful fact that > + > + [[ $a = ["$cset"] ]] > + > +matches any chracter contained in the variable "cset". A consequence > +of this change is that variables that should have active ranges need > +(with default zsh options) to be indicated explicitly, e.g. > + > + cset="a-z" > + [[ b = [${~cset}] ]] > + > +The "~" causes the "-" character to be active. In sh emulation the > +"~" is unncessary in this example and double quotes must be used to > +suppress the range behaviour of the "-". Does this mean [$cset] and ["$cset"] work the same way in zsh emulation, and [$cset] and [$~cset] work the same in sh emulation? (character is also somewhat consistently typoed as chracters in two or three places). -- Mikael Magnusson ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 16:25 ` Mikael Magnusson @ 2016-01-19 16:34 ` Peter Stephenson 0 siblings, 0 replies; 17+ messages in thread From: Peter Stephenson @ 2016-01-19 16:34 UTC (permalink / raw) To: Zsh Hackers' List On Tue, 19 Jan 2016 17:25:58 +0100 Mikael Magnusson <mikachu@gmail.com> wrote: > Does this mean [$cset] and ["$cset"] work the same way in zsh > emulation, and [$cset] and [$~cset] work the same in sh emulation? Yes, that's the upshot. > (character is also somewhat consistently typoed as chracters in two or > three places). waste of a syllable. pws ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-19 16:03 ` Peter Stephenson 2016-01-19 16:25 ` Mikael Magnusson @ 2016-01-19 18:41 ` Bart Schaefer 1 sibling, 0 replies; 17+ messages in thread From: Bart Schaefer @ 2016-01-19 18:41 UTC (permalink / raw) To: Zsh Hackers' List On Jan 19, 4:03pm, Peter Stephenson wrote: } Subject: Re: [BUG] quoting within bracket patterns has no effect } } -Incompatibilities between 5.1 and 5.2 } +Incompatibilities between 5.2 and 5.3 } ------------------------------------- } } -mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-'\"\\\\"; } +mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-!'\"\\\\"; Just a reminder that this needs a bump in the -dev suffix in version.mk -- otherwise, zcompile'd functions will misbehave. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-18 17:24 ` Peter Stephenson 2016-01-19 15:57 ` Jun T. 2016-01-19 16:03 ` Peter Stephenson @ 2016-01-23 0:17 ` Martijn Dekker 2016-01-23 1:49 ` Bart Schaefer 2 siblings, 1 reply; 17+ messages in thread From: Martijn Dekker @ 2016-01-23 0:17 UTC (permalink / raw) To: zsh-workers; +Cc: Peter Stephenson > On Mon, 18 Jan 2016 05:23:07 +0100 > Martijn Dekker <martijn@inlv.org> wrote: >> > Quotes should disable the special meaning of characters in glob >> > patterns[*]. >> > >> > [*] "If any character (ordinary, shell special, or pattern special) is >> > quoted, that pattern shall match the character itself." >> > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_01 In the current git version of zsh (zsh-5.2-97-g1aec003), this bug is now fixed for literal patterns: case b in ( ['a-c'] ) echo 'false match' ;; ( [a-c] ) echo 'correct match' ;; esac outputs "correct match" as expected. However, for variables there is now a new problem, the opposite of the old one: unlike in every other shell, a range is not recognised even if the variable is *not* quoted. So quoting a variable still has no effect, it's just that range parsing from variables was disabled altogether. The following code: myrange='a-z' somevar='c' case $somevar in ( *[$myrange]* ) echo "$somevar is part of $myrange" ;; esac outputs "c is part of a-z" on every shell except current zsh. Quoting the variable ( *["$myrange"]* ) should make it output nothing. Thanks, - M. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-23 0:17 ` Martijn Dekker @ 2016-01-23 1:49 ` Bart Schaefer 2016-01-26 4:03 ` Martijn Dekker 0 siblings, 1 reply; 17+ messages in thread From: Bart Schaefer @ 2016-01-23 1:49 UTC (permalink / raw) To: zsh-workers On Jan 23, 12:17am, Martijn Dekker wrote: } } However, for variables there is now a new problem, the opposite of the } old one: unlike in every other shell, a range is not recognised even if } the variable is *not* quoted. So quoting a variable still has no effect, } it's just that range parsing from variables was disabled altogether. The } following code: } } myrange='a-z' } somevar='c' } case $somevar in } ( *[$myrange]* ) echo "$somevar is part of $myrange" ;; } esac } } outputs "c is part of a-z" on every shell except current zsh. This is related to long-standing behavior for zsh in native mode. In as-close-as-zsh-has-to-POSIX mode: schaefer[655] ARGV0=sh Src/zsh $ myrange='a-z' $ somevar='c' $ case $somevar in > ( *[$myrange]* ) echo "$somevar is part of $myrange" ;; > esac c is part of a-z $ In native mode you need to use $~param to activate pattern characters in the expanded value: schaefer[656] Src/zsh -f torch% myrange='a-z' torch% somevar='c' torch% case $somevar in case> ( *[$~myrange]* ) echo "$somevar is part of $myrange" ;; case> esac c is part of a-z torch% It's true that needing this inside a character class now differs from previous versions of zsh for native mode. I'm not sure it's possible to have it both ways. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-23 1:49 ` Bart Schaefer @ 2016-01-26 4:03 ` Martijn Dekker 2016-01-26 4:48 ` Bart Schaefer 0 siblings, 1 reply; 17+ messages in thread From: Martijn Dekker @ 2016-01-26 4:03 UTC (permalink / raw) To: zsh-workers; +Cc: Bart Schaefer Bart Schaefer schreef op 23-01-16 om 01:49: > This is related to long-standing behavior for zsh in native mode. > > In as-close-as-zsh-has-to-POSIX mode: (Which, by the way, is very close now.) > In native mode you need to use $~param to activate pattern characters > in the expanded value: > > schaefer[656] Src/zsh -f > torch% myrange='a-z' > torch% somevar='c' > torch% case $somevar in > case> ( *[$~myrange]* ) echo "$somevar is part of $myrange" ;; > case> esac > c is part of a-z > torch% > > > It's true that needing this inside a character class now differs from > previous versions of zsh for native mode. I'm not sure it's possible > to have it both ways. In normal variable expansion, setting the option SH_WORD_SPLIT causes unquoted $var to be equivalent to ${~var} in variable expansion. Wouldn't it make sense to have SH_WORD_SPLIT activate pattern characters in unquoted variables in range expressions as well? This would make zsh under 'emulate sh' exactly compatible with bash, (d)ash, {pd,m}ksh and yash. - Martijn ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-26 4:03 ` Martijn Dekker @ 2016-01-26 4:48 ` Bart Schaefer 2016-01-26 14:07 ` Martijn Dekker 0 siblings, 1 reply; 17+ messages in thread From: Bart Schaefer @ 2016-01-26 4:48 UTC (permalink / raw) To: zsh-workers On Jan 26, 4:03am, Martijn Dekker wrote: } } In normal variable expansion, setting the option SH_WORD_SPLIT causes } unquoted $var to be equivalent to ${~var} in variable expansion. } Wouldn't it make sense to have SH_WORD_SPLIT activate pattern characters } in unquoted variables in range expressions as well? Look again at my first example from the previous message: schaefer[691] Src/zsh -f torch% emulate sh torch% myrange='a-z' torch% somevar='c' torch% case $somevar in case> ( *[$myrange]* ) echo "$somevar is part of $myrange" ;; case> esac c is part of a-z torch% print $ZSH_PATCHLEVEL zsh-5.2-103-g69c86cd What about that is incorrect? You need $~myrange for "emulate zsh" but NOT for "emulate sh", unless I'm missing something. Also it's never been "setopt shwordsplit" that enables patterns in a parameter expansion, rather it's "setopt globsubst": schaefer[692] Src/zsh -f torch% x='c*h' torch% print $x c*h torch% setopt shwordsplit torch% print $x c*h torch% setopt globsubst torch% print $x config.h config.modules.sh What I was pointing out when I said "I'm not sure it's possible to have it both ways" has ONLY to do with "emulate zsh". The problem is that the parsing happens at two different places -- at the time $myrange is expanded, I don't believe the parameter substitution code knows it's inside a character set in an active pattern; so there's no way to temporarily activate globsubst except by explicity doing so. This may be a case where native zsh is incompatible with POSIX at a fairly fundamental level; old working zsh scripts are potentially going to break, and I don't think we can do anything about it unless we want to tell POSIX to go pound sand. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-26 4:48 ` Bart Schaefer @ 2016-01-26 14:07 ` Martijn Dekker 2016-01-27 3:05 ` Bart Schaefer 0 siblings, 1 reply; 17+ messages in thread From: Martijn Dekker @ 2016-01-26 14:07 UTC (permalink / raw) To: zsh-workers; +Cc: Bart Schaefer Bart Schaefer schreef op 26-01-16 om 04:48: > What about that is incorrect? You need $~myrange for "emulate zsh" but > NOT for "emulate sh", unless I'm missing something. [...] > What I was pointing out when I said "I'm not sure it's possible to have > it both ways" has ONLY to do with "emulate zsh". Indeed, I got the wrong end of the stick there. Sorry about that. > This may be a case where native zsh is incompatible with POSIX at a > fairly fundamental level; Sure, but native zsh is its own entity, which is fine. Thanks, - M. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [BUG] quoting within bracket patterns has no effect 2016-01-26 14:07 ` Martijn Dekker @ 2016-01-27 3:05 ` Bart Schaefer 0 siblings, 0 replies; 17+ messages in thread From: Bart Schaefer @ 2016-01-27 3:05 UTC (permalink / raw) To: zsh-workers On Jan 26, 2:07pm, Martijn Dekker wrote: } } > This may be a case where native zsh is incompatible with POSIX at a } > fairly fundamental level; } } Sure, but native zsh is its own entity, which is fine. Yes, but the point is this incompatibility is so fundamental that the same code can't even support both alternatives. To support POSIX we have to "break" native zsh. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-01-27 3:04 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-18 4:23 [BUG] quoting within bracket patterns has no effect Martijn Dekker 2016-01-18 17:24 ` Peter Stephenson 2016-01-19 15:57 ` Jun T. 2016-01-19 17:35 ` Peter Stephenson 2016-01-19 18:54 ` Bart Schaefer 2016-01-20 10:48 ` Jun T. 2016-01-20 11:04 ` Peter Stephenson 2016-01-19 16:03 ` Peter Stephenson 2016-01-19 16:25 ` Mikael Magnusson 2016-01-19 16:34 ` Peter Stephenson 2016-01-19 18:41 ` Bart Schaefer 2016-01-23 0:17 ` Martijn Dekker 2016-01-23 1:49 ` Bart Schaefer 2016-01-26 4:03 ` Martijn Dekker 2016-01-26 4:48 ` Bart Schaefer 2016-01-26 14:07 ` Martijn Dekker 2016-01-27 3:05 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).