Bart Schaefer wrote: > } Another question is what to do with user names. Currently these are > } just the ASCII identifier characters plus "-". Is it useful to extend > } these to include alphanumeric characters from the local character set? > > My impression is that it would be, but some non-English-speakers should > weigh in. I've allowed it; it's particularly useful if you're using the extended parameter naming rules and the reference is to a named directory. > } Finally, I failed to interpret this code from math.c: > } > } if (*ptr == '+' && (unary || !ialnum(*ptr))) { > } ptr++; > > I suspect that's a bug. It probably originally said > > if (*ptr++ == '+' && (unary || !ialnum(*ptr))) { > > but someone realized it was wrong to increment the pointer if it was NOT > equal to plus, and made an incomplete fix. I've just assumed this is "&& 1" which is how it's evaluated as far back as the CVS archive goes without any obvious problems, and hence removed it. Here is the patch. The MULTIBYTE and POSIX_IDENTIFIERS options should be respected whenever necessary when testing character types. There's one remaining big job: I have not yet fixed up IFS to handle multibyte characters (also isep() and ISEP macros). That looks a little messy in places. The other remaining cases I'm aware of where we still don't test for multibyte characters should be harmless: - Some idigit()s. I don't see any good reason for allowing active multibyte digit characters in numerical expressions (for example, extra width digits), so anywhere a real digit (rather than just a printable character that happens to look like a digit) is required it must be 0 to 9 from the portable character set. - Likewise some iblank()s when inputting text. Whitespace has to be portable whitespace. - One ialpha() when checking options to builtins, since all option letters come from the portable character set. Index: README =================================================================== RCS file: /cvsroot/zsh/zsh/README,v retrieving revision 1.33 diff -u -r1.33 README --- README 26 Jun 2006 09:57:17 -0000 1.33 +++ README 10 Jul 2006 12:49:17 -0000 @@ -50,11 +50,23 @@ subsequently by the user. It is valid for the variable to be unset. Zsh has previously been lax about whether it allows octets with the -top bit set to be part of a shell identifier. With --enable-multibyte set, -this is now completely disabled. This is a temporary fix until the main -shell handles multibyte characters properly and the appropriate library -tests can be used. This change may be reviewed if no such permanent fix -is forthcoming. +top bit set to be part of a shell identifier. Older versions of the shell +assumed all such octets were allowed in identifiers, however the POSIX +standard does not allow such characters in identifiers. The older +behaviour is still obtained with --disable-multibyte in effect. +With --enable-multibyte set there are three possible cases: + MULTIBYTE option unset: only ASCII characters are allowed; the + shell does not attempt to identify non-ASCII characters at all. + MULTIBYTE option set, POSIX_IDENTIFIERS option unset: in addition + to the POSIX characters, any alphanumeric characters in the + local character set are allowed. Note that scripts and functions that + take advantage of this are non-portable; however, this is in the spirit + of previous versions of the shell. Note also that the options must + be set before the shell parses the script or function; setting + them during execution is not sufficient. + MULITBYTE option set, POSIX_IDENTIFIERS set: only ASCII characters + are allowed in identifiers even though the shell will recognise + alphanumeric multibyte characters. The completion style pine-directory must now be set to use completion for PINE mailbox folders; previously it had the default ~/mail. This Index: Doc/Zsh/options.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/options.yo,v retrieving revision 1.46 diff -u -r1.46 options.yo --- Doc/Zsh/options.yo 9 Apr 2006 21:47:22 -0000 1.46 +++ Doc/Zsh/options.yo 10 Jul 2006 12:49:18 -0000 @@ -1204,6 +1204,27 @@ tt(trap) and tt(unset). ) +pindex(POSIX_IDENTIFIERS) +cindex(identifiers, non-portable characters in) +cindex(parameter names, non-portable characters in) +item(tt(POSIX_IDENTIFIERS) )( +When this option is set, only the ASCII characters tt(a) to tt(z), tt(A) to +tt(Z), tt(0) to tt(9) and tt(_) may be used in identifiers (names +of shell parameters and modules). + +When the option is unset and multibyte character support is enabled (i.e. it +is compiled in and the option tt(MULTIBYTE) is set), then additionally any +alphanumeric characters in the local character set may be used in +identifiers. Note that scripts and functions written with this feature are +not portable, and also that both options must be set before the script +or function is parsed; setting them during execution is not sufficient +as the syntax var(variable)tt(=)var(value) has already been parsed as +a command rather than an assignment. + +If multibyte character support is not compiled into the shell this option is +ignored; all octets with the top bit set may be used in identifiers. +This is non-standard but is the traditional zsh behaviour. +) pindex(SH_FILE_EXPANSION) cindex(sh, expansion style) cindex(expansion style, sh) Index: Src/builtin.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v retrieving revision 1.158 diff -u -r1.158 builtin.c --- Src/builtin.c 30 May 2006 22:35:03 -0000 1.158 +++ Src/builtin.c 10 Jul 2006 12:49:20 -0000 @@ -2629,9 +2629,7 @@ char *modname = NULL; char *ptr; - for (ptr = funcname; *ptr; ptr++) - if (!iident(*ptr)) - break; + ptr = itype_end(funcname, IIDENT, 0); if (idigit(*funcname) || funcname == ptr || *ptr) { zwarnnam(name, "-M %s: bad math function name", funcname); return 1; Index: Src/glob.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/glob.c,v retrieving revision 1.51 diff -u -r1.51 glob.c --- Src/glob.c 30 May 2006 22:35:03 -0000 1.51 +++ Src/glob.c 10 Jul 2006 12:49:21 -0000 @@ -1443,9 +1443,7 @@ if (s[-1] == '+') { plus = 0; - tt = s; - while (iident(*tt)) - tt++; + tt = itype_end(s, IIDENT, 0); if (tt == s) { zerr("missing identifier after `+'"); Index: Src/lex.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/lex.c,v retrieving revision 1.33 diff -u -r1.33 lex.c --- Src/lex.c 30 May 2006 22:35:03 -0000 1.33 +++ Src/lex.c 10 Jul 2006 12:49:22 -0000 @@ -1135,10 +1135,13 @@ if (idigit(*t)) while (++t < bptr && idigit(*t)); else { - while (iident(*t) && ++t < bptr); + int sav = *bptr; + *bptr = '\0'; + t = itype_end(t, IIDENT, 0); if (t < bptr) { - *bptr = '\0'; skipparens(Inbrack, Outbrack, &t); + } else { + *bptr = sav; } } if (*t == '+') Index: Src/math.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/math.c,v retrieving revision 1.25 diff -u -r1.25 math.c --- Src/math.c 30 Jun 2006 09:41:35 -0000 1.25 +++ Src/math.c 10 Jul 2006 12:49:22 -0000 @@ -265,11 +265,12 @@ { int cct = 0; yyval.type = MN_INTEGER; + char *ie; for (;; cct = 0) switch (*ptr++) { case '+': - if (*ptr == '+' && (unary || !ialnum(*ptr))) { + if (*ptr == '+') { ptr++; return (unary) ? PREPLUS : POSTPLUS; } @@ -279,7 +280,7 @@ } return (unary) ? UPLUS : PLUS; case '-': - if (*ptr == '-' && (unary || !ialnum(*ptr))) { + if (*ptr == '-') { ptr++; return (unary) ? PREMINUS : POSTMINUS; } @@ -469,12 +470,12 @@ } cct = 1; } - if (iident(*ptr)) { + if ((ie = itype_end(ptr, IIDENT, 0)) != ptr) { int func = 0; char *p; p = ptr; - while (iident(*++ptr)); + ptr = ie; if (*ptr == '[' || (!cct && *ptr == '(')) { char op = *ptr, cp = ((*ptr == '[') ? ']' : ')'); int l; Index: Src/module.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/module.c,v retrieving revision 1.22 diff -u -r1.22 module.c --- Src/module.c 30 May 2006 22:35:03 -0000 1.22 +++ Src/module.c 10 Jul 2006 12:49:23 -0000 @@ -734,12 +734,8 @@ modname_ok(char const *p) { do { - if(*p != '_' && !ialnum(*p)) - return 0; - do { - p++; - } while(*p == '_' || ialnum(*p)); - if(!*p) + p = itype_end(p, IIDENT, 0); + if (!*p) return 1; } while(*p++ == '/'); return 0; Index: Src/options.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/options.c,v retrieving revision 1.28 diff -u -r1.28 options.c --- Src/options.c 30 May 2006 22:35:03 -0000 1.28 +++ Src/options.c 10 Jul 2006 12:49:23 -0000 @@ -176,6 +176,7 @@ {{NULL, "overstrike", 0}, OVERSTRIKE}, {{NULL, "pathdirs", OPT_EMULATE}, PATHDIRS}, {{NULL, "posixbuiltins", OPT_EMULATE|OPT_BOURNE}, POSIXBUILTINS}, +{{NULL, "posixidentifiers", OPT_EMULATE|OPT_BOURNE}, POSIXIDENTIFIERS}, {{NULL, "printeightbit", 0}, PRINTEIGHTBIT}, {{NULL, "printexitvalue", 0}, PRINTEXITVALUE}, {{NULL, "privileged", OPT_SPECIAL}, PRIVILEGED}, Index: Src/params.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/params.c,v retrieving revision 1.116 diff -u -r1.116 params.c --- Src/params.c 27 Jun 2006 16:28:46 -0000 1.116 +++ Src/params.c 10 Jul 2006 12:49:24 -0000 @@ -899,9 +899,7 @@ break; } else { /* Find the first character in `s' not in the iident type table */ - for (ss = s; *ss; ss++) - if (!iident(*ss)) - break; + ss = itype_end(s, IIDENT, 0); } /* If the next character is not [, then it is * @@ -1653,7 +1651,7 @@ mod_export Value fetchvalue(Value v, char **pptr, int bracks, int flags) { - char *s, *t; + char *s, *t, *ie; char sav, c; int ppar = 0; @@ -1665,9 +1663,8 @@ else ppar = *s++ - '0'; } - else if (iident(c)) - while (iident(*s)) - s++; + else if ((ie = itype_end(s, IIDENT, 0)) != s) + s = ie; else if (c == Quest) *s++ = '?'; else if (c == Pound) @@ -1732,7 +1729,7 @@ return v; } } else if (!(flags & SCANPM_ASSIGNING) && v->isarr && - iident(*t) && isset(KSHARRAYS)) + itype_end(t, IIDENT, 1) != t && isset(KSHARRAYS)) v->end = 1, v->isarr = 0; } if (!bracks && *s) Index: Src/parse.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/parse.c,v retrieving revision 1.56 diff -u -r1.56 parse.c --- Src/parse.c 9 Jul 2006 14:47:22 -0000 1.56 +++ Src/parse.c 10 Jul 2006 12:49:26 -0000 @@ -1603,10 +1603,7 @@ if (*ptr == Outbrace && ptr > tokstr + 1) { - while (--ptr > tokstr) - if (!iident(*ptr)) - break; - if (ptr == tokstr) + if (itype_end(tokstr, IIDENT, 0) >= ptr - 1) { char *toksave = tokstr; char *idstring = dupstrpfx(tokstr+1, eptr-tokstr-1); Index: Src/subst.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/subst.c,v retrieving revision 1.53 diff -u -r1.53 subst.c --- Src/subst.c 28 Jun 2006 14:34:27 -0000 1.53 +++ Src/subst.c 10 Jul 2006 12:49:28 -0000 @@ -475,15 +475,14 @@ return 0; *namptr = dyncat(ds, ptr); return 1; - } else if (iuser(str[1])) { /* ~foo */ - char *ptr, *hom, save; + } else if ((ptr = itype_end(str+1, IUSER, 0)) != str+1) { /* ~foo */ + char *hom, save; - for (ptr = ++str; *ptr && iuser(*ptr); ptr++); save = *ptr; if (!isend(save)) return 0; *ptr = 0; - if (!(hom = getnameddir(str))) { + if (!(hom = getnameddir(++str))) { if (isset(NOMATCH)) zerr("no such user or named directory: %s", str); *ptr = save; @@ -1146,9 +1145,10 @@ * Shouldn't this be a table or something? We test for all * these later on, too. */ - if (!ialnum(c = *s) && c != '#' && c != Pound && c != '-' && - c != '!' && c != '$' && c != String && c != Qstring && - c != '?' && c != Quest && c != '_' && + c = *s; + if (itype_end(s, IIDENT, 1) == s && *s != '#' && c != Pound && + c != '-' && c != '!' && c != '$' && c != String && c != Qstring && + c != '?' && c != Quest && c != '*' && c != Star && c != '@' && c != '{' && c != Inbrace && c != '=' && c != Equals && c != Hat && c != '^' && c != '~' && c != Tilde && c != '+') { @@ -1446,8 +1446,8 @@ } else spbreak = 2; } else if ((c == '#' || c == Pound) && - (iident(cc = s[1]) - || cc == '*' || cc == Star || cc == '@' + (itype_end(s+1, IIDENT, 0) != s + 1 + || (cc = s[1]) == '*' || cc == Star || cc == '@' || cc == '-' || (cc == ':' && s[2] == '-') || (isstring(cc) && (s[2] == Inbrace || s[2] == Inpar)))) { getlen = 1 + whichlen, s++; @@ -1471,7 +1471,7 @@ * Try to handle this when parameter is named * by (P) (second part of test). */ - if (iident(s[1]) || (aspar && isstring(s[1]) && + if (itype_end(s+1, IIDENT, 0) != s+1 || (aspar && isstring(s[1]) && (s[2] == Inbrace || s[2] == Inpar))) chkset = 1, s++; else if (!inbrace) { Index: Src/utils.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/utils.c,v retrieving revision 1.126 diff -u -r1.126 utils.c --- Src/utils.c 30 Jun 2006 09:41:35 -0000 1.126 +++ Src/utils.c 10 Jul 2006 12:49:29 -0000 @@ -1921,7 +1921,7 @@ return; if (**s == String && !*t) { guess = *s + 1; - if (*t || !ialpha(*guess)) + if (itype_end(guess, IIDENT, 1) == guess) return; ic = String; d = 100; @@ -2750,11 +2750,8 @@ * iident() macro extended to support wide characters. * * The macro is intended to test if a character is allowed in an - * internal zsh identifier. Until the main shell handles multibyte - * characters it's not a good idea to allow characters other than - * ASCII characters; it would cause zle to allow characters that - * the main shell would reject. Eventually we should be able - * to allow all alphanumerics. + * internal zsh identifier. We allow all alphanumerics outside + * the ASCII range unless POSIXIDENTIFIERS is set. * * Otherwise similar to wcsiword. */ @@ -2774,14 +2771,90 @@ } else if (len == 1 && iascii(*outstr)) { return iident(*outstr); } else { - /* TODO: not currently allowed, see above */ - return 0; + return !isset(POSIXIDENTIFIERS) && iswalnum(c); } } /**/ #endif +/* + * Find the end of a set of characters in the set specified by itype; + * one of IALNUM, IIDENT, IWORD or IUSER. For non-ASCII characters, we assume + * alphanumerics are part of the set, with the exception that + * identifiers are not treated that way if POSIXIDENTIFIERS is set. + * + * See notes above for identifiers. + * Returns the same pointer as passed if not on an identifier character. + * If "once" is set, just test the first character, i.e. (outptr != + * inptr) tests whether the first character is valid in an identifier. + * + * Currently this is only called with itype IIDENT or IUSER. + */ + +/**/ +mod_export char * +itype_end(const char *ptr, int itype, int once) +{ +#ifdef MULTIBYTE_SUPPORT + if (isset(MULTIBYTE) && + (itype != IIDENT || !isset(POSIXIDENTIFIERS))) { + mb_metacharinit(); + while (*ptr) { + wint_t wc; + int len = mb_metacharlenconv(ptr, &wc); + + if (!len) + break; + + if (wc == WEOF) { + /* invalid, treat as single character */ + int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr); + /* in this case non-ASCII characters can't match */ + if (chr > 127 || !zistype(chr,itype)) + break; + } else if (len == 1 && iascii(*ptr)) { + /* ASCII: can't be metafied, use standard test */ + if (!zistype(*ptr,itype)) + break; + } else { + /* + * Valid non-ASCII character. Allow all alphanumerics; + * if testing for words, allow all wordchars. + */ + if (!(iswalnum(wc) || + (itype == IWORD && wcschr(wordchars_wide, wc)))) + break; + } + ptr += len; + + if (once) + break; + } + } else +#endif + for (;;) { + int chr = STOUC(*ptr == Meta ? ptr[1] ^ 32 : *ptr); + if (!zistype(chr,itype)) + break; + ptr += (*ptr == Meta) ? 2 : 1; + + if (once) + break; + } + + /* + * Nasty. The first argument is const char * because we + * don't modify it here. However, we really want to pass + * back the same type as was passed down, to allow idioms like + * p = itype_end(p, IIDENT, 0); + * So returning a const char * isn't really the right thing to do. + * Without having two different functions the following seems + * to be the best we can do. + */ + return (char *)ptr; +} + /**/ mod_export char ** arrdup(char **s) @@ -3710,9 +3783,10 @@ /**/ int -mb_metacharlenconv(char *s, wint_t *wcp) +mb_metacharlenconv(const char *s, wint_t *wcp) { - char inchar, *ptr; + char inchar; + const char *ptr; size_t ret; wchar_t wc; Index: Src/zsh.h =================================================================== RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v retrieving revision 1.92 diff -u -r1.92 zsh.h --- Src/zsh.h 9 Jul 2006 14:47:22 -0000 1.92 +++ Src/zsh.h 10 Jul 2006 12:49:30 -0000 @@ -1610,6 +1610,7 @@ OVERSTRIKE, PATHDIRS, POSIXBUILTINS, + POSIXIDENTIFIERS, PRINTEIGHTBIT, PRINTEXITVALUE, PRIVILEGED, Index: Src/ztype.h =================================================================== RCS file: /cvsroot/zsh/zsh/Src/ztype.h,v retrieving revision 1.3 diff -u -r1.3 ztype.h --- Src/ztype.h 1 Nov 2005 02:50:22 -0000 1.3 +++ Src/ztype.h 10 Jul 2006 12:49:30 -0000 @@ -42,22 +42,22 @@ #define IMETA (1 << 12) #define IWSEP (1 << 13) #define INULL (1 << 14) -#define _icom(X,Y) (typtab[STOUC(X)] & Y) -#define idigit(X) _icom(X,IDIGIT) -#define ialnum(X) _icom(X,IALNUM) -#define iblank(X) _icom(X,IBLANK) /* blank, not including \n */ -#define inblank(X) _icom(X,INBLANK) /* blank or \n */ -#define itok(X) _icom(X,ITOK) -#define isep(X) _icom(X,ISEP) -#define ialpha(X) _icom(X,IALPHA) -#define iident(X) _icom(X,IIDENT) -#define iuser(X) _icom(X,IUSER) /* username char */ -#define icntrl(X) _icom(X,ICNTRL) -#define iword(X) _icom(X,IWORD) -#define ispecial(X) _icom(X,ISPECIAL) -#define imeta(X) _icom(X,IMETA) -#define iwsep(X) _icom(X,IWSEP) -#define inull(X) _icom(X,INULL) +#define zistype(X,Y) (typtab[STOUC(X)] & Y) +#define idigit(X) zistype(X,IDIGIT) +#define ialnum(X) zistype(X,IALNUM) +#define iblank(X) zistype(X,IBLANK) /* blank, not including \n */ +#define inblank(X) zistype(X,INBLANK) /* blank or \n */ +#define itok(X) zistype(X,ITOK) +#define isep(X) zistype(X,ISEP) +#define ialpha(X) zistype(X,IALPHA) +#define iident(X) zistype(X,IIDENT) +#define iuser(X) zistype(X,IUSER) /* username char */ +#define icntrl(X) zistype(X,ICNTRL) +#define iword(X) zistype(X,IWORD) +#define ispecial(X) zistype(X,ISPECIAL) +#define imeta(X) zistype(X,IMETA) +#define iwsep(X) zistype(X,IWSEP) +#define inull(X) zistype(X,INULL) #define iascii(X) isascii(STOUC(X)) #define ilower(X) islower(STOUC(X)) Index: Src/Zle/compcore.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/compcore.c,v retrieving revision 1.83 diff -u -r1.83 compcore.c --- Src/Zle/compcore.c 7 Mar 2006 12:52:28 -0000 1.83 +++ Src/Zle/compcore.c 10 Jul 2006 12:49:31 -0000 @@ -1081,7 +1081,7 @@ } if ((*p == String || *p == Qstring) && p[1] != Inpar && p[1] != Inbrack) { /* This is really a parameter expression (not $(...) or $[...]). */ - char *b = p + 1, *e = b; + char *b = p + 1, *e = b, *ie; int n = 0, br = 1, nest = 0; if (*b == Inbrace) { @@ -1124,10 +1124,16 @@ else if (idigit(*e)) while (idigit(*e)) e++; - else if (iident(*e)) - while (iident(*e) || - (comppatmatch && *comppatmatch && (*e == Star || *e == Quest))) - e++; + else if ((ie = itype_end(e, IIDENT, 0)) != e) { + do { + e = ie; + if (comppatmatch && *comppatmatch && + (*e == Star || *e == Quest)) + ie = e + 1; + else + ie = itype_end(e, IIDENT, 0); + } while (ie != e); + } /* Now make sure that the cursor is inside the name. */ if (offs <= e - s && offs >= b - s && n <= 0) { Index: Src/Zle/zle_tricky.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_tricky.c,v retrieving revision 1.67 diff -u -r1.67 zle_tricky.c --- Src/Zle/zle_tricky.c 30 May 2006 22:35:04 -0000 1.67 +++ Src/Zle/zle_tricky.c 10 Jul 2006 12:49:32 -0000 @@ -551,9 +551,8 @@ else if (idigit(*e)) while (idigit(*e)) e++; - else if (iident(*e)) - while (iident(*e)) - e++; + else + e = itype_end(e, IIDENT, 0); /* Now make sure that the cursor is inside the name. */ if (offs <= e - s && offs >= b - s && n <= 0) { @@ -740,8 +739,7 @@ else if (idigit(*q)) do q++; while (idigit(*q)); else - while (iident(*q)) - q++; + q = itype_end(q, IIDENT, 0); sav = *q; *q = '\0'; if (zlemetacs - wb == q - s && @@ -1293,7 +1291,7 @@ if (varq) tt = clwords[clwpos]; - for (s = tt; iident(*s); s++); + s = itype_end(tt, IIDENT, 0); sav = *s; *s = '\0'; zsfree(varname); @@ -1360,17 +1358,29 @@ * as being in math. */ if (inwhat != IN_MATH) { int i = 0; - char *nnb = (iident(*s) ? s : s + 1), *nb = NULL, *ne = NULL; - - for (tt = s; ++tt < s + zlemetacs - wb;) + char *nnb, *nb = NULL, *ne = NULL; + + MB_METACHARINIT(); + if (itype_end(s, IIDENT, 1) == s) + nnb = s + MB_METACHARLEN(s); + else + nnb = s; + for (tt = s; tt < s + zlemetacs - wb;) { if (*tt == Inbrack) { i++; nb = nnb; ne = tt; - } else if (i && *tt == Outbrack) + tt++; + } else if (i && *tt == Outbrack) { i--; - else if (!iident(*tt)) - nnb = tt + 1; + tt++; + } else { + int nclen = MB_METACHARLEN(tt); + if (itype_end(tt, IIDENT, 1) == tt) + nnb = tt + nclen; + tt += nclen; + } + } if (i) { inwhat = IN_MATH; insubscr = 1; @@ -1415,33 +1425,59 @@ /* In mathematical expression, we complete parameter names * * (even if they don't have a `$' in front of them). So we * * have to find that name. */ - for (we = zlemetacs; iident(zlemetaline[we]); we++); - for (wb = zlemetacs; --wb >= 0 && iident(zlemetaline[wb]);); - wb++; + char *cspos = zlemetaline + zlemetacs, *wptr, *cptr; + we = itype_end(cspos, IIDENT, 0) - cspos; + + /* + * With multibyte characters we need to go forwards, + * so start at the beginning of the line and continue + * until cspos. + */ + wptr = cptr = zlemetaline; + for (;;) { + cptr = itype_end(wptr, IIDENT, 0); + if (cptr == wptr) { + /* not an ident character */ + wptr = (cptr += MB_METACHARLEN(cptr)); + } + if (cptr >= cspos) { + wb = wptr - zlemetaline; + break; + } + } } zsfree(s); s = zalloc(we - wb + 1); strncpy(s, zlemetaline + wb, we - wb); s[we - wb] = '\0'; - if (wb > 2 && zlemetaline[wb - 1] == '[' && - iident(zlemetaline[wb - 2])) { - int i = wb - 3; - char sav = zlemetaline[wb - 1]; - while (i >= 0 && iident(zlemetaline[i])) - i--; + if (wb > 2 && zlemetaline[wb - 1] == '[') { + char *sqbr = zlemetaline + wb - 1, *cptr, *wptr; - zlemetaline[wb - 1] = '\0'; - zsfree(varname); - varname = ztrdup(zlemetaline + i + 1); - zlemetaline[wb - 1] = sav; - if ((keypm = (Param) paramtab->getnode(paramtab, varname)) && - (keypm->node.flags & PM_HASHED)) { - if (insubscr != 3) - insubscr = 2; - } else - insubscr = 1; + /* Need to search forward for word characters */ + cptr = wptr = zlemetaline; + for (;;) { + cptr = itype_end(wptr, IIDENT, 0); + if (cptr == wptr) { + /* not an ident character */ + wptr = (cptr += MB_METACHARLEN(cptr)); + } + if (cptr >= sqbr) + break; + } + + if (wptr < sqbr) { + zsfree(varname); + varname = ztrduppfx(wptr, sqbr - wptr); + if ((keypm = (Param) paramtab->getnode(paramtab, varname)) && + (keypm->node.flags & PM_HASHED)) { + if (insubscr != 3) + insubscr = 2; + } else + insubscr = 1; + } } + parse_subst_string(s); } /* This variable will hold the current word in quoted form. */ @@ -1562,12 +1598,12 @@ *tp == '@') p++, i++; else { + char *ie; if (idigit(*tp)) while (idigit(*tp)) tp++; - else if (iident(*tp)) - while (iident(*tp)) - tp++; + else if ((ie = itype_end(tp, IIDENT, 0)) != tp) + tp = ie; else { tt = NULL; break; Index: Test/D07multibyte.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/D07multibyte.ztst,v retrieving revision 1.4 diff -u -r1.4 D07multibyte.ztst --- Test/D07multibyte.ztst 30 Jun 2006 09:41:35 -0000 1.4 +++ Test/D07multibyte.ztst 10 Jul 2006 12:49:32 -0000 @@ -165,3 +165,12 @@ >165 >163 >945 945 + + unsetopt posix_identifiers + expr='hähä=3 || exit 1; print $hähä' + eval $expr + setopt posix_identifiers + (eval $expr) +1:POSIX_IDENTIFIERS option +>3 +?(eval):1: command not found: hähä=3 -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070