From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29215 invoked from network); 13 Dec 2022 09:51:52 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 13 Dec 2022 09:51:52 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1670925112; b=TZ1ommU0JtJ7TYZfdyVCNBlgEtJIw1lOjlvM6jBx0st8NLEFvLo5TL/MFl9Ues5Ki56ZBV1B0q 4RgrU7wxq58ZyjnxaMfkT28wtehZH8vBoGRN6Jto8qhlX6Nfx9EKN40+n7DbEBRRXXRYWtR3mH mlCZgSecRVr9GYvXVEYbBmfjmY6IDkRE+NRiU+nq2u/Hr4eDAYEBcZIF5mwiJGlMtkLQ28FlzR sfI8tSBy3nfok/rgVYLJhnpggBwVZky/0DQhWT3Hwme2O+Fz0AQseA2kpnPmtstrnGYGLFsEBB VlTApYqnYoXYSAPhZVqFMORMoJ+sRb/wiyZGDf5gX387xg==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (snd20008-bg.im.kddi.ne.jp) smtp.remote-ip=222.227.84.8; dmarc=none header.from=kba.biglobe.ne.jp; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1670925112; bh=eOpLFMjMG/76yM5Fzj/FTLSznmFmGDRjBVdDg0Ui/8U=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Message-ID:In-Reply-To:To:References:Date:Subject: MIME-Version:Content-Transfer-Encoding:Content-Type:From:DKIM-Signature; b=fJ7MG7MS2h1Cn4cPHOSI/2Wjpp53TeGK0Ps7jRMisov2KU9FiECI8N0gK0Y4sq5QRn+hFM7cuZ ONz0p4jcb7UZvj2xHXs6Gct+o6dflTjWCj6VHVSt9SxX3D2BH6hVhbZ9cqdBdw6zhKZT5rQeY9 Xre0abVdx67uIyBckWpC0S6CbV2YJJTNuE08mqBWOw/jl/+Nr7qJQ6UOAvZQGehBdDiOKwP4Ln iSWYCLrmfX7GgN0phR98oFVPWjDp1SivhFh/QA0wphzAXEp02QRMBGxzxGK03GvqT2gkFO5vCH /yWYmbkgpm64jmFtL8fJC7J2ZtRdi0fXwoxeTmEO13XWrg==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Message-Id:In-Reply-To:To:References: Date:Subject:Mime-Version:Content-Transfer-Encoding:Content-Type:From: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=3yhQdQ+uuOqKTZ72jN1Gin3lVatAmOSpsffShUgw9kU=; b=hv2hTqPuNexmVYtvSp2wn04FUK ZsyEWM8cVPfOPI+S5E/to5vaDu602PCd8QfJQAQYz4bASuIpxdfxzRbxq3XGy5rvimwYYzPT881rr iiJP5lndd9Ov92rhfuS7JLOHUDf+NFzjVC9/XBR0qGIkQpG8yrtKbQLGv0tiOGRFq2u+JRIbhRMHD MOyL2yNazQ9xMtU3b1LJeUrsO3dixAhZcC4Y5RzzLwR4AGAu/QYN7J23+TTfPNuv3bz2IyZwoOh/k t/cnJTYY8yVu8FnC7RUrSFRDd1CZYEX4KRioxHaV/GuDXUlIBPSyMrW5EjFsisubjo3Z3DUGCtWwI NSXoBd6A==; Received: by zero.zsh.org with local id 1p51x2-0007GE-83; Tue, 13 Dec 2022 09:51:52 +0000 Authentication-Results: zsh.org; iprev=pass (snd20008-bg.im.kddi.ne.jp) smtp.remote-ip=222.227.84.8; dmarc=none header.from=kba.biglobe.ne.jp; arc=none Received: from snd20008-bg.im.kddi.ne.jp ([222.227.84.8]:8801 helo=dfmta0005.biglobe.ne.jp) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1p51wl-0006x3-06; Tue, 13 Dec 2022 09:51:36 +0000 Received: from mail.biglobe.ne.jp by omta0005.biglobe.ne.jp with ESMTP id <20221213095131260.ZKTL.71968.mail.biglobe.ne.jp@biglobe.ne.jp> for ; Tue, 13 Dec 2022 18:51:31 +0900 From: Jun T Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\)) Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f Date: Tue, 13 Dec 2022 18:51:30 +0900 References: <20221118142717.t4elzrigjeizjm6w@chazelas.org> To: zsh-workers@zsh.org In-Reply-To: Message-Id: <1D770986-46C3-4A8C-A66A-5DA661AC5C27@kba.biglobe.ne.jp> X-Mailer: Apple Mail (2.3445.104.21) X-Biglobe-Sender: takimoto-j@kba.biglobe.ne.jp X-Seq: 51205 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: > 2022/11/29 23:27, Jun. T wrote: >=20 > So the basic question is: > What should we do if IFS contains invalid character(s)? >=20 > I think, at least if MULTIBYTE option is ON, it would be better to > force reset IFS to the default, rather than leaving ifs_wide empty. So currently this is the only simple solution I can think of. In the patch below, if MULTIBYTE option is ON and IFS contains invalid characters, it is reset to the default. Is this OK? Do we need to issue a warning when reseting IFS? The patch includes the patch in workers/51087=E2=81=A9 (fix the behavior when MULTIBYTE option is OFF). When LC_CTYPE changes (directly or via LC_ALL or LANG), a character that was valid would become invalid in the new locale. So I added inittyptab() in lcsetfn() etc. A simple test is included. On macOS, with C-locale, any byte is a valid character, and IFS is not reset by the test. Doc/Zsh/params.yo | 7 +++++-- Src/params.c | 3 +++ Src/utils.c | 42 ++++++++++++++++++++++++++---------------- Test/D04parameter.ztst | 12 ++++++++++++ 4 files changed, 46 insertions(+), 18 deletions(-) diff --git a/Doc/Zsh/params.yo b/Doc/Zsh/params.yo index 2a30085a8..91201616a 100644 --- a/Doc/Zsh/params.yo +++ b/Doc/Zsh/params.yo @@ -1251,15 +1251,18 @@ Internal field separators (by default space, = tab, newline and NUL), that are used to separate words which result from command or parameter expansion and words read by the tt(read) builtin. Any characters from the set space, tab and -newline that appear in the IFS are called em(IFS white space). +newline that appear in the tt(IFS) are called em(IFS white space). One or more IFS white space characters or one non-IFS white space character together with any adjacent IFS white space character delimit a field. If an IFS white space character appears twice consecutively -in the IFS, this character is treated as if it were not an IFS white +in the tt(IFS), this character is treated as if it were not an IFS = white space character. =20 If the parameter is unset, the default is used. Note this has a different effect from setting the parameter to an empty string. + +If tt(MULTIBYTE) option is on and tt(IFS) contains invalid characters = in +the current locale, it is reset to the default. ) vindex(KEYBOARD_HACK) item(tt(KEYBOARD_HACK))( diff --git a/Src/params.c b/Src/params.c index f1fe38955..81f0e5015 100644 --- a/Src/params.c +++ b/Src/params.c @@ -4639,6 +4639,7 @@ setlang(char *x) if ((x =3D getsparam_u(ln->name)) && *x) setlocale(ln->category, x); unqueue_signals(); + inittyptab(); } =20 /**/ @@ -4662,6 +4663,7 @@ lc_allsetfn(Param pm, char *x) else { setlocale(LC_ALL, unmeta(x)); clear_mbstate(); + inittyptab(); } } =20 @@ -4700,6 +4702,7 @@ lcsetfn(Param pm, char *x) } unqueue_signals(); clear_mbstate(); /* LC_CTYPE may have changed */ + inittyptab(); } #endif /* USE_LOCALE */ =20 diff --git a/Src/utils.c b/Src/utils.c index edf5d3df7..a874851cc 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -74,9 +74,6 @@ set_widearray(char *mb_array, Widechar_array wca) } wca->len =3D 0; =20 - if (!isset(MULTIBYTE)) - return; - if (mb_array) { VARARR(wchar_t, tmpwcs, strlen(mb_array)); wchar_t *wcptr =3D tmpwcs; @@ -87,8 +84,7 @@ set_widearray(char *mb_array, Widechar_array wca) int mblen; =20 if (STOUC(*mb_array) <=3D 0x7f) { - mb_array++; - *wcptr++ =3D (wchar_t)*mb_array; + *wcptr++ =3D (wchar_t)*mb_array++; continue; } =20 @@ -4118,8 +4114,9 @@ inittyptab(void) * having IIDENT here is a good idea at all, but this code * should disappear into history... */ - for (t0 =3D 0240; t0 !=3D 0400; t0++) - typtab[t0] =3D IALPHA | IALNUM | IIDENT | IUSER | IWORD; + if isset(MULTIBYTE) + for (t0 =3D 0240; t0 !=3D 0400; t0++) + typtab[t0] =3D IALPHA | IALNUM | IIDENT | IUSER | IWORD; #endif /* typtab['.'] |=3D IIDENT; */ /* Allow '.' in variable names - = broken */ typtab['_'] =3D IIDENT | IUSER; @@ -4134,11 +4131,24 @@ inittyptab(void) typtab[t0] |=3D ITOK | IMETA; for (t0 =3D (int)STOUC(Snull); t0 <=3D (int)STOUC(Nularg); t0++) typtab[t0] |=3D ITOK | IMETA | INULL; - for (s =3D ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ? - DEFAULT_IFS_SH : DEFAULT_IFS; *s; s++) { + /* ifs */ +#define CURRENT_DEFAULT_IFS (EMULATION(EMULATE_KSH|EMULATE_SH) ? \ + DEFAULT_IFS_SH : DEFAULT_IFS) +#ifdef MULTIBYTE_SUPPORT + if (isset(MULTIBYTE)) { + set_widearray(ifs ? ifs : CURRENT_DEFAULT_IFS, &ifs_wide); + if (ifs && !ifs_wide.chars) { + /* IFS has invalid character(s). Reset it to default */ + zsfree(ifs); + ifs =3D ztrdup(CURRENT_DEFAULT_IFS); + set_widearray(ifs, &ifs_wide); + } + } +#endif + for (s =3D ifs ? ifs : CURRENT_DEFAULT_IFS; *s; s++) { int c =3D STOUC(*s =3D=3D Meta ? *++s ^ 32 : *s); #ifdef MULTIBYTE_SUPPORT - if (!isascii(c)) { + if (isset(MULTIBYTE) && !isascii(c)) { /* see comment for wordchars below */ continue; } @@ -4151,10 +4161,15 @@ inittyptab(void) } typtab[c] |=3D ISEP; } + /* wordchars */ +#ifdef MULTIBYTE_SUPPORT + if (isset(MULTIBYTE)) + set_widearray(wordchars, &wordchars_wide); +#endif for (s =3D wordchars ? wordchars : DEFAULT_WORDCHARS; *s; s++) { int c =3D STOUC(*s =3D=3D Meta ? *++s ^ 32 : *s); #ifdef MULTIBYTE_SUPPORT - if (!isascii(c)) { + if (isset(MULTIBYTE) && !isascii(c)) { /* * If we have support for multibyte characters, we don't * handle non-ASCII characters here; instead, we turn @@ -4167,11 +4182,6 @@ inittyptab(void) #endif typtab[c] |=3D IWORD; } -#ifdef MULTIBYTE_SUPPORT - set_widearray(wordchars, &wordchars_wide); - set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ? - DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide); -#endif for (s =3D SPECCHARS; *s; s++) typtab[STOUC(*s)] |=3D ISPECIAL; if (typtab_flags & ZTF_SP_COMMA) diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index 6bf55b4db..d9f81f66d 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -2275,6 +2275,18 @@ F:We do not care what $OLDPWD is, as long as it = does not cause an error F:As of this writing, var=3D$@ and var=3D"$@" with null IFS have = unspecified F:behavior, see http://austingroupbugs.net/view.php?id=3D888 =20 + ( + IFS=3D$'\x80' + if [[ $IFS =3D $' \t\n\0' ]]; then + echo OK # if $'\x80' is illegal + else # otherwise, it should work as a separator + s=3D$'foo\x80\bar' + [[ ${${=3Ds}[1]} =3D foo ]] && echo OK + fi + ) +0:reset IFS to default if it contains illegal character +>OK + () { setopt localoptions extendedglob [[ $- =3D [[:alnum:]]## ]] || print Failed 1