From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 28877 invoked from network); 21 Jun 2023 04:50:24 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 21 Jun 2023 04:50:24 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1687323024; b=k35FSGR9FqhxTLvVhsxiRg6Sl/FkOCZNvxi8ZZQ2wHol29mi4q805FSoQRJUHaonSl3mZUX9VT qS3L87Pf+xtn7V8pjsyKYFPjtlACHZhELqUOw7ptQ0tvNlUu5rnrEwcf34AJNS4VtbuaIHSiZF JXz5cnCLpxuV86aTZlSgXVBWGgt3MLeMd0y4LHkftkZpxehErSsGHIVFkZHSFFxAf+ppA/jRhF NXN7Igmu6AQnc8R5RUic8OnnFZJfYPQyRaQg6pYGUckGIaMX1JOS/gWW75CmvhEAhCzWYwjsPv sigo/rUoLAsjNnuTQOL7FJUFtXKUSX2tPAWOOO6mQ3jgmA==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (snd20004-bg.im.kddi.ne.jp) smtp.remote-ip=222.227.84.4; dmarc=none header.from=kba.biglobe.ne.jp; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1687323024; bh=q+jfZN4XMZiJ+E2VT39DFzegiNQPfqS5lJlOXF9/u7I=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:To:References:Message-ID:Content-Transfer-Encoding:Date: In-Reply-To:From:Subject:MIME-Version:Content-Type:DKIM-Signature; b=a6VwB1pEmok9CqKG+SLWRql8zd8VV9p1CQV0GcZ2VWGizChltmcoln6t6duqz/0YviIvVEXQ6c 1CQ4+HRMdMvhqS9xu8yr/ODNckPmeMY6S8jtkj0EtslESsXpNGT6yENYmnDIx5QUp8YCeFWEgI MrlO1FI3Ktx6XoeGZR6BCKMScx0wXsf/i3BNesg8brd7SqKziWGnmVBbsID2Mh4LxELWaxLp0h 9tIK2+hLSacYWFT2KaNndt6i4qcAWgdEXT54etSLS4DFymQAcvFGeVJ6PFxdfHgAr0ZqxjIBqj 63bpuYLy80i+RouWvJyh3iYMSw92mKQmG13cIHaVrOCz3w==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:To:References:Message-Id: Content-Transfer-Encoding:Date:In-Reply-To:From:Subject:Mime-Version: Content-Type:Reply-To:Cc:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=y0bRgkD7COYYCjGThoQff+z0f+l0lJkH/aFEUWcZu8Y=; b=HmbpJlbtnLEmnTexgcEOGbkqjl 4W2LDTlolpLdSAEcBYuWe3QhQv9qV78i/Uz4Xngmz9xbgpRIhOdQch8uJde8Aberp/ZqQLH5wFeOL xXS2TimvuXuFUf7NILTK64bAECf7gcA3Q8o4Z9FuFTt8wDxhK0JgT+DWv3MMRCyhx6AGy5wo2Mjrp bONgWK89mtr69eLTahK4dpxYigXXrkvNbonnVH3eWLO69E0asLVSa8oIjqHmpRTVW/3CC8vGdbZZW 8n9f7BTRnj5Fxle6v6YQsD4yIIKHcmkrw2vJaP2xFOqIimzspdF4LB6qcfC7ixRZsGPKigysX/JAx TXFrBfLg==; Received: by zero.zsh.org with local id 1qBpnT-000Ny4-Ir; Wed, 21 Jun 2023 04:50:23 +0000 Authentication-Results: zsh.org; iprev=pass (snd20004-bg.im.kddi.ne.jp) smtp.remote-ip=222.227.84.4; dmarc=none header.from=kba.biglobe.ne.jp; arc=none Received: from snd20004-bg.im.kddi.ne.jp ([222.227.84.4]:17281 helo=dfmta0010.biglobe.ne.jp) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1qBpn5-000NeY-QU; Wed, 21 Jun 2023 04:50:04 +0000 Received: from mail.biglobe.ne.jp by omta0010.biglobe.ne.jp with ESMTP id <20230621044954747.NJGQ.8926.mail.biglobe.ne.jp@biglobe.ne.jp> for ; Wed, 21 Jun 2023 13:49:54 +0900 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.21\)) Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f From: Jun T X-Priority: 3 In-Reply-To: <946835209.7082170.1670932555405@mail.virginmedia.com> Date: Wed, 21 Jun 2023 13:49:54 +0900 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20221118142717.t4elzrigjeizjm6w@chazelas.org> <351204342.6213761.1669732685914@mail.virginmedia.com> <98F2CEB4-691A-4DA3-9B41-5341EA3E8B9B@kba.biglobe.ne.jp> <985975587.7151691.1670926401043@mail.virginmedia.com> <7910F067-9694-432B-9890-2BA25692C2C9@kba.biglobe.ne.jp> <946835209.7082170.1670932555405@mail.virginmedia.com> To: zsh-workers@zsh.org X-Mailer: Apple Mail (2.3445.104.21) X-Biglobe-Sender: takimoto-j@kba.biglobe.ne.jp X-Seq: 51884 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: Sorry, I've forgotten to finish this. The patch below is the same as 51205=E2=81=A9 (Dec.13, 2022), except: zwarn() is added when resetting IFS. a test is added to confirm that IFS can contain any bytes if multibyte option is off. diff --git a/Doc/Zsh/params.yo b/Doc/Zsh/params.yo index 57d10b8bd..e0410d673 100644 --- a/Doc/Zsh/params.yo +++ b/Doc/Zsh/params.yo @@ -1325,15 +1325,18 @@ Internal field separators (by default space, = tab, newline and NUL), that are used to separate words which result from command or parameter expansion and words read by the tt(read) builtin. Any characters from the set space, tab and -newline that appear in the IFS are called em(IFS white space). +newline that appear in the tt(IFS) are called em(IFS white space). One or more IFS white space characters or one non-IFS white space character together with any adjacent IFS white space character delimit a field. If an IFS white space character appears twice consecutively -in the IFS, this character is treated as if it were not an IFS white +in the tt(IFS), this character is treated as if it were not an IFS = white space character. =20 If the parameter is unset, the default is used. Note this has a different effect from setting the parameter to an empty string. + +If tt(MULTIBYTE) option is on and tt(IFS) contains invalid characters = in +the current locale, it is reset to the default. ) vindex(KEYBOARD_HACK) item(tt(KEYBOARD_HACK))( diff --git a/Src/params.c b/Src/params.c index 021d341e8..e25c1286c 100644 --- a/Src/params.c +++ b/Src/params.c @@ -4723,6 +4723,7 @@ setlang(char *x) if ((x =3D getsparam_u(ln->name)) && *x) setlocale(ln->category, x); unqueue_signals(); + inittyptab(); } =20 /**/ @@ -4746,6 +4747,7 @@ lc_allsetfn(Param pm, char *x) else { setlocale(LC_ALL, unmeta(x)); clear_mbstate(); + inittyptab(); } } =20 @@ -4784,6 +4786,7 @@ lcsetfn(Param pm, char *x) } unqueue_signals(); clear_mbstate(); /* LC_CTYPE may have changed */ + inittyptab(); } #endif /* USE_LOCALE */ =20 diff --git a/Src/utils.c b/Src/utils.c index f13e3a79d..58b4f7149 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -74,9 +74,6 @@ set_widearray(char *mb_array, Widechar_array wca) } wca->len =3D 0; =20 - if (!isset(MULTIBYTE)) - return; - if (mb_array) { VARARR(wchar_t, tmpwcs, strlen(mb_array)); wchar_t *wcptr =3D tmpwcs; @@ -87,8 +84,7 @@ set_widearray(char *mb_array, Widechar_array wca) int mblen; =20 if ((unsigned char) *mb_array <=3D 0x7f) { - mb_array++; - *wcptr++ =3D (wchar_t)*mb_array; + *wcptr++ =3D (wchar_t)*mb_array++; continue; } =20 @@ -4121,8 +4117,9 @@ inittyptab(void) * having IIDENT here is a good idea at all, but this code * should disappear into history... */ - for (t0 =3D 0240; t0 !=3D 0400; t0++) - typtab[t0] =3D IALPHA | IALNUM | IIDENT | IUSER | IWORD; + if isset(MULTIBYTE) + for (t0 =3D 0240; t0 !=3D 0400; t0++) + typtab[t0] =3D IALPHA | IALNUM | IIDENT | IUSER | IWORD; #endif /* typtab['.'] |=3D IIDENT; */ /* Allow '.' in variable names - = broken */ typtab['_'] =3D IIDENT | IUSER; @@ -4137,11 +4134,24 @@ inittyptab(void) typtab[t0] |=3D ITOK | IMETA; for (t0 =3D (int) (unsigned char) Snull; t0 <=3D (int) (unsigned = char) Nularg; t0++) typtab[t0] |=3D ITOK | IMETA | INULL; - for (s =3D ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ? - DEFAULT_IFS_SH : DEFAULT_IFS; *s; s++) { + /* ifs */ +#define CURRENT_DEFAULT_IFS (EMULATION(EMULATE_KSH|EMULATE_SH) ? \ + DEFAULT_IFS_SH : DEFAULT_IFS) +#ifdef MULTIBYTE_SUPPORT + if (isset(MULTIBYTE)) { + set_widearray(ifs ? ifs : CURRENT_DEFAULT_IFS, &ifs_wide); + if (ifs && !ifs_wide.chars) { + zwarn("IFS has an invalid character; resetting IFS to = default"); + zsfree(ifs); + ifs =3D ztrdup(CURRENT_DEFAULT_IFS); + set_widearray(ifs, &ifs_wide); + } + } +#endif + for (s =3D ifs ? ifs : CURRENT_DEFAULT_IFS; *s; s++) { int c =3D (unsigned char) (*s =3D=3D Meta ? *++s ^ 32 : *s); #ifdef MULTIBYTE_SUPPORT - if (!isascii(c)) { + if (isset(MULTIBYTE) && !isascii(c)) { /* see comment for wordchars below */ continue; } @@ -4154,10 +4164,15 @@ inittyptab(void) } typtab[c] |=3D ISEP; } + /* wordchars */ +#ifdef MULTIBYTE_SUPPORT + if (isset(MULTIBYTE)) + set_widearray(wordchars, &wordchars_wide); +#endif for (s =3D wordchars ? wordchars : DEFAULT_WORDCHARS; *s; s++) { int c =3D (unsigned char) (*s =3D=3D Meta ? *++s ^ 32 : *s); #ifdef MULTIBYTE_SUPPORT - if (!isascii(c)) { + if (isset(MULTIBYTE) && !isascii(c)) { /* * If we have support for multibyte characters, we don't * handle non-ASCII characters here; instead, we turn @@ -4170,11 +4185,6 @@ inittyptab(void) #endif typtab[c] |=3D IWORD; } -#ifdef MULTIBYTE_SUPPORT - set_widearray(wordchars, &wordchars_wide); - set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ? - DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide); -#endif for (s =3D SPECCHARS; *s; s++) typtab[(unsigned char) *s] |=3D ISPECIAL; if (typtab_flags & ZTF_SP_COMMA) diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst index 2fd2f975f..0d44558a7 100644 --- a/Test/D04parameter.ztst +++ b/Test/D04parameter.ztst @@ -2280,6 +2280,27 @@ F:We do not care what $OLDPWD is, as long as it = does not cause an error F:As of this writing, var=3D$@ and var=3D"$@" with null IFS have = unspecified F:behavior, see http://austingroupbugs.net/view.php?id=3D888 =20 + ( + IFS=3D$'\x80' + if [[ $IFS =3D $' \t\n\0' ]]; then + echo OK # if $'\x80' is illegal (e.g. Linux) + else # otherwise (e.g. macOS), it should work as a separator + s=3D$'foo\x80\bar' + [[ ${${=3Ds}[1]} =3D foo ]] && echo OK + fi + ) +0D:reset IFS to default if it contains illegal character +>OK + + ( + unsetopt multibyte + IFS=3D$'\xc3\xa9' + s=3D$'foo\xc3bar\xa9boo' + echo ${${=3Ds}[2]} + ) +0:eight bit chars in IFS should work if multibute option is off +>bar + () { setopt localoptions extendedglob [[ $- =3D [[:alnum:]]## ]] || print Failed 1