zsh-workers
 help / color / mirror / code / Atom feed
From: "Jun. T" <takimoto-j@kba.biglobe.ne.jp>
To: zsh-workers@zsh.org
Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
Date: Wed, 30 Nov 2022 23:56:50 +0900	[thread overview]
Message-ID: <6581C482-5535-43D6-A784-CF16B5289B80@kba.biglobe.ne.jp> (raw)
In-Reply-To: <20221118142717.t4elzrigjeizjm6w@chazelas.org>


> 2022/11/18 23:27, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> With +o multibyte, no busy loop, but splitting doesn't work properly:
> 
> $ LC_ALL=C zsh +o multibyte -c 'IFS=é$IFS; printf "<%q>\n" $=IFS'
> <$'\303'$'\251'>
> <''>

It seems this can be fixed by the following patch
(use the multibyte code only if MULTIBYTE option is on).

The test script above gives
<''>
<''>
<''>
<''>

I gess this is the expected result (the description of IFS in man
zshparam(1) is not easy to understand).

If this works OK, then I think we can force reset IFS if an invalid
character is found in it when multibyte option is on, because
if a user wants (in C locale) to include any byte in IFS then she/he
can unset multibyte option.



diff --git a/Src/utils.c b/Src/utils.c
index edf5d3df7..a182553e7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -74,9 +74,6 @@ set_widearray(char *mb_array, Widechar_array wca)
     }
     wca->len = 0;
 
-    if (!isset(MULTIBYTE))
-	return;
-
     if (mb_array) {
 	VARARR(wchar_t, tmpwcs, strlen(mb_array));
 	wchar_t *wcptr = tmpwcs;
@@ -4118,8 +4115,9 @@ inittyptab(void)
      * having IIDENT here is a good idea at all, but this code
      * should disappear into history...
      */
-    for (t0 = 0240; t0 != 0400; t0++)
-	typtab[t0] = IALPHA | IALNUM | IIDENT | IUSER | IWORD;
+    if isset(MULTIBYTE)
+	for (t0 = 0240; t0 != 0400; t0++)
+	    typtab[t0] = IALPHA | IALNUM | IIDENT | IUSER | IWORD;
 #endif
     /* typtab['.'] |= IIDENT; */ /* Allow '.' in variable names - broken */
     typtab['_'] = IIDENT | IUSER;
@@ -4138,7 +4136,7 @@ inittyptab(void)
 	DEFAULT_IFS_SH : DEFAULT_IFS; *s; s++) {
 	int c = STOUC(*s == Meta ? *++s ^ 32 : *s);
 #ifdef MULTIBYTE_SUPPORT
-	if (!isascii(c)) {
+	if (isset(MULTIBYTE) && !isascii(c)) {
 	    /* see comment for wordchars below */
 	    continue;
 	}
@@ -4154,7 +4152,7 @@ inittyptab(void)
     for (s = wordchars ? wordchars : DEFAULT_WORDCHARS; *s; s++) {
 	int c = STOUC(*s == Meta ? *++s ^ 32 : *s);
 #ifdef MULTIBYTE_SUPPORT
-	if (!isascii(c)) {
+	if (isset(MULTIBYTE) && !isascii(c)) {
 	    /*
 	     * If we have support for multibyte characters, we don't
 	     * handle non-ASCII characters here; instead, we turn
@@ -4168,9 +4166,11 @@ inittyptab(void)
 	typtab[c] |= IWORD;
     }
 #ifdef MULTIBYTE_SUPPORT
-    set_widearray(wordchars, &wordchars_wide);
-    set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ?
-	DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide);
+    if (isset(MULTIBYTE)) {
+	set_widearray(wordchars, &wordchars_wide);
+	set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ?
+	    DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide);
+    }
 #endif
     for (s = SPECCHARS; *s; s++)
 	typtab[STOUC(*s)] |= ISPECIAL;



      parent reply	other threads:[~2022-11-30 14:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18 14:27 Stephane Chazelas
2022-11-29 14:27 ` Jun. T
2022-11-29 14:38   ` Peter Stephenson
2022-11-30  4:20     ` Bart Schaefer
2022-11-30  9:21       ` Peter Stephenson
2022-12-13  9:50         ` Jun T
2022-12-13  9:49     ` Jun T
2022-12-13 10:13       ` Peter Stephenson
2022-12-13 11:40         ` Jun T
2022-12-13 11:55           ` Peter Stephenson
2023-06-21  4:49             ` Jun T
2022-12-11 19:12   ` Stephane Chazelas
2022-12-13  9:51   ` Jun T
2022-11-30 14:56 ` Jun. T [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6581C482-5535-43D6-A784-CF16B5289B80@kba.biglobe.ne.jp \
    --to=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).