From: "Jun. T" <takimoto-j@kba.biglobe.ne.jp>
To: zsh-workers@zsh.org
Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
Date: Tue, 29 Nov 2022 23:27:10 +0900 [thread overview]
Message-ID: <AD3B3716-F348-4C93-981F-8B8DD624C37D@kba.biglobe.ne.jp> (raw)
In-Reply-To: <20221118142717.t4elzrigjeizjm6w@chazelas.org>
> 2022/11/18 23:27, Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ LC_ALL=C zsh -c 'IFS=é$IFS; echo $=IFS'
> ^C
>
> (busy loop had to be interrupted with ^C).
This not simple to solve. The basic question is:
What should we do if IFS contains invalid characters?
When IFS changes, ifssetfn() calls inittyptab(), and it then calls
set_widearay() (at line 4172 in utils.c) to set the structure
ifs_wide. The origin of the problem seems to be in this function
(also in utils.c):
95 mblen = mb_metacharlenconv(mb_array, &wci);
..
99 /* No good unless all characters are convertible */
100 if (wci == WEOF)
101 return;
mb_array is the current IFS (metafied), and it contains
é = \xc3\xa9. In the C locale (and at least on Linux), \xc3 is
an invalid character, and wci is set to WEOF. Then the function
returns without setting ifs_wide (ifs_wide.chars=NULL and
ifs_wide.len=0).
The comment at line 99 may look reasonable, but leaving ifs_wide
empty is equally 'no good', I think.
Due to this empty ifs_wide, itype_end() (and wcsitype()) doesn't
work as expected (for character >= \x80).
The 'busy loop' is in wordcount() (utils.c):
3834 for (; *s; r++) {
3835 char *ie = itype_end(s, ISEP, 1);
3836 if (ie != s) {
3837 s = ie;
....
3840 }
3841 (void)findsep(&s, NULL, 0);
....
3845 }
Here, the pointer s already points to a ISEP (\x83\x20 = metafied Nul),
but itype_end() can't find the next ISEP (ie == s) due to the empty
ifs_wide, and findsep() does not move s because *s is already ISEP,
resulting in infinite-loop with the same s.
So the basic question is:
What should we do if IFS contains invalid character(s)?
I think, at least if MULTIBYTE option is ON, it would be better to
force reset IFS to the default, rather than leaving ifs_wide empty.
Or store only valid characters in ifs_side.chars?
BTW, in set_widearay():
89 if (STOUC(*mb_array) <= 0x7f) {
90 mb_array++;
91 *wcptr++ = (wchar_t)*mb_array;
I think the lines 90,91 should be
*wcptr++ = (wchar_t)*mb_array++;
But fixing this does not solve the current problem.
next prev parent reply other threads:[~2022-11-29 14:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-18 14:27 Stephane Chazelas
2022-11-29 14:27 ` Jun. T [this message]
2022-11-29 14:38 ` Peter Stephenson
2022-11-30 4:20 ` Bart Schaefer
2022-11-30 9:21 ` Peter Stephenson
2022-12-13 9:50 ` Jun T
2022-12-13 9:49 ` Jun T
2022-12-13 10:13 ` Peter Stephenson
2022-12-13 11:40 ` Jun T
2022-12-13 11:55 ` Peter Stephenson
2023-06-21 4:49 ` Jun T
2022-12-11 19:12 ` Stephane Chazelas
2022-12-13 9:51 ` Jun T
2022-11-30 14:56 ` Jun. T
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AD3B3716-F348-4C93-981F-8B8DD624C37D@kba.biglobe.ne.jp \
--to=takimoto-j@kba.biglobe.ne.jp \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).