zsh-workers
 help / color / mirror / code / Atom feed
From: "Jun. T" <takimoto-j@kba.biglobe.ne.jp>
To: zsh-workers@zsh.org
Subject: Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
Date: Tue, 29 Nov 2022 23:27:10 +0900	[thread overview]
Message-ID: <AD3B3716-F348-4C93-981F-8B8DD624C37D@kba.biglobe.ne.jp> (raw)
In-Reply-To: <20221118142717.t4elzrigjeizjm6w@chazelas.org>


> 2022/11/18 23:27, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> $ LC_ALL=C zsh -c 'IFS=é$IFS; echo $=IFS'
> ^C
> 
> (busy loop had to be interrupted with ^C).

This not simple to solve. The basic question is:
   What should we do if IFS contains invalid characters?

When IFS changes, ifssetfn() calls inittyptab(), and it then calls
set_widearay() (at line 4172 in utils.c) to set the structure
ifs_wide. The origin of the problem seems to be in this function
(also in utils.c):

  95             mblen = mb_metacharlenconv(mb_array, &wci);
  ..
  99             /* No good unless all characters are convertible */
 100             if (wci == WEOF)
 101                 return;

mb_array is the current IFS (metafied), and it contains
é = \xc3\xa9. In the C locale (and at least on Linux), \xc3 is
an invalid character, and wci is set to WEOF. Then the function
returns without setting ifs_wide (ifs_wide.chars=NULL and
ifs_wide.len=0).

The comment at line 99 may look reasonable, but leaving ifs_wide
empty is equally 'no good', I think.

Due to this empty ifs_wide, itype_end() (and wcsitype()) doesn't
work as expected (for character >= \x80).

The 'busy loop' is in wordcount() (utils.c):

3834         for (; *s; r++) {                                                 
3835             char *ie = itype_end(s, ISEP, 1);               
3836             if (ie != s) {                                             
3837                 s = ie;                                                
....                          
3840             }                                          
3841             (void)findsep(&s, NULL, 0);
....
3845         }

Here, the pointer s already points to a ISEP (\x83\x20 = metafied Nul),
but itype_end() can't find the next ISEP (ie == s) due to the empty
ifs_wide, and findsep() does not move s because *s is already ISEP,
resulting in infinite-loop with the same s.

So the basic question is:
What should we do if IFS contains invalid character(s)?

I think, at least if MULTIBYTE option is ON, it would be better to
force reset IFS to the default, rather than leaving ifs_wide empty.

Or store only valid characters in ifs_side.chars?

BTW, in set_widearay():

  89             if (STOUC(*mb_array) <= 0x7f) {
  90                 mb_array++;
  91                 *wcptr++ = (wchar_t)*mb_array;

I think the lines 90,91 should be
	*wcptr++ = (wchar_t)*mb_array++;
But fixing this does not solve the current problem.

  reply	other threads:[~2022-11-29 14:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18 14:27 Stephane Chazelas
2022-11-29 14:27 ` Jun. T [this message]
2022-11-29 14:38   ` Peter Stephenson
2022-11-30  4:20     ` Bart Schaefer
2022-11-30  9:21       ` Peter Stephenson
2022-12-13  9:50         ` Jun T
2022-12-13  9:49     ` Jun T
2022-12-13 10:13       ` Peter Stephenson
2022-12-13 11:40         ` Jun T
2022-12-13 11:55           ` Peter Stephenson
2023-06-21  4:49             ` Jun T
2022-12-11 19:12   ` Stephane Chazelas
2022-12-13  9:51   ` Jun T
2022-11-30 14:56 ` Jun. T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AD3B3716-F348-4C93-981F-8B8DD624C37D@kba.biglobe.ne.jp \
    --to=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).