zsh-workers
 help / color / mirror / code / Atom feed
* [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
@ 2022-11-18 14:27 Stephane Chazelas
  2022-11-29 14:27 ` Jun. T
  2022-11-30 14:56 ` Jun. T
  0 siblings, 2 replies; 6+ messages in thread
From: Stephane Chazelas @ 2022-11-18 14:27 UTC (permalink / raw)
  To: Zsh hackers list

$ LC_ALL=C zsh -c 'IFS=é$IFS; echo $=IFS'
^C

(busy loop had to be interrupted with ^C).

Call trace:

(gdb) bt
#0  0x00005555556249ad in mb_metacharlenconv (s=0x555555675185 "\203 ", wcp=0x7fffffffc760) at utils.c:5541
#1  0x0000555555622791 in itype_end (ptr=0x555555675185 "\203 ", itype=32, once=1) at utils.c:4332
#2  0x000055555562121e in wordcount (s=0x555555675185 "\203 ", sep=0x0, mul=-1) at utils.c:3835
#3  0x0000555555620b29 in spacesplit (s=0x555555675180 "\303\251 \t\n\203 ", allownull=0, heap=1, quote=0) at utils.c:3650
#4  0x0000555555621495 in sepsplit (s=0x555555675180 "\303\251 \t\n\203 ", sep=0x0, allownull=0, heap=1) at utils.c:3908
#5  0x0000555555612f55 in paramsubst (l=0x7ffff7fbf560, n=0x7ffff7fbf590, str=0x7fffffffce00, qt=0, pf_flags=0, ret_flags=0x7fffffffcfcc) at subst.c:3660
#6  0x000055555560bb37 in stringsubst (list=0x7ffff7fbf560, node=0x7ffff7fbf590, pf_flags=0, ret_flags=0x7fffffffcfcc, asssub=0) at subst.c:322
#7  0x000055555560abbc in prefork (list=0x7ffff7fbf560, flags=0, ret_flags=0x7fffffffcfcc) at subst.c:142
#8  0x0000555555595cfd in execcmd_exec (state=0x7fffffffd940, eparams=0x7fffffffd540, input=0, output=0, how=18, last1=1, close_if_forked=-1) at exec.c:3232
#9  0x0000555555592757 in execpline2 (state=0x7fffffffd940, pcode=131, how=18, input=0, output=0, last1=1) at exec.c:1966
#10 0x0000555555590f30 in execpline (state=0x7fffffffd940, slcode=4098, how=18, last1=1) at exec.c:1691
#11 0x000055555559009a in execlist (state=0x7fffffffd940, dont_change_job=0, exiting=1) at exec.c:1444
#12 0x000055555558f735 in execode (p=0x7ffff7fbf448, dont_change_job=0, exiting=1, context=0x55555562e108 "cmdarg") at exec.c:1221
#13 0x000055555558f60c in execstring (s=0x7fffffffdedc "IFS=\303\251$IFS; echo $=IFS", dont_change_job=0, exiting=1, context=0x55555562e108 "cmdarg")
    at exec.c:1187
#14 0x00005555555ba5f9 in init_misc (cmd=0x7fffffffdedc "IFS=\303\251$IFS; echo $=IFS", zsh_name=0x7fffffffded5 "zsh") at init.c:1389
#15 0x00005555555bbd2d in zsh_main (argc=3, argv=0x7fffffffdb58) at init.c:1780
#16 0x000055555556ad29 in main (argc=3, argv=0x7fffffffdb58) at ./main.c:93


With +o multibyte, no busy loop, but splitting doesn't work properly:

$ LC_ALL=C zsh +o multibyte -c 'IFS=é$IFS; printf "<%q>\n" $=IFS'
<$'\303'$'\251'>
<''>

That's triggered when IFS contains both NUL and a byte over 0x7f
(in any order) and when the variable to split contains NUL.

In UTF-8 locales, that's triggered when IFS contains NUL and
bytes or byte sequences not forming parts of valid characters.

"read" doesn't seem to be affected:

$ print 'foo\0bar' | LC_ALL=C zsh -c 'IFS=é$IFS read -rA a; typeset a'
a=( foo bar )

(that's on Debian GNU/Linux amd64 with zsh git HEAD)

-- 
Stephane



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
  2022-11-18 14:27 [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f Stephane Chazelas
@ 2022-11-29 14:27 ` Jun. T
  2022-11-29 14:38   ` Peter Stephenson
  2022-11-30 14:56 ` Jun. T
  1 sibling, 1 reply; 6+ messages in thread
From: Jun. T @ 2022-11-29 14:27 UTC (permalink / raw)
  To: zsh-workers


> 2022/11/18 23:27, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> $ LC_ALL=C zsh -c 'IFS=é$IFS; echo $=IFS'
> ^C
> 
> (busy loop had to be interrupted with ^C).

This not simple to solve. The basic question is:
   What should we do if IFS contains invalid characters?

When IFS changes, ifssetfn() calls inittyptab(), and it then calls
set_widearay() (at line 4172 in utils.c) to set the structure
ifs_wide. The origin of the problem seems to be in this function
(also in utils.c):

  95             mblen = mb_metacharlenconv(mb_array, &wci);
  ..
  99             /* No good unless all characters are convertible */
 100             if (wci == WEOF)
 101                 return;

mb_array is the current IFS (metafied), and it contains
é = \xc3\xa9. In the C locale (and at least on Linux), \xc3 is
an invalid character, and wci is set to WEOF. Then the function
returns without setting ifs_wide (ifs_wide.chars=NULL and
ifs_wide.len=0).

The comment at line 99 may look reasonable, but leaving ifs_wide
empty is equally 'no good', I think.

Due to this empty ifs_wide, itype_end() (and wcsitype()) doesn't
work as expected (for character >= \x80).

The 'busy loop' is in wordcount() (utils.c):

3834         for (; *s; r++) {                                                 
3835             char *ie = itype_end(s, ISEP, 1);               
3836             if (ie != s) {                                             
3837                 s = ie;                                                
....                          
3840             }                                          
3841             (void)findsep(&s, NULL, 0);
....
3845         }

Here, the pointer s already points to a ISEP (\x83\x20 = metafied Nul),
but itype_end() can't find the next ISEP (ie == s) due to the empty
ifs_wide, and findsep() does not move s because *s is already ISEP,
resulting in infinite-loop with the same s.

So the basic question is:
What should we do if IFS contains invalid character(s)?

I think, at least if MULTIBYTE option is ON, it would be better to
force reset IFS to the default, rather than leaving ifs_wide empty.

Or store only valid characters in ifs_side.chars?

BTW, in set_widearay():

  89             if (STOUC(*mb_array) <= 0x7f) {
  90                 mb_array++;
  91                 *wcptr++ = (wchar_t)*mb_array;

I think the lines 90,91 should be
	*wcptr++ = (wchar_t)*mb_array++;
But fixing this does not solve the current problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
  2022-11-29 14:27 ` Jun. T
@ 2022-11-29 14:38   ` Peter Stephenson
  2022-11-30  4:20     ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Stephenson @ 2022-11-29 14:38 UTC (permalink / raw)
  To: zsh-workers

> On 29/11/2022 14:27 Jun. T <takimoto-j@kba.biglobe.ne.jp> wrote:
> 3834         for (; *s; r++) {                                                 
> 3835             char *ie = itype_end(s, ISEP, 1);               
> 3836             if (ie != s) {                                             
> 3837                 s = ie;                                                
> ....                          
> 3840             }                                          
> 3841             (void)findsep(&s, NULL, 0);
> ....
> 3845         }
> 
> Here, the pointer s already points to a ISEP (\x83\x20 = metafied Nul),
> but itype_end() can't find the next ISEP (ie == s) due to the empty
> ifs_wide, and findsep() does not move s because *s is already ISEP,
> resulting in infinite-loop with the same s.

I guess the obvious thing to do here is any time s doesn't move
give up splitting at that point.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
  2022-11-29 14:38   ` Peter Stephenson
@ 2022-11-30  4:20     ` Bart Schaefer
  2022-11-30  9:21       ` Peter Stephenson
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2022-11-30  4:20 UTC (permalink / raw)
  To: zsh-workers

On Tue, Nov 29, 2022 at 6:41 AM Peter Stephenson
<p.w.stephenson@ntlworld.com> wrote:
>
> I guess the obvious thing to do here is any time s doesn't move
> give up splitting at that point.

I reported the infinite loop back in workers/50472 ("out of memory
error" thread), although the cause doesn't appear to be exactly the
same.

I suppose if s doesn't move we could forcibly advance to see if
another possible split occurs later.  Unless there's clearly a way to
determine that the lack of movement is due to upstream user error
(such as invalid IFS value in this case)?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
  2022-11-30  4:20     ` Bart Schaefer
@ 2022-11-30  9:21       ` Peter Stephenson
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Stephenson @ 2022-11-30  9:21 UTC (permalink / raw)
  To: zsh-workers

> On 30/11/2022 04:20 Bart Schaefer <schaefer@brasslantern.com> wrote
> On Tue, Nov 29, 2022 at 6:41 AM Peter Stephenson
> <p.w.stephenson@ntlworld.com> wrote:
> >
> > I guess the obvious thing to do here is any time s doesn't move
> > give up splitting at that point.
> 
> I reported the infinite loop back in workers/50472 ("out of memory
> error" thread), although the cause doesn't appear to be exactly the
> same.
> 
> I suppose if s doesn't move we could forcibly advance to see if
> another possible split occurs later.  Unless there's clearly a way to
> determine that the lack of movement is due to upstream user error
> (such as invalid IFS value in this case)?

Engaging finger-in-the-air mode suggests in a case like this just trying
to be safe is the best we've got, unless we have proof positive we can
do something better in some likely scenario --- Jun may have a feel
for whether there's a potential good (as opposed to not completely
broken) outcome of the case he's looking at, in which case we can be
clever, otherwise I don't think it's worth it.

pws


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f
  2022-11-18 14:27 [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f Stephane Chazelas
  2022-11-29 14:27 ` Jun. T
@ 2022-11-30 14:56 ` Jun. T
  1 sibling, 0 replies; 6+ messages in thread
From: Jun. T @ 2022-11-30 14:56 UTC (permalink / raw)
  To: zsh-workers


> 2022/11/18 23:27, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> With +o multibyte, no busy loop, but splitting doesn't work properly:
> 
> $ LC_ALL=C zsh +o multibyte -c 'IFS=é$IFS; printf "<%q>\n" $=IFS'
> <$'\303'$'\251'>
> <''>

It seems this can be fixed by the following patch
(use the multibyte code only if MULTIBYTE option is on).

The test script above gives
<''>
<''>
<''>
<''>

I gess this is the expected result (the description of IFS in man
zshparam(1) is not easy to understand).

If this works OK, then I think we can force reset IFS if an invalid
character is found in it when multibyte option is on, because
if a user wants (in C locale) to include any byte in IFS then she/he
can unset multibyte option.



diff --git a/Src/utils.c b/Src/utils.c
index edf5d3df7..a182553e7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -74,9 +74,6 @@ set_widearray(char *mb_array, Widechar_array wca)
     }
     wca->len = 0;
 
-    if (!isset(MULTIBYTE))
-	return;
-
     if (mb_array) {
 	VARARR(wchar_t, tmpwcs, strlen(mb_array));
 	wchar_t *wcptr = tmpwcs;
@@ -4118,8 +4115,9 @@ inittyptab(void)
      * having IIDENT here is a good idea at all, but this code
      * should disappear into history...
      */
-    for (t0 = 0240; t0 != 0400; t0++)
-	typtab[t0] = IALPHA | IALNUM | IIDENT | IUSER | IWORD;
+    if isset(MULTIBYTE)
+	for (t0 = 0240; t0 != 0400; t0++)
+	    typtab[t0] = IALPHA | IALNUM | IIDENT | IUSER | IWORD;
 #endif
     /* typtab['.'] |= IIDENT; */ /* Allow '.' in variable names - broken */
     typtab['_'] = IIDENT | IUSER;
@@ -4138,7 +4136,7 @@ inittyptab(void)
 	DEFAULT_IFS_SH : DEFAULT_IFS; *s; s++) {
 	int c = STOUC(*s == Meta ? *++s ^ 32 : *s);
 #ifdef MULTIBYTE_SUPPORT
-	if (!isascii(c)) {
+	if (isset(MULTIBYTE) && !isascii(c)) {
 	    /* see comment for wordchars below */
 	    continue;
 	}
@@ -4154,7 +4152,7 @@ inittyptab(void)
     for (s = wordchars ? wordchars : DEFAULT_WORDCHARS; *s; s++) {
 	int c = STOUC(*s == Meta ? *++s ^ 32 : *s);
 #ifdef MULTIBYTE_SUPPORT
-	if (!isascii(c)) {
+	if (isset(MULTIBYTE) && !isascii(c)) {
 	    /*
 	     * If we have support for multibyte characters, we don't
 	     * handle non-ASCII characters here; instead, we turn
@@ -4168,9 +4166,11 @@ inittyptab(void)
 	typtab[c] |= IWORD;
     }
 #ifdef MULTIBYTE_SUPPORT
-    set_widearray(wordchars, &wordchars_wide);
-    set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ?
-	DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide);
+    if (isset(MULTIBYTE)) {
+	set_widearray(wordchars, &wordchars_wide);
+	set_widearray(ifs ? ifs : EMULATION(EMULATE_KSH|EMULATE_SH) ?
+	    DEFAULT_IFS_SH : DEFAULT_IFS, &ifs_wide);
+    }
 #endif
     for (s = SPECCHARS; *s; s++)
 	typtab[STOUC(*s)] |= ISPECIAL;



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-30 14:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-18 14:27 [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f Stephane Chazelas
2022-11-29 14:27 ` Jun. T
2022-11-29 14:38   ` Peter Stephenson
2022-11-30  4:20     ` Bart Schaefer
2022-11-30  9:21       ` Peter Stephenson
2022-11-30 14:56 ` Jun. T

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).