zsh-workers
 help / color / mirror / code / Atom feed
* [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
@ 2022-09-23 10:54 Stephane Chazelas
  2022-09-24 11:04 ` Jun. T
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-23 10:54 UTC (permalink / raw)
  To: Zsh hackers list

$ (limit cputime 10; TIMEFMT='%MMiB %U user %S sys'; time zsh +o multibyte -c ": {z..$'\x80'}")
3980MiB 8.84s user 1.40s sys

{$'\x80'..$'\xff} doesn't have the problem, but the  expansion is:

$ zsh +o multibyte -c "printf %s {$'\x80'..$'\xfe'}" | hexdump -C
00000000  5c 4d 2d 40 5c 4d 2d 41  5c 4d 2d 42 5c 4d 2d 43  |\M-@\M-A\M-B\M-C|
00000010  5c 4d 2d 44 5c 4d 2d 45  5c 4d 2d 46 5c 4d 2d 47  |\M-D\M-E\M-F\M-G|
00000020  5c 4d 2d 48 5c 4d 2d 49  5c 4d 2d 4a 5c 4d 2d 4b  |\M-H\M-I\M-J\M-K|
00000030  5c 4d 2d 4c 5c 4d 2d 4d  5c 4d 2d 4e 5c 4d 2d 4f  |\M-L\M-M\M-N\M-O|
00000040  5c 4d 2d 50 5c 4d 2d 51  5c 4d 2d 52 5c 4d 2d 53  |\M-P\M-Q\M-R\M-S|
00000050  5c 4d 2d 54 5c 4d 2d 55  5c 4d 2d 56 5c 4d 2d 57  |\M-T\M-U\M-V\M-W|
00000060  5c 4d 2d 58 5c 4d 2d 59  5c 4d 2d 5a 5c 4d 2d 5b  |\M-X\M-Y\M-Z\M-[|
00000070  5c 4d 2d 5c 5c 4d 2d 5d  5c 4d 2d 5e 5c 4d 2d 5f  |\M-\\M-]\M-^\M-_|
00000080  5c 4d 2d 60 5c 4d 2d 61  5c 4d 2d 62 5c 4d 2d 63  |\M-`\M-a\M-b\M-c|
00000090  5c 4d 2d 64 5c 4d 2d 65  5c 4d 2d 66 5c 4d 2d 67  |\M-d\M-e\M-f\M-g|
000000a0  5c 4d 2d 68 5c 4d 2d 69  5c 4d 2d 6a 5c 4d 2d 6b  |\M-h\M-i\M-j\M-k|
000000b0  5c 4d 2d 6c 5c 4d 2d 6d  5c 4d 2d 6e 5c 4d 2d 6f  |\M-l\M-m\M-n\M-o|
000000c0  5c 4d 2d 70 5c 4d 2d 71  5c 4d 2d 72 5c 4d 2d 73  |\M-p\M-q\M-r\M-s|
000000d0  5c 4d 2d 74 5c 4d 2d 75  5c 4d 2d 76 5c 4d 2d 77  |\M-t\M-u\M-v\M-w|
000000e0  5c 4d 2d 78 5c 4d 2d 79  5c 4d 2d 7a 5c 4d 2d 7b  |\M-x\M-y\M-z\M-{|
000000f0  5c 4d 2d 7c 5c 4d 2d 7d  5c 4d 2d 7e 5c 4d 2d 5e  |\M-|\M-}\M-~\M-^|
00000100  3f 5e 00 5e 01 5e 02 5e  03 5e 04 5e 05 5e 06 5e  |?^.^.^.^.^.^.^.^|
00000110  07 5e 08 5e 09 5e 0a 5e  0b 5e 0c 5e 0d 5e 0e 5e  |.^.^.^.^.^.^.^.^|
00000120  0f 5e 10 5e 11 5e 12 5e  13 5e 14 5e 15 5e 16 5e  |.^.^.^.^.^.^.^.^|
00000130  17 5e 18 5e 19 5e 1a 5e  1b 5e 1c 5e 1d 5e 1e 5e  |.^.^.^.^.^.^.^.^|
00000140  1f 5e 20 5e 21 5e 22 5e  23 5e 24 5e 25 5e 26 5e  |.^ ^!^"^#^$^%^&^|
00000150  27 5e 28 5e 29 5e 2a 5e  2b 5e 2c 5e 2d 5e 2e 5e  |'^(^)^*^+^,^-^.^|
00000160  2f 5e 30 5e 31 5e 32 5e  33 5e 34 5e 35 5e 36 5e  |/^0^1^2^3^4^5^6^|
00000170  37 5e 38 5e 39 5e 3a 5e  3b 5e 3c 5e 3d 5e 3e     |7^8^9^:^;^<^=^>|
0000017f

With {$'\x80'..$'\xff'}, we get:

$ zsh +o multibyte -c "printf %s {$'\x80'..$'\xff'}" | hd
00000000  7b 80 2e 2e ff 7d                                 |{....}|
00000006

One can always use:

() {set -o localoption +o multibyte; bytes=(${(#)@}); } {0..255}

and then

printf %s $^bytes[##x+1,0x81]

To get byte values from x to 0x80 in a {x..y} fashion as a work around

(BTW, the fact that it's MiB above instead of documented KiB on
systems other than Darwin/macos is a separate bug that has
already been reported at least a couple of times in the past).

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-23 10:54 [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte Stephane Chazelas
@ 2022-09-24 11:04 ` Jun. T
  2022-09-25 10:34   ` Stephane Chazelas
  0 siblings, 1 reply; 7+ messages in thread
From: Jun. T @ 2022-09-24 11:04 UTC (permalink / raw)
  To: zsh-workers


> 2022/09/23 19:54, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> $ (limit cputime 10; TIMEFMT='%MMiB %U user %S sys'; time zsh +o multibyte -c ": {z..$'\x80'}")
> 3980MiB 8.84s user 1.40s sys
> 
> {$'\x80'..$'\xff} doesn't have the problem, but the  expansion is:
(snip)
> With {$'\x80'..$'\xff'}, we get:
> 
> $ zsh +o multibyte -c "printf %s {$'\x80'..$'\xff'}" | hd
> 00000000  7b 80 2e 2e ff 7d                                 |{....}|


Does this solve the problem?

diff --git a/Src/utils.c b/Src/utils.c
index 62bd3e602..edf5d3df7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
     if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
 	/* treat as single byte, possibly metafied */
 	if (wcp)
-	    *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
+	    *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
 	return 1 + (*s == Meta);
     }
     /*




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-24 11:04 ` Jun. T
@ 2022-09-25 10:34   ` Stephane Chazelas
  2022-09-26  5:49     ` Jun T
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-25 10:34 UTC (permalink / raw)
  To: Jun. T; +Cc: zsh-workers

On 2022-09-24 12:04, Jun. T wrote:
[...]
> Does this solve the problem?
[...]

Thanks that's better, but now:

$ echo $options[multibyte]
off
$ printf %s {$'\x80'..$'\xff'} | hexdump -C
00000000  5c 4d 2d 5e 40 5c 4d 2d  5e 41 5c 4d 2d 5e 42 5c  
|\M-^@\M-^A\M-^B\|
00000010  4d 2d 5e 43 5c 4d 2d 5e  44 5c 4d 2d 5e 45 5c 4d  
|M-^C\M-^D\M-^E\M|
00000020  2d 5e 46 5c 4d 2d 5e 47  5c 4d 2d 5e 48 5c 4d 2d  
|-^F\M-^G\M-^H\M-|
00000030  5c 74 5c 4d 2d 5c 6e 5c  4d 2d 5e 4b 5c 4d 2d 5e  
|\t\M-\n\M-^K\M-^|
00000040  4c 5c 4d 2d 5e 4d 5c 4d  2d 5e 4e 5c 4d 2d 5e 4f  
|L\M-^M\M-^N\M-^O|
00000050  5c 4d 2d 5e 50 5c 4d 2d  5e 51 5c 4d 2d 5e 52 5c  
|\M-^P\M-^Q\M-^R\|
00000060  4d 2d 5e 53 5c 4d 2d 5e  54 5c 4d 2d 5e 55 5c 4d  
|M-^S\M-^T\M-^U\M|
00000070  2d 5e 56 5c 4d 2d 5e 57  5c 4d 2d 5e 58 5c 4d 2d  
|-^V\M-^W\M-^X\M-|
00000080  5e 59 5c 4d 2d 5e 5a 5c  4d 2d 5e 5b 5c 4d 2d 5e  
|^Y\M-^Z\M-^[\M-^|
00000090  5c 5c 4d 2d 5e 5d 5c 4d  2d 5e 5e 5c 4d 2d 5e 5f  
|\\M-^]\M-^^\M-^_|
000000a0  c2 a0 c2 a1 c2 a2 c2 a3  c2 a4 c2 a5 c2 a6 c2 a7  
|................|
000000b0  c2 a8 c2 a9 c2 aa c2 ab  c2 ac c2 ad c2 ae c2 af  
|................|
000000c0  c2 b0 c2 b1 c2 b2 c2 b3  c2 b4 c2 b5 c2 b6 c2 b7  
|................|
000000d0  c2 b8 c2 b9 c2 ba c2 bb  c2 bc c2 bd c2 be c2 bf  
|................|
000000e0  c3 80 c3 81 c3 82 c3 83  c3 84 c3 85 c3 86 c3 87  
|................|
000000f0  c3 88 c3 89 c3 8a c3 8b  c3 8c c3 8d c3 8e c3 8f  
|................|
00000100  c3 90 c3 91 c3 92 c3 93  c3 94 c3 95 c3 96 c3 97  
|................|
00000110  c3 98 c3 99 c3 9a c3 9b  c3 9c c3 9d c3 9e c3 9f  
|................|
00000120  c3 a0 c3 a1 c3 a2 c3 a3  c3 a4 c3 a5 c3 a6 c3 a7  
|................|
00000130  c3 a8 c3 a9 c3 aa c3 ab  c3 ac c3 ad c3 ae c3 af  
|................|
00000140  c3 b0 c3 b1 c3 b2 c3 b3  c3 b4 c3 b5 c3 b6 c3 b7  
|................|
00000150  c3 b8 c3 b9 c3 ba c3 bb  c3 bc c3 bd c3 be c3 bf  
|................|
00000160


That's bytes 0x80 to 0x9f with their \M-^X representation followed by 
UTF-8
encoded (in my locale using UTF-8 as charmap) characters U+00A0 to 
U+00FF
instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.

In any case, that (documented) transliteration of unprintable characters 
means
I can't use it for what I initially intended to (get a range of 
arbitrary byte
values). It seems braceccl's {$'\0'-$'\xff'} works for that though 
(though the
documentation suggests it may not be future proof):

> unchanged, unless the option BRACE_CCL (an abbreviation for 'brace
> character class') is set.  In that case, it is expanded to a list of 
> the
> individual characters between the braces sorted into the order of the
> characters in the ASCII character set (multibyte characters are not
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> currently handled).  The syntax is similar to a [...]  expression in
   ^^^^^^^^^^^^^^^^^^

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-25 10:34   ` Stephane Chazelas
@ 2022-09-26  5:49     ` Jun T
  2022-09-28  6:03       ` Stephane Chazelas
  0 siblings, 1 reply; 7+ messages in thread
From: Jun T @ 2022-09-26  5:49 UTC (permalink / raw)
  To: zsh-workers


> 2022/09/25 19:34, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> Thanks that's better, but now:
> 
> $ echo $options[multibyte]
> off
> $ printf %s {$'\x80'..$'\xff'} | hexdump -C
> 00000000  5c 4d 2d 5e 40 5c 4d 2d  5e 41 5c 4d 2d 5e 42 5c  |\M-^@\M-^A\M-^B\|
> 00000010  4d 2d 5e 43 5c 4d 2d 5e  44 5c 4d 2d 5e 45 5c 4d  |M-^C\M-^D\M-^E\M|
(snip)
> 
> That's bytes 0x80 to 0x9f with their \M-^X representation followed by UTF-8
> encoded (in my locale using UTF-8 as charmap) characters U+00A0 to U+00FF
> instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.

Did you try 'LANG=C; setopt print_eight_bit' ?

Anyway, I will push the patch (included below again) with a test.

diff --git a/Src/utils.c b/Src/utils.c
index 62bd3e602..edf5d3df7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
     if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
 	/* treat as single byte, possibly metafied */
 	if (wcp)
-	    *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
+	    *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
 	return 1 + (*s == Meta);
     }
     /*
diff --git a/Test/D09brace.ztst b/Test/D09brace.ztst
index 580ed430f..c289be949 100644
--- a/Test/D09brace.ztst
+++ b/Test/D09brace.ztst
@@ -116,3 +116,10 @@
   print -r {1..10}{..
 0:Unmatched braces after matched braces are left alone.
 >1{.. 2{.. 3{.. 4{.. 5{.. 6{.. 7{.. 8{.. 9{.. 10{..
+
+  () {
+    setopt localoptions no_multibyte
+    echo -E {$'\x80'..$'\x81'}
+  }
+0:range of 8bit chars, mulibyte option unset
+>\M-^@ \M-^A



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-26  5:49     ` Jun T
@ 2022-09-28  6:03       ` Stephane Chazelas
  2022-09-28  7:53         ` Jun T
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-28  6:03 UTC (permalink / raw)
  To: Jun T; +Cc: zsh-workers

2022-09-26 14:49:08 +0900, Jun T:
> 
> > 2022/09/25 19:34, Stephane Chazelas <stephane@chazelas.org> wrote:
> > 
> > Thanks that's better, but now:
> > 
> > $ echo $options[multibyte]
> > off
> > $ printf %s {$'\x80'..$'\xff'} | hexdump -C
> > 00000000  5c 4d 2d 5e 40 5c 4d 2d  5e 41 5c 4d 2d 5e 42 5c  |\M-^@\M-^A\M-^B\|
> > 00000010  4d 2d 5e 43 5c 4d 2d 5e  44 5c 4d 2d 5e 45 5c 4d  |M-^C\M-^D\M-^E\M|
> (snip)
> > 
> > That's bytes 0x80 to 0x9f with their \M-^X representation followed by UTF-8
> > encoded (in my locale using UTF-8 as charmap) characters U+00A0 to U+00FF
> > instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.
> 
> Did you try 'LANG=C; setopt print_eight_bit' ?

Thanks for the print_eight_bit clue, though it doesn't seem to make a difference:

$ zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000  5e 28 5e 29 5e 2a                                 |^(^)^*|
00000006
$ LC_ALL=C zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000  5e 28 5e 29 5e 2a                                 |^(^)^*|
00000006

(here in 5.8, hd being the same as hexdump -C on my system).

> 
> Anyway, I will push the patch (included below again) with a test.
> 
> diff --git a/Src/utils.c b/Src/utils.c
> index 62bd3e602..edf5d3df7 100644
> --- a/Src/utils.c
> +++ b/Src/utils.c
> @@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
>      if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
>  	/* treat as single byte, possibly metafied */
>  	if (wcp)
> -	    *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
> +	    *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
>  	return 1 + (*s == Meta);
>      }
>      /*
[...]

That can't be right. The comment says "treat as single byte", yet the result is
now multibyte characters instead of bytes:

$ ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000  c3 a8 c3 a9 c3 aa                                 |......|
00000006

I asked for bytes 0xE8 to 0xEA and got UTF-8 encoded characters U+00E8 to U+00EA.

Though at least now, I can get what I want in this case with:

$ LC_ALL=C ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000  e8 e9 ea                                          |...|
00000003

(the control characters are still expanded in ^? fashion, so I
still can't get ranges of bytes with that, but that's as
documented).

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-28  6:03       ` Stephane Chazelas
@ 2022-09-28  7:53         ` Jun T
  2022-09-28  9:52           ` Stephane Chazelas
  0 siblings, 1 reply; 7+ messages in thread
From: Jun T @ 2022-09-28  7:53 UTC (permalink / raw)
  To: zsh-workers


> 2022/09/28 15:03, Stephane Chazelas <stephane@chazelas.org> wrote:
> 
> (here in 5.8, hd being the same as hexdump -C on my system).

Please test with my patch applied.

% LC_ALL=C /usr/local/bin/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hexdump -C
00000000  e8 e9 ea                                          |...|




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
  2022-09-28  7:53         ` Jun T
@ 2022-09-28  9:52           ` Stephane Chazelas
  0 siblings, 0 replies; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-28  9:52 UTC (permalink / raw)
  To: Jun T; +Cc: zsh-workers

2022-09-28 16:53:58 +0900, Jun T:
> 
> > 2022/09/28 15:03, Stephane Chazelas <stephane@chazelas.org> wrote:
> > 
> > (here in 5.8, hd being the same as hexdump -C on my system).
> 
> Please test with my patch applied.
[...]

I think you missed the part of my email at the bottom (where
./Src/zsh is with the current git HEAD with your patch applied).

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-28  9:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23 10:54 [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte Stephane Chazelas
2022-09-24 11:04 ` Jun. T
2022-09-25 10:34   ` Stephane Chazelas
2022-09-26  5:49     ` Jun T
2022-09-28  6:03       ` Stephane Chazelas
2022-09-28  7:53         ` Jun T
2022-09-28  9:52           ` Stephane Chazelas

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).