* [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
@ 2022-09-23 10:54 Stephane Chazelas
2022-09-24 11:04 ` Jun. T
0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-23 10:54 UTC (permalink / raw)
To: Zsh hackers list
$ (limit cputime 10; TIMEFMT='%MMiB %U user %S sys'; time zsh +o multibyte -c ": {z..$'\x80'}")
3980MiB 8.84s user 1.40s sys
{$'\x80'..$'\xff} doesn't have the problem, but the expansion is:
$ zsh +o multibyte -c "printf %s {$'\x80'..$'\xfe'}" | hexdump -C
00000000 5c 4d 2d 40 5c 4d 2d 41 5c 4d 2d 42 5c 4d 2d 43 |\M-@\M-A\M-B\M-C|
00000010 5c 4d 2d 44 5c 4d 2d 45 5c 4d 2d 46 5c 4d 2d 47 |\M-D\M-E\M-F\M-G|
00000020 5c 4d 2d 48 5c 4d 2d 49 5c 4d 2d 4a 5c 4d 2d 4b |\M-H\M-I\M-J\M-K|
00000030 5c 4d 2d 4c 5c 4d 2d 4d 5c 4d 2d 4e 5c 4d 2d 4f |\M-L\M-M\M-N\M-O|
00000040 5c 4d 2d 50 5c 4d 2d 51 5c 4d 2d 52 5c 4d 2d 53 |\M-P\M-Q\M-R\M-S|
00000050 5c 4d 2d 54 5c 4d 2d 55 5c 4d 2d 56 5c 4d 2d 57 |\M-T\M-U\M-V\M-W|
00000060 5c 4d 2d 58 5c 4d 2d 59 5c 4d 2d 5a 5c 4d 2d 5b |\M-X\M-Y\M-Z\M-[|
00000070 5c 4d 2d 5c 5c 4d 2d 5d 5c 4d 2d 5e 5c 4d 2d 5f |\M-\\M-]\M-^\M-_|
00000080 5c 4d 2d 60 5c 4d 2d 61 5c 4d 2d 62 5c 4d 2d 63 |\M-`\M-a\M-b\M-c|
00000090 5c 4d 2d 64 5c 4d 2d 65 5c 4d 2d 66 5c 4d 2d 67 |\M-d\M-e\M-f\M-g|
000000a0 5c 4d 2d 68 5c 4d 2d 69 5c 4d 2d 6a 5c 4d 2d 6b |\M-h\M-i\M-j\M-k|
000000b0 5c 4d 2d 6c 5c 4d 2d 6d 5c 4d 2d 6e 5c 4d 2d 6f |\M-l\M-m\M-n\M-o|
000000c0 5c 4d 2d 70 5c 4d 2d 71 5c 4d 2d 72 5c 4d 2d 73 |\M-p\M-q\M-r\M-s|
000000d0 5c 4d 2d 74 5c 4d 2d 75 5c 4d 2d 76 5c 4d 2d 77 |\M-t\M-u\M-v\M-w|
000000e0 5c 4d 2d 78 5c 4d 2d 79 5c 4d 2d 7a 5c 4d 2d 7b |\M-x\M-y\M-z\M-{|
000000f0 5c 4d 2d 7c 5c 4d 2d 7d 5c 4d 2d 7e 5c 4d 2d 5e |\M-|\M-}\M-~\M-^|
00000100 3f 5e 00 5e 01 5e 02 5e 03 5e 04 5e 05 5e 06 5e |?^.^.^.^.^.^.^.^|
00000110 07 5e 08 5e 09 5e 0a 5e 0b 5e 0c 5e 0d 5e 0e 5e |.^.^.^.^.^.^.^.^|
00000120 0f 5e 10 5e 11 5e 12 5e 13 5e 14 5e 15 5e 16 5e |.^.^.^.^.^.^.^.^|
00000130 17 5e 18 5e 19 5e 1a 5e 1b 5e 1c 5e 1d 5e 1e 5e |.^.^.^.^.^.^.^.^|
00000140 1f 5e 20 5e 21 5e 22 5e 23 5e 24 5e 25 5e 26 5e |.^ ^!^"^#^$^%^&^|
00000150 27 5e 28 5e 29 5e 2a 5e 2b 5e 2c 5e 2d 5e 2e 5e |'^(^)^*^+^,^-^.^|
00000160 2f 5e 30 5e 31 5e 32 5e 33 5e 34 5e 35 5e 36 5e |/^0^1^2^3^4^5^6^|
00000170 37 5e 38 5e 39 5e 3a 5e 3b 5e 3c 5e 3d 5e 3e |7^8^9^:^;^<^=^>|
0000017f
With {$'\x80'..$'\xff'}, we get:
$ zsh +o multibyte -c "printf %s {$'\x80'..$'\xff'}" | hd
00000000 7b 80 2e 2e ff 7d |{....}|
00000006
One can always use:
() {set -o localoption +o multibyte; bytes=(${(#)@}); } {0..255}
and then
printf %s $^bytes[##x+1,0x81]
To get byte values from x to 0x80 in a {x..y} fashion as a work around
(BTW, the fact that it's MiB above instead of documented KiB on
systems other than Darwin/macos is a separate bug that has
already been reported at least a couple of times in the past).
--
Stephane
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-23 10:54 [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte Stephane Chazelas
@ 2022-09-24 11:04 ` Jun. T
2022-09-25 10:34 ` Stephane Chazelas
0 siblings, 1 reply; 7+ messages in thread
From: Jun. T @ 2022-09-24 11:04 UTC (permalink / raw)
To: zsh-workers
> 2022/09/23 19:54, Stephane Chazelas <stephane@chazelas.org> wrote:
>
> $ (limit cputime 10; TIMEFMT='%MMiB %U user %S sys'; time zsh +o multibyte -c ": {z..$'\x80'}")
> 3980MiB 8.84s user 1.40s sys
>
> {$'\x80'..$'\xff} doesn't have the problem, but the expansion is:
(snip)
> With {$'\x80'..$'\xff'}, we get:
>
> $ zsh +o multibyte -c "printf %s {$'\x80'..$'\xff'}" | hd
> 00000000 7b 80 2e 2e ff 7d |{....}|
Does this solve the problem?
diff --git a/Src/utils.c b/Src/utils.c
index 62bd3e602..edf5d3df7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
/* treat as single byte, possibly metafied */
if (wcp)
- *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
+ *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
return 1 + (*s == Meta);
}
/*
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-24 11:04 ` Jun. T
@ 2022-09-25 10:34 ` Stephane Chazelas
2022-09-26 5:49 ` Jun T
0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-25 10:34 UTC (permalink / raw)
To: Jun. T; +Cc: zsh-workers
On 2022-09-24 12:04, Jun. T wrote:
[...]
> Does this solve the problem?
[...]
Thanks that's better, but now:
$ echo $options[multibyte]
off
$ printf %s {$'\x80'..$'\xff'} | hexdump -C
00000000 5c 4d 2d 5e 40 5c 4d 2d 5e 41 5c 4d 2d 5e 42 5c
|\M-^@\M-^A\M-^B\|
00000010 4d 2d 5e 43 5c 4d 2d 5e 44 5c 4d 2d 5e 45 5c 4d
|M-^C\M-^D\M-^E\M|
00000020 2d 5e 46 5c 4d 2d 5e 47 5c 4d 2d 5e 48 5c 4d 2d
|-^F\M-^G\M-^H\M-|
00000030 5c 74 5c 4d 2d 5c 6e 5c 4d 2d 5e 4b 5c 4d 2d 5e
|\t\M-\n\M-^K\M-^|
00000040 4c 5c 4d 2d 5e 4d 5c 4d 2d 5e 4e 5c 4d 2d 5e 4f
|L\M-^M\M-^N\M-^O|
00000050 5c 4d 2d 5e 50 5c 4d 2d 5e 51 5c 4d 2d 5e 52 5c
|\M-^P\M-^Q\M-^R\|
00000060 4d 2d 5e 53 5c 4d 2d 5e 54 5c 4d 2d 5e 55 5c 4d
|M-^S\M-^T\M-^U\M|
00000070 2d 5e 56 5c 4d 2d 5e 57 5c 4d 2d 5e 58 5c 4d 2d
|-^V\M-^W\M-^X\M-|
00000080 5e 59 5c 4d 2d 5e 5a 5c 4d 2d 5e 5b 5c 4d 2d 5e
|^Y\M-^Z\M-^[\M-^|
00000090 5c 5c 4d 2d 5e 5d 5c 4d 2d 5e 5e 5c 4d 2d 5e 5f
|\\M-^]\M-^^\M-^_|
000000a0 c2 a0 c2 a1 c2 a2 c2 a3 c2 a4 c2 a5 c2 a6 c2 a7
|................|
000000b0 c2 a8 c2 a9 c2 aa c2 ab c2 ac c2 ad c2 ae c2 af
|................|
000000c0 c2 b0 c2 b1 c2 b2 c2 b3 c2 b4 c2 b5 c2 b6 c2 b7
|................|
000000d0 c2 b8 c2 b9 c2 ba c2 bb c2 bc c2 bd c2 be c2 bf
|................|
000000e0 c3 80 c3 81 c3 82 c3 83 c3 84 c3 85 c3 86 c3 87
|................|
000000f0 c3 88 c3 89 c3 8a c3 8b c3 8c c3 8d c3 8e c3 8f
|................|
00000100 c3 90 c3 91 c3 92 c3 93 c3 94 c3 95 c3 96 c3 97
|................|
00000110 c3 98 c3 99 c3 9a c3 9b c3 9c c3 9d c3 9e c3 9f
|................|
00000120 c3 a0 c3 a1 c3 a2 c3 a3 c3 a4 c3 a5 c3 a6 c3 a7
|................|
00000130 c3 a8 c3 a9 c3 aa c3 ab c3 ac c3 ad c3 ae c3 af
|................|
00000140 c3 b0 c3 b1 c3 b2 c3 b3 c3 b4 c3 b5 c3 b6 c3 b7
|................|
00000150 c3 b8 c3 b9 c3 ba c3 bb c3 bc c3 bd c3 be c3 bf
|................|
00000160
That's bytes 0x80 to 0x9f with their \M-^X representation followed by
UTF-8
encoded (in my locale using UTF-8 as charmap) characters U+00A0 to
U+00FF
instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.
In any case, that (documented) transliteration of unprintable characters
means
I can't use it for what I initially intended to (get a range of
arbitrary byte
values). It seems braceccl's {$'\0'-$'\xff'} works for that though
(though the
documentation suggests it may not be future proof):
> unchanged, unless the option BRACE_CCL (an abbreviation for 'brace
> character class') is set. In that case, it is expanded to a list of
> the
> individual characters between the braces sorted into the order of the
> characters in the ASCII character set (multibyte characters are not
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> currently handled). The syntax is similar to a [...] expression in
^^^^^^^^^^^^^^^^^^
--
Stephane
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-25 10:34 ` Stephane Chazelas
@ 2022-09-26 5:49 ` Jun T
2022-09-28 6:03 ` Stephane Chazelas
0 siblings, 1 reply; 7+ messages in thread
From: Jun T @ 2022-09-26 5:49 UTC (permalink / raw)
To: zsh-workers
> 2022/09/25 19:34, Stephane Chazelas <stephane@chazelas.org> wrote:
>
> Thanks that's better, but now:
>
> $ echo $options[multibyte]
> off
> $ printf %s {$'\x80'..$'\xff'} | hexdump -C
> 00000000 5c 4d 2d 5e 40 5c 4d 2d 5e 41 5c 4d 2d 5e 42 5c |\M-^@\M-^A\M-^B\|
> 00000010 4d 2d 5e 43 5c 4d 2d 5e 44 5c 4d 2d 5e 45 5c 4d |M-^C\M-^D\M-^E\M|
(snip)
>
> That's bytes 0x80 to 0x9f with their \M-^X representation followed by UTF-8
> encoded (in my locale using UTF-8 as charmap) characters U+00A0 to U+00FF
> instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.
Did you try 'LANG=C; setopt print_eight_bit' ?
Anyway, I will push the patch (included below again) with a test.
diff --git a/Src/utils.c b/Src/utils.c
index 62bd3e602..edf5d3df7 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
/* treat as single byte, possibly metafied */
if (wcp)
- *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
+ *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
return 1 + (*s == Meta);
}
/*
diff --git a/Test/D09brace.ztst b/Test/D09brace.ztst
index 580ed430f..c289be949 100644
--- a/Test/D09brace.ztst
+++ b/Test/D09brace.ztst
@@ -116,3 +116,10 @@
print -r {1..10}{..
0:Unmatched braces after matched braces are left alone.
>1{.. 2{.. 3{.. 4{.. 5{.. 6{.. 7{.. 8{.. 9{.. 10{..
+
+ () {
+ setopt localoptions no_multibyte
+ echo -E {$'\x80'..$'\x81'}
+ }
+0:range of 8bit chars, mulibyte option unset
+>\M-^@ \M-^A
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-26 5:49 ` Jun T
@ 2022-09-28 6:03 ` Stephane Chazelas
2022-09-28 7:53 ` Jun T
0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-28 6:03 UTC (permalink / raw)
To: Jun T; +Cc: zsh-workers
2022-09-26 14:49:08 +0900, Jun T:
>
> > 2022/09/25 19:34, Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > Thanks that's better, but now:
> >
> > $ echo $options[multibyte]
> > off
> > $ printf %s {$'\x80'..$'\xff'} | hexdump -C
> > 00000000 5c 4d 2d 5e 40 5c 4d 2d 5e 41 5c 4d 2d 5e 42 5c |\M-^@\M-^A\M-^B\|
> > 00000010 4d 2d 5e 43 5c 4d 2d 5e 44 5c 4d 2d 5e 45 5c 4d |M-^C\M-^D\M-^E\M|
> (snip)
> >
> > That's bytes 0x80 to 0x9f with their \M-^X representation followed by UTF-8
> > encoded (in my locale using UTF-8 as charmap) characters U+00A0 to U+00FF
> > instead of bytes 0x80 to 0xff which I'd expect with nomultibyte.
>
> Did you try 'LANG=C; setopt print_eight_bit' ?
Thanks for the print_eight_bit clue, though it doesn't seem to make a difference:
$ zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000 5e 28 5e 29 5e 2a |^(^)^*|
00000006
$ LC_ALL=C zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000 5e 28 5e 29 5e 2a |^(^)^*|
00000006
(here in 5.8, hd being the same as hexdump -C on my system).
>
> Anyway, I will push the patch (included below again) with a test.
>
> diff --git a/Src/utils.c b/Src/utils.c
> index 62bd3e602..edf5d3df7 100644
> --- a/Src/utils.c
> +++ b/Src/utils.c
> @@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp)
> if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
> /* treat as single byte, possibly metafied */
> if (wcp)
> - *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
> + *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s);
> return 1 + (*s == Meta);
> }
> /*
[...]
That can't be right. The comment says "treat as single byte", yet the result is
now multibyte characters instead of bytes:
$ ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000 c3 a8 c3 a9 c3 aa |......|
00000006
I asked for bytes 0xE8 to 0xEA and got UTF-8 encoded characters U+00E8 to U+00EA.
Though at least now, I can get what I want in this case with:
$ LC_ALL=C ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd
00000000 e8 e9 ea |...|
00000003
(the control characters are still expanded in ^? fashion, so I
still can't get ranges of bytes with that, but that's as
documented).
--
Stephane
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-28 6:03 ` Stephane Chazelas
@ 2022-09-28 7:53 ` Jun T
2022-09-28 9:52 ` Stephane Chazelas
0 siblings, 1 reply; 7+ messages in thread
From: Jun T @ 2022-09-28 7:53 UTC (permalink / raw)
To: zsh-workers
> 2022/09/28 15:03, Stephane Chazelas <stephane@chazelas.org> wrote:
>
> (here in 5.8, hd being the same as hexdump -C on my system).
Please test with my patch applied.
% LC_ALL=C /usr/local/bin/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hexdump -C
00000000 e8 e9 ea |...|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte
2022-09-28 7:53 ` Jun T
@ 2022-09-28 9:52 ` Stephane Chazelas
0 siblings, 0 replies; 7+ messages in thread
From: Stephane Chazelas @ 2022-09-28 9:52 UTC (permalink / raw)
To: Jun T; +Cc: zsh-workers
2022-09-28 16:53:58 +0900, Jun T:
>
> > 2022/09/28 15:03, Stephane Chazelas <stephane@chazelas.org> wrote:
> >
> > (here in 5.8, hd being the same as hexdump -C on my system).
>
> Please test with my patch applied.
[...]
I think you missed the part of my email at the bottom (where
./Src/zsh is with the current git HEAD with your patch applied).
--
Stephane
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-09-28 9:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23 10:54 [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte Stephane Chazelas
2022-09-24 11:04 ` Jun. T
2022-09-25 10:34 ` Stephane Chazelas
2022-09-26 5:49 ` Jun T
2022-09-28 6:03 ` Stephane Chazelas
2022-09-28 7:53 ` Jun T
2022-09-28 9:52 ` Stephane Chazelas
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).