From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14788 invoked from network); 28 Sep 2022 06:03:30 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 28 Sep 2022 06:03:30 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1664345010; b=NNXc5NB372FxXOu2l00ynDG8bR3636KNOwud36lpCo73sk2m4mg5EkddP8nsTahF1wevNCd04u HIVhTjj/pvEDGPmuxEbt5ZHbPKuD4g6vQnpd90/DJJzC8XMVGBkAZ2efjJ86zl0H5oesqaPQB0 Cfu09V/rIgZzaRa9oRnbyTX0Z8ZE7TMVpgHadW1cV5cFURr8Mmkln+dkhZLOcbuwnJE6tfcFFe QfK+c/ka45DdX9Z/ze+SML5OXLTXFy2DWp/zMX97HcvPijuMVnOOOPsN1OlMLTcz0AAGhXbz1w ll5tMHUnwMzvmx9OssOLu4IVpYWA4mp1YCp16a7/dFRiQw==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay2-d.mail.gandi.net) smtp.remote-ip=217.70.183.194; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1664345010; bh=Z0D9xmmdlIKrqZjGoYt1WkjoQhemwbPmX76YYPgX7Qo=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:In-Reply-To:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:DKIM-Signature; b=aKlb7/Zijxk3ZIP+vNtu5+nq5Y2JzvtJHPJnG1yRBj+H2ikScsjv9qpQ2dFIh42dGSgygR7XfT 45C4hhbCWqOndruDF1DGKDiuHMYuxKEQHmSAY5Y9RA9s10GDDHoe3eMdNjAJTNTPvBTnhn4ey1 YZy3xblNkkOE5qLnlE9+1/8FA+bD4eYBERL5GXmCfXYV2VNrTPIVc0Z56hDw2h+Gi8vq1APAGL +Io60baop2BKjlDLz057RDEIzoDLzsNBRKoMcvSO0wHCg4x92qIDinF2VNtT3PCrfUwiTovzuY NUubHRGrdOzJb01c2awkMkibU0slFKsQxclbA3joDhCwPQ==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=e3rpoDj+kgT9NCGpGYbO5Aek+McpVz/g2+FMpa/7+lM=; b=GhlWUX87CCPmFTooR5EM325zng yMJIpZPe8ixhO/UfwWAMb/Y77XALq+xT91DpdrMxEvqDQMmHCQiCsfw9a+6w/Sz9/BQLmJRg4Aopx 80ZBtV7ggX8nMFOD8VoSAyyqF5+aTHBtRZmkg16yi77P+XVyijlqgHQWeB3zW4Bs3hYUa2EnVuBOb uOjtSzxcUvmw7fl0F0lhnN1YgGGG13bnUX4lgxhxPJQ+90LFuX23zpZEpRYyUyCLiX9WFMjmgCiZj P/JpPcDTh6p0CJbwYMn5b3/FbS5LgKs/ifP2beytXPtKcOUmWBJSX4lHcGU+f2rnPD+l2CdwdzfcS +ml7jxcw==; Received: from authenticated user by zero.zsh.org with local id 1odQAL-000Kf8-AM; Wed, 28 Sep 2022 06:03:29 +0000 Authentication-Results: zsh.org; iprev=pass (relay2-d.mail.gandi.net) smtp.remote-ip=217.70.183.194; dmarc=none header.from=chazelas.org; arc=none Received: from relay2-d.mail.gandi.net ([217.70.183.194]:33445) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1odQA0-000KMi-V2; Wed, 28 Sep 2022 06:03:09 +0000 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPSA id 6BF0D40002; Wed, 28 Sep 2022 06:03:06 +0000 (UTC) Date: Wed, 28 Sep 2022 07:03:06 +0100 From: Stephane Chazelas To: Jun T Cc: zsh-workers@zsh.org Subject: Re: [bug] busy loop and memory exhaustion on {x..$'\80'} with nomultibyte Message-ID: <20220928060306.yjyrbspbitxyg5sn@chazelas.org> Mail-Followup-To: Jun T , zsh-workers@zsh.org References: <20220923105414.mrvkpoxsejwtu7rz@chazelas.org> <5DEC33AD-021F-42CD-A08E-F8BDEC23BD0A@kba.biglobe.ne.jp> <9382d4c99158b30b26e6cdf9b64f225d@chazelas.org> <26BCD53A-DBBC-4058-89E1-851705710541@kba.biglobe.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <26BCD53A-DBBC-4058-89E1-851705710541@kba.biglobe.ne.jp> X-Seq: 50677 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: 2022-09-26 14:49:08 +0900, Jun T: > > > 2022/09/25 19:34, Stephane Chazelas wrote: > > > > Thanks that's better, but now: > > > > $ echo $options[multibyte] > > off > > $ printf %s {$'\x80'..$'\xff'} | hexdump -C > > 00000000 5c 4d 2d 5e 40 5c 4d 2d 5e 41 5c 4d 2d 5e 42 5c |\M-^@\M-^A\M-^B\| > > 00000010 4d 2d 5e 43 5c 4d 2d 5e 44 5c 4d 2d 5e 45 5c 4d |M-^C\M-^D\M-^E\M| > (snip) > > > > That's bytes 0x80 to 0x9f with their \M-^X representation followed by UTF-8 > > encoded (in my locale using UTF-8 as charmap) characters U+00A0 to U+00FF > > instead of bytes 0x80 to 0xff which I'd expect with nomultibyte. > > Did you try 'LANG=C; setopt print_eight_bit' ? Thanks for the print_eight_bit clue, though it doesn't seem to make a difference: $ zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd 00000000 5e 28 5e 29 5e 2a |^(^)^*| 00000006 $ LC_ALL=C zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd 00000000 5e 28 5e 29 5e 2a |^(^)^*| 00000006 (here in 5.8, hd being the same as hexdump -C on my system). > > Anyway, I will push the patch (included below again) with a test. > > diff --git a/Src/utils.c b/Src/utils.c > index 62bd3e602..edf5d3df7 100644 > --- a/Src/utils.c > +++ b/Src/utils.c > @@ -5519,7 +5519,7 @@ mb_metacharlenconv(const char *s, wint_t *wcp) > if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) { > /* treat as single byte, possibly metafied */ > if (wcp) > - *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s); > + *wcp = (wint_t)STOUC(*s == Meta ? s[1] ^ 32 : *s); > return 1 + (*s == Meta); > } > /* [...] That can't be right. The comment says "treat as single byte", yet the result is now multibyte characters instead of bytes: $ ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd 00000000 c3 a8 c3 a9 c3 aa |......| 00000006 I asked for bytes 0xE8 to 0xEA and got UTF-8 encoded characters U+00E8 to U+00EA. Though at least now, I can get what I want in this case with: $ LC_ALL=C ./Src/zsh -o printeightbit +o multibyte -c $'printf %s {\xe8..\xea}' | hd 00000000 e8 e9 ea |...| 00000003 (the control characters are still expanded in ^? fashion, so I still can't get ranges of bytes with that, but that's as documented). -- Stephane