From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 26938 invoked from network); 7 Feb 2022 12:16:23 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 7 Feb 2022 12:16:23 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1644236183; b=YtyQDQyImZeYNFXfkCyYaJJ9flSfnY1WyQKrJtCz9ewK9st7zzPBMEZtp0OyXR4n03GQYKeuxE nP4Bn1PIKxraCxlO+CmZJo6o7hkNkQcihsnLgp3fyyOQg4kj9YrrM0FXiLFnjgKinWSkYisJX4 dlzit4r30FDXagrSY/rsbcenwrDDWeDoPhdYa+SPPaRLhoB2NmTIPGoF9JLyMJOSDCmA9Y4w/a ECYUw8MaesGuxhWPjYaFh4nQUtyr6ZOEb16EDwEaNbdkt8ZXHvMC5o8xBit2i5ELaLNHITqh5Y s99SxnVMmpXdzssLMmykzXECMwS31ysHrWSCRvyO90Wl3g==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97; dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256; dmarc=pass header.from=ntlworld.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1644236183; bh=slN/+C/JwGFGKMGsIeKloTFrLnQpQYuP/zuAGtZsHg4=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:Subject: References:In-Reply-To:Message-ID:To:From:Date:DKIM-Signature: DKIM-Signature; b=T5DQu2Had1eeS6HKV9T9msrwOjWQC7eDXmS2yVoXQxreJuBpR7k7/g5bO+oKQTIzdwlTE93MnS Pc1KeZMkp548t+yLLHI5ZzcAelQuEzee1D2kpglzTKpPBjuCIzGnXa3d3K/UlGR/ckwdYr009+ ojQ8DcQ1hj6Y+ZEHEhdgCMUss4xWMMRUhhrehG7Rp58pl34d8ypcgkw8BX1EEiSywfRIPDUiHG MOU/wLOmfu3I9WmgBVVgLbzVVXOM4NeBFmDtNWt/lcvQr77QvGVRTMC8tyvuzx6Or9qRMCoZHX yf8xtaW7jJ8L4d0WPjQbKfxOMiBsKdhAKHEwkbx57k5www==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Subject:References:In-Reply-To:Message-ID:To:From: Date:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=HbUerUDUCOgoCCQ99N5KK9Qn7qdZ3Nayf6zjEoOx+2I=; b=sUWA4+UbUefFbhbC9M8EdlXyeI 4JUrF06lId5UGWjJB7kZDdNU+VRwx4/tJhr+2rlzdQUCeioEKA4S+XbRbsWkXGMQjtZChZTHSx2p5 FiS+cGBsFjj9h6D6nnuIPqg8P/O6AhSqF/AjL/y34Y694CRMpVxDf5uv4NjbkCqW5VtyrVMmcbs0M zvduQsv4S0XMjMH+NVZDTD3Xmk3PPUBmrHQpJM4wi+EYWB6Lk0BEZa2i3rx8Qb1dN2U+OTI0eLB18 eEArcxZDy9sEiY0wBS4oSF29f/5eMcHdxEwSS+nhOAqvz2V7pMpEmwQiBO7rpHbFltH7U5oY13a0F JVIs7aWg==; Received: from authenticated user by zero.zsh.org with local id 1nH2wP-0000xz-Bi; Mon, 07 Feb 2022 12:16:21 +0000 Authentication-Results: zsh.org; iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97; dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256; dmarc=pass header.from=ntlworld.com; arc=none Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:39746) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1nH2vp-0000ch-Fz; Mon, 07 Feb 2022 12:15:47 +0000 Received: from [212.54.57.82] (helo=smtp3.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtp (Exim 4.90_1) (envelope-from ) id 1nH2vp-0002dl-73 for zsh-workers@zsh.org; Mon, 07 Feb 2022 13:15:45 +0100 Received: from oxbe3.tb.ukmail.iss.as9143.net ([172.25.160.134]) by smtp3.tb.ukmail.iss.as9143.net with ESMTP id H2vonLSnrlejnH2vpn6Oqs; Mon, 07 Feb 2022 13:15:45 +0100 X-Env-Mailfrom: p.w.stephenson@ntlworld.com X-Env-Rcptto: zsh-workers@zsh.org X-SourceIP: 172.25.160.134 X-CNFS-Analysis: v=2.4 cv=LMt1/ba9 c=1 sm=1 tr=0 ts=62010d71 cx=a_exe a=rX2mKgs1IZJU9lF6pltyEQ==:117 a=I_Av47A-8eYA:10 a=IkcTkHD0fZMA:10 a=vinPr5In5y8A:10 a=NLZqzBF-AAAA:8 a=N898d1J4AAAA:8 a=G9T7XbmNKSjhEN-kTGEA:9 a=QEXdDO2ut3YA:10 a=wW_WBVUImv98JQXhvVPZ:22 a=4OhKILdWXy8MGewZJRtY:22 X-Authenticated-Sender: p.w.stephenson@ntlworld.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ntlworld.com; s=meg.feb2017; t=1644236145; bh=slN/+C/JwGFGKMGsIeKloTFrLnQpQYuP/zuAGtZsHg4=; h=Date:From:To:In-Reply-To:References:Subject; b=c1Zo2MnG81PKurBKCjMv1ctWL2rDTNCiER1hBA7M4rQXeFB1LRuJvHYbgIsA8y1EZ CeXzEpqkxePWFBLBC+FN75RycxMrgq0ZSh5YZb0+2c0PxG60aLVuVI3pEL4lj4pwfL 7gRNdZwaJOAszJuuMGgFNjGt6j/862BJ7ML5xEChOrrGlku9/Qfeidmmdnsh0oL954 5rGzftmzqbLPDokOPepzqjyJMiHBIXu8irupSB2sRbL/YQ9QSV5qriJRgjTWk5DjoW zKnV7PNvxU3AeL9c2y/dxgn6jOfqZEXOHh8J4fym2J/J7HgR8sgaR+7zZ8tsiaBTno fW18Kys8x/+sg== Date: Mon, 7 Feb 2022 12:15:44 +0000 (GMT) From: Peter Stephenson To: zsh workers Message-ID: <1692212664.579212.1644236144924@mail2.virginmedia.com> In-Reply-To: <1071890479.577225.1644233454174@mail2.virginmedia.com> References: <20220206084255.tn3dgitvpr7qdjig@chazelas.org> <1071890479.577225.1644233454174@mail2.virginmedia.com> Subject: Re: clarification on (#U) in pattern matching. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v7.8.4-Rev72 X-Originating-IP: 165.225.17.146 X-Originating-Client: open-xchange-appsuite X-CMAE-Envelope: MS4xfL9KdNZBCDBn80IC+kzdQkbCwbbkik9taqrd4I3XWNgCIpmkm3puw1SjWT2+HoiR07Hu220wqDy+K3tO4urikrI+OJUh1oqN00QOblXklQtAtR2jxCQr 05IEnN7A4IV8e9uYLnoq757AXneGXKT8pX0DXoevyojr+744dlzPCA4C1LqnNN+0eEnURqr8cKDl13/75hKUDQBAKt1tcROgV6ZFQ511cvbZ5NoA3xtxvvfQ X-Seq: 49745 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: Sorry, this just went to Stephane. pws > On 07 February 2022 at 11:30 Peter Stephenson wrote: >=20 >=20 > > On 06 February 2022 at 08:42 Stephane Chazelas = wrote: > > $ set -o extendedglob > > $ a=3D'St=C3=A9phane=E2=82=AC' > > $ print -rn -- ${a//(#U)?} | hd > > 00000000 a9 82 ac |...| > > 00000003 > >=20 > > It seems that with (#U) (and here in a locale using UTF-8 as > > charmap), ? with (#U) matches only on the first byte of > > multibyte characters. Is that how it's meant to be? >=20 > I think what you're hitting is probably, as you suspected, a > difference between the pattern matching code and the substitution > code. The underlying pattern matching really is byte by byte, > but this doesn't force any substitution such as // to behave > in the same way. As far as I know, the MULTIBYTE option is > the only higher level consistency measure we have. >=20 > I think there might be a parameter matching flag that you can > also set that would help. I'd have to look in more detail. >=20 > pws