From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23004 invoked from network); 23 Mar 2022 10:38:50 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 23 Mar 2022 10:38:50 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1648031930; b=Ujy8Eu1uMNmgeJJps2bzTQG/CtsVfkokREtB89vb3g17CRtvhqzUKIFpVbs5HPLJ78l3muM9Vz hu0Mz95fKQIhzrzpLN5erFWlnH7AM578jpeqjqNi1cnXJkdl/WOxLnjVxtCEPUWkguyTZMxV61 DcIJfAQEfUJG1UrL6xRifRZTfO0/JmWcAmFuj3RIOy+J2wtkfcFTmX8cNXjJGtETNNW7blZMxJ kBKkiOmQ4J4jpg+v2PjUlIZWSJjSqoTeCu9IlUbGguSqmyFMcrZnvuoszn+zpjKK89ys8hi8oe H0sITLeRSAOz5pr29mFuM5+mN5FzYNfOBr60rvkylIxKuw==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay8-d.mail.gandi.net) smtp.remote-ip=217.70.183.201; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1648031930; bh=ixvDgp9dvWnFgppKaxzhhfaJ704vnA5lEBCoMpbB3Xc=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:In-Reply-To:Content-Transfer-Encoding:Content-Type: MIME-Version:References:Message-ID:Subject:To:From:Date:DKIM-Signature; b=k5kyH0wCSehul1bObvffNrd1D7T+tqvACX9AFMrWqMnTECNdYLwEvr9TgUu3vqnjT4KEIS99pM UHE/1Mew2rqo90IywhP+LLVjQat+1wyXPmfB6Q5T118MnGqkvdxOQe3v4kklt3GFRfaQ+qtfv1 zeJmfB9Xg7B6VK0HDMkfIXL2L7CLRjoO7d3kVOgMDu/IBYXQ4oK18qU9BU7rrlUiwh8giiKVzI X2y5+Z2RUHWVMFfu8AMPz7XfJJhqAv/ySMbL7CEPM4IjgvRrhZK4aXhR581N4ar/b2xtHxh/uh Ocww4cDFd32A+qqeFLVVs+WtXYh0GyKM2cYHvyCxwA0qGA==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:In-Reply-To:Content-Transfer-Encoding :Content-Type:MIME-Version:References:Message-ID:Subject:To:From:Date: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=P8J9PBd4GTzibIcrFa4rK53T92mYQqEl0Rzj/mggQ1o=; b=U+y8yIRNH2g8GTqH4cpg8vvuRH jBd/uw6P+ROhI/4PdfO+GlfL/YHVJaTSaELrtzCLg5l4pHCbb0RzKXqWqniFzAR3xtMTufpQ7QJEm KWUHUWosjQpICw4LFbPbIkwROSsLkM4OXxEELjSUdQjUm1GWlg885kwSn1i6FFgaRxb7ew6SF7rVR k4gevGNqZyejPtIRhvC5JbIEM1J06Krv742C0KI9X0KUIaF43oaiQPKRgex02YQeM0dljUddiIfAB 9XBClHJ6J7NqWXa3J3qXRmHljoA7lsxG5wXUq0FB5Rs0gxh4whOvbEJ6OfUpuPIt11Ok7VTEuqidv P/qst5sw==; Received: from authenticated user by zero.zsh.org with local id 1nWyOA-0005gs-3E; Wed, 23 Mar 2022 10:38:50 +0000 Authentication-Results: zsh.org; iprev=pass (relay8-d.mail.gandi.net) smtp.remote-ip=217.70.183.201; dmarc=none header.from=chazelas.org; arc=none Received: from relay8-d.mail.gandi.net ([217.70.183.201]:58895) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1nWyNx-0005Ki-3I; Wed, 23 Mar 2022 10:38:37 +0000 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPSA id 2FA861BF203 for ; Wed, 23 Mar 2022 10:38:35 +0000 (UTC) Date: Wed, 23 Mar 2022 10:38:35 +0000 From: Stephane Chazelas To: zsh-workers@zsh.org Subject: Re: Test ./E03posix.ztst was expected to fail, but passed. Message-ID: <20220323103835.hpoprdgt45iyqqgt@chazelas.org> Mail-Followup-To: zsh-workers@zsh.org References: <20220315163347.GA617047@zira.vinc17.org> <082447B3-C6A4-44A1-A3D3-7FD89D707480@kba.biglobe.ne.jp> <20220323022644.GA349036@zira.vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220323022644.GA349036@zira.vinc17.org> X-Seq: 49884 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: 2022-03-23 03:26:44 +0100, Vincent Lefevre: > On 2022-03-22 14:04:30 -0700, Bart Schaefer wrote: > > Specifically in this instance, we consider it a POSIX bug that '%s' > > always counts byte positions and that zsh has fixed this when it > > counts character positions. > > But, AFAIK, on the POSIX side, it has never been regarded as a bug > (I haven't seen any bug report). [...] It's been raised several times on the POSIX mailing list, and my understanding the opengroup doesn't consider it as a bug, and they have made it clear that they would not address it. They may consider specifying ksh93's %Ls (which pads based on display width, not byte nor character count) if enough implementations start to support it. That's why I didn't bother raising it as a bug personally, but to me, that position (where printf(1) is meant to be an interface to printf(3) without decoding those bytes into characters) does not make sense. printf is to print formatted text, not doing padding of binary strings. printf(3) was extended with wprintf(3) to handle wide characters, printf(1) should have been enhanced to switch to that or equivalent just like every other text utility is now specified to be able to cope with wide characters. printf(1) should need to decode arguments into text if only because in the format or %b arguments, the "\" character (also "%" in the format) is being interpreted specially. zsh doesn't btw (which may be considered a bug, but then again those non-UTF8 multibyte charsets are poorly supported throughout, and to me it doesn't seem worth the effort given that hardly anybody uses multibyte charsets other than UTF-8 these days): $ LC_ALL=zh_HK SHELL=/bin/zsh luit zsh$ locale charmap BIG5-HKSCS zsh$ printf 'αb' | hd 00000000 a3 08 |..| 00000002 (as α is encoded as 0xa3 0x5c in BIG5-HKSCS as used in that locale, 0x5c being also \) Yash is probably the only shell that does implement the POSIX spec as POSIXly likely intends it to be: ~$ LC_ALL=zh_HK SHELL=yash luit yash$ printf 'αb' | hd 00000000 a3 5c 62 |.\b| 00000003 yash$ printf %5s 'αb' | hd 00000000 20 20 a3 5c 62 | .\b| 00000005 yash$ printf %5b 'αb' | hd 00000000 20 20 a3 5c 62 | .\b| 00000005 That is bytes are decoded into characters for those backslashes to be interpreted "correctly" (yash does decode everything, it's not specific to printf¹), and then encoded back to behave as if being passed to printf(3) as POSIX requires. I've not verified it, but I've read somewhere the C standard was considering enhancing printf("%.3s") so it doesn't break characters in the middle (or maybe it's already the case?). So printf '%.3s\n' Stéphane, where é is UTF-8 encoded in a locale using UTF-8 would output "St" instead of "St<0xc3>". My opinion would be: - not change how %5s works in zsh. To me, zsh made an effort to fix that, I can't expect anyone relying on the POSIX behaviour which to me is a bug. One can always do printf() { set -o localoptions +o multibyte; builtin printf "$@" } if they want the POSIX behaviour. - no need to fix the problems with backslashes in those messed-up multibyte encodings as I'd expect they're being phased out. - maybe implement ksh93's %Ls (zsh does have a ${(ml[5])param} alternative though it does both padding and truncation). --- ¹ That approach is not tenable IMO as that means yash can't cope with arbitrary file paths, arguments, or environment variables -- Stephane