From: "Jason C. Kwan" <jasonckwan@yahoo.com>
To: "zsh-workers@zsh.org" <zsh-workers@zsh.org>
Subject: bug report : printf %.1s outputting more than 1 character
Date: Wed, 15 Mar 2023 02:38:39 +0000 (UTC) [thread overview]
Message-ID: <1621619253.265114.1678847919086@mail.yahoo.com> (raw)
In-Reply-To: <1621619253.265114.1678847919086.ref@mail.yahoo.com>
[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]
I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I understand isn't the latest and greatest of 5.9, so perhaps this bug has already been addressed.
In the 4-byte sequence as seen below ( defined via explicit octal codes ), under no Unicode scenario should 4 bytes be printed out via a command of printf %.1s, by design.
- The first byte of \377 \xFF is explicitly invalid under UTF-8 (even allowing up to 7-byte in the oldest of definitions). - The 4-byte value is too large to constitute a single character under either endian of UTF-32. - It's also not a pair of beyond-BMP UTF-16 surrogates either, regardless of endian
At best, if treated as UTF-16, of either endian, this 4-byte sequence represents 2 code points, in which case, only 2 bytes should be printed not 4.
My high-level understanding of printf %.1s is that it should output the first locale-valid character of the input string, and in its absence, output the first byte instead, if any, so setting LC_ALL=C or POSIX would defeat the purpose of this bug report.
The reproducible sample shell command below includes what the output from zsh built-in printf looks like, what the macOS built-in printf looks like, and what the gnu printf looks like, all else being equal. The testing shell was invoked via
invoked via
zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c
In all 3 test scenarios, LC_ALL is explicitly cleared, while LANG is explicitly set to a widely used one.
The od used is the macOS one, not the gnu one.
To my best knowledge, the other printfs have produced the correct output.
Thanks for your time.
====================================================================
echo; echo "$ZSH_VERSION"; echo; uname -a; echo; LC_ALL= LANG="en_US.UTF-8" builtin printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo; LC_ALL= LANG="en_US.UTF-8" command printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo; LC_ALL= LANG="en_US.UTF-8" gprintf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ; echo;+zsh:1> echo
+zsh:1> echo 5.8.15.8.1+zsh:1> echo
+zsh:1> uname -aDarwin m1mx4CT 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 +zsh:1> printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000 012 012 011 133 377 210 234 256 135 012 012 nl nl ht [ ? 88 9c ? ] nl nl \n \n \t [ 377 210 234 256 ] \n \n 0a0a 5b09 88ff ae9c 0a5d 000a0000013+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000 012 012 011 133 377 135 012 012 nl nl ht [ ? ] nl nl \n \n \t [ 377 ] \n \n 0a0a 5b09 5dff 0a0a0000010+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 gprintf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000 012 012 011 133 377 135 012 012 nl nl ht [ ? ] nl nl \n \n \t [ 377 ] \n \n 0a0a 5b09 5dff 0a0a0000010+zsh:1> echo
zsh 5.8.1 (x86_64-apple-darwin22.0)
[-- Attachment #2: Type: text/html, Size: 10769 bytes --]
next parent reply other threads:[~2023-03-15 2:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1621619253.265114.1678847919086.ref@mail.yahoo.com>
2023-03-15 2:38 ` Jason C. Kwan [this message]
2023-03-15 3:46 ` Bart Schaefer
2023-03-15 4:56 ` Jason C. Kwan
2023-03-15 15:31 ` Bart Schaefer
2023-03-15 15:50 ` Roman Perepelitsa
2023-03-18 16:56 ` Peter Stephenson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1621619253.265114.1678847919086@mail.yahoo.com \
--to=jasonckwan@yahoo.com \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).