help / color / mirror / code / Atom feed
From: "Jason C. Kwan" <jasonckwan@yahoo.com>
To: "zsh-workers@zsh.org" <zsh-workers@zsh.org>
Subject: bug report : printf %.1s outputting more than 1 character
Date: Wed, 15 Mar 2023 02:38:39 +0000 (UTC)	[thread overview]
Message-ID: <1621619253.265114.1678847919086@mail.yahoo.com> (raw)
In-Reply-To: <1621619253.265114.1678847919086.ref@mail.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I understand isn't the latest and greatest of 5.9, so perhaps this bug has already been addressed.
In the 4-byte sequence as seen below ( defined via explicit octal codes ), under no Unicode scenario should 4 bytes be printed out via a command of printf %.1s, by design. 
 - The first byte of \377 \xFF is explicitly invalid under UTF-8 (even allowing up to 7-byte in the oldest of definitions).  - The 4-byte value is too large to constitute a single character under either endian of UTF-32.  - It's also not a pair of beyond-BMP UTF-16 surrogates either, regardless of endian
At best, if treated as UTF-16, of either endian, this 4-byte sequence represents 2 code points, in which case, only 2 bytes should be printed not 4.

My high-level understanding of printf %.1s is that it should output the first locale-valid character of the input string, and in its absence, output the first byte instead, if any, so setting LC_ALL=C or POSIX would defeat the purpose of this bug report.
The reproducible sample shell command below includes what the output from zsh built-in printf looks like, what the macOS built-in printf looks like, and what the gnu printf looks like, all else being equal. The testing shell was invoked via
invoked via
    zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c
In all 3 test scenarios, LC_ALL is explicitly cleared, while LANG is explicitly set to a widely used one. 
The od used is the macOS one, not the gnu one.
To my best knowledge, the other printfs have produced the correct output.
Thanks for your time.
echo; echo "$ZSH_VERSION"; echo; uname -a; echo; LC_ALL= LANG="en_US.UTF-8" builtin printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" command printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" gprintf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo;+zsh:1> echo
+zsh:1> echo> echo
+zsh:1> uname -aDarwin m1mx4CT 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 +zsh:1> printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 210 234 256 135 012 012          nl  nl  ht   [   ?  88  9c   ?   ]  nl  nl          \n  \n  \t   [ 377 210 234 256   ]  \n  \n             0a0a    5b09    88ff    ae9c    0a5d    000a0000013+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 135 012 012          nl  nl  ht   [   ?   ]  nl  nl          \n  \n  \t   [ 377   ]  \n  \n             0a0a    5b09    5dff    0a0a0000010+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 gprintf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 135 012 012          nl  nl  ht   [   ?   ]  nl  nl          \n  \n  \t   [ 377   ]  \n  \n             0a0a    5b09    5dff    0a0a0000010+zsh:1> echo
zsh 5.8.1 (x86_64-apple-darwin22.0)

[-- Attachment #2: Type: text/html, Size: 10769 bytes --]

       reply	other threads:[~2023-03-15  2:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1621619253.265114.1678847919086.ref@mail.yahoo.com>
2023-03-15  2:38 ` Jason C. Kwan [this message]
2023-03-15  3:46   ` Bart Schaefer
2023-03-15  4:56     ` Jason C. Kwan
2023-03-15 15:31       ` Bart Schaefer
2023-03-15 15:50         ` Roman Perepelitsa
2023-03-18 16:56         ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1621619253.265114.1678847919086@mail.yahoo.com \
    --to=jasonckwan@yahoo.com \
    --cc=zsh-workers@zsh.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).