zsh-workers
 help / color / mirror / code / Atom feed
From: "Jason C. Kwan" <jasonckwan@yahoo.com>
To: "zsh-workers@zsh.org" <zsh-workers@zsh.org>
Subject: bug report : printf %.1s outputting more than 1 character
Date: Wed, 15 Mar 2023 02:38:39 +0000 (UTC)	[thread overview]
Message-ID: <1621619253.265114.1678847919086@mail.yahoo.com> (raw)
In-Reply-To: <1621619253.265114.1678847919086.ref@mail.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I understand isn't the latest and greatest of 5.9, so perhaps this bug has already been addressed.
In the 4-byte sequence as seen below ( defined via explicit octal codes ), under no Unicode scenario should 4 bytes be printed out via a command of printf %.1s, by design. 
 - The first byte of \377 \xFF is explicitly invalid under UTF-8 (even allowing up to 7-byte in the oldest of definitions).  - The 4-byte value is too large to constitute a single character under either endian of UTF-32.  - It's also not a pair of beyond-BMP UTF-16 surrogates either, regardless of endian
At best, if treated as UTF-16, of either endian, this 4-byte sequence represents 2 code points, in which case, only 2 bytes should be printed not 4.

My high-level understanding of printf %.1s is that it should output the first locale-valid character of the input string, and in its absence, output the first byte instead, if any, so setting LC_ALL=C or POSIX would defeat the purpose of this bug report.
The reproducible sample shell command below includes what the output from zsh built-in printf looks like, what the macOS built-in printf looks like, and what the gnu printf looks like, all else being equal. The testing shell was invoked via
invoked via
    zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c
In all 3 test scenarios, LC_ALL is explicitly cleared, while LANG is explicitly set to a widely used one. 
The od used is the macOS one, not the gnu one.
To my best knowledge, the other printfs have produced the correct output.
Thanks for your time.
====================================================================
echo; echo "$ZSH_VERSION"; echo; uname -a; echo; LC_ALL= LANG="en_US.UTF-8" builtin printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" command printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" gprintf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo;+zsh:1> echo
+zsh:1> echo 5.8.15.8.1+zsh:1> echo
+zsh:1> uname -aDarwin m1mx4CT 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 +zsh:1> printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 210 234 256 135 012 012          nl  nl  ht   [   ?  88  9c   ?   ]  nl  nl          \n  \n  \t   [ 377 210 234 256   ]  \n  \n             0a0a    5b09    88ff    ae9c    0a5d    000a0000013+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 135 012 012          nl  nl  ht   [   ?   ]  nl  nl          \n  \n  \t   [ 377   ]  \n  \n             0a0a    5b09    5dff    0a0a0000010+zsh:1> echo
+zsh:1> LC_ALL='' LANG=en_US.UTF-8 gprintf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'+zsh:1> od -bacx0000000   012 012 011 133 377 135 012 012          nl  nl  ht   [   ?   ]  nl  nl          \n  \n  \t   [ 377   ]  \n  \n             0a0a    5b09    5dff    0a0a0000010+zsh:1> echo
zsh 5.8.1 (x86_64-apple-darwin22.0)

[-- Attachment #2: Type: text/html, Size: 10769 bytes --]

       reply	other threads:[~2023-03-15  2:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1621619253.265114.1678847919086.ref@mail.yahoo.com>
2023-03-15  2:38 ` Jason C. Kwan [this message]
2023-03-15  3:46   ` Bart Schaefer
2023-03-15  4:56     ` Jason C. Kwan
2023-03-15 15:31       ` Bart Schaefer
2023-03-15 15:50         ` Roman Perepelitsa
2023-03-18 16:56         ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1621619253.265114.1678847919086@mail.yahoo.com \
    --to=jasonckwan@yahoo.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).