zsh-workers
 help / color / mirror / code / Atom feed
From: Peter Stephenson <p.w.stephenson@ntlworld.com>
To: zsh-workers@zsh.org
Subject: Re: bug report : printf %.1s outputting more than 1 character
Date: Sat, 18 Mar 2023 16:56:37 +0000	[thread overview]
Message-ID: <38f2432dcd7006355592f6e0e7dc15e420d3d7f3.camel@ntlworld.com> (raw)
In-Reply-To: <CAH+w=7ZsHMF_x=O3iE_dqtyaV7FpVK7Sgurf2=5=hbC=fSCLdw@mail.gmail.com>

On Wed, 2023-03-15 at 08:31 -0700, Bart Schaefer wrote:
> On Tue, Mar 14, 2023 at 9:56 PM Jason C. Kwan <jasonckwan@yahoo.com> wrote:
> > 
>> does the following ( below the "====" line ) behavior look even
>> reasonable at all, regardless of your spec ? Because what the spec ends
>> up doing is treating the rest of the input string as 1 byte and printing
>> everything out, even though there are valid code points further down the
>> input string.
> 
> I'm not the resident expert on multibyte character sets, so I'm just
> reporting the situation and waiting for e.g. PWS to respond.  However,
> as far as my understanding of the multibyte library goes, once you've
> "desynchronized" the input by encountering an invalid byte, you're not
> guaranteed that anything further that you see can be correctly
> interpreted as a code point.  I agree that it's not ideal to just dump
> everything else "raw".

Elsewhere, we mostly treat invalid codes as if they're single octets, so
this is a bit inconsistent.  I think it's really just to try to avoid
overcomplicating %s output.  However, it would probably be more
consistent just to treat everything that doesn't make sense as single
bytes until we get back on track.  There doesn't seem any point about
doing anything different with incomplete characters here, either ---
we've already got all the characters we're going to get.  Something like
this, but feel free to tweak further --- I don't have any motivation to
do so myself.

This is probably good enough for the obvious simple case of "just
output the next thing you see whatever the heck it looks like".

pws

diff --git a/Src/builtin.c b/Src/builtin.c
index 70a950666..9719d26d1 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -5222,20 +5222,21 @@ bin_print(char *name, char **args, Options ops, int func)
 #ifdef MULTIBYTE_SUPPORT
 			if (isset(MULTIBYTE)) {
 			    chars = mbrlen(ptr, lleft, &mbs);
-			    if (chars < 0) {
-				/*
-				 * Invalid/incomplete character at this
-				 * point.  Assume all the rest are a
-				 * single byte.  That's about the best we
-				 * can do.
-				 */
-				lchars += lleft;
-				lbytes = (ptr - b) + lleft;
-				break;
-			    } else if (chars == 0) {
-				/* NUL, handle as real character */
+			    /*
+			     * chars <= 0 means one of
+			     *
+			     * 0: NUL, handle as real character
+			     *
+			     * -1: MB_INVALID: Assume this is
+			     *     a single character as we do
+			     *     elsewhere in the code.
+			     *
+			     * -2: MB_INCOMPLETE: We're not waiting
+			     *     for input on this occasion, so
+			     *     just treat this as invalid.
+			     */
+			    if (chars <= 0)
 				chars = 1;
-			    }
 			}
 			else	/* use the non-multibyte code below */
 #endif



      parent reply	other threads:[~2023-03-18 16:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1621619253.265114.1678847919086.ref@mail.yahoo.com>
2023-03-15  2:38 ` Jason C. Kwan
2023-03-15  3:46   ` Bart Schaefer
2023-03-15  4:56     ` Jason C. Kwan
2023-03-15 15:31       ` Bart Schaefer
2023-03-15 15:50         ` Roman Perepelitsa
2023-03-18 16:56         ` Peter Stephenson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38f2432dcd7006355592f6e0e7dc15e420d3d7f3.camel@ntlworld.com \
    --to=p.w.stephenson@ntlworld.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).