From mboxrd@z Thu Jan 1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level:
X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham
autolearn_force=no version=3.4.4
Received: (qmail 26962 invoked from network); 18 Mar 2023 16:56:59 -0000
Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368)
by inbox.vuxu.org with ESMTPUTF8; 18 Mar 2023 16:56:59 -0000
ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1679158619;
b=ezK6HGmekwzepGQu9KgvQtD3Ub7VLbHgvNhT1dUcmnuwqQ6opQ6Xp7L70ABnYSXHV4zv3k/jGF
3YR6D+wfl0Hp/VY5PP770NiA0UqDgiH9Y4jk5R7a1NaVCYyM8UwsV5sXLgBFHZLOUZ+8hpZsbE
ZK5jP94v7TdL+XRa0uxPMJQTOGyLCNF8jYe1hmxauQMJEdRHRqT7nBcC9qtVnTZ7ovBnEaDX4R
xIRB5Uaw2eJUTW8jIKAjTPTwItbMujFhw5NfQdisyuJ8lZ5gTsojW44IhMzmTP1iovdrWAheyq
Uu1KPdKo5AW0KuC33g8eJ/qiO+psXa12YvhKlTiNrccniw==;
ARC-Authentication-Results: i=1; zsh.org;
iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97;
dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256;
dmarc=pass header.from=ntlworld.com;
arc=none
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1679158619;
bh=OcJIi1xV6IqOaPtlos4tONZ3CHYSCnFHGaEv1MEKt4E=;
h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help:
List-Id:Sender:Content-Transfer-Encoding:MIME-Version:Content-Type:
References:In-Reply-To:Date:To:From:Subject:Message-ID:DKIM-Signature:
DKIM-Signature;
b=i0SNfcsH4o2l6sDSqP7a6XB+rYH0Dcp6JgakW7AvFo/dMXcMKvG6UYJzs5lyTMiaVOrzojBhXI
Q2HdE0oyZkU1RsOuYDWnxYuVhtmIWymVbYzTBkHo2vgxuD8zdE5jEUzqjelx318WR2R75r0v7j
QTX4QNouLKaCDlh4z1XvtmoEPgFnTxSWhgSKqV8936Oobvn8dvrtPKz2Rre7PQYsx8wSWTHNRY
fPubOqCWdJK7SbPjdc8bXLUxwIg5ujCzxMs9r5n9MDkFcktoWfW0rzvs4Xz7SMPnl5lgSSGNiK
joVhoc1c++BOsDE+qvks1hreE0MuNyLHx+KkHw04oftZFw==;
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org;
s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:
List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding:
Mime-Version:Content-Type:References:In-Reply-To:Date:To:From:Subject:
Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From
:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID;
bh=/0zFyrlwTKRaED/Pgk1hYWsh7ewur57mfWr26J4b888=; b=HAyUC1k8lwKTORmV7AVT0DrCXW
upGrVNgVF4C50U5N7tD2arHqASOr51h7o0ztlYDjZzbOlVXYKozOzZ/ZsXwqRKFrwPXa1Uohp2jV0
wKf2N4flXdCozKu13+1HZtCPcacNzd2f28l5eb2J7w46GYvtf94mc1vm0U6Aubmo7AKzA2VgYj3up
Xr9t6A9s9T/IeohIJznGEoKseEvGT7MzyMxSQC7foFsmYnzbvpdulUJNKL9biWcy9XuFwW0L/N0Gl
8cuLplscQgW8Oo/OZQnDv6bijEO0V2QNyfCLz2/US98++yuaOt2On3o9eVf96qNzJJAgGTLV/cKzs
mG0YcGow==;
Received: by zero.zsh.org with local
id 1pdZrW-000EoW-IA;
Sat, 18 Mar 2023 16:56:58 +0000
Authentication-Results: zsh.org;
iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97;
dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256;
dmarc=pass header.from=ntlworld.com;
arc=none
Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:46624)
by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
id 1pdZrD-000EUR-5T;
Sat, 18 Mar 2023 16:56:39 +0000
Received: from [212.54.57.106] (helo=csmtp2.tb.ukmail.iss.as9143.net)
by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1)
(envelope-from
)
id 1pdZrC-0006vX-J3
for zsh-workers@zsh.org; Sat, 18 Mar 2023 17:56:38 +0100
Received: from pws-Zeus ([82.1.229.179])
by cmsmtp with ESMTPA
id dZrCpbn0uNOHpdZrCp84Bt; Sat, 18 Mar 2023 17:56:38 +0100
X-SourceIP: 82.1.229.179
X-Authenticated-Sender: p.w.stephenson@ntlworld.com
X-Spam: 0
X-Authority: v=2.4 cv=f66ORs+M c=1 sm=1 tr=0 ts=6415ed46 cx=a_exe
a=W4rGJ7PGCSRRUvufKHO8Dg==:117 a=W4rGJ7PGCSRRUvufKHO8Dg==:17
a=IkcTkHD0fZMA:10 a=k__wU0fu6RkA:10 a=CjxXgO3LAAAA:8 a=ektClxOcwJYiqwCWrT8A:9
a=QEXdDO2ut3YA:10
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com;
s=meg.feb2017; t=1679158598;
bh=/0zFyrlwTKRaED/Pgk1hYWsh7ewur57mfWr26J4b888=;
h=Subject:From:To:Date:In-Reply-To:References;
b=2Vv0ooQ8Iz8La/xTR9jSHQ4uF7HORNW9TweEt3/nQo7BPN3leMIRbKlnjzaqZzy48
kvmervAANUndihSukbflJAT3PduTCwSnoVwpFABBY3wdj1yEC8Va4BI66WPNVmheDb
l9G1Q1Bo4xqx1dTMsMi8iqxhWxU3JC9qt0AcApOTp1GOOaJQ82S99XtnsnVVCZ5mjx
ccJyHyPysVaHdvjq36BudRm0PT291SWB/5gQPBCpN11ixRLmy7rQCGDqlYA+AI5e3Q
Dn95YUkgpnk3jsale+Jj9NffCBlK5tDFN+0uVclXuPPGIj5UuuR5CReAZJGVoxywXv
3t+kteMl3ZDPQ==
Message-ID: <38f2432dcd7006355592f6e0e7dc15e420d3d7f3.camel@ntlworld.com>
Subject: Re: bug report : printf %.1s outputting more than 1 character
From: Peter Stephenson
To: zsh-workers@zsh.org
Date: Sat, 18 Mar 2023 16:56:37 +0000
In-Reply-To:
References: <1621619253.265114.1678847919086.ref@mail.yahoo.com>
<1621619253.265114.1678847919086@mail.yahoo.com>
<478761809.298180.1678856216911@mail.yahoo.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2
Mime-Version: 1.0
Content-Transfer-Encoding: 8bit
X-CMAE-Envelope: MS4xfESmC/2QEaoKxix3VtRNpnZeLEfrqbP/m53BpHquM330dpwDafbzJg7oXNtdpNoWPrqLJhxDeFrH90eLsUNk25sm3/mNrf7n7BbbSiwLDM6Gr8Cf32ld
je3ZfbQnmbR34iytFgrS9yUKcm9VHXPpMsngIacocebSxNYPzQ5Kt9G3DXXRt/GG6aVgsHUd+g9k2A==
X-Seq: 51586
Archived-At:
X-Loop: zsh-workers@zsh.org
Errors-To: zsh-workers-owner@zsh.org
Precedence: list
Precedence: bulk
Sender: zsh-workers-request@zsh.org
X-no-archive: yes
List-Id:
List-Help: ,
List-Subscribe: ,
List-Unsubscribe: ,
List-Post:
List-Owner:
List-Archive:
On Wed, 2023-03-15 at 08:31 -0700, Bart Schaefer wrote:
> On Tue, Mar 14, 2023 at 9:56 PM Jason C. Kwan wrote:
> >
>> does the following ( below the "====" line ) behavior look even
>> reasonable at all, regardless of your spec ? Because what the spec ends
>> up doing is treating the rest of the input string as 1 byte and printing
>> everything out, even though there are valid code points further down the
>> input string.
>
> I'm not the resident expert on multibyte character sets, so I'm just
> reporting the situation and waiting for e.g. PWS to respond. However,
> as far as my understanding of the multibyte library goes, once you've
> "desynchronized" the input by encountering an invalid byte, you're not
> guaranteed that anything further that you see can be correctly
> interpreted as a code point. I agree that it's not ideal to just dump
> everything else "raw".
Elsewhere, we mostly treat invalid codes as if they're single octets, so
this is a bit inconsistent. I think it's really just to try to avoid
overcomplicating %s output. However, it would probably be more
consistent just to treat everything that doesn't make sense as single
bytes until we get back on track. There doesn't seem any point about
doing anything different with incomplete characters here, either ---
we've already got all the characters we're going to get. Something like
this, but feel free to tweak further --- I don't have any motivation to
do so myself.
This is probably good enough for the obvious simple case of "just
output the next thing you see whatever the heck it looks like".
pws
diff --git a/Src/builtin.c b/Src/builtin.c
index 70a950666..9719d26d1 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -5222,20 +5222,21 @@ bin_print(char *name, char **args, Options ops, int func)
#ifdef MULTIBYTE_SUPPORT
if (isset(MULTIBYTE)) {
chars = mbrlen(ptr, lleft, &mbs);
- if (chars < 0) {
- /*
- * Invalid/incomplete character at this
- * point. Assume all the rest are a
- * single byte. That's about the best we
- * can do.
- */
- lchars += lleft;
- lbytes = (ptr - b) + lleft;
- break;
- } else if (chars == 0) {
- /* NUL, handle as real character */
+ /*
+ * chars <= 0 means one of
+ *
+ * 0: NUL, handle as real character
+ *
+ * -1: MB_INVALID: Assume this is
+ * a single character as we do
+ * elsewhere in the code.
+ *
+ * -2: MB_INCOMPLETE: We're not waiting
+ * for input on this occasion, so
+ * just treat this as invalid.
+ */
+ if (chars <= 0)
chars = 1;
- }
}
else /* use the non-multibyte code below */
#endif