From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 26962 invoked from network); 18 Mar 2023 16:56:59 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 18 Mar 2023 16:56:59 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1679158619; b=ezK6HGmekwzepGQu9KgvQtD3Ub7VLbHgvNhT1dUcmnuwqQ6opQ6Xp7L70ABnYSXHV4zv3k/jGF 3YR6D+wfl0Hp/VY5PP770NiA0UqDgiH9Y4jk5R7a1NaVCYyM8UwsV5sXLgBFHZLOUZ+8hpZsbE ZK5jP94v7TdL+XRa0uxPMJQTOGyLCNF8jYe1hmxauQMJEdRHRqT7nBcC9qtVnTZ7ovBnEaDX4R xIRB5Uaw2eJUTW8jIKAjTPTwItbMujFhw5NfQdisyuJ8lZ5gTsojW44IhMzmTP1iovdrWAheyq Uu1KPdKo5AW0KuC33g8eJ/qiO+psXa12YvhKlTiNrccniw==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97; dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256; dmarc=pass header.from=ntlworld.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1679158619; bh=OcJIi1xV6IqOaPtlos4tONZ3CHYSCnFHGaEv1MEKt4E=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:MIME-Version:Content-Type: References:In-Reply-To:Date:To:From:Subject:Message-ID:DKIM-Signature: DKIM-Signature; b=i0SNfcsH4o2l6sDSqP7a6XB+rYH0Dcp6JgakW7AvFo/dMXcMKvG6UYJzs5lyTMiaVOrzojBhXI Q2HdE0oyZkU1RsOuYDWnxYuVhtmIWymVbYzTBkHo2vgxuD8zdE5jEUzqjelx318WR2R75r0v7j QTX4QNouLKaCDlh4z1XvtmoEPgFnTxSWhgSKqV8936Oobvn8dvrtPKz2Rre7PQYsx8wSWTHNRY fPubOqCWdJK7SbPjdc8bXLUxwIg5ujCzxMs9r5n9MDkFcktoWfW0rzvs4Xz7SMPnl5lgSSGNiK joVhoc1c++BOsDE+qvks1hreE0MuNyLHx+KkHw04oftZFw==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Mime-Version:Content-Type:References:In-Reply-To:Date:To:From:Subject: Message-ID:Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From :Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=/0zFyrlwTKRaED/Pgk1hYWsh7ewur57mfWr26J4b888=; b=HAyUC1k8lwKTORmV7AVT0DrCXW upGrVNgVF4C50U5N7tD2arHqASOr51h7o0ztlYDjZzbOlVXYKozOzZ/ZsXwqRKFrwPXa1Uohp2jV0 wKf2N4flXdCozKu13+1HZtCPcacNzd2f28l5eb2J7w46GYvtf94mc1vm0U6Aubmo7AKzA2VgYj3up Xr9t6A9s9T/IeohIJznGEoKseEvGT7MzyMxSQC7foFsmYnzbvpdulUJNKL9biWcy9XuFwW0L/N0Gl 8cuLplscQgW8Oo/OZQnDv6bijEO0V2QNyfCLz2/US98++yuaOt2On3o9eVf96qNzJJAgGTLV/cKzs mG0YcGow==; Received: by zero.zsh.org with local id 1pdZrW-000EoW-IA; Sat, 18 Mar 2023 16:56:58 +0000 Authentication-Results: zsh.org; iprev=pass (smtpq2.tb.ukmail.iss.as9143.net) smtp.remote-ip=212.54.57.97; dkim=pass header.d=ntlworld.com header.s=meg.feb2017 header.a=rsa-sha256; dmarc=pass header.from=ntlworld.com; arc=none Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:46624) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1pdZrD-000EUR-5T; Sat, 18 Mar 2023 16:56:39 +0000 Received: from [212.54.57.106] (helo=csmtp2.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pdZrC-0006vX-J3 for zsh-workers@zsh.org; Sat, 18 Mar 2023 17:56:38 +0100 Received: from pws-Zeus ([82.1.229.179]) by cmsmtp with ESMTPA id dZrCpbn0uNOHpdZrCp84Bt; Sat, 18 Mar 2023 17:56:38 +0100 X-SourceIP: 82.1.229.179 X-Authenticated-Sender: p.w.stephenson@ntlworld.com X-Spam: 0 X-Authority: v=2.4 cv=f66ORs+M c=1 sm=1 tr=0 ts=6415ed46 cx=a_exe a=W4rGJ7PGCSRRUvufKHO8Dg==:117 a=W4rGJ7PGCSRRUvufKHO8Dg==:17 a=IkcTkHD0fZMA:10 a=k__wU0fu6RkA:10 a=CjxXgO3LAAAA:8 a=ektClxOcwJYiqwCWrT8A:9 a=QEXdDO2ut3YA:10 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1679158598; bh=/0zFyrlwTKRaED/Pgk1hYWsh7ewur57mfWr26J4b888=; h=Subject:From:To:Date:In-Reply-To:References; b=2Vv0ooQ8Iz8La/xTR9jSHQ4uF7HORNW9TweEt3/nQo7BPN3leMIRbKlnjzaqZzy48 kvmervAANUndihSukbflJAT3PduTCwSnoVwpFABBY3wdj1yEC8Va4BI66WPNVmheDb l9G1Q1Bo4xqx1dTMsMi8iqxhWxU3JC9qt0AcApOTp1GOOaJQ82S99XtnsnVVCZ5mjx ccJyHyPysVaHdvjq36BudRm0PT291SWB/5gQPBCpN11ixRLmy7rQCGDqlYA+AI5e3Q Dn95YUkgpnk3jsale+Jj9NffCBlK5tDFN+0uVclXuPPGIj5UuuR5CReAZJGVoxywXv 3t+kteMl3ZDPQ== Message-ID: <38f2432dcd7006355592f6e0e7dc15e420d3d7f3.camel@ntlworld.com> Subject: Re: bug report : printf %.1s outputting more than 1 character From: Peter Stephenson To: zsh-workers@zsh.org Date: Sat, 18 Mar 2023 16:56:37 +0000 In-Reply-To: References: <1621619253.265114.1678847919086.ref@mail.yahoo.com> <1621619253.265114.1678847919086@mail.yahoo.com> <478761809.298180.1678856216911@mail.yahoo.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.28.5-0ubuntu0.18.04.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4xfESmC/2QEaoKxix3VtRNpnZeLEfrqbP/m53BpHquM330dpwDafbzJg7oXNtdpNoWPrqLJhxDeFrH90eLsUNk25sm3/mNrf7n7BbbSiwLDM6Gr8Cf32ld je3ZfbQnmbR34iytFgrS9yUKcm9VHXPpMsngIacocebSxNYPzQ5Kt9G3DXXRt/GG6aVgsHUd+g9k2A== X-Seq: 51586 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: On Wed, 2023-03-15 at 08:31 -0700, Bart Schaefer wrote: > On Tue, Mar 14, 2023 at 9:56 PM Jason C. Kwan wrote: > > >> does the following ( below the "====" line ) behavior look even >> reasonable at all, regardless of your spec ? Because what the spec ends >> up doing is treating the rest of the input string as 1 byte and printing >> everything out, even though there are valid code points further down the >> input string. > > I'm not the resident expert on multibyte character sets, so I'm just > reporting the situation and waiting for e.g. PWS to respond. However, > as far as my understanding of the multibyte library goes, once you've > "desynchronized" the input by encountering an invalid byte, you're not > guaranteed that anything further that you see can be correctly > interpreted as a code point. I agree that it's not ideal to just dump > everything else "raw". Elsewhere, we mostly treat invalid codes as if they're single octets, so this is a bit inconsistent. I think it's really just to try to avoid overcomplicating %s output. However, it would probably be more consistent just to treat everything that doesn't make sense as single bytes until we get back on track. There doesn't seem any point about doing anything different with incomplete characters here, either --- we've already got all the characters we're going to get. Something like this, but feel free to tweak further --- I don't have any motivation to do so myself. This is probably good enough for the obvious simple case of "just output the next thing you see whatever the heck it looks like". pws diff --git a/Src/builtin.c b/Src/builtin.c index 70a950666..9719d26d1 100644 --- a/Src/builtin.c +++ b/Src/builtin.c @@ -5222,20 +5222,21 @@ bin_print(char *name, char **args, Options ops, int func) #ifdef MULTIBYTE_SUPPORT if (isset(MULTIBYTE)) { chars = mbrlen(ptr, lleft, &mbs); - if (chars < 0) { - /* - * Invalid/incomplete character at this - * point. Assume all the rest are a - * single byte. That's about the best we - * can do. - */ - lchars += lleft; - lbytes = (ptr - b) + lleft; - break; - } else if (chars == 0) { - /* NUL, handle as real character */ + /* + * chars <= 0 means one of + * + * 0: NUL, handle as real character + * + * -1: MB_INVALID: Assume this is + * a single character as we do + * elsewhere in the code. + * + * -2: MB_INCOMPLETE: We're not waiting + * for input on this occasion, so + * just treat this as invalid. + */ + if (chars <= 0) chars = 1; - } } else /* use the non-multibyte code below */ #endif