From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 5429 invoked from network); 15 Mar 2023 04:57:42 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 15 Mar 2023 04:57:42 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1678856262; b=VS86RhtrXEdFD77E99oPDwyfWpgKA7RlWQNmlh7BPv+PXl2EPndU91JRg5Ag3oigukVW+C96/s TXkKg0TBgq49S16FNfJLReAdgAQcnjbW4KRUcwr39fSq9W5UPizbKqCrZYCDy6QhxcbjfAwY9h 7umcxwGTs9qEgFdbCj2lX85UKGSHd8re0z+9TC3j+TvuH3m09nFnzyYjFHgcw2vbxW4H4Yh3xh 5mmaWs/MAO/7lzUrxSi8VpfOSKooHj3BV4R3KKzhWdNMVnynK83aR6dHMYiuCparjFXdb+lsb3 1a5Fsoij/hqFX2ltvCwqcnGFcXH4bsJKqsjrLTeLn7I+bg==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (sonic303-20.consmr.mail.ne1.yahoo.com) smtp.remote-ip=66.163.188.146; dkim=pass header.d=yahoo.com header.s=s2048 header.a=rsa-sha256; dmarc=pass header.from=yahoo.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1678856262; bh=a53V4kVqON6vWgMf0IwXDpiBLLcYlV+VfMWx72lrkY0=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Type:MIME-Version:Subject:References:In-Reply-To: Message-ID:Cc:To:From:Date:DKIM-Signature:DKIM-Signature; b=ZSHNSoYH9QM0t+bG5McluPizdDqOnkt9e/XBAO6Du3pVJnThrLUPa3k7O9rOXEDYRxEgH7D/V2 evUbvgqfzg9DmJgqCoeaVWOKMGoGN71vriWaOcxHlYgzLgXuRVcalMK20r2T3JJedIVJBcD4eK kyvPeZHl/0fmqm9lRJGWjFTi48NvrOTQusdH9n6PRulbO8foK3gEu+IblUVu/1As0KBVsldgRo rdWQhPAgR+ZfpOIuIszmBwa1QTgCuYQTOIRF+n3I1B5OBXkETTlxvUQiPRbsy7w7zdJ9o6CW4d m5KK0EvbtIAzIDSLPPJSBsVNVpji5qSSovpXp/9Yf38bTQ==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Type:MIME-Version:Subject: References:In-Reply-To:Message-ID:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=38epYAYIFLFBlxc8cGcI2vTe9k+rswOjOPV6dlkNJ/k=; b=Ksvl/mm8mmqMdDnYlpU/PwpeHO 6zEUUVThQbrqrMElWbO1xK9xc7mEiOyfsilzSlZV71stXs7d9/R3nh5IA8C1Sa+bu3Zn1nbo9oLgH lKxnv4EWWpCIwFZT04Ay5fkXF/jdd9lH1MCLNjAGgTAS5qC1VIjo2r7VysB0SDNDvLU6MZ1NgteVg xshzxfWQ8WzBo1Qwt+S7W7Y5RaTPzrtQYNAuSYBviOkqxqes6FrtZOVr3A5+MQHC4PHuOnoEizp6F w0KJpaQ5LRFl38stJgUdUY3p2KP9TJbqozxvQm/aqhhDZyzKkZ7VkhQRPEAjc1xgErb2cpBw2/NYQ ShkHHCnw==; Received: by zero.zsh.org with local id 1pcJCn-000Et4-Bo; Wed, 15 Mar 2023 04:57:41 +0000 Authentication-Results: zsh.org; iprev=pass (sonic303-20.consmr.mail.ne1.yahoo.com) smtp.remote-ip=66.163.188.146; dkim=pass header.d=yahoo.com header.s=s2048 header.a=rsa-sha256; dmarc=pass header.from=yahoo.com; arc=none Received: from sonic303-20.consmr.mail.ne1.yahoo.com ([66.163.188.146]:45838) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1pcJC8-000EYT-4K; Wed, 15 Mar 2023 04:57:01 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1678856218; bh=38epYAYIFLFBlxc8cGcI2vTe9k+rswOjOPV6dlkNJ/k=; h=Date:From:To:Cc:In-Reply-To:References:Subject:From:Subject:Reply-To; b=W9RhtOdxVm8YtW17/NU9hPtRFGIUjLiir4Q0HGfZfvZYCxIAaYO+xNk+sb2lbdmLN9tymKiHeiAPulFp32m/C3G0e68OaHJBMm0+armsMcCYVZGS8yFfLhZKG9HqMgU1FfoAewbHK0giotaXnqIxcNLE6oAfBp5o9QNHB49LBLJaDjpSqMJuDyeVf5ADObC3fhf975xpnv7eXbCxGo9vMkU6kEbu4gpw02AscXQy7I7OJ1KdZO6rprsxZ1mgV4y4qX4JL+f0padvWtWjhzKtqacbCj/hWewrCw/mw7i7nAvpH2sWEc1s0LM7FdSSg6GUB2znEoiQ9HDDJXmQYWXzMQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1678856218; bh=u5j3kY4r7jh6sZcVzoiUQXEYmlOUhktOrQBNsTD01p0=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=RnAEKL6lFBaYiXF9+0ghUXCCaTUVYvNOEJENU3ns6HXRnLmQE6N5Ii4k1wgZab2CxSdV6HulehAgawbOSKvH9oijaOZ3VrTK4qdtNNCuouZJAbE7zYFzwDcwMHPsDd9Mlh+91GPNexSu9x8UWyJTtmSnv1rbqeSTl+yDv77CLUR4Xs5jUWw8x51XwSJhteWzCpRd9M0146awQ68RKxcvl3p+jtmIG4/DFoT5jT/8DMFK6rSEgovdrZSe8Ki3xyH9BCnNN42yKgjs0FASdV3pCt9owLzZ08HCA1O5e550O6xJMz7pRDcx3WO657oFLce+Ef2qVmx0iwawffFYMLS51A== X-YMail-OSG: yFkwI3gVM1kdpTK.SifC1m7LM_67tgS6NqbGDaaflj2Ie47e1M6zc3v1OSAD34u zU5mgLPKjrXHXWg4YCqqDAe8woe3Cs.Oh.c3K4ONNFSdGMBNXffMXt7Bn79kgjIhEwDe7m.ICQi0 v3WCIl.3_6SMb3xIOA9fK9GeBukemh6vzBxbnoaDTRBkDbVK9r5y4frd885htsU44RuwzsKC88MN 9kdN8cBR65V_DVODoosBrzjMg.7SJvPtDT.as6RXd_z9ULHQ3QoP9fV8GhKlsXcPkQuNzGjFw30j aCooicRv2iecSpnvOYWJBp47FNbj.11kQi4t.irX3BTK4BKGUZDzqn.fgeGGDGiNHDq8.qML6kEk pYkEPcuG1.1k7fTnnvxUWhMInxDzKLsiO1h4zyaYolUuN6yL0bPekNq5kaFKGQgjpDG4eSheI55J 5zPaTvOSU86Y8MfdyBd0PqwQaa6MmYKbmyXWcLfBNezqDvHwcLjA.Kex9UDPeJvAlHbgAta_KsHw gEZqx.YOlcImvVOTtNZQavZNbNG3FEr6QP5vQmIXcNlOcXUVid9v9mIXIGQEK8qa4P42QFQnK1uJ jTPdhQNSMb7gkiknaAqk5r2fkvJhw1.OORpCh4DzPnZZPOTg6zS5EwDQbmLlq1htnDnFjMr4Pd0P D1zlTniIomjDtOwE7AxWqVdoQNTmilbpUGseWfFRAsQ1i78g4C9.MFwmL_JWhixnwtIzO9LsfUEZ 4_jefGy4lW2wzP2aPm0UwOeR30ITz4WOiPTVwHkMVJ6ozvOMbPNpaa1_KJQ3sBe7Zpcsf7v8g3VG _IBSiPgeWSehyEE4RW3JJNkcmfWVOZq1IPGFEcjJ_0DzZXPdujzFQEBECuRFty8Cv27I8_ofuTZ5 oJHTOczzTBDoPSaFpR7Crxp4xVNwZxlU3wuedwQEfsq8_uWAh.r1BvAqmp6YoZ1mBHg9LckTuHfF Y4baR5xP5lzskPHvyFapvWMavP0KIT68R.FLFBIAqmnH8wqUmqf6Cw5xrfSI_GECbx1PMqZLoWv_ QuWfmOYySVc2BkXEX7.60JWeMEhwI4hFQ_.Ta1YnI6Cde89838kJZyaFbcuYf3sVS6WGoKuWOcQf EJoOn9UHcTVKR7lu8Zl22V37MqTE0zMpC5L_Fa3hjPQU8YOAGZWsbzDmuCaqRDOw1Ilu6gYiMmw9 ZvZNGd47P0alhB8oxgo4c4QXLEHDDT5oH.kv68N9idWaPBMY_ESOf5zkrozBbW.J5e7_jT8rq4_M UWJlJJPwaxMNzVa.uxIoYZ2mrPqNzxIdAZnv_xI_YojXXgrV4mC0wMTV76lgG16.dBLkDqkqCNV4 7_3D9I1IBQ6vZ9MzI_qF.Y9eHgjwHFumWz9JQ9IBEnPxhjTwdGv.r492b9XVtT73vmw7MANp2d_F s9bT4nnyb4rJfL2xnHOkcemOqYIaGWIG79NlIGJQirAv2f_8yUeIGGUQwZ7M60g_zX_ost0jFiK. XjUX4WFbq.84Y500zyAZVHVrItfo0._g.Bci1AWCOmuSBnygcruBz4wWgcL9JVZbKSWjbzTRQcMk bfMzBgCjoyrCLLPDFsqpwulynj_Ux9iccKm3XPZy8XsgPpVp5iDncOKyGGA_e.H8b4G.gMzCwKKV x6zr81yrOW.YdUyzfDBHppAau.nUK36k2ey4ldA9X3a.NEeYb6IzA3bPprQbyH7JLztxk1CeWRPM QNWU054nRr.oxSgN0oglKHawU0gfyvlf.9a.iVX_BkJzqArhgIOeQh6UdSKVASR3ywbcJruAuQVJ 77HRw5Ogy4TLOfCsReNwdPkDqApiEzj6cb5ruSzi06nYyxf2WDEuEiEvfbYfXj0oDrL4nhW59Chz WAAxeOhRTL_IBO84BDxSg0MtYR.Ajcf1DAOrxzIe22H45Cf4FAh.KlxM5TQmHDnjm_URR7ZkTN_A KTIeMtJmDAqwIVjyX_REDGs9DHM6hnS2eu7ScjS7.DaKnX.iYcV3iwkD6KQ.LX240IRjZvTBlkRs Tfw.yismC5AkPLgYYOhY2nP7wGzOhne2I_Zu_WQy561UJO0tMjDHHxDLF4cfp7vMy.tAftrWysXa rpY13UgeBnkqt5OBOzj7M7GZ7Y3WaFibAQM32f91EA2je6rv7rn8evVQZS9uX30rW6uGk45D2ASA D5zJekERzxKFEycyL_LB5CH.45swR2_WLaALG0sE2i9a1byzYer.Khmv72jjN8qZq_R2Ylvufyco Ea5czLT2_7_48hA82GCPfArASKTWKkvncm92H88G2ayjjpHJq9POrIQ8.HPfYzekMZ_i7GGPVEpN WnoJc..Uowlpzkw-- X-Sonic-MF: X-Sonic-ID: d4b173c8-a8d6-44f8-92b3-2577e74ac855 Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.ne1.yahoo.com with HTTP; Wed, 15 Mar 2023 04:56:58 +0000 Date: Wed, 15 Mar 2023 04:56:56 +0000 (UTC) From: "Jason C. Kwan" To: Bart Schaefer Cc: "zsh-workers@zsh.org" Message-ID: <478761809.298180.1678856216911@mail.yahoo.com> In-Reply-To: References: <1621619253.265114.1678847919086.ref@mail.yahoo.com> <1621619253.265114.1678847919086@mail.yahoo.com> Subject: Re: bug report : printf %.1s outputting more than 1 character MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_298179_610053472.1678856216906" X-Mailer: WebService/1.1.21311 YMailNorrin X-Seq: 51578 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: ------=_Part_298179_610053472.1678856216906 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable quote : =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3DThis triggers a branch of the printf co= de introduced by this comment: =C2=A0 =C2=A0 /* =C2=A0 =C2=A0 * Invalid/incomplete character at this =C2=A0 =C2=A0 * point.=C2=A0 Assume all the rest are a =C2=A0 =C2=A0 * single byte.=C2=A0 That's about the best we =C2=A0 =C2=A0 * can do. =C2=A0 =C2=A0 */=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D does the following ( below the "=3D=3D=3D=3D" line ) behavior look even rea= sonable at all, regardless of your spec ? Because what the spec ends up doi= ng is treating the rest of the input string as 1 byte and printing everythi= ng out, even though there are valid code points further down the input stri= ng.=C2=A0 The behavior is correct when LC_ALL=3DC is set, meaning zsh already has the= codes needed to generate the correct output. My point was that instead of = treating the rest of the input string, regardless of size, as 1 byte/charac= ter, why not have it behave "as if" LC_ALL=3DC is in effect whenever it ent= ers this branch : if (chars < 0) {/*=C2=A0* Invalid/incomplete character at this=C2=A0* point= . =C2=A0Assume all the rest are a=C2=A0* single byte. =C2=A0That's about th= e best we=C2=A0* can do.=C2=A0*/lchars +=3D lleft;lbytes =3D (ptr - b) + ll= eft;break; and continue in this mode until a locale-valid character is found, then rev= ert back to multi-byte behavior ? wouldn't that be a more logical behavior = ? If that's too complex to implement, then perhaps treat rest of input string= as a collection of individual bytes instead of just 1 byte ? I just find printf '%.3s'=C2=A0outputting a 179 KB string rather odd. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =C2=A0zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c '___=3D$'= \''=3D\343\276\255#\377\210\234\256A\301B\354\210\264_'\''; command printf = "%s" "$___" | gwc -lcm; for __ in {1..16}; do builtin printf "%.${__}s" "$_= __" | gwc -lcm; done '___=3D$'=3D\343\276\255#\377\210\234\256A\301B\354\21= 0\264_'; command printf "%s" "$___" | gwc -lcm; for __ in {1..16}; do built= in printf "%.${__}s" "$___" | gwc -lcm; done+zsh:1> ___=3D$'=3D=E3=BE=AD#\M= -\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> printf %s $'=3D=E3=BE=AD#\M= -\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2= =A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D1+zsh:1> pr= intf %.1s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> = gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A0 = 1+zsh:1> __=3D2+zsh:1> printf %.2s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.= A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2= =A0 2 =C2=A0 =C2=A0 =C2=A0 4+zsh:1> __=3D3+zsh:1> printf %.3s $'=3D=E3=BE= =AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0= =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A0 5+zsh:1> __=3D4+zsh:1= > printf %.4s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh= :1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2= =A016+zsh:1> __=3D5+zsh:1> printf %.5s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\= \M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 = =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D6+zsh:1> printf %.6s $'=3D=E3= =BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2= =A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D7+zs= h:1> printf %.7s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+= zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 = =C2=A016+zsh:1> __=3D8+zsh:1> printf %.8s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\= C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D9+zsh:1> printf %.9s $'=3D= =E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 = =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D1= 0+zsh:1> printf %.10s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88= =B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 = =C2=A0 =C2=A016+zsh:1> __=3D11+zsh:1> printf %.11s $'=3D=E3=BE=AD#\M-\C-?\M= -\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D12+zsh:1> printf %.1= 2s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lc= m=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1= > __=3D13+zsh:1> printf %.13s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-A= B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 = =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D14+zsh:1> printf %.14s $'=3D=E3=BE=AD#\M= -\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2= =A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016+zsh:1> __=3D15+zsh:1> p= rintf %.15s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh:1= > gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2= =A016+zsh:1> __=3D16+zsh:1> printf %.16s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C= -\\M-.A\M-AB=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0= =C2=A0 7 =C2=A0 =C2=A0 =C2=A016 +zsh:1> ___=3D$'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'+zsh= :1> LC_ALL=3DC printf %s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC= =88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2= =A0 =C2=A0 =C2=A016+zsh:1> __=3D1+zsh:1> LC_ALL=3DC +zsh:1> printf %.1s '= =3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2= =A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A0 1+zsh:1> __=3D2+zsh:1> LC_ALL=3DC = +zsh:1> printf %.2s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 = =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2=A0 2+zsh:1> __=3D3= +zsh:1> LC_ALL=3DC +zsh:1> printf %.3s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh= :1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 1 =C2=A0 =C2=A0 =C2= =A0 3+zsh:1> __=3D4+zsh:1> LC_ALL=3DC +zsh:1> printf %.4s '=3D=E3=BE=AD#???= ?A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 = 2 =C2=A0 =C2=A0 =C2=A0 4+zsh:1> __=3D5+zsh:1> LC_ALL=3DC +zsh:1> printf %.5= s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 = =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A0 5+zsh:1> __=3D6+zsh:1> LC_ALL= =3DC +zsh:1> printf %.6s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm= =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A0 6+zsh:1>= __=3D7+zsh:1> LC_ALL=3DC +zsh:1> printf %.7s '=3D=E3=BE=AD#????A?B=EC=88= =B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 = =C2=A0 =C2=A0 7+zsh:1> __=3D8+zsh:1> LC_ALL=3DC +zsh:1> printf %.8s '=3D=E3= =BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2= =A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A0 8+zsh:1> __=3D9+zsh:1> LC_ALL=3DC +zsh:1>= printf %.9s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 = =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 3 =C2=A0 =C2=A0 =C2=A0 9+zsh:1> __=3D10+zsh:1= > LC_ALL=3DC +zsh:1> printf %.10s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> g= wc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 4 =C2=A0 =C2=A0 =C2=A010= +zsh:1> __=3D11+zsh:1> LC_ALL=3DC +zsh:1> printf %.11s '=3D=E3=BE=AD#????A?= B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 4 = =C2=A0 =C2=A0 =C2=A011+zsh:1> __=3D12+zsh:1> LC_ALL=3DC +zsh:1> printf %.12= s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 = =C2=A0 =C2=A0 =C2=A0 5 =C2=A0 =C2=A0 =C2=A012+zsh:1> __=3D13+zsh:1> LC_ALL= =3DC +zsh:1> printf %.13s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm= =C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 5 =C2=A0 =C2=A0 =C2=A013+zsh:1>= __=3D14+zsh:1> LC_ALL=3DC +zsh:1> printf %.14s '=3D=E3=BE=AD#????A?B=EC=88= =B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 5 =C2=A0 = =C2=A0 =C2=A014+zsh:1> __=3D15+zsh:1> LC_ALL=3DC +zsh:1> printf %.15s '=3D= =E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 =C2=A0 =C2=A0 0 =C2=A0 = =C2=A0 =C2=A0 6 =C2=A0 =C2=A0 =C2=A015+zsh:1> __=3D16+zsh:1> LC_ALL=3DC +zs= h:1> printf %.16s '=3D=E3=BE=AD#????A?B=EC=88=B4_'+zsh:1> gwc -lcm=C2=A0 = =C2=A0 =C2=A0 0 =C2=A0 =C2=A0 =C2=A0 7 =C2=A0 =C2=A0 =C2=A016 On Tuesday, March 14, 2023 at 11:46:14 PM EDT, Bart Schaefer wrote: =20 =20 On Tue, Mar 14, 2023 at 7:40=E2=80=AFPM Jason C. Kwan wrote: > > I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I unders= tand isn't the latest and greatest of 5.9, so perhaps this bug has already = been addressed. A related case been addressed by declaring it an intentional divergence from POSIX, see https://www.zsh.org/mla/workers/2022/msg00240.html However ... > In the 4-byte sequence as seen below ( defined via explicit octal codes )= , under no Unicode scenario should 4 bytes be printed out via a command of = printf %.1s, by design. > >=C2=A0 - The first byte of \377 \xFF is explicitly invalid under UTF-8 (ev= en allowing up to 7-byte in the oldest of definitions). This triggers a branch of the printf code introduced by this comment: =C2=A0 =C2=A0 /* =C2=A0 =C2=A0 * Invalid/incomplete character at this =C2=A0 =C2=A0 * point.=C2=A0 Assume all the rest are a =C2=A0 =C2=A0 * single byte.=C2=A0 That's about the best we =C2=A0 =C2=A0 * can do. =C2=A0 =C2=A0 */ Thus, you've deliberately invoked a case where zsh's response to invalid input is to punt.=C2=A0 This dates back to the original implementation in workers/23098, https://www.zsh.org/mla/workers/2007/msg00019.html, January 2007. =20 ------=_Part_298179_610053472.1678856216906 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
quote :

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D
Th= is triggers a branch of the printf code introduced by this comment:<= br clear=3D"none" style=3D"color: rgb(38, 40, 42);">    /*
    * I= nvalid/incomplete character at this
    *= point.  Assume all the rest are a
  &nbs= p; * single byte.  That's about the best we
&n= bsp;   * can do.
    */
=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D
=

does the following ( below the "= =3D=3D=3D=3D" line ) behavior look even reasonable at all, regardless of yo= ur spec ? Because what the spec ends up doing is treating the rest of the i= nput string as 1 byte and printing everything out, even though there are va= lid code points further down the input string. 

The behavior is correct whe= n LC_ALL= =3DC is set, meaning zsh already has the codes needed to generate th= e correct output. My point was that instead of treating the rest of the inp= ut string, regardless of size, as 1 byte/character, why not have it behave = "as if" LC_ALL=3DC is in effect whenever it enters this branch :

if (chars < 0) {
= /*
 * Invalid/incomplete character at th= is
 * point.  Assume all the rest are = a
 * single byte.  That's about the be= st we
 * can do.
 */
lchars +=3D lleft;
lbytes =3D (ptr - b) + lleft;
break;

and continue in this mode until a locale-valid = character is found, then revert back to multi-byte behavior ? wouldn't that= be a more logical behavior ?

If that's too complex to implement, then perhaps treat= rest of input string as a collection of individual bytes instead of just 1= byte ?

I= just find printf '%.3s' outputting a 179 KB string rather odd.<= /font>

=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 zsh --restricted --no-rcs --nologin --ve= rbose -xtrace -f -c '___=3D$'\''=3D\343\276\255#\377\210\234\256A\301B\354\= 210\264_'\''; command printf "%s" "$___" | gwc -lcm; for __ in {1..16}; do = builtin printf "%.${__}s" "$___" | gwc -lcm; done '
___=3D$'=3D\34= 3\276\255#\377\210\234\256A\301B\354\210\264_'; command printf "%s" "$___" = | gwc -lcm; for __ in {1..16}; do builtin printf "%.${__}s" "$___" | gwc -l= cm; done
+zsh:1> ___=3D$'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.= A\M-AB=EC=88=B4_'
+zsh:1> printf %s $'=3D=E3=BE=AD#\M-\C-?\M-\C= -H\M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
 =     0       7      16
=
+zs= h:1> __=3D1
+zsh:1> printf %.1s $'=3D=E3=BE=AD#\M-\C-?\M-\C-= H\M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
  =     0       1       1
<= div>+zsh= :1> __=3D2
+zsh:1> printf %.2s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H= \M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
=   &= nbsp;   0       2       4
+zsh:= 1> __=3D3
+zsh:1> printf %.3s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\= M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
<= font face=3D"courier new, courier, monaco, monospace, sans-serif">  &n= bsp;   0       3       5
+zsh:1= > __=3D4
+zsh:1> printf %.4s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M= -\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
  &nb= sp;   0       7      16
+zsh:1&= gt; __=3D5
+zsh:1> printf %.5s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-= \C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
  &nbs= p;   0       7      16
= +zsh:1&g= t; __=3D6
+zsh:1> printf %.6s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\= C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
   = ;   0       7      16
<= font face=3D"courier new, courier, monaco, monospace, sans-serif">+zsh:1>= ; __=3D7
+zsh:1> printf %.7s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C= -\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
   =   0       7      16
+zsh:1>= __=3D8
+zsh:1> printf %.8s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-= \\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
    =   0       7      16
+zsh:1> = __=3D9
+zsh:1> printf %.9s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\= \M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
    &= nbsp; 0       7      16
+zsh:1> _= _=3D10
+zsh:1> printf %.10s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-= \\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
    =   0       7      16
+zsh:1> = __=3D11
+zsh:1> printf %.11s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C= -\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
   =   0       7      16
+zsh:1>= __=3D12
+zsh:1> printf %.12s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\= C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
   = ;   0       7      16
<= font face=3D"courier new, courier, monaco, monospace, sans-serif">+zsh:1>= ; __=3D13
+zsh:1> printf %.13s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-= \C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
  &nbs= p;   0       7      16
= +zsh:1&g= t; __=3D14
+zsh:1> printf %.14s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M= -\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
  &nb= sp;   0       7      16
+zsh:1&= gt; __=3D15
+zsh:1> printf %.15s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H\= M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
<= font face=3D"courier new, courier, monaco, monospace, sans-serif">  &n= bsp;   0       7      16
+zsh:1= > __=3D16
+zsh:1> printf %.16s $'=3D=E3=BE=AD#\M-\C-?\M-\C-H= \M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
=   &= nbsp;   0       7      16

<= /font>
+zsh:1> ___=3D$'=3D=E3=BE=AD#\M-\C-?\M-\C-H\M-\C-\\M-.A\M-AB=EC= =88=B4_'
+zsh:1> LC_ALL=3DC printf %s $'=3D=E3=BE=AD#\M-\C-?\M-= \C-H\M-\C-\\M-.A\M-AB=EC=88=B4_'
+zsh:1> gwc -lcm
<= div>&nbs= p;     0       7      16
+= zsh:1> __=3D1
+zsh:1> LC_ALL=3DC +zsh:1> printf %.1s '=3D= =E3=BE=AD#????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
=   &= nbsp;   0       1       1
+zsh:= 1> __=3D2
+zsh:1> LC_ALL=3DC +zsh:1> printf %.2s '=3D=E3= =BE=AD#????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
  &nbs= p;   0       1       2
= +zsh:1&g= t; __=3D3
+zsh:1> LC_ALL=3DC +zsh:1> printf %.3s '=3D=E3=BE= =AD#????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    =   0       1       3
+zsh:1> = __=3D4
+zsh:1> LC_ALL=3DC +zsh:1> printf %.4s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       2       4
+zsh:1> __= =3D5
+zsh:1> LC_ALL=3DC +zsh:1> printf %.5s '=3D=E3=BE=AD#??= ??A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       3       5
+zsh:1> __= =3D6
+zsh:1> LC_ALL=3DC +zsh:1> printf %.6s '=3D=E3=BE=AD#??= ??A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       3       6
+zsh:1> __= =3D7
+zsh:1> LC_ALL=3DC +zsh:1> printf %.7s '=3D=E3=BE=AD#??= ??A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       3       7
+zsh:1> __= =3D8
+zsh:1> LC_ALL=3DC +zsh:1> printf %.8s '=3D=E3=BE=AD#??= ??A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       3       8
+zsh:1> __= =3D9
+zsh:1> LC_ALL=3DC +zsh:1> printf %.9s '=3D=E3=BE=AD#??= ??A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       3       9
+zsh:1> __= =3D10
+zsh:1> LC_ALL=3DC +zsh:1> printf %.10s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       4      10
+zsh:1> __= =3D11
+zsh:1> LC_ALL=3DC +zsh:1> printf %.11s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       4      11
+zsh:1> __= =3D12
+zsh:1> LC_ALL=3DC +zsh:1> printf %.12s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       5      12
+zsh:1> __= =3D13
+zsh:1> LC_ALL=3DC +zsh:1> printf %.13s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       5      13
+zsh:1> __= =3D14
+zsh:1> LC_ALL=3DC +zsh:1> printf %.14s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       5      14
+zsh:1> __= =3D15
+zsh:1> LC_ALL=3DC +zsh:1> printf %.15s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
    &nbs= p; 0       6      15
+zsh:1> __= =3D16
+zsh:1> LC_ALL=3DC +zsh:1> printf %.16s '=3D=E3=BE=AD#= ????A?B=EC=88=B4_'
+zsh:1> gwc -lcm
      0       7    =  16




=20
=20
On Tuesday, March 14, 2023 at 11:46:14 PM EDT, Bart Sch= aefer <schaefer@brasslantern.com> wrote:


On Tue, Mar 14, 2023 at 7:40=E2=80=AF= PM Jason C. Kwan <jasonckwan@yahoo.com> wrote:
>
> I'm using the macOS 13.2.1 OS-pr= ovided zsh, version 5.8.1, which I understand isn't the latest and greatest= of 5.9, so perhaps this bug has already been addressed.
=
A related case been addressed by declaring it an intenti= onal
divergence from POSIX, see
https://www.zsh.org/mla/workers/2022/msg002= 40.html

However ...

> In the 4-byte sequence as seen below ( defined via ex= plicit octal codes ), under no Unicode scenario should 4 bytes be printed o= ut via a command of printf %.1s, by design.
>
>  - The first byte of \377 \xFF is explicitly invalid un= der UTF-8 (even allowing up to 7-byte in the oldest of definitions).
<= br clear=3D"none">
This triggers a branch of the printf c= ode introduced by this comment:
    /*
    * Invalid/incomplete character at this
    * point.  Assume all the rest are a
    * single byte.  That's about the best we
    * can do.
    */

Thus, you've deliberately invoked a case w= here zsh's response to
invalid input is to punt.  Th= is dates back to the original
implementation in workers/2= 3098,
https://www.zsh.org/mla/workers/2007/msg00019.html,= January 2007.

------=_Part_298179_610053472.1678856216906--