From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 21434 invoked from network); 15 Mar 2023 15:51:11 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 15 Mar 2023 15:51:11 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1678895471; b=Fu4wHwPUlpJayixv3MLK0iSeOYK4RsNF91onYau5SLrXQjGKwE8zrrd5CI19I2bTh1ni5fEpAH g0+JgO1FI9nQZU5uJCzilVUdE0MqWmM8epvTlm6qmWzP4clTxg3eRb4epAiMnT5A5uwgbs17HZ xHaJJiDcsQ71YBuepAOHvrJHZJxBTBDWU+BmDYykSrbAtmJwRLv1BRfmFAfLh0JyVLIs907H0v suKfj2c4fmvvhATXt3WLDv3EAmkz7554J61RbIKKoxRZ8o78UrbUhKpk2zEr8mFNMxBiuUHHyi 2v7QN31eLkHiJRdx4YKnvKmYZPLrnLY+ZoDihR8Xp7cO3A==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mail-lf1-f54.google.com) smtp.remote-ip=209.85.167.54; dkim=pass header.d=gmail.com header.s=20210112 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1678895471; bh=2J7WZu8bpQFhtpzNsU4lkqeCd9ib+KbTGoFhxJRIY6Y=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:Cc:To:Subject: Message-ID:Date:From:In-Reply-To:References:MIME-Version:DKIM-Signature: DKIM-Signature; b=Ee6spNW8NiUNYQTC3IUt7j6HZ+1gSN/izRTH03bWExlr8f574Q9b3Z7nFnhI93/BH1+VwTpPo6 xQyjTkmqi2zex5FNY3/DFwxt+iWCtXM/IkxcjunV645NmrJ0HODyrHSSTzhzWQGVVi859oByAS rfxe0p4FPC1EVIzYDl6arJZvAiJT2/Ug1d9vRiS0g7+MD90HnuyEc7WxSGwCnHm2Bm/iFUHQOx EDwJmlKiXTz//A+3LPzK1Q9NdbqDRriQmcKg6YotbZ/e3buS/3Hbvm4PM8mBkICMk3WDFtXYvS 576frwoVuSjTZpaMEG97S0iBAv83wEo1RnFs6+mx3FJqsA==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=UhI0BlrVmxwZXye7+C2D9lfzG+hNv+J1asFA0iOi1FE=; b=NTTs+oGFDqZsn4YTX2CAcWhm7D jYRm2Y47eDgMrknzfR8rL+d71iNoh9ZMXUD44YqcYKYtMfU2BVwUxBEI70Sl9bO+5cadTKWJOKNfu 3d8KPrtWhNP8exa9XJO4UCuH4oZOBcfEe3XZ15Ds2Isya5g2ksMx6Ms0s/3foTsR4pbTna2CFmyIi C1sp7S+uBjLxvNx0fpljVfXlewktEUSC3suXMl0YjUKuuZFFfmUs9n8dufWpqZr9TmnygoJVyVOGF Eu5lclu1yUPxUx1FuNS/P0HbHuKf1j0rKranlUg9FsN8+7kfAEGgM/z4UBHbdu7MCrb6CN7L7ug+3 JkJTU1YA==; Received: by zero.zsh.org with local id 1pcTPC-0006he-Nz; Wed, 15 Mar 2023 15:51:10 +0000 Authentication-Results: zsh.org; iprev=pass (mail-lf1-f54.google.com) smtp.remote-ip=209.85.167.54; dkim=pass header.d=gmail.com header.s=20210112 header.a=rsa-sha256; dmarc=pass header.from=gmail.com; arc=none Received: from mail-lf1-f54.google.com ([209.85.167.54]:40885) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_128_GCM_SHA256:128) id 1pcTOy-0006OT-CY; Wed, 15 Mar 2023 15:50:57 +0000 Received: by mail-lf1-f54.google.com with SMTP id y15so15360914lfa.7 for ; Wed, 15 Mar 2023 08:50:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678895455; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UhI0BlrVmxwZXye7+C2D9lfzG+hNv+J1asFA0iOi1FE=; b=PlyUkmV8mRHPt6rHg4rPBhOGpow1niSzZCvA2Rcx991U1eP4TS3jws9UU22RqAJp01 0qKO45w3H2U/xWq6QPeorRazkT4mgQbVRpFAnc0nHcIcbQUzoSSE98sIH539HS7tpHrT N20E/binBb953G2fMGY6+tdkVSpJR0gq/Uw16oxWTf66xj8vak0woijRB8aRoareEBxe 4xB13AVs4lYdjvVRAnwZdlfpNnQyOe6w89VpnPV20xePdGBfQh7sJ4wcPKRYA+WIdabk tQv7R93JHU/PUJKWfFiQAgn0bNGGw3NAxWhTIDgfut8Bldj8advBAKxImGEzrljQ85Zo rllg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678895455; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UhI0BlrVmxwZXye7+C2D9lfzG+hNv+J1asFA0iOi1FE=; b=euK/5WT+fQG7SlZTEEXs2WNA0kd46oQPXynPjc8YHjQDTWroSbefokj1rO7EApujjJ iwgqYYsQcvii9cPxz22UGptO3NEbn8QM1fMCI+dFGi9jlhhhAhCXau9Mp+J2ouWLo2Cg uLj2R8IeSuMaLZLfNHg8QJbFn/yHix/OLsXXZXP6oeOxeJp87Yo+sHXCgpNxJcXhy+fi ZqHIpYlSD6vDQpZnpuHw2J7HZ5nlX6xaQc+mukszoTRk/o2x7n/G5xePbQQSAkxJEBpX ehrGHZ14LOIga7GH3GelNK3OXOytj/oXWi8WPleEaQr2cKqflcWdy4his3BswclnRLTN KTpg== X-Gm-Message-State: AO0yUKX6ha7cqbltLpdwplU/+sCGwCWsMtcVpPnK/OPNBWOx2BIBcEkt TgvPbRpGb7+QN6IOOIF+yyO9/pZ+pCPIgzOcC7Y= X-Google-Smtp-Source: AK7set/fHvS/8LBiyG+5yN25hvNcaoTmLaxqbtJGsTgLCBkDGatGlXK/DtKO/pYTwbpwu1kNC8BCdjhhilceSkUrVKw= X-Received: by 2002:a19:750f:0:b0:4d5:ca32:689b with SMTP id y15-20020a19750f000000b004d5ca32689bmr2175259lfe.0.1678895455176; Wed, 15 Mar 2023 08:50:55 -0700 (PDT) MIME-Version: 1.0 References: <1621619253.265114.1678847919086.ref@mail.yahoo.com> <1621619253.265114.1678847919086@mail.yahoo.com> <478761809.298180.1678856216911@mail.yahoo.com> In-Reply-To: From: Roman Perepelitsa Date: Wed, 15 Mar 2023 16:50:43 +0100 Message-ID: Subject: Re: bug report : printf %.1s outputting more than 1 character To: Bart Schaefer Cc: "Jason C. Kwan" , "zsh-workers@zsh.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Seq: 51580 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: On Wed, Mar 15, 2023 at 4:32=E2=80=AFPM Bart Schaefer wrote: > > On Tue, Mar 14, 2023 at 9:56=E2=80=AFPM Jason C. Kwan wrote: > > > > does the following ( below the "=3D=3D=3D=3D" line ) behavior look even= reasonable at all, regardless of your spec ? Because what the spec ends up= doing is treating the rest of the input string as 1 byte and printing ever= ything out, even though there are valid code points further down the input = string. > > I'm not the resident expert on multibyte character sets, so I'm just > reporting the situation and waiting for e.g. PWS to respond. However, > as far as my understanding of the multibyte library goes, once you've > "desynchronized" the input by encountering an invalid byte, you're not > guaranteed that anything further that you see can be correctly > interpreted as a code point. I agree that it's not ideal to just dump > everything else "raw". UTF-8 has a nice property that you can jump to an arbitrary byte position in the stream and quickly find the start of the next character. A byte is the start of a character if it has the most significant bit equal to 0 or two most significant bits equal to 1. This can also be used to recover after an invalid character. Roman.