From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from zero.zsh.org (zero.zsh.org [IPv6:2a02:898:31:0:48:4558:7a:7368]) by inbox.vuxu.org (Postfix) with ESMTP id 50A3124D02 for ; Sat, 9 Mar 2024 10:49:39 +0100 (CET) ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1709977779; b=mOGXTe0EJR6W96U2xbvQtPuqnVFmfMM8iPsnP2LWHlw1L62fYIucCptBnRHQPYweVtE2tEdNZG Xk+ye4bQYqlH6tekQBdPVsBjRsMIsApzgqrDKuLnJNOGH4wWQiSpicPRq9vV5t09FtKMpoj2jf nPnbWZX+OA4BvKkTr5o22rhsmKfMmCV7vZiKQQFzYVBeLErOt5uFaZUPK5mcd65XdPMPwypxXQ +CM6AQWwibTOmmgOu7LWiPnyeHRq0sTrSv2x30Mjz+jcybiZ6Do+p08HEbx+s1WDebT/0f0tNF ENfAcwMgoLhmp460U1DIN7O/zzKNf1leEB0VcVYlUpdluA==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (mslow1.mail.gandi.net) smtp.remote-ip=217.70.178.240; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1709977779; bh=bOjTy5UqM4pG1pWFPfKfO+O6Q9aXYvdBpLKUWwDliUk=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:MIME-Version: Message-ID:Subject:To:From:Date:DKIM-Signature; b=IRuAMOM6y6Qwq3XonnKtguKtNs/TmXnETaNLnFQpB7pMLv8Rn3mc1qSZAK1oW988uftaTwPxnu EvwQGqxqEQEqo2G2P8iY8TaV/vaQ55xoVqTPKpaq51CobQupj77RvY8OPWGVTCQNFHflV041qo 9wj1CyXy7Xx+s/rpKcTgSwGRTmrhOckNSSr9I3lSWf2CG0TvkYVd82F6vHq/lZn5WJxnRYkWF9 UnyNznTluBBiOt/RVvKuWF90EUCCFK6FSza3NTtuzZNzxvbNM3ntC/TW/rFhVJSg0dZ83GOU3k /hC/AVd4OEk6tLBYstsTLhBeL0vPkSAAQtYFLDDitPEEgQ==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Subject:To:From:Date:Reply-To:Cc: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References; bh=21DPxPzbI8cBkD0dfHjNQ4SJ9A2tthJXsY7bLWoMWwU=; b=VeRMprw/SfV5D6OGgWZISbMsoG gmjctJsje8FrLMoOpn0aAnaAwAyOGHKi6LbuqTeyQDCtj9oX+HK6fPzYgmqpOflaoEXqG3Yq2jQdq /ubD7oSP+w2Ka49Oe7lwvRZSU8HlnUeIkOi1ObJegxOKtuzCEu75jVqVHrJBPTvTJfjAOGRQ5tDIJ x4POWsOmMvvv8IUdSO+01HEtnbqUgiyRzp2WR/VQvOhxh169mGMZgRtwrVNMh0MS6sAt52xxs07r3 l2TVgYqhtH/34byRqK3qKwX7So+a8/MJSAyz8OO9SbzvhsJekpGoh9St1f5lrGWb8MGW5IRiMGgzG l67UrSZQ==; Received: by zero.zsh.org with local id 1ritKl-000HHD-Hx; Sat, 09 Mar 2024 09:49:39 +0000 Authentication-Results: zsh.org; iprev=pass (mslow1.mail.gandi.net) smtp.remote-ip=217.70.178.240; dmarc=none header.from=chazelas.org; arc=none Received: from mslow1.mail.gandi.net ([217.70.178.240]:37509) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1ritK8-000GxF-AD; Sat, 09 Mar 2024 09:49:01 +0000 Received: from relay4-d.mail.gandi.net (unknown [IPv6:2001:4b98:dc4:8::224]) by mslow1.mail.gandi.net (Postfix) with ESMTP id F0C46C2294 for ; Sat, 9 Mar 2024 09:40:39 +0000 (UTC) Received: by mail.gandi.net (Postfix) with ESMTPSA id A276DE0006 for ; Sat, 9 Mar 2024 09:40:19 +0000 (UTC) Date: Sat, 9 Mar 2024 09:40:19 +0000 From: Stephane Chazelas To: Zsh hackers list Subject: quoting of bytes >= 0x80 with set +o multibyte in multibyte locales Message-ID: <20240309094019.w524x42frygvqmxd@chazelas.org> Mail-Followup-To: Zsh hackers list MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-GND-Sasl: stephane@chazelas.org X-Seq: 52720 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: , List-Subscribe: , List-Unsubscribe: , List-Post: List-Owner: List-Archive: $ locale charmap UTF-8 $ set +o multibyte $ a=$'\xe9|\x80' $ printf '%q\n' $a �\|$'\200' $ printf '%s\n' ${(q)a} �\|$'\200' Byte 0xe9 was sent as-is to the terminal (which in my case rendered it as the � replacement character) even though it's not printable in UTF-8 (presumably because U+00E9 is printable but that shouldn't be relevant). It may get arguably worse in other multibyte charsets. For example: $ LC_ALL=zh_HK.big5hkscs luit $ printf '%s\n' ${(q)a} ㄍ$'\200' $ set +o multibyte $ printf '%s\n' ${(q)a} α|$'\200' \xa3\x7c (0x7c being |) is the encoding of U+310D there, while \xa3\x5c (0x5c being \) is the encoding of U+03B1. So we ended up with something printable, but only because the \ escaping the | was munched up to form a printable character that was not there in the input in the first place. IMO, when the multibyte option is off in a locale with a multibyte charset, all bytes above 0x7F should be escaped (as \200..\377). -- Stephane