From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 6785 invoked from network); 20 Sep 2022 13:54:21 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 20 Sep 2022 13:54:21 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1663682061; b=apxIqBY+2d4HbGgaO9PrfVeHQH1elI1YCrCXef7Cg0FHlOarrqF1fW21BUB+QXeGk5BUMS70aP nNewk+M6+URNQsiyDOyPqyjrSZA0SxThUWbtYeyGez18N/FYPr5+4S++Sn6UUbYWJYsHsBVcCJ ohbyubOkLdMrP1q4m36j6snhN5xbk31bSC+pl/eOypDHDrU1KaJENGiW90cDdddVroaJnfJ1gN fROqHliz9pawSaG9KHTc/Zo82AYTVVq/JG34sopXxs8t1eHRihFUlkyVxMnt4NTZa/LrJKoYKY BMAlmir8wn1UJdb7+E4gyzuQ80puyKfQmCYwXcW/aXO96g==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (relay2-d.mail.gandi.net) smtp.remote-ip=217.70.183.194; dmarc=none header.from=chazelas.org; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1663682061; bh=w2WWdmxM5uagIbHh+szj+KyrSLVeAu93uyAhUlGK61Y=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Content-Transfer-Encoding:Content-Type:MIME-Version: Message-ID:Subject:To:From:Date:DKIM-Signature; b=Bn4/R2sQsaJ8y9gFomjAQiOH9feWrWBn0VXv+OVJssrlyXI0otCt9Dyau5qMhxvouaw5dSHiWa JtuUBI0/OupOJE3jdd+3dqBEVhWLbi38kvURC24dFPcUxB5SjqreZGRFYLlfzjL7XFy9Qvp/0e B9smcTjn8BiVUGonfjlPzpHj2jFCP6OEQu2kofIaJ0GwN5Yv0WkxnDQLrmxZMad+BxSoEyOJaZ DOtD3LZ7qA+37s0NcI3eu7K8G5eyv+NDJkWsTj4k1nX6SKCL00MLE0NHix1gWk2lP7CkwOM+Xg h0OIc4dpzz+VHLFLbtsbe9pF7Ax2BUPEnFrKq8KU9nZi9g==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Subject:To:From:Date:Reply-To:Cc: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References; bh=rjQXI+ZGekj8uMKwCYGkYM/AigWGQ6Z6ywd+xo+h9Sc=; b=FV1HCuX92ejDFVYZUg4NhvUYMS S250h8CclK5IArcAq4hZ9lQeSssGxtSfyf+Ns6fmNI3zZEPBQgMHR17XYjE+DWtdlRvCaoq9aR4OP OroguD6USL5rV9jCDMJWsTHuARE257njCiNRx7EHdGqYRGVmTQaL/ffWy1vr8RJXAzs+4roYx/9xN pXR105l+7Z4nVCz2tnrWD6W672Dzx0U6Q/RHdzeN3g0zdLvccDSJ4j6NMVs1EVRscQzE9rkoeQwq4 J3nbR2eb9C2RxwSLv5CNYaDjZ4dIGmZjRk6HH97Ff1z7f8w4j1NoV7oGrVsWuGKnQuNak04IJP3ux vLdJnhyw==; Received: from authenticated user by zero.zsh.org with local id 1oadhd-000I0s-3X; Tue, 20 Sep 2022 13:54:21 +0000 Authentication-Results: zsh.org; iprev=pass (relay2-d.mail.gandi.net) smtp.remote-ip=217.70.183.194; dmarc=none header.from=chazelas.org; arc=none Received: from relay2-d.mail.gandi.net ([217.70.183.194]:64153) by zero.zsh.org with esmtps (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) id 1oadhO-000Hb2-A7; Tue, 20 Sep 2022 13:54:06 +0000 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPSA id D08CA4000E for ; Tue, 20 Sep 2022 13:54:05 +0000 (UTC) Date: Tue, 20 Sep 2022 14:54:04 +0100 From: Stephane Chazelas To: Zsh hackers list Subject: [bug] locale ctype not always honoured properly in pcre matching Message-ID: <20220920135404.64r5fnrlgmgxogye@chazelas.org> Mail-Followup-To: Zsh hackers list MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Seq: 50653 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: $ locale charmap UTF-8 $ set -o rematchpcre $ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes yes OK, in C locale, those two bytes are considered as two characters. $ [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes $ OK, in UTF-8, those two bytes form one é character $ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes $ Same command as above, but now it doesn't match (?!) and instead: $ LC_ALL=C [ $'\xc3\xa9' '=~' '^.\z' ] && echo yes yes Behaves as if doing a match in UTF-8. Same goes with: $ PS1='$ ' zsh -f $ set -o rematchpcre $ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes ) yes $ [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes $ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes ) $ -- Stephane