From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 6006 invoked from network); 21 Sep 2022 17:42:14 -0000 Received: from zero.zsh.org (2a02:898:31:0:48:4558:7a:7368) by inbox.vuxu.org with ESMTPUTF8; 21 Sep 2022 17:42:14 -0000 ARC-Seal: i=1; cv=none; a=rsa-sha256; d=zsh.org; s=rsa-20210803; t=1663782134; b=cSqQ7zIj69K5fifL+RxN/6I5j3b1Fg6VLrFgpQRHpaYDN+Rh3kMhLfSorU198CdqczY4l5jipu j6GKZ6rCwa+sSv9S7ULP8BmF8bxkaEMJGs/OS5d+XsoGI33vPsQuq2/oKuppmZoVyypjFpEvKC w6iKiJnEa614tOL3vLjUEOeV8HVyMMDnexOQbfMmBEVoilB0U/AhT+wmTuyBAUH1Rg4+/eZ2sZ xZhV92P659CFE1o5hwgDIA5pTNua2o9tL459M1SDgHjKHT521YfSucGjYqmj7QelYKLGJAbBzl S2iLW59JCN+FtwlWqF1abQdhQY6mb9TO0KNriO5ZmzO/8A==; ARC-Authentication-Results: i=1; zsh.org; iprev=pass (snd00012-bg.im.kddi.ne.jp) smtp.remote-ip=27.86.113.12; dmarc=none header.from=kba.biglobe.ne.jp; arc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed; d=zsh.org; s=rsa-20210803; t=1663782134; bh=na8YIgYkdr840eX8c7b3PXttgk0V5rQkfOmshJn4W0w=; h=List-Archive:List-Owner:List-Post:List-Unsubscribe:List-Subscribe:List-Help: List-Id:Sender:Message-ID:In-Reply-To:To:References:Date:Subject: MIME-Version:Content-Transfer-Encoding:Content-Type:From:DKIM-Signature; b=ftQxZgcCwokRfemZ8f420Tf5ir1DTw1z1dJ6Tf640QF9/rD+UwUUUihKB4YMdpWYfe7Z4ABZN3 XSCY1y8y7Zuv/N8x6/LkR8gY3UB91hA8JQn0/jxdhBbjw67f1jVyqdCizurVWMl58bUNm8Umv9 zVq9cetZVuvHbI9uJudCLQiSuemBeRExk6kj/fs5hrgYhTW+yZGEDUBM7oZmW3lDwwiz3zFIJ3 GH5XH+zz4Quw5+ThwN88IR19+VMaWb4/7lHkzoYF3WtD4dTvGFAbc0LC5XPOgAGLtHMn3+TaaI XKb/7fJImcn57a3JTQqbKyJEojhMsW8ymjJcm0g/TYJj/Q==; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=zsh.org; s=rsa-20210803; h=List-Archive:List-Owner:List-Post:List-Unsubscribe: List-Subscribe:List-Help:List-Id:Sender:Message-Id:In-Reply-To:To:References: Date:Subject:Mime-Version:Content-Transfer-Encoding:Content-Type:From: Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=ZIgsOlCHH62E0A/+LCxvcTt/PLB0VCoNTHE2iGGM8Tg=; b=H9YSjOMtVHaCwV7cKtmK5G4LkU Qyf2qfq2z/L5G0PeY+KNUO0oCnkns4gkYAa1r0bPyXX1Yscsr/QRaAwVbguuQNXBWY73IKtubm6YC gksR/WZF0xVLbjzCfZ9I5B4NtJw/ibuSLhqu0nwvAEAnkzYQCw0MW+7XKXxcamRVHEO70oHJvuCtH arRkW7+7Hu2958L13MV7QcFXm9E7Xu83fqHs4h2TAiqXgyypGWDZD3QRPdHQokYLU5v/AtOdY7xAA BGoO0ftPcS1iF0s+xILC1MMMhgy1DnYYAqUukdWh+NRK6U96Tey1dOK+l7c7QSJZDUD0Zagx6Skbi PwpJRdKg==; Received: from authenticated user by zero.zsh.org with local id 1ob3jg-000Dpr-Im; Wed, 21 Sep 2022 17:42:12 +0000 Authentication-Results: zsh.org; iprev=pass (snd00012-bg.im.kddi.ne.jp) smtp.remote-ip=27.86.113.12; dmarc=none header.from=kba.biglobe.ne.jp; arc=none Received: from snd00012-bg.im.kddi.ne.jp ([27.86.113.12]:55809 helo=dfmta0014.biglobe.ne.jp) by zero.zsh.org with esmtps (TLS1.3:TLS_AES_256_GCM_SHA384:256) id 1ob3jH-000DXa-E6; Wed, 21 Sep 2022 17:41:50 +0000 Received: from mail.biglobe.ne.jp by omta0014.biglobe.ne.jp with ESMTP id <20220921174140877.BUJW.61979.mail.biglobe.ne.jp@biglobe.ne.jp> for ; Thu, 22 Sep 2022 02:41:40 +0900 From: "Jun. T" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: [bug] locale ctype not always honoured properly in pcre matching Date: Thu, 22 Sep 2022 02:41:39 +0900 References: <20220920135404.64r5fnrlgmgxogye@chazelas.org> To: zsh-workers@zsh.org In-Reply-To: Message-Id: <689789F5-F64A-4692-9973-8045A777CD2F@kba.biglobe.ne.jp> X-Mailer: Apple Mail (2.3696.120.41.1.1) X-Biglobe-Sender: takimoto-j@kba.biglobe.ne.jp X-Seq: 50658 Archived-At: X-Loop: zsh-workers@zsh.org Errors-To: zsh-workers-owner@zsh.org Precedence: list Precedence: bulk Sender: zsh-workers-request@zsh.org X-no-archive: yes List-Id: List-Help: List-Subscribe: List-Unsubscribe: List-Post: List-Owner: List-Archive: > 2022/09/21 8:08, Bart Schaefer wrote: >=20 > I'm not 100% sure, but I think this is because in Src/Modules/pcre.c > the state of UTF-8 parsing is cached and only changes when the > MULTIBYTE option is different upon re-entry. Changing the locale > doesn't have that effect. Yes. The following patch seems to solve the problem. With this patch strcmp(nl_langinfo(CODESET),..) is called every time pcre matching is used, but I think the overhead is negligible. For example, I tried time (repeat 1000000; do [[ 'a' =3D~ '^.\z' ]]; done) before and after the patch, but the time difference was negligible at least on my Mac (both are about 3 seconds). diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c index 6289e003e..46875a59b 100644 --- a/Src/Modules/pcre.c +++ b/Src/Modules/pcre.c @@ -47,8 +47,6 @@ zpcre_utf8_enabled(void) #if defined(MULTIBYTE_SUPPORT) && defined(HAVE_NL_LANGINFO) && = defined(CODESET) static int have_utf8_pcre =3D -1; =20 - /* value can toggle based on MULTIBYTE, so don't - * be too eager with caching */ if (have_utf8_pcre < -1) return 0; =20 @@ -56,15 +54,11 @@ zpcre_utf8_enabled(void) return 0; =20 if ((have_utf8_pcre =3D=3D -1) && - (!strcmp(nl_langinfo(CODESET), "UTF-8"))) { - - if (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre)) + (pcre_config(PCRE_CONFIG_UTF8, &have_utf8_pcre))) { have_utf8_pcre =3D -2; /* erk, failed to ask */ } =20 - if (have_utf8_pcre < 0) - return 0; - return have_utf8_pcre; + return (have_utf8_pcre =3D=3D 1) && (!strcmp(nl_langinfo(CODESET), = "UTF-8")); =20 #else return 0;