From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 28966 invoked from network); 13 Jul 2020 14:03:38 -0000 Received: from ns1.primenet.com.au (HELO primenet.com.au) (203.24.36.2) by inbox.vuxu.org with ESMTPUTF8; 13 Jul 2020 14:03:38 -0000 Received: (qmail 6423 invoked by alias); 13 Jul 2020 14:03:25 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: Sender: zsh-workers@zsh.org X-Seq: 46243 Received: (qmail 9284 invoked by uid 1010); 13 Jul 2020 14:03:25 -0000 X-Qmail-Scanner-Diagnostics: from wout3-smtp.messagingengine.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.102.3/25870. spamassassin: 3.4.4. Clear:RC:0(64.147.123.19):SA:0(-2.6/5.0):. Processed in 5.509176 secs); 13 Jul 2020 14:03:25 -0000 X-Envelope-From: d.s@daniel.shahaf.name X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at daniel.shahaf.name does not designate permitted sender hosts) X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedrvdekgdejfecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecunecujfgurhepfffhvffukfgjfhfogggtgfesthhqtd dtredtjeenucfhrhhomhepffgrnhhivghlucfuhhgrhhgrfhcuoegurdhssegurghnihgv lhdrshhhrghhrghfrdhnrghmvgeqnecuggftrfgrthhtvghrnhephfdtteefheevuedthe dutdeifeegteettdejtdffheduieeijeelteetkeduteehnecukfhppeejledrudejiedr feelrdeileenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegurdhssegurghnihgvlhdrshhhrghhrghfrdhnrghmvg X-ME-Proxy: Date: Mon, 13 Jul 2020 14:02:38 +0000 From: Daniel Shahaf To: =?UTF-8?B?7KCV64iE66as?= Cc: zsh-workers@zsh.org Subject: Re: zsh/pcre has errors with unicode bytes Message-ID: <20200713140238.1d6f4ebf@tarpaulin.shahaf.local2> In-Reply-To: References: X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable =EC=A0=95=EB=88=84=EB=A6=AC wrote on Mon, 13 Jul 2020 11:53 +0900: > $ LC_ALL=3D'C' > $ str=3D'Hi=F0=9F=98=8A' > $ for (( i =3D 1; i <=3D ${#str}; ++i )); do =20 > byte=3D"$str[i]" =20 > [[ $byte -pcre-match [a-zA-Z0-9] ]] && echo $byte || echo 'no match' > done > >> H =20 > i > zsh: pcre_exec() error [-10] =46rom /usr/include/pcre.h on my system: #define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */ #define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */ #define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */ So pcre expects the pattern to be a Unicode string, despite the locale. Actually, wait. We don't know what the locale is. I don't build PCRE, but could you try that again with =C2=ABexport LC_ALL=3D'C'=C2=BB at the st= art? If that doesn't force it to use ASCII, try unsetting the MULTIBYTE option. See zpcre_utf8_enabled() (in Src/Modules/pcre.c). Cheers, Daniel > no match > zsh: pcre_exec() error [-10] > no match > zsh: pcre_exec() error [-10] > no match > zsh: pcre_exec() error [-10] > no match >=20 > Thanks for reading.