From: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
To: zsh-workers@zsh.org
Subject: UTF-8 and PCRE and metafy
Date: Tue, 8 Mar 2011 01:52:16 -0500 [thread overview]
Message-ID: <20110308065216.GB79682@redoubt.spodhuis.org> (raw)
4.3.11 with rematch_pcre:
% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10
Same with -pcre-match
% locale charmap
UTF-8
Error -10 is PCRE_ERROR_BADUTF8.
In the pcre.c module, we explicitly enable PCRE_UTF8 if UTF8 is in
effect and supported.
By the:
zwarn("pcre_exec() error: %d", r);
I shoved in a couple more zwarn()s to confirm that the string is in
non-meta form:
zwarn("pcre_exec() error: %d", r);
zwarn("lhstr: %s", lhstr);
zwarn("rhre: /%s/", rhre);
→
zsh: pcre_exec() error: -10
zsh: lhstr: foo→bar
zsh: rhre: /^f.*/
pcretest(1):
% pcretest
PCRE version 8.12 2011-01-15
re> /^f.*/
data> foo→bar
0: foo\xe2\x86\x92bar
Okay, so as long as the char is making it through intact as UTF-8 then
PCRE should be handling it.
Debug each char in lhstr as an int, find it's *not* in non-meta form --
why does it print just fine, then? :(
% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10
zsh: lhstr: foo→bar
zsh: lhstr/%l: foo→bar
zsh: rhre: /^f.*/
zsh: utf-8 enabled? 1
zsh: lhstr char* item: 102
zsh: lhstr char* item: 111
zsh: lhstr char* item: 111
zsh: lhstr char* item: -30
zsh: lhstr char* item: -125
zsh: lhstr char* item: -90
zsh: lhstr char* item: -125
zsh: lhstr char* item: -78
zsh: lhstr char* item: 98
zsh: lhstr char* item: 97
zsh: lhstr char* item: 114
So after line 336 of pcre.c I add:
unmetafy(lhstr, NULL);
Test:
% unset preexec_functions ; unfunction precmd
% [[ 'foo→bar' =~ ^f.* ]] ; print -l $? $MATCH foo $match
pattern.c:1403: BUG: - missing from numeric glob
0
foo?^<bar
foo
zefram
I'm guessing I need a bunch of calls to metafy() to process the results
of extraction in zpcre_get_substrings() ? Where does the string
"zefram" come from? I mean, Andrew is capable and all, but springing
into existence like that was surprising.
Is there guidance on correct API usage here for calling metafy() and
having lengths all match up?
-Phil
next reply other threads:[~2011-03-08 6:52 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 6:52 Phil Pennock [this message]
2011-03-08 8:19 ` Bart Schaefer
2011-03-08 9:58 ` Peter Stephenson
2011-10-21 9:56 ` [patch] " Phil Pennock
2011-10-21 10:35 ` Phil Pennock
2011-10-23 16:32 ` [patch] " Peter Stephenson
2011-10-24 11:35 ` Phil Pennock
2011-10-24 11:43 ` Peter Stephenson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110308065216.GB79682@redoubt.spodhuis.org \
--to=zsh-workers+phil.pennock@spodhuis.org \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).