zsh-workers
 help / color / mirror / code / Atom feed
From: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
To: zsh-workers@zsh.org
Subject: UTF-8 and PCRE and metafy
Date: Tue, 8 Mar 2011 01:52:16 -0500	[thread overview]
Message-ID: <20110308065216.GB79682@redoubt.spodhuis.org> (raw)

4.3.11 with rematch_pcre:

% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10

Same with -pcre-match

% locale charmap
UTF-8

Error -10 is PCRE_ERROR_BADUTF8.

In the pcre.c module, we explicitly enable PCRE_UTF8 if UTF8 is in
effect and supported.

By the:
  zwarn("pcre_exec() error: %d", r);
I shoved in a couple more zwarn()s to confirm that the string is in
non-meta form:
  zwarn("pcre_exec() error: %d", r);
  zwarn("lhstr: %s", lhstr);
  zwarn("rhre: /%s/", rhre);
→
  zsh: pcre_exec() error: -10
  zsh: lhstr: foo→bar
  zsh: rhre: /^f.*/

pcretest(1):
% pcretest
PCRE version 8.12 2011-01-15

  re> /^f.*/
data> foo→bar
 0: foo\xe2\x86\x92bar

Okay, so as long as the char is making it through intact as UTF-8 then
PCRE should be handling it.

Debug each char in lhstr as an int, find it's *not* in non-meta form --
why does it print just fine, then?  :(

% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10
zsh: lhstr: foo→bar
zsh: lhstr/%l: foo→bar
zsh: rhre: /^f.*/
zsh: utf-8 enabled?  1
zsh: lhstr char* item: 102
zsh: lhstr char* item: 111
zsh: lhstr char* item: 111
zsh: lhstr char* item: -30
zsh: lhstr char* item: -125
zsh: lhstr char* item: -90
zsh: lhstr char* item: -125
zsh: lhstr char* item: -78
zsh: lhstr char* item: 98
zsh: lhstr char* item: 97
zsh: lhstr char* item: 114

So after line 336 of pcre.c I add:

    unmetafy(lhstr, NULL);

Test:
% unset preexec_functions ; unfunction precmd
% [[ 'foo→bar' =~ ^f.* ]] ; print -l $? $MATCH foo $match
 pattern.c:1403: BUG: - missing from numeric glob
0
foo?^<bar
foo
zefram

I'm guessing I need a bunch of calls to metafy() to process the results
of extraction in zpcre_get_substrings() ?  Where does the string
"zefram" come from?  I mean, Andrew is capable and all, but springing
into existence like that was surprising.

Is there guidance on correct API usage here for calling metafy() and
having lengths all match up?

-Phil


             reply	other threads:[~2011-03-08  6:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-08  6:52 Phil Pennock [this message]
2011-03-08  8:19 ` Bart Schaefer
2011-03-08  9:58 ` Peter Stephenson
2011-10-21  9:56   ` [patch] " Phil Pennock
2011-10-21 10:35     ` Phil Pennock
2011-10-23 16:32     ` [patch] " Peter Stephenson
2011-10-24 11:35       ` Phil Pennock
2011-10-24 11:43         ` Peter Stephenson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110308065216.GB79682@redoubt.spodhuis.org \
    --to=zsh-workers+phil.pennock@spodhuis.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).