On 2011-03-08 at 09:58 +0000, Peter Stephenson wrote:
> On Tue, 8 Mar 2011 01:52:16 -0500
> Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> > I'm guessing I need a bunch of calls to metafy() to process the
> > results of extraction in zpcre_get_substrings() ?
> 
> You'll need to unmetafy any string getting passed into
> pcre_get_substring_list() and metafy() the resulting captures coming
> out.  You should duplicate any string that needs unmetafying, since
> otherwise it's in place and you may need the metafied form later (you do
> for the string passed in as the first argument).

Okay, it took me far too long to get back around to this, sorry. :(

Attached is what looks to me to be a correct patch.  With bash_rematch
set, I can do:
  % [[ 'foo→bar' =~ .([^[:ascii:]]). ]]
  % echo $BASH_REMATCH 
  o→b →
  % [[ 'foo→bar' =~ .(→.). ]]
  % echo $BASH_REMATCH
  o→ba →b


I'm not sure on when I should be using the wcs_strdup() functions and
the like; what I've got appears to work.  None of what I've added
appears to be specific to UTF-8.

Is it reasonable to add tests to D07multibyte.ztst for this, with the
zsh/pcre dependency?

Can anyone spot any cases I've missed in zsh/pcre ?

Does anyone know of a system extended regexp library which supports
multibyte characters?  I think I should be making the same changes to
zsh/regex but am not sure where to actually test those changes.

Regards,
-Phil