On 2011-03-08 at 09:58 +0000, Peter Stephenson wrote: > On Tue, 8 Mar 2011 01:52:16 -0500 > Phil Pennock wrote: > > I'm guessing I need a bunch of calls to metafy() to process the > > results of extraction in zpcre_get_substrings() ? > > You'll need to unmetafy any string getting passed into > pcre_get_substring_list() and metafy() the resulting captures coming > out. You should duplicate any string that needs unmetafying, since > otherwise it's in place and you may need the metafied form later (you do > for the string passed in as the first argument). Okay, it took me far too long to get back around to this, sorry. :( Attached is what looks to me to be a correct patch. With bash_rematch set, I can do: % [[ 'foo→bar' =~ .([^[:ascii:]]). ]] % echo $BASH_REMATCH o→b → % [[ 'foo→bar' =~ .(→.). ]] % echo $BASH_REMATCH o→ba →b I'm not sure on when I should be using the wcs_strdup() functions and the like; what I've got appears to work. None of what I've added appears to be specific to UTF-8. Is it reasonable to add tests to D07multibyte.ztst for this, with the zsh/pcre dependency? Can anyone spot any cases I've missed in zsh/pcre ? Does anyone know of a system extended regexp library which supports multibyte characters? I think I should be making the same changes to zsh/regex but am not sure where to actually test those changes. Regards, -Phil