UTF-8 and PCRE and metafy

zsh-workers
 help / color / mirror / code / Atom feed

* UTF-8 and PCRE and metafy
@ 2011-03-08  6:52 Phil Pennock
  2011-03-08  8:19 ` Bart Schaefer
  2011-03-08  9:58 ` Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Phil Pennock @ 2011-03-08  6:52 UTC (permalink / raw)
  To: zsh-workers

4.3.11 with rematch_pcre:

% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10

Same with -pcre-match

% locale charmap
UTF-8

Error -10 is PCRE_ERROR_BADUTF8.

In the pcre.c module, we explicitly enable PCRE_UTF8 if UTF8 is in
effect and supported.

By the:
  zwarn("pcre_exec() error: %d", r);
I shoved in a couple more zwarn()s to confirm that the string is in
non-meta form:
  zwarn("pcre_exec() error: %d", r);
  zwarn("lhstr: %s", lhstr);
  zwarn("rhre: /%s/", rhre);
→
  zsh: pcre_exec() error: -10
  zsh: lhstr: foo→bar
  zsh: rhre: /^f.*/

pcretest(1):
% pcretest
PCRE version 8.12 2011-01-15

  re> /^f.*/
data> foo→bar
 0: foo\xe2\x86\x92bar

Okay, so as long as the char is making it through intact as UTF-8 then
PCRE should be handling it.

Debug each char in lhstr as an int, find it's *not* in non-meta form --
why does it print just fine, then?  :(

% [[ 'foo→bar' =~ ^f.* ]]
zsh: pcre_exec() error: -10
zsh: lhstr: foo→bar
zsh: lhstr/%l: foo→bar
zsh: rhre: /^f.*/
zsh: utf-8 enabled?  1
zsh: lhstr char* item: 102
zsh: lhstr char* item: 111
zsh: lhstr char* item: 111
zsh: lhstr char* item: -30
zsh: lhstr char* item: -125
zsh: lhstr char* item: -90
zsh: lhstr char* item: -125
zsh: lhstr char* item: -78
zsh: lhstr char* item: 98
zsh: lhstr char* item: 97
zsh: lhstr char* item: 114

So after line 336 of pcre.c I add:

    unmetafy(lhstr, NULL);

Test:
% unset preexec_functions ; unfunction precmd
% [[ 'foo→bar' =~ ^f.* ]] ; print -l $? $MATCH foo $match
 pattern.c:1403: BUG: - missing from numeric glob
0
foo?^<bar
foo
zefram

I'm guessing I need a bunch of calls to metafy() to process the results
of extraction in zpcre_get_substrings() ?  Where does the string
"zefram" come from?  I mean, Andrew is capable and all, but springing
into existence like that was surprising.

Is there guidance on correct API usage here for calling metafy() and
having lengths all match up?

-Phil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UTF-8 and PCRE and metafy
  2011-03-08  6:52 UTF-8 and PCRE and metafy Phil Pennock
@ 2011-03-08  8:19 ` Bart Schaefer
  2011-03-08  9:58 ` Peter Stephenson
  1 sibling, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 2011-03-08  8:19 UTC (permalink / raw)
  To: zsh-workers

On Mar 8,  1:52am, Phil Pennock wrote:
}
} I'm guessing I need a bunch of calls to metafy() to process the results
} of extraction in zpcre_get_substrings() ?  Where does the string
} "zefram" come from?  I mean, Andrew is capable and all, but springing
} into existence like that was surprising.

I'm not really the one to answer the metafy() question, but "zefram" is
almost certainly coming from matching prompt_(#b)(*)_setup in promptinit.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UTF-8 and PCRE and metafy
  2011-03-08  6:52 UTF-8 and PCRE and metafy Phil Pennock
  2011-03-08  8:19 ` Bart Schaefer
@ 2011-03-08  9:58 ` Peter Stephenson
  2011-10-21  9:56   ` [patch] " Phil Pennock
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2011-03-08  9:58 UTC (permalink / raw)
  To: zsh-workers

On Tue, 8 Mar 2011 01:52:16 -0500
Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> I'm guessing I need a bunch of calls to metafy() to process the
> results of extraction in zpcre_get_substrings() ?

You'll need to unmetafy any string getting passed into
pcre_get_substring_list() and metafy() the resulting captures coming
out.  You should duplicate any string that needs unmetafying, since
otherwise it's in place and you may need the metafied form later (you do
for the string passed in as the first argument).

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [patch] Re: UTF-8 and PCRE and metafy
  2011-03-08  9:58 ` Peter Stephenson
@ 2011-10-21  9:56   ` Phil Pennock
  2011-10-21 10:35     ` Phil Pennock
  2011-10-23 16:32     ` [patch] " Peter Stephenson
  0 siblings, 2 replies; 8+ messages in thread
From: Phil Pennock @ 2011-10-21  9:56 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

On 2011-03-08 at 09:58 +0000, Peter Stephenson wrote:
> On Tue, 8 Mar 2011 01:52:16 -0500
> Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> > I'm guessing I need a bunch of calls to metafy() to process the
> > results of extraction in zpcre_get_substrings() ?
> 
> You'll need to unmetafy any string getting passed into
> pcre_get_substring_list() and metafy() the resulting captures coming
> out.  You should duplicate any string that needs unmetafying, since
> otherwise it's in place and you may need the metafied form later (you do
> for the string passed in as the first argument).

Okay, it took me far too long to get back around to this, sorry. :(

Attached is what looks to me to be a correct patch.  With bash_rematch
set, I can do:
  % [[ 'foo→bar' =~ .([^[:ascii:]]). ]]
  % echo $BASH_REMATCH 
  o→b →
  % [[ 'foo→bar' =~ .(→.). ]]
  % echo $BASH_REMATCH
  o→ba →b


I'm not sure on when I should be using the wcs_strdup() functions and
the like; what I've got appears to work.  None of what I've added
appears to be specific to UTF-8.

Is it reasonable to add tests to D07multibyte.ztst for this, with the
zsh/pcre dependency?

Can anyone spot any cases I've missed in zsh/pcre ?

Does anyone know of a system extended regexp library which supports
multibyte characters?  I think I should be making the same changes to
zsh/regex but am not sure where to actually test those changes.

Regards,
-Phil

[-- Attachment #2: pcre-utf8.patch --]
[-- Type: text/x-diff, Size: 5439 bytes --]

Index: Src/Modules/pcre.c
===================================================================
RCS file: /home/cvsroot/zsh/Src/Modules/pcre.c,v
retrieving revision 1.18
diff -a -u -p -r1.18 pcre.c
--- Src/Modules/pcre.c	20 Jan 2010 11:17:11 -0000	1.18
+++ Src/Modules/pcre.c	21 Oct 2011 09:43:29 -0000
@@ -77,6 +77,7 @@ bin_pcre_compile(char *nam, char **args,
 {
     int pcre_opts = 0, pcre_errptr;
     const char *pcre_error;
+    char *target;
     
     if(OPT_ISSET(ops,'a')) pcre_opts |= PCRE_ANCHORED;
     if(OPT_ISSET(ops,'i')) pcre_opts |= PCRE_CASELESS;
@@ -92,8 +93,13 @@ bin_pcre_compile(char *nam, char **args,
     if (pcre_pattern)
 	pcre_free(pcre_pattern);
 
-    pcre_pattern = pcre_compile(*args, pcre_opts, &pcre_error, &pcre_errptr, NULL);
+    target = ztrdup(*args);
+    unmetafy(target, NULL);
+
+    pcre_pattern = pcre_compile(target, pcre_opts, &pcre_error, &pcre_errptr, NULL);
     
+    free(target);
+
     if (pcre_pattern == NULL)
     {
 	zwarnnam(nam, "error in regex: %s", pcre_error);
@@ -161,7 +167,7 @@ zpcre_get_substrings(char *arg, int *ove
 	    sprintf(offset_all, "%d %d", ovec[0], ovec[1]);
 	    setsparam("ZPCRE_OP", ztrdup(offset_all));
 	}
-	match_all = ztrdup(captures[0]);
+	match_all = metafy(captures[0], -1, META_DUP);
 	setsparam(matchvar, match_all);
 	/*
 	 * If we're setting match, mbegin, mend we only do
@@ -169,7 +175,15 @@ zpcre_get_substrings(char *arg, int *ove
 	 * (c.f. regex.c).
 	 */
 	if (!want_begin_end || nelem) {
-	    matches = zarrdup(&captures[capture_start]);
+	    char **x, **y;
+	    y = &captures[capture_start];
+	    matches = x = (char **) zalloc(sizeof(char *) * (arrlen(y) + 1));
+	    do {
+		if (*y)
+		    *x++ = metafy(*y, -1, META_DUP);
+		else
+		    *x++ = NULL;
+	    } while (*y++);
 	    setaparam(substravar, matches);
 	}
 
@@ -255,6 +269,7 @@ bin_pcre_match(char *nam, char **args, O
 {
     int ret, capcount, *ovec, ovecsize, c;
     char *matched_portion = NULL;
+    char *plaintext = NULL;
     char *receptacle = NULL;
     int return_value = 1;
     /* The subject length and offset start are both int values in pcre_exec */
@@ -292,22 +307,23 @@ bin_pcre_match(char *nam, char **args, O
     ovecsize = (capcount+1)*3;
     ovec = zalloc(ovecsize*sizeof(int));
     
-    subject_len = (int)strlen(*args);
+    plaintext = ztrdup(*args);
+    subject_len = (int)strlen(plaintext);
 
     if (offset_start < 0 || offset_start >= subject_len)
 	ret = PCRE_ERROR_NOMATCH;
     else
-	ret = pcre_exec(pcre_pattern, pcre_hints, *args, subject_len, offset_start, 0, ovec, ovecsize);
+	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);
 
     if (ret==0) return_value = 0;
     else if (ret==PCRE_ERROR_NOMATCH) /* no match */;
     else if (ret>0) {
-	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle,
+	zpcre_get_substrings(plaintext, ovec, ret, matched_portion, receptacle,
 			     want_offset_pair, 0, 0);
 	return_value = 0;
     }
     else {
-	zwarnnam(nam, "error in pcre_exec");
+	zwarnnam(nam, "error in pcre_exec [%d]", ret);
     }
     
     if (ovec)
@@ -322,7 +338,8 @@ cond_pcre_match(char **a, int id)
 {
     pcre *pcre_pat;
     const char *pcre_err;
-    char *lhstr, *rhre, *avar=NULL;
+    char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar=NULL;
+    char *p;
     int r = 0, pcre_opts = 0, pcre_errptr, capcnt, *ov, ovsize;
     int return_value = 0;
 
@@ -331,6 +348,10 @@ cond_pcre_match(char **a, int id)
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);
+    lhstr_plain = ztrdup(lhstr);
+    rhre_plain = ztrdup(rhre);
+    unmetafy(lhstr_plain, NULL);
+    unmetafy(rhre_plain, NULL);
     pcre_pat = NULL;
     ov = NULL;
 
@@ -339,7 +360,7 @@ cond_pcre_match(char **a, int id)
 
     switch(id) {
 	 case CPCRE_PLAIN:
-		pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
+		pcre_pat = pcre_compile(rhre_plain, pcre_opts, &pcre_err, &pcre_errptr, NULL);
 		if (pcre_pat == NULL) {
 		    zwarn("failed to compile regexp /%s/: %s", rhre, pcre_err);
 		    break;
@@ -347,7 +368,7 @@ cond_pcre_match(char **a, int id)
                 pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
     		ovsize = (capcnt+1)*3;
 		ov = zalloc(ovsize*sizeof(int));
-    		r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
+    		r = pcre_exec(pcre_pat, NULL, lhstr_plain, strlen(lhstr_plain), 0, 0, ov, ovsize);
 		/* r < 0 => error; r==0 match but not enough size in ov
 		 * r > 0 => (r-1) substrings found; r==1 => no substrings
 		 */
@@ -356,13 +377,16 @@ cond_pcre_match(char **a, int id)
 		    return_value = 1;
 		    break;
 		}
-	        else if (r==PCRE_ERROR_NOMATCH) return 0; /* no match */
+	        else if (r==PCRE_ERROR_NOMATCH) {
+		    return_value = 0; /* no match */
+		    break;
+		}
 		else if (r<0) {
-		    zwarn("pcre_exec() error: %d", r);
+		    zwarn("pcre_exec() error [%d]", r);
 		    break;
 		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, 0,
+		    zpcre_get_substrings(lhstr_plain, ov, r, NULL, avar, 0,
 					 isset(BASHREMATCH),
 					 !isset(BASHREMATCH));
 		    return_value = 1;
@@ -371,6 +395,10 @@ cond_pcre_match(char **a, int id)
 		break;
     }
 
+    if (lhstr_plain)
+	free(lhstr_plain);
+    if(rhre_plain)
+	free(rhre_plain);
     if (pcre_pat)
 	pcre_free(pcre_pat);
     if (ov)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UTF-8 and PCRE and metafy
  2011-10-21  9:56   ` [patch] " Phil Pennock
@ 2011-10-21 10:35     ` Phil Pennock
  2011-10-23 16:32     ` [patch] " Peter Stephenson
  1 sibling, 0 replies; 8+ messages in thread
From: Phil Pennock @ 2011-10-21 10:35 UTC (permalink / raw)
  To: zsh-workers; +Cc: Peter Stephenson

[-- Attachment #1: Type: text/plain, Size: 308 bytes --]

On 2011-10-21 at 05:56 -0400, Phil Pennock wrote:
> Attached is what looks to me to be a correct patch.  With bash_rematch

Oops, for the change around line 292, I switched to a new variable but
never called unmetafy().

One line added in the attached patch, which replaces the previous patch.

Sorry,
-Phil

[-- Attachment #2: pcre-utf8.patch --]
[-- Type: text/x-diff, Size: 5837 bytes --]

Index: Src/Modules/pcre.c
===================================================================
RCS file: /home/cvsroot/zsh/Src/Modules/pcre.c,v
retrieving revision 1.18
diff -a -u -p -r1.18 pcre.c
--- Src/Modules/pcre.c	20 Jan 2010 11:17:11 -0000	1.18
+++ Src/Modules/pcre.c	21 Oct 2011 10:29:14 -0000
@@ -77,6 +77,7 @@ bin_pcre_compile(char *nam, char **args,
 {
     int pcre_opts = 0, pcre_errptr;
     const char *pcre_error;
+    char *target;
     
     if(OPT_ISSET(ops,'a')) pcre_opts |= PCRE_ANCHORED;
     if(OPT_ISSET(ops,'i')) pcre_opts |= PCRE_CASELESS;
@@ -92,8 +93,13 @@ bin_pcre_compile(char *nam, char **args,
     if (pcre_pattern)
 	pcre_free(pcre_pattern);
 
-    pcre_pattern = pcre_compile(*args, pcre_opts, &pcre_error, &pcre_errptr, NULL);
+    target = ztrdup(*args);
+    unmetafy(target, NULL);
+
+    pcre_pattern = pcre_compile(target, pcre_opts, &pcre_error, &pcre_errptr, NULL);
     
+    free(target);
+
     if (pcre_pattern == NULL)
     {
 	zwarnnam(nam, "error in regex: %s", pcre_error);
@@ -161,7 +167,7 @@ zpcre_get_substrings(char *arg, int *ove
 	    sprintf(offset_all, "%d %d", ovec[0], ovec[1]);
 	    setsparam("ZPCRE_OP", ztrdup(offset_all));
 	}
-	match_all = ztrdup(captures[0]);
+	match_all = metafy(captures[0], -1, META_DUP);
 	setsparam(matchvar, match_all);
 	/*
 	 * If we're setting match, mbegin, mend we only do
@@ -169,7 +175,15 @@ zpcre_get_substrings(char *arg, int *ove
 	 * (c.f. regex.c).
 	 */
 	if (!want_begin_end || nelem) {
-	    matches = zarrdup(&captures[capture_start]);
+	    char **x, **y;
+	    y = &captures[capture_start];
+	    matches = x = (char **) zalloc(sizeof(char *) * (arrlen(y) + 1));
+	    do {
+		if (*y)
+		    *x++ = metafy(*y, -1, META_DUP);
+		else
+		    *x++ = NULL;
+	    } while (*y++);
 	    setaparam(substravar, matches);
 	}
 
@@ -255,6 +269,7 @@ bin_pcre_match(char *nam, char **args, O
 {
     int ret, capcount, *ovec, ovecsize, c;
     char *matched_portion = NULL;
+    char *plaintext = NULL;
     char *receptacle = NULL;
     int return_value = 1;
     /* The subject length and offset start are both int values in pcre_exec */
@@ -278,7 +293,7 @@ bin_pcre_match(char *nam, char **args, O
     }
     /* For the entire match, 'Return' the offset byte positions instead of the matched string */
     if(OPT_ISSET(ops,'b')) want_offset_pair = 1; 
-    
+
     if(!*args) {
 	zwarnnam(nam, "not enough arguments");
     }
@@ -288,26 +303,28 @@ bin_pcre_match(char *nam, char **args, O
 	zwarnnam(nam, "error %d in fullinfo", ret);
 	return 1;
     }
-    
+
     ovecsize = (capcount+1)*3;
     ovec = zalloc(ovecsize*sizeof(int));
-    
-    subject_len = (int)strlen(*args);
+
+    plaintext = ztrdup(*args);
+    unmetafy(plaintext, NULL);
+    subject_len = (int)strlen(plaintext);
 
     if (offset_start < 0 || offset_start >= subject_len)
 	ret = PCRE_ERROR_NOMATCH;
     else
-	ret = pcre_exec(pcre_pattern, pcre_hints, *args, subject_len, offset_start, 0, ovec, ovecsize);
+	ret = pcre_exec(pcre_pattern, pcre_hints, plaintext, subject_len, offset_start, 0, ovec, ovecsize);
 
     if (ret==0) return_value = 0;
     else if (ret==PCRE_ERROR_NOMATCH) /* no match */;
     else if (ret>0) {
-	zpcre_get_substrings(*args, ovec, ret, matched_portion, receptacle,
+	zpcre_get_substrings(plaintext, ovec, ret, matched_portion, receptacle,
 			     want_offset_pair, 0, 0);
 	return_value = 0;
     }
     else {
-	zwarnnam(nam, "error in pcre_exec");
+	zwarnnam(nam, "error in pcre_exec [%d]", ret);
     }
     
     if (ovec)
@@ -322,7 +339,8 @@ cond_pcre_match(char **a, int id)
 {
     pcre *pcre_pat;
     const char *pcre_err;
-    char *lhstr, *rhre, *avar=NULL;
+    char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar=NULL;
+    char *p;
     int r = 0, pcre_opts = 0, pcre_errptr, capcnt, *ov, ovsize;
     int return_value = 0;
 
@@ -331,6 +349,10 @@ cond_pcre_match(char **a, int id)
 
     lhstr = cond_str(a,0,0);
     rhre = cond_str(a,1,0);
+    lhstr_plain = ztrdup(lhstr);
+    rhre_plain = ztrdup(rhre);
+    unmetafy(lhstr_plain, NULL);
+    unmetafy(rhre_plain, NULL);
     pcre_pat = NULL;
     ov = NULL;
 
@@ -339,7 +361,7 @@ cond_pcre_match(char **a, int id)
 
     switch(id) {
 	 case CPCRE_PLAIN:
-		pcre_pat = pcre_compile(rhre, pcre_opts, &pcre_err, &pcre_errptr, NULL);
+		pcre_pat = pcre_compile(rhre_plain, pcre_opts, &pcre_err, &pcre_errptr, NULL);
 		if (pcre_pat == NULL) {
 		    zwarn("failed to compile regexp /%s/: %s", rhre, pcre_err);
 		    break;
@@ -347,7 +369,7 @@ cond_pcre_match(char **a, int id)
                 pcre_fullinfo(pcre_pat, NULL, PCRE_INFO_CAPTURECOUNT, &capcnt);
     		ovsize = (capcnt+1)*3;
 		ov = zalloc(ovsize*sizeof(int));
-    		r = pcre_exec(pcre_pat, NULL, lhstr, strlen(lhstr), 0, 0, ov, ovsize);
+    		r = pcre_exec(pcre_pat, NULL, lhstr_plain, strlen(lhstr_plain), 0, 0, ov, ovsize);
 		/* r < 0 => error; r==0 match but not enough size in ov
 		 * r > 0 => (r-1) substrings found; r==1 => no substrings
 		 */
@@ -356,13 +378,16 @@ cond_pcre_match(char **a, int id)
 		    return_value = 1;
 		    break;
 		}
-	        else if (r==PCRE_ERROR_NOMATCH) return 0; /* no match */
+	        else if (r==PCRE_ERROR_NOMATCH) {
+		    return_value = 0; /* no match */
+		    break;
+		}
 		else if (r<0) {
-		    zwarn("pcre_exec() error: %d", r);
+		    zwarn("pcre_exec() error [%d]", r);
 		    break;
 		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr, ov, r, NULL, avar, 0,
+		    zpcre_get_substrings(lhstr_plain, ov, r, NULL, avar, 0,
 					 isset(BASHREMATCH),
 					 !isset(BASHREMATCH));
 		    return_value = 1;
@@ -371,6 +396,10 @@ cond_pcre_match(char **a, int id)
 		break;
     }
 
+    if (lhstr_plain)
+	free(lhstr_plain);
+    if(rhre_plain)
+	free(rhre_plain);
     if (pcre_pat)
 	pcre_free(pcre_pat);
     if (ov)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] Re: UTF-8 and PCRE and metafy
  2011-10-21  9:56   ` [patch] " Phil Pennock
  2011-10-21 10:35     ` Phil Pennock
@ 2011-10-23 16:32     ` Peter Stephenson
  2011-10-24 11:35       ` Phil Pennock
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2011-10-23 16:32 UTC (permalink / raw)
  To: zsh-workers

On Fri, 21 Oct 2011 05:56:25 -0400
Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> On 2011-03-08 at 09:58 +0000, Peter Stephenson wrote:
> > On Tue, 8 Mar 2011 01:52:16 -0500
> > Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> > > I'm guessing I need a bunch of calls to metafy() to process the
> > > results of extraction in zpcre_get_substrings() ?
> > 
> > You'll need to unmetafy any string getting passed into
> > pcre_get_substring_list() and metafy() the resulting captures coming
> > out.  You should duplicate any string that needs unmetafying, since
> > otherwise it's in place and you may need the metafied form later (you do
> > for the string passed in as the first argument).
> 
> Okay, it took me far too long to get back around to this, sorry. :(
> 
> Attached is what looks to me to be a correct patch.

I didn't look through in great detail, so I haven't validated the
structure, but certainly what I saw looked fine
 
> I'm not sure on when I should be using the wcs_strdup() functions and
> the like; what I've got appears to work.  None of what I've added
> appears to be specific to UTF-8.

You probably don't need wcs_strdup().  When metafied, normals str*
functions work because the NULLs are converted to Meta + space; when not
metafied you should the length around in a variable and can use mem*
functions.
 
> Is it reasonable to add tests to D07multibyte.ztst for this, with the
> zsh/pcre dependency?

You'd probably need to encapsulate it within a test for loading the
library, given which it's probably easier just to copy D07multibyte.ztst
and add to the prerequisites a test for loading zsh/pcre at that point,
which ought to be easy.
 
-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] Re: UTF-8 and PCRE and metafy
  2011-10-23 16:32     ` [patch] " Peter Stephenson
@ 2011-10-24 11:35       ` Phil Pennock
  2011-10-24 11:43         ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Pennock @ 2011-10-24 11:35 UTC (permalink / raw)
  To: zsh-workers

On 2011-10-23 at 17:32 +0100, Peter Stephenson wrote:
> I didn't look through in great detail, so I haven't validated the
> structure, but certainly what I saw looked fine

Committed; 1.19 of pcre.c.

> You'd probably need to encapsulate it within a test for loading the
> library, given which it's probably easier just to copy D07multibyte.ztst
> and add to the prerequisites a test for loading zsh/pcre at that point,
> which ought to be easy.

I even tested the test before commit!  *cough*  Which shows that I still
have Y01completion.ztst hanging forever for me; need to figure that out.

Added "V07pcre.ztst", added a few tests.

At some point, there was a discussion about whether $MATCH/$match should
be _unset_ when regex tests fail: at present we don't.  Did anyone have
any strong opinions on this?  At the moment, I unset manually after each
print of results, for a batch of simple tests I added.

-Phil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] Re: UTF-8 and PCRE and metafy
  2011-10-24 11:35       ` Phil Pennock
@ 2011-10-24 11:43         ` Peter Stephenson
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Stephenson @ 2011-10-24 11:43 UTC (permalink / raw)
  To: zsh-workers

On Mon, 24 Oct 2011 07:35:29 -0400
Phil Pennock <zsh-workers+phil.pennock@spodhuis.org> wrote:
> At some point, there was a discussion about whether $MATCH/$match should
> be _unset_ when regex tests fail: at present we don't.  Did anyone have
> any strong opinions on this?  At the moment, I unset manually after each
> print of results, for a batch of simple tests I added.

I doubt anyone's dependent on which way this is written, it would be an
odd way to write code.  However, it's sort of semi-standard that if what
you were doing didn't work you don't touch the variables you would have
set.  Possible there are already cases where this isn't true.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Follow CSR on Twitter at http://twitter.com/CSR_PLC and read our blog at www.csr.com/blog

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-10-24 11:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-08  6:52 UTF-8 and PCRE and metafy Phil Pennock
2011-03-08  8:19 ` Bart Schaefer
2011-03-08  9:58 ` Peter Stephenson
2011-10-21  9:56   ` [patch] " Phil Pennock
2011-10-21 10:35     ` Phil Pennock
2011-10-23 16:32     ` [patch] " Peter Stephenson
2011-10-24 11:35       ` Phil Pennock
2011-10-24 11:43         ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).