zsh-workers
 help / color / mirror / code / Atom feed
* 5.4.1 regression: PCRE with bash_rematch
@ 2017-08-13 20:49 Phil Pennock
  2017-08-13 21:12 ` Phil Pennock
  0 siblings, 1 reply; 3+ messages in thread
From: Phil Pennock @ 2017-08-13 20:49 UTC (permalink / raw)
  To: zsh-workers

I'm still trying to track this down, but getting what I have out there
because in checking just now, I saw that a 5.4.2 might be imminent and I
think this should block.

% /usr/local/Cellar/zsh/5.4.1/bin/zsh -f
osmium% setopt rematchpcre bash_rematch
osmium% [[ "server" =~ ^[^@:/]+$ ]]
zsh: segmentation fault (core dumped)  /usr/local/Cellar/zsh/5.4.1/bin/zsh -f

One of my git utility scripts broke on me when I tried to use it today,
with 5.4.1 on macOS installed by brew.  `brew switch zsh 5.3.1_1` got it
working again.  With a local git-head install built with memory
debugging, I get more details.

zsh(29859,0x7fffe51263c0) malloc: *** mach_vm_map(size=18446744071562072064) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
+./git-clone-each-pdp:49> [[ "$Remote" -pcre-match ^[^@:/]+$./git-clone-each-pdp:49: fatal error: out of memory

The line is:

	elif [[ "$Remote" =~ ^[^@:/]+$ ]]; then

The value of Remote is just "server" (without quotes).

rematchpcre bash_rematch and ksh_arrays are set, only the first two are
needed to reproduce under -f; actually only bash_rematch if -pcre-match
is explicit:

% /opt/zsh-devel/bin/zsh -f
osmium% setopt bash_rematch
osmium% zmodload zsh/pcre
osmium% [[ "server" -pcre-match ^[^@:/]+$ ]]
zsh: segmentation fault (core dumped)  /opt/zsh-devel/bin/zsh -f

With that last one:
-------------------------------8< lldb >8-------------------------------
% lldb -c /cores/core.29982 zsh
(lldb) target create "zsh" --core "/cores/core.29982"
warning: (x86_64) /cores/core.29982 load command 367 LC_SEGMENT_64 has a fileoff + filesize (0x2d11f000) that extends beyond the end of the file (0x2d11e000), the segment will be truncated to match
warning: (x86_64) /cores/core.29982 load command 368 LC_SEGMENT_64 has a fileoff (0x2d11f000) that extends beyond the end of the file (0x2d11e000), ignoring this section
Core file '/cores/core.29982' (x86_64) was loaded.
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x0000000108abeef0 zsh`metafy + 64
    frame #1: 0x0000000108c185c8 pcre.so`___lldb_unnamed_symbol4$$pcre.so + 360
    frame #2: 0x0000000108c18b2b pcre.so`___lldb_unnamed_symbol5$$pcre.so + 619
    frame #3: 0x0000000108a4df86 zsh`evalcond + 1254
    frame #4: 0x0000000108a55992 zsh`___lldb_unnamed_symbol30$$zsh + 98
    frame #5: 0x0000000108a50a61 zsh`___lldb_unnamed_symbol18$$zsh + 497
    frame #6: 0x0000000108a50325 zsh`execlist + 869
    frame #7: 0x0000000108a4ff93 zsh`execode + 195
    frame #8: 0x0000000108a6dcfc zsh`loop + 812
    frame #9: 0x0000000108a70ee4 zsh`zsh_main + 1476
    frame #10: 0x00007fffdc212235 libdyld.dylib`start + 1
    frame #11: 0x00007fffdc212235 libdyld.dylib`start + 1
-------------------------------8< lldb >8-------------------------------


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 5.4.1 regression: PCRE with bash_rematch
  2017-08-13 20:49 5.4.1 regression: PCRE with bash_rematch Phil Pennock
@ 2017-08-13 21:12 ` Phil Pennock
  2017-08-13 22:18   ` [PATCH] Repair BASH_REMATCH with no substrings Phil Pennock
  0 siblings, 1 reply; 3+ messages in thread
From: Phil Pennock @ 2017-08-13 21:12 UTC (permalink / raw)
  To: zsh-workers

On 2017-08-13 at 16:49 -0400, Phil Pennock wrote:
> I'm still trying to track this down, but getting what I have out there
> because in checking just now, I saw that a 5.4.2 might be imminent and I
> think this should block.

Definitely; this is a regression from my NUL fixing and trying to
correctly meta/unmeta all parameters going through.  Change 41308 in
commit 825f84c77 exposed a bug introduced in 2011 in commit 2f3c16d40f.

> % /opt/zsh-devel/bin/zsh -f
> osmium% setopt bash_rematch
> osmium% zmodload zsh/pcre
> osmium% [[ "server" -pcre-match ^[^@:/]+$ ]]

186         if (!want_begin_end || nelem) {
187             char **x, **y;
188             int vec_off;
189             y = &captures[capture_start];
190             matches = x = (char **) zalloc(sizeof(char *) * (arrlen(y) + 1));
191             vec_off = 2;
192             do {
193                 if (*y)
194                     *x++ = metafy(*y, ovec[vec_off+1]-ovec[vec_off], META_DUP);
195                 else
196                     *x++ = NULL;
197                 vec_off += 2;
198             } while (*y++);
199             setaparam(substravar, matches);
200         }

We hit problems in line 194, we're capturing for sub-expressions, but
there are no sub-expressions.  We have a regexp with no sub-captures but
through the bash_rematch path we're hitting this which assumes that we
are.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] Repair BASH_REMATCH with no substrings
  2017-08-13 21:12 ` Phil Pennock
@ 2017-08-13 22:18   ` Phil Pennock
  0 siblings, 0 replies; 3+ messages in thread
From: Phil Pennock @ 2017-08-13 22:18 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 7018 bytes --]

On 2017-08-13 at 17:12 -0400, Phil Pennock wrote:
> Definitely; this is a regression from my NUL fixing and trying to
> correctly meta/unmeta all parameters going through.  Change 41308 in
> commit 825f84c77 exposed a bug introduced in 2011 in commit 2f3c16d40f.

From 41815a3b5e7324fa188d24f558ea9b0026a1a110 Mon Sep 17 00:00:00 2001
From: Phil Pennock <phil@pennock-tech.com>
Date: Sun, 13 Aug 2017 18:13:41 -0400
Subject: [PATCH] Repair BASH_REMATCH with no substrings

Change 41308 in commit 825f84c77 exposed a bug introduced in 2011 in
commit 2f3c16d40f.  (Both mine).

When we went off the end of the array but measured the length
implicitly, we got lucky before.  After 41308 we were looking up lengths
in stale memory.

Rename some variables, clean up the logic, be easier to understand.

Add tests.
---
 Src/Modules/pcre.c | 68 ++++++++++++++++++++++++++++++++++--------------------
 Test/V07pcre.ztst  | 24 +++++++++++++++++++
 2 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index 27191d709..659fd22d5 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -148,7 +148,7 @@ bin_pcre_study(char *nam, UNUSED(char **args), UNUSED(Options ops), UNUSED(int f
 
 /**/
 static int
-zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
+zpcre_get_substrings(char *arg, int *ovec, int captured_count, char *matchvar,
 		     char *substravar, int want_offset_pair, int matchedinarr,
 		     int want_begin_end)
 {
@@ -156,15 +156,13 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
     char offset_all[50];
     int capture_start = 1;
 
-    if (matchedinarr)
+    if (matchedinarr) {
+	/* bash-style captures[0] entire-matched string in the array */
 	capture_start = 0;
-    if (matchvar == NULL)
-	matchvar = "MATCH";
-    if (substravar == NULL)
-	substravar = "match";
-    
+    }
+
     /* captures[0] will be entire matched string, [1] first substring */
-    if (!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures)) {
+    if (!pcre_get_substring_list(arg, ovec, captured_count, (const char ***)&captures)) {
 	int nelem = arrlen(captures)-1;
 	/* Set to the offsets of the complete match */
 	if (want_offset_pair) {
@@ -176,30 +174,43 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
 	 * difference between the two values in each paired entry in ovec.
 	 * ovec is length 2*(1+capture_list_length)
 	 */
-	match_all = metafy(captures[0], ovec[1] - ovec[0], META_DUP);
-	setsparam(matchvar, match_all);
+	if (matchvar) {
+	    match_all = metafy(captures[0], ovec[1] - ovec[0], META_DUP);
+	    setsparam(matchvar, match_all);
+	}
 	/*
 	 * If we're setting match, mbegin, mend we only do
 	 * so if there were parenthesised matches, for consistency
-	 * (c.f. regex.c).
+	 * (c.f. regex.c).  That's the next block after this one.
+	 * Here we handle the simpler case where we don't worry about
+	 * Unicode lengths, etc.
+	 * Either !want_begin_end (ie, this is bash) or nelem; if bash
+	 * then we're invoked always, even without nelem results, to
+	 * set the array variable with one element in it, the complete match.
 	 */
-	if (!want_begin_end || nelem) {
+	if (substravar &&
+	    (!want_begin_end || nelem)) {
 	    char **x, **y;
-	    int vec_off;
+	    int vec_off, i;
 	    y = &captures[capture_start];
-	    matches = x = (char **) zalloc(sizeof(char *) * (arrlen(y) + 1));
-	    vec_off = 2;
-	    do {
+	    matches = x = (char **) zalloc(sizeof(char *) * (captured_count+1-capture_start));
+	    for (i = capture_start; i < captured_count; i++, y++) {
+		vec_off = 2*i;
 		if (*y)
 		    *x++ = metafy(*y, ovec[vec_off+1]-ovec[vec_off], META_DUP);
 		else
 		    *x++ = NULL;
-		vec_off += 2;
-	    } while (*y++);
+	    }
+	    *x = NULL;
 	    setaparam(substravar, matches);
 	}
 
 	if (want_begin_end) {
+	    /*
+	     * cond-infix rather than builtin; also not bash; so we set a bunch
+	     * of variables and arrays to values which require handling Unicode
+	     * lengths
+	     */
 	    char *ptr = arg;
 	    zlong offs = 0;
 	    int clen, leftlen;
@@ -306,7 +317,9 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 	zwarnnam(nam, "no pattern has been compiled");
 	return 1;
     }
-    
+
+    matched_portion = "MATCH";
+    receptacle = "match";
     if(OPT_HASARG(ops,c='a')) {
 	receptacle = OPT_ARG(ops,c);
     }
@@ -318,8 +331,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNUSED(int func))
 	    return 1;
     }
     /* For the entire match, 'Return' the offset byte positions instead of the matched string */
-    if(OPT_ISSET(ops,'b')) want_offset_pair = 1; 
-    
+    if(OPT_ISSET(ops,'b')) want_offset_pair = 1;
+
     if ((ret = pcre_fullinfo(pcre_pattern, pcre_hints, PCRE_INFO_CAPTURECOUNT, &capcount)))
     {
 	zwarnnam(nam, "error %d in fullinfo", ret);
@@ -360,7 +373,7 @@ cond_pcre_match(char **a, int id)
 {
     pcre *pcre_pat;
     const char *pcre_err;
-    char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar=NULL;
+    char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar, *svar;
     int r = 0, pcre_opts = 0, pcre_errptr, capcnt, *ov, ovsize;
     int lhstr_plain_len, rhre_plain_len;
     int return_value = 0;
@@ -380,8 +393,13 @@ cond_pcre_match(char **a, int id)
     ov = NULL;
     ovsize = 0;
 
-    if (isset(BASHREMATCH))
-	avar="BASH_REMATCH";
+    if (isset(BASHREMATCH)) {
+	svar = NULL;
+	avar = "BASH_REMATCH";
+    } else {
+	svar = "MATCH";
+	avar = "match";
+    }
 
     switch(id) {
 	 case CPCRE_PLAIN:
@@ -414,7 +432,7 @@ cond_pcre_match(char **a, int id)
 		    break;
 		}
                 else if (r>0) {
-		    zpcre_get_substrings(lhstr_plain, ov, r, NULL, avar, 0,
+		    zpcre_get_substrings(lhstr_plain, ov, r, svar, avar, 0,
 					 isset(BASHREMATCH),
 					 !isset(BASHREMATCH));
 		    return_value = 1;
diff --git a/Test/V07pcre.ztst b/Test/V07pcre.ztst
index ab41d33dc..9feeb47fb 100644
--- a/Test/V07pcre.ztst
+++ b/Test/V07pcre.ztst
@@ -142,9 +142,33 @@
   print $?
   [[ foo -pcre-match ^g..$ ]]
   print $?
+  [[ ! foo -pcre-match ^g..$ ]]
+  print $?
 0:infix -pcre-match works
 >0
 >1
+>0
+
+# Bash mode; note zsh documents that variables not updated on match failure,
+# which remains different from bash
+  setopt bash_rematch
+  [[ "goo" -pcre-match ^f.+$ ]] ; print $?
+  [[ "foo" -pcre-match ^f.+$ ]] ; print -l $? _${^BASH_REMATCH[@]}
+  [[ "foot" -pcre-match ^f([aeiou]+)(.)$ ]]; print -l $? _${^BASH_REMATCH[@]}
+  [[ "foo" -pcre-match ^f.+$ ]] ; print -l $? _${^BASH_REMATCH[@]}
+  [[ ! "goo" -pcre-match ^f.+$ ]] ; print $?
+  unsetopt bash_rematch
+0:bash-compatibility works
+>1
+>0
+>_foo
+>0
+>_foot
+>_oo
+>_t
+>0
+>_foo
+>0
 
 # Subshell because crash on failure
   ( setopt re_match_pcre
-- 
2.14.1


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 996 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-08-13 22:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-13 20:49 5.4.1 regression: PCRE with bash_rematch Phil Pennock
2017-08-13 21:12 ` Phil Pennock
2017-08-13 22:18   ` [PATCH] Repair BASH_REMATCH with no substrings Phil Pennock

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).