Strange behavior of [[

zsh-workers
 help / color / mirror / code / Atom feed

* Strange behavior of [[
@ 2015-06-10  3:27 Maxime Arthaud
  2015-06-10  5:31 ` Bart Schaefer
  2016-01-08 13:09 ` Jun T.
  0 siblings, 2 replies; 9+ messages in thread
From: Maxime Arthaud @ 2015-06-10  3:27 UTC (permalink / raw)
  To: zsh-workers

Hi everybody!

I just found a very strange behavior in zsh (v5.0.8).

% [[ " X" =~ "X" ]]
where in " X" the first character is a non-breaking space (0xa0).
My shell gets stuck, and Ctrl-C is not working. With bash, no problem.

Does anyone have an explanation? I think it's a bug.

Regards,

-- 
Maxime


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-10  3:27 Strange behavior of [[ Maxime Arthaud
@ 2015-06-10  5:31 ` Bart Schaefer
  2015-06-10  8:55   ` Peter Stephenson
  2016-01-08 13:09 ` Jun T.
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2015-06-10  5:31 UTC (permalink / raw)
  To: zsh-workers

On Jun 9,  8:27pm, Maxime Arthaud wrote:
} Subject: Strange behavior of [[
}
} Hi everybody!
} 
} I just found a very strange behavior in zsh (v5.0.8).
} 
} % [[ " X" =~ "X" ]]
} where in " X" the first character is a non-breaking space (0xa0).
} My shell gets stuck, and Ctrl-C is not working. With bash, no problem.
} 
} Does anyone have an explanation? I think it's a bug.

MB_METACHARLEN() is returning that 0xa0 is a zero-width character, so
"ptr" in the "while (ptr < lhstr + m->rm_so)" loop in regex.c never
advances.  That macro ultimately resolves to mb_metacharlenconv_r()
from utils.c, which returns zero here:

4861		return 0;		/* Probably shouldn't happen */

This means that imeta() is (incorrectly?) returning true for 0xa0, which
might mean that we're passing an unmetafied string where a metafied
string is expected.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-10  5:31 ` Bart Schaefer
@ 2015-06-10  8:55   ` Peter Stephenson
  2015-06-11 16:59     ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2015-06-10  8:55 UTC (permalink / raw)
  To: zsh-workers

On Tue, 9 Jun 2015 22:31:56 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Jun 9,  8:27pm, Maxime Arthaud wrote:
> } Subject: Strange behavior of [[
> }
> } Hi everybody!
> } 
> } I just found a very strange behavior in zsh (v5.0.8).
> } 
> } % [[ " X" =~ "X" ]]
> } where in " X" the first character is a non-breaking space (0xa0).
> } My shell gets stuck, and Ctrl-C is not working. With bash, no problem.
> } 
> } Does anyone have an explanation? I think it's a bug.
> 
> MB_METACHARLEN() is returning that 0xa0 is a zero-width character, so
> "ptr" in the "while (ptr < lhstr + m->rm_so)" loop in regex.c never
> advances.  That macro ultimately resolves to mb_metacharlenconv_r()
> from utils.c, which returns zero here:
> 
> 4861		return 0;		/* Probably shouldn't happen */
>
> This means that imeta() is (incorrectly?) returning true for 0xa0, which
> might mean that we're passing an unmetafied string where a metafied
> string is expected.

Yes, that's obvious from the context.  You can see lhstr being metafied
above to go into a variable, but the unmetafied variant is then handled
as if it was metafied.  The problem is the match offsets are all in
unmetafied form from the regexp library.  Rather than attempt to metafy
with those, It probably needs to change to use mbrtowc() based on
unmetafied chracters, with simple code for the case of no
MULTIBYTE_SUPPORT.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-10  8:55   ` Peter Stephenson
@ 2015-06-11 16:59     ` Peter Stephenson
  2015-06-22 15:56       ` m0viefreak
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Stephenson @ 2015-06-11 16:59 UTC (permalink / raw)
  To: zsh-workers

The change from mb_metacharinit() to mb_charinit() is a bit unsightly
but the name had got just plain confusing --- there's nothing meta about
it.  I never metacharacter I couldn't parse.

(No multibyte characters were harmed in the preparation of this email;
I've used $'\ua0'.)

pws

diff --git a/Src/Modules/curses.c b/Src/Modules/curses.c
index 41ad2c6..62dbd55 100644
--- a/Src/Modules/curses.c
+++ b/Src/Modules/curses.c
@@ -765,7 +765,7 @@ zccmd_string(const char *nam, char **args)
     w = (ZCWin)getdata(node);
 
 #ifdef HAVE_WADDWSTR
-    mb_metacharinit();
+    mb_charinit();
     wptr = wstr = zhalloc((strlen(str)+1) * sizeof(wchar_t));
 
     while (*str && (clen = mb_metacharlenconv(str, &wc))) {
diff --git a/Src/Modules/regex.c b/Src/Modules/regex.c
index ce57de9..94f523f 100644
--- a/Src/Modules/regex.c
+++ b/Src/Modules/regex.c
@@ -115,6 +115,7 @@ zcond_regex_match(char **a, int id)
 	    } else {
 		zlong offs;
 		char *ptr;
+		int clen, leftlen;
 
 		m = matches;
 		s = metafy(lhstr + m->rm_so, m->rm_eo - m->rm_so, META_DUP);
@@ -123,19 +124,25 @@ zcond_regex_match(char **a, int id)
 		 * Count the characters before the match.
 		 */
 		ptr = lhstr;
+		leftlen = m->rm_so;
 		offs = 0;
-		MB_METACHARINIT();
-		while (ptr < lhstr + m->rm_so) {
+		MB_CHARINIT();
+		while (leftlen) {
 		    offs++;
-		    ptr += MB_METACHARLEN(ptr);
+		    clen = MB_CHARLEN(ptr, leftlen);
+		    ptr += clen;
+		    leftlen -= clen;
 		}
 		setiparam("MBEGIN", offs + !isset(KSHARRAYS));
 		/*
 		 * Add on the characters in the match.
 		 */
-		while (ptr < lhstr + m->rm_eo) {
+		leftlen = m->rm_eo - m->rm_so;
+		while (leftlen) {
 		    offs++;
-		    ptr += MB_METACHARLEN(ptr);
+		    clen = MB_CHARLEN(ptr, leftlen);
+		    ptr += clen;
+		    leftlen -= clen;
 		}
 		setiparam("MEND", offs + !isset(KSHARRAYS) - 1);
 		if (nelem) {
@@ -149,19 +156,25 @@ zcond_regex_match(char **a, int id)
 		    {
 			char buf[DIGBUFSIZE];
 			ptr = lhstr;
+			leftlen = m->rm_so;
 			offs = 0;
 			/* Find the start offset */
-			MB_METACHARINIT();
-			while (ptr < lhstr + m->rm_so) {
+			MB_CHARINIT();
+			while (leftlen) {
 			    offs++;
-			    ptr += MB_METACHARLEN(ptr);
+			    clen = MB_CHARLEN(ptr, leftlen);
+			    ptr += clen;
+			    leftlen -= clen;
 			}
 			convbase(buf, offs + !isset(KSHARRAYS), 10);
 			*bptr = ztrdup(buf);
 			/* Continue to the end offset */
-			while (ptr < lhstr + m->rm_eo) {
+			leftlen = m->rm_eo - m->rm_so;
+			while (leftlen ) {
 			    offs++;
-			    ptr += MB_METACHARLEN(ptr);
+			    clen = MB_CHARLEN(ptr, leftlen);
+			    ptr += clen;
+			    leftlen -= clen;
 			}
 			convbase(buf, offs + !isset(KSHARRAYS) - 1, 10);
 			*eptr = ztrdup(buf);
diff --git a/Src/Zle/complist.c b/Src/Zle/complist.c
index f542066..a02a5c3 100644
--- a/Src/Zle/complist.c
+++ b/Src/Zle/complist.c
@@ -728,7 +728,7 @@ clnicezputs(int do_colors, char *s, int ml)
     if (do_colors)
 	initiscol();
 
-    mb_metacharinit();
+    mb_charinit();
     while (umleft > 0) {
 	size_t cnt = eol ? MB_INVALID : mbrtowc(&cc, uptr, umleft, &mbs);
 
diff --git a/Src/Zle/zle_utils.c b/Src/Zle/zle_utils.c
index e4ab97a..06e4581 100644
--- a/Src/Zle/zle_utils.c
+++ b/Src/Zle/zle_utils.c
@@ -1288,7 +1288,7 @@ showmsg(char const *msg)
     p = unmetafy(umsg, &ulen);
     memset(&mbs, 0, sizeof mbs);
 
-    mb_metacharinit();
+    mb_charinit();
     while (ulen > 0) {
 	char const *n;
 	if (*p == '\n') {
diff --git a/Src/builtin.c b/Src/builtin.c
index a3d847f..0edc070 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -4582,7 +4582,7 @@ bin_print(char *name, char **args, Options ops, int func)
 		    convchar_t cc;
 #ifdef MULTIBYTE_SUPPORT
 		    if (isset(MULTIBYTE)) {
-			mb_metacharinit();
+			mb_charinit();
 			(void)mb_metacharlenconv(metafy(curarg+1, curlen-1,
 							META_USEHEAP), &cc);
 		    }
@@ -5557,7 +5557,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
 	wint_t wi;
 
 	if (isset(MULTIBYTE)) {
-	    mb_metacharinit();
+	    mb_charinit();
 	    (void)mb_metacharlenconv(delimstr, &wi);
 	}
 	else
diff --git a/Src/glob.c b/Src/glob.c
index 057d44a..eff34a2 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -2237,7 +2237,7 @@ xpandbraces(LinkList list, LinkNode *np)
 #ifdef MULTIBYTE_SUPPORT
 		char *ncptr;
 		int nclen;
-		mb_metacharinit();
+		mb_charinit();
 		ncptr = wcs_nicechar(cend, NULL, NULL);
 		nclen = strlen(ncptr);
 		p = zhalloc(lenalloc + nclen);
@@ -2805,7 +2805,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 		     * ... now we know whether it's worth looking for the
 		     * shortest, which we do by brute force.
 		     */
-		    mb_metacharinit();
+		    mb_charinit();
 		    for (t = s, umlen = 0; t < s + mlen; ) {
 			set_pat_end(p, *t);
 			if (pattrylen(p, s, t - s, umlen, 0)) {
@@ -2831,7 +2831,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 	     * so that match, mbegin, mend and MATCH, MBEGIN, MEND are
 	     * correct.
 	     */
-	    mb_metacharinit();
+	    mb_charinit();
 	    tmatch = NULL;
 	    for (ioff = 0, t = s, umlen = umltot; t < s + l; ioff++) {
 		set_pat_start(p, t-s);
@@ -2855,7 +2855,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 	    /* Largest possible match at tail of string:       *
 	     * move forward along string until we get a match. *
 	     * Again there's no optimisation.                  */
-	    mb_metacharinit();
+	    mb_charinit();
 	    for (ioff = 0, t = s, umlen = umltot; t < s + l; ioff++) {
 		set_pat_start(p, t-s);
 		if (pattrylen(p, t, s + l - t, umlen, ioff)) {
@@ -2889,7 +2889,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 	    }
 	    ioff = 0;		/* offset into string */
 	    umlen = umltot;
-	    mb_metacharinit();
+	    mb_charinit();
 	    do {
 		/* loop over all matches for global substitution */
 		matched = 0;
@@ -2986,7 +2986,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 	     */
 	    nmatches = 0;
 	    tmatch = NULL;
-	    mb_metacharinit();
+	    mb_charinit();
 	    for (ioff = 0, t = s, umlen = umltot; t < s + l; ioff++) {
 		set_pat_start(p, t-s);
 		if (pattrylen(p, t, s + l - t, umlen, ioff)) {
@@ -3002,7 +3002,7 @@ igetmatch(char **sp, Patprog p, int fl, int n, char *replstr,
 		     * We need to find the n'th last match.
 		     */
 		    n = nmatches - n;
-		    mb_metacharinit();
+		    mb_charinit();
 		    for (ioff = 0, t = s, umlen = umltot; t < s + l; ioff++) {
 			set_pat_start(p, t-s);
 			if (pattrylen(p, t, s + l - t, umlen, ioff) &&
diff --git a/Src/hist.c b/Src/hist.c
index bd03c4f..6725313 100644
--- a/Src/hist.c
+++ b/Src/hist.c
@@ -2000,7 +2000,7 @@ casemodify(char *str, int how)
 	VARARR(char, mbstr, MB_CUR_MAX);
 	mbstate_t ps;
 
-	mb_metacharinit();
+	mb_charinit();
 	memset(&ps, 0, sizeof(ps));
 	while (*str) {
 	    wint_t wc;
diff --git a/Src/prompt.c b/Src/prompt.c
index ffc1d0d..9e8589d 100644
--- a/Src/prompt.c
+++ b/Src/prompt.c
@@ -964,7 +964,7 @@ stradd(char *d)
 	    /* FALL THROUGH */
 	default:
 	    /* Take full wide character in one go */
-	    mb_metacharinit();
+	    mb_charinit();
 	    pc = wcs_nicechar(cc, NULL, NULL);
 	    break;
 	}
diff --git a/Src/utils.c b/Src/utils.c
index c33c16d..13fc96a 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -82,7 +82,7 @@ set_widearray(char *mb_array, Widechar_array wca)
 	wchar_t *wcptr = tmpwcs;
 	wint_t wci;
 
-	mb_metacharinit();
+	mb_charinit();
 	while (*mb_array) {
 	    int mblen = mb_metacharlenconv(mb_array, &wci);
 
@@ -332,7 +332,7 @@ zerrmsg(FILE *file, const char *fmt, va_list ap)
 	    case 'c':
 		num = va_arg(ap, int);
 #ifdef MULTIBYTE_SUPPORT
-		mb_metacharinit();
+		mb_charinit();
 		zputs(wcs_nicechar(num, NULL, NULL), file);
 #else
 		zputs(nicechar(num), file);
@@ -461,12 +461,13 @@ static mbstate_t mb_shiftstate;
 
 /*
  * Initialise multibyte state: called before a sequence of
- * wcs_nicechar() or mb_metacharlenconv().
+ * wcs_nicechar(), mb_metacharlenconv(), or
+ * mb_charlenconv().
  */
 
 /**/
 mod_export void
-mb_metacharinit(void)
+mb_charinit(void)
 {
     memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
 }
@@ -500,7 +501,7 @@ mb_metacharinit(void)
  * (but not both).  (Note the complication that the wide character
  * part may contain metafied characters.)
  *
- * The caller needs to call mb_metacharinit() before the first call, to
+ * The caller needs to call mb_charinit() before the first call, to
  * set up the multibyte shift state for a range of characters.
  */
 
@@ -3832,7 +3833,7 @@ itype_end(const char *ptr, int itype, int once)
 #ifdef MULTIBYTE_SUPPORT
     if (isset(MULTIBYTE) &&
 	(itype != IIDENT || !isset(POSIXIDENTIFIERS))) {
-	mb_metacharinit();
+	mb_charinit();
 	while (*ptr) {
 	    wint_t wc;
 	    int len = mb_metacharlenconv(ptr, &wc);
@@ -4972,6 +4973,65 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
     return num + num_in_char;
 }
 
+/*
+ * The equivalent of mb_metacharlenconv_r() for
+ * strings that aren't metafied and hence have
+ * explicit lengths.
+ */
+
+/**/
+mod_export int
+mb_charlenconv_r(const char *s, int slen, wint_t *wcp, mbstate_t *mbsp)
+{
+    size_t ret = MB_INVALID;
+    char inchar;
+    const char *ptr;
+    wchar_t wc;
+
+    for (ptr = s; slen;  ) {
+	inchar = *ptr;
+	ptr++;
+	slen--;
+	ret = mbrtowc(&wc, &inchar, 1, mbsp);
+
+	if (ret == MB_INVALID)
+	    break;
+	if (ret == MB_INCOMPLETE)
+	    continue;
+	if (wcp)
+	    *wcp = wc;
+	return ptr - s;
+    }
+
+    if (wcp)
+	*wcp = WEOF;
+    /* No valid multibyte sequence */
+    memset(mbsp, 0, sizeof(*mbsp));
+    if (ptr > s) {
+	return 1;	/* Treat as single byte character */
+    } else
+	return 0;		/* Probably shouldn't happen */
+}
+
+/*
+ * The equivalent of mb_metacharlenconv() for
+ * strings that aren't metafied and hence have
+ * explicit lengths;
+ */
+
+/**/
+mod_export int
+mb_charlenconv(const char *s, int slen, wint_t *wcp)
+{
+    if (!isset(MULTIBYTE)) {
+	if (wcp)
+	    *wcp = (wint_t)*s;
+	return 1;
+    }
+
+    return mb_charlenconv_r(s, slen, wcp, &mb_shiftstate);
+}
+
 /**/
 #else
 
@@ -4996,6 +5056,23 @@ metacharlenconv(const char *x, int *c)
     return 1;
 }
 
+/* Simple replacement for mb_charlenconv */
+
+/**/
+mod_export int
+charlenconv(const char *x, int len, int *c)
+{
+    if (!len) {
+	if (c)
+	    *c = '\0';
+	return 0;
+    }
+
+    if (c)
+	*c = (char)*x;
+    return 1;
+}
+
 /**/
 #endif /* MULTIBYTE_SUPPORT */
 
diff --git a/Src/zsh.h b/Src/zsh.h
index c88c2e7..fb04929 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -2921,8 +2921,9 @@ enum {
 #define AFTERTRAPHOOK  (zshhooks + 2)
 
 #ifdef MULTIBYTE_SUPPORT
+/* Metafied input */
 #define nicezputs(str, outs)	(void)mb_niceformat((str), (outs), NULL, 0)
-#define MB_METACHARINIT()	mb_metacharinit()
+#define MB_METACHARINIT()	mb_charinit()
 typedef wint_t convchar_t;
 #define MB_METACHARLENCONV(str, cp)	mb_metacharlenconv((str), (cp))
 #define MB_METACHARLEN(str)	mb_metacharlenconv(str, NULL)
@@ -2932,6 +2933,11 @@ typedef wint_t convchar_t;
 #define MB_METASTRLEN2END(str, widthp, eptr)	\
     mb_metastrlenend(str, widthp, eptr)
 
+/* Unmetafined input */
+#define MB_CHARINIT()		mb_charinit()
+#define MB_CHARLENCONV(str, len, cp)	mb_charlenconv((str), (len), (cp))
+#define MB_CHARLEN(str, len)	mb_charlenconv((str), (len), NULL)
+
 /*
  * We replace broken implementations with one that uses Unicode
  * characters directly as wide characters.  In principle this is only
@@ -3015,6 +3021,10 @@ typedef int convchar_t;
 #define MB_METASTRLEN2(str, widthp)	ztrlen(str)
 #define MB_METASTRLEN2END(str, widthp, eptr)	ztrlenend(str, eptr)
 
+#define MB_CHARINIT()
+#define MB_CHARLENCONV(str, len, cp) charlenconv((str), (len), (cp))
+#define MB_CHARLEN(str, len) ((len) ? 1 : 0)
+
 #define WCWIDTH_WINT(c)	(1)
 
 /* Leave character or string as is. */
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index c9ecb78..7fc07cc 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -484,3 +484,16 @@
 # This doesn't look aligned in my editor because actually the characters
 # aren't quite double width, but the arithmetic is correct.
 # It appears just to be an effect of the font.
+
+  if zmodload -i zsh/regex 2>/dev/null; then
+    [[ $'\ua0' =~ '^.$' ]] && print OK
+    [[ $'\ua0' =~ $'^\ua0$' ]] && print OK
+    [[ $'\ua0'X =~ '^X$' ]] || print OK
+  else
+    print -u$ZTST_fd "Regexp test skipped, regexp library not found."
+    print -l OK OK OK
+  fi
+0:Ensure no confusion on metafied input to regex module
+>OK
+>OK
+>OK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-11 16:59     ` Peter Stephenson
@ 2015-06-22 15:56       ` m0viefreak
  2015-06-22 16:29         ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: m0viefreak @ 2015-06-22 15:56 UTC (permalink / raw)
  To: zsh-workers



On 11.06.2015 18:59, Peter Stephenson wrote:
> The change from mb_metacharinit() to mb_charinit() is a bit unsightly
> but the name had got just plain confusing --- there's nothing meta about
> it.  I never metacharacter I couldn't parse.
> 
> (No multibyte characters were harmed in the preparation of this email;
> I've used $'\ua0'.)
> 
> pws
> 
> <path diff>
> ...
> </path diff>

This patch (f1923bdfa6300a0d32e3329eb2488447f76b8970) introduces another
issue for me:

Regex evaluation using a conditional capture group crashes zsh when the
pattern is not found:

$ [[ "foo" =~ (foo)? ]] # pattern found, all fine
$ [[ "foo" =~ (bar)? ]] # pattern not found, crash
zsh: segmentation fault (core dumped)  zsh -f


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-22 15:56       ` m0viefreak
@ 2015-06-22 16:29         ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2015-06-22 16:29 UTC (permalink / raw)
  To: zsh-workers

On Mon, 22 Jun 2015 17:56:06 +0200
m0viefreak <m0viefreak.cm@googlemail.com> wrote:
> On 11.06.2015 18:59, Peter Stephenson wrote:
> > The change from mb_metacharinit() to mb_charinit() is a bit unsightly
> > but the name had got just plain confusing --- there's nothing meta about
> > it.  I never metacharacter I couldn't parse.
> > 
> > (No multibyte characters were harmed in the preparation of this email;
> > I've used $'\ua0'.)
> > 
> > pws
> > 
> > <path diff>
> > ...
> > </path diff>
> 
> This patch (f1923bdfa6300a0d32e3329eb2488447f76b8970) introduces another
> issue for me:
> 
> Regex evaluation using a conditional capture group crashes zsh when the
> pattern is not found:

This isn't fundamentally new, it's just a different response to an
unhandled condition.

-1 is consistent with what glob matches do.

pws

diff --git a/Src/Modules/regex.c b/Src/Modules/regex.c
index 94f523f..16cc77f 100644
--- a/Src/Modules/regex.c
+++ b/Src/Modules/regex.c
@@ -155,6 +155,11 @@ zcond_regex_match(char **a, int id)
 			 ++n, ++m, ++bptr, ++eptr)
 		    {
 			char buf[DIGBUFSIZE];
+			if (m->rm_so < 0 || m->rm_eo < 0) {
+			    *bptr = ztrdup("-1");
+			    *eptr = ztrdup("-1");
+			    continue;
+			}
 			ptr = lhstr;
 			leftlen = m->rm_so;
 			offs = 0;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Strange behavior of [[
  2015-06-10  3:27 Strange behavior of [[ Maxime Arthaud
  2015-06-10  5:31 ` Bart Schaefer
@ 2016-01-08 13:09 ` Jun T.
  2016-04-23  9:51   ` Segfault with PCRE (Re: Strange behavior of [[) Mikael Berthe
  1 sibling, 1 reply; 9+ messages in thread
From: Jun T. @ 2016-01-08 13:09 UTC (permalink / raw)
  To: zsh-workers

pcre.c has the same problem:

% setopt re_match_pcre
% [[ $'\ua0' =~ . ]] && echo OK
(zsh hangs; 100% CPU usage)

The following is a copy of the patch to regex.c in workers/35448.
Also added a simple test in V07pcre.ztst.

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index 2393cd1..aa5c8ed 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -190,18 +190,25 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
 	if (want_begin_end) {
 	    char *ptr = arg;
 	    zlong offs = 0;
+	    int clen, leftlen;
 
 	    /* Count the characters before the match */
-	    MB_METACHARINIT();
-	    while (ptr < arg + ovec[0]) {
+	    MB_CHARINIT();
+	    leftlen = ovec[0];
+	    while (leftlen) {
 		offs++;
-		ptr += MB_METACHARLEN(ptr);
+		clen = MB_CHARLEN(ptr, leftlen);
+		ptr += clen;
+		leftlen -= clen;
 	    }
 	    setiparam("MBEGIN", offs + !isset(KSHARRAYS));
 	    /* Add on the characters in the match */
-	    while (ptr < arg + ovec[1]) {
+	    leftlen = ovec[1] - ovec[0];
+	    while (leftlen) {
 		offs++;
-		ptr += MB_METACHARLEN(ptr);
+		clen = MB_CHARLEN(ptr, leftlen);
+		ptr += clen;
+		leftlen -= clen;
 	    }
 	    setiparam("MEND", offs + !isset(KSHARRAYS) - 1);
 	    if (nelem) {
@@ -219,17 +226,23 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
 		    ptr = arg;
 		    offs = 0;
 		    /* Find the start offset */
-		    MB_METACHARINIT();
-		    while (ptr < arg + ipair[0]) {
+		    MB_CHARINIT();
+		    leftlen = ipair[0];
+		    while (leftlen) {
 			offs++;
-			ptr += MB_METACHARLEN(ptr);
+			clen = MB_CHARLEN(ptr, leftlen);
+			ptr += clen;
+			leftlen -= clen;
 		    }
 		    convbase(buf, offs + !isset(KSHARRAYS), 10);
 		    *bptr = ztrdup(buf);
 		    /* Continue to the end offset */
-		    while (ptr < arg + ipair[1]) {
+		    leftlen = ipair[1] - ipair[0];
+		    while (leftlen) {
 			offs++;
-			ptr += MB_METACHARLEN(ptr);
+			clen = MB_CHARLEN(ptr, leftlen);
+			ptr += clen;
+			leftlen -= clen;
 		    }
 		    convbase(buf, offs + !isset(KSHARRAYS) - 1, 10);
 		    *eptr = ztrdup(buf);
diff --git a/Test/V07pcre.ztst b/Test/V07pcre.ztst
index ddfd3f5..3907756 100644
--- a/Test/V07pcre.ztst
+++ b/Test/V07pcre.ztst
@@ -37,6 +37,17 @@
 >o→b
 >→
 
+  unset match mend
+  s=$'\u00a0'
+  [[ $s =~ '^.$' ]] && print OK
+  [[ A${s}B =~ .(.). && $match[1] == $s ]] && print OK
+  [[ A${s}${s}B =~ A([^[:ascii:]]*)B && $mend[1] == 3 ]] && print OK
+  unset s
+0:Raw IMETA characters in input string
+>OK
+>OK
+>OK
+
   [[ foo =~ f.+ ]] ; print $?
   [[ foo =~ x.+ ]] ; print $?
   [[ ! foo =~ f.+ ]] ; print $?



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Segfault with PCRE   (Re: Strange behavior of [[)
  2016-01-08 13:09 ` Jun T.
@ 2016-04-23  9:51   ` Mikael Berthe
  2016-04-23 21:22     ` Bart Schaefer
  0 siblings, 1 reply; 9+ messages in thread
From: Mikael Berthe @ 2016-04-23  9:51 UTC (permalink / raw)
  To: zsh-workers

Hello,

The following lines cause a segfault in zsh on my machines:

  setopt re_match_pcre
  s=test.txt
  [[ $s =~ '^(.*_)?(test)' ]] && echo $match[2]

This occurs only when the first group doesn't match (works fine with
s=1_test.txt).

I think this is related to the patch 37515 (commit 5eae5b58b1b99946),
see below:

* Jun T. <takimoto-j@kba.biglobe.ne.jp> [2016-01-08 14:09 +0100]:
> pcre.c has the same problem:
> 
> % setopt re_match_pcre
> % [[ $'\ua0' =~ . ]] && echo OK
> (zsh hangs; 100% CPU usage)
> 
> The following is a copy of the patch to regex.c in workers/35448.
> Also added a simple test in V07pcre.ztst.
> 
> diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c

(...)

Here ovec[2] = -1 (because there was no match for '(.**)?'),
and we have ipair = ovec + 2, so leftlen is set to -1 below.

> @@ -219,17 +226,23 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
>  		    ptr = arg;
>  		    offs = 0;
>  		    /* Find the start offset */
> -		    MB_METACHARINIT();
> -		    while (ptr < arg + ipair[0]) {
> +		    MB_CHARINIT();
> +		    leftlen = ipair[0];
> +		    while (leftlen) {
>  			offs++;
> -			ptr += MB_METACHARLEN(ptr);
> +			clen = MB_CHARLEN(ptr, leftlen);
> +			ptr += clen;
> +			leftlen -= clen;
>  		    }

The following patch fixes the segfault for me:

diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c
index e23ab57..5fd6796 100644
--- a/Src/Modules/pcre.c
+++ b/Src/Modules/pcre.c
@@ -228,7 +228,7 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar,
 		    /* Find the start offset */
 		    MB_CHARINIT();
 		    leftlen = ipair[0];
-		    while (leftlen) {
+		    while (leftlen > 0) {
 			offs++;
 			clen = MB_CHARLEN(ptr, leftlen);
 			ptr += clen;

Regards,
-- 
Mikael


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Segfault with PCRE   (Re: Strange behavior of [[)
  2016-04-23  9:51   ` Segfault with PCRE (Re: Strange behavior of [[) Mikael Berthe
@ 2016-04-23 21:22     ` Bart Schaefer
  0 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2016-04-23 21:22 UTC (permalink / raw)
  To: zsh-workers

On Apr 23, 11:51am, Mikael Berthe wrote:
}
} The following patch fixes the segfault for me

Thanks, applied.  I was hoping there was a way to reproduce this with
pcre_compile / pcre_match, but I can only get it to happen in the =~
conditional.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-04-23 21:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-10  3:27 Strange behavior of [[ Maxime Arthaud
2015-06-10  5:31 ` Bart Schaefer
2015-06-10  8:55   ` Peter Stephenson
2015-06-11 16:59     ` Peter Stephenson
2015-06-22 15:56       ` m0viefreak
2015-06-22 16:29         ` Peter Stephenson
2016-01-08 13:09 ` Jun T.
2016-04-23  9:51   ` Segfault with PCRE (Re: Strange behavior of [[) Mikael Berthe
2016-04-23 21:22     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).