From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20532 invoked by alias); 13 Aug 2017 22:18:19 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41542 Received: (qmail 8832 invoked by uid 1010); 13 Aug 2017 22:18:19 -0000 X-Qmail-Scanner-Diagnostics: from mx.spodhuis.org by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(94.142.241.89):SA:0(-2.3/5.0):. Processed in 1.857243 secs); 13 Aug 2017 22:18:19 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS,T_DKIM_INVALID,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: zsh-workers+phil.pennock@spodhuis.org X-Qmail-Scanner-Mime-Attachments: |signature.asc| X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201708; h=In-Reply-To:Content-Type:MIME-Version:References :Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding :Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=bp4OExUPcfQDavUlJl1TW5OwQCdo0GbmbfBXENbxE4o=; b=bN6RtVe8PVyoqYF5zrSeWdDhMz BxYTB8I+ObJb+QSOREz2MB+MDx3pFotF7ejZ4QqZuvAbV91zA19pfDnWysMnWR4swKECI1yCdsIyV +7iuLAWTUu/vJbjLZqyndUq5ydWR5SHj5Y6KD0CNomWHy0EzEb/A8YZmqb4uHPplSNqtU+NPaJSCP Ip937neZITnmKGaCNlXA5ttSW8Cu; Date: Sun, 13 Aug 2017 18:18:06 -0400 From: Phil Pennock To: zsh-workers@zsh.org Subject: [PATCH] Repair BASH_REMATCH with no substrings Message-ID: <20170813221806.GA19107@breadbox.private.spodhuis.org> References: <20170813204949.GA98824@tower.spodhuis.org> <20170813211225.GB98824@tower.spodhuis.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="GvXjxJ+pjyke8COw" Content-Disposition: inline In-Reply-To: <20170813211225.GB98824@tower.spodhuis.org> OpenPGP: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc --GvXjxJ+pjyke8COw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2017-08-13 at 17:12 -0400, Phil Pennock wrote: > Definitely; this is a regression from my NUL fixing and trying to > correctly meta/unmeta all parameters going through. Change 41308 in > commit 825f84c77 exposed a bug introduced in 2011 in commit 2f3c16d40f. =46rom 41815a3b5e7324fa188d24f558ea9b0026a1a110 Mon Sep 17 00:00:00 2001 =46rom: Phil Pennock Date: Sun, 13 Aug 2017 18:13:41 -0400 Subject: [PATCH] Repair BASH_REMATCH with no substrings Change 41308 in commit 825f84c77 exposed a bug introduced in 2011 in commit 2f3c16d40f. (Both mine). When we went off the end of the array but measured the length implicitly, we got lucky before. After 41308 we were looking up lengths in stale memory. Rename some variables, clean up the logic, be easier to understand. Add tests. --- Src/Modules/pcre.c | 68 ++++++++++++++++++++++++++++++++++----------------= ---- Test/V07pcre.ztst | 24 +++++++++++++++++++ 2 files changed, 67 insertions(+), 25 deletions(-) diff --git a/Src/Modules/pcre.c b/Src/Modules/pcre.c index 27191d709..659fd22d5 100644 --- a/Src/Modules/pcre.c +++ b/Src/Modules/pcre.c @@ -148,7 +148,7 @@ bin_pcre_study(char *nam, UNUSED(char **args), UNUSED(O= ptions ops), UNUSED(int f =20 /**/ static int -zpcre_get_substrings(char *arg, int *ovec, int ret, char *matchvar, +zpcre_get_substrings(char *arg, int *ovec, int captured_count, char *match= var, char *substravar, int want_offset_pair, int matchedinarr, int want_begin_end) { @@ -156,15 +156,13 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, c= har *matchvar, char offset_all[50]; int capture_start =3D 1; =20 - if (matchedinarr) + if (matchedinarr) { + /* bash-style captures[0] entire-matched string in the array */ capture_start =3D 0; - if (matchvar =3D=3D NULL) - matchvar =3D "MATCH"; - if (substravar =3D=3D NULL) - substravar =3D "match"; - =20 + } + /* captures[0] will be entire matched string, [1] first substring */ - if (!pcre_get_substring_list(arg, ovec, ret, (const char ***)&captures= )) { + if (!pcre_get_substring_list(arg, ovec, captured_count, (const char **= *)&captures)) { int nelem =3D arrlen(captures)-1; /* Set to the offsets of the complete match */ if (want_offset_pair) { @@ -176,30 +174,43 @@ zpcre_get_substrings(char *arg, int *ovec, int ret, c= har *matchvar, * difference between the two values in each paired entry in ovec. * ovec is length 2*(1+capture_list_length) */ - match_all =3D metafy(captures[0], ovec[1] - ovec[0], META_DUP); - setsparam(matchvar, match_all); + if (matchvar) { + match_all =3D metafy(captures[0], ovec[1] - ovec[0], META_DUP); + setsparam(matchvar, match_all); + } /* * If we're setting match, mbegin, mend we only do * so if there were parenthesised matches, for consistency - * (c.f. regex.c). + * (c.f. regex.c). That's the next block after this one. + * Here we handle the simpler case where we don't worry about + * Unicode lengths, etc. + * Either !want_begin_end (ie, this is bash) or nelem; if bash + * then we're invoked always, even without nelem results, to + * set the array variable with one element in it, the complete match. */ - if (!want_begin_end || nelem) { + if (substravar && + (!want_begin_end || nelem)) { char **x, **y; - int vec_off; + int vec_off, i; y =3D &captures[capture_start]; - matches =3D x =3D (char **) zalloc(sizeof(char *) * (arrlen(y) + 1)); - vec_off =3D 2; - do { + matches =3D x =3D (char **) zalloc(sizeof(char *) * (captured_count+1= -capture_start)); + for (i =3D capture_start; i < captured_count; i++, y++) { + vec_off =3D 2*i; if (*y) *x++ =3D metafy(*y, ovec[vec_off+1]-ovec[vec_off], META_DUP); else *x++ =3D NULL; - vec_off +=3D 2; - } while (*y++); + } + *x =3D NULL; setaparam(substravar, matches); } =20 if (want_begin_end) { + /* + * cond-infix rather than builtin; also not bash; so we set a bunch + * of variables and arrays to values which require handling Unicode + * lengths + */ char *ptr =3D arg; zlong offs =3D 0; int clen, leftlen; @@ -306,7 +317,9 @@ bin_pcre_match(char *nam, char **args, Options ops, UNU= SED(int func)) zwarnnam(nam, "no pattern has been compiled"); return 1; } - =20 + + matched_portion =3D "MATCH"; + receptacle =3D "match"; if(OPT_HASARG(ops,c=3D'a')) { receptacle =3D OPT_ARG(ops,c); } @@ -318,8 +331,8 @@ bin_pcre_match(char *nam, char **args, Options ops, UNU= SED(int func)) return 1; } /* For the entire match, 'Return' the offset byte positions instead of= the matched string */ - if(OPT_ISSET(ops,'b')) want_offset_pair =3D 1;=20 - =20 + if(OPT_ISSET(ops,'b')) want_offset_pair =3D 1; + if ((ret =3D pcre_fullinfo(pcre_pattern, pcre_hints, PCRE_INFO_CAPTURE= COUNT, &capcount))) { zwarnnam(nam, "error %d in fullinfo", ret); @@ -360,7 +373,7 @@ cond_pcre_match(char **a, int id) { pcre *pcre_pat; const char *pcre_err; - char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar=3DNULL; + char *lhstr, *rhre, *lhstr_plain, *rhre_plain, *avar, *svar; int r =3D 0, pcre_opts =3D 0, pcre_errptr, capcnt, *ov, ovsize; int lhstr_plain_len, rhre_plain_len; int return_value =3D 0; @@ -380,8 +393,13 @@ cond_pcre_match(char **a, int id) ov =3D NULL; ovsize =3D 0; =20 - if (isset(BASHREMATCH)) - avar=3D"BASH_REMATCH"; + if (isset(BASHREMATCH)) { + svar =3D NULL; + avar =3D "BASH_REMATCH"; + } else { + svar =3D "MATCH"; + avar =3D "match"; + } =20 switch(id) { case CPCRE_PLAIN: @@ -414,7 +432,7 @@ cond_pcre_match(char **a, int id) break; } else if (r>0) { - zpcre_get_substrings(lhstr_plain, ov, r, NULL, avar, 0, + zpcre_get_substrings(lhstr_plain, ov, r, svar, avar, 0, isset(BASHREMATCH), !isset(BASHREMATCH)); return_value =3D 1; diff --git a/Test/V07pcre.ztst b/Test/V07pcre.ztst index ab41d33dc..9feeb47fb 100644 --- a/Test/V07pcre.ztst +++ b/Test/V07pcre.ztst @@ -142,9 +142,33 @@ print $? [[ foo -pcre-match ^g..$ ]] print $? + [[ ! foo -pcre-match ^g..$ ]] + print $? 0:infix -pcre-match works >0 >1 +>0 + +# Bash mode; note zsh documents that variables not updated on match failur= e, +# which remains different from bash + setopt bash_rematch + [[ "goo" -pcre-match ^f.+$ ]] ; print $? + [[ "foo" -pcre-match ^f.+$ ]] ; print -l $? _${^BASH_REMATCH[@]} + [[ "foot" -pcre-match ^f([aeiou]+)(.)$ ]]; print -l $? _${^BASH_REMATCH[= @]} + [[ "foo" -pcre-match ^f.+$ ]] ; print -l $? _${^BASH_REMATCH[@]} + [[ ! "goo" -pcre-match ^f.+$ ]] ; print $? + unsetopt bash_rematch +0:bash-compatibility works +>1 +>0 +>_foo +>0 +>_foot +>_oo +>_t +>0 +>_foo +>0 =20 # Subshell because crash on failure ( setopt re_match_pcre --=20 2.14.1 --GvXjxJ+pjyke8COw Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQROXBeef/xNv45sEy9REE5mjdBEgQUCWZDQEwAKCRBREE5mjdBE gS/RAP9PXquzoupCT+G9gujtf7zPL78X010x/cFTQkvgUukuFQEA4QXkZBzz7oTO quhY0MWjakxFsTGU2iF7hFj1ZiAagg2JAjMEAQEIAB0WIQTGk6A04e1u6VTK4toT 2tmcfkFRnAUCWZDQHQAKCRAT2tmcfkFRnB0GEACEIskw765qy3FBywX517y3BEVh 5BhIlTR1wc2//FEtNrzF0Sf2ESZBfAxLDsilNlwfjckY1El7eq42C2FOPiEjsQHg hcKNGT9I5vtiBE2K8VZ9CHyFWQchyKeoSCQ6iaSZJrvsNJGXn0m3/PemBhhPT9bo qSvtwYwCn6ihvWbW887SsjqSO1cXees5TyAVvPIKdgJPvEMxQjbrNy1nw97nB0fb RHchIHy4+PxeIV+9x4gopeftQv7iGT69Chb6SdMqj4II4VyTMAMTpUstsmfcJigH O2KVENwEgCsXAGwgM0mAJBljP2LhVcoDUw0q69/5kDgIrTeoOu9iZh/xic+he9kB 2el9hs4CeyGmxskZH7iHMj1IzBG4dOV3FZ5Vg81XmyOsrw3MFUotv7IOJ2xfgwcu +9CAdZ9oKbREcvb4aWnWm9m2wfidg7v+6EUwpLED0WD9X7nVUVIFux+D0SbTpvdi RHgycmDJz0VHGytrC6BP2eZoCgrCvq8Iq46YyoAr6uvxo6YJcqjgl2Id4G0d+LAK n+LG1NxPgwB1IRXdIdWqEZ72/qw0M7AWEo0WTQ0iBSH9V6GTb7acHicxxccfKIWC q1MWdRRIMT2EmDne8iaD9C1iK/hFk4GPh9cWBs+txHlxeO7f6Lq+SB0qYUG5DaLn nJ2Vz58nHzJ69hHAGA== =PKXL -----END PGP SIGNATURE----- --GvXjxJ+pjyke8COw--