mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Szabolcs Nagy <nsz@port70.net>
To: musl-mailinglist <musl@lists.openwall.com>,
	mailinglist <openwrt-devel@lists.openwrt.org>
Subject: Re: regex issue / asterisk / musl / sed
Date: Mon, 29 Feb 2016 20:18:43 +0100	[thread overview]
Message-ID: <20160229191843.GG29662@port70.net> (raw)
In-Reply-To: <20160229135348.GF29662@port70.net>

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

* Szabolcs Nagy <nsz@port70.net> [2016-02-29 14:53:48 +0100]:
> * Bastian Bittorf <bittorf@bluebottle.com> [2016-02-29 13:57:36 +0100]:
> > root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
> > sed: bad regex '*': Invalid regexp
> > root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
> > oasterisko
> > 
> > it's musl 1.1.14 on OpenWrt / r48814
> > both commands are working fine with glibc and uclibc
> > but the first invokation fails with musl 1.1.14 but
> > works with musl 1.1.13. unsre if the prob is on my
> > side, maybe $you have an idea...
> 
> yes, i introduced this regression in
> http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09
> 
> because i missed the special * behaviour for BRE,
> but even before that ^* was broken so just reverting
> the patch is not enough, handling * after an anchor
> or assertion correctly needs more code changes.

a possible fix is attached, the handling of ^ and $
in BRE is suboptimal, but that will need a bigger
refactoring.


[-- Attachment #2: 0001-fix-at-the-start-of-a-BRE-subexpression.patch --]
[-- Type: text/x-diff, Size: 1136 bytes --]

From b4abe263b2bc0c183274d1aec70cc586e4a46ba1 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Mon, 29 Feb 2016 15:04:46 +0000
Subject: [PATCH 1/2] fix * at the start of a BRE subexpression

commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at
the start of a BRE subexpression, but it should be accepted as
literal * there according to the standard.

This patch does not fix subexpressions starting with ^*.
---
 src/regex/regcomp.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index da6abd1..7a2864c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -889,7 +889,6 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
 		s++;
 		break;
 	case '*':
-		return REG_BADPAT;
 	case '{':
 	case '+':
 	case '?':
@@ -978,9 +977,6 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 		}
 
 	parse_iter:
-		/* extension: repetitions are rejected after an empty node
-		   eg. (+), |*, {2}, but assertions are not treated as empty
-		   so ^* or $? are accepted currently. */
 		for (;;) {
 			int min, max;
 
-- 
1.7.9.5


[-- Attachment #3: 0002-fix-at-the-start-of-a-complete-BRE.patch --]
[-- Type: text/x-diff, Size: 1223 bytes --]

From d24223c8b344ab3c58f1b9200379bd5349bb8cee Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Mon, 29 Feb 2016 16:36:25 +0000
Subject: [PATCH 2/2] fix ^* at the start of a complete BRE

This is a workaround to treat * as literal * at the start of a BRE.

Ideally ^ would be treated as an anchor at the start of any BRE
subexpression and similarly $ would be an anchor at the end of any
subexpression.  This is not required by the standard and hard to do
with the current code, but it's the existing practice.  If it is
changed, * should be treated as literal after such anchor as well.
---
 src/regex/regcomp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index 7a2864c..5fad98b 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -994,6 +994,10 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 			if (*s=='\\')
 				s++;
 
+			/* handle ^* at the start of a complete BRE. */
+			if (!ere && s==ctx->re+1 && s[-1]=='^')
+				break;
+
 			/* extension: multiple consecutive *+?{,} is unspecified,
 			   but (a+)+ has to be supported so accepting a++ makes
 			   sense, note however that the RE_DUP_MAX limit can be
-- 
1.7.9.5


  parent reply	other threads:[~2016-02-29 19:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-29 12:57 Bastian Bittorf
2016-02-29 13:53 ` Szabolcs Nagy
2016-02-29 14:05   ` Bastian Bittorf
2016-02-29 19:18   ` Szabolcs Nagy [this message]
2016-03-01 13:20     ` Bastian Bittorf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160229191843.GG29662@port70.net \
    --to=nsz@port70.net \
    --cc=musl@lists.openwall.com \
    --cc=openwrt-devel@lists.openwrt.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).