mailing list of musl libc
 help / color / mirror / code / Atom feed
* regex issue / asterisk / musl / sed
@ 2016-02-29 12:57 Bastian Bittorf
  2016-02-29 13:53 ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Bastian Bittorf @ 2016-02-29 12:57 UTC (permalink / raw)
  To: musl-mailinglist, mailinglist

dear hackers,

i cannot find any former issue with that,
but want to document this here. I spotted
an issue in one of our scripts and it boils down to:

root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
sed: bad regex '*': Invalid regexp
root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
oasterisko

it's musl 1.1.14 on OpenWrt / r48814
both commands are working fine with glibc and uclibc
but the first invokation fails with musl 1.1.14 but
works with musl 1.1.13. unsre if the prob is on my
side, maybe $you have an idea...

bye, bastian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regex issue / asterisk / musl / sed
  2016-02-29 12:57 regex issue / asterisk / musl / sed Bastian Bittorf
@ 2016-02-29 13:53 ` Szabolcs Nagy
  2016-02-29 14:05   ` Bastian Bittorf
  2016-02-29 19:18   ` Szabolcs Nagy
  0 siblings, 2 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2016-02-29 13:53 UTC (permalink / raw)
  To: musl-mailinglist, mailinglist

* Bastian Bittorf <bittorf@bluebottle.com> [2016-02-29 13:57:36 +0100]:
> root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
> sed: bad regex '*': Invalid regexp
> root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
> oasterisko
> 
> it's musl 1.1.14 on OpenWrt / r48814
> both commands are working fine with glibc and uclibc
> but the first invokation fails with musl 1.1.14 but
> works with musl 1.1.13. unsre if the prob is on my
> side, maybe $you have an idea...

yes, i introduced this regression in
http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09

because i missed the special * behaviour for BRE,
but even before that ^* was broken so just reverting
the patch is not enough, handling * after an anchor
or assertion correctly needs more code changes.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regex issue / asterisk / musl / sed
  2016-02-29 13:53 ` Szabolcs Nagy
@ 2016-02-29 14:05   ` Bastian Bittorf
  2016-02-29 19:18   ` Szabolcs Nagy
  1 sibling, 0 replies; 5+ messages in thread
From: Bastian Bittorf @ 2016-02-29 14:05 UTC (permalink / raw)
  To: musl

* Szabolcs Nagy <nsz@port70.net> [29.02.2016 15:04]:
> the patch is not enough, handling * after an anchor
> or assertion correctly needs more code changes.

thanks for your fast response - take your time...

bye, bastian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regex issue / asterisk / musl / sed
  2016-02-29 13:53 ` Szabolcs Nagy
  2016-02-29 14:05   ` Bastian Bittorf
@ 2016-02-29 19:18   ` Szabolcs Nagy
  2016-03-01 13:20     ` Bastian Bittorf
  1 sibling, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2016-02-29 19:18 UTC (permalink / raw)
  To: musl-mailinglist, mailinglist

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

* Szabolcs Nagy <nsz@port70.net> [2016-02-29 14:53:48 +0100]:
> * Bastian Bittorf <bittorf@bluebottle.com> [2016-02-29 13:57:36 +0100]:
> > root@box:~ echo 'o*o' | sed -e 's/*/asterisk/g'
> > sed: bad regex '*': Invalid regexp
> > root@box:~ echo 'o*o' | sed -e 's/\*/asterisk/g'
> > oasterisko
> > 
> > it's musl 1.1.14 on OpenWrt / r48814
> > both commands are working fine with glibc and uclibc
> > but the first invokation fails with musl 1.1.14 but
> > works with musl 1.1.13. unsre if the prob is on my
> > side, maybe $you have an idea...
> 
> yes, i introduced this regression in
> http://git.musl-libc.org/cgit/musl/commit/?id=7eaa76fc2e7993582989d3838b1ac32dd8abac09
> 
> because i missed the special * behaviour for BRE,
> but even before that ^* was broken so just reverting
> the patch is not enough, handling * after an anchor
> or assertion correctly needs more code changes.

a possible fix is attached, the handling of ^ and $
in BRE is suboptimal, but that will need a bigger
refactoring.


[-- Attachment #2: 0001-fix-at-the-start-of-a-BRE-subexpression.patch --]
[-- Type: text/x-diff, Size: 1136 bytes --]

From b4abe263b2bc0c183274d1aec70cc586e4a46ba1 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Mon, 29 Feb 2016 15:04:46 +0000
Subject: [PATCH 1/2] fix * at the start of a BRE subexpression

commit 7eaa76fc2e7993582989d3838b1ac32dd8abac09 made * invalid at
the start of a BRE subexpression, but it should be accepted as
literal * there according to the standard.

This patch does not fix subexpressions starting with ^*.
---
 src/regex/regcomp.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index da6abd1..7a2864c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -889,7 +889,6 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
 		s++;
 		break;
 	case '*':
-		return REG_BADPAT;
 	case '{':
 	case '+':
 	case '?':
@@ -978,9 +977,6 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 		}
 
 	parse_iter:
-		/* extension: repetitions are rejected after an empty node
-		   eg. (+), |*, {2}, but assertions are not treated as empty
-		   so ^* or $? are accepted currently. */
 		for (;;) {
 			int min, max;
 
-- 
1.7.9.5


[-- Attachment #3: 0002-fix-at-the-start-of-a-complete-BRE.patch --]
[-- Type: text/x-diff, Size: 1223 bytes --]

From d24223c8b344ab3c58f1b9200379bd5349bb8cee Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Mon, 29 Feb 2016 16:36:25 +0000
Subject: [PATCH 2/2] fix ^* at the start of a complete BRE

This is a workaround to treat * as literal * at the start of a BRE.

Ideally ^ would be treated as an anchor at the start of any BRE
subexpression and similarly $ would be an anchor at the end of any
subexpression.  This is not required by the standard and hard to do
with the current code, but it's the existing practice.  If it is
changed, * should be treated as literal after such anchor as well.
---
 src/regex/regcomp.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c
index 7a2864c..5fad98b 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -994,6 +994,10 @@ static reg_errcode_t tre_parse(tre_parse_ctx_t *ctx)
 			if (*s=='\\')
 				s++;
 
+			/* handle ^* at the start of a complete BRE. */
+			if (!ere && s==ctx->re+1 && s[-1]=='^')
+				break;
+
 			/* extension: multiple consecutive *+?{,} is unspecified,
 			   but (a+)+ has to be supported so accepting a++ makes
 			   sense, note however that the RE_DUP_MAX limit can be
-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: regex issue / asterisk / musl / sed
  2016-02-29 19:18   ` Szabolcs Nagy
@ 2016-03-01 13:20     ` Bastian Bittorf
  0 siblings, 0 replies; 5+ messages in thread
From: Bastian Bittorf @ 2016-03-01 13:20 UTC (permalink / raw)
  To: musl-mailinglist, mailinglist

* Szabolcs Nagy <nsz@port70.net> [29.02.2016 20:35]:
> a possible fix is attached, the handling of ^ and $
> in BRE is suboptimal, but that will need a bigger
> refactoring.

thank you, fixes it for me on x86/UML and MIPS/ar71xx.

bye, bastian


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-01 13:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 12:57 regex issue / asterisk / musl / sed Bastian Bittorf
2016-02-29 13:53 ` Szabolcs Nagy
2016-02-29 14:05   ` Bastian Bittorf
2016-02-29 19:18   ` Szabolcs Nagy
2016-03-01 13:20     ` Bastian Bittorf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).