From: "liheng (P)" <liheng40@huawei.com>
To: Rich Felker <dalias@libc.org>
Cc: "musl@lists.openwall.com" <musl@lists.openwall.com>,
"Xiangrui (Euler)" <rui.xiang@huawei.com>,
Lizefan <lizefan@huawei.com>
Subject: [musl] regex Back reference matching result not same as glibc and tre.
Date: Sat, 18 Apr 2020 08:44:50 +0000 [thread overview]
Message-ID: <6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com> (raw)
Rich Felker:
Hello, I've noticed musl regex matching result is not same as glibc and tre.
The back reference maybe not supported well in latest version.
Here is a simple test case:
#include <regex.h>
#include <stdio.h>
#include <string.h>
#define str "aba"
#define N 2
static const char *expected[N] =
{
str, "a"
};
static const char pat[] = "(.?).?\\1";
int test_regex(void)
{
regex_t rbuf;
int err = regcomp(&rbuf, pat, REG_EXTENDED);
if (err != 0) {
char errstr[300];
regerror(err, &rbuf, errstr, sizeof (errstr));
puts (errstr);
return err;
}
regmatch_t m[N];
err = regexec(&rbuf, str, N, m, 0);
if (err != 0) {
puts ("regexec failed");
return 1;
}
int result = 0;
int i;
for (i = 0; i < N; ++i) {
if (m[i].rm_so == -1) {
printf ("m[%d] unused\n", i);
result = 1;
}
else {
int len = m[i].rm_eo - m[i].rm_so;
printf ("m[%d] = \"%.*s\"\n", i, len, str + m[i].rm_so);
if (strlen (expected[i]) != len
|| memcmp (expected[i], str + m[i].rm_so, len) != 0)
result = 1;
}
}
return result;
}
int main (void)
{
int result = 0;
result = test_regex();
if (result != 0) {
printf("test regex failed\n");
} else {
printf("test regex success\n");
}
return result;
}
musl:
# ./test
regexec failed
test regex failed
glibc:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success
tre:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success
I noticed Rich Felker made change about back reference in below commit to suppress back reference processing in ERE regcomp.
commit 7c8c86f6308c7e0816b9638465a5917b12159e8f
Author: Rich Felker <dalias@aerifal.cx>
Date: Fri Mar 20 18:25:01 2015 -0400
suppress backref processing in ERE regcomp
one of the features of ERE is that it's actually a regular language
and does not admit expressions which cannot be matched in linear time.
introduction of \n backref support into regcomp's ERE parsing was
unintentional.
diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80cb1c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
break;
default:
- if (isdigit(*s)) {
+ if (!ere && isdigit(*s)) {
/* back reference */
This commit reminds me that if i want to use back reference i should not to tag REG_EXTENDED, but this test case matching still failed.
And I try to support back reference in ERE regcomp by below modify and then the musl regex matching success same as glibc and tre.
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
default:
+ if (!ere && isdigit(*s)) {
+ if (ere && isdigit(*s)) {
/* back reference */
Thank you for considering this.
Li Heng
next reply other threads:[~2020-04-18 8:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-18 8:44 liheng (P) [this message]
2020-04-18 10:28 ` Florian Weimer
2020-04-18 11:07 ` liheng (P)
2020-04-18 11:13 ` Szabolcs Nagy
2020-04-18 11:37 ` liheng (P)
2020-04-18 14:07 ` Szabolcs Nagy
2020-04-19 12:26 ` liheng (P)
2020-04-19 13:10 ` Florian Weimer
2020-04-20 1:26 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com \
--to=liheng40@huawei.com \
--cc=dalias@libc.org \
--cc=lizefan@huawei.com \
--cc=musl@lists.openwall.com \
--cc=rui.xiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).