From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 12712 invoked from network); 18 Apr 2020 08:45:15 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with UTF8ESMTPZ; 18 Apr 2020 08:45:15 -0000 Received: (qmail 24518 invoked by uid 550); 18 Apr 2020 08:45:12 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 24497 invoked from network); 18 Apr 2020 08:45:10 -0000 From: "liheng (P)" To: Rich Felker CC: "musl@lists.openwall.com" , "Xiangrui (Euler)" , Lizefan Thread-Topic: [musl] regex Back reference matching result not same as glibc and tre. Thread-Index: AdYVXU1IMkN4DclUQ0GfLSSSQXL7hw== Date: Sat, 18 Apr 2020 08:44:50 +0000 Message-ID: <6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.166.215.203] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected Subject: [musl] regex Back reference matching result not same as glibc and tre. Rich Felker: Hello, I've noticed musl regex matching result is not same as glibc and tre= .=20 The back reference maybe not supported well in latest version. Here is a simple test case: #include #include #include #define str "aba" #define N 2 static const char *expected[N] =3D { str, "a" }; static const char pat[] =3D "(.?).?\\1"; int test_regex(void) { regex_t rbuf; int err =3D regcomp(&rbuf, pat, REG_EXTENDED); if (err !=3D 0) { char errstr[300]; regerror(err, &rbuf, errstr, sizeof (errstr)); puts (errstr); return err; } regmatch_t m[N]; err =3D regexec(&rbuf, str, N, m, 0); if (err !=3D 0) { puts ("regexec failed"); return 1; } int result =3D 0; int i; for (i =3D 0; i < N; ++i) { if (m[i].rm_so =3D=3D -1) { printf ("m[%d] unused\n", i); result =3D 1; } else { int len =3D m[i].rm_eo - m[i].rm_so; printf ("m[%d] =3D \"%.*s\"\n", i, len, str + m[i].= rm_so); if (strlen (expected[i]) !=3D len || memcmp (expected[i], str + m[i].rm_so, l= en) !=3D 0) result =3D 1; } } return result; } int main (void) { int result =3D 0; result =3D test_regex(); if (result !=3D 0) { printf("test regex failed\n"); } else { printf("test regex success\n"); } return result; } musl:=20 # ./test regexec failed test regex failed glibc: # ./test m[0] =3D "aba" m[1] =3D "a" m[2] =3D "" test regex success tre: # ./test m[0] =3D "aba" m[1] =3D "a" m[2] =3D "" test regex success I noticed Rich Felker made change about back reference in below commit to s= uppress back reference processing in ERE regcomp. commit 7c8c86f6308c7e0816b9638465a5917b12159e8f Author: Rich Felker Date: Fri Mar 20 18:25:01 2015 -0400 suppress backref processing in ERE regcomp one of the features of ERE is that it's actually a regular language and does not admit expressions which cannot be matched in linear time. introduction of \n backref support into regcomp's ERE parsing was unintentional. diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80= cb1c 100644 --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c @@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, c= onst char *s) break; default: - if (isdigit(*s)) { + if (!ere && isdigit(*s)) { /* back reference */ This commit reminds me that if i want to use back reference i should not to= tag REG_EXTENDED, but this test case matching still failed. And I try to support back reference in ERE regcomp by below modify and then= the musl regex matching success same as glibc and tre. --- a/src/regex/regcomp.c +++ b/src/regex/regcomp.c default: + if (!ere && isdigit(*s)) { + if (ere && isdigit(*s)) { /* back reference */ Thank you for considering this. Li Heng