mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] regex Back reference matching result not same as glibc and tre.
@ 2020-04-18  8:44 liheng (P)
  2020-04-18 10:28 ` Florian Weimer
  0 siblings, 1 reply; 9+ messages in thread
From: liheng (P) @ 2020-04-18  8:44 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Xiangrui (Euler), Lizefan

Rich Felker:

Hello, I've noticed musl regex matching result is not same as glibc and tre. 
The back reference maybe not supported well in latest version.

Here is a simple test case:

#include <regex.h>
#include <stdio.h>
#include <string.h>

#define str "aba"
#define N 2
static const char *expected[N] =
{
        str, "a"
};

static const char pat[] = "(.?).?\\1";

int test_regex(void)
{
        regex_t rbuf;

        int err = regcomp(&rbuf, pat, REG_EXTENDED);
        if (err != 0) {
                char errstr[300];
                regerror(err, &rbuf, errstr, sizeof (errstr));
                puts (errstr);
                return err;
        }

        regmatch_t m[N];
        err = regexec(&rbuf, str, N, m, 0);
        if (err != 0) {
                puts ("regexec failed");
                return 1;
        }

        int result = 0;
        int i;
        for (i = 0; i < N; ++i) {
                if (m[i].rm_so == -1) {
                        printf ("m[%d] unused\n", i);
                        result = 1;
                }
                else {
                        int len = m[i].rm_eo - m[i].rm_so;
                        printf ("m[%d] = \"%.*s\"\n", i, len, str + m[i].rm_so);
                        if (strlen (expected[i]) != len
                                || memcmp (expected[i], str + m[i].rm_so, len) != 0)
                                result = 1;
                }
        }

        return result;
}

int main (void)
{
        int result = 0;

        result = test_regex();

        if (result != 0) {
                printf("test regex failed\n");
        } else {
                printf("test regex success\n");
        }

        return result;
}

musl: 
# ./test
regexec failed
test regex failed

glibc:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success

tre:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success


I noticed Rich Felker made change about back reference in below commit to suppress back reference processing in ERE regcomp.

commit 7c8c86f6308c7e0816b9638465a5917b12159e8f
Author: Rich Felker <dalias@aerifal.cx>
Date:   Fri Mar 20 18:25:01 2015 -0400

    suppress backref processing in ERE regcomp

    one of the features of ERE is that it's actually a regular language
    and does not admit expressions which cannot be matched in linear time.
    introduction of \n backref support into regcomp's ERE parsing was
    unintentional.

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80cb1c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
                        break;
                default:
-                       if (isdigit(*s)) {
+                       if (!ere && isdigit(*s)) {
                                /* back reference */


This commit reminds me that if i want to use back reference i should not to tag REG_EXTENDED, but this test case matching still failed.

And I try to support back reference in ERE regcomp by below modify and then the musl regex matching success same as glibc and tre.

--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
                default:
+                       if (!ere && isdigit(*s)) {
+                       if (ere && isdigit(*s)) {
                                /* back reference */


Thank you for considering this.

Li Heng

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-20  1:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-18  8:44 [musl] regex Back reference matching result not same as glibc and tre liheng (P)
2020-04-18 10:28 ` Florian Weimer
2020-04-18 11:07   ` liheng (P)
2020-04-18 11:13     ` Szabolcs Nagy
2020-04-18 11:37       ` liheng (P)
2020-04-18 14:07         ` Szabolcs Nagy
2020-04-19 12:26           ` liheng (P)
2020-04-19 13:10             ` Florian Weimer
2020-04-20  1:26             ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).