mailing list of musl libc
 help / color / mirror / code / Atom feed
From: "liheng (P)" <liheng40@huawei.com>
To: Rich Felker <dalias@libc.org>
Cc: "musl@lists.openwall.com" <musl@lists.openwall.com>,
	"Xiangrui (Euler)" <rui.xiang@huawei.com>,
	Lizefan <lizefan@huawei.com>
Subject: [musl] regex Back reference matching result not same as glibc and tre.
Date: Sat, 18 Apr 2020 08:44:50 +0000	[thread overview]
Message-ID: <6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com> (raw)

Rich Felker:

Hello, I've noticed musl regex matching result is not same as glibc and tre. 
The back reference maybe not supported well in latest version.

Here is a simple test case:

#include <regex.h>
#include <stdio.h>
#include <string.h>

#define str "aba"
#define N 2
static const char *expected[N] =
{
        str, "a"
};

static const char pat[] = "(.?).?\\1";

int test_regex(void)
{
        regex_t rbuf;

        int err = regcomp(&rbuf, pat, REG_EXTENDED);
        if (err != 0) {
                char errstr[300];
                regerror(err, &rbuf, errstr, sizeof (errstr));
                puts (errstr);
                return err;
        }

        regmatch_t m[N];
        err = regexec(&rbuf, str, N, m, 0);
        if (err != 0) {
                puts ("regexec failed");
                return 1;
        }

        int result = 0;
        int i;
        for (i = 0; i < N; ++i) {
                if (m[i].rm_so == -1) {
                        printf ("m[%d] unused\n", i);
                        result = 1;
                }
                else {
                        int len = m[i].rm_eo - m[i].rm_so;
                        printf ("m[%d] = \"%.*s\"\n", i, len, str + m[i].rm_so);
                        if (strlen (expected[i]) != len
                                || memcmp (expected[i], str + m[i].rm_so, len) != 0)
                                result = 1;
                }
        }

        return result;
}

int main (void)
{
        int result = 0;

        result = test_regex();

        if (result != 0) {
                printf("test regex failed\n");
        } else {
                printf("test regex success\n");
        }

        return result;
}

musl: 
# ./test
regexec failed
test regex failed

glibc:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success

tre:
# ./test
m[0] = "aba"
m[1] = "a"
m[2] = ""
test regex success


I noticed Rich Felker made change about back reference in below commit to suppress back reference processing in ERE regcomp.

commit 7c8c86f6308c7e0816b9638465a5917b12159e8f
Author: Rich Felker <dalias@aerifal.cx>
Date:   Fri Mar 20 18:25:01 2015 -0400

    suppress backref processing in ERE regcomp

    one of the features of ERE is that it's actually a regular language
    and does not admit expressions which cannot be matched in linear time.
    introduction of \n backref support into regcomp's ERE parsing was
    unintentional.

diff --git a/src/regex/regcomp.c b/src/regex/regcomp.c index bce6bc15..4d80cb1c 100644
--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
@@ -839,7 +839,7 @@ static reg_errcode_t parse_atom(tre_parse_ctx_t *ctx, const char *s)
                        break;
                default:
-                       if (isdigit(*s)) {
+                       if (!ere && isdigit(*s)) {
                                /* back reference */


This commit reminds me that if i want to use back reference i should not to tag REG_EXTENDED, but this test case matching still failed.

And I try to support back reference in ERE regcomp by below modify and then the musl regex matching success same as glibc and tre.

--- a/src/regex/regcomp.c
+++ b/src/regex/regcomp.c
                default:
+                       if (!ere && isdigit(*s)) {
+                       if (ere && isdigit(*s)) {
                                /* back reference */


Thank you for considering this.

Li Heng

             reply	other threads:[~2020-04-18  8:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-18  8:44 liheng (P) [this message]
2020-04-18 10:28 ` Florian Weimer
2020-04-18 11:07   ` liheng (P)
2020-04-18 11:13     ` Szabolcs Nagy
2020-04-18 11:37       ` liheng (P)
2020-04-18 14:07         ` Szabolcs Nagy
2020-04-19 12:26           ` liheng (P)
2020-04-19 13:10             ` Florian Weimer
2020-04-20  1:26             ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6D612B6AC5DCDA4580AF97B1068118AD2DC49A@DGGEML501-MBX.china.huawei.com \
    --to=liheng40@huawei.com \
    --cc=dalias@libc.org \
    --cc=lizefan@huawei.com \
    --cc=musl@lists.openwall.com \
    --cc=rui.xiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).