From: Szabolcs Nagy <nsz@port70.net>
To: Mike Beattie <mike@ethernal.org>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Bug: BOL/EOL anchors in regex capture groups won't match EOL
Date: Thu, 4 Aug 2022 00:43:42 +0200 [thread overview]
Message-ID: <20220803224342.GF1320090@port70.net> (raw)
In-Reply-To: <20220721060819.GB9838@prometheus.ethernal.org>
* Mike Beattie <mike@ethernal.org> [2022-07-21 18:08:19 +1200]:
> FRRouting uses musl-libc in its docker container build, and it also appears
> to be in use in the GNS3 appliances for frr available online.
>
> BGP as-path matching is regex powered, and usage of a special token of '_'
> allows for the easy matching of the boundary of an ASN in an as-path.
> Internally, it's translated into the regex capture group of:
>
> (^|[,{}() ]|$)
>
> A valid as-path is a sequence of integers such as:
>
> 100 200 300
>
> A BGP as-path filter might be specified as so:
>
> bgp as-path access-list foo seq 20 permit _300_
>
> which would get expanded to:
>
> (^|[,{}() ]|$)300(^|[,{}() ]|$)
>
> when checking for a match. The usage of the pattern "(^|$)" in musl's regex
> implementation will never match EOL, but it does match BOL. Removal of the
> circumflex will let the match succeed.
thanks for the report.
it seems to me regcomp does not handle assertions corretly if there is
a union (|) of multiple subexpressions that match the empty string.
it simply takes the assertion of the leftmost subexpression so e.g.
'(|$)a' matches 'a' but
'($|)a' does not because it matches as '$a' and the $ assertion fail.
since posix does not allow (| empty pattern in the syntax a conforming
example is e.g.
'(b*|$)a' vs '($|b*)a'
all supported assertions are affected (^, $, \b, \B, \<, \>).
the fix is not obvious: there is a regcomp step like
tags, assertions = leftmost_empty_match(subexpr)
process(tags, assertions)
which should be
list = all_empty_match(subexpr)
for tags, assertions in list:
if assertions are weaker than previous ones:
process(tags, assertions)
i think this can increase storage and computation requirements
significantly unless the algorithm is further optimized.
>
> Here is the output of a test programs I've written to confirm this:
>
> $ musl-gcc -o r r.c
>
> $ ./r "_300_" "100 200 300"
> regex: (^|[,{}() ]|$)300(^|[,{}() ]|$)
> regexec on [100 200 300]: NOT Found
>
> Removal of "^|" from the beginning of the trailing capture group:
>
> $ ./r "(^|[,{}() ]|$)300([,{}() ]|$)" "0000 1111 2222"
> regex: (^|[,{}() ]|$)300([,{}() ]|$)
> regexec on [100 200 300]: Found
>
> Thanks,
> Mike.
> --
> Mike Beattie <mike@ethernal.org>
prev parent reply other threads:[~2022-08-03 22:43 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-21 6:08 Mike Beattie
2022-07-22 2:02 ` [musl] " Mike Beattie
2022-08-03 22:43 ` Szabolcs Nagy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220803224342.GF1320090@port70.net \
--to=nsz@port70.net \
--cc=mike@ethernal.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).