mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Szabolcs Nagy <nsz@port70.net>
To: Mike Beattie <mike@ethernal.org>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Bug: BOL/EOL anchors in regex capture groups won't match EOL
Date: Thu, 4 Aug 2022 00:43:42 +0200	[thread overview]
Message-ID: <20220803224342.GF1320090@port70.net> (raw)
In-Reply-To: <20220721060819.GB9838@prometheus.ethernal.org>

* Mike Beattie <mike@ethernal.org> [2022-07-21 18:08:19 +1200]:
> FRRouting uses musl-libc in its docker container build, and it also appears
> to be in use in the GNS3 appliances for frr available online.
> 
> BGP as-path matching is regex powered, and usage of a special token of '_'
> allows for the easy matching of the boundary of an ASN in an as-path.
> Internally, it's translated into the regex capture group of:
> 
>    (^|[,{}() ]|$)
> 
> A valid as-path is a sequence of integers such as:
> 
>    100 200 300
> 
> A BGP as-path filter might be specified as so:
> 
>    bgp as-path access-list foo seq 20 permit _300_
> 
> which would get expanded to:
> 
>    (^|[,{}() ]|$)300(^|[,{}() ]|$)
> 
> when checking for a match. The usage of the pattern "(^|$)" in musl's regex
> implementation will never match EOL, but it does match BOL. Removal of the
> circumflex will let the match succeed.

thanks for the report.

it seems to me regcomp does not handle assertions corretly if there is
a union (|) of multiple subexpressions that match the empty string.

it simply takes the assertion of the leftmost subexpression so e.g.

'(|$)a' matches 'a' but
'($|)a' does not because it matches as '$a' and the $ assertion fail.

since posix does not allow (| empty pattern in the syntax a conforming
example is e.g.

'(b*|$)a' vs '($|b*)a'

all supported assertions are affected (^, $, \b, \B, \<, \>).

the fix is not obvious: there is a regcomp step like

	tags, assertions = leftmost_empty_match(subexpr)
	process(tags, assertions)

which should be

	list = all_empty_match(subexpr)
	for tags, assertions in list:
		if assertions are weaker than previous ones:
			process(tags, assertions)

i think this can increase storage and computation requirements
significantly unless the algorithm is further optimized.


> 
> Here is the output of a test programs I've written to confirm this:
> 
>    $ musl-gcc -o r r.c
> 
>    $ ./r "_300_" "100 200 300"
>    regex: (^|[,{}() ]|$)300(^|[,{}() ]|$)
>    regexec on [100 200 300]: NOT Found
> 
> Removal of "^|" from the beginning of the trailing capture group:
> 
>    $ ./r "(^|[,{}() ]|$)300([,{}() ]|$)" "0000 1111 2222"
>    regex: (^|[,{}() ]|$)300([,{}() ]|$)
>    regexec on [100 200 300]: Found
> 
> Thanks,
> Mike.
> -- 
> Mike Beattie <mike@ethernal.org>

      parent reply	other threads:[~2022-08-03 22:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-21  6:08 Mike Beattie
2022-07-22  2:02 ` [musl] " Mike Beattie
2022-08-03 22:43 ` Szabolcs Nagy [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220803224342.GF1320090@port70.net \
    --to=nsz@port70.net \
    --cc=mike@ethernal.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).