mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: "Robert Högberg" <robert.hogberg@gmail.com>
Subject: Re: Unexpected regex behaviour
Date: Mon, 29 Oct 2018 18:59:57 -0400	[thread overview]
Message-ID: <20181029225957.GR5150@brightrain.aerifal.cx> (raw)
In-Reply-To: <CAFYbUHMxwFOzW9f_T0etsi9efH3RoSJpBhCVigBXB9LM-ANE-A@mail.gmail.com>

On Mon, Oct 29, 2018 at 11:26:19PM +0100, Robert Högberg wrote:
> Hi,
> 
> I've noticed that the musl regex implementation behaves slightly
> differently than the glibc implementation. I'm attaching a short program
> showing the behaviour.
> 
> The difference makes yate (http://yate.null.ro) misbehave when running with
> musl (reported here: https://github.com/openwrt/telephony/issues/378).
> 
> Yate uses a regexp like this:
> "^\\([[:alpha:]][[:alnum:]]\\+:\\)\\?/\\?/\\?\\([^[:space:][:cntrl:]@]\\+@\\)\\?\\([[:alnum:]._+-]\\+\\|[[][[:xdigit:].:]\\+[]]\\)\\(:[0-9]\\+\\)\\?"
> 
> ... to parse strings like:
> "sip:012345678@11.111.11.111:5060;user=phone"
> 
> ... and the matches produced by musl are:
> Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
> Match 1: -1 - -1
> Match 2:  0 - 14        sip:012345678@
> Match 3: 14 - 27        11.111.11.111
> Match 4: 27 - 32        :5060
> 
> ... while glibc produces:
> Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
> Match 1:  0 -  4        sip:
> Match 2:  4 - 14        012345678@
> Match 3: 14 - 27        11.111.11.111
> Match 4: 27 - 32        :5060
> 
> What do you think?
> 
> I've only tested musl 1.1.19. Sorry if this is not valid for later
> releases. I skimmed the 1.1.20 release notes and didn't find anything regex
> related.

I haven't checked which of the extensions you're using are supported
in musl, but the above is not a conforming POSIX BRE. It would be a
lot more readable and portable to use POSIX ERE (REG_EXTENDED) which
has the +, ?, and | operators as standard features. This looks like it
should work:

"^([[:alpha:]][[:alnum:]]+:)?/?/?([^[:space:][:cntrl:]@]+@)?([[:alnum:]._+-]+|[[][[:xdigit:].:]+[]])(:[0-9]+)?"

The only reason to use POSIX BRE is if you need backreferences, which
are not regular and explicitly not supported in ERE.

Rich


  reply	other threads:[~2018-10-29 22:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 22:26 Robert Högberg
2018-10-29 22:59 ` Rich Felker [this message]
2018-10-30 11:05   ` Szabolcs Nagy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181029225957.GR5150@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=robert.hogberg@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).