mailing list of musl libc
 help / color / mirror / code / Atom feed
* Unexpected regex behaviour
@ 2018-10-29 22:26 Robert Högberg
  2018-10-29 22:59 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Robert Högberg @ 2018-10-29 22:26 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 1195 bytes --]

Hi,

I've noticed that the musl regex implementation behaves slightly
differently than the glibc implementation. I'm attaching a short program
showing the behaviour.

The difference makes yate (http://yate.null.ro) misbehave when running with
musl (reported here: https://github.com/openwrt/telephony/issues/378).

Yate uses a regexp like this:
"^\\([[:alpha:]][[:alnum:]]\\+:\\)\\?/\\?/\\?\\([^[:space:][:cntrl:]@]\\+@\\)\\?\\([[:alnum:]._+-]\\+\\|[[][[:xdigit:].:]\\+[]]\\)\\(:[0-9]\\+\\)\\?"

.. to parse strings like:
"sip:012345678@11.111.11.111:5060;user=phone"

.. and the matches produced by musl are:
Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
Match 1: -1 - -1
Match 2:  0 - 14        sip:012345678@
Match 3: 14 - 27        11.111.11.111
Match 4: 27 - 32        :5060

.. while glibc produces:
Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
Match 1:  0 -  4        sip:
Match 2:  4 - 14        012345678@
Match 3: 14 - 27        11.111.11.111
Match 4: 27 - 32        :5060

What do you think?

I've only tested musl 1.1.19. Sorry if this is not valid for later
releases. I skimmed the 1.1.20 release notes and didn't find anything regex
related.

Regards
Robert

[-- Attachment #1.2: Type: text/html, Size: 1973 bytes --]

[-- Attachment #2: yate_regexp.c --]
[-- Type: text/x-csrc, Size: 1402 bytes --]

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
  const char* s = "sip:012345678@11.111.11.111:5060;user=phone";
  const char* re = "^\\([[:alpha:]][[:alnum:]]\\+:\\)\\?/\\?/\\?\\([^[:space:][:cntrl:]@]\\+@\\)\\?\\([[:alnum:]._+-]\\+\\|[[][[:xdigit:].:]\\+[]]\\)\\(:[0-9]\\+\\)\\?";

  regex_t* data = (regex_t*)malloc(sizeof(regex_t));
  regcomp(data, re, 0);

  const int MAX_MATCH = 9;
  regmatch_t rmatch[MAX_MATCH];
  regexec(data, s, MAX_MATCH, rmatch, 0);

  for (int i = 0; i < MAX_MATCH; i++) {
    char substr[256];
    unsigned substr_len = rmatch[i].rm_eo - rmatch[i].rm_so;
    memcpy(substr, s + rmatch[i].rm_so, substr_len);
    substr[substr_len] = '\0';
    printf("Match %u: %2d - %2d \t%s\n",
           i, rmatch[i].rm_so, rmatch[i].rm_eo,
           substr_len > 0? substr : "");
  }

  return 0;
}


/*
glibc:

Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
Match 1:  0 -  4        sip:
Match 2:  4 - 14        012345678@
Match 3: 14 - 27        11.111.11.111
Match 4: 27 - 32        :5060
Match 5: -1 - -1
Match 6: -1 - -1
Match 7: -1 - -1
Match 8: -1 - -1


musl 1.1.19:
Match 0:  0 - 32        sip:012345678@11.111.11.111:5060
Match 1: -1 - -1
Match 2:  0 - 14        sip:012345678@
Match 3: 14 - 27        11.111.11.111
Match 4: 27 - 32        :5060
Match 5: -1 - -1
Match 6: -1 - -1
Match 7: -1 - -1
Match 8: -1 - -1

*/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-10-30 11:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 22:26 Unexpected regex behaviour Robert Högberg
2018-10-29 22:59 ` Rich Felker
2018-10-30 11:05   ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).