From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13377 Path: news.gmane.org!.POSTED!not-for-mail From: =?UTF-8?Q?Robert_H=C3=B6gberg?= Newsgroups: gmane.linux.lib.musl.general Subject: Unexpected regex behaviour Date: Mon, 29 Oct 2018 23:26:19 +0100 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="000000000000987d080579658f1f" X-Trace: blaine.gmane.org 1540851879 22318 195.159.176.226 (29 Oct 2018 22:24:39 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 29 Oct 2018 22:24:39 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-13393-gllmg-musl=m.gmane.org@lists.openwall.com Mon Oct 29 23:24:35 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gHFxb-0005hD-Ex for gllmg-musl@m.gmane.org; Mon, 29 Oct 2018 23:24:35 +0100 Original-Received: (qmail 11554 invoked by uid 550); 29 Oct 2018 22:26:44 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 11517 invoked from network); 29 Oct 2018 22:26:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=Tlf+jtdnmeWSCM2xBimc3k9WwJ2pzz6hDXMZSU91ibM=; b=J9DvmKoOOeL0YtANZeOJszftojSINdFdXUvVRa7q5RMN8e1db/3jBINN7minQJJSNc AvPiwMk8883vm8RIZOgp51aop1ykMpX+dVUep9Hso3auiQuKz60pZGSCwvSNmLRNLzcb Ug8/c8ToQHfSSqLW34mxT4q0FGCVCBUXDF/b0wwKx2KR4rPF9zHUuViRT396FMc/dIoj J6fh+9MpPPSG3BYLFadGtjFhLmWukwyG+H2i+OMSWVYhX6MUWiJYEU9HrVB+koEpNP9o y1Zn8ZthT+SMUOQD7urFl4/UR16Lk7uDwQGIoI8D99WFD2o4vOKcc6K/x0x3CFYrrOrW DHTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=Tlf+jtdnmeWSCM2xBimc3k9WwJ2pzz6hDXMZSU91ibM=; b=ru047JromxjqiSbVuIGPhHluZNaQ26ezW9RuzLYVK3HzdeXJZjoSbPFnIEHLSCAHeH 4Zm7XtvjDGoUiot8qHi9DAdUTSnrJj+wTi/OqH6DPjsF1y1rnWrmeWdmm7IREnlcDDac fg9NlOHlbY6PvD8N4MN0DaIe8zh3iHiKl6qu1Gx42bshX+4HwciyaAUqkn+RUVAsHrQd 0XcHgycDBEEPNwNKyOaMYYLP3nsVx08o33HnKHXvTwMdDFY+KJPM1cGluVqaSASld69S JxGb0lJbiwqd2LTKb1Hhg1YJvIHGroq9oPvEr4Ux7BOmIVcckAbSbCiJcLAOE9dw13uO 3kxA== X-Gm-Message-State: AGRZ1gLUzqr2AxKvnoDlfrQnv30jwMvRsc3eEtzrxAYtTpz4gDMF4c8z j4Yz6w5R3QzQsqJVXAxeX5pft4OUksqcYvkjXjGaX8mL X-Google-Smtp-Source: AJdET5dZQ2rcQ795pkKcWlFRI6B71L4ViySyz63o1tnZv/uHsw79l4l3EWW8NXMMolmp8nq/mGqRgfesI4E5itMs/B8= X-Received: by 2002:a67:1a85:: with SMTP id a127mr1202184vsa.49.1540851991335; Mon, 29 Oct 2018 15:26:31 -0700 (PDT) Xref: news.gmane.org gmane.linux.lib.musl.general:13377 Archived-At: --000000000000987d080579658f1f Content-Type: multipart/alternative; boundary="000000000000987d020579658f1d" --000000000000987d020579658f1d Content-Type: text/plain; charset="UTF-8" Hi, I've noticed that the musl regex implementation behaves slightly differently than the glibc implementation. I'm attaching a short program showing the behaviour. The difference makes yate (http://yate.null.ro) misbehave when running with musl (reported here: https://github.com/openwrt/telephony/issues/378). Yate uses a regexp like this: "^\\([[:alpha:]][[:alnum:]]\\+:\\)\\?/\\?/\\?\\([^[:space:][:cntrl:]@]\\+@\\)\\?\\([[:alnum:]._+-]\\+\\|[[][[:xdigit:].:]\\+[]]\\)\\(:[0-9]\\+\\)\\?" .. to parse strings like: "sip:012345678@11.111.11.111:5060;user=phone" .. and the matches produced by musl are: Match 0: 0 - 32 sip:012345678@11.111.11.111:5060 Match 1: -1 - -1 Match 2: 0 - 14 sip:012345678@ Match 3: 14 - 27 11.111.11.111 Match 4: 27 - 32 :5060 .. while glibc produces: Match 0: 0 - 32 sip:012345678@11.111.11.111:5060 Match 1: 0 - 4 sip: Match 2: 4 - 14 012345678@ Match 3: 14 - 27 11.111.11.111 Match 4: 27 - 32 :5060 What do you think? I've only tested musl 1.1.19. Sorry if this is not valid for later releases. I skimmed the 1.1.20 release notes and didn't find anything regex related. Regards Robert --000000000000987d020579658f1d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I've noticed that the musl = regex implementation behaves slightly differently than the glibc implementa= tion. I'm attaching a short program showing the behaviour.

The difference makes yate (htt= p://yate.null.ro) misbehave when=20 running with musl (reported here:=20 https://github.= com/openwrt/telephony/issues/378).

Yate uses a= regexp like this:
"^\\([[:alpha:]][[:alnum:]]\\+:\\)\\?/\\?/= \\?\\([^[:space:][:cntrl:]@]\\+@\\)\\?\\([[:alnum:]._+-]\\+\\|[[][[:xdigit:= ].:]\\+[]]\\)\\(:[0-9]\\+\\)\\?"

.. to parse st= rings like:
"sip:012345678@11.111.11.111:5060;user=3Dpho= ne"

.. and the matches = produced by musl are:
Match 0:=C2=A0 0 - 32=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 s= ip:012345678@11.111.11.111:5060
Match 1: -1 - -1
Match 2:=C2=A0 0= - 14=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sip:012345678@
Match 3: = 14 - 27=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 11.111.11.111
Match 4:= 27 - 32=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :5060

.. while glibc produces:
Match 0:=C2=A0 0 - 32=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sip:012345678@11.111.11.111:5060
Match 1:=C2=A0 0 -=C2= =A0 4=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sip:
Match 2:=C2=A0 4 - = 14=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 012345678@
Match 3: 14 - 27= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 11.111.11.111
Match 4: 27 - 3= 2=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 :5060

What do you think?

I'= ;ve only tested musl 1.1.19. Sorry if this is not valid for later releases.= I skimmed the 1.1.20 release notes and didn't find anything regex rela= ted.

Regards
Robert
--000000000000987d020579658f1d-- --000000000000987d080579658f1f Content-Type: text/x-csrc; charset="US-ASCII"; name="yate_regexp.c" Content-Disposition: attachment; filename="yate_regexp.c" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jnuugy320 I2luY2x1ZGUgPHJlZ2V4Lmg+CiNpbmNsdWRlIDxzdGRpby5oPgojaW5jbHVkZSA8c3RkbGliLmg+ CiNpbmNsdWRlIDxzdHJpbmcuaD4KCmludCBtYWluKCkKewogIGNvbnN0IGNoYXIqIHMgPSAic2lw OjAxMjM0NTY3OEAxMS4xMTEuMTEuMTExOjUwNjA7dXNlcj1waG9uZSI7CiAgY29uc3QgY2hhciog cmUgPSAiXlxcKFtbOmFscGhhOl1dW1s6YWxudW06XV1cXCs6XFwpXFw/L1xcPy9cXD9cXChbXls6 c3BhY2U6XVs6Y250cmw6XUBdXFwrQFxcKVxcP1xcKFtbOmFsbnVtOl0uXystXVxcK1xcfFtbXVtb OnhkaWdpdDpdLjpdXFwrW11dXFwpXFwoOlswLTldXFwrXFwpXFw/IjsKCiAgcmVnZXhfdCogZGF0 YSA9IChyZWdleF90KiltYWxsb2Moc2l6ZW9mKHJlZ2V4X3QpKTsKICByZWdjb21wKGRhdGEsIHJl LCAwKTsKCiAgY29uc3QgaW50IE1BWF9NQVRDSCA9IDk7CiAgcmVnbWF0Y2hfdCBybWF0Y2hbTUFY X01BVENIXTsKICByZWdleGVjKGRhdGEsIHMsIE1BWF9NQVRDSCwgcm1hdGNoLCAwKTsKCiAgZm9y IChpbnQgaSA9IDA7IGkgPCBNQVhfTUFUQ0g7IGkrKykgewogICAgY2hhciBzdWJzdHJbMjU2XTsK ICAgIHVuc2lnbmVkIHN1YnN0cl9sZW4gPSBybWF0Y2hbaV0ucm1fZW8gLSBybWF0Y2hbaV0ucm1f c287CiAgICBtZW1jcHkoc3Vic3RyLCBzICsgcm1hdGNoW2ldLnJtX3NvLCBzdWJzdHJfbGVuKTsK ICAgIHN1YnN0cltzdWJzdHJfbGVuXSA9ICdcMCc7CiAgICBwcmludGYoIk1hdGNoICV1OiAlMmQg LSAlMmQgXHQlc1xuIiwKICAgICAgICAgICBpLCBybWF0Y2hbaV0ucm1fc28sIHJtYXRjaFtpXS5y bV9lbywKICAgICAgICAgICBzdWJzdHJfbGVuID4gMD8gc3Vic3RyIDogIiIpOwogIH0KCiAgcmV0 dXJuIDA7Cn0KCgovKgpnbGliYzoKCk1hdGNoIDA6ICAwIC0gMzIgICAgICAgIHNpcDowMTIzNDU2 NzhAMTEuMTExLjExLjExMTo1MDYwCk1hdGNoIDE6ICAwIC0gIDQgICAgICAgIHNpcDoKTWF0Y2gg MjogIDQgLSAxNCAgICAgICAgMDEyMzQ1Njc4QApNYXRjaCAzOiAxNCAtIDI3ICAgICAgICAxMS4x MTEuMTEuMTExCk1hdGNoIDQ6IDI3IC0gMzIgICAgICAgIDo1MDYwCk1hdGNoIDU6IC0xIC0gLTEK TWF0Y2ggNjogLTEgLSAtMQpNYXRjaCA3OiAtMSAtIC0xCk1hdGNoIDg6IC0xIC0gLTEKCgptdXNs IDEuMS4xOToKTWF0Y2ggMDogIDAgLSAzMiAgICAgICAgc2lwOjAxMjM0NTY3OEAxMS4xMTEuMTEu MTExOjUwNjAKTWF0Y2ggMTogLTEgLSAtMQpNYXRjaCAyOiAgMCAtIDE0ICAgICAgICBzaXA6MDEy MzQ1Njc4QApNYXRjaCAzOiAxNCAtIDI3ICAgICAgICAxMS4xMTEuMTEuMTExCk1hdGNoIDQ6IDI3 IC0gMzIgICAgICAgIDo1MDYwCk1hdGNoIDU6IC0xIC0gLTEKTWF0Y2ggNjogLTEgLSAtMQpNYXRj aCA3OiAtMSAtIC0xCk1hdGNoIDg6IC0xIC0gLTEKCiovCg== --000000000000987d080579658f1f--