From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13212 Path: news.gmane.org!.POSTED!not-for-mail From: Steffen Nurpmeso Newsgroups: gmane.linux.lib.musl.general Subject: Re: Regex: behaviour of ? after () atom Date: Fri, 07 Sep 2018 18:00:46 +0200 Message-ID: <20180907160046.zZvDF%steffen@sdaoden.eu> References: <20180907133805.FZif_%steffen@sdaoden.eu> <20180907151821.GL1878@brightrain.aerifal.cx> <20180907152517.QGi3S%steffen@sdaoden.eu> <20180907153302.GM1878@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1536335861 533 195.159.176.226 (7 Sep 2018 15:57:41 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 7 Sep 2018 15:57:41 +0000 (UTC) User-Agent: s-nail v14.9.11-35-ge359e701 Cc: musl@lists.openwall.com To: Rich Felker Original-X-From: musl-return-13228-gllmg-musl=m.gmane.org@lists.openwall.com Fri Sep 07 17:57:37 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fyJ8a-0008Th-No for gllmg-musl@m.gmane.org; Fri, 07 Sep 2018 17:57:36 +0200 Original-Received: (qmail 9606 invoked by uid 550); 7 Sep 2018 15:59:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 9586 invoked from network); 7 Sep 2018 15:59:42 -0000 In-Reply-To: <20180907153302.GM1878@brightrain.aerifal.cx> Mail-Followup-To: Rich Felker , musl@lists.openwall.com, Steffen Nurpmeso OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt BlahBlahBlah: Any stupid boy can crush a beetle. But all the professors in the world can make no bugs. Xref: news.gmane.org gmane.linux.lib.musl.general:13212 Archived-At: Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>: |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote: |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>: |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote: |>|> Hello. |>|> |>|> In perl this is |>|> |>|> $x="print 1 2"; |>|> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ |>|> print "<$0> -> <$1> <$2> <$3>\n" |>|>} |>|> |>|> and the result is |>|> |>|> -> <> < 1 2> |>|> |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA |>|> i maintain, which uses the normal regex stuff and calls it via |>|> |>|> echo eins=$3 |>|> vput vexpr i regex "${3}" \ |>|> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \ |>|> '<\$0> -> <\$1> <\$2> <\$3>' |>|> echo i=$i |>|> |>|> which in C code does |>|> |>|> if((reflrv = regcomp(&re, argv[2], reflrv))){ |>|> ... |>|> goto jestr; |>|>} |>|> fprintf(stderr, "GOING for <%s> -> <%s> %u\n", |>|> argv[1],argv[2],n_NELEM(rema)); |>|> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0); |>|> |>|> and overall prints |>|> |>|> eins=print 1 2 |>|> GOING for -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17 |>|> i= -> <> <> <> |>|> |>|> It works correctly if i remove the ()? atom, so i thought i should |>|> report that. |>| |>|What is the value of the flags argument you passed to regcomp? |>| |> |> REG_EXTENDED, optional REG_ICASE: |> |> reflrv = REG_EXTENDED; |> if(f & a_ICASE) |> reflrv |= REG_ICASE; |> if((reflrv = regcomp(&re, argv[2], reflrv))){ | |OK, it looks like that should work, and seemed to work here when I |passed the regex to grep -E linked with musl's regex. Can you provide |a minimal self-contained C program to demonstrate the issue you're |having? Happy user that i am, here something for tests/: #include #include int main(void){ regmatch_t rema[1 + 21]; regex_t re; int i; i = REG_EXTENDED; if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i))) return 2; i = regexec(&re, "print 1 2", 21, rema, 0); regfree(&re); if(i == REG_NOMATCH) return 3; for(i = 1; i < 21 && rema[i].rm_so != -1; ++i) ; return (i == 3) ? 0 : 4; } i is 1 here. |BTW which "()?" are you talking about? The whole first parenthesized |subsexpression and the ? after it? I wouldn't call that an atom, but |nothing seems wrong with it. I have read regex(7) first just in case something intellectual had to be said. Otherwise i am all for Finnish tango. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt)