From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13213 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Regex: behaviour of ? after () atom Date: Fri, 7 Sep 2018 12:08:47 -0400 Message-ID: <20180907160847.GN1878@brightrain.aerifal.cx> References: <20180907133805.FZif_%steffen@sdaoden.eu> <20180907151821.GL1878@brightrain.aerifal.cx> <20180907152517.QGi3S%steffen@sdaoden.eu> <20180907153302.GM1878@brightrain.aerifal.cx> <20180907160046.zZvDF%steffen@sdaoden.eu> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1536336417 7451 195.159.176.226 (7 Sep 2018 16:06:57 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 7 Sep 2018 16:06:57 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Steffen Nurpmeso To: musl@lists.openwall.com Original-X-From: musl-return-13229-gllmg-musl=m.gmane.org@lists.openwall.com Fri Sep 07 18:06:53 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fyJHZ-0001r3-JH for gllmg-musl@m.gmane.org; Fri, 07 Sep 2018 18:06:53 +0200 Original-Received: (qmail 17631 invoked by uid 550); 7 Sep 2018 16:09:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 17611 invoked from network); 7 Sep 2018 16:08:59 -0000 Content-Disposition: inline In-Reply-To: <20180907160046.zZvDF%steffen@sdaoden.eu> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13213 Archived-At: On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote: > Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>: > |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote: > |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>: > |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote: > |>|> Hello. > |>|> > |>|> In perl this is > |>|> > |>|> $x="print 1 2"; > |>|> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ > |>|> print "<$0> -> <$1> <$2> <$3>\n" > |>|>} > |>|> > |>|> and the result is > |>|> > |>|> -> <> < 1 2> > |>|> > |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA > |>|> i maintain, which uses the normal regex stuff and calls it via > |>|> > |>|> echo eins=$3 > |>|> vput vexpr i regex "${3}" \ > |>|> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \ > |>|> '<\$0> -> <\$1> <\$2> <\$3>' > |>|> echo i=$i > |>|> > |>|> which in C code does > |>|> > |>|> if((reflrv = regcomp(&re, argv[2], reflrv))){ > |>|> ... > |>|> goto jestr; > |>|>} > |>|> fprintf(stderr, "GOING for <%s> -> <%s> %u\n", > |>|> argv[1],argv[2],n_NELEM(rema)); > |>|> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0); > |>|> > |>|> and overall prints > |>|> > |>|> eins=print 1 2 > |>|> GOING for -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17 > |>|> i= -> <> <> <> > |>|> > |>|> It works correctly if i remove the ()? atom, so i thought i should > |>|> report that. > |>| > |>|What is the value of the flags argument you passed to regcomp? > |>| > |> > |> REG_EXTENDED, optional REG_ICASE: > |> > |> reflrv = REG_EXTENDED; > |> if(f & a_ICASE) > |> reflrv |= REG_ICASE; > |> if((reflrv = regcomp(&re, argv[2], reflrv))){ > | > |OK, it looks like that should work, and seemed to work here when I > |passed the regex to grep -E linked with musl's regex. Can you provide > |a minimal self-contained C program to demonstrate the issue you're > |having? > > Happy user that i am, here something for tests/: > > #include > #include > int main(void){ > regmatch_t rema[1 + 21]; > regex_t re; > int i; > > i = REG_EXTENDED; > if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i))) > return 2; > i = regexec(&re, "print 1 2", 21, rema, 0); > regfree(&re); > if(i == REG_NOMATCH) > return 3; > for(i = 1; i < 21 && rema[i].rm_so != -1; ++i) > ; > return (i == 3) ? 0 : 4; > } > > i is 1 here. > > |BTW which "()?" are you talking about? The whole first parenthesized > |subsexpression and the ? after it? I wouldn't call that an atom, but > |nothing seems wrong with it. > > I have read regex(7) first just in case something intellectual had > to be said. Otherwise i am all for Finnish tango. Your stopping condition is just wrong -- you're stopping after seeing that the first subexpression does not match anything, and failing to inspect the others. If you get rid of that stopping condition and add code to print the rest, you'll see (each line is i, rm_so, rm_eo): 1 -1 -1 2 0 5 3 5 9 4 -1 -1 5 -1 -1 6 -1 -1 ... Also, for what it's worth, there's no reason to store expressions temporarily in variables like this: > i = REG_EXTENDED; > if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i))) Just do: if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", REG_EXTENDED))) etc. Rich