mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: Steffen Nurpmeso <steffen@sdaoden.eu>
Subject: Re: Regex: behaviour of ? after () atom
Date: Fri, 7 Sep 2018 12:08:47 -0400	[thread overview]
Message-ID: <20180907160847.GN1878@brightrain.aerifal.cx> (raw)
In-Reply-To: <20180907160046.zZvDF%steffen@sdaoden.eu>

On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
>  |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
>  |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
>  |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
>  |>|> Hello.
>  |>|> 
>  |>|> In perl this is
>  |>|> 
>  |>|>   $x="print 1 2";
>  |>|>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
>  |>|>     print "<$0> -> <$1> <$2> <$3>\n"
>  |>|>}
>  |>|> 
>  |>|> and the result is
>  |>|> 
>  |>|>   </tmp/t.pl> -> <> <print> < 1 2>
>  |>|> 
>  |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
>  |>|> i maintain, which uses the normal regex stuff and calls it via
>  |>|> 
>  |>|>   echo eins=$3
>  |>|>          vput vexpr i regex "${3}" \
>  |>|>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
>  |>|>             '<\$0> -> <\$1> <\$2> <\$3>'
>  |>|>   echo i=$i
>  |>|> 
>  |>|> which in C code does 
>  |>|> 
>  |>|>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |>|>           ...
>  |>|>          goto jestr;
>  |>|>}
>  |>|>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
>  |>|>   argv[1],argv[2],n_NELEM(rema));
>  |>|>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
>  |>|> 
>  |>|> and overall prints
>  |>|> 
>  |>|>   eins=print 1 2
>  |>|>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
>  |>|>   i=<print 1 2> -> <> <> <>
>  |>|> 
>  |>|> It works correctly if i remove the ()? atom, so i thought i should
>  |>|> report that.
>  |>|
>  |>|What is the value of the flags argument you passed to regcomp?
>  |>|
>  |> 
>  |> REG_EXTENDED, optional REG_ICASE:
>  |> 
>  |>       reflrv = REG_EXTENDED;
>  |>       if(f & a_ICASE)
>  |>          reflrv |= REG_ICASE;
>  |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |
>  |OK, it looks like that should work, and seemed to work here when I
>  |passed the regex to grep -E linked with musl's regex. Can you provide
>  |a minimal self-contained C program to demonstrate the issue you're
>  |having?
> 
> Happy user that i am, here something for tests/:
> 
>   #include <stdio.h>
>   #include <regex.h>
>   int main(void){
>           regmatch_t rema[1 + 21];
>           regex_t re;
>           int i;
>           
>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
>                   return 2;
>           i = regexec(&re, "print 1 2", 21, rema, 0);
>           regfree(&re);
>           if(i == REG_NOMATCH)
>                   return 3;
>           for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
>                   ;
>           return (i == 3) ? 0 : 4;
>   }       
> 
> i is 1 here.
> 
>  |BTW which "()?" are you talking about? The whole first parenthesized
>  |subsexpression and the ? after it? I wouldn't call that an atom, but
>  |nothing seems wrong with it.
> 
> I have read regex(7) first just in case something intellectual had
> to be said.  Otherwise i am all for Finnish tango.

Your stopping condition is just wrong -- you're stopping after seeing
that the first subexpression does not match anything, and failing to
inspect the others. If you get rid of that stopping condition and
add code to print the rest, you'll see (each line is i, rm_so, rm_eo):

1 -1 -1
2 0 5
3 5 9
4 -1 -1
5 -1 -1
6 -1 -1
...

Also, for what it's worth, there's no reason to store expressions
temporarily in variables like this:

>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))

Just do:

          if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", REG_EXTENDED)))

etc.

Rich


  reply	other threads:[~2018-09-07 16:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 13:38 Steffen Nurpmeso
2018-09-07 15:18 ` Rich Felker
2018-09-07 15:25   ` Steffen Nurpmeso
2018-09-07 15:33     ` Rich Felker
2018-09-07 16:00       ` Steffen Nurpmeso
2018-09-07 16:08         ` Rich Felker [this message]
2018-09-07 16:22           ` Steffen Nurpmeso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180907160847.GN1878@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=steffen@sdaoden.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).