mailing list of musl libc
 help / color / mirror / code / Atom feed
* Regex: behaviour of ? after () atom
@ 2018-09-07 13:38 Steffen Nurpmeso
  2018-09-07 15:18 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 13:38 UTC (permalink / raw)
  To: musl

Hello.

In perl this is

  $x="print 1 2";
  if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
    print "<$0> -> <$1> <$2> <$3>\n"
  }

and the result is

  </tmp/t.pl> -> <> <print> < 1 2>

Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
i maintain, which uses the normal regex stuff and calls it via

  echo eins=$3
         vput vexpr i regex "${3}" \
            '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
            '<\$0> -> <\$1> <\$2> <\$3>'
  echo i=$i

which in C code does 

      if((reflrv = regcomp(&re, argv[2], reflrv))){
          ...
         goto jestr;
      }
  fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
  argv[1],argv[2],n_NELEM(rema));
      reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);

and overall prints

  eins=print 1 2
  GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
  i=<print 1 2> -> <> <> <>

It works correctly if i remove the ()? atom, so i thought i should
report that.
Ciao,

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 13:38 Regex: behaviour of ? after () atom Steffen Nurpmeso
@ 2018-09-07 15:18 ` Rich Felker
  2018-09-07 15:25   ` Steffen Nurpmeso
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 15:18 UTC (permalink / raw)
  To: musl; +Cc: Steffen Nurpmeso

On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
> Hello.
> 
> In perl this is
> 
>   $x="print 1 2";
>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
>     print "<$0> -> <$1> <$2> <$3>\n"
>   }
> 
> and the result is
> 
>   </tmp/t.pl> -> <> <print> < 1 2>
> 
> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
> i maintain, which uses the normal regex stuff and calls it via
> 
>   echo eins=$3
>          vput vexpr i regex "${3}" \
>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
>             '<\$0> -> <\$1> <\$2> <\$3>'
>   echo i=$i
> 
> which in C code does 
> 
>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>           ...
>          goto jestr;
>       }
>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
>   argv[1],argv[2],n_NELEM(rema));
>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
> 
> and overall prints
> 
>   eins=print 1 2
>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
>   i=<print 1 2> -> <> <> <>
> 
> It works correctly if i remove the ()? atom, so i thought i should
> report that.

What is the value of the flags argument you passed to regcomp?

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 15:18 ` Rich Felker
@ 2018-09-07 15:25   ` Steffen Nurpmeso
  2018-09-07 15:33     ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 15:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
 |On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
 |> Hello.
 |> 
 |> In perl this is
 |> 
 |>   $x="print 1 2";
 |>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
 |>     print "<$0> -> <$1> <$2> <$3>\n"
 |>}
 |> 
 |> and the result is
 |> 
 |>   </tmp/t.pl> -> <> <print> < 1 2>
 |> 
 |> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
 |> i maintain, which uses the normal regex stuff and calls it via
 |> 
 |>   echo eins=$3
 |>          vput vexpr i regex "${3}" \
 |>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
 |>             '<\$0> -> <\$1> <\$2> <\$3>'
 |>   echo i=$i
 |> 
 |> which in C code does 
 |> 
 |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
 |>           ...
 |>          goto jestr;
 |>}
 |>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
 |>   argv[1],argv[2],n_NELEM(rema));
 |>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
 |> 
 |> and overall prints
 |> 
 |>   eins=print 1 2
 |>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
 |>   i=<print 1 2> -> <> <> <>
 |> 
 |> It works correctly if i remove the ()? atom, so i thought i should
 |> report that.
 |
 |What is the value of the flags argument you passed to regcomp?
 |

REG_EXTENDED, optional REG_ICASE:

      reflrv = REG_EXTENDED;
      if(f & a_ICASE)
         reflrv |= REG_ICASE;
      if((reflrv = regcomp(&re, argv[2], reflrv))){


--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 15:25   ` Steffen Nurpmeso
@ 2018-09-07 15:33     ` Rich Felker
  2018-09-07 16:00       ` Steffen Nurpmeso
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 15:33 UTC (permalink / raw)
  To: musl; +Cc: Steffen Nurpmeso

On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
>  |On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
>  |> Hello.
>  |> 
>  |> In perl this is
>  |> 
>  |>   $x="print 1 2";
>  |>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
>  |>     print "<$0> -> <$1> <$2> <$3>\n"
>  |>}
>  |> 
>  |> and the result is
>  |> 
>  |>   </tmp/t.pl> -> <> <print> < 1 2>
>  |> 
>  |> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
>  |> i maintain, which uses the normal regex stuff and calls it via
>  |> 
>  |>   echo eins=$3
>  |>          vput vexpr i regex "${3}" \
>  |>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
>  |>             '<\$0> -> <\$1> <\$2> <\$3>'
>  |>   echo i=$i
>  |> 
>  |> which in C code does 
>  |> 
>  |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |>           ...
>  |>          goto jestr;
>  |>}
>  |>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
>  |>   argv[1],argv[2],n_NELEM(rema));
>  |>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
>  |> 
>  |> and overall prints
>  |> 
>  |>   eins=print 1 2
>  |>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
>  |>   i=<print 1 2> -> <> <> <>
>  |> 
>  |> It works correctly if i remove the ()? atom, so i thought i should
>  |> report that.
>  |
>  |What is the value of the flags argument you passed to regcomp?
>  |
> 
> REG_EXTENDED, optional REG_ICASE:
> 
>       reflrv = REG_EXTENDED;
>       if(f & a_ICASE)
>          reflrv |= REG_ICASE;
>       if((reflrv = regcomp(&re, argv[2], reflrv))){

OK, it looks like that should work, and seemed to work here when I
passed the regex to grep -E linked with musl's regex. Can you provide
a minimal self-contained C program to demonstrate the issue you're
having?

BTW which "()?" are you talking about? The whole first parenthesized
subsexpression and the ? after it? I wouldn't call that an atom, but
nothing seems wrong with it.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 15:33     ` Rich Felker
@ 2018-09-07 16:00       ` Steffen Nurpmeso
  2018-09-07 16:08         ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 16:00 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
 |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
 |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
 |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
 |>|> Hello.
 |>|> 
 |>|> In perl this is
 |>|> 
 |>|>   $x="print 1 2";
 |>|>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
 |>|>     print "<$0> -> <$1> <$2> <$3>\n"
 |>|>}
 |>|> 
 |>|> and the result is
 |>|> 
 |>|>   </tmp/t.pl> -> <> <print> < 1 2>
 |>|> 
 |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
 |>|> i maintain, which uses the normal regex stuff and calls it via
 |>|> 
 |>|>   echo eins=$3
 |>|>          vput vexpr i regex "${3}" \
 |>|>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
 |>|>             '<\$0> -> <\$1> <\$2> <\$3>'
 |>|>   echo i=$i
 |>|> 
 |>|> which in C code does 
 |>|> 
 |>|>       if((reflrv = regcomp(&re, argv[2], reflrv))){
 |>|>           ...
 |>|>          goto jestr;
 |>|>}
 |>|>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
 |>|>   argv[1],argv[2],n_NELEM(rema));
 |>|>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
 |>|> 
 |>|> and overall prints
 |>|> 
 |>|>   eins=print 1 2
 |>|>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
 |>|>   i=<print 1 2> -> <> <> <>
 |>|> 
 |>|> It works correctly if i remove the ()? atom, so i thought i should
 |>|> report that.
 |>|
 |>|What is the value of the flags argument you passed to regcomp?
 |>|
 |> 
 |> REG_EXTENDED, optional REG_ICASE:
 |> 
 |>       reflrv = REG_EXTENDED;
 |>       if(f & a_ICASE)
 |>          reflrv |= REG_ICASE;
 |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
 |
 |OK, it looks like that should work, and seemed to work here when I
 |passed the regex to grep -E linked with musl's regex. Can you provide
 |a minimal self-contained C program to demonstrate the issue you're
 |having?

Happy user that i am, here something for tests/:

  #include <stdio.h>
  #include <regex.h>
  int main(void){
          regmatch_t rema[1 + 21];
          regex_t re;
          int i;
          
          i = REG_EXTENDED;
          if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
                  return 2;
          i = regexec(&re, "print 1 2", 21, rema, 0);
          regfree(&re);
          if(i == REG_NOMATCH)
                  return 3;
          for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
                  ;
          return (i == 3) ? 0 : 4;
  }       

i is 1 here.

 |BTW which "()?" are you talking about? The whole first parenthesized
 |subsexpression and the ? after it? I wouldn't call that an atom, but
 |nothing seems wrong with it.

I have read regex(7) first just in case something intellectual had
to be said.  Otherwise i am all for Finnish tango.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 16:00       ` Steffen Nurpmeso
@ 2018-09-07 16:08         ` Rich Felker
  2018-09-07 16:22           ` Steffen Nurpmeso
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 16:08 UTC (permalink / raw)
  To: musl; +Cc: Steffen Nurpmeso

On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
>  |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
>  |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
>  |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
>  |>|> Hello.
>  |>|> 
>  |>|> In perl this is
>  |>|> 
>  |>|>   $x="print 1 2";
>  |>|>   if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){ 
>  |>|>     print "<$0> -> <$1> <$2> <$3>\n"
>  |>|>}
>  |>|> 
>  |>|> and the result is
>  |>|> 
>  |>|>   </tmp/t.pl> -> <> <print> < 1 2>
>  |>|> 
>  |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
>  |>|> i maintain, which uses the normal regex stuff and calls it via
>  |>|> 
>  |>|>   echo eins=$3
>  |>|>          vput vexpr i regex "${3}" \
>  |>|>             '^(:[[:space:]]+)?([^[:space:]]+)(.*)$'  \
>  |>|>             '<\$0> -> <\$1> <\$2> <\$3>'
>  |>|>   echo i=$i
>  |>|> 
>  |>|> which in C code does 
>  |>|> 
>  |>|>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |>|>           ...
>  |>|>          goto jestr;
>  |>|>}
>  |>|>   fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
>  |>|>   argv[1],argv[2],n_NELEM(rema));
>  |>|>       reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
>  |>|> 
>  |>|> and overall prints
>  |>|> 
>  |>|>   eins=print 1 2
>  |>|>   GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
>  |>|>   i=<print 1 2> -> <> <> <>
>  |>|> 
>  |>|> It works correctly if i remove the ()? atom, so i thought i should
>  |>|> report that.
>  |>|
>  |>|What is the value of the flags argument you passed to regcomp?
>  |>|
>  |> 
>  |> REG_EXTENDED, optional REG_ICASE:
>  |> 
>  |>       reflrv = REG_EXTENDED;
>  |>       if(f & a_ICASE)
>  |>          reflrv |= REG_ICASE;
>  |>       if((reflrv = regcomp(&re, argv[2], reflrv))){
>  |
>  |OK, it looks like that should work, and seemed to work here when I
>  |passed the regex to grep -E linked with musl's regex. Can you provide
>  |a minimal self-contained C program to demonstrate the issue you're
>  |having?
> 
> Happy user that i am, here something for tests/:
> 
>   #include <stdio.h>
>   #include <regex.h>
>   int main(void){
>           regmatch_t rema[1 + 21];
>           regex_t re;
>           int i;
>           
>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
>                   return 2;
>           i = regexec(&re, "print 1 2", 21, rema, 0);
>           regfree(&re);
>           if(i == REG_NOMATCH)
>                   return 3;
>           for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
>                   ;
>           return (i == 3) ? 0 : 4;
>   }       
> 
> i is 1 here.
> 
>  |BTW which "()?" are you talking about? The whole first parenthesized
>  |subsexpression and the ? after it? I wouldn't call that an atom, but
>  |nothing seems wrong with it.
> 
> I have read regex(7) first just in case something intellectual had
> to be said.  Otherwise i am all for Finnish tango.

Your stopping condition is just wrong -- you're stopping after seeing
that the first subexpression does not match anything, and failing to
inspect the others. If you get rid of that stopping condition and
add code to print the rest, you'll see (each line is i, rm_so, rm_eo):

1 -1 -1
2 0 5
3 5 9
4 -1 -1
5 -1 -1
6 -1 -1
...

Also, for what it's worth, there's no reason to store expressions
temporarily in variables like this:

>           i = REG_EXTENDED;
>           if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))

Just do:

          if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", REG_EXTENDED)))

etc.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regex: behaviour of ? after () atom
  2018-09-07 16:08         ` Rich Felker
@ 2018-09-07 16:22           ` Steffen Nurpmeso
  0 siblings, 0 replies; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 16:22 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Rich Felker wrote in <20180907160847.GN1878@brightrain.aerifal.cx>:
 |On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
 |> Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
 |>|On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
 |>|> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
 |>|>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
 ...
 |Your stopping condition is just wrong -- you're stopping after seeing
 |that the first subexpression does not match anything, and failing to
 |inspect the others. If you get rid of that stopping condition and
 |add code to print the rest, you'll see (each line is i, rm_so, rm_eo):
 |
 |1 -1 -1
 |2 0 5
 |3 5 9
 |4 -1 -1
 |5 -1 -1
 |6 -1 -1
 |...

I see.  Indeed.  And no other bug to report somewhere else just as
last time, sorry for the noise.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-07 16:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 13:38 Regex: behaviour of ? after () atom Steffen Nurpmeso
2018-09-07 15:18 ` Rich Felker
2018-09-07 15:25   ` Steffen Nurpmeso
2018-09-07 15:33     ` Rich Felker
2018-09-07 16:00       ` Steffen Nurpmeso
2018-09-07 16:08         ` Rich Felker
2018-09-07 16:22           ` Steffen Nurpmeso

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).