* Regex: behaviour of ? after () atom
@ 2018-09-07 13:38 Steffen Nurpmeso
2018-09-07 15:18 ` Rich Felker
0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 13:38 UTC (permalink / raw)
To: musl
Hello.
In perl this is
$x="print 1 2";
if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
print "<$0> -> <$1> <$2> <$3>\n"
}
and the result is
</tmp/t.pl> -> <> <print> < 1 2>
Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
i maintain, which uses the normal regex stuff and calls it via
echo eins=$3
vput vexpr i regex "${3}" \
'^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
'<\$0> -> <\$1> <\$2> <\$3>'
echo i=$i
which in C code does
if((reflrv = regcomp(&re, argv[2], reflrv))){
...
goto jestr;
}
fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
argv[1],argv[2],n_NELEM(rema));
reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
and overall prints
eins=print 1 2
GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
i=<print 1 2> -> <> <> <>
It works correctly if i remove the ()? atom, so i thought i should
report that.
Ciao,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 13:38 Regex: behaviour of ? after () atom Steffen Nurpmeso
@ 2018-09-07 15:18 ` Rich Felker
2018-09-07 15:25 ` Steffen Nurpmeso
0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 15:18 UTC (permalink / raw)
To: musl; +Cc: Steffen Nurpmeso
On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
> Hello.
>
> In perl this is
>
> $x="print 1 2";
> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
> print "<$0> -> <$1> <$2> <$3>\n"
> }
>
> and the result is
>
> </tmp/t.pl> -> <> <print> < 1 2>
>
> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
> i maintain, which uses the normal regex stuff and calls it via
>
> echo eins=$3
> vput vexpr i regex "${3}" \
> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
> '<\$0> -> <\$1> <\$2> <\$3>'
> echo i=$i
>
> which in C code does
>
> if((reflrv = regcomp(&re, argv[2], reflrv))){
> ...
> goto jestr;
> }
> fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
> argv[1],argv[2],n_NELEM(rema));
> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
>
> and overall prints
>
> eins=print 1 2
> GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
> i=<print 1 2> -> <> <> <>
>
> It works correctly if i remove the ()? atom, so i thought i should
> report that.
What is the value of the flags argument you passed to regcomp?
Rich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 15:18 ` Rich Felker
@ 2018-09-07 15:25 ` Steffen Nurpmeso
2018-09-07 15:33 ` Rich Felker
0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 15:25 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
|> Hello.
|>
|> In perl this is
|>
|> $x="print 1 2";
|> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
|> print "<$0> -> <$1> <$2> <$3>\n"
|>}
|>
|> and the result is
|>
|> </tmp/t.pl> -> <> <print> < 1 2>
|>
|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
|> i maintain, which uses the normal regex stuff and calls it via
|>
|> echo eins=$3
|> vput vexpr i regex "${3}" \
|> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
|> '<\$0> -> <\$1> <\$2> <\$3>'
|> echo i=$i
|>
|> which in C code does
|>
|> if((reflrv = regcomp(&re, argv[2], reflrv))){
|> ...
|> goto jestr;
|>}
|> fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
|> argv[1],argv[2],n_NELEM(rema));
|> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
|>
|> and overall prints
|>
|> eins=print 1 2
|> GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
|> i=<print 1 2> -> <> <> <>
|>
|> It works correctly if i remove the ()? atom, so i thought i should
|> report that.
|
|What is the value of the flags argument you passed to regcomp?
|
REG_EXTENDED, optional REG_ICASE:
reflrv = REG_EXTENDED;
if(f & a_ICASE)
reflrv |= REG_ICASE;
if((reflrv = regcomp(&re, argv[2], reflrv))){
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 15:25 ` Steffen Nurpmeso
@ 2018-09-07 15:33 ` Rich Felker
2018-09-07 16:00 ` Steffen Nurpmeso
0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 15:33 UTC (permalink / raw)
To: musl; +Cc: Steffen Nurpmeso
On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
> |On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
> |> Hello.
> |>
> |> In perl this is
> |>
> |> $x="print 1 2";
> |> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
> |> print "<$0> -> <$1> <$2> <$3>\n"
> |>}
> |>
> |> and the result is
> |>
> |> </tmp/t.pl> -> <> <print> < 1 2>
> |>
> |> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
> |> i maintain, which uses the normal regex stuff and calls it via
> |>
> |> echo eins=$3
> |> vput vexpr i regex "${3}" \
> |> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
> |> '<\$0> -> <\$1> <\$2> <\$3>'
> |> echo i=$i
> |>
> |> which in C code does
> |>
> |> if((reflrv = regcomp(&re, argv[2], reflrv))){
> |> ...
> |> goto jestr;
> |>}
> |> fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
> |> argv[1],argv[2],n_NELEM(rema));
> |> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
> |>
> |> and overall prints
> |>
> |> eins=print 1 2
> |> GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
> |> i=<print 1 2> -> <> <> <>
> |>
> |> It works correctly if i remove the ()? atom, so i thought i should
> |> report that.
> |
> |What is the value of the flags argument you passed to regcomp?
> |
>
> REG_EXTENDED, optional REG_ICASE:
>
> reflrv = REG_EXTENDED;
> if(f & a_ICASE)
> reflrv |= REG_ICASE;
> if((reflrv = regcomp(&re, argv[2], reflrv))){
OK, it looks like that should work, and seemed to work here when I
passed the regex to grep -E linked with musl's regex. Can you provide
a minimal self-contained C program to demonstrate the issue you're
having?
BTW which "()?" are you talking about? The whole first parenthesized
subsexpression and the ? after it? I wouldn't call that an atom, but
nothing seems wrong with it.
Rich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 15:33 ` Rich Felker
@ 2018-09-07 16:00 ` Steffen Nurpmeso
2018-09-07 16:08 ` Rich Felker
0 siblings, 1 reply; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 16:00 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
|On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
|> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
|>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
|>|> Hello.
|>|>
|>|> In perl this is
|>|>
|>|> $x="print 1 2";
|>|> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
|>|> print "<$0> -> <$1> <$2> <$3>\n"
|>|>}
|>|>
|>|> and the result is
|>|>
|>|> </tmp/t.pl> -> <> <print> < 1 2>
|>|>
|>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
|>|> i maintain, which uses the normal regex stuff and calls it via
|>|>
|>|> echo eins=$3
|>|> vput vexpr i regex "${3}" \
|>|> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
|>|> '<\$0> -> <\$1> <\$2> <\$3>'
|>|> echo i=$i
|>|>
|>|> which in C code does
|>|>
|>|> if((reflrv = regcomp(&re, argv[2], reflrv))){
|>|> ...
|>|> goto jestr;
|>|>}
|>|> fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
|>|> argv[1],argv[2],n_NELEM(rema));
|>|> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
|>|>
|>|> and overall prints
|>|>
|>|> eins=print 1 2
|>|> GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
|>|> i=<print 1 2> -> <> <> <>
|>|>
|>|> It works correctly if i remove the ()? atom, so i thought i should
|>|> report that.
|>|
|>|What is the value of the flags argument you passed to regcomp?
|>|
|>
|> REG_EXTENDED, optional REG_ICASE:
|>
|> reflrv = REG_EXTENDED;
|> if(f & a_ICASE)
|> reflrv |= REG_ICASE;
|> if((reflrv = regcomp(&re, argv[2], reflrv))){
|
|OK, it looks like that should work, and seemed to work here when I
|passed the regex to grep -E linked with musl's regex. Can you provide
|a minimal self-contained C program to demonstrate the issue you're
|having?
Happy user that i am, here something for tests/:
#include <stdio.h>
#include <regex.h>
int main(void){
regmatch_t rema[1 + 21];
regex_t re;
int i;
i = REG_EXTENDED;
if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
return 2;
i = regexec(&re, "print 1 2", 21, rema, 0);
regfree(&re);
if(i == REG_NOMATCH)
return 3;
for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
;
return (i == 3) ? 0 : 4;
}
i is 1 here.
|BTW which "()?" are you talking about? The whole first parenthesized
|subsexpression and the ? after it? I wouldn't call that an atom, but
|nothing seems wrong with it.
I have read regex(7) first just in case something intellectual had
to be said. Otherwise i am all for Finnish tango.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 16:00 ` Steffen Nurpmeso
@ 2018-09-07 16:08 ` Rich Felker
2018-09-07 16:22 ` Steffen Nurpmeso
0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2018-09-07 16:08 UTC (permalink / raw)
To: musl; +Cc: Steffen Nurpmeso
On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
> Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
> |On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
> |> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
> |>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
> |>|> Hello.
> |>|>
> |>|> In perl this is
> |>|>
> |>|> $x="print 1 2";
> |>|> if($x =~ /^(:[[:space:]]+)?([^[:space:]]+)(.*)$/){
> |>|> print "<$0> -> <$1> <$2> <$3>\n"
> |>|>}
> |>|>
> |>|> and the result is
> |>|>
> |>|> </tmp/t.pl> -> <> <print> < 1 2>
> |>|>
> |>|> Now the same on AlpineLinux edge and musl-1.1.19-r10 with the MUA
> |>|> i maintain, which uses the normal regex stuff and calls it via
> |>|>
> |>|> echo eins=$3
> |>|> vput vexpr i regex "${3}" \
> |>|> '^(:[[:space:]]+)?([^[:space:]]+)(.*)$' \
> |>|> '<\$0> -> <\$1> <\$2> <\$3>'
> |>|> echo i=$i
> |>|>
> |>|> which in C code does
> |>|>
> |>|> if((reflrv = regcomp(&re, argv[2], reflrv))){
> |>|> ...
> |>|> goto jestr;
> |>|>}
> |>|> fprintf(stderr, "GOING for <%s> -> <%s> %u\n",
> |>|> argv[1],argv[2],n_NELEM(rema));
> |>|> reflrv = regexec(&re, argv[1], n_NELEM(rema), rema, 0);
> |>|>
> |>|> and overall prints
> |>|>
> |>|> eins=print 1 2
> |>|> GOING for <print 1 2> -> <^(:[[:space:]]+)?([^[:space:]]+)(.*)$> 17
> |>|> i=<print 1 2> -> <> <> <>
> |>|>
> |>|> It works correctly if i remove the ()? atom, so i thought i should
> |>|> report that.
> |>|
> |>|What is the value of the flags argument you passed to regcomp?
> |>|
> |>
> |> REG_EXTENDED, optional REG_ICASE:
> |>
> |> reflrv = REG_EXTENDED;
> |> if(f & a_ICASE)
> |> reflrv |= REG_ICASE;
> |> if((reflrv = regcomp(&re, argv[2], reflrv))){
> |
> |OK, it looks like that should work, and seemed to work here when I
> |passed the regex to grep -E linked with musl's regex. Can you provide
> |a minimal self-contained C program to demonstrate the issue you're
> |having?
>
> Happy user that i am, here something for tests/:
>
> #include <stdio.h>
> #include <regex.h>
> int main(void){
> regmatch_t rema[1 + 21];
> regex_t re;
> int i;
>
> i = REG_EXTENDED;
> if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
> return 2;
> i = regexec(&re, "print 1 2", 21, rema, 0);
> regfree(&re);
> if(i == REG_NOMATCH)
> return 3;
> for(i = 1; i < 21 && rema[i].rm_so != -1; ++i)
> ;
> return (i == 3) ? 0 : 4;
> }
>
> i is 1 here.
>
> |BTW which "()?" are you talking about? The whole first parenthesized
> |subsexpression and the ? after it? I wouldn't call that an atom, but
> |nothing seems wrong with it.
>
> I have read regex(7) first just in case something intellectual had
> to be said. Otherwise i am all for Finnish tango.
Your stopping condition is just wrong -- you're stopping after seeing
that the first subexpression does not match anything, and failing to
inspect the others. If you get rid of that stopping condition and
add code to print the rest, you'll see (each line is i, rm_so, rm_eo):
1 -1 -1
2 0 5
3 5 9
4 -1 -1
5 -1 -1
6 -1 -1
...
Also, for what it's worth, there's no reason to store expressions
temporarily in variables like this:
> i = REG_EXTENDED;
> if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", i)))
Just do:
if((i = regcomp(&re, "^(:[[:space:]]+)?([^[:space:]]+)(.*)$", REG_EXTENDED)))
etc.
Rich
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regex: behaviour of ? after () atom
2018-09-07 16:08 ` Rich Felker
@ 2018-09-07 16:22 ` Steffen Nurpmeso
0 siblings, 0 replies; 7+ messages in thread
From: Steffen Nurpmeso @ 2018-09-07 16:22 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Rich Felker wrote in <20180907160847.GN1878@brightrain.aerifal.cx>:
|On Fri, Sep 07, 2018 at 06:00:46PM +0200, Steffen Nurpmeso wrote:
|> Rich Felker wrote in <20180907153302.GM1878@brightrain.aerifal.cx>:
|>|On Fri, Sep 07, 2018 at 05:25:17PM +0200, Steffen Nurpmeso wrote:
|>|> Rich Felker wrote in <20180907151821.GL1878@brightrain.aerifal.cx>:
|>|>|On Fri, Sep 07, 2018 at 03:38:05PM +0200, Steffen Nurpmeso wrote:
...
|Your stopping condition is just wrong -- you're stopping after seeing
|that the first subexpression does not match anything, and failing to
|inspect the others. If you get rid of that stopping condition and
|add code to print the rest, you'll see (each line is i, rm_so, rm_eo):
|
|1 -1 -1
|2 0 5
|3 5 9
|4 -1 -1
|5 -1 -1
|6 -1 -1
|...
I see. Indeed. And no other bug to report somewhere else just as
last time, sorry for the noise.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-09-07 16:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 13:38 Regex: behaviour of ? after () atom Steffen Nurpmeso
2018-09-07 15:18 ` Rich Felker
2018-09-07 15:25 ` Steffen Nurpmeso
2018-09-07 15:33 ` Rich Felker
2018-09-07 16:00 ` Steffen Nurpmeso
2018-09-07 16:08 ` Rich Felker
2018-09-07 16:22 ` Steffen Nurpmeso
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).