From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: musl@lists.openwall.com, Andreas Schwab <schwab@linux-m68k.org>,
"A. Wilcox" <awilfox@adelielinux.org>
Subject: Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it
Date: Wed, 11 Jan 2017 05:04:01 -0500 [thread overview]
Message-ID: <20170111100400.vhd5ytarqpujigbn@sigill.intra.peff.net> (raw)
In-Reply-To: <20170110113959.GL17692@port70.net>
On Tue, Jan 10, 2017 at 12:40:00PM +0100, Szabolcs Nagy wrote:
> > > I'm not sure if musl is wrong for failing to complain about a
> > > bogus regex. Generally making something that would break into
> > > something that works is an OK way to extend the standard. So our
> > > test is at fault for assuming that the regex will fail. I guess
>
> \x is undefined in posix and musl is based on tre which
> supports \x{hexdigits} in ere.
Thanks for confirming; I figured it was something like that.
> > > we'd need to find some more exotic syntax that pcre supports, but
> > > that ERE doesn't. Maybe "(?:)" or something.
>
> i think you would have to use something that's invalid
> in posix ere, ? after empty expression is undefined,
> not an error so "(?:)" is a valid ere extension.
Reading through POSIX[1], hardly anything is explicitly labeled as
"invalid". Most things are just "undefined", which leaves rooms for
implementations to do what they like.
That's a good thing for a standard to do, but a bad thing when you are
trying to find behavior that differs reliably between PCRE and ERE. :)
In most cases, PCRE constructs could be viable extensions to ERE.
> since most syntax is either defined or undefined in ere
> instead of being invalid, distinguishing pcre using
> syntax is not easy.
>
> there are semantic differences in subexpression matching:
> leftmost match has higher priority in pcre, longest match
> has higher priority in ere.
>
> $ echo ab | grep -o -E '(a|ab)'
> ab
> $ echo ab | grep -o -P '(a|ab)'
> a
>
> unfortunately grep -o is not portable.
In this case we're testing whether Git has internally fed the regex to
pcre or to regcomp(), not a system grep. So we'd need something like
"-o" for "git grep", which I don't think exists.
Another difference I found is that "[\d]" matches a literal "\" or "d"
in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on
that.
Thanks for your answer. I'll drop the musl list from the cc when I
follow-up, as this is most definitely not a musl problem, but a git one.
-Peff
prev parent reply other threads:[~2017-01-11 10:04 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <58688C9F.4000605@adelielinux.org>
[not found] ` <20170102065351.7ymrm77asjbghgdg@sigill.intra.peff.net>
[not found] ` <58736B2A.40003@adelielinux.org>
[not found] ` <871swcjsd3.fsf@linux-m68k.org>
[not found] ` <20170109213303.4rupe5cqwejfp6af@sigill.intra.peff.net>
2017-01-10 10:36 ` A. Wilcox
2017-01-10 11:40 ` Szabolcs Nagy
2017-01-11 10:04 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170111100400.vhd5ytarqpujigbn@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=awilfox@adelielinux.org \
--cc=git@vger.kernel.org \
--cc=musl@lists.openwall.com \
--cc=schwab@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).