From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
Cc: zsh-workers@zsh.org
Subject: Re: zsh/bash behavior variance: regex ERE matching
Date: Wed, 14 Mar 2018 14:37:24 +0000 [thread overview]
Message-ID: <20180314143724.GB10404@chaz.gmail.com> (raw)
In-Reply-To: <20180314024032.GB32722@tower.spodhuis.org>
2018-03-13 22:40:33 -0400, Phil Pennock:
[...]
> So: we ask for ERE, we get ERE+nonstandard.
>
> On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
> match with `REG_ENHANCED` features.
[...]
An important note about how bash's =~ works since 3.2 (in 3.1
or with the compat31 option it works more like zsh):
In bash (and to some extent in ksh93 as well though it's very
buggy there), the shell quoting operators have an influence on
the regex matching like it does for shell wildcards.
[[ a =~ "." ]] or [[ a =~ \. ]]
actually call regcomp() with a "\." regexp.
To do that, bash needs to parse the regexp and does it using the
POSIX ERE syntax. In
[[ a =~ \d ]] there is the same as [[ a =~ "d" ]] and calls
regcomp() with "d" while for [[ a =~ '\d' ]], it calls it with
"\\d" (the "\" being shell-quoted results in it being
regexp-escaped).
That means that if you want to use extensions, you need to use
variables or other expansions there (which you leave unquoted).
Like:
re='\d'
[[ a =~ $re ]]
for regcomp() to be called with "\d".
Note that (?:...) and \d are fine. We're not breaking EREs by
supporting it as the behaviour for (?:...) and \d is unspecified
in the POSIX ERE specification.
Other regexp implementations have other backward-compatible
extensions. For instance, GNU EREs support \b, \<, \>...
Some incompatibilities I'm aware of between ERE and PCRE (I
don't know if that also applies to those macOS REs):
- In POSIX ERE, [\d] matches on \ and d while it matches on a
digit in PCRE (see also [\]] and co)
- in POSIX ERE, alternation looks for the longest match, while
PCRE the leftmost one that matches.
$ echo abc | grep -oE 'a|ab'
ab
$ echo abc | grep -oP 'a|ab'
a
$ [[ abc =~ '(a|ab)' ]]; echo $match
ab
$ setopt rematchpcre
$ [[ abc =~ '(a|ab)' ]]; echo $match
a
As long as the regex library does what is required for POSIX
compliant regular expressions, since we document that =~ does
POSIX ERE, I'd say it doesn't matter what extension are
implemented over the standard.
--
Stephane
prev parent reply other threads:[~2018-03-14 14:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-14 2:40 Phil Pennock
2018-03-14 14:37 ` Stephane Chazelas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180314143724.GB10404@chaz.gmail.com \
--to=stephane.chazelas@gmail.com \
--cc=zsh-workers+phil.pennock@spodhuis.org \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).