zsh-workers
 help / color / mirror / code / Atom feed
From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
Cc: zsh-workers@zsh.org
Subject: Re: zsh/bash behavior variance: regex ERE matching
Date: Wed, 14 Mar 2018 14:37:24 +0000	[thread overview]
Message-ID: <20180314143724.GB10404@chaz.gmail.com> (raw)
In-Reply-To: <20180314024032.GB32722@tower.spodhuis.org>

2018-03-13 22:40:33 -0400, Phil Pennock:
[...]
> So: we ask for ERE, we get ERE+nonstandard.
> 
> On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
> match with `REG_ENHANCED` features.
[...]

An important note about how bash's =~ works since 3.2 (in 3.1
or with the compat31 option it works more like zsh):

In bash (and to some extent in ksh93 as well though it's very
buggy there), the shell quoting operators have an influence on
the regex matching like it does for shell wildcards.

[[ a =~ "." ]] or [[ a =~ \. ]]

actually call regcomp() with a "\." regexp.

To do that, bash needs to parse the regexp and does it using the
POSIX ERE syntax. In 

[[ a =~ \d ]] there is the same as [[ a =~ "d" ]] and calls
regcomp() with "d" while for [[ a =~ '\d' ]], it calls it with
"\\d" (the "\" being shell-quoted results in it being
regexp-escaped).

That means that if you want to use extensions, you need to use
variables or other expansions there (which you  leave unquoted).

Like:

re='\d'
[[ a =~ $re ]]

for regcomp() to be called with "\d".

Note that  (?:...) and \d are fine. We're not breaking EREs by
supporting it as the behaviour for (?:...) and \d is unspecified
in the POSIX ERE specification.

Other regexp implementations have other backward-compatible
extensions. For instance, GNU EREs support \b, \<, \>...

Some incompatibilities I'm aware of between ERE and PCRE (I
don't know if that also applies to those macOS REs):

- In POSIX ERE, [\d] matches on \ and d while it matches on a
  digit in PCRE (see also [\]] and co)
- in POSIX ERE, alternation looks for the longest match, while
  PCRE the  leftmost one that matches.

  $ echo abc | grep -oE 'a|ab'
  ab
  $ echo abc | grep -oP 'a|ab'
  a

  $ [[ abc =~ '(a|ab)' ]]; echo $match
  ab
  $ setopt rematchpcre
  $ [[ abc =~ '(a|ab)' ]]; echo $match
  a

As long as the regex library does what is required for POSIX
compliant regular expressions, since we document that =~ does
POSIX ERE, I'd say it doesn't matter what extension are
implemented over the standard.

-- 
Stephane


      reply	other threads:[~2018-03-14 14:37 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-14  2:40 Phil Pennock
2018-03-14 14:37 ` Stephane Chazelas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180314143724.GB10404@chaz.gmail.com \
    --to=stephane.chazelas@gmail.com \
    --cc=zsh-workers+phil.pennock@spodhuis.org \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).