From: Phil Pennock <zsh-workers+phil.pennock@spodhuis.org>
To: zsh-workers@zsh.org
Subject: zsh/bash behavior variance: regex ERE matching
Date: Tue, 13 Mar 2018 22:40:33 -0400 [thread overview]
Message-ID: <20180314024032.GB32722@tower.spodhuis.org> (raw)
This is just to note that I have observed a behavior variance. My
proposed solution is to do absolutely nothing, and accept the variance
as "sane in an insane world".
Note that, per my standing practice, I do not cause risk to a code-base
which does not belong to me by reading GPL code of a related code-base,
so still have not read the bash code. (I like the GPL and use it
elsewhere, but Zsh isn't GPL and it's not my call to risk that, so I
stubbornly refuse to risk it). Descriptions of bash are based on
surmise from observed behavior.
Background: when bash copied the Perl-ish `=~` syntax, they declared it
to be an ERE match. When I saw that Bash had added the `=~` comparison
infix operator, I went "that's a good idea" and did likewise for Zsh;
during on-list discussion at the time, the core maintainers expressed a
preference for closer compatibility with Bash, so I wrote the
`zsh/regex` module to do ERE matching and introduced the `re_match_pcre`
option to let folks map `=~` onto our long-standing `-pcre-match` infix
operator. (I think Peter chose to make zsh/regex the default always,
which was very sane.)
Situation: on macOS (10.12.6, Sierrra), the regex library is based on
TRE, not on Henry Spencer's library or any other. Further, re_format(7)
documents a number of features for `REG_ENHANCED` mode, as distinct from
`REG_EXTENDED`. These are Perl-ish/PCRE-ish features such as `\d` for
`[[:digit:]]` and `(?:whatever)` for non-capturing grouping.
Using Zsh 5.4.2 built from Homebrew, which has no relevant patches, the
`=~` operator in Zsh is picking up features documented as `REG_ENHANCED`
when we only ask for `REG_EXTENDED`. Homebrew reports that zsh is:
Built from source on 2018-01-07 at 18:10:37 with: --with-unicode9 --with-gdbm --with-pcre
Specifically, the added features are the two features cited above,
`\d` and `(?:...)`.
So: we ask for ERE, we get ERE+nonstandard.
On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
match with `REG_ENHANCED` features.
Best operating hypothesis is:
* Darwin userland bug
* Bash build process has logic to detect broken ERE in system libraries
and use a GNU ERE implementation (or ships with such always?) so that
it's immune from bugs like this
Proposed action: nothing
Reason: most folks aren't familiar enough with regexps to know the
variances and I suspect a non-trivial number of macOS users who are
unwittingly relying upon TRE REG_ENHANCED features. Fixing the
incompatibility (1) risks breaking working user scripts and (2) requires
shipping our own reliable ERE regexp library, and really I just don't
want to go there.
FWIW, somewhere lying around I also have a module which adds zsh/re2 as
a module, using Russ Cox's RE2 engine (as popularized by Go). I suspect
that this would cause more confusion than it would solve, and I think I
dropped it part-way through converting RE_MATCH_PCRE to a compatibility
shim which edits a zsh-specific parameter which defines the engine to be
used and so can be set to any of (regex, re2, pcre). If any of the core
team express interest, I can probably dust that off.
-Phil
next reply other threads:[~2018-03-14 2:47 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-14 2:40 Phil Pennock [this message]
2018-03-14 14:37 ` Stephane Chazelas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180314024032.GB32722@tower.spodhuis.org \
--to=zsh-workers+phil.pennock@spodhuis.org \
--cc=zsh-workers@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).