From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23904 invoked by alias); 14 Mar 2018 02:47:30 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42458 Received: (qmail 23615 invoked by uid 1010); 14 Mar 2018 02:47:29 -0000 X-Qmail-Scanner-Diagnostics: from mx.spodhuis.org by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(94.142.241.89):SA:0(-4.2/5.0):. Processed in 1.220849 secs); 14 Mar 2018 02:47:29 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: zsh-workers+phil.pennock@spodhuis.org X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201802; h=Content-Type:MIME-Version:Message-ID:Subject:To: From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=jo/ihkrH86yuPTC+MYMPMYgiIJsXE9etIrGV6MZUjBc=; b=X2aiB7siSt3kidpoyhRhXSrPAj 3LNwY3kIlSC7k3x075Uz4NWG0F6loh3ElrnF15056FGuPnvXJL5T5jPmBYQLKJXXegEunvKqwrPX9 f1RjAeTgN5v9GbnIo4IDWLtJBFQfsmzoBvwqexqcqN1ZKyyqcW9HWCFnZVfOJVNuL0H4FIAFmvyfY 18gBwE3TA0LQscZpmlv3Oriwt/st; Date: Tue, 13 Mar 2018 22:40:33 -0400 From: Phil Pennock To: zsh-workers@zsh.org Subject: zsh/bash behavior variance: regex ERE matching Message-ID: <20180314024032.GB32722@tower.spodhuis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline OpenPGP: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc This is just to note that I have observed a behavior variance. My proposed solution is to do absolutely nothing, and accept the variance as "sane in an insane world". Note that, per my standing practice, I do not cause risk to a code-base which does not belong to me by reading GPL code of a related code-base, so still have not read the bash code. (I like the GPL and use it elsewhere, but Zsh isn't GPL and it's not my call to risk that, so I stubbornly refuse to risk it). Descriptions of bash are based on surmise from observed behavior. Background: when bash copied the Perl-ish `=~` syntax, they declared it to be an ERE match. When I saw that Bash had added the `=~` comparison infix operator, I went "that's a good idea" and did likewise for Zsh; during on-list discussion at the time, the core maintainers expressed a preference for closer compatibility with Bash, so I wrote the `zsh/regex` module to do ERE matching and introduced the `re_match_pcre` option to let folks map `=~` onto our long-standing `-pcre-match` infix operator. (I think Peter chose to make zsh/regex the default always, which was very sane.) Situation: on macOS (10.12.6, Sierrra), the regex library is based on TRE, not on Henry Spencer's library or any other. Further, re_format(7) documents a number of features for `REG_ENHANCED` mode, as distinct from `REG_EXTENDED`. These are Perl-ish/PCRE-ish features such as `\d` for `[[:digit:]]` and `(?:whatever)` for non-capturing grouping. Using Zsh 5.4.2 built from Homebrew, which has no relevant patches, the `=~` operator in Zsh is picking up features documented as `REG_ENHANCED` when we only ask for `REG_EXTENDED`. Homebrew reports that zsh is: Built from source on 2018-01-07 at 18:10:37 with: --with-unicode9 --with-gdbm --with-pcre Specifically, the added features are the two features cited above, `\d` and `(?:...)`. So: we ask for ERE, we get ERE+nonstandard. On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_ match with `REG_ENHANCED` features. Best operating hypothesis is: * Darwin userland bug * Bash build process has logic to detect broken ERE in system libraries and use a GNU ERE implementation (or ships with such always?) so that it's immune from bugs like this Proposed action: nothing Reason: most folks aren't familiar enough with regexps to know the variances and I suspect a non-trivial number of macOS users who are unwittingly relying upon TRE REG_ENHANCED features. Fixing the incompatibility (1) risks breaking working user scripts and (2) requires shipping our own reliable ERE regexp library, and really I just don't want to go there. FWIW, somewhere lying around I also have a module which adds zsh/re2 as a module, using Russ Cox's RE2 engine (as popularized by Go). I suspect that this would cause more confusion than it would solve, and I think I dropped it part-way through converting RE_MATCH_PCRE to a compatibility shim which edits a zsh-specific parameter which defines the engine to be used and so can be set to any of (regex, re2, pcre). If any of the core team express interest, I can probably dust that off. -Phil