From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4382 invoked by alias); 14 Mar 2018 14:37:33 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42462 Received: (qmail 24247 invoked by uid 1010); 14 Mar 2018 14:37:33 -0000 X-Qmail-Scanner-Diagnostics: from mail-wr0-f180.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.128.180):SA:0(-1.9/5.0):. Processed in 1.958188 secs); 14 Mar 2018 14:37:33 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_PASS,T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=/S2gIPcbdD4rEzESIYLw7JxteTvtA2/5nfkGI55cYOo=; b=jKJUIgCDEAXTukWtOo95z4NtQVkLNqid7gUdFX+ULPcYlqNySG/Q6XGJ5GLZAqEhmX 0hokKB83yXRFlJ9OQhgrJ14syxGDbs3RoBKxx1PL18B5Ai+G4RBPzxNam7bRai+rzmW/ 3kOsvyRk/c8aG/pRpp6S06TkeseswlVteRhXYPr67pW54eaBHeT/KWgudw207Acj1XAS Cj/DClenSkIyZCamDmVjer5StzhT4idYR4Lo8xAyznOe8fMtcyITi+hbYU9ye+C6T/qJ TzGI7fKbw+WWlWt6iXn67X2+wh3OuylvHUjtGeljEjwXeoEiUCHMHk9HvCbnZckOluce sUDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=/S2gIPcbdD4rEzESIYLw7JxteTvtA2/5nfkGI55cYOo=; b=lRnNMxV9qN8pTDMnLKUu0umhGHuS5R66fyyfgEXpic5EongqJy1FGFpOlPIp5UxcpM g9vUvpkmE/gZKOYIiWElsFvlv269RbnRgLKATk2dEAQCMAg8mxjyccfLEsquw8s/mGSx QNNBqD0WIY6XTLLI3W/gWQ9tpGyKbenFp5PxMrDTUNeXM5pErborysdFHxYrMfqWweUb FfH8lFTuq7QgWcHRyXnHJbb2i67WcNfsB/31dye5frMAXL5i6TmlT0ApGaalN4uBMMJ5 7gzfYC7m7YxvvseDSfOtRnyjiva5x1ZoXeNM58H5/pO4B80JBn7O0RN6M6q+Ko9aXFV+ TzSQ== X-Gm-Message-State: AElRT7HKXUVTmfbTqkiiRQOCCzTPipWFxzgGgfWveX3FJwuNVWaXryT5 MLqYWnShf7m/RcRd2J1SlgpUyA== X-Google-Smtp-Source: AG47ELu+p5Nrw0ohM9Ks8f9WoiIKfYU7CVv0Ldg3X1KwnhGxtyPvSbOLdAoR3sFJuR5iouj0mQbesQ== X-Received: by 10.223.150.111 with SMTP id c44mr4192718wra.90.1521038246603; Wed, 14 Mar 2018 07:37:26 -0700 (PDT) Date: Wed, 14 Mar 2018 14:37:24 +0000 From: Stephane Chazelas To: Phil Pennock Cc: zsh-workers@zsh.org Subject: Re: zsh/bash behavior variance: regex ERE matching Message-ID: <20180314143724.GB10404@chaz.gmail.com> Mail-Followup-To: Phil Pennock , zsh-workers@zsh.org References: <20180314024032.GB32722@tower.spodhuis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180314024032.GB32722@tower.spodhuis.org> User-Agent: Mutt/1.5.24 (2015-08-30) 2018-03-13 22:40:33 -0400, Phil Pennock: [...] > So: we ask for ERE, we get ERE+nonstandard. > > On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_ > match with `REG_ENHANCED` features. [...] An important note about how bash's =~ works since 3.2 (in 3.1 or with the compat31 option it works more like zsh): In bash (and to some extent in ksh93 as well though it's very buggy there), the shell quoting operators have an influence on the regex matching like it does for shell wildcards. [[ a =~ "." ]] or [[ a =~ \. ]] actually call regcomp() with a "\." regexp. To do that, bash needs to parse the regexp and does it using the POSIX ERE syntax. In [[ a =~ \d ]] there is the same as [[ a =~ "d" ]] and calls regcomp() with "d" while for [[ a =~ '\d' ]], it calls it with "\\d" (the "\" being shell-quoted results in it being regexp-escaped). That means that if you want to use extensions, you need to use variables or other expansions there (which you leave unquoted). Like: re='\d' [[ a =~ $re ]] for regcomp() to be called with "\d". Note that (?:...) and \d are fine. We're not breaking EREs by supporting it as the behaviour for (?:...) and \d is unspecified in the POSIX ERE specification. Other regexp implementations have other backward-compatible extensions. For instance, GNU EREs support \b, \<, \>... Some incompatibilities I'm aware of between ERE and PCRE (I don't know if that also applies to those macOS REs): - In POSIX ERE, [\d] matches on \ and d while it matches on a digit in PCRE (see also [\]] and co) - in POSIX ERE, alternation looks for the longest match, while PCRE the leftmost one that matches. $ echo abc | grep -oE 'a|ab' ab $ echo abc | grep -oP 'a|ab' a $ [[ abc =~ '(a|ab)' ]]; echo $match ab $ setopt rematchpcre $ [[ abc =~ '(a|ab)' ]]; echo $match a As long as the regex library does what is required for POSIX compliant regular expressions, since we document that =~ does POSIX ERE, I'd say it doesn't matter what extension are implemented over the standard. -- Stephane