From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2666 invoked by alias); 14 Jun 2017 21:06:35 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41293 Received: (qmail 8311 invoked from network); 14 Jun 2017 21:06:35 -0000 X-Qmail-Scanner-Diagnostics: from mx.spodhuis.org by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(94.142.241.89):SA:0(-2.3/5.0):. Processed in 2.943055 secs); 14 Jun 2017 21:06:35 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: zsh-workers+phil.pennock@spodhuis.org X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at spodhuis.org designates 94.142.241.89 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201705; h=In-Reply-To:Content-Type:MIME-Version:References :Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding :Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=KxM0g6Ublu9T5GBOxVUhbVeiy9XroQR0QLd653CeEZQ=; b=CS8WBr77cgSegk+ZgAOXezIaj7 oJNUizCteQTRckOy5RqjSEXO5Kg3+fQ538e+BB0OjtnoLMDdZRWxZ7DmxsM1z0/lTcwWHL2iHERgu I7biPCTdZGQHslwW7WkO5foxFa173cvwZEQki4ZdVOBu6Hs6Cm0D5Bk1dYYFcel15Si8a8VV3gcxI oSYQPID8chkjHnmtMBLA9vQqlAsu; Date: Wed, 14 Jun 2017 16:49:38 -0400 From: Phil Pennock To: Zsh hackers list Subject: Re: =~ doesn't work with NUL characters Message-ID: <20170614204938.GA76510@tower.spodhuis.org> References: <20170613100217.GA9529@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170613100217.GA9529@chaz.gmail.com> OpenPGP: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc On 2017-06-13 at 11:02 +0100, Stephane Chazelas wrote: > [[ $'a\0b' =~ 'a$' ]] > > returns true both with and without rematchpcre Let's break this down, non-PCRE and PCRE, and consider appropriate behaviour for each separately. Without rematchpcre, this is ERE per POSIX APIs, which don't portably support size-supplied strings, relying instead upon C-string null-termination. Current macOS has regnexec() but this is not in the system regexp library I see on Ubuntu Trusty or FreeBSD 10.3. It appears to be an extension from when they switched to the TRE implementation in macOS 10.8. Trying to support this would result in variations in behaviour across systems in a way which I think might be undesirable. The whole point of adding the non-PCRE implementation was to match Bash behaviour by default, and Bash does the same thing. So for non-PCRE, I think this current behaviour is the only sane choice. For PCRE, I'm inclined to agree that we should be able to portably supply the length and there would not be any cross-platform behavioural variances. I think it's also reasonable that PCRE matching could diverge from ERE matching even more. Others might disagree? We've "always" used strlen here; the most recent change was to handle meta/unmeta (by me), but the strlen usage has been present since the pcre module was introduced in commit bff61cf9e1 in 2001. Thus: do we want to change behaviour, after 16 years, to allow embedded NUL for the PCRE case, being different from the ERE case? There's enough room for disagreement here that I'm not rushing to write a patch, but instead deferring to those with commit-bit. My personal inclination is to handle NULL in the PCRE case. It should just be a case of passing an int* instead of NULL as the second parameter to unmetafy(). -Phil