From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23690 invoked by alias); 15 Jun 2017 09:50:44 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 41307 Received: (qmail 15889 invoked from network); 15 Jun 2017 09:50:44 -0000 X-Qmail-Scanner-Diagnostics: from mail-wr0-f181.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.128.181):SA:0(-1.0/5.0):. Processed in 1.45502 secs); 15 Jun 2017 09:50:44 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_PASS,T_DKIM_INVALID autolearn=unavailable autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: pass (ns1.primenet.com.au: SPF record at _netblocks.google.com designates 209.85.128.181 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=3zKXe6sVs6UCxEOKXOAf+1cHolNCm6xGcMXWXSVUC54=; b=l/PUA+VdYujQ9PC/53DEHLtLunZnKN0A7o5WxcJLmtMgfKW+eMMc22MSQECCMRDobV Ey5Vwo6guUYe5XzSTCwQAfYgKL4q47KwClIjUN56JHwItyKVcJ3f7M0iLW8YvakuGxGu BRSciJ9a3HT2pEAFrtX1tDyXscKZrs4oeRMwcEa9Pccn+wmo+fk6ooqT74k46ucWFlPl Va9mpBpCtajmFn/3BLKcmdqc9qy3J1sCCJA9dL+gtKB8uNhkG/kQHxqZpfo2RoP7p6a/ kX3XlyD4hS68FBI/1tmHrCoyyIoqImwNlo3xMMqH1Coj9CjzKRWR2HqsghSy8OdJ0wE2 ux2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=3zKXe6sVs6UCxEOKXOAf+1cHolNCm6xGcMXWXSVUC54=; b=iiorDFlY9J6hW0Jq/XyG4Wqz+lAZQ6VFqZLNQvuY54pdbvJ4LFSZRsHZFNXvs6XtL+ bV3H0JUCNgyWsktMmMfoOYvchgciAk4uI6ZiiOPfX3R+XJzMoRaBoOOtZGf+2OodkrtF gvSvIofg2vAuCNdVqY5E6Q5XqTNI51p7b+nxb3BTbtIHFJT/q2qrKe0Wgdg5bEjPkV5+ VrHv02tGV9K3kSg9SefDkMsw8dmHeCRvP+71Ecnm5ii97a1b5s/ZCofwyXBme3Yy1/R7 +90kzSE8K7aWYeuNwv1+RFhxxpNT9HFb0rvCskkK3t2yUlfBDkoo62L3KPePXB6vO8fa 2+Mg== X-Gm-Message-State: AKS2vOyIDpFaPkmghdmvPj3iCwSe/hNRJwKAERrbhbuaH7TGVnaGtl2B 5HBUkNh2lEEGoN/3 X-Received: by 10.223.154.41 with SMTP id z38mr3004207wrb.76.1497520235666; Thu, 15 Jun 2017 02:50:35 -0700 (PDT) Date: Thu, 15 Jun 2017 10:50:34 +0100 From: Stephane Chazelas To: Phil Pennock Cc: Zsh hackers list Subject: Re: =~ doesn't work with NUL characters Message-ID: <20170615095033.GC2416@chaz.gmail.com> Mail-Followup-To: Phil Pennock , Zsh hackers list References: <20170613100217.GA9529@chaz.gmail.com> <20170614204938.GA76510@tower.spodhuis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170614204938.GA76510@tower.spodhuis.org> User-Agent: Mutt/1.5.24 (2015-08-30) 2017-06-14 16:49:38 -0400, Phil Pennock: [...] > Without rematchpcre, this is ERE per POSIX APIs, which don't portably > support size-supplied strings, relying instead upon C-string > null-termination. > > Current macOS has regnexec() but this is not in the system regexp > library I see on Ubuntu Trusty or FreeBSD 10.3. It appears to be an > extension from when they switched to the TRE implementation in macOS > 10.8. > > Trying to support this would result in variations in behaviour across > systems in a way which I think might be undesirable. The whole point of > adding the non-PCRE implementation was to match Bash behaviour by > default, and Bash does the same thing. [...] A dirty trick in UTF-8 locales (the norm these days) may be to encode NUL as U+7FFFFF00 (and bytes 0x80 -> 0xff that don't form part of valid characters as U_7FFFFF{80..FF}) (in both the string and regexp). That wouldn't work with every regexp implementation though as some would treat those as invalid characters if they go by the newer definition where valid characters are only 0000->D7FF, E000->10FFFF. But with those that do, that would also make the behaviour more consistent in cases like: [[ $'\x80' = ? ]] vs [[ $'\x80' =~ '^.$' ]] That wouldn't help in things like [[ x =~ $'[\0-\177]' ]] (which anyway doesn't make sense in locales other than C/POSIX) though. -- Stephane