From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29045 invoked by alias); 22 Nov 2017 21:40:34 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42045 Received: (qmail 29807 invoked by uid 1010); 22 Nov 2017 21:40:34 -0000 X-Qmail-Scanner-Diagnostics: from mail-wr0-f178.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.128.178):SA:0(-1.9/5.0):. Processed in 2.246521 secs); 22 Nov 2017 21:40:34 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_PASS,T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=Fd6mCQp9tLuV7BrG53DSDLo8YPt7W4/KNNCd/e6+QMQ=; b=VxX+NehbKa9kFkJbhPzqyzyL+ealdOv3/vQX6hr8oetB+pRMkFKRoZ+8u+7GgJoaeG 5vhm0lT7JhSlylG1faR8sufppYY+NXGCd41CoDPAhSjaNCtDaVYecXFf3YYW3Twpgjrj Q/417db1NXYjbybHCvhi/sw0dc2gZUj3vBsM4g2y6wKv3AR83u9QgOJywoCB2d8d6/K8 U0Q7XPt/FHOF1GW8gThSYTbqgAlnK/ZI52vllY0zCwdOaeFfX+XqV45xzG3NlY07Bk9j aCcAZ1Ii87Jw7BL5fPvPlylASVbG8GI0UJGs3SFXQO0FDeLAKAU0tlG9BkYvqmONElO9 mcBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mail-followup-to :references:mime-version:content-disposition:in-reply-to:user-agent; bh=Fd6mCQp9tLuV7BrG53DSDLo8YPt7W4/KNNCd/e6+QMQ=; b=GCtJCIBCoqqPh4MLsBSyUX5hEIMu0bQDQF7GTI8V+VVZ2wXUFxdUejLPRKQppNEqZN jDOltNNccFYAWeiLQ11/M+qkCSu3KWqh+Xmc9YIiaNBJOt5NYwIBeTrDtLGuyRGyQH+G 31s9MpgAAmK6JQB+OI8aU7Xc2oWEKnbx0lNgqq/r3ZmmFZhZwe4QhDnXFweMttcgqa1a hZp2w5pliilYE1hpzebSUYQUHAFlJ46yNkHaOJ0+YvT6J/k86ZZurUS9PGPjVjhciQAJ EHeEkW43+sxQLe5zSBjF4bjchHhaCjss+NuKYVF9N2nvnPQD6ogvFT+jO+39w4hmtkyM 5afw== X-Gm-Message-State: AJaThX7m1MXieBDbSIvcWAUdnwtXt3/xLqjRE8M4IdBYTgtGJJpsgFtb l4Fc7GX8K4BVd6vtDDL6qNxEkg== X-Google-Smtp-Source: AGs4zMbAHxdijqRVXhtKl+TyUmBR1rLuppavJb7WxbdBwSuGKnicFiOCiBkxuUfJaQOJ3oZ7ld7bfA== X-Received: by 10.223.196.194 with SMTP id o2mr4496727wrf.246.1511386827679; Wed, 22 Nov 2017 13:40:27 -0800 (PST) Date: Wed, 22 Nov 2017 21:40:25 +0000 From: Stephane Chazelas To: Zsh hackers list Subject: Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre Message-ID: <20171122214025.GA2992@chaz.gmail.com> Mail-Followup-To: Zsh hackers list References: <20171122122519.GA13771@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171122122519.GA13771@chaz.gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) 2017-11-22 12:25:19 +0000, Stephane Chazelas: [...] > It can be worked around ([[ $a =~ 'a\z' ]], [[ $a =~ '(?s).' > ]]), but IMO at least PCRE_DOLLAR_ENDONLY (if not PCRE_DOTALL) > should be the default at least for [[ $string =~ ... ]] as > in shells, $string usually do not include the newline delimiter. [...] The situation in other tools languages: ksh93: $ ksh93 -c "[[ $'a\n' = ~(P:a$) ]] || echo no; [[ $'\n' = ~(P:.) ]] && echo yes" no yes (both PCRE_DOLLAR_ENDONLY and PCRE_DOTALL (or equivalent as ksh93 comes with its own pcre-like implementation)) $ php -r 'echo preg_match("/a$/", "a\n") . "\n" . preg_match("/./", "\n") . "\n";' 1 0 neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Clearly documented and has a "D" flag to enable PCRE_DOLLAR_ENDONLY https://secure.php.net/manual/en/reference.pcre.pattern.modifiers.php $ php -r 'echo preg_match("/a$/D", "a\n") . "\n";' 0 ssed: printf 'a\n\n' | ssed -Rn 'N;/a$/=;/a./!=' neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL GNU grep: $ printf 'a\n\0' | ltrace -e 'pcre_compile' grep -zP 'a$' grep->pcre_compile("a$", 2080, 0x7ffcaf25aff8, 0x7ffcaf25aff4, 0x1e89280) PCRE_DOLLAR_ENDONLY (32) but not PCRE_DOTALL python (not PCRE) neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Documented: https://docs.python.org/3/library/re.html \Z means the opposite from perl/PCREs! (matches at the end only) fish (string match -r pcre strings...) neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL So I'd understand if you leave it as it is as many other tools do not use PCRE_DOLLAR_ENDONLY. I still find the idea of $ not matching only at the end of the subject dangerous, as most people assume it does (like it does in BRE and ERE). If not changed, it would be worth clearly documenting (if only to flag the difference with ERE and warn of potential implications). See how the documentation current has this misleading example: [[ "$text" -pcre-match ^d+$ ]] && print text variable contains only "d's". Should be: print text variable contains only "d's" optionally followed by a newline character or:. [[ "$text" -pcre-match '^d+\z' ]] It affects perl and co already. Like, many people do: rename 's/\.back$//i' ./* When they meant: rename 's/\.back\z//i' ./* Same for PCRE_DOTALL rename 's/-.*//' ./*-* when they meant rename 's/(?s)-.*//' ./*-* for instance. -- Stephane