zsh-workers
 help / color / mirror / code / Atom feed
From: Stephane Chazelas <stephane.chazelas@gmail.com>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre
Date: Wed, 22 Nov 2017 21:40:25 +0000	[thread overview]
Message-ID: <20171122214025.GA2992@chaz.gmail.com> (raw)
In-Reply-To: <20171122122519.GA13771@chaz.gmail.com>

2017-11-22 12:25:19 +0000, Stephane Chazelas:
[...]
> It can be worked around ([[ $a =~ 'a\z' ]], [[ $a =~ '(?s).'
> ]]), but IMO at least PCRE_DOLLAR_ENDONLY (if not PCRE_DOTALL)
> should be the default at least for [[ $string =~ ... ]] as
> in shells, $string usually do not include the newline delimiter.
[...]

The situation in other tools languages:

ksh93:

$ ksh93 -c "[[ $'a\n' = ~(P:a$) ]] || echo no; [[ $'\n' = ~(P:.) ]] && echo yes"
no
yes


(both PCRE_DOLLAR_ENDONLY and PCRE_DOTALL (or equivalent as
ksh93 comes with its own pcre-like implementation))

$ php -r 'echo preg_match("/a$/", "a\n") . "\n" . preg_match("/./", "\n") . "\n";'
1
0

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Clearly documented
and has a "D" flag to enable PCRE_DOLLAR_ENDONLY
https://secure.php.net/manual/en/reference.pcre.pattern.modifiers.php

$ php -r 'echo preg_match("/a$/D", "a\n") . "\n";'
0

ssed:

printf 'a\n\n' | ssed -Rn 'N;/a$/=;/a./!='

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL

GNU grep:

$ printf 'a\n\0' | ltrace -e 'pcre_compile' grep -zP 'a$'
grep->pcre_compile("a$", 2080, 0x7ffcaf25aff8, 0x7ffcaf25aff4, 0x1e89280)

PCRE_DOLLAR_ENDONLY (32) but not PCRE_DOTALL

python (not PCRE)

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL. Documented:
https://docs.python.org/3/library/re.html

\Z means the opposite from perl/PCREs! (matches at the end only)

fish (string match -r pcre strings...)

neither PCRE_DOLLAR_ENDONLY nor PCRE_DOTALL

So I'd understand if you leave it as it is as many other tools
do not use PCRE_DOLLAR_ENDONLY.

I still find the idea of $ not matching only at the end of the
subject dangerous, as most people assume it does (like it does
in BRE and ERE). If not changed, it would be worth clearly
documenting (if only to flag the difference with ERE and warn of
potential implications). See how the documentation current has
this misleading example:

  [[ "$text" -pcre-match ^d+$ ]] &&
  print text variable contains only "d's".

Should be: 

  print text variable contains only "d's" optionally followed by a newline character

or:.

  [[ "$text" -pcre-match '^d+\z' ]]


It affects perl and co already. Like, many people do:

rename 's/\.back$//i' ./*

When they meant:

rename 's/\.back\z//i' ./*

Same for PCRE_DOTALL

rename 's/-.*//' ./*-*

when they meant

rename 's/(?s)-.*//' ./*-*

for instance.

-- 
Stephane


  reply	other threads:[~2017-11-22 21:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 12:25 Stephane Chazelas
2017-11-22 21:40 ` Stephane Chazelas [this message]
2018-01-20  7:48   ` Bart Schaefer
2018-01-22  5:28     ` Phil Pennock
2018-01-23  6:57       ` Stephane Chazelas
2018-01-23 13:55         ` Stephane Chazelas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122214025.GA2992@chaz.gmail.com \
    --to=stephane.chazelas@gmail.com \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).