From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19041 invoked by alias); 23 Jan 2018 06:57:53 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 42319 Received: (qmail 2864 invoked by uid 1010); 23 Jan 2018 06:57:52 -0000 X-Qmail-Scanner-Diagnostics: from mail-wr0-f179.google.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1. Clear:RC:0(209.85.128.179):SA:0(-1.9/5.0):. Processed in 10.725267 secs); 23 Jan 2018 06:57:52 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_PASS,T_DKIM_INVALID autolearn=ham autolearn_force=no version=3.4.1 X-Envelope-From: stephane.chazelas@gmail.com X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:mail-followup-to:references :mime-version:content-disposition:in-reply-to:user-agent; bh=KCbx2GXsoYw97oj1ltjr5rnmcE8RmWYO/kcWaKYKV/g=; b=ZH8ZKlLI5kIAutw3i80MKQXTpFIxGHHXWai3RS/zy54UeVuE0IeLPXnKyavt9iH2wT M8Ak8UKZkaoLbWU95eYhJ3oHSQU7pvAkjbu2ewPm33e/UVtt3IlG6GHRBPWKJbGT0PmB 2FySwUxmnvvkqkuXsaM8PHbLjxZxZAXrUPv0bva8mkK3oabDx0jFaxvHDsWItc+0v1Y/ MBuO7X9+zEJNROs2HNN6z3sLKB9yxnoJTlX7g6WryO0xueS0ro5wgFp9U2SauJckam3K 15hl9MEf5M3/R4PhTJ/WQSNkm2/zKt7bfwe8OtjUe/3sI38ehnHrOKJ5p5MX7o+cKQ0P a2SA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:references:mime-version:content-disposition :in-reply-to:user-agent; bh=KCbx2GXsoYw97oj1ltjr5rnmcE8RmWYO/kcWaKYKV/g=; b=PiNzXRgvDVHXX2pfS8LqoPYcY/5vlSzKui1NeO+TVERTu2Fr5frl45voD83ZphvDtF L77Cr1W0uXhpbH4zVB319QYNsn4QwuecK68P6iwf8FC2tK2JW+utv7UKCFu7Igivivnn aQyhT9XpWXVs2t/YHKqlcCsQ5uo3SMJfiPuj+m1RA6etyOOTTLzoPFE0GDCof5L0fyvd +6gzv+88gAHmz2r6C0NS+WMFjbhifVEn7i+bqhl2crc91eaowfwFPWNV4vJmPaOh02U+ 89qc3IZw9kL6q2g8HFwCSx3ShWChhf9dtJR3W5i4m45qOzozyqE/t4gUicJQ08gC8WbP 8Egw== X-Gm-Message-State: AKwxytcvROH2YQ2f6kZEDSd9ogdTyZtJtNqlUn56WUUrB+X4jEPZYm8D ku3EwgazpJwtsLqca5qidOpDzA== X-Google-Smtp-Source: AH8x225FLy29ZyXsGOXMlj0PbtHk6czywno9XyuH+uRhSBW463FZ/PmSwAk5N608euPZnwephMWwyw== X-Received: by 10.223.175.220 with SMTP id y28mr1069205wrd.263.1516690657536; Mon, 22 Jan 2018 22:57:37 -0800 (PST) Date: Tue, 23 Jan 2018 06:57:35 +0000 From: Stephane Chazelas To: Phil Pennock Cc: Bart Schaefer , zsh-workers@zsh.org Subject: Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre Message-ID: <20180123065735.GA16678@chaz.gmail.com> Mail-Followup-To: Phil Pennock , Bart Schaefer , zsh-workers@zsh.org References: <20171122122519.GA13771@chaz.gmail.com> <20171122214025.GA2992@chaz.gmail.com> <180119234824.ZM7254@torch.brasslantern.com> <20180122052829.GA83799@tower.spodhuis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180122052829.GA83799@tower.spodhuis.org> User-Agent: Mutt/1.5.24 (2015-08-30) 2018-01-22 00:28:29 -0500, Phil Pennock: [...] > Changing the default behavior of valid semantics risks hard-to-debug > breakage of existing scripts and I am erring on the side of being > against this change. It's not hard opposition, but I'd like to see > stronger justification before risking breaking changes. > > I know that I myself have scripts which rely upon PCRE matching against > multiline data behaving as per the defaults of pcrepattern(3). > > In addition, while the DOTALL change can be turned off in-regex, the > dollar-endonly one can't, AFAIK, so that becomes a breaking change which > can't be worked around. [...] dollar-endonly is not really about multiline [[ $'a\nb' =~ 'a$' ]] will not match with or without it and [[ $'a\nb' =~ '(?m)a$' ]] will match with or without it. It's more about single-line where the line delimiter happens to be included (and you want the $ to match on the end of that line as opposed to the end of the string). $ matches before a trailing newline in a string in perl because of how its <> operator works. perl is a text processing utility, its regexps are primarily matched against single lines where the newline is included (contrary to traditional text processing utilities like sed/grep/awk where the record separator is not included). In: perl -pe 's/.$//' (which calls <>). you want to remove the last character of the line, not the newline character. That $ behaviour makes a lot of sense there. Even if you use: perl -lpe 's/.$//' where that -l causes the delimiter to be removed on input and added back on output like in sed/awk, that behaviour doesn't harm because the record does *not* contain any newline delimiter. But zsh is not a text processing utility, and its "read" builtin (the closest equivalent to perl's <>) does not include the delimiter. It's actually hard to have a trailing newline when processing text in shells given that $(...) strips them.. On the other hand, having [[ $file =~ '\.txt$' ]] match on files that don't end in .txt is a concern (and in my experience, file names (as opposed to text lines with delimiters) is the kind of thing I deal most often with in zsh). And again, note that it only happens with pcrematch, it works as expected with EREs. -- Stephane