From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id d1fc0d0c for ; Fri, 27 Dec 2019 05:30:14 +0000 (UTC) Received: (qmail 17817 invoked by alias); 27 Dec 2019 05:30:09 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 45149 Received: (qmail 26764 invoked by uid 1010); 27 Dec 2019 05:30:09 -0000 X-Qmail-Scanner-Diagnostics: from out3-smtp.messagingengine.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.102.1/25670. spamassassin: 3.4.2. Clear:RC:0(66.111.4.27):SA:0(-2.6/5.0):. Processed in 5.436921 secs); 27 Dec 2019 05:30:09 -0000 X-Envelope-From: d.s@daniel.shahaf.name X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at daniel.shahaf.name does not designate permitted sender hosts) X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvddvjedgkeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkfhggtggugfgjfgesth ektddttderjeenucfhrhhomhepffgrnhhivghlucfuhhgrhhgrfhcuoegurdhssegurghn ihgvlhdrshhhrghhrghfrdhnrghmvgeqnecuffhomhgrihhnpegslhhinhhkvghnshhhvg hllhdrohhrghenucfkphepjeelrddukedtrdehjedrudduleenucfrrghrrghmpehmrghi lhhfrhhomhepugdrshesuggrnhhivghlrdhshhgrhhgrfhdrnhgrmhgvnecuvehluhhsth gvrhfuihiivgeptd X-ME-Proxy: Date: Fri, 27 Dec 2019 05:29:23 +0000 From: Daniel Shahaf To: Zsh hackers list Subject: Re: [Bug] S-flag imposes non-greedy match where it shouldn't Message-ID: <20191227052923.yal2nnmxdxfgvfkr@tarpaulin.shahaf.local2> References: <1a130b2e-5824-4b7a-8510-2b1d0b3fdac5@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sebastian Gniazdowski wrote on Thu, Dec 26, 2019 at 19:35:05 +0100: > I've attached the extended description. Thanks; review below. > It includes a trick to > work-around the unintuitive behavior of S. It looks as follows: > > http://psprint.blinkenshell.org/S_flag.png Please just copy-paste terminal transcripts into the email. (Static transcripts, as in the documentation.) > I think that the way the S flag works is a bit of an inconsistency, > Because ${str%%X##**} would not stop at the first from the right > match, it would try other matches starting from the right and go on up > to the final first from the left X. I think that (S) shouldn't change > this, but on the other hand should ${(S)str%%X##} match the first > three X? Rather not, as it would resemble ## then... Intuitively, > however, it should match all the three right X. Yes, I don't find the following very intuitive: % set -- aXbXc % p ${1%%X*} a % p ${(S)1%%X*} aXb % p ${(S)1%X*} aXbc % I expected ${(S)%%} to mean: 'Look for the longest match that ends on the last character; if you don't find any, then look for the longest match that ends on the penultimate character; etc, until you finally consider whether $str[1] is a match and whether ${str[1,0]} is a match'. However, that's clearly not what it does here, or ${(S)1%%X*} would have printed «a». Rather, it seems that ${(S)%} and ${(S)%%} mean 'Find the match whose _start_ is closest to the end of the string; of all matches that start at a particular index, ${(S)%} picks the shortest and ${(S)%%} the longest.'. > +++ b/Doc/Zsh/expn.yo > @@ -1399,6 +1399,20 @@ from the beginning and with tt(%) start from the end of the string. > With substitution via tt(${)...tt(/)...tt(}) or > tt(${)...tt(//)...tt(}), specifies non-greedy matching, i.e. that the > shortest instead of the longest match should be replaced. > +The substring search means that the pattern is matched skipping the > +parts of the input string starting from the direction set by the use > +of tt(#) or tt(%). I don't understand this sentence. What does "skipping" mean? Documentation should be clear and specific enough to allow acceptance tests to be based on it. > +For example, to match a pattern starting from the > +end, one could use: > + > +example(str="abcXXXdefXXXghi" > +out=${(S)str%%(#b)([^X])X##} > +out=$out${match[1]} > +) > + > +The result is tt(abcXXXdefghi). That's not correct. The output is abcXXXdefXXXghi (in 'zsh -f') or abcXXXdeghif (with extendedglob set), but not abcXXXdefghi. I doubt this example would clarify the meaning of ${(S)} to people who encounter it for the first time. Please use a more minimal example. Specific issues: - Assigning to $out a concatenation of two different values muddies the water. It forces readers to reverse engineer which parts of the resultant value come from ${match[1]} and which from the ${(S)%%}. This is documentation, not a homework problem; the answer should be obvious. Something like «out="${out}+${match[1]}"» would address this — but… - … the use of advanced pattern matching features needlessly raises the learning curve. For example, the use of «##» doesn't affect the behaviour of the example in any meaningful way, but it has two downsides: it means the example won't work out of the box when people paste it into their shell, and it means people who RTFM about (S) won't be able to understand it until they also look up what «##» does [which in turn means they'll have to open zshoptions(1) to RTFM about EXTENDED_GLOB]. This mostly applies to the use of (#b) and capture groups too: it would be better not to assume knowledge of that. > It would have been tt(abcXXXdefXXghif) > +if not the tt([^X]) part, as despite the tt(%%) specifies a greedy > +match, the substring matching works by trying matches from right to > +left and stops at a first valid match. There are some grammatical errors here (e.g., s/(?<=specif)ies/ying/), but let's not worry about them until the rest of the patch isn't a moving target. Thanks for the patch. I look forward to a v2. Daniel P.S. Obviously, I meant to write «s/specifies/specifying/» — but I wanted to illustrate the point that no more knowledge of pattern-matching syntax should be assumed than necessary. [It was a positive lookbehind assertion Perl syntax.]