From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 9fdf6c69 for ; Sat, 28 Dec 2019 21:01:07 +0000 (UTC) Received: (qmail 7800 invoked by alias); 28 Dec 2019 21:01:02 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 45152 Received: (qmail 13618 invoked by uid 1010); 28 Dec 2019 21:01:02 -0000 X-Qmail-Scanner-Diagnostics: from out2-smtp.messagingengine.com by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.102.1/25670. spamassassin: 3.4.2. Clear:RC:0(66.111.4.26):SA:0(-2.6/5.0):. Processed in 5.102068 secs); 28 Dec 2019 21:01:02 -0000 X-Envelope-From: d.s@daniel.shahaf.name X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | Received-SPF: none (ns1.primenet.com.au: domain at daniel.shahaf.name does not designate permitted sender hosts) X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdeftddgudeghecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhepfffhvffukfhfgggtugfgjggfse htkedttddtreejnecuhfhrohhmpeffrghnihgvlhcuufhhrghhrghfuceougdrshesuggr nhhivghlrdhshhgrhhgrfhdrnhgrmhgvqeenucfkphepjeelrddukedtrdehjedruddule enucfrrghrrghmpehmrghilhhfrhhomhepugdrshesuggrnhhivghlrdhshhgrhhgrfhdr nhgrmhgvnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Date: Sat, 28 Dec 2019 21:00:17 +0000 From: Daniel Shahaf To: Zsh hackers list Subject: Re: [Bug] S-flag imposes non-greedy match where it shouldn't Message-ID: <20191228210017.2cdgwgpqrssrfhgp@tarpaulin.shahaf.local2> References: <1a130b2e-5824-4b7a-8510-2b1d0b3fdac5@www.fastmail.com> <20191227052923.yal2nnmxdxfgvfkr@tarpaulin.shahaf.local2> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sebastian Gniazdowski wrote on Sat, Dec 28, 2019 at 20:04:21 +0100: > On Fri, 27 Dec 2019 at 06:30, Daniel Shahaf wrote: > > > > Sebastian Gniazdowski wrote on Thu, Dec 26, 2019 at 19:35:05 +0100: > > > +++ b/Doc/Zsh/expn.yo > > > @@ -1399,6 +1399,20 @@ from the beginning and with tt(%) start from the end of the string. > > > With substitution via tt(${)...tt(/)...tt(}) or > > > tt(${)...tt(//)...tt(}), specifies non-greedy matching, i.e. that the > > > shortest instead of the longest match should be replaced. > > > +The substring search means that the pattern is matched skipping the > > > +parts of the input string starting from the direction set by the use > > > +of tt(#) or tt(%). > > > > I don't understand this sentence. What does "skipping" mean? > > It means that parts of the string are being skipped when they don't > match when moving to the other end. Does the sentence need an update? Yes. Feel free to also add a paragraph break, and/or to change the incumbent text, too. > > > +For example, to match a pattern starting from the > > > +end, one could use: > > > + > > > +example(str="abcXXXdefXXXghi" > > > +out=${(S)str%%(#b)([^X])X##} > > > +out=$out${match[1]} > > > +) > > > + > > > +The result is tt(abcXXXdefghi). > > > > That's not correct. The output is abcXXXdefXXXghi (in 'zsh -f') or > > abcXXXdeghif (with extendedglob set), but not abcXXXdefghi. > > I've sent an updated patch half hour before your email. It contains > the correct example. > I saw it, but most of my feedback applied to it too. I think the last sentence of that patch is the most important one, since it's the only one that actually gives the general rule. I'd put it nearer the top. > > I doubt this example would clarify the meaning of ${(S)} to people who > > encounter it for the first time. Please use a more minimal example. > > Specific issues: > > - (...) This is documentation, not > > a homework problem; the answer should be obvious. Something like > > «out="${out}+${match[1]}"» would address this — but… > > I think that many examples in the man pages are like that – they don't > go the obvious path of just demonstrating the usage but instead, they > cover some edge case that, after (sometimes quite long) thinking > reveal something very peculiar about the feature. So what? We're not going to accept a patch that adds an unclear explanation simply because other explanations are unclear. New documentation should be clear. If any of the existing documentation is unclear, we should fix that, too. > There are better examples of this, however, the best that I've found > currently is the one used for the #b glob flag: > > foo="a string with a message" > if [[ $foo = (a|an)' '(#b)(*)' '* ]]; then > print ${foo[$mbegin[1],$mend[1]]} > fi > > The example prints `string with a', and the user has a "homework" of > untangling a few points: > - why it isn't "string with a message" (it's because the final ' '* > part that requires a space after the final word of the (*) part), > - why the answer isn't "message" (the same as above plus the fact that > there's no * before (a|an) and the greediness). > > If not the homework-attitude of the examples in the man page, the > example would have been > > if [[ "a string with a message" = (#b)a' '(*) ]]; then > > and would give the answer "string with a message". This would have > been the obvious-demonstration attitude that I've referred to. You can't actually get rid of the variable $foo; it's needed for the «print» call on the next line. Otherwise, I agree. I'll go ahead and make the change, and also change the spaces to underscores. Thanks for pointing this out. Do you know any other examples that have room for improvement? > > - … the use of advanced pattern matching features needlessly raises the > > learning curve. > > I can add the mention that the example needs EXTENDED_GLOB. Overall I > think that the example: > - is nice because it shows how to make the (S)...%% substitution > behave as the intuition would suggest, Let's not lose sight of the wood for the trees. The purpose of the documentation is first and foremost to describe what a feature _does_, be it intuitive or not. Describing how to coerce it into doing other things is secondary. Your (revised) patch puts the cart before the horse: it describes your "trick" before describing what ${(S)%%} actually does. Please change that. If you then want to recommend left-anchoring the pattern in order to force a match that starts farther from the end to be used, that would be fine. And if the left-anchoring example requires capturing groups, so be it — but you could probably give an example that doesn't. (passwd(5) lines come to mind.) I wonder if there's anything else the documentation could recommend. Your trick boils down to using captured negated character classes as a poor man's negative lookbehind assertion, but we have the zsh/pcre module which supports real lookaround assertions (as well as resetting the start of the match, \K), so perhaps that could be used? Or perhaps there's a way to get the "intuitive" behaviour by reversing the string, using ${(S)##}, and reversing it again. > - it's the only place in the documentation that uses the (#b) flag > with #/% substitution, showing that it's possible to use it in that > place, We can add a separate example for that under (#b), which is the more advanced of these topics, and subject the explanation of (S) to KISS. > - it isn't that complex for someone that knows #b flag and the $match parameter. The documentation is aimed at everyone, including people who don't already know (#b). > > > It would have been tt(abcXXXdefXXghif) > > > +if not the tt([^X]) part, as despite the tt(%%) specifies a greedy > > > +match, the substring matching works by trying matches from right to > > > +left and stops at a first valid match. > > > > There are some grammatical errors here (e.g., s/(?<=specif)ies/ying/), but > > let's not worry about them until the rest of the patch isn't a moving target. > > I think that grammar is correct here. Did you maybe misread the sentence? No, I didn't. I was taught that "despite" should always be followed by a noun phrase, never by a sentence. Cheers, Daniel