zsh-workers
 help / color / mirror / code / Atom feed
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: [Bug] S-flag imposes non-greedy match where it shouldn't
Date: Fri, 27 Dec 2019 05:29:23 +0000	[thread overview]
Message-ID: <20191227052923.yal2nnmxdxfgvfkr@tarpaulin.shahaf.local2> (raw)
In-Reply-To: <CAKc7PVDOMz1wvMGWYOki7ca5dHodMJx30eq8UEzgj=D+W7E4aQ@mail.gmail.com>

Sebastian Gniazdowski wrote on Thu, Dec 26, 2019 at 19:35:05 +0100:
> I've attached the extended description.

Thanks; review below.

> It includes a trick to
> work-around the unintuitive behavior of S. It looks as follows:
> 
> http://psprint.blinkenshell.org/S_flag.png

Please just copy-paste terminal transcripts into the email.  (Static
transcripts, as in the documentation.)

> I think that the way the S flag works is a bit of an inconsistency,
> Because ${str%%X##**} would not stop at the first from the right
> match, it would try other matches starting from the right and go on up
> to the final first from the left X. I think that (S) shouldn't change
> this, but on the other hand should ${(S)str%%X##} match the first
> three X? Rather not, as it would resemble ## then... Intuitively,
> however, it should match all the three right X.

Yes, I don't find the following very intuitive:

% set -- aXbXc
% p ${1%%X*}
a
% p ${(S)1%%X*}
aXb
% p ${(S)1%X*}
aXbc
% 

I expected ${(S)%%} to mean: 'Look for the longest match that ends on the last
character; if you don't find any, then look for the longest match that ends on
the penultimate character; etc, until you finally consider whether $str[1] is a
match and whether ${str[1,0]} is a match'.  However, that's clearly not what it
does here, or ${(S)1%%X*} would have printed «a».  Rather, it seems that
${(S)%} and ${(S)%%} mean 'Find the match whose _start_ is closest to the end
of the string; of all matches that start at a particular index, ${(S)%} picks
the shortest and ${(S)%%} the longest.'.

> +++ b/Doc/Zsh/expn.yo
> @@ -1399,6 +1399,20 @@ from the beginning and with tt(%) start from the end of the string.
>  With substitution via tt(${)...tt(/)...tt(}) or
>  tt(${)...tt(//)...tt(}), specifies non-greedy matching, i.e. that the
>  shortest instead of the longest match should be replaced.
> +The substring search means that the pattern is matched skipping the
> +parts of the input string starting from the direction set by the use
> +of tt(#) or tt(%).

I don't understand this sentence.  What does "skipping" mean?

Documentation should be clear and specific enough to allow acceptance
tests to be based on it.

> +For example, to match a pattern starting from the
> +end, one could use:
> +
> +example(str="abcXXXdefXXXghi"
> +out=${(S)str%%(#b)([^X])X##}
> +out=$out${match[1]}
> +)
> +
> +The result is tt(abcXXXdefghi).

That's not correct.  The output is abcXXXdefXXXghi (in 'zsh -f') or
abcXXXdeghif (with extendedglob set), but not abcXXXdefghi.

I doubt this example would clarify the meaning of ${(S)} to people who
encounter it for the first time.  Please use a more minimal example.
Specific issues:

- Assigning to $out a concatenation of two different values muddies the water.
  It forces readers to reverse engineer which parts of the resultant value come
  from ${match[1]} and which from the ${(S)%%}.  This is documentation, not
  a homework problem; the answer should be obvious.  Something like
  «out="${out}+${match[1]}"» would address this — but…

- … the use of advanced pattern matching features needlessly raises the
  learning curve.  For example, the use of «##» doesn't affect the behaviour
  of the example in any meaningful way, but it has two downsides: it means the
  example won't work out of the box when people paste it into their shell, and
  it means people who RTFM about (S) won't be able to understand it until they
  also look up what «##» does [which in turn means they'll have to open
  zshoptions(1) to RTFM about EXTENDED_GLOB].  This mostly applies to the use
  of (#b) and capture groups too: it would be better not to assume knowledge
  of that.

> It would have been tt(abcXXXdefXXghif)
> +if not the tt([^X]) part, as despite the tt(%%) specifies a greedy
> +match, the substring matching works by trying matches from right to
> +left and stops at a first valid match.

There are some grammatical errors here (e.g., s/(?<=specif)ies/ying/), but
let's not worry about them until the rest of the patch isn't a moving target.

Thanks for the patch.  I look forward to a v2.

Daniel

P.S. Obviously, I meant to write «s/specifies/specifying/» — but I wanted to
illustrate the point that no more knowledge of pattern-matching syntax should
be assumed than necessary.  [It was a positive lookbehind assertion Perl
syntax.]

  parent reply	other threads:[~2019-12-27  5:30 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-18 20:41 Sebastian Gniazdowski
2019-12-18 20:44 ` Sebastian Gniazdowski
2019-12-19 15:29   ` Daniel Shahaf
2019-12-26 18:35     ` Sebastian Gniazdowski
2019-12-27  4:54       ` Sebastian Gniazdowski
2019-12-27  5:09         ` Sebastian Gniazdowski
2019-12-27  5:29       ` Daniel Shahaf [this message]
2019-12-28 19:04         ` Sebastian Gniazdowski
2019-12-28 20:34           ` Bart Schaefer
2019-12-28 21:00           ` Daniel Shahaf
2019-12-29  0:56             ` Sebastian Gniazdowski
2019-12-29  2:05               ` Daniel Shahaf
2019-12-29  3:14                 ` Sebastian Gniazdowski
2019-12-30 18:00                   ` [PATCH] zshexpn: Expand documentation of (S) (was: Re: [Bug] S-flag imposes non-greedy match where it shouldn't) Daniel Shahaf
2019-12-30 18:11                     ` Roman Perepelitsa
     [not found]                       ` <CAKc7PVAXLpKqZvmbazZK=mvcz8T-AHJXKusut6aEjkkSLzgdbw@mail.gmail.com>
2019-12-30 20:01                         ` Roman Perepelitsa
2019-12-30 20:20                           ` Sebastian Gniazdowski
2019-12-30 21:24                             ` ${(S)%%*} doesn't match the empty string (was: Re: [PATCH] zshexpn: Expand documentation of (S) (was: Re: [Bug] S-flag imposes non-greedy match where it shouldn't)) Daniel Shahaf
2019-12-30 21:44                               ` Roman Perepelitsa
2019-12-30 22:11                                 ` Sebastian Gniazdowski
2019-12-30 22:34                               ` Peter Stephenson
2019-12-30 21:40                             ` [PATCH] zshexpn: Expand documentation of (S) (was: Re: [Bug] S-flag imposes non-greedy match where it shouldn't) Roman Perepelitsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191227052923.yal2nnmxdxfgvfkr@tarpaulin.shahaf.local2 \
    --to=d.s@daniel.shahaf.name \
    --cc=zsh-workers@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).