Questions about completion matchers

zsh-users
 help / color / mirror / code / Atom feed

* Questions about completion matchers
@ 2021-09-21  9:23 Marlon Richert
  2021-09-22 23:25 ` Bart Schaefer
  2021-09-26 13:09 ` Oliver Kiddle
  0 siblings, 2 replies; 11+ messages in thread
From: Marlon Richert @ 2021-09-21  9:23 UTC (permalink / raw)
  To: Zsh Users

How can I make a matcher that completes the right-most part (and only
the right-most part) of each subword? That is, given a target
completion 'abcDefGhi', how do I make a match specification that
completes inputs

* a
* aD
* abD
* aDG
* aDe
* aDeG

to this target, but not inputs

* D
* aG
* acD
* DG
* aDf
* aDeGi

?

Additionally, the following are unclear to me from the manual:
* What is the exact difference between l:lanchor||ranchor=tpat and
r:lanchor||ranchor=tpat ?
* Why do the examples in the manual add r:|=* to the end of each
matcher? This appears to make no difference at all.
* It appears that the order of "match descriptions" in a matchers
matters, but it is unclear to me in what way and it isn't mentioned in
the manual. For example, the pairs of matchers below differ only in
the order of their match descriptions, yet each produces a different
behavior. How are the match descriptions inside a matcher evaluated
and what causes the difference between these?
  * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd
a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not.
  * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_]
m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the
unambiguous substring on the first try, but on the second try, only
the former will then list both completions, whereas the latter will
complete only 'a-b'.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-09-21  9:23 Questions about completion matchers Marlon Richert
@ 2021-09-22 23:25 ` Bart Schaefer
  2021-09-26 13:09 ` Oliver Kiddle
  1 sibling, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2021-09-22 23:25 UTC (permalink / raw)
  To: Marlon Richert; +Cc: Zsh Users

On Tue, Sep 21, 2021 at 2:23 AM Marlon Richert <marlon.richert@gmail.com> wrote:
>
> How can I make a matcher that completes the right-most part (and only
> the right-most part) of each subword?

I would not try to do this with a matcher specification ... someone
else (Oliver?) may be able to give a more accurate answer, but I don't
think matchers are very good at splitting up words unless there is an
anchor character ("." or "-" for example) to subdivide the words.  I
know there's an example that purports to handle a similar situation,
but the more you want to constrain it ("only the right-most part") the
uglier it gets.

Instead I'd probably write a completer function that creates a
modified words array using match-words-by-style, then compset the
appropriate prefix and suffix.  But I haven't gone very far down that
road.

> * What is the exact difference between l:lanchor||ranchor=tpat and
> r:lanchor||ranchor=tpat ?

Again I'm not the ultimate expert here, but "lanchor" always has to
appear on the command line and with "l:" it has to appear to the left
of the matched substring (but not inside it) and with "r:" it has to
appear to the right of the matched substring (but again not inside
it).  In both cases ranchor has to appear in the potential completion
result (the "trial completion") but might bound a range on the command
line if it does match there.

In practice I've nearly always seen these to be empty strings.

> * Why do the examples in the manual add r:|=* to the end of each
> matcher? This appears to make no difference at all.

All of my real-life uses are "r:|=**" ... I don't know the answer to this one.

> * How are the match descriptions inside a matcher evaluated
> and what causes the difference between these?

I believe they're applied left to right and each one constrains the
possibilities seen by the next, based on what's already on the command
line when you invoke completion.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-09-21  9:23 Questions about completion matchers Marlon Richert
  2021-09-22 23:25 ` Bart Schaefer
@ 2021-09-26 13:09 ` Oliver Kiddle
  2021-10-08 22:38   ` Marlon Richert
  1 sibling, 1 reply; 11+ messages in thread
From: Oliver Kiddle @ 2021-09-26 13:09 UTC (permalink / raw)
  To: Marlon Richert; +Cc: Zsh Users

Marlon Richert wrote:
> How can I make a matcher that completes the right-most part (and only
> the right-most part) of each subword? That is, given a target
> completion 'abcDefGhi', how do I make a match specification that
> completes inputs

If you're trying to do camel-case matching, one option is:
  'r:|[A-Z]=* r:|=*'

The following was used by the original creator of matching control, it
works and breaks for the same cases as above in your example:
  'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*'

These allow extra characters at the beginning. So in your example, D
and DG match the target. There are also oddities with consecutive runs
of upper case characters, consider e.g. completion after ssh -o where
there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally
would.

With matching control, it is often easiest if you view it as converting
what is on the command-line into a regular expression. I haven't probed
the source code to get a precise view of how these are mapped. For my
own purposes, I keep a list but don't trust it in all cases because I've
found contradictory examples and tweaked it more than once, perhaps
making it less accurate in the process. So with the caveat that this
may contain errors, my current list is as follows:

Not that that starting point is:
  [cursor position] → .*
Then:
  'm:a=b'	– a	→ b		(* doesn't work on rhs)
  'r:|b=*'	– b	→ [^b]*b
  'r:a|b=*'	– ab	→ [^b]*a?b
  'r:a|b=c'     - ab    → cb
  'l:a|=*'	– a	→ [^a]*a
  'l:a|b=*'	– ab	→ [^a]*ab?
  'l:a|b=c'     – ab	→ ac
  'b:a=*'	– ^a	→ .*
  'b:a=c'	– ^a    → ^c
  'e:a=*'	– a$	→ .*
  'r:a||b=*'	– b	→ [^a]*ab	(only * works on rhs, empty a or b has no use)
  'l:a||b=*'	– ^a	→ a.*		(only * on rhs, empty a no use, b ignored?!)

Something like [A-Z] becomes it's concrete form from the command-line in the regex
For correspondence classes, the corresponding form goes in the regex and only work with m:/M: forms.
** is like * but with .* instead of [^x]*

In all cases, the original unchanged form also passes - a matching
control does not have to be used. I've excluded those in the regular
expressions above. But including them note the following potentially
useful effects with an empty lpat:

  'r:|b=c'	– b	→ c?b
  'l:a|=c'      – a	→ ac?

When composing multiple matching controls, it doesn't try to apply over
the results of the previous. You can consider it an alternation of the
effect of each matching control.

So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?)

For the most part there are certain common forms and if you stick to
those, you find fewer bugs than when being creative.

The || forms seem buggy to me. From the documentation, my assumption
would be that one means a[^a]*b and the other a[^b]*b
That could be more helpful for camel-case but I would need to generate
tests to say for sure.
b seems to even be ignored for the l form.

> Additionally, the following are unclear to me from the manual:
> * What is the exact difference between l:lanchor||ranchor=tpat and
> r:lanchor||ranchor=tpat ?

From the documentation and assuming some actual symmetry I would assume
the difference to be that lanchor needs to match the completion
candidate but not the command-line, while a tpat of * will not match
ranchor – swap l and r anchors for l and r forms in the description.
If that's what it did do, it might possibly bring us closer to a good
solution for camel-case matching.

But as the regex above indicates, that isn't the case. I don't really
see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to
the beginning. I think those forms came about as an attempt to get
camel-case to work.

> * Why do the examples in the manual add r:|=* to the end of each
> matcher? This appears to make no difference at all.

For the case where the cursor is in the middle rather than the end. For
the example from the manual with Usenet group names like
comp.sources.unix, try c.s.u with the cursor after the s.

There are three components. Two have a dot anchor at the end. The final
has an end-of-string anchor.

> * It appears that the order of "match descriptions" in a matchers
> matters, but it is unclear to me in what way and it isn't mentioned in
> the manual. For example, the pairs of matchers below differ only in
> the order of their match descriptions, yet each produces a different
> behavior. How are the match descriptions inside a matcher evaluated
> and what causes the difference between these?

Order shouldn't really matter (apart from the x: matcher).

As I mention earlier, you can consider it as being the alternaton of all
of them - at every point in the command-line where one of them can do
something. So a single match may rely on more than one matching control
to be matched. I can imagine that order might matter where you have mixed
up anchors. An example would be interesting.

>   * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd
> a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not.

In my testing, neither do. Where is the cursor? You can think of the
matching as adding .* at the cursor position so a/b completes to a/bc
with no matching control if the cursor is at the end. The lack of other
candidate completions can also confuse testing of this because with
prefix completion, a/bc can be the only unambiguous match. Are you sure
you don't have other customisations that is allowing the first case to
match.

The l: pattern allows punctuation after any character so a/b becomes the
pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]])

The r: pattern allows anything before the punctuation so a/b becomes the
pattern a*/b

>   * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_]
> m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the
> unambiguous substring on the first try, but on the second try, only
> the former will then list both completions, whereas the latter will
> complete only 'a-b'.

I'm not sure I follow what you mean by the first and second try. If you
mean a second press of <tab>, matching is done completely anew with the
new command-line contents.

With just compadd -M 'l:?|=[_-]' - a-b a_b
ab<tab> offers both candidates as matches.
Adding 'm:-=_' in just means that completion after a-b will also match
a_b
Single element correspondence classes are pointless by the way.

Especially with the uppercase forms (L: etc) it is easy to create
situations where an unambiguous substring is inserted and the set of
candidate matches is quite different with the new command-line contents.
The effect can be somewhat jarring and has the appearance of a bug.

Oliver

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-09-26 13:09 ` Oliver Kiddle
@ 2021-10-08 22:38   ` Marlon Richert
  2021-10-09 16:23     ` Bart Schaefer
  2021-10-09 21:59     ` Oliver Kiddle
  0 siblings, 2 replies; 11+ messages in thread
From: Marlon Richert @ 2021-10-08 22:38 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh Users

[-- Attachment #1: Type: text/plain, Size: 8626 bytes --]

Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't
quite understand all of it, though. Let me try to explain how I've
understood things, but in a way that I find easier to process, and do
please correct me where I'm wrong.

The way I've understood it, is that, if $word contains the command line
string for which completion is attempted, then each matcher should
transform $word as follows:

*              m:$lpat=$tpat -> ${word//$~lpat/$~tpat}

*              b:$lpat=$tpat -> ${word/#$~lpat/$~tpat}
*             l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat}
*         l:||$ranchor=$tpat -> ${word/#(#b)($~ranchor)/$~tpat$match[1]}
*     l:$lanchor|$lpat=$tpat ->
${word//(#b)($~lanchor)$~lpat/$match[1]$~tpat}
* l:$lanchor||$ranchor=$tpat ->
${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]}

*              e:$lpat=$tpat -> ${word/%$~lpat/$~tpat}
*             r:$lpat|=$tpat -> ${word/%$~lpat/$~tpat}
*         r:$lanchor||=$tpat -> ${word/%(#b)($~lanchor)/$match[1]$~tpat}
*     r:$lpat|$ranchor=$tpat ->
${word//(#b)$~lpat($~ranchor)/$~tpat$match[1]}
* r:$lanchor||$ranchor=$tpat
-> ${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]}

However, this leaves several transformations identical, which makes me
believe I've misunderstood something.

What did I miss?


On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Marlon Richert wrote:
> > How can I make a matcher that completes the right-most part (and only
> > the right-most part) of each subword? That is, given a target
> > completion 'abcDefGhi', how do I make a match specification that
> > completes inputs
>
> If you're trying to do camel-case matching, one option is:
>   'r:|[A-Z]=* r:|=*'
>
> The following was used by the original creator of matching control, it
> works and breaks for the same cases as above in your example:
>   'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*'
>
> These allow extra characters at the beginning. So in your example, D
> and DG match the target. There are also oddities with consecutive runs
> of upper case characters, consider e.g. completion after ssh -o where
> there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally
> would.
>
> With matching control, it is often easiest if you view it as converting
> what is on the command-line into a regular expression. I haven't probed
> the source code to get a precise view of how these are mapped. For my
> own purposes, I keep a list but don't trust it in all cases because I've
> found contradictory examples and tweaked it more than once, perhaps
> making it less accurate in the process. So with the caveat that this
> may contain errors, my current list is as follows:
>
> Not that that starting point is:
>   [cursor position] → .*
> Then:
>   'm:a=b'       – a     → b             (* doesn't work on rhs)
>   'r:|b=*'      – b     → [^b]*b
>   'r:a|b=*'     – ab    → [^b]*a?b
>   'r:a|b=c'     - ab    → cb
>   'l:a|=*'      – a     → [^a]*a
>   'l:a|b=*'     – ab    → [^a]*ab?
>   'l:a|b=c'     – ab    → ac
>   'b:a=*'       – ^a    → .*
>   'b:a=c'       – ^a    → ^c
>   'e:a=*'       – a$    → .*
>   'r:a||b=*'    – b     → [^a]*ab       (only * works on rhs, empty a or
b has no use)
>   'l:a||b=*'    – ^a    → a.*           (only * on rhs, empty a no use, b
ignored?!)
>
> Something like [A-Z] becomes it's concrete form from the command-line in
the regex
> For correspondence classes, the corresponding form goes in the regex and
only work with m:/M: forms.
> ** is like * but with .* instead of [^x]*
>
> In all cases, the original unchanged form also passes - a matching
> control does not have to be used. I've excluded those in the regular
> expressions above. But including them note the following potentially
> useful effects with an empty lpat:
>
>   'r:|b=c'      – b     → c?b
>   'l:a|=c'      – a     → ac?
>
> When composing multiple matching controls, it doesn't try to apply over
> the results of the previous. You can consider it an alternation of the
> effect of each matching control.
>
> So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?)
>
> For the most part there are certain common forms and if you stick to
> those, you find fewer bugs than when being creative.
>
> The || forms seem buggy to me. From the documentation, my assumption
> would be that one means a[^a]*b and the other a[^b]*b
> That could be more helpful for camel-case but I would need to generate
> tests to say for sure.
> b seems to even be ignored for the l form.
>
> > Additionally, the following are unclear to me from the manual:
> > * What is the exact difference between l:lanchor||ranchor=tpat and
> > r:lanchor||ranchor=tpat ?
>
> From the documentation and assuming some actual symmetry I would assume
> the difference to be that lanchor needs to match the completion
> candidate but not the command-line, while a tpat of * will not match
> ranchor – swap l and r anchors for l and r forms in the description.
> If that's what it did do, it might possibly bring us closer to a good
> solution for camel-case matching.
>
> But as the regex above indicates, that isn't the case. I don't really
> see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to
> the beginning. I think those forms came about as an attempt to get
> camel-case to work.
>
> > * Why do the examples in the manual add r:|=* to the end of each
> > matcher? This appears to make no difference at all.
>
> For the case where the cursor is in the middle rather than the end. For
> the example from the manual with Usenet group names like
> comp.sources.unix, try c.s.u with the cursor after the s.
>
> There are three components. Two have a dot anchor at the end. The final
> has an end-of-string anchor.
>
> > * It appears that the order of "match descriptions" in a matchers
> > matters, but it is unclear to me in what way and it isn't mentioned in
> > the manual. For example, the pairs of matchers below differ only in
> > the order of their match descriptions, yet each produces a different
> > behavior. How are the match descriptions inside a matcher evaluated
> > and what causes the difference between these?
>
> Order shouldn't really matter (apart from the x: matcher).
>
> As I mention earlier, you can consider it as being the alternaton of all
> of them - at every point in the command-line where one of them can do
> something. So a single match may rely on more than one matching control
> to be matched. I can imagine that order might matter where you have mixed
> up anchors. An example would be interesting.
>
> >   * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd
> > a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not.
>
> In my testing, neither do. Where is the cursor? You can think of the
> matching as adding .* at the cursor position so a/b completes to a/bc
> with no matching control if the cursor is at the end. The lack of other
> candidate completions can also confuse testing of this because with
> prefix completion, a/bc can be the only unambiguous match. Are you sure
> you don't have other customisations that is allowing the first case to
> match.
>
> The l: pattern allows punctuation after any character so a/b becomes the
> pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]])
>
> The r: pattern allows anything before the punctuation so a/b becomes the
> pattern a*/b
>
> >   * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_]
> > m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the
> > unambiguous substring on the first try, but on the second try, only
> > the former will then list both completions, whereas the latter will
> > complete only 'a-b'.
>
> I'm not sure I follow what you mean by the first and second try. If you
> mean a second press of <tab>, matching is done completely anew with the
> new command-line contents.
>
> With just compadd -M 'l:?|=[_-]' - a-b a_b
> ab<tab> offers both candidates as matches.
> Adding 'm:-=_' in just means that completion after a-b will also match
> a_b
> Single element correspondence classes are pointless by the way.
>
> Especially with the uppercase forms (L: etc) it is easy to create
> situations where an unambiguous substring is inserted and the set of
> candidate matches is quite different with the new command-line contents.
> The effect can be somewhat jarring and has the appearance of a bug.
>
> Oliver

[-- Attachment #2: Type: text/html, Size: 11357 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-08 22:38   ` Marlon Richert
@ 2021-10-09 16:23     ` Bart Schaefer
  2021-10-09 22:12       ` Marlon Richert
  2021-10-09 21:59     ` Oliver Kiddle
  1 sibling, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2021-10-09 16:23 UTC (permalink / raw)
  To: Marlon Richert; +Cc: Oliver Kiddle, Zsh Users

On Fri, Oct 8, 2021 at 3:39 PM Marlon Richert <marlon.richert@gmail.com> wrote:
>
> The way I've understood it, is that, if $word contains the command line string for which completion is attempted, then each matcher should transform $word as follows:
>
> What did I miss?

I think what you've missed is that there are two things being
examined:  The word on the command line, and the "trial completion",
that is, the word passed to compadd that might replace the one on the
command line.  It's not merely (choosing the first of your seeming
duplications)

> *              b:$lpat=$tpat -> ${word/#$~lpat/$~tpat}
> *             l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat}

Rather it's

*              b:$lpat=$tpat -> [[ $trial = ${~lpat}* ]] &&
${word/#$~lpat/$~tpat}
*             l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] &&
${word/#$~lpat/$~tpat}

In the cases with r: and R:, $ranchor is only compared to $trial, it
is not used when replacing into $word.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-08 22:38   ` Marlon Richert
  2021-10-09 16:23     ` Bart Schaefer
@ 2021-10-09 21:59     ` Oliver Kiddle
  2021-10-10 12:05       ` Marlon Richert
  1 sibling, 1 reply; 11+ messages in thread
From: Oliver Kiddle @ 2021-10-09 21:59 UTC (permalink / raw)
  To: Marlon Richert; +Cc: Zsh Users

Marlon Richert wrote:
> Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't quite
> understand all of it, though. Let me try to explain how I've understood things,
> but in a way that I find easier to process, and do please correct me where I'm
> wrong.
>
> The way I've understood it, is that, if $word contains the command line string
> for which completion is attempted, then each matcher should transform $word as
> follows:

That's not what the implementation does in any real sense so I'm not
sure how helpful it is to reframe the regular expressions I gave in zsh
syntax. But the effect is along those basic lines if you view the
"transformed" $word as being a pattern that is matched against each of
the candidate matches in turn to decide which to present as matches.

I find it helpful as a brief reference but if it doesn't make sense to
you, ignore it.

> However, this leaves several transformations identical, which makes me believe
> I've misunderstood something.
>
> What did I miss?

The difference between b: and l: with an empty anchor (or e/r) is not
encapsulated by my regular expressions. They only differ in how strict
the anchoring to the start of the match is where another matching
control allowed extra characters to be inserted at the beginning.

The example given when this was added was zsh option completion where
underscores are ignored and a prefix of NO is allowed.

I took a look at the source code and dug out original -workers posts and
it does seem that the intention for the two anchor || forms was as I
thought. Even as designed I don't think either is ideal for camel case -
the l: form excludes characters from the wrong anchor for that.
The matching code looks a lot like regular expression matching with a
back tracking algorithm.

Oliver

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-09 16:23     ` Bart Schaefer
@ 2021-10-09 22:12       ` Marlon Richert
  2021-10-09 22:39         ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Marlon Richert @ 2021-10-09 22:12 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh Users

On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> I think what you've missed is that there are two things being
> examined:  The word on the command line, and the "trial completion",
> that is, the word passed to compadd that might replace the one on the
> command line.  It's not merely (choosing the first of your seeming
> duplications)
>
> > *              b:$lpat=$tpat -> ${word/#$~lpat/$~tpat}
> > *             l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat}
>
> Rather it's
>
> *              b:$lpat=$tpat -> [[ $trial = ${~lpat}* ]] &&
> ${word/#$~lpat/$~tpat}
> *             l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] &&
> ${word/#$~lpat/$~tpat}

Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor
here? In the docs, it says:

> Matching for lpat and tpat is as for m and M, but the pattern lpat matched on the command line must be preceded by the pattern lanchor. The lanchor can be blank to anchor the match to the start of the command line string; otherwise the anchor can occur anywhere, but must match in both the command line and trial completion strings.

Above, $lanchor is blank and thus needs to match only the start of the
command line string, whereas anything to the right of $lanchor that
matches $~lpat is simply replaced with $~tpat, just as in
m:$lpat=$tpat. Are the docs wrong or am I understanding them wrong?

Thanks, though, for this example, because it does help me understand
what the documentation means with:

> If no lpat is given but a ranchor is, this matches the gap between substrings matched by lanchor and ranchor. Unlike lanchor, the ranchor only needs to match the trial completion string.

Before, it was unclear to me how I should interpret "only needs to
match the trial completion string", but now, I suppose it would be
like this:

l:$lanchor||$ranchor=$tpat -> [[ $trial == *$~lanchor$~tpat$~ranchor*
]] && ${word//(#m)($~lanchor)/$MATCH$~tpat}

However, isn't that equivalent to the following?

l:$lanchor||$ranchor=$tpat -> ${word//(#m)($~lanchor)/$MATCH$~tpat$~ranchor}

Correct me if I'm wrong, but this seems to match the exact same trial
strings. But then, the above transformation would be equivalent to
this one:

l:$lanchor|=$tpat$ranchor -> ${word//(#b)($~lanchor)/$match[1]$~tpat$~ranchor}

Again, something doesn't seem right here.

On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> In the cases with r: and R:, $ranchor is only compared to $trial, it
> is not used when replacing into $word.

Are you sure? The phrase "the ranchor only needs to match the trial
completion string" is listed in the docs for l and L. Conversely, in
the docs for r and R, it says "As l, L, b and B, with the difference
that the command line and trial completion patterns are anchored on
the right side." This makes me believe that, in the case of r and R,
it is in fact $lanchor that is compared only to $trial, not $ranchor.
If your interpretation were correct, l:$lanchor||$ranchor=$tpat and
r:$lanchor||$ranchor=$tpat would be completely equivalent.

Perhaps I should consider the _examples_ in the docs to be the truth
and just ignore the ambiguous wording given earlier on in the docs. If
I do that, then Oliver's examples now start to make sense to me and I
can deduce the transformations as follows:

Given 'r:|.=* r:|=*', c.u becomes c(^*.*).u* and c.s.u becomes
c(^*.*).s(^*.*).u*. Ergo:

* r:$lpat|$ranchor=$tpat  ->
${word//(#b)$~lpat($~ranchor)/($~tpat~*$~ranchor*)$match[1]}
* l:$lanchor|$lpat=$tpat  ->
${word//(#b)($~lanchor)$~lpat/$match[1]($~tpat~*$~lanchor*)}
* r:$lpat|=$tpat  ->  ${word%$~lpat}$~tpat
* l:|$lpat=$tpat  ->  $~tpat${word#$~lpat}

Given  'r:|.=** r:|=*', c.u becomes c*.u*.
Given 'r:|[[:upper:]0-9]=** r:|=*', H becomes *H* and 2 becomes *2*.
Ergo:

* r:$lpat|$ranchor=**  ->  ${word//(#b)$~lpat($~ranchor)/*$match[1]}
* l:$lanchor|$lpat=**  ->  ${word//(#b)($~lanchor)$~lpat/$match[1]*}

Given 'r:[^[:upper:]0-9]||[[:upper:]0-9]=** r:|=*', H becomes
*[^[:upper:]0-9]H* and 2 becomes *[^[:upper:]0-9]2*. Ergo:

* r:$lanchor||$ranchor=**  ->  ${word//(#m)($~ranchor)/*$~lanchor$MATCH}
* l:$lanchor||$ranchor=**  ->  ${word//(#m)($~lanchor)/$MATCH$~lanchor*}

Given 'B:[nN][oO]= M:_= M:{[:upper:]}={[:lower:]}', both _NO_f and
NONO_f become f. Ergo:

* b:$lpat=$tpat  ->  ${word/#(#b)(|?)($~lpat)##/$match[1]$~tpat}
* e:$lpat=$tpat  ->  ${word/%(#b)($~lpat)##(|?)/$~tpat$match[-1]}

How about changing the docs to just literally state the transformation
that each matcher applies? It would be clearer than the prose it
currently contains, which is ambiguous and open to interpretation.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-09 22:12       ` Marlon Richert
@ 2021-10-09 22:39         ` Bart Schaefer
  2021-10-10 11:17           ` Marlon Richert
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2021-10-09 22:39 UTC (permalink / raw)
  To: Marlon Richert; +Cc: Oliver Kiddle, Zsh Users

On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote:
>
> On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> >
> > *             l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] &&
> > ${word/#$~lpat/$~tpat}
>
> Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor
> here?

Well, sort of, yes.  See Oliver's more recent message.  A better
description of what's happening is that the matcher transforms the
word from the command line into a pattern, and then that pattern is
compared to every one of the trial candidates, and then pieces of the
trial candidates are extracted and merged with the word from the
command line to generate the list of possible replacements for that
word.  It's never as simple as a string substitution on the word
itself taken directly from the patterns in the matcher.

> How about changing the docs to just literally state the transformation
> that each matcher applies?

Because it's not a literal transformation.  Matchers don't transform,
they create a comparison between the command line and the compadd
strings and define which parts of the command line can be replaced by
what parts of the compadd strings when that comparison finds a match.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-09 22:39         ` Bart Schaefer
@ 2021-10-10 11:17           ` Marlon Richert
  0 siblings, 0 replies; 11+ messages in thread
From: Marlon Richert @ 2021-10-10 11:17 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh Users

On Sun, Oct 10, 2021 at 1:40 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote:
> >
> > On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote:
> > >
> > > *             l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] &&
> > > ${word/#$~lpat/$~tpat}
> >
> > Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor
> > here?
>
> Well, sort of, yes.  See Oliver's more recent message.  A better
> description of what's happening is that the matcher transforms the
> word from the command line into a pattern, and then that pattern is
> compared to every one of the trial candidates, and then pieces of the
> trial candidates are extracted and merged with the word from the
> command line to generate the list of possible replacements for that
> word.  It's never as simple as a string substitution on the word
> itself taken directly from the patterns in the matcher.

Having an explanation like this in the docs would help so much! :)

Or what would help even more is to put it in a step-by-step form. For example:

> 1. Each matcher generates a search pattern by taking the word on the command line (or the pattern produced by the previous matcher) and applying a transformation specific to the matcher. If the matcher has an uppercase letter, it also captures the original substrings of the command line word that it transformed.
> 2. After all matchers are applied, the resulting search pattern is used to find matching completions.
> 3. For each uppercase matcher, substrings captured from the word on the command line are then inserted into the matching completions.

And then each matcher could state exactly how it produces its search
pattern. For example:

> r:lanchor||ranchor=tpat
> l:lanchor||ranchor=tpat
>
> 1. Find each substring in the word on the command line that matches pattern ranchor (for r:) or lanchor (for l:). If this anchor is empty, it matches the end (for r:) or the beginning (for l:) of the word on the command line.
> 2. Insert a pattern to the left (for r:) or right (for l:) of each substring:
>    * If tpat is **, then insert `*lanchor` (for r:) or `ranchor*` (for l:).
>    * Otherwise, insert `(tpat~*lanchor*)lanchor` (for r:) or `ranchor(tpat~*ranchor*)` (for l:).
>
> Example: If the word on the command line is `H2`, then the match spec r:[[:lower:]]|[[:upper:][:digit:]]=** captures the substrings 'H' and '2' and generates the search pattern `*[[:lower:]]H*[[:lower:]]2`.


On Sun, Oct 10, 2021 at 1:40 AM Bart Schaefer <schaefer@brasslantern.com> wrote:
>
> On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote:
> >
> > How about changing the docs to just literally state the transformation
> > that each matcher applies?
>
> Because it's not a literal transformation.  Matchers don't transform,
> they create a comparison between the command line and the compadd
> strings and define which parts of the command line can be replaced by
> what parts of the compadd strings when that comparison finds a match.

Perhaps "transformation" is not the word I should've used. What I
meant is that each matcher generates a pattern, using as input the
word on the command line or the pattern generated by the previous
matcher. After this has been done in turn for each matcher, the
resulting pattern is then used to find matching completions. Isn't
that correct?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-09 21:59     ` Oliver Kiddle
@ 2021-10-10 12:05       ` Marlon Richert
  2021-10-10 20:14         ` Marlon Richert
  0 siblings, 1 reply; 11+ messages in thread
From: Marlon Richert @ 2021-10-10 12:05 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh Users

On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote:
>
> Marlon Richert wrote:
> > Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't quite
> > understand all of it, though. Let me try to explain how I've understood things,
> > but in a way that I find easier to process, and do please correct me where I'm
> > wrong.
> >
> > The way I've understood it, is that, if $word contains the command line string
> > for which completion is attempted, then each matcher should transform $word as
> > follows:
>
> That's not what the implementation does in any real sense so I'm not
> sure how helpful it is to reframe the regular expressions I gave in zsh
> syntax. But the effect is along those basic lines if you view the
> "transformed" $word as being a pattern that is matched against each of
> the candidate matches in turn to decide which to present as matches.
>
> I find it helpful as a brief reference but if it doesn't make sense to
> you, ignore it.

It didn't make sense at first, because I somehow overlooked that you
were using regex. I read them as glob patterns. :)

But now that I realize that, let me have a second look:

On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> With matching control, it is often easiest if you view it as converting
> what is on the command-line into a regular expression. I haven't probed
> the source code to get a precise view of how these are mapped. For my
> own purposes, I keep a list but don't trust it in all cases because I've
> found contradictory examples and tweaked it more than once, perhaps
> making it less accurate in the process. So with the caveat that this
> may contain errors, my current list is as follows:
>
> Not that that starting point is:
>   [cursor position] → .*
> Then:
>   'm:a=b'       – a     → b             (* doesn't work on rhs)
>   'r:|b=*'      – b     → [^b]*b

The appearance of [^a] and [^b] in your patterns was a complete
surprise to me. I would've expected * to work as * in a glob
expression. This is not clear from the docs. Now that I know that the
matcher syntax was based on regex, it makes more sense, but I still
wouldn't have figured this out intuitively. A clearer explanation
about this in the docs would be helpful. Yes, it's mentioned somewhere
in the examples, but it should be explained more clearly earlier on.

>   'r:a|b=*'     – ab    → [^b]*a?b

This one looks incorrect to me as it does not match the example in the
docs. From that example, it appears to me that it is supposed to work
like this:
 'r:a|b=*'     – b    → [^b]*ab

>   'r:a|b=c'     - ab    → cb
>   'l:a|=*'      – a     → [^a]*a
>   'l:a|b=*'     – ab    → [^a]*ab?
Shouldn't these last two result in a[^a]* and ab[^a]*, respectively,
since the anchor goes to the left?

>   'l:a|b=c'     – ab    → ac
>   'b:a=*'       – ^a    → .*

Oh, but here * does work like a * glob? So, I guess * behaves
differently only when anchors are involved?

>   'b:a=c'       – ^a    → ^c
>   'e:a=*'       – a$    → .*
>   'r:a||b=*'    – b     → [^a]*ab       (only * works on rhs, empty a or b has no use)
>   'l:a||b=*'    – ^a    → a.*           (only * on rhs, empty a no use, b ignored?!)

The comments on the last two items sound like bugs to me. Also,
'l:a||b=*' should work on just 'a' and not require '^a'.

On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote:
>
> The difference between b: and l: with an empty anchor (or e/r) is not
> encapsulated by my regular expressions. They only differ in how strict
> the anchoring to the start of the match is where another matching
> control allowed extra characters to be inserted at the beginning.

So, does that mean then that matcher are not evaluated strictly left-to-right?

> The example given when this was added was zsh option completion where
> underscores are ignored and a prefix of NO is allowed.

About that example, what exactly is the difference between L: and B:
that lets B: complete '_NO_f' to '_NO_foo' and 'NONO_f' to 'NONO_f'
but not L:? It's not clear from the example, let alone from the
description of the matchers.

> I took a look at the source code and dug out original -workers posts and
> it does seem that the intention for the two anchor || forms was as I
> thought. Even as designed I don't think either is ideal for camel case -
> the l: form excludes characters from the wrong anchor for that.
> The matching code looks a lot like regular expression matching with a
> back tracking algorithm.

Y02compmatch.ztst contains a lot of examples that could be added to
the docs to better explain how the different matchers are intended to
be used. It would help to better understand their workings.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Questions about completion matchers
  2021-10-10 12:05       ` Marlon Richert
@ 2021-10-10 20:14         ` Marlon Richert
  0 siblings, 0 replies; 11+ messages in thread
From: Marlon Richert @ 2021-10-10 20:14 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh Users

I have to say, after having processed both of your explanations, it
appears that r:lanchor||ranchor=tpat and l:lanchor||ranchor=tpat are
not working as intended. It intuitively feels like they should cover
this very common case:

If lanchor and ranchor are present and adjacent in the command line
string, then apply m:=tpat to the empty string between them. That is
to say: Enable completion between lanchor and ranchor, just like we
can enable completion to the left or right of an anchor.

In terms of syntax, this treats the void between || as an empty lpat,
just like it is in :|lanchor= or :ranchor|=. The || form (and indeed,
the | form) is essentially a conditional version of one of the other
matchers.

This actually extrapolates to a consistent interpretation of the
symbols in the matching syntax:
* lpat is always the substring whose meaning is "transformed": That is
to say, it (and only it) is made to be considered equal to any trial
substring matching tpat. It is permitted for lpat to be equal to the
empty string or the beginning/end of the command line string.
* Each |ranchor or lanchor| adds a constraint: A substring matching
them needs to be directly to the right or left of lpat -- or lpat's
meaning won't be "transformed". The meaning of the anchors themselves
is never "transformed": Any substring matching the anchor on the
command line needs to be matched literally in the trial string.
* For the first anchor in a matcher, the substring matching lpat will
not be considered equivalent to a trial substring that matches the
anchor. This clause is essentially there to prevent the matcher from
becoming too "greedy".
* For the second anchor, there is no such restriction. (Or otherwise,
the matcher could easily become too constrained and unable to match
any trial string at all.)

From this then follows the following meaning of each matcher:
* m:lpat=tpat - Treat each substring matching lpat on the command line
as being equal to any substring matching tpat in the trial string.
* r:lpat|ranchor=** - The same as m:lpat=*, but only if the substring
matching lpat has directly to its right a substring matching ranchor.
* r:lpat|ranchor=tpat - The same as m:lpat=tpat~ranchor, but only if
the substring matching lpat has directly to its right a substring
matching ranchor.
* r:lanchor||ranchor=tpat - The same as r:|ranchor=tpat, but only if
the substring matching ranchor is immediately preceded by a substring
matching lanchor.

One could even continue this pattern, as || is nothing more than
|lpat| with lpat equal to the empty string:
* r:lanchor|lpat|ranchor=tpat - The same as r:lpat|ranchor=tpat, but
only if the substring matching lpat is immediately preceded by a
substring matching lanchor.

However, in practice, the more constraints a matcher has, the more
likely it is to break consistency with this pattern. As a result, the
|| matchers no longer support the case for which it looks that they
were intended - to complete the missing substring between ranchor and
lanchor - which is now, unfortunately, a missing feature.

I would hope the implementation of the || matchers could be modified
to restore this feature -- which I assume must (or was intended to)
have been there at some point.

> On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > With matching control, it is often easiest if you view it as converting
> > what is on the command-line into a regular expression. I haven't probed
> > the source code to get a precise view of how these are mapped. For my
> > own purposes, I keep a list but don't trust it in all cases because I've
> > found contradictory examples and tweaked it more than once, perhaps
> > making it less accurate in the process. So with the caveat that this
> > may contain errors, my current list is as follows:
> >
> > Not that that starting point is:
> >   [cursor position] → .*
> > Then:
> >   'm:a=b'       – a     → b             (* doesn't work on rhs)
> >   'r:|b=*'      – b     → [^b]*b
>
> The appearance of [^a] and [^b] in your patterns was a complete
> surprise to me. I would've expected * to work as * in a glob
> expression. This is not clear from the docs. Now that I know that the
> matcher syntax was based on regex, it makes more sense, but I still
> wouldn't have figured this out intuitively. A clearer explanation
> about this in the docs would be helpful. Yes, it's mentioned somewhere
> in the examples, but it should be explained more clearly earlier on.
>
> >   'r:a|b=*'     – ab    → [^b]*a?b
>
> This one looks incorrect to me as it does not match the example in the
> docs. From that example, it appears to me that it is supposed to work
> like this:
>  'r:a|b=*'     – b    → [^b]*ab
>
> >   'r:a|b=c'     - ab    → cb
> >   'l:a|=*'      – a     → [^a]*a
> >   'l:a|b=*'     – ab    → [^a]*ab?
> Shouldn't these last two result in a[^a]* and ab[^a]*, respectively,
> since the anchor goes to the left?
>
> >   'l:a|b=c'     – ab    → ac
> >   'b:a=*'       – ^a    → .*
>
> Oh, but here * does work like a * glob? So, I guess * behaves
> differently only when anchors are involved?
>
> >   'b:a=c'       – ^a    → ^c
> >   'e:a=*'       – a$    → .*
> >   'r:a||b=*'    – b     → [^a]*ab       (only * works on rhs, empty a or b has no use)
> >   'l:a||b=*'    – ^a    → a.*           (only * on rhs, empty a no use, b ignored?!)
>
> The comments on the last two items sound like bugs to me. Also,
> 'l:a||b=*' should work on just 'a' and not require '^a'.
>
>
> On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote:
> >
> > The difference between b: and l: with an empty anchor (or e/r) is not
> > encapsulated by my regular expressions. They only differ in how strict
> > the anchoring to the start of the match is where another matching
> > control allowed extra characters to be inserted at the beginning.
>
> So, does that mean then that matcher are not evaluated strictly left-to-right?
>
> > The example given when this was added was zsh option completion where
> > underscores are ignored and a prefix of NO is allowed.
>
> About that example, what exactly is the difference between L: and B:
> that lets B: complete '_NO_f' to '_NO_foo' and 'NONO_f' to 'NONO_f'
> but not L:? It's not clear from the example, let alone from the
> description of the matchers.
>
> > I took a look at the source code and dug out original -workers posts and
> > it does seem that the intention for the two anchor || forms was as I
> > thought. Even as designed I don't think either is ideal for camel case -
> > the l: form excludes characters from the wrong anchor for that.
> > The matching code looks a lot like regular expression matching with a
> > back tracking algorithm.
>
> Y02compmatch.ztst contains a lot of examples that could be added to
> the docs to better explain how the different matchers are intended to
> be used. It would help to better understand their workings.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-10 20:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-21  9:23 Questions about completion matchers Marlon Richert
2021-09-22 23:25 ` Bart Schaefer
2021-09-26 13:09 ` Oliver Kiddle
2021-10-08 22:38   ` Marlon Richert
2021-10-09 16:23     ` Bart Schaefer
2021-10-09 22:12       ` Marlon Richert
2021-10-09 22:39         ` Bart Schaefer
2021-10-10 11:17           ` Marlon Richert
2021-10-09 21:59     ` Oliver Kiddle
2021-10-10 12:05       ` Marlon Richert
2021-10-10 20:14         ` Marlon Richert

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).