Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't
quite understand all of it, though. Let me try to explain how I've
understood things, but in a way that I find easier to process, and do
please correct me where I'm wrong.

The way I've understood it, is that, if $word contains the command line
string for which completion is attempted, then each matcher should
transform $word as follows:

*              m:$lpat=$tpat -> ${word//$~lpat/$~tpat}

*              b:$lpat=$tpat -> ${word/#$~lpat/$~tpat}
*             l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat}
*         l:||$ranchor=$tpat -> ${word/#(#b)($~ranchor)/$~tpat$match[1]}
*     l:$lanchor|$lpat=$tpat ->
${word//(#b)($~lanchor)$~lpat/$match[1]$~tpat}
* l:$lanchor||$ranchor=$tpat ->
${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]}

*              e:$lpat=$tpat -> ${word/%$~lpat/$~tpat}
*             r:$lpat|=$tpat -> ${word/%$~lpat/$~tpat}
*         r:$lanchor||=$tpat -> ${word/%(#b)($~lanchor)/$match[1]$~tpat}
*     r:$lpat|$ranchor=$tpat ->
${word//(#b)$~lpat($~ranchor)/$~tpat$match[1]}
* r:$lanchor||$ranchor=$tpat
-> ${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]}

However, this leaves several transformations identical, which makes me
believe I've misunderstood something.

What did I miss?


On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote:
>
> Marlon Richert wrote:
> > How can I make a matcher that completes the right-most part (and only
> > the right-most part) of each subword? That is, given a target
> > completion 'abcDefGhi', how do I make a match specification that
> > completes inputs
>
> If you're trying to do camel-case matching, one option is:
>   'r:|[A-Z]=* r:|=*'
>
> The following was used by the original creator of matching control, it
> works and breaks for the same cases as above in your example:
>   'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*'
>
> These allow extra characters at the beginning. So in your example, D
> and DG match the target. There are also oddities with consecutive runs
> of upper case characters, consider e.g. completion after ssh -o where
> there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally
> would.
>
> With matching control, it is often easiest if you view it as converting
> what is on the command-line into a regular expression. I haven't probed
> the source code to get a precise view of how these are mapped. For my
> own purposes, I keep a list but don't trust it in all cases because I've
> found contradictory examples and tweaked it more than once, perhaps
> making it less accurate in the process. So with the caveat that this
> may contain errors, my current list is as follows:
>
> Not that that starting point is:
>   [cursor position] → .*
> Then:
>   'm:a=b'       – a     → b             (* doesn't work on rhs)
>   'r:|b=*'      – b     → [^b]*b
>   'r:a|b=*'     – ab    → [^b]*a?b
>   'r:a|b=c'     - ab    → cb
>   'l:a|=*'      – a     → [^a]*a
>   'l:a|b=*'     – ab    → [^a]*ab?
>   'l:a|b=c'     – ab    → ac
>   'b:a=*'       – ^a    → .*
>   'b:a=c'       – ^a    → ^c
>   'e:a=*'       – a$    → .*
>   'r:a||b=*'    – b     → [^a]*ab       (only * works on rhs, empty a or
b has no use)
>   'l:a||b=*'    – ^a    → a.*           (only * on rhs, empty a no use, b
ignored?!)
>
> Something like [A-Z] becomes it's concrete form from the command-line in
the regex
> For correspondence classes, the corresponding form goes in the regex and
only work with m:/M: forms.
> ** is like * but with .* instead of [^x]*
>
> In all cases, the original unchanged form also passes - a matching
> control does not have to be used. I've excluded those in the regular
> expressions above. But including them note the following potentially
> useful effects with an empty lpat:
>
>   'r:|b=c'      – b     → c?b
>   'l:a|=c'      – a     → ac?
>
> When composing multiple matching controls, it doesn't try to apply over
> the results of the previous. You can consider it an alternation of the
> effect of each matching control.
>
> So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?)
>
> For the most part there are certain common forms and if you stick to
> those, you find fewer bugs than when being creative.
>
> The || forms seem buggy to me. From the documentation, my assumption
> would be that one means a[^a]*b and the other a[^b]*b
> That could be more helpful for camel-case but I would need to generate
> tests to say for sure.
> b seems to even be ignored for the l form.
>
> > Additionally, the following are unclear to me from the manual:
> > * What is the exact difference between l:lanchor||ranchor=tpat and
> > r:lanchor||ranchor=tpat ?
>
> From the documentation and assuming some actual symmetry I would assume
> the difference to be that lanchor needs to match the completion
> candidate but not the command-line, while a tpat of * will not match
> ranchor – swap l and r anchors for l and r forms in the description.
> If that's what it did do, it might possibly bring us closer to a good
> solution for camel-case matching.
>
> But as the regex above indicates, that isn't the case. I don't really
> see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to
> the beginning. I think those forms came about as an attempt to get
> camel-case to work.
>
> > * Why do the examples in the manual add r:|=* to the end of each
> > matcher? This appears to make no difference at all.
>
> For the case where the cursor is in the middle rather than the end. For
> the example from the manual with Usenet group names like
> comp.sources.unix, try c.s.u with the cursor after the s.
>
> There are three components. Two have a dot anchor at the end. The final
> has an end-of-string anchor.
>
> > * It appears that the order of "match descriptions" in a matchers
> > matters, but it is unclear to me in what way and it isn't mentioned in
> > the manual. For example, the pairs of matchers below differ only in
> > the order of their match descriptions, yet each produces a different
> > behavior. How are the match descriptions inside a matcher evaluated
> > and what causes the difference between these?
>
> Order shouldn't really matter (apart from the x: matcher).
>
> As I mention earlier, you can consider it as being the alternaton of all
> of them - at every point in the command-line where one of them can do
> something. So a single match may rely on more than one matching control
> to be matched. I can imagine that order might matter where you have mixed
> up anchors. An example would be interesting.
>
> >   * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd
> > a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not.
>
> In my testing, neither do. Where is the cursor? You can think of the
> matching as adding .* at the cursor position so a/b completes to a/bc
> with no matching control if the cursor is at the end. The lack of other
> candidate completions can also confuse testing of this because with
> prefix completion, a/bc can be the only unambiguous match. Are you sure
> you don't have other customisations that is allowing the first case to
> match.
>
> The l: pattern allows punctuation after any character so a/b becomes the
> pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]])
>
> The r: pattern allows anything before the punctuation so a/b becomes the
> pattern a*/b
>
> >   * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_]
> > m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the
> > unambiguous substring on the first try, but on the second try, only
> > the former will then list both completions, whereas the latter will
> > complete only 'a-b'.
>
> I'm not sure I follow what you mean by the first and second try. If you
> mean a second press of <tab>, matching is done completely anew with the
> new command-line contents.
>
> With just compadd -M 'l:?|=[_-]' - a-b a_b
> ab<tab> offers both candidates as matches.
> Adding 'm:-=_' in just means that completion after a-b will also match
> a_b
> Single element correspondence classes are pointless by the way.
>
> Especially with the uppercase forms (L: etc) it is easy to create
> situations where an unambiguous substring is inserted and the set of
> candidate matches is quite different with the new command-line contents.
> The effect can be somewhat jarring and has the appearance of a bug.
>
> Oliver