* Questions about completion matchers @ 2021-09-21 9:23 Marlon Richert 2021-09-22 23:25 ` Bart Schaefer 2021-09-26 13:09 ` Oliver Kiddle 0 siblings, 2 replies; 11+ messages in thread From: Marlon Richert @ 2021-09-21 9:23 UTC (permalink / raw) To: Zsh Users How can I make a matcher that completes the right-most part (and only the right-most part) of each subword? That is, given a target completion 'abcDefGhi', how do I make a match specification that completes inputs * a * aD * abD * aDG * aDe * aDeG to this target, but not inputs * D * aG * acD * DG * aDf * aDeGi ? Additionally, the following are unclear to me from the manual: * What is the exact difference between l:lanchor||ranchor=tpat and r:lanchor||ranchor=tpat ? * Why do the examples in the manual add r:|=* to the end of each matcher? This appears to make no difference at all. * It appears that the order of "match descriptions" in a matchers matters, but it is unclear to me in what way and it isn't mentioned in the manual. For example, the pairs of matchers below differ only in the order of their match descriptions, yet each produces a different behavior. How are the match descriptions inside a matcher evaluated and what causes the difference between these? * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not. * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_] m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the unambiguous substring on the first try, but on the second try, only the former will then list both completions, whereas the latter will complete only 'a-b'. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-09-21 9:23 Questions about completion matchers Marlon Richert @ 2021-09-22 23:25 ` Bart Schaefer 2021-09-26 13:09 ` Oliver Kiddle 1 sibling, 0 replies; 11+ messages in thread From: Bart Schaefer @ 2021-09-22 23:25 UTC (permalink / raw) To: Marlon Richert; +Cc: Zsh Users On Tue, Sep 21, 2021 at 2:23 AM Marlon Richert <marlon.richert@gmail.com> wrote: > > How can I make a matcher that completes the right-most part (and only > the right-most part) of each subword? I would not try to do this with a matcher specification ... someone else (Oliver?) may be able to give a more accurate answer, but I don't think matchers are very good at splitting up words unless there is an anchor character ("." or "-" for example) to subdivide the words. I know there's an example that purports to handle a similar situation, but the more you want to constrain it ("only the right-most part") the uglier it gets. Instead I'd probably write a completer function that creates a modified words array using match-words-by-style, then compset the appropriate prefix and suffix. But I haven't gone very far down that road. > * What is the exact difference between l:lanchor||ranchor=tpat and > r:lanchor||ranchor=tpat ? Again I'm not the ultimate expert here, but "lanchor" always has to appear on the command line and with "l:" it has to appear to the left of the matched substring (but not inside it) and with "r:" it has to appear to the right of the matched substring (but again not inside it). In both cases ranchor has to appear in the potential completion result (the "trial completion") but might bound a range on the command line if it does match there. In practice I've nearly always seen these to be empty strings. > * Why do the examples in the manual add r:|=* to the end of each > matcher? This appears to make no difference at all. All of my real-life uses are "r:|=**" ... I don't know the answer to this one. > * How are the match descriptions inside a matcher evaluated > and what causes the difference between these? I believe they're applied left to right and each one constrains the possibilities seen by the next, based on what's already on the command line when you invoke completion. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-09-21 9:23 Questions about completion matchers Marlon Richert 2021-09-22 23:25 ` Bart Schaefer @ 2021-09-26 13:09 ` Oliver Kiddle 2021-10-08 22:38 ` Marlon Richert 1 sibling, 1 reply; 11+ messages in thread From: Oliver Kiddle @ 2021-09-26 13:09 UTC (permalink / raw) To: Marlon Richert; +Cc: Zsh Users Marlon Richert wrote: > How can I make a matcher that completes the right-most part (and only > the right-most part) of each subword? That is, given a target > completion 'abcDefGhi', how do I make a match specification that > completes inputs If you're trying to do camel-case matching, one option is: 'r:|[A-Z]=* r:|=*' The following was used by the original creator of matching control, it works and breaks for the same cases as above in your example: 'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*' These allow extra characters at the beginning. So in your example, D and DG match the target. There are also oddities with consecutive runs of upper case characters, consider e.g. completion after ssh -o where there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally would. With matching control, it is often easiest if you view it as converting what is on the command-line into a regular expression. I haven't probed the source code to get a precise view of how these are mapped. For my own purposes, I keep a list but don't trust it in all cases because I've found contradictory examples and tweaked it more than once, perhaps making it less accurate in the process. So with the caveat that this may contain errors, my current list is as follows: Not that that starting point is: [cursor position] → .* Then: 'm:a=b' – a → b (* doesn't work on rhs) 'r:|b=*' – b → [^b]*b 'r:a|b=*' – ab → [^b]*a?b 'r:a|b=c' - ab → cb 'l:a|=*' – a → [^a]*a 'l:a|b=*' – ab → [^a]*ab? 'l:a|b=c' – ab → ac 'b:a=*' – ^a → .* 'b:a=c' – ^a → ^c 'e:a=*' – a$ → .* 'r:a||b=*' – b → [^a]*ab (only * works on rhs, empty a or b has no use) 'l:a||b=*' – ^a → a.* (only * on rhs, empty a no use, b ignored?!) Something like [A-Z] becomes it's concrete form from the command-line in the regex For correspondence classes, the corresponding form goes in the regex and only work with m:/M: forms. ** is like * but with .* instead of [^x]* In all cases, the original unchanged form also passes - a matching control does not have to be used. I've excluded those in the regular expressions above. But including them note the following potentially useful effects with an empty lpat: 'r:|b=c' – b → c?b 'l:a|=c' – a → ac? When composing multiple matching controls, it doesn't try to apply over the results of the previous. You can consider it an alternation of the effect of each matching control. So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?) For the most part there are certain common forms and if you stick to those, you find fewer bugs than when being creative. The || forms seem buggy to me. From the documentation, my assumption would be that one means a[^a]*b and the other a[^b]*b That could be more helpful for camel-case but I would need to generate tests to say for sure. b seems to even be ignored for the l form. > Additionally, the following are unclear to me from the manual: > * What is the exact difference between l:lanchor||ranchor=tpat and > r:lanchor||ranchor=tpat ? From the documentation and assuming some actual symmetry I would assume the difference to be that lanchor needs to match the completion candidate but not the command-line, while a tpat of * will not match ranchor – swap l and r anchors for l and r forms in the description. If that's what it did do, it might possibly bring us closer to a good solution for camel-case matching. But as the regex above indicates, that isn't the case. I don't really see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to the beginning. I think those forms came about as an attempt to get camel-case to work. > * Why do the examples in the manual add r:|=* to the end of each > matcher? This appears to make no difference at all. For the case where the cursor is in the middle rather than the end. For the example from the manual with Usenet group names like comp.sources.unix, try c.s.u with the cursor after the s. There are three components. Two have a dot anchor at the end. The final has an end-of-string anchor. > * It appears that the order of "match descriptions" in a matchers > matters, but it is unclear to me in what way and it isn't mentioned in > the manual. For example, the pairs of matchers below differ only in > the order of their match descriptions, yet each produces a different > behavior. How are the match descriptions inside a matcher evaluated > and what causes the difference between these? Order shouldn't really matter (apart from the x: matcher). As I mention earlier, you can consider it as being the alternaton of all of them - at every point in the command-line where one of them can do something. So a single match may rely on more than one matching control to be matched. I can imagine that order might matter where you have mixed up anchors. An example would be interesting. > * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd > a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not. In my testing, neither do. Where is the cursor? You can think of the matching as adding .* at the cursor position so a/b completes to a/bc with no matching control if the cursor is at the end. The lack of other candidate completions can also confuse testing of this because with prefix completion, a/bc can be the only unambiguous match. Are you sure you don't have other customisations that is allowing the first case to match. The l: pattern allows punctuation after any character so a/b becomes the pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]]) The r: pattern allows anything before the punctuation so a/b becomes the pattern a*/b > * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_] > m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the > unambiguous substring on the first try, but on the second try, only > the former will then list both completions, whereas the latter will > complete only 'a-b'. I'm not sure I follow what you mean by the first and second try. If you mean a second press of <tab>, matching is done completely anew with the new command-line contents. With just compadd -M 'l:?|=[_-]' - a-b a_b ab<tab> offers both candidates as matches. Adding 'm:-=_' in just means that completion after a-b will also match a_b Single element correspondence classes are pointless by the way. Especially with the uppercase forms (L: etc) it is easy to create situations where an unambiguous substring is inserted and the set of candidate matches is quite different with the new command-line contents. The effect can be somewhat jarring and has the appearance of a bug. Oliver ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-09-26 13:09 ` Oliver Kiddle @ 2021-10-08 22:38 ` Marlon Richert 2021-10-09 16:23 ` Bart Schaefer 2021-10-09 21:59 ` Oliver Kiddle 0 siblings, 2 replies; 11+ messages in thread From: Marlon Richert @ 2021-10-08 22:38 UTC (permalink / raw) To: Oliver Kiddle; +Cc: Zsh Users [-- Attachment #1: Type: text/plain, Size: 8626 bytes --] Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't quite understand all of it, though. Let me try to explain how I've understood things, but in a way that I find easier to process, and do please correct me where I'm wrong. The way I've understood it, is that, if $word contains the command line string for which completion is attempted, then each matcher should transform $word as follows: * m:$lpat=$tpat -> ${word//$~lpat/$~tpat} * b:$lpat=$tpat -> ${word/#$~lpat/$~tpat} * l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat} * l:||$ranchor=$tpat -> ${word/#(#b)($~ranchor)/$~tpat$match[1]} * l:$lanchor|$lpat=$tpat -> ${word//(#b)($~lanchor)$~lpat/$match[1]$~tpat} * l:$lanchor||$ranchor=$tpat -> ${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]} * e:$lpat=$tpat -> ${word/%$~lpat/$~tpat} * r:$lpat|=$tpat -> ${word/%$~lpat/$~tpat} * r:$lanchor||=$tpat -> ${word/%(#b)($~lanchor)/$match[1]$~tpat} * r:$lpat|$ranchor=$tpat -> ${word//(#b)$~lpat($~ranchor)/$~tpat$match[1]} * r:$lanchor||$ranchor=$tpat -> ${word//(#b)($~lanchor)($~ranchor)/$match[1]$~tpat$match[2]} However, this leaves several transformations identical, which makes me believe I've misunderstood something. What did I miss? On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote: > > Marlon Richert wrote: > > How can I make a matcher that completes the right-most part (and only > > the right-most part) of each subword? That is, given a target > > completion 'abcDefGhi', how do I make a match specification that > > completes inputs > > If you're trying to do camel-case matching, one option is: > 'r:|[A-Z]=* r:|=*' > > The following was used by the original creator of matching control, it > works and breaks for the same cases as above in your example: > 'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*' > > These allow extra characters at the beginning. So in your example, D > and DG match the target. There are also oddities with consecutive runs > of upper case characters, consider e.g. completion after ssh -o where > there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally > would. > > With matching control, it is often easiest if you view it as converting > what is on the command-line into a regular expression. I haven't probed > the source code to get a precise view of how these are mapped. For my > own purposes, I keep a list but don't trust it in all cases because I've > found contradictory examples and tweaked it more than once, perhaps > making it less accurate in the process. So with the caveat that this > may contain errors, my current list is as follows: > > Not that that starting point is: > [cursor position] → .* > Then: > 'm:a=b' – a → b (* doesn't work on rhs) > 'r:|b=*' – b → [^b]*b > 'r:a|b=*' – ab → [^b]*a?b > 'r:a|b=c' - ab → cb > 'l:a|=*' – a → [^a]*a > 'l:a|b=*' – ab → [^a]*ab? > 'l:a|b=c' – ab → ac > 'b:a=*' – ^a → .* > 'b:a=c' – ^a → ^c > 'e:a=*' – a$ → .* > 'r:a||b=*' – b → [^a]*ab (only * works on rhs, empty a or b has no use) > 'l:a||b=*' – ^a → a.* (only * on rhs, empty a no use, b ignored?!) > > Something like [A-Z] becomes it's concrete form from the command-line in the regex > For correspondence classes, the corresponding form goes in the regex and only work with m:/M: forms. > ** is like * but with .* instead of [^x]* > > In all cases, the original unchanged form also passes - a matching > control does not have to be used. I've excluded those in the regular > expressions above. But including them note the following potentially > useful effects with an empty lpat: > > 'r:|b=c' – b → c?b > 'l:a|=c' – a → ac? > > When composing multiple matching controls, it doesn't try to apply over > the results of the previous. You can consider it an alternation of the > effect of each matching control. > > So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?) > > For the most part there are certain common forms and if you stick to > those, you find fewer bugs than when being creative. > > The || forms seem buggy to me. From the documentation, my assumption > would be that one means a[^a]*b and the other a[^b]*b > That could be more helpful for camel-case but I would need to generate > tests to say for sure. > b seems to even be ignored for the l form. > > > Additionally, the following are unclear to me from the manual: > > * What is the exact difference between l:lanchor||ranchor=tpat and > > r:lanchor||ranchor=tpat ? > > From the documentation and assuming some actual symmetry I would assume > the difference to be that lanchor needs to match the completion > candidate but not the command-line, while a tpat of * will not match > ranchor – swap l and r anchors for l and r forms in the description. > If that's what it did do, it might possibly bring us closer to a good > solution for camel-case matching. > > But as the regex above indicates, that isn't the case. I don't really > see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to > the beginning. I think those forms came about as an attempt to get > camel-case to work. > > > * Why do the examples in the manual add r:|=* to the end of each > > matcher? This appears to make no difference at all. > > For the case where the cursor is in the middle rather than the end. For > the example from the manual with Usenet group names like > comp.sources.unix, try c.s.u with the cursor after the s. > > There are three components. Two have a dot anchor at the end. The final > has an end-of-string anchor. > > > * It appears that the order of "match descriptions" in a matchers > > matters, but it is unclear to me in what way and it isn't mentioned in > > the manual. For example, the pairs of matchers below differ only in > > the order of their match descriptions, yet each produces a different > > behavior. How are the match descriptions inside a matcher evaluated > > and what causes the difference between these? > > Order shouldn't really matter (apart from the x: matcher). > > As I mention earlier, you can consider it as being the alternaton of all > of them - at every point in the command-line where one of them can do > something. So a single match may rely on more than one matching control > to be matched. I can imagine that order might matter where you have mixed > up anchors. An example would be interesting. > > > * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd > > a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not. > > In my testing, neither do. Where is the cursor? You can think of the > matching as adding .* at the cursor position so a/b completes to a/bc > with no matching control if the cursor is at the end. The lack of other > candidate completions can also confuse testing of this because with > prefix completion, a/bc can be the only unambiguous match. Are you sure > you don't have other customisations that is allowing the first case to > match. > > The l: pattern allows punctuation after any character so a/b becomes the > pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]]) > > The r: pattern allows anything before the punctuation so a/b becomes the > pattern a*/b > > > * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_] > > m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the > > unambiguous substring on the first try, but on the second try, only > > the former will then list both completions, whereas the latter will > > complete only 'a-b'. > > I'm not sure I follow what you mean by the first and second try. If you > mean a second press of <tab>, matching is done completely anew with the > new command-line contents. > > With just compadd -M 'l:?|=[_-]' - a-b a_b > ab<tab> offers both candidates as matches. > Adding 'm:-=_' in just means that completion after a-b will also match > a_b > Single element correspondence classes are pointless by the way. > > Especially with the uppercase forms (L: etc) it is easy to create > situations where an unambiguous substring is inserted and the set of > candidate matches is quite different with the new command-line contents. > The effect can be somewhat jarring and has the appearance of a bug. > > Oliver [-- Attachment #2: Type: text/html, Size: 11357 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-08 22:38 ` Marlon Richert @ 2021-10-09 16:23 ` Bart Schaefer 2021-10-09 22:12 ` Marlon Richert 2021-10-09 21:59 ` Oliver Kiddle 1 sibling, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2021-10-09 16:23 UTC (permalink / raw) To: Marlon Richert; +Cc: Oliver Kiddle, Zsh Users On Fri, Oct 8, 2021 at 3:39 PM Marlon Richert <marlon.richert@gmail.com> wrote: > > The way I've understood it, is that, if $word contains the command line string for which completion is attempted, then each matcher should transform $word as follows: > > What did I miss? I think what you've missed is that there are two things being examined: The word on the command line, and the "trial completion", that is, the word passed to compadd that might replace the one on the command line. It's not merely (choosing the first of your seeming duplications) > * b:$lpat=$tpat -> ${word/#$~lpat/$~tpat} > * l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat} Rather it's * b:$lpat=$tpat -> [[ $trial = ${~lpat}* ]] && ${word/#$~lpat/$~tpat} * l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] && ${word/#$~lpat/$~tpat} In the cases with r: and R:, $ranchor is only compared to $trial, it is not used when replacing into $word. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-09 16:23 ` Bart Schaefer @ 2021-10-09 22:12 ` Marlon Richert 2021-10-09 22:39 ` Bart Schaefer 0 siblings, 1 reply; 11+ messages in thread From: Marlon Richert @ 2021-10-09 22:12 UTC (permalink / raw) To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh Users On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > I think what you've missed is that there are two things being > examined: The word on the command line, and the "trial completion", > that is, the word passed to compadd that might replace the one on the > command line. It's not merely (choosing the first of your seeming > duplications) > > > * b:$lpat=$tpat -> ${word/#$~lpat/$~tpat} > > * l:|$lpat=$tpat -> ${word/#$~lpat/$~tpat} > > Rather it's > > * b:$lpat=$tpat -> [[ $trial = ${~lpat}* ]] && > ${word/#$~lpat/$~tpat} > * l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] && > ${word/#$~lpat/$~tpat} Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor here? In the docs, it says: > Matching for lpat and tpat is as for m and M, but the pattern lpat matched on the command line must be preceded by the pattern lanchor. The lanchor can be blank to anchor the match to the start of the command line string; otherwise the anchor can occur anywhere, but must match in both the command line and trial completion strings. Above, $lanchor is blank and thus needs to match only the start of the command line string, whereas anything to the right of $lanchor that matches $~lpat is simply replaced with $~tpat, just as in m:$lpat=$tpat. Are the docs wrong or am I understanding them wrong? Thanks, though, for this example, because it does help me understand what the documentation means with: > If no lpat is given but a ranchor is, this matches the gap between substrings matched by lanchor and ranchor. Unlike lanchor, the ranchor only needs to match the trial completion string. Before, it was unclear to me how I should interpret "only needs to match the trial completion string", but now, I suppose it would be like this: l:$lanchor||$ranchor=$tpat -> [[ $trial == *$~lanchor$~tpat$~ranchor* ]] && ${word//(#m)($~lanchor)/$MATCH$~tpat} However, isn't that equivalent to the following? l:$lanchor||$ranchor=$tpat -> ${word//(#m)($~lanchor)/$MATCH$~tpat$~ranchor} Correct me if I'm wrong, but this seems to match the exact same trial strings. But then, the above transformation would be equivalent to this one: l:$lanchor|=$tpat$ranchor -> ${word//(#b)($~lanchor)/$match[1]$~tpat$~ranchor} Again, something doesn't seem right here. On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > In the cases with r: and R:, $ranchor is only compared to $trial, it > is not used when replacing into $word. Are you sure? The phrase "the ranchor only needs to match the trial completion string" is listed in the docs for l and L. Conversely, in the docs for r and R, it says "As l, L, b and B, with the difference that the command line and trial completion patterns are anchored on the right side." This makes me believe that, in the case of r and R, it is in fact $lanchor that is compared only to $trial, not $ranchor. If your interpretation were correct, l:$lanchor||$ranchor=$tpat and r:$lanchor||$ranchor=$tpat would be completely equivalent. Perhaps I should consider the _examples_ in the docs to be the truth and just ignore the ambiguous wording given earlier on in the docs. If I do that, then Oliver's examples now start to make sense to me and I can deduce the transformations as follows: Given 'r:|.=* r:|=*', c.u becomes c(^*.*).u* and c.s.u becomes c(^*.*).s(^*.*).u*. Ergo: * r:$lpat|$ranchor=$tpat -> ${word//(#b)$~lpat($~ranchor)/($~tpat~*$~ranchor*)$match[1]} * l:$lanchor|$lpat=$tpat -> ${word//(#b)($~lanchor)$~lpat/$match[1]($~tpat~*$~lanchor*)} * r:$lpat|=$tpat -> ${word%$~lpat}$~tpat * l:|$lpat=$tpat -> $~tpat${word#$~lpat} Given 'r:|.=** r:|=*', c.u becomes c*.u*. Given 'r:|[[:upper:]0-9]=** r:|=*', H becomes *H* and 2 becomes *2*. Ergo: * r:$lpat|$ranchor=** -> ${word//(#b)$~lpat($~ranchor)/*$match[1]} * l:$lanchor|$lpat=** -> ${word//(#b)($~lanchor)$~lpat/$match[1]*} Given 'r:[^[:upper:]0-9]||[[:upper:]0-9]=** r:|=*', H becomes *[^[:upper:]0-9]H* and 2 becomes *[^[:upper:]0-9]2*. Ergo: * r:$lanchor||$ranchor=** -> ${word//(#m)($~ranchor)/*$~lanchor$MATCH} * l:$lanchor||$ranchor=** -> ${word//(#m)($~lanchor)/$MATCH$~lanchor*} Given 'B:[nN][oO]= M:_= M:{[:upper:]}={[:lower:]}', both _NO_f and NONO_f become f. Ergo: * b:$lpat=$tpat -> ${word/#(#b)(|?)($~lpat)##/$match[1]$~tpat} * e:$lpat=$tpat -> ${word/%(#b)($~lpat)##(|?)/$~tpat$match[-1]} How about changing the docs to just literally state the transformation that each matcher applies? It would be clearer than the prose it currently contains, which is ambiguous and open to interpretation. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-09 22:12 ` Marlon Richert @ 2021-10-09 22:39 ` Bart Schaefer 2021-10-10 11:17 ` Marlon Richert 0 siblings, 1 reply; 11+ messages in thread From: Bart Schaefer @ 2021-10-09 22:39 UTC (permalink / raw) To: Marlon Richert; +Cc: Oliver Kiddle, Zsh Users On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote: > > On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > * l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] && > > ${word/#$~lpat/$~tpat} > > Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor > here? Well, sort of, yes. See Oliver's more recent message. A better description of what's happening is that the matcher transforms the word from the command line into a pattern, and then that pattern is compared to every one of the trial candidates, and then pieces of the trial candidates are extracted and merged with the word from the command line to generate the list of possible replacements for that word. It's never as simple as a string substitution on the word itself taken directly from the patterns in the matcher. > How about changing the docs to just literally state the transformation > that each matcher applies? Because it's not a literal transformation. Matchers don't transform, they create a comparison between the command line and the compadd strings and define which parts of the command line can be replaced by what parts of the compadd strings when that comparison finds a match. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-09 22:39 ` Bart Schaefer @ 2021-10-10 11:17 ` Marlon Richert 0 siblings, 0 replies; 11+ messages in thread From: Marlon Richert @ 2021-10-10 11:17 UTC (permalink / raw) To: Bart Schaefer; +Cc: Oliver Kiddle, Zsh Users On Sun, Oct 10, 2021 at 1:40 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote: > > > > On Sat, Oct 9, 2021 at 7:23 PM Bart Schaefer <schaefer@brasslantern.com> wrote: > > > > > > * l:|$lpat=$tpat -> [[ $trial = *${~lpat}* ]] && > > > ${word/#$~lpat/$~tpat} > > > > Perhaps I'm mistaken, but aren't you mixing up $lpat and $lanchor > > here? > > Well, sort of, yes. See Oliver's more recent message. A better > description of what's happening is that the matcher transforms the > word from the command line into a pattern, and then that pattern is > compared to every one of the trial candidates, and then pieces of the > trial candidates are extracted and merged with the word from the > command line to generate the list of possible replacements for that > word. It's never as simple as a string substitution on the word > itself taken directly from the patterns in the matcher. Having an explanation like this in the docs would help so much! :) Or what would help even more is to put it in a step-by-step form. For example: > 1. Each matcher generates a search pattern by taking the word on the command line (or the pattern produced by the previous matcher) and applying a transformation specific to the matcher. If the matcher has an uppercase letter, it also captures the original substrings of the command line word that it transformed. > 2. After all matchers are applied, the resulting search pattern is used to find matching completions. > 3. For each uppercase matcher, substrings captured from the word on the command line are then inserted into the matching completions. And then each matcher could state exactly how it produces its search pattern. For example: > r:lanchor||ranchor=tpat > l:lanchor||ranchor=tpat > > 1. Find each substring in the word on the command line that matches pattern ranchor (for r:) or lanchor (for l:). If this anchor is empty, it matches the end (for r:) or the beginning (for l:) of the word on the command line. > 2. Insert a pattern to the left (for r:) or right (for l:) of each substring: > * If tpat is **, then insert `*lanchor` (for r:) or `ranchor*` (for l:). > * Otherwise, insert `(tpat~*lanchor*)lanchor` (for r:) or `ranchor(tpat~*ranchor*)` (for l:). > > Example: If the word on the command line is `H2`, then the match spec r:[[:lower:]]|[[:upper:][:digit:]]=** captures the substrings 'H' and '2' and generates the search pattern `*[[:lower:]]H*[[:lower:]]2`. On Sun, Oct 10, 2021 at 1:40 AM Bart Schaefer <schaefer@brasslantern.com> wrote: > > On Sat, Oct 9, 2021 at 3:12 PM Marlon Richert <marlon.richert@gmail.com> wrote: > > > > How about changing the docs to just literally state the transformation > > that each matcher applies? > > Because it's not a literal transformation. Matchers don't transform, > they create a comparison between the command line and the compadd > strings and define which parts of the command line can be replaced by > what parts of the compadd strings when that comparison finds a match. Perhaps "transformation" is not the word I should've used. What I meant is that each matcher generates a pattern, using as input the word on the command line or the pattern generated by the previous matcher. After this has been done in turn for each matcher, the resulting pattern is then used to find matching completions. Isn't that correct? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-08 22:38 ` Marlon Richert 2021-10-09 16:23 ` Bart Schaefer @ 2021-10-09 21:59 ` Oliver Kiddle 2021-10-10 12:05 ` Marlon Richert 1 sibling, 1 reply; 11+ messages in thread From: Oliver Kiddle @ 2021-10-09 21:59 UTC (permalink / raw) To: Marlon Richert; +Cc: Zsh Users Marlon Richert wrote: > Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't quite > understand all of it, though. Let me try to explain how I've understood things, > but in a way that I find easier to process, and do please correct me where I'm > wrong. > > The way I've understood it, is that, if $word contains the command line string > for which completion is attempted, then each matcher should transform $word as > follows: That's not what the implementation does in any real sense so I'm not sure how helpful it is to reframe the regular expressions I gave in zsh syntax. But the effect is along those basic lines if you view the "transformed" $word as being a pattern that is matched against each of the candidate matches in turn to decide which to present as matches. I find it helpful as a brief reference but if it doesn't make sense to you, ignore it. > However, this leaves several transformations identical, which makes me believe > I've misunderstood something. > > What did I miss? The difference between b: and l: with an empty anchor (or e/r) is not encapsulated by my regular expressions. They only differ in how strict the anchoring to the start of the match is where another matching control allowed extra characters to be inserted at the beginning. The example given when this was added was zsh option completion where underscores are ignored and a prefix of NO is allowed. I took a look at the source code and dug out original -workers posts and it does seem that the intention for the two anchor || forms was as I thought. Even as designed I don't think either is ideal for camel case - the l: form excludes characters from the wrong anchor for that. The matching code looks a lot like regular expression matching with a back tracking algorithm. Oliver ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-09 21:59 ` Oliver Kiddle @ 2021-10-10 12:05 ` Marlon Richert 2021-10-10 20:14 ` Marlon Richert 0 siblings, 1 reply; 11+ messages in thread From: Marlon Richert @ 2021-10-10 12:05 UTC (permalink / raw) To: Oliver Kiddle; +Cc: Zsh Users On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote: > > Marlon Richert wrote: > > Thanks, Oliver, for your long and thoughtful response. I'm afraid I don't quite > > understand all of it, though. Let me try to explain how I've understood things, > > but in a way that I find easier to process, and do please correct me where I'm > > wrong. > > > > The way I've understood it, is that, if $word contains the command line string > > for which completion is attempted, then each matcher should transform $word as > > follows: > > That's not what the implementation does in any real sense so I'm not > sure how helpful it is to reframe the regular expressions I gave in zsh > syntax. But the effect is along those basic lines if you view the > "transformed" $word as being a pattern that is matched against each of > the candidate matches in turn to decide which to present as matches. > > I find it helpful as a brief reference but if it doesn't make sense to > you, ignore it. It didn't make sense at first, because I somehow overlooked that you were using regex. I read them as glob patterns. :) But now that I realize that, let me have a second look: On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote: > > With matching control, it is often easiest if you view it as converting > what is on the command-line into a regular expression. I haven't probed > the source code to get a precise view of how these are mapped. For my > own purposes, I keep a list but don't trust it in all cases because I've > found contradictory examples and tweaked it more than once, perhaps > making it less accurate in the process. So with the caveat that this > may contain errors, my current list is as follows: > > Not that that starting point is: > [cursor position] → .* > Then: > 'm:a=b' – a → b (* doesn't work on rhs) > 'r:|b=*' – b → [^b]*b The appearance of [^a] and [^b] in your patterns was a complete surprise to me. I would've expected * to work as * in a glob expression. This is not clear from the docs. Now that I know that the matcher syntax was based on regex, it makes more sense, but I still wouldn't have figured this out intuitively. A clearer explanation about this in the docs would be helpful. Yes, it's mentioned somewhere in the examples, but it should be explained more clearly earlier on. > 'r:a|b=*' – ab → [^b]*a?b This one looks incorrect to me as it does not match the example in the docs. From that example, it appears to me that it is supposed to work like this: 'r:a|b=*' – b → [^b]*ab > 'r:a|b=c' - ab → cb > 'l:a|=*' – a → [^a]*a > 'l:a|b=*' – ab → [^a]*ab? Shouldn't these last two result in a[^a]* and ab[^a]*, respectively, since the anchor goes to the left? > 'l:a|b=c' – ab → ac > 'b:a=*' – ^a → .* Oh, but here * does work like a * glob? So, I guess * behaves differently only when anchors are involved? > 'b:a=c' – ^a → ^c > 'e:a=*' – a$ → .* > 'r:a||b=*' – b → [^a]*ab (only * works on rhs, empty a or b has no use) > 'l:a||b=*' – ^a → a.* (only * on rhs, empty a no use, b ignored?!) The comments on the last two items sound like bugs to me. Also, 'l:a||b=*' should work on just 'a' and not require '^a'. On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote: > > The difference between b: and l: with an empty anchor (or e/r) is not > encapsulated by my regular expressions. They only differ in how strict > the anchoring to the start of the match is where another matching > control allowed extra characters to be inserted at the beginning. So, does that mean then that matcher are not evaluated strictly left-to-right? > The example given when this was added was zsh option completion where > underscores are ignored and a prefix of NO is allowed. About that example, what exactly is the difference between L: and B: that lets B: complete '_NO_f' to '_NO_foo' and 'NONO_f' to 'NONO_f' but not L:? It's not clear from the example, let alone from the description of the matchers. > I took a look at the source code and dug out original -workers posts and > it does seem that the intention for the two anchor || forms was as I > thought. Even as designed I don't think either is ideal for camel case - > the l: form excludes characters from the wrong anchor for that. > The matching code looks a lot like regular expression matching with a > back tracking algorithm. Y02compmatch.ztst contains a lot of examples that could be added to the docs to better explain how the different matchers are intended to be used. It would help to better understand their workings. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Questions about completion matchers 2021-10-10 12:05 ` Marlon Richert @ 2021-10-10 20:14 ` Marlon Richert 0 siblings, 0 replies; 11+ messages in thread From: Marlon Richert @ 2021-10-10 20:14 UTC (permalink / raw) To: Oliver Kiddle; +Cc: Zsh Users I have to say, after having processed both of your explanations, it appears that r:lanchor||ranchor=tpat and l:lanchor||ranchor=tpat are not working as intended. It intuitively feels like they should cover this very common case: If lanchor and ranchor are present and adjacent in the command line string, then apply m:=tpat to the empty string between them. That is to say: Enable completion between lanchor and ranchor, just like we can enable completion to the left or right of an anchor. In terms of syntax, this treats the void between || as an empty lpat, just like it is in :|lanchor= or :ranchor|=. The || form (and indeed, the | form) is essentially a conditional version of one of the other matchers. This actually extrapolates to a consistent interpretation of the symbols in the matching syntax: * lpat is always the substring whose meaning is "transformed": That is to say, it (and only it) is made to be considered equal to any trial substring matching tpat. It is permitted for lpat to be equal to the empty string or the beginning/end of the command line string. * Each |ranchor or lanchor| adds a constraint: A substring matching them needs to be directly to the right or left of lpat -- or lpat's meaning won't be "transformed". The meaning of the anchors themselves is never "transformed": Any substring matching the anchor on the command line needs to be matched literally in the trial string. * For the first anchor in a matcher, the substring matching lpat will not be considered equivalent to a trial substring that matches the anchor. This clause is essentially there to prevent the matcher from becoming too "greedy". * For the second anchor, there is no such restriction. (Or otherwise, the matcher could easily become too constrained and unable to match any trial string at all.) From this then follows the following meaning of each matcher: * m:lpat=tpat - Treat each substring matching lpat on the command line as being equal to any substring matching tpat in the trial string. * r:lpat|ranchor=** - The same as m:lpat=*, but only if the substring matching lpat has directly to its right a substring matching ranchor. * r:lpat|ranchor=tpat - The same as m:lpat=tpat~ranchor, but only if the substring matching lpat has directly to its right a substring matching ranchor. * r:lanchor||ranchor=tpat - The same as r:|ranchor=tpat, but only if the substring matching ranchor is immediately preceded by a substring matching lanchor. One could even continue this pattern, as || is nothing more than |lpat| with lpat equal to the empty string: * r:lanchor|lpat|ranchor=tpat - The same as r:lpat|ranchor=tpat, but only if the substring matching lpat is immediately preceded by a substring matching lanchor. However, in practice, the more constraints a matcher has, the more likely it is to break consistency with this pattern. As a result, the || matchers no longer support the case for which it looks that they were intended - to complete the missing substring between ranchor and lanchor - which is now, unfortunately, a missing feature. I would hope the implementation of the || matchers could be modified to restore this feature -- which I assume must (or was intended to) have been there at some point. > On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@zsh.org> wrote: > > > > With matching control, it is often easiest if you view it as converting > > what is on the command-line into a regular expression. I haven't probed > > the source code to get a precise view of how these are mapped. For my > > own purposes, I keep a list but don't trust it in all cases because I've > > found contradictory examples and tweaked it more than once, perhaps > > making it less accurate in the process. So with the caveat that this > > may contain errors, my current list is as follows: > > > > Not that that starting point is: > > [cursor position] → .* > > Then: > > 'm:a=b' – a → b (* doesn't work on rhs) > > 'r:|b=*' – b → [^b]*b > > The appearance of [^a] and [^b] in your patterns was a complete > surprise to me. I would've expected * to work as * in a glob > expression. This is not clear from the docs. Now that I know that the > matcher syntax was based on regex, it makes more sense, but I still > wouldn't have figured this out intuitively. A clearer explanation > about this in the docs would be helpful. Yes, it's mentioned somewhere > in the examples, but it should be explained more clearly earlier on. > > > 'r:a|b=*' – ab → [^b]*a?b > > This one looks incorrect to me as it does not match the example in the > docs. From that example, it appears to me that it is supposed to work > like this: > 'r:a|b=*' – b → [^b]*ab > > > 'r:a|b=c' - ab → cb > > 'l:a|=*' – a → [^a]*a > > 'l:a|b=*' – ab → [^a]*ab? > Shouldn't these last two result in a[^a]* and ab[^a]*, respectively, > since the anchor goes to the left? > > > 'l:a|b=c' – ab → ac > > 'b:a=*' – ^a → .* > > Oh, but here * does work like a * glob? So, I guess * behaves > differently only when anchors are involved? > > > 'b:a=c' – ^a → ^c > > 'e:a=*' – a$ → .* > > 'r:a||b=*' – b → [^a]*ab (only * works on rhs, empty a or b has no use) > > 'l:a||b=*' – ^a → a.* (only * on rhs, empty a no use, b ignored?!) > > The comments on the last two items sound like bugs to me. Also, > 'l:a||b=*' should work on just 'a' and not require '^a'. > > > On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@zsh.org> wrote: > > > > The difference between b: and l: with an empty anchor (or e/r) is not > > encapsulated by my regular expressions. They only differ in how strict > > the anchoring to the start of the match is where another matching > > control allowed extra characters to be inserted at the beginning. > > So, does that mean then that matcher are not evaluated strictly left-to-right? > > > The example given when this was added was zsh option completion where > > underscores are ignored and a prefix of NO is allowed. > > About that example, what exactly is the difference between L: and B: > that lets B: complete '_NO_f' to '_NO_foo' and 'NONO_f' to 'NONO_f' > but not L:? It's not clear from the example, let alone from the > description of the matchers. > > > I took a look at the source code and dug out original -workers posts and > > it does seem that the intention for the two anchor || forms was as I > > thought. Even as designed I don't think either is ideal for camel case - > > the l: form excludes characters from the wrong anchor for that. > > The matching code looks a lot like regular expression matching with a > > back tracking algorithm. > > Y02compmatch.ztst contains a lot of examples that could be added to > the docs to better explain how the different matchers are intended to > be used. It would help to better understand their workings. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-10-10 20:16 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-21 9:23 Questions about completion matchers Marlon Richert 2021-09-22 23:25 ` Bart Schaefer 2021-09-26 13:09 ` Oliver Kiddle 2021-10-08 22:38 ` Marlon Richert 2021-10-09 16:23 ` Bart Schaefer 2021-10-09 22:12 ` Marlon Richert 2021-10-09 22:39 ` Bart Schaefer 2021-10-10 11:17 ` Marlon Richert 2021-10-09 21:59 ` Oliver Kiddle 2021-10-10 12:05 ` Marlon Richert 2021-10-10 20:14 ` Marlon Richert
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).