zsh-workers
 help / color / mirror / code / Atom feed
* [RFC] Case-insensitive path completion in _git
@ 2020-12-13  4:40 dana
  2020-12-13 13:55 ` m0viefreak
  2021-03-27 20:06 ` Lawrence Velázquez
  0 siblings, 2 replies; 10+ messages in thread
From: dana @ 2020-12-13  4:40 UTC (permalink / raw)
  To: Zsh hackers list

I've had this sitting for a while:

_git has issues completing files case-insensitively (if you have matcher-list
'm:{a-zA-Z}={A-Za-z}' or whatever). Looking into it, i found that __git_files
is trying to pass a glob pattern to `git ls-files`, and this fails if there's
not an exact case match, since ls-files is always case-sensitive.

There is a fall-back to `git ls-files` with no path, but this doesn't always
work either, because it defaults to the CWD, and the file you're trying to
complete may not be under the CWD. Even when the fall-back succeeds, it's not
ideal, because it'll pass every single file in the tree to _multi_parts, which
can be slow.

The following hack solves the problem for me, but it might be too silly to
commit. Can anyone think of a more proper fix? If not, would the hack be
viable (probably gated behind a style)?

dana


diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
index 81a060e4d..7d0201efe 100644
--- a/Completion/Unix/Command/_git
+++ b/Completion/Unix/Command/_git
@@ -7138,6 +7138,7 @@ __git_files_relative () {
 (( $+functions[__git_files] )) ||
 __git_files () {
   local compadd_opts opts tag description gittoplevel gitprefix files expl
+  local MATCH MBEGIN MEND
 
   zparseopts -D -E -a compadd_opts V+: J+: 1 2 o+: n f x+: X+: M+: P: S: r: R: q F:
   zparseopts -D -E -a opts -- -cached -deleted -modified -others -ignored -unmerged -killed x+: --exclude+:
@@ -7152,14 +7153,26 @@ __git_files () {
 
   # TODO: --directory should probably be added to $opts when --others is given.
 
+  # ls-files allows the part of the path under the repository directory to be a
+  # glob pattern. However, these patterns are always case-sensitive (even if the
+  # file system is not). This means that if we have a matcher style set to make
+  # path completion case-insensitive, it won't (necessarily) work, because we
+  # won't get any results back from Git. To work around this, we can transform
+  # the pattern from abc to [Aa][Bb][Cc]. This is very dumb, but seemingly
+  # reliable enough
   local pref=${(Q)${~PREFIX}}
+  pref=${pref//(#m)[[:alpha:]]/\[${(U)MATCH}${(L)MATCH}\]}
   [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix$pref
 
   # First allow ls-files to pattern-match in case of remote repository
   files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- ${(q)${pref:+$pref\*}:-.} 2>/dev/null)"})
   __git_command_successful $pipestatus || return
 
-  # If ls-files succeeded but returned nothing, try again with no pattern
+  # If ls-files succeeded but returned nothing, try again with no pattern. Note
+  # that ls-files defaults to the CWD if not given a path, so if the file we
+  # were trying to add is in an *adjacent* directory, this won't return anything
+  # helpful. (If it did, the case-sensitivity issue mentioned above would only
+  # be a minor performance issue rather than a complete failure to return)
   if [[ -z "$files" && -n "$pref" ]]; then
     files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- 2>/dev/null)"})
     __git_command_successful $pipestatus || return



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2020-12-13  4:40 [RFC] Case-insensitive path completion in _git dana
@ 2020-12-13 13:55 ` m0viefreak
  2020-12-14 10:25   ` dana
  2021-03-27 21:08   ` Daniel Shahaf
  2021-03-27 20:06 ` Lawrence Velázquez
  1 sibling, 2 replies; 10+ messages in thread
From: m0viefreak @ 2020-12-13 13:55 UTC (permalink / raw)
  To: dana, Zsh hackers list



On 13.12.2020 05:40, dana wrote:
> _git has issues completing files case-insensitively (if you have matcher-list
> 'm:{a-zA-Z}={A-Za-z}' or whatever). Looking into it, i found that __git_files
> is trying to pass a glob pattern to `git ls-files`, and this fails if there's
> not an exact case match, since ls-files is always case-sensitive.

> There is a fall-back to `git ls-files` with no path, but this doesn't always
> work either, because it defaults to the CWD, and the file you're trying to
> complete may not be under the CWD. Even when the fall-back succeeds, it's not
> ideal, because it'll pass every single file in the tree to _multi_parts, which
> can be slow.
> 
> The following hack solves the problem for me, but it might be too silly to
> commit. Can anyone think of a more proper fix? If not, would the hack be
> viable (probably gated behind a style)?
> 
> dana
>
> ...
> +  pref=${pref//(#m)[[:alpha:]]/\[${(U)MATCH}${(L)MATCH}\]}
> ...

This solution only handles simple upper-lower-case conversion matcher-list styles,
but it will not work for more complicated styles like 'r:|[._-]=* r:|=*'.
That's probably fine for most use-cases.

I have been running with the following patch applied locally for quite some time:


diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
index 81a060e4d..1a6619e31 100644
--- a/Completion/Unix/Command/_git
+++ b/Completion/Unix/Command/_git
@@ -7153,7 +7153,7 @@ __git_files () {
    # TODO: --directory should probably be added to $opts when --others is given.

    local pref=${(Q)${~PREFIX}}
-  [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix$pref
+  [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix

    # First allow ls-files to pattern-match in case of remote repository
    files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- ${(q)${pref:+$pref\*}:-.} 2>/dev/null)"})


It just lets _multi_parts do the matching.
Performance-wise, it's obviously worse than than your proposed solution, but
still better than the fallback, since it still passes $gitprefix to ls-files,
which should also avoid the CWD problem.

The trigger of the fallback is broken anyways, isn't it?

>  # If ls-files succeeded but returned nothing, try again with no pattern
>  if [[ -z "$files" && -n "$pref" ]]; then

It just considers the case where no match was found at all, but it does not
handle the case where *some* matches were found, but not all.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2020-12-13 13:55 ` m0viefreak
@ 2020-12-14 10:25   ` dana
  2021-03-27 21:08   ` Daniel Shahaf
  1 sibling, 0 replies; 10+ messages in thread
From: dana @ 2020-12-14 10:25 UTC (permalink / raw)
  To: m0viefreak; +Cc: Zsh hackers list

On 13 Dec 2020, at 07:55, m0viefreak <m0viefreak.cm@googlemail.com> wrote:
> This solution only handles simple upper-lower-case conversion matcher-list styles,
> but it will not work for more complicated styles like 'r:|[._-]=* r:|=*'.
> That's probably fine for most use-cases.

That's true.

On 13 Dec 2020, at 07:55, m0viefreak <m0viefreak.cm@googlemail.com> wrote:
> Performance-wise, it's obviously worse than than your proposed solution, but
> still better than the fallback, since it still passes $gitprefix to ls-files,
> which should also avoid the CWD problem.

Are you sure? It seems to me that $gittoplevel$gitprefix is normally the CWD,
so you'd just be running `git ls-files /path/to/cwd/*`, which will never match
a PREFIX like `../foo`.

I guess we could make it so that it retains any leading segments in PREFIX,
but then you have the same two issues again (those segments have to be matched
case-insensitively, and it doesn't account for more complex matcher-list
styles).

I also just realised that we can't match stuff like `./foo` or `.//foo` or
`../cwd/../`, since `git ls-files` elides unnecessary slashes/segments in its
output.

On 13 Dec 2020, at 07:55, m0viefreak <m0viefreak.cm@googlemail.com> wrote:
> The trigger of the fallback is broken anyways, isn't it? ...
> It just considers the case where no match was found at all, but it does not
> handle the case where *some* matches were found, but not all.

As it is in the repo, yes, actually. If you have like foobar and Foobaz, then
foo<TAB> will return only the former without triggering the fall-back.

dana



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2020-12-13  4:40 [RFC] Case-insensitive path completion in _git dana
  2020-12-13 13:55 ` m0viefreak
@ 2021-03-27 20:06 ` Lawrence Velázquez
  1 sibling, 0 replies; 10+ messages in thread
From: Lawrence Velázquez @ 2021-03-27 20:06 UTC (permalink / raw)
  To: zsh-workers; +Cc: dana

On Sat, Dec 12, 2020, at 11:40 PM, dana wrote:
> I've had this sitting for a while:
> 
> _git has issues completing files case-insensitively (if you have matcher-list
> 'm:{a-zA-Z}={A-Za-z}' or whatever). Looking into it, i found that __git_files
> is trying to pass a glob pattern to `git ls-files`, and this fails if there's
> not an exact case match, since ls-files is always case-sensitive.
> 
> There is a fall-back to `git ls-files` with no path, but this doesn't always
> work either, because it defaults to the CWD, and the file you're trying to
> complete may not be under the CWD. Even when the fall-back succeeds, it's not
> ideal, because it'll pass every single file in the tree to _multi_parts, which
> can be slow.
> 
> The following hack solves the problem for me, but it might be too silly to
> commit. Can anyone think of a more proper fix? If not, would the hack be
> viable (probably gated behind a style)?

Any further feedback (assuming dana still feels like pushing this, of course)?

vq


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2020-12-13 13:55 ` m0viefreak
  2020-12-14 10:25   ` dana
@ 2021-03-27 21:08   ` Daniel Shahaf
  2021-03-28  5:07     ` dana
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Shahaf @ 2021-03-27 21:08 UTC (permalink / raw)
  To: m0viefreak; +Cc: dana, Zsh hackers list

m0viefreak wrote on Sun, Dec 13, 2020 at 14:55:55 +0100:
> 
> 
> On 13.12.2020 05:40, dana wrote:
> > _git has issues completing files case-insensitively (if you have matcher-list
> > 'm:{a-zA-Z}={A-Za-z}' or whatever). Looking into it, i found that __git_files
> > is trying to pass a glob pattern to `git ls-files`, and this fails if there's
> > not an exact case match, since ls-files is always case-sensitive.
> 
> > There is a fall-back to `git ls-files` with no path, but this doesn't always
> > work either, because it defaults to the CWD, and the file you're trying to
> > complete may not be under the CWD. Even when the fall-back succeeds, it's not
> > ideal, because it'll pass every single file in the tree to _multi_parts, which
> > can be slow.
> > 
> > The following hack solves the problem for me, but it might be too silly to
> > commit. Can anyone think of a more proper fix? If not, would the hack be
> > viable (probably gated behind a style)?
> > 
> > dana
> > 
> > ...
> > +  pref=${pref//(#m)[[:alpha:]]/\[${(U)MATCH}${(L)MATCH}\]}
> > ...
> 
> This solution only handles simple upper-lower-case conversion matcher-list styles,
> but it will not work for more complicated styles like 'r:|[._-]=* r:|=*'.
> That's probably fine for most use-cases.

What about the default use-case — no matchspec and case-sensitive
matching?  The patch transforms a plain string to a list of character
classes, and it's not clear to me what costs that might incur on large
trees.

@Lawrence Thanks for fishing these out of the archives ☺

Cheers,

Daniel

> I have been running with the following patch applied locally for quite some time:
> 
> 
> diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
> index 81a060e4d..1a6619e31 100644
> --- a/Completion/Unix/Command/_git
> +++ b/Completion/Unix/Command/_git
> @@ -7153,7 +7153,7 @@ __git_files () {
>    # TODO: --directory should probably be added to $opts when --others is given.
> 
>    local pref=${(Q)${~PREFIX}}
> -  [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix$pref
> +  [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix
> 
>    # First allow ls-files to pattern-match in case of remote repository
>    files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- ${(q)${pref:+$pref\*}:-.} 2>/dev/null)"})
> 
> 
> It just lets _multi_parts do the matching.
> Performance-wise, it's obviously worse than than your proposed solution, but
> still better than the fallback, since it still passes $gitprefix to ls-files,
> which should also avoid the CWD problem.
> 
> The trigger of the fallback is broken anyways, isn't it?
> 
> >  # If ls-files succeeded but returned nothing, try again with no pattern
> >  if [[ -z "$files" && -n "$pref" ]]; then
> 
> It just considers the case where no match was found at all, but it does not
> handle the case where *some* matches were found, but not all.
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2021-03-27 21:08   ` Daniel Shahaf
@ 2021-03-28  5:07     ` dana
  2021-03-28  8:08       ` Oliver Kiddle
  0 siblings, 1 reply; 10+ messages in thread
From: dana @ 2021-03-28  5:07 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: Zsh hackers list

On 27 Mar 2021, at 16:08, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> What about the default use-case — no matchspec and case-sensitive
> matching?  The patch transforms a plain string to a list of character
> classes, and it's not clear to me what costs that might incur on large
> trees.

I recently learnt that ls-files actually does support case-insensitive
matching using the following syntax:

  :(icase)/foo/bar/baz

Whether there might still be performance issues with it — i ran a quick test
against the Linux tree and it looks good enough to me, but idk how to make it
conclusive:

  % hyperfine --warmup=3 \
    "git ls-files -- ${(q)PWD}/'*Kbuild*'" \
    "git ls-files -- ${(q)PWD}/'*[Kk][Bb][Uu][Ii][Ll][Dd]*'" \
    "git ls-files -- ':(icase)'${(q)PWD}/'*kbuild*'"

  Benchmark #1: git ls-files -- /home/dana/linux/'*Kbuild*'
    Time (mean ± σ):      27.4 ms ±   0.9 ms    [User: 18.5 ms, System: 8.8 ms]
    Range (min … max):    26.0 ms …  31.9 ms    102 runs

  Benchmark #2: git ls-files -- /home/dana/linux/'*[Kk][Bb][Uu][Ii][Ll][Dd]*'
    Time (mean ± σ):      71.2 ms ±   1.0 ms    [User: 62.9 ms, System: 8.2 ms]
    Range (min … max):    70.1 ms …  76.0 ms    41 runs

  Benchmark #3: git ls-files -- ':(icase)'/home/dana/linux/'*kbuild*'
    Time (mean ± σ):      32.2 ms ±   0.6 ms    [User: 23.1 ms, System: 9.1 ms]
    Range (min … max):    30.7 ms …  33.9 ms    90 runs

  Summary
    'git ls-files -- /home/dana/linux/'*Kbuild*'' ran
      1.18 ± 0.04 times faster than 'git ls-files -- ':(icase)'/home/dana/linux/'*kbuild*''
      2.60 ± 0.09 times faster than 'git ls-files -- /home/dana/linux/'*[Kk][Bb][Uu][Ii][Ll][Dd]*''

It could still be gated behind a style if it's a concern.

dana



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2021-03-28  5:07     ` dana
@ 2021-03-28  8:08       ` Oliver Kiddle
  2021-03-30  5:59         ` dana
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Kiddle @ 2021-03-28  8:08 UTC (permalink / raw)
  To: dana; +Cc: Zsh hackers list

dana wrote:
> I recently learnt that ls-files actually does support case-insensitive
> matching using the following syntax:
>
>   :(icase)/foo/bar/baz

That applies to the whole path including the directory components which
I don't thing we'd want in this case - but correct me if I'm wrong.

It doesn't appear to support
  /foo/bar/:(icase)baz
The only reference to the feature I could find was in the top-level
git.1 man page so maybe the syntax is different.
You can also pass --icase-pathspecs or set an environment variable.

>     'git ls-files -- /home/dana/linux/'*Kbuild*'' ran
>       1.18 ± 0.04 times faster than 'git ls-files -- ':(icase)'/home/dana/linux/'*kbuild*''
>       2.60 ± 0.09 times faster than 'git ls-files -- /home/dana/linux/'*[Kk][Bb][Uu][Ii][Ll][Dd]*''

That doesn't look too bad at all but I tend not to work in big trees
like the Linux kernel. :(icase) may be worse for a deeper path where
it adds extra directory matches. Did you run these benchmarks on a
case-sensitive or insensitive file-system?

Strictly speaking, we should use * and rely on zsh completion matching
to do the filtering instead of giving git the contents of $PREFIX
anyway but that'd really slow it.

> It could still be gated behind a style if it's a concern.

I don't think thats necessary. Committing as per the original patch
seems good to me.

Oliver


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2021-03-28  8:08       ` Oliver Kiddle
@ 2021-03-30  5:59         ` dana
  2021-04-10 20:43           ` Lawrence Velázquez
  0 siblings, 1 reply; 10+ messages in thread
From: dana @ 2021-03-30  5:59 UTC (permalink / raw)
  To: Oliver Kiddle; +Cc: Zsh hackers list

On 28 Mar 2021, at 03:08, Oliver Kiddle <opk@zsh.org> wrote:
> That applies to the whole path including the directory components which
> I don't thing we'd want in this case - but correct me if I'm wrong. ...
> The only reference to the feature I could find was in the top-level
> git.1 man page so maybe the syntax is different.

I found the syntax description under the 'pathspec' section in gitglossary(7).

The documentation doesn't say one way or the other, but this suggests that
icase *doesn't* apply to the whole path:

  % git ls-files -- ":(icase)${(U)PWD}/*Kbuild*"
  fatal: Invalid path '/HOME': No such file or directory

And i found this in git's dir.c, which seems to confirm:

  * Suppose the pathspec is 'foo' and '../bar' running from
  * subdir 'xyz'. The common prefix at #1 will be empty, thanks
  * to "../". We may have xyz/foo _and_ XYZ/foo after #2. The
  * user does not want XYZ/foo, only the "foo" part should be
  * case-insensitive. We need to filter out XYZ/foo here. In
  * other words, we do not trust the caller on comparing the
  * prefix part when :(icase) is involved. We do exact
  * comparison ourselves.

On 28 Mar 2021, at 03:08, Oliver Kiddle <opk@zsh.org> wrote:
> That doesn't look too bad at all but I tend not to work in big trees
> like the Linux kernel. :(icase) may be worse for a deeper path where
> it adds extra directory matches. Did you run these benchmarks on a
> case-sensitive or insensitive file-system?

Case-sensitive (ext4). Not sure if it's still like this, but you used to get
unresolvable conflicts if you checked out the Linux source on a
case-insensitive file system...

On 28 Mar 2021, at 03:08, Oliver Kiddle <opk@zsh.org> wrote:
> I don't think thats necessary. Committing as per the original patch
> seems good to me.

Thanks.

The only other thing i can think of that's worth mentioning is that the first
ls-files call may now return unusable results that nonetheless bypass the
second call. e.g., if you're using case-sensitive matching, and you try to
complete Foo, but you only find foo, the latter will be used to populate
$files, which will bypass the second ls-files call. I don't know if or when
this would be an issue, but it is different.

If anyone can think of a problem with that, please let me know. Otherwise i'll
commit the following in a few days probably

dana


diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
index ced89b501..0267acfa8 100644
--- a/Completion/Unix/Command/_git
+++ b/Completion/Unix/Command/_git
@@ -7155,11 +7155,16 @@ __git_files () {
   local pref=${(Q)${~PREFIX}}
   [[ $pref[1] == '/' ]] || pref=$gittoplevel$gitprefix$pref
 
-  # First allow ls-files to pattern-match in case of remote repository
-  files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- ${(q)${pref:+$pref\*}:-.} 2>/dev/null)"})
+  # First allow ls-files to pattern-match in case of remote repository. Use the
+  # icase pathspec magic word to ensure that we support case-insensitive path
+  # completion for users with the appropriate matcher configuration
+  files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- ${(q)${pref:+:\(icase\)$pref\*}:-.} 2>/dev/null)"})
   __git_command_successful $pipestatus || return
 
-  # If ls-files succeeded but returned nothing, try again with no pattern
+  # If ls-files succeeded but returned nothing, try again with no pattern. Note
+  # that ls-files defaults to the CWD if not given a path, so if the file we
+  # were trying to add is in an *adjacent* directory, this won't return anything
+  # helpful either
   if [[ -z "$files" && -n "$pref" ]]; then
     files=(${(0)"$(_call_program files git ls-files -z --exclude-standard ${(q)opts} -- 2>/dev/null)"})
     __git_command_successful $pipestatus || return



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2021-03-30  5:59         ` dana
@ 2021-04-10 20:43           ` Lawrence Velázquez
  2021-04-10 20:56             ` dana
  0 siblings, 1 reply; 10+ messages in thread
From: Lawrence Velázquez @ 2021-04-10 20:43 UTC (permalink / raw)
  To: dana; +Cc: zsh-workers

On Tue, Mar 30, 2021, at 1:59 AM, dana wrote:
> If anyone can think of a problem with that, please let me know. Otherwise i'll
> commit the following in a few days probably

bump

vq


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Case-insensitive path completion in _git
  2021-04-10 20:43           ` Lawrence Velázquez
@ 2021-04-10 20:56             ` dana
  0 siblings, 0 replies; 10+ messages in thread
From: dana @ 2021-04-10 20:56 UTC (permalink / raw)
  To: Lawrence Velázquez; +Cc: zsh-workers

On 10 Apr 2021, at 15:43, Lawrence Velázquez <larryv@zsh.org> wrote:
> bump

Sorry, i had to send my computer away for a bit. This is committed

dana



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-10 20:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13  4:40 [RFC] Case-insensitive path completion in _git dana
2020-12-13 13:55 ` m0viefreak
2020-12-14 10:25   ` dana
2021-03-27 21:08   ` Daniel Shahaf
2021-03-28  5:07     ` dana
2021-03-28  8:08       ` Oliver Kiddle
2021-03-30  5:59         ` dana
2021-04-10 20:43           ` Lawrence Velázquez
2021-04-10 20:56             ` dana
2021-03-27 20:06 ` Lawrence Velázquez

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).