zsh-workers
 help / color / mirror / code / Atom feed
* git completion is really slow for some git commands.
@ 2008-10-09 13:01 Brice Figureau
  2008-10-13  0:53 ` [PATCH] " Jörg Sommer
  0 siblings, 1 reply; 5+ messages in thread
From: Brice Figureau @ 2008-10-09 13:01 UTC (permalink / raw)
  To: zsh-workers

Hi,

I'm a long time user of zsh, and I've always been really pleased by its
superior completion system. Right now, I'm running lenny debian package
of 4.3.6.

Unfortunately git completion, although working fine, is extremely slow
as soon as you have a moderately sized git repository. In my case, my
checkout have 13000 files and about 6000 commits.

git log <TAB> takes about 15s using 100% CPU to complete on my 2.4GHZ P4
computer. OK, that's now an old computer, but I find that a little bit
slow to just sort/split 13000 strings.

It all boils down to the following chain of events:
 * git log completion uses __git_files

 * _git_files uses "git ls-files". In itself ls-files on this repository
on cold cache takes about 70ms (which is short, compared to the 15s of
the whole thing).

 * the result of git ls-files is splitted by \0 and stored in a shell
array. This operation takes about 350ms. That's still fast compared to
the 15s of the whole execution time. Note: that's not as fast we would
think it could be.

 * this shell array is then passed to _multi_parts for path splitting of
each element. This is this operation that takes age. As soon as I change
the _multi_parts code to just call a naive compadd and return, the
completion is (almost) immediate, and seems to work fine.

I understand that the issue might be that git ls-files returns _all_ the
files present in the index (ie not only the first level, it recurses the
hierarchy). 

Now, I'm really a zsh completion newbie, so I still fail to see how to
optimize the _git_files operation. Does anybody have an idea to speed up
the multi_parts thing? And Is it really needed?

Many thanks for the answers,
-- 
Brice Figureau <brice+zsh@daysofwonder.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] Re: git completion is really slow for some git commands.
  2008-10-09 13:01 git completion is really slow for some git commands Brice Figureau
@ 2008-10-13  0:53 ` Jörg Sommer
  2008-10-13  8:15   ` Brice Figureau
  0 siblings, 1 reply; 5+ messages in thread
From: Jörg Sommer @ 2008-10-13  0:53 UTC (permalink / raw)
  To: zsh-workers

Hello Brice,

Brice Figureau <brice+zsh@daysofwonder.com> wrote:
> Unfortunately git completion, although working fine, is extremely slow
> as soon as you have a moderately sized git repository. In my case, my
> checkout have 13000 files and about 6000 commits.
>
> git log <TAB> takes about 15s using 100% CPU to complete on my 2.4GHZ P4
> computer. OK, that's now an old computer, but I find that a little bit
> slow to just sort/split 13000 strings.
>
> It all boils down to the following chain of events:
>  * git log completion uses __git_files
>
>  * _git_files uses "git ls-files". In itself ls-files on this repository
> on cold cache takes about 70ms (which is short, compared to the 15s of
> the whole thing).
>
>  * the result of git ls-files is splitted by \0 and stored in a shell
> array. This operation takes about 350ms. That's still fast compared to
> the 15s of the whole execution time. Note: that's not as fast we would
> think it could be.
>
>  * this shell array is then passed to _multi_parts for path splitting of
> each element. This is this operation that takes age. As soon as I change
> the _multi_parts code to just call a naive compadd and return, the
> completion is (almost) immediate, and seems to work fine.

Can you try this patch? It doesn't change anything if you didn't specify
anything, i.e. git log -- <TAB> takes still very long. But it optimizes
the case when you specify anything. Try git log -- some/thing<TAB>.

commit e985683fe8d805228ad88903261f20181de23f1e
Author: Jörg Sommer <joerg@alea.gnuu.de>
Date:   Mon Oct 13 01:58:03 2008 +0200

    Prefilter the completion for _multi_parts
    
    The _multi_parts function consumes very much time, if the array with the
    possible completions is large. This happens often in large git
    repositories like the kernel git repository. To reduce the workload of
    _multi_parts filter out does entries from the array, they aren't possible
    completions. When the user specifies the path foo/bar only consider paths
    matching the pattern foo*/bar*.

diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
index c617613..5fd637a 100644
--- a/Completion/Unix/Command/_git
+++ b/Completion/Unix/Command/_git
@@ -2761,6 +2761,7 @@ __git_files () {
   files=(${(ps:\0:)"$(_call_program files git ls-files -z $ls_opts $opts 2>/dev/null)"})
   __git_command_successful || return
 
+  [[ -n $PREFIX ]] && files=${(M)files:#${~PREFIX//\//*/}*}
   _wanted files expl 'index file' _multi_parts $@ - / files
 }
 
@@ -2859,6 +2860,7 @@ __git_tree_files () {
   fi
 
   local expl
+  [[ -n $PREFIX ]] && tree_files=${(M)tree_files:#${~PREFIX//\//*/}*}
   _wanted files expl 'tree file' _multi_parts -f $@ -- / tree_files
 }
 

Question to everybody else: Should such a filter go to _multi_parts
itself?

Bye, Jörg.
-- 
Alles, wovor wir Angst haben müssen, ist die Angst selbst.
       	     	       	     	     	 (Franklin D. Roosevelt)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: git completion is really slow for some git commands.
  2008-10-13  0:53 ` [PATCH] " Jörg Sommer
@ 2008-10-13  8:15   ` Brice Figureau
  2008-10-13 15:01     ` Jörg Sommer
  0 siblings, 1 reply; 5+ messages in thread
From: Brice Figureau @ 2008-10-13  8:15 UTC (permalink / raw)
  To: Jörg Sommer; +Cc: zsh-workers

Hi Jörg,


On Mon, 2008-10-13 at 00:53 +0000, Jörg Sommer wrote:
> >  * this shell array is then passed to _multi_parts for path splitting of
> > each element. This is this operation that takes age. As soon as I change
> > the _multi_parts code to just call a naive compadd and return, the
> > completion is (almost) immediate, and seems to work fine.
> 
> Can you try this patch? It doesn't change anything if you didn't specify
> anything, i.e. git log -- <TAB> takes still very long. But it optimizes
> the case when you specify anything. Try git log -- some/thing<TAB>.

[snipped patch]

Yes, that works way faster for this case. Unfortunately it doesn't seem
to report the right results:

when I try:
git log c<TAB>
it show in the completion menu:
---- index file
cleopatragame.com
---- branch
cdn

While I do have a cdn branch, I also happen to have more than one
file/directory starting with a 'c' in the index (there is common, and
colosseumgame.com). It's exactly as it was giving only the first choice.

I looked to your patch, and tried it outside of the completion system
and it was working fine (ie returning only results starting with 'c'),
so I'm puzzled.

At least it is a magnitude more faster :-)
You are definitely on the right track.
Many thanks,
-- 
Brice Figureau <brice+zsh@daysofwonder.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: git completion is really slow for some git commands.
  2008-10-13  8:15   ` Brice Figureau
@ 2008-10-13 15:01     ` Jörg Sommer
  2008-10-13 15:45       ` Brice Figureau
  0 siblings, 1 reply; 5+ messages in thread
From: Jörg Sommer @ 2008-10-13 15:01 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Hello Brice,

Brice Figureau schrieb am Mon 13. Oct, 10:15 (+0200):
> On Mon, 2008-10-13 at 00:53 +0000, Jörg Sommer wrote:
> > >  * this shell array is then passed to _multi_parts for path splitting of
> > > each element. This is this operation that takes age. As soon as I change
> > > the _multi_parts code to just call a naive compadd and return, the
> > > completion is (almost) immediate, and seems to work fine.
> > 
> > Can you try this patch? It doesn't change anything if you didn't specify
> > anything, i.e. git log -- <TAB> takes still very long. But it optimizes
> > the case when you specify anything. Try git log -- some/thing<TAB>.
> 
> [snipped patch]
> 
> Yes, that works way faster for this case. Unfortunately it doesn't seem
> to report the right results:

I've forgot to put the result back in an array. Try this patch:

commit e8536069c36e8ae310e249356274a398bf5bd38d
Author: Jörg Sommer <joerg@alea.gnuu.de>
Date:   Mon Oct 13 01:58:03 2008 +0200

    Prefilter the completion for _multi_parts
    
    The _multi_parts function consumes very much time, if the array with the
    possible completions is large. This happens often in large git
    repositories like the kernel git repository. To reduce the workload of
    _multi_parts filter out does entries from the array, they aren't possible
    completions. When the user specifies the path foo/bar only consider paths
    matching the pattern foo*/bar*.

diff --git a/Completion/Unix/Command/_git b/Completion/Unix/Command/_git
index c617613..b6f5297 100644
--- a/Completion/Unix/Command/_git
+++ b/Completion/Unix/Command/_git
@@ -2761,6 +2761,7 @@ __git_files () {
   files=(${(ps:\0:)"$(_call_program files git ls-files -z $ls_opts $opts 2>/dev/null)"})
   __git_command_successful || return
 
+  [[ -n $PREFIX ]] && files=(${(M)files:#${~PREFIX//\//*/}*})
   _wanted files expl 'index file' _multi_parts $@ - / files
 }
 
@@ -2859,6 +2860,7 @@ __git_tree_files () {
   fi
 
   local expl
+  [[ -n $PREFIX ]] && tree_files=(${(M)tree_files:#${~PREFIX//\//*/}*})
   _wanted files expl 'tree file' _multi_parts -f $@ -- / tree_files
 }
 

Bye, Jörg.
-- 
Objektivität ist die Wahnvorstellung, Beobachtungen könnten ohne
Beobachter gemacht werden – Heinz v. Foerster

[-- Attachment #2: Digital signature http://en.wikipedia.org/wiki/OpenPGP --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: git completion is really slow for some git commands.
  2008-10-13 15:01     ` Jörg Sommer
@ 2008-10-13 15:45       ` Brice Figureau
  0 siblings, 0 replies; 5+ messages in thread
From: Brice Figureau @ 2008-10-13 15:45 UTC (permalink / raw)
  To: zsh-workers

On Mon, 2008-10-13 at 17:01 +0200, Jrg Sommer wrote:
> Hello Brice,
> 
> Brice Figureau schrieb am Mon 13. Oct, 10:15 (+0200):
> > On Mon, 2008-10-13 at 00:53 +0000, Jrg Sommer wrote:
> > > >  * this shell array is then passed to _multi_parts for path
splitting of
> > > > each element. This is this operation that takes age. As soon as
I change
> > > > the _multi_parts code to just call a naive compadd and return,
the
> > > > completion is (almost) immediate, and seems to work fine.
> > > 
> > > Can you try this patch? It doesn't change anything if you didn't
specify
> > > anything, i.e. git log -- <TAB> takes still very long. But it
optimizes
> > > the case when you specify anything. Try git log --
some/thing<TAB>.
> > 
> > [snipped patch]
> > 
> > Yes, that works way faster for this case. Unfortunately it doesn't
seem
> > to report the right results:
> 
> I've forgot to put the result back in an array. Try this patch:
[snipped patch]

Yes, this fixes the issue, and that's a good enhancement!

Unfortunately if you have plenty of completion possibility under the
same prefix, you hit again the root issue that is that _multi_part is
pretty slow when fed with large arrays. I didn't had a look to
_multi_part in detail yet but I think it is at least an O(n2) kind of
algorithm. I'll try to have a look to the algorithm and see if maybe
there is something that we can do for it.

Thanks for you patch, I'm running with it now :-)
-- 
Brice Figureau <brice+zsh@daysofwonder.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-13 15:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-09 13:01 git completion is really slow for some git commands Brice Figureau
2008-10-13  0:53 ` [PATCH] " Jörg Sommer
2008-10-13  8:15   ` Brice Figureau
2008-10-13 15:01     ` Jörg Sommer
2008-10-13 15:45       ` Brice Figureau

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).