zsh-workers
 help / color / mirror / code / Atom feed
* Performance of _store_cache and _retrieve_cache
@ 2015-02-08 16:19 Daniel Hahler
  2015-02-08 18:27 ` Daniel Hahler
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Daniel Hahler @ 2015-02-08 16:19 UTC (permalink / raw)
  To: Zsh Hackers' List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I've noticed that the completion systems cache mechanism
(_retrieve_cache and _store_cache) is slow with large lists (~50000).

_store_cache saves the array like this:

    _zsh_all_pkgs=( '02exercicio' '0x10c-asm'  ... )

and _retrieve_cache then sources it from a file.

The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
and is slower than generating the list anew!


When converting the list to be line-separated, the following is much
faster (less than a second):

   _zsh_all_pkgs=(${(f)"$(<pip_allpkgs)"})

This also applies to using the "formatted"/"typed" source file as-is:
Even when using the slow list as is:

   _zsh_all_pkgs=(${$(<pip_allpkgs.slow)})


The initial list is generated using:

      _zsh_all_pkgs=( $(curl -s https://pypi.python.org/simple/ \
        | sed -n '/<a href/ s/.*>\([^<]\{1,\}\).*/\1/p' \
        | tr '\n' ' ') )


Regards,
Daniel.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iD8DBQFU14yofAK/hT/mPgARAkEFAKD2P5vOvKLwOQffWmq8OTTCTxFHTQCfQzzL
bqUGpSQvauen9qgus9FB5rQ=
=72Ab
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler
@ 2015-02-08 18:27 ` Daniel Hahler
  2015-02-08 19:14 ` Bart Schaefer
  2015-02-08 20:27 ` Bart Schaefer
  2 siblings, 0 replies; 9+ messages in thread
From: Daniel Hahler @ 2015-02-08 18:27 UTC (permalink / raw)
  To: Zsh Hackers' List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have profiled this a bit, using:

    % valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes Src/zsh -f
    ==1513== Callgrind, a call-graph generating cache profiler
    ==1513== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
    ==1513== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
    ==1513== Command: Src/zsh -f
    ==1513== 
    --1513-- warning: L3 cache found, using its data for the LL simulation.
    ==1513== For interactive control, run 'callgrind_control -h'.
    lenny% source ~/.zcompcache/pip_allpkgs.slow  
    lenny% 
    ==1513== 
    ==1513== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
    ==1513== Collected : 16367431118 6050585484 1493446409 47422 484447632 867383 8162 142249 187982
    ==1513== 
    ==1513== I   refs:      16,367,431,118
    ==1513== I1  misses:            47,422
    ==1513== LLi misses:             8,162
    ==1513== I1  miss rate:            0.0%
    ==1513== LLi miss rate:            0.0%
    ==1513== 
    ==1513== D   refs:       7,544,031,893  (6,050,585,484 rd + 1,493,446,409 wr)
    ==1513== D1  misses:       485,315,015  (  484,447,632 rd +       867,383 wr)
    ==1513== LLd misses:           330,231  (      142,249 rd +       187,982 wr)
    ==1513== D1  miss rate:            6.4% (          8.0%   +           0.0%  )
    ==1513== LLd miss rate:            0.0% (          0.0%   +           0.0%  )
    ==1513== 
    ==1513== LL refs:          485,362,437  (  484,495,054 rd +       867,383 wr)
    ==1513== LL misses:            338,393  (      150,411 rd +       187,982 wr)
    ==1513== LL miss rate:             0.0% (          0.0%   +           0.0%  )
    valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes  Src/zsh -f  491,95s user 0,33s system 92% cpu 8:52,36 total


A screenshot of kcachegrind displaying the hot spot is available at:
http://i.imgur.com/8ntTLUQ.png

I can also provide the callgrind.out.1513 file itself, if this helps.

- From Src/parse.c, line 390:

        for (pp = &ecstrs; (p = *pp); ) {

This following condition is never true (but the most expensive):

            if (!(cmp = p->nfunc - ecnfunc) && !(cmp = strcmp(p->str, s)))
          > 286166892 call(s) to '__strcmp_ssse3' (libc-2.19.so: strcmp.S)
          > Jumping 286 166 892 times to parse.c:393 with 286 166 892 executions
              return p->offs;
            pp = (cmp < 0 ? &(p->left) : &(p->right));

Thanks,
Daniel.

On 08.02.2015 17:19, Daniel Hahler wrote:
> Hi,
> 
> I've noticed that the completion systems cache mechanism
> (_retrieve_cache and _store_cache) is slow with large lists (~50000).
> 
> _store_cache saves the array like this:
> 
>     _zsh_all_pkgs=( '02exercicio' '0x10c-asm'  ... )
> 
> and _retrieve_cache then sources it from a file.
> 
> The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
> and is slower than generating the list anew!
> 
> 
> When converting the list to be line-separated, the following is much
> faster (less than a second):
> 
>    _zsh_all_pkgs=(${(f)"$(<pip_allpkgs)"})
> 
> This also applies to using the "formatted"/"typed" source file as-is:
> Even when using the slow list as is:
> 
>    _zsh_all_pkgs=(${$(<pip_allpkgs.slow)})
> 
> 
> The initial list is generated using:
> 
>       _zsh_all_pkgs=( $(curl -s https://pypi.python.org/simple/ \
>         | sed -n '/<a href/ s/.*>\([^<]\{1,\}\).*/\1/p' \
>         | tr '\n' ' ') )
> 
> 
> Regards,
> Daniel.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iD8DBQFU16qtfAK/hT/mPgARAgNAAJ9y//ybvVz0MPwu9XzxC/6/js2PSACeLgQp
vWt3CCIPbOOeaD0+I0flWWg=
=5aY6
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler
  2015-02-08 18:27 ` Daniel Hahler
@ 2015-02-08 19:14 ` Bart Schaefer
  2015-02-08 20:27 ` Bart Schaefer
  2 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2015-02-08 19:14 UTC (permalink / raw)
  To: Zsh Hackers' List

On Feb 8,  5:19pm, Daniel Hahler wrote:
} Subject: Performance of _store_cache and _retrieve_cache
} 
} I've noticed that the completion systems cache mechanism
} (_retrieve_cache and _store_cache) is slow with large lists (~50000).

[...]

} The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
} and is slower than generating the list anew!

This might be addressed by having a policy check at _store_cache time as
well as at _retrieve_cache.  (Check a different policy, that is.)

I've been wondering whether the cache mechanism should use zstyle
rather than the parameter space.  Unfortunately the current interface
encourages this type of thing:

  if ( [[ ${+_zypp_all_raw} -eq 0 ]] || _cache_invalid ZYPPER_ALL_RAW ) &&
     ! _retrieve_cache ZYPPER_ALL_RAW;
  then

or

  if [[ -n $state ]] && (( ! $+_svn_cmds )); then
    typeset -gHA _svn_cmds
    if _cache_invalid svn-cmds || ! _retrieve_cache svn-cmds; then

I.e., the caller is aware that the cache is stored in a variable and
therefore explicitly tests the variable as well as _cache_invalid.
That makes it a little difficult to change the backend.

Further the _cache_invalid tests there are actually redundant, it'll be
called again by _retrieve_cache.  So there's a bunch of cleanup to be
done in the use cases before the backend could be altered.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler
  2015-02-08 18:27 ` Daniel Hahler
  2015-02-08 19:14 ` Bart Schaefer
@ 2015-02-08 20:27 ` Bart Schaefer
  2015-02-09  2:20   ` Slow parsing of large array assignments Bart Schaefer
                     ` (2 more replies)
  2 siblings, 3 replies; 9+ messages in thread
From: Bart Schaefer @ 2015-02-08 20:27 UTC (permalink / raw)
  To: Zsh Hackers' List

On Feb 8,  5:19pm, Daniel Hahler wrote:
}
} _store_cache saves the array like this:
} 
}     _zsh_all_pkgs=( '02exercicio' '0x10c-asm'  ... )
} 
} The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
} and is slower than generating the list anew!

The slowdown here appears to be with compiling the source'd file into
the internal wordcode format before executing it.  Even dumping the
whole assignment as a single string and then using "eval" on that, is
faster than allowing "source" to parse the words directly.  We may
want to dig further into why that is the case.

The following is a LOT faster than either "source" or "eval", but may
require recent changes that fix bugs in the parsing of $(...) :

_zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs
'02exercicio' '0x10c-asm' ...
EO:_zsh_all_pkgs
)}" )


diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache
index 86e72e9..8feaee6 100644
--- a/Completion/Base/Utility/_store_cache
+++ b/Completion/Base/Utility/_store_cache
@@ -46,8 +46,15 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then
   for var; do
     case ${(Pt)var} in
     (*readonly*) ;;
-    (*(association|array)*) print -r "$var=( ${(kv@Pqq)^^var} )";;
-    (*)                     print -r "$var=${(Pqq)^^var}";;
+    (*(association|array)*)
+	# Dump the array as a here-document to reduce parsing overhead
+	# when reloading the cache with "source" from _retrieve_cache
+	print -r "$var=( "'"${(zQ)$(<<\EO:'"$var"
+	print -r "${(kv@Pqq)^^var}"
+	print -r "EO:$var"
+	print -r ')}" )'
+	;;
+    (*) print -r "$var=${(Pqq)^^var}";;
     esac
   done >! "$_cache_dir/$_cache_ident"
 else


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Slow parsing of large array assignments
  2015-02-08 20:27 ` Bart Schaefer
@ 2015-02-09  2:20   ` Bart Schaefer
  2015-05-23  1:10   ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler
  2015-05-27 22:10   ` Performance of _store_cache and _retrieve_cache Oliver Kiddle
  2 siblings, 0 replies; 9+ messages in thread
From: Bart Schaefer @ 2015-02-09  2:20 UTC (permalink / raw)
  To: Zsh Hackers' List

On Feb 8, 12:27pm, Bart Schaefer wrote:
}
} The slowdown here appears to be with compiling the source'd file into
} the internal wordcode format before executing it.  Even dumping the
} whole assignment as a single string and then using "eval" on that, is
} faster than allowing "source" to parse the words directly.  We may
} want to dig further into why that is the case.

The "slow" routine is parse.c:par_nl_wordlist(), which calls zshlex()
repeatedly [once for every word in the array].  This is slower when
reading from a file than when reading from a string in memory, which
explains the difference between "source" and "eval" for me.

It's also a lot slower than bufferwords(), which implements the (z)
flag for parameter splitting.

Finally, par_nl_wordlist() in the current revision of zsh seems to take
longer than in 4.2.0 or 5.0.7 for the same input.

Over 10 runs I get these average times sourcing a file that assigns
approximately 55000 words to an array:

zsh 4.2.0:                    3.91s
zsh-5.0.7 (release version):  4.85s
zsh-5.0.7-289-g7d15b9a:       7.47s

I believe the two zsh-5.0.7 were compiled with identical flags, though
the 5.0.7 binary is stripped; I don't know about the old 4.2.0.  If I
re-compile with -O in CFLAGS, g7d15b9a speeds up to 4.95s average.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache)
  2015-02-08 20:27 ` Bart Schaefer
  2015-02-09  2:20   ` Slow parsing of large array assignments Bart Schaefer
@ 2015-05-23  1:10   ` Daniel Hahler
  2015-05-27 22:10   ` Performance of _store_cache and _retrieve_cache Oliver Kiddle
  2 siblings, 0 replies; 9+ messages in thread
From: Daniel Hahler @ 2015-05-23  1:10 UTC (permalink / raw)
  To: Zsh Hackers' List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

the patch from 34476 (ae7dcab) breaks the caching for the _docker completion.

- From https://github.com/docker/docker/blob/master/contrib/completion/zsh/_docker#L177:

  __docker_commands () {
      # local -a  _docker_subcommands
      local cache_policy
  
      zstyle -s ":completion:${curcontext}:" cache-policy cache_policy
      if [[ -z "$cache_policy" ]]; then
          zstyle ":completion:${curcontext}:" cache-policy __docker_caching_policy
      fi
  
      if ( [[ ${+_docker_subcommands} -eq 0 ]] || _cache_invalid docker_subcommands) \
          && ! _retrieve_cache docker_subcommands;
      then
          local -a lines
          lines=(${(f)"$(_call_program commands docker 2>&1)"})
          _docker_subcommands=(${${${lines[$((${lines[(i)Commands:]} + 1)),${lines[(I)    *]}]}## #}/ ##/:})
          _docker_subcommands=($_docker_subcommands 'help:Show help for a command')
          _store_cache docker_subcommands _docker_subcommands
      fi
      _describe -t docker-commands "docker command" _docker_subcommands
  }

The cache being used is not the same as when storing it.

"docker ps" completes to "docker stops," then: "ps" is not in the cache,
and "stops," should be a description.

(backing out this commit made it work again)

The cache file has a line which is 1827 bytes long, which appears to be
an issue: removing some of the first entries makes "docker ps" complete
again.  Is there some 1024 bytes limit or something similar?

The cache file looks like this:

% cat .zcompcache/docker_subcommands
_docker_subcommands=( "${(zQ)$(<<\EO:_docker_subcommands
'attach:Attach to a running container' 'build:Build an image from a Dockerfile' … 'help:Show help for a command'
EO:_docker_subcommands
)}" )


Regards,
Daniel.

On 08.02.2015 21:27, Bart Schaefer wrote:
> On Feb 8,  5:19pm, Daniel Hahler wrote:
> }
> } _store_cache saves the array like this:
> } 
> }     _zsh_all_pkgs=( '02exercicio' '0x10c-asm'  ... )
> } 
> } The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
> } and is slower than generating the list anew!
> 
> The slowdown here appears to be with compiling the source'd file into
> the internal wordcode format before executing it.  Even dumping the
> whole assignment as a single string and then using "eval" on that, is
> faster than allowing "source" to parse the words directly.  We may
> want to dig further into why that is the case.
> 
> The following is a LOT faster than either "source" or "eval", but may
> require recent changes that fix bugs in the parsing of $(...) :
> 
> _zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs
> '02exercicio' '0x10c-asm' ...
> EO:_zsh_all_pkgs
> )}" )
> 
> 
> diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache
> index 86e72e9..8feaee6 100644
> --- a/Completion/Base/Utility/_store_cache
> +++ b/Completion/Base/Utility/_store_cache
> @@ -46,8 +46,15 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then
>    for var; do
>      case ${(Pt)var} in
>      (*readonly*) ;;
> -    (*(association|array)*) print -r "$var=( ${(kv@Pqq)^^var} )";;
> -    (*)                     print -r "$var=${(Pqq)^^var}";;
> +    (*(association|array)*)
> +	# Dump the array as a here-document to reduce parsing overhead
> +	# when reloading the cache with "source" from _retrieve_cache
> +	print -r "$var=( "'"${(zQ)$(<<\EO:'"$var"
> +	print -r "${(kv@Pqq)^^var}"
> +	print -r "EO:$var"
> +	print -r ')}" )'
> +	;;
> +    (*) print -r "$var=${(Pqq)^^var}";;
>      esac
>    done >! "$_cache_dir/$_cache_ident"
>  else
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iD8DBQFVX9OifAK/hT/mPgARAgxeAKDFsc0cg+pzpQ5xmF99D78nq3EqrQCfdpU9
r5vXtuoDJrVpTtuZ6lOcdNs=
=xe31
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-02-08 20:27 ` Bart Schaefer
  2015-02-09  2:20   ` Slow parsing of large array assignments Bart Schaefer
  2015-05-23  1:10   ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler
@ 2015-05-27 22:10   ` Oliver Kiddle
  2015-05-28 22:18     ` Bart Schaefer
  2 siblings, 1 reply; 9+ messages in thread
From: Oliver Kiddle @ 2015-05-27 22:10 UTC (permalink / raw)
  To: Zsh Hackers' List

On 8 Feb, Bart wrote:
> _zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs
> '02exercicio' '0x10c-asm' ...
> EO:_zsh_all_pkgs
> )}" )

This is removing the quotes before doing the word splitting so breaks
if there are values that need quoting. It works with another level of
nesting as in the patch below. I timed some tests and this is still much
faster than the original given a large enough array.

For a good example of this failing, set the extra-verbose style and do
command completion. Note that you'll only see problems when
_command_descriptions is loaded from the cache.

Oliver

diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache
index 8feaee6..fb2ab32 100644
--- a/Completion/Base/Utility/_store_cache
+++ b/Completion/Base/Utility/_store_cache
@@ -49,10 +49,10 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then
     (*(association|array)*)
 	# Dump the array as a here-document to reduce parsing overhead
 	# when reloading the cache with "source" from _retrieve_cache
-	print -r "$var=( "'"${(zQ)$(<<\EO:'"$var"
+	print -r "$var=( "'${(Q)"${(z)$(<<\EO:'"$var"
 	print -r "${(kv@Pqq)^^var}"
 	print -r "EO:$var"
-	print -r ')}" )'
+	print -r ')}"} )'
 	;;
     (*) print -r "$var=${(Pqq)^^var}";;
     esac


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-05-27 22:10   ` Performance of _store_cache and _retrieve_cache Oliver Kiddle
@ 2015-05-28 22:18     ` Bart Schaefer
  2015-05-29 11:13       ` Peter Stephenson
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Schaefer @ 2015-05-28 22:18 UTC (permalink / raw)
  To: Zsh Hackers' List

So ... this applies on top of ae7dcab5ed1882b6115c9fe3ab7894a8b6a7e9eb but
not with 899613f5351642c8257401054fef5f25725af172.

Are we sticking with the reverted version for release, or applying 35310
atop 34476?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Performance of _store_cache and _retrieve_cache
  2015-05-28 22:18     ` Bart Schaefer
@ 2015-05-29 11:13       ` Peter Stephenson
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Stephenson @ 2015-05-29 11:13 UTC (permalink / raw)
  To: Zsh Hackers' List

On Thu, 28 May 2015 15:18:44 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:
> So ... this applies on top of ae7dcab5ed1882b6115c9fe3ab7894a8b6a7e9eb but
> not with 899613f5351642c8257401054fef5f25725af172.
> 
> Are we sticking with the reverted version for release, or applying 35310
> atop 34476?

If that fixes the problem, applying them both would be fine.

pws


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-29 11:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler
2015-02-08 18:27 ` Daniel Hahler
2015-02-08 19:14 ` Bart Schaefer
2015-02-08 20:27 ` Bart Schaefer
2015-02-09  2:20   ` Slow parsing of large array assignments Bart Schaefer
2015-05-23  1:10   ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler
2015-05-27 22:10   ` Performance of _store_cache and _retrieve_cache Oliver Kiddle
2015-05-28 22:18     ` Bart Schaefer
2015-05-29 11:13       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).