* Performance of _store_cache and _retrieve_cache @ 2015-02-08 16:19 Daniel Hahler 2015-02-08 18:27 ` Daniel Hahler ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Daniel Hahler @ 2015-02-08 16:19 UTC (permalink / raw) To: Zsh Hackers' List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've noticed that the completion systems cache mechanism (_retrieve_cache and _store_cache) is slow with large lists (~50000). _store_cache saves the array like this: _zsh_all_pkgs=( '02exercicio' '0x10c-asm' ... ) and _retrieve_cache then sources it from a file. The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds, and is slower than generating the list anew! When converting the list to be line-separated, the following is much faster (less than a second): _zsh_all_pkgs=(${(f)"$(<pip_allpkgs)"}) This also applies to using the "formatted"/"typed" source file as-is: Even when using the slow list as is: _zsh_all_pkgs=(${$(<pip_allpkgs.slow)}) The initial list is generated using: _zsh_all_pkgs=( $(curl -s https://pypi.python.org/simple/ \ | sed -n '/<a href/ s/.*>\([^<]\{1,\}\).*/\1/p' \ | tr '\n' ' ') ) Regards, Daniel. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iD8DBQFU14yofAK/hT/mPgARAkEFAKD2P5vOvKLwOQffWmq8OTTCTxFHTQCfQzzL bqUGpSQvauen9qgus9FB5rQ= =72Ab -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler @ 2015-02-08 18:27 ` Daniel Hahler 2015-02-08 19:14 ` Bart Schaefer 2015-02-08 20:27 ` Bart Schaefer 2 siblings, 0 replies; 9+ messages in thread From: Daniel Hahler @ 2015-02-08 18:27 UTC (permalink / raw) To: Zsh Hackers' List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have profiled this a bit, using: % valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes Src/zsh -f ==1513== Callgrind, a call-graph generating cache profiler ==1513== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al. ==1513== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==1513== Command: Src/zsh -f ==1513== --1513-- warning: L3 cache found, using its data for the LL simulation. ==1513== For interactive control, run 'callgrind_control -h'. lenny% source ~/.zcompcache/pip_allpkgs.slow lenny% ==1513== ==1513== Events : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw ==1513== Collected : 16367431118 6050585484 1493446409 47422 484447632 867383 8162 142249 187982 ==1513== ==1513== I refs: 16,367,431,118 ==1513== I1 misses: 47,422 ==1513== LLi misses: 8,162 ==1513== I1 miss rate: 0.0% ==1513== LLi miss rate: 0.0% ==1513== ==1513== D refs: 7,544,031,893 (6,050,585,484 rd + 1,493,446,409 wr) ==1513== D1 misses: 485,315,015 ( 484,447,632 rd + 867,383 wr) ==1513== LLd misses: 330,231 ( 142,249 rd + 187,982 wr) ==1513== D1 miss rate: 6.4% ( 8.0% + 0.0% ) ==1513== LLd miss rate: 0.0% ( 0.0% + 0.0% ) ==1513== ==1513== LL refs: 485,362,437 ( 484,495,054 rd + 867,383 wr) ==1513== LL misses: 338,393 ( 150,411 rd + 187,982 wr) ==1513== LL miss rate: 0.0% ( 0.0% + 0.0% ) valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes Src/zsh -f 491,95s user 0,33s system 92% cpu 8:52,36 total A screenshot of kcachegrind displaying the hot spot is available at: http://i.imgur.com/8ntTLUQ.png I can also provide the callgrind.out.1513 file itself, if this helps. - From Src/parse.c, line 390: for (pp = &ecstrs; (p = *pp); ) { This following condition is never true (but the most expensive): if (!(cmp = p->nfunc - ecnfunc) && !(cmp = strcmp(p->str, s))) > 286166892 call(s) to '__strcmp_ssse3' (libc-2.19.so: strcmp.S) > Jumping 286 166 892 times to parse.c:393 with 286 166 892 executions return p->offs; pp = (cmp < 0 ? &(p->left) : &(p->right)); Thanks, Daniel. On 08.02.2015 17:19, Daniel Hahler wrote: > Hi, > > I've noticed that the completion systems cache mechanism > (_retrieve_cache and _store_cache) is slow with large lists (~50000). > > _store_cache saves the array like this: > > _zsh_all_pkgs=( '02exercicio' '0x10c-asm' ... ) > > and _retrieve_cache then sources it from a file. > > The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds, > and is slower than generating the list anew! > > > When converting the list to be line-separated, the following is much > faster (less than a second): > > _zsh_all_pkgs=(${(f)"$(<pip_allpkgs)"}) > > This also applies to using the "formatted"/"typed" source file as-is: > Even when using the slow list as is: > > _zsh_all_pkgs=(${$(<pip_allpkgs.slow)}) > > > The initial list is generated using: > > _zsh_all_pkgs=( $(curl -s https://pypi.python.org/simple/ \ > | sed -n '/<a href/ s/.*>\([^<]\{1,\}\).*/\1/p' \ > | tr '\n' ' ') ) > > > Regards, > Daniel. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iD8DBQFU16qtfAK/hT/mPgARAgNAAJ9y//ybvVz0MPwu9XzxC/6/js2PSACeLgQp vWt3CCIPbOOeaD0+I0flWWg= =5aY6 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler 2015-02-08 18:27 ` Daniel Hahler @ 2015-02-08 19:14 ` Bart Schaefer 2015-02-08 20:27 ` Bart Schaefer 2 siblings, 0 replies; 9+ messages in thread From: Bart Schaefer @ 2015-02-08 19:14 UTC (permalink / raw) To: Zsh Hackers' List On Feb 8, 5:19pm, Daniel Hahler wrote: } Subject: Performance of _store_cache and _retrieve_cache } } I've noticed that the completion systems cache mechanism } (_retrieve_cache and _store_cache) is slow with large lists (~50000). [...] } The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds, } and is slower than generating the list anew! This might be addressed by having a policy check at _store_cache time as well as at _retrieve_cache. (Check a different policy, that is.) I've been wondering whether the cache mechanism should use zstyle rather than the parameter space. Unfortunately the current interface encourages this type of thing: if ( [[ ${+_zypp_all_raw} -eq 0 ]] || _cache_invalid ZYPPER_ALL_RAW ) && ! _retrieve_cache ZYPPER_ALL_RAW; then or if [[ -n $state ]] && (( ! $+_svn_cmds )); then typeset -gHA _svn_cmds if _cache_invalid svn-cmds || ! _retrieve_cache svn-cmds; then I.e., the caller is aware that the cache is stored in a variable and therefore explicitly tests the variable as well as _cache_invalid. That makes it a little difficult to change the backend. Further the _cache_invalid tests there are actually redundant, it'll be called again by _retrieve_cache. So there's a bunch of cleanup to be done in the use cases before the backend could be altered. -- Barton E. Schaefer ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler 2015-02-08 18:27 ` Daniel Hahler 2015-02-08 19:14 ` Bart Schaefer @ 2015-02-08 20:27 ` Bart Schaefer 2015-02-09 2:20 ` Slow parsing of large array assignments Bart Schaefer ` (2 more replies) 2 siblings, 3 replies; 9+ messages in thread From: Bart Schaefer @ 2015-02-08 20:27 UTC (permalink / raw) To: Zsh Hackers' List On Feb 8, 5:19pm, Daniel Hahler wrote: } } _store_cache saves the array like this: } } _zsh_all_pkgs=( '02exercicio' '0x10c-asm' ... ) } } The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds, } and is slower than generating the list anew! The slowdown here appears to be with compiling the source'd file into the internal wordcode format before executing it. Even dumping the whole assignment as a single string and then using "eval" on that, is faster than allowing "source" to parse the words directly. We may want to dig further into why that is the case. The following is a LOT faster than either "source" or "eval", but may require recent changes that fix bugs in the parsing of $(...) : _zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs '02exercicio' '0x10c-asm' ... EO:_zsh_all_pkgs )}" ) diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache index 86e72e9..8feaee6 100644 --- a/Completion/Base/Utility/_store_cache +++ b/Completion/Base/Utility/_store_cache @@ -46,8 +46,15 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then for var; do case ${(Pt)var} in (*readonly*) ;; - (*(association|array)*) print -r "$var=( ${(kv@Pqq)^^var} )";; - (*) print -r "$var=${(Pqq)^^var}";; + (*(association|array)*) + # Dump the array as a here-document to reduce parsing overhead + # when reloading the cache with "source" from _retrieve_cache + print -r "$var=( "'"${(zQ)$(<<\EO:'"$var" + print -r "${(kv@Pqq)^^var}" + print -r "EO:$var" + print -r ')}" )' + ;; + (*) print -r "$var=${(Pqq)^^var}";; esac done >! "$_cache_dir/$_cache_ident" else ^ permalink raw reply [flat|nested] 9+ messages in thread
* Slow parsing of large array assignments 2015-02-08 20:27 ` Bart Schaefer @ 2015-02-09 2:20 ` Bart Schaefer 2015-05-23 1:10 ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler 2015-05-27 22:10 ` Performance of _store_cache and _retrieve_cache Oliver Kiddle 2 siblings, 0 replies; 9+ messages in thread From: Bart Schaefer @ 2015-02-09 2:20 UTC (permalink / raw) To: Zsh Hackers' List On Feb 8, 12:27pm, Bart Schaefer wrote: } } The slowdown here appears to be with compiling the source'd file into } the internal wordcode format before executing it. Even dumping the } whole assignment as a single string and then using "eval" on that, is } faster than allowing "source" to parse the words directly. We may } want to dig further into why that is the case. The "slow" routine is parse.c:par_nl_wordlist(), which calls zshlex() repeatedly [once for every word in the array]. This is slower when reading from a file than when reading from a string in memory, which explains the difference between "source" and "eval" for me. It's also a lot slower than bufferwords(), which implements the (z) flag for parameter splitting. Finally, par_nl_wordlist() in the current revision of zsh seems to take longer than in 4.2.0 or 5.0.7 for the same input. Over 10 runs I get these average times sourcing a file that assigns approximately 55000 words to an array: zsh 4.2.0: 3.91s zsh-5.0.7 (release version): 4.85s zsh-5.0.7-289-g7d15b9a: 7.47s I believe the two zsh-5.0.7 were compiled with identical flags, though the 5.0.7 binary is stripped; I don't know about the old 4.2.0. If I re-compile with -O in CFLAGS, g7d15b9a speeds up to 4.95s average. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) 2015-02-08 20:27 ` Bart Schaefer 2015-02-09 2:20 ` Slow parsing of large array assignments Bart Schaefer @ 2015-05-23 1:10 ` Daniel Hahler 2015-05-27 22:10 ` Performance of _store_cache and _retrieve_cache Oliver Kiddle 2 siblings, 0 replies; 9+ messages in thread From: Daniel Hahler @ 2015-05-23 1:10 UTC (permalink / raw) To: Zsh Hackers' List -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, the patch from 34476 (ae7dcab) breaks the caching for the _docker completion. - From https://github.com/docker/docker/blob/master/contrib/completion/zsh/_docker#L177: __docker_commands () { # local -a _docker_subcommands local cache_policy zstyle -s ":completion:${curcontext}:" cache-policy cache_policy if [[ -z "$cache_policy" ]]; then zstyle ":completion:${curcontext}:" cache-policy __docker_caching_policy fi if ( [[ ${+_docker_subcommands} -eq 0 ]] || _cache_invalid docker_subcommands) \ && ! _retrieve_cache docker_subcommands; then local -a lines lines=(${(f)"$(_call_program commands docker 2>&1)"}) _docker_subcommands=(${${${lines[$((${lines[(i)Commands:]} + 1)),${lines[(I) *]}]}## #}/ ##/:}) _docker_subcommands=($_docker_subcommands 'help:Show help for a command') _store_cache docker_subcommands _docker_subcommands fi _describe -t docker-commands "docker command" _docker_subcommands } The cache being used is not the same as when storing it. "docker ps" completes to "docker stops," then: "ps" is not in the cache, and "stops," should be a description. (backing out this commit made it work again) The cache file has a line which is 1827 bytes long, which appears to be an issue: removing some of the first entries makes "docker ps" complete again. Is there some 1024 bytes limit or something similar? The cache file looks like this: % cat .zcompcache/docker_subcommands _docker_subcommands=( "${(zQ)$(<<\EO:_docker_subcommands 'attach:Attach to a running container' 'build:Build an image from a Dockerfile' … 'help:Show help for a command' EO:_docker_subcommands )}" ) Regards, Daniel. On 08.02.2015 21:27, Bart Schaefer wrote: > On Feb 8, 5:19pm, Daniel Hahler wrote: > } > } _store_cache saves the array like this: > } > } _zsh_all_pkgs=( '02exercicio' '0x10c-asm' ... ) > } > } The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds, > } and is slower than generating the list anew! > > The slowdown here appears to be with compiling the source'd file into > the internal wordcode format before executing it. Even dumping the > whole assignment as a single string and then using "eval" on that, is > faster than allowing "source" to parse the words directly. We may > want to dig further into why that is the case. > > The following is a LOT faster than either "source" or "eval", but may > require recent changes that fix bugs in the parsing of $(...) : > > _zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs > '02exercicio' '0x10c-asm' ... > EO:_zsh_all_pkgs > )}" ) > > > diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache > index 86e72e9..8feaee6 100644 > --- a/Completion/Base/Utility/_store_cache > +++ b/Completion/Base/Utility/_store_cache > @@ -46,8 +46,15 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then > for var; do > case ${(Pt)var} in > (*readonly*) ;; > - (*(association|array)*) print -r "$var=( ${(kv@Pqq)^^var} )";; > - (*) print -r "$var=${(Pqq)^^var}";; > + (*(association|array)*) > + # Dump the array as a here-document to reduce parsing overhead > + # when reloading the cache with "source" from _retrieve_cache > + print -r "$var=( "'"${(zQ)$(<<\EO:'"$var" > + print -r "${(kv@Pqq)^^var}" > + print -r "EO:$var" > + print -r ')}" )' > + ;; > + (*) print -r "$var=${(Pqq)^^var}";; > esac > done >! "$_cache_dir/$_cache_ident" > else > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iD8DBQFVX9OifAK/hT/mPgARAgxeAKDFsc0cg+pzpQ5xmF99D78nq3EqrQCfdpU9 r5vXtuoDJrVpTtuZ6lOcdNs= =xe31 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-02-08 20:27 ` Bart Schaefer 2015-02-09 2:20 ` Slow parsing of large array assignments Bart Schaefer 2015-05-23 1:10 ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler @ 2015-05-27 22:10 ` Oliver Kiddle 2015-05-28 22:18 ` Bart Schaefer 2 siblings, 1 reply; 9+ messages in thread From: Oliver Kiddle @ 2015-05-27 22:10 UTC (permalink / raw) To: Zsh Hackers' List On 8 Feb, Bart wrote: > _zsh_all_pkgs=( "${(zQ)$(<<\EO:_zsh_all_pkgs > '02exercicio' '0x10c-asm' ... > EO:_zsh_all_pkgs > )}" ) This is removing the quotes before doing the word splitting so breaks if there are values that need quoting. It works with another level of nesting as in the patch below. I timed some tests and this is still much faster than the original given a large enough array. For a good example of this failing, set the extra-verbose style and do command completion. Note that you'll only see problems when _command_descriptions is loaded from the cache. Oliver diff --git a/Completion/Base/Utility/_store_cache b/Completion/Base/Utility/_store_cache index 8feaee6..fb2ab32 100644 --- a/Completion/Base/Utility/_store_cache +++ b/Completion/Base/Utility/_store_cache @@ -49,10 +49,10 @@ if zstyle -t ":completion:${curcontext}:" use-cache; then (*(association|array)*) # Dump the array as a here-document to reduce parsing overhead # when reloading the cache with "source" from _retrieve_cache - print -r "$var=( "'"${(zQ)$(<<\EO:'"$var" + print -r "$var=( "'${(Q)"${(z)$(<<\EO:'"$var" print -r "${(kv@Pqq)^^var}" print -r "EO:$var" - print -r ')}" )' + print -r ')}"} )' ;; (*) print -r "$var=${(Pqq)^^var}";; esac ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-05-27 22:10 ` Performance of _store_cache and _retrieve_cache Oliver Kiddle @ 2015-05-28 22:18 ` Bart Schaefer 2015-05-29 11:13 ` Peter Stephenson 0 siblings, 1 reply; 9+ messages in thread From: Bart Schaefer @ 2015-05-28 22:18 UTC (permalink / raw) To: Zsh Hackers' List So ... this applies on top of ae7dcab5ed1882b6115c9fe3ab7894a8b6a7e9eb but not with 899613f5351642c8257401054fef5f25725af172. Are we sticking with the reverted version for release, or applying 35310 atop 34476? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Performance of _store_cache and _retrieve_cache 2015-05-28 22:18 ` Bart Schaefer @ 2015-05-29 11:13 ` Peter Stephenson 0 siblings, 0 replies; 9+ messages in thread From: Peter Stephenson @ 2015-05-29 11:13 UTC (permalink / raw) To: Zsh Hackers' List On Thu, 28 May 2015 15:18:44 -0700 Bart Schaefer <schaefer@brasslantern.com> wrote: > So ... this applies on top of ae7dcab5ed1882b6115c9fe3ab7894a8b6a7e9eb but > not with 899613f5351642c8257401054fef5f25725af172. > > Are we sticking with the reverted version for release, or applying 35310 > atop 34476? If that fixes the problem, applying them both would be fine. pws ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-05-29 11:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-02-08 16:19 Performance of _store_cache and _retrieve_cache Daniel Hahler 2015-02-08 18:27 ` Daniel Hahler 2015-02-08 19:14 ` Bart Schaefer 2015-02-08 20:27 ` Bart Schaefer 2015-02-09 2:20 ` Slow parsing of large array assignments Bart Schaefer 2015-05-23 1:10 ` Regression with completion cache (was: Re: Performance of _store_cache and _retrieve_cache) Daniel Hahler 2015-05-27 22:10 ` Performance of _store_cache and _retrieve_cache Oliver Kiddle 2015-05-28 22:18 ` Bart Schaefer 2015-05-29 11:13 ` Peter Stephenson
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).