* accessing array by index very slow @ 2023-04-16 9:03 Stephane Chazelas 2023-04-16 9:10 ` Roman Perepelitsa 0 siblings, 1 reply; 7+ messages in thread From: Stephane Chazelas @ 2023-04-16 9:03 UTC (permalink / raw) To: Zsh hackers list $ zsh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[1]}"; })' ( for ((i=0; i<10000; i++)) do; : "${a[1]}"; done; ) 0.07s user 0.03s system 96% cpu 0.102 total $ zsh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })' ( for ((i=0; i<10000; i++)) do; : "${a[999999]}"; done; ) 14.56s user 0.06s system 99% cpu 14.616 total Accessing array elements via their index is very slow, especially for large index values. Comparing with other shells: ~$ bash -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })' real 0m0.078s user 0m0.077s sys 0m0.000s ~$ ksh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })' real 0m00.02s user 0m00.02s sys 0m00.00s cachegrind reveals the time is spent in arrlen_ge(), as zsh needs to walk through the whole array to determine whether 999999 is past the end of the array or not. Why doesn't zsh record the length of arrays to avoid that gymnastic (and speed up $#array expansion)? -- Stephane ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-16 9:03 accessing array by index very slow Stephane Chazelas @ 2023-04-16 9:10 ` Roman Perepelitsa 2023-04-16 9:19 ` Sebastian Gniazdowski 0 siblings, 1 reply; 7+ messages in thread From: Roman Perepelitsa @ 2023-04-16 9:10 UTC (permalink / raw) To: Zsh hackers list On Sun, Apr 16, 2023 at 11:04 AM Stephane Chazelas <stephane@chazelas.org> wrote: > > Accessing array elements via their index is very slow, > especially for large index values. Indeed, access to array elements is O(N) in zsh instead of O(1) that one might expect. The same goes for strings, even with no_multibyte. This makes computation-heavy code incredibly slow. If someone can fix this, they'll have my eternal gratitude. On the plus side zsh has fast hashtables (closed hashing with quadratic probing). Bash's implementation is a lot slower (it's node-based). Roman. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-16 9:10 ` Roman Perepelitsa @ 2023-04-16 9:19 ` Sebastian Gniazdowski 2023-04-16 15:27 ` Bart Schaefer 0 siblings, 1 reply; 7+ messages in thread From: Sebastian Gniazdowski @ 2023-04-16 9:19 UTC (permalink / raw) To: Roman Perepelitsa; +Cc: Zsh hackers list [-- Attachment #1: Type: text/plain, Size: 832 bytes --] There was badarrays branch from Mikachu in the past. it was resolving constant arrlen() calling. niedz., 16 kwi 2023, 11:10 użytkownik Roman Perepelitsa < roman.perepelitsa@gmail.com> napisał: > On Sun, Apr 16, 2023 at 11:04 AM Stephane Chazelas > <stephane@chazelas.org> wrote: > > > > Accessing array elements via their index is very slow, > > especially for large index values. > > Indeed, access to array elements is O(N) in zsh instead of O(1) that > one might expect. The same goes for strings, even with no_multibyte. > This makes computation-heavy code incredibly slow. If someone can fix > this, they'll have my eternal gratitude. > > On the plus side zsh has fast hashtables (closed hashing with > quadratic probing). Bash's implementation is a lot slower (it's > node-based). > > Roman. > > [-- Attachment #2: Type: text/html, Size: 1232 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-16 9:19 ` Sebastian Gniazdowski @ 2023-04-16 15:27 ` Bart Schaefer 2023-04-16 22:03 ` Mikael Magnusson 0 siblings, 1 reply; 7+ messages in thread From: Bart Schaefer @ 2023-04-16 15:27 UTC (permalink / raw) To: Sebastian Gniazdowski; +Cc: Roman Perepelitsa, Zsh hackers list On Sun, Apr 16, 2023 at 2:20 AM Sebastian Gniazdowski <sgniazdowski@gmail.com> wrote: > > There was badarrays branch from Mikachu in the past. it was resolving constant arrlen() calling. I had been rebasing that regularly for a while (origin/schaefer/badarrays) but lost access to the host with that clone and didn't keep up the effort elsewhere. Tried again just now and get a lot of errors in params.c declarations and on new calls to functions where Mikachu changed the arguments. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-16 15:27 ` Bart Schaefer @ 2023-04-16 22:03 ` Mikael Magnusson 2023-04-18 2:43 ` Bart Schaefer 0 siblings, 1 reply; 7+ messages in thread From: Mikael Magnusson @ 2023-04-16 22:03 UTC (permalink / raw) To: Bart Schaefer; +Cc: Sebastian Gniazdowski, Roman Perepelitsa, Zsh hackers list On 4/16/23, Bart Schaefer <schaefer@brasslantern.com> wrote: > On Sun, Apr 16, 2023 at 2:20 AM Sebastian Gniazdowski > <sgniazdowski@gmail.com> wrote: >> >> There was badarrays branch from Mikachu in the past. it was resolving >> constant arrlen() calling. > > I had been rebasing that regularly for a while > (origin/schaefer/badarrays) but lost access to the host with that > clone and didn't keep up the effort elsewhere. Tried again just now > and get a lot of errors in params.c declarations and on new calls to > functions where Mikachu changed the arguments. I don't think I ever got it to actually work properly in the first place, but it would likely be a good place to start looking if anyone else is interested. -- Mikael Magnusson ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-16 22:03 ` Mikael Magnusson @ 2023-04-18 2:43 ` Bart Schaefer 2023-04-30 19:26 ` Bart Schaefer 0 siblings, 1 reply; 7+ messages in thread From: Bart Schaefer @ 2023-04-18 2:43 UTC (permalink / raw) To: Mikael Magnusson Cc: Sebastian Gniazdowski, Roman Perepelitsa, Zsh hackers list On Sun, Apr 16, 2023 at 3:03 PM Mikael Magnusson <mikachu@gmail.com> wrote: > > I don't think I ever got it to actually work properly in the first > place, but it would likely be a good place to start looking if anyone > else is interested. We seem to have got partway there with arrlen_lt() and arrlen_ge(), a lot of the merge conflicts hit those. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: accessing array by index very slow 2023-04-18 2:43 ` Bart Schaefer @ 2023-04-30 19:26 ` Bart Schaefer 0 siblings, 0 replies; 7+ messages in thread From: Bart Schaefer @ 2023-04-30 19:26 UTC (permalink / raw) To: Zsh hackers list I've been poking around at this, and there are a few serious complications. The first is that we almost always retrieve arrays via a Value struct, which contains both a pointer to the source Param and the start/end indexes for an array slice. The largest reason array accesses aren't O(1) is due to bounds checking that those desired slices don't point outside the array. We've already optimized bounds checking to avoid scanning the full array, so unless the same array is going to be checked repeatedly it'd actually make things slower to populate the cache. Second and similarly, the best place to initialize the cache is when assigning or appending to the array, which makes those operations become O(n), which seems undesirable. Third, a lot of array operations either use an array constructed on the fly (array of all values of a hash table, for example) or fetch the array from the parameter at a point in the program flow that's abstracted from the operation that needs to do the bounds checking, so the cached length is no longer readily available; or a subset of the source array may already have been copied using (start,end) of the Value, or by scanning the array for a pattern match, so the cached value no longer applies. Fourth, the length of a PM_SPECIAL parameter can't reliably be cached because the source data may change between accesses. Not true for all PM_SPECIAL, of course, but recording which ones can or cannot be is an obstacle. There is a "numparamvals" global that's reset whenever a hash is converted to an array, so that might be co-opted to update on other array copy operations if we're careful. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-30 19:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-16 9:03 accessing array by index very slow Stephane Chazelas 2023-04-16 9:10 ` Roman Perepelitsa 2023-04-16 9:19 ` Sebastian Gniazdowski 2023-04-16 15:27 ` Bart Schaefer 2023-04-16 22:03 ` Mikael Magnusson 2023-04-18 2:43 ` Bart Schaefer 2023-04-30 19:26 ` Bart Schaefer
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).