zsh-workers
 help / color / mirror / code / Atom feed
* accessing array by index very slow
@ 2023-04-16  9:03 Stephane Chazelas
  2023-04-16  9:10 ` Roman Perepelitsa
  0 siblings, 1 reply; 7+ messages in thread
From: Stephane Chazelas @ 2023-04-16  9:03 UTC (permalink / raw)
  To: Zsh hackers list

$ zsh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[1]}"; })'
( for ((i=0; i<10000; i++)) do; : "${a[1]}"; done; )  0.07s user 0.03s system 96% cpu 0.102 total

$ zsh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })'
( for ((i=0; i<10000; i++)) do; : "${a[999999]}"; done; )  14.56s user 0.06s system 99% cpu 14.616 total

Accessing array elements via their index is very slow,
especially for large index values.

Comparing with other shells:

~$ bash -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })'

real    0m0.078s
user    0m0.077s
sys     0m0.000s
~$ ksh -c 'a=($(seq 1000000)); time (for ((i=0;i<10000;i++)) { : "${a[999999]}"; })'

real    0m00.02s
user    0m00.02s
sys     0m00.00s

cachegrind reveals the time is spent in arrlen_ge(), as zsh
needs to walk through the whole array to determine whether
999999 is past the end of the array or not.

Why doesn't zsh record the length of arrays to avoid that
gymnastic (and speed up $#array expansion)?

-- 
Stephane


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-16  9:03 accessing array by index very slow Stephane Chazelas
@ 2023-04-16  9:10 ` Roman Perepelitsa
  2023-04-16  9:19   ` Sebastian Gniazdowski
  0 siblings, 1 reply; 7+ messages in thread
From: Roman Perepelitsa @ 2023-04-16  9:10 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, Apr 16, 2023 at 11:04 AM Stephane Chazelas
<stephane@chazelas.org> wrote:
>
> Accessing array elements via their index is very slow,
> especially for large index values.

Indeed, access to array elements is O(N) in zsh instead of O(1) that
one might expect. The same goes for strings, even with no_multibyte.
This makes computation-heavy code incredibly slow. If someone can fix
this, they'll have my eternal gratitude.

On the plus side zsh has fast hashtables (closed hashing with
quadratic probing). Bash's implementation is a lot slower (it's
node-based).

Roman.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-16  9:10 ` Roman Perepelitsa
@ 2023-04-16  9:19   ` Sebastian Gniazdowski
  2023-04-16 15:27     ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Sebastian Gniazdowski @ 2023-04-16  9:19 UTC (permalink / raw)
  To: Roman Perepelitsa; +Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

There was badarrays branch from Mikachu in the past. it was resolving
constant arrlen() calling.

niedz., 16 kwi 2023, 11:10 użytkownik Roman Perepelitsa <
roman.perepelitsa@gmail.com> napisał:

> On Sun, Apr 16, 2023 at 11:04 AM Stephane Chazelas
> <stephane@chazelas.org> wrote:
> >
> > Accessing array elements via their index is very slow,
> > especially for large index values.
>
> Indeed, access to array elements is O(N) in zsh instead of O(1) that
> one might expect. The same goes for strings, even with no_multibyte.
> This makes computation-heavy code incredibly slow. If someone can fix
> this, they'll have my eternal gratitude.
>
> On the plus side zsh has fast hashtables (closed hashing with
> quadratic probing). Bash's implementation is a lot slower (it's
> node-based).
>
> Roman.
>
>

[-- Attachment #2: Type: text/html, Size: 1232 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-16  9:19   ` Sebastian Gniazdowski
@ 2023-04-16 15:27     ` Bart Schaefer
  2023-04-16 22:03       ` Mikael Magnusson
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2023-04-16 15:27 UTC (permalink / raw)
  To: Sebastian Gniazdowski; +Cc: Roman Perepelitsa, Zsh hackers list

On Sun, Apr 16, 2023 at 2:20 AM Sebastian Gniazdowski
<sgniazdowski@gmail.com> wrote:
>
> There was badarrays branch from Mikachu in the past. it was resolving constant arrlen() calling.

I had been rebasing that regularly for a while
(origin/schaefer/badarrays) but lost access to the host with that
clone and didn't keep up the effort elsewhere.  Tried again just now
and get a lot of errors in params.c declarations and on new calls to
functions where Mikachu changed the arguments.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-16 15:27     ` Bart Schaefer
@ 2023-04-16 22:03       ` Mikael Magnusson
  2023-04-18  2:43         ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Mikael Magnusson @ 2023-04-16 22:03 UTC (permalink / raw)
  To: Bart Schaefer; +Cc: Sebastian Gniazdowski, Roman Perepelitsa, Zsh hackers list

On 4/16/23, Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sun, Apr 16, 2023 at 2:20 AM Sebastian Gniazdowski
> <sgniazdowski@gmail.com> wrote:
>>
>> There was badarrays branch from Mikachu in the past. it was resolving
>> constant arrlen() calling.
>
> I had been rebasing that regularly for a while
> (origin/schaefer/badarrays) but lost access to the host with that
> clone and didn't keep up the effort elsewhere.  Tried again just now
> and get a lot of errors in params.c declarations and on new calls to
> functions where Mikachu changed the arguments.

I don't think I ever got it to actually work properly in the first
place, but it would likely be a good place to start looking if anyone
else is interested.

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-16 22:03       ` Mikael Magnusson
@ 2023-04-18  2:43         ` Bart Schaefer
  2023-04-30 19:26           ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2023-04-18  2:43 UTC (permalink / raw)
  To: Mikael Magnusson
  Cc: Sebastian Gniazdowski, Roman Perepelitsa, Zsh hackers list

On Sun, Apr 16, 2023 at 3:03 PM Mikael Magnusson <mikachu@gmail.com> wrote:
>
> I don't think I ever got it to actually work properly in the first
> place, but it would likely be a good place to start looking if anyone
> else is interested.

We seem to have got partway there with arrlen_lt() and arrlen_ge(), a
lot of the merge conflicts hit those.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: accessing array by index very slow
  2023-04-18  2:43         ` Bart Schaefer
@ 2023-04-30 19:26           ` Bart Schaefer
  0 siblings, 0 replies; 7+ messages in thread
From: Bart Schaefer @ 2023-04-30 19:26 UTC (permalink / raw)
  To: Zsh hackers list

I've been poking around at this, and there are a few serious complications.

The first is that we almost always retrieve arrays via a Value struct,
which contains both a pointer to the source Param and the start/end
indexes for an array slice.  The largest reason array accesses aren't
O(1) is due to bounds checking that those desired slices don't point
outside the array.  We've already optimized bounds checking to avoid
scanning the full array, so unless the same array is going to be
checked repeatedly it'd actually make things slower to populate the
cache.

Second and similarly, the best place to initialize the cache is when
assigning or appending to the array, which makes those operations
become O(n), which seems undesirable.

Third, a lot of array operations either use an array constructed on
the fly (array of all values of a hash table, for example) or fetch
the array from the parameter at a point in the program flow that's
abstracted from the operation that needs to do the bounds checking, so
the cached length is no longer readily available; or a subset of the
source array may already have been copied using (start,end) of the
Value, or by scanning the array for a pattern match, so the cached
value no longer applies.

Fourth, the length of a PM_SPECIAL parameter can't reliably be cached
because the source data may change between accesses.  Not true for all
PM_SPECIAL, of course, but recording which ones can or cannot be is an
obstacle.

There is a "numparamvals" global that's reset whenever a hash is
converted to an array, so that might be co-opted to update on other
array copy operations if we're careful.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-30 19:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-16  9:03 accessing array by index very slow Stephane Chazelas
2023-04-16  9:10 ` Roman Perepelitsa
2023-04-16  9:19   ` Sebastian Gniazdowski
2023-04-16 15:27     ` Bart Schaefer
2023-04-16 22:03       ` Mikael Magnusson
2023-04-18  2:43         ` Bart Schaefer
2023-04-30 19:26           ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).