zsh-workers
 help / color / mirror / code / Atom feed
* Array appends are quadratic
@ 2015-11-27  7:37 Daniel Shahaf
  2015-11-27  8:26 ` Mikael Magnusson
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Shahaf @ 2015-11-27  7:37 UTC (permalink / raw)
  To: zsh-workers

Making N appends takes O(N²) time:

    % zsh -fc 'local -a a; time ( repeat 5000 { a+=(foo) } )' 
    ( repeat 5000; do; a+=(foo) ; done; )  1.26s user 0.00s system 99% cpu 1.268 total
    % zsh -fc 'local -a a; time ( repeat 10000 { a+=(foo) } )' 
    ( repeat 10000; do; a+=(foo) ; done; )  5.02s user 0.00s system 99% cpu 5.028 total
    % zsh -fc 'local -a a; time ( repeat 20000 { a+=(foo) } )' 
    ( repeat 20000; do; a+=(foo) ; done; )  19.94s user 0.08s system 99% cpu 20.036 total
    
    % zsh -fc 'local -a a; N=5000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )' 
    ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  2.10s user 0.00s system 99% cpu 2.004 total
    % zsh -fc 'local -a a; N=10000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )' 
    ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  8.33s user 0.41s system 99% cpu 8.506 total
    % zsh -fc 'local -a a; N=20000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )' 
    ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  34.92s user 0.00s system 99% cpu 34.246 total

This is because each append reallocates the array with one extra
element.  This may be confirmed by these debug prints:

    diff --git a/Src/params.c b/Src/params.c
    index b121bd6..a4ca3ce 100644
    --- a/Src/params.c
    +++ b/Src/params.c
    @@ -2579,6 +2579,7 @@ setarrvalue(Value v, char **val)
            if (v->end <= n)
                ll += n - v->end + 1;
     
    +       DPUTS2(1, "setarrvalue: Reallocating '%s' to size %d", v->pm->node.nam, ll);
            p = new = (char **) zshcalloc(sizeof(char *) * (ll + 1));
     
            for (i = 0; i < v->start; i++)
    @@ -2909,6 +2910,7 @@ assignaparam(char *s, char **val, int flags)
                    char **new;
                    int lv = arrlen(val);
     
    +               DPUTS2(1, "assignaparam: Reallocating '%s' to size %d", t, lv+1);
                    new = (char **) zalloc(sizeof(char *) * (lv + 2));
                    *new = ztrdup(getstrvalue(v));
                    memcpy(new+1, val, sizeof(char *) * (lv + 1));

The first example triggers the assignaparam() debug print with values
1,2,…,5000.  The second example triggers the setarrvalue() debug print
with values 5000,5000,…,5000, and has five thousand iterations.

Complexity could be reduced to O(N) through doubling reallocation
(whenever reallocating, allocate not oldsize + O(1) but oldsize * O(1)).

Is there a reason not to do that?  Any edge cases to take care of?

Cheers,

Daniel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-27  7:37 Array appends are quadratic Daniel Shahaf
@ 2015-11-27  8:26 ` Mikael Magnusson
  2015-11-27 10:59   ` Bart Schaefer
  0 siblings, 1 reply; 7+ messages in thread
From: Mikael Magnusson @ 2015-11-27  8:26 UTC (permalink / raw)
  To: Daniel Shahaf; +Cc: zsh workers

On Fri, Nov 27, 2015 at 8:37 AM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> Making N appends takes O(N²) time:
>
>     % zsh -fc 'local -a a; time ( repeat 5000 { a+=(foo) } )'
>     ( repeat 5000; do; a+=(foo) ; done; )  1.26s user 0.00s system 99% cpu 1.268 total
>     % zsh -fc 'local -a a; time ( repeat 10000 { a+=(foo) } )'
>     ( repeat 10000; do; a+=(foo) ; done; )  5.02s user 0.00s system 99% cpu 5.028 total
>     % zsh -fc 'local -a a; time ( repeat 20000 { a+=(foo) } )'
>     ( repeat 20000; do; a+=(foo) ; done; )  19.94s user 0.08s system 99% cpu 20.036 total
>
>     % zsh -fc 'local -a a; N=5000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )'
>     ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  2.10s user 0.00s system 99% cpu 2.004 total
>     % zsh -fc 'local -a a; N=10000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )'
>     ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  8.33s user 0.41s system 99% cpu 8.506 total
>     % zsh -fc 'local -a a; N=20000; time ( a[N]=(); integer i; while (( i++ < N )) { a[i]=foo } )'
>     ( a[N]=() ; integer i; while (( i++ < N )); do; a[i]=foo ; done; )  34.92s user 0.00s system 99% cpu 34.246 total
>
> This is because each append reallocates the array with one extra
> element.  This may be confirmed by these debug prints:
[...]
> The first example triggers the assignaparam() debug print with values
> 1,2,…,5000.  The second example triggers the setarrvalue() debug print
> with values 5000,5000,…,5000, and has five thousand iterations.
>
> Complexity could be reduced to O(N) through doubling reallocation
> (whenever reallocating, allocate not oldsize + O(1) but oldsize * O(1)).
>
> Is there a reason not to do that?  Any edge cases to take care of?

According to your debug prints, the array is reallocated every time it
is assigned to, even when the required element is already allocated.
Doubling the allocation size would do nothing to fix that, presumably.

% a[100]=foo
 params.c:2585: setarrvalue: Reallocating 'a' to size 100
% a[5]=bar
 params.c:2585: setarrvalue: Reallocating 'a' to size 101
% a[10]=bar
 params.c:2585: setarrvalue: Reallocating 'a' to size 101

Do we have an... off-by-two bug somewhere?

-- 
Mikael Magnusson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-27  8:26 ` Mikael Magnusson
@ 2015-11-27 10:59   ` Bart Schaefer
  2015-11-28 18:05     ` Daniel Shahaf
  0 siblings, 1 reply; 7+ messages in thread
From: Bart Schaefer @ 2015-11-27 10:59 UTC (permalink / raw)
  To: zsh workers

On Nov 27,  9:26am, Mikael Magnusson wrote:
} Subject: Re: Array appends are quadratic
}
} According to your debug prints, the array is reallocated every time it
} is assigned to, even when the required element is already allocated.
} Doubling the allocation size would do nothing to fix that, presumably.

Right, this is generic code to handle array splicing, so starting from
an array with 200 elements,

    a[100]=foo

and

    a[50,150]=(foo)

use the same loop even though the former does not change the size of
the resulting array and the latter will waste 100 slots at the top.
I vaguely recall a conversation in which we decided that fewer code
branches was better than attempting to efficiently re-use array slots.


} % a[100]=foo
}  params.c:2585: setarrvalue: Reallocating 'a' to size 100
} % a[5]=bar
}  params.c:2585: setarrvalue: Reallocating 'a' to size 101
} % a[10]=bar
}  params.c:2585: setarrvalue: Reallocating 'a' to size 101
} 
} Do we have an... off-by-two bug somewhere?

No, it just always allocates 1 more than the largest occupied array
index position if the array isn't empty to begin with.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-27 10:59   ` Bart Schaefer
@ 2015-11-28 18:05     ` Daniel Shahaf
  2015-11-28 19:36       ` Peter Stephenson
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Shahaf @ 2015-11-28 18:05 UTC (permalink / raw)
  To: zsh workers

Bart Schaefer wrote on Fri, Nov 27, 2015 at 02:59:58 -0800:
> On Nov 27,  9:26am, Mikael Magnusson wrote:
> } Subject: Re: Array appends are quadratic
> }
> } According to your debug prints, the array is reallocated every time it
> } is assigned to, even when the required element is already allocated.
> } Doubling the allocation size would do nothing to fix that, presumably.
> 
> Right, this is generic code to handle array splicing, so starting from
> an array with 200 elements,
> 
>     a[100]=foo
> 
> and
> 
>     a[50,150]=(foo)
> 
> use the same loop even though the former does not change the size of
> the resulting array and the latter will waste 100 slots at the top.

(The reason it'll waste 100 slots is that it will allocate 200 elements
even though only 200-(150-50+1) are needed.)

> I vaguely recall a conversation in which we decided that fewer code
> branches was better than attempting to efficiently re-use array slots.

I'm not looking to implement sparse arrays.   I just wish it didn't
reallocate unless it needed to.  That _would_ mean some additional code
complexity, but it seems it would be manageable: having the array
remember the number of allocated slots, using memmove() to shift the
past-the-slice portion, copying the slice directly into the slots.

That should make this part of the code O(slice) rather than O(array
length).

The effort involved in implementing the redoubling allocation (for the
other problem indicated in 37236) would be similar: mainly having the
array remember the number of allocated slots.  I'll have to track done
all places in the code that allocate/reallocate arrays, hopefully there
aren't too many.

There may be other parts in the code that do O(n) processing instead
of O(1) — Mikael points out arrlen() as an example — but those can be
improved separately.

> } % a[100]=foo
> }  params.c:2585: setarrvalue: Reallocating 'a' to size 100
> } % a[5]=bar
> }  params.c:2585: setarrvalue: Reallocating 'a' to size 101
> } % a[10]=bar
> }  params.c:2585: setarrvalue: Reallocating 'a' to size 101
> } 
> } Do we have an... off-by-two bug somewhere?
> 
> No, it just always allocates 1 more than the largest occupied array
> index position if the array isn't empty to begin with.
> 

Cheers,

Daniel

P.S. It seems that bin_compquote is calling ztrdup() on every element of
an array parameter passed as an argument, so it would reallocate all array
members even if none of them need quoting.  I'm not planning to look
into it further, though, since I haven't run into it in practice yet.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-28 18:05     ` Daniel Shahaf
@ 2015-11-28 19:36       ` Peter Stephenson
  2015-11-30  3:20         ` Daniel Shahaf
  2015-12-03 23:36         ` Daniel Shahaf
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Stephenson @ 2015-11-28 19:36 UTC (permalink / raw)
  To: zsh workers

On Sat, 28 Nov 2015 18:05:17 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> The effort involved in implementing the redoubling allocation (for the
> other problem indicated in 37236) would be similar: mainly having the
> array remember the number of allocated slots.  I'll have to track done
> all places in the code that allocate/reallocate arrays, hopefully there
> aren't too many.

I suspect (though I haven't actually done a search) you're out of luck
here, as there's a wide assumption that null-terminated char **'s are
the the way to carray a fixed number of elements with generic contents
that might get presented to the user.  See for example the number of
uses of mkarry().

Not allocating when the size hasn't changed should be much easier and is
easy to test for.

pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-28 19:36       ` Peter Stephenson
@ 2015-11-30  3:20         ` Daniel Shahaf
  2015-12-03 23:36         ` Daniel Shahaf
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Shahaf @ 2015-11-30  3:20 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh workers

Peter Stephenson wrote on Sat, Nov 28, 2015 at 19:36:52 +0000:
> On Sat, 28 Nov 2015 18:05:17 +0000
> Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > The effort involved in implementing the redoubling allocation (for the
> > other problem indicated in 37236) would be similar: mainly having the
> > array remember the number of allocated slots.  I'll have to track done
> > all places in the code that allocate/reallocate arrays, hopefully there
> > aren't too many.
> 
> I suspect (though I haven't actually done a search) you're out of luck
> here, as there's a wide assumption that null-terminated char **'s are
> the the way to carray a fixed number of elements with generic contents
> that might get presented to the user.  See for example the number of
> uses of mkarry().

Thanks for the pointer, I shall have a look.

> Not allocating when the size hasn't changed should be much easier and is
> easy to test for.

It'll be just a local fix to setarrvalue(), and it would fix the problem
for callers that preallocate the array (by doing 'a[10000]=""').  Sounds
simple enough.  I'll look into it.

Cheers,

Daniel

> 
> pws


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Array appends are quadratic
  2015-11-28 19:36       ` Peter Stephenson
  2015-11-30  3:20         ` Daniel Shahaf
@ 2015-12-03 23:36         ` Daniel Shahaf
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Shahaf @ 2015-12-03 23:36 UTC (permalink / raw)
  To: zsh-workers

Peter Stephenson wrote on Sat, Nov 28, 2015 at 19:36:52 +0000:
> On Sat, 28 Nov 2015 18:05:17 +0000
> Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > The effort involved in implementing the redoubling allocation (for the
> > other problem indicated in 37236) would be similar: mainly having the
> > array remember the number of allocated slots.  I'll have to track done
> > all places in the code that allocate/reallocate arrays, hopefully there
> > aren't too many.
> 
> I suspect (though I haven't actually done a search) you're out of luck
> here, as there's a wide assumption that null-terminated char **'s are
> the the way to carray a fixed number of elements with generic contents
> that might get presented to the user.  See for example the number of
> uses of mkarry().

I'm not sure I see the problem.  Doubling reallocations would require
the array to track its allocated size (the N in «pm->u.arr = alloc(N
* sizeof(void*))», which is strictly greater than its arrlen).  I assume
only affect places that modify the array would be affected: that is,
places that modify pm->u.arr directly (list below) and places that call
setfn.  So, I think doubling allocations can be made to work, if each
place that modifies the array knows what arrlen() would be after the
modification, in order to be able to cheaply determine whether its
smaller than the allocated length.

So maybe it's better to first teach arrays to know their arrlen in O(1),
before investigating doubling allocations.  (Mikael had investigated
this direction.)

Daniel

typeset_single:Src/builtin.c:2428:     tdp->arrptr = &altpm->u.arr;
bin_unset:Src/builtin.c:3379:                     if (start < arrlen(vbuf.pm->u.arr)) {
restore_params:Src/exec.c:3898:                    tpm->gsu.a->setfn(tpm, pm->u.arr);
copyparam:Src/params.c:1028:      tpm->u.arr = zarrdup(pm->gsu.a->getfn(pm));
arrgetfn:Src/params.c:3341:    return pm->u.arr ? pm->u.arr : &nullarray;
arrsetfn:Src/params.c:3350:    if (pm->u.arr && pm->u.arr != x)
arrsetfn:Src/params.c:3351:      freearray(pm->u.arr);
arrsetfn:Src/params.c:3354:    pm->u.arr = x;
scanendscope:Src/params.c:5096:                  pm->gsu.a->setfn(pm, tpm->u.arr);
paramsubst:Src/subst.c:2499:               pm->u.arr = aval;


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-12-03 23:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-27  7:37 Array appends are quadratic Daniel Shahaf
2015-11-27  8:26 ` Mikael Magnusson
2015-11-27 10:59   ` Bart Schaefer
2015-11-28 18:05     ` Daniel Shahaf
2015-11-28 19:36       ` Peter Stephenson
2015-11-30  3:20         ` Daniel Shahaf
2015-12-03 23:36         ` Daniel Shahaf

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).