zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] Optimization of getarrvalue()
@ 2016-11-08 20:11 ` Sebastian Gniazdowski
  2016-11-08 21:58   ` Bart Schaefer
                     ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-08 20:11 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]

Hello
The function contains arrdup() that includes elements beyond end index.
I've replaced it with arrdup_max() that has limit parameter – will
duplicate at most that limit of elements. Following test code:

test_fun() {
    local -a arr
    repeat 12; do
        arr+=( "abcdefghijklmnop" $arr )
    done

    local -a arr2
    repeat 2000; do
        arr2=( ${arr[1,300]} )
    done
}

generates array of 4095 elements, and the running times are 1099 ms
(optimized) vs 2038 ms (unoptimized). The array generation utilizes
previous array optimization.

More, I suspect a memory leak in following code that has been replaced:

    if (v->end <= v->start)
        s[0] = NULL;
    else if (arrlen_ge(s, v->end - v->start))
        s[v->end - v->start] = NULL;

That code adapts array according to end index – however it seems that
strings after the NULL are then unreachable to freearray() ? That said,
I wasn't able to obtain the memory leak with repeated ${arr[1,2]} use
for $#arr > 2.

Interesting that some tests fail (e.g. ./Y03arguments.ztst) if I here
duplicate nular instead of doing:

    } else if (v->end <= v->start) {
        s = arrdup_max(s, 1);
        s[0] = NULL;

like in the original code. The test output is then:

@@ -6,6 +6,6 @@
 line: {tst r}{}
 line: {tst x}{}
 line: {tst x }{}
-MESSAGE:{no more arguments}
+MESSAGE:{}
 line: {tst x y }{}
-MESSAGE:{no more arguments}
+MESSAGE:{}

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com

[-- Attachment #2: arr_opt.diff --]
[-- Type: text/plain, Size: 1937 bytes --]

diff --git a/Src/params.c b/Src/params.c
index 5fab84a..1e0bece 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2290,14 +2290,24 @@ getarrvalue(Value v)
 	v->start += arrlen(s);
     if (v->end < 0)
 	v->end += arrlen(s) + 1;
-    if (arrlen_lt(s, v->start) || v->start < 0)
+
+    /* Null if 1) array too short, 2) index still negative */
+    if (arrlen_lt(s, v->start) || v->start < 0) {
 	s = arrdup(nular);
-    else
-	s = arrdup(s + v->start);
-    if (v->end <= v->start)
-	s[0] = NULL;
-    else if (arrlen_ge(s, v->end - v->start))
-	s[v->end - v->start] = NULL;
+    } else if (v->end <= v->start) {
+        s = arrdup_max(s, 1);
+        s[0] = NULL;
+    } else {
+        /* Here copying not till the end of the source array is handled
+         * -- arrdup_max will copy at most v->end - v->start elements,
+         * starting from v->start element. Original code said:
+	 *  s[v->end - v->start] = NULL
+         * which means that there are exactly the same number of
+         * elements as the value of the above *0-based* index.
+         * */
+	s = arrdup_max(s + v->start, v->end - v->start);
+    }
+
     return s;
 }
 
diff --git a/Src/utils.c b/Src/utils.c
index 733f570..e4351ad 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4231,6 +4231,37 @@ arrdup(char **s)
 
 /**/
 mod_export char **
+arrdup_max(char **s, unsigned max)
+{
+    char **x, **y, **s_bkp, *bkp;
+    int len;
+
+    len = arrlen(s);
+
+    /* Limit has sense only if not equal to len */
+    if (max < len) {
+        s_bkp = s;
+
+        /* Num. of elements == max, sentinel *index* is the same */
+        bkp = s[max];
+        s[max] = NULL;
+    } else {
+        max = len;
+    }
+
+    y = x = (char **) zhalloc(sizeof(char *) * (max + 1));
+
+    while ((*x++ = dupstring(*s++)));
+
+    if (max < len) {
+        s_bkp[max] = bkp;
+    }
+
+    return y;
+}
+
+/**/
+mod_export char **
 zarrdup(char **s)
 {
     char **x, **y;

[-- Attachment #3: testopt5.zsh --]
[-- Type: application/octet-stream, Size: 272 bytes --]

#!Src/zsh-arr5-before
#!Src/zsh-arr5

zmodload zsh/zprof

test_fun() {
    local -a arr
    repeat 12; do
        arr+=( "abcdefghijklmnop" $arr )
    done

    local -a arr2
    repeat 2000; do
        arr2=( ${arr[1,300]} )
    done
}

test_fun
test_fun
test_fun

zprof

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-08 20:11 ` [PATCH] Optimization of getarrvalue() Sebastian Gniazdowski
@ 2016-11-08 21:58   ` Bart Schaefer
  2016-11-09  7:11   ` Bart Schaefer
  2016-11-09 11:42   ` Peter Stephenson
  2 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-11-08 21:58 UTC (permalink / raw)
  To: zsh-workers

On Nov 8, 12:11pm, Sebastian Gniazdowski wrote:
}
} More, I suspect a memory leak in following code that has been replaced:
} 
}     if (v->end <= v->start)
}         s[0] = NULL;
}     else if (arrlen_ge(s, v->end - v->start))
}         s[v->end - v->start] = NULL;
} 
} That code adapts array according to end index – however it seems that
} strings after the NULL are then unreachable to freearray() ?

You perhaps haven't noticed that arrdup() uses zhalloc() and dupstring()?
Everything is on the heap and is freed by freeheap() or popheap(), never
by freearray().

} Interesting that some tests fail (e.g. ./Y03arguments.ztst) if I here
} duplicate nular instead of doing:

I'm not sure I understand from this exactly what you're describing, but
it seems likely to have something to do with confusing zsh heap memory
with directly-malloc'd memory.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-08 20:11 ` [PATCH] Optimization of getarrvalue() Sebastian Gniazdowski
  2016-11-08 21:58   ` Bart Schaefer
@ 2016-11-09  7:11   ` Bart Schaefer
  2016-11-09 11:42   ` Peter Stephenson
  2 siblings, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-11-09  7:11 UTC (permalink / raw)
  To: Sebastian Gniazdowski, zsh-workers

On Nov 8, 12:11pm, Sebastian Gniazdowski wrote:
}
} Interesting that some tests fail (e.g. ./Y03arguments.ztst) if I here
} duplicate nular instead of doing:
} 
}     } else if (v->end <= v->start) {
}         s = arrdup_max(s, 1);
}         s[0] = NULL;
} 
} like in the original code.

Looking at this more closely, that's because nular isn't actually zero
length array, it's an array with a single empty string value.  If you
leave out the s[0]=NULL assignment you're changing the results.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-08 20:11 ` [PATCH] Optimization of getarrvalue() Sebastian Gniazdowski
  2016-11-08 21:58   ` Bart Schaefer
  2016-11-09  7:11   ` Bart Schaefer
@ 2016-11-09 11:42   ` Peter Stephenson
  2016-11-09 16:03     ` Bart Schaefer
  2 siblings, 1 reply; 22+ messages in thread
From: Peter Stephenson @ 2016-11-09 11:42 UTC (permalink / raw)
  To: zsh-workers

On Tue, 08 Nov 2016 12:11:39 -0800
Sebastian Gniazdowski <psprint@fastmail.com> wrote:
> The function contains arrdup() that includes elements beyond end index.
> I've replaced it with arrdup_max() that has limit parameter – will
> duplicate at most that limit of elements.

I'm not 100% sure that it's safe to write to the source string in all
cases (in particular, if the array is special).  I've replaced this with
the following.  It shouldn't make any significant difference.

pws

diff --git a/Src/params.c b/Src/params.c
index 3f01792..772345b 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2294,14 +2294,24 @@ getarrvalue(Value v)
 	v->start += arrlen(s);
     if (v->end < 0)
 	v->end += arrlen(s) + 1;
-    if (arrlen_lt(s, v->start) || v->start < 0)
+
+    /* Null if 1) array too short, 2) index still negative */
+    if (arrlen_lt(s, v->start) || v->start < 0) {
 	s = arrdup(nular);
-    else
-	s = arrdup(s + v->start);
-    if (v->end <= v->start)
-	s[0] = NULL;
-    else if (arrlen_ge(s, v->end - v->start))
-	s[v->end - v->start] = NULL;
+    } else if (v->end <= v->start) {
+        s = arrdup_max(s, 1);
+        s[0] = NULL;
+    } else {
+        /* Copy to a point before the end of the source array:
+         * arrdup_max will copy at most v->end - v->start elements,
+         * starting from v->start element. Original code said:
+	 *  s[v->end - v->start] = NULL
+         * which means that there are exactly the same number of
+         * elements as the value of the above *0-based* index.
+         */
+	s = arrdup_max(s + v->start, v->end - v->start);
+    }
+
     return s;
 }
 
diff --git a/Src/utils.c b/Src/utils.c
index 93e757c..3d535b8 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4229,6 +4229,31 @@ arrdup(char **s)
     return y;
 }
 
+/* Duplicate at most max elements of the array s with heap memory */
+
+/**/
+mod_export char **
+arrdup_max(char **s, unsigned max)
+{
+    char **x, **y, **send;
+    int len;
+
+    len = arrlen(s);
+
+    /* Limit has sense only if not equal to len */
+    if (max > len)
+        max = len;
+
+    y = x = (char **) zhalloc(sizeof(char *) * (max + 1));
+
+    send = s + max;
+    while (s < send)
+	*x++ = dupstring(*s++);
+    *x = NULL;
+
+    return y;
+}
+
 /**/
 mod_export char **
 zarrdup(char **s)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-09 11:42   ` Peter Stephenson
@ 2016-11-09 16:03     ` Bart Schaefer
  2016-11-14 12:32       ` Jun T.
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-11-09 16:03 UTC (permalink / raw)
  To: zsh-workers

On Nov 9, 11:42am, Peter Stephenson wrote:
}
} +    } else if (v->end <= v->start) {
} +        s = arrdup_max(s, 1);
} +        s[0] = NULL;

Might as well make it obvious?

diff --git a/Src/params.c b/Src/params.c
index ef72cba..6736aa5 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2293,10 +2293,9 @@ getarrvalue(Value v)
 
     /* Null if 1) array too short, 2) index still negative */
     if (arrlen_lt(s, v->start) || v->start < 0) {
-	s = arrdup(nular);
+	s = arrdup_max(nular, 1);
     } else if (v->end <= v->start) {
-        s = arrdup_max(s, 1);
-        s[0] = NULL;
+        s = arrdup_max(nular, 0);
     } else {
         /* Copy to a point before the end of the source array:
          * arrdup_max will copy at most v->end - v->start elements,


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-09 16:03     ` Bart Schaefer
@ 2016-11-14 12:32       ` Jun T.
  2016-11-14 13:15         ` Jun T.
                           ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jun T. @ 2016-11-14 12:32 UTC (permalink / raw)
  To: zsh-workers

After the following commit

commit a1633e09a761b9135a0a7084d2489d359a004e5a
Author: Peter Stephenson <pws@zsh.org>
Date:   Wed Nov 9 11:54:57 2016 +0000

    39886 based on 39877: Optimise arrdup to arrdup_max.

I'm having a problem with (some of) the completions which use
_sequence. For example, the following will produce bogus output:

zsh% ps -p <TAB>
zsh% mount -t <TAB>

The problem comes from the last line of _sequence:

"${(@)argv[1,minus-1]}" .... "${(@)argv[minus+1,-1]}"

In the case of 'ps -p <TAB>', argv=( _pids ) and minus=2.
We expect that, since it has a (@) flag, "${(@)argv[3,-1]}" is
entirely removed from the command line. This is indeed the case
before the commit:

zsh% a=( foo )
zsh% nargs () { print $#@ }
zsh% nargs "${a[3,-1]}"
1
zsh% nargs "${(@)a[3,-1]}"
0

But after the commit a1633e0, both output 1. This means an
empty string '' is passed to _pids and then down to compadd.
I still don't understand why compadd is confused by '' in its
arguments.

The patch below seems to make it work as before the commit
a1633e0; 'ps -p <TAB>' etc. works again. BUT:

The original (before a1633e0) behavior is already quite
confusing to me. For example,

zsh% nargs "${(@)a[i]}"

will output 0 only for i=0. On the other hand

zsh% nargs "${(@)a[i,i]}"

will output 0 for i=0 and 2.
If I replace
   arrlen_lt(s, v->start) by arrlen_le(s, v->start)
(which may look reasonable since the array s[] is 0-based)
then nargs "${(@)a[i,i]}" will output 0 only for i=0.
But then "make check" fails in two tests (D04parameter and
Y01completion).



diff --git a/Src/params.c b/Src/params.c
index 6f587a3..eee1cb8 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2292,10 +2292,11 @@ getarrvalue(Value v)
 	v->end += arrlen(s) + 1;
 
     /* Null if 1) array too short, 2) index still negative */
-    if (arrlen_lt(s, v->start) || v->start < 0) {
-	s = arrdup_max(nular, 1);
-    } else if (v->end <= v->start) {
+    if (v->end <= v->start) {
         s = arrdup_max(nular, 0);
+    }
+    else if (arrlen_lt(s, v->start) || v->start < 0) {
+	s = arrdup_max(nular, 1);
     } else {
         /* Copy to a point before the end of the source array:
          * arrdup_max will copy at most v->end - v->start elements,






^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 12:32       ` Jun T.
@ 2016-11-14 13:15         ` Jun T.
  2016-11-14 13:57         ` Peter Stephenson
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Jun T. @ 2016-11-14 13:15 UTC (permalink / raw)
  To: zsh-workers


2016/11/14 21:32, Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:

> I still don't understand why compadd is confused by '' in its
> arguments.

It was rather obvious. With the following
   compadd '' -a array
compadd thinks that the possible completions are
   ''  '-a'  and  'array'
rather than the words in $array.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 12:32       ` Jun T.
  2016-11-14 13:15         ` Jun T.
@ 2016-11-14 13:57         ` Peter Stephenson
  2016-11-14 15:35           ` Jun T.
  2016-11-14 17:10           ` Bart Schaefer
  2016-11-15 12:28         ` Peter Stephenson
  2016-11-15 19:57         ` Peter Stephenson
  3 siblings, 2 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-11-14 13:57 UTC (permalink / raw)
  To: zsh-workers

On Mon, 14 Nov 2016 21:32:19 +0900
Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:
> The original (before a1633e0) behavior is already quite
> confusing to me. For example,
> 
> zsh% nargs "${(@)a[i]}"
> 
> will output 0 only for i=0. On the other hand
> 
> zsh% nargs "${(@)a[i,i]}"
> 
> will output 0 for i=0 and 2.
> If I replace
>    arrlen_lt(s, v->start) by arrlen_le(s, v->start)
> (which may look reasonable since the array s[] is 0-based)
> then nargs "${(@)a[i,i]}" will output 0 only for i=0.
> But then "make check" fails in two tests (D04parameter and
> Y01completion).

If we can agree your patch is OK as far as it goes I'll have a look and
see if I can understand those failures with that additional change.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 13:57         ` Peter Stephenson
@ 2016-11-14 15:35           ` Jun T.
  2016-11-14 17:10           ` Bart Schaefer
  1 sibling, 0 replies; 22+ messages in thread
From: Jun T. @ 2016-11-14 15:35 UTC (permalink / raw)
  To: zsh-workers


2016/11/14 22:57, Peter Stephenson <p.stephenson@samsung.com> wrote:
> 
> If we can agree your patch is OK as far as it goes I'll have a look and
> see if I can understand those failures with that additional change.

I've pushed my patch (without replacing arrlen_lt() by arrlen_le()).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 13:57         ` Peter Stephenson
  2016-11-14 15:35           ` Jun T.
@ 2016-11-14 17:10           ` Bart Schaefer
  2016-11-16  7:55             ` Sebastian Gniazdowski
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-11-14 17:10 UTC (permalink / raw)
  To: zsh-workers

On Nov 14,  1:57pm, Peter Stephenson wrote:
}
} If we can agree your patch is OK as far as it goes I'll have a look and
} see if I can understand those failures with that additional change.

The original code structure was

    if (A)
      a;
    else
      x;
    if (B)
      b;
    else if (C)
      c;

This changed to

    if (A)
      a;
    else if (B)
      b;
    else
      c and x;

This means that in the new structure, b never happens when A.  In the
old code we might allocate the array ('') and then truncate to (), but
in the new code we allocate ('') and then do nothing.

This stems I think from misunderstanding that "nular" is an array of one
nul string, not an empty array.

Jun's change makes this

    if (B)
      b;
    else if (A)
      a;
    else
      c and x;

which looks correct to me because b results in an empty array in either
the original structure or the new one.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 12:32       ` Jun T.
  2016-11-14 13:15         ` Jun T.
  2016-11-14 13:57         ` Peter Stephenson
@ 2016-11-15 12:28         ` Peter Stephenson
  2016-11-15 19:57         ` Peter Stephenson
  3 siblings, 0 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-11-15 12:28 UTC (permalink / raw)
  To: zsh-workers

On Mon, 14 Nov 2016 21:32:19 +0900
Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:
> The problem comes from the last line of _sequence:
> 
> "${(@)argv[1,minus-1]}" .... "${(@)argv[minus+1,-1]}"
> 
> In the case of 'ps -p <TAB>', argv=( _pids ) and minus=2.
> We expect that, since it has a (@) flag, "${(@)argv[3,-1]}" is
> entirely removed from the command line. This is indeed the case
> before the commit:
> 
> zsh% a=( foo )
> zsh% nargs () { print $#@ }
> zsh% nargs "${a[3,-1]}"
> 1
> zsh% nargs "${(@)a[3,-1]}"
> 0

Here's a test for this.

pws

diff --git a/Test/D04parameter.ztst b/Test/D04parameter.ztst
index 97c8ba3..4cbd2fa 100644
--- a/Test/D04parameter.ztst
+++ b/Test/D04parameter.ztst
@@ -2041,3 +2041,11 @@
   () { print -r -- "${(q)1}" "${(b)1}" "${(qq)1}" } '=foo'
 0:(q) and (b) quoting deal with the EQUALS option
 >\=foo =foo '=foo'
+
+  args() { print $#; }
+  a=(foo)
+  args "${a[3,-1]}"
+  args "${(@)a[3,-1]}"
+0:Out-of-range multiple array subscripts with quoting, with and without (@)
+>1
+>0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 12:32       ` Jun T.
                           ` (2 preceding siblings ...)
  2016-11-15 12:28         ` Peter Stephenson
@ 2016-11-15 19:57         ` Peter Stephenson
  2016-11-15 21:11           ` Bart Schaefer
  2016-11-16 14:06           ` Jun T.
  3 siblings, 2 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-11-15 19:57 UTC (permalink / raw)
  To: zsh-workers

On Mon, 14 Nov 2016 21:32:19 +0900
"Jun T." <takimoto-j@kba.biglobe.ne.jp> wrote:
> The original (before a1633e0) behavior is already quite
> confusing to me. For example,
> 
> zsh% nargs "${(@)a[i]}"
> 
> will output 0 only for i=0. On the other hand
> 
> zsh% nargs "${(@)a[i,i]}"
> 
> will output 0 for i=0 and 2.

0 is an invalid subscript, which is probably the difference in the first
case between it and other values.

> If I replace
>    arrlen_lt(s, v->start) by arrlen_le(s, v->start)
> (which may look reasonable since the array s[] is 0-based)
> then nargs "${(@)a[i,i]}" will output 0 only for i=0.
> But then "make check" fails in two tests (D04parameter and
> Y01completion).

The parameter test is testing

setopt rcexpandparam
local -A hash=(X x)                       
print LOST key=$hash[(I)y] val=$hash[(R)Y]

outputs LOST with the arguments removed.  With your change you get an
array wth an empty element and that doesn't happen.

I guess the differences are to do with the way the value of the end
index is used, with the hash case apparently behaving more like an array
slice than an invidivudal index and the latter being the odd one out
(but also the most commonly used case).  It may be too late to
disentangle this.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-15 19:57         ` Peter Stephenson
@ 2016-11-15 21:11           ` Bart Schaefer
  2016-11-16 14:06           ` Jun T.
  1 sibling, 0 replies; 22+ messages in thread
From: Bart Schaefer @ 2016-11-15 21:11 UTC (permalink / raw)
  To: zsh-workers

On Nov 15,  7:57pm, Peter Stephenson wrote:
}
} > But then "make check" fails in two tests (D04parameter and
} > Y01completion).
} 
} The parameter test is testing
} 
} setopt rcexpandparam
} local -A hash=(X x)                       
} print LOST key=$hash[(I)y] val=$hash[(R)Y]
} 
} outputs LOST with the arguments removed.  With your change you get an
} array wth an empty element and that doesn't happen.

An array with an empty element would be wrong in either of those caes.
(Sebastian reported the same thing when he tried removing s[0]=NULL
from one branch in his original patch.)

} I guess the differences are to do with the way the value of the end
} index is used, with the hash case apparently behaving more like an
} array slice than an invidivudal index

Yes, intentionally so -- (I) and (R) are defined on hashes to return
*all* the possible matches for the pattern, so the result is an array
even if there is only one matching element.  If you want a single
element, use (i) or (r).  Hashes don't have an ordering, so it's
meaningless to distinguish between the "first" and "last" elements.
("First" just means "most easily identified".)

} and the latter being the odd one out
} (but also the most commonly used case).  It may be too late to
} disentangle this.

Semantically, I don't think there's anything to disentangle?  As for
the implementation it might be nice not to have to reduce a hash to an
array of either keys or values quite so frequently, as that might have
avoided this confusion by never treating a hash as though it has any
numeric indices.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-14 17:10           ` Bart Schaefer
@ 2016-11-16  7:55             ` Sebastian Gniazdowski
  0 siblings, 0 replies; 22+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-16  7:55 UTC (permalink / raw)
  To: zsh-workers

On Mon, Nov 14, 2016, at 09:10 AM, Bart Schaefer wrote:
> The original code structure was
> 
>     if (A)
>       a;
>     else
>       x;
>     if (B)
>       b;
>     else if (C)
>       c;
> 
...
> This means that in the new structure, b never happens when A.  In the
> old code we might allocate the array ('') and then truncate to (), but
> in the new code we allocate ('') and then do nothing.

I was aware of nular being something different because tests failed
(it's an easy thing to find though, but I chosen to signal problem
instead of solving it), however as the two if blocks are not connected
with else, nular can be, as you said, "again" truncated to 0 length.
That I didn't foresee.

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-15 19:57         ` Peter Stephenson
  2016-11-15 21:11           ` Bart Schaefer
@ 2016-11-16 14:06           ` Jun T.
  2016-11-16 16:14             ` Jun T.
  2016-11-16 18:50             ` Bart Schaefer
  1 sibling, 2 replies; 22+ messages in thread
From: Jun T. @ 2016-11-16 14:06 UTC (permalink / raw)
  To: zsh-workers


On 2016/11/16, at 4:57, Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:

> On Mon, 14 Nov 2016 21:32:19 +0900
> "Jun T." <takimoto-j@kba.biglobe.ne.jp> wrote:
>> 
>> zsh% nargs "${(@)a[i]}"
>> 
>> will output 0 only for i=0. On the other hand
>> 
>> zsh% nargs "${(@)a[i,i]}"
>> 
>> will output 0 for i=0 and 2.
> 
> 0 is an invalid subscript, which is probably the difference in the first
> case between it and other values.

The document says (15.2.1)

   If  the  KSH_ARRAYS  option  is not set, then by default accesses to an
   array element with a subscript that evaluates to zero return  an  empty
   string,

so someone may expect that "${(@)a[0]}" is also replaced by an empty
string "" rather than removed from the command line. But I'm not sure.
And I fear changing this behavior might break some existing scripts.

> setopt rcexpandparam
> local -A hash=(X x)                       
> print LOST key=$hash[(I)y] val=$hash[(R)Y]
> 
> outputs LOST with the arguments removed.  With your change you get an
> array wth an empty element and that doesn't happen.

I haven't yet understood how associative arrays are handled in C code,
but how about the following patch? With this patch all the test passes,
and "${(@)a[i,i]}" becomes an empty array (and removed from the command
line) only if i=0. I think no (reasonable) scripts are relying on the
current behavior that "${(@)a[i,i]}" is removed if i==($#a+1).


diff --git a/Src/params.c b/Src/params.c
index 3c8658c..c501e5d 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2291,11 +2291,12 @@ getarrvalue(Value v)
     if (v->end < 0)
 	v->end += arrlen(s) + 1;
 
-    /* Null if 1) array too short, 2) index still negative */
-    if (v->end <= v->start) {
+    if (!*s || v->end <= v->start) {
+	/* Empty array if 1) s[] is empty or 2) inconsistent indexes */
 	s = arrdup_max(nular, 0);
     }
-    else if (arrlen_lt(s, v->start) || v->start < 0) {
+    else if (arrlen_le(s, v->start) || v->start < 0) {
+	/* Null if 1) array too short, 2) index still negative */
 	s = arrdup_max(nular, 1);
     } else {
         /* Copy to a point before the end of the source array:





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-16 14:06           ` Jun T.
@ 2016-11-16 16:14             ` Jun T.
  2016-11-16 18:50             ` Bart Schaefer
  1 sibling, 0 replies; 22+ messages in thread
From: Jun T. @ 2016-11-16 16:14 UTC (permalink / raw)
  To: zsh-workers


2016/11/16 23:06, Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:
> 
> I haven't yet understood how associative arrays are handled in C code,
> but how about the following patch? With this patch all the test passes,

But with the patch, and if a=() ,"${(@)a[i,j]}" is always removed from
the command line; this is quite different from the current behavior.
Maybe it's not a good idea.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-16 14:06           ` Jun T.
  2016-11-16 16:14             ` Jun T.
@ 2016-11-16 18:50             ` Bart Schaefer
  2016-11-21 12:30               ` Jun T.
  1 sibling, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-11-16 18:50 UTC (permalink / raw)
  To: zsh-workers

On Nov 16, 11:06pm, Jun T. wrote:
}
} The document says (15.2.1)
} 
}    If  the  KSH_ARRAYS  option  is not set, then by default accesses to an
}    array element with a subscript that evaluates to zero return  an  empty
}    string,
} 
} so someone may expect that "${(@)a[0]}" is also replaced by an empty
} string "" rather than removed from the command line. But I'm not sure.

"Return an empty string" is not entirely accurate.  For example:

torch% setopt nounset
torch% a=()
torch% print $a[0]
zsh: a[0]: parameter not set

It would be more accurate to say "accesses to an array element with a
subscript that evaluates to zero behave in the same way as accesses to
any unset scalar parameter."

} And I fear changing this behavior might break some existing scripts.

The reason too-large subscripts behave differently than too-small is
that arrays can be extended to the right but not to the left.  That is,
if you assign to ary[bignum] you implicitly create empty elements for
any unoccupied indices less than bignum.  (This differs from e.g. bash
which uses a sparse implementation of arrays so the intervening elements
remain unset.)  A reference "off the right end" of the array behaves
consistently with a reference into such a "gap".

(This is also in part why I attempted to engage on NO_UNSET being too
aggressive with arrays, but nobody took up the discussion.)

Zero is special in that it's neither a "gap" nor an error (unless you try
to assign to it).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-16 18:50             ` Bart Schaefer
@ 2016-11-21 12:30               ` Jun T.
  2016-11-24  0:55                 ` Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Jun T. @ 2016-11-21 12:30 UTC (permalink / raw)
  To: zsh-workers


On 2016/11/17, at 3:50, Bart Schaefer <schaefer@brasslantern.com> wrote:
> 
> It would be more accurate to say "accesses to an array element with a
> subscript that evaluates to zero behave in the same way as accesses to
> any unset scalar parameter."

Maybe, but I still don't understand, for example, the following:

% a=()
% nargs "${(@)a[0]}"
1
% a=( foo )
% nargs "${(@)a[0]}"
0

But anyway zero is "special" and treating it specially may be OK.

My main concern was, for example,

% a=( foo bar )
% nargs "${(@)a[3]}"
1
% nargs "${(@)a[3,3]}"     # or "${(@)a[3,4]}"  "${(@)a[3,5]}" ...
0

More generally, $a[i,j] with i=$#a+1 is treated specially.

For a=(), treating i=j=1 specially is necessary for RC_EXPAND_PARAM
to work correctly. But for non-empty array, it may be possible to
make i=$#a+1 behave similarly as other out-of-range values.
But maybe it is not worth doing if no one is complaining about this.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-21 12:30               ` Jun T.
@ 2016-11-24  0:55                 ` Bart Schaefer
  2016-11-24 11:49                   ` Jun T.
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-11-24  0:55 UTC (permalink / raw)
  To: Jun T.; +Cc: zsh-workers

On Mon, Nov 21, 2016 at 4:30 AM, Jun T. <takimoto-j@kba.biglobe.ne.jp> wrote:
>
> Maybe, but I still don't understand, for example, the following:
>
> % a=()
> % nargs "${(@)a[0]}"
> 1
> % a=( foo )
> % nargs "${(@)a[0]}"
> 0

OK, I agree that one is strange.  However:

> % a=( foo bar )
> % nargs "${(@)a[3]}"
> 1
> % nargs "${(@)a[3,3]}"     # or "${(@)a[3,4]}"  "${(@)a[3,5]}" ...
> 0

Two subscripts separated by a comma mean something different than one
subscript.  a[3] is a scalar, so adding (@) doesn't matter; a scalar
in double quotes is the empty string.  a[3,3] is an array, and an
array with the (@) flag in double quotes resolves to nothing if the
array is empty.  The only magic here is the already-magic semantics of
double-quoting "${a[@]}".

> More generally, $a[i,j] with i=$#a+1 is treated specially.

??  It's not treated any differently than i=$#a+N for any N >=0, is it?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Optimization of getarrvalue()
  2016-11-24  0:55                 ` Bart Schaefer
@ 2016-11-24 11:49                   ` Jun T.
  2016-11-29  6:11                     ` Array slices that don't exist [was Optimization of getarrvalue()] Bart Schaefer
  0 siblings, 1 reply; 22+ messages in thread
From: Jun T. @ 2016-11-24 11:49 UTC (permalink / raw)
  To: zsh-workers


On 2016/11/24, at 9:55, Bart Schaefer <schaefer@brasslantern.com> wrote:

>> More generally, $a[i,j] with i=$#a+1 is treated specially.
> 
> ??  It's not treated any differently than i=$#a+N for any N >=0, is it?

Suppose we have an array with two elements ($#a = 2)
% a=(one two)
and see what is returned by
% nargs "${(@)a[i,j]}"

For i >= 4 or i <= -3 it returns 1 if j >= i,
while for i=3 (=$#a+1) it returns 0 always.

But, as I said before, if no one is complaining about this we can leave
it as is.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Array slices that don't exist [was Optimization of getarrvalue()]
  2016-11-24 11:49                   ` Jun T.
@ 2016-11-29  6:11                     ` Bart Schaefer
  2016-11-29  9:34                       ` Peter Stephenson
  0 siblings, 1 reply; 22+ messages in thread
From: Bart Schaefer @ 2016-11-29  6:11 UTC (permalink / raw)
  To: zsh-workers

On Nov 24,  8:49pm, Jun T. wrote:
}
} % a=(one two)
} and see what is returned by
} % nargs "${(@)a[i,j]}"
} 
} For i >= 4 or i <= -3 it returns 1 if j >= i,
} while for i=3 (=$#a+1) it returns 0 always.

Hmm.

Internally the array $a is stored as a[0]="one" a[1]="two" a[2]=NULL so
when we look at $a[i,j] we decrement i but not j to look at the internal
elements starting with a[i-1] and ending before (not including) a[j].

This means that if $i = $#a we're always going to take the branch that
now has the comment

        /* Copy to a point before the end of the source array:
         * arrdup_max will copy at most v->end - v->start elements,
         * starting from v->start element. Original code said:
	 *  s[v->end - v->start] = NULL

with v->start pointing at the NULL terminator of the internal array, so
we return an empty array.

If $i > $#a then we take the arrlen_lt(s, v->start) branch and always
return a single element.

So the question is ... is the following more consistent?

for i in 1 2 3 4; do
 for j in 1 2 3 4 5; do
  print -n "$i $j = "
  nargs "${(@)a[i,j]}"
 done
done

5.2       |  5.3 **
----------+----------
1 1 => 1  |  1 1 => 1
1 2 => 2  |  1 2 => 2
1 3 => 2  |  1 3 => 2
1 4 => 2  |  1 4 => 2
1 5 => 2  |  1 5 => 2
2 1 => 0  |  2 1 => 0
2 2 => 1  |  2 2 => 1
2 3 => 1  |  2 3 => 1
2 4 => 1  |  2 4 => 1
2 5 => 1  |  2 5 => 1
3 1 => 0  |  3 1 => 0
3 2 => 0  |  3 2 => 0
3 3 => 0  |  3 3 => 0
3 4 => 0  |  3 4 => 1   **
3 5 => 0  |  3 5 => 1   **
4 1 => 0  |  4 1 => 0
4 2 => 0  |  4 2 => 0
4 3 => 0  |  4 3 => 0
4 4 => 1  |  4 4 => 0   **
4 5 => 1  |  4 5 => 1

This means you have to slice more than one non-existent element to
get an empty string; a slice of one such element is an empty array.

All "make check" tests still pass with that behavior though I'm not
certain that all completions would work identically.

Code looks like this:

diff --git a/Src/params.c b/Src/params.c
index 45f398a..aa8b196 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2299,9 +2299,16 @@ getarrvalue(Value v)
     if (v->end <= v->start) {
 	s = arrdup_max(nular, 0);
     }
-    else if (arrlen_lt(s, v->start) || v->start < 0) {
+    else if (v->start < 0) {
 	s = arrdup_max(nular, 1);
-    } else {
+    }
+    else if (arrlen_le(s, v->start)) {
+	/* Handle $ary[i,i] consistently for any $i > $#ary
+	 * and $ary[i,j] consistently for any $j > $i > $#ary
+	 */
+	s = arrdup_max(nular, v->end - (v->start + 1));
+    }
+    else {
         /* Copy to a point before the end of the source array:
          * arrdup_max will copy at most v->end - v->start elements,
          * starting from v->start element. Original code said:


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Array slices that don't exist [was Optimization of getarrvalue()]
  2016-11-29  6:11                     ` Array slices that don't exist [was Optimization of getarrvalue()] Bart Schaefer
@ 2016-11-29  9:34                       ` Peter Stephenson
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Stephenson @ 2016-11-29  9:34 UTC (permalink / raw)
  To: zsh-workers

On Mon, 28 Nov 2016 22:11:44 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> So the question is ... is the following more consistent?

It definitely *looks* more consistent.

I imagine at the point where we can only see what happens when we throw
it out.

pws


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-11-29  9:44 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20161108201233epcas1p1e2900e2d67af8b8558ebdb70eb7ad480@epcas1p1.samsung.com>
2016-11-08 20:11 ` [PATCH] Optimization of getarrvalue() Sebastian Gniazdowski
2016-11-08 21:58   ` Bart Schaefer
2016-11-09  7:11   ` Bart Schaefer
2016-11-09 11:42   ` Peter Stephenson
2016-11-09 16:03     ` Bart Schaefer
2016-11-14 12:32       ` Jun T.
2016-11-14 13:15         ` Jun T.
2016-11-14 13:57         ` Peter Stephenson
2016-11-14 15:35           ` Jun T.
2016-11-14 17:10           ` Bart Schaefer
2016-11-16  7:55             ` Sebastian Gniazdowski
2016-11-15 12:28         ` Peter Stephenson
2016-11-15 19:57         ` Peter Stephenson
2016-11-15 21:11           ` Bart Schaefer
2016-11-16 14:06           ` Jun T.
2016-11-16 16:14             ` Jun T.
2016-11-16 18:50             ` Bart Schaefer
2016-11-21 12:30               ` Jun T.
2016-11-24  0:55                 ` Bart Schaefer
2016-11-24 11:49                   ` Jun T.
2016-11-29  6:11                     ` Array slices that don't exist [was Optimization of getarrvalue()] Bart Schaefer
2016-11-29  9:34                       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).