zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH 3/3] Constify two local variables.
@ 2015-11-30  3:21 Daniel Shahaf
  2015-11-30  9:38 ` Peter Stephenson
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Shahaf @ 2015-11-30  3:21 UTC (permalink / raw)
  To: zsh-workers

---
I assume that v->pm->gsu.a->getfn() has no access to v->start and v->end, hence
changing the order is safe.

Are there any compilers that choke on 'char **const' (const pointer to pointer
to char)?

Cheers,

Daniel

 Src/params.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Src/params.c b/Src/params.c
index d8bf83d..142697f 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2552,18 +2552,20 @@ setarrvalue(Value v, char **val)
 	     v->pm->node.nam);
 	return;
     } else {
-	char **old, **new, **p, **q, **r;
-	int pre_assignment_length;
+	char **const old = v->pm->gsu.a->getfn(v->pm);
+	char **new;
+	char **p, **q, **r; /* index variables */
+	const int pre_assignment_length = arrlen(old);
 	int post_assignment_length;
 	int i;
 
+	q = old;
+
 	if ((v->flags & VALFLAG_INV) && unset(KSHARRAYS)) {
 	    if (v->start > 0)
 		v->start--;
 	    v->end--;
 	}
-	q = old = v->pm->gsu.a->getfn(v->pm);
-	pre_assignment_length = arrlen(old);
 	if (v->start < 0) {
 	    v->start += pre_assignment_length;
 	    if (v->start < 0)
-- 
2.1.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30  3:21 [PATCH 3/3] Constify two local variables Daniel Shahaf
@ 2015-11-30  9:38 ` Peter Stephenson
  2015-11-30 15:59   ` Daniel Shahaf
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2015-11-30  9:38 UTC (permalink / raw)
  To: zsh-workers

On Mon, 30 Nov 2015 03:21:53 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> I assume that v->pm->gsu.a->getfn() has no access to v->start and
> v->end, hence changing the order is safe.

Yes, the getfn() is very low level, dealing only with storage for that
parameter and having no knowledge at all of any expansion/substitution
happening above it.

> Are there any compilers that choke on 'char **const' (const pointer to pointer
> to char)?

I suspect we ditched those some years ago when we started making more
use of const, but there might be a few oddballs lying around.

By the way...

> See for example the number of uses of mkarry()

that should have been "mkarray()".  I see 19 uses, then there's
also hmkarray() which comes from the heap and is only used 3 times.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30  9:38 ` Peter Stephenson
@ 2015-11-30 15:59   ` Daniel Shahaf
  2015-11-30 16:28     ` Bart Schaefer
  2015-11-30 16:42     ` Peter Stephenson
  0 siblings, 2 replies; 11+ messages in thread
From: Daniel Shahaf @ 2015-11-30 15:59 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2301 bytes --]

Peter Stephenson wrote on Mon, Nov 30, 2015 at 09:38:32 +0000:
> On Mon, 30 Nov 2015 03:21:53 +0000
> Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > I assume that v->pm->gsu.a->getfn() has no access to v->start and
> > v->end, hence changing the order is safe.
> 
> Yes, the getfn() is very low level, dealing only with storage for that
> parameter and having no knowledge at all of any expansion/substitution
> happening above it.
> 

Thanks, that's what I expected.  What are the lifetime interfaces of
getfn/setfn of arrays?  As far as I can tell, getfn returns a "borrowed
reference" (i.e., something whose lifetime is managed by the parameter)
and setfn "steals" a reference to its second parameter (assumes
responsibility for freeing it).

I'm asking because I found that $region_highlights' getfn returns
a heap-allocated array, while its setfn calls free() on the array passed
to it, so
.
    Param pm = $region_highlights;
    values_a = pm->gsu.a->getfn();
    pm->gsu.a->setfn(values_a);
.
is undefined behaviour (calls free() on a heap pointer, triggering SIGABRT).

That's a reduced example of what the "don't reallocate if it's the same
size" patch does, and I'm not sure how to make it work: should I change
get_re, the setfn, or the callsite which tries to pipe them into each
other?

(The getfn is get_region_highlight.  The setfn is set_region_highlight.
I'm attaching that patch in its current state, in case it's relevant;
however, it's not ready to be committed, since it triggers the SIGABRT
whenever I invoke backward-kill-word.)

> > Are there any compilers that choke on 'char **const' (const pointer to pointer
> > to char)?
> 
> I suspect we ditched those some years ago when we started making more
> use of const, but there might be a few oddballs lying around.
> 

Ack.  I'll hold this patch (37253) until the release too then (unless
there's another test version first).  No point in breaking people's
builds if I can avoid it.

> > See for example the number of uses of mkarry()
> 
> that should have been "mkarray()".  I see 19 uses, then there's
> also hmkarray() which comes from the heap and is only used 3 times.

*nod*, however, I haven't looked into these yet; I have been looking
into the "don't reallocate if it's the same size" part first.

Thanks,

Daniel

[-- Attachment #2: 0004-Optimize-non-resizing-array-assignments.patch --]
[-- Type: text/x-patch, Size: 4066 bytes --]

>From c4cba933db3e972a5539e489e45d5908918e3b59 Mon Sep 17 00:00:00 2001
From: Daniel Shahaf <d.s@daniel.shahaf.name>
Date: Mon, 30 Nov 2015 09:18:12 +0000
Subject: [PATCH] Optimize non-resizing array assignments.

Before:
	% zsh -fc 'local -a a; N=5000; time ( a[N]=bar; integer i; while (( i++ < N )) { a[i]=foo } )'
	2.07s user 0.00s system 99% cpu 2.077 total

After:
	% zsh -fc 'local -a a; N=5000; time ( a[N]=bar; integer i; while (( i++ < N )) { a[i]=foo } )'
	0.11s user 0.00s system 98% cpu 0.114 total

Also add a test demonstrating that this codepath must call setfn(), and
another triggering the 'v->end > arrlen(old)' case.
---
To be applied on top of 37253.

 Src/params.c        | 70 +++++++++++++++++++++++++++++++++++++----------------
 Test/A06assign.ztst | 12 +++++++++
 2 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/Src/params.c b/Src/params.c
index 142697f..94c996a 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2553,13 +2553,9 @@ setarrvalue(Value v, char **val)
 	return;
     } else {
 	char **const old = v->pm->gsu.a->getfn(v->pm);
-	char **new;
-	char **p, **q, **r; /* index variables */
 	const int pre_assignment_length = arrlen(old);
+	const int slice_length = arrlen(val);
 	int post_assignment_length;
-	int i;
-
-	q = old;
 
 	if ((v->flags & VALFLAG_INV) && unset(KSHARRAYS)) {
 	    if (v->start > 0)
@@ -2579,24 +2575,56 @@ setarrvalue(Value v, char **val)
 	if (v->end < v->start)
 	    v->end = v->start;
 
-	post_assignment_length = v->start + arrlen(val);
+	post_assignment_length = v->start + slice_length;
 	if (v->end <= pre_assignment_length)
-	    post_assignment_length += pre_assignment_length - v->end + 1;
-
-	p = new = (char **) zshcalloc(sizeof(char *)
-		                      * (post_assignment_length + 1));
-
-	for (i = 0; i < v->start; i++)
-	    *p++ = i < pre_assignment_length ? ztrdup(*q++) : ztrdup("");
-	for (r = val; *r;)
-	    *p++ = ztrdup(*r++);
-	if (v->end < pre_assignment_length)
-	    for (q = old + v->end; *q;)
-		*p++ = ztrdup(*q++);
-	*p = NULL;
+	    post_assignment_length += pre_assignment_length - v->end;
+
+	if (post_assignment_length == pre_assignment_length) {
+	    /* Optimization: avoid reallocating. */
+	    const int effective_end_index = MIN(v->end, pre_assignment_length);
+	    int i;
+
+	    DPUTS3(v->start + slice_length != effective_end_index,
+		   "BUG: %s: start=%d end=%d",
+		    v->pm->node.nam, v->start, v->end);
+	    DPUTS3(v->start + slice_length != effective_end_index,
+		   "BUG: %s: slice#=%d arr#=%d",
+		    v->pm->node.nam, slice_length, pre_assignment_length);
+	   
+	    /* Shallow copy 'val' into 'old'. */
+	    for (i = v->start; i < effective_end_index; i++)
+		zsfree(old[i]);
+	    memcpy(old + v->start, val, (i - v->start) * sizeof(char *));
+
+	    v->pm->gsu.a->setfn(v->pm, old);
+	    free(val);
+	} else {
+	    /* Why allocate N+2 pointers?  The first N are the elements, the 
+	     * (N+1)th is the NULL sentinel, but what is the (N+2)th pointer?
+	     */
+	    char **const new =
+		(char**) zshcalloc(sizeof(char *)
+				   * (post_assignment_length + 2));
+	    char **p, **q, **r; /* index variables */
+	    int i;
+
+	    q = old;
+	    p = new;
+	    r = val;
+
+	    for (i = 0; i < v->start; i++)
+		*p++ = i < pre_assignment_length ? ztrdup(*q++) : ztrdup("");
+	    for (r = val; *r;)
+		*p++ = ztrdup(*r++);
+	    if (v->end < pre_assignment_length)
+		for (q = old + v->end; *q;)
+		    *p++ = ztrdup(*q++);
+	    *p = NULL;
+
+	    v->pm->gsu.a->setfn(v->pm, new);
+	    freearray(val);
+	}
 
-	v->pm->gsu.a->setfn(v->pm, new);
-	freearray(val);
     }
 }
 
diff --git a/Test/A06assign.ztst b/Test/A06assign.ztst
index 1e3d2ed..17f7c63 100644
--- a/Test/A06assign.ztst
+++ b/Test/A06assign.ztst
@@ -478,3 +478,15 @@
 >first second
 >first
 >second
+
+ typeset -aU unique_array=(first second)
+ unique_array[1]=second
+ print $unique_array
+0:assignment to unique array
+>second
+
+ typeset -a array=(first)
+ array[1,3]=(FIRST)
+ print $array
+0:slice beyond length of array
+>FIRST
-- 
2.1.4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30 15:59   ` Daniel Shahaf
@ 2015-11-30 16:28     ` Bart Schaefer
  2015-11-30 16:42     ` Peter Stephenson
  1 sibling, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2015-11-30 16:28 UTC (permalink / raw)
  To: zsh-workers

On Nov 30,  3:59pm, Daniel Shahaf wrote:
}
} I'm asking because I found that $region_highlights' getfn returns
} a heap-allocated array, while its setfn calls free() on the array passed
} to it, so
} .
}     Param pm = $region_highlights;
}     values_a = pm->gsu.a->getfn();
}     pm->gsu.a->setfn(values_a);
} .
} is undefined behaviour (calls free() on a heap pointer, triggering SIGABRT).

In general you can't pass the result of any getfn() back to its own
or anyone else's setfn().  All setfn() should always expect a newly-
zshcalloc()d [or the equivalent] array.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30 15:59   ` Daniel Shahaf
  2015-11-30 16:28     ` Bart Schaefer
@ 2015-11-30 16:42     ` Peter Stephenson
  2015-11-30 20:39       ` Bart Schaefer
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Stephenson @ 2015-11-30 16:42 UTC (permalink / raw)
  To: zsh-workers

On Mon, 30 Nov 2015 15:59:45 +0000
Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> Peter Stephenson wrote on Mon, Nov 30, 2015 at 09:38:32 +0000:
> > On Mon, 30 Nov 2015 03:21:53 +0000
> > Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > > I assume that v->pm->gsu.a->getfn() has no access to v->start and
> > > v->end, hence changing the order is safe.
> > 
> > Yes, the getfn() is very low level, dealing only with storage for that
> > parameter and having no knowledge at all of any expansion/substitution
> > happening above it.
> > 
> 
> Thanks, that's what I expected.  What are the lifetime interfaces of
> getfn/setfn of arrays?

That's less well defined than it should be, but the implication is
obviously they will be used within the liftime of the current heap.
This is common in e.g. Src/Modules/parameter.c for returning parameters
reflecting some internal state.  Making sure this is what happens isn't
easy.  The only real difference from any other use of the heap is that
the source of the heap allocation is quite well buried.  But maybe
that's not much of a difference, since there's much heap allocation a
few layers above in paramsubst().

As Bart says, setfn() expects a permanently allocated values; if it
doesn't neeed it, it's up to the setfn in question to convert the value
and then free what's passed in.

pws


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30 16:42     ` Peter Stephenson
@ 2015-11-30 20:39       ` Bart Schaefer
  2015-11-30 21:36         ` Bart Schaefer
  2015-12-02  0:36         ` Daniel Shahaf
  0 siblings, 2 replies; 11+ messages in thread
From: Bart Schaefer @ 2015-11-30 20:39 UTC (permalink / raw)
  To: zsh-workers

On Nov 30,  4:42pm, Peter Stephenson wrote:
}
} As Bart says, setfn() expects a permanently allocated values; if it
} doesn't neeed it, it's up to the setfn in question to convert the value
} and then free what's passed in.

Thinking about it, it is quite likely that this makes it impossible to
optimize in the way Daniel wants.  At a very low level, we can't swap
values into an existing (char**) representing an array parameter.  All
the setfn() implementations would have to be updated to notice when
the pointer passed in is the same as the pointer already in the param
struct and skip the free, or something like that.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30 20:39       ` Bart Schaefer
@ 2015-11-30 21:36         ` Bart Schaefer
  2015-12-02  0:36         ` Daniel Shahaf
  1 sibling, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2015-11-30 21:36 UTC (permalink / raw)
  To: zsh-workers

On Nov 30, 12:39pm, Bart Schaefer wrote:
} Subject: Re: [PATCH 3/3] Constify two local variables.
}
} All the setfn() implementations would have to be updated to notice when
} the pointer passed in is the same as the pointer already in the param
} struct and skip the free, or something like that.

Hmm, the default array and hash setfn() already do that.  So it would
only be an issue with some of the specials?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-11-30 20:39       ` Bart Schaefer
  2015-11-30 21:36         ` Bart Schaefer
@ 2015-12-02  0:36         ` Daniel Shahaf
  2015-12-02  1:11           ` Bart Schaefer
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Shahaf @ 2015-12-02  0:36 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Mon, Nov 30, 2015 at 12:39:37 -0800:
> On Nov 30,  4:42pm, Peter Stephenson wrote:
> }
> } As Bart says, setfn() expects a permanently allocated values; if it
> } doesn't neeed it, it's up to the setfn in question to convert the value
> } and then free what's passed in.
> 
> Thinking about it, it is quite likely that this makes it impossible to
> optimize in the way Daniel wants.  At a very low level, we can't swap
> values into an existing (char**) representing an array parameter.  All
> the setfn() implementations would have to be updated to notice when
> the pointer passed in is the same as the pointer already in the param
> struct and skip the free, or something like that.

There's a sort of a third way here: I could change

    vals = getfn()
    munge_in_place(vals)
    setfn(vals)

to

    #define N (sizeof(void*) * arrlen(vals))
    vals = getfn()
    munge_in_place(vals)
    dummy = zalloc(N)
    memcpy(dummy, vals, N)
    setfn(dummy)

which still does O(N) work, but does _less_ work: it still allocates and
initializes one chunk of N pointers, but (compared to the setarrvalue()
in HEAD) doesn't allocate and initialize N chunks of one element each.

Bart Schaefer wrote on Mon, Nov 30, 2015 at 13:36:17 -0800:
> On Nov 30, 12:39pm, Bart Schaefer wrote:
> } Subject: Re: [PATCH 3/3] Constify two local variables.
> }
> } All the setfn() implementations would have to be updated to notice when
> } the pointer passed in is the same as the pointer already in the param
> } struct and skip the free, or something like that.
> 
> Hmm, the default array and hash setfn() already do that.  So it would
> only be an issue with some of the specials?

Specials, and module-defined parameters that don't use arrsetfn(),
I believe.

So, one option is to change all setfn()s to support the setfn(getfn())
invocation...?  But that'll mean modules written for 5.1.2 would have to
be *patched*, not just recompiled, to work with 5.1.3, unless we put in
a way for Param's to indicate whether they support the setfn(getfn())
invocation (to let core skip the optimization for Param's that don't
support that).

But in the meantime, we can just limit the optimization to Param's that
are known to support it:

/* to be applied on top of 37257 (which itself is to be applied on top of 37253) */
diff --git a/Src/params.c b/Src/params.c
index 94c996a..1372958 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -2579,7 +2579,11 @@ setarrvalue(Value v, char **val)
        if (v->end <= pre_assignment_length)
            post_assignment_length += pre_assignment_length - v->end;
 
-       if (post_assignment_length == pre_assignment_length) {
+       if (post_assignment_length == pre_assignment_length &&
+           v->pm->gsu.a->setfn == arrsetfn) {
            /* Optimization: avoid reallocating. */
+           /* Only enable the optimization for setfn's that support being called
+            * on the return value of getfn, such as arrsetfn. */
+
            const int effective_end_index = MIN(v->end, pre_assignment_length);
            int i;

This fixes the SIGABRT, I think I can commit it like this.  So, I'll
commit 37253 and 37257+this after the release. 

Now, one of my actual use-cases for this happens to be region_highlight
(for zsh-syntax-highlighting reasons), which doesn't use arrsetfn, but
I'll see about that separately...

Cheers,

Daniel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-12-02  0:36         ` Daniel Shahaf
@ 2015-12-02  1:11           ` Bart Schaefer
  2015-12-03 23:37             ` Daniel Shahaf
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Schaefer @ 2015-12-02  1:11 UTC (permalink / raw)
  To: zsh-workers

On Dec 2, 12:36am, Daniel Shahaf wrote:
} Subject: Re: [PATCH 3/3] Constify two local variables.
}
}     #define N (sizeof(void*) * arrlen(vals))
}     vals = getfn()
}     munge_in_place(vals)
}     dummy = zalloc(N)
}     memcpy(dummy, vals, N)
}     setfn(dummy)

Would that memcpy() really work, or would it need to be zarrdup() to copy
each of the elements as well?  Because e.g. arrsetfn() does freearray().


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-12-02  1:11           ` Bart Schaefer
@ 2015-12-03 23:37             ` Daniel Shahaf
  2015-12-04  0:36               ` Bart Schaefer
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Shahaf @ 2015-12-03 23:37 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer wrote on Tue, Dec 01, 2015 at 17:11:21 -0800:
> On Dec 2, 12:36am, Daniel Shahaf wrote:
> } Subject: Re: [PATCH 3/3] Constify two local variables.
> }
> }     #define N (sizeof(void*) * arrlen(vals))
> }     vals = getfn()
> }     munge_in_place(vals)
> }     dummy = zalloc(N)
> }     memcpy(dummy, vals, N)
> }     setfn(dummy)
> 
> Would that memcpy() really work, or would it need to be zarrdup() to copy
> each of the elements as well?  Because e.g. arrsetfn() does freearray().

I think it would work, because setarrvalue() in HEAD calls freearray()
on the values array, implying that the individual elements are already
permanently allocated.

---

In the meantime I've found a reproducible invalid free() with this
patchset (reproducible by invoking _gnu_generic on ": Sr<TAB>" in the
source tree).  I haven't investigated yet.

Daniel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] Constify two local variables.
  2015-12-03 23:37             ` Daniel Shahaf
@ 2015-12-04  0:36               ` Bart Schaefer
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Schaefer @ 2015-12-04  0:36 UTC (permalink / raw)
  To: zsh-workers

On Dec 3, 11:37pm, Daniel Shahaf wrote:
} Subject: Re: [PATCH 3/3] Constify two local variables.
}
} Bart Schaefer wrote on Tue, Dec 01, 2015 at 17:11:21 -0800:
} > On Dec 2, 12:36am, Daniel Shahaf wrote:
} > }
} > }     memcpy(dummy, vals, N)
} > }     setfn(dummy)
} > 
} > Would that memcpy() really work, or would it need to be zarrdup() to copy
} > each of the elements as well?  Because e.g. arrsetfn() does freearray().
} 
} I think it would work, because setarrvalue() in HEAD calls freearray()
} on the values array, implying that the individual elements are already
} permanently allocated.

That's exactly why I think it would NOT work!  If you memcp() only the
pointers and then pass a new pointer-to-pointer into arrsetfn(), it's
going to call freearray() and discard all the individual pointers that
you intended would become the [re-used] elements.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-12-04  0:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-30  3:21 [PATCH 3/3] Constify two local variables Daniel Shahaf
2015-11-30  9:38 ` Peter Stephenson
2015-11-30 15:59   ` Daniel Shahaf
2015-11-30 16:28     ` Bart Schaefer
2015-11-30 16:42     ` Peter Stephenson
2015-11-30 20:39       ` Bart Schaefer
2015-11-30 21:36         ` Bart Schaefer
2015-12-02  0:36         ` Daniel Shahaf
2015-12-02  1:11           ` Bart Schaefer
2015-12-03 23:37             ` Daniel Shahaf
2015-12-04  0:36               ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).