zsh-workers
 help / color / mirror / code / Atom feed
* [patch] memoize string offset lookups
@ 2018-03-14  2:43 Matthew Martin
  2018-03-18 23:30 ` Bart Schaefer
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Martin @ 2018-03-14  2:43 UTC (permalink / raw)
  To: zsh-workers

(Idea from 42227.)

Below patch caches the relevant information for the last getarg call in
the param struct to hopefully speedup future calls. In the best case, it
is quite successful.

% ../zsh-unpatched -fc 'a=$(repeat 50000; print -n a); time ( for (( i=1; i <= 50000; i++)); print $a[i] >/dev/null; )'
( for ((i=1; i <= 50000; i++)) do; print $a[i] > /dev/null; done; )  4.56s user 0.46s system 99% cpu 5.035 total
% ./zsh -fc 'a=$(repeat 50000; print -n a); time ( for (( i=1; i <= 50000; i++)); print $a[i] >/dev/null; )' 
( for ((i=1; i <= 50000; i++)) do; print $a[i] > /dev/null; done; )  0.68s user 0.29s system 99% cpu 0.970 total

% ../zsh-unpatched -fc 'a=$(repeat 100000; print -n a); time ( for (( i=1; i <= 100000; i++)); print $a[i] >/dev/null; )'
( for ((i=1; i <= 100000; i++)) do; print $a[i] > /dev/null; done; )  18.00s user 0.64s system 99% cpu 18.710 total
% ./zsh -fc 'a=$(repeat 100000; print -n a); time ( for (( i=1; i <= 100000; i++)); print $a[i] >/dev/null; )'
( for ((i=1; i <= 100000; i++)) do; print $a[i] > /dev/null; done; )  1.80s user 0.66s system 99% cpu 2.475 total

But in the worst case it hurts.

% ../zsh-unpatched -fc 'a=$(repeat 50000; print -n a); time ( for (( i=50000; i; i--)); print $a[i] >/dev/null; )' 
( for ((i=50000; i; i--)) do; print $a[i] > /dev/null; done; )  4.68s user 0.33s system 99% cpu 5.022 total
% ./zsh -fc 'a=$(repeat 50000; print -n a); time ( for (( i=50000; i; i--)); print $a[i] >/dev/null; )' 
( for ((i=50000; i; i--)) do; print $a[i] > /dev/null; done; )  6.88s user 0.51s system 99% cpu 7.422 tota

Any ideas if the worst case can be improved?

- Matthew Martin



diff --git a/Src/params.c b/Src/params.c
index de7730ae7..5556ca634 100644
--- a/Src/params.c
+++ b/Src/params.c
@@ -1476,11 +1476,22 @@ getarg(char **str, int *inv, Value v, int a2, zlong *w,
 	     * for Meta characters and maybe for multibyte characters.
 	     */
 	    if (r > 0) {
-		zlong nchars = r;
+		zlong nchars;
 
 		MB_METACHARINIT();
-		for (t = s; nchars && *t; nchars--)
+		if (v->pm->prev_idx > 0 && v->pm->prev_idx <= r) {
+		    t = s + v->pm->prev_off;
+		    lastcharlen = v->pm->prev_lcl;
+		    nchars = r - v->pm->prev_idx;
+		} else {
+		    t = s;
+		    nchars = r;
+		}
+		for (; nchars && *t; nchars--)
 		    t += (lastcharlen = MB_METACHARLEN(t));
+		v->pm->prev_idx = r;
+		v->pm->prev_lcl = lastcharlen;
+		v->pm->prev_off = t - s;
 		/* for consistency, keep any remainder off the end */
 		r = (zlong)(t - s) + nchars;
 		if (prevcharlen && !nchars /* ignore if off the end */)
diff --git a/Src/zsh.h b/Src/zsh.h
index 8b4898477..99508dc05 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -1820,6 +1820,9 @@ struct param {
     char *ename;		/* name of corresponding environment var */
     Param old;			/* old struct for use with local         */
     int level;			/* if (old != NULL), level of localness  */
+    zlong prev_idx;		/* previous offset in getargs() */
+    zlong prev_off;		/* previous offset into */
+    int prev_lcl;		/* previous lastcharlen */
 };
 
 /* structure stored in struct param's u.data by tied arrays */


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [patch] memoize string offset lookups
  2018-03-14  2:43 [patch] memoize string offset lookups Matthew Martin
@ 2018-03-18 23:30 ` Bart Schaefer
  0 siblings, 0 replies; 2+ messages in thread
From: Bart Schaefer @ 2018-03-18 23:30 UTC (permalink / raw)
  To: zsh-workers

On Mar 13,  9:43pm, Matthew Martin wrote:
}
} (Idea from 42227.)

If you follow that thread through to 42237, the last word on the subject
was that it's probably not worth it to memoize more than the single most
recent call to the function.  In which case it can be done with statics
in the function itself, rather than by expanding the param struct and
adding 20+ bytes to every parameter.

We'd need some real-world information on how common it is to iterate by
characters over two or more parameters in alternation, to know whether
tracking every parameter separately is worth the overhead.

} But in the worst case it hurts.

Iterating backwards over a multibyte character array is always painful.
Related to what Stephane mentions in 42466, it might be worthwhile to
have another optimization in this same bit of code to do the indexing
without the character counting when MB_CUR_MAX == 1.

One thought: Memoize two posistions in the array, one advancing to the
largest index seen so far and the other to half that, so you only have
to recount about half as much if you start skipping backward.  I don't
have a good feel for when the record-keeping becomes more expensive
than the recounting, so this may be a hare-brained scheme.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-03-18 23:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-14  2:43 [patch] memoize string offset lookups Matthew Martin
2018-03-18 23:30 ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).