zsh-workers
 help / color / mirror / code / Atom feed
* Callgrind run
@ 2016-11-10 10:37 ` Sebastian Gniazdowski
  2016-11-10 12:31   ` Peter Stephenson
  2016-11-10 13:47   ` multibyte optimisations Peter Stephenson
  0 siblings, 2 replies; 5+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-10 10:37 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 2353 bytes --]

Hello
I've run callgrind on Zsh, when executing syntax-highlighting code that
parses 823 lines of code:

2,269,560,047  ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt]
1,698,947,505  ???:remnulargs [/usr/local/bin/zsh-debug-opt]
1,677,804,272  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,425,973,736  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,177,994,701  ???:untokenize [/usr/local/bin/zsh-debug-opt]
1,048,181,974  ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt]
1,036,055,574  ???:getindex'2 [/usr/local/bin/zsh-debug-opt]
  793,202,632  ???:haswilds [/usr/local/bin/zsh-debug-opt]
  578,630,988  ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt]
  483,051,992  ???:szone_free_definite_size
  [/usr/lib/system/libsystem_malloc.dylib]
  436,411,797  ???:ztrsub [/usr/local/bin/zsh-debug-opt]
  364,444,476  ???:tiny_malloc_from_free_list
  [/usr/lib/system/libsystem_malloc.dylib]
  353,826,375  ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt]
  280,090,072  ???:tiny_free_list_add_ptr
  [/usr/lib/system/libsystem_malloc.dylib]
  258,502,596  ???:strlen [/usr/lib/dyld]
  234,273,918  ???:pattrylen [/usr/local/bin/zsh-debug-opt]
  209,835,520  ???:szone_size [/usr/lib/system/libsystem_malloc.dylib]

To repeat the run clone
https://github.com/psprint/history-search-multi-word/ and add "valgrind
--tool=callgrind" before "zsh" (after exec) in parse.zsh, then run
./parse.zsh ./to-parse.zsh. I think this is a very good real world test.

Seems that Zsh execution could be greatly optimized if functions:
remnulargs, untokenize, haswilds could be optimized. Not sure if the
results are reasonable, as haswilds just iterates over a string and does
quite basic switch. The other two functions have nested loops, so they
look more likely as being time consuming. Maybe the nested loop can be
changed to something else?

Other pointed functions seem to be very valid / expected – multibyte
functions. They can be optimized if a courageous decision will be made –
to do what charnext / pattern.c does:

    if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
        return x + 1;

I.e. to optimize for ASCII as subset of UTF-8 also when calling
MB_METACHARLEN, not only for MB_METASTRLEN (recent change).

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com

[-- Attachment #2: callgrind_annotate.txt --]
[-- Type: text/plain, Size: 7036 bytes --]

--------------------------------------------------------------------------------
Profile data file 'callgrind.out.11879' (creator: callgrind-3.12.0)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 2995164135
Trigger: Program termination
Profiled target:  zsh-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 11879, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
            Ir 
--------------------------------------------------------------------------------
16,735,388,538  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
2,269,560,047  ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt]
1,698,947,505  ???:remnulargs [/usr/local/bin/zsh-debug-opt]
1,677,804,272  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,425,973,736  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,177,994,701  ???:untokenize [/usr/local/bin/zsh-debug-opt]
1,048,181,974  ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt]
1,036,055,574  ???:getindex'2 [/usr/local/bin/zsh-debug-opt]
  793,202,632  ???:haswilds [/usr/local/bin/zsh-debug-opt]
  578,630,988  ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt]
  483,051,992  ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib]
  436,411,797  ???:ztrsub [/usr/local/bin/zsh-debug-opt]
  364,444,476  ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib]
  353,826,375  ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt]
  280,090,072  ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib]
  258,502,596  ???:strlen [/usr/lib/dyld]
  234,273,918  ???:pattrylen [/usr/local/bin/zsh-debug-opt]
  209,835,520  ???:szone_size [/usr/lib/system/libsystem_malloc.dylib]
  193,985,837  ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib]
  169,580,182  ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib]
  143,109,122  ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib]
   97,432,800  ???:free [/usr/lib/dyld]
   97,335,179  ???:itype_end [/usr/local/bin/zsh-debug-opt]
   95,353,820  ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib]
   83,934,500  ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
   81,015,036  ???:filesub [/usr/local/bin/zsh-debug-opt]
   68,738,845  ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib]
   60,927,832  ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   57,698,352  ???:zalloc [/usr/local/bin/zsh-debug-opt]
   55,196,289  ???:bin_log [/usr/local/bin/zsh-debug-opt]
   54,517,015  ???:stpcpy [/usr/lib/system/libsystem_c.dylib]
   51,545,105  ???:setarrvalue [/usr/local/bin/zsh-debug-opt]
   49,052,650  ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib]
   48,122,314  ???:ztrdup [/usr/local/bin/zsh-debug-opt]
   45,371,076  ???:mathevalarg'2 [/usr/local/bin/zsh-debug-opt]
   44,923,221  ???:arrlen [/usr/local/bin/zsh-debug-opt]
   44,888,769  ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib]
   43,521,301  ???:malloc [/usr/lib/dyld]
   33,548,312  ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib]
   33,378,378  ???:execlist'2 [/usr/local/bin/zsh-debug-opt]
   32,027,315  ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib]
   29,584,698  ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib]
   28,786,904  ???:hasher [/usr/local/bin/zsh-debug-opt]
   25,459,319  ???:zhalloc [/usr/local/bin/zsh-debug-opt]
   25,436,057  ???:modify [/usr/local/bin/zsh-debug-opt]
   23,233,085  ???:patcompile'2 [/usr/local/bin/zsh-debug-opt]
   23,114,835  ???:zsfree [/usr/local/bin/zsh-debug-opt]
   21,720,915  ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib]
   21,033,364  ???:execrestore'2 [/usr/local/bin/zsh-debug-opt]
   21,029,416  ???:ingetc [/usr/local/bin/zsh-debug-opt]
   20,619,575  ???:freearray [/usr/local/bin/zsh-debug-opt]
   18,246,076  ???:fetchvalue [/usr/local/bin/zsh-debug-opt]
   17,288,068  ???:isascii [/usr/lib/system/libsystem_c.dylib]
   16,274,888  ???:filesub'2 [/usr/local/bin/zsh-debug-opt]
   15,162,279  ???:haswilds'2 [/usr/local/bin/zsh-debug-opt]
   12,590,562  ???:parsestrnoerr [/usr/local/bin/zsh-debug-opt]
   10,881,530  ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   10,206,971  ???:zstrtol_underscore [/usr/local/bin/zsh-debug-opt]
    9,997,508  ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib]
    9,639,468  ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib]
    9,404,226  ???:modify'2 [/usr/local/bin/zsh-debug-opt]
    8,688,566  ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,688,566  ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib]
    8,688,366  ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,497,692  ???:op [/usr/local/bin/zsh-debug-opt]
    8,390,890  ???:prefork [/usr/local/bin/zsh-debug-opt]
    8,223,809  ???:patcompstart [/usr/local/bin/zsh-debug-opt]
    7,962,974  ???:gethashnode2 [/usr/local/bin/zsh-debug-opt]
    7,766,705  ???:scanmatchtable [/usr/local/bin/zsh-debug-opt]
    7,013,693  ???:parsestrnoerr'2 [/usr/local/bin/zsh-debug-opt]
    6,917,568  ???:op'2 [/usr/local/bin/zsh-debug-opt]
    6,909,521  ???:getindex [/usr/local/bin/zsh-debug-opt]
    6,827,173  ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib]
    6,691,178  ???:hasbraces [/usr/local/bin/zsh-debug-opt]
    6,601,957  ???:mathevalarg [/usr/local/bin/zsh-debug-opt]
    6,523,091  ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib]
    6,465,840  ???:ecgetstr [/usr/local/bin/zsh-debug-opt]
    6,193,899  ???:getstrvalue [/usr/local/bin/zsh-debug-opt]
    6,178,975  ???:matheval'2 [/usr/local/bin/zsh-debug-opt]
    6,012,923  ???:patcompile [/usr/local/bin/zsh-debug-opt]
    5,721,631  ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld]
    5,083,953  ???:add [/usr/local/bin/zsh-debug-opt]
    4,864,901  ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib]
    4,728,297  ???:dupstring [/usr/local/bin/zsh-debug-opt]
    4,331,738  ???:fetchvalue'2 [/usr/local/bin/zsh-debug-opt]
    4,212,315  ???:newparamtable [/usr/local/bin/zsh-debug-opt]
    4,034,794  ???:__vfprintf [/usr/lib/system/libsystem_c.dylib]
    3,977,804  ???:pattryrefs [/usr/local/bin/zsh-debug-opt]
    3,604,588  ???:assignstrvalue [/usr/local/bin/zsh-debug-opt]
    3,520,423  ???:matheval [/usr/local/bin/zsh-debug-opt]
    3,518,560  ???:mb_charinit [/usr/local/bin/zsh-debug-opt]


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Callgrind run
  2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski
@ 2016-11-10 12:31   ` Peter Stephenson
  2016-11-10 14:07     ` Sebastian Gniazdowski
  2016-11-10 13:47   ` multibyte optimisations Peter Stephenson
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2016-11-10 12:31 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Nov 2016 02:37:12 -0800
Sebastian Gniazdowski <psprint@fastmail.com> wrote:
> Seems that Zsh execution could be greatly optimized if functions:
> remnulargs, untokenize, haswilds could be optimized. Not sure if the
> results are reasonable, as haswilds just iterates over a string and does
> quite basic switch. The other two functions have nested loops, so they
> look more likely as being time consuming. Maybe the nested loop can be
> changed to something else?

The nested loops aren't "real" nested loops; the inner loop runs to
completion and then breaks if the outer loop detects a condition that
needs handling.

To do a good job optimising here, we really need state information
outside the functions --- in an experiment with my start up files, only
16% of calls to untokenize() actually had any effect.  But recording the
state generally is a very big change.

Some possible optimisations are along the following lines, although a
bit of care it's needed as it's not necessarily the case on all
architectures that the bit test used by itok() is necessarily faster
than the range test the following replaces it with.  It did seem faster
on this fairly standard Intel CPU.

I probably won't be committing this.

diff --git a/Src/exec.c b/Src/exec.c
index a01a633..a6b01a6 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1953,26 +1953,24 @@ makecline(LinkList list)
 mod_export void
 untokenize(char *s)
 {
-    if (*s) {
+    if (*s) {			/* "" may be a const string. Ick. */
 	int c;
 
-	while ((c = *s++))
-	    if (itok(c)) {
+	while ((c = *s++)) {
+	    if (c >= FIRST_TOK && c <= LAST_TOK) {
 		char *p = s - 1;
 
 		if (c != Nularg)
-		    *p++ = ztokens[c - Pound];
+		    *p++ = ztoken_to_char[STOUC(c)];
 
 		while ((c = *s++)) {
-		    if (itok(c)) {
-			if (c != Nularg)
-			    *p++ = ztokens[c - Pound];
-		    } else
-			*p++ = c;
+		    if (c != Nularg)
+			*p++ = ztoken_to_char[STOUC(c)];
 		}
 		*p = '\0';
 		break;
 	    }
+	}
     }
 }
 
diff --git a/Src/glob.c b/Src/glob.c
index 50f6dce..4d3fc51 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -3570,7 +3570,7 @@ remnulargs(char *s)
     if (*s) {
 	char *o = s, c;
 
-	while ((c = *s++))
+	while ((c = *s++)) {
 	    if (c == Bnullkeep) {
 		/*
 		 * An active backslash that needs to be turned back into
@@ -3579,7 +3579,7 @@ remnulargs(char *s)
 		 * pattern matching.
 		 */
 		continue;
-	    } else if (inull(c)) {
+	    } else if (c >= FIRST_NULL && c <= LAST_NULL) {
 		char *t = s - 1;
 
 		while ((c = *s++)) {
@@ -3595,6 +3595,7 @@ remnulargs(char *s)
 		}
 		break;
 	    }
+	}
     }
 }
 
diff --git a/Src/lex.c b/Src/lex.c
index 8896128..bfd6b11 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -37,6 +37,18 @@
 /**/
 mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-!'\"\\\\";
 
+/*
+ * Map a possibly tokenized unsigned char to a normal unsigned
+ * char, for use in untokenize().
+ *
+ * Tokens that need untokenizing (everything in ztokens except Nularg)
+ * map to a different character, everything else maps to itself.
+ * In particular, metafied characters are passed through unchanged
+ * (effectively escaping tokens) and do not need special handling.
+ */
+/**/
+mod_export char ztoken_to_char[256];
+
 /* parts of the current token */
 
 /**/
diff --git a/Src/utils.c b/Src/utils.c
index 3d535b8..9fa8a97 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -4012,6 +4012,18 @@ inittyptab(void)
     for (s = PATCHARS; *s; s++)
 	typtab[STOUC(*s)] |= IPATTERN;
 
+    for (t0 = 0; t0 < 256; t0++)
+    {
+	if (itok(t0) && (char)t0 != Nularg)
+	{
+	    ztoken_to_char[t0] = ztokens[t0 - STOUC(Pound)];
+	}
+	else
+	{
+	    ztoken_to_char[t0] = (char)t0;
+	}
+    }
+
     unqueue_signals();
 }
 
diff --git a/Src/zsh.h b/Src/zsh.h
index a5d4455..5065a54 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -170,6 +170,7 @@ struct mathfunc {
  * These should match the characters in ztokens, defined in lex.c
  */
 #define Pound		((char) 0x84)
+#define FIRST_TOK	Pound
 #define String		((char) 0x85)
 #define Hat		((char) 0x86)
 #define Star		((char) 0x87)
@@ -204,6 +205,7 @@ struct mathfunc {
  * and backslashes.
  */
 #define Snull		((char) 0x9d)
+#define FIRST_NULL	Snull
 #define Dnull		((char) 0x9e)
 #define Bnull		((char) 0x9f)
 /*
@@ -217,6 +219,8 @@ struct mathfunc {
  * is used to initialise the IMETA type in inittyptab().
  */
 #define Nularg		((char) 0xa1)
+#define LAST_TOK	Nularg
+#define LAST_NULL	Nularg
 
 /*
  * Take care to update the use of IMETA appropriately when adding


^ permalink raw reply	[flat|nested] 5+ messages in thread

* multibyte optimisations
  2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski
  2016-11-10 12:31   ` Peter Stephenson
@ 2016-11-10 13:47   ` Peter Stephenson
  2016-11-10 14:57     ` Sebastian Gniazdowski
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2016-11-10 13:47 UTC (permalink / raw)
  To: zsh-workers

On Thu, 10 Nov 2016 02:37:12 -0800
Sebastian Gniazdowski <psprint@fastmail.com> wrote:
> Other pointed functions seem to be very valid / expected – multibyte
> functions. They can be optimized if a courageous decision will be made –
> to do what charnext / pattern.c does:
> 
>     if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
>         return x + 1;
> 
> I.e. to optimize for ASCII as subset of UTF-8 also when calling
> MB_METACHARLEN, not only for MB_METASTRLEN (recent change).

These look straightforward and along the same lines as what we already
do.

pws

diff --git a/Src/utils.c b/Src/utils.c
index 3d535b8..cceaf4c 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -84,7 +84,15 @@ set_widearray(char *mb_array, Widechar_array wca)
 
 	mb_charinit();
 	while (*mb_array) {
-	    int mblen = mb_metacharlenconv(mb_array, &wci);
+	    int mblen;
+
+	    if (STOUC(*mb_array) <= 0x7f) {
+		mb_array++;
+		*wcptr++ = (wchar_t)*mb_array;
+		continue;
+	    }
+
+	    mblen = mb_metacharlenconv(mb_array, &wci);
 
 	    if (!mblen)
 		break;
@@ -5249,6 +5257,12 @@ mb_metacharlenconv_r(const char *s, wint_t *wcp, mbstate_t *mbsp)
     const char *ptr;
     wchar_t wc;
 
+    if (STOUC(*s) <= 0x7f) {
+	if (wcp)
+	    *wcp = (wint_t)*s;
+	return 1;
+    }
+
     for (ptr = s; *ptr; ) {
 	if (*ptr == Meta) {
 	    inchar = *++ptr ^ 32;
@@ -5301,7 +5315,7 @@ mb_metacharlenconv_r(const char *s, wint_t *wcp, mbstate_t *mbsp)
 mod_export int
 mb_metacharlenconv(const char *s, wint_t *wcp)
 {
-    if (!isset(MULTIBYTE)) {
+    if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
 	/* treat as single byte, possibly metafied */
 	if (wcp)
 	    *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s);
@@ -5442,6 +5456,12 @@ mb_charlenconv_r(const char *s, int slen, wint_t *wcp, mbstate_t *mbsp)
     const char *ptr;
     wchar_t wc;
 
+    if (slen && STOUC(*s) <= 0x7f) {
+	if (wcp)
+	    *wcp = (wint_t)*s;
+	return 1;
+    }
+
     for (ptr = s; slen;  ) {
 	inchar = *ptr;
 	ptr++;
@@ -5477,7 +5497,7 @@ mb_charlenconv_r(const char *s, int slen, wint_t *wcp, mbstate_t *mbsp)
 mod_export int
 mb_charlenconv(const char *s, int slen, wint_t *wcp)
 {
-    if (!isset(MULTIBYTE)) {
+    if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) {
 	if (wcp)
 	    *wcp = (wint_t)*s;
 	return 1;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Callgrind run
  2016-11-10 12:31   ` Peter Stephenson
@ 2016-11-10 14:07     ` Sebastian Gniazdowski
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-10 14:07 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

On Thu, Nov 10, 2016, at 04:31 AM, Peter Stephenson wrote:
> To do a good job optimising here, we really need state information
> outside the functions --- in an experiment with my start up files, only
> 16% of calls to untokenize() actually had any effect.  But recording the
> state generally is a very big change.
> 
> Some possible optimisations are along the following lines, although a
> bit of care it's needed as it's not necessarily the case on all
> architectures that the bit test used by itok() is necessarily faster
> than the range test the following replaces it with.  It did seem faster
> on this fairly standard Intel CPU.

Tested this and no big change, maybe 14 ms – running times are 2135 vs
2149, but that can be just instability. However callgrind reports
851,712,174 instructions instead of 1,177,994,701 for untokenize, while
other instruction counts are kept the same so the test seems valid.

My motivation is parsing of long Zsh code – would be a cool thing to
iterate long (z)-splitted input in say 400 ms instead of 2 seconds – a
dreamed result, maybe actually impossible, as disabling multibyte yields
1560 ms. State recording might seem bad but at least there is room for
improvement condensed in apparently few places, better than counting
cycles along whole Zsh code.

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com

[-- Attachment #2: callgrind_annotate3.txt --]
[-- Type: text/plain, Size: 7316 bytes --]

--------------------------------------------------------------------------------
Profile data file 'callgrind.out.37869' (creator: callgrind-3.12.0)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 3061023589
Trigger: Program termination
Profiled target:  zsh-ps-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 37869, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
            Ir 
--------------------------------------------------------------------------------
16,408,775,049  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
2,269,560,047  ???:mb_metacharlenconv_r [/usr/local/bin/zsh-ps-debug-opt]
1,697,840,717  ???:remnulargs [/usr/local/bin/zsh-ps-debug-opt]
1,677,804,272  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,425,973,736  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,048,181,974  ???:mb_metacharlenconv [/usr/local/bin/zsh-ps-debug-opt]
1,036,055,574  ???:getindex'2 [/usr/local/bin/zsh-ps-debug-opt]
  851,712,174  ???:untokenize [/usr/local/bin/zsh-ps-debug-opt]
  793,202,632  ???:haswilds [/usr/local/bin/zsh-ps-debug-opt]
  578,630,988  ???:mb_metastrlenend [/usr/local/bin/zsh-ps-debug-opt]
  482,828,373  ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib]
  436,411,797  ???:ztrsub [/usr/local/bin/zsh-ps-debug-opt]
  363,212,196  ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib]
  353,826,375  ???:pattrylen'2 [/usr/local/bin/zsh-ps-debug-opt]
  282,357,130  ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib]
  258,502,798  ???:strlen [/usr/lib/dyld]
  234,273,918  ???:pattrylen [/usr/local/bin/zsh-ps-debug-opt]
  209,831,892  ???:szone_size [/usr/lib/system/libsystem_malloc.dylib]
  193,951,431  ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib]
  169,581,080  ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib]
  143,108,999  ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib]
   97,432,800  ???:free [/usr/lib/system/libsystem_malloc.dylib]
   97,335,179  ???:itype_end [/usr/local/bin/zsh-ps-debug-opt]
   95,268,036  ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib]
   83,934,500  ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
   81,015,036  ???:filesub [/usr/local/bin/zsh-ps-debug-opt]
   68,739,019  ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib]
   60,928,000  ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   57,698,433  ???:zalloc [/usr/local/bin/zsh-ps-debug-opt]
   55,196,334  ???:bin_log [/usr/local/bin/zsh-ps-debug-opt]
   54,517,153  ???:stpcpy [/usr/lib/system/libsystem_c.dylib]
   51,545,105  ???:setarrvalue [/usr/local/bin/zsh-ps-debug-opt]
   49,058,372  ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib]
   48,122,383  ???:ztrdup [/usr/local/bin/zsh-ps-debug-opt]
   45,371,076  ???:mathevalarg'2 [/usr/local/bin/zsh-ps-debug-opt]
   44,923,221  ???:arrlen [/usr/local/bin/zsh-ps-debug-opt]
   44,888,797  ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib]
   43,521,421  ???:malloc [/usr/lib/system/libsystem_malloc.dylib]
   33,548,396  ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib]
   33,378,378  ???:execlist'2 [/usr/local/bin/zsh-ps-debug-opt]
   32,027,396  ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib]
   29,584,698  ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib]
   28,788,128  ???:hasher [/usr/local/bin/zsh-ps-debug-opt]
   25,459,319  ???:zhalloc [/usr/local/bin/zsh-ps-debug-opt]
   25,436,057  ???:modify [/usr/local/bin/zsh-ps-debug-opt]
   23,233,085  ???:patcompile'2 [/usr/local/bin/zsh-ps-debug-opt]
   23,114,835  ???:zsfree [/usr/local/bin/zsh-ps-debug-opt]
   21,720,950  ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib]
   21,033,364  ???:execrestore'2 [/usr/local/bin/zsh-ps-debug-opt]
   21,029,575  ???:ingetc [/usr/local/bin/zsh-ps-debug-opt]
   20,619,575  ???:freearray [/usr/local/bin/zsh-ps-debug-opt]
   18,246,076  ???:fetchvalue [/usr/local/bin/zsh-ps-debug-opt]
   17,288,068  ???:isascii [/usr/lib/system/libsystem_c.dylib]
   16,274,888  ???:filesub'2 [/usr/local/bin/zsh-ps-debug-opt]
   15,162,279  ???:haswilds'2 [/usr/local/bin/zsh-ps-debug-opt]
   12,590,562  ???:parsestrnoerr [/usr/local/bin/zsh-ps-debug-opt]
   10,881,555  ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   10,206,971  ???:zstrtol_underscore [/usr/local/bin/zsh-ps-debug-opt]
    9,997,820  ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib]
    9,639,594  ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib]
    9,404,226  ???:modify'2 [/usr/local/bin/zsh-ps-debug-opt]
    8,688,580  ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,688,580  ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib]
    8,688,380  ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,497,692  ???:op [/usr/local/bin/zsh-ps-debug-opt]
    8,390,890  ???:prefork [/usr/local/bin/zsh-ps-debug-opt]
    8,223,809  ???:patcompstart [/usr/local/bin/zsh-ps-debug-opt]
    7,963,097  ???:gethashnode2 [/usr/local/bin/zsh-ps-debug-opt]
    7,766,705  ???:scanmatchtable [/usr/local/bin/zsh-ps-debug-opt]
    7,013,693  ???:parsestrnoerr'2 [/usr/local/bin/zsh-ps-debug-opt]
    6,917,568  ???:op'2 [/usr/local/bin/zsh-ps-debug-opt]
    6,909,521  ???:getindex [/usr/local/bin/zsh-ps-debug-opt]
    6,827,386  ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib]
    6,691,178  ???:hasbraces [/usr/local/bin/zsh-ps-debug-opt]
    6,601,957  ???:mathevalarg [/usr/local/bin/zsh-ps-debug-opt]
    6,523,105  ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib]
    6,465,840  ???:ecgetstr [/usr/local/bin/zsh-ps-debug-opt]
    6,193,899  ???:getstrvalue [/usr/local/bin/zsh-ps-debug-opt]
    6,178,975  ???:matheval'2 [/usr/local/bin/zsh-ps-debug-opt]
    6,012,923  ???:patcompile [/usr/local/bin/zsh-ps-debug-opt]
    5,721,631  ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld]
    5,084,013  ???:add [/usr/local/bin/zsh-ps-debug-opt]
    4,864,913  ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib]
    4,728,297  ???:dupstring [/usr/local/bin/zsh-ps-debug-opt]
    4,331,738  ???:fetchvalue'2 [/usr/local/bin/zsh-ps-debug-opt]
    4,212,315  ???:newparamtable [/usr/local/bin/zsh-ps-debug-opt]
    4,034,794  ???:__vfprintf [/usr/lib/system/libsystem_c.dylib]
    3,977,804  ???:pattryrefs [/usr/local/bin/zsh-ps-debug-opt]
    3,604,588  ???:assignstrvalue [/usr/local/bin/zsh-ps-debug-opt]
    3,520,423  ???:matheval [/usr/local/bin/zsh-ps-debug-opt]
    3,518,560  ???:mb_charinit [/usr/local/bin/zsh-ps-debug-opt]
    3,487,929  ???:freeheap [/usr/local/bin/zsh-ps-debug-opt]


[-- Attachment #3: callgrind_annotate.txt --]
[-- Type: text/plain, Size: 7036 bytes --]

--------------------------------------------------------------------------------
Profile data file 'callgrind.out.11879' (creator: callgrind-3.12.0)
--------------------------------------------------------------------------------
I1 cache: 
D1 cache: 
LL cache: 
Timerange: Basic block 0 - 2995164135
Trigger: Program termination
Profiled target:  zsh-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 11879, part 1)
Events recorded:  Ir
Events shown:     Ir
Event sort order: Ir
Thresholds:       99
Include dirs:     
User annotated:   
Auto-annotation:  off

--------------------------------------------------------------------------------
            Ir 
--------------------------------------------------------------------------------
16,735,388,538  PROGRAM TOTALS

--------------------------------------------------------------------------------
           Ir  file:function
--------------------------------------------------------------------------------
2,269,560,047  ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt]
1,698,947,505  ???:remnulargs [/usr/local/bin/zsh-debug-opt]
1,677,804,272  ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,425,973,736  ???:mbrtowc [/usr/lib/system/libsystem_c.dylib]
1,177,994,701  ???:untokenize [/usr/local/bin/zsh-debug-opt]
1,048,181,974  ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt]
1,036,055,574  ???:getindex'2 [/usr/local/bin/zsh-debug-opt]
  793,202,632  ???:haswilds [/usr/local/bin/zsh-debug-opt]
  578,630,988  ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt]
  483,051,992  ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib]
  436,411,797  ???:ztrsub [/usr/local/bin/zsh-debug-opt]
  364,444,476  ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib]
  353,826,375  ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt]
  280,090,072  ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib]
  258,502,596  ???:strlen [/usr/lib/dyld]
  234,273,918  ???:pattrylen [/usr/local/bin/zsh-debug-opt]
  209,835,520  ???:szone_size [/usr/lib/system/libsystem_malloc.dylib]
  193,985,837  ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib]
  169,580,182  ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib]
  143,109,122  ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib]
   97,432,800  ???:free [/usr/lib/dyld]
   97,335,179  ???:itype_end [/usr/local/bin/zsh-debug-opt]
   95,353,820  ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib]
   83,934,500  ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib]
   81,015,036  ???:filesub [/usr/local/bin/zsh-debug-opt]
   68,738,845  ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib]
   60,927,832  ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   57,698,352  ???:zalloc [/usr/local/bin/zsh-debug-opt]
   55,196,289  ???:bin_log [/usr/local/bin/zsh-debug-opt]
   54,517,015  ???:stpcpy [/usr/lib/system/libsystem_c.dylib]
   51,545,105  ???:setarrvalue [/usr/local/bin/zsh-debug-opt]
   49,052,650  ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib]
   48,122,314  ???:ztrdup [/usr/local/bin/zsh-debug-opt]
   45,371,076  ???:mathevalarg'2 [/usr/local/bin/zsh-debug-opt]
   44,923,221  ???:arrlen [/usr/local/bin/zsh-debug-opt]
   44,888,769  ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib]
   43,521,301  ???:malloc [/usr/lib/dyld]
   33,548,312  ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib]
   33,378,378  ???:execlist'2 [/usr/local/bin/zsh-debug-opt]
   32,027,315  ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib]
   29,584,698  ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib]
   28,786,904  ???:hasher [/usr/local/bin/zsh-debug-opt]
   25,459,319  ???:zhalloc [/usr/local/bin/zsh-debug-opt]
   25,436,057  ???:modify [/usr/local/bin/zsh-debug-opt]
   23,233,085  ???:patcompile'2 [/usr/local/bin/zsh-debug-opt]
   23,114,835  ???:zsfree [/usr/local/bin/zsh-debug-opt]
   21,720,915  ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib]
   21,033,364  ???:execrestore'2 [/usr/local/bin/zsh-debug-opt]
   21,029,416  ???:ingetc [/usr/local/bin/zsh-debug-opt]
   20,619,575  ???:freearray [/usr/local/bin/zsh-debug-opt]
   18,246,076  ???:fetchvalue [/usr/local/bin/zsh-debug-opt]
   17,288,068  ???:isascii [/usr/lib/system/libsystem_c.dylib]
   16,274,888  ???:filesub'2 [/usr/local/bin/zsh-debug-opt]
   15,162,279  ???:haswilds'2 [/usr/local/bin/zsh-debug-opt]
   12,590,562  ???:parsestrnoerr [/usr/local/bin/zsh-debug-opt]
   10,881,530  ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib]
   10,206,971  ???:zstrtol_underscore [/usr/local/bin/zsh-debug-opt]
    9,997,508  ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib]
    9,639,468  ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib]
    9,404,226  ???:modify'2 [/usr/local/bin/zsh-debug-opt]
    8,688,566  ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,688,566  ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib]
    8,688,366  ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib]
    8,497,692  ???:op [/usr/local/bin/zsh-debug-opt]
    8,390,890  ???:prefork [/usr/local/bin/zsh-debug-opt]
    8,223,809  ???:patcompstart [/usr/local/bin/zsh-debug-opt]
    7,962,974  ???:gethashnode2 [/usr/local/bin/zsh-debug-opt]
    7,766,705  ???:scanmatchtable [/usr/local/bin/zsh-debug-opt]
    7,013,693  ???:parsestrnoerr'2 [/usr/local/bin/zsh-debug-opt]
    6,917,568  ???:op'2 [/usr/local/bin/zsh-debug-opt]
    6,909,521  ???:getindex [/usr/local/bin/zsh-debug-opt]
    6,827,173  ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib]
    6,691,178  ???:hasbraces [/usr/local/bin/zsh-debug-opt]
    6,601,957  ???:mathevalarg [/usr/local/bin/zsh-debug-opt]
    6,523,091  ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib]
    6,465,840  ???:ecgetstr [/usr/local/bin/zsh-debug-opt]
    6,193,899  ???:getstrvalue [/usr/local/bin/zsh-debug-opt]
    6,178,975  ???:matheval'2 [/usr/local/bin/zsh-debug-opt]
    6,012,923  ???:patcompile [/usr/local/bin/zsh-debug-opt]
    5,721,631  ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld]
    5,083,953  ???:add [/usr/local/bin/zsh-debug-opt]
    4,864,901  ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib]
    4,728,297  ???:dupstring [/usr/local/bin/zsh-debug-opt]
    4,331,738  ???:fetchvalue'2 [/usr/local/bin/zsh-debug-opt]
    4,212,315  ???:newparamtable [/usr/local/bin/zsh-debug-opt]
    4,034,794  ???:__vfprintf [/usr/lib/system/libsystem_c.dylib]
    3,977,804  ???:pattryrefs [/usr/local/bin/zsh-debug-opt]
    3,604,588  ???:assignstrvalue [/usr/local/bin/zsh-debug-opt]
    3,520,423  ???:matheval [/usr/local/bin/zsh-debug-opt]
    3,518,560  ???:mb_charinit [/usr/local/bin/zsh-debug-opt]


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: multibyte optimisations
  2016-11-10 13:47   ` multibyte optimisations Peter Stephenson
@ 2016-11-10 14:57     ` Sebastian Gniazdowski
  0 siblings, 0 replies; 5+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-10 14:57 UTC (permalink / raw)
  To: zsh-workers

On Thu, Nov 10, 2016, at 05:47 AM, Peter Stephenson wrote:
> On Thu, 10 Nov 2016 02:37:12 -0800
> Sebastian Gniazdowski <psprint@fastmail.com> wrote:
> > Other pointed functions seem to be very valid / expected – multibyte
> > functions. They can be optimized if a courageous decision will be made –
> > to do what charnext / pattern.c does:
> > 
> >     if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
> >         return x + 1;
> > 
> > I.e. to optimize for ASCII as subset of UTF-8 also when calling
> > MB_METACHARLEN, not only for MB_METASTRLEN (recent change).
> 
> These look straightforward and along the same lines as what we already
> do.

Was worried that multibyte state can be not clear when requesting length
of character, but that cannot really happen, and if it would, then the
loop that advances char by char would have a problem, being in unclear
situation after recent advancement. With this patch the parser runs for
1493 ms instead of 2148 ms :)

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-11-10 14:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20161110103845epcas3p3e7cabeffae723219daafa8d3e6b32f12@epcas3p3.samsung.com>
2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski
2016-11-10 12:31   ` Peter Stephenson
2016-11-10 14:07     ` Sebastian Gniazdowski
2016-11-10 13:47   ` multibyte optimisations Peter Stephenson
2016-11-10 14:57     ` Sebastian Gniazdowski

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).