* Callgrind run @ 2016-11-10 10:37 ` Sebastian Gniazdowski 2016-11-10 12:31 ` Peter Stephenson 2016-11-10 13:47 ` multibyte optimisations Peter Stephenson 0 siblings, 2 replies; 5+ messages in thread From: Sebastian Gniazdowski @ 2016-11-10 10:37 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 2353 bytes --] Hello I've run callgrind on Zsh, when executing syntax-highlighting code that parses 823 lines of code: 2,269,560,047 ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt] 1,698,947,505 ???:remnulargs [/usr/local/bin/zsh-debug-opt] 1,677,804,272 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,425,973,736 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,177,994,701 ???:untokenize [/usr/local/bin/zsh-debug-opt] 1,048,181,974 ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt] 1,036,055,574 ???:getindex'2 [/usr/local/bin/zsh-debug-opt] 793,202,632 ???:haswilds [/usr/local/bin/zsh-debug-opt] 578,630,988 ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt] 483,051,992 ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib] 436,411,797 ???:ztrsub [/usr/local/bin/zsh-debug-opt] 364,444,476 ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib] 353,826,375 ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt] 280,090,072 ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib] 258,502,596 ???:strlen [/usr/lib/dyld] 234,273,918 ???:pattrylen [/usr/local/bin/zsh-debug-opt] 209,835,520 ???:szone_size [/usr/lib/system/libsystem_malloc.dylib] To repeat the run clone https://github.com/psprint/history-search-multi-word/ and add "valgrind --tool=callgrind" before "zsh" (after exec) in parse.zsh, then run ./parse.zsh ./to-parse.zsh. I think this is a very good real world test. Seems that Zsh execution could be greatly optimized if functions: remnulargs, untokenize, haswilds could be optimized. Not sure if the results are reasonable, as haswilds just iterates over a string and does quite basic switch. The other two functions have nested loops, so they look more likely as being time consuming. Maybe the nested loop can be changed to something else? Other pointed functions seem to be very valid / expected – multibyte functions. They can be optimized if a courageous decision will be made – to do what charnext / pattern.c does: if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80)) return x + 1; I.e. to optimize for ASCII as subset of UTF-8 also when calling MB_METACHARLEN, not only for MB_METASTRLEN (recent change). -- Sebastian Gniazdowski psprint@fastmail.com [-- Attachment #2: callgrind_annotate.txt --] [-- Type: text/plain, Size: 7036 bytes --] -------------------------------------------------------------------------------- Profile data file 'callgrind.out.11879' (creator: callgrind-3.12.0) -------------------------------------------------------------------------------- I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 2995164135 Trigger: Program termination Profiled target: zsh-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 11879, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off -------------------------------------------------------------------------------- Ir -------------------------------------------------------------------------------- 16,735,388,538 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 2,269,560,047 ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt] 1,698,947,505 ???:remnulargs [/usr/local/bin/zsh-debug-opt] 1,677,804,272 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,425,973,736 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,177,994,701 ???:untokenize [/usr/local/bin/zsh-debug-opt] 1,048,181,974 ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt] 1,036,055,574 ???:getindex'2 [/usr/local/bin/zsh-debug-opt] 793,202,632 ???:haswilds [/usr/local/bin/zsh-debug-opt] 578,630,988 ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt] 483,051,992 ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib] 436,411,797 ???:ztrsub [/usr/local/bin/zsh-debug-opt] 364,444,476 ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib] 353,826,375 ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt] 280,090,072 ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib] 258,502,596 ???:strlen [/usr/lib/dyld] 234,273,918 ???:pattrylen [/usr/local/bin/zsh-debug-opt] 209,835,520 ???:szone_size [/usr/lib/system/libsystem_malloc.dylib] 193,985,837 ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib] 169,580,182 ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib] 143,109,122 ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib] 97,432,800 ???:free [/usr/lib/dyld] 97,335,179 ???:itype_end [/usr/local/bin/zsh-debug-opt] 95,353,820 ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib] 83,934,500 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib] 81,015,036 ???:filesub [/usr/local/bin/zsh-debug-opt] 68,738,845 ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib] 60,927,832 ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib] 57,698,352 ???:zalloc [/usr/local/bin/zsh-debug-opt] 55,196,289 ???:bin_log [/usr/local/bin/zsh-debug-opt] 54,517,015 ???:stpcpy [/usr/lib/system/libsystem_c.dylib] 51,545,105 ???:setarrvalue [/usr/local/bin/zsh-debug-opt] 49,052,650 ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib] 48,122,314 ???:ztrdup [/usr/local/bin/zsh-debug-opt] 45,371,076 ???:mathevalarg'2 [/usr/local/bin/zsh-debug-opt] 44,923,221 ???:arrlen [/usr/local/bin/zsh-debug-opt] 44,888,769 ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib] 43,521,301 ???:malloc [/usr/lib/dyld] 33,548,312 ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib] 33,378,378 ???:execlist'2 [/usr/local/bin/zsh-debug-opt] 32,027,315 ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib] 29,584,698 ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib] 28,786,904 ???:hasher [/usr/local/bin/zsh-debug-opt] 25,459,319 ???:zhalloc [/usr/local/bin/zsh-debug-opt] 25,436,057 ???:modify [/usr/local/bin/zsh-debug-opt] 23,233,085 ???:patcompile'2 [/usr/local/bin/zsh-debug-opt] 23,114,835 ???:zsfree [/usr/local/bin/zsh-debug-opt] 21,720,915 ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib] 21,033,364 ???:execrestore'2 [/usr/local/bin/zsh-debug-opt] 21,029,416 ???:ingetc [/usr/local/bin/zsh-debug-opt] 20,619,575 ???:freearray [/usr/local/bin/zsh-debug-opt] 18,246,076 ???:fetchvalue [/usr/local/bin/zsh-debug-opt] 17,288,068 ???:isascii [/usr/lib/system/libsystem_c.dylib] 16,274,888 ???:filesub'2 [/usr/local/bin/zsh-debug-opt] 15,162,279 ???:haswilds'2 [/usr/local/bin/zsh-debug-opt] 12,590,562 ???:parsestrnoerr [/usr/local/bin/zsh-debug-opt] 10,881,530 ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib] 10,206,971 ???:zstrtol_underscore [/usr/local/bin/zsh-debug-opt] 9,997,508 ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib] 9,639,468 ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib] 9,404,226 ???:modify'2 [/usr/local/bin/zsh-debug-opt] 8,688,566 ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib] 8,688,566 ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib] 8,688,366 ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib] 8,497,692 ???:op [/usr/local/bin/zsh-debug-opt] 8,390,890 ???:prefork [/usr/local/bin/zsh-debug-opt] 8,223,809 ???:patcompstart [/usr/local/bin/zsh-debug-opt] 7,962,974 ???:gethashnode2 [/usr/local/bin/zsh-debug-opt] 7,766,705 ???:scanmatchtable [/usr/local/bin/zsh-debug-opt] 7,013,693 ???:parsestrnoerr'2 [/usr/local/bin/zsh-debug-opt] 6,917,568 ???:op'2 [/usr/local/bin/zsh-debug-opt] 6,909,521 ???:getindex [/usr/local/bin/zsh-debug-opt] 6,827,173 ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib] 6,691,178 ???:hasbraces [/usr/local/bin/zsh-debug-opt] 6,601,957 ???:mathevalarg [/usr/local/bin/zsh-debug-opt] 6,523,091 ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib] 6,465,840 ???:ecgetstr [/usr/local/bin/zsh-debug-opt] 6,193,899 ???:getstrvalue [/usr/local/bin/zsh-debug-opt] 6,178,975 ???:matheval'2 [/usr/local/bin/zsh-debug-opt] 6,012,923 ???:patcompile [/usr/local/bin/zsh-debug-opt] 5,721,631 ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld] 5,083,953 ???:add [/usr/local/bin/zsh-debug-opt] 4,864,901 ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib] 4,728,297 ???:dupstring [/usr/local/bin/zsh-debug-opt] 4,331,738 ???:fetchvalue'2 [/usr/local/bin/zsh-debug-opt] 4,212,315 ???:newparamtable [/usr/local/bin/zsh-debug-opt] 4,034,794 ???:__vfprintf [/usr/lib/system/libsystem_c.dylib] 3,977,804 ???:pattryrefs [/usr/local/bin/zsh-debug-opt] 3,604,588 ???:assignstrvalue [/usr/local/bin/zsh-debug-opt] 3,520,423 ???:matheval [/usr/local/bin/zsh-debug-opt] 3,518,560 ???:mb_charinit [/usr/local/bin/zsh-debug-opt] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Callgrind run 2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski @ 2016-11-10 12:31 ` Peter Stephenson 2016-11-10 14:07 ` Sebastian Gniazdowski 2016-11-10 13:47 ` multibyte optimisations Peter Stephenson 1 sibling, 1 reply; 5+ messages in thread From: Peter Stephenson @ 2016-11-10 12:31 UTC (permalink / raw) To: zsh-workers On Thu, 10 Nov 2016 02:37:12 -0800 Sebastian Gniazdowski <psprint@fastmail.com> wrote: > Seems that Zsh execution could be greatly optimized if functions: > remnulargs, untokenize, haswilds could be optimized. Not sure if the > results are reasonable, as haswilds just iterates over a string and does > quite basic switch. The other two functions have nested loops, so they > look more likely as being time consuming. Maybe the nested loop can be > changed to something else? The nested loops aren't "real" nested loops; the inner loop runs to completion and then breaks if the outer loop detects a condition that needs handling. To do a good job optimising here, we really need state information outside the functions --- in an experiment with my start up files, only 16% of calls to untokenize() actually had any effect. But recording the state generally is a very big change. Some possible optimisations are along the following lines, although a bit of care it's needed as it's not necessarily the case on all architectures that the bit test used by itok() is necessarily faster than the range test the following replaces it with. It did seem faster on this fairly standard Intel CPU. I probably won't be committing this. diff --git a/Src/exec.c b/Src/exec.c index a01a633..a6b01a6 100644 --- a/Src/exec.c +++ b/Src/exec.c @@ -1953,26 +1953,24 @@ makecline(LinkList list) mod_export void untokenize(char *s) { - if (*s) { + if (*s) { /* "" may be a const string. Ick. */ int c; - while ((c = *s++)) - if (itok(c)) { + while ((c = *s++)) { + if (c >= FIRST_TOK && c <= LAST_TOK) { char *p = s - 1; if (c != Nularg) - *p++ = ztokens[c - Pound]; + *p++ = ztoken_to_char[STOUC(c)]; while ((c = *s++)) { - if (itok(c)) { - if (c != Nularg) - *p++ = ztokens[c - Pound]; - } else - *p++ = c; + if (c != Nularg) + *p++ = ztoken_to_char[STOUC(c)]; } *p = '\0'; break; } + } } } diff --git a/Src/glob.c b/Src/glob.c index 50f6dce..4d3fc51 100644 --- a/Src/glob.c +++ b/Src/glob.c @@ -3570,7 +3570,7 @@ remnulargs(char *s) if (*s) { char *o = s, c; - while ((c = *s++)) + while ((c = *s++)) { if (c == Bnullkeep) { /* * An active backslash that needs to be turned back into @@ -3579,7 +3579,7 @@ remnulargs(char *s) * pattern matching. */ continue; - } else if (inull(c)) { + } else if (c >= FIRST_NULL && c <= LAST_NULL) { char *t = s - 1; while ((c = *s++)) { @@ -3595,6 +3595,7 @@ remnulargs(char *s) } break; } + } } } diff --git a/Src/lex.c b/Src/lex.c index 8896128..bfd6b11 100644 --- a/Src/lex.c +++ b/Src/lex.c @@ -37,6 +37,18 @@ /**/ mod_export char ztokens[] = "#$^*(())$=|{}[]`<>>?~`,-!'\"\\\\"; +/* + * Map a possibly tokenized unsigned char to a normal unsigned + * char, for use in untokenize(). + * + * Tokens that need untokenizing (everything in ztokens except Nularg) + * map to a different character, everything else maps to itself. + * In particular, metafied characters are passed through unchanged + * (effectively escaping tokens) and do not need special handling. + */ +/**/ +mod_export char ztoken_to_char[256]; + /* parts of the current token */ /**/ diff --git a/Src/utils.c b/Src/utils.c index 3d535b8..9fa8a97 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -4012,6 +4012,18 @@ inittyptab(void) for (s = PATCHARS; *s; s++) typtab[STOUC(*s)] |= IPATTERN; + for (t0 = 0; t0 < 256; t0++) + { + if (itok(t0) && (char)t0 != Nularg) + { + ztoken_to_char[t0] = ztokens[t0 - STOUC(Pound)]; + } + else + { + ztoken_to_char[t0] = (char)t0; + } + } + unqueue_signals(); } diff --git a/Src/zsh.h b/Src/zsh.h index a5d4455..5065a54 100644 --- a/Src/zsh.h +++ b/Src/zsh.h @@ -170,6 +170,7 @@ struct mathfunc { * These should match the characters in ztokens, defined in lex.c */ #define Pound ((char) 0x84) +#define FIRST_TOK Pound #define String ((char) 0x85) #define Hat ((char) 0x86) #define Star ((char) 0x87) @@ -204,6 +205,7 @@ struct mathfunc { * and backslashes. */ #define Snull ((char) 0x9d) +#define FIRST_NULL Snull #define Dnull ((char) 0x9e) #define Bnull ((char) 0x9f) /* @@ -217,6 +219,8 @@ struct mathfunc { * is used to initialise the IMETA type in inittyptab(). */ #define Nularg ((char) 0xa1) +#define LAST_TOK Nularg +#define LAST_NULL Nularg /* * Take care to update the use of IMETA appropriately when adding ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Callgrind run 2016-11-10 12:31 ` Peter Stephenson @ 2016-11-10 14:07 ` Sebastian Gniazdowski 0 siblings, 0 replies; 5+ messages in thread From: Sebastian Gniazdowski @ 2016-11-10 14:07 UTC (permalink / raw) To: zsh-workers [-- Attachment #1: Type: text/plain, Size: 1388 bytes --] On Thu, Nov 10, 2016, at 04:31 AM, Peter Stephenson wrote: > To do a good job optimising here, we really need state information > outside the functions --- in an experiment with my start up files, only > 16% of calls to untokenize() actually had any effect. But recording the > state generally is a very big change. > > Some possible optimisations are along the following lines, although a > bit of care it's needed as it's not necessarily the case on all > architectures that the bit test used by itok() is necessarily faster > than the range test the following replaces it with. It did seem faster > on this fairly standard Intel CPU. Tested this and no big change, maybe 14 ms – running times are 2135 vs 2149, but that can be just instability. However callgrind reports 851,712,174 instructions instead of 1,177,994,701 for untokenize, while other instruction counts are kept the same so the test seems valid. My motivation is parsing of long Zsh code – would be a cool thing to iterate long (z)-splitted input in say 400 ms instead of 2 seconds – a dreamed result, maybe actually impossible, as disabling multibyte yields 1560 ms. State recording might seem bad but at least there is room for improvement condensed in apparently few places, better than counting cycles along whole Zsh code. -- Sebastian Gniazdowski psprint@fastmail.com [-- Attachment #2: callgrind_annotate3.txt --] [-- Type: text/plain, Size: 7316 bytes --] -------------------------------------------------------------------------------- Profile data file 'callgrind.out.37869' (creator: callgrind-3.12.0) -------------------------------------------------------------------------------- I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 3061023589 Trigger: Program termination Profiled target: zsh-ps-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 37869, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off -------------------------------------------------------------------------------- Ir -------------------------------------------------------------------------------- 16,408,775,049 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 2,269,560,047 ???:mb_metacharlenconv_r [/usr/local/bin/zsh-ps-debug-opt] 1,697,840,717 ???:remnulargs [/usr/local/bin/zsh-ps-debug-opt] 1,677,804,272 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,425,973,736 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,048,181,974 ???:mb_metacharlenconv [/usr/local/bin/zsh-ps-debug-opt] 1,036,055,574 ???:getindex'2 [/usr/local/bin/zsh-ps-debug-opt] 851,712,174 ???:untokenize [/usr/local/bin/zsh-ps-debug-opt] 793,202,632 ???:haswilds [/usr/local/bin/zsh-ps-debug-opt] 578,630,988 ???:mb_metastrlenend [/usr/local/bin/zsh-ps-debug-opt] 482,828,373 ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib] 436,411,797 ???:ztrsub [/usr/local/bin/zsh-ps-debug-opt] 363,212,196 ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib] 353,826,375 ???:pattrylen'2 [/usr/local/bin/zsh-ps-debug-opt] 282,357,130 ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib] 258,502,798 ???:strlen [/usr/lib/dyld] 234,273,918 ???:pattrylen [/usr/local/bin/zsh-ps-debug-opt] 209,831,892 ???:szone_size [/usr/lib/system/libsystem_malloc.dylib] 193,951,431 ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib] 169,581,080 ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib] 143,108,999 ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib] 97,432,800 ???:free [/usr/lib/system/libsystem_malloc.dylib] 97,335,179 ???:itype_end [/usr/local/bin/zsh-ps-debug-opt] 95,268,036 ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib] 83,934,500 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib] 81,015,036 ???:filesub [/usr/local/bin/zsh-ps-debug-opt] 68,739,019 ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib] 60,928,000 ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib] 57,698,433 ???:zalloc [/usr/local/bin/zsh-ps-debug-opt] 55,196,334 ???:bin_log [/usr/local/bin/zsh-ps-debug-opt] 54,517,153 ???:stpcpy [/usr/lib/system/libsystem_c.dylib] 51,545,105 ???:setarrvalue [/usr/local/bin/zsh-ps-debug-opt] 49,058,372 ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib] 48,122,383 ???:ztrdup [/usr/local/bin/zsh-ps-debug-opt] 45,371,076 ???:mathevalarg'2 [/usr/local/bin/zsh-ps-debug-opt] 44,923,221 ???:arrlen [/usr/local/bin/zsh-ps-debug-opt] 44,888,797 ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib] 43,521,421 ???:malloc [/usr/lib/system/libsystem_malloc.dylib] 33,548,396 ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib] 33,378,378 ???:execlist'2 [/usr/local/bin/zsh-ps-debug-opt] 32,027,396 ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib] 29,584,698 ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib] 28,788,128 ???:hasher [/usr/local/bin/zsh-ps-debug-opt] 25,459,319 ???:zhalloc [/usr/local/bin/zsh-ps-debug-opt] 25,436,057 ???:modify [/usr/local/bin/zsh-ps-debug-opt] 23,233,085 ???:patcompile'2 [/usr/local/bin/zsh-ps-debug-opt] 23,114,835 ???:zsfree [/usr/local/bin/zsh-ps-debug-opt] 21,720,950 ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib] 21,033,364 ???:execrestore'2 [/usr/local/bin/zsh-ps-debug-opt] 21,029,575 ???:ingetc [/usr/local/bin/zsh-ps-debug-opt] 20,619,575 ???:freearray [/usr/local/bin/zsh-ps-debug-opt] 18,246,076 ???:fetchvalue [/usr/local/bin/zsh-ps-debug-opt] 17,288,068 ???:isascii [/usr/lib/system/libsystem_c.dylib] 16,274,888 ???:filesub'2 [/usr/local/bin/zsh-ps-debug-opt] 15,162,279 ???:haswilds'2 [/usr/local/bin/zsh-ps-debug-opt] 12,590,562 ???:parsestrnoerr [/usr/local/bin/zsh-ps-debug-opt] 10,881,555 ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib] 10,206,971 ???:zstrtol_underscore [/usr/local/bin/zsh-ps-debug-opt] 9,997,820 ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib] 9,639,594 ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib] 9,404,226 ???:modify'2 [/usr/local/bin/zsh-ps-debug-opt] 8,688,580 ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib] 8,688,580 ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib] 8,688,380 ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib] 8,497,692 ???:op [/usr/local/bin/zsh-ps-debug-opt] 8,390,890 ???:prefork [/usr/local/bin/zsh-ps-debug-opt] 8,223,809 ???:patcompstart [/usr/local/bin/zsh-ps-debug-opt] 7,963,097 ???:gethashnode2 [/usr/local/bin/zsh-ps-debug-opt] 7,766,705 ???:scanmatchtable [/usr/local/bin/zsh-ps-debug-opt] 7,013,693 ???:parsestrnoerr'2 [/usr/local/bin/zsh-ps-debug-opt] 6,917,568 ???:op'2 [/usr/local/bin/zsh-ps-debug-opt] 6,909,521 ???:getindex [/usr/local/bin/zsh-ps-debug-opt] 6,827,386 ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib] 6,691,178 ???:hasbraces [/usr/local/bin/zsh-ps-debug-opt] 6,601,957 ???:mathevalarg [/usr/local/bin/zsh-ps-debug-opt] 6,523,105 ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib] 6,465,840 ???:ecgetstr [/usr/local/bin/zsh-ps-debug-opt] 6,193,899 ???:getstrvalue [/usr/local/bin/zsh-ps-debug-opt] 6,178,975 ???:matheval'2 [/usr/local/bin/zsh-ps-debug-opt] 6,012,923 ???:patcompile [/usr/local/bin/zsh-ps-debug-opt] 5,721,631 ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld] 5,084,013 ???:add [/usr/local/bin/zsh-ps-debug-opt] 4,864,913 ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib] 4,728,297 ???:dupstring [/usr/local/bin/zsh-ps-debug-opt] 4,331,738 ???:fetchvalue'2 [/usr/local/bin/zsh-ps-debug-opt] 4,212,315 ???:newparamtable [/usr/local/bin/zsh-ps-debug-opt] 4,034,794 ???:__vfprintf [/usr/lib/system/libsystem_c.dylib] 3,977,804 ???:pattryrefs [/usr/local/bin/zsh-ps-debug-opt] 3,604,588 ???:assignstrvalue [/usr/local/bin/zsh-ps-debug-opt] 3,520,423 ???:matheval [/usr/local/bin/zsh-ps-debug-opt] 3,518,560 ???:mb_charinit [/usr/local/bin/zsh-ps-debug-opt] 3,487,929 ???:freeheap [/usr/local/bin/zsh-ps-debug-opt] [-- Attachment #3: callgrind_annotate.txt --] [-- Type: text/plain, Size: 7036 bytes --] -------------------------------------------------------------------------------- Profile data file 'callgrind.out.11879' (creator: callgrind-3.12.0) -------------------------------------------------------------------------------- I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 2995164135 Trigger: Program termination Profiled target: zsh-debug-opt -f -c source "./testparse.zsh" "./to-parse.zsh" "changes.out" "" (PID 11879, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off -------------------------------------------------------------------------------- Ir -------------------------------------------------------------------------------- 16,735,388,538 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 2,269,560,047 ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt] 1,698,947,505 ???:remnulargs [/usr/local/bin/zsh-debug-opt] 1,677,804,272 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,425,973,736 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,177,994,701 ???:untokenize [/usr/local/bin/zsh-debug-opt] 1,048,181,974 ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt] 1,036,055,574 ???:getindex'2 [/usr/local/bin/zsh-debug-opt] 793,202,632 ???:haswilds [/usr/local/bin/zsh-debug-opt] 578,630,988 ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt] 483,051,992 ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib] 436,411,797 ???:ztrsub [/usr/local/bin/zsh-debug-opt] 364,444,476 ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib] 353,826,375 ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt] 280,090,072 ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib] 258,502,596 ???:strlen [/usr/lib/dyld] 234,273,918 ???:pattrylen [/usr/local/bin/zsh-debug-opt] 209,835,520 ???:szone_size [/usr/lib/system/libsystem_malloc.dylib] 193,985,837 ???:tiny_free_list_remove_ptr [/usr/lib/system/libsystem_malloc.dylib] 169,580,182 ???:szone_malloc_should_clear [/usr/lib/system/libsystem_malloc.dylib] 143,109,122 ???:_platform_memmove$VARIANT$Nehalem [/usr/lib/system/libsystem_platform.dylib] 97,432,800 ???:free [/usr/lib/dyld] 97,335,179 ???:itype_end [/usr/local/bin/zsh-debug-opt] 95,353,820 ???:get_tiny_free_size [/usr/lib/system/libsystem_malloc.dylib] 83,934,500 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib] 81,015,036 ???:filesub [/usr/local/bin/zsh-debug-opt] 68,738,845 ???:__strcpy_chk [/usr/lib/system/libsystem_c.dylib] 60,927,832 ???:malloc_zone_malloc [/usr/lib/system/libsystem_malloc.dylib] 57,698,352 ???:zalloc [/usr/local/bin/zsh-debug-opt] 55,196,289 ???:bin_log [/usr/local/bin/zsh-debug-opt] 54,517,015 ???:stpcpy [/usr/lib/system/libsystem_c.dylib] 51,545,105 ???:setarrvalue [/usr/local/bin/zsh-debug-opt] 49,052,650 ???:get_tiny_previous_free_msize [/usr/lib/system/libsystem_malloc.dylib] 48,122,314 ???:ztrdup [/usr/local/bin/zsh-debug-opt] 45,371,076 ???:mathevalarg'2 [/usr/local/bin/zsh-debug-opt] 44,923,221 ???:arrlen [/usr/local/bin/zsh-debug-opt] 44,888,769 ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib] 43,521,301 ???:malloc [/usr/lib/dyld] 33,548,312 ???:__chk_overlap [/usr/lib/system/libsystem_c.dylib] 33,378,378 ???:execlist'2 [/usr/local/bin/zsh-debug-opt] 32,027,315 ???:_platform_memset$VARIANT$Merom [/usr/lib/system/libsystem_platform.dylib] 29,584,698 ???:_platform_strchr$VARIANT$Generic [/usr/lib/system/libsystem_platform.dylib] 28,786,904 ???:hasher [/usr/local/bin/zsh-debug-opt] 25,459,319 ???:zhalloc [/usr/local/bin/zsh-debug-opt] 25,436,057 ???:modify [/usr/local/bin/zsh-debug-opt] 23,233,085 ???:patcompile'2 [/usr/local/bin/zsh-debug-opt] 23,114,835 ???:zsfree [/usr/local/bin/zsh-debug-opt] 21,720,915 ???:_os_lock_spin_lock [/usr/lib/system/libsystem_platform.dylib] 21,033,364 ???:execrestore'2 [/usr/local/bin/zsh-debug-opt] 21,029,416 ???:ingetc [/usr/local/bin/zsh-debug-opt] 20,619,575 ???:freearray [/usr/local/bin/zsh-debug-opt] 18,246,076 ???:fetchvalue [/usr/local/bin/zsh-debug-opt] 17,288,068 ???:isascii [/usr/lib/system/libsystem_c.dylib] 16,274,888 ???:filesub'2 [/usr/local/bin/zsh-debug-opt] 15,162,279 ???:haswilds'2 [/usr/local/bin/zsh-debug-opt] 12,590,562 ???:parsestrnoerr [/usr/local/bin/zsh-debug-opt] 10,881,530 ???:szone_malloc [/usr/lib/system/libsystem_malloc.dylib] 10,206,971 ???:zstrtol_underscore [/usr/local/bin/zsh-debug-opt] 9,997,508 ???:_pthread_mutex_unlock_slow [/usr/lib/system/libsystem_pthread.dylib] 9,639,468 ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib] 9,404,226 ???:modify'2 [/usr/local/bin/zsh-debug-opt] 8,688,566 ???:os_lock_unlock [/usr/lib/system/libsystem_platform.dylib] 8,688,566 ???:os_lock_lock [/usr/lib/system/libsystem_platform.dylib] 8,688,366 ???:_os_lock_spin_unlock [/usr/lib/system/libsystem_platform.dylib] 8,497,692 ???:op [/usr/local/bin/zsh-debug-opt] 8,390,890 ???:prefork [/usr/local/bin/zsh-debug-opt] 8,223,809 ???:patcompstart [/usr/local/bin/zsh-debug-opt] 7,962,974 ???:gethashnode2 [/usr/local/bin/zsh-debug-opt] 7,766,705 ???:scanmatchtable [/usr/local/bin/zsh-debug-opt] 7,013,693 ???:parsestrnoerr'2 [/usr/local/bin/zsh-debug-opt] 6,917,568 ???:op'2 [/usr/local/bin/zsh-debug-opt] 6,909,521 ???:getindex [/usr/local/bin/zsh-debug-opt] 6,827,173 ???:_pthread_mutex_lock_slow [/usr/lib/system/libsystem_pthread.dylib] 6,691,178 ???:hasbraces [/usr/local/bin/zsh-debug-opt] 6,601,957 ???:mathevalarg [/usr/local/bin/zsh-debug-opt] 6,523,091 ???:get_node_from_uniquing_table [/usr/lib/system/libsystem_malloc.dylib] 6,465,840 ???:ecgetstr [/usr/local/bin/zsh-debug-opt] 6,193,899 ???:getstrvalue [/usr/local/bin/zsh-debug-opt] 6,178,975 ???:matheval'2 [/usr/local/bin/zsh-debug-opt] 6,012,923 ???:patcompile [/usr/local/bin/zsh-debug-opt] 5,721,631 ???:ImageLoaderMachOCompressed::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld] 5,083,953 ???:add [/usr/local/bin/zsh-debug-opt] 4,864,901 ???:__vsnprintf_chk'2 [/usr/lib/system/libsystem_c.dylib] 4,728,297 ???:dupstring [/usr/local/bin/zsh-debug-opt] 4,331,738 ???:fetchvalue'2 [/usr/local/bin/zsh-debug-opt] 4,212,315 ???:newparamtable [/usr/local/bin/zsh-debug-opt] 4,034,794 ???:__vfprintf [/usr/lib/system/libsystem_c.dylib] 3,977,804 ???:pattryrefs [/usr/local/bin/zsh-debug-opt] 3,604,588 ???:assignstrvalue [/usr/local/bin/zsh-debug-opt] 3,520,423 ???:matheval [/usr/local/bin/zsh-debug-opt] 3,518,560 ???:mb_charinit [/usr/local/bin/zsh-debug-opt] ^ permalink raw reply [flat|nested] 5+ messages in thread
* multibyte optimisations 2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski 2016-11-10 12:31 ` Peter Stephenson @ 2016-11-10 13:47 ` Peter Stephenson 2016-11-10 14:57 ` Sebastian Gniazdowski 1 sibling, 1 reply; 5+ messages in thread From: Peter Stephenson @ 2016-11-10 13:47 UTC (permalink / raw) To: zsh-workers On Thu, 10 Nov 2016 02:37:12 -0800 Sebastian Gniazdowski <psprint@fastmail.com> wrote: > Other pointed functions seem to be very valid / expected – multibyte > functions. They can be optimized if a courageous decision will be made – > to do what charnext / pattern.c does: > > if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80)) > return x + 1; > > I.e. to optimize for ASCII as subset of UTF-8 also when calling > MB_METACHARLEN, not only for MB_METASTRLEN (recent change). These look straightforward and along the same lines as what we already do. pws diff --git a/Src/utils.c b/Src/utils.c index 3d535b8..cceaf4c 100644 --- a/Src/utils.c +++ b/Src/utils.c @@ -84,7 +84,15 @@ set_widearray(char *mb_array, Widechar_array wca) mb_charinit(); while (*mb_array) { - int mblen = mb_metacharlenconv(mb_array, &wci); + int mblen; + + if (STOUC(*mb_array) <= 0x7f) { + mb_array++; + *wcptr++ = (wchar_t)*mb_array; + continue; + } + + mblen = mb_metacharlenconv(mb_array, &wci); if (!mblen) break; @@ -5249,6 +5257,12 @@ mb_metacharlenconv_r(const char *s, wint_t *wcp, mbstate_t *mbsp) const char *ptr; wchar_t wc; + if (STOUC(*s) <= 0x7f) { + if (wcp) + *wcp = (wint_t)*s; + return 1; + } + for (ptr = s; *ptr; ) { if (*ptr == Meta) { inchar = *++ptr ^ 32; @@ -5301,7 +5315,7 @@ mb_metacharlenconv_r(const char *s, wint_t *wcp, mbstate_t *mbsp) mod_export int mb_metacharlenconv(const char *s, wint_t *wcp) { - if (!isset(MULTIBYTE)) { + if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) { /* treat as single byte, possibly metafied */ if (wcp) *wcp = (wint_t)(*s == Meta ? s[1] ^ 32 : *s); @@ -5442,6 +5456,12 @@ mb_charlenconv_r(const char *s, int slen, wint_t *wcp, mbstate_t *mbsp) const char *ptr; wchar_t wc; + if (slen && STOUC(*s) <= 0x7f) { + if (wcp) + *wcp = (wint_t)*s; + return 1; + } + for (ptr = s; slen; ) { inchar = *ptr; ptr++; @@ -5477,7 +5497,7 @@ mb_charlenconv_r(const char *s, int slen, wint_t *wcp, mbstate_t *mbsp) mod_export int mb_charlenconv(const char *s, int slen, wint_t *wcp) { - if (!isset(MULTIBYTE)) { + if (!isset(MULTIBYTE) || STOUC(*s) <= 0x7f) { if (wcp) *wcp = (wint_t)*s; return 1; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: multibyte optimisations 2016-11-10 13:47 ` multibyte optimisations Peter Stephenson @ 2016-11-10 14:57 ` Sebastian Gniazdowski 0 siblings, 0 replies; 5+ messages in thread From: Sebastian Gniazdowski @ 2016-11-10 14:57 UTC (permalink / raw) To: zsh-workers On Thu, Nov 10, 2016, at 05:47 AM, Peter Stephenson wrote: > On Thu, 10 Nov 2016 02:37:12 -0800 > Sebastian Gniazdowski <psprint@fastmail.com> wrote: > > Other pointed functions seem to be very valid / expected – multibyte > > functions. They can be optimized if a courageous decision will be made – > > to do what charnext / pattern.c does: > > > > if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80)) > > return x + 1; > > > > I.e. to optimize for ASCII as subset of UTF-8 also when calling > > MB_METACHARLEN, not only for MB_METASTRLEN (recent change). > > These look straightforward and along the same lines as what we already > do. Was worried that multibyte state can be not clear when requesting length of character, but that cannot really happen, and if it would, then the loop that advances char by char would have a problem, being in unclear situation after recent advancement. With this patch the parser runs for 1493 ms instead of 2148 ms :) -- Sebastian Gniazdowski psprint@fastmail.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-11-10 14:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20161110103845epcas3p3e7cabeffae723219daafa8d3e6b32f12@epcas3p3.samsung.com> 2016-11-10 10:37 ` Callgrind run Sebastian Gniazdowski 2016-11-10 12:31 ` Peter Stephenson 2016-11-10 14:07 ` Sebastian Gniazdowski 2016-11-10 13:47 ` multibyte optimisations Peter Stephenson 2016-11-10 14:57 ` Sebastian Gniazdowski
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).