Hello I've run callgrind on Zsh, when executing syntax-highlighting code that parses 823 lines of code: 2,269,560,047 ???:mb_metacharlenconv_r [/usr/local/bin/zsh-debug-opt] 1,698,947,505 ???:remnulargs [/usr/local/bin/zsh-debug-opt] 1,677,804,272 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,425,973,736 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 1,177,994,701 ???:untokenize [/usr/local/bin/zsh-debug-opt] 1,048,181,974 ???:mb_metacharlenconv [/usr/local/bin/zsh-debug-opt] 1,036,055,574 ???:getindex'2 [/usr/local/bin/zsh-debug-opt] 793,202,632 ???:haswilds [/usr/local/bin/zsh-debug-opt] 578,630,988 ???:mb_metastrlenend [/usr/local/bin/zsh-debug-opt] 483,051,992 ???:szone_free_definite_size [/usr/lib/system/libsystem_malloc.dylib] 436,411,797 ???:ztrsub [/usr/local/bin/zsh-debug-opt] 364,444,476 ???:tiny_malloc_from_free_list [/usr/lib/system/libsystem_malloc.dylib] 353,826,375 ???:pattrylen'2 [/usr/local/bin/zsh-debug-opt] 280,090,072 ???:tiny_free_list_add_ptr [/usr/lib/system/libsystem_malloc.dylib] 258,502,596 ???:strlen [/usr/lib/dyld] 234,273,918 ???:pattrylen [/usr/local/bin/zsh-debug-opt] 209,835,520 ???:szone_size [/usr/lib/system/libsystem_malloc.dylib] To repeat the run clone https://github.com/psprint/history-search-multi-word/ and add "valgrind --tool=callgrind" before "zsh" (after exec) in parse.zsh, then run ./parse.zsh ./to-parse.zsh. I think this is a very good real world test. Seems that Zsh execution could be greatly optimized if functions: remnulargs, untokenize, haswilds could be optimized. Not sure if the results are reasonable, as haswilds just iterates over a string and does quite basic switch. The other two functions have nested loops, so they look more likely as being time consuming. Maybe the nested loop can be changed to something else? Other pointed functions seem to be very valid / expected – multibyte functions. They can be optimized if a courageous decision will be made – to do what charnext / pattern.c does: if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80)) return x + 1; I.e. to optimize for ASCII as subset of UTF-8 also when calling MB_METACHARLEN, not only for MB_METASTRLEN (recent change). -- Sebastian Gniazdowski psprint@fastmail.com