zsh-workers
 help / color / mirror / code / Atom feed
* [PATCH] Optimization for mb_metastrlenend()
@ 2016-11-03 19:44 ` Sebastian Gniazdowski
  2016-11-03 19:47   ` Sebastian Gniazdowski
  2016-11-04  9:59   ` Peter Stephenson
  0 siblings, 2 replies; 3+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-03 19:44 UTC (permalink / raw)
  To: zsh-workers

Hello
mb_metastrlenend can quickly count character if it's ASCII (0..127) and
occurs after complete char. A good test for this has been found – syntax
highlighting parser working on 823 lines of Zsh-code input. It comes
from my project HSMW, is a modified and optimized
zsh-syntax-highlighting parser. Running time before optimizations: 2237
ms, after: 2027 ms, so this is a 10% optimization for long buffers.
Repeated the test many times, it's a clear win. For short buffers
(line-by-line calling the parser on different, hard input) the gain is
~30 ms for run times ~1450 ms, so no win. Zprof results for long buffers
and instruction to repeat the test are attached. Checked that all Zsh
tests are passing.



diff --git a/Src/utils.c b/Src/utils.c
index db43529..5bc9ef4 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5323,7 +5323,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
     char inchar, *laststart;
     size_t ret;
     wchar_t wc;
-    int num, num_in_char;
+    int num, num_in_char, complete;

     if (!isset(MULTIBYTE))
        return ztrlen(ptr);
@@ -5331,6 +5331,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
     laststart = ptr;
     ret = MB_INVALID;
     num = num_in_char = 0;
+    complete = 1;

     memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
     while (*ptr && !(eptr && ptr >= eptr)) {
@@ -5339,6 +5340,14 @@ mb_metastrlenend(char *ptr, int width, char
*eptr)
        else
            inchar = *ptr;
        ptr++;
+
+        if ( complete && ( inchar >= 0 && inchar <= 0x7f ) ) {
+            num ++;
+            laststart = ptr;
+            num_in_char = 0;
+            continue;
+        }
+
        ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);

        if (ret == MB_INCOMPLETE) {
@@ -5358,6 +5367,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
             * so we don't count characters twice.
             */
            num_in_char++;
+            complete = 0;
        } else {
            if (ret == MB_INVALID) {
                /* Reset, treat as single character */
@@ -5378,8 +5388,10 @@ mb_metastrlenend(char *ptr, int width, char
*eptr)
                }
            } else
                num++;
+
            laststart = ptr;
            num_in_char = 0;
+            complete = 1;
        }
     }

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Optimization for mb_metastrlenend()
  2016-11-03 19:44 ` [PATCH] Optimization for mb_metastrlenend() Sebastian Gniazdowski
@ 2016-11-03 19:47   ` Sebastian Gniazdowski
  2016-11-04  9:59   ` Peter Stephenson
  1 sibling, 0 replies; 3+ messages in thread
From: Sebastian Gniazdowski @ 2016-11-03 19:47 UTC (permalink / raw)
  To: zsh-workers

[-- Attachment #1: Type: text/plain, Size: 70 bytes --]

The missing files

-- 
  Sebastian Gniazdowski
  psprint@fastmail.com

[-- Attachment #2: mbrtowc_utils.diff --]
[-- Type: text/plain, Size: 1452 bytes --]

diff --git a/Src/utils.c b/Src/utils.c
index db43529..5bc9ef4 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5323,7 +5323,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
     char inchar, *laststart;
     size_t ret;
     wchar_t wc;
-    int num, num_in_char;
+    int num, num_in_char, complete;
 
     if (!isset(MULTIBYTE))
 	return ztrlen(ptr);
@@ -5331,6 +5331,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
     laststart = ptr;
     ret = MB_INVALID;
     num = num_in_char = 0;
+    complete = 1;
 
     memset(&mb_shiftstate, 0, sizeof(mb_shiftstate));
     while (*ptr && !(eptr && ptr >= eptr)) {
@@ -5339,6 +5340,14 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
 	else
 	    inchar = *ptr;
 	ptr++;
+
+        if ( complete && ( inchar >= 0 && inchar <= 0x7f ) ) {
+            num ++;
+            laststart = ptr;
+            num_in_char = 0;
+            continue;
+        }
+
 	ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
 
 	if (ret == MB_INCOMPLETE) {
@@ -5358,6 +5367,7 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
 	     * so we don't count characters twice.
 	     */
 	    num_in_char++;
+            complete = 0;
 	} else {
 	    if (ret == MB_INVALID) {
 		/* Reset, treat as single character */
@@ -5378,8 +5388,10 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
 		}
 	    } else
 		num++;
+
 	    laststart = ptr;
 	    num_in_char = 0;
+            complete = 1;
 	}
     }
 

[-- Attachment #3: zprof_results.txt --]
[-- Type: text/plain, Size: 2013 bytes --]

git clone https://github.com/psprint/history-search-multi-word.git
cd test; ./parse.zsh ./to-parse.zsh

823 lines parsed with modified, optimized zsh-syntax-highlighting code

After optimization, minimum obtainable time:

Running time: 2.0280520000
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        2027,49  2027,49  100,00%   1898,46  1898,46   93,63%  -hsmw-highlight-process
 2)  754         109,36     0,15    5,39%    109,36     0,15    5,39%  -hsmw-highlight-main-type
 3)  395          11,57     0,03    0,57%     11,57     0,03    0,57%  -hsmw-highlight-check-path
 4)   22           5,78     0,26    0,28%      5,78     0,26    0,28%  -hsmw-highlight-string
 5)    6           2,33     0,39    0,11%      2,33     0,39    0,11%  -hsmw-highlight-dollar-string
 6)    1           0,07     0,07    0,00%      0,07     0,07    0,00%  -hsmw-highlight-fill-option-variables
 7)    1           0,01     0,01    0,00%      0,01     0,01    0,00%  -hsmw-highlight-init

Before optimization, minimum obtainable time:

Running time: 2.2383990000
num  calls                time                       self            name
-----------------------------------------------------------------------------------
 1)    1        2237,79  2237,79  100,00%   2104,55  2104,55   94,04%  -hsmw-highlight-process
 2)  754         113,73     0,15    5,08%    113,73     0,15    5,08%  -hsmw-highlight-main-type
 3)  395          11,24     0,03    0,50%     11,24     0,03    0,50%  -hsmw-highlight-check-path
 4)   22           6,02     0,27    0,27%      6,02     0,27    0,27%  -hsmw-highlight-string
 5)    6           2,26     0,38    0,10%      2,26     0,38    0,10%  -hsmw-highlight-dollar-string
 6)    1           0,07     0,07    0,00%      0,07     0,07    0,00%  -hsmw-highlight-fill-option-variables
 7)    1           0,01     0,01    0,00%      0,01     0,01    0,00%  -hsmw-highlight-init


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Optimization for mb_metastrlenend()
  2016-11-03 19:44 ` [PATCH] Optimization for mb_metastrlenend() Sebastian Gniazdowski
  2016-11-03 19:47   ` Sebastian Gniazdowski
@ 2016-11-04  9:59   ` Peter Stephenson
  1 sibling, 0 replies; 3+ messages in thread
From: Peter Stephenson @ 2016-11-04  9:59 UTC (permalink / raw)
  To: zsh-workers

On Thu, 03 Nov 2016 12:44:12 -0700
Sebastian Gniazdowski <psprint@fastmail.com> wrote:
> mb_metastrlenend can quickly count character if it's ASCII (0..127) and
> occurs after complete char. A good test for this has been found – syntax
> highlighting parser working on 823 lines of Zsh-code input. It comes
> from my project HSMW, is a modified and optimized
> zsh-syntax-highlighting parser. Running time before optimizations: 2237
> ms, after: 2027 ms, so this is a 10% optimization for long buffers.
> Repeated the test many times, it's a clear win. For short buffers
> (line-by-line calling the parser on different, hard input) the gain is
> ~30 ms for run times ~1450 ms, so no win. Zprof results for long buffers
> and instruction to repeat the test are attached. Checked that all Zsh
> tests are passing.

Thanks, we do rely throughout on US-ASCII as a 7-bit subset so this is a
reasonable thing to do.

I had to apply it by hand and I've slightly reformatted it, but the code
is identical.

pws


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-11-04  9:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20161103194449epcas4p18709e14994601110cf87ad06c2fcb9a1@epcas4p1.samsung.com>
2016-11-03 19:44 ` [PATCH] Optimization for mb_metastrlenend() Sebastian Gniazdowski
2016-11-03 19:47   ` Sebastian Gniazdowski
2016-11-04  9:59   ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).