multibyte performance findings

mailing list of musl libc
 help / color / mirror / code / Atom feed

* multibyte performance findings
@ 2013-04-06  5:21 Rich Felker
  2013-04-06  6:08 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2013-04-06  5:21 UTC (permalink / raw)
  To: musl

Hi all,

I've been examining performance in the multibyte conversion functions
(as part of the POSIX locale controversy), and have some interesting
findings so far:

1. Performance of mbrtowc seems to be very sensitive to the compiler's
code generation. Even adding code in untaken branches can drastically
slow down or speed up the overall runtime. In one case, adding a dummy
conditional to mimic locale-dependent encoding actually made the test
run faster. I think this means that before we can draw any conclusions
we need to figure out what's causing the compiler to behave to
wackily, and whether the code can be restructured in such a way that
its performance is less vulnerable to the whims of the compiler.

2. Implementing mbtowc (the old non-restartable function) as a wrapper
for mbrtowc is a bad idea. The interface contract of mbrtowc forces it
to be much slower than desirable; mbtowc's simpler interface can in
theory give much better performance, and based on my first rewrite of
mbtowc, the difference is big -- around 40% faster than the equivalent
mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This
means all musl-internal use of mbrtowc should probably be replaced by
mbtowc, or perhaps even an internal-use-only function with a better
interface.

3. A significant amount of time is "wasted" checking that the size n
of the input buffer is not exceeded when reading; removing the checks
speeds up mbtowc by 10%. As such, it might be desirable to break the
function into two cases: n>=4 (in which case no further length checks
are needed anywhere) and n<4 (in which case, each additional read
needs a check). Alternately, for mbtowc, perhaps there's a quick and
easy way to check the length against the state mask.

I'm probably going to go ahead and commit some changes that seem to be
clear wins in the above areas, but there's definitely room for
discussion. If anybody's interested in poking around at what's going
on with the optimizer or testing these functions heavily on other cpu
variants, let me know.

Rich

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: multibyte performance findings
  2013-04-06  5:21 multibyte performance findings Rich Felker
@ 2013-04-06  6:08 ` Rich Felker
  2013-04-09  5:54   ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2013-04-06  6:08 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

On Sat, Apr 06, 2013 at 01:21:21AM -0400, Rich Felker wrote:
> Hi all,
> 
> I've been examining performance in the multibyte conversion functions
> (as part of the POSIX locale controversy), and have some interesting
> findings so far:
> [...]

And here's a diff of the proposed changes so far..

Rich

[-- Attachment #2: mb.diff --]
[-- Type: text/plain, Size: 2006 bytes --]

diff --git a/src/multibyte/mbrtowc.c b/src/multibyte/mbrtowc.c
index cc49781..d552652 100644
--- a/src/multibyte/mbrtowc.c
+++ b/src/multibyte/mbrtowc.c
@@ -18,6 +18,7 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate
 	const unsigned char *s = (const void *)src;
 	const unsigned N = n;
 
+	if (!n) return -2;
 	if (!st) st = (void *)&internal_state;
 	c = *(unsigned *)st;
 	
@@ -27,9 +28,9 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate
 		n = 1;
 	} else if (!wc) wc = (void *)&wc;
 
-	if (!n) return -2;
+	/* This condition can only be true if *s<0x80 and c==0 */
+	if (*s + c < 0x80) return !!(*wc = *s);
 	if (!c) {
-		if (*s < 0x80) return !!(*wc = *s);
 		if (*s-SA > SB-SA) goto ilseq;
 		c = bittab[*s++-SA]; n--;
 	}
diff --git a/src/multibyte/mbtowc.c b/src/multibyte/mbtowc.c
index b5dd7e3..5ce9281 100644
--- a/src/multibyte/mbtowc.c
+++ b/src/multibyte/mbtowc.c
@@ -11,9 +11,43 @@
 
 #include "internal.h"
 
-int mbtowc(wchar_t *restrict wc, const char *restrict s, size_t n)
+int mbtowc(wchar_t *restrict wc, const char *restrict src, size_t n)
 {
-	mbstate_t st = { 0 };
-	n = mbrtowc(wc, s, n, &st);
-	return n+2 ? n : -1;
+	unsigned c;
+	const unsigned char *s = (const void *)src;
+
+	if (!s) return 0;
+	if (!n) goto ilseq;
+	if (!wc) wc = (void *)&wc;
+
+	if (*s < 0x80) return !!(*wc = *s);
+	if (*s-SA > SB-SA) goto ilseq;
+	c = bittab[*s++-SA];
+
+	/* Avoid excessive checks against n: If shifting the state n-1
+	 * times does not clear the high bit, then the value of n is
+	 * insufficient to read a character */
+	if (n<4 && ((c<<(6*n-6)) & (1U<<31))) goto ilseq;
+
+	if (OOB(c,*s)) goto ilseq;
+	c = c<<6 | *s++-0x80;
+	if (!(c&(1U<<31))) {
+		*wc = c;
+		return 2;
+	}
+
+	if (*s-0x80u >= 0x40) goto ilseq;
+	c = c<<6 | *s++-0x80;
+	if (!(c&(1U<<31))) {
+		*wc = c;
+		return 3;
+	}
+
+	if (*s-0x80u >= 0x40) goto ilseq;
+	*wc = c<<6 | *s++-0x80;
+	return 4;
+
+ilseq:
+	errno = EILSEQ;
+	return -1;
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: multibyte performance findings
  2013-04-06  6:08 ` Rich Felker
@ 2013-04-09  5:54   ` Rich Felker
  0 siblings, 0 replies; 3+ messages in thread
From: Rich Felker @ 2013-04-09  5:54 UTC (permalink / raw)
  To: musl

On Sat, Apr 06, 2013 at 02:08:52AM -0400, Rich Felker wrote:
> On Sat, Apr 06, 2013 at 01:21:21AM -0400, Rich Felker wrote:
> > Hi all,
> > 
> > I've been examining performance in the multibyte conversion functions
> > (as part of the POSIX locale controversy), and have some interesting
> > findings so far:
> > [...]
> 
> And here's a diff of the proposed changes so far..
> 
> Rich

> diff --git a/src/multibyte/mbrtowc.c b/src/multibyte/mbrtowc.c
> index cc49781..d552652 100644
> --- a/src/multibyte/mbrtowc.c
> +++ b/src/multibyte/mbrtowc.c
> @@ -18,6 +18,7 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate
>  	const unsigned char *s = (const void *)src;
>  	const unsigned N = n;
>  
> +	if (!n) return -2;
>  	if (!st) st = (void *)&internal_state;
>  	c = *(unsigned *)st;
>  	
> @@ -27,9 +28,9 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate
>  		n = 1;
>  	} else if (!wc) wc = (void *)&wc;
>  
> -	if (!n) return -2;

This change turned out to be wrong (it's an invalid transformation
when s is null) and I found a better improvement anyway, which I've
committed. The commit log message is actually rather interesting:

http://git.musl-libc.org/cgit/musl/commit/?id=a49e038bab7b3927b6a9c7d0c52f9e1a9cb82629

and I think this finding serves as a warning about writing 'clever'
code for special cases that "falls through" to the general code,
rather than just writing the special case code explicitly.

> +	/* This condition can only be true if *s<0x80 and c==0 */
> +	if (*s + c < 0x80) return !!(*wc = *s);
>  	if (!c) {
> -		if (*s < 0x80) return !!(*wc = *s);
>  		if (*s-SA > SB-SA) goto ilseq;
>  		c = bittab[*s++-SA]; n--;
>  	}

I omitted this for now too since the improvement seems difficult to
measure. In principle it should be better, so I may revisit this
later.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-04-09  5:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-06  5:21 multibyte performance findings Rich Felker
2013-04-06  6:08 ` Rich Felker
2013-04-09  5:54   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).