mailing list of musl libc
 help / color / mirror / code / Atom feed
* multibyte performance findings
@ 2013-04-06  5:21 Rich Felker
  2013-04-06  6:08 ` Rich Felker
  0 siblings, 1 reply; 3+ messages in thread
From: Rich Felker @ 2013-04-06  5:21 UTC (permalink / raw)
  To: musl

Hi all,

I've been examining performance in the multibyte conversion functions
(as part of the POSIX locale controversy), and have some interesting
findings so far:

1. Performance of mbrtowc seems to be very sensitive to the compiler's
code generation. Even adding code in untaken branches can drastically
slow down or speed up the overall runtime. In one case, adding a dummy
conditional to mimic locale-dependent encoding actually made the test
run faster. I think this means that before we can draw any conclusions
we need to figure out what's causing the compiler to behave to
wackily, and whether the code can be restructured in such a way that
its performance is less vulnerable to the whims of the compiler.

2. Implementing mbtowc (the old non-restartable function) as a wrapper
for mbrtowc is a bad idea. The interface contract of mbrtowc forces it
to be much slower than desirable; mbtowc's simpler interface can in
theory give much better performance, and based on my first rewrite of
mbtowc, the difference is big -- around 40% faster than the equivalent
mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This
means all musl-internal use of mbrtowc should probably be replaced by
mbtowc, or perhaps even an internal-use-only function with a better
interface.

3. A significant amount of time is "wasted" checking that the size n
of the input buffer is not exceeded when reading; removing the checks
speeds up mbtowc by 10%. As such, it might be desirable to break the
function into two cases: n>=4 (in which case no further length checks
are needed anywhere) and n<4 (in which case, each additional read
needs a check). Alternately, for mbtowc, perhaps there's a quick and
easy way to check the length against the state mask.

I'm probably going to go ahead and commit some changes that seem to be
clear wins in the above areas, but there's definitely room for
discussion. If anybody's interested in poking around at what's going
on with the optimizer or testing these functions heavily on other cpu
variants, let me know.

Rich


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-04-09  5:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-06  5:21 multibyte performance findings Rich Felker
2013-04-06  6:08 ` Rich Felker
2013-04-09  5:54   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).