multibyte performance findings

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: multibyte performance findings
Date: Sat, 6 Apr 2013 01:21:21 -0400	[thread overview]
Message-ID: <20130406052121.GA20915@brightrain.aerifal.cx> (raw)

Hi all,

I've been examining performance in the multibyte conversion functions
(as part of the POSIX locale controversy), and have some interesting
findings so far:

1. Performance of mbrtowc seems to be very sensitive to the compiler's
code generation. Even adding code in untaken branches can drastically
slow down or speed up the overall runtime. In one case, adding a dummy
conditional to mimic locale-dependent encoding actually made the test
run faster. I think this means that before we can draw any conclusions
we need to figure out what's causing the compiler to behave to
wackily, and whether the code can be restructured in such a way that
its performance is less vulnerable to the whims of the compiler.

2. Implementing mbtowc (the old non-restartable function) as a wrapper
for mbrtowc is a bad idea. The interface contract of mbrtowc forces it
to be much slower than desirable; mbtowc's simpler interface can in
theory give much better performance, and based on my first rewrite of
mbtowc, the difference is big -- around 40% faster than the equivalent
mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This
means all musl-internal use of mbrtowc should probably be replaced by
mbtowc, or perhaps even an internal-use-only function with a better
interface.

3. A significant amount of time is "wasted" checking that the size n
of the input buffer is not exceeded when reading; removing the checks
speeds up mbtowc by 10%. As such, it might be desirable to break the
function into two cases: n>=4 (in which case no further length checks
are needed anywhere) and n<4 (in which case, each additional read
needs a check). Alternately, for mbtowc, perhaps there's a quick and
easy way to check the length against the state mask.

I'm probably going to go ahead and commit some changes that seem to be
clear wins in the above areas, but there's definitely room for
discussion. If anybody's interested in poking around at what's going
on with the optimizer or testing these functions heavily on other cpu
variants, let me know.

Rich

next             reply	other threads:[~2013-04-06  5:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-06  5:21 Rich Felker [this message]
2013-04-06  6:08 ` Rich Felker
2013-04-09  5:54   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130406052121.GA20915@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).