From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3049
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@aerifal.cx>
Newsgroups: gmane.linux.lib.musl.general
Subject: multibyte performance findings
Date: Sat, 6 Apr 2013 01:21:21 -0400
Message-ID: <20130406052121.GA20915@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1365274423 13184 80.91.229.3 (6 Apr 2013 18:53:43 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sat, 6 Apr 2013 18:53:43 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-3052-gllmg-musl=m.gmane.org@lists.openwall.com Sat Apr 06 20:53:45 2013
Return-path: <musl-return-3052-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-3052-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1UOY9x-0005dU-Bk
	for gllmg-musl@plane.gmane.org; Sat, 06 Apr 2013 20:48:17 +0200
Original-Received: (qmail 19904 invoked by uid 550); 6 Apr 2013 05:21:35 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 19896 invoked from network); 6 Apr 2013 05:21:35 -0000
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.linux.lib.musl.general:3049
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/3049>

Hi all,

I've been examining performance in the multibyte conversion functions
(as part of the POSIX locale controversy), and have some interesting
findings so far:

1. Performance of mbrtowc seems to be very sensitive to the compiler's
code generation. Even adding code in untaken branches can drastically
slow down or speed up the overall runtime. In one case, adding a dummy
conditional to mimic locale-dependent encoding actually made the test
run faster. I think this means that before we can draw any conclusions
we need to figure out what's causing the compiler to behave to
wackily, and whether the code can be restructured in such a way that
its performance is less vulnerable to the whims of the compiler.

2. Implementing mbtowc (the old non-restartable function) as a wrapper
for mbrtowc is a bad idea. The interface contract of mbrtowc forces it
to be much slower than desirable; mbtowc's simpler interface can in
theory give much better performance, and based on my first rewrite of
mbtowc, the difference is big -- around 40% faster than the equivalent
mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This
means all musl-internal use of mbrtowc should probably be replaced by
mbtowc, or perhaps even an internal-use-only function with a better
interface.

3. A significant amount of time is "wasted" checking that the size n
of the input buffer is not exceeded when reading; removing the checks
speeds up mbtowc by 10%. As such, it might be desirable to break the
function into two cases: n>=4 (in which case no further length checks
are needed anywhere) and n<4 (in which case, each additional read
needs a check). Alternately, for mbtowc, perhaps there's a quick and
easy way to check the length against the state mask.

I'm probably going to go ahead and commit some changes that seem to be
clear wins in the above areas, but there's definitely room for
discussion. If anybody's interested in poking around at what's going
on with the optimizer or testing these functions heavily on other cpu
variants, let me know.

Rich