From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3049 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: multibyte performance findings Date: Sat, 6 Apr 2013 01:21:21 -0400 Message-ID: <20130406052121.GA20915@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1365274423 13184 80.91.229.3 (6 Apr 2013 18:53:43 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 6 Apr 2013 18:53:43 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3052-gllmg-musl=m.gmane.org@lists.openwall.com Sat Apr 06 20:53:45 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UOY9x-0005dU-Bk for gllmg-musl@plane.gmane.org; Sat, 06 Apr 2013 20:48:17 +0200 Original-Received: (qmail 19904 invoked by uid 550); 6 Apr 2013 05:21:35 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 19896 invoked from network); 6 Apr 2013 05:21:35 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3049 Archived-At: Hi all, I've been examining performance in the multibyte conversion functions (as part of the POSIX locale controversy), and have some interesting findings so far: 1. Performance of mbrtowc seems to be very sensitive to the compiler's code generation. Even adding code in untaken branches can drastically slow down or speed up the overall runtime. In one case, adding a dummy conditional to mimic locale-dependent encoding actually made the test run faster. I think this means that before we can draw any conclusions we need to figure out what's causing the compiler to behave to wackily, and whether the code can be restructured in such a way that its performance is less vulnerable to the whims of the compiler. 2. Implementing mbtowc (the old non-restartable function) as a wrapper for mbrtowc is a bad idea. The interface contract of mbrtowc forces it to be much slower than desirable; mbtowc's simpler interface can in theory give much better performance, and based on my first rewrite of mbtowc, the difference is big -- around 40% faster than the equivalent mbrtowc calls, and over 50% faster than the wrapper-based mbtowc. This means all musl-internal use of mbrtowc should probably be replaced by mbtowc, or perhaps even an internal-use-only function with a better interface. 3. A significant amount of time is "wasted" checking that the size n of the input buffer is not exceeded when reading; removing the checks speeds up mbtowc by 10%. As such, it might be desirable to break the function into two cases: n>=4 (in which case no further length checks are needed anywhere) and n<4 (in which case, each additional read needs a check). Alternately, for mbtowc, perhaps there's a quick and easy way to check the length against the state mask. I'm probably going to go ahead and commit some changes that seem to be clear wins in the above areas, but there's definitely room for discussion. If anybody's interested in poking around at what's going on with the optimizer or testing these functions heavily on other cpu variants, let me know. Rich