From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3077 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: multibyte performance findings Date: Tue, 9 Apr 2013 01:54:36 -0400 Message-ID: <20130409055436.GU20323@brightrain.aerifal.cx> References: <20130406052121.GA20915@brightrain.aerifal.cx> <20130406060852.GH20323@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1365486887 2277 80.91.229.3 (9 Apr 2013 05:54:47 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 9 Apr 2013 05:54:47 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3081-gllmg-musl=m.gmane.org@lists.openwall.com Tue Apr 09 07:54:51 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UPRW7-0007LB-0f for gllmg-musl@plane.gmane.org; Tue, 09 Apr 2013 07:54:51 +0200 Original-Received: (qmail 28464 invoked by uid 550); 9 Apr 2013 05:54:49 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 28456 invoked from network); 9 Apr 2013 05:54:49 -0000 Content-Disposition: inline In-Reply-To: <20130406060852.GH20323@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3077 Archived-At: On Sat, Apr 06, 2013 at 02:08:52AM -0400, Rich Felker wrote: > On Sat, Apr 06, 2013 at 01:21:21AM -0400, Rich Felker wrote: > > Hi all, > > > > I've been examining performance in the multibyte conversion functions > > (as part of the POSIX locale controversy), and have some interesting > > findings so far: > > [...] > > And here's a diff of the proposed changes so far.. > > Rich > diff --git a/src/multibyte/mbrtowc.c b/src/multibyte/mbrtowc.c > index cc49781..d552652 100644 > --- a/src/multibyte/mbrtowc.c > +++ b/src/multibyte/mbrtowc.c > @@ -18,6 +18,7 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate > const unsigned char *s = (const void *)src; > const unsigned N = n; > > + if (!n) return -2; > if (!st) st = (void *)&internal_state; > c = *(unsigned *)st; > > @@ -27,9 +28,9 @@ size_t mbrtowc(wchar_t *restrict wc, const char *restrict src, size_t n, mbstate > n = 1; > } else if (!wc) wc = (void *)&wc; > > - if (!n) return -2; This change turned out to be wrong (it's an invalid transformation when s is null) and I found a better improvement anyway, which I've committed. The commit log message is actually rather interesting: http://git.musl-libc.org/cgit/musl/commit/?id=a49e038bab7b3927b6a9c7d0c52f9e1a9cb82629 and I think this finding serves as a warning about writing 'clever' code for special cases that "falls through" to the general code, rather than just writing the special case code explicitly. > + /* This condition can only be true if *s<0x80 and c==0 */ > + if (*s + c < 0x80) return !!(*wc = *s); > if (!c) { > - if (*s < 0x80) return !!(*wc = *s); > if (*s-SA > SB-SA) goto ilseq; > c = bittab[*s++-SA]; n--; > } I omitted this for now too since the improvement seems difficult to measure. In principle it should be better, so I may revisit this later. Rich