From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11353
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: towlower performance
Date: Fri, 26 May 2017 20:59:50 -0400
Message-ID: <20170527005950.GA1627@brightrain.aerifal.cx>
References: <VI1P192MB011211DE6A9BDC8F4AF61CB6FBFF0@VI1P192MB0112.EURP192.PROD.OUTLOOK.COM>
 <CAJ86T=XRrvspiJb4whak3qHpWu44FFD8gkrjHB7mghr-Le6epA@mail.gmail.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1495846805 12821 195.159.176.226 (27 May 2017 01:00:05 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sat, 27 May 2017 01:00:05 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-11368-gllmg-musl=m.gmane.org@lists.openwall.com Sat May 27 02:59:59 2017
Return-path: <musl-return-11368-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11368-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1dEQ5H-0003E3-Js
	for gllmg-musl@m.gmane.org; Sat, 27 May 2017 02:59:59 +0200
Original-Received: (qmail 5903 invoked by uid 550); 27 May 2017 01:00:03 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 5879 invoked from network); 27 May 2017 01:00:03 -0000
Content-Disposition: inline
In-Reply-To: <CAJ86T=XRrvspiJb4whak3qHpWu44FFD8gkrjHB7mghr-Le6epA@mail.gmail.com>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:11353
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11353>

On Fri, May 26, 2017 at 05:39:31PM -0700, Andre McCurdy wrote:
> On Thu, May 25, 2017 at 11:01 AM, maksis .
> <maksis@adrenaline-network.com> wrote:
> > Hi,
> >
> > After switching my C++ app from a glibc-based build to a static build using
> > musl, I've noticed significant differences in performance with certain
> > loading operations, which seems to be related to using std::map with a
> > case-insensitive UTF-8 sorting algorithm. For example, parsing a XML file
> > with a flat listing of ~350k directory items and mapping them by name takes
> > about 3 seconds with glibc vs 13 seconds with musl. After modifying the sort
> > algorithm by removing all calls to towlower, the loading time with musl
> > drops to 3.5 seconds.
> >
> > I put together a C++ benchmark with the same sorting algorithm, which seems
> > to produce similar results (GCC 6.3, 64 bit, -O3):
> >
> > https://gist.github.com/maksis/92ad04f525d69043283350675d04f160
> >
> > glibc: ~2 seconds (Arch Linux)
> >
> > musl: ~7 seconds (Alpine Linux 3.6.0)
> >
> > What might be causing the difference?
> 
> I'm not sure if it maps directly to your results, but when building a
> gcc based musl toolchain, libstdc++ gets configured to use the
> "generic" OS specific include dir, which seems to contain some less
> than optimal ctype code. e.g. ctype_inline.h comes with the following
> warning:
> 
>   // The following definitions are portable, but insanely slow. If one
>   // cares at all about performance, then specialized ctype
>   // functionality should be added for the native os in question: see
>   // the config/os/bits/ctype_*.h files.
> 
>   https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/config/os/generic/ctype_inline.h

That's a bit of an overstatement. While it's possible for some
operations to be slightly to moderately faster with a
pokes-at-libc-internals implementation of c++ locale/ctype
functionality, the portable code is not "insanely" slow. For wchar
functionality I don't think it's even slower at all; as far as I know
glibc only provides function-bypass tables for the 8bit char is*
functions, and for the past couple decades they haven't even fully
bypassed function calls since a function call is used to obtain the
thread-local locale tables.

In any case the portable-interface-based implementation (generic) is
the correct thing to do. musl does not and won't provide backdoors to
bypass that.

Thankfully, though, I think the issue maksis reported is more a matter
of the towlower/towupper implementation being oriented towards size
(we have all the Unicode case mappings plus code to perform them in
about 1k!) rather than performance. A simple fix that would make this
case fast (maybe faster than glibc even) is short-circuiting the
common ascii case, but a more multilingual-friendly solution might be
using some sort of O(1) first-stage table to find the relevant parts
of the mapping tables instead of using linear searches and multiple
special-case branches up front. I suspect we can make the code very
fast while keeping the size under 3k and probably less.

Rich