* towlower performance
@ 2017-05-25 18:01 maksis .
2017-05-27 0:39 ` Andre McCurdy
0 siblings, 1 reply; 5+ messages in thread
From: maksis . @ 2017-05-25 18:01 UTC (permalink / raw)
To: musl@lists.openwall.com <musl@lists.openwall.com>
[-- Attachment #1: Type: text/plain, Size: 860 bytes --]
Hi,
After switching my C++ app from a glibc-based build to a static build using musl, I've noticed significant differences in performance with certain loading operations, which seems to be related to using std::map with a case-insensitive UTF-8 sorting algorithm. For example, parsing a XML file with a flat listing of ~350k directory items and mapping them by name takes about 3 seconds with glibc vs 13 seconds with musl. After modifying the sort algorithm by removing all calls to towlower, the loading time with musl drops to 3.5 seconds.
I put together a C++ benchmark with the same sorting algorithm, which seems to produce similar results (GCC 6.3, 64 bit, -O3):
https://gist.github.com/maksis/92ad04f525d69043283350675d04f160
glibc: ~2 seconds (Arch Linux)
musl: ~7 seconds (Alpine Linux 3.6.0)
What might be causing the difference?
[-- Attachment #2: Type: text/html, Size: 2597 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: towlower performance
2017-05-25 18:01 towlower performance maksis .
@ 2017-05-27 0:39 ` Andre McCurdy
2017-05-27 0:59 ` Rich Felker
0 siblings, 1 reply; 5+ messages in thread
From: Andre McCurdy @ 2017-05-27 0:39 UTC (permalink / raw)
To: musl
On Thu, May 25, 2017 at 11:01 AM, maksis .
<maksis@adrenaline-network.com> wrote:
> Hi,
>
> After switching my C++ app from a glibc-based build to a static build using
> musl, I've noticed significant differences in performance with certain
> loading operations, which seems to be related to using std::map with a
> case-insensitive UTF-8 sorting algorithm. For example, parsing a XML file
> with a flat listing of ~350k directory items and mapping them by name takes
> about 3 seconds with glibc vs 13 seconds with musl. After modifying the sort
> algorithm by removing all calls to towlower, the loading time with musl
> drops to 3.5 seconds.
>
> I put together a C++ benchmark with the same sorting algorithm, which seems
> to produce similar results (GCC 6.3, 64 bit, -O3):
>
> https://gist.github.com/maksis/92ad04f525d69043283350675d04f160
>
> glibc: ~2 seconds (Arch Linux)
>
> musl: ~7 seconds (Alpine Linux 3.6.0)
>
> What might be causing the difference?
I'm not sure if it maps directly to your results, but when building a
gcc based musl toolchain, libstdc++ gets configured to use the
"generic" OS specific include dir, which seems to contain some less
than optimal ctype code. e.g. ctype_inline.h comes with the following
warning:
// The following definitions are portable, but insanely slow. If one
// cares at all about performance, then specialized ctype
// functionality should be added for the native os in question: see
// the config/os/bits/ctype_*.h files.
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/config/os/generic/ctype_inline.h
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: towlower performance
2017-05-27 0:39 ` Andre McCurdy
@ 2017-05-27 0:59 ` Rich Felker
2017-05-30 12:23 ` [PATCH] towupper/towlower: fast path for ascii chars Natanael Copa
0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2017-05-27 0:59 UTC (permalink / raw)
To: musl
On Fri, May 26, 2017 at 05:39:31PM -0700, Andre McCurdy wrote:
> On Thu, May 25, 2017 at 11:01 AM, maksis .
> <maksis@adrenaline-network.com> wrote:
> > Hi,
> >
> > After switching my C++ app from a glibc-based build to a static build using
> > musl, I've noticed significant differences in performance with certain
> > loading operations, which seems to be related to using std::map with a
> > case-insensitive UTF-8 sorting algorithm. For example, parsing a XML file
> > with a flat listing of ~350k directory items and mapping them by name takes
> > about 3 seconds with glibc vs 13 seconds with musl. After modifying the sort
> > algorithm by removing all calls to towlower, the loading time with musl
> > drops to 3.5 seconds.
> >
> > I put together a C++ benchmark with the same sorting algorithm, which seems
> > to produce similar results (GCC 6.3, 64 bit, -O3):
> >
> > https://gist.github.com/maksis/92ad04f525d69043283350675d04f160
> >
> > glibc: ~2 seconds (Arch Linux)
> >
> > musl: ~7 seconds (Alpine Linux 3.6.0)
> >
> > What might be causing the difference?
>
> I'm not sure if it maps directly to your results, but when building a
> gcc based musl toolchain, libstdc++ gets configured to use the
> "generic" OS specific include dir, which seems to contain some less
> than optimal ctype code. e.g. ctype_inline.h comes with the following
> warning:
>
> // The following definitions are portable, but insanely slow. If one
> // cares at all about performance, then specialized ctype
> // functionality should be added for the native os in question: see
> // the config/os/bits/ctype_*.h files.
>
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/config/os/generic/ctype_inline.h
That's a bit of an overstatement. While it's possible for some
operations to be slightly to moderately faster with a
pokes-at-libc-internals implementation of c++ locale/ctype
functionality, the portable code is not "insanely" slow. For wchar
functionality I don't think it's even slower at all; as far as I know
glibc only provides function-bypass tables for the 8bit char is*
functions, and for the past couple decades they haven't even fully
bypassed function calls since a function call is used to obtain the
thread-local locale tables.
In any case the portable-interface-based implementation (generic) is
the correct thing to do. musl does not and won't provide backdoors to
bypass that.
Thankfully, though, I think the issue maksis reported is more a matter
of the towlower/towupper implementation being oriented towards size
(we have all the Unicode case mappings plus code to perform them in
about 1k!) rather than performance. A simple fix that would make this
case fast (maybe faster than glibc even) is short-circuiting the
common ascii case, but a more multilingual-friendly solution might be
using some sort of O(1) first-stage table to find the relevant parts
of the mapping tables instead of using linear searches and multiple
special-case branches up front. I suspect we can make the code very
fast while keeping the size under 3k and probably less.
Rich
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] towupper/towlower: fast path for ascii chars
2017-05-27 0:59 ` Rich Felker
@ 2017-05-30 12:23 ` Natanael Copa
2017-05-31 3:04 ` A. Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Natanael Copa @ 2017-05-30 12:23 UTC (permalink / raw)
To: musl; +Cc: Natanael Copa
Make a fast path for ascii chars which is assumed to be the most common
case. This has significant performance benefit on xml json and similar
---
This gives a performance boost for the given testcase:
https://gist.github.com/maksis/92ad04f525d69043283350675d04f160
Before:
Completed in 8.302969s, compare count 54136421
After:
Completed in 2.745886s, compare count 54136421
I don't consider this the final solution but it is atleast a significant
improvement.
src/ctype/towctrans.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/ctype/towctrans.c b/src/ctype/towctrans.c
index 6af61875..cf13a862 100644
--- a/src/ctype/towctrans.c
+++ b/src/ctype/towctrans.c
@@ -1,3 +1,4 @@
+#include <ctype.h>
#include <wctype.h>
#include "libc.h"
@@ -9,7 +10,6 @@ static const struct {
signed char lower;
unsigned char len;
} casemaps[] = {
- CASEMAP('A','Z','a'),
CASEMAP(0xc0,0xde,0xe0),
CASELACE(0x0100,0x012e),
@@ -257,12 +257,12 @@ static wchar_t __towcase(wchar_t wc, int lower)
wint_t towupper(wint_t wc)
{
- return __towcase(wc, 0);
+ return (unsigned)wc < 128 ? toupper(wc) : __towcase(wc, 0);
}
wint_t towlower(wint_t wc)
{
- return __towcase(wc, 1);
+ return (unsigned)wc < 128 ? tolower(wc) : __towcase(wc, 1);
}
wint_t __towupper_l(wint_t c, locale_t l)
--
2.13.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] towupper/towlower: fast path for ascii chars
2017-05-30 12:23 ` [PATCH] towupper/towlower: fast path for ascii chars Natanael Copa
@ 2017-05-31 3:04 ` A. Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: A. Wilcox @ 2017-05-31 3:04 UTC (permalink / raw)
To: musl
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 30/05/17 07:23, Natanael Copa wrote:
> Make a fast path for ascii chars which is assumed to be the most
> common case. This has significant performance benefit on xml json
> and similar
I'm not sure if that would be the common case when you are already
dealing with wide chars. I would assume you're using wide chars for a
reason...
> { - return __towcase(wc, 0); + return (unsigned)wc < 128 ?
> toupper(wc) : __towcase(wc, 0); }
>
> wint_t towlower(wint_t wc) { - return __towcase(wc, 1); + return
> (unsigned)wc < 128 ? tolower(wc) : __towcase(wc, 1); }
...so I would likely do something like:
return (unsigned)wc > 127 ? __towcase(wc, 1) : tolower(wc);
I think it comes down to silly style though. Functionally it is
likely similar / the same.
Would be curious to see if there is any difference on other loads (and
on other architectures).
- --arw
- --
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAEBCAAGBQJZLjKRAAoJEMspy1GSK50UfPMQAI9WHwH2TRHfRBmC9F9dRRqR
c0wTMC0IkaI9iD7xrdpG9lq6EhHelNbC94bIUFtHY5Eao4Vf6SJUv7b+o+OEHGs0
11PmDSvciHZx4vQb64rQJte8u6s3Z++POMpsLKULS4Gi63xLCtaCKB8yp4/eKezL
5cBqxCIXz+TlCpS/7TQxyYjwCA8C13Utd/2GmU4/9cCkdDOHJ5asg3sfFS4MtVPS
tSSunDgDgn0b3b0nLD0q4PiymRZ0cpGj2MKLWCR2WPdBkMitjegNm4wYcJE6naWH
ao4Fp26McviszvQvxlWzidpiDAq6kxU6jwHylV/VniseqcG5XBDDyG/MrzzMDlKr
lyNGQg6xXAcVvIujZBJYJYrgL1oDEpQ6c+JS4aD4gcM0WIn8YM9K5PQPKbcwrsqo
2sDKY51UTT3SDoghtj7ZkqgBMbQLmEE5lyom3B7EtcQazKj9pWtTmCy8kI+cMQcT
WPYurl67aEg54WW1oGChKza6P1WyuWTL27TL4NVOsVMyCqSa2ekF7rJTBP7rey9r
hNuFz14/tKyZTA1Mq3RWtN15ydsC+ToLbZikYqEmmBq36p2J1BbeWznfRRw6dpyt
G1sz70FtTBtODAK1xVngsUsYVPINgNhaQlQKs96NSiQzU0khY2or9kbYWPm0GyS8
fPerUQTvKecXmUYd2OVk
=pfgv
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-05-31 3:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 18:01 towlower performance maksis .
2017-05-27 0:39 ` Andre McCurdy
2017-05-27 0:59 ` Rich Felker
2017-05-30 12:23 ` [PATCH] towupper/towlower: fast path for ascii chars Natanael Copa
2017-05-31 3:04 ` A. Wilcox
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).