From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7356 Path: news.gmane.org!not-for-mail From: John Mudd Newsgroups: gmane.linux.lib.musl.general Subject: musl perf, 20% slower than native build? Date: Wed, 8 Apr 2015 11:33:16 -0400 Message-ID: Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1140f5d23044f20513384119 X-Trace: ger.gmane.org 1428507243 23429 80.91.229.3 (8 Apr 2015 15:34:03 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 8 Apr 2015 15:34:03 +0000 (UTC) Cc: John Mudd To: musl Original-X-From: musl-return-7369-gllmg-musl=m.gmane.org@lists.openwall.com Wed Apr 08 17:33:57 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YfrzJ-0001XC-6C for gllmg-musl@m.gmane.org; Wed, 08 Apr 2015 17:33:57 +0200 Original-Received: (qmail 11907 invoked by uid 550); 8 Apr 2015 15:33:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 11844 invoked from network); 8 Apr 2015 15:33:48 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:cc:content-type; bh=IKljg3lsk4IZ05Qe2L30qpUzbCfkAAfHNeTMdxIcbZg=; b=heF2OlgjfxATlHwIq+jfR7+t+5Ds6KLpEvZqIrKc4bxYOElHEm0Q49eTGv/ui8tU1Y ++tMa7Sy3Tq5uE5btNL/DklnVunKwPkefpABbqy0aLFfqg+e/t1Krsl+hSIoj7CIzURX 0mJ95cpKoyX2QpGumbX6XvTrUFTFof1lyLjUVymXrBY1Hi0ai1V8eXzbhBInw/MjDElU tCPTgpYoekYrB3Q8oSZxhjR5GVz0a/po4/VogSEaustyf34PkH0G9PPzAJCU8GX8uFw7 0pC1khsfMi8PHI8zFf48ZXW5APi7CeaNiMONyUPBXnbWsOiIdwvJufNnG7UX+QB4Fx4d h/Hg== X-Received: by 10.107.30.135 with SMTP id e129mr39743012ioe.26.1428507216329; Wed, 08 Apr 2015 08:33:36 -0700 (PDT) Xref: news.gmane.org gmane.linux.lib.musl.general:7356 Archived-At: --001a1140f5d23044f20513384119 Content-Type: text/plain; charset=ISO-8859-1 On March 13 I raised a concern about performance, Subject "musl 14x slower? ". It now looks like most of that issue had to do with my application code. But here's a more focused look at how musl compares to native build. Granted this is crude, limited, but it looks like musl is 20% slower. I built Python both native and with musl. It's not apples to apples, I actually used a newer version of gcc to build musl 1.1.8 and musl Python. This may and may not be an advantage fro the musl version. I also used -O3 when building musl and musl Python. I assume this should help the musl version. The musl version also uses a newer version of Python. That may or may not be helpful. I can redo with a more consistent builds if this is worthwhile. Maybe you can suggest a straight C benchmark instead. I ran a standard Python benchmark. The code is available here: http://goo.gl/UyLDYC I picked this at random. It is not all encompassing. I've also never used "perf" before. Feel free to advise if this needs improvement. musl version: $ python Python 2.7.9 (default, Apr 2 2015, 15:16:16) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> $ perf stat python spitfire_bigtable.py StringIO 507.98 ms cStringIO 189.18 ms list concat 61.59 ms Performance counter stats for '/home/mudd/multicorn_ctree/spitfire_bigtable.py': 810.537826 task-clock (msec) # 0.971 CPUs utilized 297 context-switches # 0.366 K/sec 11 cpu-migrations # 0.014 K/sec 5,977 page-faults # 0.007 M/sec 2,151,830,012 cycles # 2.655 GHz [50.64%] stalled-cycles-frontend stalled-cycles-backend 3,106,074,350 instructions # 1.44 insns per cycle [74.86%] 677,389,217 branches # 835.728 M/sec [74.56%] 13,710,101 branch-misses # 2.02% of all branches [75.09%] 0.834844640 seconds time elapsed $ native: $ python Python 2.7.5 (default, Aug 19 2013, 15:23:53) [GCC 4.7.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> $ perf stat ~/multicorn_ctree/spitfire_bigtable.py StringIO 402.63 ms cStringIO 132.62 ms list concat 46.89 ms Performance counter stats for '/home/mudd/multicorn_ctree/spitfire_bigtable.py': 626.547364 task-clock (msec) # 0.982 CPUs utilized 169 context-switches # 0.270 K/sec 19 cpu-migrations # 0.030 K/sec 5,773 page-faults # 0.009 M/sec 1,663,247,805 cycles # 2.655 GHz [49.94%] stalled-cycles-frontend stalled-cycles-backend 2,573,617,826 instructions # 1.55 insns per cycle [75.03%] 554,357,437 branches # 884.781 M/sec [75.49%] 10,851,258 branch-misses # 1.96% of all branches [74.83%] 0.638252827 seconds time elapsed $ --001a1140f5d23044f20513384119 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On March 13 I raised a concern about performance, Sub= ject "musl 14x slower? ". It now looks like most of that issue ha= d to do with my application code. But here's a more focused look at how= musl compares to native build. Granted this is crude, limited, but it look= s like musl is 20% slower.

I built Python both nat= ive and with musl. It's not apples to apples, I actually used a newer v= ersion of gcc to build musl 1.1.8 and musl Python. This may and may not be = an advantage fro the musl version. I also used -O3 when building musl and m= usl Python. I assume this should help the musl version. The musl version al= so uses a newer version of Python. That may or may not be helpful. I can re= do with a more consistent builds if this is worthwhile. Maybe you can sugge= st a straight C benchmark instead.

I ran a standar= d Python benchmark. The code is available here:=A0http://goo.gl/UyLDYC
I picked this at random. It is n= ot all encompassing.

I've also never used &quo= t;perf" before. Feel free to advise if this needs improvement.


musl version:

$ python
Python 2.7.9 (default, Apr =A02 2015, 15:16:16)=A0
[GCC 4.8.2] on linux2
Type "help", "copyrig= ht", "credits" or "license" for more information.<= /div>
>>>=A0
$ perf stat python spitfire_bigtable.py=
StringIO =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0507.98 ms
cStringIO =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 189.18 ms
list co= ncat =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A061.59 ms

=A0Performance counter stats for '= /home/mudd/multicorn_ctree/spitfire_bigtable.py':

<= div>=A0 =A0 =A0 =A0 810.537826 task-clock (msec) =A0 =A0 =A0 =A0 # =A0 =A00= .971 CPUs utilized
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0297 context-swi= tches =A0 =A0 =A0 =A0 =A0# =A0 =A00.366 K/sec
=A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 11 cpu-migrations =A0 =A0 =A0 =A0 =A0 =A0# =A0 =A00.014 K/sec<= /div>
=A0 =A0 =A0 =A0 =A0 =A0 =A05,977 page-faults =A0 =A0 =A0 =A0 =A0 = =A0 =A0 # =A0 =A00.007 M/sec
=A0 =A0 =A02,151,830,012 cycles =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0# =A0 =A02.655 GHz =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 [50.64%]
=A0 =A0<not supported> stalled= -cycles-frontend
=A0 =A0<not supported> stalled-cycles-back= end
=A0 =A0 =A03,106,074,350 instructions =A0 =A0 =A0 =A0 =A0 =A0= =A0# =A0 =A01.44 =A0insns per cycle =A0 =A0 =A0 =A0 [74.86%]
=A0= =A0 =A0 =A0677,389,217 branches =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0# =A083= 5.728 M/sec =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [74.56%]
=A0 =A0 = =A0 =A0 13,710,101 branch-misses =A0 =A0 =A0 =A0 =A0 =A0 # =A0 =A02.02% of = all branches =A0 =A0 =A0 =A0 [75.09%]

=A0 =A0 =A0 = =A00.834844640 seconds time elapsed
$

<= div>

native:

$ p= ython
Python 2.7.5 (default, Aug 19 2013, 15:23:53)
[GC= C 4.7.3] on linux2
Type "help", "copyright", = "credits" or "license" for more information.
= >>>
$ perf stat ~/multicorn_ctree/spitfire_bi= gtable.py
StringIO =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0402.63 ms
cStringIO =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 132.62 ms
=
list concat =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A046.89 ms

=A0Performance counter sta= ts for '/home/mudd/multicorn_ctree/spitfire_bigtable.py':

=A0 =A0 =A0 =A0 626.547364 task-clock (msec) =A0 =A0 =A0 = =A0 # =A0 =A00.982 CPUs utilized
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01= 69 context-switches =A0 =A0 =A0 =A0 =A0# =A0 =A00.270 K/sec
=A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 19 cpu-migrations =A0 =A0 =A0 =A0 =A0 =A0# =A0 = =A00.030 K/sec
=A0 =A0 =A0 =A0 =A0 =A0 =A05,773 page-faults =A0 = =A0 =A0 =A0 =A0 =A0 =A0 # =A0 =A00.009 M/sec
=A0 =A0 =A01,663,247= ,805 cycles =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0# =A0 =A02.655 GHz =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [49.94%]
=A0 =A0<not suppo= rted> stalled-cycles-frontend
=A0 =A0<not supported> sta= lled-cycles-backend
=A0 =A0 =A02,573,617,826 instructions =A0 =A0= =A0 =A0 =A0 =A0 =A0# =A0 =A01.55 =A0insns per cycle =A0 =A0 =A0 =A0 [75.03= %]
=A0 =A0 =A0 =A0554,357,437 branches =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0# =A0884.781 M/sec =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [75.49%]<= /div>
=A0 =A0 =A0 =A0 10,851,258 branch-misses =A0 =A0 =A0 =A0 =A0 =A0 = # =A0 =A01.96% of all branches =A0 =A0 =A0 =A0 [74.83%]

=A0 =A0 =A0 =A00.638252827 seconds time elapsed
$

--001a1140f5d23044f20513384119--