From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7359 Path: news.gmane.org!not-for-mail From: John Mudd Newsgroups: gmane.linux.lib.musl.general Subject: Re: musl perf, 20% slower than native build? Date: Wed, 8 Apr 2015 15:10:51 -0400 Message-ID: References: <20150408160507.GB31681@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=90e6ba613f8a6744d205133b4bde X-Trace: ger.gmane.org 1428520298 23636 80.91.229.3 (8 Apr 2015 19:11:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 8 Apr 2015 19:11:38 +0000 (UTC) To: musl , John Mudd Original-X-From: musl-return-7372-gllmg-musl=m.gmane.org@lists.openwall.com Wed Apr 08 21:11:28 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YfvNn-000555-Vp for gllmg-musl@m.gmane.org; Wed, 08 Apr 2015 21:11:28 +0200 Original-Received: (qmail 15569 invoked by uid 550); 8 Apr 2015 19:11:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 15530 invoked from network); 8 Apr 2015 19:11:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=7S8DoJl7az137QUXJ0QZ/B3aSC7wiQVAc9RG7WyMM+o=; b=lHsZSsJ0GnePXbopxb/EGvkXlzzwpCZdWPLiyXBcr6UPGd6wBWZZJN73UA2xK97I05 sKr5n/y2jy6YGsdnUEIA091SeNTJ0ke9HpekYsH7//GTOnL6D4FySjDc93DFMZhwwYg6 i6BMo2DkdBjVDGsr7tyfCuuk0qO5Wmrvdnm4N2H8RT7fle2D1T6s3JdTo+3RM2kamIFF O2IlOccopuGC/xaayOADDBmIXksRdwSwlK2Q/ih62sZaZLz3TXsC6C6pPLIMzrWuBQTf o+SKjrYOAN8ThOKuZB/H/8xkH0A7iQ8pY6YNbtC8w5fuTxch6RFOQMA1z58mM5HMbVQ+ CBrQ== X-Received: by 10.42.100.211 with SMTP id b19mr35047753ico.5.1428520272608; Wed, 08 Apr 2015 12:11:12 -0700 (PDT) In-Reply-To: <20150408160507.GB31681@port70.net> Xref: news.gmane.org gmane.linux.lib.musl.general:7359 Archived-At: --90e6ba613f8a6744d205133b4bde Content-Type: text/plain; charset=ISO-8859-1 Here's a fresh native compile of the same version of Python, same gcc. Now the musl version is only slightly slower, maybe 5%. BTW, I'm not complaining. I use musl for portability, not speed. $ python Python 2.7.9 (default, Apr 8 2015, 14:29:14) [GCC 4.8.2] on linux2 >>> $ perf stat ~/multicorn_ctree/spitfire_bigtable.py StringIO 523.62 ms cStringIO 144.32 ms list concat 55.12 ms Performance counter stats for '/home/mudd/multicorn_ctree/spitfire_bigtable.py': 769.874633 task-clock (msec) # 0.977 CPUs utilized 269 context-switches # 0.349 K/sec 6 cpu-migrations # 0.008 K/sec 5,997 page-faults # 0.008 M/sec 2,043,153,669 cycles # 2.654 GHz [50.74%] stalled-cycles-frontend stalled-cycles-backend 2,993,940,382 instructions # 1.47 insns per cycle [75.11%] 673,064,696 branches # 874.252 M/sec [74.59%] 15,486,299 branch-misses # 2.30% of all branches [74.71%] 0.787704322 seconds time elapsed $ Here's output from perf record/report for libc. This looks consistent with the 5% longer run time. native: 2.20% python libc-2.19.so [.] __memcpy_ssse3 0.85% python libc-2.19.so [.] __x86.get_pc_thunk.bx 0.72% python libc-2.19.so [.] _int_malloc 0.56% python libc-2.19.so [.] __memset_sse2 0.47% python libc-2.19.so [.] _int_free 0.38% python libc-2.19.so [.] malloc 0.25% python libc-2.19.so [.] realloc 0.25% python libc-2.19.so [.] __ctype_b_loc 0.10% python libc-2.19.so [.] free 0.04% python libc-2.19.so [.] __strchr_sse2_bsf 0.03% python libc-2.19.so [.] __memcpy_ia32 0.03% python libc-2.19.so [.] __sbrk 0.03% python libc-2.19.so [.] vfprintf 0.03% python libc-2.19.so [.] mremap_chunk 0.03% python libc-2.19.so [.] __strncpy_ssse3 0.03% python libc-2.19.so [.] __strlen_sse2_bsf 0.03% python libc-2.19.so [.] __x86.get_pc_thunk.cx musl: 4.74% python libc.so [.] memcpy 2.05% python libc.so [.] free 1.17% python libc.so [.] malloc 1.05% python libc.so [.] unbin 0.90% python libc.so [.] a_and_64 0.81% python libc.so [.] a_or_64 0.68% python libc.so [.] memset 0.31% python libc.so [.] bin_index_up 0.22% python libc.so [.] bin_index 0.22% python libc.so [.] a_ctz_64 0.16% python libc.so [.] realloc 0.16% python libc.so [.] __x86.get_pc_thunk.bx 0.14% python libc.so [.] strlen 0.12% python libc.so [.] trim 0.12% python libc.so [.] strcmp 0.09% python libc.so [.] adjust_size 0.06% python libc.so [.] __strerror_l 0.06% python libc.so [.] __stpncpy 0.03% python libc.so [.] first_set 0.03% python libc.so [.] .L80 0.03% python libc.so [.] remap_rel 0.03% python libc.so [.] find_sym 0.03% python libc.so [.] sysv_hash 0.03% python libc.so [.] fclose 0.03% python libc.so [.] do_relocs 0.03% python libc.so [.] __aio_close 0.03% python libc.so [.] sysv_lookup On Wed, Apr 8, 2015 at 12:05 PM, Szabolcs Nagy wrote: > 20% is tiny measurement noise compared to the huge > variance in the environments you are comparing > > --90e6ba613f8a6744d205133b4bde Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Here's a fresh native compile of the same version= of Python, same gcc. Now the musl version is only slightly slower, maybe 5= %. BTW, I'm not complaining. I use musl for portability, not speed.

$ python
Python 2.7.9 (default, Apr = =A08 2015, 14:29:14)=A0
[GCC 4.8.2] on linux2
>>&= gt;=A0

$ perf stat ~/multicorn_ctree/spi= tfire_bigtable.py
StringIO =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0523.62 ms
cStringIO =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 144.32 = ms
list concat =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A055.12 ms

=A0Performance cou= nter stats for '/home/mudd/multicorn_ctree/spitfire_bigtable.py':

=A0 =A0 =A0 =A0 769.874633 task-clock (msec) =A0 = =A0 =A0 =A0 # =A0 =A00.977 CPUs utilized
=A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0269 context-switches =A0 =A0 =A0 =A0 =A0# =A0 =A00.349 K/sec
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A06 cpu-migrations =A0 =A0 =A0 =A0 =A0 = =A0# =A0 =A00.008 K/sec
=A0 =A0 =A0 =A0 =A0 =A0 =A05,997 page-fau= lts =A0 =A0 =A0 =A0 =A0 =A0 =A0 # =A0 =A00.008 M/sec
=A0 =A0 =A02= ,043,153,669 cycles =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0# =A0 =A02.654 G= Hz =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [50.74%]
=A0 =A0<no= t supported> stalled-cycles-frontend
=A0 =A0<not supported&= gt; stalled-cycles-backend
=A0 =A0 =A02,993,940,382 instructions = =A0 =A0 =A0 =A0 =A0 =A0 =A0# =A0 =A01.47 =A0insns per cycle =A0 =A0 =A0 =A0= [75.11%]
=A0 =A0 =A0 =A0673,064,696 branches =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0# =A0874.252 M/sec =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [74.= 59%]
=A0 =A0 =A0 =A0 15,486,299 branch-misses =A0 =A0 =A0 =A0 =A0= =A0 # =A0 =A02.30% of all branches =A0 =A0 =A0 =A0 [74.71%]

=
=A0 =A0 =A0 =A00.787704322 seconds time elapsed
$


=
Here's output from perf record/report for li= bc. This looks consistent with the 5% longer run time.

native:
=A0 =A0 =A02.20% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __memcpy_= ssse3
=A0 =A0 =A00.85% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __x86.get_pc= _thunk.bx
=A0 =A0 =A00.72% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] _int_mal= loc
=A0 =A0 =A00.56% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __memset_sse2=
=A0 =A0 =A00.47% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] _int_free
=A0 =A0 =A00.38% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] malloc
=A0 =A0 =A00.25% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] realloc
=A0 =A0 =A00.25% =A0 python =A0libc-2.1= 9.so =A0 =A0 =A0 =A0 [.] __ctype_b_loc
= =A0 =A0 =A00.10% =A0 python =A0libc-2.19.so= =A0 =A0 =A0 =A0 [.] free
=A0 =A0 =A00.= 04% =A0 python =A0libc-2.19.so =A0 =A0 = =A0 =A0 [.] __strchr_sse2_bsf
=A0 =A0 =A00.= 03% =A0 python =A0libc-2.19.so =A0 =A0 = =A0 =A0 [.] __memcpy_ia32
=A0 =A0 =A00.03% = =A0 python =A0libc-2.19.so =A0 =A0 =A0 = =A0 [.] __sbrk
=A0 =A0 =A00.03% =A0 python = =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] vfp= rintf
=A0 =A0 =A00.03% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] mremap_chunk=
=A0 =A0 =A00.03% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __strncpy_ssse3
=A0 =A0 =A00.03% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __strlen_sse2_bsf
=A0 =A0 =A00.03% =A0 python =A0libc-2.19.so =A0 =A0 =A0 =A0 [.] __x86.get_pc_thunk.cx

musl= :
=A0 =A0 =A04.74% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0= =A0 =A0[.] memcpy
=A0 =A0 =A02.05% =A0 python =A0libc.so =A0 =A0= =A0 =A0 =A0 =A0 =A0[.] free
=A0 =A0 =A01.17% =A0 python =A0libc.= so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] malloc
=A0 =A0 =A01.05% =A0 pyt= hon =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] unbin
=A0 =A0 =A00.= 90% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] a_and_64
=A0 =A0 =A00.81% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] a_or= _64
=A0 =A0 =A00.68% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 = =A0 =A0[.] memset
=A0 =A0 =A00.31% =A0 python =A0libc.so =A0 =A0 = =A0 =A0 =A0 =A0 =A0[.] bin_index_up
=A0 =A0 =A00.22% =A0 python = =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] bin_index
=A0 =A0 =A00.= 22% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] a_ctz_64
=A0 =A0 =A00.16% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] real= loc
=A0 =A0 =A00.16% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 = =A0 =A0[.] __x86.get_pc_thunk.bx
=A0 =A0 =A00.14% =A0 python =A0l= ibc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] strlen
=A0 =A0 =A00.12% =A0= python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] trim
=A0 =A0 = =A00.12% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] strcmp
<= div>=A0 =A0 =A00.09% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] a= djust_size
=A0 =A0 =A00.06% =A0 python =A0libc.so =A0 =A0 =A0 =A0= =A0 =A0 =A0[.] __strerror_l
=A0 =A0 =A00.06% =A0 python =A0libc.= so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] __stpncpy
=A0 =A0 =A00.03% =A0 = python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] first_set
=A0 = =A0 =A00.03% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] .L80
=A0 =A0 =A00.03% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.]= remap_rel
=A0 =A0 =A00.03% =A0 python =A0libc.so =A0 =A0 =A0 =A0= =A0 =A0 =A0[.] find_sym
=A0 =A0 =A00.03% =A0 python =A0libc.so = =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] sysv_hash
=A0 =A0 =A00.03% =A0 pyt= hon =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] fclose
=A0 =A0 =A00= .03% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] do_relocs
=A0 =A0 =A00.03% =A0 python =A0libc.so =A0 =A0 =A0 =A0 =A0 =A0 =A0[.] __= aio_close
=A0 =A0 =A00.03% =A0 python =A0libc.so =A0 =A0 =A0 =A0 = =A0 =A0 =A0[.] sysv_lookup




On Wed, Apr 8, 2015 at 12:05 PM, S= zabolcs Nagy=A0<nsz@port70.net>=A0wrote:
= 20% is tiny measurement noise compared to the huge
variance in the envir= onments you are comparing


--90e6ba613f8a6744d205133b4bde--