From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7069 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks up to 30 bytes long Date: Tue, 17 Feb 2015 18:30:37 +0100 Message-ID: References: <1423845589-5920-1-git-send-email-vda.linux@googlemail.com> <20150214193533.GK23507@brightrain.aerifal.cx> <20150215040655.GM23507@brightrain.aerifal.cx> <20150215150313.GO23507@brightrain.aerifal.cx> <20150216173634.GA23507@brightrain.aerifal.cx> <20150217161222.GF23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a1139b1c0ce7f19050f4c1036 X-Trace: ger.gmane.org 1424194286 3707 80.91.229.3 (17 Feb 2015 17:31:26 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 17 Feb 2015 17:31:26 +0000 (UTC) Cc: musl To: Rich Felker Original-X-From: musl-return-7082-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 17 18:31:22 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YNlzS-0001Tb-LF for gllmg-musl@m.gmane.org; Tue, 17 Feb 2015 18:31:18 +0100 Original-Received: (qmail 5613 invoked by uid 550); 17 Feb 2015 17:31:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 5470 invoked from network); 17 Feb 2015 17:31:09 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=d8kJvaOkRDYSn+fkIv37lp+qz7O4LUGdtBLVYJH4JyM=; b=O7tCz4d+cf8nr1w2SNRUTUc/OPxPILaoJVyAn/PjL2W5lUEYffYDXX9LhYRdAjXQPh EfuHlJP9utDhvruE7GLi+eK5DnHupvlwAf4iiRAFnhT5qlsWdJZTKAzjYMKJVZbYfTUy 5H9MYxLzqRlryCrI0HoYDHwRcTBgWnxfxQgV+1Fgt9ps8FJ1yEy1xeukYHuYY+25z9s+ SpIZg0oIF54th7XCY/BVnwWKOywTAg9yelCiR7wSMqtr+k8c1rdrrkFSDr1gkqr27ufN WaU/7P+x2LpqtzToV9x/fRmYolCB0AcfURbW/V/Hlb7YQDO6cnRijuPaYBzhDXw6mhdl a9Dw== X-Received: by 10.140.93.73 with SMTP id c67mr798702qge.53.1424194257378; Tue, 17 Feb 2015 09:30:57 -0800 (PST) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:7069 Archived-At: --001a1139b1c0ce7f19050f4c1036 Content-Type: text/plain; charset=UTF-8 On Tue, Feb 17, 2015 at 5:51 PM, Denys Vlasenko wrote: > On Tue, Feb 17, 2015 at 5:12 PM, Rich Felker wrote: >> On Tue, Feb 17, 2015 at 02:08:52PM +0100, Denys Vlasenko wrote: >>> >> Please see attached file. >>> > >>> > I tried it and it's ~1 cycle slower for at least sizes 16-30; >>> > presumably we're seeing the cost of the extra compare/branch at these >>> > sizes but not at others. What does your timing test show? >>> >>> See below. >>> First column - result of my2.s >>> Second column - result of vda1.s >>> >>> Basically, the "rep stosq" code path got a bit faster, while >>> small memsets stayed the same. >> >> Can you post your test program for me to try out? Here's what I've >> been using, attached. > > With your program I see similar results: Changed your program to output floating point results, and do many more iterations finding minimum, as otherwise (on my machine) consecutive runs give +-2 cycles discrepancy for most measurements. With one million iterations, discrepancy between runs is often zero, and when it's not, it's one cycle or less. Please see attached files. my2.OUT1 and my2.OUT2 are two runs of my2.s code (to judge how much noise is in the measurements). --001a1139b1c0ce7f19050f4c1036 Content-Type: application/octet-stream; name="my2.OUT1" Content-Disposition: attachment; filename="my2.OUT1" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i69k9wa92 c2l6ZSAyOiBtaW49Ny45NiwgYXZnPTguMTcKc2l6ZSA0OiBtaW49OC4wMSwgYXZnPTguMTkKc2l6 ZSA2OiBtaW49OC4wMywgYXZnPTguMjYKc2l6ZSA4OiBtaW49OC4xMiwgYXZnPTguMzEKc2l6ZSAx MDogbWluPTguMTMsIGF2Zz04LjQ1CnNpemUgMTI6IG1pbj04LjE5LCBhdmc9OC40MQpzaXplIDE0 OiBtaW49OC4yMSwgYXZnPTguNDgKc2l6ZSAxNjogbWluPTkuMDUsIGF2Zz05LjM1CnNpemUgMTg6 IG1pbj05LjA2LCBhdmc9OS40OQpzaXplIDIwOiBtaW49OS4xMiwgYXZnPTkuNDUKc2l6ZSAyMjog bWluPTkuMjAsIGF2Zz05LjQ4CnNpemUgMjQ6IG1pbj05LjIyLCBhdmc9OS41NApzaXplIDI2OiBt aW49OS4yNCwgYXZnPTkuNjUKc2l6ZSAyODogbWluPTkuMzQsIGF2Zz0xMC4yMQpzaXplIDMwOiBt aW49OS4yNywgYXZnPTkuNzEKc2l6ZSAzMjogbWluPTEwLjk0LCBhdmc9MTEuMTcKc2l6ZSAzNDog bWluPTEyLjQxLCBhdmc9MTIuNzUKc2l6ZSAzNjogbWluPTEyLjU2LCBhdmc9MTIuODcKc2l6ZSAz ODogbWluPTEyLjU4LCBhdmc9MTIuOTcKc2l6ZSA0MDogbWluPTExLjc1LCBhdmc9MTIuMzkKc2l6 ZSA0MjogbWluPTExLjg3LCBhdmc9MTIuMDcKc2l6ZSA0NDogbWluPTExLjczLCBhdmc9MTIuNzQK c2l6ZSA0NjogbWluPTExLjcxLCBhdmc9MTIuODkKc2l6ZSA0ODogbWluPTExLjcwLCBhdmc9MTIu NjkKc2l6ZSA1MDogbWluPTExLjcwLCBhdmc9MTIuOTIKc2l6ZSA1MjogbWluPTExLjg0LCBhdmc9 MTMuMDMKc2l6ZSA1NDogbWluPTExLjY3LCBhdmc9MTIuMjMKc2l6ZSA1NjogbWluPTExLjY1LCBh dmc9MTIuMzcKc2l6ZSA1ODogbWluPTExLjY1LCBhdmc9MTIuMTMKc2l6ZSA2MDogbWluPTExLjYy LCBhdmc9MTIuMDUKc2l6ZSA2MjogbWluPTExLjYyLCBhdmc9MTIuMTEKc2l6ZSA2NDogbWluPTE5 LjQwLCBhdmc9MTkuOTgKc2l6ZSA5NjogbWluPTE4LjAwLCBhdmc9MTguNTgKc2l6ZSAxMjg6IG1p bj0zMi4xNCwgYXZnPTM0LjQzCnNpemUgMTYwOiBtaW49MzUuNTAsIGF2Zz0zNy45MQpzaXplIDE5 MjogbWluPTM5LjAwLCBhdmc9NDEuNjIKc2l6ZSAyMjQ6IG1pbj00Mi4wMCwgYXZnPTQ2LjQyCnNp emUgMjU2OiBtaW49NDUuMDAsIGF2Zz01MC4zMQpzaXplIDI4ODogbWluPTQ4LjAwLCBhdmc9NTIu NjMKc2l6ZSAzMjA6IG1pbj01MS4wMCwgYXZnPTU1LjczCnNpemUgMzUyOiBtaW49NTcuMDAsIGF2 Zz02MS4xMQpzaXplIDM4NDogbWluPTYwLjAwLCBhdmc9NjQuMDYKc2l6ZSA0MTY6IG1pbj02My4w MCwgYXZnPTY3LjA3CnNpemUgNDQ4OiBtaW49NjYuMDAsIGF2Zz03MC4yMwpzaXplIDQ4MDogbWlu PTY5LjAwLCBhdmc9NzMuNjEKc2l6ZSA1MTI6IG1pbj03NS4wMCwgYXZnPTgzLjEzCnNpemUgMTAy NDogbWluPTEyNi4wMCwgYXZnPTEzNC40MgpzaXplIDIwNDg6IG1pbj0yMjguMDAsIGF2Zz0yMzcu MjQKc2l6ZSA0MDk2OiBtaW49NDMyLjAwLCBhdmc9NDUzLjAxCnNpemUgODE5MjogbWluPTgzNy4w MCwgYXZnPTg2MS4yMApzaXplIDE2Mzg0OiBtaW49MTY1MC4wMCwgYXZnPTE2OTUuMzkK --001a1139b1c0ce7f19050f4c1036 Content-Type: application/octet-stream; name="vda1.OUT1" Content-Disposition: attachment; filename="vda1.OUT1" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i69k9yv93 c2l6ZSAyOiBtaW49Ny45NywgYXZnPTguMTUKc2l6ZSA0OiBtaW49OC4wMSwgYXZnPTguMjEKc2l6 ZSA2OiBtaW49OC4wMywgYXZnPTguMjcKc2l6ZSA4OiBtaW49OC4xMiwgYXZnPTguMjkKc2l6ZSAx MDogbWluPTguMTYsIGF2Zz04LjQwCnNpemUgMTI6IG1pbj04LjE5LCBhdmc9OC40OQpzaXplIDE0 OiBtaW49OC4yNSwgYXZnPTguNDYKc2l6ZSAxNjogbWluPTkuMTAsIGF2Zz05LjMzCnNpemUgMTg6 IG1pbj05LjExLCBhdmc9OS40OApzaXplIDIwOiBtaW49OS4xOSwgYXZnPTkuNTMKc2l6ZSAyMjog bWluPTkuMjAsIGF2Zz05LjQ5CnNpemUgMjQ6IG1pbj05LjIyLCBhdmc9OS41MQpzaXplIDI2OiBt aW49OS4yNCwgYXZnPTkuNjAKc2l6ZSAyODogbWluPTkuMzQsIGF2Zz0xMC4wNgpzaXplIDMwOiBt aW49OS4zNiwgYXZnPTkuOTUKc2l6ZSAzMjogbWluPTEwLjk0LCBhdmc9MTEuMzQKc2l6ZSAzNDog bWluPTEyLjUyLCBhdmc9MTIuODgKc2l6ZSAzNjogbWluPTEyLjU2LCBhdmc9MTIuOTgKc2l6ZSAz ODogbWluPTEyLjY5LCBhdmc9MTMuMDAKc2l6ZSA0MDogbWluPTExLjc1LCBhdmc9MTIuMTUKc2l6 ZSA0MjogbWluPTExLjg3LCBhdmc9MTIuNDMKc2l6ZSA0NDogbWluPTExLjczLCBhdmc9MTMuMTgK c2l6ZSA0NjogbWluPTExLjcxLCBhdmc9MTMuMTQKc2l6ZSA0ODogbWluPTExLjg1LCBhdmc9MTMu NDMKc2l6ZSA1MDogbWluPTExLjg1LCBhdmc9MTMuMjYKc2l6ZSA1MjogbWluPTExLjg0LCBhdmc9 MTMuMzQKc2l6ZSA1NDogbWluPTExLjY3LCBhdmc9MTIuMzgKc2l6ZSA1NjogbWluPTExLjY1LCBh dmc9MTIuMjQKc2l6ZSA1ODogbWluPTExLjY1LCBhdmc9MTIuNzIKc2l6ZSA2MDogbWluPTExLjYy LCBhdmc9MTIuMzYKc2l6ZSA2MjogbWluPTExLjYyLCBhdmc9MTIuMjUKc2l6ZSA2NDogbWluPTE5 LjQwLCBhdmc9MjAuNDUKc2l6ZSA5NjogbWluPTE4LjAwLCBhdmc9MTguNTUKc2l6ZSAxMjg6IG1p bj0zMS4yOSwgYXZnPTMzLjMzCnNpemUgMTYwOiBtaW49MzQuNTAsIGF2Zz0zNi44OQpzaXplIDE5 MjogbWluPTM3LjgwLCBhdmc9NDAuMjEKc2l6ZSAyMjQ6IG1pbj00Mi4wMCwgYXZnPTQzLjgxCnNp emUgMjU2OiBtaW49NDUuMDAsIGF2Zz00OC4wMwpzaXplIDI4ODogbWluPTQ4LjAwLCBhdmc9NTIu NjkKc2l6ZSAzMjA6IG1pbj01MS4wMCwgYXZnPTU1LjEwCnNpemUgMzUyOiBtaW49NTUuNTAsIGF2 Zz01OS4zMgpzaXplIDM4NDogbWluPTU4LjUwLCBhdmc9NjEuNzQKc2l6ZSA0MTY6IG1pbj02MS41 MCwgYXZnPTY1LjE3CnNpemUgNDQ4OiBtaW49NjQuNTAsIGF2Zz02OS4wOQpzaXplIDQ4MDogbWlu PTY3LjUwLCBhdmc9NzEuNjIKc2l6ZSA1MTI6IG1pbj03NS4wMCwgYXZnPTgwLjE0CnNpemUgMTAy NDogbWluPTEyNi4wMCwgYXZnPTEzMS4zMApzaXplIDIwNDg6IG1pbj0yMjguMDAsIGF2Zz0yMzUu NjkKc2l6ZSA0MDk2OiBtaW49NDMyLjAwLCBhdmc9NDQyLjQ4CnNpemUgODE5MjogbWluPTgzNy4w MCwgYXZnPTg2Mi4zNwpzaXplIDE2Mzg0OiBtaW49MTY1MC4wMCwgYXZnPTE2ODcuMDIK --001a1139b1c0ce7f19050f4c1036 Content-Type: text/x-csrc; charset=US-ASCII; name="memset-cycles-vda.c" Content-Disposition: attachment; filename="memset-cycles-vda.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i69kap2k3 I2RlZmluZSBfWE9QRU5fU09VUkNFIDcwMAojaW5jbHVkZSA8c3RkaW8uaD4KI2luY2x1ZGUgPHRp bWUuaD4KI2luY2x1ZGUgPHN0ZGxpYi5oPgojaW5jbHVkZSA8c3RyaW5nLmg+CgpzdGF0aWMgaW5s aW5lIHVuc2lnbmVkIHJkdHNjKCkKewojaWYgZGVmaW5lZCBfX2kzODZfXyB8fCBkZWZpbmVkIF9f eDg2XzY0X18KCXVuc2lnbmVkIHg7CglfX2FzbV9fIF9fdm9sYXRpbGVfXyAoICJyZHRzYyIgOiAi PWEiKHgpIDogOiAicmR4IiApOwovLwlfX2FzbV9fIF9fdm9sYXRpbGVfXyAoICJjcHVpZCA7IHJk dHNjIiA6ICI9YSIoeCkKLy8JCTogOiAicmJ4IiwgInJjeCIsICJyZHgiICk7CglyZXR1cm4geDsK I2Vsc2UKCXN0cnVjdCB0aW1lc3BlYyB0czsKCWNsb2NrX2dldHRpbWUoQ0xPQ0tfUFJPQ0VTU19D UFVUSU1FX0lELCAmdHMpOwoJcmV0dXJuIHRzLnR2X25zZWM7CiNlbmRpZgp9CgpjaGFyIGJ1Zlsz Mjc2OCsxMDBdOwoKaW50IG1haW4oKQp7Cgl1bnNpZ25lZCBpLCB0LCB0bWluOwoJdW5zaWduZWQg bG9uZyBsb25nIHRtZWFuOwoJdW5zaWduZWQgbjsKCi8vIEkgbmVlZCBhIG1pbGxpb24gb2YgaXRl cmF0aW9ucyB0byBnZXQgYSBzdGFibGUgIm1pbiIgbWVhc3VyZW1lbnQKI2RlZmluZSBSRVAgKDEw MjQqNDA5NikKCglmb3IgKG49MjsgbjwzMjc2ODsgbis9KG48NjQgPyAyIDogbjw1MTIgPyAzMiA6 IG4pKSB7CgkJaW50IHJlcGVhdCA9ICgxMDI0IC8gKG58MSkpID8gOiAxOwoKCQltZW1zZXQoYnVm LCAwLCBuKTsKCQl0bWluID0gLTE7CgkJdG1lYW4gPSAwOwoJCWZvciAoaT0wOyBpIDwgUkVQOyBp KyspIHsKCQkJaW50IGogPSByZXBlYXQ7CgkJCV9fYXNtX18gX192b2xhdGlsZV9fICgiIiA6IDog OiAibWVtb3J5Iik7CgkJCXQgPSByZHRzYygpOwoJCQlkbyB7CgkJCQltZW1zZXQoYnVmLCAwLCBu KTsKCQkJCV9fYXNtX18gX192b2xhdGlsZV9fICgiIiA6IDogOiAibWVtb3J5Iik7CgkJCX0gd2hp bGUgKC0taiAhPSAwKTsKCQkJdCA9IHJkdHNjKCkgLSB0OwoJCQlfX2FzbV9fIF9fdm9sYXRpbGVf XyAoIiIgOiA6IDogIm1lbW9yeSIpOwoJCQlpZiAodCA8IHRtaW4pIHRtaW4gPSB0OwoJCQl0bWVh biArPSB0OwoJCX0KCQlwcmludGYoInNpemUgJXU6IG1pbj0lLjJmLCBhdmc9JS4yZlxuIiwKCQkJ biwKCQkJKGRvdWJsZSl0bWluIC8gcmVwZWF0LAoJCQkoZG91YmxlKXRtZWFuIC8gKHJlcGVhdCpS RVApCgkJKTsKCX0KCXJldHVybiAwOwp9Cg== --001a1139b1c0ce7f19050f4c1036 Content-Type: application/octet-stream; name="my2.OUT2" Content-Disposition: attachment; filename="my2.OUT2" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i69ke7qz4 c2l6ZSAyOiBtaW49Ny45NiwgYXZnPTguMjAKc2l6ZSA0OiBtaW49OC4wMSwgYXZnPTguMjQKc2l6 ZSA2OiBtaW49OC4wMywgYXZnPTguMjkKc2l6ZSA4OiBtaW49OC4xMCwgYXZnPTguMzIKc2l6ZSAx MDogbWluPTguMTYsIGF2Zz04LjQ1CnNpemUgMTI6IG1pbj04LjE5LCBhdmc9OC41MApzaXplIDE0 OiBtaW49OC4yNSwgYXZnPTguNDkKc2l6ZSAxNjogbWluPTkuMTAsIGF2Zz05LjM1CnNpemUgMTg6 IG1pbj05LjE3LCBhdmc9OS41OApzaXplIDIwOiBtaW49OS4xMiwgYXZnPTkuNDgKc2l6ZSAyMjog bWluPTkuMjAsIGF2Zz05LjU0CnNpemUgMjQ6IG1pbj05LjIyLCBhdmc9OS42MQpzaXplIDI2OiBt aW49OS4yNCwgYXZnPTkuNjgKc2l6ZSAyODogbWluPTkuMzQsIGF2Zz0xMC4yNQpzaXplIDMwOiBt aW49OS4zNiwgYXZnPTkuODkKc2l6ZSAzMjogbWluPTEwLjk0LCBhdmc9MTEuMzMKc2l6ZSAzNDog bWluPTEyLjQxLCBhdmc9MTIuODcKc2l6ZSAzNjogbWluPTEyLjU2LCBhdmc9MTIuOTgKc2l6ZSAz ODogbWluPTEyLjU4LCBhdmc9MTIuOTgKc2l6ZSA0MDogbWluPTExLjc1LCBhdmc9MTIuMjIKc2l6 ZSA0MjogbWluPTExLjg3LCBhdmc9MTIuMzAKc2l6ZSA0NDogbWluPTExLjczLCBhdmc9MTIuNzEK c2l6ZSA0NjogbWluPTExLjcxLCBhdmc9MTIuODIKc2l6ZSA0ODogbWluPTExLjcwLCBhdmc9MTIu NjUKc2l6ZSA1MDogbWluPTExLjcwLCBhdmc9MTIuNjQKc2l6ZSA1MjogbWluPTExLjg0LCBhdmc9 MTUuNjEKc2l6ZSA1NDogbWluPTExLjY3LCBhdmc9MTIuNzEKc2l6ZSA1NjogbWluPTExLjY1LCBh dmc9MTIuMzMKc2l6ZSA1ODogbWluPTExLjY1LCBhdmc9MTIuNDMKc2l6ZSA2MDogbWluPTExLjYy LCBhdmc9MTIuMTAKc2l6ZSA2MjogbWluPTExLjYyLCBhdmc9MTIuMDUKc2l6ZSA2NDogbWluPTE5 LjQwLCBhdmc9MjAuMTIKc2l6ZSA5NjogbWluPTE4LjAwLCBhdmc9MTguNDYKc2l6ZSAxMjg6IG1p bj0zMi4xNCwgYXZnPTM0LjY2CnNpemUgMTYwOiBtaW49MzYuMDAsIGF2Zz0zNy45NgpzaXplIDE5 MjogbWluPTM4LjQwLCBhdmc9NDEuODMKc2l6ZSAyMjQ6IG1pbj00Mi43NSwgYXZnPTQ1LjU4CnNp emUgMjU2OiBtaW49NDUuMDAsIGF2Zz01MC44OApzaXplIDI4ODogbWluPTQ5LjAwLCBhdmc9NTQu MDEKc2l6ZSAzMjA6IG1pbj01Mi4wMCwgYXZnPTU2LjEyCnNpemUgMzUyOiBtaW49NTcuMDAsIGF2 Zz02MC42OApzaXplIDM4NDogbWluPTYwLjAwLCBhdmc9NjMuOTAKc2l6ZSA0MTY6IG1pbj02My4w MCwgYXZnPTY3LjUxCnNpemUgNDQ4OiBtaW49NjYuMDAsIGF2Zz03MC4xNwpzaXplIDQ4MDogbWlu PTY5LjAwLCBhdmc9NzMuMzIKc2l6ZSA1MTI6IG1pbj03NS4wMCwgYXZnPTgyLjc4CnNpemUgMTAy NDogbWluPTEyNi4wMCwgYXZnPTEzNC4wMApzaXplIDIwNDg6IG1pbj0yMjguMDAsIGF2Zz0yMzcu ODYKc2l6ZSA0MDk2OiBtaW49NDMyLjAwLCBhdmc9NDQ4LjMzCnNpemUgODE5MjogbWluPTgzNy4w MCwgYXZnPTg2Mi41MgpzaXplIDE2Mzg0OiBtaW49MTY1MC4wMCwgYXZnPTE2OTguNzcK --001a1139b1c0ce7f19050f4c1036--