From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7068 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks up to 30 bytes long Date: Tue, 17 Feb 2015 17:51:11 +0100 Message-ID: References: <1423845589-5920-1-git-send-email-vda.linux@googlemail.com> <20150214193533.GK23507@brightrain.aerifal.cx> <20150215040655.GM23507@brightrain.aerifal.cx> <20150215150313.GO23507@brightrain.aerifal.cx> <20150216173634.GA23507@brightrain.aerifal.cx> <20150217161222.GF23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a113ac228d0fd86050f4b83f6 X-Trace: ger.gmane.org 1424191913 22439 80.91.229.3 (17 Feb 2015 16:51:53 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 17 Feb 2015 16:51:53 +0000 (UTC) Cc: musl To: Rich Felker Original-X-From: musl-return-7081-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 17 17:51:51 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YNlNG-0007r6-Sz for gllmg-musl@m.gmane.org; Tue, 17 Feb 2015 17:51:51 +0100 Original-Received: (qmail 21538 invoked by uid 550); 17 Feb 2015 16:51:49 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20415 invoked from network); 17 Feb 2015 16:51:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=jP09PIyydiKPBImne37IGzc+shP097qA+cdpkFxa5wM=; b=FH/STUCj7hcBlRdtd7kGLGyWpDV3ZQ+Qhp03h/QHp1mdFvWtMuDf2r7iJi+HSYLupe zJfSWkzD7h83XqeI6pZsXF/nzshYkXQ32fz3ixvksfG9sD8LR1g4i8oZOaZDpiG6KJ2P euLdyLAgdi2UvA4HI/Nm36x9GA35Vnml6xnbG4+YcP1KTJWCpWqyitq7QRgIbcmwmqzR PfqEN4Np4bj9vwirowAaBoGumD5iMN4QroxRCTV9eIfNKkHWH80qjw7Rm+dazGITWdak mL/EciCXVtyZ6ah0Z1C7honCzC+yZx7khWkTF7vOZ6L+nTag47h/8uNd5utBuFYf+6nj brXA== X-Received: by 10.140.33.132 with SMTP id j4mr2079077qgj.10.1424191891991; Tue, 17 Feb 2015 08:51:31 -0800 (PST) In-Reply-To: <20150217161222.GF23507@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:7068 Archived-At: --001a113ac228d0fd86050f4b83f6 Content-Type: text/plain; charset=UTF-8 On Tue, Feb 17, 2015 at 5:12 PM, Rich Felker wrote: > On Tue, Feb 17, 2015 at 02:08:52PM +0100, Denys Vlasenko wrote: >> >> Please see attached file. >> > >> > I tried it and it's ~1 cycle slower for at least sizes 16-30; >> > presumably we're seeing the cost of the extra compare/branch at these >> > sizes but not at others. What does your timing test show? >> >> See below. >> First column - result of my2.s >> Second column - result of vda1.s >> >> Basically, the "rep stosq" code path got a bit faster, while >> small memsets stayed the same. > > Can you post your test program for me to try out? Here's what I've > been using, attached. With your program I see similar results: ... size 50: min=10, avg=10 min=10, avg=10 size 52: min=10, avg=10 min=10, avg=10 size 54: min=10, avg=11 min=10, avg=11 size 56: min=10, avg=11 min=10, avg=11 size 58: min=10, avg=11 min=10, avg=10 size 60: min=10, avg=10 min=10, avg=12 size 62: min=10, avg=10 min=10, avg=11 size 64: min=18, avg=18 min=18, avg=22 size 96: min=17, avg=17 min=18, avg=18 size 128: min=31, avg=32 min=32, avg=32 size 160: min=35, avg=37 min=33, avg=37 size 192: min=40, avg=40 min=36, avg=37 size 224: min=43, avg=43 min=40, avg=40 size 256: min=44, avg=47 min=43, avg=43 size 288: min=47, avg=48 min=46, avg=47 size 320: min=50, avg=52 min=52, avg=52 size 352: min=53, avg=54 min=52, avg=60 size 384: min=56, avg=57 min=55, avg=57 size 416: min=59, avg=60 min=62, avg=63 size 448: min=63, avg=65 min=66, avg=66 size 480: min=66, avg=71 min=69, avg=69 size 512: min=73, avg=74 min=73, avg=76 size 1024: min=127, avg=129 min=127, avg=129 size 2048: min=221, avg=236 min=221, avg=236 size 4096: min=425, avg=444 min=424, avg=450 size 8192: min=831, avg=881 min=830, avg=883 size 16384: min=1644, avg=1717 min=1643, avg=1748 My test program is attached, I use: gcc -O2 -Wall memset-cycles.c FOO.s --001a113ac228d0fd86050f4b83f6 Content-Type: text/x-csrc; charset=US-ASCII; name="t.c" Content-Disposition: attachment; filename="t.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i69j1phe1 I2RlZmluZSBfR05VX1NPVVJDRQojaW5jbHVkZSA8c3lzL3R5cGVzLmg+CiNpbmNsdWRlIDxzeXMv dGltZS5oPgojaW5jbHVkZSA8c3lzL3N5c2NhbGwuaD4KI2luY2x1ZGUgPHRpbWUuaD4KI2luY2x1 ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgojaW5j bHVkZSA8c3RyaW5nLmg+Ci8qIE9sZCBnbGliYyAoPCAyLjMuNCkgZG9lcyBub3QgcHJvdmlkZSB0 aGlzIGNvbnN0YW50LiBXZSB1c2Ugc3lzY2FsbAogKiBkaXJlY3RseSBzbyB0aGlzIGRlZmluaXRp b24gaXMgc2FmZS4gKi8KI2lmbmRlZiBDTE9DS19NT05PVE9OSUMKI2RlZmluZSBDTE9DS19NT05P VE9OSUMgMQojZW5kaWYKCiNkZWZpbmUgQlVGICgyKjEwMjQpCiNkZWZpbmUgRklMTCAwCgovKiBs aWJjIGhhcyBpbmNyZWRpYmx5IG1lc3N5IHdheSBvZiBkb2luZyB0aGlzLAogKiB0eXBpY2FsbHkg cmVxdWlyaW5nIC1scnQuIFdlIGp1c3Qgc2tpcCBhbGwgdGhpcyBtZXNzICovCnN0YXRpYyB2b2lk IGdldF9tb25vKHN0cnVjdCB0aW1lc3BlYyAqdHMpCnsKICAgICAgICBzeXNjYWxsKF9fTlJfY2xv Y2tfZ2V0dGltZSwgQ0xPQ0tfTU9OT1RPTklDLCB0cyk7Cn0KCi8vdm9pZCAqbXVzbF9tZW1zZXQo dm9pZCAqcywgaW50IGMsIHNpemVfdCBuKTsKCnZvaWQgKm1lbXNldF9yZXBfc3Rvc3Eodm9pZCAq cHRyLCBpbnQgYywgc2l6ZV90IGNudCkKewoJdW5zaWduZWQgbG9uZyBheCxjeCxkaTsKCglhc20g dm9sYXRpbGUoCgkJInJlcCBzdG9zcSIKCTogIj1EIiAoZGkpLCAiPWMiIChjeCksICI9YSIgKGF4 KQoJOiAiMCIgKHB0ciksICIxIiAoY250LzgpLCAiMiIgKDApCgk6ICJtZW1vcnkiCgkpOwoJcmV0 dXJuIHB0cjsKfQoKdm9pZCAqbWVtc2V0X21vdm50aSh2b2lkICpwdHIsIGludCBjLCBzaXplX3Qg Y250KQp7Cgl1bnNpZ25lZCBsb25nIGF4LGN4LGRpOwoKCWFzbSB2b2xhdGlsZSgKCQkiMTogbW92 bnRpICUlcmF4LCglJXJkaSlcbiIKCQkiYWRkICQ4LCUlcmRpXG4iCgkJImRlYyAlJXJjeFxuIgoJ CSJqbnogMWJcbiIKCQkic2ZlbmNlXG4iCgk6ICI9RCIgKGRpKSwgIj1jIiAoY3gpLCAiPWEiIChh eCkKCTogIjAiIChwdHIpLCAiMSIgKGNudC84KSwgIjIiICgwKQoJOiAibWVtb3J5IgoJKTsKCXJl dHVybiBwdHI7Cn0KCnZvaWQgKm1lbXNldF9tb3ZudGlfdW5yb2xsKHZvaWQgKnB0ciwgaW50IGMs IHNpemVfdCBjbnQpCnsKCXVuc2lnbmVkIGxvbmcgYXgsY3gsZGk7CgoJYXNtIHZvbGF0aWxlKAoJ CSIxOlxuIgoJCSJtb3ZudGkgJSVyYXgsKCUlcmRpKVxuIgoJCSJtb3ZudGkgJSVyYXgsOCglJXJk aSlcbiIKCQkibW92bnRpICUlcmF4LDE2KCUlcmRpKVxuIgoJCSJtb3ZudGkgJSVyYXgsMjQoJSVy ZGkpXG4iCgkJImFkZCAkMzIsJSVyZGlcbiIKCQkiZGVjICUlcmN4XG4iCgkJImpueiAxYlxuIgoJ CSJzZmVuY2VcbiIKCTogIj1EIiAoZGkpLCAiPWMiIChjeCksICI9YSIgKGF4KQoJOiAiMCIgKHB0 ciksICIxIiAoY250Lyg4KjQpKSwgIjIiICgwKQoJOiAibWVtb3J5IgoJKTsKCXJldHVybiBwdHI7 Cn0KCnVuc2lnbmVkIGdldHQoKQp7CiNpZiAwCglzdHJ1Y3QgdGltZXZhbCB0djsKCWdldHRpbWVv ZmRheSgmdHYsIE5VTEwpOwoJcmV0dXJuIHR2LnR2X3VzZWM7CiNlbHNlCglzdHJ1Y3QgdGltZXNw ZWMgdHM7CglnZXRfbW9ubygmdHMpOwoJcmV0dXJuIHRzLnR2X25zZWM7CiNlbmRpZgp9Cgp1bnNp Z25lZCBkaWZmdCh1bnNpZ25lZCB0MiwgdW5zaWduZWQgdDEpCnsKCXQyIC09IHQxOwoJaWYgKChp bnQpdDIgPCAwKQoJCXQyICs9IDEwMDAwMDAwMDA7CglyZXR1cm4gdDI7Cn0KCnZvaWQgbWVhc3Vy ZSh1bnNpZ25lZCBzeiwgdm9pZCAqYnVmLCB2b2lkKiAoKm0pKHZvaWQgKnB0ciwgaW50IGMsIHNp emVfdCBjbnQpLCBjb25zdCBjaGFyICpuYW1lKQp7Cgl1bnNpZ25lZCB0MSwgdDIsIGNudDsKCXVu c2lnbmVkIHJlcGVhdCA9IDE7CgoJLyogRm9yIHNtYWxsIHNpemVzLCBjYWxsIG0oKSByZXBlYXRl ZGx5IGJlZm9yZSBtZWFzdXJpbmcgdGltZSBkaWZmICovCglyZXBlYXQgPSAoKDI1NioxMDI0KSAv IChzenwxKSkgPyA6IDE7CgovLwlzbGVlcCgxKTsKCW0oYnVmLCBGSUxMLCBzeik7IC8qIHdhcm0g dXAgY2FjaGVzICovCgltKGJ1ZiwgRklMTCwgc3opOyAvKiB3YXJtIHVwIGNhY2hlcyAqLwoKCXQy ID0gLTFVOwoJY250ID0gMTAwMDsKCXdoaWxlICgtLWNudCkgewoJCXVuc2lnbmVkIHJlcCA9IHJl cGVhdDsKCgkJdDEgPSBnZXR0KCk7CgkJZG8gewoJCQltKGJ1ZiwgRklMTCwgc3opOwoJCX0gd2hp bGUgKC0tcmVwKTsKCQl0MSA9IGRpZmZ0KGdldHQoKSwgdDEpOwoJCWlmICh0MiA+IHQxKQoJCQl0 MiA9IHQxOwovLwkJcHJpbnRmKCIlczoldSBucyAldVxuIiwgbmFtZSwgdDEsIHQyKTsKCX0KLy8J cHJpbnRmKCIlczoldSBucyAodGltZXMgJWQpLCAldSBieXRlcywgJS4yZiBieXRlcy9uc1xuIiwg bmFtZSwgdDIsIHJlcGVhdCwgc3osIChkb3VibGUpKHN6KSAqIHJlcGVhdCAvIHQyKTsKCXByaW50 ZigiJXUgYnl0ZSBibG9jazogJS4yZiBieXRlcy9uc1xuIiwgc3osIChkb3VibGUpKHN6KSAqIHJl cGVhdCAvIHQyKTsKfQoKaW50IG1haW4oKQp7CglpbnQgc3o7CgljaGFyICpidWYgPSBtYWxsb2Mo QlVGICsgNDA5Nik7CgoJYnVmICs9IDB4MTAwOwoJYnVmID0gKGNoYXIqKSgobG9uZylidWYgJiB+ MHhmZkwpOwoKCXNldGxpbmVidWYoc3Rkb3V0KTsKCXByaW50Zigic2l6ZToldSAoJXVrKSBidWY6 JXBcbiIsIEJVRiwgQlVGLzEwMjQsIGJ1Zik7CgoJc3ogPSBCVUY7CglkbyB7CgkJbWVhc3VyZShz eiwgYnVmLCBtZW1zZXQsICJtdXNsIik7Ci8vCQltZWFzdXJlKHN6LCBidWYrMSwgbWVtc2V0LCAi bXVzTCIpOwoJfSB3aGlsZSAoLS1zeiA+PSAwKTsKLy8JbWVhc3VyZShidWYsIG1lbXNldF9tb3Zu dGksICJtb3ZudGkiKTsKLy8JbWVhc3VyZShidWYsIG1lbXNldF9tb3ZudGlfdW5yb2xsLCAibW92 bnRpX3Vucm9sbCIpOwovLwltZWFzdXJlKGJ1ZiwgbWVtc2V0X3JlcF9zdG9zcSwgInN0b3MiKTsK Ly8JbWVhc3VyZShidWYrMSwgbWVtc2V0X21vdm50aSwgIm1vdm50aSsxIik7Ci8vCW1lYXN1cmUo YnVmKzEsIG1lbXNldF9tb3ZudGlfdW5yb2xsLCAibW92bnRpX3Vucm9sbCsxIik7Ci8vCW1lYXN1 cmUoYnVmKzEsIG1lbXNldF9yZXBfc3Rvc3EsICJzdG9zKzEiKTsKLy8JbWVhc3VyZShidWYrMywg bWVtc2V0X21vdm50aSwgIm1vdm50aSszIik7Ci8vCW1lYXN1cmUoYnVmKzMsIG1lbXNldF9tb3Zu dGlfdW5yb2xsLCAibW92bnRpX3Vucm9sbCszIik7Ci8vCW1lYXN1cmUoYnVmKzQsIG1lbXNldF9y ZXBfc3Rvc3EsICJzdG9zKzQiKTsKLy8JbWVhc3VyZShidWYrOCwgbWVtc2V0X3JlcF9zdG9zcSwg InN0b3MrOCIpOwoKCXJldHVybiAwOwp9Cg== --001a113ac228d0fd86050f4b83f6--