From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/6966 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] x86_64/memset: simple optimizations Date: Tue, 10 Feb 2015 21:27:17 +0100 Message-ID: References: <1423258814-9045-1-git-send-email-vda.linux@googlemail.com> <20150207003535.GS23507@brightrain.aerifal.cx> <20150207130655.GW23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a11c1bf90c72a99050ec1b7a0 X-Trace: ger.gmane.org 1423600072 29582 80.91.229.3 (10 Feb 2015 20:27:52 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 10 Feb 2015 20:27:52 +0000 (UTC) To: Rich Felker , musl@lists.openwall.com Original-X-From: musl-return-6979-gllmg-musl=m.gmane.org@lists.openwall.com Tue Feb 10 21:27:52 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YLHPT-0004cl-MT for gllmg-musl@m.gmane.org; Tue, 10 Feb 2015 21:27:51 +0100 Original-Received: (qmail 13921 invoked by uid 550); 10 Feb 2015 20:27:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 13912 invoked from network); 10 Feb 2015 20:27:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=tI8GowCT1H4B1raq1gPXLBYzteY9VQXf28AkRxToX3w=; b=DKcVNqRXcpshGlOm5R4ebbvbsS+8BrYCzXkzoFlQx6hE9puKIUTW3wmk8Q6JvYGJMx olAIVk7/Dq3dvLO7QZfArk7IPUTIt0oA2TZhnTcka9Gkz72DFU9vIcQUuo3ImMCOCq1s 848yA2IMsmSF+8tGjHXYCT4Qc1Z9q/2LarP8C/p7w2FJdrbmyCzH+hLLiXJu/p8IzG6j eUE7mPD7CXtLt1WiUAgkhuO1NjUNDOBip3P/qBBsTenQGEI1LJL9bWhTRB6T7REpyDsp 2WO8xsKWH8mMtWxuweDVSQIm0FUZtYbG6FxmxFutp1MUalqu7da4WbGFXAr/bZDzU+Qo rHPQ== X-Received: by 10.224.28.70 with SMTP id l6mr25196394qac.77.1423600058305; Tue, 10 Feb 2015 12:27:38 -0800 (PST) In-Reply-To: <20150207130655.GW23507@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:6966 Archived-At: --001a11c1bf90c72a99050ec1b7a0 Content-Type: text/plain; charset=UTF-8 On Sat, Feb 7, 2015 at 2:06 PM, Rich Felker wrote: > On Sat, Feb 07, 2015 at 01:49:43PM +0100, Denys Vlasenko wrote: >> On Sat, Feb 7, 2015 at 1:35 AM, Rich Felker wrote: >> What speedups? >> In particular: >> - perform pre-alignment if dst is unaligned > > For the rep stosq path? Does it help? I don't recall the details but I > seem to remember both docs and measurements showing no reliable > benefit from alignment for this instruction, and we had people trying > things on several different cpu models. I'm open to hearing evidence > to the contrary though. size:20k buf:0x7f38656e2100 stos:25978 ns (times 32), 25.227500 bytes/ns stos+1:31395 ns (times 32), 20.874662 bytes/ns stos+4:31396 ns (times 32), 20.873997 bytes/ns stos+8:24446 ns (times 32), 26.808476 bytes/ns size:50k buf:0x7fbca1dc9100 stos:68149 ns (times 32), 24.041439 bytes/ns stos+1:85762 ns (times 32), 19.104032 bytes/ns stos+4:85762 ns (times 32), 19.104032 bytes/ns stos+8:68204 ns (times 32), 24.022051 bytes/ns size:1024k buf:0x7fa3036a5100 stos:1632285 ns (times 32), 20.556724 bytes/ns stos+1:1891092 ns (times 32), 17.743416 bytes/ns stos+4:1891089 ns (times 32), 17.743444 bytes/ns stos+8:1632181 ns (times 32), 20.558034 bytes/ns size:5000k buf:0x7fdf5cd6b100 stos:15592138 ns (times 32), 10.558298 bytes/ns stos+1:15501841 ns (times 32), 10.619799 bytes/ns stos+4:15507773 ns (times 32), 10.615737 bytes/ns stos+8:15589617 ns (times 32), 10.560005 bytes/ns The source is attached. This data shows that (on my CPU, Sandy Bridge with 4MB L2) 8-byte alignment helps when stores fit into L1 or L2. If memset is larger than L2, memory throughput is too low and there is no measurable difference. --001a11c1bf90c72a99050ec1b7a0 Content-Type: text/x-csrc; charset=US-ASCII; name="t.c" Content-Disposition: attachment; filename="t.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i5zqi4640 I2RlZmluZSBfR05VX1NPVVJDRQojaW5jbHVkZSA8c3lzL3R5cGVzLmg+CiNpbmNsdWRlIDxzeXMv dGltZS5oPgojaW5jbHVkZSA8c3lzL3N5c2NhbGwuaD4KI2luY2x1ZGUgPHRpbWUuaD4KI2luY2x1 ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4KI2luY2x1ZGUgPHVuaXN0ZC5oPgojaW5j bHVkZSA8c3RyaW5nLmg+Ci8qIE9sZCBnbGliYyAoPCAyLjMuNCkgZG9lcyBub3QgcHJvdmlkZSB0 aGlzIGNvbnN0YW50LiBXZSB1c2Ugc3lzY2FsbAogKiBkaXJlY3RseSBzbyB0aGlzIGRlZmluaXRp b24gaXMgc2FmZS4gKi8KI2lmbmRlZiBDTE9DS19NT05PVE9OSUMKI2RlZmluZSBDTE9DS19NT05P VE9OSUMgMQojZW5kaWYKCi8qIGxpYmMgaGFzIGluY3JlZGlibHkgbWVzc3kgd2F5IG9mIGRvaW5n IHRoaXMsCiAqIHR5cGljYWxseSByZXF1aXJpbmcgLWxydC4gV2UganVzdCBza2lwIGFsbCB0aGlz IG1lc3MgKi8Kc3RhdGljIHZvaWQgZ2V0X21vbm8oc3RydWN0IHRpbWVzcGVjICp0cykKewogICAg ICAgIHN5c2NhbGwoX19OUl9jbG9ja19nZXR0aW1lLCBDTE9DS19NT05PVE9OSUMsIHRzKTsKfQoK dm9pZCBtZW1zZXRfcmVwX3N0b3NxKHZvaWQgKnB0ciwgdW5zaWduZWQgbG9uZyBjbnQpCnsKCXVu c2lnbmVkIGxvbmcgYXgsY3gsZGk7CgoJYXNtIHZvbGF0aWxlKAoJCSJyZXAgc3Rvc3EiCgk6ICI9 RCIgKGRpKSwgIj1jIiAoY3gpLCAiPWEiIChheCkKCTogIjAiIChwdHIpLCAiMSIgKGNudCksICIy IiAoMCkKCTogIm1lbW9yeSIKCSk7Cn0KCnZvaWQgbWVtc2V0X21vdm50aSh2b2lkICpwdHIsIHVu c2lnbmVkIGxvbmcgY250KQp7Cgl1bnNpZ25lZCBsb25nIGF4LGN4LGRpOwoKCWFzbSB2b2xhdGls ZSgKCQkiMTogbW92bnRpICUlcmF4LCglJXJkaSlcbiIKCQkiYWRkICQ4LCUlcmRpXG4iCgkJImRl YyAlJXJjeFxuIgoJCSJqbnogMWJcbiIKCQkic2ZlbmNlXG4iCgk6ICI9RCIgKGRpKSwgIj1jIiAo Y3gpLCAiPWEiIChheCkKCTogIjAiIChwdHIpLCAiMSIgKGNudCksICIyIiAoMCkKCTogIm1lbW9y eSIKCSk7Cn0KCnZvaWQgbWVtc2V0X21vdm50aV91bnJvbGwodm9pZCAqcHRyLCB1bnNpZ25lZCBs b25nIGNudCkKewoJdW5zaWduZWQgbG9uZyBheCxjeCxkaTsKCglhc20gdm9sYXRpbGUoCgkJIjE6 XG4iCgkJIm1vdm50aSAlJXJheCwoJSVyZGkpXG4iCgkJIm1vdm50aSAlJXJheCw4KCUlcmRpKVxu IgoJCSJtb3ZudGkgJSVyYXgsMTYoJSVyZGkpXG4iCgkJIm1vdm50aSAlJXJheCwyNCglJXJkaSlc biIKCQkiYWRkICQzMiwlJXJkaVxuIgoJCSJkZWMgJSVyY3hcbiIKCQkiam56IDFiXG4iCgkJInNm ZW5jZVxuIgoJOiAiPUQiIChkaSksICI9YyIgKGN4KSwgIj1hIiAoYXgpCgk6ICIwIiAocHRyKSwg IjEiIChjbnQvNCksICIyIiAoMCkKCTogIm1lbW9yeSIKCSk7Cn0KCnVuc2lnbmVkIGdldHQoKQp7 CiNpZiAwCglzdHJ1Y3QgdGltZXZhbCB0djsKCWdldHRpbWVvZmRheSgmdHYsIE5VTEwpOwoJcmV0 dXJuIHR2LnR2X3VzZWM7CiNlbHNlCglzdHJ1Y3QgdGltZXNwZWMgdHM7CglnZXRfbW9ubygmdHMp OwoJcmV0dXJuIHRzLnR2X25zZWM7CiNlbmRpZgp9Cgp1bnNpZ25lZCBkaWZmdCh1bnNpZ25lZCB0 MiwgdW5zaWduZWQgdDEpCnsKCXQyIC09IHQxOwoJaWYgKChpbnQpdDIgPCAwKQoJCXQyICs9IDEw MDAwMDAwMDA7CglyZXR1cm4gdDI7Cn0KCiNkZWZpbmUgQlVGICg1MCoxMDI0KQojZGVmaW5lIEJV RjggKEJVRi84KQoKdm9pZCBtZWFzdXJlKHZvaWQgKmJ1Ziwgdm9pZCAoKm0pKHZvaWQgKnB0ciwg dW5zaWduZWQgbG9uZyBjbnQpLCBjb25zdCBjaGFyICpuYW1lKQp7Cgl1bnNpZ25lZCB0MSwgdDIs IGNudDsKCglzbGVlcCgxKTsKCW0oYnVmLCBCVUY4KTsKCgl0MiA9IC0xVTsKCWNudCA9IDEwMDA7 Cgl3aGlsZSAoLS1jbnQpIHsKCQl0MSA9IGdldHQoKTsKI2RlZmluZSBSRVBFQVQgMzIKCQltKGJ1 ZiwgQlVGOCk7bShidWYsIEJVRjgpO20oYnVmLCBCVUY4KTttKGJ1ZiwgQlVGOCk7CgkJbShidWYs IEJVRjgpO20oYnVmLCBCVUY4KTttKGJ1ZiwgQlVGOCk7bShidWYsIEJVRjgpOwoJCW0oYnVmLCBC VUY4KTttKGJ1ZiwgQlVGOCk7bShidWYsIEJVRjgpO20oYnVmLCBCVUY4KTsKCQltKGJ1ZiwgQlVG OCk7bShidWYsIEJVRjgpO20oYnVmLCBCVUY4KTttKGJ1ZiwgQlVGOCk7CgkJbShidWYsIEJVRjgp O20oYnVmLCBCVUY4KTttKGJ1ZiwgQlVGOCk7bShidWYsIEJVRjgpOwoJCW0oYnVmLCBCVUY4KTtt KGJ1ZiwgQlVGOCk7bShidWYsIEJVRjgpO20oYnVmLCBCVUY4KTsKCQltKGJ1ZiwgQlVGOCk7bShi dWYsIEJVRjgpO20oYnVmLCBCVUY4KTttKGJ1ZiwgQlVGOCk7CgkJbShidWYsIEJVRjgpO20oYnVm LCBCVUY4KTttKGJ1ZiwgQlVGOCk7bShidWYsIEJVRjgpOwoJCXQxID0gZGlmZnQoZ2V0dCgpLCB0 MSk7CgkJaWYgKHQyID4gdDEpCgkJCXQyID0gdDE7Ci8vCQlwcmludGYoIiVzOiV1IG5zICV1XG4i LCBuYW1lLCB0MSwgdDIpOwoJfQoJcHJpbnRmKCIlczoldSBucyAodGltZXMgJWQpLCAlLjZmIGJ5 dGVzL25zXG4iLCBuYW1lLCB0MiwgUkVQRUFULCAoZG91YmxlKShCVUYpICogUkVQRUFUIC8gdDIp Owp9CgppbnQgbWFpbigpCnsKCWNoYXIgKmJ1ZiA9IG1hbGxvYyg4KkJVRiArIDQwOTYpOwoKCWJ1 ZiArPSAweDEwMDsKCWJ1ZiA9IChjaGFyKikoKGxvbmcpYnVmICYgfjB4ZmZMKTsKCglwcmludGYo InNpemU6JXVrIGJ1ZjolcFxuIiwgQlVGLzEwMjQsIGJ1Zik7Ci8vCW1lYXN1cmUoYnVmLCBtZW1z ZXRfbW92bnRpLCAibW92bnRpIik7Ci8vCW1lYXN1cmUoYnVmLCBtZW1zZXRfbW92bnRpX3Vucm9s bCwgIm1vdm50aV91bnJvbGwiKTsKCW1lYXN1cmUoYnVmLCBtZW1zZXRfcmVwX3N0b3NxLCAic3Rv cyIpOwovLwltZWFzdXJlKGJ1ZisxLCBtZW1zZXRfbW92bnRpLCAibW92bnRpKzEiKTsKLy8JbWVh c3VyZShidWYrMSwgbWVtc2V0X21vdm50aV91bnJvbGwsICJtb3ZudGlfdW5yb2xsKzEiKTsKCW1l YXN1cmUoYnVmKzEsIG1lbXNldF9yZXBfc3Rvc3EsICJzdG9zKzEiKTsKLy8JbWVhc3VyZShidWYr MywgbWVtc2V0X21vdm50aSwgIm1vdm50aSszIik7Ci8vCW1lYXN1cmUoYnVmKzMsIG1lbXNldF9t b3ZudGlfdW5yb2xsLCAibW92bnRpX3Vucm9sbCszIik7CgltZWFzdXJlKGJ1Zis0LCBtZW1zZXRf cmVwX3N0b3NxLCAic3Rvcys0Iik7CgltZWFzdXJlKGJ1Zis4LCBtZW1zZXRfcmVwX3N0b3NxLCAi c3Rvcys4Iik7CgoJcmV0dXJuIDA7Cn0K --001a11c1bf90c72a99050ec1b7a0--