From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7014 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 1/2] x86_64/memset: avoid multiply insn if possible Date: Thu, 12 Feb 2015 21:36:26 +0100 Message-ID: References: <1423761423-30050-1-git-send-email-vda.linux@googlemail.com> <20150212172735.GX23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a11c12986286cda050eea1485 X-Trace: ger.gmane.org 1423773434 15347 80.91.229.3 (12 Feb 2015 20:37:14 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 12 Feb 2015 20:37:14 +0000 (UTC) To: musl Original-X-From: musl-return-7027-gllmg-musl=m.gmane.org@lists.openwall.com Thu Feb 12 21:37:08 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YM0VX-0005MJ-CI for gllmg-musl@m.gmane.org; Thu, 12 Feb 2015 21:37:07 +0100 Original-Received: (qmail 19991 invoked by uid 550); 12 Feb 2015 20:37:05 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 19907 invoked from network); 12 Feb 2015 20:36:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=xnxMWv8U2RqjwbbgSJgW0+aqrzVGse4lJg8KfPoWA88=; b=RLKKwYk1hlVPJx3QGJiLynX7uMC8oBpueRxcIn07ta/kHTahDu/xRg9eF4u9GwVCVu YACP/33snzCZbxCZo4kgEMid7rzvhi+oBOHtSKiPqTknZW/vuGVJwWL8Pvqood1qWqZ+ i4miviF6U9+yLCVMgmayxiwSTWGq7hWEKQvyQFfY7g+6MPdiuxqxygOYvuK9x0sPFjwF PVFMR2lZkMmjq/m5f0hXLXV2UCIcNdvwueRcFm4kT36z1ym8T+1YVxBRwvMLLx1/z/qn gPL8iimKw6FAt49Af5B9CUoKKPC68/pu4pJN6hlyWZxjPh4QkhwbJhzWpJQxcGG6kOSz 4iCA== X-Received: by 10.140.38.197 with SMTP id t63mr5517948qgt.61.1423773406859; Thu, 12 Feb 2015 12:36:46 -0800 (PST) In-Reply-To: Xref: news.gmane.org gmane.linux.lib.musl.general:7014 Archived-At: --001a11c12986286cda050eea1485 Content-Type: text/plain; charset=UTF-8 On Thu, Feb 12, 2015 at 8:26 PM, Denys Vlasenko wrote: >> I'd actually like to extend the "short" range up to at least 32 bytes >> using two 8-byte writes for the middle, unless the savings from using >> 32-bit imul instead of 64-bit are sufficient to justify 4 4-byte >> writes for the middle. On the cpu I tested on, the difference is 11 >> cycles vs 32 cycles for non-rep path versus rep path at size 32. > > The short path causes mixed feelings in me. > > On one hand, it's elegant in a contrived way. > > On the other hand, multiple > overlaying stores must be causing hell in store unit. > I'm thinking, maybe there's a faster way to do that. For example, like in the attached implementation. This one will not perform eight stores to memory to fill 15 byte area... only two. --001a11c12986286cda050eea1485 Content-Type: application/octet-stream; name="memset.s" Content-Disposition: attachment; filename="memset.s" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i62lsvde0 Lmdsb2JhbCBtZW1zZXQKLnR5cGUgbWVtc2V0LEBmdW5jdGlvbgptZW1zZXQ6Cgltb3Z6YnEgJXNp bCwlcmF4Cgl0ZXN0ICVlc2ksJWVzaQoJam56IC5MX3dpZGVuX3JheCAgIyB1bmxpa2VseQouTF93 aWRlbmVkOgoKCW1vdiAlcmRpLCVyOAoKCWNtcCAkMTYsJXJkeAoJamJlIC5MZXNzX3RoYW5fb3Jf ZXF1YWxfMTYKCgl0ZXN0ICQ3LCVkaWwKCWpueiAuTF9hbGlnbiAgIyB1bmxpa2VseQouTF9hbGln bmVkOgoKCWxlYSAtMSglcmR4KSwlcmN4CglzaHIgJDMsJXJjeAoJbW92ICVyYXgsLTgoJXJkaSwl cmR4KQoJcmVwCglzdG9zcQoKCW1vdiAlcjgsJXJheAoJcmV0CgouTF93aWRlbl9yYXg6CgkjIDY0 LWJpdCBpbXVsIGhhcyAzLTcgY3ljbGVzIGxhdGVuY3kKCW1vdiAkMHgxMDEwMTAxMDEwMTAxMDEs JXJzaQoJaW11bCAlcnNpLCVyYXgKCWptcCAuTF93aWRlbmVkCgojIDgtYnl0ZSBhbGlnbm1lbnQg Z2l2ZXMgfjI1JSBzcGVlZHVwIG9uICJyZXAgc3Rvc3EiIG1lbXNldHMKIyB0byBMMSBjYWNoZSwg Y29tcGFyZWQgdG8gaW50ZW50aW9uYWxseSBtaXNhbGlnbmVkIG9uZXMuCiMgSXQgaXMgYSBzbWFs bGVyIHdpbiBvZiB+MTUlIG9uIGxhcmdlciBtZW1zZXRzIHRvIEwyIHRvby4KIyBNZWFzdXJlZCBv biBJbnRlbCBTYW5keSBCcmlkZ2UgQ1BVIChpNy0yNjIwTSwgMi43MEdIeikKLkxfYWxpZ246Cglt b3YgJXJheCwoJXJkaSkKMToJaW5jICVyZGkKCWRlYyAlcmR4Cgl0ZXN0ICQ3LCVkaWwKCWpueiAx YgoJam1wIC5MX2FsaWduZWQKCgouTGVzc190aGFuX29yX2VxdWFsXzE2OgoJamIgMWYKICAgICMg ZmlsbCA4LTE2IGJ5dGVzOgowOgltb3YgJXJheCwoJXJkaSkKCW1vdiAlcmF4LC04KCVyZGksJXJk eCkKCW1vdiAlcjgsJXJheAoJcmV0CjE6CXRlc3QgJDgsJWRsCglqbnogMGIKCgl0ZXN0ICQ0LCVk bAoJanogMWYKICAgICMgZmlsbCA0LTcgYnl0ZXM6Cgltb3YgJWVheCwoJXJkaSkKCW1vdiAlZWF4 LC00KCVyZGksJXJkeCkKCW1vdiAlcjgsJXJheAoJcmV0CgogICAgIyBmaWxsIDAtMyBieXRlczoK MToJdGVzdCAkMiwlZGwKCWp6IDFmCgltb3YgJWF4LCglcmRpKQoJYWRkICQyLCVyZGkKMToJdGVz dCAkMSwlZGwKCWp6IDFmCgltb3YgJWFsLCglcmRpKQoxOgltb3YgJXI4LCVyYXgKCXJldAo= --001a11c12986286cda050eea1485--