From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7049 Path: news.gmane.org!not-for-mail From: Denys Vlasenko Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks up to 30 bytes long Date: Sun, 15 Feb 2015 22:44:59 +0100 Message-ID: References: <1423845589-5920-1-git-send-email-vda.linux@googlemail.com> <20150214193533.GK23507@brightrain.aerifal.cx> <20150215040655.GM23507@brightrain.aerifal.cx> <20150215150313.GO23507@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary=001a1136f6b0e64993050f2762b2 X-Trace: ger.gmane.org 1424036754 5638 80.91.229.3 (15 Feb 2015 21:45:54 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 15 Feb 2015 21:45:54 +0000 (UTC) Cc: musl To: Rich Felker Original-X-From: musl-return-7062-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 15 22:45:40 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YN70W-0002BN-EL for gllmg-musl@m.gmane.org; Sun, 15 Feb 2015 22:45:40 +0100 Original-Received: (qmail 3236 invoked by uid 550); 15 Feb 2015 21:45:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 3148 invoked from network); 15 Feb 2015 21:45:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Gi24m66tRpHwekYo3PF+/dLdb164IHMy9kf1niFAGGo=; b=VJm1JNssy4I1X0MixvkmILQ43oA9qooD8cTag0XuMMT8GGv1bbbNdHs87yAXNqQyHM qcZlbfYf16Uy9cneikSka7xyrvw7SAww9OrXFrhUYRfiDtUlcwdaRqFvzHUOcw+uGiHk Fl4cD5a4iZp1ezslYvBh7GrIiURyrG+5NO1YOvV29bnEKNXLypQ76Y3aXMCGKIwGugTg vVOsAWrzrwww952243VpXfrZRUFsXnQ7qLhlGBY+UeXz2nOOnfgn4w56/GBhPdfWA8L9 koqAHyW4gm3LF0JdJPC+s6pHEqRQcAqypXqH66W7rKzKTVRYOIipTnnn8/kTorJGCaNi YdXQ== X-Received: by 10.140.231.85 with SMTP id b82mr22369429qhc.77.1424036720927; Sun, 15 Feb 2015 13:45:20 -0800 (PST) In-Reply-To: <20150215150313.GO23507@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:7049 Archived-At: --001a1136f6b0e64993050f2762b2 Content-Type: text/plain; charset=UTF-8 On Sun, Feb 15, 2015 at 4:03 PM, Rich Felker wrote: >> Just because we don't personally see a hit from 6-cycle imul of AMD CPUs, >> it does not mean people who do use those CPUs don't exist. Have heart... > > Did you test the version I attached? I think there should be at least > 4-5 cycles between when the imul is launched and when the result is > used, so I'm failing to see how the latency is a big deal. Okay, I won't insist. Your version works good. The "rep stosq" setup time is still noticeable even when we switch to it after 126: 129 byte block: 10.37 bytes/ns 128 byte block: 10.65 bytes/ns 127 byte block: 10.58 bytes/ns 126 byte block: 18.44 bytes/ns 125 byte block: 18.30 bytes/ns 124 byte block: 18.15 bytes/ns but I don't think we should do anything about this. Here lea -1(%rdx),%rcx cmp $126,%rcx jae 2f you'd have a stall, since cmp needs the result of lea. why not this? lea -1(%rdx),%rcx cmp $127,%rdx jae 2f then you can even move lea to "big buf" code part (no point doing it in "small buf" code where it is not used). Possible bug: this check seems misplaced: 2: test %rdx,%rdx jz 1b it should be before byte stores: mov %sil,(%rdi) mov %sil,-1(%rdi,%rdx) cmp $2,%edx jbe 1f otherwise memset of zero length will fill two bytes, at buf[0] and buf[-1] "sub $8,%rcx" can be folded into lea. Please see attached file. --001a1136f6b0e64993050f2762b2 Content-Type: application/octet-stream; name="vda1.s" Content-Disposition: attachment; filename="vda1.s" Content-Transfer-Encoding: base64 X-Attachment-Id: f_i66ymu560 Lmdsb2JhbCBtZW1zZXQKLnR5cGUgbWVtc2V0LEBmdW5jdGlvbgptZW1zZXQ6Cgltb3Z6YnEgJXNp bCwlcmF4Cgltb3YgJDB4MTAxMDEwMTAxMDEwMTAxLCVyOAoJaW11bCAlcjgsJXJheAoKCWNtcCAk MTI3LCVyZHgKCWphZSAyZgoKCXRlc3QgJXJkeCwlcmR4CglqeiAxZgoKCW1vdiAlc2lsLCglcmRp KQoJbW92ICVzaWwsLTEoJXJkaSwlcmR4KQoJY21wICQyLCVlZHgKCWpiZSAxZgoKCW1vdiAlYXgs MSglcmRpKQoJbW92ICVheCwoLTEtMikoJXJkaSwlcmR4KQoJY21wICQ2LCVlZHgKCWpiZSAxZgoK CW1vdiAlZWF4LCgxKzIpKCVyZGkpCgltb3YgJWVheCwoLTEtMi00KSglcmRpLCVyZHgpCgljbXAg JDE0LCVlZHgKCWpiZSAxZgoKCW1vdiAlcmF4LCgxKzIrNCkoJXJkaSkKCW1vdiAlcmF4LCgtMS0y LTQtOCkoJXJkaSwlcmR4KQoJY21wICQzMCwlZWR4CglqYmUgMWYKCgltb3YgJXJheCwoMSsyKzQr OCkoJXJkaSkKCW1vdiAlcmF4LCgxKzIrNCs4KzgpKCVyZGkpCgltb3YgJXJheCwoLTEtMi00LTgt MTYpKCVyZGksJXJkeCkKCW1vdiAlcmF4LCgtMS0yLTQtOC04KSglcmRpLCVyZHgpCgljbXAgJDYy LCVlZHgKCWpiZSAxZgoKCW1vdiAlcmF4LCgxKzIrNCs4KzE2KSglcmRpKQoJbW92ICVyYXgsKDEr Mis0KzgrMTYrOCkoJXJkaSkKCW1vdiAlcmF4LCgxKzIrNCs4KzE2KzE2KSglcmRpKQoJbW92ICVy YXgsKDErMis0KzgrMTYrMjQpKCVyZGkpCgltb3YgJXJheCwoLTEtMi00LTgtMTYtMzIpKCVyZGks JXJkeCkKCW1vdiAlcmF4LCgtMS0yLTQtOC0xNi0yNCkoJXJkaSwlcmR4KQoJbW92ICVyYXgsKC0x LTItNC04LTE2LTE2KSglcmRpLCVyZHgpCgltb3YgJXJheCwoLTEtMi00LTgtMTYtOCkoJXJkaSwl cmR4KQoKMToJbW92ICVyZGksJXJheAoJcmV0CgoyOglsZWEgLTkoJXJkeCksJXJjeAoJbW92ICVy ZGksJXI4CglzaHIgJDMsJXJjeAoJbW92ICVyYXgsKCVyZGkpCgltb3YgJXJheCwtMTYoJXJkaSwl cmR4KQoJbW92ICVyYXgsLTgoJXJkaSwlcmR4KQoJYWRkICQ4LCVyZGkKCWFuZCAkLTgsJXJkaQoJ cmVwCglzdG9zcQoJbW92ICVyOCwlcmF4CglyZXQK --001a1136f6b0e64993050f2762b2--