From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 31085 invoked from network); 3 Oct 2020 22:32:34 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 3 Oct 2020 22:32:34 -0000 Received: (qmail 17645 invoked by uid 550); 3 Oct 2020 22:32:31 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 17625 invoked from network); 3 Oct 2020 22:32:30 -0000 X-MC-Unique: laJb1GYwNA6L4qjRmV-asQ-1 From: Denys Vlasenko To: Rich Felker Cc: Denys Vlasenko , musl@lists.openwall.com Date: Sun, 4 Oct 2020 00:32:09 +0200 Message-Id: <20201003223209.10307-1-vda.linux@googlemail.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=vda.linux@googlemail.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: googlemail.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="US-ASCII" Subject: [musl] [PATCH] x86/memset: avoid performing final store twice From: Denys Vlasenko For not very short NBYTES case: To handle the tail alignment, the code performs a potentially misaligned word store to fill the final 8 bytes of the buffer. This is done even if the buffer's end is aligned. Eventually code fills the rest of the buffer, which is a multiple of 8 bytes now, with NBYTES / 8 aligned word stores. However, this means that if NBYTES *was* divisible by 8, we store last word too, again. This patch decrements byte count before dividing it by 8, making one less store in "NBYTES is divisible by 8" case, and not changing anything in all other cases. CC: Rich Felker CC: musl@lists.openwall.com Signed-off-by: Denys Vlasenko --- src/string/i386/memset.s | 7 ++++--- src/string/x86_64/memset.s | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/string/i386/memset.s b/src/string/i386/memset.s index d00422c4..b1c5c2f8 100644 --- a/src/string/i386/memset.s +++ b/src/string/i386/memset.s @@ -47,7 +47,7 @@ memset: =09mov %edx,(-1-2-4-8-8)(%eax,%ecx) =09mov %edx,(-1-2-4-8-4)(%eax,%ecx) =20 -1:=09ret =09 +1:=09ret =20 2:=09movzbl 8(%esp),%eax =09mov %edi,12(%esp) @@ -57,13 +57,14 @@ memset: =09mov %eax,-4(%edi,%ecx) =09jnz 2f =20 -1:=09shr $2, %ecx +1:=09dec %ecx +=09shr $2, %ecx =09rep =09stosl =09mov 4(%esp),%eax =09mov 12(%esp),%edi =09ret -=09 + 2:=09xor %edx,%edx =09sub %edi,%edx =09and $15,%edx diff --git a/src/string/x86_64/memset.s b/src/string/x86_64/memset.s index 2d3f5e52..85bb686c 100644 --- a/src/string/x86_64/memset.s +++ b/src/string/x86_64/memset.s @@ -53,7 +53,7 @@ memset: 2:=09test $15,%edi =09mov %rdi,%r8 =09mov %rax,-8(%rdi,%rdx) -=09mov %rdx,%rcx +=09lea -1(%rdx),%rcx =09jnz 2f =20 1:=09shr $3,%rcx --=20 2.25.0