mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations
Date: Tue, 10 Feb 2015 17:36:48 -0500	[thread overview]
Message-ID: <20150210223648.GN23507@brightrain.aerifal.cx> (raw)
In-Reply-To: <20150210213756.GM23507@brightrain.aerifal.cx>

On Tue, Feb 10, 2015 at 04:37:56PM -0500, Rich Felker wrote:
> On Tue, Feb 10, 2015 at 10:08:29PM +0100, Denys Vlasenko wrote:
> > On Tue, Feb 10, 2015 at 9:50 PM, Rich Felker <dalias@libc.org> wrote:
> > > On Tue, Feb 10, 2015 at 06:30:56PM +0100, Denys Vlasenko wrote:
> > >> "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use
> > >> 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead.
> > >> [...]
> > >
> > > Do you want to go ahead with these patches as-is, or consider some of
> > > the other ideas we discussed off-list like avoiding the 64-bit imul
> > > entirely in the small-n case? If you think that's easy as another
> > > incremental change I'll go ahead with these
> > 
> > I think you can apply these patches without waiting
> > for potential future improvements.
> 
> OK. Based on some casual testing on my Celeron 847:
> 
> - For small sizes, your patches make significant improvement, 20-30%.
> 
> - For rep stosq path, the improvement is minimal (roughly 1-2 cycles).
> 
> - Using 32-bit imul instead of 64-bit makes no difference at all.
> 
> I'll review the patches again for correctness, but so far they look
> good, and it doesn't look like these are things we'd want to back out
> or rewrite for subsequent improvements anyway.
> 
> Thanks!

One more trivial change I might do: since the non-rep-stosq path is
faster for small sizes, changing the jb 1f to jbe 1f significantly
improves 16-byte memsets with no additional code changes.

Rich


  reply	other threads:[~2015-02-10 22:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-10 17:30 Denys Vlasenko
2015-02-10 17:30 ` [PATCH 2/2] x86_64/memset: avoid prforming final store twice Denys Vlasenko
2015-02-10 20:50 ` [PATCH 1/2] x86_64/memset: simple optimizations Rich Felker
2015-02-10 21:08   ` Denys Vlasenko
2015-02-10 21:37     ` Rich Felker
2015-02-10 22:36       ` Rich Felker [this message]
2015-02-10 23:20         ` Rich Felker
2015-02-11  1:07       ` Denys Vlasenko
2015-02-11  1:21         ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150210223648.GN23507@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).