mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rob Landley <rob@landley.net>
To: musl@lists.openwall.com
Cc: musl@lists.openwall.com, Andre Renaud <andre@bluewatersys.com>
Subject: Re: Thinking about release
Date: Sun, 14 Jul 2013 23:25:37 -0500	[thread overview]
Message-ID: <1373862337.1776.3@driftwood> (raw)
In-Reply-To: <20130711033754.GL29800@brightrain.aerifal.cx> (from dalias@aerifal.cx on Wed Jul 10 22:37:55 2013)

On 07/10/2013 10:37:55 PM, Rich Felker wrote:
> On Thu, Jul 11, 2013 at 10:44:16AM +1200, Andre Renaud wrote:
> > > This results in 95MB/s on my platform (up from 65MB/s for the  
> existing
> > > memcpy.c, and down from 105MB/s with the asm optimised version).  
> It is
> > > essentially identically readable to the existing memcpy.c. I'm not
> > > really famiilar with any other cpu architectures, so I'm not sure  
> if
> > > this would improve, or hurt, performance on other platforms.
> >
> > Reviewing the assembler that is produced, it appears that GCC will
> > never generate an ldm/stm instruction (load/store multiple) that  
> reads
> > into more than 4 registers, where as the optimised assembler does  
> them
> > that read 8 (ie: 8 * 32bit reads in a single instruction). I've  
> tried
> 
> For the asm, could we make it more than 8? 10 seems easy, 12 seems
> doubtful. I don't see a fundamental reason it needs to be a power of
> two, unless the cache line alignment really helps and isn't just
> cargo-culting. (This is something I'd still like to know about the
> asm: whether it's doing unnecessary stuff that does not help
> performance.)

You're going to hit bus bandwidth at some point, and that's likely to  
be a power of two.

> > various tricks/optimisations with the C code, and can't convince GCC
> > to do more than 4. I assume that this is probably where the  
> remaining
> > 10MB/s is between these two variants.
> 
> Yes, I suspect so. One slightly crazy idea I had was to write the
> function in C with just inline asm for the inner ldm/stm loop. The
> build system does not yet have support for .c files in the arch dirs
> instead of .s files, but it could be added.

Does it have support for a header definining a macro containing the  
assembly bit?

> > Rich - do you have any comments on whether either the C or assembler
> > variants of memcpy might be suitable for inclusion in musl?
> 
> I would say either might be, but it looks like if we want competitive
> performance, some asm will be needed (either inline or full). My
> leaning would be to go for something simpler than the asm you've been
> experimenting with, but with same or better performance, if this is
> possible. I realize the code is not that big as-is, in terms of binary
> size, but it's big from an "understanding it" perspective and I don't
> like big asm blobs that are hard for somebody to look at and say "oh
> yeah, this is clearly right".
> 
> Anyway, the big questions I'd still like to get answered before moving
> forward is whether the cache line alignment has any benefit.

I'd expect so. Fundamentally what the processor is doing is fetching  
and writing cachelines. What it does to the contents of the cachelines  
is just annotating that larger operation.

(Several days behind on email, as usual...)

> Rich

Rob

  parent reply	other threads:[~2013-07-15  4:25 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13  1:25 Rich Felker
2013-06-13  1:33 ` Andre Renaud
2013-06-13  1:43   ` Rich Felker
2013-07-09  5:06     ` Andre Renaud
2013-07-09  5:37       ` Rich Felker
2013-07-09  6:24         ` Harald Becker
2013-07-09 21:28         ` Andre Renaud
2013-07-09 22:26           ` Andre Renaud
2013-07-10  6:42             ` Jens Gustedt
2013-07-10  7:50               ` Rich Felker
2013-07-10 22:44             ` Andre Renaud
2013-07-11  3:37               ` Rich Felker
2013-07-11  4:04                 ` Andre Renaud
2013-07-11  5:10                   ` Andre Renaud
2013-07-11 12:46                     ` Rich Felker
2013-07-11 22:34                       ` Andre Renaud
2013-07-12  3:16                         ` Rich Felker
2013-07-12  3:36                           ` Andre Renaud
2013-07-12  4:16                             ` Rich Felker
2013-07-24  1:34                               ` Andre Renaud
2013-07-24  3:48                                 ` Rich Felker
2013-07-24  4:40                                   ` Andre Renaud
2013-07-28  8:09                                     ` Rich Felker
2013-07-11  5:27                 ` Daniel Cegiełka
2013-07-11 12:49                   ` Rich Felker
2013-07-15  4:25                 ` Rob Landley [this message]
2013-07-10 19:42           ` Rich Felker
2013-07-14  6:37             ` Rob Landley
2013-07-11  4:30           ` Strake
2013-07-11  4:33             ` Rich Felker
2013-07-10 19:38         ` Rob Landley
2013-07-10 20:34           ` Andre Renaud
2013-07-10 20:49             ` Nathan McSween
2013-07-10 21:01             ` Rich Felker
2013-06-13 15:46 ` Isaac
2013-06-26  1:44 ` Rich Felker
2013-06-26 10:19   ` Szabolcs Nagy
2013-06-26 14:21     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1373862337.1776.3@driftwood \
    --to=rob@landley.net \
    --cc=andre@bluewatersys.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).