mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rob Landley <rob@landley.net>
To: Rich Felker <dalias@libc.org>
Cc: "j-core@j-core.org" <j-core@j-core.org>, musl@lists.openwall.com
Subject: Re: Re: [J-core] Aligned copies and cacheline conflicts?
Date: Fri, 16 Sep 2016 20:40:05 -0500	[thread overview]
Message-ID: <b7d6634b-6560-07fe-6a0a-411f2f95a0f8@landley.net> (raw)
In-Reply-To: <20160916221603.GS15995@brightrain.aerifal.cx>

On 09/16/2016 05:16 PM, Rich Felker wrote:
> Attached is a draft memcpy I'm considering for musl. Compared to the
> current one, it:
> 
> 1. Works on 32 bytes per iteration, and adds barriers between the load
>    phase and store phase to preclude cache line aliasing between src
>    and dest with a direct-mapped cache.
> 
> 2. Equally unrolls the misaligned src/dest cases.
> 
> 3. Adjusts the offsets used in the misaligned src/dest loops to all be
>    multiples of 4, with the adjustments to make that work outside the
>    loops. This helps compilers generate indexed addressing modes (e.g.
>    @(4,Rm)) rather than having to resort to arithmetic.
> 
> 4. Factors the misaligned cases into a common inline function to
>    reduce code duplication.
> 
> Comments welcome.

Superficial comments first:

I know the compiler's probably smart enough to convert %4 into &3, but
given that the point is performance optimization I'd have thought you'd
be explicit about what the machine should be doing?

Both chunks of code have their own 8 register read and 8 register write
(one is 0-7, one is 1-8).

Design comments:

Instead of optimized per-target assembly, you have an #ifdef gnuc
wrapped around just under 70 lines of C code with an __asm__
__volatile__ blob in the middle, calling a 20 line C function. Because
presumably on sh this will produce roughly the same workaround for the
primitive cache architecture, only now you're doing it indirectly and
applying it to everybody.

The motivation for this is that j2 has a more primitive cache
architecture than normal these days, so it needs an optimization most
other chips don't. This is "generic" so it'll be built on
register-constrained 32 bit x86, and on 64 bit systems where it should
presumably be using u64 not u32.

And of course gcc inlines its own version unless you hit it with a brick
anyway, so solving it in musl is of questionable utility.

I'm not sure you're focusing on the right problem?

> Rich

Rob


  reply	other threads:[~2016-09-17  1:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0c256cb1-d0fa-9a5a-3976-b7ef545c1827@landley.net>
2016-09-15  0:34 ` Rich Felker
2016-09-15  0:58   ` Rob Landley
2016-09-15  2:36     ` Rich Felker
2016-09-16 22:16       ` Rich Felker
2016-09-17  1:40         ` Rob Landley [this message]
2016-09-17  2:17           ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7d6634b-6560-07fe-6a0a-411f2f95a0f8@landley.net \
    --to=rob@landley.net \
    --cc=dalias@libc.org \
    --cc=j-core@j-core.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).