mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	libc-alpha@sourceware.org, libc-dev@lists.llvm.org,
	linuxppc-dev@lists.ozlabs.org, musl@lists.openwall.com
Subject: Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2
Date: Mon, 20 Apr 2020 13:27:15 -0400	[thread overview]
Message-ID: <20200420172715.GC11469@brightrain.aerifal.cx> (raw)
In-Reply-To: <1587356128.aslvdnmtbw.astroid@bobo.none>

On Mon, Apr 20, 2020 at 02:31:58PM +1000, Nicholas Piggin wrote:
> Excerpts from Rich Felker's message of April 20, 2020 2:09 pm:
> > On Mon, Apr 20, 2020 at 12:32:21PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Rich Felker's message of April 20, 2020 11:34 am:
> >> > On Mon, Apr 20, 2020 at 11:10:25AM +1000, Nicholas Piggin wrote:
> >> >> Excerpts from Rich Felker's message of April 17, 2020 4:31 am:
> >> >> > Note that because lr is clobbered we need at least once normally
> >> >> > call-clobbered register that's not syscall clobbered to save lr in.
> >> >> > Otherwise stack frame setup is required to spill it.
> >> >> 
> >> >> The kernel would like to use r9-r12 for itself. We could do with fewer 
> >> >> registers, but we have some delay establishing the stack (depends on a
> >> >> load which depends on a mfspr), and entry code tends to be quite store
> >> >> heavy whereas on the caller side you have r1 set up (modulo stack 
> >> >> updates), and the system call is a long delay during which time the 
> >> >> store queue has significant time to drain.
> >> >> 
> >> >> My feeling is it would be better for kernel to have these scratch 
> >> >> registers.
> >> > 
> >> > If your new kernel syscall mechanism requires the caller to make a
> >> > whole stack frame it otherwise doesn't need and spill registers to it,
> >> > it becomes a lot less attractive. Some of those 90 cycles saved are
> >> > immediately lost on the userspace side, plus you either waste icache
> >> > at the call point or require the syscall to go through a
> >> > userspace-side helper function that performs the spill and restore.
> >> 
> >> You would be surprised how few cycles that takes on a high end CPU. Some 
> >> might be a couple of %. I am one for counting cycles mind you, I'm not 
> >> being flippant about it. If we can come up with something faster I'd be 
> >> up for it.
> > 
> > If the cycle count is trivial then just do it on the kernel side.
> 
> The cycle count for user is, because you have r1 ready. Kernel does not 
> have its stack ready, it has to mfspr rX ; ld rY,N(rX); to get stack to 
> save into.
> 
> Which is also wasted work for a userspace.
> 
> Now that I think about it, no stack frame is even required! lr is saved 
> into the caller's stack when its clobbered with an asm, just as when 
> it's used for a function call.

No. If there is a non-clobbered register, lr can be moved to the
non-clobbered register rather than saved to the stack. However it
looks like (1) gcc doesn't take advantage of that possibility, but (2)
the caller already arranged for there to be space on the stack to save
lr, so the cost is only one store and one load, not any stack
adjustment or other frame setup. So it's probably not a really big
deal. However, just adding "lr" clobber to existing syscall in musl
increased the size of a simple syscall function (getuid) from 20 bytes
to 36 bytes.

> >> > syscall arg registers still preserved? If not, this is a major cost on
> >> > the userspace side, since any call point that has to loop-and-retry
> >> > (e.g. futex) now needs to make its own place to store the original
> >> > values.)
> >> 
> >> Powerpc system calls never did. We could have scv preserve them, but 
> >> you'd still need to restore r3. We could make an ABI which does not
> >> clobber r3 but puts the return value in r9, say. I'd like to see what
> >> the user side code looks like to take advantage of such a thing though.
> > 
> > Oh wow, I hadn't realized that, but indeed the code we have now is
> > allowing for the kernel to clobber them all. So at least this isn't
> > getting any worse I guess. I think it was a very poor choice of
> > behavior though and a disadvantage vs what other archs do (some of
> > them preserve all registers; others preserve only normally call-saved
> > ones plus the syscall arg ones and possibly a few other specials).
> 
> Well, we could change it. Does the generated code improve significantly
> we take those clobbers away?

I'd have to experiment a bit more to see. It's not going to help at
all in functions which are pure syscall wrappers that just do the
syscall and return, since the arg regs are dead after the syscall
anyway (the caller must assume they were clobbered). But where
syscalls are inlined and used in a loop, like a futex wait, it might
make a nontrivial difference.

Unfortunately even if you did change it for the new scv mechanism, it
would be hard to take advantage of the change while also supporting
sc, unless we used a helper function that just did scv directly, but
saved/restored all the arg regs when using the legacy sc mechanism.
Just inlining the hwcap conditional and clobbering more regs in one
code path than in the other likely would not help; gcc won't
shrink-wrap the clobbered/non-clobbered paths separately, and even if
it did, when this were inlined somewhere like a futex loop, it'd end
up having to lift the conditional out of the loop to be very
advantageous, then making the code much larger by producing two copies
of the loop. So I think just behaving similarly to the old sc method
is probably the best option we have...

Rich

  reply	other threads:[~2020-04-20 17:27 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15 21:45 Nicholas Piggin
2020-04-15 22:55 ` Rich Felker
2020-04-16  0:16   ` Nicholas Piggin
2020-04-16  0:48     ` Rich Felker
2020-04-16  2:24       ` Nicholas Piggin
2020-04-16  2:35         ` Rich Felker
2020-04-16  2:53           ` Nicholas Piggin
2020-04-16  3:03             ` Rich Felker
2020-04-16  3:41               ` Nicholas Piggin
2020-04-16 20:18             ` Florian Weimer
2020-04-16  9:58     ` Szabolcs Nagy
2020-04-20  0:27       ` Nicholas Piggin
2020-04-20  1:29         ` Rich Felker
2020-04-20  2:08           ` Nicholas Piggin
2020-04-20 21:17             ` Szabolcs Nagy
2020-04-21  9:57               ` Florian Weimer
2020-04-16 15:21     ` Jeffrey Walton
2020-04-16 15:40       ` Rich Felker
2020-04-16  4:48   ` Florian Weimer
2020-04-16 15:35     ` Rich Felker
2020-04-16 16:42       ` Florian Weimer
2020-04-16 16:52         ` Rich Felker
2020-04-16 18:12           ` Florian Weimer
2020-04-16 23:02             ` Segher Boessenkool
2020-04-17  0:34               ` Rich Felker
2020-04-17  1:48                 ` Segher Boessenkool
2020-04-17  8:34                   ` Florian Weimer
2020-04-16 14:16   ` Adhemerval Zanella
2020-04-16 15:37     ` Rich Felker
2020-04-16 17:50       ` Adhemerval Zanella
2020-04-16 17:59         ` Rich Felker
2020-04-16 18:18           ` Adhemerval Zanella
2020-04-16 18:31             ` Rich Felker
2020-04-16 18:44               ` Rich Felker
2020-04-16 18:52               ` Adhemerval Zanella
2020-04-20  0:46                 ` Nicholas Piggin
2020-04-20  1:10               ` Nicholas Piggin
2020-04-20  1:34                 ` Rich Felker
2020-04-20  2:32                   ` Nicholas Piggin
2020-04-20  4:09                     ` Rich Felker
2020-04-20  4:31                       ` Nicholas Piggin
2020-04-20 17:27                         ` Rich Felker [this message]
2020-04-22  6:18                           ` Nicholas Piggin
2020-04-22  6:29                             ` Nicholas Piggin
2020-04-23  2:36                             ` Rich Felker
2020-04-23 12:13                               ` Adhemerval Zanella
2020-04-23 16:18                                 ` Rich Felker
2020-04-23 16:35                                   ` Adhemerval Zanella
2020-04-23 16:43                                     ` Rich Felker
2020-04-23 17:15                                       ` Adhemerval Zanella
2020-04-23 17:42                                         ` Rich Felker
2020-04-25  3:40                                           ` Nicholas Piggin
2020-04-25  4:52                                             ` Rich Felker
2020-04-25  3:30                               ` Nicholas Piggin
2020-04-21 12:28                 ` David Laight
2020-04-21 14:39                   ` Rich Felker
2020-04-21 15:00                     ` Adhemerval Zanella
2020-04-21 15:31                       ` David Laight
2020-04-22  6:54                       ` [musl] " Nicholas Piggin
2020-04-22  7:15                         ` Florian Weimer
2020-04-22  7:31                           ` Nicholas Piggin
2020-04-22  8:11                             ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200420172715.GC11469@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    --cc=libc-dev@lists.llvm.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=musl@lists.openwall.com \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).