From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS,SUSPICIOUS_RECIPS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 5986 invoked from network); 20 Apr 2020 04:33:38 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with UTF8ESMTPZ; 20 Apr 2020 04:33:38 -0000 Received: (qmail 32464 invoked by uid 550); 20 Apr 2020 04:33:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 32443 invoked from network); 20 Apr 2020 04:33:35 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=/SU6oEkGmAQk0odZCe5iksWkWXpw1rmuYB3XbYdsaiI=; b=G5bHaG36ijsBNBKpG/3HaCrTY+ZacEzo7C3n7s0AK+Er4XbP6cMmThvL2l542sG2U/ CBY1Nlg6B2kHTN3IHyYnfikKxgxj4O+8JwmUCdZZMwVS4vAGpm+B1+kiVMPSGacuC69H uC/OW0sMhlntGu4oicKjPHnwXiNxMRu7BTczRiT4KcoyKI8bPpyvm6VEumuDCYCD2IP8 YboDZcP4XTFf5jDY0mus/xCH/ekhM7puT0gS32l6UhbaMlH/Sjsu4gj5J76dAZa3SixV CCOVZV+9GjqbEOsnU/QKwJyjfJSjZ/9Zs7Q1pMopgZ5DDARQLbn3Sqa6ThUP+h4Pj4CS kaMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=/SU6oEkGmAQk0odZCe5iksWkWXpw1rmuYB3XbYdsaiI=; b=pulWDMUTl7jfgGxpsEkqRxOFwImG4Msh3r8TJvaHWo39Lg0Fgvk7p1M9emqphEXXd4 o7RnGE4a36WhyJUjBsp9hCHxoe8AWkyfAwkE3urKg+nBZuPR21y3oLOd5BXgG62s1NXw +A1zOIbHNyNxYVYIwA9uPtPj8nDY5GAPl6Jtdnl5lY398DTElaKt4pxsafIcqHrFDrIj hJ7tdEnE2qvtak6ArPH2S2K3yW2WHx9PYns4kvPp+hO9iZD1NzqYcVYWeUh2IxuIGC7j J5zPi5H46qpTdmWVfYZzgt7jYnn/LwQYe5JUNFvN0FMQDC8u8QI6TtrvdfOh5rCJRY+9 phGw== X-Gm-Message-State: AGi0Pub0BF46JJJ3vcO5CEsdsYZQJxJAhs1kOjS46JyN1fUje/GYpE16 zJEbr1ZtgjyzEsG0n3n5OYFCowOe X-Google-Smtp-Source: APiQypIXHLzTzY/Qh3MIsu+pH5hF6id8O/cnX7Wfi4Xm/YIeT8paufHl8HNKcBlATNKaxDVJqO65uQ== X-Received: by 2002:a17:90a:a591:: with SMTP id b17mr19624411pjq.90.1587357202797; Sun, 19 Apr 2020 21:33:22 -0700 (PDT) Date: Mon, 20 Apr 2020 14:31:58 +1000 From: Nicholas Piggin To: Rich Felker Cc: Adhemerval Zanella , libc-alpha@sourceware.org, libc-dev@lists.llvm.org, linuxppc-dev@lists.ozlabs.org, musl@lists.openwall.com References: <20200415225539.GL11469@brightrain.aerifal.cx> <20200416153756.GU11469@brightrain.aerifal.cx> <4b2a7a56-dd2b-1863-50e5-2f4cdbeef47c@linaro.org> <20200416175932.GZ11469@brightrain.aerifal.cx> <4f824a37-e660-8912-25aa-fde88d4b79f3@linaro.org> <20200416183151.GA11469@brightrain.aerifal.cx> <1587344003.daumxvs1kh.astroid@bobo.none> <20200420013412.GZ11469@brightrain.aerifal.cx> <1587348538.l1ioqml73m.astroid@bobo.none> <20200420040926.GA11469@brightrain.aerifal.cx> In-Reply-To: <20200420040926.GA11469@brightrain.aerifal.cx> MIME-Version: 1.0 Message-Id: <1587356128.aslvdnmtbw.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2 Excerpts from Rich Felker's message of April 20, 2020 2:09 pm: > On Mon, Apr 20, 2020 at 12:32:21PM +1000, Nicholas Piggin wrote: >> Excerpts from Rich Felker's message of April 20, 2020 11:34 am: >> > On Mon, Apr 20, 2020 at 11:10:25AM +1000, Nicholas Piggin wrote: >> >> Excerpts from Rich Felker's message of April 17, 2020 4:31 am: >> >> > Note that because lr is clobbered we need at least once normally >> >> > call-clobbered register that's not syscall clobbered to save lr in. >> >> > Otherwise stack frame setup is required to spill it. >> >>=20 >> >> The kernel would like to use r9-r12 for itself. We could do with fewe= r=20 >> >> registers, but we have some delay establishing the stack (depends on = a >> >> load which depends on a mfspr), and entry code tends to be quite stor= e >> >> heavy whereas on the caller side you have r1 set up (modulo stack=20 >> >> updates), and the system call is a long delay during which time the=20 >> >> store queue has significant time to drain. >> >>=20 >> >> My feeling is it would be better for kernel to have these scratch=20 >> >> registers. >> >=20 >> > If your new kernel syscall mechanism requires the caller to make a >> > whole stack frame it otherwise doesn't need and spill registers to it, >> > it becomes a lot less attractive. Some of those 90 cycles saved are >> > immediately lost on the userspace side, plus you either waste icache >> > at the call point or require the syscall to go through a >> > userspace-side helper function that performs the spill and restore. >>=20 >> You would be surprised how few cycles that takes on a high end CPU. Some= =20 >> might be a couple of %. I am one for counting cycles mind you, I'm not=20 >> being flippant about it. If we can come up with something faster I'd be=20 >> up for it. >=20 > If the cycle count is trivial then just do it on the kernel side. The cycle count for user is, because you have r1 ready. Kernel does not=20 have its stack ready, it has to mfspr rX ; ld rY,N(rX); to get stack to=20 save into. Which is also wasted work for a userspace. Now that I think about it, no stack frame is even required! lr is saved=20 into the caller's stack when its clobbered with an asm, just as when=20 it's used for a function call. >> > The right way to do this is to have the kernel preserve enough >> > registers that userspace can avoid having any spills. It doesn't have >> > to preserve everything, probably just enough to save lr. (BTW are >>=20 >> Again, the problem is the kernel doesn't have its dependencies=20 >> immediately ready to spill, and spilling (may be) more costly=20 >> immediately after the call because we're doing a lot of stores. >>=20 >> I could try measure this. Unfortunately our pipeline simulator tool=20 >> doesn't model system calls properly so it's hard to see what's happening= =20 >> across the user/kernel horizon, I might check if that can be improved >> or I can hack it by putting some isync in there or something. >=20 > I think it's unlikely to make any real difference to the total number > of cycles spent which side it happens on, but putting it on the kernel > side makes it easier to avoid wasting size/icache at each syscall > site. >=20 >> > syscall arg registers still preserved? If not, this is a major cost on >> > the userspace side, since any call point that has to loop-and-retry >> > (e.g. futex) now needs to make its own place to store the original >> > values.) >>=20 >> Powerpc system calls never did. We could have scv preserve them, but=20 >> you'd still need to restore r3. We could make an ABI which does not >> clobber r3 but puts the return value in r9, say. I'd like to see what >> the user side code looks like to take advantage of such a thing though. >=20 > Oh wow, I hadn't realized that, but indeed the code we have now is > allowing for the kernel to clobber them all. So at least this isn't > getting any worse I guess. I think it was a very poor choice of > behavior though and a disadvantage vs what other archs do (some of > them preserve all registers; others preserve only normally call-saved > ones plus the syscall arg ones and possibly a few other specials). Well, we could change it. Does the generated code improve significantly we take those clobbers away? Thanks, Nick