From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS,SUSPICIOUS_RECIPS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 25136 invoked from network); 20 Apr 2020 02:33:55 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with UTF8ESMTPZ; 20 Apr 2020 02:33:55 -0000 Received: (qmail 21627 invoked by uid 550); 20 Apr 2020 02:33:54 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 21609 invoked from network); 20 Apr 2020 02:33:53 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=L1slcAQ03yqr0UgkJFkFmizJ4fWCDEfRZtw4Be329Kk=; b=buXwYOSYxE+h5aZAv9blpmsTpPOohphn2LAIgNaT5eTkAKR7HOC2DMvfGQrC49O457 6Zvm+gcTo8AczZVCiY0gBBRm0jVisqVWYNE5PfbIIculaiEiZ3IW5gXyxtbHaajXgwkg mCDSdNQDfP6okK5tRWsmBUMyoWzFc+4BHvx78XtYwh0jIVmfvO4CSZ+nCbZEplCo5w7p 9ZwYIcMJPLotl21b7SdTgf8+OHhioK8eRunV70D7rpKwFmr4zIhI7OYriirPBtOU9lP7 QFFdQB+d6IybAJGDdaFcidtfgZx0dsKm3KkaveVe0QaSsWE5ZJAC8jQwVWpkjHsMw44E d1Ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=L1slcAQ03yqr0UgkJFkFmizJ4fWCDEfRZtw4Be329Kk=; b=C/LmwEhzKcugg4yJ7zyTO3Ofmg9BKmJ/Q05uNg84FCje1O6/YmrzIohbd2y8xaiv8+ /07x4rSjzdcP2U9Joob70Z8S0ALhKRwLLfh35Etwl8urHuBGgJaVMDlEXJG3qyct6XGD k67ep9qQs28o3rsVwte5W58/lWTcgKSkHHMKsK5V2yZW/LSGnP+9gIoGaobjGmj+WAmG HDT5PWOvY7V9zWDTD8xpZxS1diq94KSJYuUYxLbu3igljLTZze7IzjPEJEH5ntmdeW2x vSnFnRyX4LToXs9GxZBpL+MyVzqvs4swg8aYmFL9L5eGbiQI96BE4SM/uNioiTht2R4P MIrg== X-Gm-Message-State: AGi0Pua3lIuSr8aMRK+HLCbGp2fjjD8GKwSZMLLLm+lbyNDx9oC++tCQ iDHwFAJ6ZOnb1wvuyLYhTto= X-Google-Smtp-Source: APiQypJ/fxob5JBsPtiGvBjdl3sbaFaLamwPooRSb3w6a4QKOelVokoK+IfhuAvRHxhKXhoRIksQqQ== X-Received: by 2002:a17:90a:e7c5:: with SMTP id kb5mr18119517pjb.187.1587350021346; Sun, 19 Apr 2020 19:33:41 -0700 (PDT) Date: Mon, 20 Apr 2020 12:32:21 +1000 From: Nicholas Piggin To: Rich Felker Cc: Adhemerval Zanella , libc-alpha@sourceware.org, libc-dev@lists.llvm.org, linuxppc-dev@lists.ozlabs.org, musl@lists.openwall.com References: <1586931450.ub4c8cq8dj.astroid@bobo.none> <20200415225539.GL11469@brightrain.aerifal.cx> <20200416153756.GU11469@brightrain.aerifal.cx> <4b2a7a56-dd2b-1863-50e5-2f4cdbeef47c@linaro.org> <20200416175932.GZ11469@brightrain.aerifal.cx> <4f824a37-e660-8912-25aa-fde88d4b79f3@linaro.org> <20200416183151.GA11469@brightrain.aerifal.cx> <1587344003.daumxvs1kh.astroid@bobo.none> <20200420013412.GZ11469@brightrain.aerifal.cx> In-Reply-To: <20200420013412.GZ11469@brightrain.aerifal.cx> MIME-Version: 1.0 Message-Id: <1587348538.l1ioqml73m.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [musl] Powerpc Linux 'scv' system call ABI proposal take 2 Excerpts from Rich Felker's message of April 20, 2020 11:34 am: > On Mon, Apr 20, 2020 at 11:10:25AM +1000, Nicholas Piggin wrote: >> Excerpts from Rich Felker's message of April 17, 2020 4:31 am: >> > Note that because lr is clobbered we need at least once normally >> > call-clobbered register that's not syscall clobbered to save lr in. >> > Otherwise stack frame setup is required to spill it. >>=20 >> The kernel would like to use r9-r12 for itself. We could do with fewer=20 >> registers, but we have some delay establishing the stack (depends on a >> load which depends on a mfspr), and entry code tends to be quite store >> heavy whereas on the caller side you have r1 set up (modulo stack=20 >> updates), and the system call is a long delay during which time the=20 >> store queue has significant time to drain. >>=20 >> My feeling is it would be better for kernel to have these scratch=20 >> registers. >=20 > If your new kernel syscall mechanism requires the caller to make a > whole stack frame it otherwise doesn't need and spill registers to it, > it becomes a lot less attractive. Some of those 90 cycles saved are > immediately lost on the userspace side, plus you either waste icache > at the call point or require the syscall to go through a > userspace-side helper function that performs the spill and restore. You would be surprised how few cycles that takes on a high end CPU. Some=20 might be a couple of %. I am one for counting cycles mind you, I'm not=20 being flippant about it. If we can come up with something faster I'd be=20 up for it. >=20 > The right way to do this is to have the kernel preserve enough > registers that userspace can avoid having any spills. It doesn't have > to preserve everything, probably just enough to save lr. (BTW are Again, the problem is the kernel doesn't have its dependencies=20 immediately ready to spill, and spilling (may be) more costly=20 immediately after the call because we're doing a lot of stores. I could try measure this. Unfortunately our pipeline simulator tool=20 doesn't model system calls properly so it's hard to see what's happening=20 across the user/kernel horizon, I might check if that can be improved or I can hack it by putting some isync in there or something. > syscall arg registers still preserved? If not, this is a major cost on > the userspace side, since any call point that has to loop-and-retry > (e.g. futex) now needs to make its own place to store the original > values.) Powerpc system calls never did. We could have scv preserve them, but=20 you'd still need to restore r3. We could make an ABI which does not clobber r3 but puts the return value in r9, say. I'd like to see what the user side code looks like to take advantage of such a thing though. Thanks, Nick