* [musl] Linux powerpc new system call instruction and ABI @ 2020-06-11 8:12 Nicholas Piggin 2020-06-11 8:12 ` [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw) To: linuxppc-dev; +Cc: Nicholas Piggin, musl, libc-dev, linux-api Thanks to everyone who has given feedback on the proposed new system call instruction and ABI, I think it has reached agreement and the implementation can be merged into Linux. I have a hacked glibc implementation (that doesn't do all the right HWCAP detection and misses a few things) that I've tested several things including some kernel selftests (involving signals and syscalls) with. System Call Vectored (scv) ABI ============================== The scv instruction is introduced with POWER9 / ISA3, it comes with an rfscv counter-part. The benefit of these instructions is performance (trading slower SRR0/1 with faster LR/CTR registers, and entering the kernel with MSR[EE] and MSR[RI] left enabled, which can reduce MSR updates. The scv instruction has 128 levels (not enough to cover the Linux system call space). Assignment and advertisement ---------------------------- The proposal is to assign scv levels conservatively, and advertise them with HWCAP feature bits as we add support for more. Linux has not enabled FSCR[SCV] yet, so executing the scv instruction will cause the kernel to log a "SCV facility unavilable" message, and deliver a SIGILL with ILL_ILLOPC to the process. Linux has defined a HWCAP2 bit PPC_FEATURE2_SCV for SCV support, but does not set it. This change allocates the zero level ('scv 0'), advertised with PPC_FEATURE2_SCV, which will be used to provide normal Linux system calls (equivalent to 'sc'). Attempting to execute scv with other levels will cause a SIGILL to be delivered the same as before, but will not log a "SCV facility unavailable" message (because the processor facility is enabled). Calling convention ------------------ The proposal is for scv 0 to provide the standard Linux system call ABI with the following differences from sc convention[1]: - lr is to be volatile across scv calls. This is necessary because the scv instruction clobbers lr. From previous discussion, this should be possible to deal with in GCC clobbers and CFI. - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the kernel system call exit to avoid restoring the volatile cr registers (although we probably still would anyway to avoid information leaks). - Error handling: The consensus among kernel, glibc, and musl is to move to using negative return values in r3 rather than CR0[SO]=1 to indicate error, which matches most other architectures, and is closer to a function call. Notes ----- - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as submitted currently preserves them. This is to leave room for deciding which way to go with these. Some small benefit was found by preserving them[1] but I'm not convinced it's worth deviating from the C function call ABI just for this. Release code should follow the ABI. Previous discussions: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/208691.html https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209268.html [1] https://github.com/torvalds/linux/blob/master/Documentation/powerpc/syscall64-abi.rst [2] https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-April/209263.html The following patches to add scv support to Linux are posted to https://lists.ozlabs.org/pipermail/linuxppc-dev/ Nicholas Piggin (2): powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked powerpc/64s: system call support for scv/rfscv instructions Thanks, Nick -- 2.23.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked 2020-06-11 8:12 [musl] Linux powerpc new system call instruction and ABI Nicholas Piggin @ 2020-06-11 8:12 ` Nicholas Piggin 2020-07-24 13:25 ` [musl] " Michael Ellerman 2020-06-11 8:12 ` [musl] [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin ` (2 subsequent siblings) 3 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw) To: linuxppc-dev; +Cc: Nicholas Piggin, musl, libc-dev, linux-api The scv instruction causes an interrupt which can enter the kernel with MSR[EE]=1, thus allowing interrupts to hit at any time. These must not be taken as normal interrupts, because they come from MSR[PR]=0 context, and yet the kernel stack is not yet set up and r13 is not set to the PACA). Treat this as a soft-masked interrupt regardless of the soft masked state. This does not affect behaviour yet, because currently all interrupts are taken with MSR[EE]=0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/kernel/exceptions-64s.S | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index e70ebb5c318c..388e34665b4a 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -508,8 +508,24 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) .macro __GEN_COMMON_BODY name .if IMASK + .if ! ISTACK + .error "No support for masked interrupt to use custom stack" + .endif + + /* If coming from user, skip soft-mask tests. */ + andi. r10,r12,MSR_PR + bne 2f + + /* Kernel code running below __end_interrupts is implicitly + * soft-masked */ + LOAD_HANDLER(r10, __end_interrupts) + cmpld r11,r10 + li r10,IMASK + blt- 1f + + /* Test the soft mask state against our interrupt's bit */ lbz r10,PACAIRQSOFTMASK(r13) - andi. r10,r10,IMASK +1: andi. r10,r10,IMASK /* Associate vector numbers with bits in paca->irq_happened */ .if IVEC == 0x500 || IVEC == 0xea0 li r10,PACA_IRQ_EE @@ -540,7 +556,7 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real) .if ISTACK andi. r10,r12,MSR_PR /* See if coming from user */ - mr r10,r1 /* Save r1 */ +2: mr r10,r1 /* Save r1 */ subi r1,r1,INT_FRAME_SIZE /* alloc frame on kernel stack */ beq- 100f ld r1,PACAKSAVE(r13) /* kernel stack to use */ @@ -2838,7 +2854,8 @@ masked_interrupt: ld r10,PACA_EXGEN+EX_R10(r13) ld r11,PACA_EXGEN+EX_R11(r13) ld r12,PACA_EXGEN+EX_R12(r13) - /* returns to kernel where r13 must be set up, so don't restore it */ + ld r13,PACA_EXGEN+EX_R13(r13) + /* May return to masked low address where r13 is not set up */ .if \hsrr HRFI_TO_KERNEL .else @@ -2997,6 +3014,10 @@ EXC_COMMON_BEGIN(ppc64_runlatch_on_trampoline) USE_FIXED_SECTION(virt_trampolines) /* + * All code below __end_interrupts is treated as soft-masked. If + * any code runs here with MSR[EE]=1, it must then cope with pending + * soft interrupt being raised (i.e., by ensuring it is replayed). + * * The __end_interrupts marker must be past the out-of-line (OOL) * handlers, so that they are copied to real address 0x100 when running * a relocatable kernel. This ensures they can be reached from the short -- 2.23.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked 2020-06-11 8:12 ` [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin @ 2020-07-24 13:25 ` Michael Ellerman 0 siblings, 0 replies; 42+ messages in thread From: Michael Ellerman @ 2020-07-24 13:25 UTC (permalink / raw) To: linuxppc-dev, Nicholas Piggin; +Cc: musl, linux-api, libc-dev On Thu, 11 Jun 2020 18:12:02 +1000, Nicholas Piggin wrote: > The scv instruction causes an interrupt which can enter the kernel with > MSR[EE]=1, thus allowing interrupts to hit at any time. These must not > be taken as normal interrupts, because they come from MSR[PR]=0 context, > and yet the kernel stack is not yet set up and r13 is not set to the > PACA). > > Treat this as a soft-masked interrupt regardless of the soft masked > state. This does not affect behaviour yet, because currently all > interrupts are taken with MSR[EE]=0. Applied to powerpc/next. [1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked https://git.kernel.org/powerpc/c/b2dc2977cba48990df45e0a96150663d4f342700 [2/2] powerpc/64s: system call support for scv/rfscv instructions https://git.kernel.org/powerpc/c/7fa95f9adaee7e5cbb195d3359741120829e488b cheers ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions 2020-06-11 8:12 [musl] Linux powerpc new system call instruction and ABI Nicholas Piggin 2020-06-11 8:12 ` [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin @ 2020-06-11 8:12 ` Nicholas Piggin 2020-07-23 6:47 ` [musl] " Michael Ellerman 2020-06-11 21:02 ` [musl] Re: Linux powerpc new system call instruction and ABI Segher Boessenkool 2021-05-18 23:13 ` Dmitry V. Levin 3 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2020-06-11 8:12 UTC (permalink / raw) To: linuxppc-dev; +Cc: Nicholas Piggin, musl, libc-dev, linux-api Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that lr is not preserved, nor are volatile cr registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve lr. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- Documentation/powerpc/syscall64-abi.rst | 42 ++++-- arch/powerpc/include/asm/asm-prototypes.h | 2 +- arch/powerpc/include/asm/exception-64s.h | 6 + arch/powerpc/include/asm/head-64.h | 2 +- arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/ppc_asm.h | 2 + arch/powerpc/include/asm/ptrace.h | 7 +- arch/powerpc/include/asm/setup.h | 4 +- arch/powerpc/include/asm/sstep.h | 1 + arch/powerpc/kernel/cpu_setup_power.S | 10 +- arch/powerpc/kernel/cputable.c | 3 +- arch/powerpc/kernel/dt_cpu_ftrs.c | 1 + arch/powerpc/kernel/entry_64.S | 171 +++++++++++++++++++++- arch/powerpc/kernel/exceptions-64s.S | 123 +++++++++++++++- arch/powerpc/kernel/process.c | 10 +- arch/powerpc/kernel/setup_64.c | 5 +- arch/powerpc/kernel/signal.c | 19 ++- arch/powerpc/kernel/syscall_64.c | 32 +++- arch/powerpc/lib/sstep.c | 14 ++ arch/powerpc/platforms/pseries/setup.c | 8 +- arch/powerpc/xmon/xmon.c | 1 + 21 files changed, 421 insertions(+), 44 deletions(-) diff --git a/Documentation/powerpc/syscall64-abi.rst b/Documentation/powerpc/syscall64-abi.rst index e49f69f941b9..46caaadbb029 100644 --- a/Documentation/powerpc/syscall64-abi.rst +++ b/Documentation/powerpc/syscall64-abi.rst @@ -5,6 +5,15 @@ Power Architecture 64-bit Linux system call ABI syscall ======= +Invocation +---------- +The syscall is made with the sc instruction, and returns with execution +continuing at the instruction following the sc instruction. + +If PPC_FEATURE2_SCV appears in the AT_HWCAP2 ELF auxiliary vector, the +scv 0 instruction is an alternative that may provide better performance, +with some differences to calling sequence. + syscall calling sequence\ [1]_ matches the Power Architecture 64-bit ELF ABI specification C function calling sequence, including register preservation rules, with the following differences. @@ -12,16 +21,23 @@ rules, with the following differences. .. [1] Some syscalls (typically low-level management functions) may have different calling sequences (e.g., rt_sigreturn). -Parameters and return value ---------------------------- +Parameters +---------- The system call number is specified in r0. There is a maximum of 6 integer parameters to a syscall, passed in r3-r8. -Both a return value and a return error code are returned. cr0.SO is the return -error code, and r3 is the return value or error code. When cr0.SO is clear, -the syscall succeeded and r3 is the return value. When cr0.SO is set, the -syscall failed and r3 is the error code that generally corresponds to errno. +Return value +------------ +- For the sc instruction, both a value and an error condition are returned. + cr0.SO is the error condition, and r3 is the return value. When cr0.SO is + clear, the syscall succeeded and r3 is the return value. When cr0.SO is set, + the syscall failed and r3 is the error value (that normally corresponds to + errno). + +- For the scv 0 instruction, the return value indicates failure if it is + -4095..-1 (i.e., it is >= -MAX_ERRNO (-4095) as an unsigned comparison), + in which case the error value is the negated return value. Stack ----- @@ -34,22 +50,23 @@ Register preservation rules match the ELF ABI calling sequence with the following differences: =========== ============= ======================================== +--- For the sc instruction, differences with the ELF ABI --- r0 Volatile (System call number.) r3 Volatile (Parameter 1, and return value.) r4-r8 Volatile (Parameters 2-6.) -cr0 Volatile (cr0.SO is the return error condition) +cr0 Volatile (cr0.SO is the return error condition.) cr1, cr5-7 Nonvolatile lr Nonvolatile + +--- For the scv 0 instruction, differences with the ELF ABI --- +r0 Volatile (System call number.) +r3 Volatile (Parameter 1, and return value.) +r4-r8 Volatile (Parameters 2-6.) =========== ============= ======================================== All floating point and vector data registers as well as control and status registers are nonvolatile. -Invocation ----------- -The syscall is performed with the sc instruction, and returns with execution -continuing at the instruction following the sc instruction. - Transactional Memory -------------------- Syscall behavior can change if the processor is in transactional or suspended @@ -75,6 +92,7 @@ auxiliary vector. returning to the caller. This case is not well defined or supported, so this behavior should not be relied upon. +scv 0 syscalls will always behave as PPC_FEATURE2_HTM_NOSC. vsyscall ======== diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h index 7d81e86a1e5d..fb47bf5818c8 100644 --- a/arch/powerpc/include/asm/asm-prototypes.h +++ b/arch/powerpc/include/asm/asm-prototypes.h @@ -98,7 +98,7 @@ unsigned long __init early_init(unsigned long dt_ptr); void __init machine_init(u64 dt_ptr); #endif long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs); -notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs); +notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs, long scv); notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr); notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr); diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 47bd4ea0837d..0c2fe7f042d1 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -123,6 +123,12 @@ hrfid; \ b hrfi_flush_fallback +#define RFSCV_TO_USER \ + STF_EXIT_BARRIER_SLOT; \ + RFI_FLUSH_SLOT; \ + RFSCV; \ + b rfscv_flush_fallback + #endif /* __ASSEMBLY__ */ #endif /* _ASM_POWERPC_EXCEPTION_H */ diff --git a/arch/powerpc/include/asm/head-64.h b/arch/powerpc/include/asm/head-64.h index 2dabcf668292..4cb9efa2eb21 100644 --- a/arch/powerpc/include/asm/head-64.h +++ b/arch/powerpc/include/asm/head-64.h @@ -128,7 +128,7 @@ end_##sname: .if ((start) % (size) != 0); \ .error "Fixed section exception vector misalignment"; \ .endif; \ - .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100); \ + .if ((size) != 0x20) && ((size) != 0x80) && ((size) != 0x100) && ((size) != 0x1000); \ .error "Fixed section exception vector bad size"; \ .endif; \ .if (start) < sname##_start; \ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 2a39c716c343..b2bdc4de1292 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -257,6 +257,7 @@ #define PPC_INST_MFVSRD 0x7c000066 #define PPC_INST_MTVSRD 0x7c000166 #define PPC_INST_SC 0x44000002 +#define PPC_INST_SCV 0x44000001 #define PPC_INST_SLBFEE 0x7c0007a7 #define PPC_INST_SLBIA 0x7c0003e4 @@ -411,6 +412,7 @@ #define __PPC_CT(t) (((t) & 0x0f) << 21) #define __PPC_SPR(r) ((((r) & 0x1f) << 16) | ((((r) >> 5) & 0x1f) << 11)) #define __PPC_RC21 (0x1 << 10) +#define __PPC_LEV(l) (((l) & 0x7f) << 5) /* * Both low and high 16 bits are added as SIGNED additions, so if low 16 bits diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h index 6b03dff61a05..160f3bb77ea4 100644 --- a/arch/powerpc/include/asm/ppc_asm.h +++ b/arch/powerpc/include/asm/ppc_asm.h @@ -755,6 +755,8 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96) #define N_SLINE 68 #define N_SO 100 +#define RFSCV .long 0x4c0000a4 + /* * Create an endian fixup trampoline * diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index ac3970fff0d5..f194339cef3b 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -222,9 +222,14 @@ static inline void set_trap(struct pt_regs *regs, unsigned long val) regs->trap = (regs->trap & TRAP_FLAGS_MASK) | (val & ~TRAP_FLAGS_MASK); } +static inline bool trap_is_scv(struct pt_regs *regs) +{ + return (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && TRAP(regs) == 0x3000); +} + static inline bool trap_is_syscall(struct pt_regs *regs) { - return TRAP(regs) == 0xc00; + return (trap_is_scv(regs) || TRAP(regs) == 0xc00); } static inline bool trap_norestart(struct pt_regs *regs) diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 65676e2325b8..9efbddee2bca 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -30,12 +30,12 @@ void setup_panic(void); #define ARCH_PANIC_TIMEOUT 180 #ifdef CONFIG_PPC_PSERIES -extern void pseries_enable_reloc_on_exc(void); +extern bool pseries_enable_reloc_on_exc(void); extern void pseries_disable_reloc_on_exc(void); extern void pseries_big_endian_exceptions(void); extern void pseries_little_endian_exceptions(void); #else -static inline void pseries_enable_reloc_on_exc(void) {} +static inline bool pseries_enable_reloc_on_exc(void) { return false; } static inline void pseries_disable_reloc_on_exc(void) {} static inline void pseries_big_endian_exceptions(void) {} static inline void pseries_little_endian_exceptions(void) {} diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h index 3b01c69a44aa..eaa4fb6c8960 100644 --- a/arch/powerpc/include/asm/sstep.h +++ b/arch/powerpc/include/asm/sstep.h @@ -40,6 +40,7 @@ enum instruction_type { CACHEOP, BARRIER, SYSCALL, + SYSCALL_VECTORED_0, MFMSR, MTMSR, RFI, diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index efdcfa714106..86527d19348c 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -98,7 +98,7 @@ _GLOBAL(__setup_cpu_power10) _GLOBAL(__setup_cpu_power9) mflr r11 - bl __init_FSCR + bl __init_FSCR_power9 1: bl __init_PMU bl __init_hvmode_206 mtlr r11 @@ -128,7 +128,7 @@ _GLOBAL(__restore_cpu_power10) _GLOBAL(__restore_cpu_power9) mflr r11 - bl __init_FSCR + bl __init_FSCR_power9 1: bl __init_PMU mfmsr r3 rldicl. r0,r3,4,63 @@ -198,6 +198,12 @@ __init_FSCR_power10: mtspr SPRN_FSCR, r3 // fall through +__init_FSCR_power9: + mfspr r3, SPRN_FSCR + ori r3, r3, FSCR_SCV + mtspr SPRN_FSCR, r3 + // fall through + __init_FSCR: mfspr r3,SPRN_FSCR ori r3,r3,FSCR_TAR|FSCR_EBB diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index b4066354f073..3d406a9626e8 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -120,7 +120,8 @@ extern void __restore_cpu_e6500(void); #define COMMON_USER2_POWER9 (COMMON_USER2_POWER8 | \ PPC_FEATURE2_ARCH_3_00 | \ PPC_FEATURE2_HAS_IEEE128 | \ - PPC_FEATURE2_DARN ) + PPC_FEATURE2_DARN | \ + PPC_FEATURE2_SCV) #define COMMON_USER_POWER10 COMMON_USER_POWER9 #define COMMON_USER2_POWER10 (COMMON_USER2_POWER9 | \ PPC_FEATURE2_ARCH_3_1 | \ diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c index 3a409517c031..50b2d544361e 100644 --- a/arch/powerpc/kernel/dt_cpu_ftrs.c +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c @@ -587,6 +587,7 @@ static struct dt_cpu_feature_match __initdata {"little-endian", feat_enable_le, CPU_FTR_REAL_LE}, {"smt", feat_enable_smt, 0}, {"interrupt-facilities", feat_enable, 0}, + {"system-call-vectored", feat_enable, 0}, {"timer-facilities", feat_enable, 0}, {"timer-facilities-v3", feat_enable, 0}, {"debug-facilities", feat_enable, 0}, diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 9d49338e0c85..223c4f008e63 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -64,15 +64,173 @@ exception_marker: .section ".text" .align 7 +#ifdef CONFIG_PPC_BOOK3S +.macro system_call_vectored name trapnr + .globl system_call_vectored_\name +system_call_vectored_\name: +_ASM_NOKPROBE_SYMBOL(system_call_vectored_\name) +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM +BEGIN_FTR_SECTION + extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */ + bne .Ltabort_syscall +END_FTR_SECTION_IFSET(CPU_FTR_TM) +#endif + INTERRUPT_TO_KERNEL + mr r10,r1 + ld r1,PACAKSAVE(r13) + std r10,0(r1) + std r11,_NIP(r1) + std r12,_MSR(r1) + std r0,GPR0(r1) + std r10,GPR1(r1) + std r2,GPR2(r1) + ld r2,PACATOC(r13) + mfcr r12 + li r11,0 + /* Can we avoid saving r3-r8 in common case? */ + std r3,GPR3(r1) + std r4,GPR4(r1) + std r5,GPR5(r1) + std r6,GPR6(r1) + std r7,GPR7(r1) + std r8,GPR8(r1) + /* Zero r9-r12, this should only be required when restoring all GPRs */ + std r11,GPR9(r1) + std r11,GPR10(r1) + std r11,GPR11(r1) + std r11,GPR12(r1) + std r9,GPR13(r1) + SAVE_NVGPRS(r1) + std r11,_XER(r1) + std r11,_LINK(r1) + std r11,_CTR(r1) + + li r11,\trapnr + std r11,_TRAP(r1) + std r12,_CCR(r1) + std r3,ORIG_GPR3(r1) + addi r10,r1,STACK_FRAME_OVERHEAD + ld r11,exception_marker@toc(r2) + std r11,-16(r10) /* "regshere" marker */ + + /* + * RECONCILE_IRQ_STATE without calling trace_hardirqs_off(), which + * would clobber syscall parameters. Also we always enter with IRQs + * enabled and nothing pending. system_call_exception() will call + * trace_hardirqs_off(). + * + * scv enters with MSR[EE]=1, so don't set PACA_IRQ_HARD_DIS. The + * entry vector already sets PACAIRQSOFTMASK to IRQS_ALL_DISABLED. + */ + + /* Calling convention has r9 = orig r0, r10 = regs */ + mr r9,r0 + bl system_call_exception + +.Lsyscall_vectored_\name\()_exit: + addi r4,r1,STACK_FRAME_OVERHEAD + li r5,1 /* scv */ + bl syscall_exit_prepare + + ld r2,_CCR(r1) + ld r4,_NIP(r1) + ld r5,_MSR(r1) + +BEGIN_FTR_SECTION + stdcx. r0,0,r1 /* to clear the reservation */ +END_FTR_SECTION_IFCLR(CPU_FTR_STCX_CHECKS_ADDRESS) + +BEGIN_FTR_SECTION + HMT_MEDIUM_LOW +END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) + + cmpdi r3,0 + bne .Lsyscall_vectored_\name\()_restore_regs + + /* rfscv returns with LR->NIA and CTR->MSR */ + mtlr r4 + mtctr r5 + + /* Could zero these as per ABI, but we may consider a stricter ABI + * which preserves these if libc implementations can benefit, so + * restore them for now until further measurement is done. */ + ld r0,GPR0(r1) + ld r4,GPR4(r1) + ld r5,GPR5(r1) + ld r6,GPR6(r1) + ld r7,GPR7(r1) + ld r8,GPR8(r1) + /* Zero volatile regs that may contain sensitive kernel data */ + li r9,0 + li r10,0 + li r11,0 + li r12,0 + mtspr SPRN_XER,r0 + + /* + * We don't need to restore AMR on the way back to userspace for KUAP. + * The value of AMR only matters while we're in the kernel. + */ + mtcr r2 + ld r2,GPR2(r1) + ld r3,GPR3(r1) + ld r13,GPR13(r1) + ld r1,GPR1(r1) + RFSCV_TO_USER + b . /* prevent speculative execution */ + +.Lsyscall_vectored_\name\()_restore_regs: + li r3,0 + mtmsrd r3,1 + mtspr SPRN_SRR0,r4 + mtspr SPRN_SRR1,r5 + + ld r3,_CTR(r1) + ld r4,_LINK(r1) + ld r5,_XER(r1) + + REST_NVGPRS(r1) + ld r0,GPR0(r1) + mtcr r2 + mtctr r3 + mtlr r4 + mtspr SPRN_XER,r5 + REST_10GPRS(2, r1) + REST_2GPRS(12, r1) + ld r1,GPR1(r1) + RFI_TO_USER +.endm + +system_call_vectored common 0x3000 +/* + * We instantiate another entry copy for the SIGILL variant, with TRAP=0x7ff0 + * which is tested by system_call_exception when r0 is -1 (as set by vector + * entry code). + */ +system_call_vectored sigill 0x7ff0 + + +/* + * Entered via kernel return set up by kernel/sstep.c, must match entry regs + */ + .globl system_call_vectored_emulate +system_call_vectored_emulate: +_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate) + li r10,IRQS_ALL_DISABLED + stb r10,PACAIRQSOFTMASK(r13) + b system_call_vectored_common +#endif + + .balign IFETCH_ALIGN_BYTES .globl system_call_common system_call_common: +_ASM_NOKPROBE_SYMBOL(system_call_common) #ifdef CONFIG_PPC_TRANSACTIONAL_MEM BEGIN_FTR_SECTION extrdi. r10, r12, 1, (63-MSR_TS_T_LG) /* transaction active? */ bne .Ltabort_syscall END_FTR_SECTION_IFSET(CPU_FTR_TM) #endif -_ASM_NOKPROBE_SYMBOL(system_call_common) mr r10,r1 ld r1,PACAKSAVE(r13) std r10,0(r1) @@ -138,6 +296,7 @@ END_BTB_FLUSH_SECTION .Lsyscall_exit: addi r4,r1,STACK_FRAME_OVERHEAD + li r5,0 /* !scv */ bl syscall_exit_prepare ld r2,_CCR(r1) @@ -224,10 +383,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) b . /* prevent speculative execution */ #endif +#ifdef CONFIG_PPC_BOOK3S +_GLOBAL(ret_from_fork_scv) + bl schedule_tail + REST_NVGPRS(r1) + li r3,0 /* fork() return value */ + b .Lsyscall_vectored_common_exit +#endif + _GLOBAL(ret_from_fork) bl schedule_tail REST_NVGPRS(r1) - li r3,0 + li r3,0 /* fork() return value */ b .Lsyscall_exit _GLOBAL(ret_from_kernel_thread) diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 388e34665b4a..f5f24e6c685f 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -756,6 +756,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) * guarantee they will be delivered virtually. Some conditions (see the ISA) * cause exceptions to be delivered in real mode. * + * The scv instructions are a special case. They get a 0x3000 offset applied. + * scv exceptions have unique reentrancy properties, see below. + * * It's impossible to receive interrupts below 0x300 via AIL. * * KVM: None of the virtual exceptions are from the guest. Anything that @@ -765,8 +768,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) * We layout physical memory as follows: * 0x0000 - 0x00ff : Secondary processor spin code * 0x0100 - 0x18ff : Real mode pSeries interrupt vectors - * 0x1900 - 0x3fff : Real mode trampolines - * 0x4000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors + * 0x1900 - 0x2fff : Real mode trampolines + * 0x3000 - 0x58ff : Relon (IR=1,DR=1) mode pSeries interrupt vectors * 0x5900 - 0x6fff : Relon mode trampolines * 0x7000 - 0x7fff : FWNMI data area * 0x8000 - .... : Common interrupt handlers, remaining early @@ -777,8 +780,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) * vectors there. */ OPEN_FIXED_SECTION(real_vectors, 0x0100, 0x1900) -OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x4000) -OPEN_FIXED_SECTION(virt_vectors, 0x4000, 0x5900) +OPEN_FIXED_SECTION(real_trampolines, 0x1900, 0x3000) +OPEN_FIXED_SECTION(virt_vectors, 0x3000, 0x5900) OPEN_FIXED_SECTION(virt_trampolines, 0x5900, 0x7000) #ifdef CONFIG_PPC_POWERNV @@ -814,6 +817,77 @@ USE_FIXED_SECTION(real_vectors) .globl __start_interrupts __start_interrupts: +/** + * Interrupt 0x3000 - System Call Vectored Interrupt (syscall). + * This is a synchronous interrupt invoked with the "scv" instruction. The + * system call does not alter the HV bit, so it is directed to the OS. + * + * Handling: + * scv instructions enter the kernel without changing EE, RI, ME, or HV. + * In particular, this means we can take a maskable interrupt at any point + * in the scv handler, which is unlike any other interrupt. This is solved + * by treating the instruction addresses below __end_interrupts as being + * soft-masked. + * + * AIL-0 mode scv exceptions go to 0x17000-0x17fff, but we set AIL-3 and + * ensure scv is never executed with relocation off, which means AIL-0 + * should never happen. + * + * Before leaving the below __end_interrupts text, at least of the following + * must be true: + * - MSR[PR]=1 (i.e., return to userspace) + * - MSR_EE|MSR_RI is set (no reentrant exceptions) + * - Standard kernel environment is set up (stack, paca, etc) + * + * Call convention: + * + * syscall register convention is in Documentation/powerpc/syscall64-abi.rst + */ +EXC_VIRT_BEGIN(system_call_vectored, 0x3000, 0x1000) + /* SCV 0 */ + mr r9,r13 + GET_PACA(r13) + mflr r11 + mfctr r12 + li r10,IRQS_ALL_DISABLED + stb r10,PACAIRQSOFTMASK(r13) +#ifdef CONFIG_RELOCATABLE + b system_call_vectored_tramp +#else + b system_call_vectored_common +#endif + nop + + /* SCV 1 - 127 */ + .rept 127 + mr r9,r13 + GET_PACA(r13) + mflr r11 + mfctr r12 + li r10,IRQS_ALL_DISABLED + stb r10,PACAIRQSOFTMASK(r13) + li r0,-1 /* cause failure */ +#ifdef CONFIG_RELOCATABLE + b system_call_vectored_sigill_tramp +#else + b system_call_vectored_sigill +#endif + .endr +EXC_VIRT_END(system_call_vectored, 0x3000, 0x1000) + +#ifdef CONFIG_RELOCATABLE +TRAMP_VIRT_BEGIN(system_call_vectored_tramp) + __LOAD_HANDLER(r10, system_call_vectored_common) + mtctr r10 + bctr + +TRAMP_VIRT_BEGIN(system_call_vectored_sigill_tramp) + __LOAD_HANDLER(r10, system_call_vectored_sigill) + mtctr r10 + bctr +#endif + + /* No virt vectors corresponding with 0x0..0x100 */ EXC_VIRT_NONE(0x4000, 0x100) @@ -2963,6 +3037,47 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback) GET_SCRATCH0(r13); hrfid +TRAMP_REAL_BEGIN(rfscv_flush_fallback) + /* system call volatile */ + mr r7,r13 + GET_PACA(r13); + mr r8,r1 + ld r1,PACAKSAVE(r13) + mfctr r9 + ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ + mtctr r11 + DCBT_BOOK3S_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ + + /* order ld/st prior to dcbt stop all streams with flushing */ + sync + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 + bdnz 1b + + mtctr r9 + li r9,0 + li r10,0 + li r11,0 + mr r1,r8 + mr r13,r7 + RFSCV + USE_TEXT_SECTION() MASKED_INTERRUPT MASKED_INTERRUPT hsrr=1 diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 7bb7faf84490..a0c2746f8c11 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1596,6 +1596,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, { struct pt_regs *childregs, *kregs; extern void ret_from_fork(void); + extern void ret_from_fork_scv(void); extern void ret_from_kernel_thread(void); void (*f)(void); unsigned long sp = (unsigned long)task_stack_page(p) + THREAD_SIZE; @@ -1632,7 +1633,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, if (usp) childregs->gpr[1] = usp; p->thread.regs = childregs; - childregs->gpr[3] = 0; /* Result from fork() */ + /* 64s sets this in ret_from_fork */ + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64)) + childregs->gpr[3] = 0; /* Result from fork() */ if (clone_flags & CLONE_SETTLS) { if (!is_32bit_task()) childregs->gpr[13] = tls; @@ -1640,7 +1643,10 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long usp, childregs->gpr[2] = tls; } - f = ret_from_fork; + if (trap_is_scv(regs)) + f = ret_from_fork_scv; + else + f = ret_from_fork; } childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); sp -= STACK_FRAME_OVERHEAD; diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 0ba1ed77dc68..6be430107c6f 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -196,7 +196,10 @@ static void __init configure_exceptions(void) /* Under a PAPR hypervisor, we need hypercalls */ if (firmware_has_feature(FW_FEATURE_SET_MODE)) { /* Enable AIL if possible */ - pseries_enable_reloc_on_exc(); + if (!pseries_enable_reloc_on_exc()) { + init_task.thread.fscr &= ~FSCR_SCV; + cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV; + } /* * Tell the hypervisor that we want our exceptions to diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c index b4143b6ff093..d15a98c758b8 100644 --- a/arch/powerpc/kernel/signal.c +++ b/arch/powerpc/kernel/signal.c @@ -205,8 +205,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka, return; /* error signalled ? */ - if (!(regs->ccr & 0x10000000)) + if (trap_is_scv(regs)) { + /* 32-bit compat mode sign extend? */ + if (!IS_ERR_VALUE(ret)) + return; + ret = -ret; + } else if (!(regs->ccr & 0x10000000)) { return; + } switch (ret) { case ERESTART_RESTARTBLOCK: @@ -239,9 +245,14 @@ static void check_syscall_restart(struct pt_regs *regs, struct k_sigaction *ka, regs->nip -= 4; regs->result = 0; } else { - regs->result = -EINTR; - regs->gpr[3] = EINTR; - regs->ccr |= 0x10000000; + if (trap_is_scv(regs)) { + regs->result = -EINTR; + regs->gpr[3] = -EINTR; + } else { + regs->result = -EINTR; + regs->gpr[3] = EINTR; + regs->ccr |= 0x10000000; + } } } diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c index 79edba3ab312..a783fd324cb0 100644 --- a/arch/powerpc/kernel/syscall_64.c +++ b/arch/powerpc/kernel/syscall_64.c @@ -60,6 +60,11 @@ notrace long system_call_exception(long r3, long r4, long r5, local_irq_enable(); if (unlikely(current_thread_info()->flags & _TIF_SYSCALL_DOTRACE)) { + if (unlikely(regs->trap == 0x7ff0)) { + /* Unsupported scv vector */ + _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); + return regs->gpr[3]; + } /* * We use the return value of do_syscall_trace_enter() as the * syscall number. If the syscall was rejected for any reason @@ -78,6 +83,11 @@ notrace long system_call_exception(long r3, long r4, long r5, r8 = regs->gpr[8]; } else if (unlikely(r0 >= NR_syscalls)) { + if (unlikely(regs->trap == 0x7ff0)) { + /* Unsupported scv vector */ + _exception(SIGILL, regs, ILL_ILLOPC, regs->nip); + return regs->gpr[3]; + } return -ENOSYS; } @@ -105,16 +115,20 @@ notrace long system_call_exception(long r3, long r4, long r5, * local irqs must be disabled. Returns false if the caller must re-enable * them, check for new work, and try again. */ -static notrace inline bool prep_irq_for_enabled_exit(void) +static notrace inline bool prep_irq_for_enabled_exit(bool clear_ri) { /* This must be done with RI=1 because tracing may touch vmaps */ trace_hardirqs_on(); /* This pattern matches prep_irq_for_idle */ - __hard_EE_RI_disable(); + if (clear_ri) + __hard_EE_RI_disable(); + else + __hard_irq_disable(); if (unlikely(lazy_irq_pending_nocheck())) { /* Took an interrupt, may have more exit work to do. */ - __hard_RI_enable(); + if (clear_ri) + __hard_RI_enable(); trace_hardirqs_off(); local_paca->irq_happened |= PACA_IRQ_HARD_DIS; @@ -136,7 +150,8 @@ static notrace inline bool prep_irq_for_enabled_exit(void) * because RI=0 and soft mask state is "unreconciled", so it is marked notrace. */ notrace unsigned long syscall_exit_prepare(unsigned long r3, - struct pt_regs *regs) + struct pt_regs *regs, + long scv) { unsigned long *ti_flagsp = ¤t_thread_info()->flags; unsigned long ti_flags; @@ -151,7 +166,7 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, ti_flags = *ti_flagsp; - if (unlikely(r3 >= (unsigned long)-MAX_ERRNO)) { + if (unlikely(r3 >= (unsigned long)-MAX_ERRNO) && !scv) { if (likely(!(ti_flags & (_TIF_NOERROR | _TIF_RESTOREALL)))) { r3 = -r3; regs->ccr |= 0x10000000; /* Set SO bit in CR */ @@ -211,7 +226,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3, } } - if (unlikely(!prep_irq_for_enabled_exit())) { + /* scv need not set RI=0 because SRRs are not used */ + if (unlikely(!prep_irq_for_enabled_exit(!scv))) { local_irq_enable(); goto again; } @@ -282,7 +298,7 @@ notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned } } - if (unlikely(!prep_irq_for_enabled_exit())) { + if (unlikely(!prep_irq_for_enabled_exit(true))) { local_irq_enable(); local_irq_disable(); goto again; @@ -345,7 +361,7 @@ notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsign } } - if (unlikely(!prep_irq_for_enabled_exit())) { + if (unlikely(!prep_irq_for_enabled_exit(true))) { /* * Can't local_irq_restore to replay if we were in * interrupt context. Must replay directly. diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c index 5abe98216dc2..161bfccbc309 100644 --- a/arch/powerpc/lib/sstep.c +++ b/arch/powerpc/lib/sstep.c @@ -16,6 +16,7 @@ #include <asm/disassemble.h> extern char system_call_common[]; +extern char system_call_vectored_emulate[]; #ifdef CONFIG_PPC64 /* Bits in SRR1 that are copied from MSR */ @@ -1236,6 +1237,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs, case 17: /* sc */ if ((word & 0xfe2) == 2) op->type = SYSCALL; + else if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && + (word & 0xfe3) == 1) + op->type = SYSCALL_VECTORED_0; else op->type = UNKNOWN; return 0; @@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr) regs->msr = MSR_KERNEL; return 1; + case SYSCALL_VECTORED_0: /* scv 0 */ + regs->gpr[9] = regs->gpr[13]; + regs->gpr[10] = MSR_KERNEL; + regs->gpr[11] = regs->nip + 4; + regs->gpr[12] = regs->msr & MSR_MASK; + regs->gpr[13] = (unsigned long) get_paca(); + regs->nip = (unsigned long) &system_call_vectored_emulate; + regs->msr = MSR_KERNEL; + return 1; + case RFI: return -1; #endif diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 2db8469e475f..8c85466e0dd8 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -358,7 +358,7 @@ static void pseries_lpar_idle(void) * to ever be a problem in practice we can move this into a kernel thread to * finish off the process later in boot. */ -void pseries_enable_reloc_on_exc(void) +bool pseries_enable_reloc_on_exc(void) { long rc; unsigned int delay, total_delay = 0; @@ -369,11 +369,13 @@ void pseries_enable_reloc_on_exc(void) if (rc == H_P2) { pr_info("Relocation on exceptions not" " supported\n"); + return false; } else if (rc != H_SUCCESS) { pr_warn("Unable to enable relocation" " on exceptions: %ld\n", rc); + return false; } - break; + return true; } delay = get_longbusy_msecs(rc); @@ -382,7 +384,7 @@ void pseries_enable_reloc_on_exc(void) pr_warn("Warning: Giving up waiting to enable " "relocation on exceptions (%u msec)!\n", total_delay); - return; + return false; } mdelay(delay); diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 7efe4bc3ccf6..3203c3606737 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -1593,6 +1593,7 @@ const char *getvecname(unsigned long vec) case 0x1300: ret = "(Instruction Breakpoint)"; break; case 0x1500: ret = "(Denormalisation)"; break; case 0x1700: ret = "(Altivec Assist)"; break; + case 0x3000: ret = "(System Call Vectored)"; break; default: ret = ""; } return ret; -- 2.23.0 ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions 2020-06-11 8:12 ` [musl] [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin @ 2020-07-23 6:47 ` Michael Ellerman 2020-07-23 16:48 ` Christophe Leroy 0 siblings, 1 reply; 42+ messages in thread From: Michael Ellerman @ 2020-07-23 6:47 UTC (permalink / raw) To: Nicholas Piggin, linuxppc-dev; +Cc: libc-dev, musl, Nicholas Piggin, linux-api Nicholas Piggin <npiggin@gmail.com> writes: > diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h > index 2a39c716c343..b2bdc4de1292 100644 > --- a/arch/powerpc/include/asm/ppc-opcode.h > +++ b/arch/powerpc/include/asm/ppc-opcode.h > @@ -257,6 +257,7 @@ > #define PPC_INST_MFVSRD 0x7c000066 > #define PPC_INST_MTVSRD 0x7c000166 > #define PPC_INST_SC 0x44000002 > +#define PPC_INST_SCV 0x44000001 ... > @@ -411,6 +412,7 @@ ... > +#define __PPC_LEV(l) (((l) & 0x7f) << 5) These conflicted and didn't seem to be used so I dropped them. > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c > index 5abe98216dc2..161bfccbc309 100644 > --- a/arch/powerpc/lib/sstep.c > +++ b/arch/powerpc/lib/sstep.c > @@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs, struct ppc_inst instr) > regs->msr = MSR_KERNEL; > return 1; > > + case SYSCALL_VECTORED_0: /* scv 0 */ > + regs->gpr[9] = regs->gpr[13]; > + regs->gpr[10] = MSR_KERNEL; > + regs->gpr[11] = regs->nip + 4; > + regs->gpr[12] = regs->msr & MSR_MASK; > + regs->gpr[13] = (unsigned long) get_paca(); > + regs->nip = (unsigned long) &system_call_vectored_emulate; > + regs->msr = MSR_KERNEL; > + return 1; > + This broke the ppc64e build: ld: arch/powerpc/lib/sstep.o:(.toc+0x0): undefined reference to `system_call_vectored_emulate' make[1]: *** [/home/michael/linux/Makefile:1139: vmlinux] Error 1 I wrapped it in #ifdef CONFIG_PPC64_BOOK3S. cheers ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions 2020-07-23 6:47 ` [musl] " Michael Ellerman @ 2020-07-23 16:48 ` Christophe Leroy 2020-07-24 10:45 ` Michael Ellerman 0 siblings, 1 reply; 42+ messages in thread From: Christophe Leroy @ 2020-07-23 16:48 UTC (permalink / raw) To: Michael Ellerman; +Cc: linux-api, musl, libc-dev, linuxppc-dev, Nicholas Piggin Michael Ellerman <mpe@ellerman.id.au> a écrit : > Nicholas Piggin <npiggin@gmail.com> writes: >> diff --git a/arch/powerpc/include/asm/ppc-opcode.h >> b/arch/powerpc/include/asm/ppc-opcode.h >> index 2a39c716c343..b2bdc4de1292 100644 >> --- a/arch/powerpc/include/asm/ppc-opcode.h >> +++ b/arch/powerpc/include/asm/ppc-opcode.h >> @@ -257,6 +257,7 @@ >> #define PPC_INST_MFVSRD 0x7c000066 >> #define PPC_INST_MTVSRD 0x7c000166 >> #define PPC_INST_SC 0x44000002 >> +#define PPC_INST_SCV 0x44000001 > ... >> @@ -411,6 +412,7 @@ > ... >> +#define __PPC_LEV(l) (((l) & 0x7f) << 5) > > These conflicted and didn't seem to be used so I dropped them. > >> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c >> index 5abe98216dc2..161bfccbc309 100644 >> --- a/arch/powerpc/lib/sstep.c >> +++ b/arch/powerpc/lib/sstep.c >> @@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs, >> struct ppc_inst instr) >> regs->msr = MSR_KERNEL; >> return 1; >> >> + case SYSCALL_VECTORED_0: /* scv 0 */ >> + regs->gpr[9] = regs->gpr[13]; >> + regs->gpr[10] = MSR_KERNEL; >> + regs->gpr[11] = regs->nip + 4; >> + regs->gpr[12] = regs->msr & MSR_MASK; >> + regs->gpr[13] = (unsigned long) get_paca(); >> + regs->nip = (unsigned long) &system_call_vectored_emulate; >> + regs->msr = MSR_KERNEL; >> + return 1; >> + > > This broke the ppc64e build: > > ld: arch/powerpc/lib/sstep.o:(.toc+0x0): undefined reference to > `system_call_vectored_emulate' > make[1]: *** [/home/michael/linux/Makefile:1139: vmlinux] Error 1 > > I wrapped it in #ifdef CONFIG_PPC64_BOOK3S. You mean CONFIG_PPC_BOOK3S_64 ? Christophe > > cheers ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions 2020-07-23 16:48 ` Christophe Leroy @ 2020-07-24 10:45 ` Michael Ellerman 0 siblings, 0 replies; 42+ messages in thread From: Michael Ellerman @ 2020-07-24 10:45 UTC (permalink / raw) To: Christophe Leroy; +Cc: linux-api, musl, libc-dev, linuxppc-dev, Nicholas Piggin Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Michael Ellerman <mpe@ellerman.id.au> a écrit : > >> Nicholas Piggin <npiggin@gmail.com> writes: >>> diff --git a/arch/powerpc/include/asm/ppc-opcode.h >>> b/arch/powerpc/include/asm/ppc-opcode.h >>> index 2a39c716c343..b2bdc4de1292 100644 >>> --- a/arch/powerpc/include/asm/ppc-opcode.h >>> +++ b/arch/powerpc/include/asm/ppc-opcode.h >>> @@ -257,6 +257,7 @@ >>> #define PPC_INST_MFVSRD 0x7c000066 >>> #define PPC_INST_MTVSRD 0x7c000166 >>> #define PPC_INST_SC 0x44000002 >>> +#define PPC_INST_SCV 0x44000001 >> ... >>> @@ -411,6 +412,7 @@ >> ... >>> +#define __PPC_LEV(l) (((l) & 0x7f) << 5) >> >> These conflicted and didn't seem to be used so I dropped them. >> >>> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c >>> index 5abe98216dc2..161bfccbc309 100644 >>> --- a/arch/powerpc/lib/sstep.c >>> +++ b/arch/powerpc/lib/sstep.c >>> @@ -3378,6 +3382,16 @@ int emulate_step(struct pt_regs *regs, >>> struct ppc_inst instr) >>> regs->msr = MSR_KERNEL; >>> return 1; >>> >>> + case SYSCALL_VECTORED_0: /* scv 0 */ >>> + regs->gpr[9] = regs->gpr[13]; >>> + regs->gpr[10] = MSR_KERNEL; >>> + regs->gpr[11] = regs->nip + 4; >>> + regs->gpr[12] = regs->msr & MSR_MASK; >>> + regs->gpr[13] = (unsigned long) get_paca(); >>> + regs->nip = (unsigned long) &system_call_vectored_emulate; >>> + regs->msr = MSR_KERNEL; >>> + return 1; >>> + >> >> This broke the ppc64e build: >> >> ld: arch/powerpc/lib/sstep.o:(.toc+0x0): undefined reference to >> `system_call_vectored_emulate' >> make[1]: *** [/home/michael/linux/Makefile:1139: vmlinux] Error 1 >> >> I wrapped it in #ifdef CONFIG_PPC64_BOOK3S. > > You mean CONFIG_PPC_BOOK3S_64 ? I hope so ... #### ## ####. Will send a fixup. Thanks for noticing. cheers ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2020-06-11 8:12 [musl] Linux powerpc new system call instruction and ABI Nicholas Piggin 2020-06-11 8:12 ` [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin 2020-06-11 8:12 ` [musl] [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin @ 2020-06-11 21:02 ` Segher Boessenkool 2020-06-14 9:26 ` Nicholas Piggin 2021-05-18 23:13 ` Dmitry V. Levin 3 siblings, 1 reply; 42+ messages in thread From: Segher Boessenkool @ 2020-06-11 21:02 UTC (permalink / raw) To: Nicholas Piggin; +Cc: linuxppc-dev, libc-dev, musl, linux-api Hi! On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: > Calling convention > ------------------ > The proposal is for scv 0 to provide the standard Linux system call ABI > with the following differences from sc convention[1]: > > - lr is to be volatile across scv calls. This is necessary because the > scv instruction clobbers lr. From previous discussion, this should be > possible to deal with in GCC clobbers and CFI. > > - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the > kernel system call exit to avoid restoring the volatile cr registers > (although we probably still would anyway to avoid information leaks). > > - Error handling: The consensus among kernel, glibc, and musl is to move to > using negative return values in r3 rather than CR0[SO]=1 to indicate error, > which matches most other architectures, and is closer to a function call. What about cr0 then? Will it be volatile as well (exactly like for function calls)? > Notes > ----- > - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as > submitted currently preserves them. This is to leave room for deciding > which way to go with these. The kernel has to set it to *something* that doesn't leak information ;-) Segher ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2020-06-11 21:02 ` [musl] Re: Linux powerpc new system call instruction and ABI Segher Boessenkool @ 2020-06-14 9:26 ` Nicholas Piggin 0 siblings, 0 replies; 42+ messages in thread From: Nicholas Piggin @ 2020-06-14 9:26 UTC (permalink / raw) To: Segher Boessenkool; +Cc: libc-dev, linux-api, linuxppc-dev, musl Excerpts from Segher Boessenkool's message of June 12, 2020 7:02 am: > Hi! > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >> Calling convention >> ------------------ >> The proposal is for scv 0 to provide the standard Linux system call ABI >> with the following differences from sc convention[1]: >> >> - lr is to be volatile across scv calls. This is necessary because the >> scv instruction clobbers lr. From previous discussion, this should be >> possible to deal with in GCC clobbers and CFI. >> >> - cr1 and cr5-cr7 are volatile. This matches the C ABI and would allow the >> kernel system call exit to avoid restoring the volatile cr registers >> (although we probably still would anyway to avoid information leaks). >> >> - Error handling: The consensus among kernel, glibc, and musl is to move to >> using negative return values in r3 rather than CR0[SO]=1 to indicate error, >> which matches most other architectures, and is closer to a function call. > > What about cr0 then? Will it be volatile as well (exactly like for > function calls)? Yes, same as for sc (except for SO bit). Which is a bit unclear in this section. >> Notes >> ----- >> - r0,r4-r8 are documented as volatile in the ABI, but the kernel patch as >> submitted currently preserves them. This is to leave room for deciding >> which way to go with these. > > The kernel has to set it to *something* that doesn't leak information ;-) For "sc" system calls these were defined as volatile (and used to just leak information), so now we just zero them. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2020-06-11 8:12 [musl] Linux powerpc new system call instruction and ABI Nicholas Piggin ` (2 preceding siblings ...) 2020-06-11 21:02 ` [musl] Re: Linux powerpc new system call instruction and ABI Segher Boessenkool @ 2021-05-18 23:13 ` Dmitry V. Levin 2021-05-19 2:50 ` Nicholas Piggin 2021-05-19 7:33 ` Joakim Tjernlund 3 siblings, 2 replies; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-18 23:13 UTC (permalink / raw) To: Nicholas Piggin, Michael Ellerman; +Cc: linuxppc-dev, musl, libc-dev, linux-api Hi, On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: [...] > - Error handling: The consensus among kernel, glibc, and musl is to move to > using negative return values in r3 rather than CR0[SO]=1 to indicate error, > which matches most other architectures, and is closer to a function call. Apparently, the patchset merged by commit v5.9-rc1~100^2~164 was incomplete: all functions defined in arch/powerpc/include/asm/ptrace.h and arch/powerpc/include/asm/syscall.h that use ccr are broken when scv is used. This includes syscall_get_error() and all its users including PTRACE_GET_SYSCALL_INFO API, which in turn makes strace unusable when scv is used. See also https://bugzilla.redhat.com/1929836 -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-18 23:13 ` Dmitry V. Levin @ 2021-05-19 2:50 ` Nicholas Piggin 2021-05-19 5:01 ` Nicholas Piggin 2021-05-19 10:24 ` Dmitry V. Levin 2021-05-19 7:33 ` Joakim Tjernlund 1 sibling, 2 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 2:50 UTC (permalink / raw) To: Dmitry V. Levin, Michael Ellerman Cc: libc-dev, linux-api, linuxppc-dev, musl, Matheus Castanho, libc-alpha Excerpts from Dmitry V. Levin's message of May 19, 2021 9:13 am: > Hi, > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: > [...] >> - Error handling: The consensus among kernel, glibc, and musl is to move to >> using negative return values in r3 rather than CR0[SO]=1 to indicate error, >> which matches most other architectures, and is closer to a function call. > > Apparently, the patchset merged by commit v5.9-rc1~100^2~164 was > incomplete: all functions defined in arch/powerpc/include/asm/ptrace.h and > arch/powerpc/include/asm/syscall.h that use ccr are broken when scv is used. > This includes syscall_get_error() and all its users including > PTRACE_GET_SYSCALL_INFO API, which in turn makes strace unusable > when scv is used. > > See also https://bugzilla.redhat.com/1929836 I see, thanks. Using latest strace from github.com, the attached kernel patch makes strace -k check results a lot greener. Some of the remaining failing tests look like this (I didn't look at all of them yet): signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) write(1, "signal(SIGUSR1, 0xfacefeeddeadbe"..., 50signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) ) = 50 signal(SIGUSR1, SIG_IGN) = 0xfacefeeddeadbeef write(2, "errno2name.c:461: unknown errno "..., 41errno2name.c:461: unknown errno 559038737) = 41 write(2, ": Unknown error 559038737\n", 26: Unknown error 559038737 ) = 26 exit_group(1) = ? I think the problem is glibc testing for -ve, but it should be comparing against -4095 (+cc Matheus) #define RET_SCV \ cmpdi r3,0; \ bgelr+; \ neg r3,r3; With this patch, I think the ptrace ABI should mostly be fixed. I think a problem remains with applications that look at system call return registers directly and have powerpc specific error cases. Those probably will just need to be updated unfortunately. Michael thought it might be possible to return an indication via ptrace somehow that the syscall is using a new ABI, so such apps can be updated to test for it. I don't know how that would be done. Thanks, Nick -- diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h index 9c9ab2746168..b476a685f066 100644 --- a/arch/powerpc/include/asm/ptrace.h +++ b/arch/powerpc/include/asm/ptrace.h @@ -19,6 +19,7 @@ #ifndef _ASM_POWERPC_PTRACE_H #define _ASM_POWERPC_PTRACE_H +#include <linux/err.h> #include <uapi/asm/ptrace.h> #include <asm/asm-const.h> @@ -152,25 +153,6 @@ extern unsigned long profile_pc(struct pt_regs *regs); long do_syscall_trace_enter(struct pt_regs *regs); void do_syscall_trace_leave(struct pt_regs *regs); -#define kernel_stack_pointer(regs) ((regs)->gpr[1]) -static inline int is_syscall_success(struct pt_regs *regs) -{ - return !(regs->ccr & 0x10000000); -} - -static inline long regs_return_value(struct pt_regs *regs) -{ - if (is_syscall_success(regs)) - return regs->gpr[3]; - else - return -regs->gpr[3]; -} - -static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc) -{ - regs->gpr[3] = rc; -} - #ifdef __powerpc64__ #define user_mode(regs) ((((regs)->msr) >> MSR_PR_LG) & 0x1) #else @@ -235,6 +217,31 @@ static __always_inline void set_trap_norestart(struct pt_regs *regs) regs->trap |= 0x1; } +#define kernel_stack_pointer(regs) ((regs)->gpr[1]) +static inline int is_syscall_success(struct pt_regs *regs) +{ + if (trap_is_scv(regs)) + return !IS_ERR_VALUE((unsigned long)regs->gpr[3]); + else + return !(regs->ccr & 0x10000000); +} + +static inline long regs_return_value(struct pt_regs *regs) +{ + if (trap_is_scv(regs)) + return regs->gpr[3]; + + if (is_syscall_success(regs)) + return regs->gpr[3]; + else + return -regs->gpr[3]; +} + +static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc) +{ + regs->gpr[3] = rc; +} + #define arch_has_single_step() (1) #define arch_has_block_step() (true) #define ARCH_HAS_USER_SINGLE_STEP_REPORT diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h index fd1b518eed17..e8b40149bf7e 100644 --- a/arch/powerpc/include/asm/syscall.h +++ b/arch/powerpc/include/asm/syscall.h @@ -41,11 +41,20 @@ static inline void syscall_rollback(struct task_struct *task, static inline long syscall_get_error(struct task_struct *task, struct pt_regs *regs) { - /* - * If the system call failed, - * regs->gpr[3] contains a positive ERRORCODE. - */ - return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0; + if (trap_is_scv(regs)) { + unsigned long error = regs->gpr[3]; + + if (task_is_32bit(task)) + error = (long)(int)error; + + return IS_ERR_VALUE(error) ? error : 0; + } else { + /* + * If the system call failed, + * regs->gpr[3] contains a positive ERRORCODE. + */ + return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0; + } } static inline long syscall_get_return_value(struct task_struct *task, @@ -58,18 +67,22 @@ static inline void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs, int error, long val) { - /* - * In the general case it's not obvious that we must deal with CCR - * here, as the syscall exit path will also do that for us. However - * there are some places, eg. the signal code, which check ccr to - * decide if the value in r3 is actually an error. - */ - if (error) { - regs->ccr |= 0x10000000L; - regs->gpr[3] = error; + if (trap_is_scv(regs)) { + regs->gpr[3] = (long) error ?: val; } else { - regs->ccr &= ~0x10000000L; - regs->gpr[3] = val; + /* + * In the general case it's not obvious that we must deal with + * CCR here, as the syscall exit path will also do that for us. + * However there are some places, eg. the signal code, which + * check ccr to decide if the value in r3 is actually an error. + */ + if (error) { + regs->ccr |= 0x10000000L; + regs->gpr[3] = error; + } else { + regs->ccr &= ~0x10000000L; + regs->gpr[3] = val; + } } } diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h index b4ec6c7dd72e..c4bb9f305eaf 100644 --- a/arch/powerpc/include/asm/thread_info.h +++ b/arch/powerpc/include/asm/thread_info.h @@ -165,8 +165,10 @@ static inline bool test_thread_local_flags(unsigned int flags) #ifdef CONFIG_COMPAT #define is_32bit_task() (test_thread_flag(TIF_32BIT)) +#define task_is_32bit(task) (test_tsk_thread_flag(task, TIF_32BIT)) #else #define is_32bit_task() (IS_ENABLED(CONFIG_PPC32)) +#define task_is_32bit(task) (IS_ENABLED(CONFIG_PPC32)) #endif #if defined(CONFIG_PPC64) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 2:50 ` Nicholas Piggin @ 2021-05-19 5:01 ` Nicholas Piggin 2021-05-21 19:40 ` Matheus Castanho 2021-05-19 10:24 ` Dmitry V. Levin 1 sibling, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 5:01 UTC (permalink / raw) To: Dmitry V. Levin, Michael Ellerman Cc: libc-alpha, libc-dev, linux-api, linuxppc-dev, Matheus Castanho, musl Excerpts from Nicholas Piggin's message of May 19, 2021 12:50 pm: > Excerpts from Dmitry V. Levin's message of May 19, 2021 9:13 am: >> Hi, >> >> On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >> [...] >>> - Error handling: The consensus among kernel, glibc, and musl is to move to >>> using negative return values in r3 rather than CR0[SO]=1 to indicate error, >>> which matches most other architectures, and is closer to a function call. >> >> Apparently, the patchset merged by commit v5.9-rc1~100^2~164 was >> incomplete: all functions defined in arch/powerpc/include/asm/ptrace.h and >> arch/powerpc/include/asm/syscall.h that use ccr are broken when scv is used. >> This includes syscall_get_error() and all its users including >> PTRACE_GET_SYSCALL_INFO API, which in turn makes strace unusable >> when scv is used. >> >> See also https://bugzilla.redhat.com/1929836 > > I see, thanks. Using latest strace from github.com, the attached kernel > patch makes strace -k check results a lot greener. > > Some of the remaining failing tests look like this (I didn't look at all > of them yet): > > signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) > write(1, "signal(SIGUSR1, 0xfacefeeddeadbe"..., 50signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) > ) = 50 > signal(SIGUSR1, SIG_IGN) = 0xfacefeeddeadbeef > write(2, "errno2name.c:461: unknown errno "..., 41errno2name.c:461: unknown errno 559038737) = 41 > write(2, ": Unknown error 559038737\n", 26: Unknown error 559038737 > ) = 26 > exit_group(1) = ? > > I think the problem is glibc testing for -ve, but it should be comparing > against -4095 (+cc Matheus) > > #define RET_SCV \ > cmpdi r3,0; \ > bgelr+; \ > neg r3,r3; This glibc patch at least gets that signal test working. Haven't run the full suite yet because of trouble making it work with a local glibc install... Thanks, Nick --- diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h index c57bb1c05d..1ea4c3b917 100644 --- a/sysdeps/powerpc/powerpc64/sysdep.h +++ b/sysdeps/powerpc/powerpc64/sysdep.h @@ -398,8 +398,9 @@ LT_LABELSUFFIX(name,_name_end): ; \ #endif #define RET_SCV \ - cmpdi r3,0; \ - bgelr+; \ + li r9,-4095; \ + cmpld r3,r9; \ + bltlr+; \ neg r3,r3; #define RET_SC \ ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 5:01 ` Nicholas Piggin @ 2021-05-21 19:40 ` Matheus Castanho 2021-05-21 19:52 ` Florian Weimer 0 siblings, 1 reply; 42+ messages in thread From: Matheus Castanho @ 2021-05-21 19:40 UTC (permalink / raw) To: Nicholas Piggin Cc: Dmitry V. Levin, Michael Ellerman, libc-alpha, libc-dev, linux-api, linuxppc-dev, musl Nicholas Piggin <npiggin@gmail.com> writes: > Excerpts from Nicholas Piggin's message of May 19, 2021 12:50 pm: >> Excerpts from Dmitry V. Levin's message of May 19, 2021 9:13 am: >>> Hi, >>> >>> On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >>> [...] >>>> - Error handling: The consensus among kernel, glibc, and musl is to move to >>>> using negative return values in r3 rather than CR0[SO]=1 to indicate error, >>>> which matches most other architectures, and is closer to a function call. >>> >>> Apparently, the patchset merged by commit v5.9-rc1~100^2~164 was >>> incomplete: all functions defined in arch/powerpc/include/asm/ptrace.h and >>> arch/powerpc/include/asm/syscall.h that use ccr are broken when scv is used. >>> This includes syscall_get_error() and all its users including >>> PTRACE_GET_SYSCALL_INFO API, which in turn makes strace unusable >>> when scv is used. >>> >>> See also https://bugzilla.redhat.com/1929836 >> >> I see, thanks. Using latest strace from github.com, the attached kernel >> patch makes strace -k check results a lot greener. >> >> Some of the remaining failing tests look like this (I didn't look at all >> of them yet): >> >> signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) >> write(1, "signal(SIGUSR1, 0xfacefeeddeadbe"..., 50signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) >> ) = 50 >> signal(SIGUSR1, SIG_IGN) = 0xfacefeeddeadbeef >> write(2, "errno2name.c:461: unknown errno "..., 41errno2name.c:461: unknown errno 559038737) = 41 >> write(2, ": Unknown error 559038737\n", 26: Unknown error 559038737 >> ) = 26 >> exit_group(1) = ? >> >> I think the problem is glibc testing for -ve, but it should be comparing >> against -4095 (+cc Matheus) >> >> #define RET_SCV \ >> cmpdi r3,0; \ >> bgelr+; \ >> neg r3,r3; > > This glibc patch at least gets that signal test working. Haven't run the > full suite yet because of trouble making it work with a local glibc > install... > > Thanks, > Nick > > --- > > diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h > index c57bb1c05d..1ea4c3b917 100644 > --- a/sysdeps/powerpc/powerpc64/sysdep.h > +++ b/sysdeps/powerpc/powerpc64/sysdep.h > @@ -398,8 +398,9 @@ LT_LABELSUFFIX(name,_name_end): ; \ > #endif > > #define RET_SCV \ > - cmpdi r3,0; \ > - bgelr+; \ > + li r9,-4095; \ > + cmpld r3,r9; \ > + bltlr+; \ > neg r3,r3; > > #define RET_SC \ Hi Nick, I agree the current code is accepting more values as errors than it should. This change looks good to me. All glibc tests are passing with it. I also built strace and checked one of the failing tests against a glibc with your patch: ~/src/strace/tests$ uname -r 5.10.16-1-default ~/src/strace/tests$ /lib64/libc.so.6 GNU C Library (GNU libc) release release version 2.33 (git 9826b03b74). [...] ~/src/strace/tests$ ./signal.gen.test errno2name.c:461: unknown errno 559038737: Unknown error 559038737 signal.gen.test: failed test: ../signal failed with code 1 ~/src/strace/tests$ ./signal signal(SIGUSR1, SIG_IGN) = 0 (SIG_DFL) signal(SIGUSR1, SIG_DFL) = 0x1 (SIG_IGN) signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) errno2name.c:461: unknown errno 559038737: Unknown error 559038737 Running with glibc containing the patch: ~/src/strace/tests$ ~/build/glibc/testrun.sh ./signal signal(SIGUSR1, SIG_IGN) = 0 (SIG_DFL) signal(SIGUSR1, SIG_DFL) = 0x1 (SIG_IGN) signal(SIGUSR1, 0xfacefeeddeadbeef) = 0 (SIG_DFL) signal(SIGUSR1, SIG_IGN) = 0xfacefeeddeadbeef signal(-559038737, SIG_IGN) = -1 EINVAL (Invalid argument) +++ exited with 0 +++ If the patch below looks OK to you and no one objects, I'll commit it to glibc on Monday. Thanks, Matheus Castanho --- From: Nicholas Piggin <npiggin@gmail.com> Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes When using scv on templated ASM syscalls, current code interprets any negative return value as error, but the only valid error codes are in the range -4095..-1 according to the ABI. Reviewed-by: Matheus Castanho <msc@linux.ibm.com> --- sysdeps/powerpc/powerpc64/sysdep.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sysdeps/powerpc/powerpc64/sysdep.h b/sysdeps/powerpc/powerpc64/sysdep.h index c57bb1c05d..1ea4c3b917 100644 --- a/sysdeps/powerpc/powerpc64/sysdep.h +++ b/sysdeps/powerpc/powerpc64/sysdep.h @@ -398,8 +398,9 @@ LT_LABELSUFFIX(name,_name_end): ; \ #endif #define RET_SCV \ - cmpdi r3,0; \ - bgelr+; \ + li r9,-4095; \ + cmpld r3,r9; \ + bltlr+; \ neg r3,r3; #define RET_SC \ -- 2.31.1 ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-21 19:40 ` Matheus Castanho @ 2021-05-21 19:52 ` Florian Weimer 2021-05-21 20:00 ` Matheus Castanho 0 siblings, 1 reply; 42+ messages in thread From: Florian Weimer @ 2021-05-21 19:52 UTC (permalink / raw) To: Matheus Castanho via Libc-alpha Cc: Nicholas Piggin, Matheus Castanho, musl, Dmitry V. Levin, linux-api, libc-dev, linuxppc-dev * Matheus Castanho via Libc-alpha: > From: Nicholas Piggin <npiggin@gmail.com> > Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes > > When using scv on templated ASM syscalls, current code interprets any > negative return value as error, but the only valid error codes are in > the range -4095..-1 according to the ABI. > > Reviewed-by: Matheus Castanho <msc@linux.ibm.com> Please reference bug 27892 in the commit message. I'd also appreciate a backport to the 2.33 release branch (where you need to add NEWS manually to add the bug reference). Thanks, Florian ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-21 19:52 ` Florian Weimer @ 2021-05-21 20:00 ` Matheus Castanho 2021-05-21 20:52 ` Dmitry V. Levin 0 siblings, 1 reply; 42+ messages in thread From: Matheus Castanho @ 2021-05-21 20:00 UTC (permalink / raw) To: Florian Weimer Cc: Matheus Castanho via Libc-alpha, Nicholas Piggin, musl, Dmitry V. Levin, linux-api, libc-dev, linuxppc-dev Florian Weimer <fweimer@redhat.com> writes: > * Matheus Castanho via Libc-alpha: > >> From: Nicholas Piggin <npiggin@gmail.com> >> Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes >> >> When using scv on templated ASM syscalls, current code interprets any >> negative return value as error, but the only valid error codes are in >> the range -4095..-1 according to the ABI. >> >> Reviewed-by: Matheus Castanho <msc@linux.ibm.com> > > Please reference bug 27892 in the commit message. I'd also appreciate a > backport to the 2.33 release branch (where you need to add NEWS manually > to add the bug reference). No problem. [BZ #27892] appended to the commit title. I'll make sure to backport to 2.33 as well. Thanks -- Matheus Castanho ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-21 20:00 ` Matheus Castanho @ 2021-05-21 20:52 ` Dmitry V. Levin 2021-05-24 12:11 ` Matheus Castanho 0 siblings, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-21 20:52 UTC (permalink / raw) To: Matheus Castanho Cc: Florian Weimer, Matheus Castanho via Libc-alpha, Nicholas Piggin, musl, linux-api, libc-dev, linuxppc-dev On Fri, May 21, 2021 at 05:00:36PM -0300, Matheus Castanho wrote: > Florian Weimer <fweimer@redhat.com> writes: > > * Matheus Castanho via Libc-alpha: > >> From: Nicholas Piggin <npiggin@gmail.com> > >> Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes > >> > >> When using scv on templated ASM syscalls, current code interprets any > >> negative return value as error, but the only valid error codes are in > >> the range -4095..-1 according to the ABI. > >> > >> Reviewed-by: Matheus Castanho <msc@linux.ibm.com> > > > > Please reference bug 27892 in the commit message. I'd also appreciate a > > backport to the 2.33 release branch (where you need to add NEWS manually > > to add the bug reference). > > No problem. [BZ #27892] appended to the commit title. I'll make sure to > backport to 2.33 as well. Could you also mention in the commit message that the change fixes 'signal.gen.test' strace test where it was observed initially? -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-21 20:52 ` Dmitry V. Levin @ 2021-05-24 12:11 ` Matheus Castanho 2021-05-24 20:33 ` Matheus Castanho 0 siblings, 1 reply; 42+ messages in thread From: Matheus Castanho @ 2021-05-24 12:11 UTC (permalink / raw) To: Dmitry V. Levin Cc: Florian Weimer, Matheus Castanho via Libc-alpha, Nicholas Piggin, musl, linux-api, libc-dev, linuxppc-dev Dmitry V. Levin <ldv@altlinux.org> writes: > On Fri, May 21, 2021 at 05:00:36PM -0300, Matheus Castanho wrote: >> Florian Weimer <fweimer@redhat.com> writes: >> > * Matheus Castanho via Libc-alpha: >> >> From: Nicholas Piggin <npiggin@gmail.com> >> >> Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes >> >> >> >> When using scv on templated ASM syscalls, current code interprets any >> >> negative return value as error, but the only valid error codes are in >> >> the range -4095..-1 according to the ABI. >> >> >> >> Reviewed-by: Matheus Castanho <msc@linux.ibm.com> >> > >> > Please reference bug 27892 in the commit message. I'd also appreciate a >> > backport to the 2.33 release branch (where you need to add NEWS manually >> > to add the bug reference). >> >> No problem. [BZ #27892] appended to the commit title. I'll make sure to >> backport to 2.33 as well. > > Could you also mention in the commit message that the change fixes > 'signal.gen.test' strace test where it was observed initially? Sure, no problem. I'll commit it later today. -- Matheus Castanho ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-24 12:11 ` Matheus Castanho @ 2021-05-24 20:33 ` Matheus Castanho 0 siblings, 0 replies; 42+ messages in thread From: Matheus Castanho @ 2021-05-24 20:33 UTC (permalink / raw) To: Matheus Castanho Cc: Dmitry V. Levin, Florian Weimer, Matheus Castanho via Libc-alpha, linux-api, musl, Nicholas Piggin, libc-dev, linuxppc-dev Matheus Castanho <msc@linux.ibm.com> writes: > Dmitry V. Levin <ldv@altlinux.org> writes: > >> On Fri, May 21, 2021 at 05:00:36PM -0300, Matheus Castanho wrote: >>> Florian Weimer <fweimer@redhat.com> writes: >>> > * Matheus Castanho via Libc-alpha: >>> >> From: Nicholas Piggin <npiggin@gmail.com> >>> >> Subject: [PATCH 1/1] powerpc: Fix handling of scv return error codes >>> >> >>> >> When using scv on templated ASM syscalls, current code interprets any >>> >> negative return value as error, but the only valid error codes are in >>> >> the range -4095..-1 according to the ABI. >>> >> >>> >> Reviewed-by: Matheus Castanho <msc@linux.ibm.com> >>> > >>> > Please reference bug 27892 in the commit message. I'd also appreciate a >>> > backport to the 2.33 release branch (where you need to add NEWS manually >>> > to add the bug reference). >>> >>> No problem. [BZ #27892] appended to the commit title. I'll make sure to >>> backport to 2.33 as well. >> >> Could you also mention in the commit message that the change fixes >> 'signal.gen.test' strace test where it was observed initially? > > Sure, no problem. I'll commit it later today. Since the patch falls into the less-than-15-LOC category and this is Nick's first contribution to glibc, looks like he doesn't need a copyright assignment. Pushed to master as 7de36744ee1325f35d3fe0ca079dd33c40b12267 Backported to 2.33 via commit 0ef0e6de7fdfa18328b09ba2afb4f0112d4bdab4 Thanks, Matheus Castanho ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 2:50 ` Nicholas Piggin 2021-05-19 5:01 ` Nicholas Piggin @ 2021-05-19 10:24 ` Dmitry V. Levin 2021-05-19 10:59 ` Nicholas Piggin 1 sibling, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-19 10:24 UTC (permalink / raw) To: Nicholas Piggin Cc: Michael Ellerman, libc-dev, linux-api, linuxppc-dev, musl, Matheus Castanho, libc-alpha On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: [...] > With this patch, I think the ptrace ABI should mostly be fixed. I think > a problem remains with applications that look at system call return > registers directly and have powerpc specific error cases. Those probably > will just need to be updated unfortunately. Michael thought it might be > possible to return an indication via ptrace somehow that the syscall is > using a new ABI, so such apps can be updated to test for it. I don't > know how that would be done. Is there any sane way for these applications to handle the scv case? How can they tell that the scv semantics is being used for the given syscall invocation? Can this information be obtained e.g. from struct pt_regs? For example, in strace we have the following powerpc-specific code used for syscall tampering: $ cat src/linux/powerpc/set_error.c /* * Copyright (c) 2016-2021 The strace developers. * All rights reserved. * * SPDX-License-Identifier: LGPL-2.1-or-later */ static int arch_set_r3_ccr(struct tcb *tcp, const unsigned long r3, const unsigned long ccr_set, const unsigned long ccr_clear) { if (ptrace_syscall_info_is_valid() && upeek(tcp, sizeof(long) * PT_CCR, &ppc_regs.ccr)) return -1; const unsigned long old_ccr = ppc_regs.ccr; ppc_regs.gpr[3] = r3; ppc_regs.ccr |= ccr_set; ppc_regs.ccr &= ~ccr_clear; if (ppc_regs.ccr != old_ccr && upoke(tcp, sizeof(long) * PT_CCR, ppc_regs.ccr)) return -1; return upoke(tcp, sizeof(long) * (PT_R0 + 3), ppc_regs.gpr[3]); } static int arch_set_error(struct tcb *tcp) { return arch_set_r3_ccr(tcp, tcp->u_error, 0x10000000, 0); } static int arch_set_success(struct tcb *tcp) { return arch_set_r3_ccr(tcp, tcp->u_rval, 0, 0x10000000); } -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 10:24 ` Dmitry V. Levin @ 2021-05-19 10:59 ` Nicholas Piggin 2021-05-19 12:39 ` Tulio Magno Quites Machado Filho 2021-05-19 13:26 ` Dmitry V. Levin 0 siblings, 2 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 10:59 UTC (permalink / raw) To: Dmitry V. Levin Cc: libc-alpha, libc-dev, linux-api, linuxppc-dev, Michael Ellerman, Matheus Castanho, musl Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm: > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: > [...] >> With this patch, I think the ptrace ABI should mostly be fixed. I think >> a problem remains with applications that look at system call return >> registers directly and have powerpc specific error cases. Those probably >> will just need to be updated unfortunately. Michael thought it might be >> possible to return an indication via ptrace somehow that the syscall is >> using a new ABI, so such apps can be updated to test for it. I don't >> know how that would be done. > > Is there any sane way for these applications to handle the scv case? > How can they tell that the scv semantics is being used for the given > syscall invocation? Can this information be obtained e.g. from struct > pt_regs? Not that I know of. Michael suggested there might be a way to add something. ptrace_syscall_info has some pad bytes, could we use one for flags bits and set a bit for "new system call ABI"? As a more hacky thing you could make a syscall with -1 and see how the error looks, and then assume all syscalls will be the same. Thanks, Nick > > For example, in strace we have the following powerpc-specific code used > for syscall tampering: > > $ cat src/linux/powerpc/set_error.c > /* > * Copyright (c) 2016-2021 The strace developers. > * All rights reserved. > * > * SPDX-License-Identifier: LGPL-2.1-or-later > */ > > static int > arch_set_r3_ccr(struct tcb *tcp, const unsigned long r3, > const unsigned long ccr_set, const unsigned long ccr_clear) > { > if (ptrace_syscall_info_is_valid() && > upeek(tcp, sizeof(long) * PT_CCR, &ppc_regs.ccr)) > return -1; > const unsigned long old_ccr = ppc_regs.ccr; > ppc_regs.gpr[3] = r3; > ppc_regs.ccr |= ccr_set; > ppc_regs.ccr &= ~ccr_clear; > if (ppc_regs.ccr != old_ccr && > upoke(tcp, sizeof(long) * PT_CCR, ppc_regs.ccr)) > return -1; > return upoke(tcp, sizeof(long) * (PT_R0 + 3), ppc_regs.gpr[3]); > } > > static int > arch_set_error(struct tcb *tcp) > { > return arch_set_r3_ccr(tcp, tcp->u_error, 0x10000000, 0); > } > > static int > arch_set_success(struct tcb *tcp) > { > return arch_set_r3_ccr(tcp, tcp->u_rval, 0, 0x10000000); > } > > > -- > ldv > ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 10:59 ` Nicholas Piggin @ 2021-05-19 12:39 ` Tulio Magno Quites Machado Filho 2021-05-19 13:26 ` Dmitry V. Levin 1 sibling, 0 replies; 42+ messages in thread From: Tulio Magno Quites Machado Filho @ 2021-05-19 12:39 UTC (permalink / raw) To: Nicholas Piggin, Dmitry V. Levin Cc: libc-alpha, musl, linux-api, libc-dev, linuxppc-dev Nicholas Piggin via Libc-alpha <libc-alpha@sourceware.org> writes: > As a more hacky thing you could make a syscall with -1 and see how > the error looks, and then assume all syscalls will be the same. I'm not sure this would work. Even in glibc, it's expected that early syscalls will use sc while scv is used later in the execution. -- Tulio Magno ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 10:59 ` Nicholas Piggin 2021-05-19 12:39 ` Tulio Magno Quites Machado Filho @ 2021-05-19 13:26 ` Dmitry V. Levin 2021-05-19 22:51 ` Nicholas Piggin 1 sibling, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-19 13:26 UTC (permalink / raw) To: Nicholas Piggin, Michael Ellerman Cc: libc-alpha, libc-dev, linux-api, linuxppc-dev, Matheus Castanho, musl, Joakim Tjernlund On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote: > Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm: > > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: > > [...] > >> With this patch, I think the ptrace ABI should mostly be fixed. I think > >> a problem remains with applications that look at system call return > >> registers directly and have powerpc specific error cases. Those probably > >> will just need to be updated unfortunately. Michael thought it might be > >> possible to return an indication via ptrace somehow that the syscall is > >> using a new ABI, so such apps can be updated to test for it. I don't > >> know how that would be done. > > > > Is there any sane way for these applications to handle the scv case? > > How can they tell that the scv semantics is being used for the given > > syscall invocation? Can this information be obtained e.g. from struct > > pt_regs? > > Not that I know of. Michael suggested there might be a way to add > something. ptrace_syscall_info has some pad bytes, could > we use one for flags bits and set a bit for "new system call ABI"? PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all architecture-specific details behind struct ptrace_syscall_info which has the same meaning on all architectures. ptrace_syscall_info.exit contains both rval and is_error fields to support every architecture regardless of its syscall ABI. ptrace_syscall_info.exit is extensible, but every architecture would have to define a method of telling whether the system call follows the "new system call ABI" conventions to export this bit of information. This essentially means implementing something like static inline long syscall_get_error_abi(struct task_struct *task, struct pt_regs *regs) for every architecture, and using it along with syscall_get_error in ptrace_get_syscall_info_exit to initialize the new field in ptrace_syscall_info.exit structure. > As a more hacky thing you could make a syscall with -1 and see how > the error looks, and then assume all syscalls will be the same. This would be very unreliable because sc and scv are allowed to intermingle, so every syscall invocation can follow any of these two error handling conventions. > Or... is it possible at syscall entry to peek the address of > the instruction which caused the call and see if that was a > scv instruction? That would be about as reliable as possible > without having that new flag bit. No other architecture requires peeking into tracee memory just to find out the syscall ABI. This would make powerpc the most ugly architecture for ptracing. I wonder why can't this information be just exported to the tracer via struct pt_regs? -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 13:26 ` Dmitry V. Levin @ 2021-05-19 22:51 ` Nicholas Piggin 2021-05-19 23:27 ` Dmitry V. Levin 0 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 22:51 UTC (permalink / raw) To: Dmitry V. Levin, Michael Ellerman Cc: Joakim Tjernlund, libc-alpha, libc-dev, linux-api, linuxppc-dev, Matheus Castanho, musl Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm: > On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote: >> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm: >> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: >> > [...] >> >> With this patch, I think the ptrace ABI should mostly be fixed. I think >> >> a problem remains with applications that look at system call return >> >> registers directly and have powerpc specific error cases. Those probably >> >> will just need to be updated unfortunately. Michael thought it might be >> >> possible to return an indication via ptrace somehow that the syscall is >> >> using a new ABI, so such apps can be updated to test for it. I don't >> >> know how that would be done. >> > >> > Is there any sane way for these applications to handle the scv case? >> > How can they tell that the scv semantics is being used for the given >> > syscall invocation? Can this information be obtained e.g. from struct >> > pt_regs? >> >> Not that I know of. Michael suggested there might be a way to add >> something. ptrace_syscall_info has some pad bytes, could >> we use one for flags bits and set a bit for "new system call ABI"? > > PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all > architecture-specific details behind struct ptrace_syscall_info which has > the same meaning on all architectures. ptrace_syscall_info.exit contains > both rval and is_error fields to support every architecture regardless of > its syscall ABI. > > ptrace_syscall_info.exit is extensible, but every architecture would have > to define a method of telling whether the system call follows the "new > system call ABI" conventions to export this bit of information. It's already architecture speicfic if you look at registers of syscall exit state so I don't see a problem with a flag that ppc can use for ABI. > > This essentially means implementing something like > static inline long syscall_get_error_abi(struct task_struct *task, struct pt_regs *regs) > for every architecture, and using it along with syscall_get_error > in ptrace_get_syscall_info_exit to initialize the new field in > ptrace_syscall_info.exit structure. Yes this could work. Other architectures can just use a generic implementation if they don't define their own so that's easy. And in userspace they can continue to ignore the flag. > >> As a more hacky thing you could make a syscall with -1 and see how >> the error looks, and then assume all syscalls will be the same. > > This would be very unreliable because sc and scv are allowed to intermingle, > so every syscall invocation can follow any of these two error handling > conventions. > >> Or... is it possible at syscall entry to peek the address of >> the instruction which caused the call and see if that was a >> scv instruction? That would be about as reliable as possible >> without having that new flag bit. > > No other architecture requires peeking into tracee memory just to find out > the syscall ABI. This would make powerpc the most ugly architecture for > ptracing. > > I wonder why can't this information be just exported to the tracer via > struct pt_regs? It might be able to, I don't see why that would be superior though. Where could you put it... I guess it could go in the trap field in a high bit. But could that break things that just test for syscall trap number (and don't care about register ABI)? I'm not sure. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 22:51 ` Nicholas Piggin @ 2021-05-19 23:27 ` Dmitry V. Levin 2021-05-20 2:40 ` Nicholas Piggin 0 siblings, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-19 23:27 UTC (permalink / raw) To: Nicholas Piggin Cc: Michael Ellerman, Joakim Tjernlund, libc-alpha, libc-dev, linux-api, linuxppc-dev, Matheus Castanho, musl On Thu, May 20, 2021 at 08:51:53AM +1000, Nicholas Piggin wrote: > Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm: > > On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote: > >> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm: > >> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: > >> > [...] > >> >> With this patch, I think the ptrace ABI should mostly be fixed. I think > >> >> a problem remains with applications that look at system call return > >> >> registers directly and have powerpc specific error cases. Those probably > >> >> will just need to be updated unfortunately. Michael thought it might be > >> >> possible to return an indication via ptrace somehow that the syscall is > >> >> using a new ABI, so such apps can be updated to test for it. I don't > >> >> know how that would be done. > >> > > >> > Is there any sane way for these applications to handle the scv case? > >> > How can they tell that the scv semantics is being used for the given > >> > syscall invocation? Can this information be obtained e.g. from struct > >> > pt_regs? > >> > >> Not that I know of. Michael suggested there might be a way to add > >> something. ptrace_syscall_info has some pad bytes, could > >> we use one for flags bits and set a bit for "new system call ABI"? > > > > PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all > > architecture-specific details behind struct ptrace_syscall_info which has > > the same meaning on all architectures. ptrace_syscall_info.exit contains > > both rval and is_error fields to support every architecture regardless of > > its syscall ABI. > > > > ptrace_syscall_info.exit is extensible, but every architecture would have > > to define a method of telling whether the system call follows the "new > > system call ABI" conventions to export this bit of information. > > It's already architecture speicfic if you look at registers of syscall > exit state so I don't see a problem with a flag that ppc can use for > ABI. To be honest, I don't see anything architecture-specific in PTRACE_GET_SYSCALL_INFO API. Yes, it's implementation uses various functions defined in asm/syscall.h, but this doesn't make the interface architecture-specific. PTRACE_GET_SYSCALL_INFO saves its users from necessity to be aware of tracee registers. That's why the only place where strace has to deal with tracee registers nowadays is syscall tampering. The most reliable solution is to introduce PTRACE_SET_SYSCALL_INFO, this would make the whole syscall abi issue irrelevant for ptracers, maybe the time has come to implement it. Unfortunately, extending ptrace API takes time, and it's not going to be backported to older kernels anyway, but scv-enabled kernels are already in the wild, so we need a quick powerpc-specific fix that would be backported to all maintained scv-enabled kernels. [...] > > I wonder why can't this information be just exported to the tracer via > > struct pt_regs? > > It might be able to, I don't see why that would be superior though. > > Where could you put it... I guess it could go in the trap field in a > high bit. But could that break things that just test for syscall > trap number (and don't care about register ABI)? I'm not sure. Looks like struct pt_regs.trap already contains the information that could be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then it's scv. Is my reading of arch/powerpc/include/asm/ptrace.h correct? -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 23:27 ` Dmitry V. Levin @ 2021-05-20 2:40 ` Nicholas Piggin 2021-05-20 3:06 ` Dmitry V. Levin 0 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2021-05-20 2:40 UTC (permalink / raw) To: Dmitry V. Levin Cc: Joakim Tjernlund, libc-alpha, libc-dev, linux-api, linuxppc-dev, Michael Ellerman, Matheus Castanho, musl Excerpts from Dmitry V. Levin's message of May 20, 2021 9:27 am: > On Thu, May 20, 2021 at 08:51:53AM +1000, Nicholas Piggin wrote: >> Excerpts from Dmitry V. Levin's message of May 19, 2021 11:26 pm: >> > On Wed, May 19, 2021 at 08:59:05PM +1000, Nicholas Piggin wrote: >> >> Excerpts from Dmitry V. Levin's message of May 19, 2021 8:24 pm: >> >> > On Wed, May 19, 2021 at 12:50:24PM +1000, Nicholas Piggin wrote: >> >> > [...] >> >> >> With this patch, I think the ptrace ABI should mostly be fixed. I think >> >> >> a problem remains with applications that look at system call return >> >> >> registers directly and have powerpc specific error cases. Those probably >> >> >> will just need to be updated unfortunately. Michael thought it might be >> >> >> possible to return an indication via ptrace somehow that the syscall is >> >> >> using a new ABI, so such apps can be updated to test for it. I don't >> >> >> know how that would be done. >> >> > >> >> > Is there any sane way for these applications to handle the scv case? >> >> > How can they tell that the scv semantics is being used for the given >> >> > syscall invocation? Can this information be obtained e.g. from struct >> >> > pt_regs? >> >> >> >> Not that I know of. Michael suggested there might be a way to add >> >> something. ptrace_syscall_info has some pad bytes, could >> >> we use one for flags bits and set a bit for "new system call ABI"? >> > >> > PTRACE_GET_SYSCALL_INFO is an architecture-agnostic API, it hides all >> > architecture-specific details behind struct ptrace_syscall_info which has >> > the same meaning on all architectures. ptrace_syscall_info.exit contains >> > both rval and is_error fields to support every architecture regardless of >> > its syscall ABI. >> > >> > ptrace_syscall_info.exit is extensible, but every architecture would have >> > to define a method of telling whether the system call follows the "new >> > system call ABI" conventions to export this bit of information. >> >> It's already architecture speicfic if you look at registers of syscall >> exit state so I don't see a problem with a flag that ppc can use for >> ABI. > > To be honest, I don't see anything architecture-specific in > PTRACE_GET_SYSCALL_INFO API. Yes, it's implementation uses various > functions defined in asm/syscall.h, but this doesn't make the interface > architecture-specific. No. But a field or flag it exports could be architecture dependent. It doesn't detract independence from the rest of the ABI. That said... > PTRACE_GET_SYSCALL_INFO saves its users from necessity to be aware of > tracee registers. That's why the only place where strace has to deal > with tracee registers nowadays is syscall tampering. The most reliable > solution is to introduce PTRACE_SET_SYSCALL_INFO, this would make the > whole syscall abi issue irrelevant for ptracers, maybe the time has come > to implement it. > > Unfortunately, extending ptrace API takes time, and it's not going to be > backported to older kernels anyway, but scv-enabled kernels are already > in the wild, so we need a quick powerpc-specific fix that would be > backported to all maintained scv-enabled kernels. > > [...] >> > I wonder why can't this information be just exported to the tracer via >> > struct pt_regs? >> >> It might be able to, I don't see why that would be superior though. >> >> Where could you put it... I guess it could go in the trap field in a >> high bit. But could that break things that just test for syscall >> trap number (and don't care about register ABI)? I'm not sure. > > Looks like struct pt_regs.trap already contains the information that could > be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then > it's scv. Is my reading of arch/powerpc/include/asm/ptrace.h correct? Hmm, I think it is. Certainly in the kernel regs struct it is, I had in my mind that we put it to 0xc00 when populating the user struct for compatibility, but it seems not. So I guess this would work. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-20 2:40 ` Nicholas Piggin @ 2021-05-20 3:06 ` Dmitry V. Levin 2021-05-20 5:12 ` Nicholas Piggin 0 siblings, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-20 3:06 UTC (permalink / raw) To: Nicholas Piggin Cc: Joakim Tjernlund, libc-alpha, libc-dev, linux-api, linuxppc-dev, Michael Ellerman, Matheus Castanho, musl On Thu, May 20, 2021 at 12:40:36PM +1000, Nicholas Piggin wrote: [...] > > Looks like struct pt_regs.trap already contains the information that could > > be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then > > it's scv. Is my reading of arch/powerpc/include/asm/ptrace.h correct? > > Hmm, I think it is. Certainly in the kernel regs struct it is, I had in > my mind that we put it to 0xc00 when populating the user struct for > compatibility, but it seems not. So I guess this would work. OK, can we state that (pt_regs.trap & ~0xf) == 0x3000 is a part of the scv ABI, so it's not going to change and could be relied upon by userspace? Could this be documented in Documentation/powerpc/syscall64-abi.rst, please? -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-20 3:06 ` Dmitry V. Levin @ 2021-05-20 5:12 ` Nicholas Piggin 0 siblings, 0 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-20 5:12 UTC (permalink / raw) To: Dmitry V. Levin Cc: Joakim Tjernlund, libc-alpha, libc-dev, linux-api, linuxppc-dev, Michael Ellerman, Matheus Castanho, musl Excerpts from Dmitry V. Levin's message of May 20, 2021 1:06 pm: > On Thu, May 20, 2021 at 12:40:36PM +1000, Nicholas Piggin wrote: > [...] >> > Looks like struct pt_regs.trap already contains the information that could >> > be used to tell 'sc' from 'scv': if (pt_regs.trap & ~0xf) == 0x3000, then >> > it's scv. Is my reading of arch/powerpc/include/asm/ptrace.h correct? >> >> Hmm, I think it is. Certainly in the kernel regs struct it is, I had in >> my mind that we put it to 0xc00 when populating the user struct for >> compatibility, but it seems not. So I guess this would work. > > OK, can we state that (pt_regs.trap & ~0xf) == 0x3000 is a part of the scv > ABI, so it's not going to change and could be relied upon by userspace? > Could this be documented in Documentation/powerpc/syscall64-abi.rst, > please? Yeah I think we can do that. The kernel doesn't care what is put in the userspace pt_regs.trap too much so if this is your preferred approach then I will document it in the ABI. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-18 23:13 ` Dmitry V. Levin 2021-05-19 2:50 ` Nicholas Piggin @ 2021-05-19 7:33 ` Joakim Tjernlund 2021-05-19 7:55 ` Nicholas Piggin 1 sibling, 1 reply; 42+ messages in thread From: Joakim Tjernlund @ 2021-05-19 7:33 UTC (permalink / raw) To: ldv, mpe, npiggin; +Cc: linuxppc-dev, libc-dev, musl, linux-api On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote: > Hi, > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: > [...] > > - Error handling: The consensus among kernel, glibc, and musl is to move to > > using negative return values in r3 rather than CR0[SO]=1 to indicate error, > > which matches most other architectures, and is closer to a function call. What about syscalls like times(2) which can return -1 without it being an error? Jocke ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 7:33 ` Joakim Tjernlund @ 2021-05-19 7:55 ` Nicholas Piggin 2021-05-19 8:08 ` Joakim Tjernlund 0 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 7:55 UTC (permalink / raw) To: Joakim Tjernlund, ldv, mpe; +Cc: libc-dev, linux-api, linuxppc-dev, musl Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm: > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote: >> Hi, >> >> On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >> [...] >> > - Error handling: The consensus among kernel, glibc, and musl is to move to >> > using negative return values in r3 rather than CR0[SO]=1 to indicate error, >> > which matches most other architectures, and is closer to a function call. > > What about syscalls like times(2) which can return -1 without it being an error? They do become errors / indistinguishable and have to be dealt with by libc or userspace. Which does follow what most architectures do (all except ia64, mips, sparc, and powerpc actually). Interesting question though, it should have been noted. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 7:55 ` Nicholas Piggin @ 2021-05-19 8:08 ` Joakim Tjernlund 2021-05-19 8:42 ` Nicholas Piggin 0 siblings, 1 reply; 42+ messages in thread From: Joakim Tjernlund @ 2021-05-19 8:08 UTC (permalink / raw) To: ldv, mpe, npiggin; +Cc: linuxppc-dev, libc-dev, musl, linux-api On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote: > Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm: > > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote: > > > Hi, > > > > > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: > > > [...] > > > > - Error handling: The consensus among kernel, glibc, and musl is to move to > > > > using negative return values in r3 rather than CR0[SO]=1 to indicate error, > > > > which matches most other architectures, and is closer to a function call. > > > > What about syscalls like times(2) which can return -1 without it being an error? > > They do become errors / indistinguishable and have to be dealt with by > libc or userspace. Which does follow what most architectures do (all > except ia64, mips, sparc, and powerpc actually). > > Interesting question though, it should have been noted. > > Thanks, > Nick I always figured the ppc way was superior. It begs the question if not the other archs should change instead? Jocke ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 8:08 ` Joakim Tjernlund @ 2021-05-19 8:42 ` Nicholas Piggin 2021-05-19 11:12 ` Nicholas Piggin 2021-05-19 14:38 ` Segher Boessenkool 0 siblings, 2 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 8:42 UTC (permalink / raw) To: Joakim Tjernlund, ldv, mpe; +Cc: libc-dev, linux-api, linuxppc-dev, musl Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote: >> Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm: >> > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote: >> > > Hi, >> > > >> > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >> > > [...] >> > > > - Error handling: The consensus among kernel, glibc, and musl is to move to >> > > > using negative return values in r3 rather than CR0[SO]=1 to indicate error, >> > > > which matches most other architectures, and is closer to a function call. >> > >> > What about syscalls like times(2) which can return -1 without it being an error? >> >> They do become errors / indistinguishable and have to be dealt with by >> libc or userspace. Which does follow what most architectures do (all >> except ia64, mips, sparc, and powerpc actually). >> >> Interesting question though, it should have been noted. >> >> Thanks, >> Nick > > I always figured the ppc way was superior. It begs the question if not the other archs should > change instead? It is superior in some ways, not enough to be worth being different. Other archs are unlikely to change because it would be painful for not much benefit. New system calls just should be made to not return error numbers. If we ever had a big new version of syscall ABI in Linux, we can always use another scv vector number for it. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 8:42 ` Nicholas Piggin @ 2021-05-19 11:12 ` Nicholas Piggin 2021-05-19 14:38 ` Segher Boessenkool 1 sibling, 0 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-19 11:12 UTC (permalink / raw) To: Joakim Tjernlund, ldv, mpe; +Cc: libc-dev, linux-api, linuxppc-dev, musl Excerpts from Nicholas Piggin's message of May 19, 2021 6:42 pm: > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: >> On Wed, 2021-05-19 at 17:55 +1000, Nicholas Piggin wrote: >>> Excerpts from Joakim Tjernlund's message of May 19, 2021 5:33 pm: >>> > On Wed, 2021-05-19 at 02:13 +0300, Dmitry V. Levin wrote: >>> > > Hi, >>> > > >>> > > On Thu, Jun 11, 2020 at 06:12:01PM +1000, Nicholas Piggin wrote: >>> > > [...] >>> > > > - Error handling: The consensus among kernel, glibc, and musl is to move to >>> > > > using negative return values in r3 rather than CR0[SO]=1 to indicate error, >>> > > > which matches most other architectures, and is closer to a function call. >>> > >>> > What about syscalls like times(2) which can return -1 without it being an error? >>> >>> They do become errors / indistinguishable and have to be dealt with by >>> libc or userspace. Which does follow what most architectures do (all >>> except ia64, mips, sparc, and powerpc actually). >>> >>> Interesting question though, it should have been noted. >>> >>> Thanks, >>> Nick >> >> I always figured the ppc way was superior. It begs the question if not the other archs should >> change instead? > > It is superior in some ways, not enough to be worth being different. > > Other archs are unlikely to change because it would be painful for > not much benefit. New system calls just should be made to not return > error numbers. If we ever had a big new version of syscall ABI in > Linux, we can always use another scv vector number for it. Or... is it possible at syscall entry to peek the address of the instruction which caused the call and see if that was a scv instruction? That would be about as reliable as possible without having that new flag bit. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 8:42 ` Nicholas Piggin 2021-05-19 11:12 ` Nicholas Piggin @ 2021-05-19 14:38 ` Segher Boessenkool 2021-05-19 15:06 ` Joakim Tjernlund 1 sibling, 1 reply; 42+ messages in thread From: Segher Boessenkool @ 2021-05-19 14:38 UTC (permalink / raw) To: Nicholas Piggin Cc: Joakim Tjernlund, ldv, mpe, libc-dev, musl, linuxppc-dev, linux-api On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > I always figured the ppc way was superior. It begs the question if not the other archs should > > change instead? > > It is superior in some ways, not enough to be worth being different. The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, you will have to do that whether you conflate the concepts of return code and error indicator or not! > Other archs are unlikely to change because it would be painful for > not much benefit. Other archs cannot easily change for much the same reason :-) > New system calls just should be made to not return > error numbers. Which sometimes is a difficult / non-natural / clumsy thing to do. Segher ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 14:38 ` Segher Boessenkool @ 2021-05-19 15:06 ` Joakim Tjernlund 2021-05-19 15:22 ` Segher Boessenkool 0 siblings, 1 reply; 42+ messages in thread From: Joakim Tjernlund @ 2021-05-19 15:06 UTC (permalink / raw) To: segher, npiggin; +Cc: linuxppc-dev, ldv, mpe, musl, libc-dev, linux-api On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote: > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > > I always figured the ppc way was superior. It begs the question if not the other archs should > > > change instead? > > > > It is superior in some ways, not enough to be worth being different. > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, > you will have to do that whether you conflate the concepts of return > code and error indicator or not! > > > Other archs are unlikely to change because it would be painful for > > not much benefit. > > Other archs cannot easily change for much the same reason :-) Really? I figured you could just add extra error indication in kernel syscall I/F. Eventually user space could migrate to the new indication. Jocke ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 15:06 ` Joakim Tjernlund @ 2021-05-19 15:22 ` Segher Boessenkool 2021-05-19 15:36 ` Rich Felker 2021-05-19 18:09 ` Joakim Tjernlund 0 siblings, 2 replies; 42+ messages in thread From: Segher Boessenkool @ 2021-05-19 15:22 UTC (permalink / raw) To: Joakim Tjernlund Cc: npiggin, linuxppc-dev, ldv, mpe, musl, libc-dev, linux-api On Wed, May 19, 2021 at 03:06:49PM +0000, Joakim Tjernlund wrote: > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote: > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > > > I always figured the ppc way was superior. It begs the question if not the other archs should > > > > change instead? > > > > > > It is superior in some ways, not enough to be worth being different. > > > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, > > you will have to do that whether you conflate the concepts of return > > code and error indicator or not! > > > > > Other archs are unlikely to change because it would be painful for > > > not much benefit. > > > > Other archs cannot easily change for much the same reason :-) > > Really? I figured you could just add extra error indication in kernel syscall I/F. > Eventually user space could migrate to the new indication. You seem to assume all user space uses glibc, or *any* libc even? This is false. Some programs do system calls directly. Do not break the kernel ABI :-) Segher ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 15:22 ` Segher Boessenkool @ 2021-05-19 15:36 ` Rich Felker 2021-05-19 18:09 ` Joakim Tjernlund 1 sibling, 0 replies; 42+ messages in thread From: Rich Felker @ 2021-05-19 15:36 UTC (permalink / raw) To: Segher Boessenkool Cc: Joakim Tjernlund, npiggin, linuxppc-dev, ldv, mpe, musl, libc-dev, linux-api On Wed, May 19, 2021 at 10:22:05AM -0500, Segher Boessenkool wrote: > On Wed, May 19, 2021 at 03:06:49PM +0000, Joakim Tjernlund wrote: > > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote: > > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > > > > I always figured the ppc way was superior. It begs the question if not the other archs should > > > > > change instead? > > > > > > > > It is superior in some ways, not enough to be worth being different. > > > > > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, > > > you will have to do that whether you conflate the concepts of return > > > code and error indicator or not! > > > > > > > Other archs are unlikely to change because it would be painful for > > > > not much benefit. > > > > > > Other archs cannot easily change for much the same reason :-) > > > > Really? I figured you could just add extra error indication in kernel syscall I/F. > > Eventually user space could migrate to the new indication. > > You seem to assume all user space uses glibc, or *any* libc even? This > is false. Some programs do system calls directly. Do not break the > kernel ABI :-) Even if it were easy to change, the old ppc ABI with a separate error indicator is much worse to use. In musl we paper over archs that do this silliness by converting to a normal negated errno code. There are literally no syscalls that need the ability to return negative values in addition to error codes; historically there were one or two (I only recall one fcntl command) but there were ways to disambiguate and they're only fallbacks for ancient kernels nowadays, if used at all. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 15:22 ` Segher Boessenkool 2021-05-19 15:36 ` Rich Felker @ 2021-05-19 18:09 ` Joakim Tjernlund 2021-05-19 23:48 ` Rich Felker 1 sibling, 1 reply; 42+ messages in thread From: Joakim Tjernlund @ 2021-05-19 18:09 UTC (permalink / raw) To: segher; +Cc: linuxppc-dev, ldv, mpe, musl, npiggin, linux-api, libc-dev On Wed, 2021-05-19 at 10:22 -0500, Segher Boessenkool wrote: > On Wed, May 19, 2021 at 03:06:49PM +0000, Joakim Tjernlund wrote: > > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote: > > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > > > > I always figured the ppc way was superior. It begs the question if not the other archs should > > > > > change instead? > > > > > > > > It is superior in some ways, not enough to be worth being different. > > > > > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, > > > you will have to do that whether you conflate the concepts of return > > > code and error indicator or not! > > > > > > > Other archs are unlikely to change because it would be painful for > > > > not much benefit. > > > > > > Other archs cannot easily change for much the same reason :-) > > > > Really? I figured you could just add extra error indication in kernel syscall I/F. > > Eventually user space could migrate to the new indication. > > You seem to assume all user space uses glibc, or *any* libc even? This > is false. Some programs do system calls directly. Do not break the > kernel ABI :-) Adding an extra indicator does not break ABI, does it ? W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? Jocke ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 18:09 ` Joakim Tjernlund @ 2021-05-19 23:48 ` Rich Felker 2021-05-20 1:06 ` Dmitry V. Levin 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2021-05-19 23:48 UTC (permalink / raw) To: Joakim Tjernlund Cc: segher, linuxppc-dev, ldv, mpe, musl, npiggin, linux-api, libc-dev On Wed, May 19, 2021 at 06:09:25PM +0000, Joakim Tjernlund wrote: > On Wed, 2021-05-19 at 10:22 -0500, Segher Boessenkool wrote: > > On Wed, May 19, 2021 at 03:06:49PM +0000, Joakim Tjernlund wrote: > > > On Wed, 2021-05-19 at 09:38 -0500, Segher Boessenkool wrote: > > > > On Wed, May 19, 2021 at 06:42:40PM +1000, Nicholas Piggin wrote: > > > > > Excerpts from Joakim Tjernlund's message of May 19, 2021 6:08 pm: > > > > > > I always figured the ppc way was superior. It begs the question if not the other archs should > > > > > > change instead? > > > > > > > > > > It is superior in some ways, not enough to be worth being different. > > > > > > > > The PowerPC syscall ABI *requires* using cr0.3 for indicating errors, > > > > you will have to do that whether you conflate the concepts of return > > > > code and error indicator or not! > > > > > > > > > Other archs are unlikely to change because it would be painful for > > > > > not much benefit. > > > > > > > > Other archs cannot easily change for much the same reason :-) > > > > > > Really? I figured you could just add extra error indication in kernel syscall I/F. > > > Eventually user space could migrate to the new indication. > > > > You seem to assume all user space uses glibc, or *any* libc even? This > > is false. Some programs do system calls directly. Do not break the > > kernel ABI :-) > > Adding an extra indicator does not break ABI, does it ? It does, because the old ABI on most archs has no clobbers except the return value register. Some archs though have additional syscall-clobbered regs that could be repurposed as extra return values, but you could only rely on them being meaningful after probing for a kernel that speaks the new variant. This just makes things more complicated, not more useful. > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? No, it's a new independent interface. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-19 23:48 ` Rich Felker @ 2021-05-20 1:06 ` Dmitry V. Levin 2021-05-20 2:45 ` Nicholas Piggin 0 siblings, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-20 1:06 UTC (permalink / raw) To: Rich Felker Cc: Joakim Tjernlund, Segher Boessenkool, Michael Ellerman, Nicholas Piggin, musl, linuxppc-dev, linux-api On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote: > On Wed, May 19, 2021 at 06:09:25PM +0000, Joakim Tjernlund wrote: [...] > > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? > > No, it's a new independent interface. Unfortunately, being a new independent interface doesn't mean it isn't an ABI break. In fact, it was a severe ABI break, and this thread is an attempt to find a hotfix. -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-20 1:06 ` Dmitry V. Levin @ 2021-05-20 2:45 ` Nicholas Piggin 2021-05-20 2:59 ` Dmitry V. Levin 0 siblings, 1 reply; 42+ messages in thread From: Nicholas Piggin @ 2021-05-20 2:45 UTC (permalink / raw) To: Rich Felker, Dmitry V. Levin Cc: Joakim Tjernlund, linux-api, linuxppc-dev, Michael Ellerman, musl, Segher Boessenkool Excerpts from Dmitry V. Levin's message of May 20, 2021 11:06 am: > On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote: >> On Wed, May 19, 2021 at 06:09:25PM +0000, Joakim Tjernlund wrote: > [...] >> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? >> >> No, it's a new independent interface. > > Unfortunately, being a new independent interface doesn't mean it isn't > an ABI break. In fact, it was a severe ABI break, and this thread is > an attempt to find a hotfix. It is an ABI break, that was known. The ptrace info stuff I fixed with the patch earlier was obviously a bug in my initial implementation and not intended (sorry my ptrace testing was not sufficient, and thanks for reporting it, by the way). But the register ABI was always a known break. The issue is that rfscv clobbers LR, so it can not support the old ABI. If the old ABI did not preserve LR, then we may have chosen to not change register ABI at all. Thanks, Nick ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-20 2:45 ` Nicholas Piggin @ 2021-05-20 2:59 ` Dmitry V. Levin 2021-05-20 7:20 ` Nicholas Piggin 0 siblings, 1 reply; 42+ messages in thread From: Dmitry V. Levin @ 2021-05-20 2:59 UTC (permalink / raw) To: Nicholas Piggin Cc: Rich Felker, Joakim Tjernlund, linux-api, linuxppc-dev, Michael Ellerman, musl, Segher Boessenkool On Thu, May 20, 2021 at 12:45:57PM +1000, Nicholas Piggin wrote: > Excerpts from Dmitry V. Levin's message of May 20, 2021 11:06 am: > > On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote: > >> On Wed, May 19, 2021 at 06:09:25PM +0000, Joakim Tjernlund wrote: > > [...] > >> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? > >> > >> No, it's a new independent interface. > > > > Unfortunately, being a new independent interface doesn't mean it isn't > > an ABI break. In fact, it was a severe ABI break, and this thread is > > an attempt to find a hotfix. > > It is an ABI break, that was known. The ptrace info stuff I fixed with > the patch earlier was obviously a bug in my initial implementation and > not intended (sorry my ptrace testing was not sufficient, and thanks for > reporting it, by the way). Could you check whether tools/testing/selftests/ptrace/get_syscall_info.c passes again with your fix, please? If yes, then PTRACE_GET_SYSCALL_INFO is fixed. By the way, kernel tracing and audit subsystems also use those functions from asm/syscall.h and asm/ptrace.h, so your ptrace fix is likely to fix these subsystems as well. -- ldv ^ permalink raw reply [flat|nested] 42+ messages in thread
* [musl] Re: Linux powerpc new system call instruction and ABI 2021-05-20 2:59 ` Dmitry V. Levin @ 2021-05-20 7:20 ` Nicholas Piggin 0 siblings, 0 replies; 42+ messages in thread From: Nicholas Piggin @ 2021-05-20 7:20 UTC (permalink / raw) To: Dmitry V. Levin Cc: Rich Felker, Joakim Tjernlund, linux-api, linuxppc-dev, Michael Ellerman, musl, Segher Boessenkool Excerpts from Dmitry V. Levin's message of May 20, 2021 12:59 pm: > On Thu, May 20, 2021 at 12:45:57PM +1000, Nicholas Piggin wrote: >> Excerpts from Dmitry V. Levin's message of May 20, 2021 11:06 am: >> > On Wed, May 19, 2021 at 07:48:47PM -0400, Rich Felker wrote: >> >> On Wed, May 19, 2021 at 06:09:25PM +0000, Joakim Tjernlund wrote: >> > [...] >> >> > W.r.t breaking ABI, isn't that what PowerPC is trying to do with the new syscall I/F? >> >> >> >> No, it's a new independent interface. >> > >> > Unfortunately, being a new independent interface doesn't mean it isn't >> > an ABI break. In fact, it was a severe ABI break, and this thread is >> > an attempt to find a hotfix. >> >> It is an ABI break, that was known. The ptrace info stuff I fixed with >> the patch earlier was obviously a bug in my initial implementation and >> not intended (sorry my ptrace testing was not sufficient, and thanks for >> reporting it, by the way). > > Could you check whether tools/testing/selftests/ptrace/get_syscall_info.c > passes again with your fix, please? It does. Thanks, Nick > If yes, then PTRACE_GET_SYSCALL_INFO is fixed. > > By the way, kernel tracing and audit subsystems also use those functions > from asm/syscall.h and asm/ptrace.h, so your ptrace fix is likely to fix > these subsystems as well. ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2021-05-24 20:33 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-11 8:12 [musl] Linux powerpc new system call instruction and ABI Nicholas Piggin 2020-06-11 8:12 ` [musl] [PATCH 1/2] powerpc/64s/exception: treat NIA below __end_interrupts as soft-masked Nicholas Piggin 2020-07-24 13:25 ` [musl] " Michael Ellerman 2020-06-11 8:12 ` [musl] [PATCH 2/2] powerpc/64s: system call support for scv/rfscv instructions Nicholas Piggin 2020-07-23 6:47 ` [musl] " Michael Ellerman 2020-07-23 16:48 ` Christophe Leroy 2020-07-24 10:45 ` Michael Ellerman 2020-06-11 21:02 ` [musl] Re: Linux powerpc new system call instruction and ABI Segher Boessenkool 2020-06-14 9:26 ` Nicholas Piggin 2021-05-18 23:13 ` Dmitry V. Levin 2021-05-19 2:50 ` Nicholas Piggin 2021-05-19 5:01 ` Nicholas Piggin 2021-05-21 19:40 ` Matheus Castanho 2021-05-21 19:52 ` Florian Weimer 2021-05-21 20:00 ` Matheus Castanho 2021-05-21 20:52 ` Dmitry V. Levin 2021-05-24 12:11 ` Matheus Castanho 2021-05-24 20:33 ` Matheus Castanho 2021-05-19 10:24 ` Dmitry V. Levin 2021-05-19 10:59 ` Nicholas Piggin 2021-05-19 12:39 ` Tulio Magno Quites Machado Filho 2021-05-19 13:26 ` Dmitry V. Levin 2021-05-19 22:51 ` Nicholas Piggin 2021-05-19 23:27 ` Dmitry V. Levin 2021-05-20 2:40 ` Nicholas Piggin 2021-05-20 3:06 ` Dmitry V. Levin 2021-05-20 5:12 ` Nicholas Piggin 2021-05-19 7:33 ` Joakim Tjernlund 2021-05-19 7:55 ` Nicholas Piggin 2021-05-19 8:08 ` Joakim Tjernlund 2021-05-19 8:42 ` Nicholas Piggin 2021-05-19 11:12 ` Nicholas Piggin 2021-05-19 14:38 ` Segher Boessenkool 2021-05-19 15:06 ` Joakim Tjernlund 2021-05-19 15:22 ` Segher Boessenkool 2021-05-19 15:36 ` Rich Felker 2021-05-19 18:09 ` Joakim Tjernlund 2021-05-19 23:48 ` Rich Felker 2021-05-20 1:06 ` Dmitry V. Levin 2021-05-20 2:45 ` Nicholas Piggin 2021-05-20 2:59 ` Dmitry V. Levin 2021-05-20 7:20 ` Nicholas Piggin
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).