* [musl] PAC/BTI Support on aarch64 @ 2024-02-12 16:38 William Roberts 2024-02-12 18:42 ` Rich Felker 0 siblings, 1 reply; 25+ messages in thread From: William Roberts @ 2024-02-12 16:38 UTC (permalink / raw) To: musl Hello, I was just wondering if there was any work being done to support PAC and BTI in aarch64? I could add support but didn't want to duplicate the work. Thanks, Bill ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 16:38 [musl] PAC/BTI Support on aarch64 William Roberts @ 2024-02-12 18:42 ` Rich Felker 2024-02-12 21:25 ` William Roberts ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Rich Felker @ 2024-02-12 18:42 UTC (permalink / raw) To: William Roberts; +Cc: musl On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > Hello, > > I was just wondering if there was any work being done to support PAC > and BTI in aarch64? I could add support but didn't want to duplicate > the work. I'm not aware of any active work on this, but before writing a full implementation, it would be really helpful to start with a basic proposal for the scope of changes needed to make it work to assess whether these are managable and acceptable cost. Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 18:42 ` Rich Felker @ 2024-02-12 21:25 ` William Roberts 2024-02-12 21:34 ` enh 2024-02-12 22:46 ` Rich Felker 2024-02-19 23:54 ` Fangrui Song [not found] ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com> 2 siblings, 2 replies; 25+ messages in thread From: William Roberts @ 2024-02-12 21:25 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > Hello, > > > > I was just wondering if there was any work being done to support PAC > > and BTI in aarch64? I could add support but didn't want to duplicate > > the work. > > I'm not aware of any active work on this, but before writing a full > implementation, it would be really helpful to start with a basic > proposal for the scope of changes needed to make it work to assess > whether these are manageable and acceptable cost. It's a matter of building with -mbranch-protection=standard Just the ASM labels need the first instruction to be a BTI. They're in the NOP space so they are backwards compatible, older hardware will just NOP it. It's been done for many projects, glibc and bionic have it. The problem with BTI is that when one item in the link list doesn't support BTI the loader/linker turns it off. So when it's something like a libc that is fundamental in the link chain, it turns it off for everything. The initial scope of code changes would be what's reported when LDFLAGS=-Wl,-zforce-bti,--fatal-warnings /usr/bin/ld: obj/src/fenv/aarch64/fenv.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/ldso/aarch64/dlsym.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/ldso/aarch64/tlsdesc.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/process/aarch64/vfork.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/setjmp/aarch64/longjmp.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/setjmp/aarch64/setjmp.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/signal/aarch64/restore.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/signal/aarch64/sigsetjmp.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/string/aarch64/memcpy.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/string/aarch64/memset.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/thread/aarch64/__set_thread_area.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/thread/aarch64/__unmapself.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/thread/aarch64/clone.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. /usr/bin/ld: obj/src/thread/aarch64/syscall_cp.lo: warning: BTI turned on by -z force-bti when all inputs do not have BTI in NOTE section. > > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 21:25 ` William Roberts @ 2024-02-12 21:34 ` enh 2024-02-12 22:46 ` Rich Felker 1 sibling, 0 replies; 25+ messages in thread From: enh @ 2024-02-12 21:34 UTC (permalink / raw) To: musl; +Cc: Rich Felker On Mon, Feb 12, 2024 at 1:26 PM William Roberts <bill.c.roberts@gmail.com> wrote: > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > Hello, > > > > > > I was just wondering if there was any work being done to support PAC > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > the work. > > > > I'm not aware of any active work on this, but before writing a full > > implementation, it would be really helpful to start with a basic > > proposal for the scope of changes needed to make it work to assess > > whether these are manageable and acceptable cost. > > It's a matter of building with -mbranch-protection=standard > > Just the ASM labels need the first instruction to be a BTI. They're in > the NOP space > so they are backwards compatible, older hardware will just NOP it. > > It's been done for many projects, glibc and bionic have it. The > problem with BTI is that when one item in the link > list doesn't support BTI the loader/linker turns it off. So when it's > something like a libc that is fundamental in the link chain, > it turns it off for everything. note that bionic was quite sneaky, and if you look at bionic's arm64 .S files, you'll think we _haven't_ done the BTI work... we hid the `bti c` instruction in the implementation of our ENTRY() macro [https://android.googlesource.com/platform/bionic/+/main/libc/private/bionic_asm_arm64.h#48] and similarly the ELF note you need is hidden by macros too [https://android.googlesource.com/platform/bionic/+/main/libc/private/bionic_asm_arm64.h#60]. > The initial scope of code changes would be what's reported when > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > /usr/bin/ld: obj/src/fenv/aarch64/fenv.lo: warning: BTI turned on by > -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/ldso/aarch64/dlsym.lo: warning: BTI turned on by > -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/ldso/aarch64/tlsdesc.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/process/aarch64/vfork.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/setjmp/aarch64/longjmp.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/setjmp/aarch64/setjmp.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/signal/aarch64/restore.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/signal/aarch64/sigsetjmp.lo: warning: BTI turned > on by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/string/aarch64/memcpy.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/string/aarch64/memset.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/thread/aarch64/__set_thread_area.lo: warning: BTI > turned on by -z force-bti when all inputs do not have BTI in NOTE > section. > /usr/bin/ld: obj/src/thread/aarch64/__unmapself.lo: warning: BTI > turned on by -z force-bti when all inputs do not have BTI in NOTE > section. > /usr/bin/ld: obj/src/thread/aarch64/clone.lo: warning: BTI turned on > by -z force-bti when all inputs do not have BTI in NOTE section. > /usr/bin/ld: obj/src/thread/aarch64/syscall_cp.lo: warning: BTI turned > on by -z force-bti when all inputs do not have BTI in NOTE section. > > > > > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 21:25 ` William Roberts 2024-02-12 21:34 ` enh @ 2024-02-12 22:46 ` Rich Felker 2024-02-12 23:05 ` enh 1 sibling, 1 reply; 25+ messages in thread From: Rich Felker @ 2024-02-12 22:46 UTC (permalink / raw) To: William Roberts; +Cc: musl On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > Hello, > > > > > > I was just wondering if there was any work being done to support PAC > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > the work. > > > > I'm not aware of any active work on this, but before writing a full > > implementation, it would be really helpful to start with a basic > > proposal for the scope of changes needed to make it work to assess > > whether these are manageable and acceptable cost. > > It's a matter of building with -mbranch-protection=standard > > Just the ASM labels need the first instruction to be a BTI. They're in > the NOP space > so they are backwards compatible, older hardware will just NOP it. I think it's a little more elaborate than that. Those asm instructions need to be added (probably as .instr or .word or something, unless there's a way to spell this particular nop that existing tooling will understand). Or it could be made conditional, but that would require converting any asm that's not already .S files to .S. Not bad, but not quite as trivial as adding something to CFLAGS. I also wondered if [sig]setjmp/longjmp would be affected, but probably not. > It's been done for many projects, glibc and bionic have it. The > problem with BTI is that when one item in the link > list doesn't support BTI the loader/linker turns it off. So when it's > something like a libc that is fundamental in the link chain, > it turns it off for everything. This presumably requires some kind of machinery for how dynamic linking will work, and possibly turning it off if a library without it is dlopened? My understanding doing some brief searches though was that you can individually mprotect it off in certain regions. So maybe it's possible to just enable only for DSOs that support it? > The initial scope of code changes would be what's reported when > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings Is there a way to disable these warnings so that every asm file does not need to be cluttered with annotations? Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 22:46 ` Rich Felker @ 2024-02-12 23:05 ` enh 2024-02-12 23:18 ` William Roberts 0 siblings, 1 reply; 25+ messages in thread From: enh @ 2024-02-12 23:05 UTC (permalink / raw) To: musl; +Cc: William Roberts On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > > Hello, > > > > > > > > I was just wondering if there was any work being done to support PAC > > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > > the work. > > > > > > I'm not aware of any active work on this, but before writing a full > > > implementation, it would be really helpful to start with a basic > > > proposal for the scope of changes needed to make it work to assess > > > whether these are manageable and acceptable cost. > > > > It's a matter of building with -mbranch-protection=standard > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > the NOP space > > so they are backwards compatible, older hardware will just NOP it. > > I think it's a little more elaborate than that. Those asm instructions > need to be added (probably as .instr or .word or something, unless > there's a way to spell this particular nop that existing tooling will > understand). depends on your toolchain version. when we added this to bionic, the toolchain work was still happening. so you'll want to test against whatever your oldest-supported toolchain is. > Or it could be made conditional, but that would require > converting any asm that's not already .S files to .S. Not bad, but not > quite as trivial as adding something to CFLAGS. > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > not. bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > It's been done for many projects, glibc and bionic have it. The > > problem with BTI is that when one item in the link > > list doesn't support BTI the loader/linker turns it off. So when it's > > something like a libc that is fundamental in the link chain, > > it turns it off for everything. > > This presumably requires some kind of machinery for how dynamic > linking will work, and possibly turning it off if a library without it > is dlopened? > > My understanding doing some brief searches though was that you can > individually mprotect it off in certain regions. So maybe it's > possible to just enable only for DSOs that support it? correct. > > The initial scope of code changes would be what's reported when > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > Is there a way to disable these warnings so that every asm file does > not need to be cluttered with annotations? well, that's the ELF note stuff i was talking about, and if you don't have it you'll fall foul of the static linker saying "not all this code is BTI-enabled, therefore this .so isn't", and the dynamic linker doing nothing because the static linker effectively tells it not to. > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 23:05 ` enh @ 2024-02-12 23:18 ` William Roberts 2024-02-13 2:08 ` Rich Felker 0 siblings, 1 reply; 25+ messages in thread From: William Roberts @ 2024-02-12 23:18 UTC (permalink / raw) To: enh; +Cc: musl On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote: > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > > > Hello, > > > > > > > > > > I was just wondering if there was any work being done to support PAC > > > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > > > the work. > > > > > > > > I'm not aware of any active work on this, but before writing a full > > > > implementation, it would be really helpful to start with a basic > > > > proposal for the scope of changes needed to make it work to assess > > > > whether these are manageable and acceptable cost. > > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > the NOP space > > > so they are backwards compatible, older hardware will just NOP it. > > > > I think it's a little more elaborate than that. Those asm instructions > > need to be added (probably as .instr or .word or something, unless > > there's a way to spell this particular nop that existing tooling will > > understand). > > depends on your toolchain version. when we added this to bionic, the > toolchain work was still happening. so you'll want to test against > whatever your oldest-supported toolchain is. > You just use the hint <immediate> instructions, they are understood by old toolchains. But you can only support a subset of the BTI/PAC instructions but it's been enough for most projects that follow the normal ABI conventions like OpenSSL/BoringSSL,etc, but not enough for libffi for example. > > Or it could be made conditional, but that would require > > converting any asm that's not already .S files to .S. Not bad, but not as in inline asm? Unless it's a branch target, no need. > > quite as trivial as adding something to CFLAGS. That's not really what I said... > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > not. > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? You need it, as the first instruction from a branch (where longjmp returns to) needs to be a BTI instruction. > > > > It's been done for many projects, glibc and bionic have it. The > > > problem with BTI is that when one item in the link > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > something like a libc that is fundamental in the link chain, > > > it turns it off for everything. > > > > This presumably requires some kind of machinery for how dynamic > > linking will work, and possibly turning it off if a library without it > > is dlopened? > > > > My understanding doing some brief searches though was that you can > > individually mprotect it off in certain regions. So maybe it's > > possible to just enable only for DSOs that support it? > > correct. > > > > The initial scope of code changes would be what's reported when > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > Is there a way to disable these warnings so that every asm file does > > not need to be cluttered with annotations? > > well, that's the ELF note stuff i was talking about, and if you don't > have it you'll fall foul of the static linker saying "not all this > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > doing nothing because the static linker effectively tells it not to. Yep, well said ENH. It's been since Android since we crossed paths :-). It's not that hard to annotate an asm file :-p I forget what project (I think it was gnutls, but they just use openssl's code for the asm) but I just put it in a header file and by virtue of #include'ing it you get the notes added. > > > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 23:18 ` William Roberts @ 2024-02-13 2:08 ` Rich Felker 2024-02-13 14:47 ` William Roberts 2024-02-15 0:03 ` Szabolcs Nagy 0 siblings, 2 replies; 25+ messages in thread From: Rich Felker @ 2024-02-13 2:08 UTC (permalink / raw) To: William Roberts; +Cc: enh, musl On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote: > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote: > > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > > > > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > > > > Hello, > > > > > > > > > > > > I was just wondering if there was any work being done to support PAC > > > > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > > > > the work. > > > > > > > > > > I'm not aware of any active work on this, but before writing a full > > > > > implementation, it would be really helpful to start with a basic > > > > > proposal for the scope of changes needed to make it work to assess > > > > > whether these are manageable and acceptable cost. > > > > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > > the NOP space > > > > so they are backwards compatible, older hardware will just NOP it. > > > > > > I think it's a little more elaborate than that. Those asm instructions > > > need to be added (probably as .instr or .word or something, unless > > > there's a way to spell this particular nop that existing tooling will > > > understand). > > > > depends on your toolchain version. when we added this to bionic, the > > toolchain work was still happening. so you'll want to test against > > whatever your oldest-supported toolchain is. > > > > You just use the hint <immediate> instructions, they are understood by old > toolchains. But you can only support a subset of the BTI/PAC instructions > but it's been enough for most projects that follow the normal ABI conventions > like OpenSSL/BoringSSL,etc, but not enough for libffi for example. If hint goes all the way back, that's probably fine and ideal to use. > > > Or it could be made conditional, but that would require > > > converting any asm that's not already .S files to .S. Not bad, but not > > as in inline asm? Unless it's a branch target, no need. No, .S (preprocessed) vs .s (not). But if the hint insn works, I think just having it there unconditionally is probably the way to go. > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > > not. > > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > You need it, as the first instruction from a branch (where longjmp returns to) > needs to be a BTI instruction. Is that different from a normal function return? Note that in the case of sigsetjmp, (sig)longjmp returns to a point inside the sigsetjmp asm, so that point needs the annotation I think. > > > > It's been done for many projects, glibc and bionic have it. The > > > > problem with BTI is that when one item in the link > > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > > something like a libc that is fundamental in the link chain, > > > > it turns it off for everything. > > > > > > This presumably requires some kind of machinery for how dynamic > > > linking will work, and possibly turning it off if a library without it > > > is dlopened? > > > > > > My understanding doing some brief searches though was that you can > > > individually mprotect it off in certain regions. So maybe it's > > > possible to just enable only for DSOs that support it? > > > > correct. OK, that's good to know. So which direction is it? Do DSOs that support BTI need it explicitly turned on via mprotect/mmap flags? Or is there some process-global flag to turn it on, and then ones that don't support it need it turned off? I suspect it's possible to first enable BTI for third-party libraries as a feature of the dynamic linker, and add BTI support for libc itself as a separate thing. That might be a nice factoring to make changes minimal and easy for ppl to read. The changes in dynlink.c should be as arch-agnostic as possible. If there's a corresponding feature on other archs, it should use the same basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining the mechanisms for evaluating if an ELF file is compatible, how to do the mprotect, etc. > > > > The initial scope of code changes would be what's reported when > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > > > Is there a way to disable these warnings so that every asm file does > > > not need to be cluttered with annotations? > > > > well, that's the ELF note stuff i was talking about, and if you don't > > have it you'll fall foul of the static linker saying "not all this > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > > doing nothing because the static linker effectively tells it not to. > > Yep, well said ENH. It's been since Android since we crossed paths :-). > > It's not that hard to annotate an asm file :-p I forget what project > (I think it was gnutls, but they just use openssl's code for the asm) > but I just put it in a header file and by virtue of #include'ing it you get the > notes added. Yes, we generally don't do that. There are no "asm headers" in musl; all asm files are self-contained and readable standalone. So if there's no way to tell the assembler/linker from the command line that files are BTI-compatible without generating a huge load of warning spam, I guess it's a mess of copy-and-paste... Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-13 2:08 ` Rich Felker @ 2024-02-13 14:47 ` William Roberts 2024-02-13 17:51 ` Markus Wichmann 2024-02-15 0:03 ` Szabolcs Nagy 1 sibling, 1 reply; 25+ messages in thread From: William Roberts @ 2024-02-13 14:47 UTC (permalink / raw) To: Rich Felker; +Cc: enh, musl On Mon, Feb 12, 2024 at 8:08 PM Rich Felker <dalias@libc.org> wrote: > > On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote: > > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote: > > > > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > > > > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote: > > > > > > > > > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > > > > > Hello, > > > > > > > > > > > > > > I was just wondering if there was any work being done to support PAC > > > > > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > > > > > the work. > > > > > > > > > > > > I'm not aware of any active work on this, but before writing a full > > > > > > implementation, it would be really helpful to start with a basic > > > > > > proposal for the scope of changes needed to make it work to assess > > > > > > whether these are manageable and acceptable cost. > > > > > > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > > > the NOP space > > > > > so they are backwards compatible, older hardware will just NOP it. > > > > > > > > I think it's a little more elaborate than that. Those asm instructions > > > > need to be added (probably as .instr or .word or something, unless > > > > there's a way to spell this particular nop that existing tooling will > > > > understand). > > > > > > depends on your toolchain version. when we added this to bionic, the > > > toolchain work was still happening. so you'll want to test against > > > whatever your oldest-supported toolchain is. > > > > > > > You just use the hint <immediate> instructions, they are understood by old > > toolchains. But you can only support a subset of the BTI/PAC instructions > > but it's been enough for most projects that follow the normal ABI conventions > > like OpenSSL/BoringSSL,etc, but not enough for libffi for example. > > If hint goes all the way back, that's probably fine and ideal to use. It should. Is there a known minimal tool chain requirement and I can test? > > > > > Or it could be made conditional, but that would require > > > > converting any asm that's not already .S files to .S. Not bad, but not > > > > as in inline asm? Unless it's a branch target, no need. > > No, .S (preprocessed) vs .s (not). But if the hint insn works, I think > just having it there unconditionally is probably the way to go. > > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > > > not. > > > > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > > > You need it, as the first instruction from a branch (where longjmp returns to) > > needs to be a BTI instruction. > > Is that different from a normal function return? No, anywhere branches are allowed, a BTI instruction must be the first instruction. BTI is just a way for software to say, hey this is a valid jump/branch target, allow it. This reduces the amount of gadgets available to an attacker, which is why libc is such a juicy target, as it's in everything. A lot of things static link it, which effectively turns it off for the whole process. > > Note that in the case of sigsetjmp, (sig)longjmp returns to a point > inside the sigsetjmp asm, so that point needs the annotation I think. > > > > > > It's been done for many projects, glibc and bionic have it. The > > > > > problem with BTI is that when one item in the link > > > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > > > something like a libc that is fundamental in the link chain, > > > > > it turns it off for everything. > > > > > > > > This presumably requires some kind of machinery for how dynamic > > > > linking will work, and possibly turning it off if a library without it > > > > is dlopened? > > > > > > > > My understanding doing some brief searches though was that you can > > > > individually mprotect it off in certain regions. So maybe it's > > > > possible to just enable only for DSOs that support it? > > > > > > correct. > > OK, that's good to know. So which direction is it? Do DSOs that > support BTI need it explicitly turned on via mprotect/mmap flags? Yes, so the kernel will manage the EL1 register flag for this, and then mprotect sets the PROT_BTI flag during dlopen(). > Or > is there some process-global flag to turn it on, and then ones that > don't support it need it turned off? EL1 MSR register (I forget which one offhand), but the granularity is managed at the page level. > > I suspect it's possible to first enable BTI for third-party libraries > as a feature of the dynamic linker, If you mean, check the GNU Notes section for BTI enabled and set PROT_BTI via mprotect, that's just one of the many patches, but can be taken independently. > and add BTI support for libc > itself as a separate thing. That might be a nice factoring to make > changes minimal and easy for ppl to read. This is just a matter of organizing things, there's no dependency between enabling the linker and enabling the library itself. So of course that shouldn't come as one giant patch. It's important to note, that even when enabling the assembly code files, if the C level source is not built with -mbranch-protection=standard, the feature will remain off for the library. BTI is enabled for third party packages on Fedora by default: - https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication The problem is, now all the packages that don't use the default set of CFLAGS and/or roll their own asm. > > The changes in dynlink.c should be as arch-agnostic as possible. If > there's a corresponding feature on other archs I can't think of anything like this offhand, but aarches may want to add prot flags to mprotect calls. > it should use the same > basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining > the mechanisms for evaluating if an ELF file is compatible, how to do > the mprotect, etc. it usually #ifdef aarch64 if (gnu_notes_bti_set && (prot & PROT_EXEC)) { prot |= PROT_BTI; else { prot &= ~PROT_BTI; } #endif mprotect(..., prot); but this could be done with something like an arch specific macro fn or inline in a header that just does nothing for most architectures or a weak symbol, but I am always worried with weak symbols someone might override it in a bad way. > > > > > > The initial scope of code changes would be what's reported when > > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > > > > > Is there a way to disable these warnings so that every asm file does > > > > not need to be cluttered with annotations? > > > > > > well, that's the ELF note stuff i was talking about, and if you don't > > > have it you'll fall foul of the static linker saying "not all this > > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > > > doing nothing because the static linker effectively tells it not to. > > > > Yep, well said ENH. It's been since Android since we crossed paths :-). > > > > It's not that hard to annotate an asm file :-p I forget what project > > (I think it was gnutls, but they just use openssl's code for the asm) > > but I just put it in a header file and by virtue of #include'ing it you get the > > notes added. > > Yes, we generally don't do that. There are no "asm headers" in musl; > all asm files are self-contained and readable standalone. So if > there's no way to tell the assembler/linker from the command line that > files are BTI-compatible without generating a huge load of warning > spam, I guess it's a mess of copy-and-paste... > > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-13 14:47 ` William Roberts @ 2024-02-13 17:51 ` Markus Wichmann 2024-02-14 2:19 ` Rich Felker 0 siblings, 1 reply; 25+ messages in thread From: Markus Wichmann @ 2024-02-13 17:51 UTC (permalink / raw) To: musl; +Cc: Rich Felker, enh Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts: > It should. Is there a known minimal tool chain requirement and I can test? > Typically the first C99 compiler or the first aarch64 compiler, whichever is younger. > > No, anywhere branches are allowed, a BTI instruction must be the first > instruction. BTI is just a way for software to say, hey this is a > valid jump/branch > target, allow it. This reduces the amount of gadgets available to an > attacker, which > is why libc is such a juicy target, as it's in everything. A lot of > things static link it, > which effectively turns it off for the whole process. > So this means there must be a BTI instruction following every single BL instruction. But in the end this isn't that much different from endbr64 on the PC. Whatever happened to those patches, BTW? > Yes, so the kernel will manage the EL1 register flag for this, and then > mprotect sets the PROT_BTI flag during dlopen(). > Well, this is a novelty. This is the first time there will be an arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far that code has been entirely portable. > It's important to note, that even when enabling the assembly code files, if the > C level source is not built with -mbranch-protection=standard, the feature will > remain off for the library. > Arch-specific compiler flags are not a problem; configure.sh can add those as needed. > I can't think of anything like this offhand, but aarches may want to add prot > flags to mprotect calls. > That hasn't happened yet. Of course, this may be as simple as adding a static inline function. The fact that the important information is in a note section is yet another novelty, of course. So far, the important information (even arch-specific) has been contained in the dynamic section. > it usually > #ifdef aarch64 > if (gnu_notes_bti_set && (prot & PROT_EXEC)) { > prot |= PROT_BTI; > else { > prot &= ~PROT_BTI; > } > #endif > > mprotect(..., prot); > So far we have managed to steer clear of conditional inclusion, and I think we should try to keep it that way. Ciao, Markus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-13 17:51 ` Markus Wichmann @ 2024-02-14 2:19 ` Rich Felker 2024-02-14 3:19 ` William Roberts ` (2 more replies) 0 siblings, 3 replies; 25+ messages in thread From: Rich Felker @ 2024-02-14 2:19 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl, enh On Tue, Feb 13, 2024 at 06:51:47PM +0100, Markus Wichmann wrote: > Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts: > > It should. Is there a known minimal tool chain requirement and I can test? > > Typically the first C99 compiler or the first aarch64 compiler, > whichever is younger. I think binutils is the relevant component, and that'd be whichever version of binutils added aarch64. > > No, anywhere branches are allowed, a BTI instruction must be the first > > instruction. BTI is just a way for software to say, hey this is a > > valid jump/branch > > target, allow it. This reduces the amount of gadgets available to an > > attacker, which > > is why libc is such a juicy target, as it's in everything. A lot of > > things static link it, > > which effectively turns it off for the whole process. > > > > So this means there must be a BTI instruction following every single BL > instruction. > > But in the end this isn't that much different from endbr64 on the PC. > Whatever happened to those patches, BTW? What is the situation on x86? Does it use the same kind of per-page enforcement mode, or is it only global, requiring disabling it if any DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on older ISA levels, or does it need to be conditional? > > Yes, so the kernel will manage the EL1 register flag for this, and then > > mprotect sets the PROT_BTI flag during dlopen(). > > Well, this is a novelty. This is the first time there will be an > arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far > that code has been entirely portable. Can the flag be used at mmap time, or only in mprotect? It would be a lot more efficient to do it as part of the mmap, but getting visibility to the note to know you need it at mmap time seems difficult and more costly than doing the mprotect later... I assume we would either add the code conditional on the existence of a PROT_BTI macro (#ifdef) and define that to the corresponding thing on other archs in the future, or abstract it with a new name in arch/$ARCH/reloc.h defined in terms of whatever the arch provides so as to be a little bit more naming-agnostic. It should not be #ifdef __aarch64__ or similar. > > It's important to note, that even when enabling the assembly code files, if the > > C level source is not built with -mbranch-protection=standard, the feature will > > remain off for the library. > > > > Arch-specific compiler flags are not a problem; configure.sh can add > those as needed. Yep, that's fine. Possibly a question of whether it should be on by default or configurable, but if there's essentially no cost, on-by-default seems fine. > > I can't think of anything like this offhand, but aarches may want to add prot > > flags to mprotect calls. > > That hasn't happened yet. Of course, this may be as simple as adding a > static inline function. The fact that the important information is in a > note section is yet another novelty, of course. So far, the important > information (even arch-specific) has been contained in the dynamic > section. Yes, that's gratuitously annoying. Ideally it would have been somewhere easily accessible from the Ehdr so it's available at initial mmap time... :/ > > it usually > > #ifdef aarch64 > > if (gnu_notes_bti_set && (prot & PROT_EXEC)) { > > prot |= PROT_BTI; > > else { > > prot &= ~PROT_BTI; > > } > > #endif > > > > mprotect(..., prot); > > > > So far we have managed to steer clear of conditional inclusion, and I > think we should try to keep it that way. Yes. I think reloc.h should define a predicate macro (which may call a static inline function if the predicate is complex) to check if a DSO needs branch protection on its PROT_EXEC segments. src/internal/dynlink.h could provide a default always-false one if it's not defined. Then dynlink.c can just, when that predicate evaluates true, loop thru the segments and mprotect any PROT_EXEC ones to also have PROT_BTI or whatever. This remains very arch-agnostic and the code should be either directly usable on other archs, or admit easy generalization if needed. Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 2:19 ` Rich Felker @ 2024-02-14 3:19 ` William Roberts 2024-02-14 4:44 ` Markus Wichmann 2024-02-15 13:29 ` Stefan O'Rear 2 siblings, 0 replies; 25+ messages in thread From: William Roberts @ 2024-02-14 3:19 UTC (permalink / raw) To: musl; +Cc: Markus Wichmann, enh On Tue, Feb 13, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote: > > On Tue, Feb 13, 2024 at 06:51:47PM +0100, Markus Wichmann wrote: > > Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts: > > > It should. Is there a known minimal tool chain requirement and I can test? > > > > Typically the first C99 compiler or the first aarch64 compiler, > > whichever is younger. > > I think binutils is the relevant component, and that'd be whichever > version of binutils added aarch64. AFAICT 2.24 tagged Dec 2013 > > > > No, anywhere branches are allowed, a BTI instruction must be the first > > > instruction. BTI is just a way for software to say, hey this is a > > > valid jump/branch > > > target, allow it. This reduces the amount of gadgets available to an > > > attacker, which > > > is why libc is such a juicy target, as it's in everything. A lot of > > > things static link it, > > > which effectively turns it off for the whole process. > > > > > > > So this means there must be a BTI instruction following every single BL > > instruction. > > I don't think so, you wouldn't want call sites to effectively become gadget locations. You want entry points marked, returns are handled with PAC, which goes hand in hand with BTI. As the PAC instruction can also be a landing pad. Looking at some generated ASM, I don't see BL's being marked. Here's a decent doc BTW: - https://www.google.com/search?q=arm+introduction+to+bti&rlz=1C5GCEM_enUS1088US1089&oq=arm+introduction+to+bti&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRifBTIHCAQQIRifBTIHCAUQIRifBdIBCDM2NTZqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8#:~:text=Arm%20Instruction%20Set,Arm%20Compiler%206 Essentially call points need a valid pac/bti instruction, if using pac, then there must be a validate before ret. landing pad with pointer auth (with the A key): pacisasp or hint #25 validate with autiasp or int #29 landing pad with pointer auth (with the B key): pacisasp or hint #hint #27 validate with autibsp or hint #31 landing pad BTI only: bti c or hint #34 The compilers set some defines so you know which key to use, but some projects just support the A key. To support other keys, you would need to go the route of conditional asm, but almost everyone just uses the A key, it's what's turned on by -mbranch-protection=standard (I think), easy to catch in the arm header and balk if someone sets the B key. > > But in the end this isn't that much different from endbr64 on the PC. > > Whatever happened to those patches, BTW? > > What is the situation on x86? Does it use the same kind of per-page > enforcement mode, or is it only global, requiring disabling it if any > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > older ISA levels, or does it need to be conditional? > > > > Yes, so the kernel will manage the EL1 register flag for this, and then > > > mprotect sets the PROT_BTI flag during dlopen(). > > > > Well, this is a novelty. This is the first time there will be an > > arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far > > that code has been entirely portable. > > Can the flag be used at mmap time, or only in mprotect? It would be a > lot more efficient to do it as part of the mmap, but getting > visibility to the note to know you need it at mmap time seems > difficult and more costly than doing the mprotect later... > > I assume we would either add the code conditional on the existence of > a PROT_BTI macro (#ifdef) and define that to the corresponding thing > on other archs in the future, or abstract it with a new name in > arch/$ARCH/reloc.h defined in terms of whatever the arch provides so > as to be a little bit more naming-agnostic. > > It should not be #ifdef __aarch64__ or similar. > > > > It's important to note, that even when enabling the assembly code files, if the > > > C level source is not built with -mbranch-protection=standard, the feature will > > > remain off for the library. > > > > > > > Arch-specific compiler flags are not a problem; configure.sh can add > > those as needed. > > Yep, that's fine. Possibly a question of whether it should be on by > default or configurable, but if there's essentially no cost, > on-by-default seems fine. > > > > I can't think of anything like this offhand, but aarches may want to add prot > > > flags to mprotect calls. > > > > That hasn't happened yet. Of course, this may be as simple as adding a > > static inline function. The fact that the important information is in a > > note section is yet another novelty, of course. So far, the important > > information (even arch-specific) has been contained in the dynamic > > section. > > Yes, that's gratuitously annoying. Ideally it would have been > somewhere easily accessible from the Ehdr so it's available at initial > mmap time... :/ > > > > it usually > > > #ifdef aarch64 > > > if (gnu_notes_bti_set && (prot & PROT_EXEC)) { > > > prot |= PROT_BTI; > > > else { > > > prot &= ~PROT_BTI; > > > } > > > #endif > > > > > > mprotect(..., prot); > > > > > > > So far we have managed to steer clear of conditional inclusion, and I > > think we should try to keep it that way. > > Yes. I think reloc.h should define a predicate macro (which may call a > static inline function if the predicate is complex) to check if a DSO > needs branch protection on its PROT_EXEC segments. > src/internal/dynlink.h could provide a default always-false one if > it's not defined. Then dynlink.c can just, when that predicate > evaluates true, loop thru the segments and mprotect any PROT_EXEC ones > to also have PROT_BTI or whatever. I was tinkering today, arch/generic has a bunch of empty files, just add an empty file and let arches add to it. This code is far from useful, but just clarifying that approach. diff --git a/arch/aarch64/mprot_arch.h b/arch/aarch64/mprot_arch.h new file mode 100644 index 00000000..32f7afc6 --- /dev/null +++ b/arch/aarch64/mprot_arch.h @@ -0,0 +1,13 @@ +#ifndef _MPROT_ARCH_H +#define _MPROT_ARCH_H + +static inline int do_mprot(int prot) { + if (prot & PROT_EXEC) + +#define MPROT_ARCH(prot) \ +do { + if (prot & PROT_EXEC) { \ + prot |= PROT_BTI; \ + } \ +} while(0) +#endif diff --git a/arch/generic/mprot_arch.h b/arch/generic/mprot_arch.h new file mode 100644 index 00000000..e69de29b diff --git a/ldso/dynlink.c b/ldso/dynlink.c index 324aa859..a9b2278a 100644 --- a/ldso/dynlink.c +++ b/ldso/dynlink.c @@ -22,6 +22,7 @@ #include "pthread_impl.h" #include "fork_impl.h" #include "dynlink.h" +#include "mprot_arch.h" static size_t ldso_page_size; #ifndef PAGE_SIZE @@ -851,7 +852,9 @@ static void *map_library(int fd, struct dso *dso) } for (i=0; ((size_t *)(base+dyn))[i]; i+=2) if (((size_t *)(base+dyn))[i]==DT_TEXTREL) { - if (mprotect(map, map_len, PROT_READ|PROT_WRITE|PROT_EXEC) + int prot = PROT_READ|PROT_WRITE|PROT_EXEC; + MPROT_ARCH(prot); + if (mprotect(map, map_len, prot) && errno != ENOSYS) goto error; break; > This remains very arch-agnostic and the code should be either directly > usable on other archs, or admit easy generalization if needed. > > Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 2:19 ` Rich Felker 2024-02-14 3:19 ` William Roberts @ 2024-02-14 4:44 ` Markus Wichmann 2024-02-14 13:32 ` Thorsten Glaser 2024-02-15 13:29 ` Stefan O'Rear 2 siblings, 1 reply; 25+ messages in thread From: Markus Wichmann @ 2024-02-14 4:44 UTC (permalink / raw) To: musl; +Cc: enh Am Tue, Feb 13, 2024 at 09:19:25PM -0500 schrieb Rich Felker: > What is the situation on x86? Does it use the same kind of per-page > enforcement mode, or is it only global, requiring disabling it if any > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > older ISA levels, or does it need to be conditional? > My, what a journey. I had a look around the Internet for this question and kept finding contradictory results. Turns out that is because, as per kernel documentation, Linux only supports *kernel* IBT. The only part of CET it supports for userspace is shadow stacks. Unless the kernel docs are not up-to-date, of course. According to Intel, the ENDBR64 instruction decodes as NOP on older processors. GCC has support for emiting it, but at this point in time it appears to be useless outside of Linux itself. Ciao, Markus ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 4:44 ` Markus Wichmann @ 2024-02-14 13:32 ` Thorsten Glaser 2024-02-14 14:03 ` Rich Felker 0 siblings, 1 reply; 25+ messages in thread From: Thorsten Glaser @ 2024-02-14 13:32 UTC (permalink / raw) To: musl Markus Wichmann dixit: >According to Intel, the ENDBR64 instruction decodes as NOP on older >processors. That’s unfortunately only true for processors manufactored by Intel. There exist 686-class CPUs that don’t handle these and other long nops so it’s best omitted on generic, as in not -march=native, builds. bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 13:32 ` Thorsten Glaser @ 2024-02-14 14:03 ` Rich Felker 2024-02-14 14:12 ` Thorsten Glaser 0 siblings, 1 reply; 25+ messages in thread From: Rich Felker @ 2024-02-14 14:03 UTC (permalink / raw) To: Thorsten Glaser; +Cc: musl On Wed, Feb 14, 2024 at 01:32:13PM +0000, Thorsten Glaser wrote: > Markus Wichmann dixit: > > >According to Intel, the ENDBR64 instruction decodes as NOP on older > >processors. > > That’s unfortunately only true for processors manufactored by Intel. > There exist 686-class CPUs that don’t handle these and other long nops > so it’s best omitted on generic, as in not -march=native, builds. Lovely. So yet another reason the Intel thing sounds unusable in practice while the ARM thing seems very reasonable to support... Since you mentioned 686-class which are 32-bit, is the same true for x86_64, or is the situation better there? Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 14:03 ` Rich Felker @ 2024-02-14 14:12 ` Thorsten Glaser 0 siblings, 0 replies; 25+ messages in thread From: Thorsten Glaser @ 2024-02-14 14:12 UTC (permalink / raw) To: musl Rich Felker dixit: >Since you mentioned 686-class which are 32-bit, is the same true for AIUI for amd64 and x32, it should be fine to use. bye, //mirabilos -- <ch> you introduced a merge commit │<mika> % g rebase -i HEAD^^ <mika> sorry, no idea and rebasing just fscked │<mika> Segmentation <ch> should have cloned into a clean repo │ fault (core dumped) <ch> if I rebase that now, it's really ugh │<mika:#grml> wuahhhhhh ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-14 2:19 ` Rich Felker 2024-02-14 3:19 ` William Roberts 2024-02-14 4:44 ` Markus Wichmann @ 2024-02-15 13:29 ` Stefan O'Rear 2024-02-15 14:06 ` Rich Felker 2 siblings, 1 reply; 25+ messages in thread From: Stefan O'Rear @ 2024-02-15 13:29 UTC (permalink / raw) To: Rich Felker, musl, Markus Wichmann; +Cc: enh On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote: > What is the situation on x86? Does it use the same kind of per-page > enforcement mode, or is it only global, requiring disabling it if any > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > older ISA levels, or does it need to be conditional? The situation for hardware control flow hardening on risc-v is two in-development extensions: Zicfilp (landing pads) provides a 4-byte instruction which marks valid targets for indirect jumps and calls, written `lpad LABEL`. This is an *architectural NOP at all ISA levels*. Enforcement is process-global, not per-page. Indirect jumps can be exempted from landing pad depending on which register is used for the address; this is expected to be used if the address is obtained from read-only memory or an auipc instruction, so jump tables do not use landing pads, nor are landing pads needed after direct calls regardless of length. A function which is not a visible symbol and does not have its address taken does not need a landing pad. The ABI function return is a member of the set of indirect jumps which bypass landing pad checks, so no landing pads are needed at the return sites of ABI function calls. Zicfilp intentionally does not provide any protection against ROP, a different extension must be used to protect return addresses. Landing pads have a 20-bit label which is expected to be used for a function type signature, catching function type confusion events. The hashing scheme used to generate the label from the call signature has not yet been decided. The call signature must be placed in the x7/t2 register prior to an indirect jump. The immediate layout is such that indirect jump sites can use a single lui instruction with a matching 20-bit immediate. Landing pads do not check x7/t2 if reached by a direct jump, so there is no need to initialize it prior to a direct jump. A `lpad 0` matches any incoming type signature. Zicfiss (shadow stacks) provides a new shadow stack pointer register and shadow stack memory which cannot be modified using ordinary stores. Unlike GCS and SHSTK, the shadow stack is never accessed automatically, "sspush ra" and "sspopchk ra" instructions must be added to the prologue and epilogue of functions which spill their return address to the stack. These instructions are NOPs if the shadow stack is disabled at runtime, but are *not architectural NOPs* and will trap if executed on current hardware. Also unlike GCS and SHSTK, the Zicfiss `ssp` register can be read and written from user mode using dedicated instructions, so no special mechanism is used for shadow stack switching. To my knowledge, nothing analogous to PAC is under development. Both shadow stacks and landing pads are enabled by bits in the senvcfg register, and are exposed via a prctl. The shadow stack prctl is being developed as an architecture-independent API, which provides some form of automatic allocation and deallocation of shadow stacks for threads. I believe the current strategy for marking CFI support in binaries is an ELF note similar to the x86 approach, but have not checked this part in detail. -s ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-15 13:29 ` Stefan O'Rear @ 2024-02-15 14:06 ` Rich Felker 2024-03-02 14:33 ` Szabolcs Nagy 0 siblings, 1 reply; 25+ messages in thread From: Rich Felker @ 2024-02-15 14:06 UTC (permalink / raw) To: Stefan O'Rear; +Cc: musl, Markus Wichmann, enh On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote: > On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote: > > What is the situation on x86? Does it use the same kind of per-page > > enforcement mode, or is it only global, requiring disabling it if any > > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > > older ISA levels, or does it need to be conditional? > > The situation for hardware control flow hardening on risc-v is two > in-development extensions: > > Zicfilp (landing pads) provides a 4-byte instruction which marks valid > targets for indirect jumps and calls, written `lpad LABEL`. This is > an *architectural NOP at all ISA levels*. Enforcement is > process-global, not per-page. > > Indirect jumps can be exempted from landing pad depending on which > register is used for the address; this is expected to be used if the > address is obtained from read-only memory or an auipc instruction, so > jump tables do not use landing pads, nor are landing pads needed after > direct calls regardless of length. A function which is not a visible > symbol and does not have its address taken does not need a landing pad. > > The ABI function return is a member of the set of indirect jumps > which bypass landing pad checks, so no landing pads are needed at the > return sites of ABI function calls. Zicfilp intentionally does not > provide any protection against ROP, a different extension must be used > to protect return addresses. This all sounds very good and reasonable to support. > Landing pads have a 20-bit label which is expected to be used for a > function type signature, catching function type confusion events. > The hashing scheme used to generate the label from the call signature > has not yet been decided. The call signature must be placed in the > x7/t2 register prior to an indirect jump. The immediate layout is > such that indirect jump sites can use a single lui instruction with > a matching 20-bit immediate. Landing pads do not check x7/t2 if > reached by a direct jump, so there is no need to initialize it prior > to a direct jump. A `lpad 0` matches any incoming type signature. This is very interesting. I wonder if it will break code with UB like: https://github.com/systemd/systemd/blob/d0aef638ac43ad64df920d8b3f6c2d835db7643c/src/basic/sort-util.h It's my belief that it *should* break such code, and that breaking it would be a feature. But I could see folks making the choice to hash just the "mechanical" types rather than actual types, and there may be practical reasons this is what needs to be done. Note that this also has implications for musl and whether we would ever be able to redefine some opaque types. In fact, we already have some types, like pthread_t, which are defined differently in __cplusplus mode to match a name mangling ABI; these would be badly broken. I'm not sure what the right fix for that would be. (Doing that to begin with was almost surely a big mistake.) > Zicfiss (shadow stacks) provides a new shadow stack pointer register > and shadow stack memory which cannot be modified using ordinary stores. > Unlike GCS and SHSTK, the shadow stack is never accessed automatically, > "sspush ra" and "sspopchk ra" instructions must be added to the prologue > and epilogue of functions which spill their return address to the stack. > These instructions are NOPs if the shadow stack is disabled at runtime, > but are *not architectural NOPs* and will trap if executed on current > hardware. > > Also unlike GCS and SHSTK, the Zicfiss `ssp` register can be read and > written from user mode using dedicated instructions, so no special > mechanism is used for shadow stack switching. > > To my knowledge, nothing analogous to PAC is under development. This is unfortunate, since PAC seems a lot less invasive and actually-doable. However, protection equivalent to PAC also seems possible in software, in an entirely arch-agnostic way, with overhead only slightly higher than standard SSP... so I'm not sure why we aren't just pursuing getting compilers to do that rather than chasing arch-specific anti-ROP hacks vendors are trying to use to differentiate themselves and remain relevant in the age of open ISAs... > Both shadow stacks and landing pads are enabled by bits in the senvcfg > register, and are exposed via a prctl. The shadow stack prctl is being > developed as an architecture-independent API, which provides some form > of automatic allocation and deallocation of shadow stacks for threads. > I believe the current strategy for marking CFI support in binaries is > an ELF note similar to the x86 approach, but have not checked this part > in detail. I know this should be written up in more detail, but based on request on IRC, I think it would be good to go ahead and mention "in public" on the list: *** Any API for shadow stacks that involved automatic allocation and deallocation which can fail "behind the application's back" at runtime is a very poor candidate for support by musl. *** To be supported, shadow stacks would probably need to use contiguous memory (with special protections applied to it for the duration of its usage as call stack, with automatic end to that status if it's subsequently accessed with normal loads/stores) with the normal application-provided stack, so as not to break sigaltstack, pthread_setstack, makecontext, etc. and not to introduce memory leaks or conditions under which a behind-the-scenes allocation failure makes hard program termination the only possible result. AFAICT the current shadow stack stuff in the kernel (and maybe the underlying hardware mechanisms) is not usable. Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-15 14:06 ` Rich Felker @ 2024-03-02 14:33 ` Szabolcs Nagy 2024-03-02 14:45 ` Rich Felker 0 siblings, 1 reply; 25+ messages in thread From: Szabolcs Nagy @ 2024-03-02 14:33 UTC (permalink / raw) To: Rich Felker; +Cc: Stefan O'Rear, musl, Markus Wichmann, enh * Rich Felker <dalias@libc.org> [2024-02-15 09:06:40 -0500]: > On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote: > > On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote: > > > What is the situation on x86? Does it use the same kind of per-page > > > enforcement mode, or is it only global, requiring disabling it if any > > > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > > > older ISA levels, or does it need to be conditional? > > > > The situation for hardware control flow hardening on risc-v is two > > in-development extensions: > > > > Zicfilp (landing pads) provides a 4-byte instruction which marks valid > > targets for indirect jumps and calls, written `lpad LABEL`. This is > > an *architectural NOP at all ISA levels*. Enforcement is > > process-global, not per-page. > > > > Indirect jumps can be exempted from landing pad depending on which > > register is used for the address; this is expected to be used if the > > address is obtained from read-only memory or an auipc instruction, so > > jump tables do not use landing pads, nor are landing pads needed after > > direct calls regardless of length. A function which is not a visible > > symbol and does not have its address taken does not need a landing pad. > > > > The ABI function return is a member of the set of indirect jumps > > which bypass landing pad checks, so no landing pads are needed at the > > return sites of ABI function calls. Zicfilp intentionally does not > > provide any protection against ROP, a different extension must be used > > to protect return addresses. > > This all sounds very good and reasonable to support. process global setting is not practical because legacy code maybe dlopened so libc cannot decide when to enable the feature. linux in general only provides per thread disable for such features which does not help with dlopen. > > Both shadow stacks and landing pads are enabled by bits in the senvcfg > > register, and are exposed via a prctl. The shadow stack prctl is being ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-03-02 14:33 ` Szabolcs Nagy @ 2024-03-02 14:45 ` Rich Felker 0 siblings, 0 replies; 25+ messages in thread From: Rich Felker @ 2024-03-02 14:45 UTC (permalink / raw) To: Stefan O'Rear, musl, Markus Wichmann, enh On Sat, Mar 02, 2024 at 03:33:45PM +0100, Szabolcs Nagy wrote: > * Rich Felker <dalias@libc.org> [2024-02-15 09:06:40 -0500]: > > > On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote: > > > On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote: > > > > What is the situation on x86? Does it use the same kind of per-page > > > > enforcement mode, or is it only global, requiring disabling it if any > > > > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on > > > > older ISA levels, or does it need to be conditional? > > > > > > The situation for hardware control flow hardening on risc-v is two > > > in-development extensions: > > > > > > Zicfilp (landing pads) provides a 4-byte instruction which marks valid > > > targets for indirect jumps and calls, written `lpad LABEL`. This is > > > an *architectural NOP at all ISA levels*. Enforcement is > > > process-global, not per-page. > > > > > > Indirect jumps can be exempted from landing pad depending on which > > > register is used for the address; this is expected to be used if the > > > address is obtained from read-only memory or an auipc instruction, so > > > jump tables do not use landing pads, nor are landing pads needed after > > > direct calls regardless of length. A function which is not a visible > > > symbol and does not have its address taken does not need a landing pad. > > > > > > The ABI function return is a member of the set of indirect jumps > > > which bypass landing pad checks, so no landing pads are needed at the > > > return sites of ABI function calls. Zicfilp intentionally does not > > > provide any protection against ROP, a different extension must be used > > > to protect return addresses. > > > > This all sounds very good and reasonable to support. > > process global setting is not practical > because legacy code maybe dlopened so libc > cannot decide when to enable the feature. That's exactly what you need process-global: so as soon as you dlopen an incompatible library, all enforcement gets turned off and everything turns into nops. > linux in general only provides per thread disable > for such features which does not help with dlopen. Indeed this is a problem. The kernel needs to provide a way to make sure none of the special instructions, which may still be pending (and blocked by arbitrarily many interrupting stack frames) fault if executed after disabling. In theory there are horrible ways userspace could do this if we wrapped signal handlers and patched things up at every signal return (to restart any interrupted critical section), but that kind of invasiveness is not worth it to support shadow stacks. Rich ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-13 2:08 ` Rich Felker 2024-02-13 14:47 ` William Roberts @ 2024-02-15 0:03 ` Szabolcs Nagy 2024-02-15 0:22 ` enh 1 sibling, 1 reply; 25+ messages in thread From: Szabolcs Nagy @ 2024-02-15 0:03 UTC (permalink / raw) To: Rich Felker; +Cc: William Roberts, enh, musl * Rich Felker <dalias@libc.org> [2024-02-12 21:08:34 -0500]: > On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote: > > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote: > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > > > the NOP space > > > > > so they are backwards compatible, older hardware will just NOP it. not quite that simple. sorry long brain dump follows: tl;dr: i think the main issues are asm handling (property notes and cfi for pac-ret), property note handling in ld.so, perf overhead and possible compat issues of pac-ret (not possible to disable per process) and testing (ensuring the code works when pac/bti is not nop). - asm code needs manual marking. GNU_PROPERTY note in asm is ugly and error prone, see https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/asmdefs.h#L23 i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK (this was an oversight, llvm added an option, binutils gas has none). - if asm code does indirect tailcall it should use x16 or x17. bti c is compatible with the indirect branch that way. - only functions that *maybe called indirectly* need bti c. this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc currently disagree about the definition in case of linker inserted veneers: gcc assumes ld handles the case if an inserted veneer indirectly branches to a non-bti location, llvm emits bti in all functions (including local ones without their address taken), just in case ld inserts an indirect veneer. https://github.com/ARM-software/abi-aa/issues/196 there is some difference to how PLTs are emitted between the linkers, but i don't think that causes compat issues (might cause trouble for tools that try to interpret the PLT). - dynamic linker has to figure out when to enable it. systemd MDWE (memory deny write exec) feature used to seccomp filter mprotect(PROT_EXEC) so even if the underlying mapping was already PROT_EXEC and it just added a PROT_BTI on top, mprotect would fail. this was fixed by adding an MDWE prctl to linux for systemd to make it stop using that filter, but there may be other similar seccomp filters and old kernels without MDWE so glibc re-mmaps the exec segment (which systemd happily accepts). note: the bit that tells if a load segment needs to be mapped as PROT_BTI is in the load segment (usually program headers are in the executable segment) so first mmap cannot get it right, unless a quick read of the prog headers are done before mmap but that has a lot of failure modes (the size can be unbounded) and does not gain much compared to just mmap twice. except when the exe is mapped by the kernel, then we cannot mmap since there is no fd and mprotect may fail). MDWE prctl: https://lwn.net/Articles/937315/ another detail is that static-exe / ld.so / vdso marking is handled by the kernel, while other dsos are PROT_BTI marked by ld.so. an interesting case is dynamic linked exe which was originally handled by ld.so, but after the MDWE fiasco the kernel started loading it with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI protected by the kernel). ld.so should also take care to gracefully handle BTI protection failure (or invalid notes) in dlopen. (although invalid note is more of an x86 thing.) - special functions may need to return indirectly instead of ret the only really nasty one is swapcontext which is not supported by musl so that's good (and it is only a problem if bti is used with a shadow-stack-like feature). for returns_twice functions (like setjmp) the compiler emits bti j at the call site so the second return can use an indirect branch. return from unwinder to an exception handling landing pad via indirect branch is supported too. otherwise return via indirect branch is not supported (no bti j at call sites). - a64fx (hpc core) implemented hint nops in a slow way so glibc only adds bti if glibc is configured for bti. https://sourceware.org/pipermail/libc-alpha/2021-May/125784.html > > > > I think it's a little more elaborate than that. Those asm instructions > > > > need to be added (probably as .instr or .word or something, unless > > > > there's a way to spell this particular nop that existing tooling will > > > > understand). > > > > You just use the hint <immediate> instructions, they are understood by old > > toolchains. But you can only support a subset of the BTI/PAC instructions yes 'hint <imm>' is armv8.0-a, part of the base isa. i think the non-hint pac instructions are not relevant to musl. > > but it's been enough for most projects that follow the normal ABI conventions > > like OpenSSL/BoringSSL,etc, but not enough for libffi for example. as far as i know openssl is the reason android does not enable pac: they added the hint instructions incorrectly at first so there are binaries that fail if the hw enables pac. this reveals another issue with pac/bti: since they are nops on most existing hw, they are not properly tested (e.g. by distro QA) so a binary can look ok until you move it to a newer machine. (but recent amazon graviton 4 has pac/bti so we may get more coverage soon.) > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > > > not. > > > > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > > > You need it, as the first instruction from a branch (where longjmp returns to) > > needs to be a BTI instruction. > > Is that different from a normal function return? > > Note that in the case of sigsetjmp, (sig)longjmp returns to a point > inside the sigsetjmp asm, so that point needs the annotation I think. setjmp/sigsetjmp has to decide how to protect the longjmp return. with a shadow-stack-like bw-edge-cfi, longjmp cannot return with ret (first ret from setjmp consumes the return address from the shadow stack), it must use indirect branch. (there is now gcs which is aarch64 shadow stack, linux support is in progress). since jmpbuf does not expose the return address representation a libc specific protection can be applied (mangling or pac) and then longjmp can use ret and remain protected. (but e.g. setcontext exposes the pc/lr so those cannot be mangled in memory). compilers emit bti j at the call site of returns_twice functions to allow both ret and indirect branch. musl sigsetjmp can decide what it does. > > > > > It's been done for many projects, glibc and bionic have it. The > > > > > problem with BTI is that when one item in the link > > > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > > > something like a libc that is fundamental in the link chain, > > > > > it turns it off for everything. > > > > > > > > This presumably requires some kind of machinery for how dynamic > > > > linking will work, and possibly turning it off if a library without it > > > > is dlopened? > > > > > > > > My understanding doing some brief searches though was that you can > > > > individually mprotect it off in certain regions. So maybe it's > > > > possible to just enable only for DSOs that support it? > > > > > > correct. > > OK, that's good to know. So which direction is it? Do DSOs that > support BTI need it explicitly turned on via mprotect/mmap flags? Or > is there some process-global flag to turn it on, and then ones that > don't support it need it turned off? per dso marking and explicit PROT_BTI setting. > I suspect it's possible to first enable BTI for third-party libraries > as a feature of the dynamic linker, and add BTI support for libc > itself as a separate thing. That might be a nice factoring to make > changes minimal and easy for ppl to read. for third-party, it is enough to fix crt*.o and handle markings in ld.so. (but of course for bti to be effective the libc must have it too, otherwise there are likely enough gadgets to do whatever) > The changes in dynlink.c should be as arch-agnostic as possible. If > there's a corresponding feature on other archs, it should use the same > basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining > the mechanisms for evaluating if an ELF file is compatible, how to do > the mprotect, etc. generic code should work. but e.g. see above about who handles the marking (kernel vs ld.so), turns out x86 (and likely aarch64, riscv) shadow stack uses different rules: always libc handles the marking. so there are caveats. > > > > > The initial scope of code changes would be what's reported when > > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > > > > > Is there a way to disable these warnings so that every asm file does > > > > not need to be cluttered with annotations? > > > > > > well, that's the ELF note stuff i was talking about, and if you don't > > > have it you'll fall foul of the static linker saying "not all this > > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > > > doing nothing because the static linker effectively tells it not to. > > > > Yep, well said ENH. It's been since Android since we crossed paths :-). > > > > It's not that hard to annotate an asm file :-p I forget what project > > (I think it was gnutls, but they just use openssl's code for the asm) > > but I just put it in a header file and by virtue of #include'ing it you get the > > notes added. > > Yes, we generally don't do that. There are no "asm headers" in musl; > all asm files are self-contained and readable standalone. So if > there's no way to tell the assembler/linker from the command line that > files are BTI-compatible without generating a huge load of warning > spam, I guess it's a mess of copy-and-paste... currently there is only the ugly asm directives, see above. final notes: bti is fairly deployable (iirc x86 ibt failed because it is not per dso, so dlopen does not really work, but aarch64 does not have that problem), not strong security (the final binary is littered with bti j/c so plenty opportunity to misdirect an indirect jump/call), but at least it has minimal impact (minimal compatibility issues and on a modern core even when bti is enabled it should not be slower). for pac every function can independently decide if it uses pac-ret (aka return address signing), no need for per dso marking. however it has bigger compat as well as performance impact: there are custom unwinders, pac-ret uses a new dwarf cfi (to mark the code regions where the return address is signed), custom unwinders may not understand this and that's a runtime crash. some code looks at or modifies the return address (various hacks), such code needs to be updated or not use pac-ret in relevant functions, but such issues are hard to discover without hw. pac is a per-boot system-wide setting, not per-process, so if there is any issue or bug there is no way to disable it for one broken binary (nowadays there is a disable prctl, but it is documented to slow the system down, so not suitable for working around perf issues). on simple cores pac can be slow (can add latency to non-leaf functions) with 48bit va space, there is only 7bit pac, i.e. limited protection. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-15 0:03 ` Szabolcs Nagy @ 2024-02-15 0:22 ` enh 2024-02-15 9:18 ` Szabolcs Nagy 0 siblings, 1 reply; 25+ messages in thread From: enh @ 2024-02-15 0:22 UTC (permalink / raw) To: Rich Felker, William Roberts, enh, musl On Wed, Feb 14, 2024 at 4:03 PM Szabolcs Nagy <nsz@port70.net> wrote: > > * Rich Felker <dalias@libc.org> [2024-02-12 21:08:34 -0500]: > > On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote: > > > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote: > > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote: > > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote: > > > > > > It's a matter of building with -mbranch-protection=standard > > > > > > > > > > > > Just the ASM labels need the first instruction to be a BTI. They're in > > > > > > the NOP space > > > > > > so they are backwards compatible, older hardware will just NOP it. > > not quite that simple. sorry long brain dump follows: > > tl;dr: i think the main issues are asm handling (property notes > and cfi for pac-ret), property note handling in ld.so, perf > overhead and possible compat issues of pac-ret (not possible > to disable per process) and testing (ensuring the code works > when pac/bti is not nop). > > - asm code needs manual marking. > > GNU_PROPERTY note in asm is ugly and error prone, see > https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/asmdefs.h#L23 > > i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK > (this was an oversight, llvm added an option, binutils gas has none). what's the option? (since Android only supports llvm, that might be worth considering as a slight cleanup for us...) > - if asm code does indirect tailcall it should use x16 or x17. > > bti c is compatible with the indirect branch that way. > > - only functions that *maybe called indirectly* need bti c. > > this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc > currently disagree about the definition in case of linker inserted > veneers: gcc assumes ld handles the case if an inserted veneer > indirectly branches to a non-bti location, llvm emits bti in all > functions (including local ones without their address taken), just > in case ld inserts an indirect veneer. > https://github.com/ARM-software/abi-aa/issues/196 > > there is some difference to how PLTs are emitted between the > linkers, but i don't think that causes compat issues (might cause > trouble for tools that try to interpret the PLT). > > - dynamic linker has to figure out when to enable it. > > systemd MDWE (memory deny write exec) feature used to seccomp > filter mprotect(PROT_EXEC) so even if the underlying mapping was > already PROT_EXEC and it just added a PROT_BTI on top, mprotect > would fail. this was fixed by adding an MDWE prctl to linux for > systemd to make it stop using that filter, but there may be other > similar seccomp filters and old kernels without MDWE so glibc > re-mmaps the exec segment (which systemd happily accepts). > > note: the bit that tells if a load segment needs to be mapped as > PROT_BTI is in the load segment (usually program headers are in > the executable segment) so first mmap cannot get it right, unless > a quick read of the prog headers are done before mmap but that has > a lot of failure modes (the size can be unbounded) and does not > gain much compared to just mmap twice. except when the exe is mapped > by the kernel, then we cannot mmap since there is no fd and mprotect > may fail). > MDWE prctl: https://lwn.net/Articles/937315/ > > another detail is that static-exe / ld.so / vdso marking is handled > by the kernel, while other dsos are PROT_BTI marked by ld.so. an > interesting case is dynamic linked exe which was originally handled > by ld.so, but after the MDWE fiasco the kernel started loading it > with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI > protected by the kernel). > > ld.so should also take care to gracefully handle BTI protection > failure (or invalid notes) in dlopen. (although invalid note is > more of an x86 thing.) > > - special functions may need to return indirectly instead of ret > > the only really nasty one is swapcontext which is not supported by > musl so that's good (and it is only a problem if bti is used with > a shadow-stack-like feature). > > for returns_twice functions (like setjmp) the compiler emits bti j > at the call site so the second return can use an indirect branch. > return from unwinder to an exception handling landing pad via > indirect branch is supported too. otherwise return via indirect > branch is not supported (no bti j at call sites). > > - a64fx (hpc core) implemented hint nops in a slow way > > so glibc only adds bti if glibc is configured for bti. > https://sourceware.org/pipermail/libc-alpha/2021-May/125784.html > > > > > > I think it's a little more elaborate than that. Those asm instructions > > > > > need to be added (probably as .instr or .word or something, unless > > > > > there's a way to spell this particular nop that existing tooling will > > > > > understand). > > > > > > You just use the hint <immediate> instructions, they are understood by old > > > toolchains. But you can only support a subset of the BTI/PAC instructions > > yes 'hint <imm>' is armv8.0-a, part of the base isa. > > i think the non-hint pac instructions are not relevant to musl. > > > > but it's been enough for most projects that follow the normal ABI conventions > > > like OpenSSL/BoringSSL,etc, but not enough for libffi for example. > > as far as i know openssl is the reason android does not enable pac: > they added the hint instructions incorrectly at first so there are > binaries that fail if the hw enables pac. that was one reason why "android does not enable pac" _by default in the android target triples for app developers_, yes --- though i think we're at the point where we think we should flip that default (not least because the number of users whose devices would actually _benefit_ from the extra instructions is a lot larger now!): https://github.com/android/ndk/issues/1914 > this reveals another issue with pac/bti: since they are nops on most > existing hw, they are not properly tested (e.g. by distro QA) so a > binary can look ok until you move it to a newer machine. (but recent > amazon graviton 4 has pac/bti so we may get more coverage soon.) exactly --- that was one of our key concerns, that app developers would _think_ they've tested with pac/bti but unknowingly used a device without. (or even an x86-64 emulator!) > > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably > > > > > not. > > > > > > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing? > > > > > > You need it, as the first instruction from a branch (where longjmp returns to) > > > needs to be a BTI instruction. > > > > Is that different from a normal function return? > > > > Note that in the case of sigsetjmp, (sig)longjmp returns to a point > > inside the sigsetjmp asm, so that point needs the annotation I think. > > setjmp/sigsetjmp has to decide how to protect the longjmp return. > > with a shadow-stack-like bw-edge-cfi, longjmp cannot return with > ret (first ret from setjmp consumes the return address from the > shadow stack), it must use indirect branch. (there is now gcs > which is aarch64 shadow stack, linux support is in progress). > > since jmpbuf does not expose the return address representation a > libc specific protection can be applied (mangling or pac) and then > longjmp can use ret and remain protected. (but e.g. setcontext > exposes the pc/lr so those cannot be mangled in memory). > > compilers emit bti j at the call site of returns_twice functions > to allow both ret and indirect branch. musl sigsetjmp can decide > what it does. > > > > > > > It's been done for many projects, glibc and bionic have it. The > > > > > > problem with BTI is that when one item in the link > > > > > > list doesn't support BTI the loader/linker turns it off. So when it's > > > > > > something like a libc that is fundamental in the link chain, > > > > > > it turns it off for everything. > > > > > > > > > > This presumably requires some kind of machinery for how dynamic > > > > > linking will work, and possibly turning it off if a library without it > > > > > is dlopened? > > > > > > > > > > My understanding doing some brief searches though was that you can > > > > > individually mprotect it off in certain regions. So maybe it's > > > > > possible to just enable only for DSOs that support it? > > > > > > > > correct. > > > > OK, that's good to know. So which direction is it? Do DSOs that > > support BTI need it explicitly turned on via mprotect/mmap flags? Or > > is there some process-global flag to turn it on, and then ones that > > don't support it need it turned off? > > per dso marking and explicit PROT_BTI setting. > > > I suspect it's possible to first enable BTI for third-party libraries > > as a feature of the dynamic linker, and add BTI support for libc > > itself as a separate thing. That might be a nice factoring to make > > changes minimal and easy for ppl to read. > > for third-party, it is enough to fix crt*.o and handle markings > in ld.so. > > (but of course for bti to be effective the libc must have it too, > otherwise there are likely enough gadgets to do whatever) > > > The changes in dynlink.c should be as arch-agnostic as possible. If > > there's a corresponding feature on other archs, it should use the same > > basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining > > the mechanisms for evaluating if an ELF file is compatible, how to do > > the mprotect, etc. > > generic code should work. > > but e.g. see above about who handles the marking (kernel vs ld.so), > turns out x86 (and likely aarch64, riscv) shadow stack uses different > rules: always libc handles the marking. so there are caveats. > > > > > > > The initial scope of code changes would be what's reported when > > > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings > > > > > > > > > > Is there a way to disable these warnings so that every asm file does > > > > > not need to be cluttered with annotations? > > > > > > > > well, that's the ELF note stuff i was talking about, and if you don't > > > > have it you'll fall foul of the static linker saying "not all this > > > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker > > > > doing nothing because the static linker effectively tells it not to. > > > > > > Yep, well said ENH. It's been since Android since we crossed paths :-). > > > > > > It's not that hard to annotate an asm file :-p I forget what project > > > (I think it was gnutls, but they just use openssl's code for the asm) > > > but I just put it in a header file and by virtue of #include'ing it you get the > > > notes added. > > > > Yes, we generally don't do that. There are no "asm headers" in musl; > > all asm files are self-contained and readable standalone. So if > > there's no way to tell the assembler/linker from the command line that > > files are BTI-compatible without generating a huge load of warning > > spam, I guess it's a mess of copy-and-paste... > > currently there is only the ugly asm directives, see above. > > final notes: > > bti is fairly deployable (iirc x86 ibt failed because it is not per > dso, so dlopen does not really work, but aarch64 does not have that > problem), not strong security (the final binary is littered with > bti j/c so plenty opportunity to misdirect an indirect jump/call), > but at least it has minimal impact (minimal compatibility issues and > on a modern core even when bti is enabled it should not be slower). > > for pac every function can independently decide if it uses pac-ret > (aka return address signing), no need for per dso marking. however > it has bigger compat as well as performance impact: > > there are custom unwinders, pac-ret uses a new dwarf cfi (to mark > the code regions where the return address is signed), custom > unwinders may not understand this and that's a runtime crash. > > some code looks at or modifies the return address (various hacks), such > code needs to be updated or not use pac-ret in relevant functions, but > such issues are hard to discover without hw. > > pac is a per-boot system-wide setting, not per-process, so if there is > any issue or bug there is no way to disable it for one broken binary > (nowadays there is a disable prctl, but it is documented to slow the > system down, so not suitable for working around perf issues). > > on simple cores pac can be slow (can add latency to non-leaf functions) > > with 48bit va space, there is only 7bit pac, i.e. limited protection. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-15 0:22 ` enh @ 2024-02-15 9:18 ` Szabolcs Nagy 0 siblings, 0 replies; 25+ messages in thread From: Szabolcs Nagy @ 2024-02-15 9:18 UTC (permalink / raw) To: enh; +Cc: Rich Felker, William Roberts, musl * enh <enh@google.com> [2024-02-14 16:22:05 -0800]: > On Wed, Feb 14, 2024 at 4:03 PM Szabolcs Nagy <nsz@port70.net> wrote: > > i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK > > (this was an oversight, llvm added an option, binutils gas has none). > > what's the option? (since Android only supports llvm, that might be > worth considering as a slight cleanup for us...) -mmark-bti-property https://releases.llvm.org/16.0.0/tools/clang/docs/ClangCommandLineReference.html#cmdoption-clang-mmark-bti-property https://reviews.llvm.org/D81930 > > as far as i know openssl is the reason android does not enable pac: > > they added the hint instructions incorrectly at first so there are > > binaries that fail if the hw enables pac. > > that was one reason why "android does not enable pac" _by default in > the android target triples for app developers_, yes --- though i think > we're at the point where we think we should flip that default (not > least because the number of users whose devices would actually > _benefit_ from the extra instructions is a lot larger now!): > https://github.com/android/ndk/issues/1914 i see ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [musl] PAC/BTI Support on aarch64 2024-02-12 18:42 ` Rich Felker 2024-02-12 21:25 ` William Roberts @ 2024-02-19 23:54 ` Fangrui Song [not found] ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com> 2 siblings, 0 replies; 25+ messages in thread From: Fangrui Song @ 2024-02-19 23:54 UTC (permalink / raw) To: musl; +Cc: William Roberts, Anton Korobeynikov On Mon, Feb 12, 2024 at 10:42 AM Rich Felker <dalias@libc.org> wrote: > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > Hello, > > > > I was just wondering if there was any work being done to support PAC > > and BTI in aarch64? I could add support but didn't want to duplicate > > the work. > > I'm not aware of any active work on this, but before writing a full > implementation, it would be really helpful to start with a basic > proposal for the scope of changes needed to make it work to assess > whether these are managable and acceptable cost. > > Rich Cc +Anton (other messages of this thread can be found at https://www.openwall.com/lists/musl/2024/02/12/ ). Per https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/23 and an lld/ELF patch * https://github.com/access-softek/llvm-project/commits/elf-pauth * https://github.com/access-softek/musl/tree/dkovalev/pauth-code-drop contains a prototype. > We verified that LLVM testsuite compiled with pauth successfully passes on pauth-enabled AArch64 board. https://www.openwall.com/lists/musl/2024/02/12/ It looks like there will be an LLVM Pointer Authentication discussion in a few hours: https://calendar.google.com/calendar/u/0/embed?src=calendar@llvm.org ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com>]
* Re: [musl] PAC/BTI Support on aarch64 [not found] ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com> @ 2024-02-20 6:21 ` Anton Korobeynikov 0 siblings, 0 replies; 25+ messages in thread From: Anton Korobeynikov @ 2024-02-20 6:21 UTC (permalink / raw) To: Fangrui Song; +Cc: musl, William Roberts Thanks Fangrui! For PAC / BTI no support from the C standard library is required. All changes are ordinary source code changes and only assembler sources should contain proper annotations / notes / BTI checks. The links above are about pointer authentication ABI (aka "arm64e"). PAC / BTI could be considered as part of it, but only a small one. Over the last few months we have been working on bringing pauth to ELF-based platforms. Our aim is to have pauth ABI support to be released as a part of LLVM 19. That github Access Softek repo is a downstream fork that contains rebased Apple changes to frontend, intrinsics, etc. and ELF codegen bits. We are working on upstreaming code from it to LLVM mainline. For pauth more deep interaction with standard library is required, as dynamic loader should process pauth relocations, and sign pointers as needed. Plus, some additional handling of the gnu.note segment would be necessary as one would need to e.g. prohibit loading of DSOs with incompatible ABI. We are having a proof-of-concept patch for MUSL to process pauth relocations (https://github.com/access-softek/musl/pull/1). We have not submitted it to MUSL upstream as there are lots of moving pieces and we do not want to submit something that could be changed (e.g. reloc numbers already changed once). Certainly, for pauth support additional code changes to assembler sources would be required. As well as ABI marking. PS: Please CC me on responses as I am not subscribed. On Mon, Feb 19, 2024 at 4:01 PM Fangrui Song <i@maskray.me> wrote: > > On Mon, Feb 12, 2024 at 10:42 AM Rich Felker <dalias@libc.org> wrote: > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote: > > > Hello, > > > > > > I was just wondering if there was any work being done to support PAC > > > and BTI in aarch64? I could add support but didn't want to duplicate > > > the work. > > > > I'm not aware of any active work on this, but before writing a full > > implementation, it would be really helpful to start with a basic > > proposal for the scope of changes needed to make it work to assess > > whether these are managable and acceptable cost. > > > > Rich > > Cc +Anton (other messages of this thread can be found at > https://www.openwall.com/lists/musl/2024/02/12/ ). > > Per https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/23 > and an lld/ELF patch > > * https://github.com/access-softek/llvm-project/commits/elf-pauth > * https://github.com/access-softek/musl/tree/dkovalev/pauth-code-drop > > contains a prototype. > > > We verified that LLVM testsuite compiled with pauth successfully passes on pauth-enabled AArch64 board. > > https://www.openwall.com/lists/musl/2024/02/12/ > > It looks like there will be an LLVM Pointer Authentication discussion > in a few hours: > https://calendar.google.com/calendar/u/0/embed?src=calendar@llvm.org -- With best regards, Anton Korobeynikov ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-03-02 14:45 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-02-12 16:38 [musl] PAC/BTI Support on aarch64 William Roberts 2024-02-12 18:42 ` Rich Felker 2024-02-12 21:25 ` William Roberts 2024-02-12 21:34 ` enh 2024-02-12 22:46 ` Rich Felker 2024-02-12 23:05 ` enh 2024-02-12 23:18 ` William Roberts 2024-02-13 2:08 ` Rich Felker 2024-02-13 14:47 ` William Roberts 2024-02-13 17:51 ` Markus Wichmann 2024-02-14 2:19 ` Rich Felker 2024-02-14 3:19 ` William Roberts 2024-02-14 4:44 ` Markus Wichmann 2024-02-14 13:32 ` Thorsten Glaser 2024-02-14 14:03 ` Rich Felker 2024-02-14 14:12 ` Thorsten Glaser 2024-02-15 13:29 ` Stefan O'Rear 2024-02-15 14:06 ` Rich Felker 2024-03-02 14:33 ` Szabolcs Nagy 2024-03-02 14:45 ` Rich Felker 2024-02-15 0:03 ` Szabolcs Nagy 2024-02-15 0:22 ` enh 2024-02-15 9:18 ` Szabolcs Nagy 2024-02-19 23:54 ` Fangrui Song [not found] ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com> 2024-02-20 6:21 ` Anton Korobeynikov
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).