mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] PAC/BTI Support on aarch64
@ 2024-02-12 16:38 William Roberts
  2024-02-12 18:42 ` Rich Felker
  0 siblings, 1 reply; 25+ messages in thread
From: William Roberts @ 2024-02-12 16:38 UTC (permalink / raw)
  To: musl

Hello,

I was just wondering if there was any work being done to support PAC
and BTI in aarch64? I could add support but didn't want to duplicate
the work.

Thanks,
Bill

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 16:38 [musl] PAC/BTI Support on aarch64 William Roberts
@ 2024-02-12 18:42 ` Rich Felker
  2024-02-12 21:25   ` William Roberts
                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Rich Felker @ 2024-02-12 18:42 UTC (permalink / raw)
  To: William Roberts; +Cc: musl

On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> Hello,
> 
> I was just wondering if there was any work being done to support PAC
> and BTI in aarch64? I could add support but didn't want to duplicate
> the work.

I'm not aware of any active work on this, but before writing a full
implementation, it would be really helpful to start with a basic
proposal for the scope of changes needed to make it work to assess
whether these are managable and acceptable cost.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 18:42 ` Rich Felker
@ 2024-02-12 21:25   ` William Roberts
  2024-02-12 21:34     ` enh
  2024-02-12 22:46     ` Rich Felker
  2024-02-19 23:54   ` Fangrui Song
       [not found]   ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 2 replies; 25+ messages in thread
From: William Roberts @ 2024-02-12 21:25 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > Hello,
> >
> > I was just wondering if there was any work being done to support PAC
> > and BTI in aarch64? I could add support but didn't want to duplicate
> > the work.
>
> I'm not aware of any active work on this, but before writing a full
> implementation, it would be really helpful to start with a basic
> proposal for the scope of changes needed to make it work to assess
> whether these are manageable and acceptable cost.

It's a matter of building with -mbranch-protection=standard

Just the ASM labels need the first instruction to be a BTI. They're in
the NOP space
so they are backwards compatible, older hardware will just NOP it.

It's been done for many projects, glibc and bionic have it. The
problem with BTI is that when one item in the link
list doesn't support BTI the loader/linker turns it off. So when it's
something like a libc that is fundamental in the link chain,
it turns it off for everything.

The initial scope of code changes would be what's reported when
LDFLAGS=-Wl,-zforce-bti,--fatal-warnings

/usr/bin/ld: obj/src/fenv/aarch64/fenv.lo: warning: BTI turned on by
-z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/ldso/aarch64/dlsym.lo: warning: BTI turned on by
-z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/ldso/aarch64/tlsdesc.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/process/aarch64/vfork.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/setjmp/aarch64/longjmp.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/setjmp/aarch64/setjmp.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/signal/aarch64/restore.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/signal/aarch64/sigsetjmp.lo: warning: BTI turned
on by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/string/aarch64/memcpy.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/string/aarch64/memset.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/thread/aarch64/__set_thread_area.lo: warning: BTI
turned on by -z force-bti when all inputs do not have BTI in NOTE
section.
/usr/bin/ld: obj/src/thread/aarch64/__unmapself.lo: warning: BTI
turned on by -z force-bti when all inputs do not have BTI in NOTE
section.
/usr/bin/ld: obj/src/thread/aarch64/clone.lo: warning: BTI turned on
by -z force-bti when all inputs do not have BTI in NOTE section.
/usr/bin/ld: obj/src/thread/aarch64/syscall_cp.lo: warning: BTI turned
on by -z force-bti when all inputs do not have BTI in NOTE section.

>
> Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 21:25   ` William Roberts
@ 2024-02-12 21:34     ` enh
  2024-02-12 22:46     ` Rich Felker
  1 sibling, 0 replies; 25+ messages in thread
From: enh @ 2024-02-12 21:34 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On Mon, Feb 12, 2024 at 1:26 PM William Roberts
<bill.c.roberts@gmail.com> wrote:
>
> On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > Hello,
> > >
> > > I was just wondering if there was any work being done to support PAC
> > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > the work.
> >
> > I'm not aware of any active work on this, but before writing a full
> > implementation, it would be really helpful to start with a basic
> > proposal for the scope of changes needed to make it work to assess
> > whether these are manageable and acceptable cost.
>
> It's a matter of building with -mbranch-protection=standard
>
> Just the ASM labels need the first instruction to be a BTI. They're in
> the NOP space
> so they are backwards compatible, older hardware will just NOP it.
>
> It's been done for many projects, glibc and bionic have it. The
> problem with BTI is that when one item in the link
> list doesn't support BTI the loader/linker turns it off. So when it's
> something like a libc that is fundamental in the link chain,
> it turns it off for everything.

note that bionic was quite sneaky, and if you look at bionic's arm64
.S files, you'll think we _haven't_ done the BTI work... we hid the
`bti c` instruction in the implementation of our ENTRY() macro
[https://android.googlesource.com/platform/bionic/+/main/libc/private/bionic_asm_arm64.h#48]
and similarly the ELF note you need is hidden by macros too
[https://android.googlesource.com/platform/bionic/+/main/libc/private/bionic_asm_arm64.h#60].

> The initial scope of code changes would be what's reported when
> LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
>
> /usr/bin/ld: obj/src/fenv/aarch64/fenv.lo: warning: BTI turned on by
> -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/ldso/aarch64/dlsym.lo: warning: BTI turned on by
> -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/ldso/aarch64/tlsdesc.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/process/aarch64/vfork.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/setjmp/aarch64/longjmp.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/setjmp/aarch64/setjmp.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/signal/aarch64/restore.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/signal/aarch64/sigsetjmp.lo: warning: BTI turned
> on by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/string/aarch64/memcpy.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/string/aarch64/memset.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/thread/aarch64/__set_thread_area.lo: warning: BTI
> turned on by -z force-bti when all inputs do not have BTI in NOTE
> section.
> /usr/bin/ld: obj/src/thread/aarch64/__unmapself.lo: warning: BTI
> turned on by -z force-bti when all inputs do not have BTI in NOTE
> section.
> /usr/bin/ld: obj/src/thread/aarch64/clone.lo: warning: BTI turned on
> by -z force-bti when all inputs do not have BTI in NOTE section.
> /usr/bin/ld: obj/src/thread/aarch64/syscall_cp.lo: warning: BTI turned
> on by -z force-bti when all inputs do not have BTI in NOTE section.
>
> >
> > Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 21:25   ` William Roberts
  2024-02-12 21:34     ` enh
@ 2024-02-12 22:46     ` Rich Felker
  2024-02-12 23:05       ` enh
  1 sibling, 1 reply; 25+ messages in thread
From: Rich Felker @ 2024-02-12 22:46 UTC (permalink / raw)
  To: William Roberts; +Cc: musl

On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > Hello,
> > >
> > > I was just wondering if there was any work being done to support PAC
> > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > the work.
> >
> > I'm not aware of any active work on this, but before writing a full
> > implementation, it would be really helpful to start with a basic
> > proposal for the scope of changes needed to make it work to assess
> > whether these are manageable and acceptable cost.
> 
> It's a matter of building with -mbranch-protection=standard
> 
> Just the ASM labels need the first instruction to be a BTI. They're in
> the NOP space
> so they are backwards compatible, older hardware will just NOP it.

I think it's a little more elaborate than that. Those asm instructions
need to be added (probably as .instr or .word or something, unless
there's a way to spell this particular nop that existing tooling will
understand). Or it could be made conditional, but that would require
converting any asm that's not already .S files to .S. Not bad, but not
quite as trivial as adding something to CFLAGS.

I also wondered if [sig]setjmp/longjmp would be affected, but probably
not.

> It's been done for many projects, glibc and bionic have it. The
> problem with BTI is that when one item in the link
> list doesn't support BTI the loader/linker turns it off. So when it's
> something like a libc that is fundamental in the link chain,
> it turns it off for everything.

This presumably requires some kind of machinery for how dynamic
linking will work, and possibly turning it off if a library without it
is dlopened?

My understanding doing some brief searches though was that you can
individually mprotect it off in certain regions. So maybe it's
possible to just enable only for DSOs that support it?

> The initial scope of code changes would be what's reported when
> LDFLAGS=-Wl,-zforce-bti,--fatal-warnings

Is there a way to disable these warnings so that every asm file does
not need to be cluttered with annotations?

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 22:46     ` Rich Felker
@ 2024-02-12 23:05       ` enh
  2024-02-12 23:18         ` William Roberts
  0 siblings, 1 reply; 25+ messages in thread
From: enh @ 2024-02-12 23:05 UTC (permalink / raw)
  To: musl; +Cc: William Roberts

On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > > Hello,
> > > >
> > > > I was just wondering if there was any work being done to support PAC
> > > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > > the work.
> > >
> > > I'm not aware of any active work on this, but before writing a full
> > > implementation, it would be really helpful to start with a basic
> > > proposal for the scope of changes needed to make it work to assess
> > > whether these are manageable and acceptable cost.
> >
> > It's a matter of building with -mbranch-protection=standard
> >
> > Just the ASM labels need the first instruction to be a BTI. They're in
> > the NOP space
> > so they are backwards compatible, older hardware will just NOP it.
>
> I think it's a little more elaborate than that. Those asm instructions
> need to be added (probably as .instr or .word or something, unless
> there's a way to spell this particular nop that existing tooling will
> understand).

depends on your toolchain version. when we added this to bionic, the
toolchain work was still happening. so you'll want to test against
whatever your oldest-supported toolchain is.

> Or it could be made conditional, but that would require
> converting any asm that's not already .S files to .S. Not bad, but not
> quite as trivial as adding something to CFLAGS.
>
> I also wondered if [sig]setjmp/longjmp would be affected, but probably
> not.

bionic does use PAC, but i think glibc has its own "pointer mangling" thing?

> > It's been done for many projects, glibc and bionic have it. The
> > problem with BTI is that when one item in the link
> > list doesn't support BTI the loader/linker turns it off. So when it's
> > something like a libc that is fundamental in the link chain,
> > it turns it off for everything.
>
> This presumably requires some kind of machinery for how dynamic
> linking will work, and possibly turning it off if a library without it
> is dlopened?
>
> My understanding doing some brief searches though was that you can
> individually mprotect it off in certain regions. So maybe it's
> possible to just enable only for DSOs that support it?

correct.

> > The initial scope of code changes would be what's reported when
> > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
>
> Is there a way to disable these warnings so that every asm file does
> not need to be cluttered with annotations?

well, that's the ELF note stuff i was talking about, and if you don't
have it you'll fall foul of the static linker saying "not all this
code is BTI-enabled, therefore this .so isn't", and the dynamic linker
doing nothing because the static linker effectively tells it not to.

> Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 23:05       ` enh
@ 2024-02-12 23:18         ` William Roberts
  2024-02-13  2:08           ` Rich Felker
  0 siblings, 1 reply; 25+ messages in thread
From: William Roberts @ 2024-02-12 23:18 UTC (permalink / raw)
  To: enh; +Cc: musl

On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote:
>
> On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > > > Hello,
> > > > >
> > > > > I was just wondering if there was any work being done to support PAC
> > > > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > > > the work.
> > > >
> > > > I'm not aware of any active work on this, but before writing a full
> > > > implementation, it would be really helpful to start with a basic
> > > > proposal for the scope of changes needed to make it work to assess
> > > > whether these are manageable and acceptable cost.
> > >
> > > It's a matter of building with -mbranch-protection=standard
> > >
> > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > the NOP space
> > > so they are backwards compatible, older hardware will just NOP it.
> >
> > I think it's a little more elaborate than that. Those asm instructions
> > need to be added (probably as .instr or .word or something, unless
> > there's a way to spell this particular nop that existing tooling will
> > understand).
>
> depends on your toolchain version. when we added this to bionic, the
> toolchain work was still happening. so you'll want to test against
> whatever your oldest-supported toolchain is.
>

You just use the hint <immediate> instructions, they are understood by old
toolchains. But you can only support a subset of the BTI/PAC instructions
but it's been enough for most projects that follow the normal ABI conventions
like OpenSSL/BoringSSL,etc, but not enough for libffi for example.

> > Or it could be made conditional, but that would require
> > converting any asm that's not already .S files to .S. Not bad, but not

as in inline asm? Unless it's a branch target, no need.

> > quite as trivial as adding something to CFLAGS.

That's not really what I said...

> >
> > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > not.
>
> bionic does use PAC, but i think glibc has its own "pointer mangling" thing?

You need it, as the first instruction from a branch (where longjmp returns to)
needs to be a BTI instruction.

>
> > > It's been done for many projects, glibc and bionic have it. The
> > > problem with BTI is that when one item in the link
> > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > something like a libc that is fundamental in the link chain,
> > > it turns it off for everything.
> >
> > This presumably requires some kind of machinery for how dynamic
> > linking will work, and possibly turning it off if a library without it
> > is dlopened?
> >
> > My understanding doing some brief searches though was that you can
> > individually mprotect it off in certain regions. So maybe it's
> > possible to just enable only for DSOs that support it?
>
> correct.
>
> > > The initial scope of code changes would be what's reported when
> > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> >
> > Is there a way to disable these warnings so that every asm file does
> > not need to be cluttered with annotations?
>
> well, that's the ELF note stuff i was talking about, and if you don't
> have it you'll fall foul of the static linker saying "not all this
> code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> doing nothing because the static linker effectively tells it not to.

Yep, well said ENH. It's been since Android since we crossed paths :-).

It's not that hard to annotate an asm file :-p I forget what project
(I think it was gnutls, but they just use openssl's code for the asm)
but I just put it in a header file and by virtue of #include'ing it you get the
notes added.

>
> > Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 23:18         ` William Roberts
@ 2024-02-13  2:08           ` Rich Felker
  2024-02-13 14:47             ` William Roberts
  2024-02-15  0:03             ` Szabolcs Nagy
  0 siblings, 2 replies; 25+ messages in thread
From: Rich Felker @ 2024-02-13  2:08 UTC (permalink / raw)
  To: William Roberts; +Cc: enh, musl

On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote:
> On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote:
> >
> > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
> > >
> > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> > > > >
> > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > > > > Hello,
> > > > > >
> > > > > > I was just wondering if there was any work being done to support PAC
> > > > > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > > > > the work.
> > > > >
> > > > > I'm not aware of any active work on this, but before writing a full
> > > > > implementation, it would be really helpful to start with a basic
> > > > > proposal for the scope of changes needed to make it work to assess
> > > > > whether these are manageable and acceptable cost.
> > > >
> > > > It's a matter of building with -mbranch-protection=standard
> > > >
> > > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > > the NOP space
> > > > so they are backwards compatible, older hardware will just NOP it.
> > >
> > > I think it's a little more elaborate than that. Those asm instructions
> > > need to be added (probably as .instr or .word or something, unless
> > > there's a way to spell this particular nop that existing tooling will
> > > understand).
> >
> > depends on your toolchain version. when we added this to bionic, the
> > toolchain work was still happening. so you'll want to test against
> > whatever your oldest-supported toolchain is.
> >
> 
> You just use the hint <immediate> instructions, they are understood by old
> toolchains. But you can only support a subset of the BTI/PAC instructions
> but it's been enough for most projects that follow the normal ABI conventions
> like OpenSSL/BoringSSL,etc, but not enough for libffi for example.

If hint goes all the way back, that's probably fine and ideal to use.

> > > Or it could be made conditional, but that would require
> > > converting any asm that's not already .S files to .S. Not bad, but not
> 
> as in inline asm? Unless it's a branch target, no need.

No, .S (preprocessed) vs .s (not). But if the hint insn works, I think
just having it there unconditionally is probably the way to go.

> > > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > > not.
> >
> > bionic does use PAC, but i think glibc has its own "pointer mangling" thing?
> 
> You need it, as the first instruction from a branch (where longjmp returns to)
> needs to be a BTI instruction.

Is that different from a normal function return?

Note that in the case of sigsetjmp, (sig)longjmp returns to a point
inside the sigsetjmp asm, so that point needs the annotation I think.

> > > > It's been done for many projects, glibc and bionic have it. The
> > > > problem with BTI is that when one item in the link
> > > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > > something like a libc that is fundamental in the link chain,
> > > > it turns it off for everything.
> > >
> > > This presumably requires some kind of machinery for how dynamic
> > > linking will work, and possibly turning it off if a library without it
> > > is dlopened?
> > >
> > > My understanding doing some brief searches though was that you can
> > > individually mprotect it off in certain regions. So maybe it's
> > > possible to just enable only for DSOs that support it?
> >
> > correct.

OK, that's good to know. So which direction is it? Do DSOs that
support BTI need it explicitly turned on via mprotect/mmap flags? Or
is there some process-global flag to turn it on, and then ones that
don't support it need it turned off?

I suspect it's possible to first enable BTI for third-party libraries
as a feature of the dynamic linker, and add BTI support for libc
itself as a separate thing. That might be a nice factoring to make
changes minimal and easy for ppl to read.

The changes in dynlink.c should be as arch-agnostic as possible. If
there's a corresponding feature on other archs, it should use the same
basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining
the mechanisms for evaluating if an ELF file is compatible, how to do
the mprotect, etc.

> > > > The initial scope of code changes would be what's reported when
> > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> > >
> > > Is there a way to disable these warnings so that every asm file does
> > > not need to be cluttered with annotations?
> >
> > well, that's the ELF note stuff i was talking about, and if you don't
> > have it you'll fall foul of the static linker saying "not all this
> > code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> > doing nothing because the static linker effectively tells it not to.
> 
> Yep, well said ENH. It's been since Android since we crossed paths :-).
> 
> It's not that hard to annotate an asm file :-p I forget what project
> (I think it was gnutls, but they just use openssl's code for the asm)
> but I just put it in a header file and by virtue of #include'ing it you get the
> notes added.

Yes, we generally don't do that. There are no "asm headers" in musl;
all asm files are self-contained and readable standalone. So if
there's no way to tell the assembler/linker from the command line that
files are BTI-compatible without generating a huge load of warning
spam, I guess it's a mess of copy-and-paste...

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-13  2:08           ` Rich Felker
@ 2024-02-13 14:47             ` William Roberts
  2024-02-13 17:51               ` Markus Wichmann
  2024-02-15  0:03             ` Szabolcs Nagy
  1 sibling, 1 reply; 25+ messages in thread
From: William Roberts @ 2024-02-13 14:47 UTC (permalink / raw)
  To: Rich Felker; +Cc: enh, musl

On Mon, Feb 12, 2024 at 8:08 PM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote:
> > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote:
> > >
> > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
> > > >
> > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > > > On Mon, Feb 12, 2024 at 12:42 PM Rich Felker <dalias@libc.org> wrote:
> > > > > >
> > > > > > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > I was just wondering if there was any work being done to support PAC
> > > > > > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > > > > > the work.
> > > > > >
> > > > > > I'm not aware of any active work on this, but before writing a full
> > > > > > implementation, it would be really helpful to start with a basic
> > > > > > proposal for the scope of changes needed to make it work to assess
> > > > > > whether these are manageable and acceptable cost.
> > > > >
> > > > > It's a matter of building with -mbranch-protection=standard
> > > > >
> > > > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > > > the NOP space
> > > > > so they are backwards compatible, older hardware will just NOP it.
> > > >
> > > > I think it's a little more elaborate than that. Those asm instructions
> > > > need to be added (probably as .instr or .word or something, unless
> > > > there's a way to spell this particular nop that existing tooling will
> > > > understand).
> > >
> > > depends on your toolchain version. when we added this to bionic, the
> > > toolchain work was still happening. so you'll want to test against
> > > whatever your oldest-supported toolchain is.
> > >
> >
> > You just use the hint <immediate> instructions, they are understood by old
> > toolchains. But you can only support a subset of the BTI/PAC instructions
> > but it's been enough for most projects that follow the normal ABI conventions
> > like OpenSSL/BoringSSL,etc, but not enough for libffi for example.
>
> If hint goes all the way back, that's probably fine and ideal to use.

It should. Is there a known minimal tool chain requirement and I can test?

>
> > > > Or it could be made conditional, but that would require
> > > > converting any asm that's not already .S files to .S. Not bad, but not
> >
> > as in inline asm? Unless it's a branch target, no need.
>
> No, .S (preprocessed) vs .s (not). But if the hint insn works, I think
> just having it there unconditionally is probably the way to go.
>
> > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > > > not.
> > >
> > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing?
> >
> > You need it, as the first instruction from a branch (where longjmp returns to)
> > needs to be a BTI instruction.
>
> Is that different from a normal function return?

No, anywhere branches are allowed, a BTI instruction must be the first
instruction. BTI is just a way for software to say, hey this is a
valid jump/branch
target, allow it. This reduces the amount of gadgets available to an
attacker, which
is why libc is such a juicy target, as it's in everything. A lot of
things static link it,
which effectively turns it off for the whole process.

>
> Note that in the case of sigsetjmp, (sig)longjmp returns to a point
> inside the sigsetjmp asm, so that point needs the annotation I think.
>
> > > > > It's been done for many projects, glibc and bionic have it. The
> > > > > problem with BTI is that when one item in the link
> > > > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > > > something like a libc that is fundamental in the link chain,
> > > > > it turns it off for everything.
> > > >
> > > > This presumably requires some kind of machinery for how dynamic
> > > > linking will work, and possibly turning it off if a library without it
> > > > is dlopened?
> > > >
> > > > My understanding doing some brief searches though was that you can
> > > > individually mprotect it off in certain regions. So maybe it's
> > > > possible to just enable only for DSOs that support it?
> > >
> > > correct.
>
> OK, that's good to know. So which direction is it? Do DSOs that
> support BTI need it explicitly turned on via mprotect/mmap flags?

Yes, so the kernel will manage the EL1 register flag for this, and then
mprotect sets the PROT_BTI flag during dlopen().

> Or
> is there some process-global flag to turn it on, and then ones that
> don't support it need it turned off?

EL1 MSR register (I forget which one offhand), but the granularity is
managed at the page level.

>
> I suspect it's possible to first enable BTI for third-party libraries
> as a feature of the dynamic linker,

If you mean, check the GNU Notes section for BTI enabled and set
PROT_BTI via mprotect, that's just one of the many patches, but can
be taken independently.

> and add BTI support for libc
> itself as a separate thing. That might be a nice factoring to make
> changes minimal and easy for ppl to read.

This is just a matter of organizing things, there's no dependency between
enabling the linker and enabling the library itself. So of course that shouldn't
come as one giant patch.

It's important to note, that even when enabling the assembly code files, if the
C level source is not built with -mbranch-protection=standard, the feature will
remain off for the library.

BTI is enabled for third party packages on Fedora by default:
  - https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication

The problem is, now all the packages that don't use the default set of
CFLAGS and/or roll their own asm.

>
> The changes in dynlink.c should be as arch-agnostic as possible. If
> there's a corresponding feature on other archs

I can't think of anything like this offhand, but aarches may want to add prot
flags to mprotect calls.

> it should use the same
> basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining
> the mechanisms for evaluating if an ELF file is compatible, how to do
> the mprotect, etc.

it usually
#ifdef aarch64
if (gnu_notes_bti_set && (prot & PROT_EXEC)) {
    prot |= PROT_BTI;
else {
    prot &= ~PROT_BTI;
}
#endif

mprotect(..., prot);

but this could be done with something like an arch specific macro fn
or inline in a header that just
does nothing for most architectures or a weak symbol, but I am always
worried with weak symbols
someone might override it in a bad way.

>
> > > > > The initial scope of code changes would be what's reported when
> > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> > > >
> > > > Is there a way to disable these warnings so that every asm file does
> > > > not need to be cluttered with annotations?
> > >
> > > well, that's the ELF note stuff i was talking about, and if you don't
> > > have it you'll fall foul of the static linker saying "not all this
> > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> > > doing nothing because the static linker effectively tells it not to.
> >
> > Yep, well said ENH. It's been since Android since we crossed paths :-).
> >
> > It's not that hard to annotate an asm file :-p I forget what project
> > (I think it was gnutls, but they just use openssl's code for the asm)
> > but I just put it in a header file and by virtue of #include'ing it you get the
> > notes added.
>
> Yes, we generally don't do that. There are no "asm headers" in musl;
> all asm files are self-contained and readable standalone. So if
> there's no way to tell the assembler/linker from the command line that
> files are BTI-compatible without generating a huge load of warning
> spam, I guess it's a mess of copy-and-paste...
>
> Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-13 14:47             ` William Roberts
@ 2024-02-13 17:51               ` Markus Wichmann
  2024-02-14  2:19                 ` Rich Felker
  0 siblings, 1 reply; 25+ messages in thread
From: Markus Wichmann @ 2024-02-13 17:51 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker, enh

Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts:
> It should. Is there a known minimal tool chain requirement and I can test?
>

Typically the first C99 compiler or the first aarch64 compiler,
whichever is younger.

>
> No, anywhere branches are allowed, a BTI instruction must be the first
> instruction. BTI is just a way for software to say, hey this is a
> valid jump/branch
> target, allow it. This reduces the amount of gadgets available to an
> attacker, which
> is why libc is such a juicy target, as it's in everything. A lot of
> things static link it,
> which effectively turns it off for the whole process.
>

So this means there must be a BTI instruction following every single BL
instruction.

But in the end this isn't that much different from endbr64 on the PC.
Whatever happened to those patches, BTW?

> Yes, so the kernel will manage the EL1 register flag for this, and then
> mprotect sets the PROT_BTI flag during dlopen().
>

Well, this is a novelty. This is the first time there will be an
arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far
that code has been entirely portable.

> It's important to note, that even when enabling the assembly code files, if the
> C level source is not built with -mbranch-protection=standard, the feature will
> remain off for the library.
>

Arch-specific compiler flags are not a problem; configure.sh can add
those as needed.

> I can't think of anything like this offhand, but aarches may want to add prot
> flags to mprotect calls.
>

That hasn't happened yet. Of course, this may be as simple as adding a
static inline function. The fact that the important information is in a
note section is yet another novelty, of course. So far, the important
information (even arch-specific) has been contained in the dynamic
section.

> it usually
> #ifdef aarch64
> if (gnu_notes_bti_set && (prot & PROT_EXEC)) {
>     prot |= PROT_BTI;
> else {
>     prot &= ~PROT_BTI;
> }
> #endif
>
> mprotect(..., prot);
>

So far we have managed to steer clear of conditional inclusion, and I
think we should try to keep it that way.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-13 17:51               ` Markus Wichmann
@ 2024-02-14  2:19                 ` Rich Felker
  2024-02-14  3:19                   ` William Roberts
                                     ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Rich Felker @ 2024-02-14  2:19 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl, enh

On Tue, Feb 13, 2024 at 06:51:47PM +0100, Markus Wichmann wrote:
> Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts:
> > It should. Is there a known minimal tool chain requirement and I can test?
> 
> Typically the first C99 compiler or the first aarch64 compiler,
> whichever is younger.

I think binutils is the relevant component, and that'd be whichever
version of binutils added aarch64.

> > No, anywhere branches are allowed, a BTI instruction must be the first
> > instruction. BTI is just a way for software to say, hey this is a
> > valid jump/branch
> > target, allow it. This reduces the amount of gadgets available to an
> > attacker, which
> > is why libc is such a juicy target, as it's in everything. A lot of
> > things static link it,
> > which effectively turns it off for the whole process.
> >
> 
> So this means there must be a BTI instruction following every single BL
> instruction.
> 
> But in the end this isn't that much different from endbr64 on the PC.
> Whatever happened to those patches, BTW?

What is the situation on x86? Does it use the same kind of per-page
enforcement mode, or is it only global, requiring disabling it if any
DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
older ISA levels, or does it need to be conditional?

> > Yes, so the kernel will manage the EL1 register flag for this, and then
> > mprotect sets the PROT_BTI flag during dlopen().
> 
> Well, this is a novelty. This is the first time there will be an
> arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far
> that code has been entirely portable.

Can the flag be used at mmap time, or only in mprotect? It would be a
lot more efficient to do it as part of the mmap, but getting
visibility to the note to know you need it at mmap time seems
difficult and more costly than doing the mprotect later...

I assume we would either add the code conditional on the existence of
a PROT_BTI macro (#ifdef) and define that to the corresponding thing
on other archs in the future, or abstract it with a new name in
arch/$ARCH/reloc.h defined in terms of whatever the arch provides so
as to be a little bit more naming-agnostic.

It should not be #ifdef __aarch64__ or similar.

> > It's important to note, that even when enabling the assembly code files, if the
> > C level source is not built with -mbranch-protection=standard, the feature will
> > remain off for the library.
> >
> 
> Arch-specific compiler flags are not a problem; configure.sh can add
> those as needed.

Yep, that's fine. Possibly a question of whether it should be on by
default or configurable, but if there's essentially no cost,
on-by-default seems fine.

> > I can't think of anything like this offhand, but aarches may want to add prot
> > flags to mprotect calls.
> 
> That hasn't happened yet. Of course, this may be as simple as adding a
> static inline function. The fact that the important information is in a
> note section is yet another novelty, of course. So far, the important
> information (even arch-specific) has been contained in the dynamic
> section.

Yes, that's gratuitously annoying. Ideally it would have been
somewhere easily accessible from the Ehdr so it's available at initial
mmap time... :/

> > it usually
> > #ifdef aarch64
> > if (gnu_notes_bti_set && (prot & PROT_EXEC)) {
> >     prot |= PROT_BTI;
> > else {
> >     prot &= ~PROT_BTI;
> > }
> > #endif
> >
> > mprotect(..., prot);
> >
> 
> So far we have managed to steer clear of conditional inclusion, and I
> think we should try to keep it that way.

Yes. I think reloc.h should define a predicate macro (which may call a
static inline function if the predicate is complex) to check if a DSO
needs branch protection on its PROT_EXEC segments.
src/internal/dynlink.h could provide a default always-false one if
it's not defined. Then dynlink.c can just, when that predicate
evaluates true, loop thru the segments and mprotect any PROT_EXEC ones
to also have PROT_BTI or whatever.

This remains very arch-agnostic and the code should be either directly
usable on other archs, or admit easy generalization if needed.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14  2:19                 ` Rich Felker
@ 2024-02-14  3:19                   ` William Roberts
  2024-02-14  4:44                   ` Markus Wichmann
  2024-02-15 13:29                   ` Stefan O'Rear
  2 siblings, 0 replies; 25+ messages in thread
From: William Roberts @ 2024-02-14  3:19 UTC (permalink / raw)
  To: musl; +Cc: Markus Wichmann, enh

On Tue, Feb 13, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
>
> On Tue, Feb 13, 2024 at 06:51:47PM +0100, Markus Wichmann wrote:
> > Am Tue, Feb 13, 2024 at 08:47:42AM -0600 schrieb William Roberts:
> > > It should. Is there a known minimal tool chain requirement and I can test?
> >
> > Typically the first C99 compiler or the first aarch64 compiler,
> > whichever is younger.
>
> I think binutils is the relevant component, and that'd be whichever
> version of binutils added aarch64.

AFAICT 2.24 tagged Dec 2013

>
> > > No, anywhere branches are allowed, a BTI instruction must be the first
> > > instruction. BTI is just a way for software to say, hey this is a
> > > valid jump/branch
> > > target, allow it. This reduces the amount of gadgets available to an
> > > attacker, which
> > > is why libc is such a juicy target, as it's in everything. A lot of
> > > things static link it,
> > > which effectively turns it off for the whole process.
> > >
> >
> > So this means there must be a BTI instruction following every single BL
> > instruction.
> >

I don't think so, you wouldn't want call sites to effectively become gadget
locations. You want entry points marked, returns are handled with PAC,
which goes hand in hand with BTI. As the PAC instruction can also be
a landing pad. Looking at some generated ASM, I don't see BL's being marked.

Here's a decent doc BTW:
  - https://www.google.com/search?q=arm+introduction+to+bti&rlz=1C5GCEM_enUS1088US1089&oq=arm+introduction+to+bti&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIHCAEQIRigATIHCAIQIRigATIHCAMQIRifBTIHCAQQIRifBTIHCAUQIRifBdIBCDM2NTZqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8#:~:text=Arm%20Instruction%20Set,Arm%20Compiler%206

Essentially call points need a valid pac/bti instruction, if using
pac, then there must
be a validate before ret.

landing pad with pointer auth (with the A key): pacisasp or hint #25
validate with autiasp or int #29
landing pad with pointer auth (with the B key): pacisasp or hint #hint
#27 validate with autibsp or hint #31

landing pad BTI only:  bti c or hint #34

The compilers set some defines so you know which key to use, but some
projects just support the A key.

To support other keys, you would need to go the route of conditional
asm, but almost
everyone just uses the A key, it's what's turned on by
-mbranch-protection=standard (I think),
easy to catch in the arm header and balk if someone sets the B key.

> > But in the end this isn't that much different from endbr64 on the PC.
> > Whatever happened to those patches, BTW?
>
> What is the situation on x86? Does it use the same kind of per-page
> enforcement mode, or is it only global, requiring disabling it if any
> DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> older ISA levels, or does it need to be conditional?
>
> > > Yes, so the kernel will manage the EL1 register flag for this, and then
> > > mprotect sets the PROT_BTI flag during dlopen().
> >
> > Well, this is a novelty. This is the first time there will be an
> > arch-specific flag in mmap()/mprotect() for the musl dynlinker. So far
> > that code has been entirely portable.
>
> Can the flag be used at mmap time, or only in mprotect? It would be a
> lot more efficient to do it as part of the mmap, but getting
> visibility to the note to know you need it at mmap time seems
> difficult and more costly than doing the mprotect later...
>
> I assume we would either add the code conditional on the existence of
> a PROT_BTI macro (#ifdef) and define that to the corresponding thing
> on other archs in the future, or abstract it with a new name in
> arch/$ARCH/reloc.h defined in terms of whatever the arch provides so
> as to be a little bit more naming-agnostic.
>
> It should not be #ifdef __aarch64__ or similar.
>
> > > It's important to note, that even when enabling the assembly code files, if the
> > > C level source is not built with -mbranch-protection=standard, the feature will
> > > remain off for the library.
> > >
> >
> > Arch-specific compiler flags are not a problem; configure.sh can add
> > those as needed.
>
> Yep, that's fine. Possibly a question of whether it should be on by
> default or configurable, but if there's essentially no cost,
> on-by-default seems fine.
>
> > > I can't think of anything like this offhand, but aarches may want to add prot
> > > flags to mprotect calls.
> >
> > That hasn't happened yet. Of course, this may be as simple as adding a
> > static inline function. The fact that the important information is in a
> > note section is yet another novelty, of course. So far, the important
> > information (even arch-specific) has been contained in the dynamic
> > section.
>
> Yes, that's gratuitously annoying. Ideally it would have been
> somewhere easily accessible from the Ehdr so it's available at initial
> mmap time... :/
>
> > > it usually
> > > #ifdef aarch64
> > > if (gnu_notes_bti_set && (prot & PROT_EXEC)) {
> > >     prot |= PROT_BTI;
> > > else {
> > >     prot &= ~PROT_BTI;
> > > }
> > > #endif
> > >
> > > mprotect(..., prot);
> > >
> >
> > So far we have managed to steer clear of conditional inclusion, and I
> > think we should try to keep it that way.
>
> Yes. I think reloc.h should define a predicate macro (which may call a
> static inline function if the predicate is complex) to check if a DSO
> needs branch protection on its PROT_EXEC segments.
> src/internal/dynlink.h could provide a default always-false one if
> it's not defined. Then dynlink.c can just, when that predicate
> evaluates true, loop thru the segments and mprotect any PROT_EXEC ones
> to also have PROT_BTI or whatever.

I was tinkering today, arch/generic has a bunch of empty files, just
add an empty file
and let arches add to it. This code is far from useful, but just
clarifying that approach.

diff --git a/arch/aarch64/mprot_arch.h b/arch/aarch64/mprot_arch.h
new file mode 100644
index 00000000..32f7afc6
--- /dev/null
+++ b/arch/aarch64/mprot_arch.h
@@ -0,0 +1,13 @@
+#ifndef _MPROT_ARCH_H
+#define _MPROT_ARCH_H
+
+static inline int do_mprot(int prot) {
+ if (prot & PROT_EXEC)
+
+#define MPROT_ARCH(prot) \
+do {
+ if (prot & PROT_EXEC) { \
+ prot |= PROT_BTI; \
+ } \
+} while(0)
+#endif
diff --git a/arch/generic/mprot_arch.h b/arch/generic/mprot_arch.h
new file mode 100644
index 00000000..e69de29b
diff --git a/ldso/dynlink.c b/ldso/dynlink.c
index 324aa859..a9b2278a 100644
--- a/ldso/dynlink.c
+++ b/ldso/dynlink.c
@@ -22,6 +22,7 @@
 #include "pthread_impl.h"
 #include "fork_impl.h"
 #include "dynlink.h"
+#include "mprot_arch.h"

 static size_t ldso_page_size;
 #ifndef PAGE_SIZE
@@ -851,7 +852,9 @@ static void *map_library(int fd, struct dso *dso)
  }
  for (i=0; ((size_t *)(base+dyn))[i]; i+=2)
  if (((size_t *)(base+dyn))[i]==DT_TEXTREL) {
- if (mprotect(map, map_len, PROT_READ|PROT_WRITE|PROT_EXEC)
+ int prot = PROT_READ|PROT_WRITE|PROT_EXEC;
+ MPROT_ARCH(prot);
+ if (mprotect(map, map_len, prot)
      && errno != ENOSYS)
  goto error;
  break;

> This remains very arch-agnostic and the code should be either directly
> usable on other archs, or admit easy generalization if needed.
>
> Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14  2:19                 ` Rich Felker
  2024-02-14  3:19                   ` William Roberts
@ 2024-02-14  4:44                   ` Markus Wichmann
  2024-02-14 13:32                     ` Thorsten Glaser
  2024-02-15 13:29                   ` Stefan O'Rear
  2 siblings, 1 reply; 25+ messages in thread
From: Markus Wichmann @ 2024-02-14  4:44 UTC (permalink / raw)
  To: musl; +Cc: enh

Am Tue, Feb 13, 2024 at 09:19:25PM -0500 schrieb Rich Felker:
> What is the situation on x86? Does it use the same kind of per-page
> enforcement mode, or is it only global, requiring disabling it if any
> DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> older ISA levels, or does it need to be conditional?
>

My, what a journey. I had a look around the Internet for this question
and kept finding contradictory results. Turns out that is because, as
per kernel documentation, Linux only supports *kernel* IBT. The only
part of CET it supports for userspace is shadow stacks. Unless the
kernel docs are not up-to-date, of course.

According to Intel, the ENDBR64 instruction decodes as NOP on older
processors. GCC has support for emiting it, but at this point in time it
appears to be useless outside of Linux itself.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14  4:44                   ` Markus Wichmann
@ 2024-02-14 13:32                     ` Thorsten Glaser
  2024-02-14 14:03                       ` Rich Felker
  0 siblings, 1 reply; 25+ messages in thread
From: Thorsten Glaser @ 2024-02-14 13:32 UTC (permalink / raw)
  To: musl

Markus Wichmann dixit:

>According to Intel, the ENDBR64 instruction decodes as NOP on older
>processors.

That’s unfortunately only true for processors manufactored by Intel.
There exist 686-class CPUs that don’t handle these and other long nops
so it’s best omitted on generic, as in not -march=native, builds.

bye,
//mirabilos
-- 
15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14 13:32                     ` Thorsten Glaser
@ 2024-02-14 14:03                       ` Rich Felker
  2024-02-14 14:12                         ` Thorsten Glaser
  0 siblings, 1 reply; 25+ messages in thread
From: Rich Felker @ 2024-02-14 14:03 UTC (permalink / raw)
  To: Thorsten Glaser; +Cc: musl

On Wed, Feb 14, 2024 at 01:32:13PM +0000, Thorsten Glaser wrote:
> Markus Wichmann dixit:
> 
> >According to Intel, the ENDBR64 instruction decodes as NOP on older
> >processors.
> 
> That’s unfortunately only true for processors manufactored by Intel.
> There exist 686-class CPUs that don’t handle these and other long nops
> so it’s best omitted on generic, as in not -march=native, builds.

Lovely. So yet another reason the Intel thing sounds unusable in
practice while the ARM thing seems very reasonable to support...

Since you mentioned 686-class which are 32-bit, is the same true for
x86_64, or is the situation better there?

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14 14:03                       ` Rich Felker
@ 2024-02-14 14:12                         ` Thorsten Glaser
  0 siblings, 0 replies; 25+ messages in thread
From: Thorsten Glaser @ 2024-02-14 14:12 UTC (permalink / raw)
  To: musl

Rich Felker dixit:

>Since you mentioned 686-class which are 32-bit, is the same true for

AIUI for amd64 and x32, it should be fine to use.

bye,
//mirabilos
-- 
<ch> you introduced a merge commit        │<mika> % g rebase -i HEAD^^
<mika> sorry, no idea and rebasing just fscked │<mika> Segmentation
<ch> should have cloned into a clean repo      │  fault (core dumped)
<ch> if I rebase that now, it's really ugh     │<mika:#grml> wuahhhhhh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-13  2:08           ` Rich Felker
  2024-02-13 14:47             ` William Roberts
@ 2024-02-15  0:03             ` Szabolcs Nagy
  2024-02-15  0:22               ` enh
  1 sibling, 1 reply; 25+ messages in thread
From: Szabolcs Nagy @ 2024-02-15  0:03 UTC (permalink / raw)
  To: Rich Felker; +Cc: William Roberts, enh, musl

* Rich Felker <dalias@libc.org> [2024-02-12 21:08:34 -0500]:
> On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote:
> > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote:
> > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
> > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > > > It's a matter of building with -mbranch-protection=standard
> > > > >
> > > > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > > > the NOP space
> > > > > so they are backwards compatible, older hardware will just NOP it.

not quite that simple. sorry long brain dump follows:

tl;dr: i think the main issues are asm handling (property notes
and cfi for pac-ret), property note handling in ld.so, perf
overhead and possible compat issues of pac-ret (not possible
to disable per process) and testing (ensuring the code works
when pac/bti is not nop).

- asm code needs manual marking.

GNU_PROPERTY note in asm is ugly and error prone, see
https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/asmdefs.h#L23

i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK
(this was an oversight, llvm added an option, binutils gas has none).

- if asm code does indirect tailcall it should use x16 or x17.

bti c is compatible with the indirect branch that way.

- only functions that *maybe called indirectly* need bti c.

this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc
currently disagree about the definition in case of linker inserted
veneers: gcc assumes ld handles the case if an inserted veneer
indirectly branches to a non-bti location, llvm emits bti in all
functions (including local ones without their address taken), just
in case ld inserts an indirect veneer.
https://github.com/ARM-software/abi-aa/issues/196

there is some difference to how PLTs are emitted between the
linkers, but i don't think that causes compat issues (might cause
trouble for tools that try to interpret the PLT).

- dynamic linker has to figure out when to enable it.

systemd MDWE (memory deny write exec) feature used to seccomp
filter mprotect(PROT_EXEC) so even if the underlying mapping was
already PROT_EXEC and it just added a PROT_BTI on top, mprotect
would fail. this was fixed by adding an MDWE prctl to linux for
systemd to make it stop using that filter, but there may be other
similar seccomp filters and old kernels without MDWE so glibc
re-mmaps the exec segment (which systemd happily accepts).

note: the bit that tells if a load segment needs to be mapped as
PROT_BTI is in the load segment (usually program headers are in
the executable segment) so first mmap cannot get it right, unless
a quick read of the prog headers are done before mmap but that has
a lot of failure modes (the size can be unbounded) and does not
gain much compared to just mmap twice. except when the exe is mapped
by the kernel, then we cannot mmap since there is no fd and mprotect
may fail).
MDWE prctl: https://lwn.net/Articles/937315/

another detail is that static-exe / ld.so / vdso marking is handled
by the kernel, while other dsos are PROT_BTI marked by ld.so. an
interesting case is dynamic linked exe which was originally handled
by ld.so, but after the MDWE fiasco the kernel started loading it
with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI
protected by the kernel).

ld.so should also take care to gracefully handle BTI protection
failure (or invalid notes) in dlopen. (although invalid note is
more of an x86 thing.)

- special functions may need to return indirectly instead of ret

the only really nasty one is swapcontext which is not supported by
musl so that's good (and it is only a problem if bti is used with
a shadow-stack-like feature).

for returns_twice functions (like setjmp) the compiler emits bti j
at the call site so the second return can use an indirect branch.
return from unwinder to an exception handling landing pad via
indirect branch is supported too. otherwise return via indirect
branch is not supported (no bti j at call sites).

- a64fx (hpc core) implemented hint nops in a slow way

so glibc only adds bti if glibc is configured for bti.
https://sourceware.org/pipermail/libc-alpha/2021-May/125784.html

> > > > I think it's a little more elaborate than that. Those asm instructions
> > > > need to be added (probably as .instr or .word or something, unless
> > > > there's a way to spell this particular nop that existing tooling will
> > > > understand).
> > 
> > You just use the hint <immediate> instructions, they are understood by old
> > toolchains. But you can only support a subset of the BTI/PAC instructions

yes 'hint <imm>' is armv8.0-a, part of the base isa.

i think the non-hint pac instructions are not relevant to musl.

> > but it's been enough for most projects that follow the normal ABI conventions
> > like OpenSSL/BoringSSL,etc, but not enough for libffi for example.

as far as i know openssl is the reason android does not enable pac:
they added the hint instructions incorrectly at first so there are
binaries that fail if the hw enables pac.

this reveals another issue with pac/bti: since they are nops on most
existing hw, they are not properly tested (e.g. by distro QA) so a
binary can look ok until you move it to a newer machine. (but recent
amazon graviton 4 has pac/bti so we may get more coverage soon.)

> > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > > > not.
> > >
> > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing?
> > 
> > You need it, as the first instruction from a branch (where longjmp returns to)
> > needs to be a BTI instruction.
> 
> Is that different from a normal function return?
> 
> Note that in the case of sigsetjmp, (sig)longjmp returns to a point
> inside the sigsetjmp asm, so that point needs the annotation I think.

setjmp/sigsetjmp has to decide how to protect the longjmp return.

with a shadow-stack-like bw-edge-cfi, longjmp cannot return with
ret (first ret from setjmp consumes the return address from the
shadow stack), it must use indirect branch. (there is now gcs
which is aarch64 shadow stack, linux support is in progress).

since jmpbuf does not expose the return address representation a
libc specific protection can be applied (mangling or pac) and then
longjmp can use ret and remain protected. (but e.g. setcontext
exposes the pc/lr so those cannot be mangled in memory).

compilers emit bti j at the call site of returns_twice functions
to allow both ret and indirect branch. musl sigsetjmp can decide
what it does.

> > > > > It's been done for many projects, glibc and bionic have it. The
> > > > > problem with BTI is that when one item in the link
> > > > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > > > something like a libc that is fundamental in the link chain,
> > > > > it turns it off for everything.
> > > >
> > > > This presumably requires some kind of machinery for how dynamic
> > > > linking will work, and possibly turning it off if a library without it
> > > > is dlopened?
> > > >
> > > > My understanding doing some brief searches though was that you can
> > > > individually mprotect it off in certain regions. So maybe it's
> > > > possible to just enable only for DSOs that support it?
> > >
> > > correct.
> 
> OK, that's good to know. So which direction is it? Do DSOs that
> support BTI need it explicitly turned on via mprotect/mmap flags? Or
> is there some process-global flag to turn it on, and then ones that
> don't support it need it turned off?

per dso marking and explicit PROT_BTI setting.

> I suspect it's possible to first enable BTI for third-party libraries
> as a feature of the dynamic linker, and add BTI support for libc
> itself as a separate thing. That might be a nice factoring to make
> changes minimal and easy for ppl to read.

for third-party, it is enough to fix crt*.o and handle markings
in ld.so.

(but of course for bti to be effective the libc must have it too,
otherwise there are likely enough gadgets to do whatever)

> The changes in dynlink.c should be as arch-agnostic as possible. If
> there's a corresponding feature on other archs, it should use the same
> basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining
> the mechanisms for evaluating if an ELF file is compatible, how to do
> the mprotect, etc.

generic code should work.

but e.g. see above about who handles the marking (kernel vs ld.so),
turns out x86 (and likely aarch64, riscv) shadow stack uses different
rules: always libc handles the marking. so there are caveats.

> > > > > The initial scope of code changes would be what's reported when
> > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> > > >
> > > > Is there a way to disable these warnings so that every asm file does
> > > > not need to be cluttered with annotations?
> > >
> > > well, that's the ELF note stuff i was talking about, and if you don't
> > > have it you'll fall foul of the static linker saying "not all this
> > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> > > doing nothing because the static linker effectively tells it not to.
> > 
> > Yep, well said ENH. It's been since Android since we crossed paths :-).
> > 
> > It's not that hard to annotate an asm file :-p I forget what project
> > (I think it was gnutls, but they just use openssl's code for the asm)
> > but I just put it in a header file and by virtue of #include'ing it you get the
> > notes added.
> 
> Yes, we generally don't do that. There are no "asm headers" in musl;
> all asm files are self-contained and readable standalone. So if
> there's no way to tell the assembler/linker from the command line that
> files are BTI-compatible without generating a huge load of warning
> spam, I guess it's a mess of copy-and-paste...

currently there is only the ugly asm directives, see above.

final notes:

bti is fairly deployable (iirc x86 ibt failed because it is not per
dso, so dlopen does not really work, but aarch64 does not have that
problem), not strong security (the final binary is littered with
bti j/c so plenty opportunity to misdirect an indirect jump/call),
but at least it has minimal impact (minimal compatibility issues and
on a modern core even when bti is enabled it should not be slower).

for pac every function can independently decide if it uses pac-ret
(aka return address signing), no need for per dso marking. however
it has bigger compat as well as performance impact:

there are custom unwinders, pac-ret uses a new dwarf cfi (to mark
the code regions where the return address is signed), custom
unwinders may not understand this and that's a runtime crash.

some code looks at or modifies the return address (various hacks), such
code needs to be updated or not use pac-ret in relevant functions, but
such issues are hard to discover without hw.

pac is a per-boot system-wide setting, not per-process, so if there is
any issue or bug there is no way to disable it for one broken binary
(nowadays there is a disable prctl, but it is documented to slow the
system down, so not suitable for working around perf issues).

on simple cores pac can be slow (can add latency to non-leaf functions)

with 48bit va space, there is only 7bit pac, i.e. limited protection.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-15  0:03             ` Szabolcs Nagy
@ 2024-02-15  0:22               ` enh
  2024-02-15  9:18                 ` Szabolcs Nagy
  0 siblings, 1 reply; 25+ messages in thread
From: enh @ 2024-02-15  0:22 UTC (permalink / raw)
  To: Rich Felker, William Roberts, enh, musl

On Wed, Feb 14, 2024 at 4:03 PM Szabolcs Nagy <nsz@port70.net> wrote:
>
> * Rich Felker <dalias@libc.org> [2024-02-12 21:08:34 -0500]:
> > On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote:
> > > On Mon, Feb 12, 2024 at 5:05 PM enh <enh@google.com> wrote:
> > > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <dalias@libc.org> wrote:
> > > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > > > > It's a matter of building with -mbranch-protection=standard
> > > > > >
> > > > > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > > > > the NOP space
> > > > > > so they are backwards compatible, older hardware will just NOP it.
>
> not quite that simple. sorry long brain dump follows:
>
> tl;dr: i think the main issues are asm handling (property notes
> and cfi for pac-ret), property note handling in ld.so, perf
> overhead and possible compat issues of pac-ret (not possible
> to disable per process) and testing (ensuring the code works
> when pac/bti is not nop).
>
> - asm code needs manual marking.
>
> GNU_PROPERTY note in asm is ugly and error prone, see
> https://github.com/ARM-software/optimized-routines/blob/master/string/aarch64/asmdefs.h#L23
>
> i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK
> (this was an oversight, llvm added an option, binutils gas has none).

what's the option? (since Android only supports llvm, that might be
worth considering as a slight cleanup for us...)

> - if asm code does indirect tailcall it should use x16 or x17.
>
> bti c is compatible with the indirect branch that way.
>
> - only functions that *maybe called indirectly* need bti c.
>
> this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc
> currently disagree about the definition in case of linker inserted
> veneers: gcc assumes ld handles the case if an inserted veneer
> indirectly branches to a non-bti location, llvm emits bti in all
> functions (including local ones without their address taken), just
> in case ld inserts an indirect veneer.
> https://github.com/ARM-software/abi-aa/issues/196
>
> there is some difference to how PLTs are emitted between the
> linkers, but i don't think that causes compat issues (might cause
> trouble for tools that try to interpret the PLT).
>
> - dynamic linker has to figure out when to enable it.
>
> systemd MDWE (memory deny write exec) feature used to seccomp
> filter mprotect(PROT_EXEC) so even if the underlying mapping was
> already PROT_EXEC and it just added a PROT_BTI on top, mprotect
> would fail. this was fixed by adding an MDWE prctl to linux for
> systemd to make it stop using that filter, but there may be other
> similar seccomp filters and old kernels without MDWE so glibc
> re-mmaps the exec segment (which systemd happily accepts).
>
> note: the bit that tells if a load segment needs to be mapped as
> PROT_BTI is in the load segment (usually program headers are in
> the executable segment) so first mmap cannot get it right, unless
> a quick read of the prog headers are done before mmap but that has
> a lot of failure modes (the size can be unbounded) and does not
> gain much compared to just mmap twice. except when the exe is mapped
> by the kernel, then we cannot mmap since there is no fd and mprotect
> may fail).
> MDWE prctl: https://lwn.net/Articles/937315/
>
> another detail is that static-exe / ld.so / vdso marking is handled
> by the kernel, while other dsos are PROT_BTI marked by ld.so. an
> interesting case is dynamic linked exe which was originally handled
> by ld.so, but after the MDWE fiasco the kernel started loading it
> with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI
> protected by the kernel).
>
> ld.so should also take care to gracefully handle BTI protection
> failure (or invalid notes) in dlopen. (although invalid note is
> more of an x86 thing.)
>
> - special functions may need to return indirectly instead of ret
>
> the only really nasty one is swapcontext which is not supported by
> musl so that's good (and it is only a problem if bti is used with
> a shadow-stack-like feature).
>
> for returns_twice functions (like setjmp) the compiler emits bti j
> at the call site so the second return can use an indirect branch.
> return from unwinder to an exception handling landing pad via
> indirect branch is supported too. otherwise return via indirect
> branch is not supported (no bti j at call sites).
>
> - a64fx (hpc core) implemented hint nops in a slow way
>
> so glibc only adds bti if glibc is configured for bti.
> https://sourceware.org/pipermail/libc-alpha/2021-May/125784.html
>
> > > > > I think it's a little more elaborate than that. Those asm instructions
> > > > > need to be added (probably as .instr or .word or something, unless
> > > > > there's a way to spell this particular nop that existing tooling will
> > > > > understand).
> > >
> > > You just use the hint <immediate> instructions, they are understood by old
> > > toolchains. But you can only support a subset of the BTI/PAC instructions
>
> yes 'hint <imm>' is armv8.0-a, part of the base isa.
>
> i think the non-hint pac instructions are not relevant to musl.
>
> > > but it's been enough for most projects that follow the normal ABI conventions
> > > like OpenSSL/BoringSSL,etc, but not enough for libffi for example.
>
> as far as i know openssl is the reason android does not enable pac:
> they added the hint instructions incorrectly at first so there are
> binaries that fail if the hw enables pac.

that was one reason why "android does not enable pac" _by default in
the android target triples for app developers_, yes --- though i think
we're at the point where we think we should flip that default (not
least because the number of users whose devices would actually
_benefit_ from the extra instructions is a lot larger now!):
https://github.com/android/ndk/issues/1914

> this reveals another issue with pac/bti: since they are nops on most
> existing hw, they are not properly tested (e.g. by distro QA) so a
> binary can look ok until you move it to a newer machine. (but recent
> amazon graviton 4 has pac/bti so we may get more coverage soon.)

exactly --- that was one of our key concerns, that app developers
would _think_ they've tested with pac/bti but unknowingly used a
device without. (or even an x86-64 emulator!)

> > > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > > > > not.
> > > >
> > > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing?
> > >
> > > You need it, as the first instruction from a branch (where longjmp returns to)
> > > needs to be a BTI instruction.
> >
> > Is that different from a normal function return?
> >
> > Note that in the case of sigsetjmp, (sig)longjmp returns to a point
> > inside the sigsetjmp asm, so that point needs the annotation I think.
>
> setjmp/sigsetjmp has to decide how to protect the longjmp return.
>
> with a shadow-stack-like bw-edge-cfi, longjmp cannot return with
> ret (first ret from setjmp consumes the return address from the
> shadow stack), it must use indirect branch. (there is now gcs
> which is aarch64 shadow stack, linux support is in progress).
>
> since jmpbuf does not expose the return address representation a
> libc specific protection can be applied (mangling or pac) and then
> longjmp can use ret and remain protected. (but e.g. setcontext
> exposes the pc/lr so those cannot be mangled in memory).
>
> compilers emit bti j at the call site of returns_twice functions
> to allow both ret and indirect branch. musl sigsetjmp can decide
> what it does.
>
> > > > > > It's been done for many projects, glibc and bionic have it. The
> > > > > > problem with BTI is that when one item in the link
> > > > > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > > > > something like a libc that is fundamental in the link chain,
> > > > > > it turns it off for everything.
> > > > >
> > > > > This presumably requires some kind of machinery for how dynamic
> > > > > linking will work, and possibly turning it off if a library without it
> > > > > is dlopened?
> > > > >
> > > > > My understanding doing some brief searches though was that you can
> > > > > individually mprotect it off in certain regions. So maybe it's
> > > > > possible to just enable only for DSOs that support it?
> > > >
> > > > correct.
> >
> > OK, that's good to know. So which direction is it? Do DSOs that
> > support BTI need it explicitly turned on via mprotect/mmap flags? Or
> > is there some process-global flag to turn it on, and then ones that
> > don't support it need it turned off?
>
> per dso marking and explicit PROT_BTI setting.
>
> > I suspect it's possible to first enable BTI for third-party libraries
> > as a feature of the dynamic linker, and add BTI support for libc
> > itself as a separate thing. That might be a nice factoring to make
> > changes minimal and easy for ppl to read.
>
> for third-party, it is enough to fix crt*.o and handle markings
> in ld.so.
>
> (but of course for bti to be effective the libc must have it too,
> otherwise there are likely enough gadgets to do whatever)
>
> > The changes in dynlink.c should be as arch-agnostic as possible. If
> > there's a corresponding feature on other archs, it should use the same
> > basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining
> > the mechanisms for evaluating if an ELF file is compatible, how to do
> > the mprotect, etc.
>
> generic code should work.
>
> but e.g. see above about who handles the marking (kernel vs ld.so),
> turns out x86 (and likely aarch64, riscv) shadow stack uses different
> rules: always libc handles the marking. so there are caveats.
>
> > > > > > The initial scope of code changes would be what's reported when
> > > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> > > > >
> > > > > Is there a way to disable these warnings so that every asm file does
> > > > > not need to be cluttered with annotations?
> > > >
> > > > well, that's the ELF note stuff i was talking about, and if you don't
> > > > have it you'll fall foul of the static linker saying "not all this
> > > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> > > > doing nothing because the static linker effectively tells it not to.
> > >
> > > Yep, well said ENH. It's been since Android since we crossed paths :-).
> > >
> > > It's not that hard to annotate an asm file :-p I forget what project
> > > (I think it was gnutls, but they just use openssl's code for the asm)
> > > but I just put it in a header file and by virtue of #include'ing it you get the
> > > notes added.
> >
> > Yes, we generally don't do that. There are no "asm headers" in musl;
> > all asm files are self-contained and readable standalone. So if
> > there's no way to tell the assembler/linker from the command line that
> > files are BTI-compatible without generating a huge load of warning
> > spam, I guess it's a mess of copy-and-paste...
>
> currently there is only the ugly asm directives, see above.
>
> final notes:
>
> bti is fairly deployable (iirc x86 ibt failed because it is not per
> dso, so dlopen does not really work, but aarch64 does not have that
> problem), not strong security (the final binary is littered with
> bti j/c so plenty opportunity to misdirect an indirect jump/call),
> but at least it has minimal impact (minimal compatibility issues and
> on a modern core even when bti is enabled it should not be slower).
>
> for pac every function can independently decide if it uses pac-ret
> (aka return address signing), no need for per dso marking. however
> it has bigger compat as well as performance impact:
>
> there are custom unwinders, pac-ret uses a new dwarf cfi (to mark
> the code regions where the return address is signed), custom
> unwinders may not understand this and that's a runtime crash.
>
> some code looks at or modifies the return address (various hacks), such
> code needs to be updated or not use pac-ret in relevant functions, but
> such issues are hard to discover without hw.
>
> pac is a per-boot system-wide setting, not per-process, so if there is
> any issue or bug there is no way to disable it for one broken binary
> (nowadays there is a disable prctl, but it is documented to slow the
> system down, so not suitable for working around perf issues).
>
> on simple cores pac can be slow (can add latency to non-leaf functions)
>
> with 48bit va space, there is only 7bit pac, i.e. limited protection.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-15  0:22               ` enh
@ 2024-02-15  9:18                 ` Szabolcs Nagy
  0 siblings, 0 replies; 25+ messages in thread
From: Szabolcs Nagy @ 2024-02-15  9:18 UTC (permalink / raw)
  To: enh; +Cc: Rich Felker, William Roberts, musl

* enh <enh@google.com> [2024-02-14 16:22:05 -0800]:
> On Wed, Feb 14, 2024 at 4:03 PM Szabolcs Nagy <nsz@port70.net> wrote:
> > i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK
> > (this was an oversight, llvm added an option, binutils gas has none).
> 
> what's the option? (since Android only supports llvm, that might be
> worth considering as a slight cleanup for us...)

-mmark-bti-property
https://releases.llvm.org/16.0.0/tools/clang/docs/ClangCommandLineReference.html#cmdoption-clang-mmark-bti-property
https://reviews.llvm.org/D81930

> > as far as i know openssl is the reason android does not enable pac:
> > they added the hint instructions incorrectly at first so there are
> > binaries that fail if the hw enables pac.
> 
> that was one reason why "android does not enable pac" _by default in
> the android target triples for app developers_, yes --- though i think
> we're at the point where we think we should flip that default (not
> least because the number of users whose devices would actually
> _benefit_ from the extra instructions is a lot larger now!):
> https://github.com/android/ndk/issues/1914

i see

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-14  2:19                 ` Rich Felker
  2024-02-14  3:19                   ` William Roberts
  2024-02-14  4:44                   ` Markus Wichmann
@ 2024-02-15 13:29                   ` Stefan O'Rear
  2024-02-15 14:06                     ` Rich Felker
  2 siblings, 1 reply; 25+ messages in thread
From: Stefan O'Rear @ 2024-02-15 13:29 UTC (permalink / raw)
  To: Rich Felker, musl, Markus Wichmann; +Cc: enh

On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote:
> What is the situation on x86? Does it use the same kind of per-page
> enforcement mode, or is it only global, requiring disabling it if any
> DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> older ISA levels, or does it need to be conditional?

The situation for hardware control flow hardening on risc-v is two
in-development extensions:

Zicfilp (landing pads) provides a 4-byte instruction which marks valid
targets for indirect jumps and calls, written `lpad LABEL`.  This is
an *architectural NOP at all ISA levels*.  Enforcement is
process-global, not per-page.

Indirect jumps can be exempted from landing pad depending on which
register is used for the address; this is expected to be used if the
address is obtained from read-only memory or an auipc instruction, so
jump tables do not use landing pads, nor are landing pads needed after
direct calls regardless of length.  A function which is not a visible
symbol and does not have its address taken does not need a landing pad.

The ABI function return is a member of the set of indirect jumps
which bypass landing pad checks, so no landing pads are needed at the
return sites of ABI function calls.  Zicfilp intentionally does not
provide any protection against ROP, a different extension must be used
to protect return addresses.

Landing pads have a 20-bit label which is expected to be used for a
function type signature, catching function type confusion events.
The hashing scheme used to generate the label from the call signature
has not yet been decided.  The call signature must be placed in the
x7/t2 register prior to an indirect jump.  The immediate layout is
such that indirect jump sites can use a single lui instruction with
a matching 20-bit immediate.  Landing pads do not check x7/t2 if
reached by a direct jump, so there is no need to initialize it prior
to a direct jump.  A `lpad 0` matches any incoming type signature.

Zicfiss (shadow stacks) provides a new shadow stack pointer register
and shadow stack memory which cannot be modified using ordinary stores.
Unlike GCS and SHSTK, the shadow stack is never accessed automatically,
"sspush ra" and "sspopchk ra" instructions must be added to the prologue
and epilogue of functions which spill their return address to the stack.
These instructions are NOPs if the shadow stack is disabled at runtime,
but are *not architectural NOPs* and will trap if executed on current
hardware.

Also unlike GCS and SHSTK, the Zicfiss `ssp` register can be read and
written from user mode using dedicated instructions, so no special
mechanism is used for shadow stack switching.

To my knowledge, nothing analogous to PAC is under development.

Both shadow stacks and landing pads are enabled by bits in the senvcfg
register, and are exposed via a prctl.  The shadow stack prctl is being
developed as an architecture-independent API, which provides some form
of automatic allocation and deallocation of shadow stacks for threads.
I believe the current strategy for marking CFI support in binaries is
an ELF note similar to the x86 approach, but have not checked this part
in detail.

-s

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-15 13:29                   ` Stefan O'Rear
@ 2024-02-15 14:06                     ` Rich Felker
  2024-03-02 14:33                       ` Szabolcs Nagy
  0 siblings, 1 reply; 25+ messages in thread
From: Rich Felker @ 2024-02-15 14:06 UTC (permalink / raw)
  To: Stefan O'Rear; +Cc: musl, Markus Wichmann, enh

On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote:
> On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote:
> > What is the situation on x86? Does it use the same kind of per-page
> > enforcement mode, or is it only global, requiring disabling it if any
> > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> > older ISA levels, or does it need to be conditional?
> 
> The situation for hardware control flow hardening on risc-v is two
> in-development extensions:
> 
> Zicfilp (landing pads) provides a 4-byte instruction which marks valid
> targets for indirect jumps and calls, written `lpad LABEL`.  This is
> an *architectural NOP at all ISA levels*.  Enforcement is
> process-global, not per-page.
> 
> Indirect jumps can be exempted from landing pad depending on which
> register is used for the address; this is expected to be used if the
> address is obtained from read-only memory or an auipc instruction, so
> jump tables do not use landing pads, nor are landing pads needed after
> direct calls regardless of length.  A function which is not a visible
> symbol and does not have its address taken does not need a landing pad.
> 
> The ABI function return is a member of the set of indirect jumps
> which bypass landing pad checks, so no landing pads are needed at the
> return sites of ABI function calls.  Zicfilp intentionally does not
> provide any protection against ROP, a different extension must be used
> to protect return addresses.

This all sounds very good and reasonable to support.

> Landing pads have a 20-bit label which is expected to be used for a
> function type signature, catching function type confusion events.
> The hashing scheme used to generate the label from the call signature
> has not yet been decided.  The call signature must be placed in the
> x7/t2 register prior to an indirect jump.  The immediate layout is
> such that indirect jump sites can use a single lui instruction with
> a matching 20-bit immediate.  Landing pads do not check x7/t2 if
> reached by a direct jump, so there is no need to initialize it prior
> to a direct jump.  A `lpad 0` matches any incoming type signature.

This is very interesting. I wonder if it will break code with UB like:

https://github.com/systemd/systemd/blob/d0aef638ac43ad64df920d8b3f6c2d835db7643c/src/basic/sort-util.h

It's my belief that it *should* break such code, and that breaking it
would be a feature. But I could see folks making the choice to hash
just the "mechanical" types rather than actual types, and there may be
practical reasons this is what needs to be done.

Note that this also has implications for musl and whether we would
ever be able to redefine some opaque types. In fact, we already have
some types, like pthread_t, which are defined differently in
__cplusplus mode to match a name mangling ABI; these would be badly
broken. I'm not sure what the right fix for that would be. (Doing that
to begin with was almost surely a big mistake.)

> Zicfiss (shadow stacks) provides a new shadow stack pointer register
> and shadow stack memory which cannot be modified using ordinary stores.
> Unlike GCS and SHSTK, the shadow stack is never accessed automatically,
> "sspush ra" and "sspopchk ra" instructions must be added to the prologue
> and epilogue of functions which spill their return address to the stack.
> These instructions are NOPs if the shadow stack is disabled at runtime,
> but are *not architectural NOPs* and will trap if executed on current
> hardware.
> 
> Also unlike GCS and SHSTK, the Zicfiss `ssp` register can be read and
> written from user mode using dedicated instructions, so no special
> mechanism is used for shadow stack switching.
> 
> To my knowledge, nothing analogous to PAC is under development.

This is unfortunate, since PAC seems a lot less invasive and
actually-doable. However, protection equivalent to PAC also seems
possible in software, in an entirely arch-agnostic way, with overhead
only slightly higher than standard SSP... so I'm not sure why we
aren't just pursuing getting compilers to do that rather than chasing
arch-specific anti-ROP hacks vendors are trying to use to
differentiate themselves and remain relevant in the age of open
ISAs...

> Both shadow stacks and landing pads are enabled by bits in the senvcfg
> register, and are exposed via a prctl.  The shadow stack prctl is being
> developed as an architecture-independent API, which provides some form
> of automatic allocation and deallocation of shadow stacks for threads.
> I believe the current strategy for marking CFI support in binaries is
> an ELF note similar to the x86 approach, but have not checked this part
> in detail.

I know this should be written up in more detail, but based on request
on IRC, I think it would be good to go ahead and mention "in public"
on the list:

*** Any API for shadow stacks that involved automatic allocation and
deallocation which can fail "behind the application's back" at runtime
is a very poor candidate for support by musl. ***

To be supported, shadow stacks would probably need to use contiguous
memory (with special protections applied to it for the duration of its
usage as call stack, with automatic end to that status if it's
subsequently accessed with normal loads/stores) with the normal
application-provided stack, so as not to break sigaltstack,
pthread_setstack, makecontext, etc. and not to introduce memory leaks
or conditions under which a behind-the-scenes allocation failure makes
hard program termination the only possible result.

AFAICT the current shadow stack stuff in the kernel (and maybe the
underlying hardware mechanisms) is not usable.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-12 18:42 ` Rich Felker
  2024-02-12 21:25   ` William Roberts
@ 2024-02-19 23:54   ` Fangrui Song
       [not found]   ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com>
  2 siblings, 0 replies; 25+ messages in thread
From: Fangrui Song @ 2024-02-19 23:54 UTC (permalink / raw)
  To: musl; +Cc: William Roberts, Anton Korobeynikov

On Mon, Feb 12, 2024 at 10:42 AM Rich Felker <dalias@libc.org> wrote:
>
> On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > Hello,
> >
> > I was just wondering if there was any work being done to support PAC
> > and BTI in aarch64? I could add support but didn't want to duplicate
> > the work.
>
> I'm not aware of any active work on this, but before writing a full
> implementation, it would be really helpful to start with a basic
> proposal for the scope of changes needed to make it work to assess
> whether these are managable and acceptable cost.
>
> Rich

Cc +Anton (other messages of this thread can be found at
https://www.openwall.com/lists/musl/2024/02/12/ ).

Per https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/23
and an lld/ELF patch

* https://github.com/access-softek/llvm-project/commits/elf-pauth
* https://github.com/access-softek/musl/tree/dkovalev/pauth-code-drop

contains a prototype.

> We verified that LLVM testsuite compiled with pauth successfully passes on pauth-enabled AArch64 board.

https://www.openwall.com/lists/musl/2024/02/12/

It looks like there will be an LLVM Pointer Authentication discussion
in a few hours:
https://calendar.google.com/calendar/u/0/embed?src=calendar@llvm.org

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
       [not found]   ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com>
@ 2024-02-20  6:21     ` Anton Korobeynikov
  0 siblings, 0 replies; 25+ messages in thread
From: Anton Korobeynikov @ 2024-02-20  6:21 UTC (permalink / raw)
  To: Fangrui Song; +Cc: musl, William Roberts

Thanks Fangrui!

For PAC / BTI no support from the C standard library is required. All
changes are ordinary source code changes and only assembler sources
should contain proper annotations / notes / BTI checks.

The links above are about pointer authentication ABI (aka "arm64e").
PAC / BTI could be considered as part of it, but only a small one.
Over the last few months we have been working on bringing pauth to
ELF-based platforms. Our aim is to have pauth ABI support to be
released as a part of LLVM 19.

That github Access Softek repo is a downstream fork that contains
rebased Apple changes to frontend, intrinsics, etc. and ELF codegen
bits. We are working on upstreaming code from it to LLVM mainline.

For pauth more deep interaction with standard library is required, as
dynamic loader should process pauth relocations, and sign pointers as
needed. Plus, some additional handling of the gnu.note segment would
be necessary as one would need to e.g. prohibit loading of DSOs with
incompatible ABI. We are having a proof-of-concept patch for MUSL to
process pauth relocations
(https://github.com/access-softek/musl/pull/1). We have not submitted
it to MUSL upstream as there are lots of moving pieces and we do not
want to submit something that could be changed (e.g. reloc numbers
already changed once).

Certainly, for pauth support additional code changes to assembler
sources would be required. As well as ABI marking.

PS: Please CC me on responses as I am not subscribed.

On Mon, Feb 19, 2024 at 4:01 PM Fangrui Song <i@maskray.me> wrote:
>
> On Mon, Feb 12, 2024 at 10:42 AM Rich Felker <dalias@libc.org> wrote:
> >
> > On Mon, Feb 12, 2024 at 10:38:50AM -0600, William Roberts wrote:
> > > Hello,
> > >
> > > I was just wondering if there was any work being done to support PAC
> > > and BTI in aarch64? I could add support but didn't want to duplicate
> > > the work.
> >
> > I'm not aware of any active work on this, but before writing a full
> > implementation, it would be really helpful to start with a basic
> > proposal for the scope of changes needed to make it work to assess
> > whether these are managable and acceptable cost.
> >
> > Rich
>
> Cc +Anton (other messages of this thread can be found at
> https://www.openwall.com/lists/musl/2024/02/12/ ).
>
> Per https://discourse.llvm.org/t/llvm-pointer-authentication-sync-ups/62661/23
> and an lld/ELF patch
>
> * https://github.com/access-softek/llvm-project/commits/elf-pauth
> * https://github.com/access-softek/musl/tree/dkovalev/pauth-code-drop
>
> contains a prototype.
>
> > We verified that LLVM testsuite compiled with pauth successfully passes on pauth-enabled AArch64 board.
>
> https://www.openwall.com/lists/musl/2024/02/12/
>
> It looks like there will be an LLVM Pointer Authentication discussion
> in a few hours:
> https://calendar.google.com/calendar/u/0/embed?src=calendar@llvm.org



-- 
With best regards, Anton Korobeynikov

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-02-15 14:06                     ` Rich Felker
@ 2024-03-02 14:33                       ` Szabolcs Nagy
  2024-03-02 14:45                         ` Rich Felker
  0 siblings, 1 reply; 25+ messages in thread
From: Szabolcs Nagy @ 2024-03-02 14:33 UTC (permalink / raw)
  To: Rich Felker; +Cc: Stefan O'Rear, musl, Markus Wichmann, enh

* Rich Felker <dalias@libc.org> [2024-02-15 09:06:40 -0500]:

> On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote:
> > On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote:
> > > What is the situation on x86? Does it use the same kind of per-page
> > > enforcement mode, or is it only global, requiring disabling it if any
> > > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> > > older ISA levels, or does it need to be conditional?
> > 
> > The situation for hardware control flow hardening on risc-v is two
> > in-development extensions:
> > 
> > Zicfilp (landing pads) provides a 4-byte instruction which marks valid
> > targets for indirect jumps and calls, written `lpad LABEL`.  This is
> > an *architectural NOP at all ISA levels*.  Enforcement is
> > process-global, not per-page.
> > 
> > Indirect jumps can be exempted from landing pad depending on which
> > register is used for the address; this is expected to be used if the
> > address is obtained from read-only memory or an auipc instruction, so
> > jump tables do not use landing pads, nor are landing pads needed after
> > direct calls regardless of length.  A function which is not a visible
> > symbol and does not have its address taken does not need a landing pad.
> > 
> > The ABI function return is a member of the set of indirect jumps
> > which bypass landing pad checks, so no landing pads are needed at the
> > return sites of ABI function calls.  Zicfilp intentionally does not
> > provide any protection against ROP, a different extension must be used
> > to protect return addresses.
> 
> This all sounds very good and reasonable to support.


process global setting is not practical
because legacy code maybe dlopened so libc
cannot decide when to enable the feature.

linux in general only provides per thread disable
for such features which does not help with dlopen.


> > Both shadow stacks and landing pads are enabled by bits in the senvcfg
> > register, and are exposed via a prctl.  The shadow stack prctl is being

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [musl] PAC/BTI Support on aarch64
  2024-03-02 14:33                       ` Szabolcs Nagy
@ 2024-03-02 14:45                         ` Rich Felker
  0 siblings, 0 replies; 25+ messages in thread
From: Rich Felker @ 2024-03-02 14:45 UTC (permalink / raw)
  To: Stefan O'Rear, musl, Markus Wichmann, enh

On Sat, Mar 02, 2024 at 03:33:45PM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2024-02-15 09:06:40 -0500]:
> 
> > On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote:
> > > On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote:
> > > > What is the situation on x86? Does it use the same kind of per-page
> > > > enforcement mode, or is it only global, requiring disabling it if any
> > > > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> > > > older ISA levels, or does it need to be conditional?
> > > 
> > > The situation for hardware control flow hardening on risc-v is two
> > > in-development extensions:
> > > 
> > > Zicfilp (landing pads) provides a 4-byte instruction which marks valid
> > > targets for indirect jumps and calls, written `lpad LABEL`.  This is
> > > an *architectural NOP at all ISA levels*.  Enforcement is
> > > process-global, not per-page.
> > > 
> > > Indirect jumps can be exempted from landing pad depending on which
> > > register is used for the address; this is expected to be used if the
> > > address is obtained from read-only memory or an auipc instruction, so
> > > jump tables do not use landing pads, nor are landing pads needed after
> > > direct calls regardless of length.  A function which is not a visible
> > > symbol and does not have its address taken does not need a landing pad.
> > > 
> > > The ABI function return is a member of the set of indirect jumps
> > > which bypass landing pad checks, so no landing pads are needed at the
> > > return sites of ABI function calls.  Zicfilp intentionally does not
> > > provide any protection against ROP, a different extension must be used
> > > to protect return addresses.
> > 
> > This all sounds very good and reasonable to support.
> 
> process global setting is not practical
> because legacy code maybe dlopened so libc
> cannot decide when to enable the feature.

That's exactly what you need process-global: so as soon as you dlopen
an incompatible library, all enforcement gets turned off and
everything turns into nops.

> linux in general only provides per thread disable
> for such features which does not help with dlopen.

Indeed this is a problem. The kernel needs to provide a way to make
sure none of the special instructions, which may still be pending (and
blocked by arbitrarily many interrupting stack frames) fault if
executed after disabling. In theory there are horrible ways userspace
could do this if we wrapped signal handlers and patched things up at
every signal return (to restart any interrupted critical section), but
that kind of invasiveness is not worth it to support shadow stacks.

Rich

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-03-02 14:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 16:38 [musl] PAC/BTI Support on aarch64 William Roberts
2024-02-12 18:42 ` Rich Felker
2024-02-12 21:25   ` William Roberts
2024-02-12 21:34     ` enh
2024-02-12 22:46     ` Rich Felker
2024-02-12 23:05       ` enh
2024-02-12 23:18         ` William Roberts
2024-02-13  2:08           ` Rich Felker
2024-02-13 14:47             ` William Roberts
2024-02-13 17:51               ` Markus Wichmann
2024-02-14  2:19                 ` Rich Felker
2024-02-14  3:19                   ` William Roberts
2024-02-14  4:44                   ` Markus Wichmann
2024-02-14 13:32                     ` Thorsten Glaser
2024-02-14 14:03                       ` Rich Felker
2024-02-14 14:12                         ` Thorsten Glaser
2024-02-15 13:29                   ` Stefan O'Rear
2024-02-15 14:06                     ` Rich Felker
2024-03-02 14:33                       ` Szabolcs Nagy
2024-03-02 14:45                         ` Rich Felker
2024-02-15  0:03             ` Szabolcs Nagy
2024-02-15  0:22               ` enh
2024-02-15  9:18                 ` Szabolcs Nagy
2024-02-19 23:54   ` Fangrui Song
     [not found]   ` <DS7PR12MB57659BC5D5536574D1B91D26CB502@DS7PR12MB5765.namprd12.prod.outlook.com>
2024-02-20  6:21     ` Anton Korobeynikov

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).