* [musl] aarch64 SME support issues
@ 2025-07-01 21:37 Rich Felker
2025-07-06 18:20 ` Szabolcs Nagy
0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2025-07-01 21:37 UTC (permalink / raw)
To: musl
There's a thread going on elsewhere (glibc, kernel folks, etc.) that
I'm CC'd on but that has not been on the musl list so far, about
support for the aarch64 SME extension. I was under the impression that
the way things were done on the ISA side, it should be possible to
support applications that use it as long as the kernel does the right
things, without any consideration for whether libc is new enough to
know about it. (This is a condition I would deem necessary for it to
be a transparent, non-ABI-breaking addition.) However, it seems that
may not be the case. Here is a link to the current tail of the thread
(note that it extends back thru June and May as well):
https://sourceware.org/pipermail/libc-alpha/2025-July/168330.html
At present, we should not have any musl-linked applications attempting
to use SME, since it's mandatory to check the hwcap bits for it, and
we have never defined the corresponding hwcap macro. (However it's
possible that someone is wrongly bypassing libc headers and using the
kernel ones, or defining it themselves, in which case they get to keep
both pieces.)
Anyway, the immediate question I have in mind in preparation for a
release is whether we should do something to future-proof for this
now. Specifically, should we have the aarch64 entry code mask off all
unknown hwcap bits? This would make it so if at some point in the
future we expose a macro for SME, applications don't detect it as
available if they're run with 1.2.6. (Note: this wouldn't help with
1.2.5 or earlier, since that ship has already sailed.)
The downside of this is that it would prevent using any other ISA
features newer than what were available when the libc version shipped.
But if ARM is potentially going to be making future ISA extensions
breaking like this, it might be the safety-correct option.
If OTOH applications that use SME reference a libc-provided symbol
(rather than a libgcc-provided one) to do the ABI magic, failure to
resolve symbols would prevent them from being run unsafely, and
there's not any issue.
I'd welcome input from anyone more familiar with the particulars of
SME than myself.
Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] aarch64 SME support issues
2025-07-01 21:37 [musl] aarch64 SME support issues Rich Felker
@ 2025-07-06 18:20 ` Szabolcs Nagy
2025-07-08 16:20 ` Rich Felker
0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2025-07-06 18:20 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
* Rich Felker <dalias@libc.org> [2025-07-01 17:37:03 -0400]:
> There's a thread going on elsewhere (glibc, kernel folks, etc.) that
> I'm CC'd on but that has not been on the musl list so far, about
> support for the aarch64 SME extension. I was under the impression that
> the way things were done on the ISA side, it should be possible to
> support applications that use it as long as the kernel does the right
> things, without any consideration for whether libc is new enough to
> know about it. (This is a condition I would deem necessary for it to
> be a transparent, non-ABI-breaking addition.) However, it seems that
> may not be the case. Here is a link to the current tail of the thread
> (note that it extends back thru June and May as well):
>
> https://sourceware.org/pipermail/libc-alpha/2025-July/168330.html
>
> At present, we should not have any musl-linked applications attempting
> to use SME, since it's mandatory to check the hwcap bits for it, and
> we have never defined the corresponding hwcap macro. (However it's
> possible that someone is wrongly bypassing libc headers and using the
> kernel ones, or defining it themselves, in which case they get to keep
> both pieces.)
>
> Anyway, the immediate question I have in mind in preparation for a
> release is whether we should do something to future-proof for this
> now. Specifically, should we have the aarch64 entry code mask off all
> unknown hwcap bits? This would make it so if at some point in the
> future we expose a macro for SME, applications don't detect it as
> available if they're run with 1.2.6. (Note: this wouldn't help with
> 1.2.5 or earlier, since that ship has already sailed.)
fwiw i would not fiddle with hwcap for this release
1. there are ways around that (cpu id registers for features
are now emulated for userspace by linux and hwcap is visible
in auxv etc) so we cant do it cleanly.
2. users of sme za state should rarely longjmp or create threads
so we are worrying about a cornercase we havent seen in practice
yet.
3. i think libgcc does not enable sme for musl due to lack of
__getauxval (not configure detected for bootstrap reasons,
based on target triplet, on for *-linux-gnu) so discussion is
moot until libgcc is updated.
morally the sme runtime should be in libc but it ended up in
libgcc because that's supportable in old glibc without abi
update, there were glibc vs gcc release schedule dependency
delays and testability problems otherwise and because 2. the
abi breakage is unlikely.
but yes currently the libc control over sme is via __getauxval
and hwcap masking if we want it off.
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/aarch64/__aarch64_have_sme.c;hb=HEAD
>
> The downside of this is that it would prevent using any other ISA
> features newer than what were available when the libc version shipped.
> But if ARM is potentially going to be making future ISA extensions
> breaking like this, it might be the safety-correct option.
>
> If OTOH applications that use SME reference a libc-provided symbol
> (rather than a libgcc-provided one) to do the ABI magic, failure to
> resolve symbols would prevent them from being run unsafely, and
> there's not any issue.
for newlib libgcc uses a libc symbol __aarch64_sme_accessible
because there is no __getauxval.
but that's problematic for dynamic linking: the sme runtime
is in shared libgcc like the unwinder so all applications using
libgcc would fail not just the sme ones if the symbol is missing.
>
> I'd welcome input from anyone more familiar with the particulars of
> SME than myself.
>
> Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] aarch64 SME support issues
2025-07-06 18:20 ` Szabolcs Nagy
@ 2025-07-08 16:20 ` Rich Felker
2025-07-09 14:26 ` Szabolcs Nagy
0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2025-07-08 16:20 UTC (permalink / raw)
To: musl
On Sun, Jul 06, 2025 at 08:20:40PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2025-07-01 17:37:03 -0400]:
> > There's a thread going on elsewhere (glibc, kernel folks, etc.) that
> > I'm CC'd on but that has not been on the musl list so far, about
> > support for the aarch64 SME extension. I was under the impression that
> > the way things were done on the ISA side, it should be possible to
> > support applications that use it as long as the kernel does the right
> > things, without any consideration for whether libc is new enough to
> > know about it. (This is a condition I would deem necessary for it to
> > be a transparent, non-ABI-breaking addition.) However, it seems that
> > may not be the case. Here is a link to the current tail of the thread
> > (note that it extends back thru June and May as well):
> >
> > https://sourceware.org/pipermail/libc-alpha/2025-July/168330.html
> >
> > At present, we should not have any musl-linked applications attempting
> > to use SME, since it's mandatory to check the hwcap bits for it, and
> > we have never defined the corresponding hwcap macro. (However it's
> > possible that someone is wrongly bypassing libc headers and using the
> > kernel ones, or defining it themselves, in which case they get to keep
> > both pieces.)
> >
> > Anyway, the immediate question I have in mind in preparation for a
> > release is whether we should do something to future-proof for this
> > now. Specifically, should we have the aarch64 entry code mask off all
> > unknown hwcap bits? This would make it so if at some point in the
> > future we expose a macro for SME, applications don't detect it as
> > available if they're run with 1.2.6. (Note: this wouldn't help with
> > 1.2.5 or earlier, since that ship has already sailed.)
>
> fwiw i would not fiddle with hwcap for this release
Based on what you've said below I think that's not a good idea. See
inline responses:
> 1. there are ways around that (cpu id registers for features
> are now emulated for userspace by linux and hwcap is visible
> in auxv etc) so we cant do it cleanly.
The documentation I found for using SME says it's required to use the
hwcap bit to determine availability, not other means.
> 2. users of sme za state should rarely longjmp or create threads
> so we are worrying about a cornercase we havent seen in practice
> yet.
Yes, that's not generally the way musl deals with safety tho.
> 3. i think libgcc does not enable sme for musl due to lack of
> __getauxval (not configure detected for bootstrap reasons,
> based on target triplet, on for *-linux-gnu) so discussion is
> moot until libgcc is updated.
I'm planning to include your patch exposing __getauxval in this
release, which would thereby enable SME support on musl in a way that
would silently break. So it sounds like adding __getauxval and one of
either masking off hwcap, or actually adding working SME support, need
to happen "atomically" in the same release in order not to put broken
configurations into the wild.
> morally the sme runtime should be in libc but it ended up in
> libgcc because that's supportable in old glibc without abi
> update, there were glibc vs gcc release schedule dependency
> delays and testability problems otherwise and because 2. the
> abi breakage is unlikely.
It sounds like this was a commercial consideration for rapidly pushing
a new feature to be available on existing system versions not actually
prepared to support it safely. The norm should be that new
functionality doesn't necessarily work on older systems and
applications need to be prepared for that.
> but yes currently the libc control over sme is via __getauxval
> and hwcap masking if we want it off.
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/aarch64/__aarch64_have_sme.c;hb=HEAD
Do you have a recommendation/preference beween masking it off or
dropping the __getauxval exposure for now?
I think I'd rather mask it off, since in the (unusual but plausible)
case where a static-only toolchain is built, I think the libgccc
configure test will see the hidden __getauxval and be able to use it
already.
And if we do masking, I think it makes sense to mask off all unknown
bits so this doesn't happen again in the future with the next new
thing, but I'm not sure. Does this sound reasonable? Are there any
cases where *hiding* a hwcap bit could result in malfunction?
> > The downside of this is that it would prevent using any other ISA
> > features newer than what were available when the libc version shipped.
> > But if ARM is potentially going to be making future ISA extensions
> > breaking like this, it might be the safety-correct option.
> >
> > If OTOH applications that use SME reference a libc-provided symbol
> > (rather than a libgcc-provided one) to do the ABI magic, failure to
> > resolve symbols would prevent them from being run unsafely, and
> > there's not any issue.
>
> for newlib libgcc uses a libc symbol __aarch64_sme_accessible
> because there is no __getauxval.
>
> but that's problematic for dynamic linking: the sme runtime
> is in shared libgcc like the unwinder so all applications using
> libgcc would fail not just the sme ones if the symbol is missing.
Indeed, that doesn't seem like a great idea.
Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] aarch64 SME support issues
2025-07-08 16:20 ` Rich Felker
@ 2025-07-09 14:26 ` Szabolcs Nagy
2025-07-09 18:45 ` Rich Felker
0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2025-07-09 14:26 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
* Rich Felker <dalias@libc.org> [2025-07-08 12:20:11 -0400]:
> On Sun, Jul 06, 2025 at 08:20:40PM +0200, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2025-07-01 17:37:03 -0400]:
> > > There's a thread going on elsewhere (glibc, kernel folks, etc.) that
> > > I'm CC'd on but that has not been on the musl list so far, about
> > > support for the aarch64 SME extension. I was under the impression that
> > > the way things were done on the ISA side, it should be possible to
> > > support applications that use it as long as the kernel does the right
> > > things, without any consideration for whether libc is new enough to
> > > know about it. (This is a condition I would deem necessary for it to
> > > be a transparent, non-ABI-breaking addition.) However, it seems that
> > > may not be the case. Here is a link to the current tail of the thread
> > > (note that it extends back thru June and May as well):
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2025-July/168330.html
> > >
> > > At present, we should not have any musl-linked applications attempting
> > > to use SME, since it's mandatory to check the hwcap bits for it, and
> > > we have never defined the corresponding hwcap macro. (However it's
> > > possible that someone is wrongly bypassing libc headers and using the
> > > kernel ones, or defining it themselves, in which case they get to keep
> > > both pieces.)
> > >
> > > Anyway, the immediate question I have in mind in preparation for a
> > > release is whether we should do something to future-proof for this
> > > now. Specifically, should we have the aarch64 entry code mask off all
> > > unknown hwcap bits? This would make it so if at some point in the
> > > future we expose a macro for SME, applications don't detect it as
> > > available if they're run with 1.2.6. (Note: this wouldn't help with
> > > 1.2.5 or earlier, since that ship has already sailed.)
> >
> > fwiw i would not fiddle with hwcap for this release
>
> Based on what you've said below I think that's not a good idea. See
> inline responses:
>
> > 1. there are ways around that (cpu id registers for features
> > are now emulated for userspace by linux and hwcap is visible
> > in auxv etc) so we cant do it cleanly.
>
> The documentation I found for using SME says it's required to use the
> hwcap bit to determine availability, not other means.
>
> > 2. users of sme za state should rarely longjmp or create threads
> > so we are worrying about a cornercase we havent seen in practice
> > yet.
>
> Yes, that's not generally the way musl deals with safety tho.
>
> > 3. i think libgcc does not enable sme for musl due to lack of
> > __getauxval (not configure detected for bootstrap reasons,
> > based on target triplet, on for *-linux-gnu) so discussion is
> > moot until libgcc is updated.
>
> I'm planning to include your patch exposing __getauxval in this
> release, which would thereby enable SME support on musl in a way that
> would silently break. So it sounds like adding __getauxval and one of
> either masking off hwcap, or actually adding working SME support, need
> to happen "atomically" in the same release in order not to put broken
> configurations into the wild.
>
> > morally the sme runtime should be in libc but it ended up in
> > libgcc because that's supportable in old glibc without abi
> > update, there were glibc vs gcc release schedule dependency
> > delays and testability problems otherwise and because 2. the
> > abi breakage is unlikely.
>
> It sounds like this was a commercial consideration for rapidly pushing
> a new feature to be available on existing system versions not actually
> prepared to support it safely. The norm should be that new
> functionality doesn't necessarily work on older systems and
> applications need to be prepared for that.
>
> > but yes currently the libc control over sme is via __getauxval
> > and hwcap masking if we want it off.
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/config/aarch64/__aarch64_have_sme.c;hb=HEAD
>
> Do you have a recommendation/preference beween masking it off or
> dropping the __getauxval exposure for now?
>
> I think I'd rather mask it off, since in the (unusual but plausible)
> case where a static-only toolchain is built, I think the libgccc
> configure test will see the hidden __getauxval and be able to use it
> already.
>
> And if we do masking, I think it makes sense to mask off all unknown
> bits so this doesn't happen again in the future with the next new
> thing, but I'm not sure. Does this sound reasonable? Are there any
> cases where *hiding* a hwcap bit could result in malfunction?
ok i hadnt considered the __getauxval change, i think that
is useful to go in: it will take time to safely update libgcc
so better to add it sooner and potentially more widely useful
than just for SME.
i think hiding a hwcap bit may lead to inconsistencies due
to kernel behaving differently than what libc pretends,
but i don't have a strong case, it likely can only affect
hacky code. so likely no abi break for normal code.
e.g. kernel enables BTI on vdso (or static exe) and user code
trying to indirect jump into the middle of a function after
checking via the libc hwcap that bti is off.
or creating MTE tagged objects via mprotect + instructions
based on cpuid and then passing them to a function that is
only MTE safe when HWCAP_MTE is set.
or different part of atomics code trying to detect 128bit
lse atomics support differently (hwcap vs cpuid).
note that HWCAP2 is all used up, and now the top 32 bits
of HWCAP are getting allocated (used to be reserved when
we thought ilp32 was a thing, now only the top 2 bits are
kept for libc to use), musl does not have AT_HWCAP3 but
user code may query that anyway as AT_* values are abi.
not sure if you plan to deal with AT_HWCAP3 too.
i think masking HWCAP_SME* and top bits of AT_HWCAP
above 1<<41 should be fine for now. presumably this
can be undone if sme support is added.
>
> > > The downside of this is that it would prevent using any other ISA
> > > features newer than what were available when the libc version shipped.
> > > But if ARM is potentially going to be making future ISA extensions
> > > breaking like this, it might be the safety-correct option.
> > >
> > > If OTOH applications that use SME reference a libc-provided symbol
> > > (rather than a libgcc-provided one) to do the ABI magic, failure to
> > > resolve symbols would prevent them from being run unsafely, and
> > > there's not any issue.
> >
> > for newlib libgcc uses a libc symbol __aarch64_sme_accessible
> > because there is no __getauxval.
> >
> > but that's problematic for dynamic linking: the sme runtime
> > is in shared libgcc like the unwinder so all applications using
> > libgcc would fail not just the sme ones if the symbol is missing.
>
> Indeed, that doesn't seem like a great idea.
>
> Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: aarch64 SME support issues
2025-07-09 14:26 ` Szabolcs Nagy
@ 2025-07-09 18:45 ` Rich Felker
2025-07-09 19:02 ` Rich Felker
0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2025-07-09 18:45 UTC (permalink / raw)
To: musl
On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > Do you have a recommendation/preference beween masking it off or
> > dropping the __getauxval exposure for now?
> >
> > I think I'd rather mask it off, since in the (unusual but plausible)
> > case where a static-only toolchain is built, I think the libgccc
> > configure test will see the hidden __getauxval and be able to use it
> > already.
> >
> > And if we do masking, I think it makes sense to mask off all unknown
> > bits so this doesn't happen again in the future with the next new
> > thing, but I'm not sure. Does this sound reasonable? Are there any
> > cases where *hiding* a hwcap bit could result in malfunction?
>
> ok i hadnt considered the __getauxval change, i think that
> is useful to go in: it will take time to safely update libgcc
> so better to add it sooner and potentially more widely useful
> than just for SME.
>
> i think hiding a hwcap bit may lead to inconsistencies due
> to kernel behaving differently than what libc pretends,
> but i don't have a strong case, it likely can only affect
> hacky code. so likely no abi break for normal code.
Yes that's what I'd expect.
> e.g. kernel enables BTI on vdso (or static exe) and user code
> trying to indirect jump into the middle of a function after
> checking via the libc hwcap that bti is off.
>
> or creating MTE tagged objects via mprotect + instructions
> based on cpuid and then passing them to a function that is
> only MTE safe when HWCAP_MTE is set.
Note that we don't need to mask off any caps we already know the
semantics for, only SME and possibly as-yet-unassigned ones we don't
know will be safe without libc support.
> or different part of atomics code trying to detect 128bit
> lse atomics support differently (hwcap vs cpuid).
>
> note that HWCAP2 is all used up, and now the top 32 bits
> of HWCAP are getting allocated (used to be reserved when
> we thought ilp32 was a thing, now only the top 2 bits are
> kept for libc to use), musl does not have AT_HWCAP3 but
> user code may query that anyway as AT_* values are abi.
> not sure if you plan to deal with AT_HWCAP3 too.
>
> i think masking HWCAP_SME* and top bits of AT_HWCAP
> above 1<<41 should be fine for now. presumably this
> can be undone if sme support is added.
Sounds good. Should we add and mask hwcap3 too?
Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: aarch64 SME support issues
2025-07-09 18:45 ` Rich Felker
@ 2025-07-09 19:02 ` Rich Felker
2025-07-09 22:47 ` Szabolcs Nagy
0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2025-07-09 19:02 UTC (permalink / raw)
To: musl
On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > > Do you have a recommendation/preference beween masking it off or
> > > dropping the __getauxval exposure for now?
> > >
> > > I think I'd rather mask it off, since in the (unusual but plausible)
> > > case where a static-only toolchain is built, I think the libgccc
> > > configure test will see the hidden __getauxval and be able to use it
> > > already.
> > >
> > > And if we do masking, I think it makes sense to mask off all unknown
> > > bits so this doesn't happen again in the future with the next new
> > > thing, but I'm not sure. Does this sound reasonable? Are there any
> > > cases where *hiding* a hwcap bit could result in malfunction?
> >
> > ok i hadnt considered the __getauxval change, i think that
> > is useful to go in: it will take time to safely update libgcc
> > so better to add it sooner and potentially more widely useful
> > than just for SME.
> >
> > i think hiding a hwcap bit may lead to inconsistencies due
> > to kernel behaving differently than what libc pretends,
> > but i don't have a strong case, it likely can only affect
> > hacky code. so likely no abi break for normal code.
>
> Yes that's what I'd expect.
>
> > e.g. kernel enables BTI on vdso (or static exe) and user code
> > trying to indirect jump into the middle of a function after
> > checking via the libc hwcap that bti is off.
> >
> > or creating MTE tagged objects via mprotect + instructions
> > based on cpuid and then passing them to a function that is
> > only MTE safe when HWCAP_MTE is set.
>
> Note that we don't need to mask off any caps we already know the
> semantics for, only SME and possibly as-yet-unassigned ones we don't
> know will be safe without libc support.
>
> > or different part of atomics code trying to detect 128bit
> > lse atomics support differently (hwcap vs cpuid).
> >
> > note that HWCAP2 is all used up, and now the top 32 bits
> > of HWCAP are getting allocated (used to be reserved when
> > we thought ilp32 was a thing, now only the top 2 bits are
> > kept for libc to use), musl does not have AT_HWCAP3 but
> > user code may query that anyway as AT_* values are abi.
> > not sure if you plan to deal with AT_HWCAP3 too.
> >
> > i think masking HWCAP_SME* and top bits of AT_HWCAP
> > above 1<<41 should be fine for now. presumably this
> > can be undone if sme support is added.
>
> Sounds good. Should we add and mask hwcap3 too?
Hmm, it looks like there are hwcap2 sme bits:
#define HWCAP2_SME (1 << 23)
#define HWCAP2_SME_I16I64 (1 << 24)
#define HWCAP2_SME_F64F64 (1 << 25)
#define HWCAP2_SME_I8I32 (1 << 26)
#define HWCAP2_SME_F16F32 (1 << 27)
#define HWCAP2_SME_B16F32 (1 << 28)
#define HWCAP2_SME_F32F32 (1 << 29)
#define HWCAP2_SME_FA64 (1 << 30)
...
#define HWCAP2_SME2 (1UL << 37)
#define HWCAP2_SME2P1 (1UL << 38)
#define HWCAP2_SME_I16I32 (1UL << 39)
#define HWCAP2_SME_BI32I32 (1UL << 40)
#define HWCAP2_SME_B16B16 (1UL << 41)
#define HWCAP2_SME_F16F16 (1UL << 42)
...
#define HWCAP2_SME_LUTV2 (1UL << 57)
#define HWCAP2_SME_F8F16 (1UL << 58)
#define HWCAP2_SME_F8F32 (1UL << 59)
#define HWCAP2_SME_SF8FMA (1UL << 60)
#define HWCAP2_SME_SF8DP4 (1UL << 61)
#define HWCAP2_SME_SF8DP2 (1UL << 62)
Not clear if any others are SME-related.
In plain hwcap I see:
#define HWCAP_SME2P2 (1UL << 42)
#define HWCAP_SME_SBITPERM (1UL << 43)
#define HWCAP_SME_AES (1UL << 44)
#define HWCAP_SME_SFEXPA (1UL << 45)
#define HWCAP_SME_STMOP (1UL << 46)
#define HWCAP_SME_SMOP4 (1UL << 47)
And no hwcap3 bits defined yet.
Should the above all be masked? Any I missed?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: aarch64 SME support issues
2025-07-09 19:02 ` Rich Felker
@ 2025-07-09 22:47 ` Szabolcs Nagy
2025-07-13 2:12 ` [musl] " Rich Felker
0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2025-07-09 22:47 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
* Rich Felker <dalias@libc.org> [2025-07-09 15:02:35 -0400]:
> On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > > > Do you have a recommendation/preference beween masking it off or
> > > > dropping the __getauxval exposure for now?
> > > >
> > > > I think I'd rather mask it off, since in the (unusual but plausible)
> > > > case where a static-only toolchain is built, I think the libgccc
> > > > configure test will see the hidden __getauxval and be able to use it
> > > > already.
> > > >
> > > > And if we do masking, I think it makes sense to mask off all unknown
> > > > bits so this doesn't happen again in the future with the next new
> > > > thing, but I'm not sure. Does this sound reasonable? Are there any
> > > > cases where *hiding* a hwcap bit could result in malfunction?
> > >
> > > ok i hadnt considered the __getauxval change, i think that
> > > is useful to go in: it will take time to safely update libgcc
> > > so better to add it sooner and potentially more widely useful
> > > than just for SME.
> > >
> > > i think hiding a hwcap bit may lead to inconsistencies due
> > > to kernel behaving differently than what libc pretends,
> > > but i don't have a strong case, it likely can only affect
> > > hacky code. so likely no abi break for normal code.
> >
> > Yes that's what I'd expect.
> >
> > > e.g. kernel enables BTI on vdso (or static exe) and user code
> > > trying to indirect jump into the middle of a function after
> > > checking via the libc hwcap that bti is off.
> > >
> > > or creating MTE tagged objects via mprotect + instructions
> > > based on cpuid and then passing them to a function that is
> > > only MTE safe when HWCAP_MTE is set.
> >
> > Note that we don't need to mask off any caps we already know the
> > semantics for, only SME and possibly as-yet-unassigned ones we don't
> > know will be safe without libc support.
these were meant to be examples of how masking
a future unknown hwcap bit may go wrong based
on existing hwcaps where libc hwcap vs kernel/isa
difference may be visible.
> >
> > > or different part of atomics code trying to detect 128bit
> > > lse atomics support differently (hwcap vs cpuid).
> > >
> > > note that HWCAP2 is all used up, and now the top 32 bits
> > > of HWCAP are getting allocated (used to be reserved when
> > > we thought ilp32 was a thing, now only the top 2 bits are
> > > kept for libc to use), musl does not have AT_HWCAP3 but
> > > user code may query that anyway as AT_* values are abi.
> > > not sure if you plan to deal with AT_HWCAP3 too.
> > >
> > > i think masking HWCAP_SME* and top bits of AT_HWCAP
> > > above 1<<41 should be fine for now. presumably this
> > > can be undone if sme support is added.
> >
> > Sounds good. Should we add and mask hwcap3 too?
>
> Hmm, it looks like there are hwcap2 sme bits:
>
> #define HWCAP2_SME (1 << 23)
> #define HWCAP2_SME_I16I64 (1 << 24)
> #define HWCAP2_SME_F64F64 (1 << 25)
> #define HWCAP2_SME_I8I32 (1 << 26)
> #define HWCAP2_SME_F16F32 (1 << 27)
> #define HWCAP2_SME_B16F32 (1 << 28)
> #define HWCAP2_SME_F32F32 (1 << 29)
> #define HWCAP2_SME_FA64 (1 << 30)
> ...
> #define HWCAP2_SME2 (1UL << 37)
> #define HWCAP2_SME2P1 (1UL << 38)
> #define HWCAP2_SME_I16I32 (1UL << 39)
> #define HWCAP2_SME_BI32I32 (1UL << 40)
> #define HWCAP2_SME_B16B16 (1UL << 41)
> #define HWCAP2_SME_F16F16 (1UL << 42)
> ...
> #define HWCAP2_SME_LUTV2 (1UL << 57)
> #define HWCAP2_SME_F8F16 (1UL << 58)
> #define HWCAP2_SME_F8F32 (1UL << 59)
> #define HWCAP2_SME_SF8FMA (1UL << 60)
> #define HWCAP2_SME_SF8DP4 (1UL << 61)
> #define HWCAP2_SME_SF8DP2 (1UL << 62)
>
> Not clear if any others are SME-related.
>
> In plain hwcap I see:
>
> #define HWCAP_SME2P2 (1UL << 42)
> #define HWCAP_SME_SBITPERM (1UL << 43)
> #define HWCAP_SME_AES (1UL << 44)
> #define HWCAP_SME_SFEXPA (1UL << 45)
> #define HWCAP_SME_STMOP (1UL << 46)
> #define HWCAP_SME_SMOP4 (1UL << 47)
>
> And no hwcap3 bits defined yet.
>
> Should the above all be masked? Any I missed?
yeah i'd mask them all even if in principle
HWCAP2_SME should be enough. i don't think
any of the non-SME hwcaps imply HWCAP2_SME.
if we mask future bits then i think HWCAP3 should
be masked too. there are no bits defined yet, so
no existing kernel would pass it in auxv yet, but
once it is passed musl should return 0 for it.
i just fear that if ppl figure out that musl is
masking bits they will try to work it around by
using whacky cpu feature detection. so ideally
we don't keep masking forever (i can look into
adding sme support, but not right now).
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] aarch64 SME support issues
2025-07-09 22:47 ` Szabolcs Nagy
@ 2025-07-13 2:12 ` Rich Felker
2025-07-16 15:35 ` Szabolcs Nagy
0 siblings, 1 reply; 10+ messages in thread
From: Rich Felker @ 2025-07-13 2:12 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 4971 bytes --]
On Thu, Jul 10, 2025 at 12:47:32AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2025-07-09 15:02:35 -0400]:
> > On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > > On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > > > > Do you have a recommendation/preference beween masking it off or
> > > > > dropping the __getauxval exposure for now?
> > > > >
> > > > > I think I'd rather mask it off, since in the (unusual but plausible)
> > > > > case where a static-only toolchain is built, I think the libgccc
> > > > > configure test will see the hidden __getauxval and be able to use it
> > > > > already.
> > > > >
> > > > > And if we do masking, I think it makes sense to mask off all unknown
> > > > > bits so this doesn't happen again in the future with the next new
> > > > > thing, but I'm not sure. Does this sound reasonable? Are there any
> > > > > cases where *hiding* a hwcap bit could result in malfunction?
> > > >
> > > > ok i hadnt considered the __getauxval change, i think that
> > > > is useful to go in: it will take time to safely update libgcc
> > > > so better to add it sooner and potentially more widely useful
> > > > than just for SME.
> > > >
> > > > i think hiding a hwcap bit may lead to inconsistencies due
> > > > to kernel behaving differently than what libc pretends,
> > > > but i don't have a strong case, it likely can only affect
> > > > hacky code. so likely no abi break for normal code.
> > >
> > > Yes that's what I'd expect.
> > >
> > > > e.g. kernel enables BTI on vdso (or static exe) and user code
> > > > trying to indirect jump into the middle of a function after
> > > > checking via the libc hwcap that bti is off.
> > > >
> > > > or creating MTE tagged objects via mprotect + instructions
> > > > based on cpuid and then passing them to a function that is
> > > > only MTE safe when HWCAP_MTE is set.
> > >
> > > Note that we don't need to mask off any caps we already know the
> > > semantics for, only SME and possibly as-yet-unassigned ones we don't
> > > know will be safe without libc support.
>
> these were meant to be examples of how masking
> a future unknown hwcap bit may go wrong based
> on existing hwcaps where libc hwcap vs kernel/isa
> difference may be visible.
>
> > >
> > > > or different part of atomics code trying to detect 128bit
> > > > lse atomics support differently (hwcap vs cpuid).
> > > >
> > > > note that HWCAP2 is all used up, and now the top 32 bits
> > > > of HWCAP are getting allocated (used to be reserved when
> > > > we thought ilp32 was a thing, now only the top 2 bits are
> > > > kept for libc to use), musl does not have AT_HWCAP3 but
> > > > user code may query that anyway as AT_* values are abi.
> > > > not sure if you plan to deal with AT_HWCAP3 too.
> > > >
> > > > i think masking HWCAP_SME* and top bits of AT_HWCAP
> > > > above 1<<41 should be fine for now. presumably this
> > > > can be undone if sme support is added.
> > >
> > > Sounds good. Should we add and mask hwcap3 too?
> >
> > Hmm, it looks like there are hwcap2 sme bits:
> >
> > #define HWCAP2_SME (1 << 23)
> > #define HWCAP2_SME_I16I64 (1 << 24)
> > #define HWCAP2_SME_F64F64 (1 << 25)
> > #define HWCAP2_SME_I8I32 (1 << 26)
> > #define HWCAP2_SME_F16F32 (1 << 27)
> > #define HWCAP2_SME_B16F32 (1 << 28)
> > #define HWCAP2_SME_F32F32 (1 << 29)
> > #define HWCAP2_SME_FA64 (1 << 30)
> > ...
> > #define HWCAP2_SME2 (1UL << 37)
> > #define HWCAP2_SME2P1 (1UL << 38)
> > #define HWCAP2_SME_I16I32 (1UL << 39)
> > #define HWCAP2_SME_BI32I32 (1UL << 40)
> > #define HWCAP2_SME_B16B16 (1UL << 41)
> > #define HWCAP2_SME_F16F16 (1UL << 42)
> > ...
> > #define HWCAP2_SME_LUTV2 (1UL << 57)
> > #define HWCAP2_SME_F8F16 (1UL << 58)
> > #define HWCAP2_SME_F8F32 (1UL << 59)
> > #define HWCAP2_SME_SF8FMA (1UL << 60)
> > #define HWCAP2_SME_SF8DP4 (1UL << 61)
> > #define HWCAP2_SME_SF8DP2 (1UL << 62)
> >
> > Not clear if any others are SME-related.
> >
> > In plain hwcap I see:
> >
> > #define HWCAP_SME2P2 (1UL << 42)
> > #define HWCAP_SME_SBITPERM (1UL << 43)
> > #define HWCAP_SME_AES (1UL << 44)
> > #define HWCAP_SME_SFEXPA (1UL << 45)
> > #define HWCAP_SME_STMOP (1UL << 46)
> > #define HWCAP_SME_SMOP4 (1UL << 47)
> >
> > And no hwcap3 bits defined yet.
> >
> > Should the above all be masked? Any I missed?
>
> yeah i'd mask them all even if in principle
> HWCAP2_SME should be enough. i don't think
> any of the non-SME hwcaps imply HWCAP2_SME.
>
> if we mask future bits then i think HWCAP3 should
> be masked too. there are no bits defined yet, so
> no existing kernel would pass it in auxv yet, but
> once it is passed musl should return 0 for it.
>
> i just fear that if ppl figure out that musl is
> masking bits they will try to work it around by
> using whacky cpu feature detection. so ideally
> we don't keep masking forever (i can look into
> adding sme support, but not right now).
Proposed code attached.
Rich
[-- Attachment #2: __set_thread_area.c --]
[-- Type: text/plain, Size: 715 bytes --]
#include <elf.h>
#include "libc.h"
#define BITRANGE(a,b) (2*(1UL<<(b))-(1UL<<(a)))
int __set_thread_area(void *p)
{
__asm__ __volatile__ ("msr tpidr_el0,%0" : : "r"(p) : "memory");
/* Mask off hwcap bits for SME and unknown future features. This is
* necessary because SME is not safe to use without libc support for
* it, and we do not (yet) have such support. */
for (size_t *v = libc.auxv; *v; v+=2) {
if (v[0]==AT_HWCAP) {
v[1] &= ~BITRANGE(42,63); /* 42-47 are SME */
} else if (v[0]==AT_HWCAP2) {
v[1] &= ~(BITRANGE(23,30)
| BITRANGE(37,42)
| BITRANGE(57,62));
} else if (v[0]==AT_HWCAP3 || v[0]==AT_HWCAP4) {
v[0] = AT_IGNORE;
v[1] = 0;
}
}
return 0;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [musl] aarch64 SME support issues
2025-07-13 2:12 ` [musl] " Rich Felker
@ 2025-07-16 15:35 ` Szabolcs Nagy
2025-07-16 16:51 ` Rich Felker
0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2025-07-16 15:35 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
* Rich Felker <dalias@libc.org> [2025-07-12 22:12:41 -0400]:
> On Thu, Jul 10, 2025 at 12:47:32AM +0200, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2025-07-09 15:02:35 -0400]:
> > > On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > > Hmm, it looks like there are hwcap2 sme bits:
> > >
> > > #define HWCAP2_SME (1 << 23)
> > > #define HWCAP2_SME_I16I64 (1 << 24)
> > > #define HWCAP2_SME_F64F64 (1 << 25)
> > > #define HWCAP2_SME_I8I32 (1 << 26)
> > > #define HWCAP2_SME_F16F32 (1 << 27)
> > > #define HWCAP2_SME_B16F32 (1 << 28)
> > > #define HWCAP2_SME_F32F32 (1 << 29)
> > > #define HWCAP2_SME_FA64 (1 << 30)
> > > ...
> > > #define HWCAP2_SME2 (1UL << 37)
> > > #define HWCAP2_SME2P1 (1UL << 38)
> > > #define HWCAP2_SME_I16I32 (1UL << 39)
> > > #define HWCAP2_SME_BI32I32 (1UL << 40)
> > > #define HWCAP2_SME_B16B16 (1UL << 41)
> > > #define HWCAP2_SME_F16F16 (1UL << 42)
> > > ...
> > > #define HWCAP2_SME_LUTV2 (1UL << 57)
> > > #define HWCAP2_SME_F8F16 (1UL << 58)
> > > #define HWCAP2_SME_F8F32 (1UL << 59)
> > > #define HWCAP2_SME_SF8FMA (1UL << 60)
> > > #define HWCAP2_SME_SF8DP4 (1UL << 61)
> > > #define HWCAP2_SME_SF8DP2 (1UL << 62)
> > >
> > > Not clear if any others are SME-related.
> > >
> > > In plain hwcap I see:
> > >
> > > #define HWCAP_SME2P2 (1UL << 42)
> > > #define HWCAP_SME_SBITPERM (1UL << 43)
> > > #define HWCAP_SME_AES (1UL << 44)
> > > #define HWCAP_SME_SFEXPA (1UL << 45)
> > > #define HWCAP_SME_STMOP (1UL << 46)
> > > #define HWCAP_SME_SMOP4 (1UL << 47)
> > >
> > > And no hwcap3 bits defined yet.
> > >
> > > Should the above all be masked? Any I missed?
> >
> > yeah i'd mask them all even if in principle
> > HWCAP2_SME should be enough. i don't think
> > any of the non-SME hwcaps imply HWCAP2_SME.
> >
> > if we mask future bits then i think HWCAP3 should
> > be masked too. there are no bits defined yet, so
> > no existing kernel would pass it in auxv yet, but
> > once it is passed musl should return 0 for it.
> >
> > i just fear that if ppl figure out that musl is
> > masking bits they will try to work it around by
> > using whacky cpu feature detection. so ideally
> > we don't keep masking forever (i can look into
> > adding sme support, but not right now).
>
> Proposed code attached.
>
> Rich
code looks good.
> #include <elf.h>
> #include "libc.h"
>
> #define BITRANGE(a,b) (2*(1UL<<(b))-(1UL<<(a)))
>
> int __set_thread_area(void *p)
> {
> __asm__ __volatile__ ("msr tpidr_el0,%0" : : "r"(p) : "memory");
>
> /* Mask off hwcap bits for SME and unknown future features. This is
> * necessary because SME is not safe to use without libc support for
> * it, and we do not (yet) have such support. */
> for (size_t *v = libc.auxv; *v; v+=2) {
> if (v[0]==AT_HWCAP) {
> v[1] &= ~BITRANGE(42,63); /* 42-47 are SME */
> } else if (v[0]==AT_HWCAP2) {
> v[1] &= ~(BITRANGE(23,30)
> | BITRANGE(37,42)
> | BITRANGE(57,62));
> } else if (v[0]==AT_HWCAP3 || v[0]==AT_HWCAP4) {
> v[0] = AT_IGNORE;
> v[1] = 0;
> }
> }
>
> return 0;
> }
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: aarch64 SME support issues
2025-07-16 15:35 ` Szabolcs Nagy
@ 2025-07-16 16:51 ` Rich Felker
0 siblings, 0 replies; 10+ messages in thread
From: Rich Felker @ 2025-07-16 16:51 UTC (permalink / raw)
To: musl
On Wed, Jul 16, 2025 at 05:35:14PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2025-07-12 22:12:41 -0400]:
> > On Thu, Jul 10, 2025 at 12:47:32AM +0200, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2025-07-09 15:02:35 -0400]:
> > > > On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > > > Hmm, it looks like there are hwcap2 sme bits:
> > > >
> > > > #define HWCAP2_SME (1 << 23)
> > > > #define HWCAP2_SME_I16I64 (1 << 24)
> > > > #define HWCAP2_SME_F64F64 (1 << 25)
> > > > #define HWCAP2_SME_I8I32 (1 << 26)
> > > > #define HWCAP2_SME_F16F32 (1 << 27)
> > > > #define HWCAP2_SME_B16F32 (1 << 28)
> > > > #define HWCAP2_SME_F32F32 (1 << 29)
> > > > #define HWCAP2_SME_FA64 (1 << 30)
> > > > ...
> > > > #define HWCAP2_SME2 (1UL << 37)
> > > > #define HWCAP2_SME2P1 (1UL << 38)
> > > > #define HWCAP2_SME_I16I32 (1UL << 39)
> > > > #define HWCAP2_SME_BI32I32 (1UL << 40)
> > > > #define HWCAP2_SME_B16B16 (1UL << 41)
> > > > #define HWCAP2_SME_F16F16 (1UL << 42)
> > > > ...
> > > > #define HWCAP2_SME_LUTV2 (1UL << 57)
> > > > #define HWCAP2_SME_F8F16 (1UL << 58)
> > > > #define HWCAP2_SME_F8F32 (1UL << 59)
> > > > #define HWCAP2_SME_SF8FMA (1UL << 60)
> > > > #define HWCAP2_SME_SF8DP4 (1UL << 61)
> > > > #define HWCAP2_SME_SF8DP2 (1UL << 62)
> > > >
> > > > Not clear if any others are SME-related.
> > > >
> > > > In plain hwcap I see:
> > > >
> > > > #define HWCAP_SME2P2 (1UL << 42)
> > > > #define HWCAP_SME_SBITPERM (1UL << 43)
> > > > #define HWCAP_SME_AES (1UL << 44)
> > > > #define HWCAP_SME_SFEXPA (1UL << 45)
> > > > #define HWCAP_SME_STMOP (1UL << 46)
> > > > #define HWCAP_SME_SMOP4 (1UL << 47)
> > > >
> > > > And no hwcap3 bits defined yet.
> > > >
> > > > Should the above all be masked? Any I missed?
> > >
> > > yeah i'd mask them all even if in principle
> > > HWCAP2_SME should be enough. i don't think
> > > any of the non-SME hwcaps imply HWCAP2_SME.
> > >
> > > if we mask future bits then i think HWCAP3 should
> > > be masked too. there are no bits defined yet, so
> > > no existing kernel would pass it in auxv yet, but
> > > once it is passed musl should return 0 for it.
> > >
> > > i just fear that if ppl figure out that musl is
> > > masking bits they will try to work it around by
> > > using whacky cpu feature detection. so ideally
> > > we don't keep masking forever (i can look into
> > > adding sme support, but not right now).
> >
> > Proposed code attached.
> >
> > Rich
>
> code looks good.
>
> > #include <elf.h>
> > #include "libc.h"
> >
> > #define BITRANGE(a,b) (2*(1UL<<(b))-(1UL<<(a)))
> >
> > int __set_thread_area(void *p)
> > {
> > __asm__ __volatile__ ("msr tpidr_el0,%0" : : "r"(p) : "memory");
> >
> > /* Mask off hwcap bits for SME and unknown future features. This is
> > * necessary because SME is not safe to use without libc support for
> > * it, and we do not (yet) have such support. */
> > for (size_t *v = libc.auxv; *v; v+=2) {
> > if (v[0]==AT_HWCAP) {
> > v[1] &= ~BITRANGE(42,63); /* 42-47 are SME */
> > } else if (v[0]==AT_HWCAP2) {
> > v[1] &= ~(BITRANGE(23,30)
> > | BITRANGE(37,42)
> > | BITRANGE(57,62));
> > } else if (v[0]==AT_HWCAP3 || v[0]==AT_HWCAP4) {
> > v[0] = AT_IGNORE;
> > v[1] = 0;
> > }
> > }
> >
> > return 0;
> > }
OK, merging. I'll push this with a commit message and the other stuff
I'd been holding off on til this issue was resolved.
Thanks for the feedback/help!
Rich
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-16 16:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-01 21:37 [musl] aarch64 SME support issues Rich Felker
2025-07-06 18:20 ` Szabolcs Nagy
2025-07-08 16:20 ` Rich Felker
2025-07-09 14:26 ` Szabolcs Nagy
2025-07-09 18:45 ` Rich Felker
2025-07-09 19:02 ` Rich Felker
2025-07-09 22:47 ` Szabolcs Nagy
2025-07-13 2:12 ` [musl] " Rich Felker
2025-07-16 15:35 ` Szabolcs Nagy
2025-07-16 16:51 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).