mailing list of musl libc
 help / color / mirror / code / Atom feed
* ARM atomics overhaul for musl
@ 2014-11-16  5:56 Rich Felker
  2014-11-16 16:33 ` Russell King - ARM Linux
  2014-11-16 22:33 ` Jens Gustedt
  0 siblings, 2 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16  5:56 UTC (permalink / raw)
  To: musl
  Cc: Andy Lutomirski, Russell King - ARM Linux, Szabolcs Nagy,
	Kees Cook, linux-arm-kernel

One item on the agenda for this release cycle is overhauling the way
atomics are done on ARM. I'm cc'ing people who have been involved in
this discussion in the past in case anyone's not on the musl list and
has opinions about what should be done.

The current situation looks like the following:

Pre-v6: Hard-coded to use cas from kuser_helper page (0xffff0fc0)

v6: Hard-coded to use ldrex/strex with mcr-based barrier

v7+: Hard-coded to use ldrex/strex with dmb-based barrier

In the cases where ldrex/strex are used directly, they're still not
used optimally; all the non-cas primitives like atomic inc/dec are
built on top of cas and thus have more loop complexity and probably
more barriers than they should.

Aside from that, the only case among the above that's "right" already
is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
deprecated (future models may not support the instruction, and
although the kernel could trap and emulate it this would be horribly
slow) and hard-coding kuser helper on pre-v6 is wrong because pre-v6
binaries might run on v6+ hardware and kernel where the kernel has
been built with the kuser_helper page removed for security.

My main goals for this overhaul are:

1. Make baseline (pre-v6) binaries truely universal so they run even
   on kernels with kuser_helper removed.

2. Make v7+ perform competitively. This means optimal code sequences
   for a_cas, a_swap, a_fetch_add, a_store, etc. rather than just
   doing everything with a_cas.

What's still not entirely clear is what to do with v6, and how goal #1
should be achieved. The options are basically:

A. Prefer using ldrex/strex and an appropriate barrier directly, but
   fall back to kuser_helper (assuming it's present) if the hwcap or
   similar does not indicate availability of atomics.

B. Prefer kuser_helper and and only fallback to using atomics and an
   appropriate barrier directly if kuser_helper page is missing.

Of these two approaches, A seems easier, because it's easier to know
that atomics are available (via HWCAP_TLS) than that kuser_helper is
(which requires some sort of probe for the mapping if we want to
support grsec kernels where the mapping is completely missing; if not,
we can just check the kuser version number at a fixed address).
However neither is really very easy because it seems impossible to
detect whether the mcr-based barrier or the dmb-based barrier should
be used -- there's no hwcap flag to indicate support for the latter.
This also complicates what to do in builds for v6.

Before proceeding, I think we need some sort of proposed way to detect
the availability of dmb. If there really is none, we probably need to
go with option B (prefer kuser_helper) for both pre-v6 and v6 (i.e.
only use atomics directly on v7+) and choose what to do when
kuser_helper is missing: either assume v7+ and use dmb, or assume that
the mcr barrier is still working and use it. I think I would lean
towards the latter.

Rich


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16  5:56 ARM atomics overhaul for musl Rich Felker
@ 2014-11-16 16:33 ` Russell King - ARM Linux
  2014-11-16 16:50   ` Rich Felker
  2014-11-17 11:48   ` Catalin Marinas
  2014-11-16 22:33 ` Jens Gustedt
  1 sibling, 2 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-16 16:33 UTC (permalink / raw)
  To: Rich Felker
  Cc: musl, Andy Lutomirski, Szabolcs Nagy, Kees Cook, linux-arm-kernel

On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> Aside from that, the only case among the above that's "right" already
> is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's

I don't think it's wrong at all.  The instruction isn't going away from
ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
ARM architecture, we aren't going to see the MCR instruction disappearing
on 32-bit CPUs.

On ARMv8, it may have been removed, but we have already decided that the
kernel _must_ provide emulation for this op-code, because otherwise we
are breaking existing userspace, which is just not permissible.  However,
you are absolutely right that running on ARMv8 should use the new
instruction where possible.

> However neither is really very easy because it seems impossible to
> detect whether the mcr-based barrier or the dmb-based barrier should
> be used -- there's no hwcap flag to indicate support for the latter.
> This also complicates what to do in builds for v6.

It is entirely possible to detect whether you should use mcr or dmb, and
you've said how to do that all the way through this message.  The mcr
instruction is present on ARMv6, and present but deprecated on ARMv7.
dmb is only present on ARMv7.  So, if you know the CPU architecture, you
know whether you should be using nothing, mcr, or dmb.

There's two ways to get that - firstly, the uname syscall, which gives
a string in the form "armv..." which gives the CPU architecture.  The
second way is the ELF AT_PLATFORM entry.  AT_PLATFORM has well defined
format, and is already used to select between different library versions
(so is already a user API, and is subject to user API rules).  See:

$ grep string.*elf_name arch/arm/mm/proc*.S

for a list of the prefixes - the last character is always the endian-ness.
From that, you can see that the format is "v" (for version), then the CPU
architecture number, followed (optionally) by any suffixes.  Parse that
wisely, and you have the CPU architecture version, and the CPU architecture
version defines whether the MCR or DMB variant should be used.

See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
with recent glibc.  I'm sure other C libraries will be getting their own
implementation of that for compatibility with glibc.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 16:33 ` Russell King - ARM Linux
@ 2014-11-16 16:50   ` Rich Felker
  2014-11-16 17:10     ` Russell King - ARM Linux
  2014-11-17 11:48   ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Rich Felker @ 2014-11-16 16:50 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: musl, Andy Lutomirski, Szabolcs Nagy, Kees Cook, linux-arm-kernel

On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > Aside from that, the only case among the above that's "right" already
> > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> 
> I don't think it's wrong at all.  The instruction isn't going away from
> ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> ARM architecture, we aren't going to see the MCR instruction disappearing
> on 32-bit CPUs.
> 
> On ARMv8, it may have been removed, but we have already decided that the
> kernel _must_ provide emulation for this op-code, because otherwise we
> are breaking existing userspace, which is just not permissible.  However,
> you are absolutely right that running on ARMv8 should use the new
> instruction where possible.

Thanks for the clarification on the current and intended future
compatibility status!

Emulation by the kernel would be something like 100x slower though,
no? While it's better than not working at all, I think that would be a
good argument for never using mcr explicitly unless either it's known
to be supported in hardware or there's no alternative (because kuser
helper is missing).

> > However neither is really very easy because it seems impossible to
> > detect whether the mcr-based barrier or the dmb-based barrier should
> > be used -- there's no hwcap flag to indicate support for the latter.
> > This also complicates what to do in builds for v6.
> 
> It is entirely possible to detect whether you should use mcr or dmb, and
> you've said how to do that all the way through this message.  The mcr
> instruction is present on ARMv6, and present but deprecated on ARMv7.
> dmb is only present on ARMv7.  So, if you know the CPU architecture, you
> know whether you should be using nothing, mcr, or dmb.
> 
> There's two ways to get that - firstly, the uname syscall, which gives
> a string in the form "armv..." which gives the CPU architecture.  The

Isn't it clear from the "Windows 10" fiasco that strcmp on a version
string is NOT an acceptable way to determine version/capabilities?

> second way is the ELF AT_PLATFORM entry.  AT_PLATFORM has well defined
> format, and is already used to select between different library versions
> (so is already a user API, and is subject to user API rules).  See:
> 
> $ grep string.*elf_name arch/arm/mm/proc*.S
> 
> for a list of the prefixes - the last character is always the endian-ness.
> >From that, you can see that the format is "v" (for version), then the CPU
> architecture number, followed (optionally) by any suffixes.  Parse that
> wisely, and you have the CPU architecture version, and the CPU architecture
> version defines whether the MCR or DMB variant should be used.

That seems much more acceptable to use.

> See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
> with recent glibc.  I'm sure other C libraries will be getting their own
> implementation of that for compatibility with glibc.

Yes, we have access to the aux vector, so this should work in
principle.

Rich


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 16:50   ` Rich Felker
@ 2014-11-16 17:10     ` Russell King - ARM Linux
  2014-11-16 18:27       ` Andy Lutomirski
                         ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-16 17:10 UTC (permalink / raw)
  To: Rich Felker
  Cc: musl, Andy Lutomirski, Szabolcs Nagy, Kees Cook, linux-arm-kernel

On Sun, Nov 16, 2014 at 11:50:17AM -0500, Rich Felker wrote:
> On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > > Aside from that, the only case among the above that's "right" already
> > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> > 
> > I don't think it's wrong at all.  The instruction isn't going away from
> > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> > by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> > ARM architecture, we aren't going to see the MCR instruction disappearing
> > on 32-bit CPUs.
> > 
> > On ARMv8, it may have been removed, but we have already decided that the
> > kernel _must_ provide emulation for this op-code, because otherwise we
> > are breaking existing userspace, which is just not permissible.  However,
> > you are absolutely right that running on ARMv8 should use the new
> > instruction where possible.
> 
> Thanks for the clarification on the current and intended future
> compatibility status!
> 
> Emulation by the kernel would be something like 100x slower though,
> no? While it's better than not working at all, I think that would be a
> good argument for never using mcr explicitly unless either it's known
> to be supported in hardware or there's no alternative (because kuser
> helper is missing).

Right, and that is "ARMv8 or later".

> > > However neither is really very easy because it seems impossible to
> > > detect whether the mcr-based barrier or the dmb-based barrier should
> > > be used -- there's no hwcap flag to indicate support for the latter.
> > > This also complicates what to do in builds for v6.
> > 
> > It is entirely possible to detect whether you should use mcr or dmb, and
> > you've said how to do that all the way through this message.  The mcr
> > instruction is present on ARMv6, and present but deprecated on ARMv7.
> > dmb is only present on ARMv7.  So, if you know the CPU architecture, you
> > know whether you should be using nothing, mcr, or dmb.
> > 
> > There's two ways to get that - firstly, the uname syscall, which gives
> > a string in the form "armv..." which gives the CPU architecture.  The
> 
> Isn't it clear from the "Windows 10" fiasco that strcmp on a version
> string is NOT an acceptable way to determine version/capabilities?

Would there be a "Windows 10" fiasco if there had been better control of
the version numbering?  No.

However, this is already in use as a CPU architecture thing.  It's had a
/very/ long history of being used by package managers to detect which
packages are suitable for installation on a platform, whether it be an
x86 platform, PowerPC, or ARM platform.

> > second way is the ELF AT_PLATFORM entry.  AT_PLATFORM has well defined
> > format, and is already used to select between different library versions
> > (so is already a user API, and is subject to user API rules).  See:
> > 
> > $ grep string.*elf_name arch/arm/mm/proc*.S
> > 
> > for a list of the prefixes - the last character is always the endian-ness.
> > >From that, you can see that the format is "v" (for version), then the CPU
> > architecture number, followed (optionally) by any suffixes.  Parse that
> > wisely, and you have the CPU architecture version, and the CPU architecture
> > version defines whether the MCR or DMB variant should be used.
> 
> That seems much more acceptable to use.
> 
> > See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
> > with recent glibc.  I'm sure other C libraries will be getting their own
> > implementation of that for compatibility with glibc.
> 
> Yes, we have access to the aux vector, so this should work in
> principle.

In both of these cases, we know that:
- ARMv1-ARMv3 is no longer supported (for several years)
- ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
- ARMv6 has the MCR instruction only
- ARMv7 has the MCR instruction and the DMB instruction.
- ARMv8 has the DMB instruction, and MCR emulation.

A safe bet would be that DMB is going to be there in the future (if that
goes, then the ARM architecture will be regarded as even more of a toy
architecture by Linus than he already regards it today, and he'll probably
stop giving a damn about whether any changes break ARM.)

Now, there is a twist here: ARM64 decided to use an ELF platform string
of "aarch64" for everything, which means that rather than encoding the
CPU architecture (like with every other Linux architecture), we have a
string which encodes the kernel architecture instead, which is absurd.
Obviously, the plan for ARM64 is that there will never be an ARMv9
architecture, and ARMv8 is the perfect architecture for the future. :p

So, a reasonable parsing of this would be:

	const char *ptr;
	int architecture;

	ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
	assert(ptr);

	if (!strncmp(ptr, "aarch64", 7))
		architecture = 8;
	else
		assert(sscanf(ptr, "v%d", &architecture) == 1);

	switch (architecture) {
	case 4:
	case 5:
		no_mcr_dmb;
		break;
	case 6:
		use_mcr;
		break;
	default:
		use_dmb;
		break;
	}

That will be safe - we can't really predict what future architectures will
do, but as I say above, if dmb vanishes in future with a preference for
yet another different method, I think the ARM architecture will be laughed
at even more than it is today.

Before this is finalised, I think the ARM64 maintainers need to have a long
think about the wiseness of their existing AT_PLATFORM string, and consider
whether they have created something of a cockup there.  But that's /their/
problem, it isn't an ARM32 problem, on ARM32 this is the solution which
should be used.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 17:10     ` Russell King - ARM Linux
@ 2014-11-16 18:27       ` Andy Lutomirski
  2014-11-16 18:56         ` Rich Felker
  2014-11-16 19:02       ` Rich Felker
  2014-11-17 13:54       ` Catalin Marinas
  2 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-16 18:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rich Felker, musl, Szabolcs Nagy, Kees Cook, linux-arm-kernel

On Sun, Nov 16, 2014 at 9:10 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sun, Nov 16, 2014 at 11:50:17AM -0500, Rich Felker wrote:
>> On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
>> > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
>> > > Aside from that, the only case among the above that's "right" already
>> > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
>> >
>> > I don't think it's wrong at all.  The instruction isn't going away from
>> > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
>> > by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
>> > ARM architecture, we aren't going to see the MCR instruction disappearing
>> > on 32-bit CPUs.
>> >
>> > On ARMv8, it may have been removed, but we have already decided that the
>> > kernel _must_ provide emulation for this op-code, because otherwise we
>> > are breaking existing userspace, which is just not permissible.  However,
>> > you are absolutely right that running on ARMv8 should use the new
>> > instruction where possible.
>>
>> Thanks for the clarification on the current and intended future
>> compatibility status!
>>
>> Emulation by the kernel would be something like 100x slower though,
>> no? While it's better than not working at all, I think that would be a
>> good argument for never using mcr explicitly unless either it's known
>> to be supported in hardware or there's no alternative (because kuser
>> helper is missing).
>
> Right, and that is "ARMv8 or later".
>
>> > > However neither is really very easy because it seems impossible to
>> > > detect whether the mcr-based barrier or the dmb-based barrier should
>> > > be used -- there's no hwcap flag to indicate support for the latter.
>> > > This also complicates what to do in builds for v6.
>> >
>> > It is entirely possible to detect whether you should use mcr or dmb, and
>> > you've said how to do that all the way through this message.  The mcr
>> > instruction is present on ARMv6, and present but deprecated on ARMv7.
>> > dmb is only present on ARMv7.  So, if you know the CPU architecture, you
>> > know whether you should be using nothing, mcr, or dmb.
>> >
>> > There's two ways to get that - firstly, the uname syscall, which gives
>> > a string in the form "armv..." which gives the CPU architecture.  The
>>
>> Isn't it clear from the "Windows 10" fiasco that strcmp on a version
>> string is NOT an acceptable way to determine version/capabilities?
>
> Would there be a "Windows 10" fiasco if there had been better control of
> the version numbering?  No.
>
> However, this is already in use as a CPU architecture thing.  It's had a
> /very/ long history of being used by package managers to detect which
> packages are suitable for installation on a platform, whether it be an
> x86 platform, PowerPC, or ARM platform.
>
>> > second way is the ELF AT_PLATFORM entry.  AT_PLATFORM has well defined
>> > format, and is already used to select between different library versions
>> > (so is already a user API, and is subject to user API rules).  See:
>> >
>> > $ grep string.*elf_name arch/arm/mm/proc*.S
>> >
>> > for a list of the prefixes - the last character is always the endian-ness.
>> > >From that, you can see that the format is "v" (for version), then the CPU
>> > architecture number, followed (optionally) by any suffixes.  Parse that
>> > wisely, and you have the CPU architecture version, and the CPU architecture
>> > version defines whether the MCR or DMB variant should be used.
>>
>> That seems much more acceptable to use.
>>
>> > See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
>> > with recent glibc.  I'm sure other C libraries will be getting their own
>> > implementation of that for compatibility with glibc.
>>
>> Yes, we have access to the aux vector, so this should work in
>> principle.
>
> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.
>
> A safe bet would be that DMB is going to be there in the future (if that
> goes, then the ARM architecture will be regarded as even more of a toy
> architecture by Linus than he already regards it today, and he'll probably
> stop giving a damn about whether any changes break ARM.)
>
> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything, which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.
> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p
>
> So, a reasonable parsing of this would be:
>
>         const char *ptr;
>         int architecture;
>
>         ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
>         assert(ptr);
>
>         if (!strncmp(ptr, "aarch64", 7))
>                 architecture = 8;
>         else
>                 assert(sscanf(ptr, "v%d", &architecture) == 1);
>
>         switch (architecture) {
>         case 4:
>         case 5:
>                 no_mcr_dmb;
>                 break;
>         case 6:
>                 use_mcr;
>                 break;
>         default:
>                 use_dmb;
>                 break;
>         }
>
> That will be safe - we can't really predict what future architectures will
> do, but as I say above, if dmb vanishes in future with a preference for
> yet another different method, I think the ARM architecture will be laughed
> at even more than it is today.
>
> Before this is finalised, I think the ARM64 maintainers need to have a long
> think about the wiseness of their existing AT_PLATFORM string, and consider
> whether they have created something of a cockup there.  But that's /their/
> problem, it isn't an ARM32 problem, on ARM32 this is the solution which
> should be used.

Would it make sense for arm and arm64 to add bits for these features
to AT_HWCAP, along with an extra bit indicating that the kernel
provides these bits?

--Andy


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 18:27       ` Andy Lutomirski
@ 2014-11-16 18:56         ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16 18:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Russell King - ARM Linux, musl, Szabolcs Nagy, Kees Cook,
	linux-arm-kernel

On Sun, Nov 16, 2014 at 10:27:04AM -0800, Andy Lutomirski wrote:
> Would it make sense for arm and arm64 to add bits for these features
> to AT_HWCAP, along with an extra bit indicating that the kernel
> provides these bits?

Sadly since it wasn't available there from the beginning, I don't
think there would be a lot of benefit in adding it now, but it
wouldn't hurt.

It might be useful if there's a risk that the existing methods will
break in the future; adding it now would ensure that there are only a
known finite set of kernels for which the old hackish string methods
need to be used, so that there's no concern about their compatibility
with future kernels/models.

Rich


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 17:10     ` Russell King - ARM Linux
  2014-11-16 18:27       ` Andy Lutomirski
@ 2014-11-16 19:02       ` Rich Felker
  2014-11-17 13:54       ` Catalin Marinas
  2 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16 19:02 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: musl, Andy Lutomirski, Szabolcs Nagy, Kees Cook, linux-arm-kernel

On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > There's two ways to get that - firstly, the uname syscall, which gives
> > > a string in the form "armv..." which gives the CPU architecture.  The
> > 
> > Isn't it clear from the "Windows 10" fiasco that strcmp on a version
> > string is NOT an acceptable way to determine version/capabilities?
> 
> Would there be a "Windows 10" fiasco if there had been better control of
> the version numbering?  No.
> 
> However, this is already in use as a CPU architecture thing.  It's had a
> /very/ long history of being used by package managers to detect which
> packages are suitable for installation on a platform, whether it be an
> x86 platform, PowerPC, or ARM platform.

Use by package managers (which can be upgraded independently, and
which can, in the worst case, be overridden anyway) and by program
binaries for which you might not even have source are very different
issues.

> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.
> 
> A safe bet would be that DMB is going to be there in the future (if that
> goes, then the ARM architecture will be regarded as even more of a toy
> architecture by Linus than he already regards it today, and he'll probably
> stop giving a damn about whether any changes break ARM.)

Yes, I think that's reasonable.

> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything, which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.
> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p

I'm confused. Does this mean that 32-bit binaries running on a 64-bit
kernel are going to see "aarch64" here?

> So, a reasonable parsing of this would be:
> 
> 	const char *ptr;
> 	int architecture;
> 
> 	ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> 	assert(ptr);
> 
> 	if (!strncmp(ptr, "aarch64", 7))
> 		architecture = 8;
> 	else
> 		assert(sscanf(ptr, "v%d", &architecture) == 1);
> 
> 	switch (architecture) {
> 	case 4:
> 	case 5:
> 		no_mcr_dmb;
> 		break;
> 	case 6:
> 		use_mcr;
> 		break;
> 	default:
> 		use_dmb;
> 		break;
> 	}

Is (ptr[1]=='6' && !isdigit(ptr[2])) a safe condition for v6? v4/v5
(and original v6 without the k) don't need to be detected at all since
kuser is mandatory for them and already indicated by !(hwcap &
HWCAP_TLS).

Rich


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16  5:56 ARM atomics overhaul for musl Rich Felker
  2014-11-16 16:33 ` Russell King - ARM Linux
@ 2014-11-16 22:33 ` Jens Gustedt
  2014-11-16 23:23   ` Rich Felker
  1 sibling, 1 reply; 28+ messages in thread
From: Jens Gustedt @ 2014-11-16 22:33 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

Hello,

Am Sonntag, den 16.11.2014, 00:56 -0500 schrieb Rich Felker:
> One item on the agenda for this release cycle is overhauling the way
> atomics are done on ARM. I'm cc'ing people who have been involved in
> this discussion in the past in case anyone's not on the musl list and
> has opinions about what should be done.
> 
> The current situation looks like the following: ...

I don't know enough about the nasty details of this architecture to be
helpful, I think. But what I'd very much like to have is some sort of
documentation or standards concerning memory ordering for the atomics
that we use internally. And also about which OS features are
needed/missing to make atomic operations appear stateless (AKA
"lockfree" in C11 terminology).

Since this is the most complicated architecture (or merely family of
architectures) this is probably the best to start such a reflection.

Thanks

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::





[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 22:33 ` Jens Gustedt
@ 2014-11-16 23:23   ` Rich Felker
  0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16 23:23 UTC (permalink / raw)
  To: musl

On Sun, Nov 16, 2014 at 11:33:15PM +0100, Jens Gustedt wrote:
> Hello,
> 
> Am Sonntag, den 16.11.2014, 00:56 -0500 schrieb Rich Felker:
> > One item on the agenda for this release cycle is overhauling the way
> > atomics are done on ARM. I'm cc'ing people who have been involved in
> > this discussion in the past in case anyone's not on the musl list and
> > has opinions about what should be done.
> > 
> > The current situation looks like the following: ...
> 
> I don't know enough about the nasty details of this architecture to be
> helpful, I think. But what I'd very much like to have is some sort of
> documentation or standards concerning memory ordering for the atomics
> that we use internally.

At present, the assumptions made about musl's atomic primitives used
internally is that they meet the POSIX requirement for synchronizing
memory. They are at least acquire+release barriers. Assuming a POSIX
memory model that does not have atomic objects and where you can only
access memory when simultaneous modification is excluded by
synchronizing functions, I think this is equivalent to sequential
consistency, but it's not necessarily equivalent when the application
can access atomic objects itself. Does this sound correct?

> And also about which OS features are
> needed/missing to make atomic operations appear stateless (AKA
> "lockfree" in C11 terminology).

This is purely dependent on having a hardware CAS of the correct size.
musl requires int- and long/pointer-sized CAS, and IMO it's impossible
to implement POSIX correctly without them (of course they could be
emulated by kernel blocking interrupts and shutting down all but one
core temporarily).

> Since this is the most complicated architecture (or merely family of
> architectures) this is probably the best to start such a reflection.

The complexities being discussed here are complexities in the
instruction set architecture and the kernel's failure to report the
particular variant in use in a reasonable way. The memory model is
just a pretty standard relaxed-order.

Rich


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 16:33 ` Russell King - ARM Linux
  2014-11-16 16:50   ` Rich Felker
@ 2014-11-17 11:48   ` Catalin Marinas
  2014-11-17 12:21     ` Arnd Bergmann
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 11:48 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Szabolcs Nagy, Rich Felker, Kees Cook, musl, Andy Lutomirski,
	linux-arm-kernel

On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > Aside from that, the only case among the above that's "right" already
> > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> 
> I don't think it's wrong at all.  The instruction isn't going away from
> ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> ARM architecture, we aren't going to see the MCR instruction disappearing
> on 32-bit CPUs.

You are wrong here. ARMv8-A supports 32-bit at all levels. ARMv8-R is
32-bit only (and it even has an MMU at EL1). And there is a slight
chance that we may even see 32-bit only ARMv8-A implementations (I'm not
really giving a hint and I'm not aware of any but I don't see anything
preventing this, it's all marketing driven).

http://www.arm.com/products/processors/instruction-set-architectures/armv8-r-architecture.php

> On ARMv8, it may have been removed, but we have already decided that the
> kernel _must_ provide emulation for this op-code, because otherwise we
> are breaking existing userspace, which is just not permissible.  However,
> you are absolutely right that running on ARMv8 should use the new
> instruction where possible.

Even on ARMv8 we could enable CP15 barriers in hardware, they are just
deprecated but haven't been removed (yet). What I'm pushing for, though
it's not easy, is that the hardware just deprecates such instructions
for performance rather than removing them entirely. This would make them
faster than emulation but I fully agree with you that the new
instructions should be used where possible.

-- 
Catalin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 11:48   ` Catalin Marinas
@ 2014-11-17 12:21     ` Arnd Bergmann
  2014-11-17 13:30       ` Szabolcs Nagy
  0 siblings, 1 reply; 28+ messages in thread
From: Arnd Bergmann @ 2014-11-17 12:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Szabolcs Nagy, Rich Felker, Russell King - ARM Linux, Kees Cook,
	Catalin Marinas, musl, Andy Lutomirski

On Monday 17 November 2014 11:48:33 Catalin Marinas wrote:
> On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > > Aside from that, the only case among the above that's "right" already
> > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> > 
> > I don't think it's wrong at all.  The instruction isn't going away from
> > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> > by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> > ARM architecture, we aren't going to see the MCR instruction disappearing
> > on 32-bit CPUs.
> 
> You are wrong here. ARMv8-A supports 32-bit at all levels. ARMv8-R is
> 32-bit only (and it even has an MMU at EL1). And there is a slight
> chance that we may even see 32-bit only ARMv8-A implementations (I'm not
> really giving a hint and I'm not aware of any but I don't see anything
> preventing this, it's all marketing driven).

FWIW, both Samsung EXYNOS and Qualcomm Snapdragon SoCs based on Cortex-A53
have been shipped in 32-bit only devices.

	Arnd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 12:21     ` Arnd Bergmann
@ 2014-11-17 13:30       ` Szabolcs Nagy
  2014-11-17 14:34         ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2014-11-17 13:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Catalin Marinas, Russell King - ARM Linux,
	Rich Felker, Kees Cook, musl, Andy Lutomirski

* Arnd Bergmann <arnd@arndb.de> [2014-11-17 13:21:03 +0100]:
> On Monday 17 November 2014 11:48:33 Catalin Marinas wrote:
> > On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > > > Aside from that, the only case among the above that's "right" already
> > > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> > > 
> > > I don't think it's wrong at all.  The instruction isn't going away from
> > > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> > > by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> > > ARM architecture, we aren't going to see the MCR instruction disappearing
> > > on 32-bit CPUs.
> > 
> > You are wrong here. ARMv8-A supports 32-bit at all levels. ARMv8-R is
> > 32-bit only (and it even has an MMU at EL1). And there is a slight
> > chance that we may even see 32-bit only ARMv8-A implementations (I'm not
> > really giving a hint and I'm not aware of any but I don't see anything
> > preventing this, it's all marketing driven).
> 
> FWIW, both Samsung EXYNOS and Qualcomm Snapdragon SoCs based on Cortex-A53
> have been shipped in 32-bit only devices.
> 

ARMv8-A manual talks about two execution sates:
- aarch64 with 64 bit registers and A64 instruction set
- aarch32 with 32 bit registers and A32 or T32 instruction sets

(i thought an armv8-a cpu must support both but that is not
relevant to userspace)

for userspace the two states are different architectures
so i guess for libc aarch32 backward compatibility is the
interesting question (does armv7 instructions, syscalls, elf
abi work on aarch32) and how to recognize it when its new
features can be used in the libc

if aarch32 has cp15 barrier then that is an option for portable
binaries and the other approach is runtime dispatch but then libc
needs a reliable check for >=armv7


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-16 17:10     ` Russell King - ARM Linux
  2014-11-16 18:27       ` Andy Lutomirski
  2014-11-16 19:02       ` Rich Felker
@ 2014-11-17 13:54       ` Catalin Marinas
  2014-11-17 14:11         ` Szabolcs Nagy
  2014-11-17 14:39         ` Russell King - ARM Linux
  2 siblings, 2 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 13:54 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> On Sun, Nov 16, 2014 at 11:50:17AM -0500, Rich Felker wrote:
> > On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > > second way is the ELF AT_PLATFORM entry.  AT_PLATFORM has well defined
> > > format, and is already used to select between different library versions
> > > (so is already a user API, and is subject to user API rules).  See:
> > > 
> > > $ grep string.*elf_name arch/arm/mm/proc*.S
> > > 
> > > for a list of the prefixes - the last character is always the endian-ness.
> > > >From that, you can see that the format is "v" (for version), then the CPU
> > > architecture number, followed (optionally) by any suffixes.  Parse that
> > > wisely, and you have the CPU architecture version, and the CPU architecture
> > > version defines whether the MCR or DMB variant should be used.
> > 
> > That seems much more acceptable to use.
> > 
> > > See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
> > > with recent glibc.  I'm sure other C libraries will be getting their own
> > > implementation of that for compatibility with glibc.
> > 
> > Yes, we have access to the aux vector, so this should work in
> > principle.
> 
> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.

MCR can be enabled in hardware on ARMv8 (SCTLR_EL1 bit), though there is
no guarantee that it is as fast as the DMB (normally I don't see a
reason why it wouldn't, it's just instruction decoding problem but you
never know what the microarchitecture does).

> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything,

Please define "everything". This matches the ELF name as defined in the
ARM 64-bit ELF ABI.

> which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.

Just like x86_64 vs i686?

> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p

If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
blurred enough (guess why cpu_architecture() reports ARMv7 for
ARM11MPCore). ARM is trying to move away from architecture version
numbers, which are rather useful for marketing, to proper feature
detection based on CPUID. Whether there is an ARMv9 or not, it's
irrelevant to what Linux should do (i.e. use CPUID rather than guess
features based on architecture version numbers).

> So, a reasonable parsing of this would be:
> 
> 	const char *ptr;
> 	int architecture;
> 
> 	ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> 	assert(ptr);
> 
> 	if (!strncmp(ptr, "aarch64", 7))
> 		architecture = 8;
> 	else
> 		assert(sscanf(ptr, "v%d", &architecture) == 1);

Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
on an aarch64 kernel? It reports "v8l", so please don't confuse others.

> Before this is finalised, I think the ARM64 maintainers need to have a long
> think about the wiseness of their existing AT_PLATFORM string, and consider
> whether they have created something of a cockup there.

We had a think long time ago already and it was a wise decision. FWIW,
it matches x86 in this respect.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 13:54       ` Catalin Marinas
@ 2014-11-17 14:11         ` Szabolcs Nagy
  2014-11-17 14:47           ` Catalin Marinas
  2014-11-17 14:39         ` Russell King - ARM Linux
  1 sibling, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2014-11-17 14:11 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Russell King - ARM Linux, Rich Felker, musl, Kees Cook,
	linux-arm-kernel, Andy Lutomirski

* Catalin Marinas <catalin.marinas@arm.com> [2014-11-17 13:54:13 +0000]:
> ARM11MPCore). ARM is trying to move away from architecture version
> numbers, which are rather useful for marketing, to proper feature
> detection based on CPUID. Whether there is an ARMv9 or not, it's
> irrelevant to what Linux should do (i.e. use CPUID rather than guess
> features based on architecture version numbers).
> 

how to use cpuid from userspace?

should linux export those bits into hwcap?

(i assume all relevant info will be available in /proc/cpuinfo but
that does not work for libc)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 13:30       ` Szabolcs Nagy
@ 2014-11-17 14:34         ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 14:34 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Arnd Bergmann, linux-arm-kernel, Russell King - ARM Linux,
	Rich Felker, Kees Cook, musl, Andy Lutomirski

On Mon, Nov 17, 2014 at 01:30:35PM +0000, Szabolcs Nagy wrote:
> * Arnd Bergmann <arnd@arndb.de> [2014-11-17 13:21:03 +0100]:
> > On Monday 17 November 2014 11:48:33 Catalin Marinas wrote:
> > > On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > > > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
> > > > > Aside from that, the only case among the above that's "right" already
> > > > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
> > > > 
> > > > I don't think it's wrong at all.  The instruction isn't going away from
> > > > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
> > > > by a CPU conforming to ARMv7.  As ARMv7 is going to be the last 32-bit
> > > > ARM architecture, we aren't going to see the MCR instruction disappearing
> > > > on 32-bit CPUs.
> > > 
> > > You are wrong here. ARMv8-A supports 32-bit at all levels. ARMv8-R is
> > > 32-bit only (and it even has an MMU at EL1). And there is a slight
> > > chance that we may even see 32-bit only ARMv8-A implementations (I'm not
> > > really giving a hint and I'm not aware of any but I don't see anything
> > > preventing this, it's all marketing driven).
> > 
> > FWIW, both Samsung EXYNOS and Qualcomm Snapdragon SoCs based on Cortex-A53
> > have been shipped in 32-bit only devices.
> 
> ARMv8-A manual talks about two execution sates:
> - aarch64 with 64 bit registers and A64 instruction set
> - aarch32 with 32 bit registers and A32 or T32 instruction sets
> 
> (i thought an armv8-a cpu must support both but that is not
> relevant to userspace)

I'm not sure there is a clear statement that both must be supported.
Even if they are, an SoC manufacturer may decide to hardwire the EL3
register width (external pin) to 32-bit only which makes the CPU a
32-bit one (as Arnd already mentioned).

> for userspace the two states are different architectures
> so i guess for libc aarch32 backward compatibility is the
> interesting question (does armv7 instructions, syscalls, elf
> abi work on aarch32) and how to recognize it when its new
> features can be used in the libc

The AT_PLATFORM on an AArch32 kernel running on ARMv8 would report
"v7l". An AArch64 kernel would report "v8l" to _compat_ tasks.

The AArch32 kernel could (and I think it should) be aligned to ARMv8 as
well. This is a trivial patch adding the corresponding proc_info in
proc-v7.S for Cortex-A53 etc. (we don't really need a proc-v8.S).

The differences between AArch32 binaries running on ARMv7 and ARMv8 are
around /proc/cpuinfo and uname (the latter matches x86 behaviour
already but for the former we wrongly thought people would just use
HWCAP). 32-bit binaries would need to use PER_LINUX32 personality for
uname and /proc/cpuinfo (the latter from 3.19) if they want the
ARMv7-like information.

The other things like syscall, HWCAP, AT_PLATFORM are all provided in a
compatible way to 32-bit binaries.

> if aarch32 has cp15 barrier then that is an option for portable
> binaries and the other approach is runtime dispatch but then libc
> needs a reliable check for >=armv7

AT_PLATFORM should work. We could also add a HWCAP bit for the presence
of DMB/DSB/ISB as well but it does not solve the problem of older
kernels, so I wouldn't recommend it.

We've had a long thread about deprecated/obsolete instructions and how
we inform user space about them. My proposal is here:

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-July/269675.html

which means that for CP15 barriers, we emulate the instructions even
though they are present to give an advanced warning to user space that
these may no longer be optimal at some point (e.g. emulated). As a
consequence, see the thread below implementing emulation, with the
possibility of quick hardware execution if available (e.g. for CP15
barriers, though defaulting to emulation):

http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/297448.html

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 13:54       ` Catalin Marinas
  2014-11-17 14:11         ` Szabolcs Nagy
@ 2014-11-17 14:39         ` Russell King - ARM Linux
  2014-11-17 15:26           ` Catalin Marinas
  2014-11-17 17:38           ` Andy Lutomirski
  1 sibling, 2 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 14:39 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > which means that rather than encoding the
> > CPU architecture (like with every other Linux architecture), we have a
> > string which encodes the kernel architecture instead, which is absurd.
> 
> Just like x86_64 vs i686?

That is still valid, but let's wait and see what happens when a new
"version" of x86_64 comes along.

However, the issue on x86 is far less of a problem: userspace (even
kernel space) does not have to play these games because the CPUs aren't
designed by people intent on removing old instructions from the
instruction set, thereby stopping existing binaries working without
kernel emulation of the missing instructions.

> > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > architecture, and ARMv8 is the perfect architecture for the future. :p
> 
> If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> blurred enough (guess why cpu_architecture() reports ARMv7 for
> ARM11MPCore). ARM is trying to move away from architecture version
> numbers, which are rather useful for marketing, to proper feature
> detection based on CPUID. Whether there is an ARMv9 or not, it's
> irrelevant to what Linux should do (i.e. use CPUID rather than guess
> features based on architecture version numbers).

That may be what is desired, but unfortunately we have no way to export
all the intricate feature registers to userspace.  No, elf hwcaps don't
support it, there's only 64 bits split between two words there, and
there are many more than just 64 bits of feature registers.

Given that even cocked these up (just as what happened with the cache
type register) decoding of the feature type registers depends on the
underlying CPU architecture.

So, even _if_ we exported the feature registers to userspace, you still
need to know the CPU architecture to decode them properly, so you still
need to parse the AT_PLATFORM string to get that information.

> > So, a reasonable parsing of this would be:
> > 
> > 	const char *ptr;
> > 	int architecture;
> > 
> > 	ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > 	assert(ptr);
> > 
> > 	if (!strncmp(ptr, "aarch64", 7))
> > 		architecture = 8;
> > 	else
> > 		assert(sscanf(ptr, "v%d", &architecture) == 1);
> 
> Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> on an aarch64 kernel? It reports "v8l", so please don't confuse others.

Right, I see that now - I'm not knowledgable of the compat code, because
ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.

As for "bothered trying" - tell me how I could possibly try that.  You
know full well that I have /no/ 64-bit hardware, and you also know that
I have *nothing* capable of running, let alone building a 64-bit ARM
kernel.

Please, next time you decide to make accusations, bear that in mind - my
"guesses" as to what ARM64 does are based upon reading your code,
sometimes for the first time, and not through any kind of experience of
actually running the damned stuff.

Now, think about what /you/ have said.  Think about your assertion about
that "v8l" string.  How does the code react to that?  Oh my, it sets
"architecture" to 8 !  Oh lookie, it's the right value.  Oh look, the
code works correctly.

So, counter to your crap about me confusing others, maybe you should
make that same accusation of yourself!

Maybe ARM and yourself should have tried to be more inclusive with ARMv8
in general, rather than trying to push me away with accusations and the
like (like you're doing right now) every time I say something about it.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 14:11         ` Szabolcs Nagy
@ 2014-11-17 14:47           ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 14:47 UTC (permalink / raw)
  To: Szabolcs Nagy
  Cc: Russell King - ARM Linux, Rich Felker, musl, Kees Cook,
	linux-arm-kernel, Andy Lutomirski

On Mon, Nov 17, 2014 at 02:11:23PM +0000, Szabolcs Nagy wrote:
> * Catalin Marinas <catalin.marinas@arm.com> [2014-11-17 13:54:13 +0000]:
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
> 
> how to use cpuid from userspace?

It's not possible and not recommended either. Just because the hardware
supports a feature line Neon doesn't mean that the kernel supports it as
well (e.g. saving/restoring registers).

> should linux export those bits into hwcap?

For many things we do. Unfortunately, we didn't do this for DMB, maybe
because we relied too much on the kuser helpers.

> (i assume all relevant info will be available in /proc/cpuinfo but
> that does not work for libc)

I wouldn't recommend /proc/cpuinfo.

If you want to put a dependency on newer kernel versions, there are
options like hwcap, more info in auxv (e.g. an arch-specific dump of the
CPUID registers) or even emulating CPUID access in user space (trap the
undefined instruction and return something that the kernel knows it can
support).

I'm happy with any of these options but I would like to see a concrete
proposal accepted by the libc people before committing to
implementing/supporting such ABI.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 14:39         ` Russell King - ARM Linux
@ 2014-11-17 15:26           ` Catalin Marinas
  2014-11-17 15:47             ` Russell King - ARM Linux
  2014-11-17 17:38           ` Andy Lutomirski
  1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 15:26 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > which means that rather than encoding the
> > > CPU architecture (like with every other Linux architecture), we have a
> > > string which encodes the kernel architecture instead, which is absurd.
> > 
> > Just like x86_64 vs i686?
> 
> That is still valid, but let's wait and see what happens when a new
> "version" of x86_64 comes along.

I'm not familiar enough with x86 but are there any differences between
AMD's and Intel's implementations? Or are they completely binary
compatible (no extensions)? The differences are probably covered by
hwcap.

> However, the issue on x86 is far less of a problem: userspace (even
> kernel space) does not have to play these games because the CPUs aren't
> designed by people intent on removing old instructions from the
> instruction set, thereby stopping existing binaries working without
> kernel emulation of the missing instructions.

That's what ARM hopes with AArch64. Whether this will still be valid
many years in the future, I can't tell (but a lesson learnt by the
architecture folk is that it's impossible to get rid of old instructions
in user space). There are, of course, optional features like crypto but
we use hwcap for them.

> > > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > > architecture, and ARMv8 is the perfect architecture for the future. :p
> > 
> > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
> 
> That may be what is desired, but unfortunately we have no way to export
> all the intricate feature registers to userspace.  No, elf hwcaps don't
> support it, there's only 64 bits split between two words there, and
> there are many more than just 64 bits of feature registers.

As I replied to Szabolcs, maybe we need a way to export more of the
CPUID space to user (like trapping such mrc's and returning something
that the kernel has enabled). I have a similar request from the AArch64
tools people.

> Given that even cocked these up (just as what happened with the cache
> type register) decoding of the feature type registers depends on the
> underlying CPU architecture.
> 
> So, even _if_ we exported the feature registers to userspace, you still
> need to know the CPU architecture to decode them properly, so you still
> need to parse the AT_PLATFORM string to get that information.

From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
whether you have the extended CPUID or not. Prior to that, MIDR contains
the architecture number.

> > > So, a reasonable parsing of this would be:
> > > 
> > > 	const char *ptr;
> > > 	int architecture;
> > > 
> > > 	ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > > 	assert(ptr);
> > > 
> > > 	if (!strncmp(ptr, "aarch64", 7))
> > > 		architecture = 8;
> > > 	else
> > > 		assert(sscanf(ptr, "v%d", &architecture) == 1);
> > 
> > Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> > on an aarch64 kernel? It reports "v8l", so please don't confuse others.
> 
> Right, I see that now - I'm not knowledgable of the compat code, because
> ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.
> 
> As for "bothered trying" - tell me how I could possibly try that.  You
> know full well that I have /no/ 64-bit hardware, and you also know that
> I have *nothing* capable of running, let alone building a 64-bit ARM
> kernel.

Sorry, I was assuming that you have access to at least an ARM software
model (freely available, AArch64 Qemu is also stable enough). If you
have an interest and need ARMv8 hardware, please let us know.

> Now, think about what /you/ have said.  Think about your assertion about
> that "v8l" string.  How does the code react to that?  Oh my, it sets
> "architecture" to 8 !  Oh lookie, it's the right value.  Oh look, the
> code works correctly.

So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
such value doesn't give enough information and user space should rely on
hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
another big thing we missed is Thumb-2 in hwcap).

For ARMv8, we have additional features that I would like to include in
hwcap on arm32 (and we've already done this with crypto; there are
load/store with release/acquire semantics which would allow slightly
faster locking, see kuser helpers provided by the AArch64 kernel to
compat user space).

(I'm ignoring the rest of your email in order to keep the thread
constructive)

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 15:26           ` Catalin Marinas
@ 2014-11-17 15:47             ` Russell King - ARM Linux
  2014-11-17 16:19               ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 15:47 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 03:26:25PM +0000, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> > Given that even cocked these up (just as what happened with the cache
> > type register) decoding of the feature type registers depends on the
> > underlying CPU architecture.
> > 
> > So, even _if_ we exported the feature registers to userspace, you still
> > need to know the CPU architecture to decode them properly, so you still
> > need to parse the AT_PLATFORM string to get that information.
> 
> >From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
> whether you have the extended CPUID or not. Prior to that, MIDR contains
> the architecture number.

That is not what I'm referring to.  Where the feature registers are
implemented, there are at least two different interpretations of these
feature registers.  They do not comprise of a single coherent set of
definitions - the meaning of some nibbles were changed between different
architectures.

> So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
> such value doesn't give enough information and user space should rely on
> hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
> another big thing we missed is Thumb-2 in hwcap).

Shall we look at the entire code fragment again, and this time use our
heads to *think* about it first?

        const char *ptr;
        int architecture;

        ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
        assert(ptr);

        if (!strncmp(ptr, "aarch64", 7))
                architecture = 8;
        else
                assert(sscanf(ptr, "v%d", &architecture) == 1);

        switch (architecture) {
        case 4:
        case 5:
                no_mcr_dmb;
                break;
        case 6:
                use_mcr;
                break;
        default:
                use_dmb;
                break;
        }

Now, if 32-bit ARMv8 returns "v8l" from the AT_PLATFORM auxval, then
it is not equal to "aarch64".  So, we fall through th sscanf().  sscanf()
parses the "v8l" string, and sets "architecture" to 8.

We now enter the switch() statement.  8 isn't 4.  8 also isn't 5.  Nor is
it 6.  So, we fall through to the "default" section, which uses "use_dmb".

That's the correct answer for ARMv8 CPUs, because we don't want to use
the MCR instruction there, nor do we want to do nothing.  That is not
coincidence - it was /specifically/ designed to select that outcome for
any architecture value it didn't explicitly know.  The assumption there
is that ARM are not going to deprecate and remove the dmb instruction.

So it doesn't matter if there's a v9, v10, v11, v12 etc.  It'll continue
to select the dmb method until the code is modified to do otherwise.

So, maybe I'm not as stupid as you first thought, and maybe I /did/ think
about this carefully about the possible scenarios before suggesting this
code fragment as a solution.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 15:47             ` Russell King - ARM Linux
@ 2014-11-17 16:19               ` Catalin Marinas
  2014-11-17 16:53                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 16:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 03:47:39PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 03:26:25PM +0000, Catalin Marinas wrote:
> > On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> > > Given that even cocked these up (just as what happened with the cache
> > > type register) decoding of the feature type registers depends on the
> > > underlying CPU architecture.
> > > 
> > > So, even _if_ we exported the feature registers to userspace, you still
> > > need to know the CPU architecture to decode them properly, so you still
> > > need to parse the AT_PLATFORM string to get that information.
> > 
> > >From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
> > whether you have the extended CPUID or not. Prior to that, MIDR contains
> > the architecture number.
> 
> That is not what I'm referring to.  Where the feature registers are
> implemented, there are at least two different interpretations of these
> feature registers.  They do not comprise of a single coherent set of
> definitions - the meaning of some nibbles were changed between different
> architectures.

They were indeed messy on ARMv6 and earlier but I think they stabilised
enough for ARMv7.

> > So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
> > such value doesn't give enough information and user space should rely on
> > hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
> > another big thing we missed is Thumb-2 in hwcap).
> 
> Shall we look at the entire code fragment again, and this time use our
> heads to *think* about it first?
> 
>         const char *ptr;
>         int architecture;
> 
>         ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
>         assert(ptr);
> 
>         if (!strncmp(ptr, "aarch64", 7))
>                 architecture = 8;
>         else
>                 assert(sscanf(ptr, "v%d", &architecture) == 1);
> 
>         switch (architecture) {
>         case 4:
>         case 5:
>                 no_mcr_dmb;
>                 break;
>         case 6:
>                 use_mcr;
>                 break;
>         default:
>                 use_dmb;
>                 break;
>         }
> 
> Now, if 32-bit ARMv8 returns "v8l" from the AT_PLATFORM auxval, then
> it is not equal to "aarch64".  So, we fall through th sscanf().  sscanf()
> parses the "v8l" string, and sets "architecture" to 8.

I agree, but is there a reason to still check for "aarch64" AT_PLATFORM?

> We now enter the switch() statement.  8 isn't 4.  8 also isn't 5.  Nor is
> it 6.  So, we fall through to the "default" section, which uses "use_dmb".

This indeed works and it is likely the way you designed it with the
_arm32_ kernel in mind (but not before accusing the arm64 maintainers of
making a bad decision with the "aarch64" AT_PLATFORM string for compat
apps ;)).

In your code sequence, the "aarch64" check should be removed, unless you
aim it at portable code between 32 and 64-bit but I would rather use an
#ifdef __aarch64__ in such case. On AArch64 (nothing to do with ARMv8,
v9 etc.), we should move away from thinking in terms of architecture
version numbers but just features.

Similarly for AArch32, I think we should switch our focus from version
numbers (well, only from v7/v8) to features (exposed by the hardware to
the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
change in the architecture version number. We even expose this to user
space via hwcap because that's how we know we have atomic LDRD/STRD.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 16:19               ` Catalin Marinas
@ 2014-11-17 16:53                 ` Russell King - ARM Linux
  2014-11-17 17:48                   ` Catalin Marinas
  0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 16:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 04:19:42PM +0000, Catalin Marinas wrote:
> This indeed works and it is likely the way you designed it with the
> _arm32_ kernel in mind (but not before accusing the arm64 maintainers of
> making a bad decision with the "aarch64" AT_PLATFORM string for compat
> apps ;)).

Seeing as I'm the ARM32 maintainer, and you are the ARM64 maintainer, then
of course I designed it with the ARM32 kernel in mind, with a reference
into the ARM64 situation to the best of my knowledge, which suggested
that compat tasks got the "aarch64" string.  As you have pointed out,
they don't, they get a "v8l" string, which means...

> In your code sequence, the "aarch64" check should be removed, unless you
> aim it at portable code between 32 and 64-bit but I would rather use an
> #ifdef __aarch64__ in such case. On AArch64 (nothing to do with ARMv8,
> v9 etc.), we should move away from thinking in terms of architecture
> version numbers but just features.

... that it can indeed be removed.  To repeat, the check for "aarch64"
was only there because I thought that ARM64 kernels used that for
everything.

> Similarly for AArch32, I think we should switch our focus from version
> numbers (well, only from v7/v8) to features (exposed by the hardware to
> the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
> change in the architecture version number. We even expose this to user
> space via hwcap because that's how we know we have atomic LDRD/STRD.

For this case, I disagree.  There is no value (in fact, there is lots of
harm) to adding a hwcap bit for this.

If we added such a hwcap bit, it would mean that userspace would have
to implement the check that I suggested, plus a check for the hwcap bit,
plus maybe a kernel version check to decide which test to use.

That is needlessly complicated.  Okay, you could decide that if the
hwcap bit is set, then that indicates that DMB should be used, but you
still have to then check the architecture version if it isn't set to
be compatible with old kernels.

So, it's all round simpler just to do the architecture version check -
and we know for certain that ARMv4, ARMv5, and ARMv6 do not have dmb.
We know that ARMv7 and ARMv8 both have dmb, and it is likely (especially
if you exert pressure on the architecture people) that dmb will remain
implemented.  We also know that ARMv6 implements the mcr instruction.
So, in this case, we know everything we need to know just by looking
at the architecture version.

Of course, we can't predict the future with any accuracy, but hoping
that dmb won't be deprecated and obsoleted is a reasonable hope, and
if it does, we would need to modify the code to add the new method in
any case.

What the code is *intentionally* safe from is the architecture number
incrementing.

So, I really don't see the point in exposing the presence of DMB via
a hwcap bit - if we wanted to do that, it's something that we should
have done at the very start, but we didn't.  Now, it's pointless to
do so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 14:39         ` Russell King - ARM Linux
  2014-11-17 15:26           ` Catalin Marinas
@ 2014-11-17 17:38           ` Andy Lutomirski
  2014-11-18 10:56             ` Catalin Marinas
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-17 17:38 UTC (permalink / raw)
  To: Russell King
  Cc: musl, Catalin Marinas, Szabolcs Nagy, Kees Cook, Rich Felker,
	linux-arm-kernel

On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
<linux@arm.linux.org.uk> wrote:
>
> On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > which means that rather than encoding the
> > > CPU architecture (like with every other Linux architecture), we have a
> > > string which encodes the kernel architecture instead, which is absurd.
> >
> > Just like x86_64 vs i686?
>
> That is still valid, but let's wait and see what happens when a new
> "version" of x86_64 comes along.
>
> However, the issue on x86 is far less of a problem: userspace (even
> kernel space) does not have to play these games because the CPUs aren't
> designed by people intent on removing old instructions from the
> instruction set, thereby stopping existing binaries working without
> kernel emulation of the missing instructions.
>
> > > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > > architecture, and ARMv8 is the perfect architecture for the future. :p
> >
> > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
>
> That may be what is desired, but unfortunately we have no way to export
> all the intricate feature registers to userspace.  No, elf hwcaps don't
> support it, there's only 64 bits split between two words there, and
> there are many more than just 64 bits of feature registers.

That's a ridiculous argument.  Linux can freely add bits.

You could add AT_ARM_FEATURES that points to a length followed by the
indicated number of words, or you could just keep adding new HWCAP
fields as needed.  This is expandable forever.

As an x86 person and a complete ARM outsider, this situation is
totally nuts.  There is no good reason *not* to have feature bits, and
even in x86 land, relying on the architecture version is dangerous.
(Intel seems to be reinstating version 5 right now with Quark, and
even that is having minor issues since it's not really quite a version
5 chip.)

>
> Given that even cocked these up (just as what happened with the cache
> type register) decoding of the feature type registers depends on the
> underlying CPU architecture.
>
> So, even _if_ we exported the feature registers to userspace, you still
> need to know the CPU architecture to decode them properly, so you still
> need to parse the AT_PLATFORM string to get that information.
>

There's no need to expose the hardware feature registers as is.
Define your own sensible feature bits just for Linux.

Yes, libc implementations will need a fallback for old kernels, but at
least the set of legacy configurations that need to be supported that
way will stop increasing at some point.

--Andy

> > > So, a reasonable parsing of this would be:
> > >
> > >     const char *ptr;
> > >     int architecture;
> > >
> > >     ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > >     assert(ptr);
> > >
> > >     if (!strncmp(ptr, "aarch64", 7))
> > >             architecture = 8;
> > >     else
> > >             assert(sscanf(ptr, "v%d", &architecture) == 1);
> >
> > Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> > on an aarch64 kernel? It reports "v8l", so please don't confuse others.
>
> Right, I see that now - I'm not knowledgable of the compat code, because
> ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.
>
> As for "bothered trying" - tell me how I could possibly try that.  You
> know full well that I have /no/ 64-bit hardware, and you also know that
> I have *nothing* capable of running, let alone building a 64-bit ARM
> kernel.
>
> Please, next time you decide to make accusations, bear that in mind - my
> "guesses" as to what ARM64 does are based upon reading your code,
> sometimes for the first time, and not through any kind of experience of
> actually running the damned stuff.
>
> Now, think about what /you/ have said.  Think about your assertion about
> that "v8l" string.  How does the code react to that?  Oh my, it sets
> "architecture" to 8 !  Oh lookie, it's the right value.  Oh look, the
> code works correctly.
>
> So, counter to your crap about me confusing others, maybe you should
> make that same accusation of yourself!
>
> Maybe ARM and yourself should have tried to be more inclusive with ARMv8
> in general, rather than trying to push me away with accusations and the
> like (like you're doing right now) every time I say something about it.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 16:53                 ` Russell King - ARM Linux
@ 2014-11-17 17:48                   ` Catalin Marinas
  0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 17:48 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
	Andy Lutomirski

On Mon, Nov 17, 2014 at 04:53:34PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 04:19:42PM +0000, Catalin Marinas wrote:
> > Similarly for AArch32, I think we should switch our focus from version
> > numbers (well, only from v7/v8) to features (exposed by the hardware to
> > the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
> > change in the architecture version number. We even expose this to user
> > space via hwcap because that's how we know we have atomic LDRD/STRD.
> 
> For this case, I disagree.  There is no value (in fact, there is lots of
> harm) to adding a hwcap bit for this.
> 
> If we added such a hwcap bit, it would mean that userspace would have
> to implement the check that I suggested, plus a check for the hwcap bit,
> plus maybe a kernel version check to decide which test to use.
[...]
> So, I really don't see the point in exposing the presence of DMB via
> a hwcap bit - if we wanted to do that, it's something that we should
> have done at the very start, but we didn't.  Now, it's pointless to
> do so.

I agree with you on a HWCAP_DMB bit, it's too late now and code should
rely on the architecture version instead.

But my point is about new features that will appear (or already did) in
the current or next architecture versions (e.g. ARMv8). So far we seem
to have avoided adding HWCAP bits for new features that were mandated by
certain architecture versions, probably under the assumption that
software would check the architecture version number.

For example, on ARMv8, do you want to add a HWCAP_LDACQ (for
acquire/release semantics) or we tell user space to check for "v8l"
instead? There are additional hints available for AArch32 DMB and DSB
(ISHLD, OSHLD, NSHLD, LD), there are LDREX/STREX with acquire/release
semantics, a new SEVL instruction. User space needs to know about these
not only from a backwards compatibility perspective (I don't expect DMB
to ever go away) but from a future optimisation one.

If you are worried about the risk of running out of HWCAP bits (we still
have I think 32 left, we could also introduce elf_hwcap3), what about,
for new features, adding a HWCAP_CPUID (only when the extended CPUID is
present) and, when enabled, allow user space to probe CPUID registers
via an ARM-specific syscall or undef hooks? These would filtered by the
kernel, it doesn't need to always present the real register content,
especially on heterogeneous systems.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-17 17:38           ` Andy Lutomirski
@ 2014-11-18 10:56             ` Catalin Marinas
  2014-11-18 18:14               ` Will Deacon
  0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-18 10:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Russell King, musl, Szabolcs Nagy, Kees Cook, Rich Felker,
	linux-arm-kernel

On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> <linux@arm.linux.org.uk> wrote:
> >
> > On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > > ARM11MPCore). ARM is trying to move away from architecture version
> > > numbers, which are rather useful for marketing, to proper feature
> > > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > > features based on architecture version numbers).
> >
> > That may be what is desired, but unfortunately we have no way to export
> > all the intricate feature registers to userspace.  No, elf hwcaps don't
> > support it, there's only 64 bits split between two words there, and
> > there are many more than just 64 bits of feature registers.
> 
> That's a ridiculous argument.  Linux can freely add bits.
> 
> You could add AT_ARM_FEATURES that points to a length followed by the
> indicated number of words, or you could just keep adding new HWCAP
> fields as needed.  This is expandable forever.

That's fine by me, I don't have a problem with more hwcap bits.

> > Given that even cocked these up (just as what happened with the cache
> > type register) decoding of the feature type registers depends on the
> > underlying CPU architecture.
> >
> > So, even _if_ we exported the feature registers to userspace, you still
> > need to know the CPU architecture to decode them properly, so you still
> > need to parse the AT_PLATFORM string to get that information.
> 
> There's no need to expose the hardware feature registers as is.
> Define your own sensible feature bits just for Linux.

We get regular questions about direct access to the hardware feature
bits, many using the x86 cpuid instruction as argument. So far we
couldn't see good enough reasons, otherwise we would have pushed such
instruction in the ARMv8 architecture. It's also not a simple direct
hardware access since the kernel may want to mask some features it does
not support, which pretty much requires HWCAP or some banked CPUID
registers in hardware.

There seems to be a category of software that can't access HWCAP or
/proc/self/auxv. This is Android software, I'm not sure how the
developers came to this conclusion but they think allowing
/proc/cpuinfo access is ok but not /proc/self/auxv. I'm not sure direct
cpuid access is a good enough argument for such scenario. To me it looks
like something they should solve in their security implementation.

Another class are dynamic loaders that don't yet have a C library
loaded. However, as such loaders are the first entry point, I don't see
why they couldn't access auxv directly. One particular scenario here is
finding out which CPU micro-architecture (implementation) it is so that
the dynamic loader could choose a more optimised library. CPUID would
help partially here (get the actual MIDR identifying the CPU
implementation rather than just features) but not on heterogeneous
systems like big.LITTLE. Which means that we would still be better off
with some extra features in auxv, maybe even listing the individual MIDR
for all the CPUs in the system.

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-18 10:56             ` Catalin Marinas
@ 2014-11-18 18:14               ` Will Deacon
  2014-11-18 18:24                 ` Andy Lutomirski
                                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Will Deacon @ 2014-11-18 18:14 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Szabolcs Nagy, Rich Felker, Russell King, Kees Cook, musl,
	Andy Lutomirski, linux-arm-kernel

I was really hoping to avoid this thread, but I wanted to comment on the
suitability of hwcap as a discovery mechanism.

On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> > > Given that even cocked these up (just as what happened with the cache
> > > type register) decoding of the feature type registers depends on the
> > > underlying CPU architecture.
> > >
> > > So, even _if_ we exported the feature registers to userspace, you still
> > > need to know the CPU architecture to decode them properly, so you still
> > > need to parse the AT_PLATFORM string to get that information.
> > 
> > There's no need to expose the hardware feature registers as is.
> > Define your own sensible feature bits just for Linux.
> 
> We get regular questions about direct access to the hardware feature
> bits, many using the x86 cpuid instruction as argument. So far we
> couldn't see good enough reasons, otherwise we would have pushed such
> instruction in the ARMv8 architecture. It's also not a simple direct
> hardware access since the kernel may want to mask some features it does
> not support, which pretty much requires HWCAP or some banked CPUID
> registers in hardware.

Or trapping the undef exception from EL0 and emulating it in the kernel,
which doesn't require any extra hardware, allows the kernel to mask out
things it can't support and gives userspace the information it needs
under any scenario.

> Another class are dynamic loaders that don't yet have a C library
> loaded. However, as such loaders are the first entry point, I don't see
> why they couldn't access auxv directly. One particular scenario here is
> finding out which CPU micro-architecture (implementation) it is so that
> the dynamic loader could choose a more optimised library. CPUID would
> help partially here (get the actual MIDR identifying the CPU
> implementation rather than just features) but not on heterogeneous
> systems like big.LITTLE. Which means that we would still be better off
> with some extra features in auxv, maybe even listing the individual MIDR
> for all the CPUs in the system.

The only way I can see hwcap working is if we follow what the architecture
allows for in ARMv8, which is 4 bits per feature over (currently) around
10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
which is clearly insane.

Instead, we currently advertise a tiny subset of the information exposing
in the ID registers and end up grouping it together in an ad-hoc way without
any buy-in from the instruction set architects. For example, how the
`asimd' hwcap on the arm64 kernel corresponds to feature bits in the MVFR
registers is not at all clear, especially as those hardware registers are
extended over time.

We've done a bit better with the crypto extensions, where we provide
fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
fields in ISAR5 being positive values. I can't find any architectural
guarantees that this will work on future cores (e.g. bumping the 4-bit
field to indicate a subset of previous functionality).

My position is that hwcap is trying to group fine-grained architectural
features into higher level Linux features, but that's likely to lead to
an unmaintainable mess as the feature diversity of real systems continues
to grow. We can fix this easily by exposing the features to userspace in
the form that is described by the architecture (probably with a single
HWCAP to say that such an access won't result in SIGILL).

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-18 18:14               ` Will Deacon
@ 2014-11-18 18:24                 ` Andy Lutomirski
  2014-11-18 19:19                 ` Russell King - ARM Linux
  2014-11-19 18:32                 ` Catalin Marinas
  2 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-18 18:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Szabolcs Nagy, Rich Felker, Russell King,
	Kees Cook, musl, linux-arm-kernel

On Tue, Nov 18, 2014 at 10:14 AM, Will Deacon <will.deacon@arm.com> wrote:
> I was really hoping to avoid this thread, but I wanted to comment on the
> suitability of hwcap as a discovery mechanism.
>
> On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
>> On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
>> > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
>> > > Given that even cocked these up (just as what happened with the cache
>> > > type register) decoding of the feature type registers depends on the
>> > > underlying CPU architecture.
>> > >
>> > > So, even _if_ we exported the feature registers to userspace, you still
>> > > need to know the CPU architecture to decode them properly, so you still
>> > > need to parse the AT_PLATFORM string to get that information.
>> >
>> > There's no need to expose the hardware feature registers as is.
>> > Define your own sensible feature bits just for Linux.
>>
>> We get regular questions about direct access to the hardware feature
>> bits, many using the x86 cpuid instruction as argument. So far we
>> couldn't see good enough reasons, otherwise we would have pushed such
>> instruction in the ARMv8 architecture. It's also not a simple direct
>> hardware access since the kernel may want to mask some features it does
>> not support, which pretty much requires HWCAP or some banked CPUID
>> registers in hardware.
>
> Or trapping the undef exception from EL0 and emulating it in the kernel,
> which doesn't require any extra hardware, allows the kernel to mask out
> things it can't support and gives userspace the information it needs
> under any scenario.
>

This only sounds reasonable to me if non-Linux architectures do it,
too.  Otherwise using a syscall sounds more sensible.

FWIW, I personally don't like the fact that x86 allows unprivileged
code to use CPUID.  This can sort of be disabled on some recent Intel
hardware, but last I checked that ability was explicitly not
guaranteed to exist going forward.

>> Another class are dynamic loaders that don't yet have a C library
>> loaded. However, as such loaders are the first entry point, I don't see
>> why they couldn't access auxv directly. One particular scenario here is
>> finding out which CPU micro-architecture (implementation) it is so that
>> the dynamic loader could choose a more optimised library. CPUID would
>> help partially here (get the actual MIDR identifying the CPU
>> implementation rather than just features) but not on heterogeneous
>> systems like big.LITTLE. Which means that we would still be better off
>> with some extra features in auxv, maybe even listing the individual MIDR
>> for all the CPUs in the system.
>
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.

Stick it in the vdso?  /me snickers

Out of curiosity, why are there 4 bits per feature?

--Andy


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-18 18:14               ` Will Deacon
  2014-11-18 18:24                 ` Andy Lutomirski
@ 2014-11-18 19:19                 ` Russell King - ARM Linux
  2014-11-19 18:32                 ` Catalin Marinas
  2 siblings, 0 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-18 19:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Andy Lutomirski, Szabolcs Nagy, Rich Felker,
	Kees Cook, musl, linux-arm-kernel

On Tue, Nov 18, 2014 at 06:14:25PM +0000, Will Deacon wrote:
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.

Exactly my argument, which got called "rediculous" !  I'm glad that
someone with a similar visibility of the problem has come to the
same conclusion that I did.

> We've done a bit better with the crypto extensions, where we provide
> fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
> fields in ISAR5 being positive values. I can't find any architectural
> guarantees that this will work on future cores (e.g. bumping the 4-bit
> field to indicate a subset of previous functionality).

This is the big problem.  An example of this is the barrier bits, which
indicate whether dmb & dsb are present or not.  It's not a single bit,
but a group of four.  If we provide a single bit for dmb, and another
for dsb (to cater for a future possibility that dmb or dsb may be
separately indicated by a future 4-bit binary pattern), that's fine,
but should we then list every instruction which is conditional on any
ISAR bit pattern?  That becomes a /very/ big space indeed.

If we don't do this, and (eg) we use a single bit for both dmb and dsb,
what if a future bit pattern indicates that (eg) dmb is obsolete, but
dsb hasn't.

Contary to what others assert, this is not a trivial problem, and it's
not trivial to just add additional hwcap bits to solve it.

There's also the problem in /knowing/ what information to export to
userspace, before userspace knows that they need it... which is exactly
what's happened with DMB (and this is not the first time it's happened.)

I suspect this won't be the last time either.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: ARM atomics overhaul for musl
  2014-11-18 18:14               ` Will Deacon
  2014-11-18 18:24                 ` Andy Lutomirski
  2014-11-18 19:19                 ` Russell King - ARM Linux
@ 2014-11-19 18:32                 ` Catalin Marinas
  2 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-19 18:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andy Lutomirski, Szabolcs Nagy, Rich Felker, Russell King,
	Kees Cook, musl, linux-arm-kernel

Hi Will,

On Tue, Nov 18, 2014 at 06:14:25PM +0000, Will Deacon wrote:
> I was really hoping to avoid this thread, but I wanted to comment on the
> suitability of hwcap as a discovery mechanism.

Such discussions come up regularly, so I think we should stick to this
thread and try to sort it out (it would be good to get the glibc folk to
join).

> On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
> > On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> > > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> > > > Given that even cocked these up (just as what happened with the cache
> > > > type register) decoding of the feature type registers depends on the
> > > > underlying CPU architecture.
> > > >
> > > > So, even _if_ we exported the feature registers to userspace, you still
> > > > need to know the CPU architecture to decode them properly, so you still
> > > > need to parse the AT_PLATFORM string to get that information.
> > > 
> > > There's no need to expose the hardware feature registers as is.
> > > Define your own sensible feature bits just for Linux.
> > 
> > We get regular questions about direct access to the hardware feature
> > bits, many using the x86 cpuid instruction as argument. So far we
> > couldn't see good enough reasons, otherwise we would have pushed such
> > instruction in the ARMv8 architecture. It's also not a simple direct
> > hardware access since the kernel may want to mask some features it does
> > not support, which pretty much requires HWCAP or some banked CPUID
> > registers in hardware.
> 
> Or trapping the undef exception from EL0 and emulating it in the kernel,
> which doesn't require any extra hardware, allows the kernel to mask out
> things it can't support and gives userspace the information it needs
> under any scenario.

This would be the simplest. What the hardware could do though is
populating ESR with the right information to avoid decoding the
undefined instruction.

If we go this route, I think we should also expose MIDR for some
micro-architecture optimisations (with the risk that people use it
incorrectly).

> > Another class are dynamic loaders that don't yet have a C library
> > loaded. However, as such loaders are the first entry point, I don't see
> > why they couldn't access auxv directly. One particular scenario here is
> > finding out which CPU micro-architecture (implementation) it is so that
> > the dynamic loader could choose a more optimised library. CPUID would
> > help partially here (get the actual MIDR identifying the CPU
> > implementation rather than just features) but not on heterogeneous
> > systems like big.LITTLE. Which means that we would still be better off
> > with some extra features in auxv, maybe even listing the individual MIDR
> > for all the CPUs in the system.
> 
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.

We have a similar set of registers on ARMv7. But I disagree with the
simplistic calculation that we need 1280 hwcaps. As I replied to
Stephen, many of these are not relevant to user space, other fields are
still reserved and they may never be populated.

Some values we don't even need to bother with, for example on ARMv7
ID_ISAR2[15:12] specify a MLA instruction that has been around since
ARMv4. The way these are structured, ARM assumes an incremental change
to such fields. In the ID_ISAR2[15:12] example, when the field is 1 it
means that MLA is present, when it is 2, it means whatever 1 supported
plus MLS (that's ARMv7 and ARMv6T2). So in this case we only need
HWCAP_MLS as MLA has been there already. Basically we don't need to
encode all the possible states in HWCAP.

> Instead, we currently advertise a tiny subset of the information exposing
> in the ID registers and end up grouping it together in an ad-hoc way without
> any buy-in from the instruction set architects. For example, how the
> `asimd' hwcap on the arm64 kernel corresponds to feature bits in the MVFR
> registers is not at all clear, especially as those hardware registers are
> extended over time.

Minor correction here, there is no MVFR on AArch64. Strangely, the
architects have a field for asimd which means not present when 0 and
present when ffff. It looks like they don't expect to add any values in
here. Crypto instructions which use the same register bank as ASIMD and
are listed in the ID_AA64ISAR registers with the possibility of
extending them (actually the AES fields got PMULL as well and we added a
HWCAP for it).

> We've done a bit better with the crypto extensions, where we provide
> fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
> fields in ISAR5 being positive values. I can't find any architectural
> guarantees that this will work on future cores (e.g. bumping the 4-bit
> field to indicate a subset of previous functionality).

There are no guarantees that they are present (either not built in,
export regulations etc.), that's the aim of CPUID. The problem is when
something not covered by CPUID or covered by it but not by HWCAP gets
removed.

Another example is SWP. It has been included in ARMv7 CPUID as field
ID_ISAR0[3:0] == 1 but allowing implementations to drop this field to 0
(well, we even had HWCAP_SWP but people took its presence for granted,
which is fair since there was no other way to do atomic operations).

> My position is that hwcap is trying to group fine-grained architectural
> features into higher level Linux features, but that's likely to lead to
> an unmaintainable mess as the feature diversity of real systems continues
> to grow. We can fix this easily by exposing the features to userspace in
> the form that is described by the architecture (probably with a single
> HWCAP to say that such an access won't result in SIGILL).

I think there is still value to HWCAP like we do for crypto. We could
add access to CPUID but definitely not a replacement for HWCAP.

What we need from the architects:

1. Clear statement for an architecture version of what's the minimum
   CPUID required
2. Guarantees that a new architecture would not change such minimum to
   smaller values

-- 
Catalin


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-11-19 18:32 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-16  5:56 ARM atomics overhaul for musl Rich Felker
2014-11-16 16:33 ` Russell King - ARM Linux
2014-11-16 16:50   ` Rich Felker
2014-11-16 17:10     ` Russell King - ARM Linux
2014-11-16 18:27       ` Andy Lutomirski
2014-11-16 18:56         ` Rich Felker
2014-11-16 19:02       ` Rich Felker
2014-11-17 13:54       ` Catalin Marinas
2014-11-17 14:11         ` Szabolcs Nagy
2014-11-17 14:47           ` Catalin Marinas
2014-11-17 14:39         ` Russell King - ARM Linux
2014-11-17 15:26           ` Catalin Marinas
2014-11-17 15:47             ` Russell King - ARM Linux
2014-11-17 16:19               ` Catalin Marinas
2014-11-17 16:53                 ` Russell King - ARM Linux
2014-11-17 17:48                   ` Catalin Marinas
2014-11-17 17:38           ` Andy Lutomirski
2014-11-18 10:56             ` Catalin Marinas
2014-11-18 18:14               ` Will Deacon
2014-11-18 18:24                 ` Andy Lutomirski
2014-11-18 19:19                 ` Russell King - ARM Linux
2014-11-19 18:32                 ` Catalin Marinas
2014-11-17 11:48   ` Catalin Marinas
2014-11-17 12:21     ` Arnd Bergmann
2014-11-17 13:30       ` Szabolcs Nagy
2014-11-17 14:34         ` Catalin Marinas
2014-11-16 22:33 ` Jens Gustedt
2014-11-16 23:23   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).