* Re: ARM atomics overhaul for musl
2014-11-16 17:10 ` Russell King - ARM Linux
@ 2014-11-16 18:27 ` Andy Lutomirski
2014-11-16 18:56 ` Rich Felker
2014-11-16 19:02 ` Rich Felker
2014-11-17 13:54 ` Catalin Marinas
2 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-16 18:27 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Rich Felker, musl, Szabolcs Nagy, Kees Cook, linux-arm-kernel
On Sun, Nov 16, 2014 at 9:10 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sun, Nov 16, 2014 at 11:50:17AM -0500, Rich Felker wrote:
>> On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
>> > On Sun, Nov 16, 2014 at 12:56:56AM -0500, Rich Felker wrote:
>> > > Aside from that, the only case among the above that's "right" already
>> > > is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
>> >
>> > I don't think it's wrong at all. The instruction isn't going away from
>> > ARMv7, because ARMv7 deprecates it, but it _still_ has to be implemented
>> > by a CPU conforming to ARMv7. As ARMv7 is going to be the last 32-bit
>> > ARM architecture, we aren't going to see the MCR instruction disappearing
>> > on 32-bit CPUs.
>> >
>> > On ARMv8, it may have been removed, but we have already decided that the
>> > kernel _must_ provide emulation for this op-code, because otherwise we
>> > are breaking existing userspace, which is just not permissible. However,
>> > you are absolutely right that running on ARMv8 should use the new
>> > instruction where possible.
>>
>> Thanks for the clarification on the current and intended future
>> compatibility status!
>>
>> Emulation by the kernel would be something like 100x slower though,
>> no? While it's better than not working at all, I think that would be a
>> good argument for never using mcr explicitly unless either it's known
>> to be supported in hardware or there's no alternative (because kuser
>> helper is missing).
>
> Right, and that is "ARMv8 or later".
>
>> > > However neither is really very easy because it seems impossible to
>> > > detect whether the mcr-based barrier or the dmb-based barrier should
>> > > be used -- there's no hwcap flag to indicate support for the latter.
>> > > This also complicates what to do in builds for v6.
>> >
>> > It is entirely possible to detect whether you should use mcr or dmb, and
>> > you've said how to do that all the way through this message. The mcr
>> > instruction is present on ARMv6, and present but deprecated on ARMv7.
>> > dmb is only present on ARMv7. So, if you know the CPU architecture, you
>> > know whether you should be using nothing, mcr, or dmb.
>> >
>> > There's two ways to get that - firstly, the uname syscall, which gives
>> > a string in the form "armv..." which gives the CPU architecture. The
>>
>> Isn't it clear from the "Windows 10" fiasco that strcmp on a version
>> string is NOT an acceptable way to determine version/capabilities?
>
> Would there be a "Windows 10" fiasco if there had been better control of
> the version numbering? No.
>
> However, this is already in use as a CPU architecture thing. It's had a
> /very/ long history of being used by package managers to detect which
> packages are suitable for installation on a platform, whether it be an
> x86 platform, PowerPC, or ARM platform.
>
>> > second way is the ELF AT_PLATFORM entry. AT_PLATFORM has well defined
>> > format, and is already used to select between different library versions
>> > (so is already a user API, and is subject to user API rules). See:
>> >
>> > $ grep string.*elf_name arch/arm/mm/proc*.S
>> >
>> > for a list of the prefixes - the last character is always the endian-ness.
>> > >From that, you can see that the format is "v" (for version), then the CPU
>> > architecture number, followed (optionally) by any suffixes. Parse that
>> > wisely, and you have the CPU architecture version, and the CPU architecture
>> > version defines whether the MCR or DMB variant should be used.
>>
>> That seems much more acceptable to use.
>>
>> > See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
>> > with recent glibc. I'm sure other C libraries will be getting their own
>> > implementation of that for compatibility with glibc.
>>
>> Yes, we have access to the aux vector, so this should work in
>> principle.
>
> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.
>
> A safe bet would be that DMB is going to be there in the future (if that
> goes, then the ARM architecture will be regarded as even more of a toy
> architecture by Linus than he already regards it today, and he'll probably
> stop giving a damn about whether any changes break ARM.)
>
> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything, which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.
> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p
>
> So, a reasonable parsing of this would be:
>
> const char *ptr;
> int architecture;
>
> ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> assert(ptr);
>
> if (!strncmp(ptr, "aarch64", 7))
> architecture = 8;
> else
> assert(sscanf(ptr, "v%d", &architecture) == 1);
>
> switch (architecture) {
> case 4:
> case 5:
> no_mcr_dmb;
> break;
> case 6:
> use_mcr;
> break;
> default:
> use_dmb;
> break;
> }
>
> That will be safe - we can't really predict what future architectures will
> do, but as I say above, if dmb vanishes in future with a preference for
> yet another different method, I think the ARM architecture will be laughed
> at even more than it is today.
>
> Before this is finalised, I think the ARM64 maintainers need to have a long
> think about the wiseness of their existing AT_PLATFORM string, and consider
> whether they have created something of a cockup there. But that's /their/
> problem, it isn't an ARM32 problem, on ARM32 this is the solution which
> should be used.
Would it make sense for arm and arm64 to add bits for these features
to AT_HWCAP, along with an extra bit indicating that the kernel
provides these bits?
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-16 18:27 ` Andy Lutomirski
@ 2014-11-16 18:56 ` Rich Felker
0 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16 18:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Russell King - ARM Linux, musl, Szabolcs Nagy, Kees Cook,
linux-arm-kernel
On Sun, Nov 16, 2014 at 10:27:04AM -0800, Andy Lutomirski wrote:
> Would it make sense for arm and arm64 to add bits for these features
> to AT_HWCAP, along with an extra bit indicating that the kernel
> provides these bits?
Sadly since it wasn't available there from the beginning, I don't
think there would be a lot of benefit in adding it now, but it
wouldn't hurt.
It might be useful if there's a risk that the existing methods will
break in the future; adding it now would ensure that there are only a
known finite set of kernels for which the old hackish string methods
need to be used, so that there's no concern about their compatibility
with future kernels/models.
Rich
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-16 17:10 ` Russell King - ARM Linux
2014-11-16 18:27 ` Andy Lutomirski
@ 2014-11-16 19:02 ` Rich Felker
2014-11-17 13:54 ` Catalin Marinas
2 siblings, 0 replies; 28+ messages in thread
From: Rich Felker @ 2014-11-16 19:02 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: musl, Andy Lutomirski, Szabolcs Nagy, Kees Cook, linux-arm-kernel
On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > There's two ways to get that - firstly, the uname syscall, which gives
> > > a string in the form "armv..." which gives the CPU architecture. The
> >
> > Isn't it clear from the "Windows 10" fiasco that strcmp on a version
> > string is NOT an acceptable way to determine version/capabilities?
>
> Would there be a "Windows 10" fiasco if there had been better control of
> the version numbering? No.
>
> However, this is already in use as a CPU architecture thing. It's had a
> /very/ long history of being used by package managers to detect which
> packages are suitable for installation on a platform, whether it be an
> x86 platform, PowerPC, or ARM platform.
Use by package managers (which can be upgraded independently, and
which can, in the worst case, be overridden anyway) and by program
binaries for which you might not even have source are very different
issues.
> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.
>
> A safe bet would be that DMB is going to be there in the future (if that
> goes, then the ARM architecture will be regarded as even more of a toy
> architecture by Linus than he already regards it today, and he'll probably
> stop giving a damn about whether any changes break ARM.)
Yes, I think that's reasonable.
> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything, which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.
> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p
I'm confused. Does this mean that 32-bit binaries running on a 64-bit
kernel are going to see "aarch64" here?
> So, a reasonable parsing of this would be:
>
> const char *ptr;
> int architecture;
>
> ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> assert(ptr);
>
> if (!strncmp(ptr, "aarch64", 7))
> architecture = 8;
> else
> assert(sscanf(ptr, "v%d", &architecture) == 1);
>
> switch (architecture) {
> case 4:
> case 5:
> no_mcr_dmb;
> break;
> case 6:
> use_mcr;
> break;
> default:
> use_dmb;
> break;
> }
Is (ptr[1]=='6' && !isdigit(ptr[2])) a safe condition for v6? v4/v5
(and original v6 without the k) don't need to be detected at all since
kuser is mandatory for them and already indicated by !(hwcap &
HWCAP_TLS).
Rich
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-16 17:10 ` Russell King - ARM Linux
2014-11-16 18:27 ` Andy Lutomirski
2014-11-16 19:02 ` Rich Felker
@ 2014-11-17 13:54 ` Catalin Marinas
2014-11-17 14:11 ` Szabolcs Nagy
2014-11-17 14:39 ` Russell King - ARM Linux
2 siblings, 2 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 13:54 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> On Sun, Nov 16, 2014 at 11:50:17AM -0500, Rich Felker wrote:
> > On Sun, Nov 16, 2014 at 04:33:56PM +0000, Russell King - ARM Linux wrote:
> > > second way is the ELF AT_PLATFORM entry. AT_PLATFORM has well defined
> > > format, and is already used to select between different library versions
> > > (so is already a user API, and is subject to user API rules). See:
> > >
> > > $ grep string.*elf_name arch/arm/mm/proc*.S
> > >
> > > for a list of the prefixes - the last character is always the endian-ness.
> > > >From that, you can see that the format is "v" (for version), then the CPU
> > > architecture number, followed (optionally) by any suffixes. Parse that
> > > wisely, and you have the CPU architecture version, and the CPU architecture
> > > version defines whether the MCR or DMB variant should be used.
> >
> > That seems much more acceptable to use.
> >
> > > See http://lwn.net/Articles/519085/ for a way to get at the ELF aux info
> > > with recent glibc. I'm sure other C libraries will be getting their own
> > > implementation of that for compatibility with glibc.
> >
> > Yes, we have access to the aux vector, so this should work in
> > principle.
>
> In both of these cases, we know that:
> - ARMv1-ARMv3 is no longer supported (for several years)
> - ARMv4 and ARMv5 do not have either the MCR or DMB instructions.
> - ARMv6 has the MCR instruction only
> - ARMv7 has the MCR instruction and the DMB instruction.
> - ARMv8 has the DMB instruction, and MCR emulation.
MCR can be enabled in hardware on ARMv8 (SCTLR_EL1 bit), though there is
no guarantee that it is as fast as the DMB (normally I don't see a
reason why it wouldn't, it's just instruction decoding problem but you
never know what the microarchitecture does).
> Now, there is a twist here: ARM64 decided to use an ELF platform string
> of "aarch64" for everything,
Please define "everything". This matches the ELF name as defined in the
ARM 64-bit ELF ABI.
> which means that rather than encoding the
> CPU architecture (like with every other Linux architecture), we have a
> string which encodes the kernel architecture instead, which is absurd.
Just like x86_64 vs i686?
> Obviously, the plan for ARM64 is that there will never be an ARMv9
> architecture, and ARMv8 is the perfect architecture for the future. :p
If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
blurred enough (guess why cpu_architecture() reports ARMv7 for
ARM11MPCore). ARM is trying to move away from architecture version
numbers, which are rather useful for marketing, to proper feature
detection based on CPUID. Whether there is an ARMv9 or not, it's
irrelevant to what Linux should do (i.e. use CPUID rather than guess
features based on architecture version numbers).
> So, a reasonable parsing of this would be:
>
> const char *ptr;
> int architecture;
>
> ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> assert(ptr);
>
> if (!strncmp(ptr, "aarch64", 7))
> architecture = 8;
> else
> assert(sscanf(ptr, "v%d", &architecture) == 1);
Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
on an aarch64 kernel? It reports "v8l", so please don't confuse others.
> Before this is finalised, I think the ARM64 maintainers need to have a long
> think about the wiseness of their existing AT_PLATFORM string, and consider
> whether they have created something of a cockup there.
We had a think long time ago already and it was a wise decision. FWIW,
it matches x86 in this respect.
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 13:54 ` Catalin Marinas
@ 2014-11-17 14:11 ` Szabolcs Nagy
2014-11-17 14:47 ` Catalin Marinas
2014-11-17 14:39 ` Russell King - ARM Linux
1 sibling, 1 reply; 28+ messages in thread
From: Szabolcs Nagy @ 2014-11-17 14:11 UTC (permalink / raw)
To: Catalin Marinas
Cc: Russell King - ARM Linux, Rich Felker, musl, Kees Cook,
linux-arm-kernel, Andy Lutomirski
* Catalin Marinas <catalin.marinas@arm.com> [2014-11-17 13:54:13 +0000]:
> ARM11MPCore). ARM is trying to move away from architecture version
> numbers, which are rather useful for marketing, to proper feature
> detection based on CPUID. Whether there is an ARMv9 or not, it's
> irrelevant to what Linux should do (i.e. use CPUID rather than guess
> features based on architecture version numbers).
>
how to use cpuid from userspace?
should linux export those bits into hwcap?
(i assume all relevant info will be available in /proc/cpuinfo but
that does not work for libc)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 14:11 ` Szabolcs Nagy
@ 2014-11-17 14:47 ` Catalin Marinas
0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 14:47 UTC (permalink / raw)
To: Szabolcs Nagy
Cc: Russell King - ARM Linux, Rich Felker, musl, Kees Cook,
linux-arm-kernel, Andy Lutomirski
On Mon, Nov 17, 2014 at 02:11:23PM +0000, Szabolcs Nagy wrote:
> * Catalin Marinas <catalin.marinas@arm.com> [2014-11-17 13:54:13 +0000]:
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
>
> how to use cpuid from userspace?
It's not possible and not recommended either. Just because the hardware
supports a feature line Neon doesn't mean that the kernel supports it as
well (e.g. saving/restoring registers).
> should linux export those bits into hwcap?
For many things we do. Unfortunately, we didn't do this for DMB, maybe
because we relied too much on the kuser helpers.
> (i assume all relevant info will be available in /proc/cpuinfo but
> that does not work for libc)
I wouldn't recommend /proc/cpuinfo.
If you want to put a dependency on newer kernel versions, there are
options like hwcap, more info in auxv (e.g. an arch-specific dump of the
CPUID registers) or even emulating CPUID access in user space (trap the
undefined instruction and return something that the kernel knows it can
support).
I'm happy with any of these options but I would like to see a concrete
proposal accepted by the libc people before committing to
implementing/supporting such ABI.
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 13:54 ` Catalin Marinas
2014-11-17 14:11 ` Szabolcs Nagy
@ 2014-11-17 14:39 ` Russell King - ARM Linux
2014-11-17 15:26 ` Catalin Marinas
2014-11-17 17:38 ` Andy Lutomirski
1 sibling, 2 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 14:39 UTC (permalink / raw)
To: Catalin Marinas
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > which means that rather than encoding the
> > CPU architecture (like with every other Linux architecture), we have a
> > string which encodes the kernel architecture instead, which is absurd.
>
> Just like x86_64 vs i686?
That is still valid, but let's wait and see what happens when a new
"version" of x86_64 comes along.
However, the issue on x86 is far less of a problem: userspace (even
kernel space) does not have to play these games because the CPUs aren't
designed by people intent on removing old instructions from the
instruction set, thereby stopping existing binaries working without
kernel emulation of the missing instructions.
> > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > architecture, and ARMv8 is the perfect architecture for the future. :p
>
> If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> blurred enough (guess why cpu_architecture() reports ARMv7 for
> ARM11MPCore). ARM is trying to move away from architecture version
> numbers, which are rather useful for marketing, to proper feature
> detection based on CPUID. Whether there is an ARMv9 or not, it's
> irrelevant to what Linux should do (i.e. use CPUID rather than guess
> features based on architecture version numbers).
That may be what is desired, but unfortunately we have no way to export
all the intricate feature registers to userspace. No, elf hwcaps don't
support it, there's only 64 bits split between two words there, and
there are many more than just 64 bits of feature registers.
Given that even cocked these up (just as what happened with the cache
type register) decoding of the feature type registers depends on the
underlying CPU architecture.
So, even _if_ we exported the feature registers to userspace, you still
need to know the CPU architecture to decode them properly, so you still
need to parse the AT_PLATFORM string to get that information.
> > So, a reasonable parsing of this would be:
> >
> > const char *ptr;
> > int architecture;
> >
> > ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > assert(ptr);
> >
> > if (!strncmp(ptr, "aarch64", 7))
> > architecture = 8;
> > else
> > assert(sscanf(ptr, "v%d", &architecture) == 1);
>
> Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> on an aarch64 kernel? It reports "v8l", so please don't confuse others.
Right, I see that now - I'm not knowledgable of the compat code, because
ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.
As for "bothered trying" - tell me how I could possibly try that. You
know full well that I have /no/ 64-bit hardware, and you also know that
I have *nothing* capable of running, let alone building a 64-bit ARM
kernel.
Please, next time you decide to make accusations, bear that in mind - my
"guesses" as to what ARM64 does are based upon reading your code,
sometimes for the first time, and not through any kind of experience of
actually running the damned stuff.
Now, think about what /you/ have said. Think about your assertion about
that "v8l" string. How does the code react to that? Oh my, it sets
"architecture" to 8 ! Oh lookie, it's the right value. Oh look, the
code works correctly.
So, counter to your crap about me confusing others, maybe you should
make that same accusation of yourself!
Maybe ARM and yourself should have tried to be more inclusive with ARMv8
in general, rather than trying to push me away with accusations and the
like (like you're doing right now) every time I say something about it.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 14:39 ` Russell King - ARM Linux
@ 2014-11-17 15:26 ` Catalin Marinas
2014-11-17 15:47 ` Russell King - ARM Linux
2014-11-17 17:38 ` Andy Lutomirski
1 sibling, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 15:26 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > which means that rather than encoding the
> > > CPU architecture (like with every other Linux architecture), we have a
> > > string which encodes the kernel architecture instead, which is absurd.
> >
> > Just like x86_64 vs i686?
>
> That is still valid, but let's wait and see what happens when a new
> "version" of x86_64 comes along.
I'm not familiar enough with x86 but are there any differences between
AMD's and Intel's implementations? Or are they completely binary
compatible (no extensions)? The differences are probably covered by
hwcap.
> However, the issue on x86 is far less of a problem: userspace (even
> kernel space) does not have to play these games because the CPUs aren't
> designed by people intent on removing old instructions from the
> instruction set, thereby stopping existing binaries working without
> kernel emulation of the missing instructions.
That's what ARM hopes with AArch64. Whether this will still be valid
many years in the future, I can't tell (but a lesson learnt by the
architecture folk is that it's impossible to get rid of old instructions
in user space). There are, of course, optional features like crypto but
we use hwcap for them.
> > > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > > architecture, and ARMv8 is the perfect architecture for the future. :p
> >
> > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
>
> That may be what is desired, but unfortunately we have no way to export
> all the intricate feature registers to userspace. No, elf hwcaps don't
> support it, there's only 64 bits split between two words there, and
> there are many more than just 64 bits of feature registers.
As I replied to Szabolcs, maybe we need a way to export more of the
CPUID space to user (like trapping such mrc's and returning something
that the kernel has enabled). I have a similar request from the AArch64
tools people.
> Given that even cocked these up (just as what happened with the cache
> type register) decoding of the feature type registers depends on the
> underlying CPU architecture.
>
> So, even _if_ we exported the feature registers to userspace, you still
> need to know the CPU architecture to decode them properly, so you still
> need to parse the AT_PLATFORM string to get that information.
From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
whether you have the extended CPUID or not. Prior to that, MIDR contains
the architecture number.
> > > So, a reasonable parsing of this would be:
> > >
> > > const char *ptr;
> > > int architecture;
> > >
> > > ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > > assert(ptr);
> > >
> > > if (!strncmp(ptr, "aarch64", 7))
> > > architecture = 8;
> > > else
> > > assert(sscanf(ptr, "v%d", &architecture) == 1);
> >
> > Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> > on an aarch64 kernel? It reports "v8l", so please don't confuse others.
>
> Right, I see that now - I'm not knowledgable of the compat code, because
> ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.
>
> As for "bothered trying" - tell me how I could possibly try that. You
> know full well that I have /no/ 64-bit hardware, and you also know that
> I have *nothing* capable of running, let alone building a 64-bit ARM
> kernel.
Sorry, I was assuming that you have access to at least an ARM software
model (freely available, AArch64 Qemu is also stable enough). If you
have an interest and need ARMv8 hardware, please let us know.
> Now, think about what /you/ have said. Think about your assertion about
> that "v8l" string. How does the code react to that? Oh my, it sets
> "architecture" to 8 ! Oh lookie, it's the right value. Oh look, the
> code works correctly.
So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
such value doesn't give enough information and user space should rely on
hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
another big thing we missed is Thumb-2 in hwcap).
For ARMv8, we have additional features that I would like to include in
hwcap on arm32 (and we've already done this with crypto; there are
load/store with release/acquire semantics which would allow slightly
faster locking, see kuser helpers provided by the AArch64 kernel to
compat user space).
(I'm ignoring the rest of your email in order to keep the thread
constructive)
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 15:26 ` Catalin Marinas
@ 2014-11-17 15:47 ` Russell King - ARM Linux
2014-11-17 16:19 ` Catalin Marinas
0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 15:47 UTC (permalink / raw)
To: Catalin Marinas
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 03:26:25PM +0000, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> > Given that even cocked these up (just as what happened with the cache
> > type register) decoding of the feature type registers depends on the
> > underlying CPU architecture.
> >
> > So, even _if_ we exported the feature registers to userspace, you still
> > need to know the CPU architecture to decode them properly, so you still
> > need to parse the AT_PLATFORM string to get that information.
>
> >From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
> whether you have the extended CPUID or not. Prior to that, MIDR contains
> the architecture number.
That is not what I'm referring to. Where the feature registers are
implemented, there are at least two different interpretations of these
feature registers. They do not comprise of a single coherent set of
definitions - the meaning of some nibbles were changed between different
architectures.
> So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
> such value doesn't give enough information and user space should rely on
> hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
> another big thing we missed is Thumb-2 in hwcap).
Shall we look at the entire code fragment again, and this time use our
heads to *think* about it first?
const char *ptr;
int architecture;
ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
assert(ptr);
if (!strncmp(ptr, "aarch64", 7))
architecture = 8;
else
assert(sscanf(ptr, "v%d", &architecture) == 1);
switch (architecture) {
case 4:
case 5:
no_mcr_dmb;
break;
case 6:
use_mcr;
break;
default:
use_dmb;
break;
}
Now, if 32-bit ARMv8 returns "v8l" from the AT_PLATFORM auxval, then
it is not equal to "aarch64". So, we fall through th sscanf(). sscanf()
parses the "v8l" string, and sets "architecture" to 8.
We now enter the switch() statement. 8 isn't 4. 8 also isn't 5. Nor is
it 6. So, we fall through to the "default" section, which uses "use_dmb".
That's the correct answer for ARMv8 CPUs, because we don't want to use
the MCR instruction there, nor do we want to do nothing. That is not
coincidence - it was /specifically/ designed to select that outcome for
any architecture value it didn't explicitly know. The assumption there
is that ARM are not going to deprecate and remove the dmb instruction.
So it doesn't matter if there's a v9, v10, v11, v12 etc. It'll continue
to select the dmb method until the code is modified to do otherwise.
So, maybe I'm not as stupid as you first thought, and maybe I /did/ think
about this carefully about the possible scenarios before suggesting this
code fragment as a solution.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 15:47 ` Russell King - ARM Linux
@ 2014-11-17 16:19 ` Catalin Marinas
2014-11-17 16:53 ` Russell King - ARM Linux
0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 16:19 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 03:47:39PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 03:26:25PM +0000, Catalin Marinas wrote:
> > On Mon, Nov 17, 2014 at 02:39:05PM +0000, Russell King - ARM Linux wrote:
> > > Given that even cocked these up (just as what happened with the cache
> > > type register) decoding of the feature type registers depends on the
> > > underlying CPU architecture.
> > >
> > > So, even _if_ we exported the feature registers to userspace, you still
> > > need to know the CPU architecture to decode them properly, so you still
> > > need to parse the AT_PLATFORM string to get that information.
> >
> > >From ARMv7 and many recent ARMv6, you can rely on the MIDR to tell you
> > whether you have the extended CPUID or not. Prior to that, MIDR contains
> > the architecture number.
>
> That is not what I'm referring to. Where the feature registers are
> implemented, there are at least two different interpretations of these
> feature registers. They do not comprise of a single coherent set of
> definitions - the meaning of some nibbles were changed between different
> architectures.
They were indeed messy on ARMv6 and earlier but I think they stabilised
enough for ARMv7.
> > So what would you like to set it to? "v7l"? Even for pre-ARMv8 CPUs,
> > such value doesn't give enough information and user space should rely on
> > hwcap (yes, we missed a HWCAP_DMB because we relied on kuser helpers;
> > another big thing we missed is Thumb-2 in hwcap).
>
> Shall we look at the entire code fragment again, and this time use our
> heads to *think* about it first?
>
> const char *ptr;
> int architecture;
>
> ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> assert(ptr);
>
> if (!strncmp(ptr, "aarch64", 7))
> architecture = 8;
> else
> assert(sscanf(ptr, "v%d", &architecture) == 1);
>
> switch (architecture) {
> case 4:
> case 5:
> no_mcr_dmb;
> break;
> case 6:
> use_mcr;
> break;
> default:
> use_dmb;
> break;
> }
>
> Now, if 32-bit ARMv8 returns "v8l" from the AT_PLATFORM auxval, then
> it is not equal to "aarch64". So, we fall through th sscanf(). sscanf()
> parses the "v8l" string, and sets "architecture" to 8.
I agree, but is there a reason to still check for "aarch64" AT_PLATFORM?
> We now enter the switch() statement. 8 isn't 4. 8 also isn't 5. Nor is
> it 6. So, we fall through to the "default" section, which uses "use_dmb".
This indeed works and it is likely the way you designed it with the
_arm32_ kernel in mind (but not before accusing the arm64 maintainers of
making a bad decision with the "aarch64" AT_PLATFORM string for compat
apps ;)).
In your code sequence, the "aarch64" check should be removed, unless you
aim it at portable code between 32 and 64-bit but I would rather use an
#ifdef __aarch64__ in such case. On AArch64 (nothing to do with ARMv8,
v9 etc.), we should move away from thinking in terms of architecture
version numbers but just features.
Similarly for AArch32, I think we should switch our focus from version
numbers (well, only from v7/v8) to features (exposed by the hardware to
the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
change in the architecture version number. We even expose this to user
space via hwcap because that's how we know we have atomic LDRD/STRD.
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 16:19 ` Catalin Marinas
@ 2014-11-17 16:53 ` Russell King - ARM Linux
2014-11-17 17:48 ` Catalin Marinas
0 siblings, 1 reply; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-17 16:53 UTC (permalink / raw)
To: Catalin Marinas
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 04:19:42PM +0000, Catalin Marinas wrote:
> This indeed works and it is likely the way you designed it with the
> _arm32_ kernel in mind (but not before accusing the arm64 maintainers of
> making a bad decision with the "aarch64" AT_PLATFORM string for compat
> apps ;)).
Seeing as I'm the ARM32 maintainer, and you are the ARM64 maintainer, then
of course I designed it with the ARM32 kernel in mind, with a reference
into the ARM64 situation to the best of my knowledge, which suggested
that compat tasks got the "aarch64" string. As you have pointed out,
they don't, they get a "v8l" string, which means...
> In your code sequence, the "aarch64" check should be removed, unless you
> aim it at portable code between 32 and 64-bit but I would rather use an
> #ifdef __aarch64__ in such case. On AArch64 (nothing to do with ARMv8,
> v9 etc.), we should move away from thinking in terms of architecture
> version numbers but just features.
... that it can indeed be removed. To repeat, the check for "aarch64"
was only there because I thought that ARM64 kernels used that for
everything.
> Similarly for AArch32, I think we should switch our focus from version
> numbers (well, only from v7/v8) to features (exposed by the hardware to
> the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
> change in the architecture version number. We even expose this to user
> space via hwcap because that's how we know we have atomic LDRD/STRD.
For this case, I disagree. There is no value (in fact, there is lots of
harm) to adding a hwcap bit for this.
If we added such a hwcap bit, it would mean that userspace would have
to implement the check that I suggested, plus a check for the hwcap bit,
plus maybe a kernel version check to decide which test to use.
That is needlessly complicated. Okay, you could decide that if the
hwcap bit is set, then that indicates that DMB should be used, but you
still have to then check the architecture version if it isn't set to
be compatible with old kernels.
So, it's all round simpler just to do the architecture version check -
and we know for certain that ARMv4, ARMv5, and ARMv6 do not have dmb.
We know that ARMv7 and ARMv8 both have dmb, and it is likely (especially
if you exert pressure on the architecture people) that dmb will remain
implemented. We also know that ARMv6 implements the mcr instruction.
So, in this case, we know everything we need to know just by looking
at the architecture version.
Of course, we can't predict the future with any accuracy, but hoping
that dmb won't be deprecated and obsoleted is a reasonable hope, and
if it does, we would need to modify the code to add the new method in
any case.
What the code is *intentionally* safe from is the architecture number
incrementing.
So, I really don't see the point in exposing the presence of DMB via
a hwcap bit - if we wanted to do that, it's something that we should
have done at the very start, but we didn't. Now, it's pointless to
do so.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 16:53 ` Russell King - ARM Linux
@ 2014-11-17 17:48 ` Catalin Marinas
0 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-17 17:48 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Rich Felker, Szabolcs Nagy, musl, Kees Cook, linux-arm-kernel,
Andy Lutomirski
On Mon, Nov 17, 2014 at 04:53:34PM +0000, Russell King - ARM Linux wrote:
> On Mon, Nov 17, 2014 at 04:19:42PM +0000, Catalin Marinas wrote:
> > Similarly for AArch32, I think we should switch our focus from version
> > numbers (well, only from v7/v8) to features (exposed by the hardware to
> > the kernel via CPUID). An example is how we got LPAE on ARMv7 without a
> > change in the architecture version number. We even expose this to user
> > space via hwcap because that's how we know we have atomic LDRD/STRD.
>
> For this case, I disagree. There is no value (in fact, there is lots of
> harm) to adding a hwcap bit for this.
>
> If we added such a hwcap bit, it would mean that userspace would have
> to implement the check that I suggested, plus a check for the hwcap bit,
> plus maybe a kernel version check to decide which test to use.
[...]
> So, I really don't see the point in exposing the presence of DMB via
> a hwcap bit - if we wanted to do that, it's something that we should
> have done at the very start, but we didn't. Now, it's pointless to
> do so.
I agree with you on a HWCAP_DMB bit, it's too late now and code should
rely on the architecture version instead.
But my point is about new features that will appear (or already did) in
the current or next architecture versions (e.g. ARMv8). So far we seem
to have avoided adding HWCAP bits for new features that were mandated by
certain architecture versions, probably under the assumption that
software would check the architecture version number.
For example, on ARMv8, do you want to add a HWCAP_LDACQ (for
acquire/release semantics) or we tell user space to check for "v8l"
instead? There are additional hints available for AArch32 DMB and DSB
(ISHLD, OSHLD, NSHLD, LD), there are LDREX/STREX with acquire/release
semantics, a new SEVL instruction. User space needs to know about these
not only from a backwards compatibility perspective (I don't expect DMB
to ever go away) but from a future optimisation one.
If you are worried about the risk of running out of HWCAP bits (we still
have I think 32 left, we could also introduce elf_hwcap3), what about,
for new features, adding a HWCAP_CPUID (only when the extended CPUID is
present) and, when enabled, allow user space to probe CPUID registers
via an ARM-specific syscall or undef hooks? These would filtered by the
kernel, it doesn't need to always present the real register content,
especially on heterogeneous systems.
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 14:39 ` Russell King - ARM Linux
2014-11-17 15:26 ` Catalin Marinas
@ 2014-11-17 17:38 ` Andy Lutomirski
2014-11-18 10:56 ` Catalin Marinas
1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-17 17:38 UTC (permalink / raw)
To: Russell King
Cc: musl, Catalin Marinas, Szabolcs Nagy, Kees Cook, Rich Felker,
linux-arm-kernel
On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
<linux@arm.linux.org.uk> wrote:
>
> On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > On Sun, Nov 16, 2014 at 05:10:55PM +0000, Russell King - ARM Linux wrote:
> > > which means that rather than encoding the
> > > CPU architecture (like with every other Linux architecture), we have a
> > > string which encodes the kernel architecture instead, which is absurd.
> >
> > Just like x86_64 vs i686?
>
> That is still valid, but let's wait and see what happens when a new
> "version" of x86_64 comes along.
>
> However, the issue on x86 is far less of a problem: userspace (even
> kernel space) does not have to play these games because the CPUs aren't
> designed by people intent on removing old instructions from the
> instruction set, thereby stopping existing binaries working without
> kernel emulation of the missing instructions.
>
> > > Obviously, the plan for ARM64 is that there will never be an ARMv9
> > > architecture, and ARMv8 is the perfect architecture for the future. :p
> >
> > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > ARM11MPCore). ARM is trying to move away from architecture version
> > numbers, which are rather useful for marketing, to proper feature
> > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > features based on architecture version numbers).
>
> That may be what is desired, but unfortunately we have no way to export
> all the intricate feature registers to userspace. No, elf hwcaps don't
> support it, there's only 64 bits split between two words there, and
> there are many more than just 64 bits of feature registers.
That's a ridiculous argument. Linux can freely add bits.
You could add AT_ARM_FEATURES that points to a length followed by the
indicated number of words, or you could just keep adding new HWCAP
fields as needed. This is expandable forever.
As an x86 person and a complete ARM outsider, this situation is
totally nuts. There is no good reason *not* to have feature bits, and
even in x86 land, relying on the architecture version is dangerous.
(Intel seems to be reinstating version 5 right now with Quark, and
even that is having minor issues since it's not really quite a version
5 chip.)
>
> Given that even cocked these up (just as what happened with the cache
> type register) decoding of the feature type registers depends on the
> underlying CPU architecture.
>
> So, even _if_ we exported the feature registers to userspace, you still
> need to know the CPU architecture to decode them properly, so you still
> need to parse the AT_PLATFORM string to get that information.
>
There's no need to expose the hardware feature registers as is.
Define your own sensible feature bits just for Linux.
Yes, libc implementations will need a fallback for old kernels, but at
least the set of legacy configurations that need to be supported that
way will stop increasing at some point.
--Andy
> > > So, a reasonable parsing of this would be:
> > >
> > > const char *ptr;
> > > int architecture;
> > >
> > > ptr = (const char *)(uintptr_t)getauxval(AT_PLATFORM);
> > > assert(ptr);
> > >
> > > if (!strncmp(ptr, "aarch64", 7))
> > > architecture = 8;
> > > else
> > > assert(sscanf(ptr, "v%d", &architecture) == 1);
> >
> > Oh, have you even bothered trying 32-bit (compat) getauxval(AT_PLATFORM)
> > on an aarch64 kernel? It reports "v8l", so please don't confuse others.
>
> Right, I see that now - I'm not knowledgable of the compat code, because
> ARM32 has nothing to do with it, and I missed COMPAT_ELF_PLATFORM.
>
> As for "bothered trying" - tell me how I could possibly try that. You
> know full well that I have /no/ 64-bit hardware, and you also know that
> I have *nothing* capable of running, let alone building a 64-bit ARM
> kernel.
>
> Please, next time you decide to make accusations, bear that in mind - my
> "guesses" as to what ARM64 does are based upon reading your code,
> sometimes for the first time, and not through any kind of experience of
> actually running the damned stuff.
>
> Now, think about what /you/ have said. Think about your assertion about
> that "v8l" string. How does the code react to that? Oh my, it sets
> "architecture" to 8 ! Oh lookie, it's the right value. Oh look, the
> code works correctly.
>
> So, counter to your crap about me confusing others, maybe you should
> make that same accusation of yourself!
>
> Maybe ARM and yourself should have tried to be more inclusive with ARMv8
> in general, rather than trying to push me away with accusations and the
> like (like you're doing right now) every time I say something about it.
>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-17 17:38 ` Andy Lutomirski
@ 2014-11-18 10:56 ` Catalin Marinas
2014-11-18 18:14 ` Will Deacon
0 siblings, 1 reply; 28+ messages in thread
From: Catalin Marinas @ 2014-11-18 10:56 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Russell King, musl, Szabolcs Nagy, Kees Cook, Rich Felker,
linux-arm-kernel
On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> <linux@arm.linux.org.uk> wrote:
> >
> > On Mon, Nov 17, 2014 at 01:54:13PM +0000, Catalin Marinas wrote:
> > > If you haven't noticed, the distinction between ARMv6 and ARMv7 has been
> > > blurred enough (guess why cpu_architecture() reports ARMv7 for
> > > ARM11MPCore). ARM is trying to move away from architecture version
> > > numbers, which are rather useful for marketing, to proper feature
> > > detection based on CPUID. Whether there is an ARMv9 or not, it's
> > > irrelevant to what Linux should do (i.e. use CPUID rather than guess
> > > features based on architecture version numbers).
> >
> > That may be what is desired, but unfortunately we have no way to export
> > all the intricate feature registers to userspace. No, elf hwcaps don't
> > support it, there's only 64 bits split between two words there, and
> > there are many more than just 64 bits of feature registers.
>
> That's a ridiculous argument. Linux can freely add bits.
>
> You could add AT_ARM_FEATURES that points to a length followed by the
> indicated number of words, or you could just keep adding new HWCAP
> fields as needed. This is expandable forever.
That's fine by me, I don't have a problem with more hwcap bits.
> > Given that even cocked these up (just as what happened with the cache
> > type register) decoding of the feature type registers depends on the
> > underlying CPU architecture.
> >
> > So, even _if_ we exported the feature registers to userspace, you still
> > need to know the CPU architecture to decode them properly, so you still
> > need to parse the AT_PLATFORM string to get that information.
>
> There's no need to expose the hardware feature registers as is.
> Define your own sensible feature bits just for Linux.
We get regular questions about direct access to the hardware feature
bits, many using the x86 cpuid instruction as argument. So far we
couldn't see good enough reasons, otherwise we would have pushed such
instruction in the ARMv8 architecture. It's also not a simple direct
hardware access since the kernel may want to mask some features it does
not support, which pretty much requires HWCAP or some banked CPUID
registers in hardware.
There seems to be a category of software that can't access HWCAP or
/proc/self/auxv. This is Android software, I'm not sure how the
developers came to this conclusion but they think allowing
/proc/cpuinfo access is ok but not /proc/self/auxv. I'm not sure direct
cpuid access is a good enough argument for such scenario. To me it looks
like something they should solve in their security implementation.
Another class are dynamic loaders that don't yet have a C library
loaded. However, as such loaders are the first entry point, I don't see
why they couldn't access auxv directly. One particular scenario here is
finding out which CPU micro-architecture (implementation) it is so that
the dynamic loader could choose a more optimised library. CPUID would
help partially here (get the actual MIDR identifying the CPU
implementation rather than just features) but not on heterogeneous
systems like big.LITTLE. Which means that we would still be better off
with some extra features in auxv, maybe even listing the individual MIDR
for all the CPUs in the system.
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-18 10:56 ` Catalin Marinas
@ 2014-11-18 18:14 ` Will Deacon
2014-11-18 18:24 ` Andy Lutomirski
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Will Deacon @ 2014-11-18 18:14 UTC (permalink / raw)
To: Catalin Marinas
Cc: Szabolcs Nagy, Rich Felker, Russell King, Kees Cook, musl,
Andy Lutomirski, linux-arm-kernel
I was really hoping to avoid this thread, but I wanted to comment on the
suitability of hwcap as a discovery mechanism.
On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
> On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> > > Given that even cocked these up (just as what happened with the cache
> > > type register) decoding of the feature type registers depends on the
> > > underlying CPU architecture.
> > >
> > > So, even _if_ we exported the feature registers to userspace, you still
> > > need to know the CPU architecture to decode them properly, so you still
> > > need to parse the AT_PLATFORM string to get that information.
> >
> > There's no need to expose the hardware feature registers as is.
> > Define your own sensible feature bits just for Linux.
>
> We get regular questions about direct access to the hardware feature
> bits, many using the x86 cpuid instruction as argument. So far we
> couldn't see good enough reasons, otherwise we would have pushed such
> instruction in the ARMv8 architecture. It's also not a simple direct
> hardware access since the kernel may want to mask some features it does
> not support, which pretty much requires HWCAP or some banked CPUID
> registers in hardware.
Or trapping the undef exception from EL0 and emulating it in the kernel,
which doesn't require any extra hardware, allows the kernel to mask out
things it can't support and gives userspace the information it needs
under any scenario.
> Another class are dynamic loaders that don't yet have a C library
> loaded. However, as such loaders are the first entry point, I don't see
> why they couldn't access auxv directly. One particular scenario here is
> finding out which CPU micro-architecture (implementation) it is so that
> the dynamic loader could choose a more optimised library. CPUID would
> help partially here (get the actual MIDR identifying the CPU
> implementation rather than just features) but not on heterogeneous
> systems like big.LITTLE. Which means that we would still be better off
> with some extra features in auxv, maybe even listing the individual MIDR
> for all the CPUs in the system.
The only way I can see hwcap working is if we follow what the architecture
allows for in ARMv8, which is 4 bits per feature over (currently) around
10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
which is clearly insane.
Instead, we currently advertise a tiny subset of the information exposing
in the ID registers and end up grouping it together in an ad-hoc way without
any buy-in from the instruction set architects. For example, how the
`asimd' hwcap on the arm64 kernel corresponds to feature bits in the MVFR
registers is not at all clear, especially as those hardware registers are
extended over time.
We've done a bit better with the crypto extensions, where we provide
fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
fields in ISAR5 being positive values. I can't find any architectural
guarantees that this will work on future cores (e.g. bumping the 4-bit
field to indicate a subset of previous functionality).
My position is that hwcap is trying to group fine-grained architectural
features into higher level Linux features, but that's likely to lead to
an unmaintainable mess as the feature diversity of real systems continues
to grow. We can fix this easily by exposing the features to userspace in
the form that is described by the architecture (probably with a single
HWCAP to say that such an access won't result in SIGILL).
Will
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-18 18:14 ` Will Deacon
@ 2014-11-18 18:24 ` Andy Lutomirski
2014-11-18 19:19 ` Russell King - ARM Linux
2014-11-19 18:32 ` Catalin Marinas
2 siblings, 0 replies; 28+ messages in thread
From: Andy Lutomirski @ 2014-11-18 18:24 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Szabolcs Nagy, Rich Felker, Russell King,
Kees Cook, musl, linux-arm-kernel
On Tue, Nov 18, 2014 at 10:14 AM, Will Deacon <will.deacon@arm.com> wrote:
> I was really hoping to avoid this thread, but I wanted to comment on the
> suitability of hwcap as a discovery mechanism.
>
> On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
>> On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
>> > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
>> > > Given that even cocked these up (just as what happened with the cache
>> > > type register) decoding of the feature type registers depends on the
>> > > underlying CPU architecture.
>> > >
>> > > So, even _if_ we exported the feature registers to userspace, you still
>> > > need to know the CPU architecture to decode them properly, so you still
>> > > need to parse the AT_PLATFORM string to get that information.
>> >
>> > There's no need to expose the hardware feature registers as is.
>> > Define your own sensible feature bits just for Linux.
>>
>> We get regular questions about direct access to the hardware feature
>> bits, many using the x86 cpuid instruction as argument. So far we
>> couldn't see good enough reasons, otherwise we would have pushed such
>> instruction in the ARMv8 architecture. It's also not a simple direct
>> hardware access since the kernel may want to mask some features it does
>> not support, which pretty much requires HWCAP or some banked CPUID
>> registers in hardware.
>
> Or trapping the undef exception from EL0 and emulating it in the kernel,
> which doesn't require any extra hardware, allows the kernel to mask out
> things it can't support and gives userspace the information it needs
> under any scenario.
>
This only sounds reasonable to me if non-Linux architectures do it,
too. Otherwise using a syscall sounds more sensible.
FWIW, I personally don't like the fact that x86 allows unprivileged
code to use CPUID. This can sort of be disabled on some recent Intel
hardware, but last I checked that ability was explicitly not
guaranteed to exist going forward.
>> Another class are dynamic loaders that don't yet have a C library
>> loaded. However, as such loaders are the first entry point, I don't see
>> why they couldn't access auxv directly. One particular scenario here is
>> finding out which CPU micro-architecture (implementation) it is so that
>> the dynamic loader could choose a more optimised library. CPUID would
>> help partially here (get the actual MIDR identifying the CPU
>> implementation rather than just features) but not on heterogeneous
>> systems like big.LITTLE. Which means that we would still be better off
>> with some extra features in auxv, maybe even listing the individual MIDR
>> for all the CPUs in the system.
>
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.
Stick it in the vdso? /me snickers
Out of curiosity, why are there 4 bits per feature?
--Andy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-18 18:14 ` Will Deacon
2014-11-18 18:24 ` Andy Lutomirski
@ 2014-11-18 19:19 ` Russell King - ARM Linux
2014-11-19 18:32 ` Catalin Marinas
2 siblings, 0 replies; 28+ messages in thread
From: Russell King - ARM Linux @ 2014-11-18 19:19 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Andy Lutomirski, Szabolcs Nagy, Rich Felker,
Kees Cook, musl, linux-arm-kernel
On Tue, Nov 18, 2014 at 06:14:25PM +0000, Will Deacon wrote:
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.
Exactly my argument, which got called "rediculous" ! I'm glad that
someone with a similar visibility of the problem has come to the
same conclusion that I did.
> We've done a bit better with the crypto extensions, where we provide
> fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
> fields in ISAR5 being positive values. I can't find any architectural
> guarantees that this will work on future cores (e.g. bumping the 4-bit
> field to indicate a subset of previous functionality).
This is the big problem. An example of this is the barrier bits, which
indicate whether dmb & dsb are present or not. It's not a single bit,
but a group of four. If we provide a single bit for dmb, and another
for dsb (to cater for a future possibility that dmb or dsb may be
separately indicated by a future 4-bit binary pattern), that's fine,
but should we then list every instruction which is conditional on any
ISAR bit pattern? That becomes a /very/ big space indeed.
If we don't do this, and (eg) we use a single bit for both dmb and dsb,
what if a future bit pattern indicates that (eg) dmb is obsolete, but
dsb hasn't.
Contary to what others assert, this is not a trivial problem, and it's
not trivial to just add additional hwcap bits to solve it.
There's also the problem in /knowing/ what information to export to
userspace, before userspace knows that they need it... which is exactly
what's happened with DMB (and this is not the first time it's happened.)
I suspect this won't be the last time either.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: ARM atomics overhaul for musl
2014-11-18 18:14 ` Will Deacon
2014-11-18 18:24 ` Andy Lutomirski
2014-11-18 19:19 ` Russell King - ARM Linux
@ 2014-11-19 18:32 ` Catalin Marinas
2 siblings, 0 replies; 28+ messages in thread
From: Catalin Marinas @ 2014-11-19 18:32 UTC (permalink / raw)
To: Will Deacon
Cc: Andy Lutomirski, Szabolcs Nagy, Rich Felker, Russell King,
Kees Cook, musl, linux-arm-kernel
Hi Will,
On Tue, Nov 18, 2014 at 06:14:25PM +0000, Will Deacon wrote:
> I was really hoping to avoid this thread, but I wanted to comment on the
> suitability of hwcap as a discovery mechanism.
Such discussions come up regularly, so I think we should stick to this
thread and try to sort it out (it would be good to get the glibc folk to
join).
> On Tue, Nov 18, 2014 at 10:56:12AM +0000, Catalin Marinas wrote:
> > On Mon, Nov 17, 2014 at 05:38:46PM +0000, Andy Lutomirski wrote:
> > > On Nov 17, 2014 6:39 AM, "Russell King - ARM Linux"
> > > > Given that even cocked these up (just as what happened with the cache
> > > > type register) decoding of the feature type registers depends on the
> > > > underlying CPU architecture.
> > > >
> > > > So, even _if_ we exported the feature registers to userspace, you still
> > > > need to know the CPU architecture to decode them properly, so you still
> > > > need to parse the AT_PLATFORM string to get that information.
> > >
> > > There's no need to expose the hardware feature registers as is.
> > > Define your own sensible feature bits just for Linux.
> >
> > We get regular questions about direct access to the hardware feature
> > bits, many using the x86 cpuid instruction as argument. So far we
> > couldn't see good enough reasons, otherwise we would have pushed such
> > instruction in the ARMv8 architecture. It's also not a simple direct
> > hardware access since the kernel may want to mask some features it does
> > not support, which pretty much requires HWCAP or some banked CPUID
> > registers in hardware.
>
> Or trapping the undef exception from EL0 and emulating it in the kernel,
> which doesn't require any extra hardware, allows the kernel to mask out
> things it can't support and gives userspace the information it needs
> under any scenario.
This would be the simplest. What the hardware could do though is
populating ESR with the right information to avoid decoding the
undefined instruction.
If we go this route, I think we should also expose MIDR for some
micro-architecture optimisations (with the risk that people use it
incorrectly).
> > Another class are dynamic loaders that don't yet have a C library
> > loaded. However, as such loaders are the first entry point, I don't see
> > why they couldn't access auxv directly. One particular scenario here is
> > finding out which CPU micro-architecture (implementation) it is so that
> > the dynamic loader could choose a more optimised library. CPUID would
> > help partially here (get the actual MIDR identifying the CPU
> > implementation rather than just features) but not on heterogeneous
> > systems like big.LITTLE. Which means that we would still be better off
> > with some extra features in auxv, maybe even listing the individual MIDR
> > for all the CPUs in the system.
>
> The only way I can see hwcap working is if we follow what the architecture
> allows for in ARMv8, which is 4 bits per feature over (currently) around
> 10 32-bit registers. That would mean potentially exposing 1280 hwcaps,
> which is clearly insane.
We have a similar set of registers on ARMv7. But I disagree with the
simplistic calculation that we need 1280 hwcaps. As I replied to
Stephen, many of these are not relevant to user space, other fields are
still reserved and they may never be populated.
Some values we don't even need to bother with, for example on ARMv7
ID_ISAR2[15:12] specify a MLA instruction that has been around since
ARMv4. The way these are structured, ARM assumes an incremental change
to such fields. In the ID_ISAR2[15:12] example, when the field is 1 it
means that MLA is present, when it is 2, it means whatever 1 supported
plus MLS (that's ARMv7 and ARMv6T2). So in this case we only need
HWCAP_MLS as MLA has been there already. Basically we don't need to
encode all the possible states in HWCAP.
> Instead, we currently advertise a tiny subset of the information exposing
> in the ID registers and end up grouping it together in an ad-hoc way without
> any buy-in from the instruction set architects. For example, how the
> `asimd' hwcap on the arm64 kernel corresponds to feature bits in the MVFR
> registers is not at all clear, especially as those hardware registers are
> extended over time.
Minor correction here, there is no MVFR on AArch64. Strangely, the
architects have a field for asimd which means not present when 0 and
present when ffff. It looks like they don't expect to add any values in
here. Crypto instructions which use the same register bank as ASIMD and
are listed in the ID_AA64ISAR registers with the possibility of
extending them (actually the AES fields got PMULL as well and we added a
HWCAP for it).
> We've done a bit better with the crypto extensions, where we provide
> fine-grained sha1, sha2 etc hwcaps, but this is based on the relavant 4-bit
> fields in ISAR5 being positive values. I can't find any architectural
> guarantees that this will work on future cores (e.g. bumping the 4-bit
> field to indicate a subset of previous functionality).
There are no guarantees that they are present (either not built in,
export regulations etc.), that's the aim of CPUID. The problem is when
something not covered by CPUID or covered by it but not by HWCAP gets
removed.
Another example is SWP. It has been included in ARMv7 CPUID as field
ID_ISAR0[3:0] == 1 but allowing implementations to drop this field to 0
(well, we even had HWCAP_SWP but people took its presence for granted,
which is fair since there was no other way to do atomic operations).
> My position is that hwcap is trying to group fine-grained architectural
> features into higher level Linux features, but that's likely to lead to
> an unmaintainable mess as the feature diversity of real systems continues
> to grow. We can fix this easily by exposing the features to userspace in
> the form that is described by the architecture (probably with a single
> HWCAP to say that such an access won't result in SIGILL).
I think there is still value to HWCAP like we do for crypto. We could
add access to CPUID but definitely not a replacement for HWCAP.
What we need from the architects:
1. Clear statement for an architecture version of what's the minimum
CPUID required
2. Guarantees that a new architecture would not change such minimum to
smaller values
--
Catalin
^ permalink raw reply [flat|nested] 28+ messages in thread