public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* libc_hwcap
@ 2024-02-19 18:51 Peter Tribble
  2024-02-19 20:00 ` [developer] libc_hwcap Chris Ridd
  2024-02-21 20:01 ` Robert Mustacchi
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Tribble @ 2024-02-19 18:51 UTC (permalink / raw)
  To: illumos-dev

[-- Attachment #1: Type: text/plain, Size: 1356 bytes --]

I was poking around at /usr/lib/libc, and started to wonder how current it
is.

We have 3 "optimized" copies of libc there, with capabilities, and the
"best"
one is selected by moe to be loopback-mounted over libc:

libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
libc_hwcap3.so.1 [SSE MMX CMOV FPU]

Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall (ie
AMD).
And libc_hwcap3 is the (legacy, I guess) int 0x91.

Given that any cpu we're running on must be reasonably modern, as it
has to be a 64-bit processor, this leads to a couple of questions:

1. Are there any cpus we can boot on that will select hwcap3? If not,
can we drop it? (We'll always fall back on the base libc in any case.)

2. Do all cpus we support have SSE2, in which case maybe we can
enable SSE2 for hwcap1?

One might wonder what other baseline optimizations could be made.

And then this isn't an issue for 64-bit. As such, the code to mount an
optimized 64-bit libc in /lib/svc/method/fs-root is redundant, and is ripe
for removal. Does that sound right?

(The only other hwcap file I can find is /lib/libmvec/libmvec_hwcap1.so.1,
and I haven't yet found anything that uses it, on x86 at any rate.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

[-- Attachment #2: Type: text/html, Size: 1994 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-19 18:51 libc_hwcap Peter Tribble
@ 2024-02-19 20:00 ` Chris Ridd
  2024-02-21 20:07   ` Robert Mustacchi
  2024-04-06 14:30   ` Peter Tribble
  2024-02-21 20:01 ` Robert Mustacchi
  1 sibling, 2 replies; 7+ messages in thread
From: Chris Ridd @ 2024-02-19 20:00 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]



> On 19 Feb 2024, at 18:51, Peter Tribble <peter.tribble@gmail.com> wrote:
> 
> I was poking around at /usr/lib/libc, and started to wonder how current it is.
> 
> We have 3 "optimized" copies of libc there, with capabilities, and the  "best"
> one is selected by moe to be loopback-mounted over libc:
> 
> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
> 
> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall (ie AMD).
> And libc_hwcap3 is the (legacy, I guess) int 0x91.

A number of different x86-64 microarchitecture levels have been defined by AMD/Intel/Red Hat/SUSE, see https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

Would it be helpful to use these levels (x86-64-v1, v2, v3, v4) in Illumos’s libc_hwcap?

Chris

[-- Attachment #2: Type: text/html, Size: 1468 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-19 18:51 libc_hwcap Peter Tribble
  2024-02-19 20:00 ` [developer] libc_hwcap Chris Ridd
@ 2024-02-21 20:01 ` Robert Mustacchi
  1 sibling, 0 replies; 7+ messages in thread
From: Robert Mustacchi @ 2024-02-21 20:01 UTC (permalink / raw)
  To: illumos-developer, Peter Tribble

On 2/19/24 10:51, Peter Tribble wrote:
> I was poking around at /usr/lib/libc, and started to wonder how current it
> is.
> 
> We have 3 "optimized" copies of libc there, with capabilities, and the
> "best"
> one is selected by moe to be loopback-mounted over libc:
> 
> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
> 
> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall (ie
> AMD).
> And libc_hwcap3 is the (legacy, I guess) int 0x91.
> 
> Given that any cpu we're running on must be reasonably modern, as it
> has to be a 64-bit processor, this leads to a couple of questions:
> 
> 1. Are there any cpus we can boot on that will select hwcap3? If not,
> can we drop it? (We'll always fall back on the base libc in any case.)

I don't have as much confidence to say yes to this today. I would want
to assume that any 64-bit CPU Intel made including the various old Atom
parts would have support for sysenter, but I'm not certain.

> 2. Do all cpus we support have SSE2, in which case maybe we can
> enable SSE2 for hwcap1?

Because we're constrained to supporting 64-bit CPUs, all 64-bit CPUs
support the baseline amd64 ISA which includes SSE2 and basically the
instructions that Chris mentioned in what was linked.

> One might wonder what other baseline optimizations could be made.
> 
> And then this isn't an issue for 64-bit. As such, the code to mount an
> optimized 64-bit libc in /lib/svc/method/fs-root is redundant, and is ripe
> for removal. Does that sound right?

Given that we don't have that, I think that's correct. I think there's
an open question on how we want to approach that. But I'll follow up a
bit there on Chris's thread.

Robert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-19 20:00 ` [developer] libc_hwcap Chris Ridd
@ 2024-02-21 20:07   ` Robert Mustacchi
  2024-02-21 20:21     ` C
  2024-04-06 14:30   ` Peter Tribble
  1 sibling, 1 reply; 7+ messages in thread
From: Robert Mustacchi @ 2024-02-21 20:07 UTC (permalink / raw)
  To: illumos-developer

On 2/19/24 12:00, Chris Ridd via illumos-developer wrote:
> 
> 
>> On 19 Feb 2024, at 18:51, Peter Tribble <peter.tribble@gmail.com> wrote:
>>
>> I was poking around at /usr/lib/libc, and started to wonder how current it is.
>>
>> We have 3 "optimized" copies of libc there, with capabilities, and the  "best"
>> one is selected by moe to be loopback-mounted over libc:
>>
>> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
>> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
>> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
>>
>> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall (ie AMD).
>> And libc_hwcap3 is the (legacy, I guess) int 0x91.
> 
> A number of different x86-64 microarchitecture levels have been defined by AMD/Intel/Red Hat/SUSE, see https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
> 
> Would it be helpful to use these levels (x86-64-v1, v2, v3, v4) in Illumos’s libc_hwcap?

Right now we don't have much here for 64-bit applications. I think the
thing we need to look at for libc and others is how much of this is
really about changing the base for the entire thing versus optimizing
for a couple of particular instructions. There's an alternative approach
that we use in other libraries around object capabilities.

As an example, there's a single 64-bit libmd.so.1 that is loaded.
However, if your process has the SHA hardware capability present, then
the run-time link-editor (rtld) will transparently relocate the SHA
instruction based version of the SHA256 transform function rather than
using the normal one. So I think writ large as we're evaluating what we
want to enable we want to look at where are the tradeoffs in using
disjoint objects and leveraging the platform identification ala isainfo
which is the linked bit is similar to versus paying the relocation cost.
libc doesn't use either method today, but just something for us to think
about as folks evaluate different optimized versions of these functions
which is probably where the initial bang for the buck is versus building
several different ISA-enabled versions.

Robert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-21 20:07   ` Robert Mustacchi
@ 2024-02-21 20:21     ` C
  2024-02-25 21:55       ` Peter Tribble
  0 siblings, 1 reply; 7+ messages in thread
From: C @ 2024-02-21 20:21 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 4098 bytes --]

As constructively as I can ask -

Is there any benchmarks showing that the unoptimized libc functions are
actually a hotspot?

There's going to be a very very wide range of recent extensions that could
be enabled, but it's going to be highly CPU specific about
if/when/where/who/what it should happen.

// AVX512 is a very good example

If moe is extended to have a much better understanding about CPU profiles
and not just extension capability it may make sense.


In terms of places I could see this potentially have more impact..
libm and libmv (it's been so long forgive me I'm mistaken)

(isn't the mv the vector version of the math functions..)

Assuming that malloc isn't hand tuned assembly this is also going to
certainly have impact, but again is CPU specific

Stepping back for a minute will be the compiler and toolchain support. I
have no idea if newer gcc is better (performance perspective), but this
could be the biggest limiting factor in terms of performance impact from
enabling anything.

Let's say you enable all these fancy extensions, but gcc is tuned for Intel
heuristics and you're running AMD.

(I don't remember previous generations of gcc caring too much about CPU
specific stuff)
// I have been out of the compiler world for more than a few years now and
maybe things have changed..

I'm just a little nobody...

It would be great if someone champions this and start small, but has some
gameplan for extending things beyond maybe just an optimized libc


On Thu, Feb 22, 2024 at 3:08 AM Robert Mustacchi <rm@fingolfin.org> wrote:

> On 2/19/24 12:00, Chris Ridd via illumos-developer wrote:
> >
> >
> >> On 19 Feb 2024, at 18:51, Peter Tribble <peter.tribble@gmail.com>
> wrote:
> >>
> >> I was poking around at /usr/lib/libc, and started to wonder how current
> it is.
> >>
> >> We have 3 "optimized" copies of libc there, with capabilities, and the
> "best"
> >> one is selected by moe to be loopback-mounted over libc:
> >>
> >> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
> >> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
> >> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
> >>
> >> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall
> (ie AMD).
> >> And libc_hwcap3 is the (legacy, I guess) int 0x91.
> >
> > A number of different x86-64 microarchitecture levels have been defined
> by AMD/Intel/Red Hat/SUSE, see
> https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
> >
> > Would it be helpful to use these levels (x86-64-v1, v2, v3, v4) in
> Illumos’s libc_hwcap?
>
> Right now we don't have much here for 64-bit applications. I think the
> thing we need to look at for libc and others is how much of this is
> really about changing the base for the entire thing versus optimizing
> for a couple of particular instructions. There's an alternative approach
> that we use in other libraries around object capabilities.
>
> As an example, there's a single 64-bit libmd.so.1 that is loaded.
> However, if your process has the SHA hardware capability present, then
> the run-time link-editor (rtld) will transparently relocate the SHA
> instruction based version of the SHA256 transform function rather than
> using the normal one. So I think writ large as we're evaluating what we
> want to enable we want to look at where are the tradeoffs in using
> disjoint objects and leveraging the platform identification ala isainfo
> which is the linked bit is similar to versus paying the relocation cost.
> libc doesn't use either method today, but just something for us to think
> about as folks evaluate different optimized versions of these functions
> which is probably where the initial bang for the buck is versus building
> several different ISA-enabled versions.
>
> Robert
>
> ------------------------------------------
> illumos: illumos-developer
> Permalink:
> https://illumos.topicbox.com/groups/developer/T0fbcf192363e1705-M8dbc37a718f7e1d543bccfe8
> Delivery options:
> https://illumos.topicbox.com/groups/developer/subscription
>

[-- Attachment #2: Type: text/html, Size: 7117 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-21 20:21     ` C
@ 2024-02-25 21:55       ` Peter Tribble
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Tribble @ 2024-02-25 21:55 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 5567 bytes --]

On Wed, Feb 21, 2024 at 8:22 PM C via illumos-developer <
developer@lists.illumos.org> wrote:

> As constructively as I can ask -
>

It's worth having the feedback!


> Is there any benchmarks showing that the unoptimized libc functions are
> actually a hotspot?
>

I don't recall seeing anything recently, but do recall that there was a
reasonable
amount of work in this area in the Solaris 10/OpenSolaris days. That's
probably
behind the choices we're currently living with. And remember that the
baseline
target was something like the original pentium pro.


> There's going to be a very very wide range of recent extensions that could
> be enabled, but it's going to be highly CPU specific about
> if/when/where/who/what it should happen.
>

I was trying to go the other way - does it make sense to remove any of the
current 32-bit
libraries, and can we bring the targets closer into alignment, and is there
anything to be
gained by modernising the baseline?

I wouldn't want to get into the situation we had on SPARC at one point
where we had
really a rather large number of variant ISAs being shipped as separate
libraries.


> // AVX512 is a very good example
>
> If moe is extended to have a much better understanding about CPU profiles
> and not just extension capability it may make sense.
>
>
> In terms of places I could see this potentially have more impact..
> libm and libmv (it's been so long forgive me I'm mistaken)
>
> (isn't the mv the vector version of the math functions..)
>
> Assuming that malloc isn't hand tuned assembly this is also going to
> certainly have impact, but again is CPU specific
>
> Stepping back for a minute will be the compiler and toolchain support. I
> have no idea if newer gcc is better (performance perspective), but this
> could be the biggest limiting factor in terms of performance impact from
> enabling anything.
>
> Let's say you enable all these fancy extensions, but gcc is tuned for
> Intel heuristics and you're running AMD.
>
> (I don't remember previous generations of gcc caring too much about CPU
> specific stuff)
> // I have been out of the compiler world for more than a few years now and
> maybe things have changed..
>
> I'm just a little nobody...
>
> It would be great if someone champions this and start small, but has some
> gameplan for extending things beyond maybe just an optimized libc
>
>
> On Thu, Feb 22, 2024 at 3:08 AM Robert Mustacchi <rm@fingolfin.org> wrote:
>
>> On 2/19/24 12:00, Chris Ridd via illumos-developer wrote:
>> >
>> >
>> >> On 19 Feb 2024, at 18:51, Peter Tribble <peter.tribble@gmail.com>
>> wrote:
>> >>
>> >> I was poking around at /usr/lib/libc, and started to wonder how
>> current it is.
>> >>
>> >> We have 3 "optimized" copies of libc there, with capabilities, and
>> the  "best"
>> >> one is selected by moe to be loopback-mounted over libc:
>> >>
>> >> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
>> >> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
>> >> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
>> >>
>> >> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall
>> (ie AMD).
>> >> And libc_hwcap3 is the (legacy, I guess) int 0x91.
>> >
>> > A number of different x86-64 microarchitecture levels have been defined
>> by AMD/Intel/Red Hat/SUSE, see
>> https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
>> >
>> > Would it be helpful to use these levels (x86-64-v1, v2, v3, v4) in
>> Illumos’s libc_hwcap?
>>
>> Right now we don't have much here for 64-bit applications. I think the
>> thing we need to look at for libc and others is how much of this is
>> really about changing the base for the entire thing versus optimizing
>> for a couple of particular instructions. There's an alternative approach
>> that we use in other libraries around object capabilities.
>>
>> As an example, there's a single 64-bit libmd.so.1 that is loaded.
>> However, if your process has the SHA hardware capability present, then
>> the run-time link-editor (rtld) will transparently relocate the SHA
>> instruction based version of the SHA256 transform function rather than
>> using the normal one. So I think writ large as we're evaluating what we
>> want to enable we want to look at where are the tradeoffs in using
>> disjoint objects and leveraging the platform identification ala isainfo
>> which is the linked bit is similar to versus paying the relocation cost.
>> libc doesn't use either method today, but just something for us to think
>> about as folks evaluate different optimized versions of these functions
>> which is probably where the initial bang for the buck is versus building
>> several different ISA-enabled versions.
>>
>> Robert
>>
>> ------------------------------------------
>> illumos: illumos-developer
>> Permalink:
>> https://illumos.topicbox.com/groups/developer/T0fbcf192363e1705-M8dbc37a718f7e1d543bccfe8
>> Delivery options:
>> https://illumos.topicbox.com/groups/developer/subscription
>>
> *illumos <https://illumos.topicbox.com/latest>* / illumos-developer / see
> discussions <https://illumos.topicbox.com/groups/developer> + participants
> <https://illumos.topicbox.com/groups/developer/members> + delivery options
> <https://illumos.topicbox.com/groups/developer/subscription> Permalink
> <https://illumos.topicbox.com/groups/developer/T0fbcf192363e1705-M046aae4bac857a3906c0368e>
>


-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

[-- Attachment #2: Type: text/html, Size: 9181 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [developer] libc_hwcap
  2024-02-19 20:00 ` [developer] libc_hwcap Chris Ridd
  2024-02-21 20:07   ` Robert Mustacchi
@ 2024-04-06 14:30   ` Peter Tribble
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Tribble @ 2024-04-06 14:30 UTC (permalink / raw)
  To: illumos-developer

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Mon, Feb 19, 2024 at 8:01 PM Chris Ridd via illumos-developer <
developer@lists.illumos.org> wrote:

>
>
> On 19 Feb 2024, at 18:51, Peter Tribble <peter.tribble@gmail.com> wrote:
>
> I was poking around at /usr/lib/libc, and started to wonder how current it
> is.
>
> We have 3 "optimized" copies of libc there, with capabilities, and the
> "best"
> one is selected by moe to be loopback-mounted over libc:
>
> libc_hwcap1.so.1 [SSE MMX CMOV SEP FPU]
> libc_hwcap2.so.1 [SSE2 SSE MMX CMOV AMD_SYSC FPU]
> libc_hwcap3.so.1 [SSE MMX CMOV FPU]
>
> Ok, so hwcap1 means sysenter/sysexit (ie Intel), and hwcap2 is syscall (ie
> AMD).
> And libc_hwcap3 is the (legacy, I guess) int 0x91.
>
>
> A number of different x86-64 microarchitecture levels have been defined by
> AMD/Intel/Red Hat/SUSE, see
> https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
>
> Would it be helpful to use these levels (x86-64-v1, v2, v3, v4) in
> Illumos’s libc_hwcap?
>
> Chris
> *illumos <https://illumos.topicbox.com/latest>* / illumos-developer / see
> discussions <https://illumos.topicbox.com/groups/developer> + participants
> <https://illumos.topicbox.com/groups/developer/members> + delivery options
> <https://illumos.topicbox.com/groups/developer/subscription> Permalink
> <https://illumos.topicbox.com/groups/developer/T0fbcf192363e1705-M7a88e032a3adb0950ab57a71>
>

Well that's for 64-bit, where we don't really differentiate much at all.
But I think that
if we were to have differentiation - and in particular if we wanted to bump
the baseline -
the following established precedent makes sense.

One thing that's related is the discussion in Pale Moon about the supported
baseline, and Pale Moon will shortly require AVX by default

https://forum.palemoon.org/viewtopic.php?f=5&t=30909

My approach for Tribblix is to retain SSE2 as the baseline - not because I
have
a fetish for older hardware (although I do have a reasonable amount of
that), but
simply because I don't like the idea that the OS will boot but some
applications
will mysteriously fail to run because they have different requirements.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

[-- Attachment #2: Type: text/html, Size: 3918 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-04-06 14:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-19 18:51 libc_hwcap Peter Tribble
2024-02-19 20:00 ` [developer] libc_hwcap Chris Ridd
2024-02-21 20:07   ` Robert Mustacchi
2024-02-21 20:21     ` C
2024-02-25 21:55       ` Peter Tribble
2024-04-06 14:30   ` Peter Tribble
2024-02-21 20:01 ` Robert Mustacchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).