mailing list of musl libc
 help / color / Atom feed
* [musl] PPC64(LE) support in musl requires ALTIVEC
@ 2020-02-03 14:16 Romain Naour
  2020-02-03 14:50 ` Rich Felker
  0 siblings, 1 reply; 64+ messages in thread
From: Romain Naour @ 2020-02-03 14:16 UTC (permalink / raw)
  To: musl; +Cc: Vincent Fazio

Hi,

We noticed that musl toolchain doesn't work on PPC64 e5500 cpus due to
Altived instructions in src/setjmp/powerpc64/setjmp.s [1].

A patch has been sent by Vincent Fazio to the Buildroot mailing list to disable
musl for such cpus [2].

Maybe the supported-platforms list could be updated [3].

[1] https://git.musl-libc.org/cgit/musl/tree/src/setjmp/powerpc64/setjmp.s#n74
[2] http://patchwork.ozlabs.org/patch/1231986/
[3] https://wiki.musl-libc.org/supported-platforms.html

Best regards,
Romain

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] PPC64(LE) support in musl requires ALTIVEC
  2020-02-03 14:16 [musl] PPC64(LE) support in musl requires ALTIVEC Romain Naour
@ 2020-02-03 14:50 ` Rich Felker
  2020-02-03 15:42   ` Romain Naour
                     ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Rich Felker @ 2020-02-03 14:50 UTC (permalink / raw)
  To: musl

On Mon, Feb 03, 2020 at 03:16:17PM +0100, Romain Naour wrote:
> Hi,
> 
> We noticed that musl toolchain doesn't work on PPC64 e5500 cpus due to
> Altived instructions in src/setjmp/powerpc64/setjmp.s [1].
> 
> A patch has been sent by Vincent Fazio to the Buildroot mailing list to disable
> musl for such cpus [2].
> 
> Maybe the supported-platforms list could be updated [3].
> 
> [1] https://git.musl-libc.org/cgit/musl/tree/src/setjmp/powerpc64/setjmp.s#n74
> [2] http://patchwork.ozlabs.org/patch/1231986/
> [3] https://wiki.musl-libc.org/supported-platforms.html

Is this like the 32-bit Freescale things with the weird alternate FPU?
We support those for ppc32 as soft-float (and AIUI the ABI for use
with the FPU matches soft-float ABI, so in theory it could be
supported but we were never clear on whether it's IEEE-conforming) but
I wasn't aware of anything like that for 64-bit so it was never added.
Assuming it's the same concept, I don't see a reason we couldn't add
it.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] PPC64(LE) support in musl requires ALTIVEC
  2020-02-03 14:50 ` Rich Felker
@ 2020-02-03 15:42   ` Romain Naour
  2020-02-03 16:02     ` Jeffrey Walton
  2020-02-03 16:51   ` A. Wilcox
  2020-02-05  1:32   ` [musl] Considering x86-64 fenv.s to C Damian McGuckin
  2 siblings, 1 reply; 64+ messages in thread
From: Romain Naour @ 2020-02-03 15:42 UTC (permalink / raw)
  To: musl, Rich Felker

Hi,

Le 03/02/2020 à 15:50, Rich Felker a écrit :
> On Mon, Feb 03, 2020 at 03:16:17PM +0100, Romain Naour wrote:
>> Hi,
>>
>> We noticed that musl toolchain doesn't work on PPC64 e5500 cpus due to
>> Altived instructions in src/setjmp/powerpc64/setjmp.s [1].
>>
>> A patch has been sent by Vincent Fazio to the Buildroot mailing list to disable
>> musl for such cpus [2].
>>
>> Maybe the supported-platforms list could be updated [3].
>>
>> [1] https://git.musl-libc.org/cgit/musl/tree/src/setjmp/powerpc64/setjmp.s#n74
>> [2] http://patchwork.ozlabs.org/patch/1231986/
>> [3] https://wiki.musl-libc.org/supported-platforms.html
> 
> Is this like the 32-bit Freescale things with the weird alternate FPU?

Indeed, it seems that the only one ppc64 cpu without Altivec support.

> We support those for ppc32 as soft-float (and AIUI the ABI for use
> with the FPU matches soft-float ABI, so in theory it could be
> supported but we were never clear on whether it's IEEE-conforming) but
> I wasn't aware of anything like that for 64-bit so it was never added.
> Assuming it's the same concept, I don't see a reason we couldn't add
> it.

I'm not sure it worth the effort, I don't have a particular interest on this
cpu. I'm fine if this case is not supported.
The issue was discovered by toolchain-builder [1] project (a Buildroot side
project for building cross toolchains) due to issue while testing the system
with Qemu.

If upcoming version of musl support this cpu, we will re-enable musl for it.

[1] https://toolchains.bootlin.com

Best regards,
Romain

> 
> Rich
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] PPC64(LE) support in musl requires ALTIVEC
  2020-02-03 15:42   ` Romain Naour
@ 2020-02-03 16:02     ` Jeffrey Walton
  2020-02-03 16:18       ` David Edelsohn
  0 siblings, 1 reply; 64+ messages in thread
From: Jeffrey Walton @ 2020-02-03 16:02 UTC (permalink / raw)
  To: musl

On Mon, Feb 3, 2020 at 10:43 AM Romain Naour <romain.naour@gmail.com> wrote:
>
> Le 03/02/2020 à 15:50, Rich Felker a écrit :
> > On Mon, Feb 03, 2020 at 03:16:17PM +0100, Romain Naour wrote:
> >> Hi,
> >>
> >> We noticed that musl toolchain doesn't work on PPC64 e5500 cpus due to
> >> Altived instructions in src/setjmp/powerpc64/setjmp.s [1].
> >>
> >> A patch has been sent by Vincent Fazio to the Buildroot mailing list to disable
> >> musl for such cpus [2].
> >>
> >> Maybe the supported-platforms list could be updated [3].
> >>
> >> [1] https://git.musl-libc.org/cgit/musl/tree/src/setjmp/powerpc64/setjmp.s#n74
> >> [2] http://patchwork.ozlabs.org/patch/1231986/
> >> [3] https://wiki.musl-libc.org/supported-platforms.html
> >
> > Is this like the 32-bit Freescale things with the weird alternate FPU?
>
> Indeed, it seems that the only one ppc64 cpu without Altivec support.

I'm not sure if it matters or how it relates here, but I believe IBM
compilers (XL C and XL C++) do not allow Altivec until Power5. I think
the decision was more policies and procedures, though. Power5 was the
first ABI specification from the OpenPower group.

Other compilers supported Altivec earlier then Power5. For example,
Apple's GCC supported Altivec at least as early as Power3. And I have
a PowerMac that is Power4 and Altivec works fine.

Jeff

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] PPC64(LE) support in musl requires ALTIVEC
  2020-02-03 16:02     ` Jeffrey Walton
@ 2020-02-03 16:18       ` David Edelsohn
  0 siblings, 0 replies; 64+ messages in thread
From: David Edelsohn @ 2020-02-03 16:18 UTC (permalink / raw)
  To: musl

On Mon, Feb 3, 2020 at 11:04 AM Jeffrey Walton <noloader@gmail.com> wrote:
>
> On Mon, Feb 3, 2020 at 10:43 AM Romain Naour <romain.naour@gmail.com> wrote:
> >
> > Le 03/02/2020 à 15:50, Rich Felker a écrit :
> > > On Mon, Feb 03, 2020 at 03:16:17PM +0100, Romain Naour wrote:
> > >> Hi,
> > >>
> > >> We noticed that musl toolchain doesn't work on PPC64 e5500 cpus due to
> > >> Altived instructions in src/setjmp/powerpc64/setjmp.s [1].
> > >>
> > >> A patch has been sent by Vincent Fazio to the Buildroot mailing list to disable
> > >> musl for such cpus [2].
> > >>
> > >> Maybe the supported-platforms list could be updated [3].
> > >>
> > >> [1] https://git.musl-libc.org/cgit/musl/tree/src/setjmp/powerpc64/setjmp.s#n74
> > >> [2] http://patchwork.ozlabs.org/patch/1231986/
> > >> [3] https://wiki.musl-libc.org/supported-platforms.html
> > >
> > > Is this like the 32-bit Freescale things with the weird alternate FPU?
> >
> > Indeed, it seems that the only one ppc64 cpu without Altivec support.
>
> I'm not sure if it matters or how it relates here, but I believe IBM
> compilers (XL C and XL C++) do not allow Altivec until Power5. I think
> the decision was more policies and procedures, though. Power5 was the
> first ABI specification from the OpenPower group.
>
> Other compilers supported Altivec earlier then Power5. For example,
> Apple's GCC supported Altivec at least as early as Power3. And I have
> a PowerMac that is Power4 and Altivec works fine.

The PPC64LE ELFv2 ABI specifies Power8 with VSX as the minimum ISA.
Musl is free to ignore that.

The rest of the discussion above about processor support for Altivec
is wrong, but correcting that is off topic.

Thanks, David

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] PPC64(LE) support in musl requires ALTIVEC
  2020-02-03 14:50 ` Rich Felker
  2020-02-03 15:42   ` Romain Naour
@ 2020-02-03 16:51   ` A. Wilcox
  2020-02-05  1:32   ` [musl] Considering x86-64 fenv.s to C Damian McGuckin
  2 siblings, 0 replies; 64+ messages in thread
From: A. Wilcox @ 2020-02-03 16:51 UTC (permalink / raw)
  To: musl

[-- Attachment #1.1: Type: text/plain, Size: 1594 bytes --]

On 03/02/2020 08:50, Rich Felker wrote:
> Is this like the 32-bit Freescale things with the weird alternate FPU?
> We support those for ppc32 as soft-float (and AIUI the ABI for use
> with the FPU matches soft-float ABI, so in theory it could be
> supported but we were never clear on whether it's IEEE-conforming) but
> I wasn't aware of anything like that for 64-bit so it was never added.
> Assuming it's the same concept, I don't see a reason we couldn't add
> it.
> 
> Rich
> 


Yes, it's the exact same thing.

The Freescale e5500 series does not support AltiVec / VMX.  The
Freescale e6500 series, however, does.

The e5500 uses the e600 FPU, which Freescale certified IEEE 754
conformant.  I don't know if the e5500 was ever independently tested.

The QorIQ P50x0s use the e5500 core, as do the BAE radhard machines.
Some of the AmigaOne desktops do as well.

I believe FreeBSD supports the e5500 as 'powerpc64sbe', in line with the
e500v2 being 'powerpcsbe'.  But I cannot reliably confirm that; it's
been a while since I've dealt hands-on with FreeBSD's ppc64 machdep code
and it has had significant churn with the P9 enablement.

For our part, Adélie's ppc64 port requires AltiVec for performance
reasons; we made the explicit choice to not support the e5500 for 1.0.
We haven't had any requests to change this, but that doesn't mean users
aren't out there.  If the change is simple to do, having wider support
is always an admirable action.

Best,
--arw


-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-03 14:50 ` Rich Felker
  2020-02-03 15:42   ` Romain Naour
  2020-02-03 16:51   ` A. Wilcox
@ 2020-02-05  1:32   ` Damian McGuckin
  2 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-02-05  1:32 UTC (permalink / raw)
  To: musl


Sort of style question.  No rush at replying.

As I read it, and I could be wrong, the assumptions on the FENV interface 
is that excepts/exceptions fit into an int. MUSL takes this further and 
then assumes that these are in the 31 least significant bits of floating 
point exception registers.

That works for all known architectures, including Sparc and Itanium.

Mind you, an 'fexcept_t' may be an unsigned long but it is still just an 
image of the status register and no architecture has anything which is of 
interest to FENV stuck up there in the sign bit.

This MUSL assumption would appear to also be the case for the control 
register where the rounding bits appear (which more often than not is the 
same register as the status register).

However, the raw bit mask encroaches on the sign bit for a Sparc. And yes, 
MUSL does not support Sparc. But I assume if Sparc did it, some hardware 
designer may try it into the future for some new chip that MUSL does want 
to support (although I see no evidence of that). The user-space rounding 
bits fed to fesetrounding() and retrieved from fegetrounding() for a Sparc 
are small integers which are shifted into (and out of) their bit position 
within the register in a BSD implementation. I have not figured out how
OpenSolaris/Illumnos does it.

Now, I

a)	Normally avoid using signed quantities when working with bit
 	masks as such handling was not always predictable in the past. So
 	using signed quantities for bit operations is not something with
 	which I have loads of experience.

b)	Am trying to use consistent processing for the status and control
 	registers because they are often one and the same. And I also want
 	to cover all eventualities.

Do I just stick with working with (signed) int's as MUSL does currently or
do I try and make this generic code a reference implementation that goes 
beyond usefulness in just a Linux environment and the architectures that 
MUSL does support. Is this overcomplicating this task?

Thoughts anybody? - Damian

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-22 20:17                                         ` Rich Felker
@ 2020-02-22 20:53                                           ` Damian McGuckin
  0 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-02-22 20:53 UTC (permalink / raw)
  To: musl

On Sat, 22 Feb 2020, Rich Felker wrote:

> First comment: I couldn't find (maybe I missed?) what you intend fore
> the contents of fenv-generic.c and fenv-trivial.c to be, but I don't
> see what you want them for. fenv.c should just use the macros/inlines
> the fenv_arch.h defines, naturally collapsing to empty functions when
> they do nothing (for softfloat archs).

I will repost the current state later this week.  Moving house this week 
so it might take a bit longer!
>
> In general, the idiom of #include "../foo.c" for $(ARCH)/foo.c only
> works when the names are the same, so that the arch file is replacing
> the one in the parent dir, and there are special cases (subarch
> variants) where you want the arch dir to just do the same thing as the
> generic file in the parent dir does, or where it's doing approximately
> the same thing but with some macros defined to change what it does.
> The way you seem to be trying to do it, src/fenv/fenv-generic.c,
> src/fenv/fenv-trivial.c, and src/fenv/$(ARCH)/fenv.c would all be
> getting built.

Understand. I will think about what I am trying to do. Open to advice.

> A couple minor style things I also spotted right away: If you want an
> unsigned 0 constant, 0U is very much preferred to ((unsigned)0), and
> not putting the type there at all is preferable unless there's a good
> reason you need it (like to make other things in expressions get
> coerced to unsigned). Also verbose type names like "unsigned int" in
> place of just "unsigned" are not preferred. If they appear in some
> existing places in the source, it's by mistake/oversight.

OK.

> I'll follow up with more interesting technical review soon.

Thanks - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-03  2:12                                       ` Damian McGuckin
@ 2020-02-22 20:17                                         ` Rich Felker
  2020-02-22 20:53                                           ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-02-22 20:17 UTC (permalink / raw)
  To: musl

On Mon, Feb 03, 2020 at 01:12:35PM +1100, Damian McGuckin wrote:
> On Sun, 2 Feb 2020, Rich Felker wrote:
> 
> >Hi. Just checking in to let you know I'm not ignoring you, but
> >might need a bit to task-switch back to following up on this. I'll
> >try to do it in parallel with finishing up the release cycle.
> 
> No rush. I have to fight other fires for most of this week anyway.

OK, I'm going to start review and follow-up now. Sorry it's taken so
long. Back-and-forth over time64 concerns in applications & testing,
documenting time64 transition for users, etc. took a lot longer than I
had hoped but it's done now. :-)

First comment: I couldn't find (maybe I missed?) what you intend fore
the contents of fenv-generic.c and fenv-trivial.c to be, but I don't
see what you want them for. fenv.c should just use the macros/inlines
the fenv_arch.h defines, naturally collapsing to empty functions when
they do nothing (for softfloat archs).

In general, the idiom of #include "../foo.c" for $(ARCH)/foo.c only
works when the names are the same, so that the arch file is replacing
the one in the parent dir, and there are special cases (subarch
variants) where you want the arch dir to just do the same thing as the
generic file in the parent dir does, or where it's doing approximately
the same thing but with some macros defined to change what it does.
The way you seem to be trying to do it, src/fenv/fenv-generic.c,
src/fenv/fenv-trivial.c, and src/fenv/$(ARCH)/fenv.c would all be
getting built.

A couple minor style things I also spotted right away: If you want an
unsigned 0 constant, 0U is very much preferred to ((unsigned)0), and
not putting the type there at all is preferable unless there's a good
reason you need it (like to make other things in expressions get
coerced to unsigned). Also verbose type names like "unsigned int" in
place of just "unsigned" are not preferred. If they appear in some
existing places in the source, it's by mistake/oversight.

I'll follow up with more interesting technical review soon.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-07 21:38                                     ` Rich Felker
  2020-02-07 23:53                                       ` Damian McGuckin
@ 2020-02-09  5:04                                       ` Damian McGuckin
  1 sibling, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-02-09  5:04 UTC (permalink / raw)
  To: musl


The overall generic design is that each architecture has its own

 	fenv.c

file and

a)	if it is soft float, they pull in the trivial fenv.c file
 	currently in ../fenv (which I will call trivial.c) as

 		#include "../trivial.c"

b)	otherwise,

 	i) most architectures, except Intel then pull in two files

 		#include "../regular.c"

 		#include "../generic.c"

 	ii) Intel architectures pull in solely

 		#include "../generic.c"

All decision making logic is in "generic.c". There is no decision making 
in the "regular.c" file.

There is no decision making in the "fenv.c" file which is largely embedded 
assembler and a few definitions EXCEPT or

a	Intel i386 basically trying to figure out whether it has XMM
 	instructions and trying to optimize the use of clearing the
 	exceptions or writing to the environment.

b)	PowerPC where it has to figure out whether to augment the
 	exception mask in the case of the INVALID exception.

The i386, x32 and x86_64 will only use "generic.c" but not "regular.c". 
They need extend their own "fenv.c" file as this architecture is just too 
different.

The code within "generic.c" file will, when it is asked to raise a single 
excepion, do so arithmetically irrespective of architecture. For more than 
one exception, the use of the status register is, at this stage of the 
design process, deemed more cost effective. This is only done for 'raise'. 
It does not try and optimize 'feclearexcept' or 'fesetexceptlags'. It just 
gets too messy.

There are 2 #define's in "regular.c" to cater for a SPARC into the future. 
If MUSL will never support this architecture, six lines can be dropped and 
two lines simpliied.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-07 21:38                                     ` Rich Felker
@ 2020-02-07 23:53                                       ` Damian McGuckin
  2020-02-09  5:04                                       ` Damian McGuckin
  1 sibling, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-02-07 23:53 UTC (permalink / raw)
  To: musl


Merging for i386.

Comments?

fegetenv(fenv_t *e)

 	as per i387 assembler instruction - fnstenv
 	+ merge MXCSR exception bits into e-> __status_word exception bits

fesetenv (fenv_t *e)
 	as per i387 assembler instruction - fldenv
 	+ merge e->__control_word rounding bits into MXCSR rounding bits
 	+ clear MXCSR exception bits

feupdateenv(fenv_t *e)

 	merge existing MXCSR and X87 exception bits  e->__status_word
 	fesetenv(e) as above

feholdexcept(fenv_t)

 	fegetenv(e) as above
 	clear MXCSR exception bits
 	clear X87 exception bits

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-07 21:25                                   ` Damian McGuckin
@ 2020-02-07 21:38                                     ` Rich Felker
  2020-02-07 23:53                                       ` Damian McGuckin
  2020-02-09  5:04                                       ` Damian McGuckin
  0 siblings, 2 replies; 64+ messages in thread
From: Rich Felker @ 2020-02-07 21:38 UTC (permalink / raw)
  To: musl

On Sat, Feb 08, 2020 at 08:25:43AM +1100, Damian McGuckin wrote:
> 
> In considering an i386 architecture with XMM capability, there is an issue
> with fegetenv()/feholdexcept() and fesetenv()/feupdateenv().
> 
> There is no __mxcsr field in the fenv_t structure.
> 
> So, there is no way to return the MXCSR using the first pair.
> 
> Do I just bury any MXCSR exceptions into the __status_word?
> 
> Regards - Damian

There's plenty of unused space to tuck it away somewhere if we wanted,
but I don't know if it's actually useful to do so. I'm fine with just
merging the states like the other fenv functions do.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (8 preceding siblings ...)
  2020-02-03  0:54                                   ` Damian McGuckin
@ 2020-02-07 21:25                                   ` Damian McGuckin
  2020-02-07 21:38                                     ` Rich Felker
  9 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-02-07 21:25 UTC (permalink / raw)
  To: musl


In considering an i386 architecture with XMM capability, there is an issue
with fegetenv()/feholdexcept() and fesetenv()/feupdateenv().

There is no __mxcsr field in the fenv_t structure.

So, there is no way to return the MXCSR using the first pair.

Do I just bury any MXCSR exceptions into the __status_word?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-03  2:09                                     ` Rich Felker
@ 2020-02-03  2:12                                       ` Damian McGuckin
  2020-02-22 20:17                                         ` Rich Felker
  0 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-02-03  2:12 UTC (permalink / raw)
  To: musl

On Sun, 2 Feb 2020, Rich Felker wrote:

> Hi. Just checking in to let you know I'm not ignoring you, but might 
> need a bit to task-switch back to following up on this. I'll try to do 
> it in parallel with finishing up the release cycle.

No rush. I have to fight other fires for most of this week anyway.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-02-03  0:54                                   ` Damian McGuckin
@ 2020-02-03  2:09                                     ` Rich Felker
  2020-02-03  2:12                                       ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-02-03  2:09 UTC (permalink / raw)
  To: musl

On Mon, Feb 03, 2020 at 11:54:52AM +1100, Damian McGuckin wrote:
> 
> Latest Implementation Document is attached.
> 
> Comments need reviewing and clarification.
> 
> Style issues will need to be addressed.
> 
> This is meant for for ARM with hardware floating point, any MIPS,
> RISC-V, M68K, Hitachi SuperH, IBM S390x, Powerpc64, and (in theory)
> Sparc.
> 
> The version to handle Intel i386, x32 and x86-64 will be very different.

Hi. Just checking in to let you know I'm not ignoring you, but might
need a bit to task-switch back to following up on this. I'll try to do
it in parallel with finishing up the release cycle.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (7 preceding siblings ...)
  2020-02-02  0:47                                   ` Damian McGuckin
@ 2020-02-03  0:54                                   ` Damian McGuckin
  2020-02-03  2:09                                     ` Rich Felker
  2020-02-07 21:25                                   ` Damian McGuckin
  9 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-02-03  0:54 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: TEXT/PLAIN, Size: 595 bytes --]


Latest Implementation Document is attached.

Comments need reviewing and clarification.

Style issues will need to be addressed.

This is meant for for ARM with hardware floating point, any MIPS, RISC-V, 
M68K, Hitachi SuperH, IBM S390x, Powerpc64, and (in theory) Sparc.

The version to handle Intel i386, x32 and x86-64 will be very different.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

[-- Attachment #2: Type: TEXT/plain, Size: 9158 bytes --]

/*
 * these lines are here to allow the compilation of this file
 * in isolation and the lint'ing of the same with 'splint'!!
 */
#include	<assert.h>

#define TEST

#ifdef TEST

#include "../../../arch/powerpc64/bits/fenv.h"

typedef unsigned int fe_csr_t;

extern unsigned int fe_get_sr();
extern void fe_set_sr(unsigned int);
extern unsigned int fe_get_cr();
extern void fe_set_cr(unsigned int);
extern double fe_get_csr_f();
extern void fe_set_csr_f(double);
extern void fe_get_env(fenv_t *e);
extern void fe_set_env(fenv_t *e);
extern void fe_set_env_but_keep_raised_excepts(fenv_t *e);
extern void fe_get_env_and_clear_all_excepts(fenv_t *e);
extern void fescrubexcept(void);
extern fenv_t fe_dfl_env;
extern int FE_ROUNDING_TYPE(int);

#endif
\f
/*
 * fenv-generic.c (use 'fenv-trivial.v' for soft-float environments):
 *
 * All architectures which fit the generic mould are assumed to have a status
 * register (sr) which contains the exception flags are stored and a control
 * register (cr) which contains the rounding mode. These registers may be one
 * and the same or they may be distinct. All architecture-dependent features
 * live in an individual 'fenv.c' file which contains function definitions,
 * mainly of one-line embedded assembler, declarations, macros and typedef's:
 *
 * a) the type 'fe_csr_t' which can store both 'sr' and 'cr',
 *    - this is an unsigned int on all known architectures
 * b) routines to copy 'sr' and 'cr' into memory, respectively
 *    - fe_csr_ fe_get_sr() and fe_csr_t fe_get_cr()
 * c) routines to load 'sr' and 'cr' from memory, respectively
 *    - void fe_set_sr(fe_csr_t) and void fe_set_cr(fe_csr_t) 
 * d) routines to copy and load an architecture's floating point environment, 
 *    - fe_get_env(fenv_t *) and fe_set_env(fenv_t *) respectively
 *    - 'fenv_t' is defined in an architectures own 'fenv.h' file
 * e) a routine to reset an architecture's floating point environment
 *    - like fe_set_env(fenv_t *) but currently raised exceptions stay raised
 * f) a routine to clone an architecture's floating point environment
 *    - like fe_get_env(fenv_t *) but currently raised exceptions get cleared
 * g) a macro yielding a ratified exception list during a CLEAR operation
 *    - FE_RATIFY_CLEAR_EXCEPTS
 * h) a macro yielding a ratified exception list during a RAISE operation
 *    - FE_RATIFY_RAISE_EXCEPTS
 * i) macros which maps from the rounding mode to a rounding style
 *    - FE_ROUNDING_STYLE
 * j) a (macro) bit-mask of all legal rounding modes
 *    - FE_ROUNDING_MASK
 *    - it returns the value to be returned by the FLT_ROUNDS macro
 * k) an initialization for the default fenv_t
 *    - i.e. static const fenv_t fe_dfl_env = .....
 *
 * (k) is an implicitly hidden, IEEE 754 default floating point environment in
 * which all exceptions are clear, non-stop (or continue on exception) mode is
 * active, and rounding to nearest will be used for all arithmetic.
 *
 * For most architectures, the macros (g) and (h) can be allowed to default
 * to the handling done by the routine 'fe_ratified_excepts'. This routine
 * reviews all the exceptions implicit in the mask 'excepts' fed to all the
 * 'fe*' routines, discarding any inappropriate bits. In this case, these
 * macros are defined automatically as seen below. Only one architecture,
 * the PowerPC, does not fit this mould, and its definitions are explained
 * in detail in that architecture's own 'fenv.c'.
 *
 * If the architecture-specific 'fenv.c' file does not define the mask (j),
 * all four IEEE 754 rounding modes are assumed to exist, a mask which is
 * the OR'd combinations of those modes will be automatically created, as
 * well as the macro (i) which maps the machine-dependent rounding mode of
 * the control register to the machine-independent rounding style which is
 * provided by FLT_ROUNDS.  If all four modes are not available, this mask
 * and macro must be provided in the architecture-specific 'fenv.c' file.
 * Apart from architectures with soft floating point, only the Hitachi
 * SuperH RISC chip (sh) needs such definitions.
 *
 * After all these definitions and declarations, the architectures-specific
 * file must then also include this file, i.e. 'fenv-generic,c' following
 * current practice as
 *
 *	#include	"../fenv-generic.c"
 */
\f
#ifndef	FE_RATIFIED_RAISE_EXCEPTS
#define	FE_RATIFIED_RAISE_EXCEPTS(e)	fe_ratified_excepts(e)
#endif
#ifndef	FE_RATIFIED_CLEAR_EXCEPTS
#define	FE_RATIFIED_CLEAR_EXCEPTS(e)	fe_ratified_excepts(e)
#endif
/*
 * take a bit mask which should be just an OR'd combination of IEEE 754
 * exceptions and discard any bit which is not one of those exceptions.
 */
static inline int
fe_ratified_excepts(int excepts)
{
	return excepts & FE_ALL_EXCEPT;
}

#ifndef	FE_ROUNDING_MASK
#define	FE_ROUNDING_MASK	(FE_TOWARDZERO|FE_DOWNWARD|FE_UPWARD)
/*
 * Map an architecture-dependent rounding mode to that needed by FLT_ROUNDS.
 * There are better ways to do this but none of those reads as well as this.
 */
static inline int
fe_rounding_style(int rounding)
{
	return
	(
		rounding == FE_DOWNWARD
	?
		3
	:
		rounding == FE_UPWARD
	?
		2
	:
		(int) (rounding == FE_TONEAREST)
	);
}
#define	FE_ROUNDING_STYLE(m)	fe_rounding_style(m)
#endif
\f
/*
 * mask out the bits with 'bits' from 'mask' ensuring type compatability
 */
static inline fe_csr_t
fe_clear_bits(fe_csr_t mask, int bits)
{
	return mask & ~((fe_csr_t) bits);
}
/*
 * get the rounding mask
 */
int
fegetround(void)
{
	return (int) (fe_get_cr() & FE_ROUNDING_MASK);
}
/*
 * internals for FLT_ROUNDS
 */
int __flt_rounds()
{
	return FE_ROUNDING_STYLE(fegetround());
}
/*
 * set the rounding mode - returning (-1) if input is not a valid mode
 */
int
fesetround(int rounding_mode)
{
	if ((rounding_mode & FE_ROUNDING_MASK) != rounding_mode)
		return -1;

	fe_set_cr(fe_clear_bits(fe_get_cr(), FE_ROUNDING_MASK) | rounding_mode);
	return 0;
}
\f
/*
 * returns a word in which the bits (which represent IEEE 754 exceptions)
 * are set that were also set in the argument 'excepts' and for which the
 * corresponding exception is currently set. The only legal bits that can
 * be given in the argment are those which correspond to an exception.
 */
int
fetestexcept(int excepts)
{
	return (int) (fe_get_sr() & fe_ratified_excepts(excepts));
}
/*
 * clears the supported exceptions represented by the bits in its argument.
 */
int
feclearexcept(int excepts)
{
	excepts = FE_RATIFIED_CLEAR_EXCEPTS(excepts);
	fe_set_sr(fe_clear_bits(fe_get_sr(), excepts));
	return 0;
}
/*
 * raises the supported exceptions represented by the bits in its argument.
 */
int
feraiseexcept(int excepts)
{
	/*
	 * assume single OP is faster than double OP
	 */
	const float zero = (float) 0;
	const float one = (float) 1;
	const float large = 2.076919e+34; // 2^(+108)
	const float small = 4.814826e-35; // 2^(-108)
	volatile float x;
	/*
	 * if it is just a simple exception, arithmetic expressions are optimal
	 */
	switch(excepts)
	{
	case FE_INVALID:
		x = zero, x /= x;
		break;
	case FE_DIVBYZERO:
		x = zero, x = one / x;
		break;
	case FE_INEXACT:
		x = small, x += one;
		break;
	case FE_OVERFLOW | FE_INEXACT:
		x = large, x *= x; 
		break;
	case FE_UNDERFLOW | FE_INEXACT:
		x = small, x *= x;
		break;
	default:
		/*
		 * if multiple exceptions exist, a sledgehammer is viable
		 */
		excepts = FE_RATIFIED_RAISE_EXCEPTS(excepts);
		fe_set_sr(fe_get_sr() | excepts);
		break;
	}
	return 0;
}
\f
/*
 * copy the complete current exception state into *fp
 */
int
fegetexceptflag(fexcept_t *fp, int excepts)
{
	*fp = (fexcept_t) fetestexcept(excepts);
	return 0;
}
/*
 * load the exception state from *fp, but intelligently avoiding
 * clearing (raising) exceptions which are already cleared (raised).
 */
int
fesetexceptflag(const fexcept_t *fp, int excepts)
{
	const fe_csr_t sr =  fe_get_sr();
	const fe_csr_t fx = (fe_csr_t) *fp;
	const int requested = excepts & FE_ALL_EXCEPT;
	const int excepts_old = (int) (sr & requested);
	const int excepts_new = (int) (fx & requested);
	const int essential = excepts_old ^ excepts_new;
	const int clear = FE_RATIFIED_CLEAR_EXCEPTS(excepts_old & essential);
	const int raise = FE_RATIFIED_RAISE_EXCEPTS(excepts_new & essential);

	if ((clear | raise) != 0)
		fe_set_sr(fe_clear_bits(sr, clear) | raise);
	return 0;
}
/*
 * copy the floating point environment into memory
 */
int
fegetenv(fenv_t *e)
{
	fe_get_env(e);
	return 0;
}
/*
 * load the floating point environment from memory
 */
int
fesetenv(fenv_t *e)
{
	fe_set_env(e != FE_DFL_ENV ? e : &fe_dfl_env);
	return 0;
}
/*
 * as for fesetenv but retain the exception data
 */
int
feupdateenv(fenv_t *e)
{
	fe_set_env_but_keep_raised_excepts(e != FE_DFL_ENV ? e : &fe_dfl_env);
	return 0;
}
/*
 * as for fegetenv but clear the exception states
 */
int
feholdexcept(fenv_t *e)
{
	fe_get_env_and_clear_all_excepts(e);
	return 0;
}

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (6 preceding siblings ...)
  2020-01-27  5:39                                   ` Damian McGuckin
@ 2020-02-02  0:47                                   ` Damian McGuckin
  2020-02-03  0:54                                   ` Damian McGuckin
  2020-02-07 21:25                                   ` Damian McGuckin
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-02-02  0:47 UTC (permalink / raw)
  To: musl


Questions of style:

When working inside 'fenv.c', what should internal routines and
definition be called to avoid namespace pollution, retain clarity,
and so on.

Note that 'fenv.c' is designed to never call any routines external to 
itself.

Do we keep using

 	FE_<SOMETHING>

and
 	fe<SOMETHING>

where <SOMETHING> is some meaningful name,

to conform to the external interface which we assume will never clash.

Or do we use

 	__<SOMETHING>

as is used or

 	__flt_rounds

which is used by FTL_ROUNDS when recovering the current IEEE 754 rounding 
style, as opposed to the architecture dependent one.

Thanks - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (5 preceding siblings ...)
  2020-01-27  3:32                                   ` Damian McGuckin
@ 2020-01-27  5:39                                   ` Damian McGuckin
  2020-02-02  0:47                                   ` Damian McGuckin
                                                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-27  5:39 UTC (permalink / raw)
  To: musl


The PowerPC is ever so close to being generic.

Generic Handling of Floating Point Exceptions (And Causes Thereof)
------------------------------------------------------------------

The PowerPC architecture has the non-standard concept of the cause of an
INVALID exception, the reason why an FE_INVALID exception has occurred.
There are nine such causes, encapsulated in a non-contiguous mask across
nine bits, the OR'd combination of which is the FE_ALL_INVALID macro.
Instructions performing invalid operations (as I read the documentation)
only need to see the CAUSE bit, which implicitly raises the FE_INVALID
exception. So, an FE_INVALID implies the existence of a CAUSE bit, and
vica versa. On a PowerPC, MUSL supports one of those 9 causes, the macro
called FE_INVALID_SOFTWARE (if _GNU_SOURCE is defined) which says that
a FE_INVALID exception has been caused 'by an explicit software request'.

The handling of cause bits is awkward. However they must be addressed as
they can stab the user, well MUSL, in the back if not done right.

An incoming exception list (of bits) to be cleared, raised, or tested, is 
validated by masking it with respect to an mask comprising the net OR'd
combination of all IEEE 754 exceptions. This approach results in any bits 
which are not a valid exception being thrown away. For anything other than 
a PowerPC, this mask is defined as simply

 	#define	FE_EXCEPT_MASK	(FE_ALL_EXCEPT)

It is proposed that for a PowerPC, it is instead defined as

 	#define	FE_EXCEPT_MASK	(FE_ALL_EXCEPT | FE_ALL_INVALID)

*** This change also greatly simplifies the generic nature of handling ***

This also allows the user to interrogate, clear and raise these extra
cause bits on the PowerPC and provides consistent handling. A bonus.

Mind you, I do not yet have a PowerPC64 with which to test any of this,
so I could just be talking through my hat.

Everything needs to be tested!

If the user wants to specify a CAUSE bits, i.e. one within the those
for which FE_ALL_INVALID is a mask, the FE_INVALID_SOFTWARE bit should
probably not be also introduced.

That needs to be tested too!!

Comments, Complaints, Criticism??????

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (4 preceding siblings ...)
  2020-01-26  5:25                                   ` Damian McGuckin
@ 2020-01-27  3:32                                   ` Damian McGuckin
  2020-01-27  5:39                                   ` Damian McGuckin
                                                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-27  3:32 UTC (permalink / raw)
  To: musl


Greatly simplified things.  Will send later today or tomorrow.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (3 preceding siblings ...)
  2020-01-26  3:32                                   ` Damian McGuckin
@ 2020-01-26  5:25                                   ` Damian McGuckin
  2020-01-27  3:32                                   ` Damian McGuckin
                                                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-26  5:25 UTC (permalink / raw)
  To: musl


a)	Simplified ROUND_MASK definition

b)	Eliminated #define's for fe_get_[cs]r
 	- no longer need fe_[sg]et_*_arch routines

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
                                                     ` (2 preceding siblings ...)
  2020-01-26  3:30                                   ` Damian McGuckin
@ 2020-01-26  3:32                                   ` Damian McGuckin
  2020-01-26  5:25                                   ` Damian McGuckin
                                                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-26  3:32 UTC (permalink / raw)
  To: musl


/*
  * mips64 ARCHITECTURE
  */
#include	<fenv.h>

#ifndef __mips_soft_float

static inline unsigned int fe_get_csr_arch(void)
{
 	unsigned int fcsr;

 	__asm__ __volatile__ ("cfc1 %0, $31" : "=r" (fcsr));
 	return fcsr;
}

static inline void fe_set_csr_arch(unsigned int fcsr)
{
 	__asm__ __volatile__ ("ctc1 %0, $31" : : "r" (fcsr));
}

#define	fe_get_e(e)	(e->__cw = fe_get_csr_arch())
#define	fe_set_e(e)	(fe_set_csr_arch(e->__cw))

#define	FE_DFL_ENV_DATA	{ 0 }

#include	"../fenv-generic.c"

#else

#include	"../fenv-trivial.c"

#endif

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
  2020-01-26  3:28                                   ` Damian McGuckin
  2020-01-26  3:28                                   ` Damian McGuckin
@ 2020-01-26  3:30                                   ` Damian McGuckin
  2020-01-26  3:32                                   ` Damian McGuckin
                                                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-26  3:30 UTC (permalink / raw)
  To: musl


PowerPC 64:

/*
  * powerpc64 ARCHITECTURE (tolterable reference is below)
  *
  * http://math-atlas.sourceforge.net/devel/assembly/ppc_isa.pdf
  *
  * PowerPC includes cause (or reason) bits which qualify why an INVALID
  * exception get raised. The cause can be regarded as just an explanation
  * in most cases. In fact, if only the ppropriate cause bit is set in the
  * control register and this is fed to in a 'Move to FPSCR', the INVALID
  * exception will be triggered automatically. The complete list is:
  *
  *		VXSNAN	caused by a floating point operation on a SNAN
  *		VXISI	caused by the operation INFINITY - INFINITY
  *		VXIDI	caused by the operation INFINITY / INFINITY
  *		VXZDZ	caused by the operation 0.0 / 0.0
  *		VXIMZ	caused by the operation INFINITY * 0
  *		VXCVI	caused by an illegal integer conversion
  *		VXVC	caused by an ordered comparison involving a NaN
  *		VXSOFT	caused by an explicit sofware request
  *		VXSQRT	caused by a sqrt/rsqrt on a negative & non-zero number
  *
  * The OR of all such causes is called FE_ALL_INVALID in the API.
  *
  * One above, VXSOFT, is that caused 'By (An Explicit Software) Request'. The
  * documentation says that its purpose is to allow software to cause/raise an
  * Invalid Operation Exception for a condition that is not necessarily
  * associated with the execution of a floating-point instruction. It could
  * even be used by a program to complain about some negative and non-zero
  * operand long BEFORE it gets given to a SQRT or RSQRT imnstruction where
  * it would otherwise invoke the VXSQRT cause and trigger that bit to be set.
  *
  * Note, when an FE_INVALID exception is cleared, all the causes including
  * the ones that other instructions can generate, i.e all of the bits in the
  * FE_ALL_INVALID bit mask, must be similarly also cleared. Otherwise, any
  * CAUSE bit left in the mask will, when fed back into the 'MOVE to FPSCR'
  * instruction (implicit in the 'fe_set_fpscr' routine) will retrigger the
  * FE_INVALID exception. So MUSL must still cater for these non-standard
  * causes, because to not do so will stab itself in the back.
  *
  * So MUSL makes the assumption that at least for a PowerPC, any raising of
  * the INVALID exception must also raise a cause, that cause being by default,
  * that called 'By Explicit Software Request'. That makes sense as this is
  * beind done by the MUSL itself, not as a part, or side-effect, of some
  * arithmetic operation. Note also that if a user has explicitly (violated
  * the standard and) augmented an exception list to contain some other cause,
  * that extra cause gets trashed on entry to the fenv(3) routines and only
  * this default cause is added. The rejection of the benefits of the extra
  * information provided by a PowerPC could be addressed in the future as
  * the fix is trivial!
  *
  * The FE_INVALID_SOFTWARE flag is relabelled as FE_INVALID_BY_REQUEST to
  * correctly explain its purpose and make any use remotely comprehensible.
  * This is questionable in terms of style but without this, the ability of
  * the comments to make sense is seriously compromised. For the same reason,
  * FE_ALL_INVALID is relabelled FE_ALL_INVALID_CAUSES. At some stage, it is
  * recommended that the IBM ISA documentation be enhanced to be more lucid.
  */

#include	<fenv.h>
/*
  * Remember that the floating point environment lives in a 'double' That
  * is locked by the ABI defined in the architecture dependent <fenv.h>.
  *
  * The first routine pair gets and sets that environment for later use by
  * the MUSL mandated pair of fe_get_csr_arch/fe_set_csr_arch routines
  */
static inline double fe_get_fpscr_f(void)
{
 	double fpscr;

 	__asm__ __volatile__ ("mffs %0" : "=d"(fpscr));
 	return fpscr;
}

static inline void fe_set_fpscr_f(double fpscr)
{
 	__asm__ __volatile__ ("mtfsf 255, %0" : : "d"(fpscr));
}
/*
  * This routine pair move tos and from the double (of the environment) into
  * MUSL's unsigned int object in which it stores control & status registers
  */
static inline unsigned int fe_get_csr_arch(void)
{
 	return (unsigned int) (union {double f; long i;}) {fe_get_fpscr_f()}.i;
}

static inline void fe_set_csr_arch(unsigned int fpscr)
{
 	fe_set_fpscr_f((union {long i; double f;}) {(long) fpscr}.f);
}
/*
  * These two macros work with the environment directly
  * Should they eventually move to being inline routines?
  */
#define	fe_get_e(e)	(*e = fe_get_fpscr_f())
#define	fe_set_e(e)	(fe_set_fpscr_f(*e))
/*
  * the default value of a floating point environment of type fenv_t (double)
  */
#define	FE_DFL_ENV_DATA	0.0
/*
  * Eludicate the macros related to CAUSEs for improved readability.
  * While this is tolerable at the architecture definition stage, it
  * may need to change in the production code.
  */
#define	FE_ALL_INVALID_CAUSES	FE_ALL_INVALID
#define	FE_INVALID_BY_REQUEST	FE_INVALID_SOFTWARE

#include	"../fenv-generic.c"

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
  2020-01-26  3:28                                   ` Damian McGuckin
@ 2020-01-26  3:28                                   ` Damian McGuckin
  2020-01-26  3:30                                   ` Damian McGuckin
                                                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-26  3:28 UTC (permalink / raw)
  To: musl


M68K is below:

I looked inside 'fenv.c' inside the .../fenv/arm directory and noticed
it pulled in a default trivial 'fenv.c' from the directory above it.

In the absence of an alternative, I might exploit that but say

 	#include	"../fenv-trivial.c"

for that trivial case and

 	#include	"../fenv-generic.c"

for the new generic architecture.

Apologizing in advance for insuficient comments, but the hardware specific 
fenv.c for (say) the m68k is

/*
  * 68k ARCHITECTURE
  *
  * this CPU has distinct control and status registers
  */
#include	<fenv.h>

#if __HAVE_68881 || __mcffpu__

static inline unsigned int fe_get_sr_arch()
{
 	unsigned int v;

 	__asm__ __volatile__ ("fmove.l %%fpsr,%0" : "=dm"(v));
 	return v;
}

static inline void fe_set_sr_arch(unsigned v)
{
 	__asm__ __volatile__ ("fmove.l %0,%%fpsr" : : "dm"(v));
}

static inline unsigned int fe_get_cr_arch()
{
 	unsigned int v;

 	__asm__ __volatile__ ("fmove.l %%fpcr,%0" : "=dm"(v));
 	return v;
}

static inline void fe_set_cr_arch(unsigned v)
{
 	__asm__ __volatile__ ("fmove.l %0,%%fpcr" : : "dm"(v));
}

static inline unsigned int fe_get_ia_arch()
{
 	unsigned int address;

 	__asm__ __volatile__ ("fmove.l %%fpiar,%0" : "=dm"(address));
 	return address;
}

static inline void fe_set_ia_arch(unsigned int address)
{
 	__asm__ __volatile__ ("fmove.l %0,%%fpiar" : : "dm"(address));
}

/*
  * Expose the above generically
  */
#define	fe_get_sr	fe_get_sr_arch
#define	fe_set_sr	fe_set_sr_arch
#define	fe_get_cr	fe_get_cr_arch
#define	fe_set_cr	fe_set_cr_arch

/*
  * Handle the environment (which is a struct)
  */
#define	fe_get_e(e)\
 		((e)->__control_register = fe_get_cr_arch(),\
 		(e)->__status_register = fe_get_sr_arch(),\
 		(e)->__archnstruction_address = fe_get_ia_arch())
#define	fe_set_e(e)\
 		(fe_set_cr_arch((e)->__control_register),\
 		fe_set_sr_arch((e)->__status_register),\
 		fe_set_ia_arch((e)->__instruction_address))

/* the contents of the default fenv_t  */

#define	FE_DFL_ENV_DATA	{ 0, 0, 0 }

#include	"../fenv-generic.c"

#else

#include	"../fenv-trivial.c"

#endif


Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-25  0:11                                 ` Rich Felker
@ 2020-01-26  3:28                                   ` Damian McGuckin
  2020-01-26  3:28                                   ` Damian McGuckin
                                                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-26  3:28 UTC (permalink / raw)
  To: musl


Latest below - Apologies if I missed anything about which we spoke.

*	Lots more comments about architecture.

*	better explanation about dealing with those PowerPC issues.

*	some naming is improved

I will shortly post some guesses at what I think some hardware specific 
versions of fenv.c need to be.

Not sure how best to handle these because the way used currently is to 
pull in a template from directory above as seen in

 	.../musl-1.1.24/src/fenv/arm/fenv.c

which just does

 	#include "../fenv.c"

which includes the most trivial of fenv(3) implementations possible.

The following passes 'splint'.

/*
  * fenv-generic.c:
  *
  * All generic architectures are assumed to have
  *
  * a)	a status register in which the exception flags exist
  * b)	a control register in which the rounding mode exists
  * c)	an environment structure, fenv_t
  *
  * Routines must be provided with the architecture specific fenv.c routines
  * to both store (copy) these register into memory and load them from memory.
  *
  * If the control & status registers are distinct, then the declaration
  *
  *	static unsigned int fe_get_sr_arch() { ... }
  *	static void fe_set_sr_arch(unsigned int) { ... }
  *	static unsigned int fe_get_cr_arch() { ... }
  *	static void fe_set_cr_arch(unsigned int) { ... }
  *
  * and the macros that use these internal routines
  *
  *	#define fe_get_sr	fe_get_sr_arch
  *	#define fe_set_sr	fe_set_sr_arch
  *	#define fe_get_cr	fe_get_cr_arch
  *	#define fe_set_cr	fe_set_cr_arch
  *
  * need to be provided.
  *
  * If the control & status register are one and the same, then the declaration
  *
  *	static unsigned int fe_get_csr_arch() { ... }
  *	static void fe_set_csr_arch(unsigned int) { ... }
  *
  * needs to be provided. Those fe_[gs]et_[sc] macros are auto-created.
  *
  * Also needed are definitions that respectively
  *
  * a) allows a (memory) fenv_t object *e to be filled with the contents
  *    of the current floating point environment, and
  * b) allows the current floating point environment to mirror the state
  *    of the contents of some (memory) fenv_t object *e.
  *
  *	#define fe_get_e(e)	...
  *	#define fe_set_e(e)	...
  *
  * The default state of the floating point environment must be defined as
  *
  *	#define FE_DFL_ENV_DATA	....
  *
  * Where an architecture like a PowerPC64 supports a list of causes for an
  * INVALID exceptions, the bit mask encapsulating all those causes must be
  * defined, FE_ALL_VALID_CAUSES, as well as the bit field associated with
  * an FE_INVALID exception being caused 'by (an explicit software) request'.
  * They will commonly be defined using the existing ABI as:
  *
  *	#define FE_ALL_INVALID_CAUSES	FE_ALL_INVALID
  *	#define FE_INVALID_BY_REQUEST	FE_INVALID_SOFTWARE
  *
  * Where no causes exist, the above 2 lines must NEVER be defined!
  *
  * Once the above inline functions and appropriate macros have been defined
  * within an 'fenv.c' for a given architecture, the ''fenv.c' should pull
  * this file into its compilation space following existing practice like
  *
  *	#include	"../fenv-generic.c"
  */

/*
  * default inspection and modification of FPU status and control registers
  */
#ifndef	fe_get_cr
#define	fe_get_cr	fe_get_csr_arch
#endif
#ifndef	fe_set_cr
#define	fe_set_cr	fe_set_csr_arch
#endif
#ifndef	fe_get_sr
#define	fe_get_sr	fe_get_csr_arch
#endif
#ifndef	fe_set_sr
#define	fe_set_sr	fe_set_csr_arch
#endif
/*
  * Compute the rounding mask with some rigid logic
  */
#ifndef	FE_TONEAREST
#define	FE_TONEAREST	0
#define	ROUND_MASK	((unsigned int) 0)
#else
#ifndef	FE_TOWARDZERO
#define	ROUND_MASK	((unsigned int) 0)
#else
#ifdef	FE_DOWNWARD
#ifdef	FE_UPWARD
#define	ROUND_MASK	((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
#else
#define	ROUND_MASK	UNSUPPORTED rounding
#endif
#else
#ifdef	FE_UPWARD
#define	ROUND_MASK	UNSUPPORTED rounding
#else
#define	ROUND_MASK	((unsigned int) (FE_TOWARDZERO))
#endif
#endif
#endif
#endif

/*
  * The PowerPC architecture has the concept of an INVALID exception cause,
  * basically the reason why an FE_INVALID exception has occurred. There are
  * nine such causes, encapsulated in a non-contiguous mask of 9 bits, the
  * macro called (herein) FE_ALL_INVALID_CAUSES for clarity. Instructions
  * which involve invalid operations will set the CAUSE bit first, that
  * action implicitly raising the FE_INVALID exception, i.e. an FE_INVALID
  * implies the existence of a CAUSE bit, and vica versa.  Even on this a
  * PowerPC, MUSL supports only one of those 9 hardware causes, the macro
  * called (herein)  FE_INVALID_BY_REQUEST. These nomenclature is used to
  * document that not only do these macros refer to some cause, but that the
  * FE_INVALID exception has been caused 'by (an explicit software) request'.
  *
  * On any other architecture, no such definitions will exist of the mask of
  * all such causes, the macro FE_ALL_INVALID_CAUSES, nor will the macro
  * FE_INVALID_BY_REQUEST exist. In that case, the former is just defined as
  * zero, implying that there are NO causes, while the latter will be just
  * defined as FE_INVALID to keep the compiler happy (as it is never used
  * in any logic).  See fenv.c for the PowerPC64 for more detailed words on
  * this interesting, useful, fascinating, but highly non-standard feature.
  *
  * The above is important because ignoring it stabs MUSL in the back. When
  * it is desired that an FE_INVALID exception be cleared on an architecture
  * with a bit mask of causes as to why any FE_INVALID occurred, any CAUSE
  * bit field must also to be cleared. If this is not done, any set CAUSE
  * bit, i.e. one left uncleared, will cause an FE_INVALID exception to be
  * re-raised when the register image containing those uncleared bits is
  * fed back into 'fe_set_sr' where the instruction * 'Move to FPSCR' is
  * used. The single line of code following doing such handling is avoided
  * on architectures which do not need it because the zero value of the
  * FE_ALL_INVALID_CAUSE mask should mean that this line of code disappears
  * into oblivion by compile-time optimization.
  *
  * On architectures where a raised FE_INVALID exception implies some CAUSE
  * bit is also raised (or set), consistently is provided by saying that the
  * FE_INVALID is caused 'by (an explicit software) request', the bit named
  * (herein) as FE_INVALID_BY_REQUEST.  The single line of code doing such
  * handling is again avoided on architectures which do not need it because
  * the zero value of the FE_ALL_INVALID_CAUSE mask should mean that this
  * line of code is optimized into oblivion at compile-time. To keep the
  * compiler happy, FE_INVALID_BY_REQUEST is simply set to FE_INVALID.
  */
#ifndef	FE_ALL_INVALID_CAUSES
#define	FE_ALL_INVALID_CAUSES	0
#define	FE_INVALID_BY_REQUEST	FE_INVALID
#endif

static inline void
__cleargeneric(int excepts)
{
 	/*
 	 * Address where FE_INVALID is to be cleared on an architecture with a
 	 * matching CAUSE to say why that FE_INVALID occurred. To handle this,
 	 * every one of the CAUSE bits must also be cleared - see above.
 	 */
 	if (FE_ALL_INVALID_CAUSES != 0 && (excepts & FE_INVALID) != 0)
 		excepts |= FE_ALL_INVALID_CAUSES;

 	fe_set_sr(fe_get_sr() & ~excepts);
}

static inline void
__raisegeneric(int excepts)
{
 	/*
 	 * Address where FE_INVALID is to be raised on an architecture demanding
 	 * a matching CAUSE to say why the FE_INVALID is being raised. To handle
 	 * this, say it is caused 'By (Explicit Software) Request' - see above.
 	 */
 	if (FE_ALL_INVALID_CAUSES != 0 && (excepts & FE_INVALID) != 0)
 		excepts |= FE_INVALID_BY_REQUEST;

 	fe_set_sr(fe_get_sr() | excepts);
}

static inline void
__raisecleverly(int excepts)
{
 	/*
 	 * assume single OP is faster than double OP
 	 */
 	const float zero = (float) 0;
 	const float one = (float) 1;
 	const float large = 2.076919e+34; // 2^(+108)
 	const float small = 4.814826e-35; // 2^(-108)
 	volatile float x;
 	/*
 	 * if it is just a simple exception, arithmetic expressions are optimal
 	 */
 	switch(excepts)
 	{
 	case FE_INVALID:
 		x = zero, x /= x;
 		break;
 	case FE_DIVBYZERO:
 		x = zero, x = one / x;
 		break;
 	case FE_INEXACT:
 		x = small, x += one;
 		break;
 	case FE_OVERFLOW | FE_INEXACT:
 		x = large, x *= x;
 		break;
 	case FE_UNDERFLOW | FE_INEXACT:
 		x = small, x *= x;
 		break;
 	default:
 		/*
 		 * if more than one exception exists, a sledgehammer is viable
 		 */
 		__raisegeneric(excepts);
 		break;
 	}
}

int
feclearexcept(int excepts)
{
 	__cleargeneric(excepts & FE_ALL_EXCEPT);
 	return 0;
}

int
feraiseexcept(int excepts)
{
 	__raisecleverly(excepts & FE_ALL_EXCEPT);
 	return 0;
}

int
fetestexcept(int excepts)
{
 	return (int) (fe_get_sr() & (excepts & FE_ALL_EXCEPT));
}

int
fegetround(void)
{
 	return (int) (fe_get_cr() & ROUND_MASK);
}

int
fesetround(int rounding_mode)
{
 	if ((rounding_mode & ~ROUND_MASK) != 0)
 		return -1;

 	fe_set_cr((fe_get_cr() & ~ROUND_MASK) | (rounding_mode & ROUND_MASK));
 	return 0;
}

int
fegetenv(fenv_t *envp)
{
 	fe_get_e(envp);
 	return 0;
}

int
fesetenv(fenv_t *envp)
{
 	const fenv_t envpd = FE_DFL_ENV_DATA;
 	const fenv_t *e = envp == FE_DFL_ENV ? &envpd : envp;

 	fe_set_e(e);
 	return 0;
}

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24 23:59                               ` Damian McGuckin
@ 2020-01-25  0:11                                 ` Rich Felker
  2020-01-26  3:28                                   ` Damian McGuckin
                                                     ` (9 more replies)
  0 siblings, 10 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-25  0:11 UTC (permalink / raw)
  To: musl

On Sat, Jan 25, 2020 at 10:59:07AM +1100, Damian McGuckin wrote:
> 
> Is MUSL ever likely to support Sparc64?
> 
> If so, there is tweak needed for the rounding to match the API. For
> some reason, what is returned to a user program query is just a
> right shifted version of what is in the register, with an equivalent
> left shift for when the mode is set. Very odd.
> 
> The default shift for everything else would be 0.

Unless the bits are in the high portion of the register, we can just
define the FE_* constants to match their location in the register.
Otherwise defining a shift, zero by default, is not a big deal. I
wouldn't worry about this now. It's not like a shift is hard to add
later if needed; it doesn't invalidate the proposed design.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24 13:44                             ` Rich Felker
  2020-01-24 14:45                               ` Damian McGuckin
@ 2020-01-24 23:59                               ` Damian McGuckin
  2020-01-25  0:11                                 ` Rich Felker
  1 sibling, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24 23:59 UTC (permalink / raw)
  To: musl


Is MUSL ever likely to support Sparc64?

If so, there is tweak needed for the rounding to match the API. For some 
reason, what is returned to a user program query is just a right shifted 
version of what is in the register, with an equivalent left shift for when 
the mode is set. Very odd.

The default shift for everything else would be 0.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24 13:44                             ` Rich Felker
@ 2020-01-24 14:45                               ` Damian McGuckin
  2020-01-24 23:59                               ` Damian McGuckin
  1 sibling, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24 14:45 UTC (permalink / raw)
  To: musl

On Fri, 24 Jan 2020, Rich Felker wrote:

>> 	static inline double get_fpscr_f(void)
>> 	{
>> 		double fpscr;
>>
>> 		__asm__ __volatile__ ("mffs %0" : "=d"(fpscr));
>> 		return fpscr;
>> 	}
>> 	static inline unsigned int get_csr(void)
>> 	{
>> 		return (unsigned int) (union {double f; long i;}) {get_fpscr_f()}.i;
>> 	}
>> 	static inline void set_fpscr_f(double fpscr)
>> 	{
>> 		__asm__ __volatile__ ("mtfsf 255, %0" : : "d"(fpscr));
>> 	}
>> 	static inline void set_csr(unsigned int fpscr)
>> 	{
>> 		set_fpscr_f((union {long i; double f;}) {(long) fpscr}.f);
>> 	}
>
> I would drop the _f functions; they're not useful separately

They are needed for fegetenv/fesetenv. The API defines

 	fenv_t

as a double, even though it is interpreted as a long when you have to use 
it.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  6:08                           ` Damian McGuckin
@ 2020-01-24 13:44                             ` Rich Felker
  2020-01-24 14:45                               ` Damian McGuckin
  2020-01-24 23:59                               ` Damian McGuckin
  0 siblings, 2 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-24 13:44 UTC (permalink / raw)
  To: musl

On Fri, Jan 24, 2020 at 05:08:54PM +1100, Damian McGuckin wrote:
> On Thu, 23 Jan 2020, Rich Felker wrote:
> 
> >Note that it's not necessary to write that logic in asm just because
> >it's in the arch-defined part. It could simply be C that fixes up the
> >value before/after the asm.
> 
> My rule was to have no logic in the embedded assembler. Just get or
> set the register and quit. Your skill levels may let you do more. I
> tried to keep these routine super simple.  Here they are for the
> PowerPC.
> 
> 	static inline double get_fpscr_f(void)
> 	{
> 		double fpscr;
> 
> 		__asm__ __volatile__ ("mffs %0" : "=d"(fpscr));
> 		return fpscr;
> 	}
> 	static inline unsigned int get_csr(void)
> 	{
> 		return (unsigned int) (union {double f; long i;}) {get_fpscr_f()}.i;
> 	}
> 	static inline void set_fpscr_f(double fpscr)
> 	{
> 		__asm__ __volatile__ ("mtfsf 255, %0" : : "d"(fpscr));
> 	}
> 	static inline void set_csr(unsigned int fpscr)
> 	{
> 		set_fpscr_f((union {long i; double f;}) {(long) fpscr}.f);
> 	}

I would drop the _f functions; they're not useful separately and
without them you can do something like:

	static inline unsigned get_fpscr_f(void)
	{
		union { double f; long i; } fpscr;
		__asm__ __volatile__ ("mffs %0" : "=d"(fpscr.f));
		return fpscr.i;
	}

that seems both shorter and easier to follow.

> >But indeed the following may be nicer anyway:
> .......
> >>	if (excepts & FE_INVALID)
> >>		excepts |= FE_ALL_INVALID;
> >>and
> >>	if (excepts & FE_INVALID)
> >>		excepts |= FE_INVALID_SOFTWARE;
> >>
> >>An optimize should see the OR with zero for most architectures and
> >>then ignore that whole 'if' statement because the action was a
> >>NO-OP.
> >
> >Yes, this looks nice (although I would probably include FE_INVALID
> >in FE_ALL_INVALID just out of principle of least surprise; it
> >should generate the same code either way.
> 
> Do you mean
> 
>  	if (excepts & FE_INVALID)
>  		excepts |= FE_ALL_INVALID | FE_INVALID;
> 
> If so, will it really be optimized away if FE_ALL_INVALID is zero (0)?

No, I meant so that FE_ALL_INVALID is defined as FE_INVALID rather
than as 0 by default (when the arch doesn't define it). I don't think
it makes any difference to the code emitted, but the value 0 is rather
confusing with the name being "ALL". Without knowing to expect
otherwise, I read:

#ifndef FE_ALL_INVALID
#define FE_ALL_INVALID 0
#endif

as assuming there are no invalid flags at all (not even FE_INVALID) if
the arch didn't define FE_ALL_INVALID. Whereas I read:

#ifndef FE_ALL_INVALID
#define FE_ALL_INVALID FE_INVALID
#endif

as assuming FE_INVALID is the only invalid flag if if the arch didn't
define otherwise.

> >feclearexcept would do:
> >
> >	soft_fpsr = (soft_fpsr | get_sr()) & ~except;
> >	clear_sr();
> >
> >At that point, FE_INVALID is clearable independent of the
> >specific-reason flags and doesn't seem to need any special treatment.
> 
> ??? - I need to wrap my head around that one over the weekend.

The underlying idea is that it makes no difference whether the flag
bits are in the hard or soft fpsr. The logical value is
soft_sr|get_sr(). What the above does is move the entire logical value
to soft_fpsr so that the value in hard_fpsr can be cleared.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  4:55                         ` Rich Felker
  2020-01-24  6:08                           ` Damian McGuckin
@ 2020-01-24  7:10                           ` Damian McGuckin
  1 sibling, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24  7:10 UTC (permalink / raw)
  To: musl


My nomenclature for

 	[sg]etsr
 	[sg]etcr
 	[sg]gete

need to have a bit more 'fe' to ensure they never have a name clash.

Any suggestions?

 	fe[sg]etsr
 	fe[sg]etcr
 	fe[sg]gete

Is the last pair too close to fe[gs]stenv?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  4:55                         ` Rich Felker
@ 2020-01-24  6:08                           ` Damian McGuckin
  2020-01-24 13:44                             ` Rich Felker
  2020-01-24  7:10                           ` Damian McGuckin
  1 sibling, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24  6:08 UTC (permalink / raw)
  To: musl

On Thu, 23 Jan 2020, Rich Felker wrote:

> Note that it's not necessary to write that logic in asm just because
> it's in the arch-defined part. It could simply be C that fixes up the
> value before/after the asm.

My rule was to have no logic in the embedded assembler. Just get or set 
the register and quit. Your skill levels may let you do more. I tried to 
keep these routine super simple.  Here they are for the PowerPC.

 	static inline double get_fpscr_f(void)
 	{
 		double fpscr;

 		__asm__ __volatile__ ("mffs %0" : "=d"(fpscr));
 		return fpscr;
 	}
 	static inline unsigned int get_csr(void)
 	{
 		return (unsigned int) (union {double f; long i;}) {get_fpscr_f()}.i;
 	}
 	static inline void set_fpscr_f(double fpscr)
 	{
 		__asm__ __volatile__ ("mtfsf 255, %0" : : "d"(fpscr));
 	}
 	static inline void set_csr(unsigned int fpscr)
 	{
 		set_fpscr_f((union {long i; double f;}) {(long) fpscr}.f);
 	}

> But indeed the following may be nicer anyway:
......
>> 	if (excepts & FE_INVALID)
>> 		excepts |= FE_ALL_INVALID;
>> and
>> 	if (excepts & FE_INVALID)
>> 		excepts |= FE_INVALID_SOFTWARE;
>>
>> An optimize should see the OR with zero for most architectures and
>> then ignore that whole 'if' statement because the action was a
>> NO-OP.
>
> Yes, this looks nice (although I would probably include FE_INVALID in 
> FE_ALL_INVALID just out of principle of least surprise; it should 
> generate the same code either way.

Do you mean

  	if (excepts & FE_INVALID)
  		excepts |= FE_ALL_INVALID | FE_INVALID;

If so, will it really be optimized away if FE_ALL_INVALID is zero (0)?

>>>>    case (FE_OVERFLOW | FE_INEXACT):
>
> One more style thing: no need for parens here.

Sorry. Too much cutting and pasting from the original 'if' that I realized 
(at the last minute) should be a switch.

> Indeed, I think if we make that change it would be nice to factor it
> as a separate change later.

Please. Sounds wise.

> I do think I want to make it, though,

I think the idea of a SOFT_FPSR is a really good idea.

> feclearexcept would do:
>
> 	soft_fpsr = (soft_fpsr | get_sr()) & ~except;
> 	clear_sr();
>
> At that point, FE_INVALID is clearable independent of the
> specific-reason flags and doesn't seem to need any special treatment.

??? - I need to wrap my head around that one over the weekend.

> Note that just using clear_sr() rather than set_sr() here is a huge 
> optimization on i386 too -- it avoids the need to read-modify-write the 
> full fenv.

Definitely. The Intel architectures are a world unto themselves.

> You should consider arch/*/bits/fenv.h as essentially immutable; the
> types and macros there are ABI or API conventions. The arch-specific
> definitions for fenv stuff should go in arch/$(ARCH)/fenv_arch.h.

OK.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  4:13                       ` Damian McGuckin
@ 2020-01-24  4:55                         ` Rich Felker
  2020-01-24  6:08                           ` Damian McGuckin
  2020-01-24  7:10                           ` Damian McGuckin
  0 siblings, 2 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-24  4:55 UTC (permalink / raw)
  To: musl

On Fri, Jan 24, 2020 at 03:13:32PM +1100, Damian McGuckin wrote:
> >>/*
> >> * There is usually no need to mess the generic valid exceptions bits
> >> * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
> >> * macro for such a scenario - is this the MUSL accepted style????
> >> */
> >>#ifndef FE_QUALIFY_CLEAR_EXCEPT
> >>#define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
> >>#endif
> >>#ifndef FE_QUALIFY_RAISE_EXCEPT
> >>#define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
> >>#endif
> >
> >Can you clarify how these would be defined for powerpc? Could the
> >logic just be embedded in get_sr/set_sr if needed?
> 
> Address the second question first, my original aim was to keep any
> logic out of where embedded assembler was used, mainly because of my
> own very
> irregular experience with it. I could also not convince myself that
>  get_sr and set_sr may need to know from where they are called. In
> the case of the powerpc64, with the SR and CR being the same, that
> adds to the complexity, so I gave up.

Note that it's not necessary to write that logic in asm just because
it's in the arch-defined part. It could simply be C that fixes up the
value before/after the asm. But indeed the following may be nicer
anyway:

> Answering your first question, for the powerpc64.
> 
> #define	QUALIFY_CLEAR_EXCEPT(e)	if (e & FE_INVALID) e |= FE_ALL_INVALID
> #define	QUALIFY_RAISE_EXCEPT(e)	if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE
> 
> An alternative is to avoid these macros and then
> 
> 	#ifndef FE_ALL_INVALID
> 	#define	FE_ALL_INVALID 0
> 	#endif
> 	#ifndef FE_INVALID_SOFTWARE
> 	#define	FE_INVALID_SOFTWARE 0
> 	#endif
> 
> and have respectively the following code in feclear.. and feraise..
> 
> 	if (excepts & FE_INVALID)
> 		excepts |= FE_ALL_INVALID;
> and
> 	if (excepts & FE_INVALID)
> 		excepts |= FE_INVALID_SOFTWARE;
> 
> An optimize should see the OR with zero for most architectures and
> then ignore that whole 'if' statement because the action was a
> NO-OP.

Yes, this looks nice (although I would probably include FE_INVALID in
FE_ALL_INVALID just out of principle of least surprise; it should
generate the same code either way.

> >
> >>/*
> >> * Compute the rounding mask with some rigid logic
> >> */
> >>#ifndef FE_TO_NEAREST
> >>#define FE_TO_NEAREST   0
> >>#define ROUND_MASK  ((unsigned int) 0)
> >>#else
> >>#ifndef FE_TOWARDS_ZERO
> >>#define ROUND_MASK  ((unsigned int) 0)
> >>#else
> >>#ifdef  FE_DOWNWARD
> >>#ifdef  FE_UPWARD
> >>#define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
> >>#else
> >>#define ROUND_MASK  UNSUPPORTED rounding
> >>#endif
> >>#else
> >>#ifdef  FE_UPWARD
> >>#define ROUND_MASK  UNSUPPORTED rounding
> >>#else
> >>#define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
> >>#endif
> >>#endif
> >>#endif
> >>#endif
> >
> >I don't see why you have "UNSUPPORTED rounding" cases here. Any
> >combination should be valid.
> 
> My deliberate assumption was that the only legal combinations were
> 
> 	round to nearest only
> or
> 	round to nearest and truncate only
> or
> 	round to nearest, truncate, and round to either +/- INFINITY
> 
> Whether that assumption is valid is another thing but I am happy to
> be proven wrong.  If somebody defined an architecture where that
> assumption was not valid, I wanted the compiler to complain at me.

I don't want to be encoding assumptions that we're not actually using,
and that aren't useful, into the code. That makes the code more
complex and fragile. (If someone adds an arch where it's not true,
then this breaks and they have to figure out why before proceeding,
instead of just having it work.)

> >Since these are not bitmasks, probably the arch should just have
> >to define the rounding mode mask. For example if the values were
> >0b000, 0b001, 0b100, and 0b101, but the 0 in the middle was
> >significant, it should probably be treated as part of the mask.
> 
> That was not the way I read it but again, I am happy to be proven wrong.
> If the net OR'ing of the various types of rounding cannot be treated
> as a mask, then every *BSD libc implementations that I know will
> need to be fixed because they do treat it that way. I did run tests
> and none of the
> ones I tested from BSD and MUSL had that structure. Maybe my test code
> was wrong. They all seems to be 2 bits where TO_NEAREST was all bits
> zero and one of the others was the some of the other two.

OK then it's probably not an issue and generating it from | of the
individual ones should be fine.

> >If you really want to generate the mask, something like
> >
> >#ifdef FE_TONEAREST
> >#define FE_TONEAREST_MASK FE_TONEAREST
> >#else
> >#define FE_TONEAREST_MASK 0
> >#endif
> >...
> >#define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)
> >
> >should be simpler and comprehensive.
> 
> I will run a test but I think they produce the same result for all
> the architecture for which I have definition files.

Yes, it produces the same result but it's fully general and doesn't
depend on special-casing all the combinations you expect to be
possible.

> >>int
> >>feclearexcept(excepts)
> >>{
> >>    FE_VALIDATE_EXCEPT(excepts);
> >>    FE_QUALIFY_CLEAR_EXCEPT(excepts);
> >>    setsr(getsr() & ~((unsigned int) excepts));
> >>    return(0);
> >>}
> >
> >No function-like parens around return value, no gratuitous casts.
> 
> No problems. Sorry, gratuitous casting is a habit to stop various
> linters complaining. I went looking for a MUSL style guide but could
> not find one.

There's something of a style guide on the wiki but it's not very
complete:

	https://wiki.musl-libc.org/coding-style.html

This discussion here suggests some things we could add.

Otherwise, just inferring style from existing code is what people
generally do.

> >>static inline int
> >>__raisearithmetically(int excepts)
> >>{
> >>    /*
> >>     * assume single OP is faster than double OP
> >>     */
> >>    const float one = (float) 1;
> >>    const float zero = (float) 0;
> >>    const float tiny = (float) 0x1.0p-126;
> >>    const float huge = (float) 0x1.0p+126;
> >>    volatile float x;
> >>
> >>    /*
> >>     * if it is just a simple exception, arithmetic expressions are optimal
> >>     */
> >>    switch(excepts)
> >>    {
> >>    case FE_INVALID:
> >>        x = zero, x /= x;
> >>        break;
> >>    case FE_DIVBYZERO:
> >>        x = zero, x = one / x;
> >>        break;
> >>    case FE_INEXACT:
> >>        x = tiny, x += one;
> >>        break;
> >>    case (FE_OVERFLOW | FE_INEXACT):

One more style thing: no need for parens here.

> >>        x = huge, x *= x;
> >>        break;
> >>    case (FE_UNDERFLOW | FE_INEXACT):
> >>        x = tiny, x *= x;
> >>        break;
> >>    default: /* if more than one exception exists, a sledgehammer is viable */
> >>        setsr(getsr() | ((unsigned int) excepts));
> >>        break;
> >>    }
> >>    return(0);
> >>}
> >
> >This is probably ok (aside from style things mentioned above), but
> >simply recording the flags in software without writing them to the
> >status register at all may be preferable. In that case we would
> >not even need a set_sr primitive, only get_sr and clear_sr.
> 
> Yes. I like the idea. But I was deliberately not trying to change
> anything external to 'fenv.c'.

Indeed, I think if we make that change it would be nice to factor it
as a separate change later. I do think I want to make it, though,
since it will enable us to add fenv support for soft-float archs
(which will be fully working if we hook up the FP_HANDLE_EXCEPTIONS
macro in libgcc soft-fp.h to call feraiseexcept!)

> Also, is the handling of
> __pthread_self() protected
> into the future?

I would put it as a macro in the internal fenv header so it could be
changed easily if needed, something like:

#define SOFT_FPSR (__pthread_self()->soft_fpsr)

> >Doing this would also simplify the ppc special-casing for FE_INVALID I
> >think..
> 
> I think only for feraiseexcept as you show.  It does not simplify
> feclearexcept as far as I can see.

feclearexcept would do:

	soft_fpsr = (soft_fpsr | get_sr()) & ~except;
	clear_sr();

At that point, FE_INVALID is clearable independent of the
specific-reason flags and doesn't seem to need any special treatment.

Note that just using clear_sr() rather than set_sr() here is a huge
optimization on i386 too -- it avoids the need to read-modify-write
the full fenv.

> >>int
> >>feraiseexcept(int excepts)
> >>{
> >>    FE_VALIDATE_EXCEPT(excepts);
> >>    FE_QUALIFY_RAISE_EXCEPT(excepts);
> >>    return __raisearithmetically(excepts);
> >>}
> >
> >Then this could just be __pthread_self()->fpsr |= excepts;
> 
> Very clean and simple. Is it portable outside of Linux? Not that this is
> relevant for MUSL which is designed for Linux-based devices but I do work
> on non-Linux (and no C/C++) systems so I am curious.

It's portable to anywhere you have thread-local storage, which should
be anywhere nowadays. You'd just define it as a tls var instead of
using the __pthread_self() access method (which is mildly more
efficient but mainly avoids depending on tls tooling and some issues
with early init of the dynamic linker). Something like:

_Thread_local unsigned __soft_fpsr;
#define SOFT_FPSR __soft_fpsr

> >>int
> >>fesetenv(fenv_t *envp)
> >>{
> >>    fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
> >>
> >>    return ((sete(envp)), 0);
> >>}
> >
> >It might be preferable to use a static const object for default env;
> >I'm not sure. Not a big deal either way though.
> 
> I thought of that. I used to always do that until 5 years ago until
> I found that an auto variable would often produce more optimal code.
> It
> needs to be checked but it is irrelevant to the fundament design issues.

I think you're probably right on that.

> Ongoing:
> 
> I will incorporate your suggestions except for the __pthread_self()->fpsr
> tweaks which I like but cannot test across multiple O/S's and then we can
> review this.  Should I repost to make sure I got everything and we have a
> reference point?
> 
> I will post some inline assembler for your review and entertainment
> but in the spirit of a generic template, they should be reviewed in
> isolation.
> 
> Also, I notice that on the mips, the single unsigned in an 'fenv_t'
> is encapsulated in a struct. Is this historical, mandatory,
> stylistic, a way to fix a compiler bug, compatability or something
> else? I also notice that there is a subtle difference in the
> 'fegetenv' assembler between

You should consider arch/*/bits/fenv.h as essentially immutable; the
types and macros there are ABI or API conventions. The arch-specific
definitions for fenv stuff should go in arch/$(ARCH)/fenv_arch.h.

> 	diff mips/*.S mips64/*.S
> 
> 	64c64
> 	< 	addiu	$5, $4, 1
> 	---
> 	> 	daddiu	$5, $4, 1
> 
> I guess this is just struct alignment issues? Sorry, it is 20 years
> since I worked with the assembler on a MIPS so my knowledge is
> really rusty.

The code there is comparing the pointer in the argument (register $4)
with (void*)-1 by adding 1 and checking whether the result is 0. For
64-bit arch, it has to do a 64-bit add; otherwise the pointer would be
truncated to 32-bit. This kind of nastiness in the asm source files is
the whole reason I want the high-level logic moved to C.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  1:11                     ` Rich Felker
@ 2020-01-24  4:13                       ` Damian McGuckin
  2020-01-24  4:55                         ` Rich Felker
  0 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24  4:13 UTC (permalink / raw)
  To: musl


Let me know if I missed something in the following:

On Thu, 23 Jan 2020, Rich Felker wrote:

>> #ifndef FE_ALL_EXCEPT
>> #define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
>> #define FE_ALL_EXCEPT   (FE_INVALID|FE_VALID_EXCEPT)
>> #endif
>
> I don't understand what this is for. FE_ALL_EXCEPT is defined in the
> public fenv.h and FE_VALID_EXCEPT is not used.

For the first question, it was done purely to avoid the need to define 
FE_ALL_EXCEPT in all of the various 'fenv.h' files. I am happy to throw it 
away.  For the second question, at one stage, I considered having the 
concept of what was anything except an INVALID exception.

I will remove those 4 lines.

>> /*
>>  * The usual approach to validating an exception mask is to throw out
>>  * anything which is illegal. Some systems will NOT do this, choosing
>>  * instead to return a non-zero result on encountering an illegal mask
>>  */
>
> The latter is not desirable behavior and not something you need to be
> trying to reproduce. It's a bug or at least an unspecified case we
> don't care about in existing code.

Wonderful. I was purely trying to preserve historical performance. I am 
more than happy to NOT do this. That makes the next problem easy to fix.

>> #ifndef FE_VALIDATE_EXCEPT
>> #define FE_VALIDATE_EXCEPT(e)   e &= FE_ALL_EXCEPT
>> #endif
>
> Write things like this directly in code; don't hide them behind
> macros. Doing that just makes the code hard to read because you can't
> tell what it's doing without searching for the macro definition. (Also
> the macro is not protected by parens as written but that's easily
> fixable.)

Because we can ignore that non-zero return case, we no longer need
an architecture dependent macro for this so it can go directly in
the code. So, again, an easy problem to fix.

>> /*
>>  * There is usually no need to mess the generic valid exceptions bits
>>  * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
>>  * macro for such a scenario - is this the MUSL accepted style????
>>  */
>> #ifndef FE_QUALIFY_CLEAR_EXCEPT
>> #define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
>> #endif
>> #ifndef FE_QUALIFY_RAISE_EXCEPT
>> #define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
>> #endif
>
> Can you clarify how these would be defined for powerpc? Could the logic 
> just be embedded in get_sr/set_sr if needed?

Address the second question first, my original aim was to keep any logic 
out of where embedded assembler was used, mainly because of my own very
irregular experience with it. I could also not convince myself that
  get_sr and set_sr may need to know from where they are called. In the 
case of the powerpc64, with the SR and CR being the same, that adds to the 
complexity, so I gave up.

Answering your first question, for the powerpc64.

#define	QUALIFY_CLEAR_EXCEPT(e)	if (e & FE_INVALID) e |= FE_ALL_INVALID
#define	QUALIFY_RAISE_EXCEPT(e)	if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE

An alternative is to avoid these macros and then

 	#ifndef FE_ALL_INVALID
 	#define	FE_ALL_INVALID 0
 	#endif
 	#ifndef FE_INVALID_SOFTWARE
 	#define	FE_INVALID_SOFTWARE 0
 	#endif

and have respectively the following code in feclear.. and feraise..

 	if (excepts & FE_INVALID)
 		excepts |= FE_ALL_INVALID;
and
 	if (excepts & FE_INVALID)
 		excepts |= FE_INVALID_SOFTWARE;

An optimize should see the OR with zero for most architectures and then 
ignore that whole 'if' statement because the action was a NO-OP.

>
>> /*
>>  * Compute the rounding mask with some rigid logic
>>  */
>> #ifndef FE_TO_NEAREST
>> #define FE_TO_NEAREST   0
>> #define ROUND_MASK  ((unsigned int) 0)
>> #else
>> #ifndef FE_TOWARDS_ZERO
>> #define ROUND_MASK  ((unsigned int) 0)
>> #else
>> #ifdef  FE_DOWNWARD
>> #ifdef  FE_UPWARD
>> #define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
>> #else
>> #define ROUND_MASK  UNSUPPORTED rounding
>> #endif
>> #else
>> #ifdef  FE_UPWARD
>> #define ROUND_MASK  UNSUPPORTED rounding
>> #else
>> #define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
>> #endif
>> #endif
>> #endif
>> #endif
>
> I don't see why you have "UNSUPPORTED rounding" cases here. Any 
> combination should be valid.

My deliberate assumption was that the only legal combinations were

 	round to nearest only
or
 	round to nearest and truncate only
or
 	round to nearest, truncate, and round to either +/- INFINITY

Whether that assumption is valid is another thing but I am happy to be 
proven wrong.  If somebody defined an architecture where that assumption 
was not valid, I wanted the compiler to complain at me.

> Since these are not bitmasks, probably the arch should just have to 
> define the rounding mode mask. For example if the values were 0b000, 
> 0b001, 0b100, and 0b101, but the 0 in the middle was significant, it 
> should probably be treated as part of the mask.

That was not the way I read it but again, I am happy to be proven wrong.
If the net OR'ing of the various types of rounding cannot be treated as a 
mask, then every *BSD libc implementations that I know will need to be 
fixed because they do treat it that way. I did run tests and none of the
ones I tested from BSD and MUSL had that structure. Maybe my test code
was wrong. They all seems to be 2 bits where TO_NEAREST was all bits
zero and one of the others was the some of the other two.

> If you really want to generate the mask, something like
>
> #ifdef FE_TONEAREST
> #define FE_TONEAREST_MASK FE_TONEAREST
> #else
> #define FE_TONEAREST_MASK 0
> #endif
> ...
> #define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)
>
> should be simpler and comprehensive.

I will run a test but I think they produce the same result for all
the architecture for which I have definition files.

>> int
>> feclearexcept(excepts)
>> {
>>     FE_VALIDATE_EXCEPT(excepts);
>>     FE_QUALIFY_CLEAR_EXCEPT(excepts);
>>     setsr(getsr() & ~((unsigned int) excepts));
>>     return(0);
>> }
>
> No function-like parens around return value, no gratuitous casts.

No problems. Sorry, gratuitous casting is a habit to stop various linters 
complaining. I went looking for a MUSL style guide but could not find one.

>> static inline int
>> __raisearithmetically(int excepts)
>> {
>>     /*
>>      * assume single OP is faster than double OP
>>      */
>>     const float one = (float) 1;
>>     const float zero = (float) 0;
>>     const float tiny = (float) 0x1.0p-126;
>>     const float huge = (float) 0x1.0p+126;
>>     volatile float x;
>>
>>     /*
>>      * if it is just a simple exception, arithmetic expressions are optimal
>>      */
>>     switch(excepts)
>>     {
>>     case FE_INVALID:
>>         x = zero, x /= x;
>>         break;
>>     case FE_DIVBYZERO:
>>         x = zero, x = one / x;
>>         break;
>>     case FE_INEXACT:
>>         x = tiny, x += one;
>>         break;
>>     case (FE_OVERFLOW | FE_INEXACT):
>>         x = huge, x *= x;
>>         break;
>>     case (FE_UNDERFLOW | FE_INEXACT):
>>         x = tiny, x *= x;
>>         break;
>>     default: /* if more than one exception exists, a sledgehammer is viable */
>>         setsr(getsr() | ((unsigned int) excepts));
>>         break;
>>     }
>>     return(0);
>> }
>
> This is probably ok (aside from style things mentioned above), but 
> simply recording the flags in software without writing them to the 
> status register at all may be preferable. In that case we would not even 
> need a set_sr primitive, only get_sr and clear_sr.

Yes. I like the idea. But I was deliberately not trying to change anything 
external to 'fenv.c'. Also, is the handling of __pthread_self() protected
into the future?

> Doing this would also simplify the ppc special-casing for FE_INVALID I
> think..

I think only for feraiseexcept as you show.  It does not simplify 
feclearexcept as far as I can see.

>> int
>> feraiseexcept(int excepts)
>> {
>>     FE_VALIDATE_EXCEPT(excepts);
>>     FE_QUALIFY_RAISE_EXCEPT(excepts);
>>     return __raisearithmetically(excepts);
>> }
>
> Then this could just be __pthread_self()->fpsr |= excepts;

Very clean and simple. Is it portable outside of Linux? Not that this is
relevant for MUSL which is designed for Linux-based devices but I do work
on non-Linux (and no C/C++) systems so I am curious.

>> int
>> fegetround(void)
>> {
>>     return (int) (getcr() & ROUND_MASK);
>> }
>>
>> int
>> fesetround(int rounding_mode)
>> {
>>     if ((rounding_mode & ~ROUND_MASK) == 0)
>>     {
>>         unsigned int mode = ((unsigned int) rounding_mode);
>>
>>         return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
>>     }
>>     return(-1);
>> }
>
> I would put the error test at the beginning rather than putting the
> whole body inside {}:
>
>     if (rounding_mode & ~ROUND_MASK) return -1;

Happy to.  A while ago, I found that the alternative produces more optimal 
code when dealing with floating point dominant routines. But that is huge 
generalization and I have not tested that generalization in this case.

>> int
>> fegetenv(fenv_t *envp)
>> {
>>     return ((gete(envp)), 0);
>> }
>
> Is there a reason for using comma operator here?

I used it to let the compiler prove that my #defines for gete() and sete() 
worked as if they were a single statement. That gave me some rigour on the 
design of that macro, which in the case of the m68k (and eventually the 
i386), does demand a comma operator.

But for the production code, it will disappear.

>> int
>> fesetenv(fenv_t *envp)
>> {
>>     fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
>>
>>     return ((sete(envp)), 0);
>> }
>
> It might be preferable to use a static const object for default env;
> I'm not sure. Not a big deal either way though.

I thought of that. I used to always do that until 5 years ago until I 
found that an auto variable would often produce more optimal code. It
needs to be checked but it is irrelevant to the fundament design issues.

Ongoing:

I will incorporate your suggestions except for the __pthread_self()->fpsr
tweaks which I like but cannot test across multiple O/S's and then we can
review this.  Should I repost to make sure I got everything and we have a
reference point?

I will post some inline assembler for your review and entertainment but in 
the spirit of a generic template, they should be reviewed in isolation.

Also, I notice that on the mips, the single unsigned in an 'fenv_t' is 
encapsulated in a struct. Is this historical, mandatory, stylistic, a way 
to fix a compiler bug, compatability or something else? I also notice that 
there is a subtle difference in the 'fegetenv' assembler between

 	diff mips/*.S mips64/*.S

 	64c64
 	< 	addiu	$5, $4, 1
 	---
 	> 	daddiu	$5, $4, 1

I guess this is just struct alignment issues? Sorry, it is 20 years since 
I worked with the assembler on a MIPS so my knowledge is really rusty.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-24  0:42                   ` Damian McGuckin
@ 2020-01-24  1:11                     ` Rich Felker
  2020-01-24  4:13                       ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-24  1:11 UTC (permalink / raw)
  To: musl

On Fri, Jan 24, 2020 at 11:42:02AM +1100, Damian McGuckin wrote:
> 
> In an attempt to address this issue, the following can be considered
> an attempt at a 'discussion paper'.

Thanks! I'm first commenting inline with the code since that's a more
natural way for me to review, but I might come back with some comments
on the text a bit later. Here it goes:


> #ifndef FE_ALL_EXCEPT
> #define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
> #define FE_ALL_EXCEPT   (FE_INVALID|FE_VALID_EXCEPT)
> #endif

I don't understand what this is for. FE_ALL_EXCEPT is defined in the
public fenv.h and FE_VALID_EXCEPT is not used.

> /*
>  * The usual approach to validating an exception mask is to throw out
>  * anything which is illegal. Some systems will NOT do this, choosing
>  * instead to return a non-zero result on encountering an illegal mask
>  */

The latter is not desirable behavior and not something you need to be
trying to reproduce. It's a bug or at least an unspecified case we
don't care about in existing code.

> #ifndef FE_VALIDATE_EXCEPT
> #define FE_VALIDATE_EXCEPT(e)   e &= FE_ALL_EXCEPT
> #endif

Write things like this directly in code; don't hide them behind
macros. Doing that just makes the code hard to read because you can't
tell what it's doing without searching for the macro definition. (Also
the macro is not protected by parens as written but that's easily
fixable.)

> /*
>  * There is usually no need to mess the generic valid exceptions bits
>  * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
>  * macro for such a scenario - is this the MUSL accepted style????
>  */
> #ifndef FE_QUALIFY_CLEAR_EXCEPT
> #define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
> #endif
> #ifndef FE_QUALIFY_RAISE_EXCEPT
> #define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
> #endif

Can you clarify how these would be defined for powerpc? Could the
logic just be embedded in get_sr/set_sr if needed?

> /*
>  * Compute the rounding mask with some rigid logic
>  */
> #ifndef FE_TO_NEAREST
> #define FE_TO_NEAREST   0
> #define ROUND_MASK  ((unsigned int) 0)
> #else
> #ifndef FE_TOWARDS_ZERO
> #define ROUND_MASK  ((unsigned int) 0)
> #else
> #ifdef  FE_DOWNWARD
> #ifdef  FE_UPWARD
> #define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
> #else
> #define ROUND_MASK  UNSUPPORTED rounding
> #endif
> #else
> #ifdef  FE_UPWARD
> #define ROUND_MASK  UNSUPPORTED rounding
> #else
> #define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
> #endif
> #endif
> #endif
> #endif

I don't see why you have "UNSUPPORTED rounding" cases here. Any
combination should be valid. Since these are not bitmasks, probably
the arch should just have to define the rounding mode mask. For
example if the values were 0b000, 0b001, 0b100, and 0b101, but the 0
in the middle was significant, it should probably be treated as part
of the mask.

If you really want to generate the mask, something like

#ifdef FE_TONEAREST
#define FE_TONEAREST_MASK FE_TONEAREST
#else
#define FE_TONEAREST_MASK 0
#endif
...
#define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)

should be simpler and comprehensive.

> int
> feclearexcept(excepts)
> {
>     FE_VALIDATE_EXCEPT(excepts);
>     FE_QUALIFY_CLEAR_EXCEPT(excepts);
>     setsr(getsr() & ~((unsigned int) excepts));
>     return(0);
> }

No function-like parens around return value, no gratuitous casts.

> static inline int
> __raisearithmetically(int excepts)
> {
>     /*
>      * assume single OP is faster than double OP
>      */
>     const float one = (float) 1;
>     const float zero = (float) 0;
>     const float tiny = (float) 0x1.0p-126;
>     const float huge = (float) 0x1.0p+126;
>     volatile float x;
> 
>     /*
>      * if it is just a simple exception, arithmetic expressions are optimal
>      */
>     switch(excepts)
>     {
>     case FE_INVALID:
>         x = zero, x /= x;
>         break;
>     case FE_DIVBYZERO:
>         x = zero, x = one / x;
>         break;
>     case FE_INEXACT:
>         x = tiny, x += one;
>         break;
>     case (FE_OVERFLOW | FE_INEXACT):
>         x = huge, x *= x;
>         break;
>     case (FE_UNDERFLOW | FE_INEXACT):
>         x = tiny, x *= x;
>         break;
>     default: /* if more than one exception exists, a sledgehammer is viable */
>         setsr(getsr() | ((unsigned int) excepts));
>         break;
>     }
>     return(0);
> }

This is probably ok (aside from style things mentioned above), but
simply recording the flags in software without writing them to the
status register at all may be preferable. In that case we would not
even need a set_sr primitive, only get_sr and clear_sr.

Doing this would also simplify the ppc special-casing for FE_INVALID I
think..

> int
> feraiseexcept(int excepts)
> {
>     FE_VALIDATE_EXCEPT(excepts);
>     FE_QUALIFY_RAISE_EXCEPT(excepts);
>     return __raisearithmetically(excepts);
> }

Then this could just be __pthread_self()->fpsr |= excepts;

> int
> fetestexcept(int excepts)
> {
>     FE_VALIDATE_EXCEPT(excepts);
> 
>     return (int) (getsr() & ((unsigned int) excepts));
> }
> int
> fegetround(void)
> {
>     return (int) (getcr() & ROUND_MASK);
> }
> 
> int
> fesetround(int rounding_mode)
> {
>     if ((rounding_mode & ~ROUND_MASK) == 0)
>     {
>         unsigned int mode = ((unsigned int) rounding_mode);
> 
>         return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
>     }
>     return(-1);
> }

I would put the error test at the beginning rather than putting the
whole body inside {}:

     if (rounding_mode & ~ROUND_MASK) return -1;

> int
> fegetenv(fenv_t *envp)
> {
>     return ((gete(envp)), 0);
> }

Is there a reason for using comma operator here?

> int
> fesetenv(fenv_t *envp)
> {
>     fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
> 
>     return ((sete(envp)), 0);
> }

It might be preferable to use a static const object for default env;
I'm not sure. Not a big deal either way though.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  9:40                 ` Szabolcs Nagy
@ 2020-01-24  0:42                   ` Damian McGuckin
  2020-01-24  1:11                     ` Rich Felker
  0 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-24  0:42 UTC (permalink / raw)
  To: musl


In an attempt to address this issue, the following can be considered an 
attempt at a 'discussion paper'.

The following are assumptions. Whether the assumptions are valid enough
is debatable and open for discussion by virtue of this initial post. I
will put the generic details and once there is some agreement to that,
the details can be attacked, including nomenclature.  The names are not
locked in stone. They are just what is being used to illustrate the design.
Comments relating to accepted practice in MUSL are welcome.

These assumptions do not hold for i386, x32, and x86_64, for reasons that 
are largely a need to address their historical origins which at the time, 
were quite ground breaking. The 3 architectures need their own special 
handling although background research suggest that all three can be made 
generic with respect to each.  But that is another story.

I have written this at a level that was meant to be read by those with a
low level of knowledge. That may be a mistake but it meant that I could
get some level of initial peer review before posting this to the list!

Ignoring Intel FPUs for the moment at least, it is assumed that all FPUs 
expose their state to, and allow control of themselves by, user programs 
through

a) a status register which contains bits which are set when a floating
    point exception occurs, the exception bits, and
b) a control register which contains a bit field in which is stored the
    currently active rounding mode.

These registers may be one and the same.  It is assumed that they can be
stored in, or loaded from an unsigned int (of 32-bits).  This appears to
be the case for all considered architectures (see the list below).

The interrogation and modification of both the status register and the
control register can be done via the respective pair of 'routines'

 	getsr/setsr
and
 	getcr/setcr

which respectively copies the register's contents into a local variable,
an unsigned int, and sets the register's contents from its unsigned int
argument.

These are simple #defines, being mapped to

a)	if the status and control bits are in the same register, one of

 	   static inline unsigned int get_csr() { .... }
 	   static inline void set_csr(unsigned int csr) { .... }

 	where get_csr is mapped to both 'getsr' and 'getcr' and

a)	if the registers are in fact distinct, one

 	    static inline unsigned int get_sr() { .... }
 	    static inline void set_sr(unsigned int csr) { .... }
 	    static inline unsigned int get_cr() { .... }
 	    static inline void set_cr(unsigned int csr) { .... }

There exists a 'const' default floating point environment which can be
defined as a local variable as:

      const fenv_t fe_dfl_env = FE_DFL_ENV_DATA;

where FE_DFL_ENV_DATA is a #define which is created automatically if the
variable 'fe_dfl_env' is an unsigned int.

There also exists a pair of macros, 'gete' and 'sete', and an assumed
memory object '*e' which are defined such that

 	#define gete(e)	... // extracts the FPU's environment into *e

 	#define sete(e)	... // replaces the FPU's environment with *e

 	fenv_t *e;

Some examples of these architecture dependent routines and the associated 
definitions were crafted for m68k, powerpc64, and s390x, by copying the 
appropriate bits from how 'musl' currently does it.  An approximation for 
examples for sparc64, mips64 and sh is provides, remembering that these 
are approximations only.  The implementation for these cases is likely to 
be at least less than optimal if not wrong.  Please amend as appropriate.

Also, at least one architecture has an expanded idea of the concept of an
INVALID exception. For such scenarios, 'feclearexcept' and 'feraiseexcept'
need macros to be defined which massage the exceptions bits 'excepts' that
come a user's program, respectively

     #define FE_QUALIFY_CLEAR_EXCEPT(excepts)	....
and
     #define FE_QUALIFY_RAISE_EXCEPT(excepts)	....

For any other scenarios, these macros never need to be defined explicitly,
the code will define them as empty by itself.
All routines validate the exceptions bits 'excepts' coming a user's program.
This is achieve with a macro, the default of which is

     #define FE_VALIDATE_EXCEPT(except)	except &= FE_ALL_EXCEPTS

This simply discards any bits which are not a valid exception bit. It is
created automatically and does not need to be defined. However, where such
an approach for the case of illegal exceptions is deemed to unsatisfactory
and for example routine failure and a non-zero return is deems a better
approach, that macro can redefined explicitly as

    #define FE_VALIDATE_EXCEPT(except) if ((except) & FE_ALL_EXCEPTS) return(-1)

Historically it seems only one architecture took this approach, 'm68k',
and then only on Linux.  On any other architecture, and on *BSD, the
earlier, the default approach is used.

An attempt at an implementation is shown below, WITHOUT the architecture 
dependent macro definitions and 'static inline' routines which largely 
just comprise one line of embedded assembler.

Note that raising exceptions is done with arithmetic expression for trivial
cases and by directly modifying the floating point status register for the
non-trivial case.  The approach is open to question. While it allows testing
in isolation, it does not use any of the latest barrier function approaches
and routines like 'math_uflow' and colleagues.  It might be advantages to
focus on the overall design first and then discuss how to raise exceptions
later so as to not detract from the discussion of the overall design of the
generic interface.

Please consider:

/*
  * default inspection and modification of FPU status and control registerd
  */
#ifndef getcr
#define getcr   get_csr
#endif
#ifndef setcr
#define setcr   set_csr
#endif
#ifndef getsr
#define getsr   get_csr
#endif
#ifndef setsr
#define setsr   set_csr
#endif
#ifdef FE_DFL_ENV_DATA
#define FE_DFL_ENV_DATA 0	/* assume this is a unsigned int - cast?? */
#endif
#ifndef gete
#define gete(e) *e = get_csr()	/* assume this is a basic type */
#endif
#ifndef sete
#define sete(e) set_csr(*e)	/* assume this is a basic type */
#endif
#ifndef FE_ALL_EXCEPT
#define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
#define FE_ALL_EXCEPT   (FE_INVALID|FE_VALID_EXCEPT)
#endif
/*
  * The usual approach to validating an exception mask is to throw out
  * anything which is illegal. Some systems will NOT do this, choosing
  * instead to return a non-zero result on encountering an illegal mask
  */ 
#ifndef FE_VALIDATE_EXCEPT
#define FE_VALIDATE_EXCEPT(e)   e &= FE_ALL_EXCEPT
#endif
/*
  * There is usually no need to mess the generic valid exceptions bits
  * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
  * macro for such a scenario - is this the MUSL accepted style????
  */
#ifndef FE_QUALIFY_CLEAR_EXCEPT
#define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
#endif
#ifndef FE_QUALIFY_RAISE_EXCEPT
#define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
#endif
/*
  * Compute the rounding mask with some rigid logic
  */
#ifndef FE_TO_NEAREST
#define FE_TO_NEAREST   0
#define ROUND_MASK  ((unsigned int) 0)
#else
#ifndef FE_TOWARDS_ZERO
#define ROUND_MASK  ((unsigned int) 0)
#else
#ifdef  FE_DOWNWARD
#ifdef  FE_UPWARD
#define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
#else
#define ROUND_MASK  UNSUPPORTED rounding
#endif
#else
#ifdef  FE_UPWARD
#define ROUND_MASK  UNSUPPORTED rounding
#else
#define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
#endif
#endif
#endif
#endif

int
feclearexcept(excepts)
{
     FE_VALIDATE_EXCEPT(excepts);
     FE_QUALIFY_CLEAR_EXCEPT(excepts);
     setsr(getsr() & ~((unsigned int) excepts));
     return(0);
}

static inline int
__raisearithmetically(int excepts)
{
     /*
      * assume single OP is faster than double OP
      */
     const float one = (float) 1;
     const float zero = (float) 0;
     const float tiny = (float) 0x1.0p-126;
     const float huge = (float) 0x1.0p+126;
     volatile float x;

     /*
      * if it is just a simple exception, arithmetic expressions are optimal
      */
     switch(excepts)
     {
     case FE_INVALID:
         x = zero, x /= x;
         break;
     case FE_DIVBYZERO:
         x = zero, x = one / x;
         break;
     case FE_INEXACT:
         x = tiny, x += one;
         break;
     case (FE_OVERFLOW | FE_INEXACT):
         x = huge, x *= x;
         break;
     case (FE_UNDERFLOW | FE_INEXACT):
         x = tiny, x *= x;
         break;
     default: /* if more than one exception exists, a sledgehammer is viable */
         setsr(getsr() | ((unsigned int) excepts));
         break;
     }
     return(0);
}

int
feraiseexcept(int excepts)
{
     FE_VALIDATE_EXCEPT(excepts);
     FE_QUALIFY_RAISE_EXCEPT(excepts);
     return __raisearithmetically(excepts);
}

int
fetestexcept(int excepts)
{
     FE_VALIDATE_EXCEPT(excepts);

     return (int) (getsr() & ((unsigned int) excepts));
}

int
fegetround(void)
{
     return (int) (getcr() & ROUND_MASK);
}

int
fesetround(int rounding_mode)
{
     if ((rounding_mode & ~ROUND_MASK) == 0)
     {
         unsigned int mode = ((unsigned int) rounding_mode);

         return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
     }
     return(-1);
}

int
fegetenv(fenv_t *envp)
{
     return ((gete(envp)), 0);
}

int
fesetenv(fenv_t *envp)
{
     fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;

     return ((sete(envp)), 0);
}

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-21  4:46               ` Damian McGuckin
@ 2020-01-21  7:26                 ` Damian McGuckin
  0 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-21  7:26 UTC (permalink / raw)
  To: musl

On Tue, 21 Jan 2020, Damian McGuckin wrote:

> I will revisit the abstraction without the i386/x86-?? architecture.
> I think that would work.

My rough abstraction of these:

 	m68k/fenv.c
 	powerpc64/fenv.c
 	s390x/fenv.c

Looks to be pretty compatible with this:

 	mips64/fenv.S

These next three look very similar but my knowledge of these assemblers is 
close to zero.

 	riscv64/fenv.S
 	sh/fenv.S
 	aarch64/fenv.s

I will try and tidy up the abstraction over the weekend.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-21  4:22             ` Rich Felker
@ 2020-01-21  4:46               ` Damian McGuckin
  2020-01-21  7:26                 ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-21  4:46 UTC (permalink / raw)
  To: musl

On Mon, 20 Jan 2020, Rich Felker wrote:

> If you don't feel ready to do unification or work on archs you're 
> unfamiliar with, I think it's okay to either (1) only do the x86 work 
> now, with no unification, or (2) start the unification in src/fenv/*.c, 
> but with the arch files left in place in src/fenv/*/*.[csS] for all the 
> archs that haven't been converted yet. I don't want to block improvement 
> of the x86 versions just because the bigger task is too big.

I will revisit the abstraction without the i386/x86-?? architecture.
I think that would work.

> Another really stupid but perhaps very efficient idea we could do is
> just emulating the flags. Add a TLS slot for an fexcept_t value, move
> exceptions there as needed, and or it onto the result when reading
> back current exceptions. This would also make it dirt cheap for the
> math library to raise any exception it wants, without needing
> arithmetic, and it would make it possible to have the math library
> return errors via exception flags even on softfloat archs.

That makes a lot of sense.

That just leaves rounding and, even on the X86, the abstraction of 
getting/setting the register can be made generic.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-21  3:53           ` Damian McGuckin
@ 2020-01-21  4:22             ` Rich Felker
  2020-01-21  4:46               ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-21  4:22 UTC (permalink / raw)
  To: musl

On Tue, Jan 21, 2020 at 02:53:53PM +1100, Damian McGuckin wrote:
> 
> On Thu, 16 Jan 2020, Rich Felker wrote:
> 
> >Would you be interested in assessing what kind of abstraction makes
> >sense here?
> 
> I think it is quite difficult, but eventually feasibly.
> 
> Even having one abstract version for i386/x32 and x86_64 is not easy.
> 
> My thoughts were to do an abstraction that works for at least those three,
> simplify this to be even more abstract,  and then see how well it
> works for say something else.  The i386/x32 and x86 are arguably
> among the worst as
> they effectively have 2 lots of status and control registers which are
> not synced on-chip but that need to be for MUSL.

It's possible that the x86's are actually the worst fit for the
abstraction, and should be left separate, while the rest are unified.

> The only assembler in which I have even limited skills is Sparc32/64
> which is not terribly useful for MUSL but in terms of an
> abstraction, may be as good as anything. I will be investing in an
> ARM soon but my skills will be starting from a base of none.

If you don't feel ready to do unification or work on archs you're
unfamiliar with, I think it's okay to either (1) only do the x86 work
now, with no unification, or (2) start the unification in
src/fenv/*.c, but with the arch files left in place in
src/fenv/*/*.[csS] for all the archs that haven't been converted yet.
I don't want to block improvement of the x86 versions just because the
bigger task is too big.

> On Fri, 17 Jan 2020, Rich Felker wrote:
> 
> >As you said above, updating x87 status register is expensive because
> >the only way to write it is to read-modify-write the whole fenv. But
> >since we know on x86_64 we have sse registers, we can just move all
> >the flags to the sse register, then use fnclex to clear the x87 one
> >inexpensively, and the effective set of raised flags remains the same.
> >
> >I think we could do this on i386 too with a couple tricks:
> >
> >1. Do the same thing if sse is available (hwcap check).
> 
> Yes.
> >
> >2. If sse is not available, clear all flags then re-raise the desired
> >set via arithmetic operations.
> 
> That works.  That said, Based on a comment earlier today, my
> thoughts are to use an arithmetic expression for the case where only
> a single exception was active, including the pairs INEXACT/OVERFLOW
> and INEXACT/UNDERFLOW, and use a fegetenv/set-register/fesetenv for
> anything more complex.

I think arithmetic should be far better for *any* case it works on.

Another really stupid but perhaps very efficient idea we could do is
just emulating the flags. Add a TLS slot for an fexcept_t value, move
exceptions there as needed, and or it onto the result when reading
back current exceptions. This would also make it dirt cheap for the
math library to raise any exception it wants, without needing
arithmetic, and it would make it possible to have the math library
return errors via exception flags even on softfloat archs.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 14:53         ` Rich Felker
                             ` (2 preceding siblings ...)
  2020-01-20  5:32           ` Damian McGuckin
@ 2020-01-21  3:53           ` Damian McGuckin
  2020-01-21  4:22             ` Rich Felker
  3 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-21  3:53 UTC (permalink / raw)
  To: musl


On Thu, 16 Jan 2020, Rich Felker wrote:

> Would you be interested in assessing what kind of abstraction makes
> sense here?

I think it is quite difficult, but eventually feasibly.

Even having one abstract version for i386/x32 and x86_64 is not easy.

My thoughts were to do an abstraction that works for at least those three,
simplify this to be even more abstract,  and then see how well it works for 
say something else.  The i386/x32 and x86 are arguably among the worst as
they effectively have 2 lots of status and control registers which are
not synced on-chip but that need to be for MUSL.

The only assembler in which I have even limited skills is Sparc32/64 which 
is not terribly useful for MUSL but in terms of an abstraction, may be as 
good as anything. I will be investing in an ARM soon but my skills will be 
starting from a base of none.

On Fri, 17 Jan 2020, Rich Felker wrote:

> As you said above, updating x87 status register is expensive because
> the only way to write it is to read-modify-write the whole fenv. But
> since we know on x86_64 we have sse registers, we can just move all
> the flags to the sse register, then use fnclex to clear the x87 one
> inexpensively, and the effective set of raised flags remains the same.
>
> I think we could do this on i386 too with a couple tricks:
>
> 1. Do the same thing if sse is available (hwcap check).

Yes.
>
> 2. If sse is not available, clear all flags then re-raise the desired
> set via arithmetic operations.

That works.  That said, Based on a comment earlier today, my thoughts are 
to use an arithmetic expression for the case where only a single exception 
was active, including the pairs INEXACT/OVERFLOW and INEXACT/UNDERFLOW, 
and use a fegetenv/set-register/fesetenv for anything more complex.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-20  5:32           ` Damian McGuckin
@ 2020-01-20 17:38             ` Rich Felker
  0 siblings, 0 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-20 17:38 UTC (permalink / raw)
  To: musl

On Mon, Jan 20, 2020 at 04:32:32PM +1100, Damian McGuckin wrote:
> On Fri, 17 Jan 2020, Rich Felker wrote:
> 
> >Note that this approach is not compatible with trapping
> >exceptions, but we don't support them anyway.
> 
> When an FNSTENV instruction is executed, all pending exceptions are
> essentially lost (either the x87 FPU status register is cleared or
> all exceptions are masked).

I don't follow what you're saying here and suspect you have a
misunderstanding of some of the terms involved. "Pending" does not
mean status flags that are already set. It's a concept from the
original 387 that's irrelevant in later cpu models, whereby an
exception may have already happened on the fpu but still be in transit
to the cpu. Executing fwait first, or fstenv rather than fnstenv, will
ensure that you don't miss anything even on the original 387.

> I think that this means either it wipes the exception bits in the
> status register or it wipes out the bits of the control register
> which handle which exceptions are trapped.
> 
> Am I wrong.

Yes.

> Because MUSL does not support trapping exceptions, we do not need to
> restore the control word.
> 
> But do we need to copy the lost exceptions flags in the x87 register
> back into the SSE register, well at least where we have an SSE.
> 
> Mind you, on an i386 without an SSE, you cannot restore them!

Again, I don't follow.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 14:53         ` Rich Felker
  2020-01-18  4:45           ` Damian McGuckin
  2020-01-19  9:07           ` Damian McGuckin
@ 2020-01-20  5:32           ` Damian McGuckin
  2020-01-20 17:38             ` Rich Felker
  2020-01-21  3:53           ` Damian McGuckin
  3 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-20  5:32 UTC (permalink / raw)
  To: musl

On Fri, 17 Jan 2020, Rich Felker wrote:

> Note that this approach is not compatible with trapping exceptions, but 
> we don't support them anyway.

When an FNSTENV instruction is executed, all pending exceptions are 
essentially lost (either the x87 FPU status register is cleared or all 
exceptions are masked).

I think that this means either it wipes the exception bits in the status 
register or it wipes out the bits of the control register which handle 
which exceptions are trapped.

Am I wrong.

Because MUSL does not support trapping exceptions, we do not need to
restore the control word.

But do we need to copy the lost exceptions flags in the x87 register
back into the SSE register, well at least where we have an SSE.

Mind you, on an i386 without an SSE, you cannot restore them!

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-19 10:42             ` Szabolcs Nagy
@ 2020-01-19 12:25               ` Damian McGuckin
  0 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-19 12:25 UTC (permalink / raw)
  To: musl

On Sun, 19 Jan 2020, Szabolcs Nagy wrote:

> only musl internally has to be cosistent, which it is.

Thanks for the guidance.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-19  9:07           ` Damian McGuckin
@ 2020-01-19 10:42             ` Szabolcs Nagy
  2020-01-19 12:25               ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Szabolcs Nagy @ 2020-01-19 10:42 UTC (permalink / raw)
  To: musl

* Damian McGuckin <damianm@esi.com.au> [2020-01-19 20:07:56 +1100]:
> On Fri, 17 Jan 2020, Rich Felker wrote:
> 
> > > x86_64
> > > 
> > > *	In assembler
> > > 
> > > *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
> > 
> > As you said above, updating x87 status register is expensive because
> > the only way to write it is to read-modify-write the whole fenv. But
> > since we know on x86_64 we have sse registers, we can just move all
> > the flags to the sse register, then use fnclex to clear the x87 one
> > inexpensively, and the effective set of raised flags remains the same.
> 
> I am worried about the lack of consistency.
> 
> What if something that is only using (say) 80-bit floating arithmetic comes
> back looking for a raised exception in the x87 status register &
> MUSL has moved effectively moved it to the mxcsr.

only musl internally has to be cosistent, which it is.

if anything else looks at fpu status it must look at
both x87 and sse status otherwise it will be wrong
no matter what musl does (fp operations are outside
the control of musl and whoever else tries to look
at fpu status)

but i think if something looks at fpu status behind
the libc it's already wrong: libc can implement fe*
however it wants to as long as the behavior follows
the language semantics, so internal hw registers are
meaningless as soon as a libc call is made.

iirc gcc can emit software float code on x86 that
uses the x87 status register to record the soft
float fenv status, so those bits can change even
if there was no hard float instruction, i.e. the
flags make sense on the c language level but not
on the hw level, if you mix those two levels you
are in trouble.


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 14:53         ` Rich Felker
  2020-01-18  4:45           ` Damian McGuckin
@ 2020-01-19  9:07           ` Damian McGuckin
  2020-01-19 10:42             ` Szabolcs Nagy
  2020-01-20  5:32           ` Damian McGuckin
  2020-01-21  3:53           ` Damian McGuckin
  3 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-19  9:07 UTC (permalink / raw)
  To: musl

On Fri, 17 Jan 2020, Rich Felker wrote:

>> x86_64
>>
>> *	In assembler
>>
>> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
>
> As you said above, updating x87 status register is expensive because
> the only way to write it is to read-modify-write the whole fenv. But
> since we know on x86_64 we have sse registers, we can just move all
> the flags to the sse register, then use fnclex to clear the x87 one
> inexpensively, and the effective set of raised flags remains the same.

I am worried about the lack of consistency.

What if something that is only using (say) 80-bit floating arithmetic 
comes back looking for a raised exception in the x87 status register &
MUSL has moved effectively moved it to the mxcsr.

Mind you, I think that BSD is guilty of almost the same error in rounding 
modes.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  5:29             ` Rich Felker
@ 2020-01-19  8:50               ` Damian McGuckin
  0 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-19  8:50 UTC (permalink / raw)
  To: musl


Hi,

Does MUSL need to support

 	fesetexceptflag
 	fegetexceptflag
 	feholdexcept
and
 	feupdateenv

What about

 	fedisableexcept
 	feenableexcept

Is there need to put in a softfloat environment for instances that do not
support it?

I am not suggesting YES for any of the above. Just curious?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  5:37               ` Rich Felker
@ 2020-01-18  9:40                 ` Szabolcs Nagy
  2020-01-24  0:42                   ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Szabolcs Nagy @ 2020-01-18  9:40 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2020-01-18 00:37:59 -0500]:
> On Sat, Jan 18, 2020 at 04:03:01PM +1100, Damian McGuckin wrote:
> > Do we have to start supporting 128-bit floats, we will need to use
> > the __int28_t and __uint128_t about which I know nothing. But they
> > have their own limitations.
> 
> No, these are not used at all in musl, and are problematic types
> because they can't formally be extended integer types, since they
> mismatch intmax_t being 64-bit. We just use "ldshape".

there is one place where __int128 is used: powerpc64 jmp_buf

and i think aarch64 should use it in ucontext, both are
places where quad float regs are saved, but the kernel and
system software wants to access them with integer code.
(aarch64 uses long double type now)

> I don't think it really makes any difference. Existing ABIs are not
> going to be changed, and the consensus from people who actually want
> to use IEEE quad seems to be that they want a __float128 type that's
> uniform across archs, not having it for long double. This puts it
> outside the scope of math library for the time being, and would
> probably be a separate add-on math library if there eventually is a
> math library for these since musl can't/won't depend on compiler
> support for new/optional types to build a working libc.so (this would
> preclude using any compiler without them).

nowadays there is some support for ts 18661-3 in gcc so
a libc can support the interfaces defined there (which
include math functions for _Float128 type), an old draft:
http://www.open-std.org/jtc1/sc22//WG14/www/docs/n1945.pdf

however hw implementation is costly and with current
implementations there is significant gap between double
and quad prec performance so heavy uses of quad is
unlikely even when available in hw.

otoh quad prec software implementations are still much
faster than arbitrary precision float libraries (mpfr,
arb) so there are use cases when quad makes sense in
general (but it might be better to just optimize the
arbitrary precision libraries)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  5:03             ` Damian McGuckin
@ 2020-01-18  5:37               ` Rich Felker
  2020-01-18  9:40                 ` Szabolcs Nagy
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-18  5:37 UTC (permalink / raw)
  To: musl

On Sat, Jan 18, 2020 at 04:03:01PM +1100, Damian McGuckin wrote:
> On Sat, 18 Jan 2020, Szabolcs Nagy wrote:
> 
> >* Markus Wichmann <nullplan@gmx.net> [2020-01-17 17:41:43 +0100]:
> >>On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
> >>>Also, and I could be wrong, currently MUSL assumes that there is an integral
> >>>type for every floating type.
> >>
> >>Let me stop you there. Musl assumes that for long doubles, you can
> >>create a union ldshape. So there is no need for a 128 bit data type, a
> >>64 bit one is good enough.
> 
> What does it assume for a long double which has 128-bits?

Just that there's a struct ldshape defined for the specific long
double type and endianness, that can be used to access its
representation.

> >actually it's quite annoying to deal with N bit long double
> >without N bit int, because sometimes you want to fiddle with the
> >bits using int operations and if you don't have N bit int type you
> >have to deal with endianness (you can always copy the long double
> >into an array of bytes, but the layout of the bytes is not
> >portable, same for union with multiple fields: you need variants
> >for different endianness)
> 
> I was just seeking others experience and whether they is solution
> about which I am unaware. But it seems not. I am looking at this
> issues in the context of (purchasing) a Power9 CPU which has no
> 80-bit floats but has instead 128-bit floats.

The musl ABI for powerpc[64] uses 64-bit long double because the
compiler support for IEEE quad is very lacking and was pretty much
nonexistent at the time musl added these archs.

> Do we have to start supporting 128-bit floats, we will need to use
> the __int28_t and __uint128_t about which I know nothing. But they
> have their own limitations.

No, these are not used at all in musl, and are problematic types
because they can't formally be extended integer types, since they
mismatch intmax_t being 64-bit. We just use "ldshape".

> Are hardware versions of these on the horizon for Intel or others?
> 
> Should we start lobbying the chip makers to get their act together??
> 
> Probably not questions to which I will get resolutions any time soon.

I don't think it really makes any difference. Existing ABIs are not
going to be changed, and the consensus from people who actually want
to use IEEE quad seems to be that they want a __float128 type that's
uniform across archs, not having it for long double. This puts it
outside the scope of math library for the time being, and would
probably be a separate add-on math library if there eventually is a
math library for these since musl can't/won't depend on compiler
support for new/optional types to build a working libc.so (this would
preclude using any compiler without them).

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  4:45           ` Damian McGuckin
@ 2020-01-18  5:29             ` Rich Felker
  2020-01-19  8:50               ` Damian McGuckin
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-18  5:29 UTC (permalink / raw)
  To: musl

On Sat, Jan 18, 2020 at 03:45:27PM +1100, Damian McGuckin wrote:
> >C specifies it as:
> >
> >   "The feraiseexcept function returns zero if the excepts argument
> >   is zero or if all the specified exceptions were successfully
> >   raised. Otherwise, it returns a nonzero value."
> 
> Pretty vague. This is not why the M68K routines return (-1).
> 
> No routine currently checks that the exceptions were successfully
> raised. They assume that a write to the status register works. If we
> are going to check each instruction such as storing into a register
> works, we have a lot of work to do.

If the ISA has instructions to set status word then we can assume they
actually work. The idea of the possibility that they can fail is to
admit different implementation choices: rather than declining to
define exceptions and rounding modes on softfloat ABI variants, we
could have defined them but make the functions to set them always
report failure. I think this would have been a worse choice which is
why it wasn't done.

In the future if softfloat implementations add fenv, but it's only
conditionally available at runtime depending on [something], then it
would make sense to have possibility of failure at runtime.

> >>	double to a union and then extract the data as a long.
> >>
> >>		return (union {double f; long i;}) {get_fpscr_f()}.i;
> >>
> >>	Is this style of coding universally accepted within MUSL? From my
> >>	reading of other routines, it is normally done as
> >>
> >>		union {double f; long i;} f = { get_fpscr_f() };
> >>
> >>		return f.i;
> >>
> >>	Just curious.
> >
> >Yes, the compound literal form is preferred since it avoids a
> >gratuitous named variable.
> 
> I would humbly suggest initially using the longer form, and then
> once the architecture of the routine is complete, we revert to the
> compound literal form where the code is simple enough to make it
> possible.

I don't follow why this is better.

> >>x86_64
> >>
> >>*	In assembler
> >>
> >>*	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
> >
> >As you said above, updating x87 status register is expensive because
> >the only way to write it is to read-modify-write the whole fenv. But
> >since we know on x86_64 we have sse registers, we can just move all
> >the flags to the sse register, then use fnclex to clear the x87 one
> >inexpensively, and the effective set of raised flags remains the same.
> 
> Thanks for the explanation. Neat.
> 
> >I think we could do this on i386 too with a couple tricks:
> >
> >1. Do the same thing if sse is available (hwcap check).
> >
> >2. If sse is not available, clear all flags then re-raise the desired
> >set via arithmetic operations.
> 
> Simple. I like it. But more code.
> 
> Also, playing devil's advocate for a minute ....
> 
> Are we, or should we, be aiming to have
> 
> 	fetestexcept(int excepts)
> 
> and (even also)
> 
> 	feraiseexcept(int excepts)
> 
> being expanded inline so their use does not compromise optimization?

Only if LTO is in use, and then as you mention you may run into bugs
with the compiler not considering side effects of floating point
operations correctly, so LTO isn't really practical to use until
compilers are fixed. Otherwise we don't put implementations of stuff
like this in public headers.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  1:15           ` Szabolcs Nagy
  2020-01-18  5:03             ` Damian McGuckin
@ 2020-01-18  5:04             ` Markus Wichmann
  1 sibling, 0 replies; 64+ messages in thread
From: Markus Wichmann @ 2020-01-18  5:04 UTC (permalink / raw)
  To: musl

On Sat, Jan 18, 2020 at 02:15:35AM +0100, Szabolcs Nagy wrote:
> actually it's quite annoying to deal with N bit long double without
> N bit int, because sometimes you want to fiddle with the bits using
> int operations and if you don't have N bit int type you have to
> deal with endianness (you can always copy the long double into an
> array of bytes, but the layout of the bytes is not portable, same
> for union with multiple fields: you need variants for different
> endianness)

Well, yeah, but there is never going to be an 80 bit integer type. As
for endianess, I feel your pain. I just got done porting a really old
piece of software from PowerPC to ARM. Sooo many endianess problems...

Ciao,
Markus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-18  1:15           ` Szabolcs Nagy
@ 2020-01-18  5:03             ` Damian McGuckin
  2020-01-18  5:37               ` Rich Felker
  2020-01-18  5:04             ` Markus Wichmann
  1 sibling, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-18  5:03 UTC (permalink / raw)
  To: musl

On Sat, 18 Jan 2020, Szabolcs Nagy wrote:

> * Markus Wichmann <nullplan@gmx.net> [2020-01-17 17:41:43 +0100]:
>> On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
>>> Also, and I could be wrong, currently MUSL assumes that there is an integral
>>> type for every floating type.
>>
>> Let me stop you there. Musl assumes that for long doubles, you can
>> create a union ldshape. So there is no need for a 128 bit data type, a
>> 64 bit one is good enough.

What does it assume for a long double which has 128-bits?

> actually it's quite annoying to deal with N bit long double without N 
> bit int, because sometimes you want to fiddle with the bits using int 
> operations and if you don't have N bit int type you have to deal with 
> endianness (you can always copy the long double into an array of bytes, 
> but the layout of the bytes is not portable, same for union with 
> multiple fields: you need variants for different endianness)

I was just seeking others experience and whether they is solution about 
which I am unaware. But it seems not. I am looking at this issues in the 
context of (purchasing) a Power9 CPU which has no 80-bit floats but has 
instead 128-bit floats.

As you say, it is annoying.  I had this problem ages ago when a long was a 
32-bit integer and a double was 64-bits. And now the problem has come back 
again.

I find MUSL just so much more elegant that the old SUN library because 
MUSL no longer has to worry the get/set hi/lo twin pairs of macros and the 
soft arithmetic that was often necessary to support the 2 32-bit integers.

Do we have to start supporting 128-bit floats, we will need to use the 
__int28_t and __uint128_t about which I know nothing. But they have their 
own limitations.

Are hardware versions of these on the horizon for Intel or others?

Should we start lobbying the chip makers to get their act together??

Probably not questions to which I will get resolutions any time soon.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 14:53         ` Rich Felker
@ 2020-01-18  4:45           ` Damian McGuckin
  2020-01-18  5:29             ` Rich Felker
  2020-01-19  9:07           ` Damian McGuckin
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-18  4:45 UTC (permalink / raw)
  To: musl

On Fri, 17 Jan 2020, Rich Felker wrote:

>> arm (bare)
>>
>> *	Empty
>
> I think you missed that there's fenv-hf.S for armhf. The default ABI
> does not have fpu/fenv.

Oops. Thanks for that.

>> *	The fldenv instruction to update the status registers has a serious
>> 	overhead which cannot be avoided in 'feraiseexcept'. No attempt is
>> 	made to optimize any unnecessary usage (as occurs in feclearexcept).
>
> One thing we could do in C is write feraiseexcept portably to raise
> the exceptions via performing appropriate floating point operations,
> rather than directly modifying status word. This would probably be
> faster on most if not all archs.

To confirm your use of 'probably' on (say) Ivy Bridge

Most places in MUSL use FORCE_EVAL for just that reason.  Just looking at 
the example within 'trunc', to force INEXACT there is an ADDSD and a MOVSD 
with memory accesses in both so 1 cycle each so 2 in total. Done inline, 
the STMXCSR and LDMXCSR are 3 cycles each so 6 in total. But in another 
example to force UNDERFLOW within 'nextafter' using FORCE_EVAL needs 4 
cycles total against the same 6. So a bit closer

Most of the time I prefer FORCE_EVAL but 'feraiseexcept' should probably 
be treated as a fall back as use the current method. My 2c is that we 
should retain 'modifying the register' concept of the 'feraiseexcept' 
routine.

.... Your call.

On a related note:

For the non-zero bits in the first 'setexceptflag' argument, doesn't this 
routine do the same thing as 'feraiseexcept' for those exceptions which 
correspond to those bits.

> Note that the high level C could avoid any action when the flags to be
> cleared are already clear.

Agreed. Nobody does this currently to keep the routine size small.

>> *	What is the best way to query '__hwcap' from inline __asm__ statement,
>
>> From *inline* asm you don't have to do it at all. You just write the
> branch in C. This is one of the reasons to prefer C.
>
>> 	specifically to verify if SSE instructions have to be supported

I have never done this in my life? On BSD there is a system call but that 
is a bit heavy handed.

> It's not to verify that they have to be supported, rather to verify
> that they *can* be used. If the bit is not set in hwcap, the
> instructions are not there and will fault if executed.

I understand this.

> C specifies it as:
>
>    "The feraiseexcept function returns zero if the excepts argument
>    is zero or if all the specified exceptions were successfully
>    raised. Otherwise, it returns a nonzero value."

Pretty vague. This is not why the M68K routines return (-1).

No routine currently checks that the exceptions were successfully raised. 
They assume that a write to the status register works. If we are going to 
check each instruction such as storing into a register works, we have a 
lot of work to do.

> It's not 100% clear to me that this is supposed to apply to invalid
> arguments rather than just some kind of failure to raise valid
> arguments, but I'd err on the side of assuming it does apply if we're
> overhauling this code in C. It's a minor issue though.

Most routines simply mask away anything invalid and then proceed.

>> powerpc
>>
>> *	All assembler
>>
>> *	I think that this architecture has more exception bits than IEEE 754
>> 	specifies. It has lots of specific cases of FE_INVALID. This needs
>> 	to be considered when dealing with FE_INVALID.
>
> I'm trying to remember -- does the hardware lack a unified FE_INVALID
> bit, so that you have to emulate it by aggregating the individual
> ones? I think that could be hidden inside a primitive that
> loads/stores the status word.

Not as far as I understand but others may wish to correct me. When they 
raise FE_INVALID, they concurrently raise a 'reason'. So if we are going 
to manually raise FE_INVALID, the mask MUST include a 'reason'. There is a 
catch-all 'miscellaneous' reason, what is call Software-Defined Condition, 
that is currently used. Other reasons are SNan, Infinity-Infinite, 
Infinity/Infinity, Zero/Zero, Infinite*Zero, Invalid Compare, Invalid 
Integer Convert and Invalid Square Root.

Reasons are a nice ideas but these 'explanations' have others chips have 
never picked up on the concept (which has been around a while).  Sadly, a 
lot of great ideas have never been commercial successes and humanity is 
the poorer for it. I digress. Sorry.

>> powerpc64

>> 	double to a union and then extract the data as a long.
>>
>> 		return (union {double f; long i;}) {get_fpscr_f()}.i;
>>
>> 	Is this style of coding universally accepted within MUSL? From my
>> 	reading of other routines, it is normally done as
>>
>> 		union {double f; long i;} f = { get_fpscr_f() };
>>
>> 		return f.i;
>>
>> 	Just curious.
>
> Yes, the compound literal form is preferred since it avoids a
> gratuitous named variable.

I would humbly suggest initially using the longer form, and then once the 
architecture of the routine is complete, we revert to the compound literal 
form where the code is simple enough to make it possible.

>> sh (SuperH??)
>>
>> *	In assembler
>>
>> *	I know zero about this assembler
>>
>> *	There is some pecularity about updating the environment. I have no
>> 	idea what is going on here. Anybody clear to elaborate?
>
> The comment about preserving precision bit is just incorrect as long
> as this is an external function. The function call ABI has precision
> set to double on function entry. If it's inline asm we might have to
> think about ensuring it's safe. I'm not sure how to constrain the
> precision on __asm__ statement entry but I assume there must be a way
> or you couldn't write inline asm using floating point operands.

Interesting. Way beyond my knowledge. Interesting chip.

>> x86_64
>>
>> *	In assembler
>>
>> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
>
> As you said above, updating x87 status register is expensive because
> the only way to write it is to read-modify-write the whole fenv. But
> since we know on x86_64 we have sse registers, we can just move all
> the flags to the sse register, then use fnclex to clear the x87 one
> inexpensively, and the effective set of raised flags remains the same.

Thanks for the explanation. Neat.

> I think we could do this on i386 too with a couple tricks:
>
> 1. Do the same thing if sse is available (hwcap check).
>
> 2. If sse is not available, clear all flags then re-raise the desired
> set via arithmetic operations.

Simple. I like it. But more code.

Also, playing devil's advocate for a minute ....

Are we, or should we, be aiming to have

 	fetestexcept(int excepts)

and (even also)

 	feraiseexcept(int excepts)

being expanded inline so their use does not compromise optimization?  How 
feasible it this? How much will compilers get confused by the fact that 
there are side-effects of floating point operations that affect floating 
point status registers.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 16:41         ` Markus Wichmann
@ 2020-01-18  1:15           ` Szabolcs Nagy
  2020-01-18  5:03             ` Damian McGuckin
  2020-01-18  5:04             ` Markus Wichmann
  0 siblings, 2 replies; 64+ messages in thread
From: Szabolcs Nagy @ 2020-01-18  1:15 UTC (permalink / raw)
  To: musl

* Markus Wichmann <nullplan@gmx.net> [2020-01-17 17:41:43 +0100]:
> On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
> > Also, and I could be wrong, currently MUSL assumes that there is an integral
> > type for every floating type.
> 
> Let me stop you there. Musl assumes that for long doubles, you can
> create a union ldshape. So there is no need for a 128 bit data type, a
> 64 bit one is good enough.

actually it's quite annoying to deal with N bit long double without
N bit int, because sometimes you want to fiddle with the bits using
int operations and if you don't have N bit int type you have to
deal with endianness (you can always copy the long double into an
array of bytes, but the layout of the bytes is not portable, same
for union with multiple fields: you need variants for different
endianness)

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17  3:36       ` Damian McGuckin
                           ` (2 preceding siblings ...)
  2020-01-17 14:53         ` Rich Felker
@ 2020-01-17 16:41         ` Markus Wichmann
  2020-01-18  1:15           ` Szabolcs Nagy
  3 siblings, 1 reply; 64+ messages in thread
From: Markus Wichmann @ 2020-01-17 16:41 UTC (permalink / raw)
  To: musl

On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
> Also, and I could be wrong, currently MUSL assumes that there is an integral
> type for every floating type.

Let me stop you there. Musl assumes that for long doubles, you can
create a union ldshape. So there is no need for a 128 bit data type, a
64 bit one is good enough.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17  3:36       ` Damian McGuckin
  2020-01-17  3:48         ` Damian McGuckin
  2020-01-17  3:53         ` David Edelsohn
@ 2020-01-17 14:53         ` Rich Felker
  2020-01-18  4:45           ` Damian McGuckin
                             ` (3 more replies)
  2020-01-17 16:41         ` Markus Wichmann
  3 siblings, 4 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-17 14:53 UTC (permalink / raw)
  To: musl

On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
> 
> Feedback/Discussion please, especially in terms of what extra
> comments I need to make?  I hope I have not missed anything.
> 
> General Comments
> ****************
> 
> Except where noted, the approach taken to invalid input is to mask
> out the invalid data, use what data is left, and never inform the
> calling program of invalid data.
> 
> The i386(sometimes), X32 and X86-64 generally need to realise that
> they have both the X87 FPU and the SSE.  Are there scenarios where
> this will not be the case or do we need to plan for future
> scenarious where this will not be the case?
> 
> Do we need to consider what is in the latest IEEE 754 2019 standard
> to see what enhancements are needed or just wait for C2X?
> 
> Other Architectures
> *******************
> 
> Should we look at what is needed for Sparc and Power9 to ensure that
> the (eventually-) chosen abstraction will work with these? Are there
> any other chips which need to be considered. If you look at more
> recent chipset designs, they have all been able to leverage the
> experience of working with IEEE 754 exceptions and rounding and
> follow the same style of use of an exception status and round
> control register . So I think catering for the current crop, plus
> those 2 mentioned above, should be adequate. But
> am I wrong?
> 
> Is Power9 the same as PowerPC64?  I have never seen one. I know I do
> not know enough about this chip as the 128-bit floating point
> discussion talks about Rounding-To-Odd mode? I have tried to read
> the 1358 pages of the ISA 3.0 architecture manual but I have a long
> way to go before I know even 10% of what is in there. Are the newer
> beefy ARMS likely to change what they
> do not in the context of 'fenv' routines?
> 
> Also, and I could be wrong, currently MUSL assumes that there is an
> integral type for every floating type.  On some architectures, I
> believe this is not always the case for 128-bit floating point
> numbers. On some Sparcs, I am not sure it was even the case for
> 64-bit numbers but that was a long time. I do not think that this
> restriction will influence anything here.  How it affects MUSL in
> general is another question irrelevant to this discussion.
> 
> Summary
> *******
> 
> aarch64 (arm)
> 
> *	All assembler
> 
> arm (bare)
> 
> *	Empty

I think you missed that there's fenv-hf.S for armhf. The default ABI
does not have fpu/fenv.

> i386
> 
> *	All assembler
> 
> *	The fldenv instruction to update the status registers has a serious
> 	overhead which cannot be avoided in 'feraiseexcept'. No attempt is
> 	made to optimize any unnecessary usage (as occurs in feclearexcept).

One thing we could do in C is write feraiseexcept portably to raise
the exceptions via performing appropriate floating point operations,
rather than directly modifying status word. This would probably be
faster on most if not all archs.

> 	Note that fldenv also makes the 'feclearexcept' routine unavoidably
> 	complex.

Note that the high level C could avoid any action when the flags to be
cleared are already clear.

> *	What is the best way to query '__hwcap' from inline __asm__ statement,

From *inline* asm you don't have to do it at all. You just write the
branch in C. This is one of the reasons to prefer C.

> 	specifically to verify if SSE instructions have to be supported

It's not to verify that they have to be supported, rather to verify
that they *can* be used. If the bit is not set in hwcap, the
instructions are not there and will fault if executed.

> m68k
> 
> *	In C.
> 
> *	Very clear
> 
> *	feclearexcept and feraiseexcept
> 
> 		if (exception_mask & ~FE_ALL_EXCEPT) return (-1)
> 
> 	Different to the way others handle invalid input. Is this cast
> 	behaviour cast in stone based on standard documentation?

C specifies it as:

    "The feraiseexcept function returns zero if the excepts argument
    is zero or if all the specified exceptions were successfully
    raised. Otherwise, it returns a nonzero value."

It's not 100% clear to me that this is supposed to apply to invalid
arguments rather than just some kind of failure to raise valid
arguments, but I'd err on the side of assuming it does apply if we're
overhauling this code in C. It's a minor issue though.

> mips/mips64/mipsn32
> 
> *	All assembler
> 
> *	Not overly complex.
> 
> powerpc
> 
> *	All assembler
> 
> *	I think that this architecture has more exception bits than IEEE 754
> 	specifies. It has lots of specific cases of FE_INVALID. This needs
> 	to be considered when dealing with FE_INVALID.

I'm trying to remember -- does the hardware lack a unified FE_INVALID
bit, so that you have to emulate it by aggregating the individual
ones? I think that could be hidden inside a primitive that
loads/stores the status word.

> powerpc64
> 
> *	In C.
> 
> *	Very clear
> 
> *	Note that this architecture has more exception bits than IEEE 754
> 	specifies. It has lots of specific cases of FE_INVALID. This needs
> 	to be considered when dealing with FE_INVALID.
> 
> *	This is the first time I have seen this style of coding to cast a
> 	double to a union and then extract the data as a long.
> 
> 		return (union {double f; long i;}) {get_fpscr_f()}.i;
> 
> 	Is this style of coding universally accepted within MUSL? From my
> 	reading of other routines, it is normally done as
> 
> 		union {double f; long i;} f = { get_fpscr_f() };
> 
> 		return f.i;
> 
> 	Just curious.

Yes, the compound literal form is preferred since it avoids a
gratuitous named variable.

> riscv64
> 
> *	All assembler.
> 
> *	Very clear.
> 
> *	The architecture has obviously been done after a review of lots
> 	of experience with the IEEE 754 standard.
> 
> s390x
> 
> *	In C.
> 
> *	Very clear.
> 
> *	Why is __fesetround(int) 'hidden'? Where is fesetround()?

In src/fenv. It does the validity check in C before calling into the
arch backend.

> sh (SuperH??)
> 
> *	In assembler
> 
> *	I know zero about this assembler
> 
> *	There is some pecularity about updating the environment. I have no
> 	idea what is going on here. Anybody clear to elaborate?

The comment about preserving precision bit is just incorrect as long
as this is an external function. The function call ABI has precision
set to double on function entry. If it's inline asm we might have to
think about ensuring it's safe. I'm not sure how to constrain the
precision on __asm__ statement entry but I assume there must be a way
or you couldn't write inline asm using floating point operands.

> x32
> 
> *	In assembler
> 
> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
> 
> *	It is this really much the same as x86-64 (or am I wrong)?

Yes, they're identical except that pointers in memory occupy only 4
bytes.

> x86_64
> 
> *	In assembler
> 
> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

As you said above, updating x87 status register is expensive because
the only way to write it is to read-modify-write the whole fenv. But
since we know on x86_64 we have sse registers, we can just move all
the flags to the sse register, then use fnclex to clear the x87 one
inexpensively, and the effective set of raised flags remains the same.

I think we could do this on i386 too with a couple tricks:

1. Do the same thing if sse is available (hwcap check).

2. If sse is not available, clear all flags then re-raise the desired
set via arithmetic operations.

Note that this approach is not compatible with trapping exceptions,
but we don't support them anyway.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17 14:13           ` Rich Felker
@ 2020-01-17 14:19             ` David Edelsohn
  0 siblings, 0 replies; 64+ messages in thread
From: David Edelsohn @ 2020-01-17 14:19 UTC (permalink / raw)
  To: musl

On Fri, Jan 17, 2020 at 6:13 AM Rich Felker <dalias@libc.org> wrote:
>
> On Thu, Jan 16, 2020 at 07:53:18PM -0800, David Edelsohn wrote:
> > The latest iteration / evolution of the PowerPC architecture is called
> > Power.  Power9 is the most recent processor with support for the most
> > recent ISA.
>
> Is this really relevant to the thread you replied to?

Damian asked:

"Is Power9 the same as PowerPC64?  I have never seen one. I know I do not
know enough about this chip as the 128-bit floating point discussion talks
about Rounding-To-Odd mode? I have tried to read the 1358 pages of the ISA
3.0 architecture manual but I have a long way to go before I know even 10%
of what is in there. Are the newer beefy ARMS likely to change what they
do not in the context of 'fenv' routines?"

>
> > If Intel Ice Lake is x86-64, then IBM Power9 is PowerPC64.
>
> Both of these are true in the naming we're using.
>
> > PowerPC64 Little Endian Linux ABI specifies Power8 as the minimum ISA.
>
> We do not follow that and you're well aware of this. All earlier ISA
> levels are supported in musl. Please do not reply into threads where a
> contributor is looking for accurate information to state things which
> do not apply to the project. Especially ones which are only
> tangentially related to what you want to say.

I was trying to provide Damian and others with additional context for
PPC64LE ABI and Power9.

Please don't attack me for providing additional information and
context.  Your demeaning reply is inappropriate.

Thanks, David

>
> > The basic ABI can run on earlier versions of the 64 bit PowerPC ISA,
> > but it was helpful to define a new, minimum instruction set for Linux
> > distribution releases during the switch to Little Endian.
>
> We also don't have a "switch to little endian". Both endiannesses are
> supported on any hardware capable of running them, and both use ELFv2
> ABI because there never was a v1 ABI on musl (we added the arch long
> after v2 was a thing).
>
> > The switch to IEEE 128 bit FP came later and is not complete in the
> > toolchains yet.  I believe that Musl already decided that it would
> > ignore "long double" until IEEE 128 was available and would avoid the
> > IBM double-double format.
>
> long double in the musl ABI for both 32-bit and 64-bit ppc is same as
> IEEE double. The IBM double-double format's semantics are not
> compatible with musl code and most compilers do not support an ABI
> with IEEE quad on ppc.
>
> Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17  3:53         ` David Edelsohn
@ 2020-01-17 14:13           ` Rich Felker
  2020-01-17 14:19             ` David Edelsohn
  0 siblings, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-17 14:13 UTC (permalink / raw)
  To: musl

On Thu, Jan 16, 2020 at 07:53:18PM -0800, David Edelsohn wrote:
> The latest iteration / evolution of the PowerPC architecture is called
> Power.  Power9 is the most recent processor with support for the most
> recent ISA.

Is this really relevant to the thread you replied to?

> If Intel Ice Lake is x86-64, then IBM Power9 is PowerPC64.

Both of these are true in the naming we're using.

> PowerPC64 Little Endian Linux ABI specifies Power8 as the minimum ISA.

We do not follow that and you're well aware of this. All earlier ISA
levels are supported in musl. Please do not reply into threads where a
contributor is looking for accurate information to state things which
do not apply to the project. Especially ones which are only
tangentially related to what you want to say.

> The basic ABI can run on earlier versions of the 64 bit PowerPC ISA,
> but it was helpful to define a new, minimum instruction set for Linux
> distribution releases during the switch to Little Endian.

We also don't have a "switch to little endian". Both endiannesses are
supported on any hardware capable of running them, and both use ELFv2
ABI because there never was a v1 ABI on musl (we added the arch long
after v2 was a thing).

> The switch to IEEE 128 bit FP came later and is not complete in the
> toolchains yet.  I believe that Musl already decided that it would
> ignore "long double" until IEEE 128 was available and would avoid the
> IBM double-double format.

long double in the musl ABI for both 32-bit and 64-bit ppc is same as
IEEE double. The IBM double-double format's semantics are not
compatible with musl code and most compilers do not support an ABI
with IEEE quad on ppc.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17  3:36       ` Damian McGuckin
  2020-01-17  3:48         ` Damian McGuckin
@ 2020-01-17  3:53         ` David Edelsohn
  2020-01-17 14:13           ` Rich Felker
  2020-01-17 14:53         ` Rich Felker
  2020-01-17 16:41         ` Markus Wichmann
  3 siblings, 1 reply; 64+ messages in thread
From: David Edelsohn @ 2020-01-17  3:53 UTC (permalink / raw)
  To: musl

The latest iteration / evolution of the PowerPC architecture is called
Power.  Power9 is the most recent processor with support for the most
recent ISA.

If Intel Ice Lake is x86-64, then IBM Power9 is PowerPC64.

PowerPC64 Little Endian Linux ABI specifies Power8 as the minimum ISA.
The basic ABI can run on earlier versions of the 64 bit PowerPC ISA,
but it was helpful to define a new, minimum instruction set for Linux
distribution releases during the switch to Little Endian.

The switch to IEEE 128 bit FP came later and is not complete in the
toolchains yet.  I believe that Musl already decided that it would
ignore "long double" until IEEE 128 was available and would avoid the
IBM double-double format.

Thanks, David

On Thu, Jan 16, 2020 at 7:36 PM Damian McGuckin <damianm@esi.com.au> wrote:
>
>
> Feedback/Discussion please, especially in terms of what extra comments I
> need to make?  I hope I have not missed anything.
>
> General Comments
> ****************
>
> Except where noted, the approach taken to invalid input is to mask out the
> invalid data, use what data is left, and never inform the calling program
> of invalid data.
>
> The i386(sometimes), X32 and X86-64 generally need to realise that they
> have both the X87 FPU and the SSE.  Are there scenarios where this will
> not be the case or do we need to plan for future scenarious where this
> will not be the case?
>
> Do we need to consider what is in the latest IEEE 754 2019 standard to see
> what enhancements are needed or just wait for C2X?
>
> Other Architectures
> *******************
>
> Should we look at what is needed for Sparc and Power9 to ensure that the
> (eventually-) chosen abstraction will work with these? Are there any other
> chips which need to be considered. If you look at more recent chipset
> designs, they have all been able to leverage the experience of working
> with IEEE 754 exceptions and rounding and follow the same style of use of
> an exception status and round control register . So I think catering for
> the current crop, plus those 2 mentioned above, should be adequate. But
> am I wrong?
>
> Is Power9 the same as PowerPC64?  I have never seen one. I know I do not
> know enough about this chip as the 128-bit floating point discussion talks
> about Rounding-To-Odd mode? I have tried to read the 1358 pages of the ISA
> 3.0 architecture manual but I have a long way to go before I know even 10%
> of what is in there. Are the newer beefy ARMS likely to change what they
> do not in the context of 'fenv' routines?
>
> Also, and I could be wrong, currently MUSL assumes that there is an
> integral type for every floating type.  On some architectures, I believe
> this is not always the case for 128-bit floating point numbers. On some
> Sparcs, I am not sure it was even the case for 64-bit numbers but that was
> a long time. I do not think that this restriction will influence anything
> here.  How it affects MUSL in general is another question irrelevant to
> this discussion.
>
> Summary
> *******
>
> aarch64 (arm)
>
> *       All assembler
>
> arm (bare)
>
> *       Empty
>
> i386
>
> *       All assembler
>
> *       The fldenv instruction to update the status registers has a serious
>         overhead which cannot be avoided in 'feraiseexcept'. No attempt is
>         made to optimize any unnecessary usage (as occurs in feclearexcept).
>         Note that fldenv also makes the 'feclearexcept' routine unavoidably
>         complex.
>
> *       What is the best way to query '__hwcap' from inline __asm__ statement,
>         specifically to verify if SSE instructions have to be supported
>
> m68k
>
> *       In C.
>
> *       Very clear
>
> *       feclearexcept and feraiseexcept
>
>                 if (exception_mask & ~FE_ALL_EXCEPT) return (-1)
>
>         Different to the way others handle invalid input. Is this cast
>         behaviour cast in stone based on standard documentation?
>
> mips/mips64/mipsn32
>
> *       All assembler
>
> *       Not overly complex.
>
> powerpc
>
> *       All assembler
>
> *       I think that this architecture has more exception bits than IEEE 754
>         specifies. It has lots of specific cases of FE_INVALID. This needs
>         to be considered when dealing with FE_INVALID.
>
> *       My knowledge of this assembler is poor. Please expand these comments!!
>
> powerpc64
>
> *       In C.
>
> *       Very clear
>
> *       Note that this architecture has more exception bits than IEEE 754
>         specifies. It has lots of specific cases of FE_INVALID. This needs
>         to be considered when dealing with FE_INVALID.
>
> *       This is the first time I have seen this style of coding to cast a
>         double to a union and then extract the data as a long.
>
>                 return (union {double f; long i;}) {get_fpscr_f()}.i;
>
>         Is this style of coding universally accepted within MUSL? From my
>         reading of other routines, it is normally done as
>
>                 union {double f; long i;} f = { get_fpscr_f() };
>
>                 return f.i;
>
>         Just curious.
>
> riscv64
>
> *       All assembler.
>
> *       Very clear.
>
> *       The architecture has obviously been done after a review of lots
>         of experience with the IEEE 754 standard.
>
> s390x
>
> *       In C.
>
> *       Very clear.
>
> *       Why is __fesetround(int) 'hidden'? Where is fesetround()?
>
> sh (SuperH??)
>
> *       In assembler
>
> *       I know zero about this assembler
>
> *       There is some pecularity about updating the environment. I have no
>         idea what is going on here. Anybody clear to elaborate?
>
> x32
>
> *       In assembler
>
> *       Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
>
> *       It is this really much the same as x86-64 (or am I wrong)?
>
> x86_64
>
> *       In assembler
>
> *       Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
>
> *** FINISH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-17  3:36       ` Damian McGuckin
@ 2020-01-17  3:48         ` Damian McGuckin
  2020-01-17  3:53         ` David Edelsohn
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-17  3:48 UTC (permalink / raw)
  To: musl

On Fri, 17 Jan 2020, Damian McGuckin wrote:

> *	Why is __fesetround(int) 'hidden'? Where is fesetround()?

Sorry. Ignore that.  Forgot to delete that line from my document.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16 19:33     ` Rich Felker
  2020-01-16 21:31       ` Damian McGuckin
@ 2020-01-17  3:36       ` Damian McGuckin
  2020-01-17  3:48         ` Damian McGuckin
                           ` (3 more replies)
  1 sibling, 4 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-17  3:36 UTC (permalink / raw)
  To: musl


Feedback/Discussion please, especially in terms of what extra comments I 
need to make?  I hope I have not missed anything.

General Comments
****************

Except where noted, the approach taken to invalid input is to mask out the 
invalid data, use what data is left, and never inform the calling program 
of invalid data.

The i386(sometimes), X32 and X86-64 generally need to realise that they 
have both the X87 FPU and the SSE.  Are there scenarios where this will 
not be the case or do we need to plan for future scenarious where this 
will not be the case?

Do we need to consider what is in the latest IEEE 754 2019 standard to see 
what enhancements are needed or just wait for C2X?

Other Architectures
*******************

Should we look at what is needed for Sparc and Power9 to ensure that the 
(eventually-) chosen abstraction will work with these? Are there any other 
chips which need to be considered. If you look at more recent chipset 
designs, they have all been able to leverage the experience of working 
with IEEE 754 exceptions and rounding and follow the same style of use of 
an exception status and round control register . So I think catering for 
the current crop, plus those 2 mentioned above, should be adequate. But
am I wrong?

Is Power9 the same as PowerPC64?  I have never seen one. I know I do not 
know enough about this chip as the 128-bit floating point discussion talks 
about Rounding-To-Odd mode? I have tried to read the 1358 pages of the ISA 
3.0 architecture manual but I have a long way to go before I know even 10% 
of what is in there. Are the newer beefy ARMS likely to change what they
do not in the context of 'fenv' routines?

Also, and I could be wrong, currently MUSL assumes that there is an 
integral type for every floating type.  On some architectures, I believe 
this is not always the case for 128-bit floating point numbers. On some 
Sparcs, I am not sure it was even the case for 64-bit numbers but that was 
a long time. I do not think that this restriction will influence anything 
here.  How it affects MUSL in general is another question irrelevant to 
this discussion.

Summary
*******

aarch64 (arm)

*	All assembler

arm (bare)

*	Empty

i386

*	All assembler

*	The fldenv instruction to update the status registers has a serious
 	overhead which cannot be avoided in 'feraiseexcept'. No attempt is
 	made to optimize any unnecessary usage (as occurs in feclearexcept).
 	Note that fldenv also makes the 'feclearexcept' routine unavoidably
 	complex.

*	What is the best way to query '__hwcap' from inline __asm__ statement,
 	specifically to verify if SSE instructions have to be supported

m68k

*	In C.

*	Very clear

*	feclearexcept and feraiseexcept

 		if (exception_mask & ~FE_ALL_EXCEPT) return (-1)

 	Different to the way others handle invalid input. Is this cast
 	behaviour cast in stone based on standard documentation?

mips/mips64/mipsn32

*	All assembler

*	Not overly complex.

powerpc

*	All assembler

*	I think that this architecture has more exception bits than IEEE 754
 	specifies. It has lots of specific cases of FE_INVALID. This needs
 	to be considered when dealing with FE_INVALID.

*	My knowledge of this assembler is poor. Please expand these comments!!

powerpc64

*	In C.

*	Very clear

*	Note that this architecture has more exception bits than IEEE 754
 	specifies. It has lots of specific cases of FE_INVALID. This needs
 	to be considered when dealing with FE_INVALID.

*	This is the first time I have seen this style of coding to cast a
 	double to a union and then extract the data as a long.

 		return (union {double f; long i;}) {get_fpscr_f()}.i;

 	Is this style of coding universally accepted within MUSL? From my
 	reading of other routines, it is normally done as

 		union {double f; long i;} f = { get_fpscr_f() };

 		return f.i;

 	Just curious.

riscv64

*	All assembler.

*	Very clear.

*	The architecture has obviously been done after a review of lots
 	of experience with the IEEE 754 standard.

s390x

*	In C.

*	Very clear.

*	Why is __fesetround(int) 'hidden'? Where is fesetround()?

sh (SuperH??)

*	In assembler

*	I know zero about this assembler

*	There is some pecularity about updating the environment. I have no
 	idea what is going on here. Anybody clear to elaborate?

x32

*	In assembler

*	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

*	It is this really much the same as x86-64 (or am I wrong)?

x86_64

*	In assembler

*	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

*** FINISH

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16 19:33     ` Rich Felker
@ 2020-01-16 21:31       ` Damian McGuckin
  2020-01-17  3:36       ` Damian McGuckin
  1 sibling, 0 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-16 21:31 UTC (permalink / raw)
  To: musl

On Thu, 16 Jan 2020, Rich Felker wrote:

> That's one point of having the high level logic in C: not having to
> repeat it in each arch where it's potentially subtly wrong. The C code
> should check the validity of the mask before attempting to use the
> arch primitive to set it.

All check it but

a)	most then mask away the rubbish, and then use it, but

b)	the 68k simply gives up and returns (-1).

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16 18:56   ` Damian McGuckin
@ 2020-01-16 19:33     ` Rich Felker
  2020-01-16 21:31       ` Damian McGuckin
  2020-01-17  3:36       ` Damian McGuckin
  0 siblings, 2 replies; 64+ messages in thread
From: Rich Felker @ 2020-01-16 19:33 UTC (permalink / raw)
  To: musl

On Fri, Jan 17, 2020 at 05:56:06AM +1100, Damian McGuckin wrote:
> On Thu, 16 Jan 2020, Rich Felker wrote:
> 
> >As an aside, rather than implementing x86-specific C versions of
> >these, I'd like to end up with a general framework where the arch
> >just exposes a header (fenv_arch.h?) defining some primitives, and
> >the actual fenv function logic is all shared C. I'm not sure what
> >the right generalization is though. It looks like all archs have a
> >get/set exception-flags (status word) operation, but the rest
> >varies a bit.
> 
> I will try and put together a summary of what things do and we can
> go from there. It will need input from somebody with way more
> expertise on ARM and RISC-V (and X87) than I.
> 
> >Would you be interested in assessing what kind of abstraction makes
> >sense here?
> 
> I think the abstraction might need to be consensus. Some of it is
> not even 'fenv' related. For example, some routines only ever return
> zero, while for other FPUs, the same routine can return a (-1) if
> the exception mask
> had an component (bit) which was not a valid exception.

That's one point of having the high level logic in C: not having to
repeat it in each arch where it's potentially subtly wrong. The C code
should check the validity of the mask before attempting to use the
arch primitive to set it.

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16 16:14 ` Rich Felker
@ 2020-01-16 18:56   ` Damian McGuckin
  2020-01-16 19:33     ` Rich Felker
  0 siblings, 1 reply; 64+ messages in thread
From: Damian McGuckin @ 2020-01-16 18:56 UTC (permalink / raw)
  To: musl

On Thu, 16 Jan 2020, Rich Felker wrote:

> As an aside, rather than implementing x86-specific C versions of these, 
> I'd like to end up with a general framework where the arch just exposes 
> a header (fenv_arch.h?) defining some primitives, and the actual fenv 
> function logic is all shared C. I'm not sure what the right 
> generalization is though. It looks like all archs have a get/set 
> exception-flags (status word) operation, but the rest varies a bit.

I will try and put together a summary of what things do and we can go from 
there. It will need input from somebody with way more expertise on ARM and 
RISC-V (and X87) than I.

> Would you be interested in assessing what kind of abstraction makes
> sense here?

I think the abstraction might need to be consensus. Some of it is not even 
'fenv' related. For example, some routines only ever return zero, while 
for other FPUs, the same routine can return a (-1) if the exception mask
had an component (bit) which was not a valid exception.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16  4:30 Damian McGuckin
  2020-01-16 15:11 ` Markus Wichmann
@ 2020-01-16 16:14 ` Rich Felker
  2020-01-16 18:56   ` Damian McGuckin
  1 sibling, 1 reply; 64+ messages in thread
From: Rich Felker @ 2020-01-16 16:14 UTC (permalink / raw)
  To: musl

On Thu, Jan 16, 2020 at 03:30:03PM +1100, Damian McGuckin wrote:
> 
> I was just looking at fenv.s in x86-64 with a view to doing this as
> C-code with some __asm__. I have limited experience with playing
> with this stuff in an SSE-only environment, i.e. no 80-bit floats.
> Before I attempt anything, I need to expand my knowledge.
> 
> My observations on
> 
> 	musl-1.1.24/src/fenv/x86_64/fenv.s
> 
> * feclearexcept
> 
> 	considers both the X87 status word and the SSE MXCSR
> 
> * feraiseexcept
> 
> 	ignores the X87 status word and considers only the SSE MXCSR
> 
> * fetesteexcept
> 
> 	considers both the X87 status word and the SSE MXCSR
> 
> * fesetround
> 
> 	considers both the X87 control word and the SSE MXCSR
> 
> * fegetround
> 
> 	ignores the X87 control word and considers only the SSE MXCSR
> 
> Why is the X87 component of the FPU control/status sometimes ignored
> and sometimes not?
> 
> Does the X87 control or status bits automatically flow through to the
> equivalent stuff in the SSE or vica-versa?
> 
> I assume that 'fwait' is irrelevant and unnecessary, even if one is
> using the x87 FPU?

As an aside, rather than implementing x86-specific C versions of
these, I'd like to end up with a general framework where the arch just
exposes a header (fenv_arch.h?) defining some primitives, and the
actual fenv function logic is all shared C. I'm not sure what the
right generalization is though. It looks like all archs have a get/set
exception-flags (status word) operation, but the rest varies a bit.

Would you be interested in assessing what kind of abstraction makes
sense here?

Rich

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [musl] Considering x86-64 fenv.s to C
  2020-01-16  4:30 Damian McGuckin
@ 2020-01-16 15:11 ` Markus Wichmann
  2020-01-16 16:14 ` Rich Felker
  1 sibling, 0 replies; 64+ messages in thread
From: Markus Wichmann @ 2020-01-16 15:11 UTC (permalink / raw)
  To: musl

On Thu, Jan 16, 2020 at 03:30:03PM +1100, Damian McGuckin wrote:
>
> I was just looking at fenv.s in x86-64 with a view to doing this as C-code
> with some __asm__. I have limited experience with playing with this stuff in
> an SSE-only environment, i.e. no 80-bit floats.  Before I attempt anything,
> I need to expand my knowledge.
>
> My observations on
>
> 	musl-1.1.24/src/fenv/x86_64/fenv.s
>
> * feclearexcept
>
> 	considers both the X87 status word and the SSE MXCSR
>
> * feraiseexcept
>
> 	ignores the X87 status word and considers only the SSE MXCSR
>
> * fetesteexcept
>
> 	considers both the X87 status word and the SSE MXCSR
>
> * fesetround
>
> 	considers both the X87 control word and the SSE MXCSR
>
> * fegetround
>
> 	ignores the X87 control word and considers only the SSE MXCSR
>
> Why is the X87 component of the FPU control/status sometimes ignored and
> sometimes not?
>

feraiseexcept() is merely supposed to set a flag in the fenv such that
fetestexcept() can pick it up. Therefore it is sufficient to set the
flag in either the X87 SW or the MXCSR, and the MXCSR is nicer to
access.

fegetround() has to determine rounding mode. Since no conformant code
can set the rounding modes in X87 and SSE differently from each other,
it is sufficient to read it from either source, and again, the MXCSR is
nicer to access.

> Does the X87 control or status bits automatically flow through to the
> equivalent stuff in the SSE or vica-versa?
>

No, the CPU treats X87 and SSE as separate. That's why fetestexcept()
has to calculate the bitwise OR of the two sources.

> I assume that 'fwait' is irrelevant and unnecessary, even if one is using
> the x87 FPU?
>

Yes, fwait was last relevant on the 386, when an external 387 (was there
a 487?) was actually a possibility, but ever since the 486DX, the FPU
has been moved into the processor. Therefore FPU instructions are now
synchronous to program execution, and a synchronization instruction is
superfluous. Doesn't hurt, either, it's basically a NOP now. A NOP that
can generate #NM under some circumstances, but it doesn't change
anything.

> Regards - Damian
>

Ciao,
Markus

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [musl] Considering x86-64 fenv.s to C 
@ 2020-01-16  4:30 Damian McGuckin
  2020-01-16 15:11 ` Markus Wichmann
  2020-01-16 16:14 ` Rich Felker
  0 siblings, 2 replies; 64+ messages in thread
From: Damian McGuckin @ 2020-01-16  4:30 UTC (permalink / raw)
  To: musl


I was just looking at fenv.s in x86-64 with a view to doing this as C-code 
with some __asm__. I have limited experience with playing with this stuff 
in an SSE-only environment, i.e. no 80-bit floats.  Before I attempt 
anything, I need to expand my knowledge.

My observations on

 	musl-1.1.24/src/fenv/x86_64/fenv.s

* feclearexcept

 	considers both the X87 status word and the SSE MXCSR

* feraiseexcept

 	ignores the X87 status word and considers only the SSE MXCSR

* fetesteexcept

 	considers both the X87 status word and the SSE MXCSR

* fesetround

 	considers both the X87 control word and the SSE MXCSR

* fegetround

 	ignores the X87 control word and considers only the SSE MXCSR

Why is the X87 component of the FPU control/status sometimes ignored and 
sometimes not?

Does the X87 control or status bits automatically flow through to the
equivalent stuff in the SSE or vica-versa?

I assume that 'fwait' is irrelevant and unnecessary, even if one is using 
the x87 FPU?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, back to index

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 14:16 [musl] PPC64(LE) support in musl requires ALTIVEC Romain Naour
2020-02-03 14:50 ` Rich Felker
2020-02-03 15:42   ` Romain Naour
2020-02-03 16:02     ` Jeffrey Walton
2020-02-03 16:18       ` David Edelsohn
2020-02-03 16:51   ` A. Wilcox
2020-02-05  1:32   ` [musl] Considering x86-64 fenv.s to C Damian McGuckin
  -- strict thread matches above, loose matches on Subject: below --
2020-01-16  4:30 Damian McGuckin
2020-01-16 15:11 ` Markus Wichmann
2020-01-16 16:14 ` Rich Felker
2020-01-16 18:56   ` Damian McGuckin
2020-01-16 19:33     ` Rich Felker
2020-01-16 21:31       ` Damian McGuckin
2020-01-17  3:36       ` Damian McGuckin
2020-01-17  3:48         ` Damian McGuckin
2020-01-17  3:53         ` David Edelsohn
2020-01-17 14:13           ` Rich Felker
2020-01-17 14:19             ` David Edelsohn
2020-01-17 14:53         ` Rich Felker
2020-01-18  4:45           ` Damian McGuckin
2020-01-18  5:29             ` Rich Felker
2020-01-19  8:50               ` Damian McGuckin
2020-01-19  9:07           ` Damian McGuckin
2020-01-19 10:42             ` Szabolcs Nagy
2020-01-19 12:25               ` Damian McGuckin
2020-01-20  5:32           ` Damian McGuckin
2020-01-20 17:38             ` Rich Felker
2020-01-21  3:53           ` Damian McGuckin
2020-01-21  4:22             ` Rich Felker
2020-01-21  4:46               ` Damian McGuckin
2020-01-21  7:26                 ` Damian McGuckin
2020-01-17 16:41         ` Markus Wichmann
2020-01-18  1:15           ` Szabolcs Nagy
2020-01-18  5:03             ` Damian McGuckin
2020-01-18  5:37               ` Rich Felker
2020-01-18  9:40                 ` Szabolcs Nagy
2020-01-24  0:42                   ` Damian McGuckin
2020-01-24  1:11                     ` Rich Felker
2020-01-24  4:13                       ` Damian McGuckin
2020-01-24  4:55                         ` Rich Felker
2020-01-24  6:08                           ` Damian McGuckin
2020-01-24 13:44                             ` Rich Felker
2020-01-24 14:45                               ` Damian McGuckin
2020-01-24 23:59                               ` Damian McGuckin
2020-01-25  0:11                                 ` Rich Felker
2020-01-26  3:28                                   ` Damian McGuckin
2020-01-26  3:28                                   ` Damian McGuckin
2020-01-26  3:30                                   ` Damian McGuckin
2020-01-26  3:32                                   ` Damian McGuckin
2020-01-26  5:25                                   ` Damian McGuckin
2020-01-27  3:32                                   ` Damian McGuckin
2020-01-27  5:39                                   ` Damian McGuckin
2020-02-02  0:47                                   ` Damian McGuckin
2020-02-03  0:54                                   ` Damian McGuckin
2020-02-03  2:09                                     ` Rich Felker
2020-02-03  2:12                                       ` Damian McGuckin
2020-02-22 20:17                                         ` Rich Felker
2020-02-22 20:53                                           ` Damian McGuckin
2020-02-07 21:25                                   ` Damian McGuckin
2020-02-07 21:38                                     ` Rich Felker
2020-02-07 23:53                                       ` Damian McGuckin
2020-02-09  5:04                                       ` Damian McGuckin
2020-01-24  7:10                           ` Damian McGuckin
2020-01-18  5:04             ` Markus Wichmann

mailing list of musl libc

Archives are clonable: git clone --mirror http://inbox.vuxu.org/musl

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.musl


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git