From: Damian McGuckin <damianm@esi.com.au>
To: musl@lists.openwall.com
Subject: Re: [musl] Considering x86-64 fenv.s to C
Date: Fri, 24 Jan 2020 15:13:32 +1100 (AEDT) [thread overview]
Message-ID: <alpine.LRH.2.02.2001241227030.6298@key0.esi.com.au> (raw)
In-Reply-To: <20200124011122.GP30412@brightrain.aerifal.cx>
Let me know if I missed something in the following:
On Thu, 23 Jan 2020, Rich Felker wrote:
>> #ifndef FE_ALL_EXCEPT
>> #define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
>> #define FE_ALL_EXCEPT (FE_INVALID|FE_VALID_EXCEPT)
>> #endif
>
> I don't understand what this is for. FE_ALL_EXCEPT is defined in the
> public fenv.h and FE_VALID_EXCEPT is not used.
For the first question, it was done purely to avoid the need to define
FE_ALL_EXCEPT in all of the various 'fenv.h' files. I am happy to throw it
away. For the second question, at one stage, I considered having the
concept of what was anything except an INVALID exception.
I will remove those 4 lines.
>> /*
>> * The usual approach to validating an exception mask is to throw out
>> * anything which is illegal. Some systems will NOT do this, choosing
>> * instead to return a non-zero result on encountering an illegal mask
>> */
>
> The latter is not desirable behavior and not something you need to be
> trying to reproduce. It's a bug or at least an unspecified case we
> don't care about in existing code.
Wonderful. I was purely trying to preserve historical performance. I am
more than happy to NOT do this. That makes the next problem easy to fix.
>> #ifndef FE_VALIDATE_EXCEPT
>> #define FE_VALIDATE_EXCEPT(e) e &= FE_ALL_EXCEPT
>> #endif
>
> Write things like this directly in code; don't hide them behind
> macros. Doing that just makes the code hard to read because you can't
> tell what it's doing without searching for the macro definition. (Also
> the macro is not protected by parens as written but that's easily
> fixable.)
Because we can ignore that non-zero return case, we no longer need
an architecture dependent macro for this so it can go directly in
the code. So, again, an easy problem to fix.
>> /*
>> * There is usually no need to mess the generic valid exceptions bits
>> * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
>> * macro for such a scenario - is this the MUSL accepted style????
>> */
>> #ifndef FE_QUALIFY_CLEAR_EXCEPT
>> #define FE_QUALIFY_CLEAR_EXCEPT(excepts) (0)
>> #endif
>> #ifndef FE_QUALIFY_RAISE_EXCEPT
>> #define FE_QUALIFY_RAISE_EXCEPT(excepts) (0)
>> #endif
>
> Can you clarify how these would be defined for powerpc? Could the logic
> just be embedded in get_sr/set_sr if needed?
Address the second question first, my original aim was to keep any logic
out of where embedded assembler was used, mainly because of my own very
irregular experience with it. I could also not convince myself that
get_sr and set_sr may need to know from where they are called. In the
case of the powerpc64, with the SR and CR being the same, that adds to the
complexity, so I gave up.
Answering your first question, for the powerpc64.
#define QUALIFY_CLEAR_EXCEPT(e) if (e & FE_INVALID) e |= FE_ALL_INVALID
#define QUALIFY_RAISE_EXCEPT(e) if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE
An alternative is to avoid these macros and then
#ifndef FE_ALL_INVALID
#define FE_ALL_INVALID 0
#endif
#ifndef FE_INVALID_SOFTWARE
#define FE_INVALID_SOFTWARE 0
#endif
and have respectively the following code in feclear.. and feraise..
if (excepts & FE_INVALID)
excepts |= FE_ALL_INVALID;
and
if (excepts & FE_INVALID)
excepts |= FE_INVALID_SOFTWARE;
An optimize should see the OR with zero for most architectures and then
ignore that whole 'if' statement because the action was a NO-OP.
>
>> /*
>> * Compute the rounding mask with some rigid logic
>> */
>> #ifndef FE_TO_NEAREST
>> #define FE_TO_NEAREST 0
>> #define ROUND_MASK ((unsigned int) 0)
>> #else
>> #ifndef FE_TOWARDS_ZERO
>> #define ROUND_MASK ((unsigned int) 0)
>> #else
>> #ifdef FE_DOWNWARD
>> #ifdef FE_UPWARD
>> #define ROUND_MASK ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
>> #else
>> #define ROUND_MASK UNSUPPORTED rounding
>> #endif
>> #else
>> #ifdef FE_UPWARD
>> #define ROUND_MASK UNSUPPORTED rounding
>> #else
>> #define ROUND_MASK ((unsigned int) (FETOWARDZERO))
>> #endif
>> #endif
>> #endif
>> #endif
>
> I don't see why you have "UNSUPPORTED rounding" cases here. Any
> combination should be valid.
My deliberate assumption was that the only legal combinations were
round to nearest only
or
round to nearest and truncate only
or
round to nearest, truncate, and round to either +/- INFINITY
Whether that assumption is valid is another thing but I am happy to be
proven wrong. If somebody defined an architecture where that assumption
was not valid, I wanted the compiler to complain at me.
> Since these are not bitmasks, probably the arch should just have to
> define the rounding mode mask. For example if the values were 0b000,
> 0b001, 0b100, and 0b101, but the 0 in the middle was significant, it
> should probably be treated as part of the mask.
That was not the way I read it but again, I am happy to be proven wrong.
If the net OR'ing of the various types of rounding cannot be treated as a
mask, then every *BSD libc implementations that I know will need to be
fixed because they do treat it that way. I did run tests and none of the
ones I tested from BSD and MUSL had that structure. Maybe my test code
was wrong. They all seems to be 2 bits where TO_NEAREST was all bits
zero and one of the others was the some of the other two.
> If you really want to generate the mask, something like
>
> #ifdef FE_TONEAREST
> #define FE_TONEAREST_MASK FE_TONEAREST
> #else
> #define FE_TONEAREST_MASK 0
> #endif
> ...
> #define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)
>
> should be simpler and comprehensive.
I will run a test but I think they produce the same result for all
the architecture for which I have definition files.
>> int
>> feclearexcept(excepts)
>> {
>> FE_VALIDATE_EXCEPT(excepts);
>> FE_QUALIFY_CLEAR_EXCEPT(excepts);
>> setsr(getsr() & ~((unsigned int) excepts));
>> return(0);
>> }
>
> No function-like parens around return value, no gratuitous casts.
No problems. Sorry, gratuitous casting is a habit to stop various linters
complaining. I went looking for a MUSL style guide but could not find one.
>> static inline int
>> __raisearithmetically(int excepts)
>> {
>> /*
>> * assume single OP is faster than double OP
>> */
>> const float one = (float) 1;
>> const float zero = (float) 0;
>> const float tiny = (float) 0x1.0p-126;
>> const float huge = (float) 0x1.0p+126;
>> volatile float x;
>>
>> /*
>> * if it is just a simple exception, arithmetic expressions are optimal
>> */
>> switch(excepts)
>> {
>> case FE_INVALID:
>> x = zero, x /= x;
>> break;
>> case FE_DIVBYZERO:
>> x = zero, x = one / x;
>> break;
>> case FE_INEXACT:
>> x = tiny, x += one;
>> break;
>> case (FE_OVERFLOW | FE_INEXACT):
>> x = huge, x *= x;
>> break;
>> case (FE_UNDERFLOW | FE_INEXACT):
>> x = tiny, x *= x;
>> break;
>> default: /* if more than one exception exists, a sledgehammer is viable */
>> setsr(getsr() | ((unsigned int) excepts));
>> break;
>> }
>> return(0);
>> }
>
> This is probably ok (aside from style things mentioned above), but
> simply recording the flags in software without writing them to the
> status register at all may be preferable. In that case we would not even
> need a set_sr primitive, only get_sr and clear_sr.
Yes. I like the idea. But I was deliberately not trying to change anything
external to 'fenv.c'. Also, is the handling of __pthread_self() protected
into the future?
> Doing this would also simplify the ppc special-casing for FE_INVALID I
> think..
I think only for feraiseexcept as you show. It does not simplify
feclearexcept as far as I can see.
>> int
>> feraiseexcept(int excepts)
>> {
>> FE_VALIDATE_EXCEPT(excepts);
>> FE_QUALIFY_RAISE_EXCEPT(excepts);
>> return __raisearithmetically(excepts);
>> }
>
> Then this could just be __pthread_self()->fpsr |= excepts;
Very clean and simple. Is it portable outside of Linux? Not that this is
relevant for MUSL which is designed for Linux-based devices but I do work
on non-Linux (and no C/C++) systems so I am curious.
>> int
>> fegetround(void)
>> {
>> return (int) (getcr() & ROUND_MASK);
>> }
>>
>> int
>> fesetround(int rounding_mode)
>> {
>> if ((rounding_mode & ~ROUND_MASK) == 0)
>> {
>> unsigned int mode = ((unsigned int) rounding_mode);
>>
>> return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
>> }
>> return(-1);
>> }
>
> I would put the error test at the beginning rather than putting the
> whole body inside {}:
>
> if (rounding_mode & ~ROUND_MASK) return -1;
Happy to. A while ago, I found that the alternative produces more optimal
code when dealing with floating point dominant routines. But that is huge
generalization and I have not tested that generalization in this case.
>> int
>> fegetenv(fenv_t *envp)
>> {
>> return ((gete(envp)), 0);
>> }
>
> Is there a reason for using comma operator here?
I used it to let the compiler prove that my #defines for gete() and sete()
worked as if they were a single statement. That gave me some rigour on the
design of that macro, which in the case of the m68k (and eventually the
i386), does demand a comma operator.
But for the production code, it will disappear.
>> int
>> fesetenv(fenv_t *envp)
>> {
>> fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
>>
>> return ((sete(envp)), 0);
>> }
>
> It might be preferable to use a static const object for default env;
> I'm not sure. Not a big deal either way though.
I thought of that. I used to always do that until 5 years ago until I
found that an auto variable would often produce more optimal code. It
needs to be checked but it is irrelevant to the fundament design issues.
Ongoing:
I will incorporate your suggestions except for the __pthread_self()->fpsr
tweaks which I like but cannot test across multiple O/S's and then we can
review this. Should I repost to make sure I got everything and we have a
reference point?
I will post some inline assembler for your review and entertainment but in
the spirit of a generic template, they should be reviewed in isolation.
Also, I notice that on the mips, the single unsigned in an 'fenv_t' is
encapsulated in a struct. Is this historical, mandatory, stylistic, a way
to fix a compiler bug, compatability or something else? I also notice that
there is a subtle difference in the 'fegetenv' assembler between
diff mips/*.S mips64/*.S
64c64
< addiu $5, $4, 1
---
> daddiu $5, $4, 1
I guess this is just struct alignment issues? Sorry, it is 20 years since
I worked with the assembler on a MIPS so my knowledge is really rusty.
Regards - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
next prev parent reply other threads:[~2020-01-24 4:13 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-16 4:30 Damian McGuckin
2020-01-16 15:11 ` Markus Wichmann
2020-01-16 16:14 ` Rich Felker
2020-01-16 18:56 ` Damian McGuckin
2020-01-16 19:33 ` Rich Felker
2020-01-16 21:31 ` Damian McGuckin
2020-01-17 3:36 ` Damian McGuckin
2020-01-17 3:48 ` Damian McGuckin
2020-01-17 3:53 ` David Edelsohn
2020-01-17 14:13 ` Rich Felker
2020-01-17 14:19 ` David Edelsohn
2020-01-17 14:53 ` Rich Felker
2020-01-18 4:45 ` Damian McGuckin
2020-01-18 5:29 ` Rich Felker
2020-01-19 8:50 ` Damian McGuckin
2020-01-19 9:07 ` Damian McGuckin
2020-01-19 10:42 ` Szabolcs Nagy
2020-01-19 12:25 ` Damian McGuckin
2020-01-20 5:32 ` Damian McGuckin
2020-01-20 17:38 ` Rich Felker
2020-01-20 21:11 ` [musl] Triggering Overflow (or Underflow) without triggering Inexact on i386 Damian McGuckin
2020-01-20 22:32 ` Szabolcs Nagy
2020-01-21 3:53 ` [musl] Considering x86-64 fenv.s to C Damian McGuckin
2020-01-21 4:22 ` Rich Felker
2020-01-21 4:46 ` Damian McGuckin
2020-01-21 7:26 ` Damian McGuckin
2020-01-17 16:41 ` Markus Wichmann
2020-01-18 1:15 ` Szabolcs Nagy
2020-01-18 5:03 ` Damian McGuckin
2020-01-18 5:37 ` Rich Felker
2020-01-18 9:40 ` Szabolcs Nagy
2020-01-24 0:42 ` Damian McGuckin
2020-01-24 1:11 ` Rich Felker
2020-01-24 4:13 ` Damian McGuckin [this message]
2020-01-24 4:55 ` Rich Felker
2020-01-24 6:08 ` Damian McGuckin
2020-01-24 13:44 ` Rich Felker
2020-01-24 14:45 ` Damian McGuckin
2020-01-24 23:59 ` Damian McGuckin
2020-01-25 0:11 ` Rich Felker
2020-01-26 3:28 ` Damian McGuckin
2020-01-26 3:28 ` Damian McGuckin
2020-01-26 3:30 ` Damian McGuckin
2020-01-26 3:32 ` Damian McGuckin
2020-01-26 5:25 ` Damian McGuckin
2020-01-27 3:32 ` Damian McGuckin
2020-01-27 5:39 ` Damian McGuckin
2020-02-02 0:47 ` Damian McGuckin
2020-02-03 0:54 ` Damian McGuckin
2020-02-03 2:09 ` Rich Felker
2020-02-03 2:12 ` Damian McGuckin
2020-02-22 20:17 ` Rich Felker
2020-02-22 20:53 ` Damian McGuckin
2020-02-23 5:41 ` Damian McGuckin
2020-02-23 6:25 ` Rich Felker
2020-02-23 8:35 ` Damian McGuckin
2020-02-07 21:25 ` Damian McGuckin
2020-02-07 21:38 ` Rich Felker
2020-02-07 23:53 ` Damian McGuckin
2020-02-09 5:04 ` Damian McGuckin
2020-01-24 7:10 ` Damian McGuckin
2020-01-18 5:04 ` Markus Wichmann
2020-02-03 14:16 [musl] PPC64(LE) support in musl requires ALTIVEC Romain Naour
2020-02-03 14:50 ` Rich Felker
2020-02-05 1:32 ` [musl] Considering x86-64 fenv.s to C Damian McGuckin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LRH.2.02.2001241227030.6298@key0.esi.com.au \
--to=damianm@esi.com.au \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).