From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13859 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 00/18] math updates Date: Sun, 24 Feb 2019 22:00:31 +0100 Message-ID: <20190224210016.GA21289@port70.net> References: <20181208125009.GY21289@port70.net> <20190223233602.GO23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="51900"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) To: musl@lists.openwall.com Original-X-From: musl-return-13875-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 24 22:00:49 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gy0tC-000DJZ-4s for gllmg-musl@m.gmane.org; Sun, 24 Feb 2019 22:00:46 +0100 Original-Received: (qmail 1253 invoked by uid 550); 24 Feb 2019 21:00:44 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 1235 invoked from network); 24 Feb 2019 21:00:43 -0000 Mail-Followup-To: musl@lists.openwall.com Content-Disposition: inline In-Reply-To: <20190223233602.GO23599@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:13859 Archived-At: * Rich Felker [2019-02-23 18:36:02 -0500]: > On Sat, Dec 08, 2018 at 01:50:10PM +0100, Szabolcs Nagy wrote: > > add new code from > > > > https://github.com/ARM-software/optimized-routines > > I think when I merge this, I should include a sha1 hash to indicate > what revision it was based on. This documents both code history and > license change. There should also be an update to the COPYRIGHT file > indicating ARM copyright w/MIT license on new math code. ok > > with small modifications: > > - remove all code paths related to errno handling (WANT_ERRNO cases) > > - reformat with clang-format to linux style instead of gnu style. > > - drop non-default configs (polynomial and table size settings) > > - remove SNaN support (but keep the code so adding it back is easy) > > - use __FP_FAST_FMA feature test with __builtin_fma > > I think this should also be conditioned on defined(__GNUC__) or > __GNUC__>=something or similar. It's not clear that __FP_FAST_FMA is > supposed to imply that. But it looks like it's also used to condition > presence of data in headers, so perhaps libm.h should define something > indicating whether to use __builtin_fma to put the logic all in one > place. in gcc __FP_FAST_FMA is only defined if __builtin_fma is inlined as a single instruction, i assumed other implementations would not define __FP_FAST_FMA unless they also supported __builtin_fma and able to inline it. (clang does not define it, icc seems to, they both support __builtin_fma with inlining) > > - some macros got renamed: barriers, unlikely, HIDDEN > > - kept TOINT_INTRINSICS code paths, but it's never set > > (requires __builtin_round and __builtin_lround support as single insn) > > That's ok. Do you know if it helps enough that we should consider > adding these later? the generic c solution is problematic for non-nearest rounding mode, because there is no way to efficiently compute a remainder that works in all rounding modes: r = x - toint(x) // r in [-0.5, 0.5] e.g. i don't yet know how to do this reasonably for double prec trig functions. it turns out in exp/exp2/pow it's ok to have r in [-1,1] if we accept slightly worse quality for non-nearest rm. then x+Shift-Shift works as toint and it should not be much slower than a toint(x) instruction (the latency will be a bit higher, i'd say ~5% for an exp call) but in trig functions you really want r in [-0.5-tiny, 0.5+tiny], and you also want -x to give -r so trig function symmetry is not broken (ideally in all rounding modes). this is easy to achieve with a rounding mode independent toint instruction, but not with normal arithmetic operators. (e.g. one approach is to just call round and lround and on targets where there is single instruction it will be very fast, on others it will be very slow, but at least the behaviour will be consistent. in trig functions the arg reduction is not in the critical path for common inputs so the cost may be justified, but in exp such approach would be bad) > > - error handling is split up across several translation units. > > - data layout declarations are split into several _data.h headers > > > > todo: > > - fp_barrier implementation for various targets > > Isn't the only thing that's needed the asm constraint letter for float > regs? This would be easier than duplicating the code per-target. What > are the exact barrier semantics? Hiding how the output depends on the > input from buggy optimizer? Or forcing that the evaluation happen? it's mostly just constraint letter, but e.g. on soft-fp it is probably not needed (fenv does not work anyway). there are several use-cases for the barrier: - ensure a value is computed so the side-effect happens even if the result is otherwise unused. - ensure constant computation is not folded i.e. the value should be hidden from the compiler. - ensure an operation is not hoisted from a conditional path. (the current barrier does not guarantee this i think, but at least the inline asm prevents if conversions to single x=a?b:c; instruction with both b and c evaluated unconditionally, with x=a?barrier(b):c the compiler in principle can still do the evals unconditionally, but in practice it does not want to) i didn't want to introduce many barriers for the different use-cases, in general things work (the compiler cannot do fenv breaking transformations without whole program analysis) and only a few places need some little tweaks. trying to provide correct fenv behaviour assuming the compiler does not know anything about fenv is a bit fragile no matter how it is done, i'm not sure what's a reasonable definition for the barriers (the definition would likely rely on no-lto) or if there is a cleaner way to achieve the same result (ideally compilers would provide a __builtin_fp_barrier thingy). > > - musl does not enable fma contraction, new code would be better with it > > Yes, we could do it selectively if gcc honored the pragma... it honors the cflag, e.g. we can add -ffp-contract=fast to math/*.c (i think this should be ok, if we run into trouble we can remve the optimization) > > - musl disables fabs etc inlining, using builtins would help > > Yes. I've thought of trying to make the src/include/*.h patch this up > to use the builtins that we can. The problem is not math-specific; > string functions (esp memcpy/memset) are an even bigger issue. i agree. > > - FENV_ACCESS pragma should be set in some top level header in principle > > (like features.h) > > Is there a reason? I like it as documentation of what actually wants > it, even if gcc currently isn't honoring it. libm.h might make sense > since all of libm needs it. the iso c definition is "all or nothing" kind, so any code that may run under non-default rounding mode must be compiled with FENV_ACCESS "on" (otherwise ub) so if we want to support libc calls with non-default rounding mode then everything needs it on. unfortunately "on" also implies that any call (and possibly inline asm?) may change the rounding mode which prevents some useful optimizations. (the same story holds for status flag checks: any call may modify or check the flags), which is why compiler devs don't like this pragma thing. > > - use the new helper functions/macros in existing code. > > > > overall libc.so code size increase on x86_64: +8540 bytes > > > > (i'll send the patches as attachments in two parts, because they are too > > big for one mail) > > > > Szabolcs Nagy (18): > > define FP_FAST_FMA* when fma* can be inlined > > math: move complex math out of libm.h > > math: add asuint, asuint64, asfloat and asdouble > > math: remove sun copyright from libm.h > > math: add fp_arch.h with fp_barrier and fp_force_eval > > math: add eval_as_float and eval_as_double > > math: add single precision error handling functions > > math: add double precision error handling functions > > math: add macros for static branch prediction hints > > math: add configuration macros > > A few of these things (esp the ones that facilitate broken compilers > that don't implement excess-precision right) I'm not really a fan of, > or at least don't know if I am (I might be convinced if I'd worked on > the code they affect). But as long as they're what work well for you > developing and improving this code, and make it easier to share with > other projects, I have no objection to going forward with them. eval_as_float assumes the compiler will honor the assignment, in practice that's not enough for gcc with default excess precision handling but if we have the eval_as_float call in all the relevant places, it is easy to reuse the code externally with non-standard build flags: only the eval_as_float definition needs to be changed (e.g. by adding a barrier). > > Sorry it took a while to get to this. I know sometime last release > cycle I said I'd do it in the next cycle, but didn't remember to look > back at it right away. The performance figures and code improvements > look nice and I definitely want to have this in 1.1.22. ok.