* [musl] roundf() (and round(), and ...) @ 2024-06-17 1:48 Damian McGuckin 2024-06-18 12:23 ` Szabolcs Nagy 0 siblings, 1 reply; 26+ messages in thread From: Damian McGuckin @ 2024-06-17 1:48 UTC (permalink / raw) To: MUSL Before I submit any suggestion for a change to roundf/round/.., could I get a critique of the style to make sure I compy with the style guide. Also, I have not used anything except for the now default FLT_EVAL_METHOD so I need guidance as to where to use 'float_t'. However, the only place where such arithmetic (here) might be affected is the expression 'a + a' which is exact anyway so I think it is irrelevant. Thanks - Damian Code follows: float roundf(float x) { static const int b = 0x7f; const float a = fabsf(x); union { float f; uint32_t _f; } r = { a }, _x = { x }; if (((int) (r._f >> 23)) < b + 23) /* i.e. effectively |x| < 2^(p-1) */ { /* * capture the sign of the argument */ const uint32_t s = _x._f - r._f; /* * this should achieve rounding to nearest with any * ties (half-way cases) being rounded away-from-zero. * (is it wise to use uint32_t instead of int32_t here?) */ const uint32_t rf = ((uint32_t) (a + a)) - (uint32_t) a; x = (r.f = (float) rf, r._f |= s, r.f); } return x; } Note that as per the latest IEEE-754 standard, the above does not raise an exception in the event of the rounding being inexact. This is not backwards compatible with the existing MUSL equivalents. While it is another issue altogether, this latest standard also drops the nearbyint() in favour of routines called (as per the C standard) 'roundevenf()/roundeven()/...'. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-17 1:48 [musl] roundf() (and round(), and ...) Damian McGuckin @ 2024-06-18 12:23 ` Szabolcs Nagy 2024-06-18 12:40 ` Damian McGuckin ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Szabolcs Nagy @ 2024-06-18 12:23 UTC (permalink / raw) To: Damian McGuckin; +Cc: MUSL * Damian McGuckin <damianm@esi.com.au> [2024-06-17 11:48:38 +1000]: > Before I submit any suggestion for a change to roundf/round/.., could I get > a critique of the style to make sure I compy with the style guide. > > Also, I have not used anything except for the now default FLT_EVAL_METHOD > so I need guidance as to where to use 'float_t'. However, the only place > where such arithmetic (here) might be affected is the expression 'a + a' > which is exact anyway so I think it is irrelevant. note that it is unspecified if your algorithm raises inexact or not. (iirc musl asm implementations don't raise inexact, the c code does, c23 now requires no inexact which i guess is what you tried to follow, but that is hard to do in c) otherwise i think your style is fine, but i added some comments. FLT_EVAL_METHOD is not relevant for this code. benchmark data may be useful (or code size e.g. on a soft-float target) because that may be a valid justification to use this implementation. > > Thanks - Damian > > Code follows: > > float roundf(float x) > { > static const int b = 0x7f; > const float a = fabsf(x); > union { float f; uint32_t _f; } r = { a }, _x = { x }; > > if (((int) (r._f >> 23)) < b + 23) /* i.e. effectively |x| < 2^(p-1) */ > { > /* > * capture the sign of the argument > */ > const uint32_t s = _x._f - r._f; i used to use explicit unions then switched to helper function asuint(x) because i found that a bit clearer and compiler optimizes it just fine. and some ppl complained that memcpy is more correct than union, so it is better hidden away behind a helper function if somebody wants to switch. nowadays i'd probably write if (asuint(a) >> 23 < asuint(0x1p23f) >> 23) but e.g. i find it just as clear writing const uint32_t abits = asuint(a); if (abits >> 23 < 0x7f + 23) > /* > * this should achieve rounding to nearest with any > * ties (half-way cases) being rounded away-from-zero. > * (is it wise to use uint32_t instead of int32_t here?) > */ > const uint32_t rf = ((uint32_t) (a + a)) - (uint32_t) a; > > x = (r.f = (float) rf, r._f |= s, r.f); it is isa dependent if int32_t or uint32_t is better, but i'd expect signed convert is more widely supported than unsigned. so i'd write this as const float r = (float)((int32_t)(a + a) - (int32_t)a); x = asfloat(asuint(r) | s); > } > return x; > } > > Note that as per the latest IEEE-754 standard, the above does not raise an > exception in the event of the rounding being inexact. This is not backwards it is unspecified if (uint32_t)a raises inexact exception, so this will be target and compiler dependent in practice. ieee-754-2008: 5.8 Details of conversions from floating-point to integer formats ... A language standard that permits implicit conversions or expressions involving mixed types should require that these be implemented with the inexact-signaling conversion operations below. c23: F.4 Floating to integer conversion ... ... whether conversion of a non-integral floating value raises the "inexact" floating-point exception is unspecified.[note] ... note: In those cases where it matters, library functions can be used to effect such conversions with or without raising the "inexact" floating-point exception. > compatible with the existing MUSL equivalents. While it is another issue > altogether, this latest standard also drops the nearbyint() in favour of > routines called (as per the C standard) 'roundevenf()/roundeven()/...'. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-18 12:23 ` Szabolcs Nagy @ 2024-06-18 12:40 ` Damian McGuckin 2024-06-19 7:09 ` Damian McGuckin 2024-06-19 8:17 ` Damian McGuckin 2 siblings, 0 replies; 26+ messages in thread From: Damian McGuckin @ 2024-06-18 12:40 UTC (permalink / raw) To: MUSL Thanks for the thorough review I will look at it in more tomorrow. Almost midnight here now - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-18 12:23 ` Szabolcs Nagy 2024-06-18 12:40 ` Damian McGuckin @ 2024-06-19 7:09 ` Damian McGuckin 2024-06-19 8:17 ` Damian McGuckin 2 siblings, 0 replies; 26+ messages in thread From: Damian McGuckin @ 2024-06-19 7:09 UTC (permalink / raw) To: MUSL Commenting on only part of that: On Tue, 18 Jun 2024, Szabolcs Nagy wrote: > it is unspecified if (uint32_t)a raises inexact exception, > so this will be target and compiler dependent in practice. Yes. Scary. > ieee-754-2008: > > 5.8 Details of conversions from floating-point to integer formats > ... > A language standard that permits implicit conversions or expressions > involving mixed types should require that these be implemented with > the inexact-signaling conversion operations below. IEEE-754 2019 now specifies: The operations for conversion from floating-point to a specific signed or unsigned integer format without signaling the inexact exception are: One of which is: intFormatOf-convertToIntegerTowardZero(source) convertToIntegerTowardZero(x) rounds x to an integral value toward zero. AND The operations for conversion from floating-point to a specific signed or unsigned integer format, signaling if inexact, are: intFormatOf-convertToIntegerExactTowardZero(source) convertToIntegerExactTowardZero(x) rounds x to an integral value toward zero. So, it depends as you say, which of those two operations the compiler uses when it translates: (integral-type) floating-point-expression If indeed the target supports both. > c23: > > F.4 Floating to integer conversion > ... > ... whether conversion of a non-integral floating value raises the > "inexact" floating-point exception is unspecified.[note] > ... > note: In those cases where it matters, library functions can be used > to effect such conversions with or without raising the "inexact" > floating-point exception. Ah, but then such functions are lower down those that we currently name. Thanks - Damian Pacific Engineering Systems International ..... 20D Grose St, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-18 12:23 ` Szabolcs Nagy 2024-06-18 12:40 ` Damian McGuckin 2024-06-19 7:09 ` Damian McGuckin @ 2024-06-19 8:17 ` Damian McGuckin 2024-06-19 18:58 ` Szabolcs Nagy 2 siblings, 1 reply; 26+ messages in thread From: Damian McGuckin @ 2024-06-19 8:17 UTC (permalink / raw) To: MUSL On Tue, 18 Jun 2024, Szabolcs Nagy wrote: >> Also, I have not used anything except for the now default >> FLT_EVAL_METHOD so I need guidance as to where to use 'float_t'. >> However, the only place where such arithmetic (here) might be affected >> is the expression 'a + a' which is exact anyway so I think it is >> irrelevant. > > note that it is unspecified if your algorithm raises inexact or not. > (iirc musl asm implementations don't raise inexact, the c code does, > c23 now requires no inexact which i guess is what you tried to follow, Yes. > but that is hard to do in c) Yes, well for certain (and across multiple architectures). > otherwise i think your style is fine, but i added some comments. > > FLT_EVAL_METHOD is not relevant for this code. That was my placeholder as in 'Memo to Self'. I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so I am likely to not handle the other scenarios properly, i.e. using the types 'float_t' and 'double_t' as appropriate. So I am worried that my code is buggy in that context. > benchmark data may be useful (or code size e.g. on a soft-float target) > because that may be a valid justification to use this implementation. It is much less assembler size on a hard-float target. I have no experience on a soft-float target. In my testing on X86-64, it is faster than either existing MUSL code or GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'. Not that these routines are going to be performance killers. > i used to use explicit unions then switched to helper function asuint(x) > because i found that a bit clearer and compiler optimizes it just fine. > and some ppl complained that memcpy is more correct than union, so it is > better hidden away behind a helper function if somebody wants to switch. I initially wrote these routines using these helper functions but went back to unions because I had some issues. Unions are the only things which work at compile-time. But I am happy to go back to helper function. I just redid them routines and I notices that they read quite well. They also map nicely into things like Rust's or Chapel's 'transmute' feature although they work at compile time and these helper functions do not. I had also tried implementing these routines with <ieee754.h> from Linux to see how well that worked. I have very little experience with the use of 'memcpy" as this also has problems at compile time. > nowadays i'd probably write > > if (asuint(a) >> 23 < asuint(0x1p23f) >> 23) I wish it was asuint32 rather than asuint() but maybe I am too picky. I would prefer to define the bias 'b' and precision 'p' and write it as if ((asuint(a) >> (p - 1)) < b + (p - 1)) But I was worried that would be a change too far. Also, I use different languages in a given week so I probably us more coercions (or casts) and parentheses than MUSL prefers for C. >> /* >> * this should achieve rounding to nearest with any >> * ties (half-way cases) being rounded away-from-zero. >> * (is it wise to use uint32_t instead of int32_t here?) >> */ >> const uint32_t rf = ((uint32_t) (a + a)) - (uint32_t) a; >> >> x = (r.f = (float) rf, r._f |= s, r.f); > > it is isa dependent if int32_t or uint32_t is better, but i'd > expect signed convert is more widely supported than unsigned. Thanks for the advice - I am just testing that change now - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-19 8:17 ` Damian McGuckin @ 2024-06-19 18:58 ` Szabolcs Nagy 2024-06-19 23:20 ` Rich Felker 2024-06-21 8:12 ` Damian McGuckin 0 siblings, 2 replies; 26+ messages in thread From: Szabolcs Nagy @ 2024-06-19 18:58 UTC (permalink / raw) To: Damian McGuckin; +Cc: MUSL * Damian McGuckin <damianm@esi.com.au> [2024-06-19 18:17:21 +1000]: > On Tue, 18 Jun 2024, Szabolcs Nagy wrote: > > FLT_EVAL_METHOD is not relevant for this code. > > That was my placeholder as in 'Memo to Self'. > > I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so > I am likely to not handle the other scenarios properly, i.e. using the > types 'float_t' and 'double_t' as appropriate. So I am worried that my > code is buggy in that context. you can change all 'float x' declarations to 'float_t x' and sometimes this allows saving a store+load on i386 and m68k due to iso c rules. e.g. consider float foo1(float x) { T y = x*x; T z = x+y; return z; } with T==float and FLT_EVAL_METHOD!=0 this is not the same as float foo2(float x) { retrun x + x*x; } because iso c requires assignment (and return statement) to round away the excess precision, so in the former there are two float stores while in the latter only one (for the return). (with vs without the extra store the result is slightly different but they are both different from ieee754 arithmetics anyway. if the difference matters we use explicit eval_as_float. i believe the standard -ffp-contract=on mode only contracts foo2 and would give yet another different result, but musl keeps contraction off so there is no implicit fma.) there are two approaches to deal with the extra store issue (only a performance issue, not correctness): the gnu approach is that by default assignment does not round excess precision (-fexcess-precision=fast) so T==float works, but there is no guarantee where exactly the rounding happens, any register spill may do it. the musl approach is to use -fexcess-precision=standard and use T==float_t and then it is guaranteed that intermediate results are evaluated with excess precision (even if spilled). i don't think optimizing for i386 or m68k is very relevant today, so if you get this wrong it will not be a big issue. > > benchmark data may be useful (or code size e.g. on a soft-float target) > > because that may be a valid justification to use this implementation. > > It is much less assembler size on a hard-float target. > > I have no experience on a soft-float target. > > In my testing on X86-64, it is faster than either existing MUSL code or > GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'. Not that these > routines are going to be performance killers. ok, this sounds good. note that in musl fabsf() is an actual call because we build musl with -ffrestanding (builtins are turned off) at some point we should fix this to ensure important functions are optimized internally, e.g. have a header with #define fabsf(x) __builtin_fabsf(x) > > i used to use explicit unions then switched to helper function asuint(x) > > because i found that a bit clearer and compiler optimizes it just fine. > > and some ppl complained that memcpy is more correct than union, so it is > > better hidden away behind a helper function if somebody wants to switch. > > I initially wrote these routines using these helper functions but went back > to unions because I had some issues. Unions are the only things which work > at compile-time. > > But I am happy to go back to helper function. I just redid them routines and > I notices that they read quite well. They also map nicely into things like > Rust's or Chapel's 'transmute' feature although they work at compile time > and these helper functions do not. > > I had also tried implementing these routines with <ieee754.h> from Linux to > see how well that worked. I have very little experience with the use of > 'memcpy" as this also has problems at compile time. not sure what you mean by 'compile-time', in musl the helper functions are actually macros now using unions (iirc with older compilers there were some optimization issues when i used static inline function) but with recent compiler and -O2 static inline unsigned asuint(float x) { unsigned u; memcpy(&u, &x, sizeof x); return u; } should work just as well. (and this is pedantically correct even on c++) > > > nowadays i'd probably write > > > > if (asuint(a) >> 23 < asuint(0x1p23f) >> 23) > > I wish it was asuint32 rather than asuint() but maybe I am too picky. > > I would prefer to define the bias 'b' and precision 'p' and write it as > > if ((asuint(a) >> (p - 1)) < b + (p - 1)) > > But I was worried that would be a change too far. Also, I use different > languages in a given week so I probably us more coercions (or casts) and > parentheses than MUSL prefers for C. i'm used to writing out the bias term and precision as int literal but using names is fine too i guess. the naming does not always help as the c and ieee float models differ: they both use x=m*2^e, but in c m is in [0.5,1), in ieee m is in [1,2) so the e is off by one (1024 vs 1023 emax). and 'precision bits' includes the leading 1 in both models, but the representation does not store that bit so i often don't count that (53 vs 52 bits). > > > > /* > > > * this should achieve rounding to nearest with any > > > * ties (half-way cases) being rounded away-from-zero. > > > * (is it wise to use uint32_t instead of int32_t here?) > > > */ > > > const uint32_t rf = ((uint32_t) (a + a)) - (uint32_t) a; > > > > > > x = (r.f = (float) rf, r._f |= s, r.f); > > > > it is isa dependent if int32_t or uint32_t is better, but i'd > > expect signed convert is more widely supported than unsigned. > > Thanks for the advice - I am just testing that change now - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-19 18:58 ` Szabolcs Nagy @ 2024-06-19 23:20 ` Rich Felker 2024-06-21 8:12 ` Damian McGuckin 1 sibling, 0 replies; 26+ messages in thread From: Rich Felker @ 2024-06-19 23:20 UTC (permalink / raw) To: Damian McGuckin, MUSL On Wed, Jun 19, 2024 at 08:58:37PM +0200, Szabolcs Nagy wrote: > * Damian McGuckin <damianm@esi.com.au> [2024-06-19 18:17:21 +1000]: > > On Tue, 18 Jun 2024, Szabolcs Nagy wrote: > > > FLT_EVAL_METHOD is not relevant for this code. > > > > That was my placeholder as in 'Memo to Self'. > > > > I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so > > I am likely to not handle the other scenarios properly, i.e. using the > > types 'float_t' and 'double_t' as appropriate. So I am worried that my > > code is buggy in that context. > > you can change all 'float x' declarations to 'float_t x' and > sometimes this allows saving a store+load on i386 and m68k > due to iso c rules. > > e.g. consider > > float foo1(float x) > { > T y = x*x; > T z = x+y; > return z; > } > > with T==float and FLT_EVAL_METHOD!=0 this is not the same as > > float foo2(float x) > { > retrun x + x*x; > } > > because iso c requires assignment (and return statement) to > round away the excess precision, so in the former there are two > float stores while in the latter only one (for the return). > (with vs without the extra store the result is slightly different > but they are both different from ieee754 arithmetics anyway. > if the difference matters we use explicit eval_as_float. > i believe the standard -ffp-contract=on mode only contracts > foo2 and would give yet another different result, but musl keeps > contraction off so there is no implicit fma.) > > there are two approaches to deal with the extra store issue > (only a performance issue, not correctness): > > the gnu approach is that by default assignment does not round > excess precision (-fexcess-precision=fast) so T==float works, > but there is no guarantee where exactly the rounding happens, > any register spill may do it. > > the musl approach is to use -fexcess-precision=standard and > use T==float_t and then it is guaranteed that intermediate > results are evaluated with excess precision (even if spilled). > > i don't think optimizing for i386 or m68k is very relevant today, > so if you get this wrong it will not be a big issue. The code needs to be correct even on archs with excess precision. But there's usually only a correctess distinction for functions like this with bit-exact/correctly-rounded results, which we don't require for most of libm. > > > benchmark data may be useful (or code size e.g. on a soft-float target) > > > because that may be a valid justification to use this implementation. > > > > It is much less assembler size on a hard-float target. > > > > I have no experience on a soft-float target. > > > > In my testing on X86-64, it is faster than either existing MUSL code or > > GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'. Not that these > > routines are going to be performance killers. > > ok, this sounds good. > > note that in musl fabsf() is an actual call because we build musl > with -ffrestanding (builtins are turned off) > > at some point we should fix this to ensure important functions are > optimized internally, e.g. have a header with > > #define fabsf(x) __builtin_fabsf(x) Yes. I think the src/include/*.h functions should do this where appropriate, but we probably need detection unless __GNUC__ is sufficient to assume the existence of all the ones we want to use. We also need a list of functions it would be helpful for (mainly memcpy, memset, strlen & friends, and some math functions, I think) and some review to make sure there are no places where using the builtins would be breaking. > > > nowadays i'd probably write > > > > > > if (asuint(a) >> 23 < asuint(0x1p23f) >> 23) > > > > I wish it was asuint32 rather than asuint() but maybe I am too picky. > > > > I would prefer to define the bias 'b' and precision 'p' and write it as > > > > if ((asuint(a) >> (p - 1)) < b + (p - 1)) > > > > But I was worried that would be a change too far. Also, I use different > > languages in a given week so I probably us more coercions (or casts) and > > parentheses than MUSL prefers for C. > > i'm used to writing out the bias term and precision as int literal > but using names is fine too i guess. I prefer having it written out where it's used rather than having to go look somewhere else to see the value. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-19 18:58 ` Szabolcs Nagy 2024-06-19 23:20 ` Rich Felker @ 2024-06-21 8:12 ` Damian McGuckin 2024-06-21 8:16 ` Damian McGuckin 2024-06-23 18:33 ` Rich Felker 1 sibling, 2 replies; 26+ messages in thread From: Damian McGuckin @ 2024-06-21 8:12 UTC (permalink / raw) To: MUSL There is an optimization in roundf() (and an equivalent in round()) which handles the case of numbers 0 < |x| < 0.5 if (e < 0x7f-1) { return 0*u.f; } If one has lots of numbers close to zero (as you would if you were testing even possible 32-bit floating point number or for any other reason), this can mean a total run time is 5% faster overall (at the expense of the performance of cases for |x| >= 0.5. Is such an optimization worth it? And yes, we re not talking about lots of compute cycles here! Thanks - Damian Pacific Engineering Systems International ..... 20D Grose St, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-21 8:12 ` Damian McGuckin @ 2024-06-21 8:16 ` Damian McGuckin 2024-06-23 18:33 ` Rich Felker 1 sibling, 0 replies; 26+ messages in thread From: Damian McGuckin @ 2024-06-21 8:16 UTC (permalink / raw) To: MUSL On Fri, 21 Jun 2024, Damian McGuckin wrote: > There is an optimization in roundf() (and an equivalent in round()) > which handles the case of numbers 0 < |x| < 0.5 > > if (e < 0x7f-1) { > return 0*u.f; > } > > If one has lots of numbers close to zero (as you would if you were testing > even possible 32-bit floating point number or for any other reason), this > can mean a total run time is 5% faster overall (at the expense of the > performance of cases for |x| >= 0.5. > > Is such an optimization worth it? And yes, we re not talking about lots of > compute cycles here! This issue also arises in an implementation roundeven() which will be needed for C2Y compliance. - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-21 8:12 ` Damian McGuckin 2024-06-21 8:16 ` Damian McGuckin @ 2024-06-23 18:33 ` Rich Felker 2024-06-25 3:41 ` Damian McGuckin 1 sibling, 1 reply; 26+ messages in thread From: Rich Felker @ 2024-06-23 18:33 UTC (permalink / raw) To: Damian McGuckin; +Cc: MUSL On Fri, Jun 21, 2024 at 06:12:22PM +1000, Damian McGuckin wrote: > > There is an optimization in roundf() (and an equivalent in round()) > which handles the case of numbers 0 < |x| < 0.5 > > if (e < 0x7f-1) { > return 0*u.f; > } > > If one has lots of numbers close to zero (as you would if you were > testing even possible 32-bit floating point number or for any other > reason), this > can mean a total run time is 5% faster overall (at the expense of > the performance of cases for |x| >= 0.5. > > Is such an optimization worth it? And yes, we re not talking about lots of > compute cycles here! I think this is not a good tradeoff. It seems unlikely anyone would be using round in a performance-critical context with numbers that are in a range where an inline compare would be far better, and the costs are increased code size, decreased performance in useful cases, and exacerbating timing dependency on data (which is generally a negative thing). Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-23 18:33 ` Rich Felker @ 2024-06-25 3:41 ` Damian McGuckin 2024-06-25 19:09 ` Markus Wichmann 0 siblings, 1 reply; 26+ messages in thread From: Damian McGuckin @ 2024-06-25 3:41 UTC (permalink / raw) To: MUSL On Sun, 23 Jun 2024, Rich Felker wrote: > I think this is not a good tradeoff. It seems unlikely anyone would be > using round in a performance-critical context with numbers that are in > a range where an inline compare would be far better, and the costs are > increased code size, decreased performance in useful cases, and > exacerbating timing dependency on data (which is generally a negative > thing). I was actually playing devil's advocate. I do not like the tradeoff either but I was looking for some words of wisdom (as per yours above) to justify it. Anyway... I am not convinced of the benefits of changing the existing rounding routines in MUSL to reflect any modification I might have made to the internal algorithms. And before any changes, I think there are other questions which need to be answered. So I might park it for now. Here is a summary of the work for those who are interested. They reduce the number of lines of GCC-11 generated assembler code by 33%. I am sure that others have thought of (and used) the same incremental changes long ago but never published them. Only the C 'float' routines are in a good shape. The project itself was actually done in another language. My modifications assumes that a call likes 'fabsf' is inlined to assembler which I hope we can assume is the rule these days. This is the code size in GCC11 assembler instructions for C 'float' routines: rintf.... 32/16 - metric was 5% faster truncf... 19/21 - metric was 10% faster roundf... 47/30 - no speed difference ceilf.... 38/29 - no speed difference floorf... 38/29 - no speed difference This line count does not include the startproc and endproc lines, but does include any line with just a label. The first number is that of the MUSL 2.5 routine, and the second is that of the revised routine. Overall, 174 lines down to 115 lines or a >33% reduction. The performance metric was the computation and comparison of GLibc routine + MUSL routine or Glibc routine + Revised routine The GLIBC routines is there for comparison. Where this showed the revised algorithm was faster by 5% or more, I have noted it. Note than no revision algorithm is slower. The CPU was a Xeon E5-2650v4. A 'roundeven' routine exists and it is 44 lines of assembler long. CLANG's line count is slightly longer, I can recode the revised roundf() to be >5% faster than MUSL's roundf() but then I run the risk that I am have a timing dependency on data, in this case, the set of all the 32-bit floating point numbers. - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 3:41 ` Damian McGuckin @ 2024-06-25 19:09 ` Markus Wichmann 2024-06-25 22:24 ` Damian McGuckin 2024-06-25 22:55 ` Damian McGuckin 0 siblings, 2 replies; 26+ messages in thread From: Markus Wichmann @ 2024-06-25 19:09 UTC (permalink / raw) To: musl; +Cc: Damian McGuckin Am Tue, Jun 25, 2024 at 01:41:45PM +1000 schrieb Damian McGuckin: > My > modifications assumes that a call likes 'fabsf' is inlined to assembler > which I hope we can assume is the rule these days. > If GCC is working according to the documentation, then not inside of musl at the moment. These inlinings happen when a builtin instrinsic is lowered into an assembler instruction, and by default, fabs* is recognized as a builtin. But not when building musl, because musl builds with -ffreestanding, which includes -fno-builtin. Rich has stated he wants to work around that with an implementation-internal header file that defines macros such as #define fabs(x) __builtin_fabs(x) But as far as I can tell, this hasn't happened yet. Ciao, Markus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 19:09 ` Markus Wichmann @ 2024-06-25 22:24 ` Damian McGuckin 2024-06-25 22:28 ` enh 2024-06-25 22:55 ` Damian McGuckin 1 sibling, 1 reply; 26+ messages in thread From: Damian McGuckin @ 2024-06-25 22:24 UTC (permalink / raw) To: musl On Tue, 25 Jun 2024, Markus Wichmann wrote: > #define fabs(x) __builtin_fabs(x) > > But as far as I can tell, this hasn't happened yet. Thanks for the insight - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 22:24 ` Damian McGuckin @ 2024-06-25 22:28 ` enh 2024-06-25 23:35 ` Rich Felker 2024-06-26 1:49 ` Thorsten Glaser 0 siblings, 2 replies; 26+ messages in thread From: enh @ 2024-06-25 22:28 UTC (permalink / raw) To: musl i don't know about gcc, but iirc for clang you don't even need to do that. it assumes it knows what various functions mean, and inlines trivial stuff like this anyway... /tmp$ cat x.c #include <math.h> double foo(double x) { return fabs(x); } /tmp$ cc -O2 -S -o - x.c .section __TEXT,__text,regular,pure_instructions .build_version macos, 14, 0 sdk_version 14, 4 .globl _foo ; -- Begin function foo .p2align 2 _foo: ; @foo .cfi_startproc ; %bb.0: fabs d0, d0 ret .cfi_endproc ; -- End function .subsections_via_symbols /tmp$ (the same is true even at -O0, there's just more boilerplate around it that way.) bionic actually uses __builtin_fabs() [and friends] to _implement_ these functions, should someone be foolish enough to be using a function pointer to call them, but we don't actually expect these functions to ever be called. On Tue, Jun 25, 2024 at 6:24 PM Damian McGuckin <damianm@esi.com.au> wrote: > > On Tue, 25 Jun 2024, Markus Wichmann wrote: > > > #define fabs(x) __builtin_fabs(x) > > > > But as far as I can tell, this hasn't happened yet. > > Thanks for the insight - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 22:28 ` enh @ 2024-06-25 23:35 ` Rich Felker 2024-06-26 10:36 ` enh 2024-06-26 1:49 ` Thorsten Glaser 1 sibling, 1 reply; 26+ messages in thread From: Rich Felker @ 2024-06-25 23:35 UTC (permalink / raw) To: enh; +Cc: musl On Tue, Jun 25, 2024 at 06:28:52PM -0400, enh wrote: > i don't know about gcc, but iirc for clang you don't even need to do > that. it assumes it knows what various functions mean, and inlines > trivial stuff like this anyway... Not with -ffreestanding. If it does that with -ffreestanding, it's a bug. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 23:35 ` Rich Felker @ 2024-06-26 10:36 ` enh 2024-06-26 15:23 ` Rich Felker 0 siblings, 1 reply; 26+ messages in thread From: enh @ 2024-06-26 10:36 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Tue, Jun 25, 2024 at 7:35 PM Rich Felker <dalias@libc.org> wrote: > > On Tue, Jun 25, 2024 at 06:28:52PM -0400, enh wrote: > > i don't know about gcc, but iirc for clang you don't even need to do > > that. it assumes it knows what various functions mean, and inlines > > trivial stuff like this anyway... > > Not with -ffreestanding. If it does that with -ffreestanding, it's a > bug. no, but my point is that most of your _users_ will never see the libm functions anyway. (at least not on architectures new enough that their fp instructions are basically "we went through <math.h> and added an instruction for everything that was easy", like arm64 or riscv64.) > Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 10:36 ` enh @ 2024-06-26 15:23 ` Rich Felker 0 siblings, 0 replies; 26+ messages in thread From: Rich Felker @ 2024-06-26 15:23 UTC (permalink / raw) To: enh; +Cc: musl On Wed, Jun 26, 2024 at 06:36:09AM -0400, enh wrote: > On Tue, Jun 25, 2024 at 7:35 PM Rich Felker <dalias@libc.org> wrote: > > > > On Tue, Jun 25, 2024 at 06:28:52PM -0400, enh wrote: > > > i don't know about gcc, but iirc for clang you don't even need to do > > > that. it assumes it knows what various functions mean, and inlines > > > trivial stuff like this anyway... > > > > Not with -ffreestanding. If it does that with -ffreestanding, it's a > > bug. > > no, but my point is that most of your _users_ will never see the libm > functions anyway. (at least not on architectures new enough that their > fp instructions are basically "we went through <math.h> and added an > instruction for everything that was easy", like arm64 or riscv64.) They will if they take the address of those functions and call them indirectly, or if they're using (still default I think; if so that should be fixed) -fmath-errno, etc. Inlining is an optimization; it's not by itself a conforming implementation. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 22:28 ` enh 2024-06-25 23:35 ` Rich Felker @ 2024-06-26 1:49 ` Thorsten Glaser 2024-06-26 3:19 ` Markus Wichmann 2024-06-26 10:38 ` enh 1 sibling, 2 replies; 26+ messages in thread From: Thorsten Glaser @ 2024-06-26 1:49 UTC (permalink / raw) To: musl enh dixit: >bionic actually uses __builtin_fabs() [and friends] to _implement_ >these functions, should someone be foolish enough to be using a That’ll be fun should the compiler decide to insert a call to fabs() at the call site instead, which it’d be allowed to do ☺ Stories of compilers doing that abound… IIRC it was the compiler recognising the implementation pattern of memcpy and replacing that by a call to memcpy… in some libc’s that was not careful to use -ffreestanding memcpy implementation. bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 1:49 ` Thorsten Glaser @ 2024-06-26 3:19 ` Markus Wichmann 2024-06-26 3:43 ` Thorsten Glaser 2024-06-26 10:38 ` enh 1 sibling, 1 reply; 26+ messages in thread From: Markus Wichmann @ 2024-06-26 3:19 UTC (permalink / raw) To: musl Am Wed, Jun 26, 2024 at 01:49:09AM +0000 schrieb Thorsten Glaser: > Stories of compilers doing that abound… IIRC it was the compiler > recognising the implementation pattern of memcpy and replacing > that by a call to memcpy… in some libc’s that was not careful to > use -ffreestanding memcpy implementation. memcpy() is special, in that GCC reserves the right to create calls to it (and memmove(), memcmp(), and memset()) even in freestanding mode. It doesn't do that for any other builtin. Your only options are to write those functions in assembler, or write them in C and constantly curate the options needed to compile the functions so that no recursive calls happen. > 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) $ ./hw > /dev/full && echo FAIL || echo PASS And you wouldn't believe how many implementations fail that. Ciao, Markus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 3:19 ` Markus Wichmann @ 2024-06-26 3:43 ` Thorsten Glaser 2024-06-26 14:11 ` Markus Wichmann 2024-06-26 15:25 ` Rich Felker 0 siblings, 2 replies; 26+ messages in thread From: Thorsten Glaser @ 2024-06-26 3:43 UTC (permalink / raw) To: musl Markus Wichmann dixit: >memcpy() is special, in that GCC reserves the right to create calls to >it (and memmove(), memcmp(), and memset()) even in freestanding mode. It >doesn't do that for any other builtin. But not to replace code that implements a memcpy, only to implement things like struct copying. >Your only options are to write >those functions in assembler, or write them in C and constantly curate >the options needed to compile the functions so that no recursive calls >happen. I’m personally happy writing .S files, but that’s a dying craft. >> 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) >$ ./hw > /dev/full && echo FAIL || echo PASS > >And you wouldn't believe how many implementations fail that. W: mksh: /dev/full: create: Permission denied PASS I don’t know what you mean… *g* bye, //mirabilos (Yes, of course I know all nuances of what you mean, I’m on the list.) -- “It is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.” -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 3:43 ` Thorsten Glaser @ 2024-06-26 14:11 ` Markus Wichmann 2024-06-26 15:25 ` Rich Felker 1 sibling, 0 replies; 26+ messages in thread From: Markus Wichmann @ 2024-06-26 14:11 UTC (permalink / raw) To: musl Am Wed, Jun 26, 2024 at 03:43:27AM +0000 schrieb Thorsten Glaser: > Markus Wichmann dixit: > > >memcpy() is special, in that GCC reserves the right to create calls to > >it (and memmove(), memcmp(), and memset()) even in freestanding mode. It > >doesn't do that for any other builtin. > > But not to replace code that implements a memcpy, only to implement > things like struct copying. > No, there have been versions that would recognize you trying to write a memcpy() and replacing it with a call to memcpy(). Which is fine, unless you are actually writing a memcpy(). That is why musl now builds with -fno-tree-loop-distribute-patterns if that is supported. Because that was the optimizer making that decision, so it is now turned off. Ciao, Markus ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 3:43 ` Thorsten Glaser 2024-06-26 14:11 ` Markus Wichmann @ 2024-06-26 15:25 ` Rich Felker 2024-06-30 4:32 ` Damian McGuckin 1 sibling, 1 reply; 26+ messages in thread From: Rich Felker @ 2024-06-26 15:25 UTC (permalink / raw) To: Thorsten Glaser; +Cc: musl On Wed, Jun 26, 2024 at 03:43:27AM +0000, Thorsten Glaser wrote: > Markus Wichmann dixit: > > >memcpy() is special, in that GCC reserves the right to create calls to > >it (and memmove(), memcmp(), and memset()) even in freestanding mode. It > >doesn't do that for any other builtin. > > But not to replace code that implements a memcpy, only to implement > things like struct copying. FWIW "struct copying" (with __may_alias__ attribute) is, or would be, a reasonable way to write a portable "efficiently copy blocks of the middle at a time" part of memcpy -- outsourcing the strategy for doing so to the compiler's logic for what insn patterns are efficient. Unfortunately you can't because it will just emit a call to memcpy on many archs. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 15:25 ` Rich Felker @ 2024-06-30 4:32 ` Damian McGuckin 0 siblings, 0 replies; 26+ messages in thread From: Damian McGuckin @ 2024-06-30 4:32 UTC (permalink / raw) To: musl As an aside, the current round(), roundf() and roundl() routines raise an exception for non-integral arguments. I believe that this is incorrect according to the latest IEEE 754 and C2Y. Should it be changed? Thanks - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-26 1:49 ` Thorsten Glaser 2024-06-26 3:19 ` Markus Wichmann @ 2024-06-26 10:38 ` enh 1 sibling, 0 replies; 26+ messages in thread From: enh @ 2024-06-26 10:38 UTC (permalink / raw) To: musl On Tue, Jun 25, 2024 at 9:49 PM Thorsten Glaser <tg@mirbsd.de> wrote: > > enh dixit: > > >bionic actually uses __builtin_fabs() [and friends] to _implement_ > >these functions, should someone be foolish enough to be using a > > That’ll be fun should the compiler decide to insert a call to > fabs() at the call site instead, which it’d be allowed to do ☺ yeah, even for our limited set of supported architectures we can't do all the functions for all the architectures. arm32 in particular is missing a lot of stuff. > Stories of compilers doing that abound… IIRC it was the compiler > recognising the implementation pattern of memcpy and replacing > that by a call to memcpy… in some libc’s that was not careful to > use -ffreestanding memcpy implementation. well, unlike musl we don't implement those functions in C. (but, yes, bionic's had similar fun. stack protector trying to protect stuff before we've set up the stack protector infrastructure is our usual one.) > bye, > //mirabilos > -- > 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 19:09 ` Markus Wichmann 2024-06-25 22:24 ` Damian McGuckin @ 2024-06-25 22:55 ` Damian McGuckin 2024-06-25 23:37 ` Rich Felker 1 sibling, 1 reply; 26+ messages in thread From: Damian McGuckin @ 2024-06-25 22:55 UTC (permalink / raw) To: musl On Tue, 25 Jun 2024, Markus Wichmann wrote: > If GCC is working according to the documentation, then not inside of > musl at the moment. These inlinings happen when a builtin instrinsic is > lowered into an assembler instruction, and by default, fabs* is > recognized as a builtin. But not when building musl, because musl builds > with -ffreestanding, which includes -fno-builtin. As a general rule, I think that is a very wise decison. > Rich has stated he wants to work around that with an > implementation-internal header file that defines macros such as > #define fabs(x) __builtin_fabs(x) With IEEE 754 (in 5.5.1) defining all of copy, negate, abs (and copysign) as sign-bit operations, i.e. they are not just recommended operatons, I think this is a smart move, at least for the absolute value and sign copy functionality. - Damian ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [musl] roundf() (and round(), and ...) 2024-06-25 22:55 ` Damian McGuckin @ 2024-06-25 23:37 ` Rich Felker 0 siblings, 0 replies; 26+ messages in thread From: Rich Felker @ 2024-06-25 23:37 UTC (permalink / raw) To: Damian McGuckin; +Cc: musl On Wed, Jun 26, 2024 at 08:55:08AM +1000, Damian McGuckin wrote: > On Tue, 25 Jun 2024, Markus Wichmann wrote: > > >If GCC is working according to the documentation, then not inside > >of musl at the moment. These inlinings happen when a builtin > >instrinsic is lowered into an assembler instruction, and by > >default, fabs* is recognized as a builtin. But not when building > >musl, because musl builds with -ffreestanding, which includes > >-fno-builtin. > > As a general rule, I think that is a very wise decison. > > >Rich has stated he wants to work around that with an > >implementation-internal header file that defines macros such as > > >#define fabs(x) __builtin_fabs(x) > > With IEEE 754 (in 5.5.1) defining all of copy, negate, abs (and > copysign) as sign-bit operations, i.e. they are not just recommended > operatons, I think this is a smart move, at least for the absolute > value and sign copy > functionality. I don't see how that's relevant. The external function call versions are obligated to follow the IEEE rules too, which we do, and __builtin_fabs does not imply a contract that the compiler can expand fabs inline, just an allowance that it can know this means "the standard fabs function semantics" to inline that if it wants, or emit an external call. So doing this will purely be an optimization, no semantic difference. Rich ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2024-06-30 4:32 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-06-17 1:48 [musl] roundf() (and round(), and ...) Damian McGuckin 2024-06-18 12:23 ` Szabolcs Nagy 2024-06-18 12:40 ` Damian McGuckin 2024-06-19 7:09 ` Damian McGuckin 2024-06-19 8:17 ` Damian McGuckin 2024-06-19 18:58 ` Szabolcs Nagy 2024-06-19 23:20 ` Rich Felker 2024-06-21 8:12 ` Damian McGuckin 2024-06-21 8:16 ` Damian McGuckin 2024-06-23 18:33 ` Rich Felker 2024-06-25 3:41 ` Damian McGuckin 2024-06-25 19:09 ` Markus Wichmann 2024-06-25 22:24 ` Damian McGuckin 2024-06-25 22:28 ` enh 2024-06-25 23:35 ` Rich Felker 2024-06-26 10:36 ` enh 2024-06-26 15:23 ` Rich Felker 2024-06-26 1:49 ` Thorsten Glaser 2024-06-26 3:19 ` Markus Wichmann 2024-06-26 3:43 ` Thorsten Glaser 2024-06-26 14:11 ` Markus Wichmann 2024-06-26 15:25 ` Rich Felker 2024-06-30 4:32 ` Damian McGuckin 2024-06-26 10:38 ` enh 2024-06-25 22:55 ` Damian McGuckin 2024-06-25 23:37 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).