From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id 08CA42421E for ; Wed, 19 Jun 2024 20:58:51 +0200 (CEST) Received: (qmail 1867 invoked by uid 550); 19 Jun 2024 18:58:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 1830 invoked from network); 19 Jun 2024 18:58:46 -0000 Date: Wed, 19 Jun 2024 20:58:37 +0200 From: Szabolcs Nagy To: Damian McGuckin Cc: MUSL Message-ID: <20240619185837.GM3766212@port70.net> Mail-Followup-To: Damian McGuckin , MUSL References: <31679941-f6c2-64fa-7a8d-6b6f7112c31@esi.com.au> <20240618122357.GL3766212@port70.net> <67395818-5d95-f74f-5c50-435fc157dda@esi.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67395818-5d95-f74f-5c50-435fc157dda@esi.com.au> Subject: Re: [musl] roundf() (and round(), and ...) * Damian McGuckin [2024-06-19 18:17:21 +1000]: > On Tue, 18 Jun 2024, Szabolcs Nagy wrote: > > FLT_EVAL_METHOD is not relevant for this code. > > That was my placeholder as in 'Memo to Self'. > > I have not used anything other than 'FLT_EVAL_METHOD=0' for decades so > I am likely to not handle the other scenarios properly, i.e. using the > types 'float_t' and 'double_t' as appropriate. So I am worried that my > code is buggy in that context. you can change all 'float x' declarations to 'float_t x' and sometimes this allows saving a store+load on i386 and m68k due to iso c rules. e.g. consider float foo1(float x) { T y = x*x; T z = x+y; return z; } with T==float and FLT_EVAL_METHOD!=0 this is not the same as float foo2(float x) { retrun x + x*x; } because iso c requires assignment (and return statement) to round away the excess precision, so in the former there are two float stores while in the latter only one (for the return). (with vs without the extra store the result is slightly different but they are both different from ieee754 arithmetics anyway. if the difference matters we use explicit eval_as_float. i believe the standard -ffp-contract=on mode only contracts foo2 and would give yet another different result, but musl keeps contraction off so there is no implicit fma.) there are two approaches to deal with the extra store issue (only a performance issue, not correctness): the gnu approach is that by default assignment does not round excess precision (-fexcess-precision=fast) so T==float works, but there is no guarantee where exactly the rounding happens, any register spill may do it. the musl approach is to use -fexcess-precision=standard and use T==float_t and then it is guaranteed that intermediate results are evaluated with excess precision (even if spilled). i don't think optimizing for i386 or m68k is very relevant today, so if you get this wrong it will not be a big issue. > > benchmark data may be useful (or code size e.g. on a soft-float target) > > because that may be a valid justification to use this implementation. > > It is much less assembler size on a hard-float target. > > I have no experience on a soft-float target. > > In my testing on X86-64, it is faster than either existing MUSL code or > GLIBM, but not as fast as assembler using 'ROUNDSS/ROUNDSD'. Not that these > routines are going to be performance killers. ok, this sounds good. note that in musl fabsf() is an actual call because we build musl with -ffrestanding (builtins are turned off) at some point we should fix this to ensure important functions are optimized internally, e.g. have a header with #define fabsf(x) __builtin_fabsf(x) > > i used to use explicit unions then switched to helper function asuint(x) > > because i found that a bit clearer and compiler optimizes it just fine. > > and some ppl complained that memcpy is more correct than union, so it is > > better hidden away behind a helper function if somebody wants to switch. > > I initially wrote these routines using these helper functions but went back > to unions because I had some issues. Unions are the only things which work > at compile-time. > > But I am happy to go back to helper function. I just redid them routines and > I notices that they read quite well. They also map nicely into things like > Rust's or Chapel's 'transmute' feature although they work at compile time > and these helper functions do not. > > I had also tried implementing these routines with from Linux to > see how well that worked. I have very little experience with the use of > 'memcpy" as this also has problems at compile time. not sure what you mean by 'compile-time', in musl the helper functions are actually macros now using unions (iirc with older compilers there were some optimization issues when i used static inline function) but with recent compiler and -O2 static inline unsigned asuint(float x) { unsigned u; memcpy(&u, &x, sizeof x); return u; } should work just as well. (and this is pedantically correct even on c++) > > > nowadays i'd probably write > > > > if (asuint(a) >> 23 < asuint(0x1p23f) >> 23) > > I wish it was asuint32 rather than asuint() but maybe I am too picky. > > I would prefer to define the bias 'b' and precision 'p' and write it as > > if ((asuint(a) >> (p - 1)) < b + (p - 1)) > > But I was worried that would be a change too far. Also, I use different > languages in a given week so I probably us more coercions (or casts) and > parentheses than MUSL prefers for C. i'm used to writing out the bias term and precision as int literal but using names is fine too i guess. the naming does not always help as the c and ieee float models differ: they both use x=m*2^e, but in c m is in [0.5,1), in ieee m is in [1,2) so the e is off by one (1024 vs 1023 emax). and 'precision bits' includes the leading 1 in both models, but the representation does not store that bit so i often don't count that (53 vs 52 bits). > > > > /* > > > * this should achieve rounding to nearest with any > > > * ties (half-way cases) being rounded away-from-zero. > > > * (is it wise to use uint32_t instead of int32_t here?) > > > */ > > > const uint32_t rf = ((uint32_t) (a + a)) - (uint32_t) a; > > > > > > x = (r.f = (float) rf, r._f |= s, r.f); > > > > it is isa dependent if int32_t or uint32_t is better, but i'd > > expect signed convert is more widely supported than unsigned. > > Thanks for the advice - I am just testing that change now - Damian