From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 27589 invoked from network); 5 Sep 2022 16:38:46 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 5 Sep 2022 16:38:46 -0000 Received: (qmail 28601 invoked by uid 550); 5 Sep 2022 16:38:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 28569 invoked from network); 5 Sep 2022 16:38:42 -0000 Date: Mon, 5 Sep 2022 18:38:30 +0200 From: Szabolcs Nagy To: Paul Zimmermann Cc: dalias@libc.org, musl@lists.openwall.com Message-ID: <20220905163830.GP1320090@port70.net> Mail-Followup-To: Paul Zimmermann , dalias@libc.org, musl@lists.openwall.com References: <20220902121755.GS7074@brightrain.aerifal.cx> <20220905130859.GO1320090@port70.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [musl] Re: integration of CORE-MATH routines into Musl? * Paul Zimmermann [2022-09-05 16:39:02 +0200]: > Dear Szabolcs, > > > when i worked on exp and log i noticed that for single prec it is > > easy to do correct rounding with only minor overhead, but it required > > either a bit bigger lookup table or a bit bigger polynomial vs going > > for < 1 ulp error only. > > please have a look at https://gitlab.inria.fr/core-math/core-math/-/blob/master/src/binary32/exp/expf.c: no big lookup table, degree 5 only. "a bit bigger". in this case the polynomial is bigger: order 5 instead of 3. (order 3 is enough for < 1 ulp error). the code size is also bigger: core-math: size -G (x86_64 -O3): text data bss total filename 464 352 0 816 exp2/exp2f.o 398 348 0 746 exp/expf.o musl: size -G: (data is shared between expf, exp2f and powf) text data bss total filename 0 328 0 328 exp2f_data.o 202 12 0 214 exp2f.o 211 16 0 227 expf.o i'd expect at least a bit of overhead between <1 ulp and cr functions (but not significant overhead in case of binary32). so when core-math is faster, it should be possible to write an even faster version that only aims to be <1 ulp (but the perf diff will not be huge). in case of binary64: i'd expect one can turn a close to 0.5 ulp implementation into a cr one with small overhead by testing for near halfway cases in the end and having a slow path for those. but the slow path will be much slower and bigger (and harder to test).