From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13694 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Damian McGuckin Newsgroups: gmane.linux.lib.musl.general Subject: Re: Possible Mistype in exp.c Date: Wed, 30 Jan 2019 23:56:05 +1100 (AEDT) Message-ID: References: <20190129110135.GC21289@port70.net> <20190129114308.GD21289@port70.net> <20190130093738.GE21289@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="28146"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) Cc: musl@lists.openwall.com To: Szabolcs Nagy Original-X-From: musl-return-13710-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jan 30 13:56:22 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gopPi-0007AM-4l for gllmg-musl@m.gmane.org; Wed, 30 Jan 2019 13:56:22 +0100 Original-Received: (qmail 3306 invoked by uid 550); 30 Jan 2019 12:56:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 3283 invoked from network); 30 Jan 2019 12:56:19 -0000 X-Authentication-Warning: key0.esi.com.au: damianm owned process doing -bs In-Reply-To: <20190130093738.GE21289@port70.net> Xref: news.gmane.org gmane.linux.lib.musl.general:13694 Archived-At: I have Muller's book. Good for the math. Not for any ideas about the actual implementation. As a matter of interest, what was the benchmark against which you get a 2x speed gain? I got 1.75 against GLIBC for what that is worth. I used a faster scaling routine. But I was not chasing a improved ULP performance ike you were as that was too much extra work. Your work there sounds like seriously smart stuff to me. I used super-scalar friendly code which adds an extra multiplication. It made a miniscule tiny net benefit on the Xeons (not Xeon Gold). I had 2 versions of the fast scaling routine replacing ldexp. One used a single ternary if/then/else and other grabbed the sign and did a table looking which meant one extra multiplication all the time but no branches. The one extra multiplication instead of a branch in the 2-line scaling routine made no difference. I saw a tiny but measurable difference when I used a Xeon with an FMA compared to one which did not. Discarding the last term in the SUN routine and that net loss of one multiplication still made no serious difference to the timing and of course, the results were affected. My timings showed on my reworked code (for doubles) 21+% the preliminary comparisons 43+% polynomial computation super-scalar friendly way 35+% y = 1 + (x*c/(2-c) - lo + hi); return k == 0 ? y : scalbn-FAST(y, k); I have slightly increased the work load in the comparisons because I avoid pulling 'x' apart into 'hx'. I used only doubles or floats. Regards - Damian Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037 Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here Views & opinions here are mine and not those of any past or present employer