From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10552 Path: news.gmane.org!.POSTED!not-for-mail From: Adhemerval Zanella Newsgroups: gmane.linux.lib.musl.general Subject: Re: Model specific optimizations? Date: Sun, 2 Oct 2016 10:59:00 -0300 Message-ID: <0a73b895-5d34-0059-0355-397e0abf0d3e@linaro.org> References: <20160929142126.GB22343@voyager> <20160929152354.GK19318@brightrain.aerifal.cx> <20160929170801.GC22343@voyager> <20160929181336.GL19318@brightrain.aerifal.cx> <20160930045615.GD22343@voyager> <20161001055023.GA24569@brightrain.aerifal.cx> <20161001085214.GE22343@voyager> <20161001151012.GN19318@brightrain.aerifal.cx> <20161001195304.GF22343@voyager> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Trace: blaine.gmane.org 1475416771 23771 195.159.176.226 (2 Oct 2016 13:59:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 2 Oct 2016 13:59:31 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 To: musl@lists.openwall.com Original-X-From: musl-return-10565-gllmg-musl=m.gmane.org@lists.openwall.com Sun Oct 02 15:59:26 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1bqhIR-0004OF-H1 for gllmg-musl@m.gmane.org; Sun, 02 Oct 2016 15:59:15 +0200 Original-Received: (qmail 5624 invoked by uid 550); 2 Oct 2016 13:59:16 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5600 invoked from network); 2 Oct 2016 13:59:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=RTn9c3DykTqrjFgslX/BIwj9RlqEH1lpAEc6CDu6wPw=; b=UPI9GGt4bMkqlhjMD1uVdewT0yX6fRrmOPjR2QiQoc3fy4S7xUQYDqtVpoiCbvFTaS L+ULT7qbUE8CCAGcFyNwcgecyNQQHEo1PRx/Wg6VsQO9u2Zxk7UYHMjJG2DMgxpZWd5R 2gPJyKhdQeXSXP7J3MtPuI6KhJBPzTSA4T3Zo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=RTn9c3DykTqrjFgslX/BIwj9RlqEH1lpAEc6CDu6wPw=; b=R/9zSgGKVkdYy3unYa8GuNZ1E/aGYyAvVMtMFRxviRt/Ebg2wcX+Wk5zD96E3HvEqk lIdPyeRQIkcO1FFJ4Y7R08T7Gm88Vvyvs4Lh2W0NqlK2QOYHnxsaGF92DUN4UvopWsxM ZH2KX1LY7K2AkSwjGrlA3XDttuBSUckuqZjToA41Ht366MPBDO5YJxjVO0V/qAxDHJG7 L8DHpVHnTIyOoJpoMl4lvO1hTpi8ONzRUeH756xzNtyGZMKbZ1TZ96x95Wg4RfNr5y+F n/OqP2Zcnc5y3Rk7HUCAwlh0DFlp6TV2Lk1bJfNdpCCVjqX8/Eu3gqN9Svzu0yCz2QPA Ej8Q== X-Gm-Message-State: AA6/9Rktc8a7bRwTC9reXmJEWSLRy0YJN9kSPSWaMVh34zzP+cwKdq0DsQPKOP5HtUFx4y3z X-Received: by 10.129.105.85 with SMTP id e82mr11348860ywc.324.1475416743282; Sun, 02 Oct 2016 06:59:03 -0700 (PDT) In-Reply-To: <20161001195304.GF22343@voyager> Xref: news.gmane.org gmane.linux.lib.musl.general:10552 Archived-At: On 01/10/2016 16:53, Markus Wichmann wrote: > On Sat, Oct 01, 2016 at 11:10:12AM -0400, Rich Felker wrote: >> On Sat, Oct 01, 2016 at 10:52:14AM +0200, Markus Wichmann wrote: >>> On Sat, Oct 01, 2016 at 01:50:23AM -0400, Rich Felker wrote: >>>> I don't think this works at all. sqrt() is required to be >>>> correctly-rounded; that's the whole reason sqrt.c is costly. >>> >>> It's an approximation, at least, which was rather my point. >>> >>> As I've come to realize over the course of this discussion, the fsqrt >>> instruction is useless here and pretty much everywhere out there: >> >> I don't think that conclusion is correct. It certainly makes sense for >> libc to use it in targets that have it, assuming it safely produces >> correct results, and for compilers to generate it in place of a call >> to sqrt. >> > > But again, that requires the appropriate flags.gcc/config/rs6000/rs6000-cpus.def:35: > >>> Also, at least according to Apple, which were the only ones actually >>> looking at the thing, such as I could find, it was only ever supported >>> by the 970 and the 970FX cores, released in 2002 and 2004, respectively. >>> I highly doubt they'll have much relevance. Chalk up my suspicions from >>> the OP to not having researched enough. >> >> Do you mean these are the only non-POWER line models that have fsqrt? >> > > The more I research this, the more confused I get! > > So, I was looking for real-world users of fsqrt, do look at how they > determine availability. The first such user I found was Apple's libm. > Tracing back to where they set their feature flags, I found this file > > http://opensource.apple.com/source/xnu/xnu-1456.1.26/osfmk/ppc/start.s > > If you search for _cpu_capabilities, around line 180 you'll find a > comment saying the feature flags in this file are only defaults and may > be changed by initialization code. But I couldn't find anything setting > more flags, if anything, flags got removed. And the only models that > have the flag kHasFsqrt are the 970 and the 970FX. > > But then I noticed that their processor list is kind of small, so I > continued the search. I found this e-mail claiming the 604 supports the > instruction: > > http://aps.anl.gov/epics/tech-talk/2011/msg01247.php > > But if you look at datasheets of the 604, they say nothing either way. > But alright, the 604 is and old model (intrduced in 1994), maybe fsqrt > wasn't defined then. > > I personally work with the e300 (at my day job), and at least their > datasheet makes it clear that fsqrt is not supported. Actually, > apparently Freescale aren't big fans of this instruction at all, > according to this comment: > > https://github.com/ibmruntimes/v8ppc/issues/119#issuecomment-72705975 > > Wikipedia claims, however, that it wasn't until the 620 that the square > root instruction was put into hardware. I tried to find a 620 datasheet, > but no luck so far. > > Next family on the list would be the 4xx. 403 can be discounted > immediately as it lacks an FPU. Since the 401 is stripped down even > further, it also has no FPU. From the 405 onward it get's dicey as they > went the way of the x87: You could connect an external FPU if desired. > > I found one for 405 here: > > http://www.xilinx.com/support/documentation/ip_documentation/apu_fpu.pdf > > That one doesn't support fsqrt, at least not enough for our purposes, > but it does support fsqrts (that's the single precision variant). That's > a whole new level of weird. > > As for the rest: I hope, Apple got it right, because afer the 970, > nothing more is listed in Wikipedia. > > So, as you can see, the whole thing is a mess. Since kernel does not track it, I found GCC internal implementation to be the most correct way to found if an chip implementation actually support some ppc instruction. For fsqrt gcc will define _ARCH_PPCSQ and internally this flag is controlled by OPTION_MASK_PPC_GPOPT. GCC cpu definition file has some information about it [1]: 27 /* For ISA 2.05, do not add MFPGPR, since it isn't in ISA 2.06, and don't add 28 ALTIVEC, since in general it isn't a win on power6. In ISA 2.04, fsel, 29 fre, fsqrt, etc. were no longer documented as optional. Group masks by 30 server and embedded. */ 31 #define ISA_2_5_MASKS_EMBEDDED (ISA_2_4_MASKS \ 32 | OPTION_MASK_CMPB \ 33 | OPTION_MASK_RECIP_PRECISION \ 34 | OPTION_MASK_PPC_GFXOPT \ 35 | OPTION_MASK_PPC_GPOPT) 36 36 37 #define ISA_2_5_MASKS_SERVER (ISA_2_5_MASKS_EMBEDDED | OPTION_MASK_DFP) Later in the file you can check that besides all POWER chips from power4 and forward, only the 970, cell, and G5 support fsqrt. The problem is you have new powerpc cores as e6500 that is suppose to follow the ISA 2.06 embedded profile but that does not implement fsqrt even on ppc64 mode. [1] gcc/config/rs6000/rs6000-cpus.def