From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10541 Path: news.gmane.org!.POSTED!not-for-mail From: Markus Wichmann Newsgroups: gmane.linux.lib.musl.general Subject: Re: Model specific optimizations? Date: Fri, 30 Sep 2016 06:56:15 +0200 Message-ID: <20160930045615.GD22343@voyager> References: <20160929142126.GB22343@voyager> <20160929152354.GK19318@brightrain.aerifal.cx> <20160929170801.GC22343@voyager> <20160929181336.GL19318@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1475211406 24520 195.159.176.226 (30 Sep 2016 04:56:46 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 30 Sep 2016 04:56:46 +0000 (UTC) User-Agent: Mutt/1.5.23 (2014-03-12) To: musl@lists.openwall.com Original-X-From: musl-return-10554-gllmg-musl=m.gmane.org@lists.openwall.com Fri Sep 30 06:56:42 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1bppsB-0005G9-Uo for gllmg-musl@m.gmane.org; Fri, 30 Sep 2016 06:56:36 +0200 Original-Received: (qmail 22512 invoked by uid 550); 30 Sep 2016 04:56:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 22493 invoked from network); 30 Sep 2016 04:56:35 -0000 Content-Disposition: inline In-Reply-To: <20160929181336.GL19318@brightrain.aerifal.cx> X-Provags-ID: V03:K0:MWIO8qunvQa21Y5JarReJfvT1u1Y+Nb8TbnlTOd7I0H+UMI7GH/ CqtjaB+r6Hcn0vNJW90rDtAnr9TtMO6gIQ1geSsoqxVFyaN5ZuQ+qrOPJZbAKK5e44Fz0UD AgdXspFC0w9AVJCPb3UA2S2Yo9pd/7N6ecp1+6eQ5DwcQukagB5zrdAkVSY5Wf6LyPEpZ+3 b0dzClK1bzlOhouri1ghA== X-UI-Out-Filterresults: notjunk:1;V01:K0:JhQNO8QIWoc=:MU1JYmxq5Te8kprm7s7INO Fla2LYSHhNY8Xu2VIm23Mi440t5luIiiOggy+sHOmLX5dDyaL3pnr2WRsDW/qNRJWkYzrwZUi aPcBrnwkC4Hbth6JIT98vJVC+f129mQ9x1TZxTIS8FrO1DIpQzo9h2fIakuzkRTrgBaIrUuWM 3oWNJx9TP2q5sjGxdxwadIppAF/BiiubCsgFC5WLMwc14Ei0ZBNjPNb+ztjer8HWR5JXD/m2O IxmsbLfB3aFTFK5pvqAhoBOzi2G5t73Q5KipClaYQWeTK68JZ2zUAhHypA9ORb8KpUcp0NQad ujbh1PtBtFpy3Irjq7CJuf8iY0QvkfKy3tGwhVX7XY10zlJG1VbQ6hNyhO5k+j2HnkaA0HJJy Gbpz1H29MMULgKip9oEOykJ0RdvxzJRnuydn03p3wzBKuq7zJQneuWW7OVWpjifxgCYJmM/C+ DJbMZ2TWLAnOeAUSCRwU423Z7kZny+zVMr2tI/hmg6FwcvsNYgI1NiK4ErY0Zb4An8MQPgtv6 sVxmaR0sesjCg6ZqSncbNkvbVd5puneWhUHImiW3ajO942JLdGjKO7aIj7o+uVN98DlSSFqsO L4EutH4dkpYaFxCz7ZwEL6UST/uenGd3Ims7/jIlEkXyPbgACPNycWT/87OOoOrTrZtBDR/XQ NEk0d74a6JOUt971pISOpCX1E9uPAnngY/bjaOYiQdrwrm3mnUpQ4YHhHJKd78JQjCRjFk2qR gloByI5yEElOzethuJ4JVC1g8HYMoSy66yqmf/wmuVgdrJRXZ9E2uqqyFzc= Xref: news.gmane.org gmane.linux.lib.musl.general:10541 Archived-At: On Thu, Sep 29, 2016 at 02:13:36PM -0400, Rich Felker wrote: > On Thu, Sep 29, 2016 at 07:08:01PM +0200, Markus Wichmann wrote: > > [...] > On Linux it's supposed to be the kernel which detects availability of > features (either by feature-specific cpu flags or translating a model > to flags) but I don't see anything for fsqrt on ppc. :-( How/why did > they botch this? > Maybe it's a new extension? I only know version 2.2 of the PowerPC Book. Or maybe it goes back to the single core thing. (Only the 970 supports it, and that's pretty new.) Or maybe Linux kernel developers aren't interested in this problem, because a manual sqrt exists, and if need be, anyone can just implement the Babylonian method for speed. On PPC, it can be implemented in a loop consisting of four instructions, namely: ; .rodata half: .double 0.5 ; assuming positive finite argument ; if that can't be assumed, go through memory to inspect argument fmr 1, 0 ; yes, halving the exponent would be a better estimate ; requires going through memory, though lfd 2, half(13) li 0, 6 ;or more for more accurcy mtctr 0 1: ; fr0 = x, fr1 = a fdiv 3, 1, 0 ; fr3 = a/x fadd 3, 3, 0 ; fr3 = x + a/x fmul 0, 3, 2 ; fr0 = 0.5(x + a/x) bdnz 1b So maybe there wasn't a lot of need for the hardware sqrt. > > Well, yes, I was just throwing shit at a wall to see what sticks. We > > could also move the function pointer dispatch into a pthread_once block > > or something. I don't know if any caches need to be cleared then or not. > > pthread_once/call_once would be the nice clean abstraction to use, but > it's mildly to considerably more expensive, currently involving a full > barrier. There's a nice technical report on how that can be eliminated > but it requires TLS, which is also expensive on some archs. In cases > like this where there's no state other than the function pointer, > relaxed atomics can simply be used on the reading end and then they're > always fast. > Hmmm... not on PPC, though. TLS on Linux PPC just uses r2 as TLS pointer. So the entire thing could be used almost as-is by making sqrtfn thread-local? > > So any PowerPC implementation is free to include it or not. > > There are a lot of optional features, and if the gas people made a > > different subarch for each combination of them, they'd be here all day. > > They've actually done that for some archs... > That actually made me check if they did it here, but thankfully not. gas assembles the instruction without flags, without warning, and without a note or anything on the output file. > Anyway, I would have no objection right away to doing a patch like > this that's decided at compile-time based on predefined macros set by > -march. For runtime choice I think we need to discuss motivation. Are > you trying to do a powerpc-based distro where you need a universal > libc.so that works optimally on various models? Or would just > compiling for the right -march meet your needs? > Just idle musings. I was reading sqrt.c, which has a flowerbox saying "Use hardware sqrt if available" and recalled that there is a hardware sqrt on PPC and started doing research from there. And that ended up in the OP. > Rich Ciao, Markus