mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: Model specific optimizations?
Date: Sat, 1 Oct 2016 01:50:23 -0400	[thread overview]
Message-ID: <20161001055023.GA24569@brightrain.aerifal.cx> (raw)
In-Reply-To: <20160930045615.GD22343@voyager>

On Fri, Sep 30, 2016 at 06:56:15AM +0200, Markus Wichmann wrote:
> On Thu, Sep 29, 2016 at 02:13:36PM -0400, Rich Felker wrote:
> > On Thu, Sep 29, 2016 at 07:08:01PM +0200, Markus Wichmann wrote:
> > > [...]
> > On Linux it's supposed to be the kernel which detects availability of
> > features (either by feature-specific cpu flags or translating a model
> > to flags) but I don't see anything for fsqrt on ppc. :-( How/why did
> > they botch this?
> > 
> 
> Maybe it's a new extension? I only know version 2.2 of the PowerPC Book.
> 
> Or maybe it goes back to the single core thing. (Only the 970 supports
> it, and that's pretty new.) Or maybe Linux kernel developers aren't
> interested in this problem, because a manual sqrt exists, and if need
> be, anyone can just implement the Babylonian method for speed. On PPC,
> it can be implemented in a loop consisting of four instructions, namely:
> 
> ; .rodata
> half: .double 0.5
> ; assuming positive finite argument
> ; if that can't be assumed, go through memory to inspect argument
> fmr 1, 0    ; yes, halving the exponent would be a better estimate
> ; requires going through memory, though
> lfd 2, half(13)
> li 0, 6 ;or more for more accurcy
> mtctr 0
> 
> 1:  ; fr0 = x, fr1 = a
>     fdiv 3, 1, 0    ; fr3 = a/x
>     fadd 3, 3, 0    ; fr3 = x + a/x
>     fmul 0, 3, 2    ; fr0 = 0.5(x + a/x)
>     bdnz 1b
> 
> So maybe there wasn't a lot of need for the hardware sqrt.

I don't think this works at all. sqrt() is required to be
correctly-rounded; that's the whole reason sqrt.c is costly.

> > > Well, yes, I was just throwing shit at a wall to see what sticks. We
> > > could also move the function pointer dispatch into a pthread_once block
> > > or something. I don't know if any caches need to be cleared then or not.
> > 
> > pthread_once/call_once would be the nice clean abstraction to use, but
> > it's mildly to considerably more expensive, currently involving a full
> > barrier. There's a nice technical report on how that can be eliminated
> > but it requires TLS, which is also expensive on some archs. In cases
> > like this where there's no state other than the function pointer,
> > relaxed atomics can simply be used on the reading end and then they're
> > always fast.
> 
> Hmmm... not on PPC, though. TLS on Linux PPC just uses r2 as TLS
> pointer. So the entire thing could be used almost as-is by making sqrtfn
> thread-local?

Yes and no. Not in musl because we don't use _Thread_local; it would
require allocating space in the thread structure which is not
appropriate for something like this. The right and most efficient
solution is the one I described above.

Rich


  reply	other threads:[~2016-10-01  5:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29 14:21 Markus Wichmann
2016-09-29 14:57 ` Szabolcs Nagy
2016-09-29 15:23 ` Rich Felker
2016-09-29 17:08   ` Markus Wichmann
2016-09-29 18:13     ` Rich Felker
2016-09-29 18:52       ` Adhemerval Zanella
2016-09-29 22:05         ` Szabolcs Nagy
2016-09-29 23:14           ` Adhemerval Zanella
2016-09-30  4:56       ` Markus Wichmann
2016-10-01  5:50         ` Rich Felker [this message]
2016-10-01  8:52           ` Markus Wichmann
2016-10-01 15:10             ` Rich Felker
2016-10-01 19:53               ` Markus Wichmann
2016-10-02 13:59                 ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161001055023.GA24569@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).