mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: Re: Model specific optimizations?
Date: Fri, 30 Sep 2016 06:56:15 +0200	[thread overview]
Message-ID: <20160930045615.GD22343@voyager> (raw)
In-Reply-To: <20160929181336.GL19318@brightrain.aerifal.cx>

On Thu, Sep 29, 2016 at 02:13:36PM -0400, Rich Felker wrote:
> On Thu, Sep 29, 2016 at 07:08:01PM +0200, Markus Wichmann wrote:
> > [...]
> On Linux it's supposed to be the kernel which detects availability of
> features (either by feature-specific cpu flags or translating a model
> to flags) but I don't see anything for fsqrt on ppc. :-( How/why did
> they botch this?
> 

Maybe it's a new extension? I only know version 2.2 of the PowerPC Book.

Or maybe it goes back to the single core thing. (Only the 970 supports
it, and that's pretty new.) Or maybe Linux kernel developers aren't
interested in this problem, because a manual sqrt exists, and if need
be, anyone can just implement the Babylonian method for speed. On PPC,
it can be implemented in a loop consisting of four instructions, namely:

; .rodata
half: .double 0.5
; assuming positive finite argument
; if that can't be assumed, go through memory to inspect argument
fmr 1, 0    ; yes, halving the exponent would be a better estimate
; requires going through memory, though
lfd 2, half(13)
li 0, 6 ;or more for more accurcy
mtctr 0

1:  ; fr0 = x, fr1 = a
    fdiv 3, 1, 0    ; fr3 = a/x
    fadd 3, 3, 0    ; fr3 = x + a/x
    fmul 0, 3, 2    ; fr0 = 0.5(x + a/x)
    bdnz 1b

So maybe there wasn't a lot of need for the hardware sqrt.

> > Well, yes, I was just throwing shit at a wall to see what sticks. We
> > could also move the function pointer dispatch into a pthread_once block
> > or something. I don't know if any caches need to be cleared then or not.
> 
> pthread_once/call_once would be the nice clean abstraction to use, but
> it's mildly to considerably more expensive, currently involving a full
> barrier. There's a nice technical report on how that can be eliminated
> but it requires TLS, which is also expensive on some archs. In cases
> like this where there's no state other than the function pointer,
> relaxed atomics can simply be used on the reading end and then they're
> always fast.
> 

Hmmm... not on PPC, though. TLS on Linux PPC just uses r2 as TLS
pointer. So the entire thing could be used almost as-is by making sqrtfn
thread-local?

> > So any PowerPC implementation is free to include it or not.
> > There are a lot of optional features, and if the gas people made a
> > different subarch for each combination of them, they'd be here all day.
> 
> They've actually done that for some archs...
> 

That actually made me check if they did it here, but thankfully not. gas
assembles the instruction without flags, without warning, and without a
note or anything on the output file.

> Anyway, I would have no objection right away to doing a patch like
> this that's decided at compile-time based on predefined macros set by
> -march. For runtime choice I think we need to discuss motivation. Are
> you trying to do a powerpc-based distro where you need a universal
> libc.so that works optimally on various models? Or would just
> compiling for the right -march meet your needs?
> 

Just idle musings. I was reading sqrt.c, which has a flowerbox saying
"Use hardware sqrt if available" and recalled that there is a hardware
sqrt on PPC and started doing research from there. And that ended up in
the OP.

> Rich

Ciao,
Markus


  parent reply	other threads:[~2016-09-30  4:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-29 14:21 Markus Wichmann
2016-09-29 14:57 ` Szabolcs Nagy
2016-09-29 15:23 ` Rich Felker
2016-09-29 17:08   ` Markus Wichmann
2016-09-29 18:13     ` Rich Felker
2016-09-29 18:52       ` Adhemerval Zanella
2016-09-29 22:05         ` Szabolcs Nagy
2016-09-29 23:14           ` Adhemerval Zanella
2016-09-30  4:56       ` Markus Wichmann [this message]
2016-10-01  5:50         ` Rich Felker
2016-10-01  8:52           ` Markus Wichmann
2016-10-01 15:10             ` Rich Felker
2016-10-01 19:53               ` Markus Wichmann
2016-10-02 13:59                 ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160930045615.GD22343@voyager \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).