From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14995
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: "Stefan Kanthak" <stefan.kanthak@nexgo.de>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: More patches for math subtree
Date: Wed, 11 Dec 2019 10:53:41 +0100
Organization: Me, myself & IT
Message-ID: <F2BD74F09B4748A094E99A4B8C5CBB4A@H270>
References: <2C3325A208DA4260A1A0F7B4517D6DFA@H270> <20191210193558.GK1666@brightrain.aerifal.cx> <FAF063FA3F8F4F1B883929708874E3AA@H270> <20191210221738.GL1666@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="143399"; mail-complaints-to="usenet@blaine.gmane.org"
Cc: <musl@lists.openwall.com>
To: "Rich Felker" <dalias@libc.org>
Original-X-From: musl-return-15011-gllmg-musl=m.gmane.org@lists.openwall.com Wed Dec 11 10:58:41 2019
Return-path: <musl-return-15011-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-15011-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ieylS-000b8v-C7
	for gllmg-musl@m.gmane.org; Wed, 11 Dec 2019 10:58:38 +0100
Original-Received: (qmail 3564 invoked by uid 550); 11 Dec 2019 09:58:35 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 3534 invoked from network); 11 Dec 2019 09:58:34 -0000
In-Reply-To: <20191210221738.GL1666@brightrain.aerifal.cx>
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Windows Mail 6.0.6002.18197
X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158
X-VADE-STATUS: LEGIT
Xref: news.gmane.org gmane.linux.lib.musl.general:14995
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/14995>

"Rich Felker" <dalias@libc.org> wrote:

> On Tue, Dec 10, 2019 at 10:32:26PM +0100, Stefan Kanthak wrote:

[ asm vs. C ]

>> Does any compiler emit branch-free instruction sequences like the
>> following for Intel CPUs without SSE4.1, i.e. without ROUNDSS/ROUNDSD?
>> 
>>         .code   ; Intel syntax
>> ceil    proc    public
>>         extern  __real@8000000000000000:real8
>>         movsd   xmm1, __real@8000000000000000
>>         extern  __real@3ff0000000000000:real8
>>         movsd   xmm2, __real@3ff0000000000000
>>         extern  __real@4330000000000000:real8
>>         movsd   xmm3, __real@4330000000000000
>>         movsd   xmm4, xmm1
>>         andnpd  xmm1, xmm0
>>         andpd   xmm4, xmm0
>>         cmpltsd xmm1, xmm3
>>         andpd   xmm1, xmm3
>>         orpd    xmm1, xmm4
>>         movsd   xmm3, xmm0
>>         addsd   xmm0, xmm1
>>         subsd   xmm0, xmm1
>>         movsd   xmm1, xmm0
>>         cmpltsd xmm0, xmm3
>>         andpd   xmm0, xmm2
>>         addsd   xmm0, xmm1
>>         orpd    xmm0, xmm4
>>         ret
>> ceil    endp
>> 
>> Or instruction sequences like
>> 
>>         .code   ; Intel syntax
>> copysign proc   public
>>         movd    rcx, xmm0
>>         movd    rdx, xmm1
>>         shld    rcx, rdx, 1
>>         ror     rcx, 1
>>         movd    xmm0, rcx
>>         ret
>> copysign endp
> 
> Not quite (but it might be possible to write the C in terms of shifts
> instead of masks such that it does), but I also don't think it's clear
> which version is better. Yours here is mildly smaller and might
> perform better, but when making changes that aren't clearly better
> there should be some evidence that it's actually an improvement --
> especially if it's not just improving existing arch optimizations but
> adding new ones where the C was formerly used.

Correct.
I expect the compiler to emit such properly optimised code instead of
calls to the library for standard functions like copysign(), fdim(),
etc. which can be written with just a few instructions ... what the
compiler but not (always) does.

JFTR: I don't know whether GCC or clang either provide intrinsics or
      __builtin_* for such (or all those) small standard functions.

> Generally musl avoids asm and arch-specific files as much as possible,
> using them only for things that aren't representable in C or where
> the C is a lot larger or slower or both.
> 
>>         .code   ; Intel syntax
>> fdim    proc    public
>>         movsd   xmm2, xmm0
>>         cmpsd   xmm0, xmm1, 6
>>         subsd   xmm2, xmm1
>>         andpd   xmm0, xmm2
>>         ret
>> fdim    endp
> 
> Does this handle nans correctly?

Of course! It's equivalent to

double fdim(double a, double b)
{
    uint64_t mask = (a <= b) ? 0ull : ~0ull;
    union {double dbl; uint64_t ull;} u = {a - b};
    u.ull &= mask;
    return u.dbl;
}

[...]

> OK. I don't mind looking at these patches further as-is, and I'll try
> to continue offering constructive comments now, but it'll be after
> this release cycle (hopefully wrapping that up in the next week or so)
> before consideration for merging. musl 1.2.0 is already going to be a
> release with big changes (time64) and I don't want to risk subtle
> breakage with new changes that haven't been reviewed in detail yet or
> had time for users to test.

That's OK.

Stefan