From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/10533 Path: news.gmane.org!.POSTED!not-for-mail From: Markus Wichmann Newsgroups: gmane.linux.lib.musl.general Subject: Model specific optimizations? Date: Thu, 29 Sep 2016 16:21:26 +0200 Message-ID: <20160929142126.GB22343@voyager> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1475158942 8187 195.159.176.226 (29 Sep 2016 14:22:22 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 29 Sep 2016 14:22:22 +0000 (UTC) User-Agent: Mutt/1.5.23 (2014-03-12) To: musl@lists.openwall.com Original-X-From: musl-return-10546-gllmg-musl=m.gmane.org@lists.openwall.com Thu Sep 29 16:22:14 2016 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1bpcDd-0006re-1S for gllmg-musl@m.gmane.org; Thu, 29 Sep 2016 16:21:49 +0200 Original-Received: (qmail 30498 invoked by uid 550); 29 Sep 2016 14:21:47 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30464 invoked from network); 29 Sep 2016 14:21:46 -0000 Content-Disposition: inline X-Provags-ID: V03:K0:u95tz2DjCgyViU8lPLbQI93y+d+9+V9bDAdvJg0EGvZI+GP4LW0 UFoPQkWqodq+K1N2IwyPUCHGeJsEqPyudbfMZorFcCXB+ew703+W8Kh275uIgsPy4ZKBl/a KZIpMgtWS9fDPbrxEeagtbxRAYZRM+UbzSuiX+EY/vel/F48Mt2MNoJ19DwtrND15qsmPWY uuL00HDEAL9VwG8oBc3Jg== X-UI-Out-Filterresults: notjunk:1;V01:K0:25pyELO36f4=:tHitI5Pee9nfdhtw1Ke5oK Pxo+uxbdR9upjGugX8wX4hF8y1Con+2YuCg44W7+w11GrcrDRlUbf+Uit8KMOJtC4KyxmE0Tg 7y2xDeZfn4/czSBkDHvr8f5T/XR7+M4RxICba4BMmG1budMPbYbNF/W3Z33ySmqSV542gWnPX 73jidKj+QrAf0pgtPMskRZbldypCDFidxxQXi0bT61D22NjlJoaUDdZ6juWQT3CSx0GTABrj5 i4a+IufKYxIjqIMiU/6/vuVZ9NilQbo4V7vl5egZHVIQrTKJz9d7ak5sWo/7evO1LsksWQog4 9nk7L7iWFKg1UOwGANLyWymSpSdx5/EYZbUKjq8jDkQP3zGVra5B0admAqn1mm7n+1l7ZbHME +9UAdkm2deWMBzF03ri4INZhF674onzuT2IMd4s1BqAkl3fqogLv/C/fdaEoT1Z4iVunCEKMZ gqzN1pVBY0HajzzD61Wyy+GVMydGu+gOrzc+agUON1fEEII1WuZZwV+2HnWxsUshb8myLFqmg xT3Oovlg6Q0jqtHicTHddmceYSdue8x5gbQQudJ2OpU3WTrSTUF8cX8diApaMJYFv4AxwgrF7 NCMEZHu2OOOn7mKOKpqed+3q9viEsLHv4abdqOkxGcZ6nCzW0RcyVvBD17B8OrHLhbEQBwNgt fMfQ/JLO4iLroZdCzJK4wEMmxBeJC/q5RKYVNTYaBPEIFVM0YwmEL+bOVcGPc0KZbH4LZhX77 vbM9foSQnV1aBYStQUoLHkzOaE6cFmNdbSli2/SxS8CrX/vDSltHRYla0uY= Xref: news.gmane.org gmane.linux.lib.musl.general:10533 Archived-At: Hi there, I wanted to ask if there is any wish for the near future to support model-specific optimizations. What I mean by that is multiple implementations of the same function, where the best implementation is decided at run-time. One simple example would be PowerPC's fsqrt instruction. The PowerPC Book 1 defines it as optional and provides no way to know specifically, if the currently running processor supports this instruction besides executing it and seeing if you get a SIGILL. A cursory DuckDuckGo search revealed that Apple uses the instruction as sqrt implementation if it detects the CPU capability for that, however it only detects that capability by checking the PVR for known-good bit patterns (Currently, the only known PowerPC cores to support this instruction are the 970 and 970FX, which have a version field if 0x39 and 0x3c, respectively). x86 and -derived architectures at least have the cpuid instruction to check for some features, and admittedly, there's a lot of defined bits. However, glibc's ifunc-initialization function (which selects the implementation) also does a lot of work finding out the precise make and model of the CPU to set some more flags. The reason I ask is that lots of ISAs define optional parts that aren't mandatory, but grow in popularity more and more until they're seen in all current practical implementations. Like how x87 started out as a separate device but is a fixed part of x86 since the later days of the 486. Same with MMX, SSE, SSE2. None of these are mandatory by the ABI, but available in all practical implementations. And musl is never going to be able to utilize that in its current form. Oh, alright, the compiler might support it, but that's different. I also suspect the fsqrt instruction will be available in more future PowerPC implementations. If we were to go this route, the question is how to go about it. First the detection method: Stuff like cpuid or AT_HWCAP are pretty nice, because they allow for the detection of a feature, whereas version checking only allows one to find known-good implementations. The latter means there's a list of known-good values, and that list has to be kept up-to-date. However, the latter is also pretty much always possible, while the former isn't always available. The kernel doesn't check for fsqrt availability, for example. Then organization: Are we going the glibc route, which gathers all indirect functions in a single section and initializes all of the pointers at startup (__libc_init_main()), or do we put these checks separately in each function? To make a practical example, we could implement sqrt() for PowerPC like this: static double soft_sqrt(double); static double hard_sqrt(double); static double init_sqrt(double); static double (*sqrtfn)(double) = init_sqrt; double sqrt(double x) { return sqrtfn(x); } static double init_sqrt(double x) { unsigned long pvr; unsigned long ver; asm ("mfspr pvr, r0" : "=r"(pvr)); ver = (pvr >> 16) & 0xffff; /* XXX: Add more values for cores with the fsqrt instruction here */ if (0 || ver == 0x39 /* PowerPC 970 */ || ver == 0x3c /* PowerPC 970FX */ ) sqrtfn = hard_sqrt; else sqrtfn = soft_sqrt; return sqrtfn(x); } static double hard_sqrt(double x) { double r; asm ("fsqrt %0, %1": "=d"(r) : "d"(x)); return r; } #define sqrt soft_sqrt #include "../sqrt.c" Problem with this is: The same thing would have to be repeated for sqrtf(), the same list of known values would have to be maintained twice, although we could make it a real list (an array, I mean), and get rid of that issue. But it does add quite a bit of code, and the overhead of an indirect function call, and at the moment isn't going to be useful to all but a few people. Also, the inclusion here is a hack. But I couldn't think of a better way. Thoughts? Ciao, Markus