From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, FREEMAIL_FROM,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 16147 invoked from network); 26 Dec 2021 20:42:55 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 26 Dec 2021 20:42:55 -0000 Received: (qmail 13980 invoked by uid 550); 26 Dec 2021 20:42:51 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 13945 invoked from network); 26 Dec 2021 20:42:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1640551359; bh=0FlqynoJgJJLGtLt1VHNNEYrTb+/g8zuOFEMYkzhukI=; h=X-UI-Sender-Class:Date:From:To:Subject; b=ddoHRGNza1vC66FrQtff2X2K5rvLeMzaK6pTCH2OyE5YgKehMOxVb8+tdyTsrREX+ 3oOMK0nX8DY6Q67EFV4UcewuN0gQAlOEwRzdzfgbvobB9zzvlc2mmhG2OgpA0mrXBG 6Ij4ybGQdocfn5zBrTj5RI8IvPayda44TYPlZ8vQ= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Date: Sun, 26 Dec 2021 21:42:38 +0100 From: Markus Wichmann To: musl@lists.openwall.com Message-ID: <20211226204238.GA1949@voyager> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Provags-ID: V03:K1:pmlOoh8jfnEc8dm3fN8riL3v4zFQ0/wnA/UABRXstB7fSDAEpep FBCwPgARgiSv0c4orfEOcB/9aTL6KYh4ygaod2WPqIXNkwLmqOdDGCkjvzch27f845ABBsf LhChBEmcnukXhDh+GPyWOUjyqbNhISXewjKfzfAHS+LW+NbW7MBU2yJkfyfaS1XyhtMd/fp PGW1iwkB3/lpXpJuYG+PA== X-UI-Out-Filterresults: notjunk:1;V03:K0:7+Sd6tFXt7A=:+RdPNb6tIcqpGgQjUYxeJD CqiURD+rhsb7kSonxe0vU9m9712CqELdrF4H8QDJLS2UlE+gBFNZuByNjOsDWZidmvWu9W5TE pl5XmqzEr9EqhNq4E6oBB1MsrK1OLuMoeYF/5KL5t+7NBVgFMKJDVsvRCtAlI2oli67S0doi8 wEqrsxCbn+k02qlG9j7op8pnY9ymxFjyquypB5nqwelMTGDVMfquyfRDjQoudV2mjoaeTquZq RCOhK96WAsm/TaRL2baFGhlmLKvxqKj43zyk1/WLkSBmSnV1zKr0Ikb/0px/sa4FN4e8UIoJG Px4+8qU1BfvVoO7Nupu+2xKgVC23Q5maekrWACWaECxge72wLz9WsyvJKrkZpZ2umgRYasLa/ qZKBOgnXQ4GpGR2lBuh2wgYisvwZesWPp9rJYnZEucknntLtV8fooDJiTzEZZhOU2XeUSjjgV b5ss98ZlMNDnJNdDX4ktlJmVTXzl3quF/u+azj0r7B/+1xpP+KtZJY3FCqaH/qA146mEWxPY4 9Yobb03+djpgKqK/VWMKlEYT0e0/ozqAiZ2k/8I3PakwFsg6LO//D9pOYzvEMi0fCsd5CnIU0 9PxrIkrKB+EGhif4zrFwQFNaugist00u3TQqtw5c8NE1ufpjNQGUoOgFhde4g0qu0yGj+FGMB aPyonUtqx8LVliBoJHd4DiLoRkdSohl+jYnp4diYx3NrD+r+70Sr3VGr6vqe0eAupxMkWdA1U HBzDp2OejMynjeVKqyh8HmzsKfvV2YFUVtZtrioy2YjLclInZAXQPYsx3fApwq6LfKl3Kor7i 4XdcULaAQOhEdZCc94OYtWUeOQIS9T9HQFwAb90P2kcShNwSZB9mrdkRKdmdebnNr3RcW6GvW weG9EqFNsjaSvA0GWcAfZ77PgXPOpE9qMSmoUVPqUnPHa20jnL31M7AMvoFarmdAaHef3Jhsl 1GGburQcwrrU/GMDGfQTmujA9lE6B5SA1lijK3f2OVcmjBmefT2NqsOPtvLagOA5M/2LT8gpw U3Sl/v6//ndEMvWAJSihrwJGN9bjr6Cp38Atp5TNpfoq4Grmaiq7IvfePVgcPMWfuGVmAypLs eXFU5vFRcRPins= Subject: [musl] ASM-to-C conversion for i386 Hi all, merry Christmas, everyone. I hope you survived the various family visitations in good health and are slowly coming out of the food coma, or whatever your anual rituals are. Anyway, I found myself with a bit of time on my hands and chose to be productive for once. Rich made some noise however long ago that he wanted to move from assembly source code files to C source code files with inline assembly. So I looked at what I could contribute to that cause. This is hindered somewhat by the fact that my knowledge of assembler is restricted to x86, PowerPC, and Microblaze. And for Microblaze, it has been a while since I've used it.. For ARM and most of the others, I can get the gist, but there may be subtleties I am not grasping, and that is precisely what we cannot use for such a conversion. So I decided to start with the architecture I am most familiar with: i386. And now I am finished with the largest part of it, the maths code. That is, finished with the first pass. You can follow the progress here: https://github.com/nullplan/musl/tree/asm2c So I've converted __set_thread_area(). That was pretty straightforward once I found SYSCALL_NO_TLS. The generated assembly generated by clang 6.0.0 hits the same notes as the handwritten code, so I'm willing to count that as a win. For the maths code, I've added the likely() and unlikely() macros to libm.h. Not sure if they belong there, but they do make the generated assembly more similar to the handwritten code. Most of that code was straightforward, but some of the more complex functions I am not sure about. What is up with __exp2l()? I can see that expl() is calling it, but I'm not sure why. But its existence forced me to employ a technique not used elsewhere in the code (that I could find): A hidden alias. I vaguely recall that such hackery was rejected before (on grounds of old binutils reacting badly to such magic), but I don't really know what else I could have done. Or was the correct way to make __exp2l() a hidden function with the actual implementation and exp2l() (without the underscores) a weak alias? Anyway, the maths code suffers from massive code duplication on both assembler and C levels. Not sure what to do about it, though. In many cases, each of the three versions of a function only differ in the fine details, but clang being as inline happy as it is means that many techniques to reduce code duplication in C cause bloated object files in assembler. For example, all functions of the floor, ceil, and trunc families have been implemented in floor.c, in terms of a new static function I called "rndint()", containing the heart of what used to be at label 1 in floor.s. Unfortunately, after compiling, clang has inlined rndint() every time, so that floor.o contains all nine functions, and all functions are substantially copies of rndint(). The only solution I would see to that would have been to rename "rndint()" to something with a double underscore at the start, make it hidden and extern, and move all the functions into their own files, thus preventing inlining and making the object files more modular. Not sure how you'd like it. Also, the generated assembly tends to use more memory. It appears that clang is hesitant to overwrite memory allocated to a variable, even if that variable is currently parked in a register. Or maybe my clang version is just weird. That also explains why it sometimes emits "fld" instructions in the wrong order and then fixes the mistake with "fxch". Not a huge deal, just weird. Nothing forces the wrong order. And the order is often correct in the smaller precision versions of the same function. Many of the maths functions are testing if their argument is subnormal, and return an underflow exception if so and the argument is not zero. For the single-precision case, the idiom used was to square the input, which I have recreated with FORCE_EVAL(). For the double-precision case, however, it was to store the variable as single precision. Finally, I have also converted fenv.s today. I was hesitant to do that at first, since a general C framework for fenv is under development, but it has been quite a while since I've heard a peep from that project. In any case, since their code should overwrite all of the existing fenv code, a merge would now just lead to trivial path conflicts that are easily resolved. I believe in doing the conversion, I found a bug in feclearexcept(). The original code said in the non-SSE version (context: EAX contains the status word, ECX contains the function argument, and "1b" is a function return) | test %eax,%ecx | jz 1b | not %ecx | and %ecx,%eax | test $0x3f,%eax | jz 1f | fnclex | jmp 1b |1: sub $32,%esp | fnstenv (%esp) | mov %al,4(%esp) | fldenv (%esp) | add $32,%esp | xor %eax,%eax | ret That second "jz" confuses me. The intent seems to be to test if any exceptions remain, and use "fnclex" if not. That would make sense, since "fnclex" clears all exceptions. But since the second "jz" is a "jz" and not a "jnz", the "fnclex" path is used only if exceptions remain, and the slower "fldenv" path is used if none remain. Or am I reading this wrong? Anyway, I implemented the logic that made sense to me in the C version. What remains to be done? Well, looking at the list of assembler files, the only targets for a C conversion that remain (in i386) are the string functions. After that, it is time to clean up and submit patches. Speaking of, how would you like those? One patch for everything, one patch per directory (i.e. one for thread, one for math, one for fenv, one for string), or one per functions group (the three precisions of each function), or one per function? I don't want to overwhelm you. Ciao, Markus