[musl] ASM-to-C conversion for i386

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Markus Wichmann <nullplan@gmx.net>
To: musl@lists.openwall.com
Subject: [musl] ASM-to-C conversion for i386
Date: Sun, 26 Dec 2021 21:42:38 +0100	[thread overview]
Message-ID: <20211226204238.GA1949@voyager> (raw)

Hi all,

merry Christmas, everyone. I hope you survived the various family
visitations in good health and are slowly coming out of the food coma,
or whatever your anual rituals are.

Anyway, I found myself with a bit of time on my hands and chose to be
productive for once. Rich made some noise however long ago that he
wanted to move from assembly source code files to C source code files
with inline assembly. So I looked at what I could contribute to that
cause.

This is hindered somewhat by the fact that my knowledge of assembler is
restricted to x86, PowerPC, and Microblaze. And for Microblaze, it has
been a while since I've used it.. For ARM and most of the others, I can
get the gist, but there may be subtleties I am not grasping, and that is
precisely what we cannot use for such a conversion.

So I decided to start with the architecture I am most familiar with:
i386. And now I am finished with the largest part of it, the maths code.
That is, finished with the first pass.

You can follow the progress here: https://github.com/nullplan/musl/tree/asm2c

So I've converted __set_thread_area(). That was pretty straightforward
once I found SYSCALL_NO_TLS. The generated assembly generated by clang
6.0.0 hits the same notes as the handwritten code, so I'm willing to
count that as a win.

For the maths code, I've added the likely() and unlikely() macros to
libm.h. Not sure if they belong there, but they do make the generated
assembly more similar to the handwritten code.

Most of that code was straightforward, but some of the more complex
functions I am not sure about. What is up with __exp2l()? I can see that
expl() is calling it, but I'm not sure why. But its existence forced me
to employ a technique not used elsewhere in the code (that I could
find): A hidden alias. I vaguely recall that such hackery was rejected
before (on grounds of old binutils reacting badly to such magic), but I
don't really know what else I could have done. Or was the correct way to
make __exp2l() a hidden function with the actual implementation and
exp2l() (without the underscores) a weak alias?

Anyway, the maths code suffers from massive code duplication on both
assembler and C levels. Not sure what to do about it, though. In many
cases, each of the three versions of a function only differ in the fine
details, but clang being as inline happy as it is means that many
techniques to reduce code duplication in C cause bloated object files in
assembler. For example, all functions of the floor, ceil, and trunc
families have been implemented in floor.c, in terms of a new static
function I called "rndint()", containing the heart of what used to be at
label 1 in floor.s. Unfortunately, after compiling, clang has inlined
rndint() every time, so that floor.o contains all nine functions, and
all functions are substantially copies of rndint(). The only solution I
would see to that would have been to rename "rndint()" to something with
a double underscore at the start, make it hidden and extern, and move
all the functions into their own files, thus preventing inlining and
making the object files more modular. Not sure how you'd like it.

Also, the generated assembly tends to use more memory. It appears that
clang is hesitant to overwrite memory allocated to a variable, even if
that variable is currently parked in a register. Or maybe my clang
version is just weird. That also explains why it sometimes emits "fld"
instructions in the wrong order and then fixes the mistake with "fxch".
Not a huge deal, just weird. Nothing forces the wrong order. And the
order is often correct in the smaller precision versions of the same
function.

Many of the maths functions are testing if their argument is subnormal,
and return an underflow exception if so and the argument is not zero.
For the single-precision case, the idiom used was to square the input,
which I have recreated with FORCE_EVAL(). For the double-precision case,
however, it was to store the variable as single precision.

Finally, I have also converted fenv.s today. I was hesitant to do that
at first, since a general C framework for fenv is under development, but
it has been quite a while since I've heard a peep from that project. In
any case, since their code should overwrite all of the existing fenv
code, a merge would now just lead to trivial path conflicts that are
easily resolved.

I believe in doing the conversion, I found a bug in feclearexcept(). The
original code said in the non-SSE version (context: EAX contains the
status word, ECX contains the function argument, and "1b" is a function
return)

|	test %eax,%ecx
|	jz 1b
|	not %ecx
|	and %ecx,%eax
|	test $0x3f,%eax
|	jz 1f
|	fnclex
|	jmp 1b
|1:	sub $32,%esp
|	fnstenv (%esp)
|	mov %al,4(%esp)
|	fldenv (%esp)
|	add $32,%esp
|	xor %eax,%eax
|	ret

That second "jz" confuses me. The intent seems to be to test if any
exceptions remain, and use "fnclex" if not. That would make sense, since
"fnclex" clears all exceptions. But since the second "jz" is a "jz" and
not a "jnz", the "fnclex" path is used only if exceptions remain, and
the slower "fldenv" path is used if none remain. Or am I reading this
wrong?

Anyway, I implemented the logic that made sense to me in the C version.

What remains to be done? Well, looking at the list of assembler files,
the only targets for a C conversion that remain (in i386) are the string
functions. After that, it is time to clean up and submit patches.

Speaking of, how would you like those? One patch for everything, one
patch per directory (i.e. one for thread, one for math, one for fenv,
one for string), or one per functions group (the three precisions of
each function), or one per function? I don't want to overwhelm you.

Ciao,
Markus

next             reply	other threads:[~2021-12-26 20:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-26 20:42 Markus Wichmann [this message]
2021-12-26 21:20 ` Markus Wichmann
2021-12-27 13:08 ` Markus Wichmann
2021-12-27 15:00 ` Rich Felker
2021-12-27 16:27   ` Markus Wichmann
2021-12-27 16:30   ` Rich Felker
2021-12-27 18:04     ` Markus Wichmann
2021-12-27 18:41       ` Rich Felker
2021-12-28  8:42         ` Markus Wichmann
2021-12-29 10:02   ` Markus Wichmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211226204238.GA1949@voyager \
    --to=nullplan@gmx.net \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).