Rich Felker wrote: If you don't want patches for assembly modules, please state so VERY CLEAR in your FAQ. > On Fri, Aug 06, 2021 at 12:17:12PM +0200, Stefan Kanthak wrote: >> Alexander Monakov wrote: >> >> > On Wed, 4 Aug 2021, Stefan Kanthak wrote: >> >> The change just follows by removing 6 LOC/instructions.-) >> > >> > Have you considered collecting the three bits in one go via a multiplication? >> >> No. My mind is not that twisted;-) >> >> > You can first isolate the necessary bits with 'and $0x4300, %eax', then do >> > 'imul $0x910000, %eax, %eax' to put the required bits in EAX[31:29] in the >> > right order, then shift right by 29. Three instructions, 14 bytes. >> >> Thanks, VERY NICE! How did you come up to it? >> >> Revised patch with shorter bit twiddling attached. > > The path forward for all the math asm is moving it to inline asm in C > files, with no flow control or bit/register shuffling in the asm, only > using asm for the single instructions. See how Alexander Monakov did > x86_64 remquol in commit 19f870c3a68a959c7c6ef1de12086ac908920e5e. This commit is for i386 fmod/fmodf/fmodl. The bit twiddling used in (which I hadn't noticed yet) and the code GCC generates for it is but (almost) as bad as the original assembly code: | shrl $8, %eax | movl %eax, %ecx | movabsq $8463725162920157216, %rax | rolb $4, %cl | andl $60, %ecx | sarq %cl, %rax | andl $7, %eax vs. | mov %ah,%dl | shr %dl | and $1,%dl | mov %ah,%al | shr $5,%al | and $2,%al | or %al,%dl | mov %ah,%al | shl $2,%al | and $4,%al | or %al,%dl > I haven't read the mul trick here in detail but I believe it should be > duplicable with plain C * operator. It is. > I really do not want to review/merge asm changes that keep this kind > of complex logic in asm when there's no strong motivation for it (like > fixing an actual bug, vs just reducing size or improving speed). The > risk to reward ratio is just not reasonable. Final patch attached! Stefan