From: Mathias Krause <minipli@grsecurity.net>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: WireGuard mailing list <wireguard@lists.zx2c4.com>
Subject: Re: [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches
Date: Mon, 9 Aug 2021 12:13:54 +0200 [thread overview]
Message-ID: <c772d6c2-dac7-8c11-7c41-aaa6435a0bd6@grsecurity.net> (raw)
In-Reply-To: <CAHmME9o38fA4_nDVyS9FvO1EL-Z4fS+a2eCo6Xq7VzMSrPYNKQ@mail.gmail.com>
Hi Jason,
Am 08.08.21 um 22:53 schrieb Jason A. Donenfeld:
> Hi Mathias,
>
> Sorry for the delay in reviewing these. Thanks for that. I've merged
> them with a trivial change.
thanks!
> The constraint one is interesting; it
> looks like with your change and those extra constraints it winds up
> spilling rdi to the stack?
Indeed, looking, for example, at fsquare_times() (!RAP, FRAMEPOINTER=y),
it looks like gcc prefers to choose the constraint with a memory output
argument even if it could fulfill the register allocation request, i.e.
it could allocate (all remaining) 3 registers instead of only 2 but
chooses to do the latter nonetheless. However, for me it's %rsi that's
put on the stack. Also it's not spilled, but used as a variable -- the
stack frame increases by 8 bytes and the stack slot where %rsi is
written to is used as-is for operations in the end. But, again, this
differs from the original code that used to allocate three registers for
the inline assembly. The old code has two memory loads at the end and
one compare while the new one has only one -- directly encode into the
compare instruction. So, memory-wise, a net win?
Below is the diff of the disassembly of fsquare_times(), spare of
addresses to reduce clutter with comments below each hunk:
--- old.dis 2021-08-09 11:46:53.050680456 +0200
+++ new.dis 2021-08-09 11:48:18.816851059 +0200
@@ -6,10 +6,10 @@
push %r13
push %r12
push %rbx
- sub $0x18,%rsp
- mov %rdi,-0x38(%rbp)
- mov %rdx,-0x40(%rbp)
- mov %ecx,-0x2c(%rbp)
+ sub $0x20,%rsp
+ mov %rdi,-0x40(%rbp)
+ mov %rdx,-0x48(%rbp)
+ mov %ecx,-0x30(%rbp)
mov %rdx,%r12
mov (%rsi),%rdx
mulx 0x8(%rsi),%r8,%r14
This first hunk is really only the increased stack frame and resulting
offset changes.
@@ -92,15 +92,14 @@
cmovb %rdx,%rax
add %rax,%r8
mov %r8,(%r12)
- mov -0x2c(%rbp),%eax
+ mov -0x30(%rbp),%eax
sub $0x1,%eax
- mov %eax,-0x30(%rbp)
- je 1f42 <fsquare_times+0x3a2>
+ mov %eax,-0x34(%rbp)
+ je 1f3c <fsquare_times+0x39c>
xor %r12d,%r12d
- mov %r12d,-0x2c(%rbp) <-- no longer needed in new code
- mov -0x38(%rbp),%rsi
- mov -0x40(%rbp),%rdi
- mov %rsi,%r12 <-- see below
+ mov -0x40(%rbp),%rsi
+ mov -0x48(%rbp),%rdi
+ mov %rsi,-0x30(%rbp) <-- reg -> mem operand constraint
mov (%rsi),%rdx
mulx 0x8(%rsi),%r8,%r14
xor %r15,%r15
Offset changes again. Beside the one dropped instruction there's only
one real change ("mov %rsi,%r12" -> "mov %rsi,-0x30(%rbp)"), induced by
the memory constraint of the inline assembly, switching a register
operand to a memory location.
@@ -154,7 +153,7 @@
adcx %rcx,%r14
mov %r14,0x38(%rdi)
mov %rdi,%rsi
- mov %r12,%rdi
+ mov -0x30(%rbp),%rdi
mov $0x26,%rdx
mulx 0x20(%rsi),%r8,%r13
xor %rcx,%rcx
Load of the stack slot which used to be a register operation before.
@@ -182,12 +181,10 @@
cmovb %rdx,%rax
add %rax,%r8
mov %r8,(%rdi)
- addl $0x1,-0x2c(%rbp) <-- mem operation
- mov -0x30(%rbp),%ebx <-- mem load
- mov -0x2c(%rbp),%eax <-- mem load
- cmp %ebx,%eax <-- reg compare
- jne 1d83 <fsquare_times+0x1e3>
- add $0x18,%rsp
+ add $0x1,%r12d <-- reg operation
+ cmp -0x34(%rbp),%r12d <-- mem compare
+ jne 1d7f <fsquare_times+0x1df>
+ add $0x20,%rsp
pop %rbx
pop %r12
pop %r13
We switch one memory operation (addl) to a register one and spare the
memory loads for the compare by directly encoding one of the memory
operands into the compare instruction. IMHO a net win, as gcc now can
use a register for the loop condition variable, making the conditional
jump depend only on one memory operand, not two. But only a benchmark
can tell if it's really faster or even slower.
Thanks,
Mathias
next prev parent reply other threads:[~2021-08-09 10:14 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-06 13:27 Mathias Krause
2021-07-06 13:27 ` [PATCH 1/2] compat: better grsecurity compatibility Mathias Krause
2021-07-06 13:27 ` [PATCH 2/2] curve25519-x86_64: solve register constraints with reserved registers Mathias Krause
2021-08-08 20:53 ` [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Jason A. Donenfeld
2021-08-09 10:13 ` Mathias Krause [this message]
2021-12-03 22:20 ` Jason A. Donenfeld
2021-12-03 22:25 ` Jason A. Donenfeld
2021-12-06 14:04 ` Mathias Krause
2021-12-06 14:48 ` Jason A. Donenfeld
2021-12-06 16:24 ` Mathias Krause
2021-12-06 16:27 ` Jason A. Donenfeld
2021-12-06 18:18 ` Mathias Krause
2021-12-06 18:55 ` Jason A. Donenfeld
2021-12-06 19:28 ` Jason A. Donenfeld
2021-12-06 20:54 ` Mathias Krause
2021-12-08 14:56 ` Jason A. Donenfeld
2021-12-06 21:00 ` Mathias Krause
2021-12-08 14:56 ` Jason A. Donenfeld
2021-12-09 7:59 ` Mathias Krause
2021-12-10 22:36 ` Jason A. Donenfeld
2021-12-10 22:58 ` Jason A. Donenfeld
2021-12-11 16:35 ` Aymeric Fromherz
2021-12-12 21:43 ` Jason A. Donenfeld
2021-12-13 7:54 ` Mathias Krause
2021-12-13 11:36 ` Jason A. Donenfeld
2021-12-13 16:29 ` Jason A. Donenfeld
2021-12-13 16:46 ` Mathias Krause
2021-12-13 7:44 ` Mathias Krause
2021-12-13 14:20 ` Aymeric Fromherz
2021-12-13 14:33 ` Mathias Krause
2021-12-13 14:37 ` Jason A. Donenfeld
2021-12-13 16:32 ` Mathias Krause
2021-12-13 16:33 ` Jason A. Donenfeld
2021-12-13 16:39 ` Mathias Krause
2021-12-13 16:53 ` Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c772d6c2-dac7-8c11-7c41-aaa6435a0bd6@grsecurity.net \
--to=minipli@grsecurity.net \
--cc=Jason@zx2c4.com \
--cc=wireguard@lists.zx2c4.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).