Development discussion of WireGuard
 help / color / mirror / Atom feed
* [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches
@ 2021-07-06 13:27 Mathias Krause
  2021-07-06 13:27 ` [PATCH 1/2] compat: better grsecurity compatibility Mathias Krause
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Mathias Krause @ 2021-07-06 13:27 UTC (permalink / raw)
  To: Jason A . Donenfeld; +Cc: wireguard, Mathias Krause

Hi Jason,

this mini series enhances compatibility of the wireguard-linux-compat
repo with grsecurity. I noticed some build breakage on our 4.14 kernel
which the following patches take care of.

Please apply!

Thanks,
Mathias


Mathias Krause (2):
  compat: better grsecurity compatibility
  curve25519-x86_64: solve register constraints with reserved registers

 src/compat/compat-asm.h                        | 4 ++--
 src/compat/compat.h                            | 4 ++--
 src/crypto/zinc/curve25519/curve25519-x86_64.c | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] compat: better grsecurity compatibility
  2021-07-06 13:27 [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Mathias Krause
@ 2021-07-06 13:27 ` Mathias Krause
  2021-07-06 13:27 ` [PATCH 2/2] curve25519-x86_64: solve register constraints with reserved registers Mathias Krause
  2021-08-08 20:53 ` [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Jason A. Donenfeld
  2 siblings, 0 replies; 5+ messages in thread
From: Mathias Krause @ 2021-07-06 13:27 UTC (permalink / raw)
  To: Jason A . Donenfeld; +Cc: wireguard, Mathias Krause

grsecurity kernels tend to carry additional backports and changes, like
commit b60b87fc2996 ("netlink: add ethernet address policy types") or
the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no
diversion to RAP_ENTRY is needed any more.

Instead of relying on the kernel version test, also test for the macros
we're about to define to not already be defined to account for these
additional changes in the grsecurity patch without breaking
compatibility to the older public ones.

Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related
changes as these don't depend on the RAP plugin to be enabled but just a
PaX/grsecurity patch to be applied. While there is no preprocessor knob
for the latter, use CONFIG_PAX as this will likely be enabled in every
kernel that uses the patch.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
 src/compat/compat-asm.h | 4 ++--
 src/compat/compat.h     | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/compat/compat-asm.h b/src/compat/compat-asm.h
index fde21dabba4f..5bfdb9410933 100644
--- a/src/compat/compat-asm.h
+++ b/src/compat/compat-asm.h
@@ -22,7 +22,7 @@
 #endif
 
 /* PaX compatibility */
-#if defined(RAP_PLUGIN)
+#if defined(RAP_PLUGIN) && defined(RAP_ENTRY)
 #undef ENTRY
 #define ENTRY RAP_ENTRY
 #endif
@@ -51,7 +51,7 @@
 #undef pull
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 4, 76) && !defined(ISCENTOS8S)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 4, 76) && !defined(ISCENTOS8S) && !defined(SYM_FUNC_START)
 #define SYM_FUNC_START ENTRY
 #define SYM_FUNC_END ENDPROC
 #endif
diff --git a/src/compat/compat.h b/src/compat/compat.h
index b2041327d85c..da6912d871fa 100644
--- a/src/compat/compat.h
+++ b/src/compat/compat.h
@@ -830,7 +830,7 @@ static inline void skb_mark_not_on_list(struct sk_buff *skb)
 }
 #endif
 
-#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 20, 0) && !defined(ISRHEL8)
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 20, 0) && !defined(ISRHEL8) && !defined(NLA_POLICY_EXACT_LEN)
 #define NLA_POLICY_EXACT_LEN(_len) { .type = NLA_UNSPEC, .len = _len }
 #endif
 #if LINUX_VERSION_CODE < KERNEL_VERSION(5, 2, 0) && !defined(ISRHEL8)
@@ -1127,7 +1127,7 @@ static const struct header_ops ip_tunnel_header_ops = { .parse_protocol = ip_tun
 #undef __read_mostly
 #define __read_mostly
 #endif
-#if (defined(RAP_PLUGIN) || defined(CONFIG_CFI_CLANG)) && LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
+#if (defined(CONFIG_PAX) || defined(CONFIG_CFI_CLANG)) && LINUX_VERSION_CODE < KERNEL_VERSION(4, 15, 0)
 #include <linux/timer.h>
 #define wg_expired_retransmit_handshake(a) wg_expired_retransmit_handshake(unsigned long timer)
 #define wg_expired_send_keepalive(a) wg_expired_send_keepalive(unsigned long timer)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] curve25519-x86_64: solve register constraints with reserved registers
  2021-07-06 13:27 [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Mathias Krause
  2021-07-06 13:27 ` [PATCH 1/2] compat: better grsecurity compatibility Mathias Krause
@ 2021-07-06 13:27 ` Mathias Krause
  2021-08-08 20:53 ` [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Jason A. Donenfeld
  2 siblings, 0 replies; 5+ messages in thread
From: Mathias Krause @ 2021-07-06 13:27 UTC (permalink / raw)
  To: Jason A . Donenfeld; +Cc: wireguard, Mathias Krause

The register constraints for the inline assembly in fsqr() and fsqr2()
are pretty tight on what the compiler may assign to the remaining three
register variables. The clobber list only allows the following to be
used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having
CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left
so the compiler rightfully complains about impossible constraints.

Provide alternatives that'll allow a memory reference for 'out' to solve
the allocation constraint dilemma for this configuration.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>

---

Yes, the '+' constraint prefix doesn't need to be repeated for the
alternatives. In fact, it's invalid syntax to do so (see [1]). Also
"+rm" won't do the trick either, as in this case gcc still insists to
have a free register -- even if it would choose a memory operand in the
end.

[1] https://gcc.gnu.org/onlinedocs/gcc/Multi-Alternative.html#Multi-Alternative
---
 src/crypto/zinc/curve25519/curve25519-x86_64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/crypto/zinc/curve25519/curve25519-x86_64.c b/src/crypto/zinc/curve25519/curve25519-x86_64.c
index 79716c425b0c..67f55affcf88 100644
--- a/src/crypto/zinc/curve25519/curve25519-x86_64.c
+++ b/src/crypto/zinc/curve25519/curve25519-x86_64.c
@@ -581,7 +581,7 @@ static inline void fsqr(u64 *out, const u64 *f, u64 *tmp)
 		"  cmovc %%rdx, %%rax;"
 		"  add %%rax, %%r8;"
 		"  movq %%r8, 0(%0);"
-	: "+&r" (tmp), "+&r" (f), "+&r" (out)
+	: "+&r,&r" (tmp), "+&r,&r" (f), "+&r,m" (out)
 	:
 	: "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10", "%r11", "%rbx", "%r13", "%r14", "%r15", "memory", "cc"
 	);
@@ -743,7 +743,7 @@ static inline void fsqr2(u64 *out, const u64 *f, u64 *tmp)
 		"  cmovc %%rdx, %%rax;"
 		"  add %%rax, %%r8;"
 		"  movq %%r8, 32(%0);"
-	: "+&r" (tmp), "+&r" (f), "+&r" (out)
+	: "+&r,&r" (tmp), "+&r,&r" (f), "+&r,m" (out)
 	:
 	: "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10", "%r11", "%rbx", "%r13", "%r14", "%r15", "memory", "cc"
 	);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches
  2021-07-06 13:27 [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Mathias Krause
  2021-07-06 13:27 ` [PATCH 1/2] compat: better grsecurity compatibility Mathias Krause
  2021-07-06 13:27 ` [PATCH 2/2] curve25519-x86_64: solve register constraints with reserved registers Mathias Krause
@ 2021-08-08 20:53 ` Jason A. Donenfeld
  2021-08-09 10:13   ` Mathias Krause
  2 siblings, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2021-08-08 20:53 UTC (permalink / raw)
  To: Mathias Krause; +Cc: WireGuard mailing list

Hi Mathias,

Sorry for the delay in reviewing these. Thanks for that. I've merged
them with a trivial change. The constraint one is interesting; it
looks like with your change and those extra constraints it winds up
spilling rdi to the stack?

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches
  2021-08-08 20:53 ` [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Jason A. Donenfeld
@ 2021-08-09 10:13   ` Mathias Krause
  0 siblings, 0 replies; 5+ messages in thread
From: Mathias Krause @ 2021-08-09 10:13 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Hi Jason,

Am 08.08.21 um 22:53 schrieb Jason A. Donenfeld:
> Hi Mathias,
> 
> Sorry for the delay in reviewing these. Thanks for that. I've merged
> them with a trivial change.

thanks!

>                             The constraint one is interesting; it
> looks like with your change and those extra constraints it winds up
> spilling rdi to the stack?

Indeed, looking, for example, at fsquare_times() (!RAP, FRAMEPOINTER=y),
it looks like gcc prefers to choose the constraint with a memory output
argument even if it could fulfill the register allocation request, i.e.
it could allocate (all remaining) 3 registers instead of only 2 but
chooses to do the latter nonetheless. However, for me it's %rsi that's
put on the stack. Also it's not spilled, but used as a variable -- the
stack frame increases by 8 bytes and the stack slot where %rsi is
written to is used as-is for operations in the end. But, again, this
differs from the original code that used to allocate three registers for
the inline assembly. The old code has two memory loads at the end and
one compare while the new one has only one -- directly encode into the
compare instruction. So, memory-wise, a net win?

Below is the diff of the disassembly of fsquare_times(), spare of
addresses to reduce clutter with comments below each hunk:

--- old.dis	2021-08-09 11:46:53.050680456 +0200
+++ new.dis	2021-08-09 11:48:18.816851059 +0200
@@ -6,10 +6,10 @@
 	push   %r13
 	push   %r12
 	push   %rbx
-	sub    $0x18,%rsp
-	mov    %rdi,-0x38(%rbp)
-	mov    %rdx,-0x40(%rbp)
-	mov    %ecx,-0x2c(%rbp)
+	sub    $0x20,%rsp
+	mov    %rdi,-0x40(%rbp)
+	mov    %rdx,-0x48(%rbp)
+	mov    %ecx,-0x30(%rbp)
 	mov    %rdx,%r12
 	mov    (%rsi),%rdx
 	mulx   0x8(%rsi),%r8,%r14

This first hunk is really only the increased stack frame and resulting
offset changes.

@@ -92,15 +92,14 @@
 	cmovb  %rdx,%rax
 	add    %rax,%r8
 	mov    %r8,(%r12)
-	mov    -0x2c(%rbp),%eax
+	mov    -0x30(%rbp),%eax
 	sub    $0x1,%eax
-	mov    %eax,-0x30(%rbp)
-	je     1f42 <fsquare_times+0x3a2>
+	mov    %eax,-0x34(%rbp)
+	je     1f3c <fsquare_times+0x39c>
 	xor    %r12d,%r12d
-	mov    %r12d,-0x2c(%rbp)     <-- no longer needed in new code
-	mov    -0x38(%rbp),%rsi
-	mov    -0x40(%rbp),%rdi
-	mov    %rsi,%r12             <-- see below
+	mov    -0x40(%rbp),%rsi
+	mov    -0x48(%rbp),%rdi
+	mov    %rsi,-0x30(%rbp)      <-- reg -> mem operand constraint
 	mov    (%rsi),%rdx
 	mulx   0x8(%rsi),%r8,%r14
 	xor    %r15,%r15

Offset changes again. Beside the one dropped instruction there's only
one real change ("mov %rsi,%r12" -> "mov %rsi,-0x30(%rbp)"), induced by
the memory constraint of the inline assembly, switching a register
operand to a memory location.

@@ -154,7 +153,7 @@
 	adcx   %rcx,%r14
 	mov    %r14,0x38(%rdi)
 	mov    %rdi,%rsi
-	mov    %r12,%rdi
+	mov    -0x30(%rbp),%rdi
 	mov    $0x26,%rdx
 	mulx   0x20(%rsi),%r8,%r13
 	xor    %rcx,%rcx

Load of the stack slot which used to be a register operation before.

@@ -182,12 +181,10 @@
 	cmovb  %rdx,%rax
 	add    %rax,%r8
 	mov    %r8,(%rdi)
-	addl   $0x1,-0x2c(%rbp)            <-- mem operation
-	mov    -0x30(%rbp),%ebx            <-- mem load
-	mov    -0x2c(%rbp),%eax            <-- mem load
-	cmp    %ebx,%eax                   <-- reg compare
-	jne    1d83 <fsquare_times+0x1e3>
-	add    $0x18,%rsp
+	add    $0x1,%r12d                  <-- reg operation
+	cmp    -0x34(%rbp),%r12d           <-- mem compare
+	jne    1d7f <fsquare_times+0x1df>
+	add    $0x20,%rsp
 	pop    %rbx
 	pop    %r12
 	pop    %r13

We switch one memory operation (addl) to a register one and spare the
memory loads for the compare by directly encoding one of the memory
operands into the compare instruction. IMHO a net win, as gcc now can
use a register for the loop condition variable, making the conditional
jump depend only on one memory operand, not two. But only a benchmark
can tell if it's really faster or even slower.


Thanks,
Mathias

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-09 10:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06 13:27 [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Mathias Krause
2021-07-06 13:27 ` [PATCH 1/2] compat: better grsecurity compatibility Mathias Krause
2021-07-06 13:27 ` [PATCH 2/2] curve25519-x86_64: solve register constraints with reserved registers Mathias Krause
2021-08-08 20:53 ` [PATCH 0/2] wireguard-linux-compat: grsecurity compat patches Jason A. Donenfeld
2021-08-09 10:13   ` Mathias Krause

Development discussion of WireGuard

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.vuxu.org/wireguard/0 wireguard/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 wireguard wireguard/ https://inbox.vuxu.org/wireguard \
		wireguard@lists.zx2c4.com
	public-inbox-index wireguard

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.vuxu.org/vuxu.archive.wireguard


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git