Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2]

Development discussion of WireGuard
 help / color / mirror / Atom feed

* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2]
       [not found]             ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com>
@ 2020-06-17 20:45               ` Jason A. Donenfeld
  2020-06-17 20:54                 ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2020-06-17 20:45 UTC (permalink / raw)
  To: Rui Salvaterra; +Cc: OpenWrt Development List, wireguard

On Wed, Jun 17, 2020 at 02:33:49PM -0600, Jason A. Donenfeld wrote:
> So, some more research: it looks like the R_ARM_THM_JUMP11 symbol is
> actually wg_packet_send_staged_packets, a boring C function with
> nothing fancy about it. That github issue you pointed to suggested
> that it might have something to do with complex crypto functions, but
> it looks like that's not the case. wg_packet_send_staged_packets is
> plain old boring C.
> 
> But there is one interesting thing about
> wg_packet_send_staged_packets: it's defined in send.c, and called from
> send.c, receive.c, device.c, and netlink.c -- four places. What I
> suspect is happening is that the linker can't quite figure out how to
> order the functions in the final executable so that the
> wg_packet_send_staged_packets definition is sufficiently close to all
> of its call sites, so it then needs to add that extra trampoline
> midway to get to it. Stupid linker. I'm playing now if there's some
> manual reordering I can do in the build system so that this isn't a
> problem, but I'm not very optimistic that I'll succeed.

Looks like my explanation there wasn't 100% accurate, but it does seem
like the issue occurs when gcc sees a clear tail call that it can
optimize into a B instruction instead of a BL instruction.

The below patch avoids that, and thus fixes your issue, using a pretty
bad trick that's not really suitable for being committed anywhere, but
it is perhaps leading us in the right direction:

diff --git a/src/send.c b/src/send.c
index 828b086a..4bb6911f 100644
--- a/src/send.c
+++ b/src/send.c
@@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair,
     simd_context);
 }
 
+volatile char dummy;
+
 void wg_packet_send_keepalive(struct wg_peer *peer)
 {
  struct sk_buff *skb;
@@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer)
  }
 
  wg_packet_send_staged_packets(peer);
+ dummy = -1;
 }
 
 static void wg_packet_create_data_done(struct sk_buff *first,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2]
  2020-06-17 20:45               ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld
@ 2020-06-17 20:54                 ` Jason A. Donenfeld
  2020-06-17 21:02                   ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2020-06-17 20:54 UTC (permalink / raw)
  To: Rui Salvaterra; +Cc: OpenWrt Development List, wireguard

On Wed, Jun 17, 2020 at 02:45:12PM -0600, Jason A. Donenfeld wrote:
> Looks like my explanation there wasn't 100% accurate, but it does seem
> like the issue occurs when gcc sees a clear tail call that it can
> optimize into a B instruction instead of a BL instruction.
> 
> The below patch avoids that, and thus fixes your issue, using a pretty
> bad trick that's not really suitable for being committed anywhere, but
> it is perhaps leading us in the right direction:
> 
> diff --git a/src/send.c b/src/send.c
> index 828b086a..4bb6911f 100644
> --- a/src/send.c
> +++ b/src/send.c
> @@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair,
>      simd_context);
>  }
>  
> +volatile char dummy;
> +
>  void wg_packet_send_keepalive(struct wg_peer *peer)
>  {
>   struct sk_buff *skb;
> @@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer)
>   }
>  
>   wg_packet_send_staged_packets(peer);
> + dummy = -1;
>  }
>  
>  static void wg_packet_create_data_done(struct sk_buff *first,

A better fix with more explanation: it looks like the issue doesn't have
to do with the multifile thing I pointed out before, but just that gcc
sees it can optimize the tail call into a B instruction, which seems to
have a ±2KB range, whereas BL has a ±4MB range. The solution is to just
move the location of the function in that file to be closer to the
destination of the tail call. I'm not a big fan of that and I'm slightly
worried davem will nack it because it makes backporting harder for a
fairly speculative gain (at least, I haven't yet taken measurements,
though I suppose I could). There's also the question of - why are we
doing goofy reordering things to the code to work around a toolchain
bug? Shouldn't we fix the toolchain? So, I'll keep thinking...

diff --git a/src/send.c b/src/send.c
index 828b086a..f44aff8d 100644
--- a/src/send.c
+++ b/src/send.c
@@ -221,27 +221,6 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair,
 						   simd_context);
 }

-void wg_packet_send_keepalive(struct wg_peer *peer)
-{
-	struct sk_buff *skb;
-
-	if (skb_queue_empty(&peer->staged_packet_queue)) {
-		skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH,
-				GFP_ATOMIC);
-		if (unlikely(!skb))
-			return;
-		skb_reserve(skb, DATA_PACKET_HEAD_ROOM);
-		skb->dev = peer->device->dev;
-		PACKET_CB(skb)->mtu = skb->dev->mtu;
-		skb_queue_tail(&peer->staged_packet_queue, skb);
-		net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n",
-				    peer->device->dev->name, peer->internal_id,
-				    &peer->endpoint.addr);
-	}
-
-	wg_packet_send_staged_packets(peer);
-}
-
 static void wg_packet_create_data_done(struct sk_buff *first,
 				       struct wg_peer *peer)
 {
@@ -346,6 +325,27 @@ err:
 	kfree_skb_list(first);
 }

+void wg_packet_send_keepalive(struct wg_peer *peer)
+{
+	struct sk_buff *skb;
+
+	if (skb_queue_empty(&peer->staged_packet_queue)) {
+		skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH,
+				GFP_ATOMIC);
+		if (unlikely(!skb))
+			return;
+		skb_reserve(skb, DATA_PACKET_HEAD_ROOM);
+		skb->dev = peer->device->dev;
+		PACKET_CB(skb)->mtu = skb->dev->mtu;
+		skb_queue_tail(&peer->staged_packet_queue, skb);
+		net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n",
+				    peer->device->dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+	}
+
+	wg_packet_send_staged_packets(peer);
+}
+
 void wg_packet_purge_staged_packets(struct wg_peer *peer)
 {
 	spin_lock_bh(&peer->staged_packet_queue.lock);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Any progress on R_ARM_THM_JUMP11 issues?
  2020-06-17 20:54                 ` Jason A. Donenfeld
@ 2020-06-17 21:02                   ` Jason A. Donenfeld
  2020-06-17 22:18                     ` Rui Salvaterra
  0 siblings, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2020-06-17 21:02 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: OpenWrt Development List, WireGuard mailing list, Rui Salvaterra

Hi ARM folks,

Rui emailed the OpenWRT list and me about an issue he found when
compiling WireGuard. He was compiling kernels with
CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=n -- which I'm well aware the
Kconfig advices people not to do -- and got the dreaded "unknown
relocation 102" error when trying to load the module. I figured out
that I could "fix" it in the WireGuard code by either doing some extra
stuff after the tail call, so that the B becomes a BL, or by moving
the destination of the tail call a bit closer to the callsite, so that
THUMB2's jump distance is shorter and fits within the B's limitations,
thereby not needing the "JUMP11" relocation.

Obviously reordering code for this reason isn't going to fly with
upstream patches, nor would adding dummy code to avoid a tail call.
And there's already CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y which seems
like the right global solution for this.

But I am wondering: has anybody heard about toolchain progress toward
fixing this? Couldn't the compiler reorder functions itself more
intelligently? Or avoid emitting the B in the case that the jump will
be too far? Or does nobody care much about 32-bit ARM these days so
it's just fallen by the wayside and
CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y is the best we've got? Or
something else?

Jason

On Wed, Jun 17, 2020 at 2:54 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Wed, Jun 17, 2020 at 02:45:12PM -0600, Jason A. Donenfeld wrote:
> > Looks like my explanation there wasn't 100% accurate, but it does seem
> > like the issue occurs when gcc sees a clear tail call that it can
> > optimize into a B instruction instead of a BL instruction.
> >
> > The below patch avoids that, and thus fixes your issue, using a pretty
> > bad trick that's not really suitable for being committed anywhere, but
> > it is perhaps leading us in the right direction:
> >
> > diff --git a/src/send.c b/src/send.c
> > index 828b086a..4bb6911f 100644
> > --- a/src/send.c
> > +++ b/src/send.c
> > @@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair,
> >      simd_context);
> >  }
> >
> > +volatile char dummy;
> > +
> >  void wg_packet_send_keepalive(struct wg_peer *peer)
> >  {
> >   struct sk_buff *skb;
> > @@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer)
> >   }
> >
> >   wg_packet_send_staged_packets(peer);
> > + dummy = -1;
> >  }
> >
> >  static void wg_packet_create_data_done(struct sk_buff *first,
>
> A better fix with more explanation: it looks like the issue doesn't have
> to do with the multifile thing I pointed out before, but just that gcc
> sees it can optimize the tail call into a B instruction, which seems to
> have a ±2KB range, whereas BL has a ±4MB range. The solution is to just
> move the location of the function in that file to be closer to the
> destination of the tail call. I'm not a big fan of that and I'm slightly
> worried davem will nack it because it makes backporting harder for a
> fairly speculative gain (at least, I haven't yet taken measurements,
> though I suppose I could). There's also the question of - why are we
> doing goofy reordering things to the code to work around a toolchain
> bug? Shouldn't we fix the toolchain? So, I'll keep thinking...
>
> diff --git a/src/send.c b/src/send.c
> index 828b086a..f44aff8d 100644
> --- a/src/send.c
> +++ b/src/send.c
> @@ -221,27 +221,6 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair,
>                                                    simd_context);
>  }
>
> -void wg_packet_send_keepalive(struct wg_peer *peer)
> -{
> -       struct sk_buff *skb;
> -
> -       if (skb_queue_empty(&peer->staged_packet_queue)) {
> -               skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH,
> -                               GFP_ATOMIC);
> -               if (unlikely(!skb))
> -                       return;
> -               skb_reserve(skb, DATA_PACKET_HEAD_ROOM);
> -               skb->dev = peer->device->dev;
> -               PACKET_CB(skb)->mtu = skb->dev->mtu;
> -               skb_queue_tail(&peer->staged_packet_queue, skb);
> -               net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n",
> -                                   peer->device->dev->name, peer->internal_id,
> -                                   &peer->endpoint.addr);
> -       }
> -
> -       wg_packet_send_staged_packets(peer);
> -}
> -
>  static void wg_packet_create_data_done(struct sk_buff *first,
>                                        struct wg_peer *peer)
>  {
> @@ -346,6 +325,27 @@ err:
>         kfree_skb_list(first);
>  }
>
> +void wg_packet_send_keepalive(struct wg_peer *peer)
> +{
> +       struct sk_buff *skb;
> +
> +       if (skb_queue_empty(&peer->staged_packet_queue)) {
> +               skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH,
> +                               GFP_ATOMIC);
> +               if (unlikely(!skb))
> +                       return;
> +               skb_reserve(skb, DATA_PACKET_HEAD_ROOM);
> +               skb->dev = peer->device->dev;
> +               PACKET_CB(skb)->mtu = skb->dev->mtu;
> +               skb_queue_tail(&peer->staged_packet_queue, skb);
> +               net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n",
> +                                   peer->device->dev->name, peer->internal_id,
> +                                   &peer->endpoint.addr);
> +       }
> +
> +       wg_packet_send_staged_packets(peer);
> +}
> +
>  void wg_packet_purge_staged_packets(struct wg_peer *peer)
>  {
>         spin_lock_bh(&peer->staged_packet_queue.lock);
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any progress on R_ARM_THM_JUMP11 issues?
  2020-06-17 21:02                   ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld
@ 2020-06-17 22:18                     ` Rui Salvaterra
  2020-06-18 23:58                       ` Jason A. Donenfeld
  0 siblings, 1 reply; 7+ messages in thread
From: Rui Salvaterra @ 2020-06-17 22:18 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-arm-kernel, OpenWrt Development List, WireGuard mailing list

Hi again, Jason,

[Adding a bit of extra context for linux-arm-kernel.]

On Wed, 17 Jun 2020 at 22:02, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> But I am wondering: has anybody heard about toolchain progress toward
> fixing this? Couldn't the compiler reorder functions itself more
> intelligently? Or avoid emitting the B in the case that the jump will
> be too far? Or does nobody care much about 32-bit ARM these days so
> it's just fallen by the wayside and
> CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y is the best we've got? Or
> something else?

The thing is, CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y implies
-fno-optimize-sibling-calls, which seems like a big hammer to work
around a toolchain bug.
Now, this bug has been reported in Linaro binutils as early as
February 2011 [1] and the upstream bug has been deemed fixed in
binutils 2.22 [2], two months later. I usually don't build modular
kernels, but in OpenWrt we don't have a choice due to the compat
backports of wireless drivers. What strikes me as odd is the fact that
without CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11, all kernel modules load
and run just fine, except for WireGuard. Anyway, I completely agree
that if it's a toolchain bug, the toolchain must be fixed.

Out of curiosity, I also compared the vmlinux sizes in both modes
(OpenWrt kernel, Linux 5.4.46 with my Turris Omnia configuration, gcc
9.3.0 and binutils 2.34).

Pure ARM:
24243392 bytes

Thumb-2 (with CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y):
22102716 bytes

A 2 MiB smaller code footprint is nothing to sneeze at.

[1] https://bugs.launchpad.net/binutils-linaro/+bug/725126
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=12532

Cheers,
Rui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2]
       [not found]           ` <CAHmME9qWrBTCsBr7s6oLD0zuBMzZUD2OV3s-tgDwV0W7bb9Utw@mail.gmail.com>
       [not found]             ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com>
@ 2020-06-18 23:50             ` Jason A. Donenfeld
  2020-06-19  8:26               ` Rui Salvaterra
  1 sibling, 1 reply; 7+ messages in thread
From: Jason A. Donenfeld @ 2020-06-18 23:50 UTC (permalink / raw)
  To: Rui Salvaterra; +Cc: OpenWrt Development List, WireGuard mailing list

Hey Rui,

I fixed it! It turned out to be caused by -fvisibility=hidden undoing
the effect of the binutils fix from a while back. Here's the patch
that makes the problem go away:

https://git.zx2c4.com/wireguard-linux-compat/commit/?id=178cdfffb99f2fd6fb4a5bfd2f9319461d93f53b

This will be in the next compat release.

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Any progress on R_ARM_THM_JUMP11 issues?
  2020-06-17 22:18                     ` Rui Salvaterra
@ 2020-06-18 23:58                       ` Jason A. Donenfeld
  0 siblings, 0 replies; 7+ messages in thread
From: Jason A. Donenfeld @ 2020-06-18 23:58 UTC (permalink / raw)
  To: Rui Salvaterra
  Cc: linux-arm-kernel, OpenWrt Development List, WireGuard mailing list

Looks as though in the end this is a binutils bug with
-fvisibility=hidden. Details on
https://sourceware.org/bugzilla/show_bug.cgi?id=12532#c9

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2]
  2020-06-18 23:50             ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld
@ 2020-06-19  8:26               ` Rui Salvaterra
  0 siblings, 0 replies; 7+ messages in thread
From: Rui Salvaterra @ 2020-06-19  8:26 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: OpenWrt Development List, WireGuard mailing list

Good morning, Jason!

On Fri, 19 Jun 2020 at 00:50, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey Rui,
>
> I fixed it! It turned out to be caused by -fvisibility=hidden undoing
> the effect of the binutils fix from a while back. Here's the patch
> that makes the problem go away:
>
> https://git.zx2c4.com/wireguard-linux-compat/commit/?id=178cdfffb99f2fd6fb4a5bfd2f9319461d93f53b
>
> This will be in the next compat release.
>
> Jason

Great detective work there too! :) I do have to wonder if this is
really a binutils/gas bug, though. From what I could gather, it's only
the kernel module loader which can't (and won't, I remember reading
somewhere they don't make sense for the kernel) resolve
R_ARM_THM_JUMP11 relocations, and using -fvisibility=hidden in a
kernel module seems to send the linker a conflicting message. Anyway,
I'd still open that bug, at least to get a definitive answer. ;)

Thanks a lot,
Rui

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-06-19  8:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALjTZvbpu1Lw0j9dtXZPmVS+i-OnopUo+zuqtoQLnABQGw-SqQ@mail.gmail.com>
     [not found] ` <CAHmME9r3nPwmUoYYrj0PnUStd1ACSmdFAO4Qv2cZtmiLspOW1g@mail.gmail.com>
     [not found]   ` <CALjTZvbtjVwpyV+AMX4htssTbwTHV45mQeokUr952D_GbtFPvw@mail.gmail.com>
     [not found]     ` <CALjTZvZRerzWqaqhY2U=m44n5taLEsY99uEt2=ZNCe27=LYbLA@mail.gmail.com>
     [not found]       ` <CAHmME9otC1mOqR2tLB55BVQQpNPvCMUGa1E4jfMYYXNp6_31BA@mail.gmail.com>
     [not found]         ` <CALjTZvZ4wqZZ7_Fk-YHaxT9uuWnS4n9dLm4ZXSy1UM3riv+NuQ@mail.gmail.com>
     [not found]           ` <CAHmME9qWrBTCsBr7s6oLD0zuBMzZUD2OV3s-tgDwV0W7bb9Utw@mail.gmail.com>
     [not found]             ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com>
2020-06-17 20:45               ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld
2020-06-17 20:54                 ` Jason A. Donenfeld
2020-06-17 21:02                   ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld
2020-06-17 22:18                     ` Rui Salvaterra
2020-06-18 23:58                       ` Jason A. Donenfeld
2020-06-18 23:50             ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld
2020-06-19  8:26               ` Rui Salvaterra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).