* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2] [not found] ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com> @ 2020-06-17 20:45 ` Jason A. Donenfeld 2020-06-17 20:54 ` Jason A. Donenfeld 0 siblings, 1 reply; 7+ messages in thread From: Jason A. Donenfeld @ 2020-06-17 20:45 UTC (permalink / raw) To: Rui Salvaterra; +Cc: OpenWrt Development List, wireguard On Wed, Jun 17, 2020 at 02:33:49PM -0600, Jason A. Donenfeld wrote: > So, some more research: it looks like the R_ARM_THM_JUMP11 symbol is > actually wg_packet_send_staged_packets, a boring C function with > nothing fancy about it. That github issue you pointed to suggested > that it might have something to do with complex crypto functions, but > it looks like that's not the case. wg_packet_send_staged_packets is > plain old boring C. > > But there is one interesting thing about > wg_packet_send_staged_packets: it's defined in send.c, and called from > send.c, receive.c, device.c, and netlink.c -- four places. What I > suspect is happening is that the linker can't quite figure out how to > order the functions in the final executable so that the > wg_packet_send_staged_packets definition is sufficiently close to all > of its call sites, so it then needs to add that extra trampoline > midway to get to it. Stupid linker. I'm playing now if there's some > manual reordering I can do in the build system so that this isn't a > problem, but I'm not very optimistic that I'll succeed. Looks like my explanation there wasn't 100% accurate, but it does seem like the issue occurs when gcc sees a clear tail call that it can optimize into a B instruction instead of a BL instruction. The below patch avoids that, and thus fixes your issue, using a pretty bad trick that's not really suitable for being committed anywhere, but it is perhaps leading us in the right direction: diff --git a/src/send.c b/src/send.c index 828b086a..4bb6911f 100644 --- a/src/send.c +++ b/src/send.c @@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, simd_context); } +volatile char dummy; + void wg_packet_send_keepalive(struct wg_peer *peer) { struct sk_buff *skb; @@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer) } wg_packet_send_staged_packets(peer); + dummy = -1; } static void wg_packet_create_data_done(struct sk_buff *first, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2] 2020-06-17 20:45 ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld @ 2020-06-17 20:54 ` Jason A. Donenfeld 2020-06-17 21:02 ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld 0 siblings, 1 reply; 7+ messages in thread From: Jason A. Donenfeld @ 2020-06-17 20:54 UTC (permalink / raw) To: Rui Salvaterra; +Cc: OpenWrt Development List, wireguard On Wed, Jun 17, 2020 at 02:45:12PM -0600, Jason A. Donenfeld wrote: > Looks like my explanation there wasn't 100% accurate, but it does seem > like the issue occurs when gcc sees a clear tail call that it can > optimize into a B instruction instead of a BL instruction. > > The below patch avoids that, and thus fixes your issue, using a pretty > bad trick that's not really suitable for being committed anywhere, but > it is perhaps leading us in the right direction: > > diff --git a/src/send.c b/src/send.c > index 828b086a..4bb6911f 100644 > --- a/src/send.c > +++ b/src/send.c > @@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, > simd_context); > } > > +volatile char dummy; > + > void wg_packet_send_keepalive(struct wg_peer *peer) > { > struct sk_buff *skb; > @@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer) > } > > wg_packet_send_staged_packets(peer); > + dummy = -1; > } > > static void wg_packet_create_data_done(struct sk_buff *first, A better fix with more explanation: it looks like the issue doesn't have to do with the multifile thing I pointed out before, but just that gcc sees it can optimize the tail call into a B instruction, which seems to have a ±2KB range, whereas BL has a ±4MB range. The solution is to just move the location of the function in that file to be closer to the destination of the tail call. I'm not a big fan of that and I'm slightly worried davem will nack it because it makes backporting harder for a fairly speculative gain (at least, I haven't yet taken measurements, though I suppose I could). There's also the question of - why are we doing goofy reordering things to the code to work around a toolchain bug? Shouldn't we fix the toolchain? So, I'll keep thinking... diff --git a/src/send.c b/src/send.c index 828b086a..f44aff8d 100644 --- a/src/send.c +++ b/src/send.c @@ -221,27 +221,6 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, simd_context); } -void wg_packet_send_keepalive(struct wg_peer *peer) -{ - struct sk_buff *skb; - - if (skb_queue_empty(&peer->staged_packet_queue)) { - skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, - GFP_ATOMIC); - if (unlikely(!skb)) - return; - skb_reserve(skb, DATA_PACKET_HEAD_ROOM); - skb->dev = peer->device->dev; - PACKET_CB(skb)->mtu = skb->dev->mtu; - skb_queue_tail(&peer->staged_packet_queue, skb); - net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n", - peer->device->dev->name, peer->internal_id, - &peer->endpoint.addr); - } - - wg_packet_send_staged_packets(peer); -} - static void wg_packet_create_data_done(struct sk_buff *first, struct wg_peer *peer) { @@ -346,6 +325,27 @@ err: kfree_skb_list(first); } +void wg_packet_send_keepalive(struct wg_peer *peer) +{ + struct sk_buff *skb; + + if (skb_queue_empty(&peer->staged_packet_queue)) { + skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, + GFP_ATOMIC); + if (unlikely(!skb)) + return; + skb_reserve(skb, DATA_PACKET_HEAD_ROOM); + skb->dev = peer->device->dev; + PACKET_CB(skb)->mtu = skb->dev->mtu; + skb_queue_tail(&peer->staged_packet_queue, skb); + net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr); + } + + wg_packet_send_staged_packets(peer); +} + void wg_packet_purge_staged_packets(struct wg_peer *peer) { spin_lock_bh(&peer->staged_packet_queue.lock); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Any progress on R_ARM_THM_JUMP11 issues? 2020-06-17 20:54 ` Jason A. Donenfeld @ 2020-06-17 21:02 ` Jason A. Donenfeld 2020-06-17 22:18 ` Rui Salvaterra 0 siblings, 1 reply; 7+ messages in thread From: Jason A. Donenfeld @ 2020-06-17 21:02 UTC (permalink / raw) To: linux-arm-kernel Cc: OpenWrt Development List, WireGuard mailing list, Rui Salvaterra Hi ARM folks, Rui emailed the OpenWRT list and me about an issue he found when compiling WireGuard. He was compiling kernels with CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=n -- which I'm well aware the Kconfig advices people not to do -- and got the dreaded "unknown relocation 102" error when trying to load the module. I figured out that I could "fix" it in the WireGuard code by either doing some extra stuff after the tail call, so that the B becomes a BL, or by moving the destination of the tail call a bit closer to the callsite, so that THUMB2's jump distance is shorter and fits within the B's limitations, thereby not needing the "JUMP11" relocation. Obviously reordering code for this reason isn't going to fly with upstream patches, nor would adding dummy code to avoid a tail call. And there's already CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y which seems like the right global solution for this. But I am wondering: has anybody heard about toolchain progress toward fixing this? Couldn't the compiler reorder functions itself more intelligently? Or avoid emitting the B in the case that the jump will be too far? Or does nobody care much about 32-bit ARM these days so it's just fallen by the wayside and CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y is the best we've got? Or something else? Jason On Wed, Jun 17, 2020 at 2:54 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > On Wed, Jun 17, 2020 at 02:45:12PM -0600, Jason A. Donenfeld wrote: > > Looks like my explanation there wasn't 100% accurate, but it does seem > > like the issue occurs when gcc sees a clear tail call that it can > > optimize into a B instruction instead of a BL instruction. > > > > The below patch avoids that, and thus fixes your issue, using a pretty > > bad trick that's not really suitable for being committed anywhere, but > > it is perhaps leading us in the right direction: > > > > diff --git a/src/send.c b/src/send.c > > index 828b086a..4bb6911f 100644 > > --- a/src/send.c > > +++ b/src/send.c > > @@ -221,6 +221,8 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, > > simd_context); > > } > > > > +volatile char dummy; > > + > > void wg_packet_send_keepalive(struct wg_peer *peer) > > { > > struct sk_buff *skb; > > @@ -240,6 +242,7 @@ void wg_packet_send_keepalive(struct wg_peer *peer) > > } > > > > wg_packet_send_staged_packets(peer); > > + dummy = -1; > > } > > > > static void wg_packet_create_data_done(struct sk_buff *first, > > A better fix with more explanation: it looks like the issue doesn't have > to do with the multifile thing I pointed out before, but just that gcc > sees it can optimize the tail call into a B instruction, which seems to > have a ±2KB range, whereas BL has a ±4MB range. The solution is to just > move the location of the function in that file to be closer to the > destination of the tail call. I'm not a big fan of that and I'm slightly > worried davem will nack it because it makes backporting harder for a > fairly speculative gain (at least, I haven't yet taken measurements, > though I suppose I could). There's also the question of - why are we > doing goofy reordering things to the code to work around a toolchain > bug? Shouldn't we fix the toolchain? So, I'll keep thinking... > > diff --git a/src/send.c b/src/send.c > index 828b086a..f44aff8d 100644 > --- a/src/send.c > +++ b/src/send.c > @@ -221,27 +221,6 @@ static bool encrypt_packet(struct sk_buff *skb, struct noise_keypair *keypair, > simd_context); > } > > -void wg_packet_send_keepalive(struct wg_peer *peer) > -{ > - struct sk_buff *skb; > - > - if (skb_queue_empty(&peer->staged_packet_queue)) { > - skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, > - GFP_ATOMIC); > - if (unlikely(!skb)) > - return; > - skb_reserve(skb, DATA_PACKET_HEAD_ROOM); > - skb->dev = peer->device->dev; > - PACKET_CB(skb)->mtu = skb->dev->mtu; > - skb_queue_tail(&peer->staged_packet_queue, skb); > - net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n", > - peer->device->dev->name, peer->internal_id, > - &peer->endpoint.addr); > - } > - > - wg_packet_send_staged_packets(peer); > -} > - > static void wg_packet_create_data_done(struct sk_buff *first, > struct wg_peer *peer) > { > @@ -346,6 +325,27 @@ err: > kfree_skb_list(first); > } > > +void wg_packet_send_keepalive(struct wg_peer *peer) > +{ > + struct sk_buff *skb; > + > + if (skb_queue_empty(&peer->staged_packet_queue)) { > + skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, > + GFP_ATOMIC); > + if (unlikely(!skb)) > + return; > + skb_reserve(skb, DATA_PACKET_HEAD_ROOM); > + skb->dev = peer->device->dev; > + PACKET_CB(skb)->mtu = skb->dev->mtu; > + skb_queue_tail(&peer->staged_packet_queue, skb); > + net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n", > + peer->device->dev->name, peer->internal_id, > + &peer->endpoint.addr); > + } > + > + wg_packet_send_staged_packets(peer); > +} > + > void wg_packet_purge_staged_packets(struct wg_peer *peer) > { > spin_lock_bh(&peer->staged_packet_queue.lock); > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any progress on R_ARM_THM_JUMP11 issues? 2020-06-17 21:02 ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld @ 2020-06-17 22:18 ` Rui Salvaterra 2020-06-18 23:58 ` Jason A. Donenfeld 0 siblings, 1 reply; 7+ messages in thread From: Rui Salvaterra @ 2020-06-17 22:18 UTC (permalink / raw) To: Jason A. Donenfeld Cc: linux-arm-kernel, OpenWrt Development List, WireGuard mailing list Hi again, Jason, [Adding a bit of extra context for linux-arm-kernel.] On Wed, 17 Jun 2020 at 22:02, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > But I am wondering: has anybody heard about toolchain progress toward > fixing this? Couldn't the compiler reorder functions itself more > intelligently? Or avoid emitting the B in the case that the jump will > be too far? Or does nobody care much about 32-bit ARM these days so > it's just fallen by the wayside and > CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y is the best we've got? Or > something else? The thing is, CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y implies -fno-optimize-sibling-calls, which seems like a big hammer to work around a toolchain bug. Now, this bug has been reported in Linaro binutils as early as February 2011 [1] and the upstream bug has been deemed fixed in binutils 2.22 [2], two months later. I usually don't build modular kernels, but in OpenWrt we don't have a choice due to the compat backports of wireless drivers. What strikes me as odd is the fact that without CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11, all kernel modules load and run just fine, except for WireGuard. Anyway, I completely agree that if it's a toolchain bug, the toolchain must be fixed. Out of curiosity, I also compared the vmlinux sizes in both modes (OpenWrt kernel, Linux 5.4.46 with my Turris Omnia configuration, gcc 9.3.0 and binutils 2.34). Pure ARM: 24243392 bytes Thumb-2 (with CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y): 22102716 bytes A 2 MiB smaller code footprint is nothing to sneeze at. [1] https://bugs.launchpad.net/binutils-linaro/+bug/725126 [2] https://sourceware.org/bugzilla/show_bug.cgi?id=12532 Cheers, Rui ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Any progress on R_ARM_THM_JUMP11 issues? 2020-06-17 22:18 ` Rui Salvaterra @ 2020-06-18 23:58 ` Jason A. Donenfeld 0 siblings, 0 replies; 7+ messages in thread From: Jason A. Donenfeld @ 2020-06-18 23:58 UTC (permalink / raw) To: Rui Salvaterra Cc: linux-arm-kernel, OpenWrt Development List, WireGuard mailing list Looks as though in the end this is a binutils bug with -fvisibility=hidden. Details on https://sourceware.org/bugzilla/show_bug.cgi?id=12532#c9 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2] [not found] ` <CAHmME9qWrBTCsBr7s6oLD0zuBMzZUD2OV3s-tgDwV0W7bb9Utw@mail.gmail.com> [not found] ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com> @ 2020-06-18 23:50 ` Jason A. Donenfeld 2020-06-19 8:26 ` Rui Salvaterra 1 sibling, 1 reply; 7+ messages in thread From: Jason A. Donenfeld @ 2020-06-18 23:50 UTC (permalink / raw) To: Rui Salvaterra; +Cc: OpenWrt Development List, WireGuard mailing list Hey Rui, I fixed it! It turned out to be caused by -fvisibility=hidden undoing the effect of the binutils fix from a while back. Here's the patch that makes the problem go away: https://git.zx2c4.com/wireguard-linux-compat/commit/?id=178cdfffb99f2fd6fb4a5bfd2f9319461d93f53b This will be in the next compat release. Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wireguard: unknown relocation: 102 [ARMv7 Thumb-2] 2020-06-18 23:50 ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld @ 2020-06-19 8:26 ` Rui Salvaterra 0 siblings, 0 replies; 7+ messages in thread From: Rui Salvaterra @ 2020-06-19 8:26 UTC (permalink / raw) To: Jason A. Donenfeld; +Cc: OpenWrt Development List, WireGuard mailing list Good morning, Jason! On Fri, 19 Jun 2020 at 00:50, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hey Rui, > > I fixed it! It turned out to be caused by -fvisibility=hidden undoing > the effect of the binutils fix from a while back. Here's the patch > that makes the problem go away: > > https://git.zx2c4.com/wireguard-linux-compat/commit/?id=178cdfffb99f2fd6fb4a5bfd2f9319461d93f53b > > This will be in the next compat release. > > Jason Great detective work there too! :) I do have to wonder if this is really a binutils/gas bug, though. From what I could gather, it's only the kernel module loader which can't (and won't, I remember reading somewhere they don't make sense for the kernel) resolve R_ARM_THM_JUMP11 relocations, and using -fvisibility=hidden in a kernel module seems to send the linker a conflicting message. Anyway, I'd still open that bug, at least to get a definitive answer. ;) Thanks a lot, Rui ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-06-19 8:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CALjTZvbpu1Lw0j9dtXZPmVS+i-OnopUo+zuqtoQLnABQGw-SqQ@mail.gmail.com> [not found] ` <CAHmME9r3nPwmUoYYrj0PnUStd1ACSmdFAO4Qv2cZtmiLspOW1g@mail.gmail.com> [not found] ` <CALjTZvbtjVwpyV+AMX4htssTbwTHV45mQeokUr952D_GbtFPvw@mail.gmail.com> [not found] ` <CALjTZvZRerzWqaqhY2U=m44n5taLEsY99uEt2=ZNCe27=LYbLA@mail.gmail.com> [not found] ` <CAHmME9otC1mOqR2tLB55BVQQpNPvCMUGa1E4jfMYYXNp6_31BA@mail.gmail.com> [not found] ` <CALjTZvZ4wqZZ7_Fk-YHaxT9uuWnS4n9dLm4ZXSy1UM3riv+NuQ@mail.gmail.com> [not found] ` <CAHmME9qWrBTCsBr7s6oLD0zuBMzZUD2OV3s-tgDwV0W7bb9Utw@mail.gmail.com> [not found] ` <CAHmME9p51XvLEZ7QbDreEXym34S4XZZaRotAv4aRiT5D4Pz3XA@mail.gmail.com> 2020-06-17 20:45 ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld 2020-06-17 20:54 ` Jason A. Donenfeld 2020-06-17 21:02 ` Any progress on R_ARM_THM_JUMP11 issues? Jason A. Donenfeld 2020-06-17 22:18 ` Rui Salvaterra 2020-06-18 23:58 ` Jason A. Donenfeld 2020-06-18 23:50 ` wireguard: unknown relocation: 102 [ARMv7 Thumb-2] Jason A. Donenfeld 2020-06-19 8:26 ` Rui Salvaterra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).