Hi Jason, Great work on the Freebsd kmod so far! Couple of issues to report. I am running the wireguard-kmod-0.0.20210428 snapshot on my pfsense router. I am working with the pfSense-pkg-Wireguard effort in building the WG package. Admittedly I am mostly testing and providing some UI code. However I have come across 2 errors. First one is a KP that happened sometime today. FreeBSD pfsense 12.2-STABLE FreeBSD 12.2-STABLE 1b709158e581(RELENG_2_5_0) pfSense amd64 Here is the stack trace from the KP https://pastebin.com/4bjdzYas db:0:kdb.enter.default> bt Tracing pid 0 tid 100402 td 0xfffff800c67b6740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe004d02c4b0 vpanic() at vpanic+0x197/frame 0xfffffe004d02c500 panic() at panic+0x43/frame 0xfffffe004d02c560 trap_fatal() at trap_fatal+0x391/frame 0xfffffe004d02c5c0 trap() at trap+0x67/frame 0xfffffe004d02c6d0 calltrap() at calltrap+0x8/frame 0xfffffe004d02c6d0 --- trap 0x9, rip = 0xffffffff840fd580, rsp = 0xfffffe004d02c7a0, rbp = 0xfffffe004d02c7e0 --- noise_remote_index_insert() at noise_remote_index_insert+0xb0/frame 0xfffffe004d02c7e0 noise_consume_initiation() at noise_consume_initiation+0x6bb/frame 0xfffffe004d02ca10 wg_softc_handshake_receive() at wg_softc_handshake_receive+0x27a/frame 0xfffffe004d02cb20 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe004d02cb80 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xb6/frame 0xfffffe004d02cbb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe004d02cbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004d02cbf0 Second issue is that I am seeing memory silent corruption where the pfSense UI stops responding and serves up invalid files. Reboot fixes it. I have NOT noticed this issue with the 0415 snapshot; this happened both in the 0424 and 0428 snapshots. While I cannot definitively say its wg related, that is the only bit changing on the boxes. Thanks Manoj
Hi Manojav, On Mon, May 3, 2021 at 3:05 PM Manojav Sridhar <manojav@manojav.com> wrote: > --- trap 0x9, rip = 0xffffffff840fd580, rsp = 0xfffffe004d02c7a0, rbp = > 0xfffffe004d02c7e0 --- > noise_remote_index_insert() at noise_remote_index_insert+0xb0/frame > 0xfffffe004d02c7e0 > noise_consume_initiation() at noise_consume_initiation+0x6bb/frame > 0xfffffe004d02ca10 > wg_softc_handshake_receive() at wg_softc_handshake_receive+0x27a/frame > 0xfffffe004d02cb20 Do you know how to reproduce this? Do you have the symbol file anywhere? Otherwise, do you think you could send me (off list) your if_wg.ko file that produced this stack trace? Then I can put it into the disassembler. > Second issue is that I am seeing memory silent corruption where the pfSense > UI stops responding and serves up invalid files. Fixed in https://lists.zx2c4.com/pipermail/wireguard/2021-May/006694.html . Jason
Thanks. I have responded off list onr your other request. Will
continue to test on the latest snapshots!
On Mon, May 3, 2021 at 9:07 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Manojav,
>
> On Mon, May 3, 2021 at 3:05 PM Manojav Sridhar <manojav@manojav.com> wrote:
> > --- trap 0x9, rip = 0xffffffff840fd580, rsp = 0xfffffe004d02c7a0, rbp =
> > 0xfffffe004d02c7e0 ---
> > noise_remote_index_insert() at noise_remote_index_insert+0xb0/frame
> > 0xfffffe004d02c7e0
> > noise_consume_initiation() at noise_consume_initiation+0x6bb/frame
> > 0xfffffe004d02ca10
> > wg_softc_handshake_receive() at wg_softc_handshake_receive+0x27a/frame
> > 0xfffffe004d02cb20
>
> Do you know how to reproduce this? Do you have the symbol file
> anywhere? Otherwise, do you think you could send me (off list) your
> if_wg.ko file that produced this stack trace? Then I can put it into
> the disassembler.
>
> > Second issue is that I am seeing memory silent corruption where the pfSense
> > UI stops responding and serves up invalid files.
>
> Fixed in https://lists.zx2c4.com/pipermail/wireguard/2021-May/006694.html .
>
> Jason
Hey again, Thanks for the .ko you sent me. That was helpful in tracking down the bug, which Matt and I have now fixed here: https://git.zx2c4.com/wireguard-freebsd/commit/?id=c69fb61b94341027ea3c539bcf96d9fe03f65fa5 The commit message includes a little bash reproducer that hit the same crash in my tests, making me somewhat confident we squashed the right one. Jason
Jason,
Thanks for the follow up and bash to script to ensure help re-pro it.
I am guessing my constant restarting of the tunnels when testing the
pfSense based UI we are building triggered the scenario your bash
script creates.
I tried the bash script on both bare metal box and virtualbox pfsense
box. both ran for a few minutes okay. How long does it take to happen?
I am testing with the 0428 snapshot.
Manoj
On Mon, May 3, 2021 at 11:08 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey again,
>
> Thanks for the .ko you sent me. That was helpful in tracking down the
> bug, which Matt and I have now fixed here:
>
> https://git.zx2c4.com/wireguard-freebsd/commit/?id=c69fb61b94341027ea3c539bcf96d9fe03f65fa5
>
> The commit message includes a little bash reproducer that hit the same
> crash in my tests, making me somewhat confident we squashed the right
> one.
>
> Jason
On Mon, May 3, 2021 at 5:27 PM Manojav Sridhar <manojav@manojav.com> wrote:
> I tried the bash script on both bare metal box and virtualbox pfsense
> box. both ran for a few minutes okay. How long does it take to happen?
> I am testing with the 0428 snapshot.
Try changing in wg_noise.c:
#define HT_INDEX_SIZE (1 << 13)
to
#define HT_INDEX_SIZE (1 << 3)
And then you'll see it hit pretty quickly.
Ah. Understood. I am not set up to build for freebsd yet ko. But I can
leave it running on my test box for a bit.
On Mon, May 3, 2021 at 11:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Mon, May 3, 2021 at 5:27 PM Manojav Sridhar <manojav@manojav.com> wrote:
> > I tried the bash script on both bare metal box and virtualbox pfsense
> > box. both ran for a few minutes okay. How long does it take to happen?
> > I am testing with the 0428 snapshot.
>
> Try changing in wg_noise.c:
>
> #define HT_INDEX_SIZE (1 << 13)
>
> to
>
> #define HT_INDEX_SIZE (1 << 3)
>
> And then you'll see it hit pretty quickly.
Just happened! so yeah that was it on the trigger. Once Cmac builds
the ko for me I will test it again!
Again thanks so much!
On Mon, May 3, 2021 at 11:32 AM Manojav Sridhar <manojav@manojav.com> wrote:
>
> Ah. Understood. I am not set up to build for freebsd yet ko. But I can
> leave it running on my test box for a bit.
>
> On Mon, May 3, 2021 at 11:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > On Mon, May 3, 2021 at 5:27 PM Manojav Sridhar <manojav@manojav.com> wrote:
> > > I tried the bash script on both bare metal box and virtualbox pfsense
> > > box. both ran for a few minutes okay. How long does it take to happen?
> > > I am testing with the 0428 snapshot.
> >
> > Try changing in wg_noise.c:
> >
> > #define HT_INDEX_SIZE (1 << 13)
> >
> > to
> >
> > #define HT_INDEX_SIZE (1 << 3)
> >
> > And then you'll see it hit pretty quickly.
On Mon, May 3, 2021 at 5:33 PM Manojav Sridhar <manojav@manojav.com> wrote:
>
> Ah. Understood. I am not set up to build for freebsd yet ko. But I can
> leave it running on my test box for a bit.
Ah, don't worry about it. The trigger was sufficient for my purposes,
but it doesn't need to be reproduced elsewhere necessarily. However,
if you do wind up seeing this same bug again, using the latest master
branch that contains the fix, please let me know, since that'd
indicate I've done something wrong.
Jason
On Mon, May 3, 2021 at 5:35 PM Manojav Sridhar <manojav@manojav.com> wrote:
>
> Just happened! so yeah that was it on the trigger.
With 1 << 13 or 1 << 3?
With the same ko I sent you. 1<<13. I was just confirming I could trigger it.
On Mon, May 3, 2021 at 11:35 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On Mon, May 3, 2021 at 5:35 PM Manojav Sridhar <manojav@manojav.com> wrote:
> >
> > Just happened! so yeah that was it on the trigger.
>
> With 1 << 13 or 1 << 3?
The code in here will repro the bug much faster: https://git.zx2c4.com/wireguard-freebsd/commit/?id=561f3a8f930cf2e44f493fa04d932ba9a2362cc5
Jason,
Thanks for the update. Yes it still triggers this on the current
snapshot, which is built prior to your fix. I will retry it once you
release a new snapshot. It seems quite a long shot that this occurred
on my firewall in the first place. Glad it was reported and fixed.
Onward!
Thanks
Manoj
On Mon, May 3, 2021 at 1:56 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> The code in here will repro the bug much faster:
>
> https://git.zx2c4.com/wireguard-freebsd/commit/?id=561f3a8f930cf2e44f493fa04d932ba9a2362cc5
Hi Manojav, 0.0.20210503 is now in ports, which contains these fixes. Jason
Jason,
With some help I was able to get the latest if_wg.ko built. I have
been running the triggering bash script for a while now and not
managed to trigger it. However prior to installing the latest snapshot
it happened one more my on firewall (just provided as FYI) in the
middle of the night as part of normal usage.
Thanks for jumping on this! looking solid again!
Manoj
On Thu, May 6, 2021 at 8:10 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Manojav,
>
> 0.0.20210503 is now in ports, which contains these fixes.
>
> Jason