Development discussion of WireGuard
 help / color / mirror / Atom feed
From: "حامد صابر" <hsaber@gmail.com>
To: WireGuard mailing list <wireguard@lists.zx2c4.com>
Subject: Re: Unexpected experience of site-to-site wireguard tunneling‏‏
Date: Tue, 21 Sep 2021 08:56:59 +0430	[thread overview]
Message-ID: <CAECraRqw7tZ12cPgADE=APs0a1LyG7zwPpJYZ+E3DhMhkcohDg@mail.gmail.com> (raw)
In-Reply-To: <CAECraRpgfySdLsk_HtW8NSph3B==oX_2fdbJrOnS9vn3MWokwg@mail.gmail.com>

Dear all,
I have been struggling with this issue for many days but can't even
guess the cause.
John did me a favor and connected to my server to check what's going
on, but he couldn't find it either.
May someone please suggest an idea. I doubt there is a Wireguard bug.
Rgds
Hamed

‫حامد صابر <‪hsaber@gmail.com‬‏> در تاریخ شنبه ۴ سپتامبر ۲۰۲۱ ساعت ۲۱:۲۵ نوشت:‬
>
> UPDATE:
> During the outage, I ran a simple test to check if both sides have
> access to each other's specified UDP ports, then I found something
> interesting:
> Some 148 bytes length packets are transferred between parties which I
> can't recognize what are them.
> By the way here I copy my test results.
> Anybody can spot what's going on here?
> The middle-node host name which runs wg1 is ir-pp and the exit-note
> (to free internet) which runs wg2 is sf-do:
> (these two parts happened at the same time and are copied from 2
> different servers):
>
> ========================
> root@ir-pp:~# nc -u sf-do 50840
> 123456
> 1234
> 12
> ^C
> root@ir-pp:~# tcpdump 'port 50841'
> dropped privs to tcpdump
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 20:56:57.607342 IP ir-pp.50841 > sf-do.50840: UDP, length 148
> 20:57:01.042471 IP sf-do.43161 > ir-pp.50841: UDP, length 3
> 20:57:02.361827 IP ir-pp.50841 > sf-do.50840: UDP, length 148
> 20:57:03.635754 IP sf-do.43161 > ir-pp.50841: UDP, length 5
> 20:57:06.740922 IP sf-do.43161 > ir-pp.50841: UDP, length 7
> 20:57:07.552305 IP ir-pp.50841 > sf-do.50840: UDP, length 148
> ^C
> 6 packets captured
> 15 packets received by filter
> 0 packets dropped by kernel
> root@ir-pp:~#
>
>
> ========================
> root@sf-do:~# tcpdump 'port 50840'
> dropped privs to tcpdump
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 20:56:09.126958 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> 20:56:14.758974 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> 20:56:18.222814 IP ir-pp.38136 > sf-do.50840: UDP, length 7
> 20:56:20.391041 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> 20:56:22.307488 IP ir-pp.38136 > sf-do.50840: UDP, length 5
> 20:56:24.702590 IP ir-pp.38136 > sf-do.50840: UDP, length 3
> 20:56:25.510985 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> 20:56:30.630917 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> 20:56:35.750965 IP sf-do.50840 > ir-pp.50841: UDP, length 148
> ^C
> 9 packets captured
> 9 packets received by filter
> 0 packets dropped by kernel
> root@sf-do:~# nc -u ir-pp 50841
> 12
> 1234
> 123456
> ^C
> root@sf-do:~#
>
> ‫حامد صابر <‪hsaber@gmail.com‬‏> در تاریخ جمعه ۳ سپتامبر ۲۰۲۱ ساعت ۸:۳۹ نوشت:‬
> >
> > Hi again,
> > Something new happened which makes me more confused.
> > I wrote a small shell-script to check the connection between wg1 and
> > wg2, so whenever it drops, the script restarts the wg1 and everything
> > comes back.
> > Yes yes I don't like this way of addressing issues, but what could I
> > do if no meaningful debug information exists, and I predict that it
> > might be a bug of Wireguard itself?
> > BTW this system worked fine and the anti-censorship VPN chain was up
> > and running till this morning.
> > The connection went down at 7:49 and didn't come back with
> > auto-restart. The process of restarting continued for about 6 minutes,
> > and at last at 7:55 it came back.
> > During the outage I checked both sides and everything seemed fine.
> > Sides could ping the public ip of each other, their wg udp ports were
> > accessible to each other, and even handshake seemed to be finished
> > correctly (using wg-show command) but peers couldn't ping each other.
> > And the most confusing part is everything came back to life without
> > any action from my side. Just after 6 minutes of continuously
> > restarting the wg1 interface!
> > Isn't there a bug? Somebody please help me to debug this problem and
> > find the cause.
> >
> > Here I bring you my shell-script code, and then its related log:
> >
> > -------------------------
> > #!/bin/bash
> >
> > exec >>/var/log/wg-ping 2>&1
> > while true
> > do
> >         connection=$(ping -c 1 10.10.10.1)
> >         time=$(date +%H:%M)
> >         seconds=$(date +%S)
> >         seconds=${seconds#0}
> >         if [[ "$connection" != *"icmp"* ]]; then
> >           echo " "
> >           date
> >           wg-quick down wg1
> >           echo " "
> >           wg-quick up wg1
> >           connection=$(ping -c 1 10.10.10.1)
> >           time=$(date +%T)
> >           if [[ "$connection" != *"icmp"* ]]; then
> >             echo "$time ERROR"
> >           else
> >             echo "$time OK"
> >             echo " "
> >           fi
> >         elif [[ $seconds -lt 5  ]]; then
> >           echo "$time OK"
> >         fi
> >         sleep 5
> > done
> > ===============
> > Sample log of simply restarting the wg1 which makes everything fine
> > (and happens every few hours):
> > -----------------------
> > 01:17 OK
> > 01:18 OK
> > 01:19 OK
> > 01:20 OK
> > 01:21 OK
> > 01:22 OK
> > 01:23 OK
> > 01:24 OK
> > 01:25 OK
> > 01:26 OK
> >
> > Fri Sep  3 01:26:41 +0430 2021
> > [#] ip route del default dev wg1 table middle
> > [#] ip rule del iif wg0 lookup middle
> > [#] ip link delete dev wg1
> >
> > [#] ip link add wg1 type wireguard
> > [#] wg setconf wg1 /dev/fd/63
> > [#] ip -4 address add 10.10.10.2/32 dev wg1
> > [#] ip link set mtu 1420 up dev wg1
> > [#] ip -4 route add 10.10.10.1/32 dev wg1
> > [#] ip route add default dev wg1 table middle
> > [#] ip rule add iif wg0 lookup middle
> > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> > 01:26:42 OK
> >
> > 01:27 OK
> > 01:28 OK
> > 01:30 OK
> > 01:31 OK
> > 01:32 OK
> > 01:33 OK
> > 01:34 OK
> > =====================================================
> > And the log for the confusing situation I explained:
> > -----------------------
> > 07:45 OK
> > 07:46 OK
> > 07:47 OK
> > 07:48 OK
> > 07:49 OK
> >
> > Fri Sep  3 07:49:42 +0430 2021
> > [#] ip route del default dev wg1 table middle
> > [#] ip rule del iif wg0 lookup middle
> > [#] ip link delete dev wg1
> >
> > [#] ip link add wg1 type wireguard
> > [#] wg setconf wg1 /dev/fd/63
> > [#] ip -4 address add 10.10.10.2/32 dev wg1
> > [#] ip link set mtu 1420 up dev wg1
> > [#] ip -4 route add 10.10.10.1/32 dev wg1
> > [#] ip route add default dev wg1 table middle
> > [#] ip rule add iif wg0 lookup middle
> > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> > 07:49:53 ERROR
> >
> > Fri Sep  3 07:50:08 +0430 2021
> > [#] ip route del default dev wg1 table middle
> > [#] ip rule del iif wg0 lookup middle
> > [#] ip link delete dev wg1
> >
> > [#] ip link add wg1 type wireguard
> > [#] wg setconf wg1 /dev/fd/63
> > [#] ip -4 address add 10.10.10.2/32 dev wg1
> > [#] ip link set mtu 1420 up dev wg1
> > [#] ip -4 route add 10.10.10.1/32 dev wg1
> > [#] ip route add default dev wg1 table middle
> > [#] ip rule add iif wg0 lookup middle
> > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> > 07:50:18 ERROR
> >
> > ======
> > LOTS OF RETRY LOGS CROPPED
> > ======
> >
> > Fri Sep  3 07:55:13 +0430 2021
> > [#] ip route del default dev wg1 table middle
> > [#] ip rule del iif wg0 lookup middle
> > [#] ip link delete dev wg1
> >
> > [#] ip link add wg1 type wireguard
> > [#] wg setconf wg1 /dev/fd/63
> > [#] ip -4 address add 10.10.10.2/32 dev wg1
> > [#] ip link set mtu 1420 up dev wg1
> > [#] ip -4 route add 10.10.10.1/32 dev wg1
> > [#] ip route add default dev wg1 table middle
> > [#] ip rule add iif wg0 lookup middle
> > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> > 07:55:23 ERROR
> >
> > Fri Sep  3 07:55:38 +0430 2021
> > [#] ip route del default dev wg1 table middle
> > [#] ip rule del iif wg0 lookup middle
> > [#] ip link delete dev wg1
> >
> > [#] ip link add wg1 type wireguard
> > [#] wg setconf wg1 /dev/fd/63
> > [#] ip -4 address add 10.10.10.2/32 dev wg1
> > [#] ip link set mtu 1420 up dev wg1
> > [#] ip -4 route add 10.10.10.1/32 dev wg1
> > [#] ip route add default dev wg1 table middle
> > [#] ip rule add iif wg0 lookup middle
> > [#] wg set wg1 peer <public key> allowed-ips 0.0.0.0/0
> > 07:55:39 OK
> >
> > 07:56 OK
> > 07:57 OK
> > 07:58 OK
> > 07:59 OK
> > 08:00 OK
> > ---------------------------------------
> >
> >
> >
> > ‫حامد صابر <‪hsaber@gmail.com‬‏> در تاریخ پنجشنبه ۲ سپتامبر ۲۰۲۱ ساعت
> > ۸:۴۷ نوشت:‬
> > >
> > > -Thanks for reply.
> > > For test reasons, I turned the firewall off on my middle-node server.
> > > But I can't understand how this issue may be related to firewall,
> > > because most of the time it works (and to me it means firewall is Ok),
> > > but sometime for some unknown reason it stops working, which when I
> > > restart the wg1 interface with this command everything comes back to
> > > life:
> > > wg-quick down wg1 && wg-quick up wg1
> > >
> > > BTW here it is the firewall status of middle-node:
> > > ==================
> > > ● firewalld.service - firewalld - dynamic firewall daemon
> > >    Loaded: loaded (/usr/lib/systemd/system/firewalld.service;
> > > disabled; vendor preset: enabled)
> > >    Active: inactive (dead)
> > >      Docs: man:firewalld(1)
> > > -----------------------------------
> > > Chain INPUT (policy ACCEPT)
> > > target     prot opt source               destination
> > >
> > > Chain FORWARD (policy ACCEPT)
> > > target     prot opt source               destination
> > >
> > > Chain OUTPUT (policy ACCEPT)
> > > target     prot opt source               destination
> > > -----------------------------------
> > >
> > > And the firewall status of exit-node:
> > > ====================
> > > Unit firewalld.service could not be found.
> > > ---------------------------
> > > Chain INPUT (policy ACCEPT)
> > > target     prot opt source               destination
> > > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:50842
> > > ACCEPT     udp  --  0.0.0.0/0            0.0.0.0/0            udp dpt:53
> > >
> > > Chain FORWARD (policy ACCEPT)
> > > target     prot opt source               destination
> > > ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> > > ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0
> > >
> > > Chain OUTPUT (policy ACCEPT)
> > > target     prot opt source               destination
> > > ---------------------
> > > Regards
> > >
> > > ‫‪John Lauro‬‏ <‪johnalauro@gmail.com‬‏> در تاریخ چهارشنبه ۱ سپتامبر
> > > ۲۰۲۱ ساعت ۲۱:۵۱ نوشت:‬
> > > >
> > > > Just a guess, but I would be suspicious about connection tracking causing the issue.  What are your firewall rules?
> > > >
> > > > ‪On Wed, Sep 1, 2021 at 9:51 AM ‫حامد صابر‬‎ <hsaber@gmail.com> wrote:‬
> > > >>
> > > >> Dear friends,
> > > >> I have configured 3 wireguard interfaces on 2 servers to act as a
> > > >> chained VPN for me (to bypass the internet censorship in my country),
> > > >> with this schema:
> > > >>
> > > >> client -- wg0 on middle-node -- wg1 on middle node -- wg2 on exit node
> > > >> (to free internet)
> > > >>
> > > >> Everything works fine, but after a while, the connection between wg1
> > > >> and wg2 drops and I can't find the reason. The connection comes back
> > > >> to action by simply switching the wg1 down and up again using
> > > >> wg-quick. And the amazing behaviour is that sometimes the connection
> > > >> comes back to work automatically  after some random time passes,
> > > >> without any actions from my side (sometimes after a few tens of
> > > >> minutes, sometimes after a few hours).
> > > >> When the wg1-wg2 connection is not working, anything else between 2
> > > >> servers (middle-node and exit-node) works fine. I mean I can ping the
> > > >> public IP of each server from another part, but the local wireguard ip
> > > >> of none of them are accessible.
> > > >>
> > > >> I tried to monitor the situation and read the logs but couldn't find
> > > >> what is happening here, so please help!
> > > >>
> > > >> The configuration:
> > > >> ======================
> > > >>
> > > >> client (my mobile phone):
> > > >> -------------------------------------------
> > > >> [Interface]
> > > >> Address = 10.10.20.2/32
> > > >> PrivateKey =  <private key of client>
> > > >> DNS = 10.10.10.1
> > > >>
> > > >> ### Middle Node
> > > >> [Peer]
> > > >> PublicKey =  <public key of wg0>
> > > >> PresharedKey =  <preshared key>
> > > >> AllowedIPs = 0.0.0.0/0
> > > >> Endpoint = middle-node:50842
> > > >> ======================
> > > >>
> > > >> wg0 (in middle-node server):
> > > >> -------------------------------------------
> > > >> [Interface]
> > > >> Address = 10.10.20.1/24
> > > >> ListenPort = 50842
> > > >> PrivateKey =  <private key of wg0>
> > > >>
> > > >> ### Client
> > > >> [Peer]
> > > >> PublicKey =  <public key of client>
> > > >> PresharedKey =  <preshared key>
> > > >> AllowedIPs = 10.10.20.2/32
> > > >> ======================
> > > >>
> > > >> wg1 (again in middle-node server):
> > > >> -------------------------------------------
> > > >> [Interface]
> > > >> Address = 10.10.10.2/32
> > > >> PrivateKey =  <private key of wg1>
> > > >>
> > > >> PostUp = ip route add default dev wg1 table middle
> > > >> PostUp = ip rule add iif wg0 lookup middle
> > > >> PostUp = wg set wg1 peer <publickey of wg2 (in exit-node)> allowed-ips 0.0.0.0/0
> > > >>
> > > >> PreDown = ip route del default dev wg1 table middle
> > > >> PreDown = ip rule del iif wg0 lookup middle
> > > >>
> > > >> ### Exit Node
> > > >> [Peer]
> > > >> PublicKey =  <publickey of wg2 (in exit-node)>
> > > >> PresharedKey =  <preshared key>
> > > >> AllowedIPs = 10.10.10.1/32
> > > >> Endpoint = exit-node:50842
> > > >> PersistentKeepalive = 25
> > > >> ======================
> > > >>
> > > >> wg2 (in exit-node server):
> > > >> -------------------------------------------
> > > >> [Interface]
> > > >> Address = 10.10.10.1/24
> > > >> ListenPort = 50842
> > > >> PrivateKey =  <private key of wg2>
> > > >>
> > > >> PostUp   = iptables -A FORWARD -i eth0 -o wg2 -j ACCEPT
> > > >> PostUp   = iptables -A FORWARD -i wg2 -j ACCEPT
> > > >> PostUp   = iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
> > > >>
> > > >> PostDown = iptables -D FORWARD -i eth0 -o wg2 -j ACCEPT
> > > >> PostDown = iptables -D FORWARD -i wg2 -j ACCEPT
> > > >> PostDown = iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
> > > >>
> > > >> ### Middle Node
> > > >> [Peer]
> > > >> PublicKey =  <publickey of wg1 (in middle-node)>
> > > >> PresharedKey =  <preshared key>
> > > >> AllowedIPs = 10.0.0.0/8
> > > >> ======================
> > > >> ======================
> > > >> ======================
> > > >>
> > > >> Sample log of dmesg when the wg1-wg2 connection is not working:
> > > >> -------------------------------------------
> > > >> [Wed Sep  1 11:19:32 2021] wireguard: wg1: Sending keepalive packet to
> > > >> peer 12 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:19:44 2021] wireguard: wg0: Sending keepalive packet to
> > > >> peer 8 (~client-ip~:65323)
> > > >> [Wed Sep  1 11:19:44 2021] wireguard: wg1: Receiving keepalive packet
> > > >> from peer 12 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving handshake
> > > >> initiation from peer 8 (~client-ip~:65323)
> > > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Sending handshake response
> > > >> to peer 8 (~client-ip~:65323)
> > > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2867 destroyed for peer 8
> > > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Keypair 2871 created for peer 8
> > > >> [Wed Sep  1 11:20:09 2021] wireguard: wg0: Receiving keepalive packet
> > > >> from peer 8 (~client-ip~:65323)
> > > >> [Wed Sep  1 11:21:19 2021] wireguard: wg0: Sending keepalive packet to
> > > >> peer 8 (~client-ip~:65323)
> > > >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Retrying handshake with
> > > >> peer 12 (~exit-node-ip~:50842) because we stopped hearing back after
> > > >> 15 seconds
> > > >> [Wed Sep  1 11:21:24 2021] wireguard: wg1: Sending handshake
> > > >> initiation to peer 12 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:21:30 2021] wireguard: wg1: Handshake for peer 12
> > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> > > >> 2)
> > > >> ======================
> > > >>
> > > >> Sample log of dmesg when the wg1-wg2 connection is coming back using
> > > >> manual restart:
> > > >> -------------------------------------------
> > > >> [Wed Sep  1 11:45:52 2021] wireguard: wg1: Sending handshake
> > > >> initiation to peer 12 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:45:52 2021] wireguard: wg0: Sending keepalive packet to
> > > >> peer 8 (~client-ip~:2335)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Handshake for peer 12
> > > >> (~exit-node-ip~:50842) did not complete after 5 seconds, retrying (try
> > > >> 3)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> > > >> initiation to peer 12 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2878 destroyed for peer 12
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 12
> > > >> (~exit-node-ip~:50842) destroyed
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface destroyed
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Interface created
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Peer 13 created
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending keepalive packet to
> > > >> peer 13 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Sending handshake
> > > >> initiation to peer 13 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Receiving handshake
> > > >> response from peer 13 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:45:58 2021] wireguard: wg1: Keypair 2881 created for peer 13
> > > >> [Wed Sep  1 11:46:12 2021] wireguard: wg0: Receiving keepalive packet
> > > >> from peer 8 (~client-ip~:2335)
> > > >> [Wed Sep  1 11:46:14 2021] wireguard: wg1: Receiving keepalive packet
> > > >> from peer 13 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:46:27 2021] wireguard: wg0: Sending keepalive packet to
> > > >> peer 8 (~client-ip~:2335)
> > > >> [Wed Sep  1 11:46:28 2021] wireguard: wg1: Receiving keepalive packet
> > > >> from peer 13 (~exit-node-ip~:50842)
> > > >> [Wed Sep  1 11:46:52 2021] wireguard: wg1: Receiving keepalive packet
> > > >> from peer 13 (~exit-node-ip~:50842)
> > > >>
> > > >>
> > > >> Thanks in advance for your kind help

      reply	other threads:[~2021-09-21  4:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAECraRrhiAZQfZjSc0Auhby0Z+G0jSfNeapgDnfQ6=NT8p7q5Q@mail.gmail.com>
2021-09-01  9:03 ` حامد صابر
     [not found]   ` <CADGd2Dp0S5_=bPQDnRsamvraJZ5BeVV1eWh98js3wATXodfQcQ@mail.gmail.com>
2021-09-02  4:17     ` حامد صابر
2021-09-03  4:09       ` حامد صابر
2021-09-04 16:55         ` حامد صابر
2021-09-21  4:26           ` حامد صابر [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAECraRqw7tZ12cPgADE=APs0a1LyG7zwPpJYZ+E3DhMhkcohDg@mail.gmail.com' \
    --to=hsaber@gmail.com \
    --cc=wireguard@lists.zx2c4.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).