Development discussion of WireGuard
 help / color / mirror / Atom feed
* wireguard-go: data race introduced in recent commit 9e2f386
@ 2023-03-23  9:01 lsc
  2023-03-23 17:00 ` Jordan Whited
  0 siblings, 1 reply; 5+ messages in thread
From: lsc @ 2023-03-23  9:01 UTC (permalink / raw)
  To: wireguard; +Cc: jordan, james, Jason

Hi,

Not sure if this is the correct place for bug reports, hope it's okay :)

I found my wireguard-go server (built from current ToT 1417a47) always 
panicking when I ran speedtest on a client via the server. I rebuilt the 
server with data race detector (`go build -race`) and got the following 
trace:

==================
WARNING: DATA RACE
Read at 0x00c00043b2f8 by goroutine 49:
   golang.zx2c4.com/wireguard/conn.setSrcControl()
       /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
   golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
       /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
   golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
       /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
   golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
       /home/lsc36/wireguard-go/device/peer.go:126 +0x18c
   golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
       /home/lsc36/wireguard-go/device/send.go:519 +0x458
   golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
       /home/lsc36/wireguard-go/device/peer.go:198 +0x40

Previous write at 0x00c00043b2f8 by goroutine 47:
   golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
       /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
   golang.zx2c4.com/wireguard/conn.getSrcFromControl()
       /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
   golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
       /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
   golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
       /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
   golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
       /home/lsc36/wireguard-go/device/device.go:530 +0x4c

Goroutine 49 (running) created at:
   golang.zx2c4.com/wireguard/device.(*Peer).Start()
       /home/lsc36/wireguard-go/device/peer.go:198 +0x300
   golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
       /home/lsc36/wireguard-go/device/uapi.go:268 +0x164
   golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
       /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
   golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
       /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
   main.main.func4.1()
       /home/lsc36/wireguard-go/main.go:245 +0x4c

Goroutine 47 (running) created at:
   golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
       /home/lsc36/wireguard-go/device/device.go:530 +0x4b8
   golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
       /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
   golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
       /home/lsc36/wireguard-go/device/uapi.go:183 +0x210
   golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
       /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
   main.main.func4.1()
       /home/lsc36/wireguard-go/main.go:245 +0x4c
==================

The bug looks introduced by a recent commit 9e2f386 "conn, device, tun: 
implement vectorized I/O on Linux". Without much knowledge to the 
codebase, it seems to me that Endpoint is supposed to be guarded by the 
RWMutex embedded in Peer, yet the change introduced a write without 
locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).

In addition, I found it weird that Endpoint objects are reused via 
`endpointPool`. The docstring suggests "Endpoints are immutable, so we 
can re-use them" but that doesn't seem true.

Added authors/reviewers of 9e2f386 to cc. Would you take a look?

Regards,
SC


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: wireguard-go: data race introduced in recent commit 9e2f386
  2023-03-23  9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc
@ 2023-03-23 17:00 ` Jordan Whited
  2023-03-24  3:21   ` lsc
  0 siblings, 1 reply; 5+ messages in thread
From: Jordan Whited @ 2023-03-23 17:00 UTC (permalink / raw)
  To: lsc; +Cc: wireguard, james, Jason

SC, thank you for reporting. Do you happen to have a stack trace from
one of the panics?

On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote:
>
> Hi,
>
> Not sure if this is the correct place for bug reports, hope it's okay :)
>
> I found my wireguard-go server (built from current ToT 1417a47) always
> panicking when I ran speedtest on a client via the server. I rebuilt the
> server with data race detector (`go build -race`) and got the following
> trace:
>
> ==================
> WARNING: DATA RACE
> Read at 0x00c00043b2f8 by goroutine 49:
>    golang.zx2c4.com/wireguard/conn.setSrcControl()
>        /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
>        /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
>        /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
>    golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
>        /home/lsc36/wireguard-go/device/peer.go:126 +0x18c
>    golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
>        /home/lsc36/wireguard-go/device/send.go:519 +0x458
>    golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
>        /home/lsc36/wireguard-go/device/peer.go:198 +0x40
>
> Previous write at 0x00c00043b2f8 by goroutine 47:
>    golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
>        /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
>    golang.zx2c4.com/wireguard/conn.getSrcFromControl()
>        /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
>        /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
>    golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
>        /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
>    golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
>        /home/lsc36/wireguard-go/device/device.go:530 +0x4c
>
> Goroutine 49 (running) created at:
>    golang.zx2c4.com/wireguard/device.(*Peer).Start()
>        /home/lsc36/wireguard-go/device/peer.go:198 +0x300
>    golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
>        /home/lsc36/wireguard-go/device/uapi.go:268 +0x164
>    golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>        /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
>    golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>        /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>    main.main.func4.1()
>        /home/lsc36/wireguard-go/main.go:245 +0x4c
>
> Goroutine 47 (running) created at:
>    golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
>        /home/lsc36/wireguard-go/device/device.go:530 +0x4b8
>    golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
>        /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
>    golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>        /home/lsc36/wireguard-go/device/uapi.go:183 +0x210
>    golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>        /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>    main.main.func4.1()
>        /home/lsc36/wireguard-go/main.go:245 +0x4c
> ==================
>
> The bug looks introduced by a recent commit 9e2f386 "conn, device, tun:
> implement vectorized I/O on Linux". Without much knowledge to the
> codebase, it seems to me that Endpoint is supposed to be guarded by the
> RWMutex embedded in Peer, yet the change introduced a write without
> locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).
>
> In addition, I found it weird that Endpoint objects are reused via
> `endpointPool`. The docstring suggests "Endpoints are immutable, so we
> can re-use them" but that doesn't seem true.
>
> Added authors/reviewers of 9e2f386 to cc. Would you take a look?
>
> Regards,
> SC
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: wireguard-go: data race introduced in recent commit 9e2f386
  2023-03-23 17:00 ` Jordan Whited
@ 2023-03-24  3:21   ` lsc
  2023-03-24 13:44     ` Jason A. Donenfeld
  0 siblings, 1 reply; 5+ messages in thread
From: lsc @ 2023-03-24  3:21 UTC (permalink / raw)
  To: Jordan Whited; +Cc: wireguard, james, Jason

Sure, here's the trace:

--
DEBUG: (wg1) 2023/03/22 23:46:39 peer(Suxf…6dEI) - Routine: sequential sender - stopped
panic: As4 called on IP zero value

goroutine 70 [running]:
net/netip.Addr.As4({{0xc0048b1200?, 0xc00058bb28?}, 0x0?})
        /opt/go1.20.2.linux-arm64/src/net/netip/netip.go:690 +0x174
golang.zx2c4.com/wireguard/conn.setSrcControl(0xc00931b018, 0xc006301640)
        /home/lsc36/wireguard-go/conn/sticky_linux.go:95 +0x280
golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4(0xc0003b6000, 0xffffa9d04200?, 0xc0000b8230, {0x24e6e8?, 0xc006301640?}, {0xc000533800, 0x1, 0xba250?})
        /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x178
golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send(0xc0003b6000, {0xc000533800, 0x1, 0x80}, {0x24e6e8, 0xc006301640})
        /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d8
golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers(0xc0000be380, {0xc000533800?, 0x1, 0x80})
        /home/lsc36/wireguard-go/device/peer.go:126 +0x190
golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender(0xc0000be380, 0x80)
        /home/lsc36/wireguard-go/device/send.go:519 +0x45c
created by golang.zx2c4.com/wireguard/device.(*Peer).Start
        /home/lsc36/wireguard-go/device/peer.go:198 +0x304
--

The offending call `ep.SrcIP().As4()` (sticky_linux.go:95) is only
reached when `ep.SrcIP().Is4() && ep.SrcIP().IsValid()` is true, yet it
panicked saying it was being called against a zero value. That's why I
suspected it's a data race in the first place.

On 3/24/23 01:00, Jordan Whited wrote:
> SC, thank you for reporting. Do you happen to have a stack trace from
> one of the panics?
> 
> On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote:
>>
>> Hi,
>>
>> Not sure if this is the correct place for bug reports, hope it's okay :)
>>
>> I found my wireguard-go server (built from current ToT 1417a47) always
>> panicking when I ran speedtest on a client via the server. I rebuilt the
>> server with data race detector (`go build -race`) and got the following
>> trace:
>>
>> ==================
>> WARNING: DATA RACE
>> Read at 0x00c00043b2f8 by goroutine 49:
>>    golang.zx2c4.com/wireguard/conn.setSrcControl()
>>        /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
>>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
>>        /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
>>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
>>        /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
>>    golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
>>        /home/lsc36/wireguard-go/device/peer.go:126 +0x18c
>>    golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
>>        /home/lsc36/wireguard-go/device/send.go:519 +0x458
>>    golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
>>        /home/lsc36/wireguard-go/device/peer.go:198 +0x40
>>
>> Previous write at 0x00c00043b2f8 by goroutine 47:
>>    golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
>>        /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
>>    golang.zx2c4.com/wireguard/conn.getSrcFromControl()
>>        /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
>>    golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
>>        /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
>>    golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
>>        /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
>>    golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
>>        /home/lsc36/wireguard-go/device/device.go:530 +0x4c
>>
>> Goroutine 49 (running) created at:
>>    golang.zx2c4.com/wireguard/device.(*Peer).Start()
>>        /home/lsc36/wireguard-go/device/peer.go:198 +0x300
>>    golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
>>        /home/lsc36/wireguard-go/device/uapi.go:268 +0x164
>>    golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>>        /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
>>    golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>>        /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>>    main.main.func4.1()
>>        /home/lsc36/wireguard-go/main.go:245 +0x4c
>>
>> Goroutine 47 (running) created at:
>>    golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
>>        /home/lsc36/wireguard-go/device/device.go:530 +0x4b8
>>    golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
>>        /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
>>    golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>>        /home/lsc36/wireguard-go/device/uapi.go:183 +0x210
>>    golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>>        /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>>    main.main.func4.1()
>>        /home/lsc36/wireguard-go/main.go:245 +0x4c
>> ==================
>>
>> The bug looks introduced by a recent commit 9e2f386 "conn, device, tun:
>> implement vectorized I/O on Linux". Without much knowledge to the
>> codebase, it seems to me that Endpoint is supposed to be guarded by the
>> RWMutex embedded in Peer, yet the change introduced a write without
>> locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).
>>
>> In addition, I found it weird that Endpoint objects are reused via
>> `endpointPool`. The docstring suggests "Endpoints are immutable, so we
>> can re-use them" but that doesn't seem true.
>>
>> Added authors/reviewers of 9e2f386 to cc. Would you take a look?
>>
>> Regards,
>> SC
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: wireguard-go: data race introduced in recent commit 9e2f386
  2023-03-24  3:21   ` lsc
@ 2023-03-24 13:44     ` Jason A. Donenfeld
  2023-03-24 13:45       ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld
  0 siblings, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2023-03-24 13:44 UTC (permalink / raw)
  To: lsc; +Cc: Jordan Whited, wireguard, james

Thanks for the panic trace. That indeed looks like the same issue as
the race. Jordan has a stop-gap patch prepared, which I'll send to the
list.

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints
  2023-03-24 13:44     ` Jason A. Donenfeld
@ 2023-03-24 13:45       ` Jason A. Donenfeld
  0 siblings, 0 replies; 5+ messages in thread
From: Jason A. Donenfeld @ 2023-03-24 13:45 UTC (permalink / raw)
  To: wireguard
  Cc: Jordan Whited, lsc, Josh Bleecher Snyder, James Tucker,
	Jason A . Donenfeld

From: Jordan Whited <jordan@tailscale.com>

In 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux"), the
Linux-specific Bind implementation was collapsed into StdNetBind. This
introduced a race on StdNetEndpoint from getSrcFromControl() and
setSrcControl().

Remove the sync.Pool involved in the race, and simplify StdNetBind's
receive path to allocate StdNetEndpoint on the heap instead, with the
intent for it to be cleaned up by the GC, later. This essentially
reverts ef5c587 ("conn: remove the final alloc per packet receive"),
adding back that allocation, unfortunately.

This does slightly increase resident memory usage in higher throughput
scenarios. StdNetBind is the only Bind implementation that was using
this Endpoint recycling technique prior to this commit.

This is considered a stop-gap solution, and there are plans to replace
the allocation with a better mechanism.

Reported-by: lsc <lsc@lv6.tw>
Link: https://lore.kernel.org/wireguard/ac87f86f-6837-4e0e-ec34-1df35f52540e@lv6.tw/
Fixes: 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux")
Cc: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 conn/bind_std.go | 32 ++++++++------------------------
 1 file changed, 8 insertions(+), 24 deletions(-)

diff --git a/conn/bind_std.go b/conn/bind_std.go
index e8cf7b8..491e571 100644
--- a/conn/bind_std.go
+++ b/conn/bind_std.go
@@ -93,7 +93,12 @@ var (
 
 func (*StdNetBind) ParseEndpoint(s string) (Endpoint, error) {
 	e, err := netip.ParseAddrPort(s)
-	return asEndpoint(e), err
+	if err != nil {
+		return nil, err
+	}
+	return &StdNetEndpoint{
+		AddrPort: e,
+	}, nil
 }
 
 func (e *StdNetEndpoint) ClearSrc() {
@@ -228,7 +233,7 @@ func (s *StdNetBind) makeReceiveIPv4(pc *ipv4.PacketConn, conn *net.UDPConn) Rec
 			msg := &(*msgs)[i]
 			sizes[i] = msg.N
 			addrPort := msg.Addr.(*net.UDPAddr).AddrPort()
-			ep := asEndpoint(addrPort)
+			ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation
 			getSrcFromControl(msg.OOB[:msg.NN], ep)
 			eps[i] = ep
 		}
@@ -261,7 +266,7 @@ func (s *StdNetBind) makeReceiveIPv6(pc *ipv6.PacketConn, conn *net.UDPConn) Rec
 			msg := &(*msgs)[i]
 			sizes[i] = msg.N
 			addrPort := msg.Addr.(*net.UDPAddr).AddrPort()
-			ep := asEndpoint(addrPort)
+			ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation
 			getSrcFromControl(msg.OOB[:msg.NN], ep)
 			eps[i] = ep
 		}
@@ -408,24 +413,3 @@ func (s *StdNetBind) send6(conn *net.UDPConn, pc *ipv6.PacketConn, ep Endpoint,
 	s.ipv6MsgsPool.Put(msgs)
 	return err
 }
-
-// endpointPool contains a re-usable set of mapping from netip.AddrPort to Endpoint.
-// This exists to reduce allocations: Putting a netip.AddrPort in an Endpoint allocates,
-// but Endpoints are immutable, so we can re-use them.
-var endpointPool = sync.Pool{
-	New: func() any {
-		return make(map[netip.AddrPort]*StdNetEndpoint)
-	},
-}
-
-// asEndpoint returns an Endpoint containing ap.
-func asEndpoint(ap netip.AddrPort) *StdNetEndpoint {
-	m := endpointPool.Get().(map[netip.AddrPort]*StdNetEndpoint)
-	defer endpointPool.Put(m)
-	e, ok := m[ap]
-	if !ok {
-		e = &StdNetEndpoint{AddrPort: ap}
-		m[ap] = e
-	}
-	return e
-}
-- 
2.40.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-24 13:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23  9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc
2023-03-23 17:00 ` Jordan Whited
2023-03-24  3:21   ` lsc
2023-03-24 13:44     ` Jason A. Donenfeld
2023-03-24 13:45       ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).