* wireguard-go: data race introduced in recent commit 9e2f386 @ 2023-03-23 9:01 lsc 2023-03-23 17:00 ` Jordan Whited 0 siblings, 1 reply; 5+ messages in thread From: lsc @ 2023-03-23 9:01 UTC (permalink / raw) To: wireguard; +Cc: jordan, james, Jason Hi, Not sure if this is the correct place for bug reports, hope it's okay :) I found my wireguard-go server (built from current ToT 1417a47) always panicking when I ran speedtest on a client via the server. I rebuilt the server with data race detector (`go build -race`) and got the following trace: ================== WARNING: DATA RACE Read at 0x00c00043b2f8 by goroutine 49: golang.zx2c4.com/wireguard/conn.setSrcControl() /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4() /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174 golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send() /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4 golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers() /home/lsc36/wireguard-go/device/peer.go:126 +0x18c golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender() /home/lsc36/wireguard-go/device/send.go:519 +0x458 golang.zx2c4.com/wireguard/device.(*Peer).Start.func2() /home/lsc36/wireguard-go/device/peer.go:198 +0x40 Previous write at 0x00c00043b2f8 by goroutine 47: golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc() /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48 golang.zx2c4.com/wireguard/conn.getSrcFromControl() /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1() /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350 golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming() /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3() /home/lsc36/wireguard-go/device/device.go:530 +0x4c Goroutine 49 (running) created at: golang.zx2c4.com/wireguard/device.(*Peer).Start() /home/lsc36/wireguard-go/device/peer.go:198 +0x300 golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig() /home/lsc36/wireguard-go/device/uapi.go:268 +0x164 golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 main.main.func4.1() /home/lsc36/wireguard-go/main.go:245 +0x4c Goroutine 47 (running) created at: golang.zx2c4.com/wireguard/device.(*Device).BindUpdate() /home/lsc36/wireguard-go/device/device.go:530 +0x4b8 golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine() /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() /home/lsc36/wireguard-go/device/uapi.go:183 +0x210 golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 main.main.func4.1() /home/lsc36/wireguard-go/main.go:245 +0x4c ================== The bug looks introduced by a recent commit 9e2f386 "conn, device, tun: implement vectorized I/O on Linux". Without much knowledge to the codebase, it seems to me that Endpoint is supposed to be guarded by the RWMutex embedded in Peer, yet the change introduced a write without locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18). In addition, I found it weird that Endpoint objects are reused via `endpointPool`. The docstring suggests "Endpoints are immutable, so we can re-use them" but that doesn't seem true. Added authors/reviewers of 9e2f386 to cc. Would you take a look? Regards, SC ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386 2023-03-23 9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc @ 2023-03-23 17:00 ` Jordan Whited 2023-03-24 3:21 ` lsc 0 siblings, 1 reply; 5+ messages in thread From: Jordan Whited @ 2023-03-23 17:00 UTC (permalink / raw) To: lsc; +Cc: wireguard, james, Jason SC, thank you for reporting. Do you happen to have a stack trace from one of the panics? On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote: > > Hi, > > Not sure if this is the correct place for bug reports, hope it's okay :) > > I found my wireguard-go server (built from current ToT 1417a47) always > panicking when I ran speedtest on a client via the server. I rebuilt the > server with data race detector (`go build -race`) and got the following > trace: > > ================== > WARNING: DATA RACE > Read at 0x00c00043b2f8 by goroutine 49: > golang.zx2c4.com/wireguard/conn.setSrcControl() > /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c > golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4() > /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174 > golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send() > /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4 > golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers() > /home/lsc36/wireguard-go/device/peer.go:126 +0x18c > golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender() > /home/lsc36/wireguard-go/device/send.go:519 +0x458 > golang.zx2c4.com/wireguard/device.(*Peer).Start.func2() > /home/lsc36/wireguard-go/device/peer.go:198 +0x40 > > Previous write at 0x00c00043b2f8 by goroutine 47: > golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc() > /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48 > golang.zx2c4.com/wireguard/conn.getSrcFromControl() > /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c > golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1() > /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350 > golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming() > /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac > golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3() > /home/lsc36/wireguard-go/device/device.go:530 +0x4c > > Goroutine 49 (running) created at: > golang.zx2c4.com/wireguard/device.(*Peer).Start() > /home/lsc36/wireguard-go/device/peer.go:198 +0x300 > golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig() > /home/lsc36/wireguard-go/device/uapi.go:268 +0x164 > golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() > /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c > golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() > /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 > main.main.func4.1() > /home/lsc36/wireguard-go/main.go:245 +0x4c > > Goroutine 47 (running) created at: > golang.zx2c4.com/wireguard/device.(*Device).BindUpdate() > /home/lsc36/wireguard-go/device/device.go:530 +0x4b8 > golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine() > /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c > golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() > /home/lsc36/wireguard-go/device/uapi.go:183 +0x210 > golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() > /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 > main.main.func4.1() > /home/lsc36/wireguard-go/main.go:245 +0x4c > ================== > > The bug looks introduced by a recent commit 9e2f386 "conn, device, tun: > implement vectorized I/O on Linux". Without much knowledge to the > codebase, it seems to me that Endpoint is supposed to be guarded by the > RWMutex embedded in Peer, yet the change introduced a write without > locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18). > > In addition, I found it weird that Endpoint objects are reused via > `endpointPool`. The docstring suggests "Endpoints are immutable, so we > can re-use them" but that doesn't seem true. > > Added authors/reviewers of 9e2f386 to cc. Would you take a look? > > Regards, > SC > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386 2023-03-23 17:00 ` Jordan Whited @ 2023-03-24 3:21 ` lsc 2023-03-24 13:44 ` Jason A. Donenfeld 0 siblings, 1 reply; 5+ messages in thread From: lsc @ 2023-03-24 3:21 UTC (permalink / raw) To: Jordan Whited; +Cc: wireguard, james, Jason Sure, here's the trace: -- DEBUG: (wg1) 2023/03/22 23:46:39 peer(Suxf…6dEI) - Routine: sequential sender - stopped panic: As4 called on IP zero value goroutine 70 [running]: net/netip.Addr.As4({{0xc0048b1200?, 0xc00058bb28?}, 0x0?}) /opt/go1.20.2.linux-arm64/src/net/netip/netip.go:690 +0x174 golang.zx2c4.com/wireguard/conn.setSrcControl(0xc00931b018, 0xc006301640) /home/lsc36/wireguard-go/conn/sticky_linux.go:95 +0x280 golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4(0xc0003b6000, 0xffffa9d04200?, 0xc0000b8230, {0x24e6e8?, 0xc006301640?}, {0xc000533800, 0x1, 0xba250?}) /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x178 golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send(0xc0003b6000, {0xc000533800, 0x1, 0x80}, {0x24e6e8, 0xc006301640}) /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d8 golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers(0xc0000be380, {0xc000533800?, 0x1, 0x80}) /home/lsc36/wireguard-go/device/peer.go:126 +0x190 golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender(0xc0000be380, 0x80) /home/lsc36/wireguard-go/device/send.go:519 +0x45c created by golang.zx2c4.com/wireguard/device.(*Peer).Start /home/lsc36/wireguard-go/device/peer.go:198 +0x304 -- The offending call `ep.SrcIP().As4()` (sticky_linux.go:95) is only reached when `ep.SrcIP().Is4() && ep.SrcIP().IsValid()` is true, yet it panicked saying it was being called against a zero value. That's why I suspected it's a data race in the first place. On 3/24/23 01:00, Jordan Whited wrote: > SC, thank you for reporting. Do you happen to have a stack trace from > one of the panics? > > On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote: >> >> Hi, >> >> Not sure if this is the correct place for bug reports, hope it's okay :) >> >> I found my wireguard-go server (built from current ToT 1417a47) always >> panicking when I ran speedtest on a client via the server. I rebuilt the >> server with data race detector (`go build -race`) and got the following >> trace: >> >> ================== >> WARNING: DATA RACE >> Read at 0x00c00043b2f8 by goroutine 49: >> golang.zx2c4.com/wireguard/conn.setSrcControl() >> /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c >> golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4() >> /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174 >> golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send() >> /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4 >> golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers() >> /home/lsc36/wireguard-go/device/peer.go:126 +0x18c >> golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender() >> /home/lsc36/wireguard-go/device/send.go:519 +0x458 >> golang.zx2c4.com/wireguard/device.(*Peer).Start.func2() >> /home/lsc36/wireguard-go/device/peer.go:198 +0x40 >> >> Previous write at 0x00c00043b2f8 by goroutine 47: >> golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc() >> /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48 >> golang.zx2c4.com/wireguard/conn.getSrcFromControl() >> /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c >> golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1() >> /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350 >> golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming() >> /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac >> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3() >> /home/lsc36/wireguard-go/device/device.go:530 +0x4c >> >> Goroutine 49 (running) created at: >> golang.zx2c4.com/wireguard/device.(*Peer).Start() >> /home/lsc36/wireguard-go/device/peer.go:198 +0x300 >> golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig() >> /home/lsc36/wireguard-go/device/uapi.go:268 +0x164 >> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() >> /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c >> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() >> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 >> main.main.func4.1() >> /home/lsc36/wireguard-go/main.go:245 +0x4c >> >> Goroutine 47 (running) created at: >> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate() >> /home/lsc36/wireguard-go/device/device.go:530 +0x4b8 >> golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine() >> /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c >> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation() >> /home/lsc36/wireguard-go/device/uapi.go:183 +0x210 >> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle() >> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248 >> main.main.func4.1() >> /home/lsc36/wireguard-go/main.go:245 +0x4c >> ================== >> >> The bug looks introduced by a recent commit 9e2f386 "conn, device, tun: >> implement vectorized I/O on Linux". Without much knowledge to the >> codebase, it seems to me that Endpoint is supposed to be guarded by the >> RWMutex embedded in Peer, yet the change introduced a write without >> locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18). >> >> In addition, I found it weird that Endpoint objects are reused via >> `endpointPool`. The docstring suggests "Endpoints are immutable, so we >> can re-use them" but that doesn't seem true. >> >> Added authors/reviewers of 9e2f386 to cc. Would you take a look? >> >> Regards, >> SC >> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386 2023-03-24 3:21 ` lsc @ 2023-03-24 13:44 ` Jason A. Donenfeld 2023-03-24 13:45 ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld 0 siblings, 1 reply; 5+ messages in thread From: Jason A. Donenfeld @ 2023-03-24 13:44 UTC (permalink / raw) To: lsc; +Cc: Jordan Whited, wireguard, james Thanks for the panic trace. That indeed looks like the same issue as the race. Jordan has a stop-gap patch prepared, which I'll send to the list. Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints 2023-03-24 13:44 ` Jason A. Donenfeld @ 2023-03-24 13:45 ` Jason A. Donenfeld 0 siblings, 0 replies; 5+ messages in thread From: Jason A. Donenfeld @ 2023-03-24 13:45 UTC (permalink / raw) To: wireguard Cc: Jordan Whited, lsc, Josh Bleecher Snyder, James Tucker, Jason A . Donenfeld From: Jordan Whited <jordan@tailscale.com> In 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux"), the Linux-specific Bind implementation was collapsed into StdNetBind. This introduced a race on StdNetEndpoint from getSrcFromControl() and setSrcControl(). Remove the sync.Pool involved in the race, and simplify StdNetBind's receive path to allocate StdNetEndpoint on the heap instead, with the intent for it to be cleaned up by the GC, later. This essentially reverts ef5c587 ("conn: remove the final alloc per packet receive"), adding back that allocation, unfortunately. This does slightly increase resident memory usage in higher throughput scenarios. StdNetBind is the only Bind implementation that was using this Endpoint recycling technique prior to this commit. This is considered a stop-gap solution, and there are plans to replace the allocation with a better mechanism. Reported-by: lsc <lsc@lv6.tw> Link: https://lore.kernel.org/wireguard/ac87f86f-6837-4e0e-ec34-1df35f52540e@lv6.tw/ Fixes: 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux") Cc: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: James Tucker <james@tailscale.com> Signed-off-by: Jordan Whited <jordan@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- conn/bind_std.go | 32 ++++++++------------------------ 1 file changed, 8 insertions(+), 24 deletions(-) diff --git a/conn/bind_std.go b/conn/bind_std.go index e8cf7b8..491e571 100644 --- a/conn/bind_std.go +++ b/conn/bind_std.go @@ -93,7 +93,12 @@ var ( func (*StdNetBind) ParseEndpoint(s string) (Endpoint, error) { e, err := netip.ParseAddrPort(s) - return asEndpoint(e), err + if err != nil { + return nil, err + } + return &StdNetEndpoint{ + AddrPort: e, + }, nil } func (e *StdNetEndpoint) ClearSrc() { @@ -228,7 +233,7 @@ func (s *StdNetBind) makeReceiveIPv4(pc *ipv4.PacketConn, conn *net.UDPConn) Rec msg := &(*msgs)[i] sizes[i] = msg.N addrPort := msg.Addr.(*net.UDPAddr).AddrPort() - ep := asEndpoint(addrPort) + ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation getSrcFromControl(msg.OOB[:msg.NN], ep) eps[i] = ep } @@ -261,7 +266,7 @@ func (s *StdNetBind) makeReceiveIPv6(pc *ipv6.PacketConn, conn *net.UDPConn) Rec msg := &(*msgs)[i] sizes[i] = msg.N addrPort := msg.Addr.(*net.UDPAddr).AddrPort() - ep := asEndpoint(addrPort) + ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation getSrcFromControl(msg.OOB[:msg.NN], ep) eps[i] = ep } @@ -408,24 +413,3 @@ func (s *StdNetBind) send6(conn *net.UDPConn, pc *ipv6.PacketConn, ep Endpoint, s.ipv6MsgsPool.Put(msgs) return err } - -// endpointPool contains a re-usable set of mapping from netip.AddrPort to Endpoint. -// This exists to reduce allocations: Putting a netip.AddrPort in an Endpoint allocates, -// but Endpoints are immutable, so we can re-use them. -var endpointPool = sync.Pool{ - New: func() any { - return make(map[netip.AddrPort]*StdNetEndpoint) - }, -} - -// asEndpoint returns an Endpoint containing ap. -func asEndpoint(ap netip.AddrPort) *StdNetEndpoint { - m := endpointPool.Get().(map[netip.AddrPort]*StdNetEndpoint) - defer endpointPool.Put(m) - e, ok := m[ap] - if !ok { - e = &StdNetEndpoint{AddrPort: ap} - m[ap] = e - } - return e -} -- 2.40.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-24 13:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-23 9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc 2023-03-23 17:00 ` Jordan Whited 2023-03-24 3:21 ` lsc 2023-03-24 13:44 ` Jason A. Donenfeld 2023-03-24 13:45 ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).