* wireguard-go: data race introduced in recent commit 9e2f386
@ 2023-03-23 9:01 lsc
2023-03-23 17:00 ` Jordan Whited
0 siblings, 1 reply; 5+ messages in thread
From: lsc @ 2023-03-23 9:01 UTC (permalink / raw)
To: wireguard; +Cc: jordan, james, Jason
Hi,
Not sure if this is the correct place for bug reports, hope it's okay :)
I found my wireguard-go server (built from current ToT 1417a47) always
panicking when I ran speedtest on a client via the server. I rebuilt the
server with data race detector (`go build -race`) and got the following
trace:
==================
WARNING: DATA RACE
Read at 0x00c00043b2f8 by goroutine 49:
golang.zx2c4.com/wireguard/conn.setSrcControl()
/home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
/home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
/home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
/home/lsc36/wireguard-go/device/peer.go:126 +0x18c
golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
/home/lsc36/wireguard-go/device/send.go:519 +0x458
golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
/home/lsc36/wireguard-go/device/peer.go:198 +0x40
Previous write at 0x00c00043b2f8 by goroutine 47:
golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
/home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
golang.zx2c4.com/wireguard/conn.getSrcFromControl()
/home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
/home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
/home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
/home/lsc36/wireguard-go/device/device.go:530 +0x4c
Goroutine 49 (running) created at:
golang.zx2c4.com/wireguard/device.(*Peer).Start()
/home/lsc36/wireguard-go/device/peer.go:198 +0x300
golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
/home/lsc36/wireguard-go/device/uapi.go:268 +0x164
golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
/home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
/home/lsc36/wireguard-go/device/uapi.go:428 +0x248
main.main.func4.1()
/home/lsc36/wireguard-go/main.go:245 +0x4c
Goroutine 47 (running) created at:
golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
/home/lsc36/wireguard-go/device/device.go:530 +0x4b8
golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
/home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
/home/lsc36/wireguard-go/device/uapi.go:183 +0x210
golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
/home/lsc36/wireguard-go/device/uapi.go:428 +0x248
main.main.func4.1()
/home/lsc36/wireguard-go/main.go:245 +0x4c
==================
The bug looks introduced by a recent commit 9e2f386 "conn, device, tun:
implement vectorized I/O on Linux". Without much knowledge to the
codebase, it seems to me that Endpoint is supposed to be guarded by the
RWMutex embedded in Peer, yet the change introduced a write without
locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).
In addition, I found it weird that Endpoint objects are reused via
`endpointPool`. The docstring suggests "Endpoints are immutable, so we
can re-use them" but that doesn't seem true.
Added authors/reviewers of 9e2f386 to cc. Would you take a look?
Regards,
SC
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386
2023-03-23 9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc
@ 2023-03-23 17:00 ` Jordan Whited
2023-03-24 3:21 ` lsc
0 siblings, 1 reply; 5+ messages in thread
From: Jordan Whited @ 2023-03-23 17:00 UTC (permalink / raw)
To: lsc; +Cc: wireguard, james, Jason
SC, thank you for reporting. Do you happen to have a stack trace from
one of the panics?
On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote:
>
> Hi,
>
> Not sure if this is the correct place for bug reports, hope it's okay :)
>
> I found my wireguard-go server (built from current ToT 1417a47) always
> panicking when I ran speedtest on a client via the server. I rebuilt the
> server with data race detector (`go build -race`) and got the following
> trace:
>
> ==================
> WARNING: DATA RACE
> Read at 0x00c00043b2f8 by goroutine 49:
> golang.zx2c4.com/wireguard/conn.setSrcControl()
> /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
> golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
> /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
> golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
> /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
> golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
> /home/lsc36/wireguard-go/device/peer.go:126 +0x18c
> golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
> /home/lsc36/wireguard-go/device/send.go:519 +0x458
> golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
> /home/lsc36/wireguard-go/device/peer.go:198 +0x40
>
> Previous write at 0x00c00043b2f8 by goroutine 47:
> golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
> /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
> golang.zx2c4.com/wireguard/conn.getSrcFromControl()
> /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
> golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
> /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
> golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
> /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
> /home/lsc36/wireguard-go/device/device.go:530 +0x4c
>
> Goroutine 49 (running) created at:
> golang.zx2c4.com/wireguard/device.(*Peer).Start()
> /home/lsc36/wireguard-go/device/peer.go:198 +0x300
> golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
> /home/lsc36/wireguard-go/device/uapi.go:268 +0x164
> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
> /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
> main.main.func4.1()
> /home/lsc36/wireguard-go/main.go:245 +0x4c
>
> Goroutine 47 (running) created at:
> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
> /home/lsc36/wireguard-go/device/device.go:530 +0x4b8
> golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
> /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
> /home/lsc36/wireguard-go/device/uapi.go:183 +0x210
> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
> main.main.func4.1()
> /home/lsc36/wireguard-go/main.go:245 +0x4c
> ==================
>
> The bug looks introduced by a recent commit 9e2f386 "conn, device, tun:
> implement vectorized I/O on Linux". Without much knowledge to the
> codebase, it seems to me that Endpoint is supposed to be guarded by the
> RWMutex embedded in Peer, yet the change introduced a write without
> locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).
>
> In addition, I found it weird that Endpoint objects are reused via
> `endpointPool`. The docstring suggests "Endpoints are immutable, so we
> can re-use them" but that doesn't seem true.
>
> Added authors/reviewers of 9e2f386 to cc. Would you take a look?
>
> Regards,
> SC
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386
2023-03-23 17:00 ` Jordan Whited
@ 2023-03-24 3:21 ` lsc
2023-03-24 13:44 ` Jason A. Donenfeld
0 siblings, 1 reply; 5+ messages in thread
From: lsc @ 2023-03-24 3:21 UTC (permalink / raw)
To: Jordan Whited; +Cc: wireguard, james, Jason
Sure, here's the trace:
--
DEBUG: (wg1) 2023/03/22 23:46:39 peer(Suxf…6dEI) - Routine: sequential sender - stopped
panic: As4 called on IP zero value
goroutine 70 [running]:
net/netip.Addr.As4({{0xc0048b1200?, 0xc00058bb28?}, 0x0?})
/opt/go1.20.2.linux-arm64/src/net/netip/netip.go:690 +0x174
golang.zx2c4.com/wireguard/conn.setSrcControl(0xc00931b018, 0xc006301640)
/home/lsc36/wireguard-go/conn/sticky_linux.go:95 +0x280
golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4(0xc0003b6000, 0xffffa9d04200?, 0xc0000b8230, {0x24e6e8?, 0xc006301640?}, {0xc000533800, 0x1, 0xba250?})
/home/lsc36/wireguard-go/conn/bind_std.go:346 +0x178
golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send(0xc0003b6000, {0xc000533800, 0x1, 0x80}, {0x24e6e8, 0xc006301640})
/home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d8
golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers(0xc0000be380, {0xc000533800?, 0x1, 0x80})
/home/lsc36/wireguard-go/device/peer.go:126 +0x190
golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender(0xc0000be380, 0x80)
/home/lsc36/wireguard-go/device/send.go:519 +0x45c
created by golang.zx2c4.com/wireguard/device.(*Peer).Start
/home/lsc36/wireguard-go/device/peer.go:198 +0x304
--
The offending call `ep.SrcIP().As4()` (sticky_linux.go:95) is only
reached when `ep.SrcIP().Is4() && ep.SrcIP().IsValid()` is true, yet it
panicked saying it was being called against a zero value. That's why I
suspected it's a data race in the first place.
On 3/24/23 01:00, Jordan Whited wrote:
> SC, thank you for reporting. Do you happen to have a stack trace from
> one of the panics?
>
> On Thu, Mar 23, 2023 at 2:01 AM lsc <lsc@lv6.tw> wrote:
>>
>> Hi,
>>
>> Not sure if this is the correct place for bug reports, hope it's okay :)
>>
>> I found my wireguard-go server (built from current ToT 1417a47) always
>> panicking when I ran speedtest on a client via the server. I rebuilt the
>> server with data race detector (`go build -race`) and got the following
>> trace:
>>
>> ==================
>> WARNING: DATA RACE
>> Read at 0x00c00043b2f8 by goroutine 49:
>> golang.zx2c4.com/wireguard/conn.setSrcControl()
>> /home/lsc36/wireguard-go/conn/sticky_linux.go:76 +0x8c
>> golang.zx2c4.com/wireguard/conn.(*StdNetBind).send4()
>> /home/lsc36/wireguard-go/conn/bind_std.go:346 +0x174
>> golang.zx2c4.com/wireguard/conn.(*StdNetBind).Send()
>> /home/lsc36/wireguard-go/conn/bind_std.go:332 +0x1d4
>> golang.zx2c4.com/wireguard/device.(*Peer).SendBuffers()
>> /home/lsc36/wireguard-go/device/peer.go:126 +0x18c
>> golang.zx2c4.com/wireguard/device.(*Peer).RoutineSequentialSender()
>> /home/lsc36/wireguard-go/device/send.go:519 +0x458
>> golang.zx2c4.com/wireguard/device.(*Peer).Start.func2()
>> /home/lsc36/wireguard-go/device/peer.go:198 +0x40
>>
>> Previous write at 0x00c00043b2f8 by goroutine 47:
>> golang.zx2c4.com/wireguard/conn.(*StdNetEndpoint).ClearSrc()
>> /home/lsc36/wireguard-go/conn/bind_std.go:100 +0x48
>> golang.zx2c4.com/wireguard/conn.getSrcFromControl()
>> /home/lsc36/wireguard-go/conn/sticky_linux.go:18 +0x3c
>> golang.zx2c4.com/wireguard/conn.(*StdNetBind).makeReceiveIPv4.func1()
>> /home/lsc36/wireguard-go/conn/bind_std.go:232 +0x350
>> golang.zx2c4.com/wireguard/device.(*Device).RoutineReceiveIncoming()
>> /home/lsc36/wireguard-go/device/receive.go:107 +0x3ac
>> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate.func3()
>> /home/lsc36/wireguard-go/device/device.go:530 +0x4c
>>
>> Goroutine 49 (running) created at:
>> golang.zx2c4.com/wireguard/device.(*Peer).Start()
>> /home/lsc36/wireguard-go/device/peer.go:198 +0x300
>> golang.zx2c4.com/wireguard/device.(*ipcSetPeer).handlePostConfig()
>> /home/lsc36/wireguard-go/device/uapi.go:268 +0x164
>> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>> /home/lsc36/wireguard-go/device/uapi.go:160 +0x35c
>> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>> main.main.func4.1()
>> /home/lsc36/wireguard-go/main.go:245 +0x4c
>>
>> Goroutine 47 (running) created at:
>> golang.zx2c4.com/wireguard/device.(*Device).BindUpdate()
>> /home/lsc36/wireguard-go/device/device.go:530 +0x4b8
>> golang.zx2c4.com/wireguard/device.(*Device).handleDeviceLine()
>> /home/lsc36/wireguard-go/device/uapi.go:223 +0x45c
>> golang.zx2c4.com/wireguard/device.(*Device).IpcSetOperation()
>> /home/lsc36/wireguard-go/device/uapi.go:183 +0x210
>> golang.zx2c4.com/wireguard/device.(*Device).IpcHandle()
>> /home/lsc36/wireguard-go/device/uapi.go:428 +0x248
>> main.main.func4.1()
>> /home/lsc36/wireguard-go/main.go:245 +0x4c
>> ==================
>>
>> The bug looks introduced by a recent commit 9e2f386 "conn, device, tun:
>> implement vectorized I/O on Linux". Without much knowledge to the
>> codebase, it seems to me that Endpoint is supposed to be guarded by the
>> RWMutex embedded in Peer, yet the change introduced a write without
>> locking the mutex (`ep.ClearSrc()` called from sticky_linux.go:18).
>>
>> In addition, I found it weird that Endpoint objects are reused via
>> `endpointPool`. The docstring suggests "Endpoints are immutable, so we
>> can re-use them" but that doesn't seem true.
>>
>> Added authors/reviewers of 9e2f386 to cc. Would you take a look?
>>
>> Regards,
>> SC
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: wireguard-go: data race introduced in recent commit 9e2f386
2023-03-24 3:21 ` lsc
@ 2023-03-24 13:44 ` Jason A. Donenfeld
2023-03-24 13:45 ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld
0 siblings, 1 reply; 5+ messages in thread
From: Jason A. Donenfeld @ 2023-03-24 13:44 UTC (permalink / raw)
To: lsc; +Cc: Jordan Whited, wireguard, james
Thanks for the panic trace. That indeed looks like the same issue as
the race. Jordan has a stop-gap patch prepared, which I'll send to the
list.
Jason
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints
2023-03-24 13:44 ` Jason A. Donenfeld
@ 2023-03-24 13:45 ` Jason A. Donenfeld
0 siblings, 0 replies; 5+ messages in thread
From: Jason A. Donenfeld @ 2023-03-24 13:45 UTC (permalink / raw)
To: wireguard
Cc: Jordan Whited, lsc, Josh Bleecher Snyder, James Tucker,
Jason A . Donenfeld
From: Jordan Whited <jordan@tailscale.com>
In 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux"), the
Linux-specific Bind implementation was collapsed into StdNetBind. This
introduced a race on StdNetEndpoint from getSrcFromControl() and
setSrcControl().
Remove the sync.Pool involved in the race, and simplify StdNetBind's
receive path to allocate StdNetEndpoint on the heap instead, with the
intent for it to be cleaned up by the GC, later. This essentially
reverts ef5c587 ("conn: remove the final alloc per packet receive"),
adding back that allocation, unfortunately.
This does slightly increase resident memory usage in higher throughput
scenarios. StdNetBind is the only Bind implementation that was using
this Endpoint recycling technique prior to this commit.
This is considered a stop-gap solution, and there are plans to replace
the allocation with a better mechanism.
Reported-by: lsc <lsc@lv6.tw>
Link: https://lore.kernel.org/wireguard/ac87f86f-6837-4e0e-ec34-1df35f52540e@lv6.tw/
Fixes: 9e2f386 ("conn, device, tun: implement vectorized I/O on Linux")
Cc: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: James Tucker <james@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
conn/bind_std.go | 32 ++++++++------------------------
1 file changed, 8 insertions(+), 24 deletions(-)
diff --git a/conn/bind_std.go b/conn/bind_std.go
index e8cf7b8..491e571 100644
--- a/conn/bind_std.go
+++ b/conn/bind_std.go
@@ -93,7 +93,12 @@ var (
func (*StdNetBind) ParseEndpoint(s string) (Endpoint, error) {
e, err := netip.ParseAddrPort(s)
- return asEndpoint(e), err
+ if err != nil {
+ return nil, err
+ }
+ return &StdNetEndpoint{
+ AddrPort: e,
+ }, nil
}
func (e *StdNetEndpoint) ClearSrc() {
@@ -228,7 +233,7 @@ func (s *StdNetBind) makeReceiveIPv4(pc *ipv4.PacketConn, conn *net.UDPConn) Rec
msg := &(*msgs)[i]
sizes[i] = msg.N
addrPort := msg.Addr.(*net.UDPAddr).AddrPort()
- ep := asEndpoint(addrPort)
+ ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation
getSrcFromControl(msg.OOB[:msg.NN], ep)
eps[i] = ep
}
@@ -261,7 +266,7 @@ func (s *StdNetBind) makeReceiveIPv6(pc *ipv6.PacketConn, conn *net.UDPConn) Rec
msg := &(*msgs)[i]
sizes[i] = msg.N
addrPort := msg.Addr.(*net.UDPAddr).AddrPort()
- ep := asEndpoint(addrPort)
+ ep := &StdNetEndpoint{AddrPort: addrPort} // TODO: remove allocation
getSrcFromControl(msg.OOB[:msg.NN], ep)
eps[i] = ep
}
@@ -408,24 +413,3 @@ func (s *StdNetBind) send6(conn *net.UDPConn, pc *ipv6.PacketConn, ep Endpoint,
s.ipv6MsgsPool.Put(msgs)
return err
}
-
-// endpointPool contains a re-usable set of mapping from netip.AddrPort to Endpoint.
-// This exists to reduce allocations: Putting a netip.AddrPort in an Endpoint allocates,
-// but Endpoints are immutable, so we can re-use them.
-var endpointPool = sync.Pool{
- New: func() any {
- return make(map[netip.AddrPort]*StdNetEndpoint)
- },
-}
-
-// asEndpoint returns an Endpoint containing ap.
-func asEndpoint(ap netip.AddrPort) *StdNetEndpoint {
- m := endpointPool.Get().(map[netip.AddrPort]*StdNetEndpoint)
- defer endpointPool.Put(m)
- e, ok := m[ap]
- if !ok {
- e = &StdNetEndpoint{AddrPort: ap}
- m[ap] = e
- }
- return e
-}
--
2.40.0
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-03-24 13:47 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23 9:01 wireguard-go: data race introduced in recent commit 9e2f386 lsc
2023-03-23 17:00 ` Jordan Whited
2023-03-24 3:21 ` lsc
2023-03-24 13:44 ` Jason A. Donenfeld
2023-03-24 13:45 ` [PATCH wireguard-go] conn: fix StdNetEndpoint data race by dynamically allocating endpoints Jason A. Donenfeld
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).