Development discussion of WireGuard
 help / color / mirror / Atom feed
* Handshake state collision between parralel RoutineHandshake threads
@ 2020-12-22 10:16 Laura Zelenku
  2021-01-08  6:56 ` Laura Zelenku
  2021-02-25 11:23 ` Jason A. Donenfeld
  0 siblings, 2 replies; 6+ messages in thread
From: Laura Zelenku @ 2020-12-22 10:16 UTC (permalink / raw)
  To: wireguard

Hello devs,
I’m getting errors "handshake initiation must be consumed first” caused by handshake state is not set to HandshakeInitiationConsumed when processing handshake initiation from a peer. The handshake state is changed between “ConsumeMessageInitiation” and “CreateMessageResponse” by processing handshake response from the same peer in different thread.
You know, UDP packet can be delayed or lost mainly from mobile devices with worse mobile signal. After some time (handshake timeout) the peer that initiated connection doesn’t send any handshake (handshake is not delivered) to other side, the other side starts to send its own handshake initiation requests. And now there are both sides sending handshakes.

In a log I’ve found “Received handshake response”, “Received handshake initiation” and “Failed to create response message” on the same time (ms).

Proposals:
1) lower log level of "handshake initiation must be consumed first”
2) start only one RoutineHandshake (not based on nr. of CPUs)
3) extend critical section over whole MessageInitiation processing

thanks and I’m looking forward for following discussions
Laura
-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handshake state collision between parralel RoutineHandshake threads
  2020-12-22 10:16 Handshake state collision between parralel RoutineHandshake threads Laura Zelenku
@ 2021-01-08  6:56 ` Laura Zelenku
  2021-02-25 11:23 ` Jason A. Donenfeld
  1 sibling, 0 replies; 6+ messages in thread
From: Laura Zelenku @ 2021-01-08  6:56 UTC (permalink / raw)
  To: wireguard

Hi there,
we are able to prepare appropriate fix/pull request for this issue. We just need your desision as Wireguard representatives. There are listed 3 proposals in previous message. Do you see some of them feasible?

thank you



-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handshake state collision between parralel RoutineHandshake threads
  2020-12-22 10:16 Handshake state collision between parralel RoutineHandshake threads Laura Zelenku
  2021-01-08  6:56 ` Laura Zelenku
@ 2021-02-25 11:23 ` Jason A. Donenfeld
  2021-03-01 14:08   ` Laura Zelenku
  1 sibling, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2021-02-25 11:23 UTC (permalink / raw)
  To: Laura Zelenku; +Cc: WireGuard mailing list

Hi Laura,

I'm not sure this is actually a problem. The latest handshake message
should probably win the race. I don't see state machine or data
corruption here, but just one handshake interrupting another, which is
par for the course with WireGuard.

Or have I overlooked something important in the state machine implementation?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handshake state collision between parralel RoutineHandshake threads
  2021-02-25 11:23 ` Jason A. Donenfeld
@ 2021-03-01 14:08   ` Laura Zelenku
  2021-03-16 14:03     ` Laura Zelenku
  0 siblings, 1 reply; 6+ messages in thread
From: Laura Zelenku @ 2021-03-01 14:08 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Hi Jason,
I’ll try to explain the issue.

For incomming hanshake, the `handshake.state` is changing in the following way:
1. set state handshakeInitiationConsumed
2. check the state is handshakeInitiationConsumed otherwise "handshake initiation must be consumed first” error
3. set state handshakeResponseCreated
4. check the state is handshakeResponseCreated, otherwise "invalid state for keypair derivation” error
5. set state handshakeZeroed

For outgoing handshake the `handshake.state` is changing:
1. set state handshakeInitiationCreated
2. <sending handshake and waiting for response>
3. check the state is handshakeInitiationCreated, otherwise skip the packet
4. set state handshakeResponseConsumed
5. check the state is handshakeResponseConsumed, otherwise "invalid state for keypair derivation” error
6. set state handshakeZeroed

Usually only “client” is sending handshake initiations and the “server” responding. But in case some delay (e.g. cause by some network issues mainly for mobile devices) the “server” can start sending handshake initiations (expiredNewHandshake or expiredRetransmitHandshake timers). In this time the client and server are sending hanshake initiations against each other. "go device.RoutineHandshake()” is running in multiple threads. `handshake.state` is defined per peer. Two threads (RoutineHandshake) can process both handshakes (incomming, outgoing) in the same time and these threads are working with shared resource, handshake.state. Because the routine is expecting state that was set before and the second thread can modify the state, the routine can fail on checking the expected handshake.state.
This is happening to us. We are getting error "handshake initiation must be consumed first”. handshakeInitiationConsumed is expected but handshakeZeroed is actually set (set by different thread). The error is logged on error level (Failed to create response message).

Hope this will help to understand the issue well.

Laura


> On 25 Feb 2021, at 12:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> Hi Laura,
> 
> I'm not sure this is actually a problem. The latest handshake message
> should probably win the race. I don't see state machine or data
> corruption here, but just one handshake interrupting another, which is
> par for the course with WireGuard.
> 
> Or have I overlooked something important in the state machine implementation?
> 
> Jason


-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handshake state collision between parralel RoutineHandshake threads
  2021-03-01 14:08   ` Laura Zelenku
@ 2021-03-16 14:03     ` Laura Zelenku
  2021-03-23  9:47       ` Laura Zelenku
  0 siblings, 1 reply; 6+ messages in thread
From: Laura Zelenku @ 2021-03-16 14:03 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Still struggling with this issue. Running RoutineHandshake in single instance will help. Remember there is shared resource “peer.handshake.state”. Even the resource is per peer there are two directions (upstream/downstream) that can fight for this resource and write it’s own value.

Index: device/receive.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- device/receive.go	(revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
+++ device/receive.go	(date 1615902577604)
@@ -10,6 +10,7 @@
 	"encoding/binary"
 	"errors"
 	"net"
+	"runtime"
 	"sync"
 	"sync/atomic"
 	"time"
@@ -239,7 +240,9 @@
 func (device *Device) RoutineHandshake() {
 	defer func() {
 		device.log.Verbosef("Routine: handshake worker - stopped")
-		device.queue.encryption.wg.Done()
+		for i := 0; i < runtime.NumCPU(); i++ {
+			device.queue.encryption.wg.Done()
+		}
 	}()
 	device.log.Verbosef("Routine: handshake worker - started")
 
Index: device/device.go
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- device/device.go	(revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
+++ device/device.go	(date 1615902577566)
@@ -305,12 +305,12 @@
 
 	cpus := runtime.NumCPU()
 	device.state.stopping.Wait()
-	device.queue.encryption.wg.Add(cpus) // One for each RoutineHandshake
+	device.queue.encryption.wg.Add(cpus)
 	for i := 0; i < cpus; i++ {
 		go device.RoutineEncryption()
 		go device.RoutineDecryption()
-		go device.RoutineHandshake()
-	}
+	}
+	go device.RoutineHandshake()
 
 	device.state.stopping.Add(1)      // RoutineReadFromTUN
 	device.queue.encryption.wg.Add(1) // RoutineReadFromTUN


> On 1. 3. 2021, at 15:08, Laura Zelenku <laura.zelenku@wandera.com> wrote:
> 
> Hi Jason,
> I’ll try to explain the issue.
> 
> For incomming hanshake, the `handshake.state` is changing in the following way:
> 1. set state handshakeInitiationConsumed
> 2. check the state is handshakeInitiationConsumed otherwise "handshake initiation must be consumed first” error
> 3. set state handshakeResponseCreated
> 4. check the state is handshakeResponseCreated, otherwise "invalid state for keypair derivation” error
> 5. set state handshakeZeroed
> 
> For outgoing handshake the `handshake.state` is changing:
> 1. set state handshakeInitiationCreated
> 2. <sending handshake and waiting for response>
> 3. check the state is handshakeInitiationCreated, otherwise skip the packet
> 4. set state handshakeResponseConsumed
> 5. check the state is handshakeResponseConsumed, otherwise "invalid state for keypair derivation” error
> 6. set state handshakeZeroed
> 
> Usually only “client” is sending handshake initiations and the “server” responding. But in case some delay (e.g. cause by some network issues mainly for mobile devices) the “server” can start sending handshake initiations (expiredNewHandshake or expiredRetransmitHandshake timers). In this time the client and server are sending hanshake initiations against each other. "go device.RoutineHandshake()” is running in multiple threads. `handshake.state` is defined per peer. Two threads (RoutineHandshake) can process both handshakes (incomming, outgoing) in the same time and these threads are working with shared resource, handshake.state. Because the routine is expecting state that was set before and the second thread can modify the state, the routine can fail on checking the expected handshake.state.
> This is happening to us. We are getting error "handshake initiation must be consumed first”. handshakeInitiationConsumed is expected but handshakeZeroed is actually set (set by different thread). The error is logged on error level (Failed to create response message).
> 
> Hope this will help to understand the issue well.
> 
> Laura
> 
> 
>> On 25 Feb 2021, at 12:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> 
>> Hi Laura,
>> 
>> I'm not sure this is actually a problem. The latest handshake message
>> should probably win the race. I don't see state machine or data
>> corruption here, but just one handshake interrupting another, which is
>> par for the course with WireGuard.
>> 
>> Or have I overlooked something important in the state machine implementation?
>> 
>> Jason
> 


-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Handshake state collision between parralel RoutineHandshake threads
  2021-03-16 14:03     ` Laura Zelenku
@ 2021-03-23  9:47       ` Laura Zelenku
  0 siblings, 0 replies; 6+ messages in thread
From: Laura Zelenku @ 2021-03-23  9:47 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Or can we at least lower log level of "Failed to create response message” if it is not problem? The error is really disturbing and as you wrote one of the handshakes is processed.

> On 16. 3. 2021, at 15:03, Laura Zelenku <laura.zelenku@wandera.com> wrote:
> 
> Still struggling with this issue. Running RoutineHandshake in single instance will help. Remember there is shared resource “peer.handshake.state”. Even the resource is per peer there are two directions (upstream/downstream) that can fight for this resource and write it’s own value.
> 
> Index: device/receive.go
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- device/receive.go	(revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
> +++ device/receive.go	(date 1615902577604)
> @@ -10,6 +10,7 @@
> 	"encoding/binary"
> 	"errors"
> 	"net"
> +	"runtime"
> 	"sync"
> 	"sync/atomic"
> 	"time"
> @@ -239,7 +240,9 @@
> func (device *Device) RoutineHandshake() {
> 	defer func() {
> 		device.log.Verbosef("Routine: handshake worker - stopped")
> -		device.queue.encryption.wg.Done()
> +		for i := 0; i < runtime.NumCPU(); i++ {
> +			device.queue.encryption.wg.Done()
> +		}
> 	}()
> 	device.log.Verbosef("Routine: handshake worker - started")
> 
> Index: device/device.go
> IDEA additional info:
> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
> <+>UTF-8
> ===================================================================
> --- device/device.go	(revision 5f0c8b942d93be6ac36a156c0ba44c86c3698f91)
> +++ device/device.go	(date 1615902577566)
> @@ -305,12 +305,12 @@
> 
> 	cpus := runtime.NumCPU()
> 	device.state.stopping.Wait()
> -	device.queue.encryption.wg.Add(cpus) // One for each RoutineHandshake
> +	device.queue.encryption.wg.Add(cpus)
> 	for i := 0; i < cpus; i++ {
> 		go device.RoutineEncryption()
> 		go device.RoutineDecryption()
> -		go device.RoutineHandshake()
> -	}
> +	}
> +	go device.RoutineHandshake()
> 
> 	device.state.stopping.Add(1)      // RoutineReadFromTUN
> 	device.queue.encryption.wg.Add(1) // RoutineReadFromTUN
> 
> 
>> On 1. 3. 2021, at 15:08, Laura Zelenku <laura.zelenku@wandera.com> wrote:
>> 
>> Hi Jason,
>> I’ll try to explain the issue.
>> 
>> For incomming hanshake, the `handshake.state` is changing in the following way:
>> 1. set state handshakeInitiationConsumed
>> 2. check the state is handshakeInitiationConsumed otherwise "handshake initiation must be consumed first” error
>> 3. set state handshakeResponseCreated
>> 4. check the state is handshakeResponseCreated, otherwise "invalid state for keypair derivation” error
>> 5. set state handshakeZeroed
>> 
>> For outgoing handshake the `handshake.state` is changing:
>> 1. set state handshakeInitiationCreated
>> 2. <sending handshake and waiting for response>
>> 3. check the state is handshakeInitiationCreated, otherwise skip the packet
>> 4. set state handshakeResponseConsumed
>> 5. check the state is handshakeResponseConsumed, otherwise "invalid state for keypair derivation” error
>> 6. set state handshakeZeroed
>> 
>> Usually only “client” is sending handshake initiations and the “server” responding. But in case some delay (e.g. cause by some network issues mainly for mobile devices) the “server” can start sending handshake initiations (expiredNewHandshake or expiredRetransmitHandshake timers). In this time the client and server are sending hanshake initiations against each other. "go device.RoutineHandshake()” is running in multiple threads. `handshake.state` is defined per peer. Two threads (RoutineHandshake) can process both handshakes (incomming, outgoing) in the same time and these threads are working with shared resource, handshake.state. Because the routine is expecting state that was set before and the second thread can modify the state, the routine can fail on checking the expected handshake.state.
>> This is happening to us. We are getting error "handshake initiation must be consumed first”. handshakeInitiationConsumed is expected but handshakeZeroed is actually set (set by different thread). The error is logged on error level (Failed to create response message).
>> 
>> Hope this will help to understand the issue well.
>> 
>> Laura
>> 
>> 
>>> On 25 Feb 2021, at 12:23, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>> 
>>> Hi Laura,
>>> 
>>> I'm not sure this is actually a problem. The latest handshake message
>>> should probably win the race. I don't see state machine or data
>>> corruption here, but just one handshake interrupting another, which is
>>> par for the course with WireGuard.
>>> 
>>> Or have I overlooked something important in the state machine implementation?
>>> 
>>> Jason
>> 
> 


-- 
*IMPORTANT NOTICE*: This email, its attachments and any rights attaching 
hereto are confidential and intended exclusively for the person to whom the 
email is addressed. If you are not the intended recipient, do not read, 
copy, disclose or use the contents in any way. Wandera accepts no liability 
for any loss, damage or consequence resulting directly or indirectly from 
the use of this email and attachments.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-23 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-22 10:16 Handshake state collision between parralel RoutineHandshake threads Laura Zelenku
2021-01-08  6:56 ` Laura Zelenku
2021-02-25 11:23 ` Jason A. Donenfeld
2021-03-01 14:08   ` Laura Zelenku
2021-03-16 14:03     ` Laura Zelenku
2021-03-23  9:47       ` Laura Zelenku

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).