From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12313
Path: news.gmane.org!.POSTED!not-for-mail
From: Jens Gustedt <Jens.Gustedt@inria.fr>
Newsgroups: gmane.linux.lib.musl.general
Subject: [PATCH 1/7] a new lock algorithm with lock value and congestion
 count in the same atomic int
Date: Wed, 3 Jan 2018 14:17:12 +0100
Message-ID: <add3e6cdffdca6a484278d46087e500e4374a3b6.1514985618.git.Jens.Gustedt@inria.fr>
References: <cover.1514985618.git.Jens.Gustedt@inria.fr>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1514987202 23761 195.159.176.226 (3 Jan 2018 13:46:42 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 3 Jan 2018 13:46:42 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-12330-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jan 03 14:46:37 2018
Return-path: <musl-return-12330-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-12330-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1eWjNJ-0005h1-7F
	for gllmg-musl@m.gmane.org; Wed, 03 Jan 2018 14:46:33 +0100
Original-Received: (qmail 7212 invoked by uid 550); 3 Jan 2018 13:48:16 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 6102 invoked from network); 3 Jan 2018 13:48:14 -0000
X-IronPort-AV: E=Sophos;i="5.45,501,1508796000"; 
   d="scan'208";a="307522383"
In-Reply-To: <cover.1514985618.git.Jens.Gustedt@inria.fr>
Resent-Date: Wed, 3 Jan 2018 14:43:38 +0100
Resent-From: Jens Gustedt <jens.gustedt@inria.fr>
Resent-Message-ID: <20180103144338.24b49f36@inria.fr>
Resent-To: musl <musl@lists.openwall.com>
Xref: news.gmane.org gmane.linux.lib.musl.general:12313
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/12313>

A variant of this new lock algorithm has been presented at SAC'16, see
https://hal.inria.fr/hal-01304108. A full version of that paper is
available at https://hal.inria.fr/hal-01236734.

The main motivation of this is to improve on the safety of the basic lock
implementation in musl. This is achieved by squeezing a lock flag and a
congestion count (= threads inside the critical section) into a single
int. Thereby an unlock operation does exactly one memory
transfer (a_fetch_add) and never touches the value again, but still
detects if a waiter has to be woken up.

This is a fix of a use-after-free bug in pthread_detach that had
temporarily been patched. Therefore this patch also reverts

         c1e27367a9b26b9baac0f37a12349fc36567c8b6

This is also the only place where internal knowledge of the lock
algorithm is used.

The main price for the improved safety is a little bit larger code.

Under high congestion, the scheduling behavior will be different
compared to the previous algorithm. In that case, a successful
put-to-sleep may appear out of order compared to the arrival in the
critical section.

This is version 3 of the patch that takes remarks of Alexander and Rich
into account:

 - all lock values are now formulated with respect to INT_MIN
 - unlock now uses __wake to wake up waiters
 - a new inline function __futexwait with the same complexity as __wake
   is use to set threads to sleep
 - __unlock covers the case that no lock had been recorded
 - the fast path of __lock is moved outside the loop in a prominent position
 - revert c1e27367
---
 src/internal/pthread_impl.h |  6 +++++
 src/thread/__lock.c         | 55 ++++++++++++++++++++++++++++++++++++++++-----
 src/thread/pthread_detach.c |  5 ++---
 3 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
index 56e19348..602d6f56 100644
--- a/src/internal/pthread_impl.h
+++ b/src/internal/pthread_impl.h
@@ -136,6 +136,12 @@ static inline void __wake(volatile void *addr, int cnt, int priv)
 	__syscall(SYS_futex, addr, FUTEX_WAKE|priv, cnt) != -ENOSYS ||
 	__syscall(SYS_futex, addr, FUTEX_WAKE, cnt);
 }
+static inline void __futexwait(volatile void *addr, int val, int priv)
+{
+	if (priv) priv = FUTEX_PRIVATE;
+	__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val) != -ENOSYS ||
+	__syscall(SYS_futex, addr, FUTEX_WAIT, val);
+}
 
 void __acquire_ptc(void);
 void __release_ptc(void);
diff --git a/src/thread/__lock.c b/src/thread/__lock.c
index 0874c04a..45557c88 100644
--- a/src/thread/__lock.c
+++ b/src/thread/__lock.c
@@ -1,15 +1,60 @@
 #include "pthread_impl.h"
 
+/* This lock primitive combines a flag (in the sign bit) and a
+ * congestion count (= threads inside the critical section, CS) in a
+ * single int that is accessed through atomic operations. The states
+ * of the int for value x are:
+ *
+ * x == 0: unlocked and no thread inside the critical section
+ *
+ * x < 0: locked with a congestion of x-INT_MIN, including the thread
+ * that holds the lock
+ *
+ * x > 0: unlocked with a congestion of x
+ *
+ * or in an equivalent formulation x is the congestion count or'ed
+ * with INT_MIN as a lock flag.
+ */
+
 void __lock(volatile int *l)
 {
-	if (libc.threads_minus_1)
-		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
+	if (!libc.threads_minus_1) return;
+	/* fast path: INT_MIN for the lock, +1 for the congestion */
+	int current = a_cas(l, 0, INT_MIN + 1);
+	if (!current) return;
+	/* A first spin loop, for medium congestion. */
+	for (unsigned i = 0; i < 10; ++i) {
+		if (current < 0) current -= INT_MIN + 1;
+		// assertion: current >= 0
+		int val = a_cas(l, current, INT_MIN + (current + 1));
+		if (val == current) return;
+		current = val;
+	}
+	// Spinning failed, so mark ourselves as being inside the CS.
+	current = a_fetch_add(l, 1) + 1;
+	/* The main lock acquisition loop for heavy congestion. The only
+	 * change to the value performed inside that loop is a successful
+	 * lock via the CAS that acquires the lock. */
+	for (;;) {
+		/* We can only go into wait, if we know that somebody holds the
+		 * lock and will eventually wake us up, again. */
+		if (current < 0) {
+			__futexwait(l, current, 1);
+			current -= INT_MIN + 1;
+		}
+		/* assertion: current > 0, the count includes us already. */
+		int val = a_cas(l, current, INT_MIN + current);
+		if (val == current) return;
+		current = val;
+	}
 }
 
 void __unlock(volatile int *l)
 {
-	if (l[0]) {
-		a_store(l, 0);
-		if (l[1]) __wake(l, 1, 1);
+	/* Check l[0] to see if we are multi-threaded. */
+	if (l[0] < 0) {
+		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
+			__wake(l, 1, 1);
+		}
 	}
 }
diff --git a/src/thread/pthread_detach.c b/src/thread/pthread_detach.c
index 13482607..d9c90d1c 100644
--- a/src/thread/pthread_detach.c
+++ b/src/thread/pthread_detach.c
@@ -6,11 +6,10 @@ int __pthread_join(pthread_t, void **);
 static int __pthread_detach(pthread_t t)
 {
 	/* Cannot detach a thread that's already exiting */
-	if (a_swap(t->exitlock, 1))
+	if (a_cas(t->exitlock, 0, INT_MIN + 1))
 		return __pthread_join(t, 0);
 	t->detached = 2;
-	a_store(t->exitlock, 0);
-	__wake(t->exitlock, 1, 1);
+	__unlock(t->exitlock);
 	return 0;
 }
 
-- 
2.15.1