[PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
@ 2017-06-16  7:11 ` Jens Gustedt
  2017-12-20 21:58   ` Rich Felker
  2017-06-20 13:25 ` [PATCH 3/8] revise the definition of multiple basic locks in the code Jens Gustedt
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-16  7:11 UTC (permalink / raw)
  To: musl

A variant of this new lock algorithm has been presented at SAC'16, see
https://hal.inria.fr/hal-01304108. A full version of that paper is
available at https://hal.inria.fr/hal-01236734.

The main motivation of this is to improve on the safety of the basic lock
implementation in musl. This is achieved by squeezing lock value and
"waiter" count into a single int. Thereby an unlock operation does
exactly one memory transfer (a_fetch_add) and never touches the value
again, but still detects if a waiter has to be woken up.

Another effect could be improved performance, but don't take this too
seriously, these lock/unlock functions are probably never used in
performance critical parts of libc.

Depending on the architecture, the performance of the previous lock
strategy was not always so great under no or medium congestion. Now, in
case of no congestion, there are exactly two memory operations, one to
lock an one to unlock. So if we would start to use this same new strategy
also for user locks such as mutexes, semaphores or the locks used for
lockfull atomics, we might see a performance improvement.

Also, the previous algorithm could be terrible under high load. Key to
the better performance is, again, the combination of lock value and
"waiter" count into one atomic entity. This drastically reduces the
memory transfers that have to be performed under load. In particular our
main acquisition loop changes this atomic int exactly once, namely when
the lock is acquired, and so the mutual disturbance between acquiring
threads is reduced. Again, this is probably not an issue, because this
lock/unlock algorithm is *not* used in places that arbitrarily hammer
thousands of requests on the same poor lock. (A throughput test in thread
creation shows about 50000 threads created per second on my machine, not
enough that all of this could make a difference.)

The main price for the improved safety is a little bit larger code.

Also under high congestion, the scheduling behavior will be different
compared to the previous algorithm. In that case, a successful
put-to-sleep may appear out of order compared to the arrival in the
critical section. I am not sure that this is very different from the
previous behavior, nor that scheduling order for these lock primitives is
anything that an application should ever count on.

For the patch itself, I found only one other place where knowledge of the
internals of the lock algorithm is used. Here I patch this usage with the
appropriate CAS, but this should perhaps be extracted as a __trylock
function or, maybe even better, macro.

This is version 2 of the patch that takes some of the remarks of
Alexander and Rich into account:

 - all lock values are now formulated with respect to INT_MIN
 - unlock now uses __wake to wake up waiters
 - a new inline function __futexwait with the same complexity as __wake
   is use to set threads to sleep
 - __unlock covers the case that no lock had been recorded
 - the fast path of __lock is moved outside the loop in a prominent position
---
 src/internal/pthread_impl.h |  6 ++++++
 src/thread/__lock.c         | 42 +++++++++++++++++++++++++++++++++++++-----
 src/thread/pthread_detach.c |  2 +-
 3 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
index 757b86ad..0622ad52 100644
--- a/src/internal/pthread_impl.h
+++ b/src/internal/pthread_impl.h
@@ -132,6 +132,12 @@ static inline void __wake(volatile void *addr, int cnt, int priv)
 	__syscall(SYS_futex, addr, FUTEX_WAKE|priv, cnt) != -ENOSYS ||
 	__syscall(SYS_futex, addr, FUTEX_WAKE, cnt);
 }
+static inline void __futexwait(volatile void *addr, int val, int priv)
+{
+	if (priv) priv = 128;
+	__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val) != -ENOSYS ||
+	__syscall(SYS_futex, addr, FUTEX_WAIT, val);
+}

 void __acquire_ptc(void);
 void __release_ptc(void);
diff --git a/src/thread/__lock.c b/src/thread/__lock.c
index 0874c04a..56092240 100644
--- a/src/thread/__lock.c
+++ b/src/thread/__lock.c
@@ -2,14 +2,46 @@

 void __lock(volatile int *l)
 {
-	if (libc.threads_minus_1)
-		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
+	if (!libc.threads_minus_1) return;
+        /* fast path: INT_MIN for holding the lock, +1 to count this
+           thread in the critical section. */
+	int current = a_cas(l, 0, INT_MIN + 1);
+        if (!current) return;
+	/* A first spin lock acquisition loop, for the case of
+	   medium congestion. */
+	for (unsigned i = 0; i < 10; ++i) {
+		if (current < 0) current -= INT_MIN + 1;
+		// assertion: current >= 0
+		int val = a_cas(l, current, INT_MIN + (current + 1));
+		if (val == current) return;
+		current = val;
+	}
+	// Spinning failed, so mark ourselves as being inside the CS.
+	current = a_fetch_add(l, 1) + 1;
+	/* The main lock acquisition loop for heavy congestion. The only
+	   change to the value performed inside that loop is a successful
+	   lock via the CAS that acquires the lock. */
+	for (;;) {
+		/* We can only go into wait, if we know that somebody holds the
+		   lock and will eventually wake us up, again. */
+		if (current < 0) {
+			__futexwait(l, current, 1);
+			current -= INT_MIN + 1;
+		}
+		/* assertion: current > 0, because the count
+		   includes us already. */
+		int val = a_cas(l, current, INT_MIN + current);
+		if (val == current) return;
+		current = val;
+	}
 }

 void __unlock(volatile int *l)
 {
-	if (l[0]) {
-		a_store(l, 0);
-		if (l[1]) __wake(l, 1, 1);
+	/* We have to check if l[0] had been touched at all. */
+	if (l[0] < 0) {
+		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
+			__wake(l, 1, 1);
+		}
 	}
 }
diff --git a/src/thread/pthread_detach.c b/src/thread/pthread_detach.c
index ed77f74d..818626ed 100644
--- a/src/thread/pthread_detach.c
+++ b/src/thread/pthread_detach.c
@@ -6,7 +6,7 @@ int __pthread_join(pthread_t, void **);
 static int __pthread_detach(pthread_t t)
 {
 	/* Cannot detach a thread that's already exiting */
-	if (a_swap(t->exitlock, 1))
+	if (a_cas(t->exitlock, 0, -INT_MAX))
 		return __pthread_join(t, 0);
 	t->detached = 2;
 	__unlock(t->exitlock);

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/8] revise the definition of multiple basic locks in the code
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
  2017-06-16  7:11 ` [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int Jens Gustedt
@ 2017-06-20 13:25 ` Jens Gustedt
  2017-06-20 19:08 ` [PATCH 6/8] use the new lock algorithm for malloc Jens Gustedt
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-20 13:25 UTC (permalink / raw)
  To: musl

This was done by temporarily using a struct type to identify all the
users of the __lock an __unlock primitive.

In most cases this is just a change from two volatile int to one.

One place (__get_locale) also was missing a volatile, a minor bug.
---
 src/dirent/__dirent.h       | 2 +-
 src/exit/at_quick_exit.c    | 2 +-
 src/exit/atexit.c           | 2 +-
 src/internal/pthread_impl.h | 5 +++--
 src/locale/dcngettext.c     | 2 +-
 src/locale/locale_map.c     | 2 +-
 src/locale/setlocale.c      | 2 +-
 src/malloc/lite_malloc.c    | 2 +-
 src/misc/syslog.c           | 2 +-
 src/prng/random.c           | 2 +-
 src/stdio/ofl.c             | 2 +-
 src/thread/pthread_atfork.c | 2 +-
 src/thread/sem_open.c       | 2 +-
 src/thread/synccall.c       | 2 +-
 src/time/__tz.c             | 2 +-
 15 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/src/dirent/__dirent.h b/src/dirent/__dirent.h
index 32871baf..101b0368 100644
--- a/src/dirent/__dirent.h
+++ b/src/dirent/__dirent.h
@@ -4,6 +4,6 @@ struct __dirstream
 	off_t tell;
 	int buf_pos;
 	int buf_end;
-	volatile int lock[2];
+	volatile int lock[1];
 	char buf[2048];
 };
diff --git a/src/exit/at_quick_exit.c b/src/exit/at_quick_exit.c
index 34541bad..4e8b119e 100644
--- a/src/exit/at_quick_exit.c
+++ b/src/exit/at_quick_exit.c
@@ -5,7 +5,7 @@
 
 static void (*funcs[COUNT])(void);
 static int count;
-static volatile int lock[2];
+static volatile int lock[1];
 
 void __funcs_on_quick_exit()
 {
diff --git a/src/exit/atexit.c b/src/exit/atexit.c
index 2b58b8bb..cd3b0a64 100644
--- a/src/exit/atexit.c
+++ b/src/exit/atexit.c
@@ -13,7 +13,7 @@ static struct fl
 } builtin, *head;
 
 static int slot;
-static volatile int lock[2];
+static volatile int lock[1];
 
 void __funcs_on_exit()
 {
diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
index 0622ad52..e1dd4421 100644
--- a/src/internal/pthread_impl.h
+++ b/src/internal/pthread_impl.h
@@ -39,8 +39,8 @@ struct pthread {
 	int unblock_cancel;
 	volatile int timer_id;
 	locale_t locale;
-	volatile int killlock[2];
-	volatile int exitlock[2];
+	volatile int killlock[1];
+	volatile int exitlock[1];
 	volatile int startlock[2];
 	unsigned long sigmask[_NSIG/8/sizeof(long)];
 	char *dlerror_buf;
@@ -115,6 +115,7 @@ int __clone(int (*)(void *), void *, int, void *, ...);
 int __set_thread_area(void *);
 int __libc_sigaction(int, const struct sigaction *, struct sigaction *);
 int __libc_sigprocmask(int, const sigset_t *, sigset_t *);
+
 void __lock(volatile int *);
 void __unmapself(void *, size_t);
 
diff --git a/src/locale/dcngettext.c b/src/locale/dcngettext.c
index a5ff8475..26023676 100644
--- a/src/locale/dcngettext.c
+++ b/src/locale/dcngettext.c
@@ -34,7 +34,7 @@ static char *gettextdir(const char *domainname, size_t *dirlen)
 
 char *bindtextdomain(const char *domainname, const char *dirname)
 {
-	static volatile int lock[2];
+	static volatile int lock[1];
 	struct binding *p, *q;
 
 	if (!domainname) return 0;
diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
index c3e59174..79542310 100644
--- a/src/locale/locale_map.c
+++ b/src/locale/locale_map.c
@@ -26,7 +26,7 @@ static const char envvars[][12] = {
 
 const struct __locale_map *__get_locale(int cat, const char *val)
 {
-	static int lock[2];
+	static volatile int lock[1];
 	static void *volatile loc_head;
 	const struct __locale_map *p;
 	struct __locale_map *new = 0;
diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
index 8dae5a4e..3af87e05 100644
--- a/src/locale/setlocale.c
+++ b/src/locale/setlocale.c
@@ -21,7 +21,7 @@ char *__strchrnul(const char *, int);
 
 char *setlocale(int cat, const char *name)
 {
-	static volatile int lock[2];
+	static volatile int lock[1];
 
 	if ((unsigned)cat > LC_ALL) return 0;
 
diff --git a/src/malloc/lite_malloc.c b/src/malloc/lite_malloc.c
index a7e4a9f7..701f60b4 100644
--- a/src/malloc/lite_malloc.c
+++ b/src/malloc/lite_malloc.c
@@ -11,7 +11,7 @@ void *__expand_heap(size_t *);
 static void *__simple_malloc(size_t n)
 {
 	static char *cur, *end;
-	static volatile int lock[2];
+	static volatile int lock[1];
 	size_t align=1, pad;
 	void *p;
 
diff --git a/src/misc/syslog.c b/src/misc/syslog.c
index 9dd1ddb5..60e9020f 100644
--- a/src/misc/syslog.c
+++ b/src/misc/syslog.c
@@ -11,7 +11,7 @@
 #include <fcntl.h>
 #include "libc.h"
 
-static volatile int lock[2];
+static volatile int lock[1];
 static char log_ident[32];
 static int log_opt;
 static int log_facility = LOG_USER;
diff --git a/src/prng/random.c b/src/prng/random.c
index 7d557d70..13a5e6df 100644
--- a/src/prng/random.c
+++ b/src/prng/random.c
@@ -22,7 +22,7 @@ static int n = 31;
 static int i = 3;
 static int j = 0;
 static uint32_t *x = init+1;
-static volatile int lock[2];
+static volatile int lock[1];
 
 static uint32_t lcg31(uint32_t x) {
 	return (1103515245*x + 12345) & 0x7fffffff;
diff --git a/src/stdio/ofl.c b/src/stdio/ofl.c
index b143999c..0e3602aa 100644
--- a/src/stdio/ofl.c
+++ b/src/stdio/ofl.c
@@ -2,7 +2,7 @@
 #include "libc.h"
 
 static FILE *ofl_head;
-static volatile int ofl_lock[2];
+static volatile int ofl_lock[1];
 
 FILE **__ofl_lock()
 {
diff --git a/src/thread/pthread_atfork.c b/src/thread/pthread_atfork.c
index a40d7f63..c6f77b3f 100644
--- a/src/thread/pthread_atfork.c
+++ b/src/thread/pthread_atfork.c
@@ -8,7 +8,7 @@ static struct atfork_funcs {
 	struct atfork_funcs *prev, *next;
 } *funcs;
 
-static volatile int lock[2];
+static volatile int lock[1];
 
 void __fork_handler(int who)
 {
diff --git a/src/thread/sem_open.c b/src/thread/sem_open.c
index fda0acd3..dc0279e8 100644
--- a/src/thread/sem_open.c
+++ b/src/thread/sem_open.c
@@ -20,7 +20,7 @@ static struct {
 	sem_t *sem;
 	int refcnt;
 } *semtab;
-static volatile int lock[2];
+static volatile int lock[1];
 
 #define FLAGS (O_RDWR|O_NOFOLLOW|O_CLOEXEC|O_NONBLOCK)
 
diff --git a/src/thread/synccall.c b/src/thread/synccall.c
index 000ec4e3..14d40506 100644
--- a/src/thread/synccall.c
+++ b/src/thread/synccall.c
@@ -14,7 +14,7 @@ static struct chain {
 	sem_t target_sem, caller_sem;
 } *volatile head;
 
-static volatile int synccall_lock[2];
+static volatile int synccall_lock[1];
 static volatile int target_tid;
 static void (*callback)(void *), *context;
 static volatile int dummy = 0;
diff --git a/src/time/__tz.c b/src/time/__tz.c
index 0e0c4ea2..b5951a67 100644
--- a/src/time/__tz.c
+++ b/src/time/__tz.c
@@ -27,7 +27,7 @@ static char old_tz_buf[32];
 static char *old_tz = old_tz_buf;
 static size_t old_tz_size = sizeof old_tz_buf;
 
-static volatile int lock[2];
+static volatile int lock[1];
 
 static int getint(const char **p)
 {


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/8] use the new lock algorithm for malloc
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
  2017-06-16  7:11 ` [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int Jens Gustedt
  2017-06-20 13:25 ` [PATCH 3/8] revise the definition of multiple basic locks in the code Jens Gustedt
@ 2017-06-20 19:08 ` Jens Gustedt
  2017-06-23 15:01   ` Jens Gustedt
  2017-06-20 19:41 ` [PATCH 2/8] consistently use the LOCK an UNLOCK macros Jens Gustedt
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-20 19:08 UTC (permalink / raw)
  To: musl

Malloc used a specialized lock implementation in many places. Now that we
have a generic lock that has the desired properties, we should just use
this, instead of this multitude of very similar lock mechanisms.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/8] consistently use the LOCK an UNLOCK macros
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (2 preceding siblings ...)
  2017-06-20 19:08 ` [PATCH 6/8] use the new lock algorithm for malloc Jens Gustedt
@ 2017-06-20 19:41 ` Jens Gustedt
  2017-06-20 19:44 ` [PATCH 5/8] separate the fast parts of __lock and __unlock into a .h file that may be used by other TU Jens Gustedt
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-20 19:41 UTC (permalink / raw)
  To: musl

In some places there has been a direct usage of the functions. Use the
macros consistently everywhere, such that it might be easier later on to
capture the fast path directly inside the macro and only have the call
overhead on the slow path.
---
 src/thread/pthread_create.c        | 6 +++---
 src/thread/pthread_detach.c        | 2 +-
 src/thread/pthread_getschedparam.c | 4 ++--
 src/thread/pthread_kill.c          | 4 ++--
 src/thread/pthread_setschedparam.c | 4 ++--
 src/thread/pthread_setschedprio.c  | 4 ++--
 6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 49f2f729..ca5c6b90 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -37,10 +37,10 @@ _Noreturn void __pthread_exit(void *result)
 
 	__pthread_tsd_run_dtors();
 
-	__lock(self->exitlock);
+	LOCK(self->exitlock);
 
 	/* Mark this thread dead before decrementing count */
-	__lock(self->killlock);
+	LOCK(self->killlock);
 	self->dead = 1;
 
 	/* Block all signals before decrementing the live thread count.
@@ -54,7 +54,7 @@ _Noreturn void __pthread_exit(void *result)
 	 * been blocked. This precludes observation of the thread id
 	 * as a live thread (with application code running in it) after
 	 * the thread was reported dead by ESRCH being returned. */
-	__unlock(self->killlock);
+	UNLOCK(self->killlock);
 
 	/* It's impossible to determine whether this is "the last thread"
 	 * until performing the atomic decrement, since multiple threads
diff --git a/src/thread/pthread_detach.c b/src/thread/pthread_detach.c
index 818626ed..cf0397f7 100644
--- a/src/thread/pthread_detach.c
+++ b/src/thread/pthread_detach.c
@@ -9,7 +9,7 @@ static int __pthread_detach(pthread_t t)
 	if (a_cas(t->exitlock, 0, -INT_MAX))
 		return __pthread_join(t, 0);
 	t->detached = 2;
-	__unlock(t->exitlock);
+	UNLOCK(t->exitlock);
 	return 0;
 }
 
diff --git a/src/thread/pthread_getschedparam.c b/src/thread/pthread_getschedparam.c
index 3053c186..a994b637 100644
--- a/src/thread/pthread_getschedparam.c
+++ b/src/thread/pthread_getschedparam.c
@@ -3,7 +3,7 @@
 int pthread_getschedparam(pthread_t t, int *restrict policy, struct sched_param *restrict param)
 {
 	int r;
-	__lock(t->killlock);
+	LOCK(t->killlock);
 	if (t->dead) {
 		r = ESRCH;
 	} else {
@@ -12,6 +12,6 @@ int pthread_getschedparam(pthread_t t, int *restrict policy, struct sched_param
 			*policy = __syscall(SYS_sched_getscheduler, t->tid);
 		}
 	}
-	__unlock(t->killlock);
+	UNLOCK(t->killlock);
 	return r;
 }
diff --git a/src/thread/pthread_kill.c b/src/thread/pthread_kill.c
index acdb1ea6..f0903420 100644
--- a/src/thread/pthread_kill.c
+++ b/src/thread/pthread_kill.c
@@ -3,8 +3,8 @@
 int pthread_kill(pthread_t t, int sig)
 {
 	int r;
-	__lock(t->killlock);
+	LOCK(t->killlock);
 	r = t->dead ? ESRCH : -__syscall(SYS_tkill, t->tid, sig);
-	__unlock(t->killlock);
+	UNLOCK(t->killlock);
 	return r;
 }
diff --git a/src/thread/pthread_setschedparam.c b/src/thread/pthread_setschedparam.c
index c4738d64..9e2fa456 100644
--- a/src/thread/pthread_setschedparam.c
+++ b/src/thread/pthread_setschedparam.c
@@ -3,8 +3,8 @@
 int pthread_setschedparam(pthread_t t, int policy, const struct sched_param *param)
 {
 	int r;
-	__lock(t->killlock);
+	LOCK(t->killlock);
 	r = t->dead ? ESRCH : -__syscall(SYS_sched_setscheduler, t->tid, policy, param);
-	__unlock(t->killlock);
+	UNLOCK(t->killlock);
 	return r;
 }
diff --git a/src/thread/pthread_setschedprio.c b/src/thread/pthread_setschedprio.c
index e0bdc03b..dc745b42 100644
--- a/src/thread/pthread_setschedprio.c
+++ b/src/thread/pthread_setschedprio.c
@@ -3,8 +3,8 @@
 int pthread_setschedprio(pthread_t t, int prio)
 {
 	int r;
-	__lock(t->killlock);
+	LOCK(t->killlock);
 	r = t->dead ? ESRCH : -__syscall(SYS_sched_setparam, t->tid, &prio);
-	__unlock(t->killlock);
+	UNLOCK(t->killlock);
 	return r;
 }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/8] separate the fast parts of __lock and __unlock into a .h file that may be used by other TU
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (3 preceding siblings ...)
  2017-06-20 19:41 ` [PATCH 2/8] consistently use the LOCK an UNLOCK macros Jens Gustedt
@ 2017-06-20 19:44 ` Jens Gustedt
  2017-06-20 20:35 ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Jens Gustedt
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-20 19:44 UTC (permalink / raw)
  To: musl

This provides two interfaces __lock_fast and __unlock_fast that are both
"static inline" and that should result in a better integration of the
lock in place. The slow path of the lock algorithm remains centralized,
here adding the overhead of a function call is not a big deal.

This should only be used by a TU that encapsulates all LOCK and
UNLOCK calls of a particular lock object.
---
 src/internal/__lock.h       | 22 ++++++++++++++++++++++
 src/internal/libc.h         |  1 +
 src/thread/__lock.c         | 22 ++++++----------------
 src/thread/pthread_create.c |  4 ++--
 4 files changed, 31 insertions(+), 18 deletions(-)
 create mode 100644 src/internal/__lock.h

diff --git a/src/internal/__lock.h b/src/internal/__lock.h
new file mode 100644
index 00000000..c1f07fc0
--- /dev/null
+++ b/src/internal/__lock.h
@@ -0,0 +1,22 @@
+#include "pthread_impl.h"
+
+static inline void __lock_fast(volatile int *l)
+{
+	extern void __lock_slow(volatile int*, int);
+	if (!libc.threads_minus_1) return;
+        /* fast path: INT_MIN for holding the lock, +1 to count this
+           thread in the critical section. */
+	int current = a_cas(l, 0, INT_MIN + 1);
+        if (!current) return;
+        __lock_slow(l, current);
+}
+
+static inline void __unlock_fast(volatile int *l)
+{
+	/* We have to check if l[0] had been touched at all. */
+	if (l[0] < 0) {
+		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
+			__wake(l, 1, 1);
+		}
+	}
+}
diff --git a/src/internal/libc.h b/src/internal/libc.h
index 5e145183..a594d0c5 100644
--- a/src/internal/libc.h
+++ b/src/internal/libc.h
@@ -47,6 +47,7 @@ extern size_t __sysinfo ATTR_LIBC_VISIBILITY;
 extern char *__progname, *__progname_full;
 
 /* Designed to avoid any overhead in non-threaded processes */
+void __lock_slow(volatile int *, int) ATTR_LIBC_VISIBILITY;
 void __lock(volatile int *) ATTR_LIBC_VISIBILITY;
 void __unlock(volatile int *) ATTR_LIBC_VISIBILITY;
 int __lockfile(FILE *) ATTR_LIBC_VISIBILITY;
diff --git a/src/thread/__lock.c b/src/thread/__lock.c
index 56092240..e612c6f9 100644
--- a/src/thread/__lock.c
+++ b/src/thread/__lock.c
@@ -1,12 +1,12 @@
 #include "pthread_impl.h"
 
-void __lock(volatile int *l)
+#include "__lock.h"
+
+weak_alias(__lock_fast, __lock);
+weak_alias(__unlock_fast, __unlock);
+
+void __lock_slow(volatile int *l, int current)
 {
-	if (!libc.threads_minus_1) return;
-        /* fast path: INT_MIN for holding the lock, +1 to count this
-           thread in the critical section. */
-	int current = a_cas(l, 0, INT_MIN + 1);
-        if (!current) return;
 	/* A first spin lock acquisition loop, for the case of
 	   medium congestion. */
 	for (unsigned i = 0; i < 10; ++i) {
@@ -35,13 +35,3 @@ void __lock(volatile int *l)
 		current = val;
 	}
 }
-
-void __unlock(volatile int *l)
-{
-	/* We have to check if l[0] had been touched at all. */
-	if (l[0] < 0) {
-		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
-			__wake(l, 1, 1);
-		}
-	}
-}
diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index 26945022..015d7bee 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -282,8 +282,8 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	if (!a_fetch_add(&libc.threads_minus_1, 1)) {
           // As long as we only have one thread, test if this supports
           // private futexes.
-          __lock_t dummy = { 0 };
-          if (__syscall(SYS_futex, dummy.lc, FUTEX_WAKE|FUTEX_PRIVATE, 0) != -ENOSYS)
+          volatile int dummy[1] = { 0 };
+          if (__syscall(SYS_futex, dummy, FUTEX_WAKE|FUTEX_PRIVATE, 0) != -ENOSYS)
 		__futex_private = FUTEX_PRIVATE;
         }
 	ret = __clone((c11 ? start_c11 : start), stack, flags, new, &new->tid, TP_ADJ(new), &new->tid);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (4 preceding siblings ...)
  2017-06-20 19:44 ` [PATCH 5/8] separate the fast parts of __lock and __unlock into a .h file that may be used by other TU Jens Gustedt
@ 2017-06-20 20:35 ` Jens Gustedt
  2017-06-23 17:05   ` Rich Felker
  2017-06-24 22:23   ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Rich Felker
  2017-06-22 21:17 ` [PATCH 7/8] implement __unlock_requeue Jens Gustedt
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-20 20:35 UTC (permalink / raw)
  To: musl

The current strategie to deal with kernels that don't implement private
futexes is to

 - test if the call works with FUTEX_PRIVATE
 - fallback to a call without if it doesn't

This forces an overhead for both sides, Linux'es with and without private
futexes. For those with, it adds a superflouous branch instruction to all
calls. For those without, it add the whole call overhead of a syscall.

The idea of this patch is simple. We determine dynamically but only once
when we switch from 1 thread to multiple threads if private futexes are
implemented. This reduces the overhead to
 - one system call in total
 - one load of a global and an OR of the flags at each futex call

In addition this system call will be as short as possible, because it
futexes on a local variable where nobody is to woken, and all this is in
a context (thread creation) that is clearly dominated by a lot of other
things.

This patch also clears an inconsistency in the code that had
FUTEX_PRIVATE written out as 128 in many places, which made it really
hard to digest and to identify what was going on.
---
 src/internal/pthread_impl.h         | 11 +++++------
 src/thread/__timedwait.c            |  3 +--
 src/thread/__wait.c                 |  5 ++---
 src/thread/pthread_barrier_wait.c   |  3 +--
 src/thread/pthread_cond_timedwait.c |  3 +--
 src/thread/pthread_create.c         | 10 +++++++++-
 6 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
index e1dd4421..02929e07 100644
--- a/src/internal/pthread_impl.h
+++ b/src/internal/pthread_impl.h
@@ -126,18 +126,17 @@ void __vm_unlock(void);
 int __timedwait(volatile int *, int, clockid_t, const struct timespec *, int);
 int __timedwait_cp(volatile int *, int, clockid_t, const struct timespec *, int);
 void __wait(volatile int *, volatile int *, int, int);
+extern int __futex_private;
 static inline void __wake(volatile void *addr, int cnt, int priv)
 {
-	if (priv) priv = 128;
+	if (priv) priv = __futex_private;
 	if (cnt<0) cnt = INT_MAX;
-	__syscall(SYS_futex, addr, FUTEX_WAKE|priv, cnt) != -ENOSYS ||
-	__syscall(SYS_futex, addr, FUTEX_WAKE, cnt);
+	__syscall(SYS_futex, addr, FUTEX_WAKE|priv, cnt);
 }
 static inline void __futexwait(volatile void *addr, int val, int priv)
 {
-	if (priv) priv = 128;
-	__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val) != -ENOSYS ||
-	__syscall(SYS_futex, addr, FUTEX_WAIT, val);
+	if (priv) priv = __futex_private;
+	__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val);
 }
 
 void __acquire_ptc(void);
diff --git a/src/thread/__timedwait.c b/src/thread/__timedwait.c
index 13d8465a..7f0f03e6 100644
--- a/src/thread/__timedwait.c
+++ b/src/thread/__timedwait.c
@@ -14,7 +14,7 @@ int __timedwait_cp(volatile int *addr, int val,
 	int r;
 	struct timespec to, *top=0;
 
-	if (priv) priv = 128;
+	if (priv) priv = __futex_private;
 
 	if (at) {
 		if (at->tv_nsec >= 1000000000UL) return EINVAL;
@@ -29,7 +29,6 @@ int __timedwait_cp(volatile int *addr, int val,
 	}
 
 	r = -__syscall_cp(SYS_futex, addr, FUTEX_WAIT|priv, val, top);
-	if (r == ENOSYS) r = -__syscall_cp(SYS_futex, addr, FUTEX_WAIT, val, top);
 	if (r != EINTR && r != ETIMEDOUT && r != ECANCELED) r = 0;
 
 	return r;
diff --git a/src/thread/__wait.c b/src/thread/__wait.c
index dc33c1a3..a3391b79 100644
--- a/src/thread/__wait.c
+++ b/src/thread/__wait.c
@@ -3,15 +3,14 @@
 void __wait(volatile int *addr, volatile int *waiters, int val, int priv)
 {
 	int spins=100;
-	if (priv) priv = FUTEX_PRIVATE;
+	if (priv) priv = __futex_private;
 	while (spins-- && (!waiters || !*waiters)) {
 		if (*addr==val) a_spin();
 		else return;
 	}
 	if (waiters) a_inc(waiters);
 	while (*addr==val) {
-		__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val, 0) != -ENOSYS
-		|| __syscall(SYS_futex, addr, FUTEX_WAIT, val, 0);
+		__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val, 0);
 	}
 	if (waiters) a_dec(waiters);
 }
diff --git a/src/thread/pthread_barrier_wait.c b/src/thread/pthread_barrier_wait.c
index 06b83db9..a89051e5 100644
--- a/src/thread/pthread_barrier_wait.c
+++ b/src/thread/pthread_barrier_wait.c
@@ -84,8 +84,7 @@ int pthread_barrier_wait(pthread_barrier_t *b)
 			a_spin();
 		a_inc(&inst->finished);
 		while (inst->finished == 1)
-			__syscall(SYS_futex,&inst->finished,FUTEX_WAIT|128,1,0) != -ENOSYS
-			|| __syscall(SYS_futex,&inst->finished,FUTEX_WAIT,1,0);
+			__syscall(SYS_futex,&inst->finished,FUTEX_WAIT|__futex_private,1,0);
 		return PTHREAD_BARRIER_SERIAL_THREAD;
 	}
 
diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index 3526ecfb..6d92ea88 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -54,8 +54,7 @@ static inline void unlock_requeue(volatile int *l, volatile int *r, int w)
 {
 	a_store(l, 0);
 	if (w) __wake(l, 1, 1);
-	else __syscall(SYS_futex, l, FUTEX_REQUEUE|128, 0, 1, r) != -ENOSYS
-		|| __syscall(SYS_futex, l, FUTEX_REQUEUE, 0, 1, r);
+	else __syscall(SYS_futex, l, FUTEX_REQUEUE|__futex_private, 0, 1, r);
 }
 
 enum {
diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
index ca5c6b90..26945022 100644
--- a/src/thread/pthread_create.c
+++ b/src/thread/pthread_create.c
@@ -10,6 +10,8 @@ void *__mmap(void *, size_t, int, int, int, off_t);
 int __munmap(void *, size_t);
 int __mprotect(void *, size_t, int);
 
+int __futex_private = 0;
+
 static void dummy_0()
 {
 }
@@ -277,7 +279,13 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
 	new->unblock_cancel = self->cancel;
 	new->CANARY = self->CANARY;
 
-	a_inc(&libc.threads_minus_1);
+	if (!a_fetch_add(&libc.threads_minus_1, 1)) {
+          // As long as we only have one thread, test if this supports
+          // private futexes.
+          __lock_t dummy = { 0 };
+          if (__syscall(SYS_futex, dummy.lc, FUTEX_WAKE|FUTEX_PRIVATE, 0) != -ENOSYS)
+		__futex_private = FUTEX_PRIVATE;
+        }
 	ret = __clone((c11 ? start_c11 : start), stack, flags, new, &new->tid, TP_ADJ(new), &new->tid);
 
 	__release_ptc();


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 7/8] implement __unlock_requeue
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (5 preceding siblings ...)
  2017-06-20 20:35 ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Jens Gustedt
@ 2017-06-22 21:17 ` Jens Gustedt
  2017-06-22 21:42 ` [PATCH 8/8] implement the local lock for conditions with __lock & Co Jens Gustedt
  2017-06-23 14:57 ` [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-22 21:17 UTC (permalink / raw)
  To: musl

We need a special case of unlock when we might take advantage of the
futex requeue feature to move a thread from a condition queue to the one
of the correponding mutex
---
 src/internal/__lock.h |  8 ++++++++
 src/internal/libc.h   |  2 ++
 src/thread/__lock.c   | 16 ++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/src/internal/__lock.h b/src/internal/__lock.h
index c1f07fc0..a96d159f 100644
--- a/src/internal/__lock.h
+++ b/src/internal/__lock.h
@@ -20,3 +20,11 @@ static inline void __unlock_fast(volatile int *l)
 		}
 	}
 }
+
+static inline void __unlock_requeue_fast(volatile int *l, volatile int *r, int shared)
+{
+	extern void __unlock_requeue_slow(volatile int*, volatile int*, int);
+	/* We have to check if l[0] had been touched at all. */
+	if (l[0] < 0 && (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)))
+          __unlock_requeue_slow(l, r, shared);
+}
diff --git a/src/internal/libc.h b/src/internal/libc.h
index a594d0c5..999b4cba 100644
--- a/src/internal/libc.h
+++ b/src/internal/libc.h
@@ -50,6 +50,8 @@ extern char *__progname, *__progname_full;
 void __lock_slow(volatile int *, int) ATTR_LIBC_VISIBILITY;
 void __lock(volatile int *) ATTR_LIBC_VISIBILITY;
 void __unlock(volatile int *) ATTR_LIBC_VISIBILITY;
+void __unlock_requeue(volatile int *, volatile int *, int) ATTR_LIBC_VISIBILITY;
+void __unlock_requeue_slow(volatile int *, volatile int *, int) ATTR_LIBC_VISIBILITY;
 int __lockfile(FILE *) ATTR_LIBC_VISIBILITY;
 void __unlockfile(FILE *) ATTR_LIBC_VISIBILITY;
 #define LOCK(x) __lock(x)
diff --git a/src/thread/__lock.c b/src/thread/__lock.c
index e612c6f9..6516e2e1 100644
--- a/src/thread/__lock.c
+++ b/src/thread/__lock.c
@@ -4,6 +4,22 @@
 
 weak_alias(__lock_fast, __lock);
 weak_alias(__unlock_fast, __unlock);
+weak_alias(__unlock_requeue_fast, __unlock_requeue);
+
+/* __unlock_requeue handles the case where we have a second futex
+   queue to which the a waiting thread should be scheduled if shared
+   is false. This is e.g the case when l is the futex of a condition
+   variable and r is the one of a private mutex: if the mutex is
+   private, it is more efficient to just insert the thread in the
+   queue of that mutex, instead of waking him up and put it back to
+   sleep immediately when trying to lock the mutex. */
+
+void __unlock_requeue_slow(volatile int *l, volatile int *r, int shared)
+{
+	if (shared) __wake(l, 1, 1);
+	else __syscall(SYS_futex, l, FUTEX_REQUEUE|__futex_private, 0, 1, r);
+}
+
 
 void __lock_slow(volatile int *l, int current)
 {


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 8/8] implement the local lock for conditions with __lock & Co
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (6 preceding siblings ...)
  2017-06-22 21:17 ` [PATCH 7/8] implement __unlock_requeue Jens Gustedt
@ 2017-06-22 21:42 ` Jens Gustedt
  2017-06-23 14:57 ` [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-22 21:42 UTC (permalink / raw)
  To: musl

Condition variables (pthread and C11) had a specialized lock
implementation that already had been designed such that no access to the
variable is made after an unlock. Here this was not done because a
variable might be freed but because the implementation uses variables on
the thread's stack to chain the access.

The previous implementation only held two different internal states, one
when there is no congestion and one when there is. The latter is then
used to wake up other threads if necessary.

The new lock structure deals with such situation more adequately, because
it keeps track of the exact congestion and allows us to react
appropriately.
---
 src/thread/pthread_cond_timedwait.c | 48 ++++++++++++-------------------------
 1 file changed, 15 insertions(+), 33 deletions(-)

diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index 6d92ea88..873fac2c 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -1,5 +1,11 @@
 #include "pthread_impl.h"
 
+#if defined(__GNUC__) && defined(__PIC__)
+#define inline inline __attribute__((always_inline))
+#endif
+
+#include "__lock.h"
+
 void __pthread_testcancel(void);
 int __pthread_mutex_lock(pthread_mutex_t *);
 int __pthread_mutex_unlock(pthread_mutex_t *);
@@ -33,30 +39,6 @@ struct waiter {
 	volatile int *notify;
 };
 
-/* Self-synchronized-destruction-safe lock functions */
-
-static inline void lock(volatile int *l)
-{
-	if (a_cas(l, 0, 1)) {
-		a_cas(l, 1, 2);
-		do __wait(l, 0, 2, 1);
-		while (a_cas(l, 0, 2));
-	}
-}
-
-static inline void unlock(volatile int *l)
-{
-	if (a_swap(l, 0)==2)
-		__wake(l, 1, 1);
-}
-
-static inline void unlock_requeue(volatile int *l, volatile int *r, int w)
-{
-	a_store(l, 0);
-	if (w) __wake(l, 1, 1);
-	else __syscall(SYS_futex, l, FUTEX_REQUEUE|__futex_private, 0, 1, r);
-}
-
 enum {
 	WAITING,
 	SIGNALED,
@@ -83,7 +65,7 @@ int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restri
 		seq = c->_c_seq;
 		a_inc(&c->_c_waiters);
 	} else {
-		lock(&c->_c_lock);
+		__lock_fast(&c->_c_lock);
 
 		seq = node.barrier = 2;
 		fut = &node.barrier;
@@ -93,7 +75,7 @@ int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restri
 		if (!c->_c_tail) c->_c_tail = &node;
 		else node.next->prev = &node;
 
-		unlock(&c->_c_lock);
+		__unlock_fast(&c->_c_lock);
 	}
 
 	__pthread_mutex_unlock(m);
@@ -124,14 +106,14 @@ int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restri
 		 * after seeing a LEAVING waiter without getting notified
 		 * via the futex notify below. */
 
-		lock(&c->_c_lock);
+		__lock_fast(&c->_c_lock);
 		
 		if (c->_c_head == &node) c->_c_head = node.next;
 		else if (node.prev) node.prev->next = node.next;
 		if (c->_c_tail == &node) c->_c_tail = node.prev;
 		else if (node.next) node.next->prev = node.prev;
 		
-		unlock(&c->_c_lock);
+		__unlock_fast(&c->_c_lock);
 
 		if (node.notify) {
 			if (a_fetch_add(node.notify, -1)==1)
@@ -139,7 +121,7 @@ int __pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restri
 		}
 	} else {
 		/* Lock barrier first to control wake order. */
-		lock(&node.barrier);
+		__lock_fast(&node.barrier);
 	}
 
 relock:
@@ -155,7 +137,7 @@ relock:
 	/* Unlock the barrier that's holding back the next waiter, and
 	 * either wake it or requeue it to the mutex. */
 	if (node.prev)
-		unlock_requeue(&node.prev->barrier, &m->_m_lock, m->_m_type & 128);
+		__unlock_requeue_fast(&node.prev->barrier, &m->_m_lock, m->_m_type & 128);
 	else
 		a_dec(&m->_m_waiters);
 
@@ -179,7 +161,7 @@ int __private_cond_signal(pthread_cond_t *c, int n)
 	volatile int ref = 0;
 	int cur;
 
-	lock(&c->_c_lock);
+	__lock_fast(&c->_c_lock);
 	for (p=c->_c_tail; n && p; p=p->prev) {
 		if (a_cas(&p->state, WAITING, SIGNALED) != WAITING) {
 			ref++;
@@ -197,7 +179,7 @@ int __private_cond_signal(pthread_cond_t *c, int n)
 		c->_c_head = 0;
 	}
 	c->_c_tail = p;
-	unlock(&c->_c_lock);
+	__unlock_fast(&c->_c_lock);
 
 	/* Wait for any waiters in the LEAVING state to remove
 	 * themselves from the list before returning or allowing
@@ -205,7 +187,7 @@ int __private_cond_signal(pthread_cond_t *c, int n)
 	while ((cur = ref)) __wait(&ref, 0, cur, 1);
 
 	/* Allow first signaled waiter, if any, to proceed. */
-	if (first) unlock(&first->barrier);
+	if (first) __unlock_fast(&first->barrier);
 
 	return 0;
 }


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/8] the new __lock and follow up patches
@ 2017-06-23 14:38 Jens Gustedt
  2017-06-16  7:11 ` [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int Jens Gustedt
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 14:38 UTC (permalink / raw)
  To: musl

Here comes a 

Jens Gustedt (8):
  (V2) a new lock algorithm with lock value and CS counts in the same
    atomic int
  consistently use the LOCK an UNLOCK macros
  revise the definition of multiple basic locks in the code
  determine the existence of private futexes at the first thread
    creation
  separate the fast parts of __lock and __unlock into a .h file that may
    be used by other TU
  use the new lock algorithm for malloc
  implement __unlock_requeue
  implement the local lock for conditions with __lock & Co

 src/dirent/__dirent.h               |  2 +-
 src/exit/at_quick_exit.c            |  2 +-
 src/exit/atexit.c                   |  2 +-
 src/internal/__lock.h               | 30 +++++++++++++++++++++
 src/internal/libc.h                 |  3 +++
 src/internal/pthread_impl.h         | 16 ++++++++----
 src/locale/dcngettext.c             |  2 +-
 src/locale/locale_map.c             |  2 +-
 src/locale/setlocale.c              |  2 +-
 src/malloc/lite_malloc.c            |  2 +-
 src/malloc/malloc.c                 | 38 ++++++++++-----------------
 src/misc/syslog.c                   |  2 +-
 src/prng/random.c                   |  2 +-
 src/stdio/ofl.c                     |  2 +-
 src/thread/__lock.c                 | 52 ++++++++++++++++++++++++++++++++-----
 src/thread/__timedwait.c            |  3 +--
 src/thread/__wait.c                 |  5 ++--
 src/thread/pthread_atfork.c         |  2 +-
 src/thread/pthread_barrier_wait.c   |  3 +--
 src/thread/pthread_cond_timedwait.c | 49 +++++++++++-----------------------
 src/thread/pthread_create.c         | 16 +++++++++---
 src/thread/pthread_detach.c         |  4 +--
 src/thread/pthread_getschedparam.c  |  4 +--
 src/thread/pthread_kill.c           |  4 +--
 src/thread/pthread_setschedparam.c  |  4 +--
 src/thread/pthread_setschedprio.c   |  4 +--
 src/thread/sem_open.c               |  2 +-
 src/thread/synccall.c               |  2 +-
 src/time/__tz.c                     |  2 +-
 29 files changed, 157 insertions(+), 106 deletions(-)
 create mode 100644 src/internal/__lock.h


base-commit: 8fe1f2d79b275b7f7fb0d41c99e379357df63cd9


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/8] the new __lock and follow up patches
  2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
                   ` (7 preceding siblings ...)
  2017-06-22 21:42 ` [PATCH 8/8] implement the local lock for conditions with __lock & Co Jens Gustedt
@ 2017-06-23 14:57 ` Jens Gustedt
  8 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 14:57 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3271 bytes --]

Sorry,
somehow the contents of this mail was eaten, sorry, so here is the real
thing that I wanted to add to this patch series.

This is merely a series to make a point such that I may have feedback
if I am on the right track. The V2 patch is unchanged to what I have
sent before and on that I add several ideas of what could be useful and
easy to do without changing the lock-logic of musl too much.

Some of these are merely independent and may be factored out, but it was
easier to present them here as one series.
Thanks
Jens

PS: I also see that 3/8 has been cut so I'll send it again

On Fri, 23 Jun 2017 16:38:53 +0200 Jens Gustedt <Jens.Gustedt@inria.fr>
wrote:

> Here comes a 
> 
> Jens Gustedt (8):
>   (V2) a new lock algorithm with lock value and CS counts in the same
>     atomic int
>   consistently use the LOCK an UNLOCK macros
>   revise the definition of multiple basic locks in the code
>   determine the existence of private futexes at the first thread
>     creation
>   separate the fast parts of __lock and __unlock into a .h file that
> may be used by other TU
>   use the new lock algorithm for malloc
>   implement __unlock_requeue
>   implement the local lock for conditions with __lock & Co
> 
>  src/dirent/__dirent.h               |  2 +-
>  src/exit/at_quick_exit.c            |  2 +-
>  src/exit/atexit.c                   |  2 +-
>  src/internal/__lock.h               | 30 +++++++++++++++++++++
>  src/internal/libc.h                 |  3 +++
>  src/internal/pthread_impl.h         | 16 ++++++++----
>  src/locale/dcngettext.c             |  2 +-
>  src/locale/locale_map.c             |  2 +-
>  src/locale/setlocale.c              |  2 +-
>  src/malloc/lite_malloc.c            |  2 +-
>  src/malloc/malloc.c                 | 38 ++++++++++-----------------
>  src/misc/syslog.c                   |  2 +-
>  src/prng/random.c                   |  2 +-
>  src/stdio/ofl.c                     |  2 +-
>  src/thread/__lock.c                 | 52
> ++++++++++++++++++++++++++++++++-----
> src/thread/__timedwait.c            |  3 +--
> src/thread/__wait.c                 |  5 ++--
> src/thread/pthread_atfork.c         |  2 +-
> src/thread/pthread_barrier_wait.c   |  3 +--
> src/thread/pthread_cond_timedwait.c | 49
> +++++++++++-----------------------
> src/thread/pthread_create.c         | 16 +++++++++---
> src/thread/pthread_detach.c         |  4 +--
> src/thread/pthread_getschedparam.c  |  4 +--
> src/thread/pthread_kill.c           |  4 +--
> src/thread/pthread_setschedparam.c  |  4 +--
> src/thread/pthread_setschedprio.c   |  4 +--
> src/thread/sem_open.c               |  2 +-
> src/thread/synccall.c               |  2 +-
> src/time/__tz.c                     |  2 +- 29 files changed, 157
> insertions(+), 106 deletions(-) create mode 100644
> src/internal/__lock.h
> 
> 
> base-commit: 8fe1f2d79b275b7f7fb0d41c99e379357df63cd9



Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] use the new lock algorithm for malloc
  2017-06-20 19:08 ` [PATCH 6/8] use the new lock algorithm for malloc Jens Gustedt
@ 2017-06-23 15:01   ` Jens Gustedt
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 15:01 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]

sorry this mail was cut, so here the patch comes again

From 16e02eb8cc008bb6e160da72df9b69e361015938 Mon Sep 17 00:00:00 2001
Message-Id:
<16e02eb8cc008bb6e160da72df9b69e361015938.1498228733.git.Jens.Gustedt@inria.fr>
In-Reply-To: <cover.1498228733.git.Jens.Gustedt@inria.fr> References:
<cover.1498228733.git.Jens.Gustedt@inria.fr> From: Jens Gustedt
<Jens.Gustedt@inria.fr> Date: Tue, 20 Jun 2017 21:08:30 +0200
Subject: [PATCH 6/8] use the new lock algorithm for malloc
To: musl@lists.openwall.com

Malloc used a specialized lock implementation in many places. Now that
we have a generic lock that has the desired properties, we should just
use this, instead of this multitude of very similar lock mechanisms.

From the original benchmarks that I did it should follow that the
performance of this new code should be at least as good as the old
one. But this might vary according to compilers, platforms, room
temperature, mood ... So feedback on this would be much appreciated.
---
 src/malloc/malloc.c | 38 +++++++++++++-------------------------
 1 file changed, 13 insertions(+), 25 deletions(-)

diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index c38c46fe..dc00e4a7 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -13,6 +13,8 @@
 #define inline inline __attribute__((always_inline))
 #endif
 
+#include "__lock.h"
+
 void *__mmap(void *, size_t, int, int, int, off_t);
 int __munmap(void *, size_t);
 void *__mremap(void *, size_t, size_t, int, ...);
@@ -24,7 +26,7 @@ struct chunk {
 };
 
 struct bin {
-	volatile int lock[2];
+	volatile int lock[1];
 	struct chunk *head;
 	struct chunk *tail;
 };
@@ -32,7 +34,7 @@ struct bin {
 static struct {
 	volatile uint64_t binmap;
 	struct bin bins[64];
-	volatile int free_lock[2];
+	volatile int free_lock[1];
 } mal;
 
 
@@ -58,30 +60,16 @@ static struct {
 
 /* Synchronization tools */
 
-static inline void lock(volatile int *lk)
-{
-	if (libc.threads_minus_1)
-		while(a_swap(lk, 1)) __wait(lk, lk+1, 1, 1);
-}
-
-static inline void unlock(volatile int *lk)
-{
-	if (lk[0]) {
-		a_store(lk, 0);
-		if (lk[1]) __wake(lk, 1, 1);
-	}
-}
-
 static inline void lock_bin(int i)
 {
-	lock(mal.bins[i].lock);
+	__lock_fast(mal.bins[i].lock);
 	if (!mal.bins[i].head)
 		mal.bins[i].head = mal.bins[i].tail = BIN_TO_CHUNK(i);
 }
 
 static inline void unlock_bin(int i)
 {
-	unlock(mal.bins[i].lock);
+	__unlock_fast(mal.bins[i].lock);
 }
 
 static int first_set(uint64_t x)
@@ -161,7 +149,7 @@ void *__expand_heap(size_t *);
 
 static struct chunk *expand_heap(size_t n)
 {
-	static int heap_lock[2];
+	static volatile int heap_lock[1];
 	static void *end;
 	void *p;
 	struct chunk *w;
@@ -171,11 +159,11 @@ static struct chunk *expand_heap(size_t n)
 	 * we need room for an extra zero-sized sentinel chunk. */
 	n += SIZE_ALIGN;
 
-	lock(heap_lock);
+	__lock_fast(heap_lock);
 
 	p = __expand_heap(&n);
 	if (!p) {
-		unlock(heap_lock);
+		__unlock_fast(heap_lock);
 		return 0;
 	}
 
@@ -200,7 +188,7 @@ static struct chunk *expand_heap(size_t n)
 	w = MEM_TO_CHUNK(p);
 	w->csize = n | C_INUSE;
 
-	unlock(heap_lock);
+	__unlock_fast(heap_lock);
 
 	return w;
 }
@@ -479,10 +467,10 @@ void free(void *p)
 			next->psize = final_size | C_INUSE;
 			i = bin_index(final_size);
 			lock_bin(i);
-			lock(mal.free_lock);
+			__lock_fast(mal.free_lock);
 			if (self->psize & next->csize & C_INUSE)
 				break;
-			unlock(mal.free_lock);
+			__unlock_fast(mal.free_lock);
 			unlock_bin(i);
 		}
 
@@ -508,7 +496,7 @@ void free(void *p)
 
 	self->csize = final_size;
 	next->psize = final_size;
-	unlock(mal.free_lock);
+	__unlock_fast(mal.free_lock);
 
 	self->next = BIN_TO_CHUNK(i);
 	self->prev = mal.bins[i].tail;

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-20 20:35 ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Jens Gustedt
@ 2017-06-23 17:05   ` Rich Felker
  2017-06-23 17:16     ` Jens Gustedt
  2017-06-23 21:48     ` Jens Gustedt
  2017-06-24 22:23   ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Rich Felker
  1 sibling, 2 replies; 27+ messages in thread
From: Rich Felker @ 2017-06-23 17:05 UTC (permalink / raw)
  To: musl

On Tue, Jun 20, 2017 at 10:35:28PM +0200, Jens Gustedt wrote:
> The current strategie to deal with kernels that don't implement private
> futexes is to
> 
>  - test if the call works with FUTEX_PRIVATE
>  - fallback to a call without if it doesn't
> 
> This forces an overhead for both sides, Linux'es with and without private
> futexes. For those with, it adds a superflouous branch instruction to all
> calls. For those without, it add the whole call overhead of a syscall.

This was intentional, the idea being that a 100% predictable branch in
a path where a syscall is being made anyway is much less expensive
than a GOT address load that gets hoisted all the way to the top of
the function and affects even code paths that don't need to make the
syscall. Whether it was a choice that makes sense overall, I'm not
sure, but that was the intent.

Rich


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 17:05   ` Rich Felker
@ 2017-06-23 17:16     ` Jens Gustedt
  2017-06-23 21:48     ` Jens Gustedt
  1 sibling, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 17:16 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

Hello Rich,

On Fri, 23 Jun 2017 13:05:35 -0400 Rich Felker <dalias@libc.org> wrote:

> On Tue, Jun 20, 2017 at 10:35:28PM +0200, Jens Gustedt wrote:
> > The current strategie to deal with kernels that don't implement
> > private futexes is to
> > 
> >  - test if the call works with FUTEX_PRIVATE
> >  - fallback to a call without if it doesn't
> > 
> > This forces an overhead for both sides, Linux'es with and without
> > private futexes. For those with, it adds a superflouous branch
> > instruction to all calls. For those without, it add the whole call
> > overhead of a syscall.  
> 
> This was intentional, the idea being that a 100% predictable branch in
> a path where a syscall is being made anyway is much less expensive
> than a GOT address load that gets hoisted all the way to the top of
> the function and affects even code paths that don't need to make the
> syscall. Whether it was a choice that makes sense overall, I'm not
> sure, but that was the intent.

Ok, I see. I came to it because I noticed that it might have real
impact on the inline optimization that was e.g suggested for __wake.

Also, as observed while doing this, this mixing of 128 and
FUTEX_PRIVATE in the original code is really disruptive. Not
withstanding that 128 is also the (explicitly used) value of the flag
for mutexes, but that it has exactly the opposite meaning there, its
presence means that the mutex is shared.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 17:05   ` Rich Felker
  2017-06-23 17:16     ` Jens Gustedt
@ 2017-06-23 21:48     ` Jens Gustedt
  2017-06-23 22:08       ` Rich Felker
  1 sibling, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 21:48 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

Hello Rich,

On Fri, 23 Jun 2017 13:05:35 -0400 Rich Felker <dalias@libc.org> wrote:

> This was intentional, the idea being that a 100% predictable branch in
> a path where a syscall is being made anyway is much less expensive
> than a GOT address load that gets hoisted all the way to the top of
> the function and affects even code paths that don't need to make the
> syscall. Whether it was a choice that makes sense overall, I'm not
> sure, but that was the intent.

So if we can avoid going through GOT, this would be better?
I'd just add ATTR_LIBC_VISIBILITY to the variable, and then this
should go away the same way as it is done for the libc object.


Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 21:48     ` Jens Gustedt
@ 2017-06-23 22:08       ` Rich Felker
  2017-06-23 23:42         ` Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Rich Felker @ 2017-06-23 22:08 UTC (permalink / raw)
  To: musl

On Fri, Jun 23, 2017 at 11:48:25PM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Fri, 23 Jun 2017 13:05:35 -0400 Rich Felker <dalias@libc.org> wrote:
> 
> > This was intentional, the idea being that a 100% predictable branch in
> > a path where a syscall is being made anyway is much less expensive
> > than a GOT address load that gets hoisted all the way to the top of
> > the function and affects even code paths that don't need to make the
> > syscall. Whether it was a choice that makes sense overall, I'm not
> > sure, but that was the intent.
> 
> So if we can avoid going through GOT, this would be better?
> I'd just add ATTR_LIBC_VISIBILITY to the variable, and then this
> should go away the same way as it is done for the libc object.

It's not going through the GOT that's costly, but actually getting the
GOT address, which is used for both accesses through the GOT and
GOT-relative addressing. On several archs including i386, PC-relative
addressing is not directly available and requires hacks to load the PC
into a GPR, and these usually take some cycles themselves and spill
out of the free call-clobbered registers so that additional stack
shuffling is needed.

Rich


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 22:08       ` Rich Felker
@ 2017-06-23 23:42         ` Jens Gustedt
  2017-06-24  0:53           ` Rich Felker
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-23 23:42 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2391 bytes --]

Hello Rich,

On Fri, 23 Jun 2017 18:08:23 -0400 Rich Felker <dalias@libc.org> wrote:

> On Fri, Jun 23, 2017 at 11:48:25PM +0200, Jens Gustedt wrote:
> > Hello Rich,
> > 
> > On Fri, 23 Jun 2017 13:05:35 -0400 Rich Felker <dalias@libc.org>
> > wrote: 
> > > This was intentional, the idea being that a 100% predictable
> > > branch in a path where a syscall is being made anyway is much
> > > less expensive than a GOT address load that gets hoisted all the
> > > way to the top of the function and affects even code paths that
> > > don't need to make the syscall. Whether it was a choice that
> > > makes sense overall, I'm not sure, but that was the intent.  
> > 
> > So if we can avoid going through GOT, this would be better?
> > I'd just add ATTR_LIBC_VISIBILITY to the variable, and then this
> > should go away the same way as it is done for the libc object.  
> 
> It's not going through the GOT that's costly, but actually getting the
> GOT address, which is used for both accesses through the GOT and
> GOT-relative addressing. On several archs including i386, PC-relative
> addressing is not directly available and requires hacks to load the PC
> into a GPR, and these usually take some cycles themselves and spill
> out of the free call-clobbered registers so that additional stack
> shuffling is needed.

So you are saying that when I add ATTR_LIBC_VISIBILITY
and see something like

	movslq	__futex_private(%rip), %rsi

in the assembler (instead of GOP stuff), this is actually going through
such a complicated mechanism? But then the same penalty applies for
members of the "libc" object, doesn't it? E.g __lock accesses
"libc.threads_minus_1" which results in something like

	movl	12+__libc(%rip), %eax

In any case, all of this is probably not so important in view of the
system call that is happening right after. So let's just drop it.

What would you think of a patch that just cleans up the 128 vs
FUTEX_PRIVATE issue? Just to improve readability?

Also there is this missing volatile in __get_locale.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-23 23:42         ` Jens Gustedt
@ 2017-06-24  0:53           ` Rich Felker
  2017-06-24  8:00             ` Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Rich Felker @ 2017-06-24  0:53 UTC (permalink / raw)
  To: musl

On Sat, Jun 24, 2017 at 01:42:20AM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Fri, 23 Jun 2017 18:08:23 -0400 Rich Felker <dalias@libc.org> wrote:
> 
> > On Fri, Jun 23, 2017 at 11:48:25PM +0200, Jens Gustedt wrote:
> > > Hello Rich,
> > > 
> > > On Fri, 23 Jun 2017 13:05:35 -0400 Rich Felker <dalias@libc.org>
> > > wrote: 
> > > > This was intentional, the idea being that a 100% predictable
> > > > branch in a path where a syscall is being made anyway is much
> > > > less expensive than a GOT address load that gets hoisted all the
> > > > way to the top of the function and affects even code paths that
> > > > don't need to make the syscall. Whether it was a choice that
> > > > makes sense overall, I'm not sure, but that was the intent.  
> > > 
> > > So if we can avoid going through GOT, this would be better?
> > > I'd just add ATTR_LIBC_VISIBILITY to the variable, and then this
> > > should go away the same way as it is done for the libc object.  
> > 
> > It's not going through the GOT that's costly, but actually getting the
> > GOT address, which is used for both accesses through the GOT and
> > GOT-relative addressing. On several archs including i386, PC-relative
> > addressing is not directly available and requires hacks to load the PC
> > into a GPR, and these usually take some cycles themselves and spill
> > out of the free call-clobbered registers so that additional stack
> > shuffling is needed.
> 
> So you are saying that when I add ATTR_LIBC_VISIBILITY
> and see something like
> 
> 	movslq	__futex_private(%rip), %rsi

i386 does not have (%rip). x86_64 is one of the archs with very
efficient PC-relative addressing.

> What would you think of a patch that just cleans up the 128 vs
> FUTEX_PRIVATE issue? Just to improve readability?

That seems like a good thing. Regarding the mutex flag, I thought some
places we depended on the 128 shared mutex flag being the inverse of
the FUTEX_PRIVATE flag (so ^128 translates between them); if this is
the case, do you have an elegant way to make it work?

> Also there is this missing volatile in __get_locale.

Nice catch.

Rich


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-24  0:53           ` Rich Felker
@ 2017-06-24  8:00             ` Jens Gustedt
  2017-06-24 10:12               ` flag 128 Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-24  8:00 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

Hello Rich,

On Fri, 23 Jun 2017 20:53:48 -0400 Rich Felker <dalias@libc.org> wrote:

> On Sat, Jun 24, 2017 at 01:42:20AM +0200, Jens Gustedt wrote:
> > What would you think of a patch that just cleans up the 128 vs
> > FUTEX_PRIVATE issue? Just to improve readability?  
> 
> That seems like a good thing. Regarding the mutex flag, I thought some
> places we depended on the 128 shared mutex flag being the inverse of
> the FUTEX_PRIVATE flag (so ^128 translates between them); if this is
> the case, do you have an elegant way to make it work?

The usages I found (I suppose this is all in src/thread/) only have
^128 to produce a "Boolean" int "priv" that is passed to __wake and
alike. There we always have something like

if (priv) priv = FUTEX_PRIVATE; (or 128)

In the case of __wake, which is inlined, the compiler might be able to
shortcut that conditional branch, but that should be independent from
using the macro or the literal.

> > Also there is this missing volatile in __get_locale.  
> 
> Nice catch.

I'll prepare patches for both issues.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* flag 128
  2017-06-24  8:00             ` Jens Gustedt
@ 2017-06-24 10:12               ` Jens Gustedt
  2017-06-24 22:28                 ` Rich Felker
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-06-24 10:12 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

There is also one weird use of flag 128 in sem_init. Semantically it
encodes "private" so the complement of a shared flag as used for
mutexes:

	sem->__val[2] = pshared ? 0 : 128;

Musl only uses this as a Boolean, so the semantic would be the same if
we'd just have

	sem->__val[2] = !pshared;

Do you have any recollection if the "128" was needed for
compatibility, or is this just an artefact?

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] determine the existence of private futexes at the first thread creation
  2017-06-20 20:35 ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Jens Gustedt
  2017-06-23 17:05   ` Rich Felker
@ 2017-06-24 22:23   ` Rich Felker
  1 sibling, 0 replies; 27+ messages in thread
From: Rich Felker @ 2017-06-24 22:23 UTC (permalink / raw)
  To: musl

On Tue, Jun 20, 2017 at 10:35:28PM +0200, Jens Gustedt wrote:
> diff --git a/src/thread/pthread_create.c b/src/thread/pthread_create.c
> index ca5c6b90..26945022 100644
> --- a/src/thread/pthread_create.c
> +++ b/src/thread/pthread_create.c
> @@ -10,6 +10,8 @@ void *__mmap(void *, size_t, int, int, int, off_t);
>  int __munmap(void *, size_t);
>  int __mprotect(void *, size_t, int);
>  
> +int __futex_private = 0;
> +
>  static void dummy_0()
>  {
>  }
> @@ -277,7 +279,13 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
>  	new->unblock_cancel = self->cancel;
>  	new->CANARY = self->CANARY;
>  
> -	a_inc(&libc.threads_minus_1);
> +	if (!a_fetch_add(&libc.threads_minus_1, 1)) {
> +          // As long as we only have one thread, test if this supports
> +          // private futexes.
> +          __lock_t dummy = { 0 };
> +          if (__syscall(SYS_futex, dummy.lc, FUTEX_WAKE|FUTEX_PRIVATE, 0) != -ENOSYS)
> +		__futex_private = FUTEX_PRIVATE;
> +        }
>  	ret = __clone((c11 ? start_c11 : start), stack, flags, new, &new->tid, TP_ADJ(new), &new->tid);

For future reference, this is also the wrong way to do "on first
thread creation" -- it will trigger every time the number of threads
changes from 1 to 2. There's already a block at the top of the
function for first thread creation, that sets up stdio for
multithreaded use and such; that would be the place to put this.

I was also momentarily concerned about whether there are cases where a
single-threaded process needs to be able to use the futex ops, but I
think it probably doesn't matter except for the not-private case
(pshared sync objects).

Rich


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: flag 128
  2017-06-24 10:12               ` flag 128 Jens Gustedt
@ 2017-06-24 22:28                 ` Rich Felker
  0 siblings, 0 replies; 27+ messages in thread
From: Rich Felker @ 2017-06-24 22:28 UTC (permalink / raw)
  To: musl

On Sat, Jun 24, 2017 at 12:12:23PM +0200, Jens Gustedt wrote:
> There is also one weird use of flag 128 in sem_init. Semantically it
> encodes "private" so the complement of a shared flag as used for
> mutexes:
> 
> 	sem->__val[2] = pshared ? 0 : 128;
> 
> Musl only uses this as a Boolean, so the semantic would be the same if
> we'd just have
> 
> 	sem->__val[2] = !pshared;
> 
> Do you have any recollection if the "128" was needed for
> compatibility, or is this just an artefact?

I think it was just an artefact of wanting to be able to pass it
directly to __wait and __wake and have them use it directly, which
they did a long time ago -- but the code was disabled because it
didn't work on kernels without private futex support. Ever since we've
actually used private flag, I'm pretty sure they've always treated it
as a boolean.

Note that it is preferable not to break the representation of pshared
objects (for ABI-compat across static binaries with different
versions) but since the value is just passed to __wake/__wait, I don't
think it matters if it's 1 or 128. It might be best to just leave it
alone for now though -- fewer gratuitous changes to worry about, less
risk of breaking things.

Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int
  2017-06-16  7:11 ` [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int Jens Gustedt
@ 2017-12-20 21:58   ` Rich Felker
  2017-12-21 11:06     ` Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Rich Felker @ 2017-12-20 21:58 UTC (permalink / raw)
  To: musl

On Fri, Jun 16, 2017 at 09:11:35AM +0200, Jens Gustedt wrote:
> A variant of this new lock algorithm has been presented at SAC'16, see
> https://hal.inria.fr/hal-01304108. A full version of that paper is
> available at https://hal.inria.fr/hal-01236734.

Finally getting back to this, and trying to finish it in the current
(1.1.19) release cycle...

> The main motivation of this is to improve on the safety of the basic lock
> implementation in musl. This is achieved by squeezing lock value and
> "waiter" count into a single int. Thereby an unlock operation does
> exactly one memory transfer (a_fetch_add) and never touches the value
> again, but still detects if a waiter has to be woken up.
> 
> Another effect could be improved performance, but don't take this too
> seriously, these lock/unlock functions are probably never used in
> performance critical parts of libc.

Given this, I'm still mildly skeptical of the spin tuning and addition
of the new __futexwait inline with a single caller.

> Depending on the architecture, the performance of the previous lock
> strategy was not always so great under no or medium congestion. Now, in
> case of no congestion, there are exactly two memory operations, one to
> lock an one to unlock. So if we would start to use this same new strategy
> also for user locks such as mutexes, semaphores or the locks used for
> lockfull atomics, we might see a performance improvement.

Indeed, but it would only be usable for some cases (for mutexes, only
those without owner tracking), and seems like it would add some
complexity. I'm not opposed to looking into it at some point, but
let's get the internal lock stuff done first. When updating, can you
trim down the proposed commit message a bit (or do you mind if I do
so?) to focus on the change being made and the motivation without
going into lots of speculation about related changes that could be
made?

> Also, the previous algorithm could be terrible under high load. Key to
> the better performance is, again, the combination of lock value and
> "waiter" count into one atomic entity. This drastically reduces the
> memory transfers that have to be performed under load. In particular our
> main acquisition loop changes this atomic int exactly once, namely when
> the lock is acquired, and so the mutual disturbance between acquiring
> threads is reduced. Again, this is probably not an issue, because this
> lock/unlock algorithm is *not* used in places that arbitrarily hammer
> thousands of requests on the same poor lock. (A throughput test in thread
> creation shows about 50000 threads created per second on my machine, not
> enough that all of this could make a difference.)
> 
> The main price for the improved safety is a little bit larger code.
> 
> Also under high congestion, the scheduling behavior will be different
> compared to the previous algorithm. In that case, a successful
> put-to-sleep may appear out of order compared to the arrival in the
> critical section. I am not sure that this is very different from the
> previous behavior, nor that scheduling order for these lock primitives is
> anything that an application should ever count on.
> 
> For the patch itself, I found only one other place where knowledge of the
> internals of the lock algorithm is used. Here I patch this usage with the
> appropriate CAS, but this should perhaps be extracted as a __trylock
> function or, maybe even better, macro.

The context for this part changed slightly from commit
c1e27367a9b26b9baac0f37a12349fc36567c8b6, but reversion of that patch
could probably be merged with your patch since the old code is valid
once the lock implementation is changed.

> This is version 2 of the patch that takes some of the remarks of
> Alexander and Rich into account:
> 
>  - all lock values are now formulated with respect to INT_MIN
>  - unlock now uses __wake to wake up waiters
>  - a new inline function __futexwait with the same complexity as __wake
>    is use to set threads to sleep
>  - __unlock covers the case that no lock had been recorded
>  - the fast path of __lock is moved outside the loop in a prominent position
> ---
>  src/internal/pthread_impl.h |  6 ++++++
>  src/thread/__lock.c         | 42 +++++++++++++++++++++++++++++++++++++-----
>  src/thread/pthread_detach.c |  2 +-
>  3 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/src/internal/pthread_impl.h b/src/internal/pthread_impl.h
> index 757b86ad..0622ad52 100644
> --- a/src/internal/pthread_impl.h
> +++ b/src/internal/pthread_impl.h
> @@ -132,6 +132,12 @@ static inline void __wake(volatile void *addr, int cnt, int priv)
>  	__syscall(SYS_futex, addr, FUTEX_WAKE|priv, cnt) != -ENOSYS ||
>  	__syscall(SYS_futex, addr, FUTEX_WAKE, cnt);
>  }
> +static inline void __futexwait(volatile void *addr, int val, int priv)
> +{
> +	if (priv) priv = 128;
> +	__syscall(SYS_futex, addr, FUTEX_WAIT|priv, val) != -ENOSYS ||
> +	__syscall(SYS_futex, addr, FUTEX_WAIT, val);
> +}
>  
>  void __acquire_ptc(void);
>  void __release_ptc(void);
> diff --git a/src/thread/__lock.c b/src/thread/__lock.c
> index 0874c04a..56092240 100644
> --- a/src/thread/__lock.c
> +++ b/src/thread/__lock.c
> @@ -2,14 +2,46 @@
>  
>  void __lock(volatile int *l)
>  {
> -	if (libc.threads_minus_1)
> -		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
> +	if (!libc.threads_minus_1) return;
> +        /* fast path: INT_MIN for holding the lock, +1 to count this
> +           thread in the critical section. */

There seems to be some mixing of tabs and spaces here and in a few
other lines. Tabs should be used for indentation levels, spaces (after
any initial indent) to column-align. Generally musl style also has
continuation comment lines begin with a * aligned with the * of the
opening /* too.

> +	int current = a_cas(l, 0, INT_MIN + 1);
> +        if (!current) return;
> +	/* A first spin lock acquisition loop, for the case of
> +	   medium congestion. */
> +	for (unsigned i = 0; i < 10; ++i) {
> +		if (current < 0) current -= INT_MIN + 1;
> +		// assertion: current >= 0
> +		int val = a_cas(l, current, INT_MIN + (current + 1));
> +		if (val == current) return;
> +		current = val;
> +	}
> +	// Spinning failed, so mark ourselves as being inside the CS.
> +	current = a_fetch_add(l, 1) + 1;
> +	/* The main lock acquisition loop for heavy congestion. The only
> +	   change to the value performed inside that loop is a successful
> +	   lock via the CAS that acquires the lock. */
> +	for (;;) {
> +		/* We can only go into wait, if we know that somebody holds the
> +		   lock and will eventually wake us up, again. */
> +		if (current < 0) {
> +			__futexwait(l, current, 1);
> +			current -= INT_MIN + 1;
> +		}
> +		/* assertion: current > 0, because the count
> +		   includes us already. */
> +		int val = a_cas(l, current, INT_MIN + current);
> +		if (val == current) return;
> +		current = val;
> +	}
>  }

As long as you're including comments, which I think makes sense given
that the algorithm is somewhat nontrivial, it might be nice to add a
quick exposition on the states. As I understand them:

- v==0 unlocked, no waiters
- v>=0 unlocked, with v waiters contending
- v<0  locked, with v-INT_MIN-1 waiters

One thing I don't recall understanding is why the lock owner is
"included" in the waiter count, i.e. why v<0 indicates v-INT_MIN-1
waiters rather than v-INT_MIN waiters. Is there some race it's
avoiding? Sorry if this is something obvious I knew at one time; it's
just been a long time since I visited this code due to the
very-drawn-out 1.1.17 release cycle.

>  void __unlock(volatile int *l)
>  {
> -	if (l[0]) {
> -		a_store(l, 0);
> -		if (l[1]) __wake(l, 1, 1);
> +	/* We have to check if l[0] had been touched at all. */

I don't understand this comment.

> +	if (l[0] < 0) {
> +		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
> +			__wake(l, 1, 1);
> +		}
>  	}
>  }

This would be a candidate for single if with &&, but I'm not sure if
it's more or less legible that way. I don't have a strong preference.

> diff --git a/src/thread/pthread_detach.c b/src/thread/pthread_detach.c
> index ed77f74d..818626ed 100644
> --- a/src/thread/pthread_detach.c
> +++ b/src/thread/pthread_detach.c
> @@ -6,7 +6,7 @@ int __pthread_join(pthread_t, void **);
>  static int __pthread_detach(pthread_t t)
>  {
>  	/* Cannot detach a thread that's already exiting */
> -	if (a_swap(t->exitlock, 1))
> +	if (a_cas(t->exitlock, 0, -INT_MAX))
>  		return __pthread_join(t, 0);
>  	t->detached = 2;
>  	__unlock(t->exitlock);

As noted above this needs slight update to revert the commit that
worked around the use-after-free bug.

Overall I think it looks pretty much good to go, aside from minor
style things. I'm open to your ideas on __futexwait, maybe just
writing it inline here since it's only used in one place; while I
don't like duplicating what __wait already does, __wait is mildly big
and using your __futexwait version instead will probably offset the
full size increase of the new lock (and maybe more) in programs that
have no reason other than __lock to pull in __wait.

Rich


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int
  2017-12-20 21:58   ` Rich Felker
@ 2017-12-21 11:06     ` Jens Gustedt
  2017-12-21 23:34       ` Rich Felker
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2017-12-21 11:06 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5176 bytes --]

Hello Rich,

On Wed, 20 Dec 2017 16:58:27 -0500, Rich Felker wrote:

> On Fri, Jun 16, 2017 at 09:11:35AM +0200, Jens Gustedt wrote:
> > A variant of this new lock algorithm has been presented at SAC'16,
> > see https://hal.inria.fr/hal-01304108. A full version of that paper
> > is available at https://hal.inria.fr/hal-01236734.  
> 
> Finally getting back to this, and trying to finish it in the current
> (1.1.19) release cycle...

great, I appreciate

since also for me this is half a year ago I have to wrap my head
around this and look into the details, again.

> When updating, can you trim down the proposed commit message a bit
> (or do you mind if I do so?) to focus on the change being made and
> the motivation without going into lots of speculation about related
> changes that could be made?

sure, I'll do that, the patches were clearly not supposed to be final
anyhow

> The context for this part changed slightly from commit
> c1e27367a9b26b9baac0f37a12349fc36567c8b6, but reversion of that patch
> could probably be merged with your patch since the old code is valid
> once the lock implementation is changed.

I'll look into that particular one.

> >  void __lock(volatile int *l)
> >  {
> > -	if (libc.threads_minus_1)
> > -		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
> > +	if (!libc.threads_minus_1) return;
> > +        /* fast path: INT_MIN for holding the lock, +1 to count
> > this
> > +           thread in the critical section. */  
> 
> There seems to be some mixing of tabs and spaces here and in a few
> other lines. Tabs should be used for indentation levels, spaces (after
> any initial indent) to column-align. Generally musl style also has
> continuation comment lines begin with a * aligned with the * of the
> opening /* too.

Since I ran into similar comments before and I find it quite
frustrating and disruptive:

musl style is just not what my editor is tuned to, and I wouldn't
change that, just for musl.

Is there a tool (and command line arguments) that you could recommend,
and that one could just run before submitting any patches?

In my own projects I usually use astyle, which does a pretty good
job. Would you be able to release a list of astyle (or indent or
whatever) arguments that format code in a style accepted for musl?

> As long as you're including comments, which I think makes sense given
> that the algorithm is somewhat nontrivial, it might be nice to add a
> quick exposition on the states.

yes, very good idea

> As I understand them:
> 
> - v==0 unlocked, no waiters
> - v>=0 unlocked, with v waiters contending  
> - v<0  locked, with v-INT_MIN-1 waiters

sort of, see below

> One thing I don't recall understanding is why the lock owner is
> "included" in the waiter count, i.e. why v<0 indicates v-INT_MIN-1
> waiters rather than v-INT_MIN waiters. Is there some race it's
> avoiding?

yes.

The counter effectively is a congestion counter, not a wait counter,
that is it counts all threads that are inside the critical section.

This POV avoids a race around sending waiters to sleep: now that the
'int' of the futex contains all the state, if this would be a wait
count, waiters that want to go into sleep would have to change the
very state to which the sleep request is atomically bound.

As a consequence, under high congestion, many futex_wait calls could
fail because the 'int' changes too often. There is even potential for
some sort of ping-pong where threads hammer on the wait counter and
are never able to fall asleep because everybody is shouting their
"good night" - "good morning" messages in a loop.

Using the counter as a congestion counter avoids that. The fact that a
thread on the slow path wants to sleep doesn't change the state, and
thus the futex_wait calls have better chances to succeed. And at the
other end the unlock function can operate on that effectively, because
we can change the lock bit and the counter in one atomic operation.

If you are interested, in the paper that I mentioned is a proof that
such a system eventually calms down.

In summary, with a congestion count n we have at most 3xn changes to
the atomic state in total. With a wait count we have at least as much
with no upper bound.

> Overall I think it looks pretty much good to go, aside from minor
> style things. I'm open to your ideas on __futexwait, maybe just
> writing it inline here since it's only used in one place; while I
> don't like duplicating what __wait already does, __wait is mildly big
> and using your __futexwait version instead will probably offset the
> full size increase of the new lock (and maybe more) in programs that
> have no reason other than __lock to pull in __wait.

I'look into all of that, but I might need some days to do so.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int
  2017-12-21 11:06     ` Jens Gustedt
@ 2017-12-21 23:34       ` Rich Felker
  2018-01-03 14:08         ` automated coding style Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Rich Felker @ 2017-12-21 23:34 UTC (permalink / raw)
  To: musl

On Thu, Dec 21, 2017 at 12:06:55PM +0100, Jens Gustedt wrote:
> Hello Rich,
> 
> On Wed, 20 Dec 2017 16:58:27 -0500, Rich Felker wrote:
> 
> > On Fri, Jun 16, 2017 at 09:11:35AM +0200, Jens Gustedt wrote:
> > > A variant of this new lock algorithm has been presented at SAC'16,
> > > see https://hal.inria.fr/hal-01304108. A full version of that paper
> > > is available at https://hal.inria.fr/hal-01236734.  
> > 
> > Finally getting back to this, and trying to finish it in the current
> > (1.1.19) release cycle...
> 
> great, I appreciate
> 
> since also for me this is half a year ago I have to wrap my head
> around this and look into the details, again.

Sure, no problem.

> > >  void __lock(volatile int *l)
> > >  {
> > > -	if (libc.threads_minus_1)
> > > -		while (a_swap(l, 1)) __wait(l, l+1, 1, 1);
> > > +	if (!libc.threads_minus_1) return;
> > > +        /* fast path: INT_MIN for holding the lock, +1 to count
> > > this
> > > +           thread in the critical section. */  
> > 
> > There seems to be some mixing of tabs and spaces here and in a few
> > other lines. Tabs should be used for indentation levels, spaces (after
> > any initial indent) to column-align. Generally musl style also has
> > continuation comment lines begin with a * aligned with the * of the
> > opening /* too.
> 
> Since I ran into similar comments before and I find it quite
> frustrating and disruptive:

By similar comments do you mean existing comments in the musl source
styled like you did it, or similar email comments before about styling
matters?

> musl style is just not what my editor is tuned to, and I wouldn't
> change that, just for musl.
> 
> Is there a tool (and command line arguments) that you could recommend,
> and that one could just run before submitting any patches?
> 
> In my own projects I usually use astyle, which does a pretty good
> job. Would you be able to release a list of astyle (or indent or
> whatever) arguments that format code in a style accepted for musl?

I'm not familiar with these tools, so I could look into it, but
someone else interested might be able to get to it before me. I'll
see. The way I deal with touching code in lots of different projects
with different styles is having my editor configured to copy exactly
the leading whitespace of the previous line when adding a new line,
and then adding or deleting tabs or spaces as appropriate. But I can
see how that's not agreeable to everyone.

Actually now that I think about it, I'm pretty sure the official Linux
kernel style is nearly the same as preferred style in musl, and should
usually produce formatting that's perfectly acceptable.

> > As long as you're including comments, which I think makes sense given
> > that the algorithm is somewhat nontrivial, it might be nice to add a
> > quick exposition on the states.
> 
> yes, very good idea
> 
> > As I understand them:
> > 
> > - v==0 unlocked, no waiters
> > - v>=0 unlocked, with v waiters contending  
> > - v<0  locked, with v-INT_MIN-1 waiters
> 
> sort of, see below
> 
> > One thing I don't recall understanding is why the lock owner is
> > "included" in the waiter count, i.e. why v<0 indicates v-INT_MIN-1
> > waiters rather than v-INT_MIN waiters. Is there some race it's
> > avoiding?
> 
> yes.
> 
> The counter effectively is a congestion counter, not a wait counter,
> that is it counts all threads that are inside the critical section.
> 
> This POV avoids a race around sending waiters to sleep: now that the
> 'int' of the futex contains all the state, if this would be a wait
> count, waiters that want to go into sleep would have to change the
> very state to which the sleep request is atomically bound.

I see. I actually had in mind counting threads that were spinning but
not yet in the futex wait as "waiters", so in that case I think the
only difference between what you're doing and what I had in mind as an
alternative is that the thread which owns the lock still gets counted
even after acquiring it rather than subtracting itself out at the same
time as acquiring the lock. But I don't think there's any perfomance
or safety/lock-properties difference either way; it seems to just be a
notational difference.

> As a consequence, under high congestion, many futex_wait calls could
> fail because the 'int' changes too often. There is even potential for
> some sort of ping-pong where threads hammer on the wait counter and
> are never able to fall asleep because everybody is shouting their
> "good night" - "good morning" messages in a loop.
> 
> Using the counter as a congestion counter avoids that. The fact that a
> thread on the slow path wants to sleep doesn't change the state, and
> thus the futex_wait calls have better chances to succeed. And at the
> other end the unlock function can operate on that effectively, because
> we can change the lock bit and the counter in one atomic operation.
> 
> If you are interested, in the paper that I mentioned is a proof that
> such a system eventually calms down.
> 
> In summary, with a congestion count n we have at most 3xn changes to
> the atomic state in total. With a wait count we have at least as much
> with no upper bound.

Yes, this is a very nice property. I think I actually read the paper
or something preliminary you'd written up on it previously (when you
first proposed the new lock) but it's been so long that I may be
misremembering.

> > Overall I think it looks pretty much good to go, aside from minor
> > style things. I'm open to your ideas on __futexwait, maybe just
> > writing it inline here since it's only used in one place; while I
> > don't like duplicating what __wait already does, __wait is mildly big
> > and using your __futexwait version instead will probably offset the
> > full size increase of the new lock (and maybe more) in programs that
> > have no reason other than __lock to pull in __wait.
> 
> I'look into all of that, but I might need some days to do so.

The more I think about it the more I'm leaning towards avoiding musl's
existing __wait function here, like what you're doing, and maybe even
phasing out the current __wait and __timedwait since they seem to have
bad worst-case performance properties (unbounded changing of shared
memory under contention). Moving the waiter counting out to the
calling lock function seems like the right way to take things. So for
now it's just a question of whether to keep __futexwait as a function
like you did, and if we're going to use it more in the future, the
answer is probably yes.

Rich

^ permalink raw reply	[flat|nested] 27+ messages in thread

* automated coding style
  2017-12-21 23:34       ` Rich Felker
@ 2018-01-03 14:08         ` Jens Gustedt
  2018-01-11  4:41           ` Samuel Holland
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Gustedt @ 2018-01-03 14:08 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 2142 bytes --]

Hello Rich,

On Thu, 21 Dec 2017 18:34:02 -0500 Rich Felker <dalias@libc.org> wrote:

> On Thu, Dec 21, 2017 at 12:06:55PM +0100, Jens Gustedt wrote:

> > > There seems to be some mixing of tabs and spaces here and in a few
> > > other lines. Tabs should be used for indentation levels, spaces
> > > (after any initial indent) to column-align. Generally musl style
> > > also has continuation comment lines begin with a * aligned with
> > > the * of the opening /* too.  
> > 
> > Since I ran into similar comments before and I find it quite
> > frustrating and disruptive:  
> 
> By similar comments do you mean existing comments in the musl source
> styled like you did it, or similar email comments before about styling
> matters?

comments about style.

it think an automatic formatting procedure that accommodates everybody
could help us focus on contents

> I'm not familiar with these tools, so I could look into it, but
> someone else interested might be able to get to it before me. I'll
> see. The way I deal with touching code in lots of different projects
> with different styles is having my editor configured to copy exactly
> the leading whitespace of the previous line when adding a new line,
> and then adding or deleting tabs or spaces as appropriate. But I can
> see how that's not agreeable to everyone.
> 
> Actually now that I think about it, I'm pretty sure the official Linux
> kernel style is nearly the same as preferred style in musl, and should
> usually produce formatting that's perfectly acceptable.

so for my latest patches I experimented with

astyle --style=linux --indent=tab

This helps a lot, already. Unfortunately, this combination still
changes some musl sources, basically because indentation of
continuation lines is handled differently.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: automated coding style
  2018-01-03 14:08         ` automated coding style Jens Gustedt
@ 2018-01-11  4:41           ` Samuel Holland
  2018-01-11  8:28             ` Jens Gustedt
  0 siblings, 1 reply; 27+ messages in thread
From: Samuel Holland @ 2018-01-11  4:41 UTC (permalink / raw)
  To: musl, Jens Gustedt

[-- Attachment #1: Type: text/plain, Size: 3318 bytes --]

Hello,

On 01/03/18 08:08, Jens Gustedt wrote:
> On Thu, 21 Dec 2017 18:34:02 -0500 Rich Felker <dalias@libc.org> wrote:
>> On Thu, Dec 21, 2017 at 12:06:55PM +0100, Jens Gustedt wrote:
>>>> There seems to be some mixing of tabs and spaces here and in a few 
>>>> other lines. Tabs should be used for indentation levels, spaces (after 
>>>> any initial indent) to column-align. Generally musl style also has 
>>>> continuation comment lines begin with a * aligned with the * of the 
>>>> opening /* too.
>>> 
>>> Since I ran into similar comments before and I find it quite frustrating 
>>> and disruptive:
>> 
>> By similar comments do you mean existing comments in the musl source
>> styled like you did it, or similar email comments before about styling
>> matters?
> 
> comments about style.
> 
> it think an automatic formatting procedure that accommodates everybody could 
> help us focus on contents
> 
>> I'm not familiar with these tools, so I could look into it, but someone 
>> else interested might be able to get to it before me. I'll see. The way I 
>> deal with touching code in lots of different projects with different
>> styles is having my editor configured to copy exactly the leading
>> whitespace of the previous line when adding a new line, and then adding or
>> deleting tabs or spaces as appropriate. But I can see how that's not
>> agreeable to everyone.
>> 
>> Actually now that I think about it, I'm pretty sure the official Linux 
>> kernel style is nearly the same as preferred style in musl, and should 
>> usually produce formatting that's perfectly acceptable.
> 
> so for my latest patches I experimented with
> 
> astyle --style=linux --indent=tab
> 
> This helps a lot, already. Unfortunately, this combination still changes
> some musl sources, basically because indentation of continuation lines is
> handled differently.

I'm familiar with three source code formatting tools for C: astyle,
clang-format, and uncrustify. I haven't used astyle, because it seems to be the
least flexible of the three.

clang-format is very stable and consistent, and has a moderate number of
configuration options. However, because it is designed for use with C++, it has
poor handling of C braced initializers, especially designated initializers. It
also is unable to align macro definitions, making it unusable on musl's headers.
The changeset for this feature has been open for over a year and doesn't look
like it will be merged soon: https://reviews.llvm.org/D28462

uncrustify is the most flexible of the three, making it the best for matching an
existing code style. It's also a bit buggier than clang-format, especially when
the tool wraps long lines for you (though this doesn't affect code already
following a line length limit).

I wrote a script to permute over 600 of the options to uncrustify and generate a
configuration with the smallest diff from the current musl source. Attached is
that configuration file, as well as a diffstat from 628cf979b249. You can see
most of the changes are in the inline assembly syntax, continuation of header
namespace guards, and the imported code in src/math and src/regex.

I don't claim that this file represents the "official musl style", especially
because it's autogenerated, but hopefully it's useful for formatting patches.

Regards,
Samuel

[-- Attachment #2: uncrustify --]
[-- Type: text/plain, Size: 15471 bytes --]

align_asm_colon=False
align_assign_span=0
align_assign_thresh=0
align_enum_equ_span=0
align_enum_equ_thresh=0
align_func_params=True
align_func_params_gap=1
align_func_params_span=8
align_func_params_thresh=2
align_func_proto_gap=0
align_func_proto_span=0
align_keep_extra_space=False
align_keep_tabs=True
align_left_shift=False
align_mix_var_proto=False
align_nl_cont=False
align_number_right=False
align_oc_decl_colon=False
align_oc_msg_colon_first=False
align_oc_msg_colon_span=0
align_oc_msg_spec_span=0
align_on_operator=False
align_on_tabstop=False
align_pp_define_gap=0
align_pp_define_span=0
align_pp_define_together=False
align_right_cmt_at_col=0
align_right_cmt_gap=0
align_right_cmt_mix=False
align_right_cmt_span=0
align_same_func_call_params=False
align_single_line_brace=False
align_single_line_brace_gap=0
align_single_line_func=False
align_struct_init_span=0
align_typedef_amp_style=0
align_typedef_func=0
align_typedef_gap=0
align_typedef_span=0
align_typedef_star_style=0
align_var_class_gap=0
align_var_class_span=0
align_var_class_thresh=0
align_var_def_amp_style=0
align_var_def_attribute=False
align_var_def_colon=False
align_var_def_colon_gap=0
align_var_def_gap=0
align_var_def_inline=False
align_var_def_span=0
align_var_def_star_style=0
align_var_def_thresh=0
align_var_struct_gap=0
align_var_struct_span=0
align_var_struct_thresh=0
align_with_tabs=False
cmt_c_group=False
cmt_c_nl_end=False
cmt_c_nl_start=False
cmt_convert_tab_to_spaces=False
cmt_cpp_group=False
cmt_cpp_nl_end=False
cmt_cpp_nl_start=False
cmt_cpp_to_c=False
cmt_indent_multi=False
cmt_insert_before_ctor_dtor=False
cmt_insert_before_inlines=True
cmt_insert_before_preproc=False
cmt_insert_class_header=""
cmt_insert_file_footer=""
cmt_insert_file_header=""
cmt_insert_func_header=""
cmt_insert_oc_msg_header=""
cmt_multi_check_last=True
cmt_multi_first_len_minimum=4
cmt_reflow_mode=0
cmt_sp_after_star_cont=0
cmt_sp_before_star_cont=0
cmt_star_cont=False
cmt_width=0
code_width=0
disable_processing_cmt=""
eat_blanks_after_open_brace=False
eat_blanks_before_close_brace=True
enable_digraphs=False
enable_processing_cmt=""
force_tab_after_define=False
include_category_0=""
include_category_1=""
include_category_2=""
indent_access_spec=4
indent_access_spec_body=False
indent_align_assign=False
indent_align_string=False
indent_bool_paren=False
indent_brace=0
indent_brace_parent=false
indent_braces=false
indent_braces_no_class=false
indent_braces_no_func=false
indent_braces_no_struct=false
indent_case_brace=2
indent_case_shift=0
indent_class=False
indent_class_colon=False
indent_class_on_colon=False
indent_cmt_with_tabs=False
indent_col1_comment=False
indent_columns=8
indent_comma_paren=False
indent_constr_colon=False
indent_continue=0
indent_cpp_lambda_body=False
indent_cs_delegate_brace=False
indent_ctor_init=0
indent_ctor_init_leading=2
indent_else_if=False
indent_extern=False
indent_first_bool_expr=False
indent_func_call_param=True
indent_func_class_param=False
indent_func_const=0
indent_func_ctor_var_param=False
indent_func_def_force_col1=False
indent_func_def_param=True
indent_func_param_double=False
indent_func_proto_param=True
indent_func_throw=0
indent_ignore_asm_block=True
indent_label=1
indent_member=0
indent_min_vbrace_open=0
indent_namespace=False
indent_namespace_level=0
indent_namespace_limit=0
indent_namespace_single_indent=False
indent_oc_block=False
indent_oc_block_msg=0
indent_oc_block_msg_from_brace=False
indent_oc_block_msg_from_caret=False
indent_oc_block_msg_from_colon=False
indent_oc_block_msg_from_keyword=False
indent_oc_block_msg_xcode_style=False
indent_oc_msg_colon=0
indent_oc_msg_prioritize_first_colon=True
indent_param=0
indent_paren_after_func_call=False
indent_paren_after_func_decl=False
indent_paren_after_func_def=False
indent_paren_close=2
indent_paren_nl=False
indent_paren_open_brace=False
indent_preserve_sql=False
indent_relative_single_line_comments=False
indent_shift=False
indent_sing_line_comments=0
indent_square_nl=False
indent_switch_case=0
indent_switch_pp=True
indent_template_param=False
indent_ternary_operator=0
indent_token_after_brace=False
indent_using_block=True
indent_var_def_blk=0
indent_var_def_cont=False
indent_vbrace_open_on_tabstop=False
indent_with_tabs=2
indent_xml_string=0
input_tab_size=8
ls_code_width=false
ls_for_split_full=False
ls_func_split_full=False
mod_add_long_class_closebrace_comment=0
mod_add_long_function_closebrace_comment=0
mod_add_long_ifdef_else_comment=0
mod_add_long_ifdef_endif_comment=0
mod_add_long_namespace_closebrace_comment=0
mod_add_long_switch_closebrace_comment=0
mod_case_brace=Remove
mod_full_brace_do=ignore
mod_full_brace_for=ignore
mod_full_brace_function=Force
mod_full_brace_if=ignore
mod_full_brace_if_chain=false
mod_full_brace_if_chain_only=false
mod_full_brace_nl=2
mod_full_brace_nl_block_rem_mlcond=False
mod_full_brace_using=Force
mod_full_brace_while=Remove
mod_full_paren_if_bool=False
mod_move_case_break=False
mod_paren_on_return=ignore
mod_pawn_semicolon=False
mod_remove_empty_return=False
mod_remove_extra_semicolon=False
mod_sort_import=False
mod_sort_include=False
mod_sort_oc_properties=False
mod_sort_oc_property_class_weight=0
mod_sort_oc_property_getter_weight=0
mod_sort_oc_property_nullability_weight=0
mod_sort_oc_property_readwrite_weight=0
mod_sort_oc_property_reference_weight=0
mod_sort_oc_property_setter_weight=0
mod_sort_oc_property_thread_safe_weight=0
mod_sort_using=False
newlines=LF
nl_after_access_spec=0
nl_after_annotation=Force
nl_after_brace_close=False
nl_after_brace_open=False
nl_after_brace_open_cmt=False
nl_after_case=False
nl_after_class=0
nl_after_do=ignore
nl_after_for=ignore
nl_after_func_body=0
nl_after_func_body_class=0
nl_after_func_body_one_liner=0
nl_after_func_class_proto=0
nl_after_func_class_proto_group=0
nl_after_func_proto=0
nl_after_func_proto_group=0
nl_after_if=ignore
nl_after_label_colon=False
nl_after_multiline_comment=False
nl_after_return=False
nl_after_semicolon=False
nl_after_square_assign=Force
nl_after_struct=0
nl_after_switch=ignore
nl_after_synchronized=Force
nl_after_try_catch_finally=0
nl_after_vbrace_close=False
nl_after_vbrace_open=False
nl_after_vbrace_open_empty=False
nl_after_while=ignore
nl_around_cs_property=0
nl_assign_brace=Remove
nl_assign_leave_one_liners=False
nl_assign_square=Force
nl_before_access_spec=0
nl_before_block_comment=0
nl_before_c_comment=0
nl_before_case=False
nl_before_class=0
nl_before_cpp_comment=0
nl_before_do=ignore
nl_before_for=ignore
nl_before_func_body_def=0
nl_before_func_body_proto=0
nl_before_func_class_def=0
nl_before_func_class_proto=0
nl_before_if=ignore
nl_before_if_closing_paren=ignore
nl_before_return=False
nl_before_switch=ignore
nl_before_synchronized=Force
nl_before_throw=Force
nl_before_while=ignore
nl_between_annotation=Force
nl_between_get_set=0
nl_brace_brace=Remove
nl_brace_catch=Force
nl_brace_else=Remove
nl_brace_finally=Force
nl_brace_fparen=Remove
nl_brace_square=Force
nl_brace_struct_var=Remove
nl_brace_while=Remove
nl_case_colon_brace=Remove
nl_catch_brace=Force
nl_class_brace=Force
nl_class_colon=Force
nl_class_init_args=Force
nl_class_leave_one_liners=False
nl_collapse_empty_body=False
nl_comment_func_def=0
nl_constr_colon=Force
nl_constr_init_args=Force
nl_cpp_lambda_leave_one_liners=False
nl_cpp_ldef_brace=Force
nl_create_for_one_liner=False
nl_create_if_one_liner=False
nl_create_while_one_liner=False
nl_define_macro=True
nl_do_brace=Remove
nl_ds_struct_enum_close_brace=False
nl_ds_struct_enum_cmt=False
nl_else_brace=Remove
nl_else_if=Remove
nl_elseif_brace=Remove
nl_end_of_file=ignore
nl_end_of_file_min=0
nl_enum_brace=Remove
nl_enum_class=Force
nl_enum_class_identifier=Force
nl_enum_colon_type=Force
nl_enum_identifier_colon=Force
nl_enum_leave_one_liners=False
nl_enum_own_lines=ignore
nl_fcall_brace=Force
nl_fdef_brace=ignore
nl_finally_brace=Force
nl_for_brace=Remove
nl_func_call_args_multi_line=False
nl_func_call_empty=Remove
nl_func_call_end_multi_line=False
nl_func_call_paren=Remove
nl_func_call_paren_empty=ignore
nl_func_call_start_multi_line=False
nl_func_class_scope=Force
nl_func_decl_args=ignore
nl_func_decl_args_multi_line=false
nl_func_decl_empty=Remove
nl_func_decl_end=Remove
nl_func_decl_end_multi_line=false
nl_func_decl_end_single=ignore
nl_func_decl_start=ignore
nl_func_decl_start_multi_line=false
nl_func_decl_start_single=ignore
nl_func_def_args=ignore
nl_func_def_args_multi_line=false
nl_func_def_empty=Remove
nl_func_def_end=Remove
nl_func_def_end_multi_line=false
nl_func_def_end_single=ignore
nl_func_def_paren=Remove
nl_func_def_paren_empty=ignore
nl_func_def_start=Remove
nl_func_def_start_multi_line=false
nl_func_def_start_single=ignore
nl_func_leave_one_liners=True
nl_func_paren=Remove
nl_func_paren_empty=ignore
nl_func_proto_type_name=ignore
nl_func_scope_name=Force
nl_func_type_name=ignore
nl_func_type_name_class=ignore
nl_func_var_def_blk=0
nl_getset_brace=Force
nl_getset_leave_one_liners=False
nl_if_brace=Remove
nl_if_leave_one_liners=True
nl_max=0
nl_max_blank_in_func=0
nl_multi_line_cond=False
nl_multi_line_define=false
nl_namespace_brace=Force
nl_oc_block_brace=Force
nl_oc_msg_args=False
nl_oc_msg_leave_one_liner=False
nl_paren_dbrace_open=Force
nl_property_brace=Force
nl_remove_extra_newlines=0
nl_return_expr=Remove
nl_scope_brace=Force
nl_split_for_one_liner=False
nl_split_if_one_liner=False
nl_split_while_one_liner=False
nl_squeeze_ifdef=False
nl_squeeze_ifdef_top_level=False
nl_start_of_file=Force
nl_start_of_file_min=0
nl_struct_brace=ignore
nl_switch_brace=Remove
nl_synchronized_brace=Force
nl_template_class=Force
nl_try_brace=Force
nl_type_brace_init_lst=Force
nl_type_brace_init_lst_close=Force
nl_type_brace_init_lst_open=Force
nl_typedef_blk_end=0
nl_typedef_blk_in=0
nl_typedef_blk_start=0
nl_union_brace=ignore
nl_unittest_brace=Force
nl_using_brace=Force
nl_var_def_blk_end=0
nl_var_def_blk_in=0
nl_var_def_blk_start=0
nl_version_brace=Force
nl_while_brace=ignore
nl_while_leave_one_liners=False
output_tab_size=8
pos_arith=ignore
pos_assign=Trail
pos_bool=ignore
pos_class_colon=Trail
pos_class_comma=Trail
pos_comma=Trail
pos_compare=Lead
pos_conditional=ignore
pos_constr_colon=Trail
pos_constr_comma=Trail
pos_enum_comma=Trail
pp_define_at_level=False
pp_if_indent_code=False
pp_ignore_define_body=True
pp_indent=ignore
pp_indent_at_level=false
pp_indent_brace=true
pp_indent_case=true
pp_indent_count=1
pp_indent_extern=true
pp_indent_func_def=true
pp_indent_if=0
pp_indent_region=0
pp_region_indent_code=False
pp_space=Remove
pp_space_count=8
sp_addr=Remove
sp_after_angle=Force
sp_after_assign=ignore
sp_after_byref=Force
sp_after_byref_func=Force
sp_after_cast=ignore
sp_after_class_colon=Force
sp_after_comma=ignore
sp_after_constr_colon=Force
sp_after_dc=Force
sp_after_for_colon=Force
sp_after_invariant_paren=Force
sp_after_mdatype_commas=Force
sp_after_new=Force
sp_after_newop_paren=Force
sp_after_oc_at_sel=Force
sp_after_oc_at_sel_parens=Force
sp_after_oc_block_caret=Force
sp_after_oc_colon=Force
sp_after_oc_dict_colon=Force
sp_after_oc_msg_receiver=Force
sp_after_oc_property=Force
sp_after_oc_return_type=Force
sp_after_oc_scope=Force
sp_after_oc_type=Force
sp_after_operator=Force
sp_after_operator_sym=Force
sp_after_operator_sym_empty=Force
sp_after_ptr_star=ignore
sp_after_ptr_star_func=ignore
sp_after_ptr_star_qualifier=ignore
sp_after_semi=Add
sp_after_semi_for=Force
sp_after_semi_for_empty=ignore
sp_after_send_oc_colon=Force
sp_after_sparen=Force
sp_after_tag=Force
sp_after_throw=Force
sp_after_tparen_close=Remove
sp_after_type=Add
sp_after_type_brace_init_lst_open=Force
sp_angle_colon=Force
sp_angle_paren=Force
sp_angle_paren_empty=Force
sp_angle_shift=Add
sp_angle_word=Force
sp_annotation_paren=Force
sp_arith=ignore
sp_arith_additive=ignore
sp_assign=ignore
sp_assign_default=ignore
sp_attribute_paren=Remove
sp_balance_nested_parens=False
sp_before_angle=Force
sp_before_assign=ignore
sp_before_byref=Force
sp_before_byref_func=Force
sp_before_case_colon=Remove
sp_before_class_colon=Force
sp_before_comma=Remove
sp_before_constr_colon=Force
sp_before_dc=Force
sp_before_ellipsis=Force
sp_before_for_colon=Force
sp_before_mdatype_commas=Force
sp_before_nl_cont=Add
sp_before_oc_block_caret=Force
sp_before_oc_colon=Force
sp_before_oc_dict_colon=Force
sp_before_pp_stringify=Force
sp_before_ptr_star=ignore
sp_before_ptr_star_func=ignore
sp_before_semi=Remove
sp_before_semi_for=ignore
sp_before_semi_for_empty=ignore
sp_before_send_oc_colon=Force
sp_before_sparen=ignore
sp_before_square=ignore
sp_before_squares=ignore
sp_before_template_paren=Force
sp_before_tr_emb_cmt=ignore
sp_before_type_brace_init_lst_close=Force
sp_before_unnamed_byref=Force
sp_before_unnamed_ptr_star=ignore
sp_between_mdatype_commas=Force
sp_between_new_paren=Force
sp_between_ptr_star=Remove
sp_bool=ignore
sp_brace_catch=Force
sp_brace_else=Force
sp_brace_finally=Force
sp_brace_typedef=Force
sp_case_label=Force
sp_catch_brace=Force
sp_catch_paren=Force
sp_cmt_cpp_doxygen=False
sp_cmt_cpp_qttr=False
sp_cmt_cpp_start=ignore
sp_compare=ignore
sp_cond_colon=Force
sp_cond_colon_after=Force
sp_cond_colon_before=Force
sp_cond_question=Force
sp_cond_question_after=Force
sp_cond_question_before=Force
sp_cond_ternary_short=Force
sp_cparen_oparen=Remove
sp_cpp_cast_paren=Force
sp_cpp_lambda_assign=Force
sp_cpp_lambda_paren=Force
sp_d_array_colon=Force
sp_defined_paren=Remove
sp_deref=Remove
sp_else_brace=Force
sp_endif_cmt=Force
sp_enum_after_assign=Force
sp_enum_assign=Force
sp_enum_before_assign=Force
sp_enum_colon=Force
sp_enum_paren=Force
sp_extern_paren=Force
sp_finally_brace=Force
sp_fparen_brace=Force
sp_fparen_dbrace=Force
sp_func_call_paren=ignore
sp_func_call_paren_empty=ignore
sp_func_call_user_paren=Force
sp_func_class_paren=Force
sp_func_class_paren_empty=Force
sp_func_def_paren=ignore
sp_func_def_paren_empty=ignore
sp_func_proto_paren=Remove
sp_func_proto_paren_empty=ignore
sp_getset_brace=Force
sp_incdec=Remove
sp_inside_angle=Force
sp_inside_braces=ignore
sp_inside_braces_empty=ignore
sp_inside_braces_enum=ignore
sp_inside_braces_struct=ignore
sp_inside_fparen=ignore
sp_inside_fparens=ignore
sp_inside_newop_paren=Force
sp_inside_newop_paren_close=Force
sp_inside_newop_paren_open=Force
sp_inside_oc_at_sel_parens=Force
sp_inside_paren=ignore
sp_inside_paren_cast=ignore
sp_inside_sparen=ignore
sp_inside_sparen_close=ignore
sp_inside_sparen_open=ignore
sp_inside_square=Remove
sp_inside_tparen=Remove
sp_inside_type_brace_init_lst=Force
sp_inv=Remove
sp_invariant_paren=Force
sp_macro=Add
sp_macro_func=Add
sp_member=Remove
sp_not=Remove
sp_num_before_tr_emb_cmt=0
sp_paren_brace=ignore
sp_paren_comma=Force
sp_paren_paren=ignore
sp_permit_cpp11_shift=False
sp_pp_concat=Add
sp_pp_stringify=Force
sp_ptr_star_paren=Remove
sp_range=Force
sp_return_paren=Force
sp_scope_paren=Force
sp_sign=Remove
sp_sizeof_paren=ignore
sp_skip_vbrace_tokens=False
sp_sparen_brace=Force
sp_special_semi=Force
sp_square_fparen=Remove
sp_super_paren=Remove
sp_template_angle=Force
sp_this_paren=Remove
sp_throw_paren=Force
sp_try_brace=Force
sp_type_brace_init_lst=Force
sp_type_func=ignore
sp_version_paren=Force
sp_word_brace=Add
sp_word_brace_ns=Add
string_escape_char2=0
string_escape_char=92
string_replace_tab_chars=False
tok_split_gte=False
use_indent_continue_only_once=False
use_indent_func_call_param=False
use_options_overriding_for_qt_macros=True
utf8_bom=Force
utf8_byte=False
utf8_force=False
warn_level_tabs_found_in_verbatim_string_literals=2

[-- Attachment #3: diffstat --]
[-- Type: text/plain, Size: 18373 bytes --]

 arch/aarch64/atomic_arch.h              |   10 +-
 arch/aarch64/bits/limits.h              |    2 +-
 arch/aarch64/bits/signal.h              |    2 +-
 arch/aarch64/crt_arch.h                 |   28 +-
 arch/aarch64/syscall_arch.h             |   72 +-
 arch/arm/atomic_arch.h                  |   32 +-
 arch/arm/bits/limits.h                  |    2 +-
 arch/arm/bits/signal.h                  |    2 +-
 arch/arm/bits/user.h                    |   18 +-
 arch/arm/crt_arch.h                     |   34 +-
 arch/arm/pthread_arch.h                 |    8 +-
 arch/arm/syscall_arch.h                 |   72 +-
 arch/generic/bits/ioctl.h               |   10 +-
 arch/i386/atomic_arch.h                 |   58 +-
 arch/i386/bits/fenv.h                   |    4 +-
 arch/i386/bits/limits.h                 |    2 +-
 arch/i386/bits/signal.h                 |    2 +-
 arch/i386/crt_arch.h                    |   30 +-
 arch/i386/pthread_arch.h                |    2 +-
 arch/microblaze/atomic_arch.h           |    6 +-
 arch/microblaze/bits/limits.h           |    2 +-
 arch/microblaze/bits/signal.h           |    2 +-
 arch/microblaze/crt_arch.h              |   32 +-
 arch/microblaze/pthread_arch.h          |    2 +-
 arch/microblaze/syscall_arch.h          |   70 +-
 arch/mips/bits/ioctl.h                  |   10 +-
 arch/mips/bits/limits.h                 |    2 +-
 arch/mips/bits/signal.h                 |    2 +-
 arch/mips/bits/stat.h                   |    2 +-
 arch/mips/crt_arch.h                    |   56 +-
 arch/mips/pthread_arch.h                |    6 +-
 arch/mips/syscall_arch.h                |   38 +-
 arch/mips64/bits/ioctl.h                |   10 +-
 arch/mips64/bits/limits.h               |    2 +-
 arch/mips64/bits/signal.h               |    2 +-
 arch/mips64/crt_arch.h                  |   64 +-
 arch/mips64/pthread_arch.h              |    6 +-
 arch/mips64/syscall_arch.h              |   38 +-
 arch/mipsn32/bits/ioctl.h               |   10 +-
 arch/mipsn32/bits/limits.h              |    2 +-
 arch/mipsn32/bits/signal.h              |    2 +-
 arch/mipsn32/crt_arch.h                 |   62 +-
 arch/mipsn32/pthread_arch.h             |    6 +-
 arch/mipsn32/syscall_arch.h             |   38 +-
 arch/or1k/atomic_arch.h                 |   20 +-
 arch/or1k/bits/limits.h                 |    2 +-
 arch/or1k/bits/signal.h                 |    2 +-
 arch/or1k/crt_arch.h                    |   34 +-
 arch/or1k/pthread_arch.h                |    6 +-
 arch/or1k/syscall_arch.h                |   56 +-
 arch/powerpc/bits/limits.h              |    2 +-
 arch/powerpc/bits/signal.h              |    4 +-
 arch/powerpc/bits/user.h                |    2 +-
 arch/powerpc/crt_arch.h                 |   38 +-
 arch/powerpc/pthread_arch.h             |    8 +-
 arch/powerpc64/bits/limits.h            |    2 +-
 arch/powerpc64/bits/signal.h            |    2 +-
 arch/powerpc64/bits/user.h              |    2 +-
 arch/powerpc64/crt_arch.h               |   36 +-
 arch/powerpc64/pthread_arch.h           |    4 +-
 arch/powerpc64/syscall_arch.h           |  100 +-
 arch/s390x/bits/limits.h                |    2 +-
 arch/s390x/bits/signal.h                |    2 +-
 arch/s390x/crt_arch.h                   |   32 +-
 arch/s390x/syscall_arch.h               |   72 +-
 arch/sh/atomic_arch.h                   |    6 +-
 arch/sh/bits/ioctl.h                    |    8 +-
 arch/sh/bits/limits.h                   |    2 +-
 arch/sh/bits/signal.h                   |    2 +-
 arch/sh/bits/user.h                     |    2 +-
 arch/sh/crt_arch.h                      |  118 +-
 arch/sh/pthread_arch.h                  |    2 +-
 arch/sh/syscall_arch.h                  |   80 +-
 arch/x32/atomic_arch.h                  |   64 +-
 arch/x32/bits/fenv.h                    |    4 +-
 arch/x32/bits/limits.h                  |    2 +-
 arch/x32/bits/shm.h                     |    2 +-
 arch/x32/bits/signal.h                  |    2 +-
 arch/x32/crt_arch.h                     |   22 +-
 arch/x32/pthread_arch.h                 |    2 +-
 arch/x32/syscall_arch.h                 |   18 +-
 arch/x86_64/atomic_arch.h               |   68 +-
 arch/x86_64/bits/fenv.h                 |    4 +-
 arch/x86_64/bits/limits.h               |    2 +-
 arch/x86_64/bits/signal.h               |    2 +-
 arch/x86_64/crt_arch.h                  |   22 +-
 arch/x86_64/pthread_arch.h              |    2 +-
 arch/x86_64/syscall_arch.h              |   12 +-
 crt/crt1.c                              |    2 +-
 crt/rcrt1.c                             |    2 +-
 include/arpa/inet.h                     |   12 +-
 include/arpa/nameser.h                  |   58 +-
 include/arpa/telnet.h                   |    4 +-
 include/assert.h                        |    2 +-
 include/ctype.h                         |    4 +-
 include/elf.h                           |  793 ++++----
 include/features.h                      |    4 +-
 include/fnmatch.h                       |    2 +-
 include/libintl.h                       |    2 +-
 include/limits.h                        |    4 +-
 include/locale.h                        |    4 +-
 include/malloc.h                        |   10 +-
 include/net/ethernet.h                  |    2 +-
 include/net/if.h                        |   12 +-
 include/net/if_arp.h                    |    6 +-
 include/netdb.h                         |   50 +-
 include/netinet/ether.h                 |    8 +-
 include/netinet/igmp.h                  |    2 +-
 include/netinet/in.h                    |   36 +-
 include/netinet/ip.h                    |   34 +-
 include/netinet/ip_icmp.h               |   20 +-
 include/netinet/tcp.h                   |   85 +-
 include/nl_types.h                      |    6 +-
 include/poll.h                          |    2 +-
 include/pwd.h                           |   14 +-
 include/resolv.h                        |   10 +-
 include/sched.h                         |    4 +-
 include/scsi/sg.h                       |   22 +-
 include/search.h                        |    4 +-
 include/setjmp.h                        |   18 +-
 include/signal.h                        |   10 +-
 include/stdio.h                         |   10 +-
 include/stdlib.h                        |  158 +-
 include/string.h                        |   78 +-
 include/strings.h                       |   28 +-
 include/sys/cachectl.h                  |    2 +-
 include/sys/epoll.h                     |    2 +-
 include/sys/ioctl.h                     |    2 +-
 include/sys/ipc.h                       |    2 +-
 include/sys/klog.h                      |    2 +-
 include/sys/mman.h                      |   30 +-
 include/sys/msg.h                       |    8 +-
 include/sys/mtio.h                      |   60 +-
 include/sys/prctl.h                     |    2 +-
 include/sys/resource.h                  |   10 +-
 include/sys/select.h                    |    4 +-
 include/sys/socket.h                    |   40 +-
 include/sys/statfs.h                    |    4 +-
 include/sys/statvfs.h                   |    8 +-
 include/sys/swap.h                      |    4 +-
 include/sys/sysinfo.h                   |   10 +-
 include/sys/sysmacros.h                 |    4 +-
 include/sys/time.h                      |   10 +-
 include/sys/times.h                     |    2 +-
 include/sys/uio.h                       |    8 +-
 include/sys/utsname.h                   |    2 +-
 include/sys/wait.h                      |   14 +-
 include/syslog.h                        |   10 +-
 include/termios.h                       |   22 +-
 include/threads.h                       |   12 +-
 include/time.h                          |   64 +-
 include/ulimit.h                        |    2 +-
 include/unistd.h                        |    2 +-
 include/utime.h                         |    2 +-
 include/wchar.h                         |  150 +-
 include/wctype.h                        |    4 +-
 include/wordexp.h                       |    4 +-
 ldso/dlstart.c                          |    6 +-
 ldso/dynlink.c                          |   88 +-
 src/aio/aio.c                           |    2 +-
 src/aio/aio_suspend.c                   |   10 +-
 src/complex/cexp.c                      |    4 +-
 src/complex/cexpf.c                     |    4 +-
 src/conf/legacy.c                       |    4 +-
 src/conf/sysconf.c                      |    4 +-
 src/crypt/crypt_blowfish.c              |  562 +++---
 src/crypt/crypt_des.c                   |   70 +-
 src/crypt/crypt_md5.c                   |   21 +-
 src/crypt/crypt_sha256.c                |   21 +-
 src/crypt/crypt_sha512.c                |   45 +-
 src/crypt/encrypt.c                     |    6 +-
 src/ctype/__ctype_b_loc.c               |   48 +-
 src/ctype/__ctype_tolower_loc.c         |   38 +-
 src/ctype/__ctype_toupper_loc.c         |   38 +-
 src/ctype/iswcntrl.c                    |    6 +-
 src/ctype/towctrans.c                   |    8 +-
 src/dirent/readdir.c                    |    2 +-
 src/dirent/readdir_r.c                  |    2 +-
 src/env/__libc_start_main.c             |    8 +-
 src/env/__reset_tls.c                   |    8 +-
 src/exit/arm/__aeabi_atexit.c           |    2 +-
 src/exit/atexit.c                       |   14 +-
 src/exit/exit.c                         |    4 +-
 src/fenv/fesetround.c                   |   10 +-
 src/fenv/powerpc64/fenv.c               |    4 +-
 src/fenv/s390x/fenv.c                   |    4 +-
 src/internal/atomic.h                   |    8 +-
 src/internal/floatscan.c                |    8 +-
 src/internal/intscan.c                  |   33 +-
 src/internal/syscall.h                  |    4 +-
 src/internal/vdso.c                     |    6 +-
 src/ldso/dl_iterate_phdr.c              |    2 +-
 src/ldso/dlerror.c                      |    2 +-
 src/linux/fanotify.c                    |    2 +-
 src/linux/x32/sysinfo.c                 |    4 +-
 src/locale/__mo_lookup.c                |    3 +-
 src/locale/dcngettext.c                 |    2 +-
 src/locale/iconv.c                      |   66 +-
 src/locale/langinfo.c                   |    2 +-
 src/locale/locale_map.c                 |   42 +-
 src/locale/strfmon.c                    |    8 +-
 src/malloc/malloc.c                     |    8 +-
 src/math/__cos.c                        |   12 +-
 src/math/__cosdf.c                      |    8 +-
 src/math/__cosl.c                       |   36 +-
 src/math/__invtrigl.c                   |   62 +-
 src/math/__rem_pio2.c                   |   16 +-
 src/math/__rem_pio2_large.c             |  249 ++-
 src/math/__rem_pio2f.c                  |    8 +-
 src/math/__rem_pio2l.c                  |   28 +-
 src/math/__sin.c                        |   12 +-
 src/math/__sindf.c                      |    8 +-
 src/math/__sinl.c                       |   40 +-
 src/math/__tan.c                        |   30 +-
 src/math/__tandf.c                      |   12 +-
 src/math/__tanl.c                       |   96 +-
 src/math/acos.c                         |   24 +-
 src/math/acosf.c                        |   12 +-
 src/math/asin.c                         |   24 +-
 src/math/asinf.c                        |   10 +-
 src/math/atan.c                         |   38 +-
 src/math/atan2.c                        |   14 +-
 src/math/atan2f.c                       |   12 +-
 src/math/atan2l.c                       |   10 +-
 src/math/atanf.c                        |   26 +-
 src/math/atanl.c                        |   76 +-
 src/math/cbrt.c                         |   14 +-
 src/math/cbrtf.c                        |    4 +-
 src/math/cos.c                          |    4 +-
 src/math/cosf.c                         |   14 +-
 src/math/erf.c                          |  118 +-
 src/math/erff.c                         |  118 +-
 src/math/erfl.c                         |   58 +-
 src/math/exp.c                          |   20 +-
 src/math/exp2.c                         |  524 ++---
 src/math/exp2f.c                        |   42 +-
 src/math/exp2l.c                        |  180 +-
 src/math/expf.c                         |   12 +-
 src/math/expl.c                         |   20 +-
 src/math/expm1.c                        |   18 +-
 src/math/expm1f.c                       |   12 +-
 src/math/expm1l.c                       |   28 +-
 src/math/fmaf.c                         |    7 +-
 src/math/j0.c                           |  236 ++-
 src/math/j0f.c                          |  236 ++-
 src/math/j1.c                           |  236 ++-
 src/math/j1f.c                          |  236 ++-
 src/math/jn.c                           |    6 +-
 src/math/jnf.c                          |    4 +-
 src/math/lgamma_r.c                     |  124 +-
 src/math/lgammaf_r.c                    |  124 +-
 src/math/lgammal.c                      |  126 +-
 src/math/log.c                          |   18 +-
 src/math/log10.c                        |   22 +-
 src/math/log10f.c                       |   16 +-
 src/math/log10l.c                       |   42 +-
 src/math/log1p.c                        |   18 +-
 src/math/log1pf.c                       |   12 +-
 src/math/log1pl.c                       |   40 +-
 src/math/log2.c                         |   18 +-
 src/math/log2f.c                        |   12 +-
 src/math/log2l.c                        |   42 +-
 src/math/logb.c                         |    6 +-
 src/math/logf.c                         |   12 +-
 src/math/logl.c                         |   40 +-
 src/math/pow.c                          |   56 +-
 src/math/powf.c                         |   58 +-
 src/math/powl.c                         |  128 +-
 src/math/sin.c                          |    4 +-
 src/math/sincosf.c                      |    8 +-
 src/math/sinf.c                         |   14 +-
 src/math/tanf.c                         |    8 +-
 src/math/tgammal.c                      |  100 +-
 src/misc/fmtmsg.c                       |   28 +-
 src/misc/getopt_long.c                  |    7 +-
 src/misc/nftw.c                         |   12 +-
 src/misc/syslog.c                       |    4 +-
 src/misc/wordexp.c                      |   84 +-
 src/multibyte/internal.c                |   22 +-
 src/multibyte/mbrtoc16.c                |    2 +-
 src/multibyte/mbrtowc.c                 |    2 +-
 src/multibyte/mbsnrtowcs.c              |   32 +-
 src/multibyte/mbsrtowcs.c               |   97 +-
 src/multibyte/wcsnrtombs.c              |   20 +-
 src/network/dn_comp.c                   |   14 +-
 src/network/getaddrinfo.c               |   47 +-
 src/network/gethostbyname2_r.c          |   30 +-
 src/network/getnameinfo.c               |    3 +-
 src/network/getservbyname_r.c           |   12 +-
 src/network/herror.c                    |    2 +-
 src/network/inet_pton.c                 |    2 +-
 src/network/lookup_name.c               |   18 +-
 src/network/lookup_serv.c               |   14 +-
 src/network/ns_parse.c                  |    2 +-
 src/network/res_msend.c                 |    8 +-
 src/network/resolvconf.c                |   14 +-
 src/passwd/getgr_a.c                    |    8 +-
 src/passwd/getgr_r.c                    |    4 +-
 src/passwd/getpw_a.c                    |   14 +-
 src/passwd/getpw_r.c                    |    2 +-
 src/passwd/putgrent.c                   |    2 +-
 src/prng/random.c                       |   17 +-
 src/process/execvp.c                    |    2 +-
 src/process/posix_spawn.c               |    4 +-
 src/regex/fnmatch.c                     |   21 +-
 src/regex/glob.c                        |    6 +-
 src/regex/regcomp.c                     | 3358 +++++++++++++++----------------
 src/regex/regerror.c                    |   30 +-
 src/regex/regexec.c                     | 1451 +++++++------
 src/regex/tre-mem.c                     |  154 +-
 src/regex/tre.h                         |  128 +-
 src/sched/sched_cpucount.c              |    2 +-
 src/search/hsearch.c                    |    4 +-
 src/search/tsearch_avl.c                |    6 +-
 src/select/poll.c                       |    2 +-
 src/signal/sigaction.c                  |    2 +-
 src/signal/sigset.c                     |    2 +-
 src/signal/sigsetjmp_tail.c             |    2 +-
 src/stdio/fclose.c                      |    2 +-
 src/stdio/fmemopen.c                    |    6 +-
 src/stdio/fread.c                       |    2 +-
 src/stdio/fseek.c                       |    2 +-
 src/stdio/fwide.c                       |    2 +-
 src/stdio/open_wmemstream.c             |    2 +-
 src/stdio/perror.c                      |    2 +-
 src/stdio/popen.c                       |    4 +-
 src/stdio/vfprintf.c                    |   87 +-
 src/stdio/vfscanf.c                     |   22 +-
 src/stdio/vfwprintf.c                   |   46 +-
 src/stdio/vfwscanf.c                    |    7 +-
 src/stdlib/qsort.c                      |    6 +-
 src/string/memccpy.c                    |    2 +-
 src/string/memcpy.c                     |   96 +-
 src/string/stpncpy.c                    |    2 +-
 src/string/strlcpy.c                    |    2 +-
 src/string/strncmp.c                    |    2 +-
 src/string/strsignal.c                  |   14 +-
 src/thread/__syscall_cp.c               |    8 +-
 src/thread/arm/__set_thread_area.c      |    4 +-
 src/thread/pthread_barrier_init.c       |    2 +-
 src/thread/pthread_barrier_wait.c       |    2 +-
 src/thread/pthread_cancel.c             |    8 +-
 src/thread/pthread_cond_timedwait.c     |    4 +-
 src/thread/pthread_create.c             |    4 +-
 src/thread/pthread_mutex_timedlock.c    |    4 +-
 src/thread/pthread_once.c               |   32 +-
 src/thread/pthread_rwlock_timedrdlock.c |    2 +-
 src/thread/pthread_rwlock_timedwrlock.c |    4 +-
 src/thread/sh/__set_thread_area.c       |    2 +-
 src/thread/thrd_join.c                  |    8 +-
 src/thread/x32/syscall_cp_fixup.c       |    4 +-
 src/time/__asctime.c                    |    3 +-
 src/time/__month_to_secs.c              |    3 +-
 src/time/clock.c                        |    2 +-
 src/time/strptime.c                     |    8 +-
 src/time/utime.c                        |    3 +-
 src/unistd/faccessat.c                  |    2 +-
 src/unistd/fchown.c                     |    1 -
 358 files changed, 7466 insertions(+), 7712 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: automated coding style
  2018-01-11  4:41           ` Samuel Holland
@ 2018-01-11  8:28             ` Jens Gustedt
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Gustedt @ 2018-01-11  8:28 UTC (permalink / raw)
  Cc: musl

[-- Attachment #1: Type: text/plain, Size: 1576 bytes --]

Hello Samuel,

On Wed, 10 Jan 2018 22:41:23 -0600 Samuel Holland <samuel@sholland.org>
wrote:

> uncrustify is the most flexible of the three, making it the best for
> matching an existing code style. It's also a bit buggier than
> clang-format, especially when the tool wraps long lines for you
> (though this doesn't affect code already following a line length
> limit).

I didn't know of uncrustify. Looks promissing.

> I wrote a script to permute over 600 of the options to uncrustify and
> generate a configuration with the smallest diff from the current musl
> source. Attached is that configuration file, as well as a diffstat
> from 628cf979b249. You can see most of the changes are in the inline
> assembly syntax, continuation of header namespace guards, and the
> imported code in src/math and src/regex.

This sounds like a lot of work.

Debian also has UniversalIndentGui which seems to be able to guide
through the choice of different beautifier options.

> I don't claim that this file represents the "official musl style",
> especially because it's autogenerated, but hopefully it's useful for
> formatting patches.

Such patches could be applied as last before release, or first after,
such that we have a clear cut.

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-01-11  8:28 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-23 14:38 [PATCH 0/8] the new __lock and follow up patches Jens Gustedt
2017-06-16  7:11 ` [PATCH 1/8] (V2) a new lock algorithm with lock value and CS counts in the same atomic int Jens Gustedt
2017-12-20 21:58   ` Rich Felker
2017-12-21 11:06     ` Jens Gustedt
2017-12-21 23:34       ` Rich Felker
2018-01-03 14:08         ` automated coding style Jens Gustedt
2018-01-11  4:41           ` Samuel Holland
2018-01-11  8:28             ` Jens Gustedt
2017-06-20 13:25 ` [PATCH 3/8] revise the definition of multiple basic locks in the code Jens Gustedt
2017-06-20 19:08 ` [PATCH 6/8] use the new lock algorithm for malloc Jens Gustedt
2017-06-23 15:01   ` Jens Gustedt
2017-06-20 19:41 ` [PATCH 2/8] consistently use the LOCK an UNLOCK macros Jens Gustedt
2017-06-20 19:44 ` [PATCH 5/8] separate the fast parts of __lock and __unlock into a .h file that may be used by other TU Jens Gustedt
2017-06-20 20:35 ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Jens Gustedt
2017-06-23 17:05   ` Rich Felker
2017-06-23 17:16     ` Jens Gustedt
2017-06-23 21:48     ` Jens Gustedt
2017-06-23 22:08       ` Rich Felker
2017-06-23 23:42         ` Jens Gustedt
2017-06-24  0:53           ` Rich Felker
2017-06-24  8:00             ` Jens Gustedt
2017-06-24 10:12               ` flag 128 Jens Gustedt
2017-06-24 22:28                 ` Rich Felker
2017-06-24 22:23   ` [PATCH 4/8] determine the existence of private futexes at the first thread creation Rich Felker
2017-06-22 21:17 ` [PATCH 7/8] implement __unlock_requeue Jens Gustedt
2017-06-22 21:42 ` [PATCH 8/8] implement the local lock for conditions with __lock & Co Jens Gustedt
2017-06-23 14:57 ` [PATCH 0/8] the new __lock and follow up patches Jens Gustedt

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).