From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12343 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH 4/7] separate the fast parts of __lock and __unlock into a .h file that may be used by other TU Date: Tue, 9 Jan 2018 12:41:08 -0500 Message-ID: <20180109174108.GZ1627@brightrain.aerifal.cx> References: <41048892c9aa47a6ebde97cd90f244981221a437.1514985618.git.Jens.Gustedt@inria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1515519574 16526 195.159.176.226 (9 Jan 2018 17:39:34 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 9 Jan 2018 17:39:34 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-12359-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 09 18:39:29 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1eYxrv-0003b1-J0 for gllmg-musl@m.gmane.org; Tue, 09 Jan 2018 18:39:23 +0100 Original-Received: (qmail 16023 invoked by uid 550); 9 Jan 2018 17:41:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 15994 invoked from network); 9 Jan 2018 17:41:20 -0000 Content-Disposition: inline In-Reply-To: <41048892c9aa47a6ebde97cd90f244981221a437.1514985618.git.Jens.Gustedt@inria.fr> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12343 Archived-At: On Wed, Jan 03, 2018 at 02:17:12PM +0100, Jens Gustedt wrote: > This provides two interfaces __lock_fast and __unlock_fast that are both > "static inline" and that should result in a better integration of the > lock in place. The slow path of the lock algorithm remains centralized, > here adding the overhead of a function call is not a big deal. > > This must only be used directly by a TU that encapsulates all LOCK and > UNLOCK calls of a particular lock object. > --- > src/internal/__lock.h | 21 +++++++++++++++++++++ > src/internal/libc.h | 1 + > src/thread/__lock.c | 20 +++++--------------- > 3 files changed, 27 insertions(+), 15 deletions(-) > create mode 100644 src/internal/__lock.h > > diff --git a/src/internal/__lock.h b/src/internal/__lock.h > new file mode 100644 > index 00000000..f16d6176 > --- /dev/null > +++ b/src/internal/__lock.h > @@ -0,0 +1,21 @@ > +#include "pthread_impl.h" > + > +static inline void __lock_fast(volatile int *l) > +{ > + extern void __lock_slow(volatile int*, int); > + if (!libc.threads_minus_1) return; > + /* fast path: INT_MIN for the lock, +1 for the congestion */ > + int current = a_cas(l, 0, INT_MIN + 1); > + if (!current) return; > + __lock_slow(l, current); > +} I think I'd like to hold off on this one for now until we study the impact. Burying the libc.threads_minus_1 check inside an external function was actually something of a trick to prevent the caller from needing a GOT pointer; on archs where loading the GOT pointer is a pain, putting the load in the caller may badly un-shrinkwrap the function. In practice, that probably doesn't work correctly right now because too many internal functions are missing hidden attribute, and too many of the affected functions call public functions (which the compiler was able to assume didn't go thru PLT when we had the vis.h hack, but that's off by default now because it was a bad idea). So figuring out whether this sort of shrinkwrap-assistance is still salvagable or just needs to be thrown out is an open question.. > +static inline void __unlock_fast(volatile int *l) > +{ > + /* Check l[0] to see if we are multi-threaded. */ > + if (l[0] < 0) { > + if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) { > + __wake(l, 1, 1); > + } > + } > +} This one probably already makes sense to inline. > diff --git a/src/internal/libc.h b/src/internal/libc.h > index 5e145183..a594d0c5 100644 > --- a/src/internal/libc.h > +++ b/src/internal/libc.h > @@ -47,6 +47,7 @@ extern size_t __sysinfo ATTR_LIBC_VISIBILITY; > extern char *__progname, *__progname_full; > > /* Designed to avoid any overhead in non-threaded processes */ > +void __lock_slow(volatile int *, int) ATTR_LIBC_VISIBILITY; > void __lock(volatile int *) ATTR_LIBC_VISIBILITY; > void __unlock(volatile int *) ATTR_LIBC_VISIBILITY; > int __lockfile(FILE *) ATTR_LIBC_VISIBILITY; > diff --git a/src/thread/__lock.c b/src/thread/__lock.c > index 45557c88..a3d8d4d0 100644 > --- a/src/thread/__lock.c > +++ b/src/thread/__lock.c > @@ -1,4 +1,5 @@ > #include "pthread_impl.h" > +#include "__lock.h" > > /* This lock primitive combines a flag (in the sign bit) and a > * congestion count (= threads inside the critical section, CS) in a > @@ -16,12 +17,11 @@ > * with INT_MIN as a lock flag. > */ > > -void __lock(volatile int *l) > +weak_alias(__lock_fast, __lock); > +weak_alias(__unlock_fast, __unlock); > + > +void __lock_slow(volatile int *l, int current) > { > - if (!libc.threads_minus_1) return; > - /* fast path: INT_MIN for the lock, +1 for the congestion */ > - int current = a_cas(l, 0, INT_MIN + 1); > - if (!current) return; > /* A first spin loop, for medium congestion. */ > for (unsigned i = 0; i < 10; ++i) { > if (current < 0) current -= INT_MIN + 1; > @@ -48,13 +48,3 @@ void __lock(volatile int *l) > current = val; > } > } > - > -void __unlock(volatile int *l) > -{ > - /* Check l[0] to see if we are multi-threaded. */ > - if (l[0] < 0) { > - if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) { > - __wake(l, 1, 1); > - } > - } > -} > -- > 2.15.1 I'm missing the mechanism by which the new __lock_fast and __unlock_fast ever get inlined. __lock.h does not seem to be included anywhere. In order to make this work right, some refactoring discussion of what should be exposed by libc.h vs pthread_impl.h probably needs to be discussed. I think the factoring right now is fairly wrong; lock and futex primitives aren't really part of the "pthread implementation" but general stuff libc needs internally. Rich