On Tue, Jan 09, 2018 at 02:26:44PM -0500, Rich Felker wrote: > On Tue, Jan 09, 2018 at 07:58:51PM +0100, Jens Gustedt wrote: > > Hello Rich, > > > > On Tue, 9 Jan 2018 12:42:34 -0500 Rich Felker wrote: > > > > > On Wed, Jan 03, 2018 at 02:17:12PM +0100, Jens Gustedt wrote: > > > > Malloc used a specialized lock implementation in many places. Now > > > > that we have a generic lock that has the desired properties, we > > > > should just use this, instead of this multitude of very similar > > > > lock mechanisms. --- > > > > src/malloc/malloc.c | 38 +++++++++++++------------------------- > > > > 1 file changed, 13 insertions(+), 25 deletions(-) > > > > > > > > diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c > > > > index 9e05e1d6..6c667a5a 100644 > > > > --- a/src/malloc/malloc.c > > > > +++ b/src/malloc/malloc.c > > > > @@ -13,6 +13,8 @@ > > > > #define inline inline __attribute__((always_inline)) > > > > #endif > > > > > > > > +#include "__lock.h" > > > > + > > > > > > Ah, I see -- maybe you deemed malloc to be the only place where > > > inlining for the sake of speed made sense? That's probably true. > > > > Yes, and also I was trying to be conservative. Previously, the lock > > functions for malloc resided in the same TU, so they were probably > > inlined most of the time. > > Yes, and that was done because (at least at the time) it made a > significant empirical difference. So I suspect it makes sense to do > the same still. I've queued your patches 1-3 for inclusion in my next > push unless I see any major problem. I might try to get the rest > included too but being that I'm behind on this release cycle we'll > see.. > > Thanks for all your work on this and patience. :) I'm just coming back to look at this, and I can't get the new lock to perform comparably well to the current one, much less better, in malloc. I suspect the benefit of just being able to do a store and relaxed read on x86 for the unlock is too great to beat. Note that I just fixed a bug related to this on powerpc64 in commit 12817793301398241b6cb00c740f0d3ca41076e9 and I expect the performance properties might be reversed on non-x86 archs. I did have to hack it in since the patch from this series no longer directly applies, and I just did it inline as a test, but I don't think I did anything wrong there; it's attached for reference. I'm also attaching the (very old) malloc_stress.c I used to measure. I noticed the greatest differences running it with test #3 and 4 threads (./malloc_stress 3 4), where 4 is the number of cores. Rich