From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5900
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Multi-threaded performance progress
Date: Mon, 25 Aug 2014 23:43:21 -0400
Message-ID: <20140826034321.GA13999@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Trace: ger.gmane.org 1409024621 22771 80.91.229.3 (26 Aug 2014 03:43:41 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 26 Aug 2014 03:43:41 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-5906-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 26 05:43:35 2014
Return-path: <musl-return-5906-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-5906-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1XM7fS-0007jm-1R
	for gllmg-musl@plane.gmane.org; Tue, 26 Aug 2014 05:43:34 +0200
Original-Received: (qmail 9383 invoked by uid 550); 26 Aug 2014 03:43:33 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 9367 invoked from network); 26 Aug 2014 03:43:33 -0000
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:5900
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/5900>

This release cycle looks like it's going to be huge for multi-threaded
performance issues. So far the cumulative improvement on my main
development system, as measured by the cond_bench.c by Timo Teräs, is
from ~250k signals in 2 seconds up to ~3.7M signals in 2 seconds.
That's comparable to what glibc gets on similar hardware with a cond
var implementation that's much less correct. The improvements are a
result of adding private futex support, redesigning the cond var
implementation, and improvements to the spin-before-futex-wait
behavior.

Semaphore performance has also improved, up from fewer than 500k
wait/post operations to ~12M, mostly due to spin-before-futex-wait.

The above results are all based on micro-benchmarks which are
potentially meaningless to real-world applications, so I'd be
interested in seeing any higher-level or real-application-based
comparisons of the old and new code.

There is one remaining performance issue I still want to look into
fixing, possibly during this release cycle: when a thread repeatedly
takes and releases a lock on which other threads are waiting, it makes
a futex wake syscall on each unlock, despite only the first one being
necessary. I have a design for avoiding this on internal locks, but
it's less obvious how to do it for mutexes where storage is tight and
self-synchronized destruction is possible.

We're near the end of my planned time frame for this release cycle,
but I'm still interested in working with Jens to get C11 threads into
this release if possible, so I'll probably extend it for a while
still.

Rich