From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5900 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Multi-threaded performance progress Date: Mon, 25 Aug 2014 23:43:21 -0400 Message-ID: <20140826034321.GA13999@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1409024621 22771 80.91.229.3 (26 Aug 2014 03:43:41 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 26 Aug 2014 03:43:41 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5906-gllmg-musl=m.gmane.org@lists.openwall.com Tue Aug 26 05:43:35 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XM7fS-0007jm-1R for gllmg-musl@plane.gmane.org; Tue, 26 Aug 2014 05:43:34 +0200 Original-Received: (qmail 9383 invoked by uid 550); 26 Aug 2014 03:43:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 9367 invoked from network); 26 Aug 2014 03:43:33 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5900 Archived-At: This release cycle looks like it's going to be huge for multi-threaded performance issues. So far the cumulative improvement on my main development system, as measured by the cond_bench.c by Timo Teräs, is from ~250k signals in 2 seconds up to ~3.7M signals in 2 seconds. That's comparable to what glibc gets on similar hardware with a cond var implementation that's much less correct. The improvements are a result of adding private futex support, redesigning the cond var implementation, and improvements to the spin-before-futex-wait behavior. Semaphore performance has also improved, up from fewer than 500k wait/post operations to ~12M, mostly due to spin-before-futex-wait. The above results are all based on micro-benchmarks which are potentially meaningless to real-world applications, so I'd be interested in seeing any higher-level or real-application-based comparisons of the old and new code. There is one remaining performance issue I still want to look into fixing, possibly during this release cycle: when a thread repeatedly takes and releases a lock on which other threads are waiting, it makes a futex wake syscall on each unlock, despite only the first one being necessary. I have a design for avoiding this on internal locks, but it's less obvious how to do it for mutexes where storage is tight and self-synchronized destruction is possible. We're near the end of my planned time frame for this release cycle, but I'm still interested in working with Jens to get C11 threads into this release if possible, so I'll probably extend it for a while still. Rich