From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5875 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: New private cond var design Date: Mon, 18 Aug 2014 00:04:37 -0400 Message-ID: <20140818040437.GO12888@brightrain.aerifal.cx> References: <20140815193536.GA26312@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1408334701 23849 80.91.229.3 (18 Aug 2014 04:05:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 18 Aug 2014 04:05:01 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5881-gllmg-musl=m.gmane.org@lists.openwall.com Mon Aug 18 06:04:54 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XJEBi-0005Af-5q for gllmg-musl@plane.gmane.org; Mon, 18 Aug 2014 06:04:54 +0200 Original-Received: (qmail 5845 invoked by uid 550); 18 Aug 2014 04:04:52 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 5837 invoked from network); 18 Aug 2014 04:04:51 -0000 Content-Disposition: inline In-Reply-To: <20140815193536.GA26312@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5875 Archived-At: On Fri, Aug 15, 2014 at 03:35:36PM -0400, Rich Felker wrote: > Jens' proposed solution tracked "instances" via dynamically allocated, > reference-counted objects. I finally think I have a solution which > avoids dynamic allocation: representing the "instance" as a > doubly-linked-list of automatic objects on the stack of each waiter. > > [...] > > Option 2: Each waiter can wait on a separate futex on its own stack, > so that sequence numbers are totally unneeded. This eliminates all > spurious wakes; signal can precisely control exactly which waiter > wakes (e.g. choosing the oldest), thereby waking only one waiter. > Broadcast then becomes much more expensive: the broadcasting thread > has to make one requeue syscall per waiter. But this still might be a > good design. I ended up implementing option 2. It doesn't avoid O(n) operations in broadcast like Jens seems to have had in mind (although it only makes O(1) syscalls); avoiding the need for waiters to access the cv object after waiting (which would require round-trip synchronization with all waiters at broadcast time) necessitates writing to each waiter node object. I have some improvements in mind that might further reduce the time spent in broadcast and make the wake order more predictable (and reduce code size) to look into soon, though. So far I'm really happy with the performance -- it's much better than before, sometimes even 2-3x. The benchmark by Timo Teräs that showed the benefit of private futexes particularly benefits. My cvb2.c test benefits much less: roughly 15% improvement for private mutex and ~33% improvement with process-shared mutex. I'd still like to look into whether it's possible to improve process-shared cond vars (even at the expense of some performance) to get rid of the synchronization in the destroy function (and thereby make them unmapping-safe too). Rich