From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5795 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] private futex support Date: Fri, 8 Aug 2014 21:50:13 -0400 Message-ID: <20140809015013.GX1674@brightrain.aerifal.cx> References: <20140626184803.GA8845@brightrain.aerifal.cx> <20140808113857.1839babf@vostro> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="ZmUaFz6apKcXQszQ" X-Trace: ger.gmane.org 1407549034 27468 80.91.229.3 (9 Aug 2014 01:50:34 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 9 Aug 2014 01:50:34 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-5800-gllmg-musl=m.gmane.org@lists.openwall.com Sat Aug 09 03:50:29 2014 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1XFvne-0005OO-Us for gllmg-musl@plane.gmane.org; Sat, 09 Aug 2014 03:50:27 +0200 Original-Received: (qmail 19923 invoked by uid 550); 9 Aug 2014 01:50:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 19913 invoked from network); 9 Aug 2014 01:50:26 -0000 Content-Disposition: inline In-Reply-To: <20140808113857.1839babf@vostro> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:5795 Archived-At: --ZmUaFz6apKcXQszQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Aug 08, 2014 at 11:38:57AM +0300, Timo Teras wrote: > > actually commit it. If anyone is interested in this feature, please > > see if you can find some examples that demonstrate that it measurably > > improves performance. > > And running my simple test-case of having two threads wake up each > other using a condition variable, seems to yield noticeable performance > speed up from private futexes. See at the end of mail for the code. > > The low and high numbers from few test runs are as follows from musl > git 4fe57cad709fdfb377060 without and with the futex patch are as > follows: > > ~/privfutex $ time ~/oss/musl/lib/libc.so ./test > count=2516417 > real 0m 2.00s > user 0m 1.68s > sys 0m 2.30s > > ~/privfutex $ time ~/oss/musl/lib/libc.so ./test > count=2679381 > real 0m 2.00s > user 0m 1.59s > sys 0m 2.39s > > Private futexes: > > ~/privfutex $ time ~/oss/musl/lib/libc.so ./test > count=3839470 > real 0m 2.00s > user 0m 1.68s > sys 0m 1.98s > > ~/privfutex $ time ~/oss/musl/lib/libc.so ./test > count=5350852 > real 0m 2.00s > user 0m 1.66s > sys 0m 2.32s > > > You can see essentially lowered sys time use, and up to doubled > throughput of wait/wake operations. I was able to match the relative difference (albeit at about 10% of the total throughput you got for both versions) on my atom. I also dug up an old test of mine that shows some difference (1.9s vs 2.2s to run). The original point of this test was to demonstrate that glibc's non-process-shared condvars are 2-2.5x slower than their process-shared ones (yes, the opposite of what you would expect; see bug 13234). The code is attached. > So I suspect your test case was not measuring right thing. Private > futexes speed up only specific loads, and this type of pthread_cond_t > usage would probably be the pattern benefiting most. > > Please reconsidering adding this after addressing the found > deficiencies stated in the beginning. Yes, I think you've succeeded in establishing that private futex support is useful. So now I just need to check for more stupid mistakes, get it into a form that's ready to commit, and do some testing between now and the next release. We should do at least one test with private futexes hard-wired to fail (or just find an old kernel to test on) to make sure the fallback code is working, too. Rich --ZmUaFz6apKcXQszQ Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="cvb2.c" #include #include pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t c = PTHREAD_COND_INITIALIZER; volatile int p; int left[5], avail[5], wakes; void *tf(void *arg) { int i = (long)arg; pthread_mutex_lock(&m); while (left[i]) { while (!avail[i]) pthread_cond_wait(&c, &m), wakes++; left[i]--; avail[i]--; } pthread_mutex_unlock(&m); } int main() { pthread_t td[5]; int i, total; pthread_mutexattr_t ma; pthread_mutexattr_init(&ma); pthread_mutexattr_settype(&ma, PTHREAD_MUTEX_ERRORCHECK); pthread_condattr_t ca; pthread_condattr_init(&ca); pthread_condattr_setpshared(&ca, PTHREAD_PROCESS_SHARED); //pthread_cond_init(&c, &ca); //pthread_mutex_init(&m, &ma); for (i=0; i<5; i++) left[i] = 100000; for (i=0; i<5; i++) pthread_create(td+i, 0, tf, (void*)(long)i); pthread_mutex_lock(&m); for (;;) { for (total=i=0; i<5; i++) total+=left[i]; if (!total) break; for (i=0; i<5; i++) avail[i]=1; pthread_cond_broadcast(&c); pthread_mutex_unlock(&m); pthread_mutex_lock(&m); } pthread_mutex_unlock(&m); for (i=0; i<5; i++) pthread_join(td[i], 0); printf("%d\n", wakes); } --ZmUaFz6apKcXQszQ--