From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8247 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: New optimized normal-type mutex? Date: Thu, 30 Jul 2015 09:45:19 -0400 Message-ID: <20150730134519.GB16376@brightrain.aerifal.cx> References: <20150521234402.GA25373@brightrain.aerifal.cx> <8c49d81e.dNq.dMV.21.hNiSfA@mailjet.com> <1438207875.10742.3.camel@inria.fr> <20150729233054.GZ16376@brightrain.aerifal.cx> <1438213760.10742.5.camel@inria.fr> <20150730001014.GA16376@brightrain.aerifal.cx> <1438243654.10742.9.camel@inria.fr> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1438263937 17589 80.91.229.3 (30 Jul 2015 13:45:37 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Thu, 30 Jul 2015 13:45:37 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-8260-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jul 30 15:45:37 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1ZKo9Q-0000QA-Ii for gllmg-musl@m.gmane.org; Thu, 30 Jul 2015 15:45:36 +0200 Original-Received: (qmail 24031 invoked by uid 550); 30 Jul 2015 13:45:33 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 24013 invoked from network); 30 Jul 2015 13:45:32 -0000 Content-Disposition: inline In-Reply-To: <1438243654.10742.9.camel@inria.fr> User-Agent: Mutt/1.5.21 (2010-09-15) Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:8247 Archived-At: On Thu, Jul 30, 2015 at 10:07:34AM +0200, Jens Gustedt wrote: > Am Mittwoch, den 29.07.2015, 20:10 -0400 schrieb Rich Felker: > > On Thu, Jul 30, 2015 at 01:49:20AM +0200, Jens Gustedt wrote: > > > Hm, could you be more specific about where this hurts? > > > > > > In the code I have there is > > > > > > for (;val & lockbit;) { > > > __syscall(SYS_futex, loc, FUTEX_WAIT, val, 0); > > > val = atomic_load_explicit(loc, memory_order_consume); > > > } > > > > > > so this should be robust against spurious wakeups, no? > > > > The problem is that futex_wait returns immediately with EAGAIN if > > *loc!=val, which happens very often if *loc is incremented or > > otherwise changed on each arriving waiter. > > Yes, sure, it may change. Whether or not this is often may depend, I > don't think we can easily make a quantitative statement, here. The same happened with the obvious implementation of cond vars, having a waiter store its tid then wait on *val==tid with futex_wait. The ratio of spurious wakes to intended wakes was something like 10:1. So this is not a purely theoretical problem. > In the case of atomics the critical section is extremely short, and > the count, if it varies so much, should have a bit stabilized during > the spinlock phase before coming to the futex part. That futex part is > really only a last resort for the rare case that the thread that is > holding the lock has been descheduled in the middle. Spinning is near-useless whenever you have more contenders than cores. It's an optimization for a special case (fewer contenders than cores) but you don't want to rely on it. > My current test case is having X threads hammer on one single > location, X being up to some hundred. On my 2x2 hyperthreaded CPU for > a reasonable number of threads (X = 16 or 32) I have an overall > performance improvement of 30%, say, when using my version of the lock > instead of the original musl one. The point of inversion where the > original musl lock is better is at about 200 threads. Interesting. I suspect this would change quickly if you increased the amount of work done with the lock held (e.g. to more closely mimic malloc). The nasty thing is that, as soon as you _do_ cause futex_wait to start happening rather than just spinning, performance blows up because each thread that needs to wait calls futex_wait several times before it succeeds, taking up cpu time where the thread that holds the lock could be running. Rich