From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8246
Path: news.gmane.org!not-for-mail
From: Alexander Monakov <amonakov@ispras.ru>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: New optimized normal-type mutex?
Date: Thu, 30 Jul 2015 14:37:13 +0300 (MSK)
Message-ID: <alpine.LNX.2.20.1507301426460.11825@monopod.intra.ispras.ru>
References: <20150521234402.GA25373@brightrain.aerifal.cx>  <8c49d81e.dNq.dMV.21.hNiSfA@mailjet.com>  <1438207875.10742.3.camel@inria.fr>  <20150729233054.GZ16376@brightrain.aerifal.cx>  <1438213760.10742.5.camel@inria.fr>  <20150730001014.GA16376@brightrain.aerifal.cx>
  <1438243654.10742.9.camel@inria.fr> <1438247427.10742.13.camel@inria.fr>  <alpine.LNX.2.20.1507301228230.11825@monopod.intra.ispras.ru> <1438250459.10742.16.camel@inria.fr>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Trace: ger.gmane.org 1438256248 18659 80.91.229.3 (30 Jul 2015 11:37:28 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 30 Jul 2015 11:37:28 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-8259-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jul 30 13:37:28 2015
Return-path: <musl-return-8259-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-8259-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ZKm9O-0001Yw-Rg
	for gllmg-musl@m.gmane.org; Thu, 30 Jul 2015 13:37:26 +0200
Original-Received: (qmail 26024 invoked by uid 550); 30 Jul 2015 11:37:24 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 26004 invoked from network); 30 Jul 2015 11:37:24 -0000
In-Reply-To: <1438250459.10742.16.camel@inria.fr>
User-Agent: Alpine 2.20 (LNX 67 2015-01-07)
Xref: news.gmane.org gmane.linux.lib.musl.general:8246
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/8246>

On Thu, 30 Jul 2015, Jens Gustedt wrote:
> Am Donnerstag, den 30.07.2015, 12:36 +0300 schrieb Alexander Monakov:
> > That sounds like your testcase simulates a load where you'd be better off with
> > a spinlock in the first place, no?
> 
> Hm, this is not a "testcase" in the sense that this is the real code
> that I'd like to use for the generic atomic lock-full stuff. My test
> is just using this atomic lock-full thing, with a lot of threads that
> use the same head of a "lock-free" FIFO implementation. There the
> inner part in the critical section is just memcpy of some bytes. For
> reasonable uses of atomics this should be about 16 to 32 bytes that
> are copied.
> 
> So this is really a use case that I consider important, and that I
> would like to see implemented with similar performance.

I acknowledge that that seems like an important case, but you have not
addressed my main point.  With so little work in the critical section, it does
not make sense to me that you would use something like a normal-type futex-y
mutex.  Even a call/return to grab it gives you some overhead.  I'd expect you
would use a fully inlined spinlock acquisition/release around the memory copy.

> 
> (I didn't yet think of making this into a fullfledged mutex,
> implementing timed versions certainly needs some thinking.)
> 
> > Have you tried simulating a load that does some non-trivial work between
> > lock/unlock, making a spinlock a poor fit?
> 
> No. But I am not sure that there is such a case :)

There appears to be some miscommunication here, and the smiley does not help.
"such a case" would be copying 32KB in the critical section, for example.
 
> With this idea that the counter doesn't change once the thread is
> inside the lock-acquisition loop, there is much less noise on the lock
> value. This has two benefits. First the accesses in the loop are
> mainly reads, to see if there has been a change, no writes. So the bus
> pressure should be reduced. And second, because there are less writes
> in total, other threads that are inside the same loop perceive less
> perturbation, and the futex as a good chance to succeed.

I think spinning every time you're about to enter futex_wait helps if you
expect critical sections to be as small as your spin period.  Otherwise, it's
not obviously an improvement.  I think normally you spin prior to the very
first atomic operation, in anticipation that you can proceed via the fast
path.  Your spin scheme is different.

Alexander