From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/5788
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Explaining cond var destroy [Re: [musl] C threads, v3.0]
Date: Fri, 8 Aug 2014 16:48:55 -0400
Message-ID: <20140808204855.GQ1674@brightrain.aerifal.cx>
References: <1407358510.24324.328.camel@eris.loria.fr>
 <20140806220453.GD1674@brightrain.aerifal.cx>
 <1407365015.24324.334.camel@eris.loria.fr>
 <20140806231539.GE1674@brightrain.aerifal.cx>
 <1407397851.24324.354.camel@eris.loria.fr>
 <20140807161342.GH1674@brightrain.aerifal.cx>
 <1407430024.24324.387.camel@eris.loria.fr>
 <20140807172514.GI1674@brightrain.aerifal.cx>
 <1407489627.24324.419.camel@eris.loria.fr>
 <20140808191405.GP1674@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1407530960 31252 80.91.229.3 (8 Aug 2014 20:49:20 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 8 Aug 2014 20:49:20 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-5793-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 08 22:49:10 2014
Return-path: <musl-return-5793-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-5793-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1XFr64-0004nW-0Y
	for gllmg-musl@plane.gmane.org; Fri, 08 Aug 2014 22:49:08 +0200
Original-Received: (qmail 21651 invoked by uid 550); 8 Aug 2014 20:49:07 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 21643 invoked from network); 8 Aug 2014 20:49:07 -0000
Content-Disposition: inline
In-Reply-To: <20140808191405.GP1674@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:5788
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/5788>

On Fri, Aug 08, 2014 at 03:14:06PM -0400, Rich Felker wrote:
> I think I may have a solution you'll like:
> 
> We can perform the release of the lock via a compare-and-swap rather
> than a simple swap. In this way, we can know before releasing the lock
> whether it's going to require a wake or not:
> 
> - If waiters was zero and the cas from owned/uncontended to zero
>   succeeds, no futex wake operation is needed.
> 
> - If waiters was nonzero, or if the cas fails (thereby instead
>   requiring a cas from owned/contended to zero), we can do the
>   following:
> 
> Don't use a userspace CAS to release; this would allow the lock to be
> acquired by another thread, released, destroyed, and freed before the
> futex wake is performed. Instead, use FUTEX_WAKE_OP to atomically
> perform the atomic assignment and futex wake.

FUTEX_WAKE_OP is highly under-documented, and i'm worried it might be
unsupported on some archs (since the atomics for it have to be
implemented on a per-arch basis in the kernel) but of course we can
just fallback on archs where it's not supported yet.

Anyway, the behavior seems to be:

- Futex acquisition for uaddr1 and uaddr2 both happen prior to the
  atomic operation, and this hold locks that seem to prevent new
  waiters on the futex(es). This should preclude any risk of waking a
  new waiter that arrives after the atomic operation, as desired.

- Both uaddr1 and uaddr2 are hashed, with no check for equality. This
  is a fairly costly wasteful operation, but could be fixed on the
  kernel side. At present I suspect they don't care because
  FUTEX_WAKE_OP is considered unnecessary, but if I raise it on the
  glibc bug tracker thread for issue 13690 as a solution to the
  problem, I think there would be a lot more interest in optimizing
  this kernel path.

- After the atomic operation is performed, a wake is always performed
  on uaddr1 (based on the previous acquisition); this fact is omitted
  from all the documentation, but it's obviously intentional since
  otherwise the uaddr1 argument would not be used for anything but
  wasting time. The wake on uaddr2 is conditional on a comparison.

- No allocation is required anywhere in the operation, so we don't
  have to worry about lost actions on OOM. For plain FUTEX_WAKE this
  would not have been an issue (if acquirin the futex required memory,
  then failure for FUTEX_WAKE to acquire it would mean there was no
  FUTEX_WAIT taking place anyway), but for FUTEX_WAKE_OP, failure
  would omit the atomic operation, which must take place even if there
  are no current FUTEX_WAIT waiters (e.g. if the FUTEX_WAIT was
  interrupted by a signal handler).

Based on the above, I think it's safe to move forward with using
FUTEX_WAKE_OP. It seems optimal to me to use uaddr1==uaddr2 and a
comparison that always yields false, so that the wake only goes to
uaddr1. This will allow the kernel to optimize out double-hashing in
the future by checking for uaddr1==uaddr2, and already optimizes out
the double-iteration of the hash bucket for waking purposes.

Any further thoughts on the matter? I think we should finish the
private futex support task before starting on this, so that we don't
do new work that's going to conflict with a pending patch.

Rich