Re: Explaining cond var destroy [Re: [musl] C threads, v3.0]

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Jens Gustedt <jens.gustedt@inria.fr>
To: musl@lists.openwall.com
Subject: Re: Explaining cond var destroy [Re: [musl] C threads, v3.0]
Date: Sat, 09 Aug 2014 08:47:34 +0200	[thread overview]
Message-ID: <1407566854.4988.231.camel@eris.loria.fr> (raw)
In-Reply-To: <20140808204855.GQ1674@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 4333 bytes --]

Hello,

Am Freitag, den 08.08.2014, 16:48 -0400 schrieb Rich Felker:
> On Fri, Aug 08, 2014 at 03:14:06PM -0400, Rich Felker wrote:
> > I think I may have a solution you'll like:
> > 
> > We can perform the release of the lock via a compare-and-swap rather
> > than a simple swap. In this way, we can know before releasing the lock
> > whether it's going to require a wake or not:
> > 
> > - If waiters was zero and the cas from owned/uncontended to zero
> >   succeeds, no futex wake operation is needed.
> > 
> > - If waiters was nonzero, or if the cas fails (thereby instead
> >   requiring a cas from owned/contended to zero), we can do the
> >   following:
> > 
> > Don't use a userspace CAS to release; this would allow the lock to be
> > acquired by another thread, released, destroyed, and freed before the
> > futex wake is performed. Instead, use FUTEX_WAKE_OP to atomically
> > perform the atomic assignment and futex wake.
> 
> FUTEX_WAKE_OP is highly under-documented, and i'm worried it might be
> unsupported on some archs (since the atomics for it have to be
> implemented on a per-arch basis in the kernel) but of course we can
> just fallback on archs where it's not supported yet.
> 
> Anyway, the behavior seems to be:
> 
> - Futex acquisition for uaddr1 and uaddr2 both happen prior to the
>   atomic operation, and this hold locks that seem to prevent new
>   waiters on the futex(es). This should preclude any risk of waking a
>   new waiter that arrives after the atomic operation, as desired.
> 
> - Both uaddr1 and uaddr2 are hashed, with no check for equality. This
>   is a fairly costly wasteful operation, but could be fixed on the
>   kernel side. At present I suspect they don't care because
>   FUTEX_WAKE_OP is considered unnecessary, but if I raise it on the
>   glibc bug tracker thread for issue 13690 as a solution to the
>   problem, I think there would be a lot more interest in optimizing
>   this kernel path.
> 
> - After the atomic operation is performed, a wake is always performed
>   on uaddr1 (based on the previous acquisition); this fact is omitted
>   from all the documentation, but it's obviously intentional since
>   otherwise the uaddr1 argument would not be used for anything but
>   wasting time. The wake on uaddr2 is conditional on a comparison.
> 
> - No allocation is required anywhere in the operation, so we don't
>   have to worry about lost actions on OOM. For plain FUTEX_WAKE this
>   would not have been an issue (if acquirin the futex required memory,
>   then failure for FUTEX_WAKE to acquire it would mean there was no
>   FUTEX_WAIT taking place anyway), but for FUTEX_WAKE_OP, failure
>   would omit the atomic operation, which must take place even if there
>   are no current FUTEX_WAIT waiters (e.g. if the FUTEX_WAIT was
>   interrupted by a signal handler).
> 
> Based on the above, I think it's safe to move forward with using
> FUTEX_WAKE_OP. It seems optimal to me to use uaddr1==uaddr2 and a
> comparison that always yields false, so that the wake only goes to
> uaddr1. This will allow the kernel to optimize out double-hashing in
> the future by checking for uaddr1==uaddr2, and already optimizes out
> the double-iteration of the hash bucket for waking purposes.
> 
> Any further thoughts on the matter? I think we should finish the
> private futex support task before starting on this, so that we don't
> do new work that's going to conflict with a pending patch.

This looks promissing, but I yet don't know enough about these less
common futex operations to comment more on it.

Generally I think that the control structures should be as tight as
possible, give provable properties in the mathematical sense. The
interaction between user- and kernelland should be minimal, and we
shouldn't provoque reactions of the kernel that concern threads (or
even process) that are not really targetted. 

Jens


PS: I will be a bit less available in the next days.


-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

next prev parent reply	other threads:[~2014-08-09  6:47 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-04  9:30 C threads, v3.0 Jens Gustedt
2014-08-04  9:33 ` Jens Gustedt
2014-08-04 14:50 ` Rich Felker
2014-08-04 16:48   ` Jens Gustedt
2014-08-04 17:06     ` Rich Felker
2014-08-04 22:16       ` Jens Gustedt
2014-08-04 22:36         ` Rich Felker
2014-08-06  3:52 ` Explaining cond var destroy [Re: [musl] C threads, v3.0] Rich Felker
2014-08-06  8:43   ` Jens Gustedt
2014-08-06  9:41     ` Jens Gustedt
2014-08-06 10:03       ` Rich Felker
2014-08-06 10:32         ` Jens Gustedt
2014-08-06 16:15           ` Rich Felker
2014-08-06 16:56             ` Jens Gustedt
2014-08-06 17:32               ` Rich Felker
2014-08-06 20:55                 ` Jens Gustedt
2014-08-06 22:04                   ` Rich Felker
2014-08-06 22:43                     ` Jens Gustedt
2014-08-06 23:15                       ` Rich Felker
2014-08-07  7:50                         ` Jens Gustedt
2014-08-07 10:52                           ` Szabolcs Nagy
2014-08-07 11:03                             ` Jens Gustedt
2014-08-07 16:13                           ` Rich Felker
2014-08-07 16:47                             ` Jens Gustedt
2014-08-07 17:25                               ` Rich Felker
2014-08-08  9:20                                 ` Jens Gustedt
2014-08-08 16:53                                   ` Rich Felker
2014-08-08 19:14                                   ` Rich Felker
2014-08-08 20:48                                     ` Rich Felker
2014-08-09  6:47                                       ` Jens Gustedt [this message]
2014-08-12  2:50                                         ` Rich Felker
2014-08-12  7:04                                           ` Jens Gustedt
2014-08-12 16:01                                             ` Rich Felker
2014-08-12 19:09                                               ` Jens Gustedt
2014-08-12 21:18                                                 ` Rich Felker
2014-08-13  6:43                                                   ` Jens Gustedt
2014-08-13  7:19                                                     ` Jens Gustedt
2014-08-06  9:50     ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1407566854.4988.231.camel@eris.loria.fr \
    --to=jens.gustedt@inria.fr \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).