Am Mittwoch, den 06.08.2014, 19:15 -0400 schrieb Rich Felker:
> No, it's not. The wait happens prior to the deallocation, in the same
> thread that performs the deallocation. The interleaving looks like
> this:
> 
> Thread A                        Thread B
> --------                        --------
> waiters++
>                                 save waiters count
>                                 atomic unlock
> futex wait fails with EAGAIN
> cas succeeds & gets lock
> waiters--
> [unlock operation]
> [free operation]
>                                 futex wake to freed address
> 
> The free operation in thread A is valid since A knows it is the last
> user of the mutex and thread B's use/ownership of the mutex formally
> ends with the atomic unlock.

No, operating on an object that has been freed is UB.  This is
independent of this object being a mutex or not. This must never
happen. So the free is making a wrong assumption.

I think the fundamental flaw with this approach is that it mixes two
different concepts, the waiters count and a reference count. These are
two different things.

With a reference count, the schema looks like this.

Initially the "obj->refs" counter is at least 2 because both threads
hold references on the object.

Thread A                                   Thread B
--------                                   --------
waiters++
                                           save waiters count
                                           atomic unlock
futex wait fails with EAGAIN
cas succeeds & gets lock
waiters--
[unlock operation]
if (atomic_fetch_sub(&obj->refs,1) == 1)
  [free operation on obj]

                                           futex wake to freed address
                                           if (atomic_fetch_sub(&obj->refs,1) == 1)
                                              [free operation on obj]

Which thread does the free operation (if any), only depends on the
order in which the atomic_fetch_sub operations are effective. (And
musl doesn't seem to have the primitives to do atomic_fetch_sub?)

Now I am aware that such a scheme is difficult to establish in a
setting where obj can be malloced of not. This scenario supposes that
both threads *know* that the allocation of obj has been done with
malloc.

The easiest way to assure that, would be to impose that the "real"
data object that the thread lock, unlock, wait etc operations would
use would always have to be malloced.

For C threads this can be done by mtx_init and cnd_init.  They would
be allocating the dynamic object, set "refs" to 1 and set a link to
that object. For mtx_t and cnd_t dynamic initialization is imperative.

For pthread unshared mutexes and conditions that are initialized by an
initializer (and not by the corresponding init function) one can
certainly get away by delaying that part of the dynamic initialization
to the first usage. This can certainly also be done with mtx_t and
cnd_t as an extension to the C standard.

For process shared mutexes and conditions I have no clue how I would
do that.

This uses dynamic allocation under the hood, and adds possible failure
paths to the game. But as always for allocations these failures occur
in situations where the application is pretty much screwed.

I don't like going dynamic too much, and I strongly suspect that you
don't like such an approach for this reason, either :)

But for C threads this would be a way to go.

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::