Am Dienstag, den 19.05.2015, 19:15 -0400 schrieb Rich Felker:
> On Wed, May 20, 2015 at 12:47:44AM +0200, Jens Gustedt wrote:
> > > >  - pthread_once_t should always be volatile
> > > >  - pthread_spinlock_t should always be volatile
> > > 
> > > These are C++ ABI changes. I'd like to make the change but I'm
> > > concerned it might break things.
> > 
> > Both are broken as they are now, if you fall into a compiler that
> > "knows" that the object itself isn't volatile qualified, and by that
> > excuse takes the liberty to optimize out loads. For both types there
> > is one point where there is a cast to (volatile int*) followed by a
> > load, that might not do what we want.
> > 
> > (For pthread_once_t, this on line 43 in pthread_once.c)
> > 
> > I think the safest would be to make the data types volatile. If that
> > is not possible, do the load with some new function "a_load_rel" that
> > is guaranteed to be volatile, atomic and with memory_order relaxed.
> 
> This would require lots of changes and it would be easy to overlook
> some.

No, no, I just meant a special function that is applied for these two
special cases. (With a big warning flag that we only have that because
ABI compatibility issues.)

> The main reason is that lots of the implementations of a_*
> functions perform loads via C code inside their CAS loops and need the
> load to be volatile. They'd all need to be changed to use a_load_rel,
> and a_load_rel would in turn need to be added for all archs.
> 
> I have a slightly different approach: destroy the compiler's ability
> to known the identity of the object being accessed by passing it
> through:
> 
> volatile int *make_volatile(volatile int *p)
> {
> 	__asm__ ( "" : "=r"(p) : "0"(p) );
> 	return p;
> }

Really not so different, I was just thinking to have one line of
assembler that does the load directly. So minor details.

> > > >  - pthread_barrier needs atomic increment
> > > 
> > > Where?
> > 
> > I think all uses of count in pthread_barrier_wait should be
> > atomic. The increments in lines 15 and 93 should be atomic_fetch_add.
> > 
> > Probably most archs do a ++ on a volatile as one memory operation, but
> > nothing enforces that.
> 
> At line 93, the modification to inst->count happens with a lock held.
> There is no concurrent access.
> 
> At line 15, I think there may be an issue; I need to look closer. But
> it's not concurrent writes that need atomicity of the ++ as a RMW
> operation; those are excluded by a lock. It's just the possibility of
> another thread reading concurrently with the writes that I'm concerned
> may happen.

I don't think that both of these are very performance critical, there
are a lot of locks, descheduling etc in that function. So I would
clearly go for security and simplicity here: make them all atomic.

> > > I don't know a really good solution to this. The vast majority of uses
> > > of tid are self->tid, from which perspective tid is not volatile or
> > > even mutable. We don't want to pessimize them with volatile access.
> > > Probably the best approach is to split it out into separate tid and
> > > exit_futex fields.
> > 
> > Yes, this would probably be safer and cleaner.
> 
> I'm not sure if it's easy to do, though. There seem to be race
> conditions, since ->tid is accessed from other threads for things like
> pthread_kill and pthread_cancel. If we setup ->tid from start(), it
> might not be seen by a pthread_kill or pthread_cancel made by the
> caller of pthread_create after pthread_create returns. If we setup
> ->tid from pthread_create, it would not be seen by the new thread
> immediately when using self->tid. If we setup ->tid from both places,
> then we have a data race (concurrent writes) unless we make it atomic
> (thus volatile) and then we're back to the original problem. (Note
> that we don't have this race now because ->tid is set in the kernel
> before clone returns in either the parent or the child.)
> 
> I think the above make_volatile trick would solve the problem without
> adding a new field.

probably

> Alternatively, instead of ->tid and ->exit_futex, we could have ->tid
> (used by self) and ->volatile_tid (used by other threads acting on the
> target).

I would prefer something along that line, looks cleaner to me.

Jens


-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::