Am Dienstag, den 19.05.2015, 19:15 -0400 schrieb Rich Felker: > On Wed, May 20, 2015 at 12:47:44AM +0200, Jens Gustedt wrote: > > > > - pthread_once_t should always be volatile > > > > - pthread_spinlock_t should always be volatile > > > > > > These are C++ ABI changes. I'd like to make the change but I'm > > > concerned it might break things. > > > > Both are broken as they are now, if you fall into a compiler that > > "knows" that the object itself isn't volatile qualified, and by that > > excuse takes the liberty to optimize out loads. For both types there > > is one point where there is a cast to (volatile int*) followed by a > > load, that might not do what we want. > > > > (For pthread_once_t, this on line 43 in pthread_once.c) > > > > I think the safest would be to make the data types volatile. If that > > is not possible, do the load with some new function "a_load_rel" that > > is guaranteed to be volatile, atomic and with memory_order relaxed. > > This would require lots of changes and it would be easy to overlook > some. No, no, I just meant a special function that is applied for these two special cases. (With a big warning flag that we only have that because ABI compatibility issues.) > The main reason is that lots of the implementations of a_* > functions perform loads via C code inside their CAS loops and need the > load to be volatile. They'd all need to be changed to use a_load_rel, > and a_load_rel would in turn need to be added for all archs. > > I have a slightly different approach: destroy the compiler's ability > to known the identity of the object being accessed by passing it > through: > > volatile int *make_volatile(volatile int *p) > { > __asm__ ( "" : "=r"(p) : "0"(p) ); > return p; > } Really not so different, I was just thinking to have one line of assembler that does the load directly. So minor details. > > > > - pthread_barrier needs atomic increment > > > > > > Where? > > > > I think all uses of count in pthread_barrier_wait should be > > atomic. The increments in lines 15 and 93 should be atomic_fetch_add. > > > > Probably most archs do a ++ on a volatile as one memory operation, but > > nothing enforces that. > > At line 93, the modification to inst->count happens with a lock held. > There is no concurrent access. > > At line 15, I think there may be an issue; I need to look closer. But > it's not concurrent writes that need atomicity of the ++ as a RMW > operation; those are excluded by a lock. It's just the possibility of > another thread reading concurrently with the writes that I'm concerned > may happen. I don't think that both of these are very performance critical, there are a lot of locks, descheduling etc in that function. So I would clearly go for security and simplicity here: make them all atomic. > > > I don't know a really good solution to this. The vast majority of uses > > > of tid are self->tid, from which perspective tid is not volatile or > > > even mutable. We don't want to pessimize them with volatile access. > > > Probably the best approach is to split it out into separate tid and > > > exit_futex fields. > > > > Yes, this would probably be safer and cleaner. > > I'm not sure if it's easy to do, though. There seem to be race > conditions, since ->tid is accessed from other threads for things like > pthread_kill and pthread_cancel. If we setup ->tid from start(), it > might not be seen by a pthread_kill or pthread_cancel made by the > caller of pthread_create after pthread_create returns. If we setup > ->tid from pthread_create, it would not be seen by the new thread > immediately when using self->tid. If we setup ->tid from both places, > then we have a data race (concurrent writes) unless we make it atomic > (thus volatile) and then we're back to the original problem. (Note > that we don't have this race now because ->tid is set in the kernel > before clone returns in either the parent or the child.) > > I think the above make_volatile trick would solve the problem without > adding a new field. probably > Alternatively, instead of ->tid and ->exit_futex, we could have ->tid > (used by self) and ->volatile_tid (used by other threads acting on the > target). I would prefer something along that line, looks cleaner to me. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::