On Fri, Jun 29, 2018 at 10:27 AM, wrote: > Thread local storage and starting threads up is largely a rather > inconsequential implementation detail. When it comes down to actual > parallel programming, of which I have done more than a little, the big > thing > is thread synchronization. It's rather hardware dependent. You can > pretty much entirely wipe out any parallism gains with a synchronization > call that results in a context switch or even a serious cache impact. On > one side you have machines like the Denelcor HEP where every memory word > had > a pair of semaphores on it and the instructions could stall the process > while waiting for them and the hardware would schedule the other threads. > On the other hand you have your x86, which you can do a few clever things > with some atomic operations and inlined assembler but a lot of the > "standard" (boost, pthread, etc...) synchs will kill you. > C11 also defines thread APIs and atomic operations sufficient to do many types of locking. POSIX layers on threads as well that could be implemented using those atomics. Warner