On Thu, 9 Mar 2006, skaller wrote: > Ahem. Now try that on an AMDx2 (dual core). The cost goes through > the roof if one process has a thread on each core. Because each > core has its own cache and both caches have to be flushed/ > synchronised. And those caches are BIG! Love to. Wanna buy me the box? :-} Seriously- my code is attached, if someone wants to run it on other boxes and post the results, feel free. It's GNU-C/x86 specific, as I'm using GNU C's inline assembler and the rdtsc instruction to get accurate cycle counts. As to the cache comment: the whole caches don't have to be flushed, just the line the mutex is on. Which makes it approximately the cost of a cache miss- that's a good approximation of the cost of getting an uncontended lock. > > I have no idea if Linux, for example, running SMP kernel, > is smart enough to know if a mutex is shared between two > processing units or not: AFAIK Linux doesn't support > interprocess mutex. Windows does. Be interesting to > compare. It doesn't look like the mutex software is even going into the kernel. I don't think the Linux kernel even knows the mutex *exists*, let alone what threads are competing for it. On the x86, at least, lock instructions are not priveledged. > > As mentioned before the only data I have at the moment > is a two thread counter increment experiment on a dual > CPU G5 box, where the speed up from 2 CPUs vs 1 was > a factor of 15 .. times SLOWER. If you're ping-ponging a cache line between two CPUs (and the AMD dual cores count as two CPUs), then I can easily beleive that. So? Brian