They are machines designed to run programs most people do not write!
> Also, NUMA effects are more important in practice on big multicores. Some
> of the off-chip delays are brutal.
yeah, we've been talking about this on #cat-v. even inside one CPU
package amd puts multiple dies nowadays, and the cross-die cpu cache
access delays are approaching the same dimensions as memory-access!
also on each die, they have what they call ccx (cpu complex),
groupings of 4 cores, which are connected much faster internally than
towards the other ccx