for those without much mwait experience, mwait is a kernel-only primitive
(as per the instructions) that pauses the processor until a change has been
made in some range of memory. the size is determined by probing the h/w,
but think cacheline. so the discussion of locking is kernel specific as well.
> > On 17 Dec 2013, at 12:00, cinap_lenrek@felloff.net wrote:i assume you mean that there is contention on the cacheline holding the runq lock?
> >
> > thats a surprising result. by dog pile lock you mean the runq spinlock no?
> >
>
> I guess it depends on the HW, but I don´t find that so surprising. You are looping
> sending messages to the coherency fabric, which gets congested as a result.
> I have seen that happen.
i don't think there's classical congestion. as i believe cachelines not involved in the
mwait would experience no hold up.
mwait() does improve things and one would expect the latency to always be better
than spining*. but as it turns out the current scheduler is pretty hopeless in its locking
anyway. simply grabbing the lock with lock rather than canlock makes more sense to me.
also, using ticket locks (see 9atom nix kernel) will provide automatic backoff within the lock.
ticket locks are a poor solution as they're not really scalable but they will scale to 24 cpus
much better than tas locks.
mcs locks or some other queueing-style lock is clearly the long-term solution. but as
charles points out one would really perfer to figure out a way to fit them to the lock
api. i have some test code, but testing queueing locks in user space is ... interesting.
i need a new approach.