mailing list of musl libc
 help / color / mirror / code / Atom feed
* New private cond var design
@ 2014-08-15 19:35 Rich Felker
  2014-08-15 20:28 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Rich Felker @ 2014-08-15 19:35 UTC (permalink / raw)
  To: musl

The current cv bug reported by Jens occurs when a cv is reused with a
new mutex before all the former-waiters from the previous mutex have
woken up and decremented themselves from the waiter count. In this
case, they can't know whether to decrement the in-cv waiter count or
the in-mutex waiter count, and thereby end up corrupting these counts.

Jens' proposed solution tracked "instances" via dynamically allocated,
reference-counted objects. I finally think I have a solution which
avoids dynamic allocation: representing the "instance" as a
doubly-linked-list of automatic objects on the stack of each waiter.

The cv object itself needs a single pointer to the head of the current
instance. This pointer is set by the first waiter on an instance.
Subsequent waiters which arrive when it's already set can check that
the mutex argument is the same; if not, this is an error. The pointer
is cleared when the last (formal) waiter is removed by the signal or
broadcast operation.

Storing this list eliminates the need to keep a waiter count. The
length of the linked list itself is the number of waiters which need
to be moved to the mutex on broadcast. This requires an O(n) walk of
the list at broadcast time, but that's really a non-issue since the
kernel is already doing a much more expensive O(n) walk of the futex
waiter list anyway.

The list also allows us to eliminate the sequence number wrapping
issue (sadly, only for private, non-process-shared cv's, since
process-shared can't use process-local memory like this) in one of two
ways:

Option 1: If the list elements store the sequence number their waiter
is waiting on, the signal/broadcast operations can choose a new
sequence number distinct from that of all waiters.

Option 2: Each waiter can wait on a separate futex on its own stack,
so that sequence numbers are totally unneeded. This eliminates all
spurious wakes; signal can precisely control exactly which waiter
wakes (e.g. choosing the oldest), thereby waking only one waiter.
Broadcast then becomes much more expensive: the broadcasting thread
has to make one requeue syscall per waiter. But this still might be a
good design.

Unless anyone sees problems with this design, I'll probably start
working on it soon. I think I'll try to commit the private-futex stuff
first, though, to avoid having to rebase it; fixing the cv issue in
1.0.x will not be a direct cherry-pick anyway, so there's no point in
putting off 1.0.x-incompatible changes pending the fix.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: New private cond var design
  2014-08-15 19:35 New private cond var design Rich Felker
@ 2014-08-15 20:28 ` Rich Felker
  2014-08-17 13:44 ` AW: " Jens Gustedt
  2014-08-18  4:04 ` Rich Felker
  2 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2014-08-15 20:28 UTC (permalink / raw)
  To: musl

On Fri, Aug 15, 2014 at 03:35:36PM -0400, Rich Felker wrote:
> The list also allows us to eliminate the sequence number wrapping
> issue (sadly, only for private, non-process-shared cv's, since
> process-shared can't use process-local memory like this) in one of two
> ways:
> 
> Option 1: If the list elements store the sequence number their waiter
> is waiting on, the signal/broadcast operations can choose a new
> sequence number distinct from that of all waiters.

I don't think this actually works to avoid the sequence number issue,
at least not as-described above, since the sequence number still has
to remain unique even once there's a new instance. If we keep a linked
list that's not instance-specific but covers all instances, though, we
could use it to avoid sequence number reuse, but then some (light, I
think) extra accounting is needed to mark what part of the list is
from previous instances.

> Option 2: Each waiter can wait on a separate futex on its own stack,
> so that sequence numbers are totally unneeded. This eliminates all
> spurious wakes; signal can precisely control exactly which waiter
> wakes (e.g. choosing the oldest), thereby waking only one waiter.
> Broadcast then becomes much more expensive: the broadcasting thread
> has to make one requeue syscall per waiter. But this still might be a
> good design.

I think this design is more elegant, and probably performs better in
the case where only signal is used since spurious wakes are avoided
totally, but somewhat worse when broadcasts are being used.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: New private cond var design
  2014-08-15 19:35 New private cond var design Rich Felker
  2014-08-15 20:28 ` Rich Felker
@ 2014-08-17 13:44 ` Jens Gustedt
  2014-08-17 15:44   ` Rich Felker
  2014-08-18  4:04 ` Rich Felker
  2 siblings, 1 reply; 5+ messages in thread
From: Jens Gustedt @ 2014-08-17 13:44 UTC (permalink / raw)
  To: musl

Hi,
I definitively like this idea of the list items on the stack.
Some thoughts:

Threads can leave the list prematurely, so we need a doubly linked list.
Threads that are inside the wait function can be organized that they hold the mutex,
this should avoid a lot of race trouble maintaining the list.
Broadcast can be done in O(1) when delegating the wake up to the threads on the list one after another. These have to acquire the mutex anyhow, one at a time.
Jens

:: INRIA Nancy Grand Est :: http://www.loria.fr/~gustedt/   ::
:: AlGorille ::::::::::::::: office Nancy : +33 383593090   ::
:: ICube :::::::::::::: office Strasbourg : +33 368854536   ::
:: ::::::::::::::::::::::::::: gsm France : +33 651400183   ::
:: :::::::::::::::::::: gsm international : +49 15737185122 ::


----- Rich Felker <dalias@libc.org> schrieb:
> The current cv bug reported by Jens occurs when a cv is reused with a
> new mutex before all the former-waiters from the previous mutex have
> woken up and decremented themselves from the waiter count. In this
> case, they can't know whether to decrement the in-cv waiter count or
> the in-mutex waiter count, and thereby end up corrupting these counts.
> 
> Jens' proposed solution tracked "instances" via dynamically allocated,
> reference-counted objects. I finally think I have a solution which
> avoids dynamic allocation: representing the "instance" as a
> doubly-linked-list of automatic objects on the stack of each waiter.
> 
> The cv object itself needs a single pointer to the head of the current
> instance. This pointer is set by the first waiter on an instance.
> Subsequent waiters which arrive when it's already set can check that
> the mutex argument is the same; if not, this is an error. The pointer
> is cleared when the last (formal) waiter is removed by the signal or
> broadcast operation.
> 
> Storing this list eliminates the need to keep a waiter count. The
> length of the linked list itself is the number of waiters which need
> to be moved to the mutex on broadcast. This requires an O(n) walk of
> the list at broadcast time, but that's really a non-issue since the
> kernel is already doing a much more expensive O(n) walk of the futex
> waiter list anyway.
> 
> The list also allows us to eliminate the sequence number wrapping
> issue (sadly, only for private, non-process-shared cv's, since
> process-shared can't use process-local memory like this) in one of two
> ways:
> 
> Option 1: If the list elements store the sequence number their waiter
> is waiting on, the signal/broadcast operations can choose a new
> sequence number distinct from that of all waiters.
> 
> Option 2: Each waiter can wait on a separate futex on its own stack,
> so that sequence numbers are totally unneeded. This eliminates all
> spurious wakes; signal can precisely control exactly which waiter
> wakes (e.g. choosing the oldest), thereby waking only one waiter.
> Broadcast then becomes much more expensive: the broadcasting thread
> has to make one requeue syscall per waiter. But this still might be a
> good design.
> 
> Unless anyone sees problems with this design, I'll probably start
> working on it soon. I think I'll try to commit the private-futex stuff
> first, though, to avoid having to rebase it; fixing the cv issue in
> 1.0.x will not be a direct cherry-pick anyway, so there's no point in
> putting off 1.0.x-incompatible changes pending the fix.
> 
> Rich



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: New private cond var design
  2014-08-17 13:44 ` AW: " Jens Gustedt
@ 2014-08-17 15:44   ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2014-08-17 15:44 UTC (permalink / raw)
  To: musl

On Sun, Aug 17, 2014 at 03:44:15PM +0200, Jens Gustedt wrote:
> Hi,
> I definitively like this idea of the list items on the stack.

Do you have a preference which variant? It's not clear to me whether
your comments are in regard to variant 1 or 2 or apply to both.

> Some thoughts:
> 
> Threads can leave the list prematurely, so we need a doubly linked list.

Yes, obviously.

> Threads that are inside the wait function can be organized that they hold the mutex,
> this should avoid a lot of race trouble maintaining the list.

If we have one list per 'instance' rather than a single list of all
waiters that haven't yet exited, then yes, the waiters are all using
the mutex and thereby excluding one another from access to the list.

But the signal/broadcast thread(s) still need a way to access the list
and cv object safely...

> Broadcast can be done in O(1) when delegating the wake up to the
> threads on the list one after another. These have to acquire the
> mutex anyhow, one at a time.

Is your thought to have each waiter, after acquiring the mutex in
preparation to return, requeue the next waiter (if any) onto the
mutex? I think this works and should be conforming. And with this
approach there seems to be no reason to prefer having all waiters use
the same futex rather than a futex local to their own stacks, so the
option 2 looks very attractive.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: New private cond var design
  2014-08-15 19:35 New private cond var design Rich Felker
  2014-08-15 20:28 ` Rich Felker
  2014-08-17 13:44 ` AW: " Jens Gustedt
@ 2014-08-18  4:04 ` Rich Felker
  2 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2014-08-18  4:04 UTC (permalink / raw)
  To: musl

On Fri, Aug 15, 2014 at 03:35:36PM -0400, Rich Felker wrote:
> Jens' proposed solution tracked "instances" via dynamically allocated,
> reference-counted objects. I finally think I have a solution which
> avoids dynamic allocation: representing the "instance" as a
> doubly-linked-list of automatic objects on the stack of each waiter.
> 
> [...]
> 
> Option 2: Each waiter can wait on a separate futex on its own stack,
> so that sequence numbers are totally unneeded. This eliminates all
> spurious wakes; signal can precisely control exactly which waiter
> wakes (e.g. choosing the oldest), thereby waking only one waiter.
> Broadcast then becomes much more expensive: the broadcasting thread
> has to make one requeue syscall per waiter. But this still might be a
> good design.

I ended up implementing option 2. It doesn't avoid O(n) operations in
broadcast like Jens seems to have had in mind (although it only makes
O(1) syscalls); avoiding the need for waiters to access the cv object
after waiting (which would require round-trip synchronization with all
waiters at broadcast time) necessitates writing to each waiter node
object. I have some improvements in mind that might further reduce the
time spent in broadcast and make the wake order more predictable (and
reduce code size) to look into soon, though.

So far I'm really happy with the performance -- it's much better than
before, sometimes even 2-3x. The benchmark by Timo Teräs that showed
the benefit of private futexes particularly benefits. My cvb2.c test
benefits much less: roughly 15% improvement for private mutex and ~33%
improvement with process-shared mutex.

I'd still like to look into whether it's possible to improve
process-shared cond vars (even at the expense of some performance) to
get rid of the synchronization in the destroy function (and thereby
make them unmapping-safe too).

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-08-18  4:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-15 19:35 New private cond var design Rich Felker
2014-08-15 20:28 ` Rich Felker
2014-08-17 13:44 ` AW: " Jens Gustedt
2014-08-17 15:44   ` Rich Felker
2014-08-18  4:04 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).