mailing list of musl libc
 help / color / mirror / code / Atom feed
* [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary
@ 2014-08-27  9:57 Jens Gustedt
  2014-08-27 20:07 ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Gustedt @ 2014-08-27  9:57 UTC (permalink / raw)
  To: musl

This avoids calls to __wait for both, the signaler and the waiter, by
eventually adding a __wake to the waiter. Since __wait may unschedule the
calling thread and a private __wake is just a not too expensive syscall,
this is considered being an advantage.

The additional __wake call could be avoided by tracing the fact that the
signaler is inside the "ref" count, making it a win-win.
---
 src/thread/pthread_cond_timedwait.c |   33 +++++++++++++++++++++++++++------
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index 52b1026..9790c84 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -25,7 +25,7 @@
 struct waiter {
 	struct waiter *prev, *next;
 	int state, barrier, mutex_ret;
-	int *notify;
+	int *volatile notify;
 	pthread_mutex_t *mutex;
 	pthread_cond_t *cond;
 	int shared;
@@ -42,6 +42,19 @@ static inline void lock(volatile int *l)
 	}
 }
 
+/* Avoid taking the lock if we know it isn't necessary. */
+static inline int lockRace(volatile int *l, int*volatile* notifier)
+{
+	int ret = 1;
+	if (!*notifier && (ret = a_cas(l, 0, 1))) {
+		a_cas(l, 1, 2);
+		do __wait(l, 0, 2, 1);
+		while (!*notifier && (ret = a_cas(l, 0, 2)));
+	}
+	return ret;
+}
+
+
 static inline void unlock(volatile int *l)
 {
 	if (a_swap(l, 0)==2)
@@ -86,21 +99,29 @@ static void unwait(void *arg)
 		pthread_cond_t *c = node->cond;
 		int * ref;
 
-		lock(&c->_c_lock);
+		int locked = lockRace(&c->_c_lock, &node->notify);
 		/* Our membership to the list may have changed while waiting for the lock. */
-		/* Ensure that the notify field is only read while we hold the lock. */
+		/* This field is written with an atomic op, so we don't need the lock to be taken. */
+		/* But it might also have changed while we were waiting, so reload it, again. */
 		ref = node->notify;
 
 		/* If there has been no race with a signaler, splice us out of the list. */
 		/* Otherwise, the signaler has already taken care of it. */
 		if (!ref) {
+			/* In this case locked is always 0 and the lock is acquired. */
 			if (c->_c_head == node) c->_c_head = node->next;
 			else if (node->prev) node->prev->next = node->next;
 			if (c->_c_tail == node) c->_c_tail = node->prev;
 			else if (node->next) node->next->prev = node->prev;
 		}
-		unlock(&c->_c_lock);
+		if (!locked) unlock(&c->_c_lock);
+
 
+		/* Since this leaving waiter might not have held the _c_lock, the following       */
+		/* __wake might be issued when the signaler is still inside its CS.               */
+		/* But if so, this avoids a __wait of the signaler, which more important.         */
+		/* This should not target any spurious wake up in any other thread:               */
+		/* ref is on the stack of the signaler, and that signaler is still alive.         */
 		if (ref) {
 			if (a_fetch_add(ref, -1)==1)
 				__wake(ref, 1, 1);
@@ -178,8 +199,8 @@ int __private_cond_signal(pthread_cond_t *c, int n)
 	lock(&c->_c_lock);
 	for (p=c->_c_tail; n && p; p=p->prev) {
 		if (a_cas(&p->state, WAITING, SIGNALED) != WAITING) {
-			ref++;
-			p->notify = &ref;
+			a_fetch_add(&ref, 1);
+			a_cas_p(&p->notify, 0, &ref);
 			/* Let the signaler do all work concerning consistency of the list. */
 			if (c->_c_head == p) c->_c_head = p->next;
 			else if (p->prev) p->prev->next = p->next;
-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary
  2014-08-27  9:57 [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary Jens Gustedt
@ 2014-08-27 20:07 ` Rich Felker
  2014-08-27 21:30   ` Jens Gustedt
  0 siblings, 1 reply; 4+ messages in thread
From: Rich Felker @ 2014-08-27 20:07 UTC (permalink / raw)
  To: musl

On Wed, Aug 27, 2014 at 11:57:47AM +0200, Jens Gustedt wrote:
> +		/* Since this leaving waiter might not have held the _c_lock, the following       */
> +		/* __wake might be issued when the signaler is still inside its CS.               */
> +		/* But if so, this avoids a __wait of the signaler, which more important.         */
> +		/* This should not target any spurious wake up in any other thread:               */
> +		/* ref is on the stack of the signaler, and that signaler is still alive.         */
>  		if (ref) {
>  			if (a_fetch_add(ref, -1)==1)
>  				__wake(ref, 1, 1);

Can't you avoid that with the design I suggested, having the signaler
use an extra ref count on itself, which it decrements right before
waiting?

Aside from that, based on my reading so far, these patches look like
they should work correctly. But since we both want to get C11 threads
done, let's put them aside for now (pending some testing for
measurable benefits). I also have some other potential changes to this
code based on my latest comments to:

http://austingroupbugs.net/view.php?id=609

regarding things they seem to deem as requirements, and which musl
does not satisfy, that are specified in non-normative text. So there's
likely to be more cond var work to do before the release still...

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary
  2014-08-27 20:07 ` Rich Felker
@ 2014-08-27 21:30   ` Jens Gustedt
  2014-08-27 21:48     ` Rich Felker
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Gustedt @ 2014-08-27 21:30 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3547 bytes --]

Am Mittwoch, den 27.08.2014, 16:07 -0400 schrieb Rich Felker:
> On Wed, Aug 27, 2014 at 11:57:47AM +0200, Jens Gustedt wrote:
> > +		/* Since this leaving waiter might not have held the _c_lock, the following       */
> > +		/* __wake might be issued when the signaler is still inside its CS.               */
> > +		/* But if so, this avoids a __wait of the signaler, which more important.         */
> > +		/* This should not target any spurious wake up in any other thread:               */
> > +		/* ref is on the stack of the signaler, and that signaler is still alive.         */
> >  		if (ref) {
> >  			if (a_fetch_add(ref, -1)==1)
> >  				__wake(ref, 1, 1);
> 
> Can't you avoid that with the design I suggested, having the signaler
> use an extra ref count on itself, which it decrements right before
> waiting?

Probably, but I have not completely thought this through. But if so
this would round up this series nicely with a third patch.

> Aside from that, based on my reading so far, these patches look like
> they should work correctly. But since we both want to get C11 threads
> done,

Exactly, that was my idea, freeze the main ideas in some short patches
and have them in the list archives for future use.

> let's put them aside for now (pending some testing for
> measurable benefits).


> I also have some other potential changes to this
> code based on my latest comments to:
> 
> http://austingroupbugs.net/view.php?id=609
> 
> regarding things they seem to deem as requirements, and which musl
> does not satisfy, that are specified in non-normative text. So there's
> likely to be more cond var work to do before the release still...

Ah, the cancelation stuff. As if condition variables wouldn't be
complicated enough already, without cancelation. We already have two
different ordered sequences of events, those on the cv and those on
the mutex. The discussion (and our implementation struggles) already
shows how difficult it is to get these two linear sequences ordered in
a convenient way. If you add a third set of events that are neither
ordered among themselves (cancelation to different threads are
asynchronous) nor with any of the two sequences, the semantics aren't
clear at all. (This is why I think that generally thread cancelation
is not a good idea, and why it is not very widely used. It contributes
for more than 50% to the complexity of the implementation of
pthreads.)

But with the current implementation, I would think that it basically
fulfills (or can be easily made to fulfill) the requirement that
cancelation would not be "consuming" a signal when some other thread
is available. We are marking threads as WAITING, LEAVING or SIGNALED
and only for WAITING, a thread can be consired "blocked" on the
cv. The transition between these is atomic, and so once a signaler
marked a thread SIGNALED, it is not blocked and has rightly consumed
the signal.

I didn't check, though, if timedwait returns 0 in that case the final
value is SIGNALED. If not, that would probably be a reasonable way to
go. Something like

if (SIGNALED && "mutex sucessfully acquired") return 0
else return the proper error code as before

Jens



-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary
  2014-08-27 21:30   ` Jens Gustedt
@ 2014-08-27 21:48     ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2014-08-27 21:48 UTC (permalink / raw)
  To: musl

On Wed, Aug 27, 2014 at 11:30:26PM +0200, Jens Gustedt wrote:
> > I also have some other potential changes to this
> > code based on my latest comments to:
> > 
> > http://austingroupbugs.net/view.php?id=609
> > 
> > regarding things they seem to deem as requirements, and which musl
> > does not satisfy, that are specified in non-normative text. So there's
> > likely to be more cond var work to do before the release still...
> 
> Ah, the cancelation stuff. As if condition variables wouldn't be
> complicated enough already, without cancelation. We already have two
> different ordered sequences of events, those on the cv and those on
> the mutex. The discussion (and our implementation struggles) already
> shows how difficult it is to get these two linear sequences ordered in
> a convenient way. If you add a third set of events that are neither
> ordered among themselves (cancelation to different threads are
> asynchronous) nor with any of the two sequences, the semantics aren't
> clear at all. (This is why I think that generally thread cancelation
> is not a good idea, and why it is not very widely used. It contributes
> for more than 50% to the complexity of the implementation of
> pthreads.)
> 
> But with the current implementation, I would think that it basically
> fulfills (or can be easily made to fulfill) the requirement that
> cancelation would not be "consuming" a signal when some other thread
> is available. We are marking threads as WAITING, LEAVING or SIGNALED
> and only for WAITING, a thread can be consired "blocked" on the
> cv. The transition between these is atomic, and so once a signaler
> marked a thread SIGNALED, it is not blocked and has rightly consumed
> the signal.

Yet this transition to SIGNALED can happen when the waiter is already
executing the cancellation cleanup handler, before the a_cas there. In
this case, it has "consumed the signal", but __timedwait never
returns (the __syscall_cp in timedwait never returns).

I have a patch which solves this problem via setjmp in
pthread_cond_timedwait and longjmp in unwait when SIGNALED won the
a_cas race, but it has noticable performance cost (due to
unconditional setjmp on each call).

The ideal solution would be to implement the cancellation variant I've
been wanting to add for some time now: a cancellation mode where the
cancelled function returns with ECANCELED rather than acting on
cancellation immediately. This can be implemented by having the
cancellation signal handler not just check the program counter, but
also modify it, when this mode is in effect, so that returning from
the signal handler skips the syscall and instead returns -ECANCELED.

With that done, all of the nasty libc-internal use of cancellation
cleanup handlers could be replaced with temporarily changing the
cancellation mode and simply checking return values/errno for
ECANCELED. And it allows us to implement things like the cond var
behavior where deciding whether to act on cancellation or leave it
pending should take place in userspace after the syscall returns.

We can also expose this behavior as an experimental public interface
and propose it for standardization, but there are a lot of corner
cases I'd want to analyze in more detail before doing so to make sure
they're done right.

Rich


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-08-27 21:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-27  9:57 [PATCH 2/2] avoid taking _c_lock if we know it isn't necessary Jens Gustedt
2014-08-27 20:07 ` Rich Felker
2014-08-27 21:30   ` Jens Gustedt
2014-08-27 21:48     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).