[PATCH] Use CAS instead of atomic swap to implement spinlock

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [PATCH] Use CAS instead of atomic swap to implement spinlock
@ 2015-04-14 22:44 Alexander Monakov
  2015-04-14 23:16 ` Rich Felker
  2015-04-19  0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker
  0 siblings, 2 replies; 8+ messages in thread
From: Alexander Monakov @ 2015-04-14 22:44 UTC (permalink / raw)
  To: musl; +Cc: Alexander Monakov

This should allow spinning without constantly dirtying cache lines holding the
spinlock value.  On architectures without native atomic swap, musl implement
a_swap by looping around a_cas.
---
If I'm not mistaken this was also suggested by nsz on IRC.

 src/thread/pthread_spin_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c
index df575f0..dabcb31 100644
--- a/src/thread/pthread_spin_lock.c
+++ b/src/thread/pthread_spin_lock.c
@@ -2,6 +2,6 @@
 
 int pthread_spin_lock(pthread_spinlock_t *s)
 {
-	while (a_swap(s, 1)) a_spin();
+	while (a_cas(s, 0, 1)) a_spin();
 	return 0;
 }
-- 
2.1.3



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock
  2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov
@ 2015-04-14 23:16 ` Rich Felker
  2015-04-15  6:29   ` Concerns about dlclose() behaviour Brad Conroy
  2015-04-19  0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker
  1 sibling, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-04-14 23:16 UTC (permalink / raw)
  To: musl

On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote:
> This should allow spinning without constantly dirtying cache lines holding the
> spinlock value.  On architectures without native atomic swap, musl implement
> a_swap by looping around a_cas.

It won't help on x86, where all lock-prefixed ops perform both a read
and a write even if they fail. But it should help in ll/sc style
archs, and shouldn't hurt on x86.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Concerns about dlclose() behaviour
  2015-04-14 23:16 ` Rich Felker
@ 2015-04-15  6:29   ` Brad Conroy
  2015-04-15  7:17     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Brad Conroy @ 2015-04-15  6:29 UTC (permalink / raw)
  To: musl

I recently checked out the netsurf-framebuffer package from debian
and was bothered that all of the backends were compiled in...
thus requiring a bunch of extra dependencies and increasing footprint.

I am currently patching it to make the backends modular, but ran into
a concern about musl compatibility.

libnsfb is set up in such a way that each backend has (static) functions
exported in a single struct.  With normal dlclose() functionality, I could
use dlopen to get a module and run its init function, then if it fails, just
dlclose() it and try the next backend (all using the same variable and
function names, thus simplifying the implementation)

I am thinking this will still work on musl in a way similar to using
LD_PRELOAD, but wanted to verify before moving forward since many
nsfb users also use musl.

R,
Brad Conroy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about dlclose() behaviour
  2015-04-15  6:29   ` Concerns about dlclose() behaviour Brad Conroy
@ 2015-04-15  7:17     ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2015-04-15  7:17 UTC (permalink / raw)
  To: musl

On Tue, Apr 14, 2015 at 11:29:17PM -0700, Brad Conroy wrote:
> I recently checked out the netsurf-framebuffer package from debian
> and was bothered that all of the backends were compiled in...
> thus requiring a bunch of extra dependencies and increasing footprint.
> 
> I am currently patching it to make the backends modular, but ran into
> a concern about musl compatibility.
> 
> libnsfb is set up in such a way that each backend has (static) functions
> exported in a single struct.  With normal dlclose() functionality, I could
> use dlopen to get a module and run its init function, then if it fails, just
> dlclose() it and try the next backend (all using the same variable and
> function names, thus simplifying the implementation)
> 
> I am thinking this will still work on musl in a way similar to using
> LD_PRELOAD, but wanted to verify before moving forward since many
> nsfb users also use musl.

If I understand correctly, what you're hoping is that dlclose will
remove the symbols loaded by dlopen so that you can replace them with
new definitions. This is not going to happen, but I'm not sure your
conclusion that you need it to happen is correct. If you're using
RTLD_LOCAL, the names aren't even global to begin with. How are you
calling/accessing them? By dlsym? Or is some other library getting
loaded and binding to the names in the global (RTLD_GLOBAL) namespace?

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock
  2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov
  2015-04-14 23:16 ` Rich Felker
@ 2015-04-19  0:24 ` Rich Felker
  2015-04-19  5:50   ` Alexander Monakov
  1 sibling, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-04-19  0:24 UTC (permalink / raw)
  To: musl

On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote:
> This should allow spinning without constantly dirtying cache lines holding the
> spinlock value.  On architectures without native atomic swap, musl implement
> a_swap by looping around a_cas.
> ---
> If I'm not mistaken this was also suggested by nsz on IRC.
> 
>  src/thread/pthread_spin_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c
> index df575f0..dabcb31 100644
> --- a/src/thread/pthread_spin_lock.c
> +++ b/src/thread/pthread_spin_lock.c
> @@ -2,6 +2,6 @@
>  
>  int pthread_spin_lock(pthread_spinlock_t *s)
>  {
> -	while (a_swap(s, 1)) a_spin();
> +	while (a_cas(s, 0, 1)) a_spin();
>  	return 0;
>  }

Would it perhaps be better to do something like this?

while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin();

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock
  2015-04-19  0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker
@ 2015-04-19  5:50   ` Alexander Monakov
  2015-04-19  6:01     ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2015-04-19  5:50 UTC (permalink / raw)
  To: musl

On Sat, 18 Apr 2015, Rich Felker wrote:

> On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote:
> > This should allow spinning without constantly dirtying cache lines holding the
> > spinlock value.  On architectures without native atomic swap, musl implement
> > a_swap by looping around a_cas.
> > ---
> > If I'm not mistaken this was also suggested by nsz on IRC.
> > 
> >  src/thread/pthread_spin_lock.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c
> > index df575f0..dabcb31 100644
> > --- a/src/thread/pthread_spin_lock.c
> > +++ b/src/thread/pthread_spin_lock.c
> > @@ -2,6 +2,6 @@
> >  
> >  int pthread_spin_lock(pthread_spinlock_t *s)
> >  {
> > -	while (a_swap(s, 1)) a_spin();
> > +	while (a_cas(s, 0, 1)) a_spin();
> >  	return 0;
> >  }
> 
> Would it perhaps be better to do something like this?
> 
> while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin();

I think so, Yes.  Is the cast required, or is it possible to change the
pthread_spinlock_t typedef to 'volatile int'?

Thanks.
Alexander


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock
  2015-04-19  5:50   ` Alexander Monakov
@ 2015-04-19  6:01     ` Rich Felker
  2015-04-19 12:46       ` Szabolcs Nagy
  0 siblings, 1 reply; 8+ messages in thread
From: Rich Felker @ 2015-04-19  6:01 UTC (permalink / raw)
  To: musl

On Sun, Apr 19, 2015 at 08:50:18AM +0300, Alexander Monakov wrote:
> On Sat, 18 Apr 2015, Rich Felker wrote:
> 
> > On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote:
> > > This should allow spinning without constantly dirtying cache lines holding the
> > > spinlock value.  On architectures without native atomic swap, musl implement
> > > a_swap by looping around a_cas.
> > > ---
> > > If I'm not mistaken this was also suggested by nsz on IRC.
> > > 
> > >  src/thread/pthread_spin_lock.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c
> > > index df575f0..dabcb31 100644
> > > --- a/src/thread/pthread_spin_lock.c
> > > +++ b/src/thread/pthread_spin_lock.c
> > > @@ -2,6 +2,6 @@
> > >  
> > >  int pthread_spin_lock(pthread_spinlock_t *s)
> > >  {
> > > -	while (a_swap(s, 1)) a_spin();
> > > +	while (a_cas(s, 0, 1)) a_spin();
> > >  	return 0;
> > >  }
> > 
> > Would it perhaps be better to do something like this?
> > 
> > while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin();
> 
> I think so, Yes.  Is the cast required, or is it possible to change the
> pthread_spinlock_t typedef to 'volatile int'?

For C++ ABI purposes, I think switching to volatile int would be a
different type. :(

I wouldn't really be opposed to changing it for C and just having the
ABI-compat type used when __cplusplus is defined. We already do that
for pthread_t.

Rich


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock
  2015-04-19  6:01     ` Rich Felker
@ 2015-04-19 12:46       ` Szabolcs Nagy
  0 siblings, 0 replies; 8+ messages in thread
From: Szabolcs Nagy @ 2015-04-19 12:46 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

* Rich Felker <dalias@libc.org> [2015-04-19 02:01:32 -0400]:
> On Sun, Apr 19, 2015 at 08:50:18AM +0300, Alexander Monakov wrote:
> > On Sat, 18 Apr 2015, Rich Felker wrote:
> > > 
> > > while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin();
> > 
> > I think so, Yes.  Is the cast required, or is it possible to change the
> > pthread_spinlock_t typedef to 'volatile int'?
> 
> For C++ ABI purposes, I think switching to volatile int would be a
> different type. :(
> 
> I wouldn't really be opposed to changing it for C and just having the
> ABI-compat type used when __cplusplus is defined. We already do that
> for pthread_t.
> 

btw pthread_spinlock_t is volatile int in glibc,
i didnt catch this abi diff earlier because it only matters when used
as a pointer

now changed my abi checks for all type T to do

 void x_T(T x, T* ptr, size(*y)[sizeof(T)], align(*z)[__alignof__(T)]){}

but i found no other case with qualifier differences
(updated x86_64 abi diffs are attached)

[-- Attachment #2: abi.x86_64.diff --]
[-- Type: text/x-diff, Size: 4079 bytes --]

--- abi.x86_64.glibc
+++ abi.x86_64.musl
@@ -2 +2 @@
-x_CODE(_code, _code*, size (*) [16], align (*) [8])
+x_CODE(CODE, CODE*, size (*) [16], align (*) [8])
@@ -69 +69 @@
-x___jmp_buf(long*, long (*) [8], size (*) [64], align (*) [8])
+x___jmp_buf(unsigned long*, unsigned long (*) [8], size (*) [64], align (*) [8])
@@ -97 +97 @@
-x_cmsghdr(cmsghdr, cmsghdr*, size (*) [16], align (*) [8])
+x_cmsghdr(cmsghdr, cmsghdr*, size (*) [16], align (*) [4])
@@ -100 +100 @@
-x_crypt_data(crypt_data, crypt_data*, size (*) [131232], align (*) [8])
+x_crypt_data(crypt_data, crypt_data*, size (*) [260], align (*) [4])
@@ -121,2 +121,2 @@
-x_ether_header(ether_header, ether_header*, size (*) [14], align (*) [1])
-x_ethhdr(ethhdr, ethhdr*, size (*) [14], align (*) [1])
+x_ether_header(ether_header, ether_header*, size (*) [14], align (*) [2])
+x_ethhdr(ethhdr, ethhdr*, size (*) [14], align (*) [2])
@@ -127 +127 @@
-x_fd_mask(long, long*, size (*) [8], align (*) [8])
+x_fd_mask(unsigned long, unsigned long*, size (*) [8], align (*) [8])
@@ -135 +135 @@
-x_fpregset_t(_libc_fpstate*, _libc_fpstate**, size (*) [8], align (*) [8])
+x_fpregset_t(_fpstate*, _fpstate**, size (*) [8], align (*) [8])
@@ -180,2 +180,2 @@
-x_int_fast16_t(long, long*, size (*) [8], align (*) [8])
-x_int_fast32_t(long, long*, size (*) [8], align (*) [8])
+x_int_fast16_t(int, int*, size (*) [4], align (*) [4])
+x_int_fast32_t(int, int*, size (*) [4], align (*) [4])
@@ -218 +218 @@
-x_lastlog(lastlog, lastlog*, size (*) [292], align (*) [4])
+x_lastlog(lastlog, lastlog*, size (*) [296], align (*) [8])
@@ -281 +281 @@
-x_ntptimeval(ntptimeval, ntptimeval*, size (*) [72], align (*) [8])
+x_ntptimeval(ntptimeval, ntptimeval*, size (*) [32], align (*) [8])
@@ -307,2 +307,2 @@
-x_pthread_rwlockattr_t(pthread_rwlockattr_t, pthread_rwlockattr_t*, size (*) [8], align (*) [8])
-x_pthread_spinlock_t(int, int volatile*, size (*) [4], align (*) [4])
+x_pthread_rwlockattr_t(pthread_rwlockattr_t, pthread_rwlockattr_t*, size (*) [8], align (*) [4])
+x_pthread_spinlock_t(int, int*, size (*) [4], align (*) [4])
@@ -312 +312 @@
-x_quad_t(long, long*, size (*) [8], align (*) [8])
+x_quad_t(long long, long long*, size (*) [8], align (*) [8])
@@ -317,2 +317,2 @@
-x_regmatch_t(regmatch_t, regmatch_t*, size (*) [8], align (*) [4])
-x_regoff_t(int, int*, size (*) [4], align (*) [4])
+x_regmatch_t(regmatch_t, regmatch_t*, size (*) [16], align (*) [8])
+x_regoff_t(long, long*, size (*) [8], align (*) [8])
@@ -321 +321 @@
-x_rlim_t(unsigned long, unsigned long*, size (*) [8], align (*) [8])
+x_rlim_t(unsigned long long, unsigned long long*, size (*) [8], align (*) [8])
@@ -327 +327 @@
-x_rusage(rusage, rusage*, size (*) [144], align (*) [8])
+x_rusage(rusage, rusage*, size (*) [272], align (*) [8])
@@ -329,2 +329,2 @@
-x_sched_param(sched_param, sched_param*, size (*) [4], align (*) [4])
-x_sem_t(sem_t, sem_t*, size (*) [32], align (*) [8])
+x_sched_param(sched_param, sched_param*, size (*) [48], align (*) [8])
+x_sem_t(sem_t, sem_t*, size (*) [32], align (*) [4])
@@ -383 +383 @@
-x_sysinfo(sysinfo, sysinfo*, size (*) [112], align (*) [8])
+x_sysinfo(sysinfo, sysinfo*, size (*) [368], align (*) [8])
@@ -385 +385 @@
-x_tcp_info(tcp_info, tcp_info*, size (*) [104], align (*) [4])
+x_tcp_info(tcp_info, tcp_info*, size (*) [120], align (*) [8])
@@ -389 +389 @@
-x_tftphdr(tftphdr, tftphdr*, size (*) [5], align (*) [1])
+x_tftphdr(tftphdr, tftphdr*, size (*) [6], align (*) [2])
@@ -407 +407 @@
-x_u_quad_t(unsigned long, unsigned long*, size (*) [8], align (*) [8])
+x_u_quad_t(unsigned long long, unsigned long long*, size (*) [8], align (*) [8])
@@ -418,2 +418,2 @@
-x_uint_fast16_t(unsigned long, unsigned long*, size (*) [8], align (*) [8])
-x_uint_fast32_t(unsigned long, unsigned long*, size (*) [8], align (*) [8])
+x_uint_fast16_t(unsigned int, unsigned int*, size (*) [4], align (*) [4])
+x_uint_fast32_t(unsigned int, unsigned int*, size (*) [4], align (*) [4])
@@ -435 +435 @@
-x_utmpx(utmpx, utmpx*, size (*) [384], align (*) [4])
+x_utmpx(utmpx, utmpx*, size (*) [400], align (*) [8])

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-19 12:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov
2015-04-14 23:16 ` Rich Felker
2015-04-15  6:29   ` Concerns about dlclose() behaviour Brad Conroy
2015-04-15  7:17     ` Rich Felker
2015-04-19  0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker
2015-04-19  5:50   ` Alexander Monakov
2015-04-19  6:01     ` Rich Felker
2015-04-19 12:46       ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).