* [PATCH] Use CAS instead of atomic swap to implement spinlock @ 2015-04-14 22:44 Alexander Monakov 2015-04-14 23:16 ` Rich Felker 2015-04-19 0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker 0 siblings, 2 replies; 8+ messages in thread From: Alexander Monakov @ 2015-04-14 22:44 UTC (permalink / raw) To: musl; +Cc: Alexander Monakov This should allow spinning without constantly dirtying cache lines holding the spinlock value. On architectures without native atomic swap, musl implement a_swap by looping around a_cas. --- If I'm not mistaken this was also suggested by nsz on IRC. src/thread/pthread_spin_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c index df575f0..dabcb31 100644 --- a/src/thread/pthread_spin_lock.c +++ b/src/thread/pthread_spin_lock.c @@ -2,6 +2,6 @@ int pthread_spin_lock(pthread_spinlock_t *s) { - while (a_swap(s, 1)) a_spin(); + while (a_cas(s, 0, 1)) a_spin(); return 0; } -- 2.1.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock 2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov @ 2015-04-14 23:16 ` Rich Felker 2015-04-15 6:29 ` Concerns about dlclose() behaviour Brad Conroy 2015-04-19 0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker 1 sibling, 1 reply; 8+ messages in thread From: Rich Felker @ 2015-04-14 23:16 UTC (permalink / raw) To: musl On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote: > This should allow spinning without constantly dirtying cache lines holding the > spinlock value. On architectures without native atomic swap, musl implement > a_swap by looping around a_cas. It won't help on x86, where all lock-prefixed ops perform both a read and a write even if they fail. But it should help in ll/sc style archs, and shouldn't hurt on x86. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Concerns about dlclose() behaviour 2015-04-14 23:16 ` Rich Felker @ 2015-04-15 6:29 ` Brad Conroy 2015-04-15 7:17 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Brad Conroy @ 2015-04-15 6:29 UTC (permalink / raw) To: musl I recently checked out the netsurf-framebuffer package from debian and was bothered that all of the backends were compiled in... thus requiring a bunch of extra dependencies and increasing footprint. I am currently patching it to make the backends modular, but ran into a concern about musl compatibility. libnsfb is set up in such a way that each backend has (static) functions exported in a single struct. With normal dlclose() functionality, I could use dlopen to get a module and run its init function, then if it fails, just dlclose() it and try the next backend (all using the same variable and function names, thus simplifying the implementation) I am thinking this will still work on musl in a way similar to using LD_PRELOAD, but wanted to verify before moving forward since many nsfb users also use musl. R, Brad Conroy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Concerns about dlclose() behaviour 2015-04-15 6:29 ` Concerns about dlclose() behaviour Brad Conroy @ 2015-04-15 7:17 ` Rich Felker 0 siblings, 0 replies; 8+ messages in thread From: Rich Felker @ 2015-04-15 7:17 UTC (permalink / raw) To: musl On Tue, Apr 14, 2015 at 11:29:17PM -0700, Brad Conroy wrote: > I recently checked out the netsurf-framebuffer package from debian > and was bothered that all of the backends were compiled in... > thus requiring a bunch of extra dependencies and increasing footprint. > > I am currently patching it to make the backends modular, but ran into > a concern about musl compatibility. > > libnsfb is set up in such a way that each backend has (static) functions > exported in a single struct. With normal dlclose() functionality, I could > use dlopen to get a module and run its init function, then if it fails, just > dlclose() it and try the next backend (all using the same variable and > function names, thus simplifying the implementation) > > I am thinking this will still work on musl in a way similar to using > LD_PRELOAD, but wanted to verify before moving forward since many > nsfb users also use musl. If I understand correctly, what you're hoping is that dlclose will remove the symbols loaded by dlopen so that you can replace them with new definitions. This is not going to happen, but I'm not sure your conclusion that you need it to happen is correct. If you're using RTLD_LOCAL, the names aren't even global to begin with. How are you calling/accessing them? By dlsym? Or is some other library getting loaded and binding to the names in the global (RTLD_GLOBAL) namespace? Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock 2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov 2015-04-14 23:16 ` Rich Felker @ 2015-04-19 0:24 ` Rich Felker 2015-04-19 5:50 ` Alexander Monakov 1 sibling, 1 reply; 8+ messages in thread From: Rich Felker @ 2015-04-19 0:24 UTC (permalink / raw) To: musl On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote: > This should allow spinning without constantly dirtying cache lines holding the > spinlock value. On architectures without native atomic swap, musl implement > a_swap by looping around a_cas. > --- > If I'm not mistaken this was also suggested by nsz on IRC. > > src/thread/pthread_spin_lock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c > index df575f0..dabcb31 100644 > --- a/src/thread/pthread_spin_lock.c > +++ b/src/thread/pthread_spin_lock.c > @@ -2,6 +2,6 @@ > > int pthread_spin_lock(pthread_spinlock_t *s) > { > - while (a_swap(s, 1)) a_spin(); > + while (a_cas(s, 0, 1)) a_spin(); > return 0; > } Would it perhaps be better to do something like this? while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin(); Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock 2015-04-19 0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker @ 2015-04-19 5:50 ` Alexander Monakov 2015-04-19 6:01 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Alexander Monakov @ 2015-04-19 5:50 UTC (permalink / raw) To: musl On Sat, 18 Apr 2015, Rich Felker wrote: > On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote: > > This should allow spinning without constantly dirtying cache lines holding the > > spinlock value. On architectures without native atomic swap, musl implement > > a_swap by looping around a_cas. > > --- > > If I'm not mistaken this was also suggested by nsz on IRC. > > > > src/thread/pthread_spin_lock.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c > > index df575f0..dabcb31 100644 > > --- a/src/thread/pthread_spin_lock.c > > +++ b/src/thread/pthread_spin_lock.c > > @@ -2,6 +2,6 @@ > > > > int pthread_spin_lock(pthread_spinlock_t *s) > > { > > - while (a_swap(s, 1)) a_spin(); > > + while (a_cas(s, 0, 1)) a_spin(); > > return 0; > > } > > Would it perhaps be better to do something like this? > > while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin(); I think so, Yes. Is the cast required, or is it possible to change the pthread_spinlock_t typedef to 'volatile int'? Thanks. Alexander ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock 2015-04-19 5:50 ` Alexander Monakov @ 2015-04-19 6:01 ` Rich Felker 2015-04-19 12:46 ` Szabolcs Nagy 0 siblings, 1 reply; 8+ messages in thread From: Rich Felker @ 2015-04-19 6:01 UTC (permalink / raw) To: musl On Sun, Apr 19, 2015 at 08:50:18AM +0300, Alexander Monakov wrote: > On Sat, 18 Apr 2015, Rich Felker wrote: > > > On Wed, Apr 15, 2015 at 01:44:53AM +0300, Alexander Monakov wrote: > > > This should allow spinning without constantly dirtying cache lines holding the > > > spinlock value. On architectures without native atomic swap, musl implement > > > a_swap by looping around a_cas. > > > --- > > > If I'm not mistaken this was also suggested by nsz on IRC. > > > > > > src/thread/pthread_spin_lock.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/src/thread/pthread_spin_lock.c b/src/thread/pthread_spin_lock.c > > > index df575f0..dabcb31 100644 > > > --- a/src/thread/pthread_spin_lock.c > > > +++ b/src/thread/pthread_spin_lock.c > > > @@ -2,6 +2,6 @@ > > > > > > int pthread_spin_lock(pthread_spinlock_t *s) > > > { > > > - while (a_swap(s, 1)) a_spin(); > > > + while (a_cas(s, 0, 1)) a_spin(); > > > return 0; > > > } > > > > Would it perhaps be better to do something like this? > > > > while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin(); > > I think so, Yes. Is the cast required, or is it possible to change the > pthread_spinlock_t typedef to 'volatile int'? For C++ ABI purposes, I think switching to volatile int would be a different type. :( I wouldn't really be opposed to changing it for C and just having the ABI-compat type used when __cplusplus is defined. We already do that for pthread_t. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Use CAS instead of atomic swap to implement spinlock 2015-04-19 6:01 ` Rich Felker @ 2015-04-19 12:46 ` Szabolcs Nagy 0 siblings, 0 replies; 8+ messages in thread From: Szabolcs Nagy @ 2015-04-19 12:46 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 981 bytes --] * Rich Felker <dalias@libc.org> [2015-04-19 02:01:32 -0400]: > On Sun, Apr 19, 2015 at 08:50:18AM +0300, Alexander Monakov wrote: > > On Sat, 18 Apr 2015, Rich Felker wrote: > > > > > > while (*(volatile int *)s || a_cas(s, 0, 1)) a_spin(); > > > > I think so, Yes. Is the cast required, or is it possible to change the > > pthread_spinlock_t typedef to 'volatile int'? > > For C++ ABI purposes, I think switching to volatile int would be a > different type. :( > > I wouldn't really be opposed to changing it for C and just having the > ABI-compat type used when __cplusplus is defined. We already do that > for pthread_t. > btw pthread_spinlock_t is volatile int in glibc, i didnt catch this abi diff earlier because it only matters when used as a pointer now changed my abi checks for all type T to do void x_T(T x, T* ptr, size(*y)[sizeof(T)], align(*z)[__alignof__(T)]){} but i found no other case with qualifier differences (updated x86_64 abi diffs are attached) [-- Attachment #2: abi.x86_64.diff --] [-- Type: text/x-diff, Size: 4079 bytes --] --- abi.x86_64.glibc +++ abi.x86_64.musl @@ -2 +2 @@ -x_CODE(_code, _code*, size (*) [16], align (*) [8]) +x_CODE(CODE, CODE*, size (*) [16], align (*) [8]) @@ -69 +69 @@ -x___jmp_buf(long*, long (*) [8], size (*) [64], align (*) [8]) +x___jmp_buf(unsigned long*, unsigned long (*) [8], size (*) [64], align (*) [8]) @@ -97 +97 @@ -x_cmsghdr(cmsghdr, cmsghdr*, size (*) [16], align (*) [8]) +x_cmsghdr(cmsghdr, cmsghdr*, size (*) [16], align (*) [4]) @@ -100 +100 @@ -x_crypt_data(crypt_data, crypt_data*, size (*) [131232], align (*) [8]) +x_crypt_data(crypt_data, crypt_data*, size (*) [260], align (*) [4]) @@ -121,2 +121,2 @@ -x_ether_header(ether_header, ether_header*, size (*) [14], align (*) [1]) -x_ethhdr(ethhdr, ethhdr*, size (*) [14], align (*) [1]) +x_ether_header(ether_header, ether_header*, size (*) [14], align (*) [2]) +x_ethhdr(ethhdr, ethhdr*, size (*) [14], align (*) [2]) @@ -127 +127 @@ -x_fd_mask(long, long*, size (*) [8], align (*) [8]) +x_fd_mask(unsigned long, unsigned long*, size (*) [8], align (*) [8]) @@ -135 +135 @@ -x_fpregset_t(_libc_fpstate*, _libc_fpstate**, size (*) [8], align (*) [8]) +x_fpregset_t(_fpstate*, _fpstate**, size (*) [8], align (*) [8]) @@ -180,2 +180,2 @@ -x_int_fast16_t(long, long*, size (*) [8], align (*) [8]) -x_int_fast32_t(long, long*, size (*) [8], align (*) [8]) +x_int_fast16_t(int, int*, size (*) [4], align (*) [4]) +x_int_fast32_t(int, int*, size (*) [4], align (*) [4]) @@ -218 +218 @@ -x_lastlog(lastlog, lastlog*, size (*) [292], align (*) [4]) +x_lastlog(lastlog, lastlog*, size (*) [296], align (*) [8]) @@ -281 +281 @@ -x_ntptimeval(ntptimeval, ntptimeval*, size (*) [72], align (*) [8]) +x_ntptimeval(ntptimeval, ntptimeval*, size (*) [32], align (*) [8]) @@ -307,2 +307,2 @@ -x_pthread_rwlockattr_t(pthread_rwlockattr_t, pthread_rwlockattr_t*, size (*) [8], align (*) [8]) -x_pthread_spinlock_t(int, int volatile*, size (*) [4], align (*) [4]) +x_pthread_rwlockattr_t(pthread_rwlockattr_t, pthread_rwlockattr_t*, size (*) [8], align (*) [4]) +x_pthread_spinlock_t(int, int*, size (*) [4], align (*) [4]) @@ -312 +312 @@ -x_quad_t(long, long*, size (*) [8], align (*) [8]) +x_quad_t(long long, long long*, size (*) [8], align (*) [8]) @@ -317,2 +317,2 @@ -x_regmatch_t(regmatch_t, regmatch_t*, size (*) [8], align (*) [4]) -x_regoff_t(int, int*, size (*) [4], align (*) [4]) +x_regmatch_t(regmatch_t, regmatch_t*, size (*) [16], align (*) [8]) +x_regoff_t(long, long*, size (*) [8], align (*) [8]) @@ -321 +321 @@ -x_rlim_t(unsigned long, unsigned long*, size (*) [8], align (*) [8]) +x_rlim_t(unsigned long long, unsigned long long*, size (*) [8], align (*) [8]) @@ -327 +327 @@ -x_rusage(rusage, rusage*, size (*) [144], align (*) [8]) +x_rusage(rusage, rusage*, size (*) [272], align (*) [8]) @@ -329,2 +329,2 @@ -x_sched_param(sched_param, sched_param*, size (*) [4], align (*) [4]) -x_sem_t(sem_t, sem_t*, size (*) [32], align (*) [8]) +x_sched_param(sched_param, sched_param*, size (*) [48], align (*) [8]) +x_sem_t(sem_t, sem_t*, size (*) [32], align (*) [4]) @@ -383 +383 @@ -x_sysinfo(sysinfo, sysinfo*, size (*) [112], align (*) [8]) +x_sysinfo(sysinfo, sysinfo*, size (*) [368], align (*) [8]) @@ -385 +385 @@ -x_tcp_info(tcp_info, tcp_info*, size (*) [104], align (*) [4]) +x_tcp_info(tcp_info, tcp_info*, size (*) [120], align (*) [8]) @@ -389 +389 @@ -x_tftphdr(tftphdr, tftphdr*, size (*) [5], align (*) [1]) +x_tftphdr(tftphdr, tftphdr*, size (*) [6], align (*) [2]) @@ -407 +407 @@ -x_u_quad_t(unsigned long, unsigned long*, size (*) [8], align (*) [8]) +x_u_quad_t(unsigned long long, unsigned long long*, size (*) [8], align (*) [8]) @@ -418,2 +418,2 @@ -x_uint_fast16_t(unsigned long, unsigned long*, size (*) [8], align (*) [8]) -x_uint_fast32_t(unsigned long, unsigned long*, size (*) [8], align (*) [8]) +x_uint_fast16_t(unsigned int, unsigned int*, size (*) [4], align (*) [4]) +x_uint_fast32_t(unsigned int, unsigned int*, size (*) [4], align (*) [4]) @@ -435 +435 @@ -x_utmpx(utmpx, utmpx*, size (*) [384], align (*) [4]) +x_utmpx(utmpx, utmpx*, size (*) [400], align (*) [8]) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-04-19 12:46 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-14 22:44 [PATCH] Use CAS instead of atomic swap to implement spinlock Alexander Monakov 2015-04-14 23:16 ` Rich Felker 2015-04-15 6:29 ` Concerns about dlclose() behaviour Brad Conroy 2015-04-15 7:17 ` Rich Felker 2015-04-19 0:24 ` [PATCH] Use CAS instead of atomic swap to implement spinlock Rich Felker 2015-04-19 5:50 ` Alexander Monakov 2015-04-19 6:01 ` Rich Felker 2015-04-19 12:46 ` Szabolcs Nagy
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).