* atomic.h cleanup @ 2016-01-10 12:21 Markus Wichmann 2016-01-10 16:57 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Markus Wichmann @ 2016-01-10 12:21 UTC (permalink / raw) To: musl Hi all, The development roadmap on the musl wiki lists the ominous point "atomic.h cleanup" for 1.2.0. I assume you mean a sort of simplification and unification. I noticed that for the RISC arch's there are rather liberal amounts of inline assembly for the atomic operations. And I have always been taught, that as soon as you start copying code, you are probably doing it wrong. So first thing I'd do: add a new file, let's call it atomic_debruijn.h. It contains an implementation of a_ctz() and a_ctz_64() based on the DeBruijn number. That way, all the architectures currently implementing a_ctz() in this manner can just include that file, and a lot of duplicate code goes out the window. Second thing: We can reduce the inline assembly footprint and the amount of duplicate code by adding a new file, let's call it atomic_llsc.h, that implements a_cas(), a_cas_p(), a_swap(), a_fetch_add(), a_inc(), a_dec(), a_and() and a_or() in terms of new functions that would have to be defined, namely: static inline void a_presync(void) - execute any barrier needed before attempting an atomic operation, like "dmb ish" for arm, or "sync" for ppc. static inline void a_postsync(void) - execute any barrier needed afterwards, like "isync" for PPC, or, again, "dmb ish" for ARM. static inline int a_ll(int*) - perform an LL on the given pointer and return the value there. This would be "lwarx" for PPC, or "ldrex" for ARM. static inline int a_sc(int*, int) - perform an SC on the given pointer with the given value. Return zero iff that failed. static inline void* a_ll_p(void*) - same as a_ll(), but with machine words instead of int, if that's a difference. static inline int a_sc_p(void*, void*) - same as a_sc(), but with machine words. With these function we can implement e.g. CAS as: static inline int a_cas(volatile int *p, int t, int s) { int v; do { v = a_ll(p); if (v != t) break; } while (!a_sc(p, s)); return v; } Add some #ifdefs to only activate the pointer variations if they're needed (i.e. if we're on 64 bits) and Bob's your uncle. The only hardship would be in implementing a_sc(), but that can be solved by using a feature often referenced but rarely seen in the wild: ASM goto. How that works is that, if the arch's SC instruction returns success or failure in a flag and the CPU can jump on that flag (unlike, say, microblaze, which can only jump on comparisons), then you encode the jump in the assembly snippet but let the compiler handle the targets for you. Since in all cases, we want to jump on failure, that's what the assembly should do, so for instance for PowerPC: static inline int a_sc(volatile int* p, int x) { __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); return 1; fail: return 0; } I already tried the compiler results for such a design, but I never tried running it for lack of hardware. Anyway, this code makes it possible for the compiler to redirect the conditional jump on failure to the top of the loop in a_cas(). Since the return value isn't used otherwise, the values 1 and 0 never appear in the generated assembly. What do you say to this design? Ciao, Markus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 12:21 atomic.h cleanup Markus Wichmann @ 2016-01-10 16:57 ` Rich Felker 2016-01-10 17:35 ` Markus Wichmann ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Rich Felker @ 2016-01-10 16:57 UTC (permalink / raw) To: musl On Sun, Jan 10, 2016 at 01:21:39PM +0100, Markus Wichmann wrote: > Hi all, > > The development roadmap on the musl wiki lists the ominous point > "atomic.h cleanup" for 1.2.0. > > I assume you mean a sort of simplification and unification. I noticed > that for the RISC arch's there are rather liberal amounts of inline > assembly for the atomic operations. And I have always been taught, that > as soon as you start copying code, you are probably doing it wrong. > > So first thing I'd do: add a new file, let's call it atomic_debruijn.h. > It contains an implementation of a_ctz() and a_ctz_64() based on the > DeBruijn number. That way, all the architectures currently implementing > a_ctz() in this manner can just include that file, and a lot of > duplicate code goes out the window. > > Second thing: We can reduce the inline assembly footprint and the amount > of duplicate code by adding a new file, let's call it atomic_llsc.h, > that implements a_cas(), a_cas_p(), a_swap(), a_fetch_add(), a_inc(), > a_dec(), a_and() and a_or() in terms of new functions that would have to > be defined, namely: > > static inline void a_presync(void) - execute any barrier needed before > attempting an atomic operation, like "dmb ish" for arm, or "sync" for > ppc. > > static inline void a_postsync(void) - execute any barrier needed > afterwards, like "isync" for PPC, or, again, "dmb ish" for ARM. > > static inline int a_ll(int*) - perform an LL on the given pointer and > return the value there. This would be "lwarx" for PPC, or "ldrex" for > ARM. > > static inline int a_sc(int*, int) - perform an SC on the given pointer > with the given value. Return zero iff that failed. > > static inline void* a_ll_p(void*) - same as a_ll(), but with machine > words instead of int, if that's a difference. > > static inline int a_sc_p(void*, void*) - same as a_sc(), but with > machine words. > > > With these function we can implement e.g. CAS as: > > static inline int a_cas(volatile int *p, int t, int s) > { > int v; > do { > v = a_ll(p); > if (v != t) > break; > } while (!a_sc(p, s)); > return v; > } > > Add some #ifdefs to only activate the pointer variations if they're > needed (i.e. if we're on 64 bits) and Bob's your uncle. > > The only hardship would be in implementing a_sc(), but that can be > solved by using a feature often referenced but rarely seen in the wild: > ASM goto. How that works is that, if the arch's SC instruction returns > success or failure in a flag and the CPU can jump on that flag (unlike, > say, microblaze, which can only jump on comparisons), then you encode > the jump in the assembly snippet but let the compiler handle the targets > for you. Since in all cases, we want to jump on failure, that's what the > assembly should do, so for instance for PowerPC: > > static inline int a_sc(volatile int* p, int x) > { > __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); > return 1; > fail: > return 0; > } > > I already tried the compiler results for such a design, but I never > tried running it for lack of hardware. > > Anyway, this code makes it possible for the compiler to redirect the > conditional jump on failure to the top of the loop in a_cas(). Since the > return value isn't used otherwise, the values 1 and 0 never appear in > the generated assembly. > > What do you say to this design? Have you read this thread? :) http://www.openwall.com/lists/musl/2015/05/20/1 I thought at one point it was linked from the wiki but maybe it got lost. Basically I have this done already outside of musl as an experiment, but there are minor details that were holding it up. One annoyance is that, on some archs, success/failure of "sc" comes via a condition flag which the C caller can't easily branch on, so there's an extra conversion to a boolean result inside the asm and extra conversion back to a test/branch outside the asm. In practice we probably don't care. One other issue is that risc-v seems to guarantee, at least on some implementations, stronger forward-progress guarantees than a normal ll/sc as long as the ll/sc are in order, within a few instruction slots of each other, with no branches between. Such conditions cannot be met without putting them in the same asm block, so we might need to do a custom version for risc-v if we want to take advantage of the stronger properties. Anyway, at this point the main obstacle to finishing the task is doing the actual merging and testing, not any new coding, I think. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 16:57 ` Rich Felker @ 2016-01-10 17:35 ` Markus Wichmann 2016-01-10 17:50 ` Alexander Monakov 2016-01-10 17:37 ` Markus Wichmann 2016-01-22 0:09 ` Rich Felker 2 siblings, 1 reply; 14+ messages in thread From: Markus Wichmann @ 2016-01-10 17:35 UTC (permalink / raw) To: musl On Sun, Jan 10, 2016 at 11:57:18AM -0500, Rich Felker wrote: > On Sun, Jan 10, 2016 at 01:21:39PM +0100, Markus Wichmann wrote: >> [...] >> What do you say to this design? > > Have you read this thread? :) > > http://www.openwall.com/lists/musl/2015/05/20/1 > > I thought at one point it was linked from the wiki but maybe it got > lost. > Well, at least it's not on the Roadmap page. There should probably be a wiki page for such open questions. Then people like me wouldn't have to ask (or do extended searches in the mailing list archives, which at the moment is only possible through your favorite search engine, BTW, which is why I've not been doing that so often). > Basically I have this done already outside of musl as an experiment, > but there are minor details that were holding it up. One annoyance is > that, on some archs, success/failure of "sc" comes via a condition > flag which the C caller can't easily branch on, so there's an extra > conversion to a boolean result inside the asm and extra conversion > back to a test/branch outside the asm. In practice we probably don't > care. > Yes, and my original message showed how to deal with that: | static inline int a_sc(volatile int* p, int x) | { | __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); | return 1; | fail: | return 0; | } I tested the assembler output from both gcc and clang and it looks alright to me (testcase attached). As I said, we typically want to branch if SC fails, so that's how these snippets should be written (not "branch if it succeeds", because AFAICS the compiler can't rewrite the ASM snippets). > One other issue is that risc-v seems to guarantee, at least on some > implementations, stronger forward-progress guarantees than a normal > ll/sc as long as the ll/sc are in order, within a few instruction > slots of each other, with no branches between. Such conditions cannot > be met without putting them in the same asm block, so we might need to > do a custom version for risc-v if we want to take advantage of the > stronger properties. > Oh goodie, another term to research... As far as I can see, RISC-V is another architecture. Since the scheme I'm proposing is strictly opt-in, we can cross that bridge once we come to it. For instance, this scheme is insufficient to support SuperH and its three different versions of a_cas(). So I'd just leave that be for the moment. Ciao, Markus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 17:35 ` Markus Wichmann @ 2016-01-10 17:50 ` Alexander Monakov 2016-01-11 16:35 ` Markus Wichmann 0 siblings, 1 reply; 14+ messages in thread From: Alexander Monakov @ 2016-01-10 17:50 UTC (permalink / raw) To: musl On Sun, 10 Jan 2016, Markus Wichmann wrote: > | static inline int a_sc(volatile int* p, int x) > | { > | __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); > | return 1; > | fail: > | return 0; > | } > > I tested the assembler output from both gcc and clang and it looks > alright to me (testcase attached). "asm goto" is an extension that appeared in gcc 4.5 and is not supported in clang (I get "error: expected '(' after 'asm'" with clang 3.7). I don't know why you claim it works with clang. Corresponding llvm bugs (no plans to add support soon): https://llvm.org/bugs/show_bug.cgi?id=9295 https://llvm.org/bugs/show_bug.cgi?id=14406 Alexander ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 17:50 ` Alexander Monakov @ 2016-01-11 16:35 ` Markus Wichmann 2016-01-11 17:12 ` Jens Gustedt 0 siblings, 1 reply; 14+ messages in thread From: Markus Wichmann @ 2016-01-11 16:35 UTC (permalink / raw) To: musl On Sun, Jan 10, 2016 at 08:50:28PM +0300, Alexander Monakov wrote: > "asm goto" is an extension that appeared in gcc 4.5 and is not supported in > clang (I get "error: expected '(' after 'asm'" with clang 3.7). I don't know > why you claim it works with clang. > That's very simple: Because I misremembered. I had tested a lot of things against gcc and clang and thought that file had been among them. But it wasn't. And I didn't test it because building a cross-gcc had been hard enough, so I didn't have the energy to try a cross-clang on that day. So I'm sorry, but it was an honest mistake. > Corresponding llvm bugs (no plans to add support soon): > https://llvm.org/bugs/show_bug.cgi?id=9295 > https://llvm.org/bugs/show_bug.cgi?id=14406 > Really? OK, so it's either suboptimal code for everyone or compiler-specific better code. Why can't we have nice things? OTOH, maybe we simply shouldn't write synchronisation primitives ourselves and instead use the ones provided by GCC (and let other compilers suck on a salty sausage, if they don't support those primitives). Ciao, Markus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-11 16:35 ` Markus Wichmann @ 2016-01-11 17:12 ` Jens Gustedt 2016-01-11 19:03 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Jens Gustedt @ 2016-01-11 17:12 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 1513 bytes --] Hello, Am Montag, den 11.01.2016, 17:35 +0100 schrieb Markus Wichmann: > Really? OK, so it's either suboptimal code for everyone or > compiler-specific better code. Why can't we have nice things? nice things would be portable atomics, no? > OTOH, maybe we simply shouldn't write synchronisation primitives > ourselves and instead use the ones provided by GCC (and let other > compilers suck on a salty sausage, if they don't support those > primitives). I think on the long run we should use C11 atomics and leave the dirty work to the compiler writers. To my experience they do good work with that now, the assembler they produce looks nice. My stdatomic library is sitting there, ready to integrate into musl. It solves the problem of backwards compatibility for all compilers that that implement the __sync builtins. (gcc and clang with very old version numbers.) Last time I looked, all usages but one of atomic operations in musl are clean. If an atomic operation is used for a data a some point, atomic operations are used in all other places. So moving to _Atomic(int) would be a option. (Basically this would be `volatile int*` => `_Atomic(int)`, IIRC). Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-11 17:12 ` Jens Gustedt @ 2016-01-11 19:03 ` Szabolcs Nagy 2016-01-11 20:56 ` Jens Gustedt 0 siblings, 1 reply; 14+ messages in thread From: Szabolcs Nagy @ 2016-01-11 19:03 UTC (permalink / raw) To: musl * Jens Gustedt <jens.gustedt@inria.fr> [2016-01-11 18:12:29 +0100]: > Am Montag, den 11.01.2016, 17:35 +0100 schrieb Markus Wichmann: > > OTOH, maybe we simply shouldn't write synchronisation primitives > > ourselves and instead use the ones provided by GCC (and let other > > compilers suck on a salty sausage, if they don't support those > > primitives). > > I think on the long run we should use C11 atomics and leave the dirty > work to the compiler writers. To my experience they do good work with > that now, the assembler they produce looks nice. > yes but old compilers had various bugs on various targets. > My stdatomic library is sitting there, ready to integrate into > musl. It solves the problem of backwards compatibility for all > compilers that that implement the __sync builtins. (gcc and clang with > very old version numbers.) > i think simpler compilers like pcc, cparser, tcc dont implement that. if musl moves to compiler builtins then i'd like to have a possibility to compile atomic primitives as a separate tu > Last time I looked, all usages but one of atomic operations in musl > are clean. If an atomic operation is used for a data a some point, > atomic operations are used in all other places. So moving to > _Atomic(int) would be a option. (Basically this would be `volatile > int*` => `_Atomic(int)`, IIRC). > pthread_once_t and pthread_spinlock_t are publicly visibles type (without volatile and _Atomic) i dont think we can fix those without abi change. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-11 19:03 ` Szabolcs Nagy @ 2016-01-11 20:56 ` Jens Gustedt 2016-01-14 22:12 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Jens Gustedt @ 2016-01-11 20:56 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2846 bytes --] Am Montag, den 11.01.2016, 20:03 +0100 schrieb Szabolcs Nagy: > * Jens Gustedt <jens.gustedt@inria.fr> [2016-01-11 18:12:29 +0100]: > > Am Montag, den 11.01.2016, 17:35 +0100 schrieb Markus Wichmann: > > > OTOH, maybe we simply shouldn't write synchronisation primitives > > > ourselves and instead use the ones provided by GCC (and let other > > > compilers suck on a salty sausage, if they don't support those > > > primitives). > > > > I think on the long run we should use C11 atomics and leave the dirty > > work to the compiler writers. To my experience they do good work with > > that now, the assembler they produce looks nice. > > > > yes but old compilers had various bugs on various targets. > > > My stdatomic library is sitting there, ready to integrate into > > musl. It solves the problem of backwards compatibility for all > > compilers that that implement the __sync builtins. (gcc and clang with > > very old version numbers.) > > > > i think simpler compilers like pcc, cparser, tcc > dont implement that. > > if musl moves to compiler builtins then i'd > like to have a possibility to compile atomic > primitives as a separate tu In a sense, stdatomic has that already. It also implements the atomic operations as fallback functions, for the case that the compiler isn't able to synthesise the operation. But you are right, support for those simpler compilers then would mean that we'd have to maintain stubs, at least for the most commonly used 4 byte operations. > > Last time I looked, all usages but one of atomic operations in musl > > are clean. If an atomic operation is used for a data a some point, > > atomic operations are used in all other places. So moving to > > _Atomic(int) would be a option. (Basically this would be `volatile > > int*` => `_Atomic(int)`, IIRC). oops I meant `volatile int*` => `_Atomic(int)*` > pthread_once_t and pthread_spinlock_t are > publicly visibles type (without volatile and > _Atomic) > > i dont think we can fix those without abi > change. This is really a question what ABI means in this case. The width, alignment and representation of the `int` would stay the same, we would just internally (to the library implementation) interpret it as _Atomic(int). Also it seems that we do such a re-interpretation already with `volatile`. One interpretation of the standard says that the object itself has to be `volatile`, just casting a pointer to `volatile int*` doesn't inhibit optimizations. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-11 20:56 ` Jens Gustedt @ 2016-01-14 22:12 ` Rich Felker 2016-01-14 22:37 ` Jens Gustedt 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2016-01-14 22:12 UTC (permalink / raw) To: musl On Mon, Jan 11, 2016 at 09:56:50PM +0100, Jens Gustedt wrote: > Am Montag, den 11.01.2016, 20:03 +0100 schrieb Szabolcs Nagy: > > * Jens Gustedt <jens.gustedt@inria.fr> [2016-01-11 18:12:29 +0100]: > > > Am Montag, den 11.01.2016, 17:35 +0100 schrieb Markus Wichmann: > > > > OTOH, maybe we simply shouldn't write synchronisation primitives > > > > ourselves and instead use the ones provided by GCC (and let other > > > > compilers suck on a salty sausage, if they don't support those > > > > primitives). > > > > > > I think on the long run we should use C11 atomics and leave the dirty > > > work to the compiler writers. To my experience they do good work with > > > that now, the assembler they produce looks nice. > > > > > > > yes but old compilers had various bugs on various targets. > > > > > My stdatomic library is sitting there, ready to integrate into > > > musl. It solves the problem of backwards compatibility for all > > > compilers that that implement the __sync builtins. (gcc and clang with > > > very old version numbers.) > > > > > > > i think simpler compilers like pcc, cparser, tcc > > dont implement that. > > > > if musl moves to compiler builtins then i'd > > like to have a possibility to compile atomic > > primitives as a separate tu > > In a sense, stdatomic has that already. It also implements the atomic > operations as fallback functions, for the case that the compiler isn't > able to synthesise the operation. > > But you are right, support for those simpler compilers then would mean > that we'd have to maintain stubs, at least for the most commonly used > 4 byte operations. There are already multiple reasons we don't use the compiler's atomics, either directly or indirectly via stdatomic.h. They're not supported in some old/alternative compilers, they generate highly suboptimal code even on modern compilers for some important archs (e.g. ARM), and they fail to properly support archs where it's necessary to make a runtime choice of which atomic code paths to use in order to achieve safe/correct behavior. With the atomics overhaul I am planning to have an option (selected by the arch headers, not the user) to use __sync_* as the backend for atomics, which will ease porting to new archs where it already works correctly on all compilers that support the arch. > > > Last time I looked, all usages but one of atomic operations in musl > > > are clean. If an atomic operation is used for a data a some point, > > > atomic operations are used in all other places. So moving to > > > _Atomic(int) would be a option. (Basically this would be `volatile > > > int*` => `_Atomic(int)`, IIRC). > > oops I meant `volatile int*` => `_Atomic(int)*` > > > pthread_once_t and pthread_spinlock_t are > > publicly visibles type (without volatile and > > _Atomic) > > > > i dont think we can fix those without abi > > change. > > This is really a question what ABI means in this case. The width, > alignment and representation of the `int` would stay the same, we > would just internally (to the library implementation) interpret it as > _Atomic(int). From a C++ perspective ABI certainly includes the type that will appear in mangled function names. This is the main motivation for not changing types like this. Of course LTO could also break when formal types don't match. > Also it seems that we do such a re-interpretation already with > `volatile`. One interpretation of the standard says that the object > itself has to be `volatile`, just casting a pointer to `volatile int*` > doesn't inhibit optimizations. GCC explicitly interprets it the other way, and documents it as such. If we want to deal with compilers that don't provide such a guarantee, and for which the accesses could break, we probably need an explicit load construct in asm... But this seems like a low priorit and I'm happy to wait to address it unless/until real-world problems seem likely. For this reason it would be nice to document the assumption, however, as Szabolcs Nagy suggested we do for issues like this. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-14 22:12 ` Rich Felker @ 2016-01-14 22:37 ` Jens Gustedt 2016-01-14 23:32 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Jens Gustedt @ 2016-01-14 22:37 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 2164 bytes --] Am Donnerstag, den 14.01.2016, 17:12 -0500 schrieb Rich Felker: > There are already multiple reasons we don't use the compiler's > atomics, either directly or indirectly via stdatomic.h. They're not > supported in some old/alternative compilers, they generate highly > suboptimal code even on modern compilers for some important archs > (e.g. ARM), I have seen some pretty good assembler when using the __atomic... builtins, so I can't completely follow, here. > and they fail to properly support archs where it's > necessary to make a runtime choice of which atomic code paths to use > in order to achieve safe/correct behavior. ok > > > > Last time I looked, all usages but one of atomic operations in musl > > > > are clean. If an atomic operation is used for a data a some point, > > > > atomic operations are used in all other places. So moving to > > > > _Atomic(int) would be a option. (Basically this would be `volatile > > > > int*` => `_Atomic(int)`, IIRC). > > > > oops I meant `volatile int*` => `_Atomic(int)*` > > > > > pthread_once_t and pthread_spinlock_t are > > > publicly visibles type (without volatile and > > > _Atomic) > > > > > > i dont think we can fix those without abi > > > change. > > > > This is really a question what ABI means in this case. The width, > > alignment and representation of the `int` would stay the same, we > > would just internally (to the library implementation) interpret it as > > _Atomic(int). > > From a C++ perspective ABI certainly includes the type that will > appear in mangled function names. This is the main motivation for not > changing types like this. Of course LTO could also break when formal > types don't match. Yes, I know. I was thinking of a way to have the externally visible type remain stable, but internally use _Atomic. Jens -- :: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS ::: :: ::::::::::::::: office Strasbourg : +33 368854536 :: :: :::::::::::::::::::::: gsm France : +33 651400183 :: :: ::::::::::::::: gsm international : +49 15737185122 :: :: http://icube-icps.unistra.fr/index.php/Jens_Gustedt :: [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-14 22:37 ` Jens Gustedt @ 2016-01-14 23:32 ` Rich Felker 2016-01-15 0:46 ` Szabolcs Nagy 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2016-01-14 23:32 UTC (permalink / raw) To: musl On Thu, Jan 14, 2016 at 11:37:48PM +0100, Jens Gustedt wrote: > Am Donnerstag, den 14.01.2016, 17:12 -0500 schrieb Rich Felker: > > There are already multiple reasons we don't use the compiler's > > atomics, either directly or indirectly via stdatomic.h. They're not > > supported in some old/alternative compilers, they generate highly > > suboptimal code even on modern compilers for some important archs > > (e.g. ARM), > > I have seen some pretty good assembler when using the > __atomic... builtins, so I can't completely follow, here. It generates "dmb sy" all over the place instead of "dmb ish". Synchronizing with external bus devices is NOT something you want to happen in thread synchronization primitives. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-14 23:32 ` Rich Felker @ 2016-01-15 0:46 ` Szabolcs Nagy 0 siblings, 0 replies; 14+ messages in thread From: Szabolcs Nagy @ 2016-01-15 0:46 UTC (permalink / raw) To: musl * Rich Felker <dalias@libc.org> [2016-01-14 18:32:24 -0500]: > On Thu, Jan 14, 2016 at 11:37:48PM +0100, Jens Gustedt wrote: > > Am Donnerstag, den 14.01.2016, 17:12 -0500 schrieb Rich Felker: > > > There are already multiple reasons we don't use the compiler's > > > atomics, either directly or indirectly via stdatomic.h. They're not > > > supported in some old/alternative compilers, they generate highly > > > suboptimal code even on modern compilers for some important archs > > > (e.g. ARM), > > > > I have seen some pretty good assembler when using the > > __atomic... builtins, so I can't completely follow, here. > > It generates "dmb sy" all over the place instead of "dmb ish". > Synchronizing with external bus devices is NOT something you want to > happen in thread synchronization primitives. > fixed in gcc-6 :) https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=224317 (not backported to earlier branches though) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 16:57 ` Rich Felker 2016-01-10 17:35 ` Markus Wichmann @ 2016-01-10 17:37 ` Markus Wichmann 2016-01-22 0:09 ` Rich Felker 2 siblings, 0 replies; 14+ messages in thread From: Markus Wichmann @ 2016-01-10 17:37 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 79 bytes --] Hi all, And once more with the promised attachments. I'm sorry. Ciao, Markus [-- Attachment #2: cas.c --] [-- Type: text/x-csrc, Size: 477 bytes --] static inline int a_ll(volatile int *p) { int x; asm volatile ("lwarx %0, 0, %1" : "=r"(x) : "r"(p) : "memory"); return x; } static inline int a_sc(volatile int *p, int x) { asm goto ("stwcx. %0, 0, %1\nbne- %l2" : : "r"(x), "r"(p) : "cc","memory" : fail); return 1; fail: return 0; } int a_cas(volatile int *p, int t, int s) { int x; do { x = a_ll(p); if (x != t) break; } while (!a_sc(p, s)); return x; } [-- Attachment #3: cas.s --] [-- Type: text/plain, Size: 359 bytes --] .file "cas.c" .section ".text" .align 2 .globl a_cas .type a_cas, @function a_cas: .L2: #APP # 3 "cas.c" 1 lwarx 9, 0, 3 # 0 "" 2 #NO_APP cmpw 7,4,9 bne- 7,.L4 #APP # 8 "cas.c" 1 stwcx. 5, 0, 3 bne- .L2 # 0 "" 2 #NO_APP .L3: mr 3,4 blr .L4: mr 4,9 b .L3 .size a_cas, .-a_cas .ident "GCC: (GNU) 4.9.2" .section .note.GNU-stack,"",@progbits ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: atomic.h cleanup 2016-01-10 16:57 ` Rich Felker 2016-01-10 17:35 ` Markus Wichmann 2016-01-10 17:37 ` Markus Wichmann @ 2016-01-22 0:09 ` Rich Felker 2 siblings, 0 replies; 14+ messages in thread From: Rich Felker @ 2016-01-22 0:09 UTC (permalink / raw) To: musl On Sun, Jan 10, 2016 at 11:57:18AM -0500, Rich Felker wrote: > On Sun, Jan 10, 2016 at 01:21:39PM +0100, Markus Wichmann wrote: > > Hi all, > > > > The development roadmap on the musl wiki lists the ominous point > > "atomic.h cleanup" for 1.2.0. > > > > I assume you mean a sort of simplification and unification. I noticed > > that for the RISC arch's there are rather liberal amounts of inline > > assembly for the atomic operations. And I have always been taught, that > > as soon as you start copying code, you are probably doing it wrong. > > > > So first thing I'd do: add a new file, let's call it atomic_debruijn.h. > > It contains an implementation of a_ctz() and a_ctz_64() based on the > > DeBruijn number. That way, all the architectures currently implementing > > a_ctz() in this manner can just include that file, and a lot of > > duplicate code goes out the window. > > > > Second thing: We can reduce the inline assembly footprint and the amount > > of duplicate code by adding a new file, let's call it atomic_llsc.h, > > that implements a_cas(), a_cas_p(), a_swap(), a_fetch_add(), a_inc(), > > a_dec(), a_and() and a_or() in terms of new functions that would have to > > be defined, namely: > > > > static inline void a_presync(void) - execute any barrier needed before > > attempting an atomic operation, like "dmb ish" for arm, or "sync" for > > ppc. > > > > static inline void a_postsync(void) - execute any barrier needed > > afterwards, like "isync" for PPC, or, again, "dmb ish" for ARM. > > > > static inline int a_ll(int*) - perform an LL on the given pointer and > > return the value there. This would be "lwarx" for PPC, or "ldrex" for > > ARM. > > > > static inline int a_sc(int*, int) - perform an SC on the given pointer > > with the given value. Return zero iff that failed. > > > > static inline void* a_ll_p(void*) - same as a_ll(), but with machine > > words instead of int, if that's a difference. > > > > static inline int a_sc_p(void*, void*) - same as a_sc(), but with > > machine words. > > > > > > With these function we can implement e.g. CAS as: > > > > static inline int a_cas(volatile int *p, int t, int s) > > { > > int v; > > do { > > v = a_ll(p); > > if (v != t) > > break; > > } while (!a_sc(p, s)); > > return v; > > } > > > > Add some #ifdefs to only activate the pointer variations if they're > > needed (i.e. if we're on 64 bits) and Bob's your uncle. > > > > The only hardship would be in implementing a_sc(), but that can be > > solved by using a feature often referenced but rarely seen in the wild: > > ASM goto. How that works is that, if the arch's SC instruction returns > > success or failure in a flag and the CPU can jump on that flag (unlike, > > say, microblaze, which can only jump on comparisons), then you encode > > the jump in the assembly snippet but let the compiler handle the targets > > for you. Since in all cases, we want to jump on failure, that's what the > > assembly should do, so for instance for PowerPC: > > > > static inline int a_sc(volatile int* p, int x) > > { > > __asm__ goto ("stwcx. %0, 0, %1\n\tbne- %l2" : : "r"(x), "r"(p) : "cc", "memory" : fail); > > return 1; > > fail: > > return 0; > > } > > > > I already tried the compiler results for such a design, but I never > > tried running it for lack of hardware. > > > > Anyway, this code makes it possible for the compiler to redirect the > > conditional jump on failure to the top of the loop in a_cas(). Since the > > return value isn't used otherwise, the values 1 and 0 never appear in > > the generated assembly. > > > > What do you say to this design? > > Have you read this thread? :) > > http://www.openwall.com/lists/musl/2015/05/20/1 > > I thought at one point it was linked from the wiki but maybe it got > lost. > > Basically I have this done already outside of musl as an experiment, > but there are minor details that were holding it up. One annoyance is > that, on some archs, success/failure of "sc" comes via a condition > flag which the C caller can't easily branch on, so there's an extra > conversion to a boolean result inside the asm and extra conversion > back to a test/branch outside the asm. In practice we probably don't > care. > > One other issue is that risc-v seems to guarantee, at least on some > implementations, stronger forward-progress guarantees than a normal > ll/sc as long as the ll/sc are in order, within a few instruction > slots of each other, with no branches between. Such conditions cannot > be met without putting them in the same asm block, so we might need to > do a custom version for risc-v if we want to take advantage of the > stronger properties. > > Anyway, at this point the main obstacle to finishing the task is doing > the actual merging and testing, not any new coding, I think. Most of this is committed now! The original commit introducing the new system is: http://git.musl-libc.org/cgit/musl/commit/?id=1315596b510189b5159e742110b504177bdd4932 Subsequent commits are converting archs over one by one to make best use of the new system. I'd really like to see some before-and-after benchmarks on real hardware. For Timo's cond_bench test which I've been using as a quick check to make sure the new atomics have no obvious breakage, performance under qemu user-level emulation has roughly doubled for most of the llsc archs due to better ability of gcc to inline (it refuses to inline the largeish ll/sc loops written in asm but is happy to inline the tiny a_ll and a_sc asm). Please report any regressions, etc. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-01-22 0:09 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-10 12:21 atomic.h cleanup Markus Wichmann 2016-01-10 16:57 ` Rich Felker 2016-01-10 17:35 ` Markus Wichmann 2016-01-10 17:50 ` Alexander Monakov 2016-01-11 16:35 ` Markus Wichmann 2016-01-11 17:12 ` Jens Gustedt 2016-01-11 19:03 ` Szabolcs Nagy 2016-01-11 20:56 ` Jens Gustedt 2016-01-14 22:12 ` Rich Felker 2016-01-14 22:37 ` Jens Gustedt 2016-01-14 23:32 ` Rich Felker 2016-01-15 0:46 ` Szabolcs Nagy 2016-01-10 17:37 ` Markus Wichmann 2016-01-22 0:09 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).