* [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations
@ 2025-09-18 16:47 Pincheng Wang
2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
0 siblings, 2 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-09-18 16:47 UTC (permalink / raw)
To: musl; +Cc: pincheng.plct
Hi all,
This patch adds support for the RISC-V Zacas (Atomic Compare-and-Swap)
extension in musl's atomic operations for both riscv64 and riscv32.
Currently, musl implements a_cas using a
Load-Reserved/Store-Conditional (lr/sc) loop that:
- Requires at least four instructions (lr+bne+sc+bnez) per CAS
operation,
- Contains a retry loop under contention,
- Incurs branch penalties that may cause pipeline stalls.
Zacas introduces amocas.w.aqrl/amocas.d.aqrl instructions that perform
CAS atomically in a single instruction, eliminating retry loops and
conditional branches.
Due to hardware limitations, we evaluated this change under QEMU using
both mcycle and minstret counters. The results show clear benefits:
Metric lr/sc Zacas Improvement
Instr. per CAS (50k ops average) 15.04 8.36 -44.4%
Instr. per op (single-thread) 23.61 14.25 -39.6%
Instr. per op (multi-thread, high contention) 528.24 251.14 -52.5%
In addition, libc.a size is reduced by ~1.2% due to removal of loop
code.
The patch automatically falls back to the lr/sc implementation on
systems where Zacas is not available, preserving full backward
compatibility.
This work provides a measurable reduction in instruction count,
execution cycles and binary size, improving scalability of
synchronization primitives under load.
Thanks for reviewing!
Best regards,
Pincheng Wang
Pincheng Wang (1):
riscv: add Zacas extension support for atomic CAS
arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)
--
2.39.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
@ 2025-09-18 16:47 ` Pincheng Wang
2025-10-21 0:30 ` Szabolcs Nagy
2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
1 sibling, 1 reply; 5+ messages in thread
From: Pincheng Wang @ 2025-09-18 16:47 UTC (permalink / raw)
To: musl; +Cc: pincheng.plct
Add compile-time detection for RISC-V Zacas extension and use
amocas.w.aqrl/amocas.d.aqrl instructions when available.
When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
instructions instead of lr/sc loops. Falls back to existing lr/sc
implementation when Zacas is not available.
Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
---
arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+)
diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
index 4d418f63..64ef05b7 100644
--- a/arch/riscv32/atomic_arch.h
+++ b/arch/riscv32/atomic_arch.h
@@ -3,6 +3,21 @@ static inline void a_barrier()
{
__asm__ __volatile__ ("fence rw,rw" : : : "memory");
}
+#ifdef __riscv_zacas
+
+#define a_cas a_cas
+static inline int a_cas(volatile int *p, int t, int s)
+{
+ int old = t;
+ __asm__ __volatile__ (
+ "amocas.w.aqrl %0, %2, %1"
+ : "+r"(old), "+A"(*(volatile int *)p)
+ : "r"(s)
+ : "memory");
+ return old;
+}
+
+#else /* Fallback to lr/sc when Zacas is not available */
#define a_cas a_cas
static inline int a_cas(volatile int *p, int t, int s)
@@ -19,3 +34,5 @@ static inline int a_cas(volatile int *p, int t, int s)
: "memory");
return old;
}
+
+#endif /* __riscv_zacas */
\ No newline at end of file
diff --git a/arch/riscv64/atomic_arch.h b/arch/riscv64/atomic_arch.h
index 0c382588..9681505e 100644
--- a/arch/riscv64/atomic_arch.h
+++ b/arch/riscv64/atomic_arch.h
@@ -4,6 +4,34 @@ static inline void a_barrier()
__asm__ __volatile__ ("fence rw,rw" : : : "memory");
}
+#ifdef __riscv_zacas
+
+#define a_cas a_cas
+static inline int a_cas(volatile int *p, int t, int s)
+{
+ int old = t;
+ __asm__ __volatile__ (
+ "amocas.w.aqrl %0, %2, %1"
+ : "+r"(old), "+A"(*(volatile int *)p)
+ : "r"(s)
+ : "memory");
+ return old;
+}
+
+#define a_cas_p a_cas_p
+static inline void *a_cas_p(volatile void *p, void *t, void *s)
+{
+ void *old = t;
+ __asm__ __volatile__ (
+ "amocas.d.aqrl %0, %2, %1"
+ : "+r"(old), "+A"(*(void *volatile *)p)
+ : "r"(s)
+ : "memory");
+ return old;
+}
+
+#else /* Fallback to lr/sc when Zacas is not available */
+
#define a_cas a_cas
static inline int a_cas(volatile int *p, int t, int s)
{
@@ -36,3 +64,5 @@ static inline void *a_cas_p(volatile void *p, void *t, void *s)
: "memory");
return old;
}
+
+#endif /* __riscv_zacas */
--
2.39.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations
2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
@ 2025-10-16 15:37 ` Pincheng Wang
1 sibling, 0 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-10-16 15:37 UTC (permalink / raw)
To: musl
On 2025/9/19 00:47, Pincheng Wang wrote:
> Hi all,
>
> This patch adds support for the RISC-V Zacas (Atomic Compare-and-Swap)
> extension in musl's atomic operations for both riscv64 and riscv32.
>
> Currently, musl implements a_cas using a
> Load-Reserved/Store-Conditional (lr/sc) loop that:
> - Requires at least four instructions (lr+bne+sc+bnez) per CAS
> operation,
> - Contains a retry loop under contention,
> - Incurs branch penalties that may cause pipeline stalls.
>
> Zacas introduces amocas.w.aqrl/amocas.d.aqrl instructions that perform
> CAS atomically in a single instruction, eliminating retry loops and
> conditional branches.
>
> Due to hardware limitations, we evaluated this change under QEMU using
> both mcycle and minstret counters. The results show clear benefits:
>
> Metric lr/sc Zacas Improvement
> Instr. per CAS (50k ops average) 15.04 8.36 -44.4%
> Instr. per op (single-thread) 23.61 14.25 -39.6%
> Instr. per op (multi-thread, high contention) 528.24 251.14 -52.5%
>
> In addition, libc.a size is reduced by ~1.2% due to removal of loop
> code.
>
> The patch automatically falls back to the lr/sc implementation on
> systems where Zacas is not available, preserving full backward
> compatibility.
>
> This work provides a measurable reduction in instruction count,
> execution cycles and binary size, improving scalability of
> synchronization primitives under load.
>
> Thanks for reviewing!
>
> Best regards,
> Pincheng Wang
>
>
> Pincheng Wang (1):
> riscv: add Zacas extension support for atomic CAS
>
> arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
> arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+)
>
Hi all,
Friendly ping regarding my earlier patch on enabling the RISC-V Zacas
(amocas.{w,d}) path for a_cas()/a_cas_p().
Best regards,
Pincheng Wang
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
@ 2025-10-21 0:30 ` Szabolcs Nagy
2025-10-21 3:05 ` Pincheng Wang
0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2025-10-21 0:30 UTC (permalink / raw)
To: Pincheng Wang; +Cc: musl
* Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> [2025-09-19 00:47:20 +0800]:
> Add compile-time detection for RISC-V Zacas extension and use
> amocas.w.aqrl/amocas.d.aqrl instructions when available.
>
> When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
> instructions instead of lr/sc loops. Falls back to existing lr/sc
> implementation when Zacas is not available.
is this a supported extension? are there users?
(implemented on existing cpus with released toolchain versions)
what cflags enable the extension? (how to test)
i can't review if the instructions have the right semantics,
but the code looks ok, with some comments below.
>
> Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
> ---
> arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
> arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
> index 4d418f63..64ef05b7 100644
> --- a/arch/riscv32/atomic_arch.h
> +++ b/arch/riscv32/atomic_arch.h
> }
> +#ifdef __riscv_zacas
newline before #ifdef
> +#else /* Fallback to lr/sc when Zacas is not available */
...
> +#endif /* __riscv_zacas */
> \ No newline at end of file
newline after endif
i think ifdef comments are not needed in such a simple file.
> +++ b/arch/riscv64/atomic_arch.h
> @@ -4,6 +4,34 @@ static inline void a_barrier()
> __asm__ __volatile__ ("fence rw,rw" : : : "memory");
> }
>
> +#ifdef __riscv_zacas
> +
> +#define a_cas a_cas
> +static inline int a_cas(volatile int *p, int t, int s)
> +{
> + int old = t;
> + __asm__ __volatile__ (
> + "amocas.w.aqrl %0, %2, %1"
> + : "+r"(old), "+A"(*(volatile int *)p)
> + : "r"(s)
> + : "memory");
existing cas does not use +A constraint (check git log
why and ensure this is ok).
the ptr cast should not be needed. (same for rv32)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
2025-10-21 0:30 ` Szabolcs Nagy
@ 2025-10-21 3:05 ` Pincheng Wang
0 siblings, 0 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-10-21 3:05 UTC (permalink / raw)
To: musl, nsz
On 2025/10/21 08:30, Szabolcs Nagy wrote:
> * Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> [2025-09-19 00:47:20 +0800]:
>> Add compile-time detection for RISC-V Zacas extension and use
>> amocas.w.aqrl/amocas.d.aqrl instructions when available.
>>
>> When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
>> instructions instead of lr/sc loops. Falls back to existing lr/sc
>> implementation when Zacas is not available.
>
> is this a supported extension? are there users?
> (implemented on existing cpus with released toolchain versions)
The Zacas extension was ratified in November 2023. CPUs such as the
XuanTie C930 already support this extension [1].
For toolchain support, GCC added Zacas extension support in commit
11c2453 ("RISC-V: Add basic support for the Zacas extension") on Jul 30,
2024.
Moreover, the RVA23 profile document [2] listed Zacas as a development
option and states that it "is intented to become mandatory in the future
RVA profile", suggesting broader adoption, particularly in
high-preformance computing domains such as PCs and servers, in the near
future.
[1] https://www.xrvm.com/product/xuantie/C930
[2]
https://docs.riscv.org/reference/profiles/rva23/_attachments/rva23-profile.pdf
> what cflags enable the extension? (how to test)
To enable this extension, use "-march=rv{32,64}gc_zacas" CFLAGS. In my
development environment, I'm using riscv64-unknown-linux-gnu-gcc
(version 15.1.0, commit g1b306039ac4) with the following configure command:
`CC=riscv64-unknown-linux-gnu-gcc CFLAGS="-march=rv64gc_zacas"
./configure --prefix=/home/wpcwzy/sysroot-rv64`
> i can't review if the instructions have the right semantics,
> but the code looks ok, with some comments below.
>
>>
>> Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
>> ---
>> arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
>> arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
>> 2 files changed, 47 insertions(+)
>>
>> diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
>> index 4d418f63..64ef05b7 100644
>> --- a/arch/riscv32/atomic_arch.h
>> +++ b/arch/riscv32/atomic_arch.h
>> }
>> +#ifdef __riscv_zacas
>
> newline before #ifdef
>
>> +#else /* Fallback to lr/sc when Zacas is not available */
> ...
>> +#endif /* __riscv_zacas */
>> \ No newline at end of file
>
> newline after endif
>
> i think ifdef comments are not needed in such a simple file.
>
Thank you for the formatting suggestions. I'll address these in the next
revision.
>> +++ b/arch/riscv64/atomic_arch.h
>> @@ -4,6 +4,34 @@ static inline void a_barrier()
>> __asm__ __volatile__ ("fence rw,rw" : : : "memory");
>> }
>>
>> +#ifdef __riscv_zacas
>> +
>> +#define a_cas a_cas
>> +static inline int a_cas(volatile int *p, int t, int s)
>> +{
>> + int old = t;
>> + __asm__ __volatile__ (
>> + "amocas.w.aqrl %0, %2, %1"
>> + : "+r"(old), "+A"(*(volatile int *)p)
>> + : "r"(s)
>> + : "memory");
>
> existing cas does not use +A constraint (check git log
> why and ensure this is ok).
>
> the ptr cast should not be needed. (same for rv32)
Thanks, I will review the git history and adjust the constraint and cast
in the next patch revision.
Best regards,
Pincheng Wang
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-21 3:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
2025-10-21 0:30 ` Szabolcs Nagy
2025-10-21 3:05 ` Pincheng Wang
2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).