mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations
@ 2025-09-18 16:47 Pincheng Wang
  2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
  2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
  0 siblings, 2 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-09-18 16:47 UTC (permalink / raw)
  To: musl; +Cc: pincheng.plct

Hi all,

This patch adds support for the RISC-V Zacas (Atomic Compare-and-Swap)
extension in musl's atomic operations for both riscv64 and riscv32.

Currently, musl implements a_cas using a
Load-Reserved/Store-Conditional (lr/sc) loop that:
- Requires at least four instructions (lr+bne+sc+bnez) per CAS
  operation,
- Contains a retry loop under contention,
- Incurs branch penalties that may cause pipeline stalls.

Zacas introduces amocas.w.aqrl/amocas.d.aqrl instructions that perform
CAS atomically in a single instruction, eliminating retry loops and
conditional branches.

Due to hardware limitations, we evaluated this change under QEMU using
both mcycle and minstret counters. The results show clear benefits:

Metric											lr/sc	Zacas	Improvement
Instr. per CAS (50k ops average)				15.04	8.36	-44.4%
Instr. per op (single-thread)					23.61	14.25	-39.6%
Instr. per op (multi-thread, high contention)	528.24	251.14	-52.5%

In addition, libc.a size is reduced by ~1.2% due to removal of loop
code.

The patch automatically falls back to the lr/sc implementation on
systems where Zacas is not available, preserving full backward
compatibility.

This work provides a measurable reduction in instruction count,
execution cycles and binary size, improving scalability of
synchronization primitives under load.

Thanks for reviewing!

Best regards,
Pincheng Wang


Pincheng Wang (1):
  riscv: add Zacas extension support for atomic CAS

 arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
 arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
  2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
@ 2025-09-18 16:47 ` Pincheng Wang
  2025-10-21  0:30   ` Szabolcs Nagy
  2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
  1 sibling, 1 reply; 5+ messages in thread
From: Pincheng Wang @ 2025-09-18 16:47 UTC (permalink / raw)
  To: musl; +Cc: pincheng.plct

Add compile-time detection for RISC-V Zacas extension and use
amocas.w.aqrl/amocas.d.aqrl instructions when available.

When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
instructions instead of lr/sc loops. Falls back to existing lr/sc
implementation when Zacas is not available.

Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
---
 arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
 arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
index 4d418f63..64ef05b7 100644
--- a/arch/riscv32/atomic_arch.h
+++ b/arch/riscv32/atomic_arch.h
@@ -3,6 +3,21 @@ static inline void a_barrier()
 {
 	__asm__ __volatile__ ("fence rw,rw" : : : "memory");
 }
+#ifdef __riscv_zacas
+
+#define a_cas a_cas
+static inline int a_cas(volatile int *p, int t, int s)
+{
+	int old = t;
+	__asm__ __volatile__ (
+		"amocas.w.aqrl %0, %2, %1"
+		: "+r"(old), "+A"(*(volatile int *)p)
+		: "r"(s)
+		: "memory");
+	return old;
+}
+
+#else /* Fallback to lr/sc when Zacas is not available */
 
 #define a_cas a_cas
 static inline int a_cas(volatile int *p, int t, int s)
@@ -19,3 +34,5 @@ static inline int a_cas(volatile int *p, int t, int s)
 		: "memory");
 	return old;
 }
+
+#endif /* __riscv_zacas */
\ No newline at end of file
diff --git a/arch/riscv64/atomic_arch.h b/arch/riscv64/atomic_arch.h
index 0c382588..9681505e 100644
--- a/arch/riscv64/atomic_arch.h
+++ b/arch/riscv64/atomic_arch.h
@@ -4,6 +4,34 @@ static inline void a_barrier()
 	__asm__ __volatile__ ("fence rw,rw" : : : "memory");
 }
 
+#ifdef __riscv_zacas
+
+#define a_cas a_cas
+static inline int a_cas(volatile int *p, int t, int s)
+{
+	int old = t;
+	__asm__ __volatile__ (
+		"amocas.w.aqrl %0, %2, %1"
+		: "+r"(old), "+A"(*(volatile int *)p)
+		: "r"(s)
+		: "memory");
+	return old;
+}
+
+#define a_cas_p a_cas_p
+static inline void *a_cas_p(volatile void *p, void *t, void *s)
+{
+	void *old = t;
+	__asm__ __volatile__ (
+		"amocas.d.aqrl %0, %2, %1"
+		: "+r"(old), "+A"(*(void *volatile *)p)
+		: "r"(s)
+		: "memory");
+	return old;
+}
+
+#else /* Fallback to lr/sc when Zacas is not available */
+
 #define a_cas a_cas
 static inline int a_cas(volatile int *p, int t, int s)
 {
@@ -36,3 +64,5 @@ static inline void *a_cas_p(volatile void *p, void *t, void *s)
 		: "memory");
 	return old;
 }
+
+#endif /* __riscv_zacas */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations
  2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
  2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
@ 2025-10-16 15:37 ` Pincheng Wang
  1 sibling, 0 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-10-16 15:37 UTC (permalink / raw)
  To: musl

On 2025/9/19 00:47, Pincheng Wang wrote:
> Hi all,
> 
> This patch adds support for the RISC-V Zacas (Atomic Compare-and-Swap)
> extension in musl's atomic operations for both riscv64 and riscv32.
> 
> Currently, musl implements a_cas using a
> Load-Reserved/Store-Conditional (lr/sc) loop that:
> - Requires at least four instructions (lr+bne+sc+bnez) per CAS
>    operation,
> - Contains a retry loop under contention,
> - Incurs branch penalties that may cause pipeline stalls.
> 
> Zacas introduces amocas.w.aqrl/amocas.d.aqrl instructions that perform
> CAS atomically in a single instruction, eliminating retry loops and
> conditional branches.
> 
> Due to hardware limitations, we evaluated this change under QEMU using
> both mcycle and minstret counters. The results show clear benefits:
> 
> Metric											lr/sc	Zacas	Improvement
> Instr. per CAS (50k ops average)				15.04	8.36	-44.4%
> Instr. per op (single-thread)					23.61	14.25	-39.6%
> Instr. per op (multi-thread, high contention)	528.24	251.14	-52.5%
> 
> In addition, libc.a size is reduced by ~1.2% due to removal of loop
> code.
> 
> The patch automatically falls back to the lr/sc implementation on
> systems where Zacas is not available, preserving full backward
> compatibility.
> 
> This work provides a measurable reduction in instruction count,
> execution cycles and binary size, improving scalability of
> synchronization primitives under load.
> 
> Thanks for reviewing!
> 
> Best regards,
> Pincheng Wang
> 
> 
> Pincheng Wang (1):
>    riscv: add Zacas extension support for atomic CAS
> 
>   arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
>   arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
>   2 files changed, 47 insertions(+)
> 

Hi all,

Friendly ping regarding my earlier patch on enabling the RISC-V Zacas 
(amocas.{w,d}) path for a_cas()/a_cas_p().

Best regards,
Pincheng Wang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
  2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
@ 2025-10-21  0:30   ` Szabolcs Nagy
  2025-10-21  3:05     ` Pincheng Wang
  0 siblings, 1 reply; 5+ messages in thread
From: Szabolcs Nagy @ 2025-10-21  0:30 UTC (permalink / raw)
  To: Pincheng Wang; +Cc: musl

* Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> [2025-09-19 00:47:20 +0800]:
> Add compile-time detection for RISC-V Zacas extension and use
> amocas.w.aqrl/amocas.d.aqrl instructions when available.
> 
> When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
> instructions instead of lr/sc loops. Falls back to existing lr/sc
> implementation when Zacas is not available.

is this a supported extension? are there users?
(implemented on existing cpus with released toolchain versions)

what cflags enable the extension? (how to test)

i can't review if the instructions have the right semantics,
but the code looks ok, with some comments below.

> 
> Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
> ---
>  arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
>  arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
> 
> diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
> index 4d418f63..64ef05b7 100644
> --- a/arch/riscv32/atomic_arch.h
> +++ b/arch/riscv32/atomic_arch.h
>  }
> +#ifdef __riscv_zacas

newline before #ifdef

> +#else /* Fallback to lr/sc when Zacas is not available */
...
> +#endif /* __riscv_zacas */
> \ No newline at end of file

newline after endif

i think ifdef comments are not needed in such a simple file.

> +++ b/arch/riscv64/atomic_arch.h
> @@ -4,6 +4,34 @@ static inline void a_barrier()
>  	__asm__ __volatile__ ("fence rw,rw" : : : "memory");
>  }
>  
> +#ifdef __riscv_zacas
> +
> +#define a_cas a_cas
> +static inline int a_cas(volatile int *p, int t, int s)
> +{
> +	int old = t;
> +	__asm__ __volatile__ (
> +		"amocas.w.aqrl %0, %2, %1"
> +		: "+r"(old), "+A"(*(volatile int *)p)
> +		: "r"(s)
> +		: "memory");

existing cas does not use +A constraint (check git log
why and ensure this is ok).

the ptr cast should not be needed. (same for rv32)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS
  2025-10-21  0:30   ` Szabolcs Nagy
@ 2025-10-21  3:05     ` Pincheng Wang
  0 siblings, 0 replies; 5+ messages in thread
From: Pincheng Wang @ 2025-10-21  3:05 UTC (permalink / raw)
  To: musl, nsz

On 2025/10/21 08:30, Szabolcs Nagy wrote:
> * Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn> [2025-09-19 00:47:20 +0800]:
>> Add compile-time detection for RISC-V Zacas extension and use
>> amocas.w.aqrl/amocas.d.aqrl instructions when available.
>>
>> When __riscv_zacas is defined, a_cas() and a_cas_p() use single amocas
>> instructions instead of lr/sc loops. Falls back to existing lr/sc
>> implementation when Zacas is not available.
> 
> is this a supported extension? are there users?
> (implemented on existing cpus with released toolchain versions)

The Zacas extension was ratified in November 2023. CPUs such as the 
XuanTie C930 already support this extension [1].

For toolchain support, GCC added Zacas extension support in commit 
11c2453 ("RISC-V: Add basic support for the Zacas extension") on Jul 30, 
2024.

Moreover, the RVA23 profile document [2] listed Zacas as a development 
option and states that it "is intented to become mandatory in the future 
RVA profile", suggesting broader adoption, particularly in 
high-preformance computing domains such as PCs and servers, in the near 
future.

[1] https://www.xrvm.com/product/xuantie/C930
[2] 
https://docs.riscv.org/reference/profiles/rva23/_attachments/rva23-profile.pdf

> what cflags enable the extension? (how to test) 

To enable this extension, use "-march=rv{32,64}gc_zacas" CFLAGS. In my 
development environment, I'm using riscv64-unknown-linux-gnu-gcc 
(version 15.1.0, commit g1b306039ac4) with the following configure command:
`CC=riscv64-unknown-linux-gnu-gcc CFLAGS="-march=rv64gc_zacas" 
./configure --prefix=/home/wpcwzy/sysroot-rv64`

> i can't review if the instructions have the right semantics,
> but the code looks ok, with some comments below.
> 
>>
>> Signed-off-by: Pincheng Wang <pincheng.plct@isrc.iscas.ac.cn>
>> ---
>>   arch/riscv32/atomic_arch.h | 17 +++++++++++++++++
>>   arch/riscv64/atomic_arch.h | 30 ++++++++++++++++++++++++++++++
>>   2 files changed, 47 insertions(+)
>>
>> diff --git a/arch/riscv32/atomic_arch.h b/arch/riscv32/atomic_arch.h
>> index 4d418f63..64ef05b7 100644
>> --- a/arch/riscv32/atomic_arch.h
>> +++ b/arch/riscv32/atomic_arch.h
>>   }
>> +#ifdef __riscv_zacas
> 
> newline before #ifdef
> 
>> +#else /* Fallback to lr/sc when Zacas is not available */
> ...
>> +#endif /* __riscv_zacas */
>> \ No newline at end of file
> 
> newline after endif
> 
> i think ifdef comments are not needed in such a simple file.
> 

Thank you for the formatting suggestions. I'll address these in the next 
revision.

>> +++ b/arch/riscv64/atomic_arch.h
>> @@ -4,6 +4,34 @@ static inline void a_barrier()
>>   	__asm__ __volatile__ ("fence rw,rw" : : : "memory");
>>   }
>>   
>> +#ifdef __riscv_zacas
>> +
>> +#define a_cas a_cas
>> +static inline int a_cas(volatile int *p, int t, int s)
>> +{
>> +	int old = t;
>> +	__asm__ __volatile__ (
>> +		"amocas.w.aqrl %0, %2, %1"
>> +		: "+r"(old), "+A"(*(volatile int *)p)
>> +		: "r"(s)
>> +		: "memory");
> 
> existing cas does not use +A constraint (check git log
> why and ensure this is ok).
> 
> the ptr cast should not be needed. (same for rv32)

Thanks, I will review the git history and adjust the constraint and cast 
in the next patch revision.

Best regards,
Pincheng Wang


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-10-21  3:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 16:47 [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang
2025-09-18 16:47 ` [musl] [PATCH 1/1] riscv: add Zacas extension support for atomic CAS Pincheng Wang
2025-10-21  0:30   ` Szabolcs Nagy
2025-10-21  3:05     ` Pincheng Wang
2025-10-16 15:37 ` [musl] [PATCH 0/1] riscv: Add support for Zacas in atomic operations Pincheng Wang

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).