Hi! I have revised the implementation of memset, which includes two versions of the basic instruction set and vector instruction implementation, and used macro definitions to determine whether riscv hardware supports vector extension. The reason for implementing two versions is to hope that the memset implemented using the basic instruction set can be applicable to all RISCV architecture CPUs, and the vector version can accelerate the hardware supporting vector expansion. At present, the riscv vector extension instruction set is in a frozen state, and the instruction set is stable. In other open source libraries, such as openssl and openCV, riscv vector optimization is available. I conducted tests on different data volumes and compared the performance of memset functions implemented in C language, basic instruction set, and vector instruction set.The test case is test_memset.c Performance comparison between C language implementation and assembly implementation was tested on Sifive chips(RISC-V SiFive U74 Dual Core 64 Bit RV64GC ISA Chip Platform). The test results are as follows.when it is less than 16 bytes, the performance of C language implementation is better; From 16 bytes to 32768 bytes, the basic instruction implementation performance is better, with an average improvement of over 30%; When it is greater than 32768 bytes, the performance of both is equivalent. -------------------------------------------------------------------------------- length(byte) C language implementation(s) Basic instruction implementation(s) -------------------------------------------------------------------------------- 4 0.00000352 0.000004001 8 0.000004001 0.000005441 16 0.000006241 0.00000464 32 0.00000752 0.00000448 64 0.000008481 0.000005281 128 0.000009281 0.000005921 256 0.000011201 0.000007041 512 0.000014402 0.000010401 1024 0.000022563 0.000016962 2048 0.000039205 0.000030724 4096 0.000072809 0.000057768 8192 0.000153459 0.000132793 16384 0.000297157 0.000244992 32768 0.000784416 0.000735298 65536 0.005005252 0.004987382 131072 0.011286821 0.011256855 262144 0.023295169 0.022932165 524288 0.04647724 0.046084839 1048576 0.094114058 0.0932383 -------------------------------------------------------------------------------- Due to the lack of a chip that supports vector extension, I conducted a performance comparison test of memset using C language and vector implementation on the Spike simulator, which has certain reference value. It can be clearly seen that vector implementation is more efficient than C language implementation, with an average performance improvement of over 200%. -------------------------------------------------------------------------------- length(byte) C language implementation(s) Vector instruction implementation(s) -------------------------------------------------------------------------------- 4 0.000002839 0.000002939 8 0.000003239 0.000002939 16 0.000005239 0.000002939 32 0.000007039 0.000002939 64 0.000008039 0.000002939 128 0.000009239 0.000002939 256 0.000011639 0.000003539 512 0.000016439 0.000004739 1024 0.000026039 0.000007139 2048 0.000045239 0.000011939 4096 0.000083639 0.000021539 8192 0.000162298 0.000042598 16384 0.000317757 0.000082857 32768 0.000628675 0.000163375 65536 0.001250511 0.000324411 131072 0.002494183 0.000646483 262144 0.004981527 0.001290627 524288 0.009956215 0.002578915 1048576 0.019905591 0.005155491 -------------------------------------------------------------------------------- So I hope that a more efficient assembly implementation of riscv64 memset functions can be integrated into musl and compatible with different riscv architectures' CPUs. Fei Zhang > -----原始邮件----- > 发件人: "Pedro Falcato" > 发送时间: 2023-04-11 17:48:31 (星期二) > 收件人: musl@lists.openwall.com > 抄送: > 主题: Re: [musl] memset_riscv64 > > On Tue, Apr 11, 2023 at 3:18 AM 张飞 wrote: > > > > Hello, > > > > Currently, there is no assembly implementation of the memset function for riscv64 in Musl. > > This patch is a riscv64 assembly implementation of the memset function, which is implemented using the basic instruction set and > > has better performance than the c language implementation in Musl. I hope it can be integrated into Musl. > > Hi! > > Do you have performance measurements here? What exactly is the difference? > As far as I know, no one is actively optimizing on riscv yet, only > some movements in the upstream kernel (to prepare for vector extension > stuff, unaligned loads/stores) and the corresponding glibc patches. > Mainly because it's still super unobtanium and in very early stages, > so optimizing is very hard. > > So what hardware did you use? Is there a large gain here? Given that > your memset looks so simple, wouldn't it just be easier to write this > in C? > > -- > Pedro