From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,RCVD_IN_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id B636D200A9 for ; Thu, 25 Sep 2025 15:16:52 +0200 (CEST) Received: (qmail 7840 invoked by uid 550); 25 Sep 2025 13:16:40 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com x-ms-reactions: disallow Received: (qmail 7787 invoked from network); 25 Sep 2025 13:16:39 -0000 From: Pincheng Wang To: musl@lists.openwall.com Cc: pincheng.plct@isrc.iscas.ac.cn Date: Thu, 25 Sep 2025 21:15:57 +0800 Message-Id: <20250925131557.8907-2-pincheng.plct@isrc.iscas.ac.cn> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250925131557.8907-1-pincheng.plct@isrc.iscas.ac.cn> References: <20250925131557.8907-1-pincheng.plct@isrc.iscas.ac.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-CM-TRANSID:qwCowACX7Z+mQNVojvn3BQ--.2885S3 X-Coremail-Antispam: 1UD129KBjvJXoW7Cr48JF47XrW8Aw4UZr1UKFg_yoW8KFy7pr 1Yk34akr43tr97urWfJw13JFs8K3yFqr15WanIva4jyryxJF4DuF9xJ3yjqFWxtr1qkw4a vF48Zryfu3ykAr7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUU9214x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE 2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1l84ACjc xK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Jr0_Gr1le2I262IY c4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI 0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvY 0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1l42xK82IYc2Ij64vIr41l4I8I3I0E4I kC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWU WwC2zVAF1VAY17CE14v26r1Y6r17MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr 0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWU JVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYx BIdaVFxhVjvjDU0xZFpf9x0JUqfO7UUUUU= X-Originating-IP: [36.148.251.191] X-CM-SenderInfo: pslquxhhqjh1xofwqxxvufhxpvfd2hldfou0/ Subject: [musl] [PATCH 1/1] riscv64: optimize memset implementation with vector extension Use head-tail filling strategy for small sizes and dynamic vsetvli approach for vector loops to reduce branch overhead. Add conditional compilation to fall back to scalar implementation when __riscv_vector is not available. Signed-off-by: Pincheng Wang --- src/string/riscv64/memset.S | 101 ++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 src/string/riscv64/memset.S diff --git a/src/string/riscv64/memset.S b/src/string/riscv64/memset.S new file mode 100644 index 00000000..5fc6ee14 --- /dev/null +++ b/src/string/riscv64/memset.S @@ -0,0 +1,101 @@ +#ifdef __riscv_vector + + .text + .global memset +/* void *memset(void *s, int c, size_t n) + * a0 = s (dest), a1 = c (fill byte), a2 = n (size) + * Returns a0. + */ +memset: + mv t0, a0 /* running dst; keep a0 as return */ + beqz a2, .Ldone /* n == 0 → return */ + + li t3, 8 + bltu a2, t3, .Lsmall /* small-size fast path */ + + /* Broadcast fill byte once. */ + vsetvli t1, zero, e8, m8, ta, ma + vmv.v.x v0, a1 + +.Lbulk: + vsetvli t1, a2, e8, m8, ta, ma /* t1 = vl (bytes) */ + vse8.v v0, (t0) + add t0, t0, t1 + sub a2, a2, t1 + bnez a2, .Lbulk + j .Ldone + +/* Small-size fast path (< 8). + * Head-tail fills to minimize branches and avoid vsetvli overhead. + */ +.Lsmall: + /* Fill s[0], s[n-1] */ + sb a1, 0(t0) + add t2, t0, a2 + sb a1, -1(t2) + li t3, 2 + bleu a2, t3, .Ldone + + /* Fill s[1], s[2], s[n-2], s[n-3] */ + sb a1, 1(t0) + sb a1, 2(t0) + sb a1, -2(t2) + sb a1, -3(t2) + li t3, 6 + bleu a2, t3, .Ldone + + /* Fill s[3], s[n-4] */ + sb a1, 3(t0) + sb a1, -4(t2) + /* fallthrough for n <= 8 */ + +.Ldone: + ret +.size memset, .-memset + +#else /* !__riscv_vector */ + + .text + .global memset +/* Fallback scalar memset + * void *memset(void *s, int c, size_t n) + */ +memset: + mv t0, a0 /* running dst; keep a0 as return */ + beqz a2, .Ldone + + andi a1, a1, 0xff /* use low 8 bits only */ + + /* Head-tail strategy for small n */ + sb a1, 0(t0) /* s[0] */ + add t2, t0, a2 + sb a1, -1(t2) /* s[n-1] */ + li t3, 2 + bleu a2, t3, .Ldone + + sb a1, 1(t0) + sb a1, 2(t0) + sb a1, -2(t2) + sb a1, -3(t2) + li t3, 6 + bleu a2, t3, .Ldone + + sb a1, 3(t0) + sb a1, -4(t2) + li t3, 8 + bleu a2, t3, .Ldone + + /* Linear fill middle region [4, n-4) */ + addi t4, t0, 4 + addi t5, t2, -4 +.Lloop: + bgeu t4, t5, .Ldone + sb a1, 0(t4) + addi t4, t4, 1 + j .Lloop + +.Ldone: + ret +.size memset, .-memset + +#endif -- 2.39.5