From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL,RCVD_IN_ZEN_BLOCKED_OPENDNS autolearn=ham autolearn_force=no version=3.4.4 Received: from second.openwall.net (second.openwall.net [193.110.157.125]) by inbox.vuxu.org (Postfix) with SMTP id DA14223833 for ; Fri, 26 Sep 2025 02:32:14 +0200 (CEST) Received: (qmail 32164 invoked by uid 550); 26 Sep 2025 00:32:09 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com x-ms-reactions: disallow Received: (qmail 31988 invoked from network); 26 Sep 2025 00:32:09 -0000 Message-ID: <35488ed2-3c30-4bc4-89ab-70f30dee9890@isrc.iscas.ac.cn> Date: Fri, 26 Sep 2025 08:31:53 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: musl@lists.openwall.com, Yao Zi References: <20250925131557.8907-1-pincheng.plct@isrc.iscas.ac.cn> <20250925131557.8907-2-pincheng.plct@isrc.iscas.ac.cn> From: Pincheng Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-CM-TRANSID:rQCowABHloD53tVobsRABg--.45337S2 X-Coremail-Antispam: 1UD129KBjvJXoW7Kw1fXF45uryfAry5Kw1rZwb_yoW8CrW3pw 4FyFyfKFWIqry7KF9xXw1xJF45Ca9a9Fy5WF1SkryrA34UGr1IgF9xt3W09FW7JFnakr12 vr4jvry8GFyDAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyjb7Iv0xC_Kw4lb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I2 0VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rw A2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xII jxv20xvEc7CjxVAFwI0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4 A2jsIEc7CjxVAFwI0_Jr0_Gr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IE w4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMc vjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwCF04k20xvY0x0EwIxGrwCFx2IqxVCF s4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r 1rMI8E67AF67kF1VAFwI0_Jrv_JF1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWU JVWUCwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r 1j6r1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUv cSsGvfC2KfnxnUUI43ZEXa7IU5PpnJUUUUU== X-Originating-IP: [218.76.28.114] X-CM-SenderInfo: pslquxhhqjh1xofwqxxvufhxpvfd2hldfou0/ Subject: Re: [musl] [PATCH 1/1] riscv64: optimize memset implementation with vector extension On 2025/9/25 23:30, Yao Zi wrote: > On Thu, Sep 25, 2025 at 09:15:57PM +0800, Pincheng Wang wrote: >> Use head-tail filling strategy for small sizes and dynamic vsetvli >> approach for vector loops to reduce branch overhead. Add conditional >> compilation to fall back to scalar implementation when __riscv_vector is >> not available. >> >> Signed-off-by: Pincheng Wang >> --- >> src/string/riscv64/memset.S | 101 ++++++++++++++++++++++++++++++++++++ >> 1 file changed, 101 insertions(+) >> create mode 100644 src/string/riscv64/memset.S >> >> diff --git a/src/string/riscv64/memset.S b/src/string/riscv64/memset.S >> new file mode 100644 >> index 00000000..5fc6ee14 >> --- /dev/null >> +++ b/src/string/riscv64/memset.S >> @@ -0,0 +1,101 @@ >> +#ifdef __riscv_vector > > I don't think musl is built with V extension specified in march on > RISC-V platforms by default. Does this patch only benefit builds that > "-march=rv64gcv" is manually specified in CFLAGS? > > Furthermore, having RVV available at compilation-time doesn't mean it's > available at runtime. This effectively raises the baseline for RISC-V > platforms from RV64GC (or even lower) to RV64GCV, where the latter isn't > implied by the mostly-adapted RVA20 profile. > > Best regards, > Yao Zi Hi, Yao Thank you for your review. This patch currently only takes effect when `-march=rv64gcv` is manually specified in CFLAGS. I also understand your concern about enabling the vector implementation purely through compile-time conditionals. I am investigating a runtime detection and dispatch mechanism to select the appropriate implementation based on actual hardware support. If I make progress on this and verify it works as expected, I will update the approach in a v2 patch. Best regards, Pincheng Wang