From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13528 invoked from network); 19 Apr 2023 09:57:37 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 19 Apr 2023 09:57:37 -0000 Received: (qmail 20161 invoked by uid 550); 19 Apr 2023 09:57:29 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 20123 invoked from network); 19 Apr 2023 09:57:28 -0000 Date: Wed, 19 Apr 2023 11:02:10 +0200 From: Szabolcs Nagy To: =?utf-8?B?5byg6aOe?= Cc: musl@lists.openwall.com Message-ID: <20230419090210.GR3630668@port70.net> Mail-Followup-To: =?utf-8?B?5byg6aOe?= , musl@lists.openwall.com References: <7ab4e713.9fae.1876e1ac122.Coremail.zhangfei@nj.iscas.ac.cn> <658c32ae.2348c.187980096c9.Coremail.zhangfei@nj.iscas.ac.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <658c32ae.2348c.187980096c9.Coremail.zhangfei@nj.iscas.ac.cn> Subject: Re: Re: [musl] memset_riscv64 * =E5=BC=A0=E9=A3=9E [2023-04-19 13:33:08 +0800]: > -------------------------------------------------------------------------= ------- > length(byte) C language implementation(s) Basic instruction implementa= tion(s) > -------------------------------------------------------------------------= -------=09 > 4 0.00000352 0.000004001=09 > 8 0.000004001 0.000005441=09 > 16 0.000006241 0.00000464=09 > 32 0.00000752 0.00000448=09 > 64 0.000008481 0.000005281=09 > 128 0.000009281 0.000005921=09 > 256 0.000011201 0.000007041=09 i don't think these numbers can be trusted. > #include > #include > #include > #include > #include >=20 > #define DATA_SIZE 5*1024*1024 > #define MAX_LEN 1*1024*1024 > #define OFFSET 0 > #define LOOP_TIMES 100 > int main(){ > char *str1,*src1; > str1 =3D (char *)mmap(NULL, DATA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVA= TE|MAP_ANONYMOUS, -1, 0); >=20 > printf("function test start\n"); > =20 > src1 =3D str1+OFFSET; > struct timespec tv0,tv; > for(int len=3D2; len<=3DMAX_LEN; len*=3D2){ > clock_gettime(CLOCK_REALTIME, &tv0); > for(int k=3D0; k memset(src1, 'a', len); > } > clock_gettime(CLOCK_REALTIME, &tv); > tv.tv_sec -=3D tv0.tv_sec; > if ((tv.tv_nsec -=3D tv0.tv_nsec) < 0) { > tv.tv_nsec +=3D 1000000000; > tv.tv_sec--; > } > printf("len: %d time: %ld.%.9ld\n",len, (long)tv.tv_sec, (long)tv.= tv_nsec); this repeatedly calls memset with exact same len, alignment and value. so it favours branch heavy code since those are correctly predicted. but even if you care about a branch-predicted microbenchmark, you made a single measurement per size so you cannot tell how much the time varies, you should do several measurements and take the min so noise from system effects and cpu internal state are reduced (also that state needs to be warmed up). and likely the LOOP_TIMES should be bigger too for small sizes for reliable timing. benchmarking string functions is tricky especially for a target arch with many implementations. > } >=20 > printf("function test end\n"); > munmap(str1,DATA_SIZE); > return 0; > } >=20