From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4
Received: (qmail 13528 invoked from network); 19 Apr 2023 09:57:37 -0000
Received: from second.openwall.net (193.110.157.125)
  by inbox.vuxu.org with ESMTPUTF8; 19 Apr 2023 09:57:37 -0000
Received: (qmail 20161 invoked by uid 550); 19 Apr 2023 09:57:29 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 20123 invoked from network); 19 Apr 2023 09:57:28 -0000
Date: Wed, 19 Apr 2023 11:02:10 +0200
From: Szabolcs Nagy <nsz@port70.net>
To: =?utf-8?B?5byg6aOe?= <zhangfei@nj.iscas.ac.cn>
Cc: musl@lists.openwall.com
Message-ID: <20230419090210.GR3630668@port70.net>
Mail-Followup-To: =?utf-8?B?5byg6aOe?= <zhangfei@nj.iscas.ac.cn>,
	musl@lists.openwall.com
References: <7ab4e713.9fae.1876e1ac122.Coremail.zhangfei@nj.iscas.ac.cn>
 <CAKbZUD2Rfd9Lg37GY+N_bMeJOQJ=84yZ=SW9+vHMRdByU0CZ+A@mail.gmail.com>
 <658c32ae.2348c.187980096c9.Coremail.zhangfei@nj.iscas.ac.cn>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <658c32ae.2348c.187980096c9.Coremail.zhangfei@nj.iscas.ac.cn>
Subject: Re: Re: [musl] memset_riscv64

* =E5=BC=A0=E9=A3=9E <zhangfei@nj.iscas.ac.cn> [2023-04-19 13:33:08 +0800]:
> -------------------------------------------------------------------------=
-------
> length(byte)  C language implementation(s)   Basic instruction implementa=
tion(s)
> -------------------------------------------------------------------------=
-------=09
> 4	          0.00000352	                    0.000004001=09
> 8	          0.000004001	                    0.000005441=09
> 16	          0.000006241	                    0.00000464=09
> 32	          0.00000752	                    0.00000448=09
> 64	          0.000008481	                    0.000005281=09
> 128	          0.000009281	                    0.000005921=09
> 256	          0.000011201	                    0.000007041=09

i don't think these numbers can be trusted.

> #include <stdio.h>
> #include <sys/mman.h>
> #include <string.h>
> #include <stdlib.h>
> #include <time.h>
>=20
> #define DATA_SIZE 5*1024*1024
> #define MAX_LEN 1*1024*1024
> #define OFFSET 0
> #define LOOP_TIMES 100
> int main(){
>    char *str1,*src1;
>    str1 =3D (char *)mmap(NULL, DATA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVA=
TE|MAP_ANONYMOUS, -1, 0);
>=20
>    printf("function test start\n");
>   =20
>    src1 =3D str1+OFFSET;
>    struct timespec tv0,tv;
>    for(int len=3D2; len<=3DMAX_LEN; len*=3D2){
>       clock_gettime(CLOCK_REALTIME, &tv0);
>       for(int k=3D0; k<LOOP_TIMES; k++){
>           memset(src1, 'a', len);
>       }
>       clock_gettime(CLOCK_REALTIME, &tv);
>       tv.tv_sec -=3D tv0.tv_sec;
>       if ((tv.tv_nsec -=3D tv0.tv_nsec) < 0) {
> 	      tv.tv_nsec +=3D 1000000000;
> 	      tv.tv_sec--;
>       }
>       printf("len: %d  time: %ld.%.9ld\n",len, (long)tv.tv_sec, (long)tv.=
tv_nsec);


this repeatedly calls memset with exact same len, alignment and value.
so it favours branch heavy code since those are correctly predicted.

but even if you care about a branch-predicted microbenchmark, you
made a single measurement per size so you cannot tell how much the
time varies, you should do several measurements and take the min
so noise from system effects and cpu internal state are reduced
(also that state needs to be warmed up). and likely the LOOP_TIMES
should be bigger too for small sizes for reliable timing.

benchmarking string functions is tricky especially for a target arch
with many implementations.

>    }
>=20
>    printf("function test end\n");
>    munmap(str1,DATA_SIZE);
>    return 0;
> }
>=20