mailing list of musl libc
 help / color / mirror / code / Atom feed
From: 张飞 <zhangfei@nj.iscas.ac.cn>
To: "Szabolcs Nagy" <nsz@port70.net>
Cc: musl@lists.openwall.com
Subject: Re: Re: Re: [musl] memset_riscv64
Date: Thu, 20 Apr 2023 16:17:10 +0800 (GMT+08:00)	[thread overview]
Message-ID: <4deb3986.247e1.1879dbd21b4.Coremail.zhangfei@nj.iscas.ac.cn> (raw)
In-Reply-To: <20230419090210.GR3630668@port70.net>

[-- Attachment #1: Type: text/plain, Size: 7671 bytes --]

Hi!
I listened to your suggestions and referred to string.c in Musl's test set(libc-bench), 
and then modified the test cases. Since BUFLEN is a fixed value in strlen.c, I modified 
it to a variable as a parameter in my own test case and passed it to the memset function. 
I adjusted the LOOP_TIMES has been counted up to 500 times and the running time has been 
sorted, only recording the running time of the middle 300 times.

I took turns executing two programs on the SiFive chip three times each, and the results 
are shown below.
                             First run result
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208102                     0.002304056
200                 0.005053208                     0.004629598
400                 0.008666684                     0.007739176
800                 0.014065196                     0.012372702
1600                0.023377685                     0.020090966
3200                0.040221849                     0.034059631
6400                0.072095377                     0.060028906
12800               0.134040475                     0.110039387
25600               0.257426806                     0.210710952
51200               1.173755160                     1.121833227
102400              3.693170402                     3.637194098
204800              8.919975455                     8.865504460
409600             19.410922418                    19.360956493
--------------------------------------------------------------------------------

                             Second run result 
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208109                     0.002293857
200                 0.005057374                     0.004640669
400                 0.008674218                     0.007760795
800                 0.014068582                     0.012417084
1600                0.023381095                     0.020124496
3200                0.040225138                     0.034093181
6400                0.072098744                     0.060069574
12800               0.134043954                     0.110088141
25600               0.256453187                     0.208578633
51200               1.166602505                     1.118972796
102400              3.684957231                     3.635116808
204800              8.916302592                     8.861590734
409600             19.411057216                    19.358777670
--------------------------------------------------------------------------------

                             Third run result 
--------------------------------------------------------------------------------
length(byte)  C language implementation(s)   Basic instruction implementation(s)
--------------------------------------------------------------------------------
100                 0.002208111                     0.002293227
200                 0.005056101                     0.004628539
400                 0.008677756                     0.007748687
800                 0.014085242                     0.012404443
1600                0.023397782                     0.020115710
3200                0.040242985                     0.034084435
6400                0.072116665                     0.060063767
12800               0.134060262                     0.110082427
25600               0.257865186                     0.209101754
51200               1.174257177                     1.117753408
102400              3.696518162                     3.635417503
204800              8.929357747                     8.858765915
409600             19.426520562                     19.356515671
--------------------------------------------------------------------------------

From the test results, it can be seen that the runtime of memset implemented using the basic 
instruction set assembly is basically shorter than that implemented using the C language. 
May I ask if the test results are convincing?


&gt; -----原始邮件-----
&gt; 发件人: "Szabolcs Nagy" <nsz@port70.net>
&gt; 发送时间: 2023-04-19 17:02:10 (星期三)
&gt; 收件人: "张飞" <zhangfei@nj.iscas.ac.cn>
&gt; 抄送: musl@lists.openwall.com
&gt; 主题: Re: Re: [musl] memset_riscv64
&gt; 
&gt; * 张飞 <zhangfei@nj.iscas.ac.cn> [2023-04-19 13:33:08 +0800]:
&gt; &gt; --------------------------------------------------------------------------------
&gt; &gt; length(byte)  C language implementation(s)   Basic instruction implementation(s)
&gt; &gt; --------------------------------------------------------------------------------	
&gt; &gt; 4	          0.00000352	                    0.000004001	
&gt; &gt; 8	          0.000004001	                    0.000005441	
&gt; &gt; 16	          0.000006241	                    0.00000464	
&gt; &gt; 32	          0.00000752	                    0.00000448	
&gt; &gt; 64	          0.000008481	                    0.000005281	
&gt; &gt; 128	          0.000009281	                    0.000005921	
&gt; &gt; 256	          0.000011201	                    0.000007041	
&gt; 
&gt; i don't think these numbers can be trusted.
&gt; 
&gt; &gt; #include <stdio.h>
&gt; &gt; #include <sys mman.h="">
&gt; &gt; #include <string.h>
&gt; &gt; #include <stdlib.h>
&gt; &gt; #include <time.h>
&gt; &gt; 
&gt; &gt; #define DATA_SIZE 5*1024*1024
&gt; &gt; #define MAX_LEN 1*1024*1024
&gt; &gt; #define OFFSET 0
&gt; &gt; #define LOOP_TIMES 100
&gt; &gt; int main(){
&gt; &gt;    char *str1,*src1;
&gt; &gt;    str1 = (char *)mmap(NULL, DATA_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
&gt; &gt; 
&gt; &gt;    printf("function test start\n");
&gt; &gt;    
&gt; &gt;    src1 = str1+OFFSET;
&gt; &gt;    struct timespec tv0,tv;
&gt; &gt;    for(int len=2; len&lt;=MAX_LEN; len*=2){
&gt; &gt;       clock_gettime(CLOCK_REALTIME, &amp;tv0);
&gt; &gt;       for(int k=0; k<loop_times; k++){=""> &gt;           memset(src1, 'a', len);
&gt; &gt;       }
&gt; &gt;       clock_gettime(CLOCK_REALTIME, &amp;tv);
&gt; &gt;       tv.tv_sec -= tv0.tv_sec;
&gt; &gt;       if ((tv.tv_nsec -= tv0.tv_nsec) &lt; 0) {
&gt; &gt; 	      tv.tv_nsec += 1000000000;
&gt; &gt; 	      tv.tv_sec--;
&gt; &gt;       }
&gt; &gt;       printf("len: %d  time: %ld.%.9ld\n",len, (long)tv.tv_sec, (long)tv.tv_nsec);
&gt; 
&gt; 
&gt; this repeatedly calls memset with exact same len, alignment and value.
&gt; so it favours branch heavy code since those are correctly predicted.
&gt; 
&gt; but even if you care about a branch-predicted microbenchmark, you
&gt; made a single measurement per size so you cannot tell how much the
&gt; time varies, you should do several measurements and take the min
&gt; so noise from system effects and cpu internal state are reduced
&gt; (also that state needs to be warmed up). and likely the LOOP_TIMES
&gt; should be bigger too for small sizes for reliable timing.
&gt; 
&gt; benchmarking string functions is tricky especially for a target arch
&gt; with many implementations.
&gt; 
&gt; &gt;    }
&gt; &gt; 
&gt; &gt;    printf("function test end\n");
&gt; &gt;    munmap(str1,DATA_SIZE);
&gt; &gt;    return 0;
&gt; &gt; }
&gt; &gt; 
</loop_times;></time.h></stdlib.h></string.h></sys></stdio.h></zhangfei@nj.iscas.ac.cn></zhangfei@nj.iscas.ac.cn></nsz@port70.net>

[-- Attachment #2: test_memset2.c --]
[-- Type: text/plain, Size: 1364 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define BUFLEN 500000
#define LOOP_TIMES 500

int cmp(const void *a, const void *b) {
    double x = *(double *)a;
    double y = *(double *)b;
    if (x < y) return -1;
    if (x > y) return 1;
    return 0;
}

int main(){
        char *buf = malloc(BUFLEN);
	double *arr = malloc(sizeof(double) * LOOP_TIMES);
        size_t i,j,k;
        struct timespec tv0,tv;
	double times;

        for(j=100; j<BUFLEN; j*=2){
          for(k=0; k<LOOP_TIMES; k++){
            for (i=0; i<100; i++)
                  memset(buf+i, i, j-i);
          }
        }

        for(j=100; j<BUFLEN; j*=2){
          for(k=0; k<LOOP_TIMES; k++){
            clock_gettime(CLOCK_REALTIME, &tv0);
            for (i=0; i<100; i++)
                  memset(buf+i, i, j-i);
            clock_gettime(CLOCK_REALTIME, &tv);
            tv.tv_sec -= tv0.tv_sec;
            if ((tv.tv_nsec -= tv0.tv_nsec) < 0) {
                tv.tv_nsec += 1000000000;
                tv.tv_sec--;
            }
	    arr[k] = tv.tv_sec + (double)tv.tv_nsec/1000000000;
          }
          qsort(arr, 500, sizeof(double), cmp); 
          
	  for (int m = 100; m < LOOP_TIMES - 100; m++) {
              times += arr[m];
          }
	  printf("len: %ld  time: %.9lf\n",j, times);
	}
        free(buf);
        return 0;
}

  reply	other threads:[~2023-04-20  8:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-11  2:17 张飞
2023-04-11  9:48 ` Pedro Falcato
2023-04-19  5:33   ` 张飞
2023-04-19  9:02     ` Szabolcs Nagy
2023-04-20  8:17       ` 张飞 [this message]
2023-04-21 13:30         ` Szabolcs Nagy
2023-04-21 14:50           ` Pedro Falcato
2023-04-21 16:54             ` Rich Felker
2023-04-21 17:01               ` enh
2023-04-26  7:25           ` 张飞

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4deb3986.247e1.1879dbd21b4.Coremail.zhangfei@nj.iscas.ac.cn \
    --to=zhangfei@nj.iscas.ac.cn \
    --cc=musl@lists.openwall.com \
    --cc=nsz@port70.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).