Re: [musl] Re:Re: [musl] The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1)

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: 王志强 <00107082@163.com>
Cc: musl@lists.openwall.com, Quentin Rameau <quinq@fifth.space>,
	Florian Weimer <fweimer@redhat.com>
Subject: Re: [musl] Re:Re: [musl] The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1)
Date: Wed, 21 Sep 2022 13:15:35 -0400	[thread overview]
Message-ID: <20220921171535.GV9709@brightrain.aerifal.cx> (raw)
In-Reply-To: <30eefd71.52d4.1835f8b367f.Coremail.00107082@163.com>

On Wed, Sep 21, 2022 at 06:15:02PM +0800, 王志强 wrote:
> Hi Rich,
> 
> 
> 
> I am quite interested into the topic,  and made a comparation between glibc and musl with following code:
> #define MAXF 4096
> void* tobefree[MAXF];
> int main() {
>     long long i;
>     int v, k;
>     size_t s, c=0;
>     char *p;
>     for (i=0; i<100000000L; i++) {
>         v = rand();   
>         s = ((v%256)+1)*1024;
>         p = (char*) malloc(s);
>         p[1023]=0;
>         if (c>=MAXF) {
>             k = v%c;
>             free(tobefree[k]);
>             tobefree[k]=tobefree[--c];
>         }
>         tobefree[c++]=p;
>     }
>     return 0;
> }
> ```
> 
> The results show signaficant difference.
> With glibc, (running within a debian docker image)
> # gcc -o m.debian -O0 app_malloc.c
> 
> # time ./m.debian
> real    0m37.529s
> user    0m36.677s
> sys    0m0.771s
> 
> With musl, (runnign within a alpine3.15 docker image)
> 
> # gcc -o m.alpine -O0 app_malloc.c
> 
> # time ./m.alpine
> real    6m 30.51s
> user    1m 36.67s
> sys    4m 53.31s
> 
> 
> 
> musl seems spend way too much time within kernel, while glibc hold most work within userspace.
> I used perf_event_open to profile those programs:
> musl profiling(total  302899 samples) shows that those "malloc/free" sequence spend lots of time dealing with pagefault/munmap/madvise/mmap
> 
> munmap(30.858% 93469/302899)
> _init?(22.583% 68404/302899)
> aligned_alloc?(89.290% 61078/68404)
> asm_exc_page_fault(45.961% 28072/61078)
> main(9.001% 6157/68404)
> asm_exc_page_fault(29.170% 1796/6157)
> rand(1.266% 866/68404)
> aligned_alloc?(20.437% 61904/302899)
> asm_exc_page_fault(56.038% 34690/61904)
> madvise(13.275% 40209/302899)
> mmap64(11.125% 33698/302899)
> 
> 
> But glibc profiling (total 29072 samples) is way much lighter, pagefault is the most cost while glibc spend significat time on "free"
> 
> 
> 
> pthread_attr_setschedparam?(82.021% 23845/29072)
> asm_exc_page_fault(1.657% 395/23845)
> _dl_catch_error?(16.714% 4859/29072)__libc_start_main(100.000% 4859/4859)
> cfree(58.839% 2859/4859)
> main(31.138% 1513/4859)
> asm_exc_page_fault(2.115% 32/1513)
> pthread_attr_setschedparam?(3.725% 181/4859)
> random(2.099% 102/4859)
> random_r(1.832% 89/4859)
> __libc_malloc(1.420% 69/4859)
> It seems to be me, glibc make lots of uasage of cache of kernel
> memory and avoid lots of pagefault and syscalls.
> Is this performance difference should concern realworld
> applications? On average, musl actual spend about 3~4ns per
> malloc/free, which is quite acceptable in realworld applications, I
> think.
> 
> 
> 
> (Seems to me, that the performance difference has nothing to do with
> malloc_usable_size, which may be indeed just a speculative guess
> without any base)

Indeed this has nothing to do with it. What you're seeing is just that
musl/mallocng return freed memory, and glibc, basically, doesn't
(modulo the special case of large contiguous free block at 'top' of
heap). This inherently has a time cost.

mallocng does make significant efforts to avoid hammering mmap/munmap
under repeated malloc/free, at least in cases where it can reasonably
be deemed to matter. However, this is best-effort, and always a
tradeoff on (potential) large unwanted memory usage vs performance.
More on this later.

Your test case, with the completely random size distribution across
various large sizes, is likely a worst case. The mean size you're
allocating is 128k, which is the threshold for direct mmap/munmap of
each allocation, so at least half of the allocations you're making can
*never* be reused, and will always be immediately unmapped on free. It
might be interesting to change the scaling factor from 1k to 256 bytes
so that basically all of the allocation sizes are in the
malloc-managed range.

Now, I mentioned above "when it can reasonably be deemed to matter".
One of the assumptions mallocng makes for deciding where to risk
larger memory usage for the sake of performance is that you're
*actually going to use* the memory you're allocating. Yes, 6min might
seem like a lot of time to do nothing, but that "doing nothing" was
churning through allocating 12 TB of memory. How much time would you
have spent *actually processing* 12 TB of data (and not as flat linear
data, but avg-128kB chunks), and what percentage of that time would
the 6min in malloc have been?

Rich

next prev parent reply	other threads:[~2022-09-21 17:15 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-19  7:53 baiyang
2022-09-19 11:08 ` Szabolcs Nagy
2022-09-19 12:36   ` Florian Weimer
2022-09-19 13:46     ` Rich Felker
2022-09-19 13:53       ` James Y Knight
2022-09-19 17:40         ` baiyang
2022-09-19 18:14           ` Szabolcs Nagy
2022-09-19 18:40             ` baiyang
2022-09-19 19:07             ` Gabriel Ravier
2022-09-19 19:21               ` Rich Felker
2022-09-19 21:02                 ` Gabriel Ravier
2022-09-19 21:47                   ` Rich Felker
2022-09-19 22:31                     ` Gabriel Ravier
2022-09-19 22:46                       ` baiyang
2022-09-19 20:46             ` Nat!
2022-09-20  8:51               ` Szabolcs Nagy
2022-09-20  0:13           ` James Y Knight
2022-09-20  0:25             ` baiyang
2022-09-20  0:38               ` Rich Felker
2022-09-20  0:47                 ` baiyang
2022-09-20  1:00                   ` Rich Felker
2022-09-20  1:18                     ` baiyang
2022-09-20  2:15                       ` Rich Felker
2022-09-20  2:35                         ` baiyang
2022-09-20  3:28                           ` Rich Felker
2022-09-20  3:53                             ` baiyang
2022-09-20  5:41                               ` Rich Felker
2022-09-20  5:56                                 ` baiyang
2022-09-20 12:16                                   ` Rich Felker
2022-09-20 17:21                                     ` baiyang
2022-09-20  8:33       ` Florian Weimer
2022-09-20 13:54         ` Siddhesh Poyarekar
2022-09-20 16:59           ` James Y Knight
2022-09-20 17:34             ` Szabolcs Nagy
2022-09-20 19:53               ` James Y Knight
2022-09-24  8:55               ` Fangrui Song
2022-09-20 17:39             ` baiyang
2022-09-20 18:12               ` Quentin Rameau
2022-09-20 18:19                 ` Rich Felker
2022-09-20 18:26                   ` Alexander Monakov
2022-09-20 18:35                     ` baiyang
2022-09-20 20:33                       ` Gabriel Ravier
2022-09-20 20:45                         ` baiyang
2022-09-21  8:42                           ` NRK
2022-09-20 18:37                     ` Quentin Rameau
2022-09-21 10:15                   ` [musl] " 王志强
2022-09-21 16:11                     ` [musl] " 王志强
2022-09-21 17:15                     ` Rich Felker [this message]
2022-09-21 17:58                       ` [musl] " Rich Felker
2022-09-22  3:34                         ` [musl] " 王志强
2022-09-22  9:10                           ` [musl] " 王志强
2022-09-22  9:39                             ` [musl] " 王志强
2022-09-20 17:28           ` baiyang
2022-09-20 17:44             ` Siddhesh Poyarekar
2022-10-10 14:13           ` Florian Weimer
2022-09-19 13:43 ` Rich Felker
2022-09-19 17:32   ` baiyang
2022-09-19 18:15     ` Rich Felker
2022-09-19 18:44       ` baiyang
2022-09-19 19:18         ` Rich Felker
2022-09-19 19:45           ` baiyang
2022-09-19 20:07             ` Rich Felker
2022-09-19 20:17               ` baiyang
2022-09-19 20:28                 ` Rich Felker
2022-09-19 20:38                   ` baiyang
2022-09-19 22:02                 ` Quentin Rameau
2022-09-19 20:17             ` Joakim Sindholt
2022-09-19 20:33               ` baiyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220921171535.GV9709@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=00107082@163.com \
    --cc=fweimer@redhat.com \
    --cc=musl@lists.openwall.com \
    --cc=quinq@fifth.space \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).