Hi Rich, I am quite interested into the topic, and made a comparation between glibc and musl with following code: #define MAXF 4096 void* tobefree[MAXF]; int main() { long long i; int v, k; size_t s, c=0; char *p; for (i=0; i<100000000L; i++) { v = rand(); s = ((v%256)+1)*1024; p = (char*) malloc(s); p[1023]=0; if (c>=MAXF) { k = v%c; free(tobefree[k]); tobefree[k]=tobefree[--c]; } tobefree[c++]=p; } return 0; } ``` The results show signaficant difference. With glibc, (running within a debian docker image) # gcc -o m.debian -O0 app_malloc.c # time ./m.debian real 0m37.529s user 0m36.677s sys 0m0.771s With musl, (runnign within a alpine3.15 docker image) # gcc -o m.alpine -O0 app_malloc.c # time ./m.alpine real 6m 30.51s user 1m 36.67s sys 4m 53.31s musl seems spend way too much time within kernel, while glibc hold most work within userspace. I used perf_event_open to profile those programs: musl profiling(total 302899 samples) shows that those "malloc/free" sequence spend lots of time dealing with pagefault/munmap/madvise/mmap munmap(30.858% 93469/302899) _init?(22.583% 68404/302899) aligned_alloc?(89.290% 61078/68404) asm_exc_page_fault(45.961% 28072/61078) main(9.001% 6157/68404) asm_exc_page_fault(29.170% 1796/6157) rand(1.266% 866/68404) aligned_alloc?(20.437% 61904/302899) asm_exc_page_fault(56.038% 34690/61904) madvise(13.275% 40209/302899) mmap64(11.125% 33698/302899) But glibc profiling (total 29072 samples) is way much lighter, pagefault is the most cost while glibc spend significat time on "free" pthread_attr_setschedparam?(82.021% 23845/29072) asm_exc_page_fault(1.657% 395/23845) _dl_catch_error?(16.714% 4859/29072)__libc_start_main(100.000% 4859/4859) cfree(58.839% 2859/4859) main(31.138% 1513/4859) asm_exc_page_fault(2.115% 32/1513) pthread_attr_setschedparam?(3.725% 181/4859) random(2.099% 102/4859) random_r(1.832% 89/4859) __libc_malloc(1.420% 69/4859) It seems to be me, glibc make lots of uasage of cache of kernel memory and avoid lots of pagefault and syscalls. Is this performance difference should concern realworld applications? On average, musl actual spend about 3~4ns per malloc/free, which is quite acceptable in realworld applications, I think. (Seems to me, that the performance difference has nothing to do with malloc_usable_size, which may be indeed just a speculative guess without any base) David Wang