> You seem to think that if the group stride was 8100, calling realloc might memcpy up to 8100 bytes. This is not the case. Yes, I already understood that mallocng would only memcpy 6600 bytes when I was told that malloc_usable_size will return the size requested by the user. But AFAIK, many other malloc implementations basically don't keep 6600 bytes of data. So they're actually going to memcpy the 8100 bytes. > You also seem to be under the impression that the work to determine > that the size was 6600 and not 8100 is where most (or at least a > significant portion of) the time is spent. This is also not the case. > The majority of the metadata processing time is chasing pointers back > to the out-of-band metadata, validating it, validating that it > round-trips back, and validating various other things. Some of these > could in principle be omitted at the cost of loss-of-hardening. Yes, according to my previous understanding (which seems wrong now), since other malloc_usable_size implementations that directly return 8100 (the actual allocated size class length) such as tcmalloc are all very fast, so I can only understand that mallocng is so much slower than them because it has to return 6600, not 8100. Apart from this difference, there is no reason it is slower than other implementations of malloc_usable_size as I understand it. If this is not the main reason, can we speed up this algorithm with the help of a fast lookup table mechanism like tcmalloc? As I said before, this not only greatly increases the performance of malloc_usable_size , but also the performance of realloc and free . Thanks :-) -- Best Regards BaiYang baiyang@gmail.com http://i.baiy.cn **** < END OF EMAIL > **** From: Rich Felker Date: 2022-09-20 10:15 To: baiyang CC: musl Subject: Re: Re: [musl] The heap memory performance (malloc/free/realloc) is significantly degraded in musl 1.2 (compared to 1.1) On Tue, Sep 20, 2022 at 09:18:04AM +0800, baiyang wrote: > > There is no hidden "size actually allocated internally". The size you > > get is the size you requested. Everything else is allocator data > > structures *outside of the object* that the caller has no entitlement > > to peek or poke at, and malloc_usable_size's return value reflects > > that. > > If I understand correctly, according to the definition of size_classes in the mallocng code: > 1. When I call `void* p = malloc(6600)`, mallocng actually allocates > more than 8100 bytes of usable space, right? No, it uses space from a size-class-8176 group (~=slab) to produce an allocation of size 6600. The *allocation* is the part that belongs to the caller. Everything else is part of the allocator data structures. > 2. According to your previous explanation, calling > malloc_usable_size(p) at this time returns 6600, right? Yes. > My question is, if malloc_usable_size(p) can directly return 8191 > (or similar actual allocated size, as other libc do) instead of > 6600, is it possible to make mallocng achieve higher performance > both in time and space? No, and the reason you said you want it to does not make sense. You seem to think that if the group stride was 8100, calling realloc might memcpy up to 8100 bytes. This is not the case. If realloc has to allocate a new object, the amount copied will be 6600 or exactly whatever the allocated object size was (or the new size, if smaller). This is the only meaningful number. You also seem to be under the impression that the work to determine that the size was 6600 and not 8100 is where most (or at least a significant portion of) the time is spent. This is also not the case. The majority of the metadata processing time is chasing pointers back to the out-of-band metadata, validating it, validating that it round-trips back, and validating various other things. Some of these could in principle be omitted at the cost of loss-of-hardening. Figuring out that the allocation is 6600 bytes, once you already know the size class and out-of-band metadata, is quite trivial and hardly takes any of the time. (It also has a few validation checks that could be omitted at the cost of loss of hardening, but these are proportionally much smaller.) Rich