[-- Attachment #1: Type: text/plain, Size: 252 bytes --] Hello, When chunks are merged, we use "(curr_size + pre_size) ^ pre_size > pre_size" to decide whether to reclaim. I think this may be something related to performance, but I can’t prove it. I want to know the reason. Thank you! Zhengyu [-- Attachment #2: Type: text/html, Size: 698 bytes --]
On Fri, Jul 31, 2020 at 08:17:02AM +0800, Zhao Zhengyu wrote:
> Hello,
>
> When chunks are merged, we use "(curr_size + pre_size) ^ pre_size >
> pre_size" to decide whether to reclaim. I think this may be something
> related to performance, but I can’t prove it. I want to know the
> reason.
>
> Thank you!
>
> Zhengyu
I asked that same question a while ago. For one, this was in the old
malloc code, which is now usually no longer used. For two, this tries to
figure out if adding current size to previous size rolls over into a new
power of two. Usually, curr_size will be small and pre_size will be
large. Therefore, adding the two will not change much about the high
bits of pre_size, so due to the XOR operator, those bits will cancel out
and the result will be smaller than pre_size. However, if the sum does
roll over, then the sum has one bit set that isn't in pre_size and is
larger than all the ones in pre_size. So the XOR can't cancel that bit,
and the result of the XOR is greater than pre_size.
Fundamentally, it is an optimized version of (a_clz(curr_size +
pre_size) < a_clz(pre_size)).
Ciao,
Markus
On Fri, Jul 31, 2020 at 07:22:16PM +0200, Markus Wichmann wrote:
> On Fri, Jul 31, 2020 at 08:17:02AM +0800, Zhao Zhengyu wrote:
> > Hello,
> >
> > When chunks are merged, we use "(curr_size + pre_size) ^ pre_size >
> > pre_size" to decide whether to reclaim. I think this may be something
> > related to performance, but I can’t prove it. I want to know the
> > reason.
> >
> > Thank you!
> >
> > Zhengyu
>
> I asked that same question a while ago. For one, this was in the old
> malloc code, which is now usually no longer used. For two, this tries to
> figure out if adding current size to previous size rolls over into a new
> power of two. Usually, curr_size will be small and pre_size will be
> large. Therefore, adding the two will not change much about the high
> bits of pre_size, so due to the XOR operator, those bits will cancel out
> and the result will be smaller than pre_size. However, if the sum does
> roll over, then the sum has one bit set that isn't in pre_size and is
> larger than all the ones in pre_size. So the XOR can't cancel that bit,
> and the result of the XOR is greater than pre_size.
>
> Fundamentally, it is an optimized version of (a_clz(curr_size +
> pre_size) < a_clz(pre_size)).
Yes, it's a heuristic at least approximately equivalent to "crossed
the next power of two size boundary" to limit the frequency of madvise
syscalls when a large free zone keeps getting expanded by adjacent
tiny frees. However it does not work very well in practice, and
doesn't even mitigate the possibility of continuous syscall load when
a repeated malloc/free cycle occurs right at a power-of-two boundary.
mallocng handles this kind of thing much better by grouping same-sized
allocations and returning them as a group when all are freed, only
holding back from doing so if it's observed allocations of this size
"bouncing" (repeatedly creating and destroying the same group).
Rich
[-- Attachment #1: Type: text/plain, Size: 2516 bytes --] I read the code of mallocng carefully and found some ideas similar to my current work: For malloc requirements of the “small size”, I split one page into the specified size, and manage them as a group. When free, I give them back to heap together. But at the point of reclaiming physical pages to kernel, I can't get enough inspiration from mallocng. Anyway, I understand the question I asked. Thanks! On Sat, Aug 1, 2020 at 3:24 AM Rich Felker <dalias@libc.org> wrote: > On Fri, Jul 31, 2020 at 07:22:16PM +0200, Markus Wichmann wrote: > > On Fri, Jul 31, 2020 at 08:17:02AM +0800, Zhao Zhengyu wrote: > > > Hello, > > > > > > When chunks are merged, we use "(curr_size + pre_size) ^ pre_size > > > > pre_size" to decide whether to reclaim. I think this may be something > > > related to performance, but I can’t prove it. I want to know the > > > reason. > > > > > > Thank you! > > > > > > Zhengyu > > > > I asked that same question a while ago. For one, this was in the old > > malloc code, which is now usually no longer used. For two, this tries to > > figure out if adding current size to previous size rolls over into a new > > power of two. Usually, curr_size will be small and pre_size will be > > large. Therefore, adding the two will not change much about the high > > bits of pre_size, so due to the XOR operator, those bits will cancel out > > and the result will be smaller than pre_size. However, if the sum does > > roll over, then the sum has one bit set that isn't in pre_size and is > > larger than all the ones in pre_size. So the XOR can't cancel that bit, > > and the result of the XOR is greater than pre_size. > > > > Fundamentally, it is an optimized version of (a_clz(curr_size + > > pre_size) < a_clz(pre_size)). > > Yes, it's a heuristic at least approximately equivalent to "crossed > the next power of two size boundary" to limit the frequency of madvise > syscalls when a large free zone keeps getting expanded by adjacent > tiny frees. However it does not work very well in practice, and > doesn't even mitigate the possibility of continuous syscall load when > a repeated malloc/free cycle occurs right at a power-of-two boundary. > > mallocng handles this kind of thing much better by grouping same-sized > allocations and returning them as a group when all are freed, only > holding back from doing so if it's observed allocations of this size > "bouncing" (repeatedly creating and destroying the same group). > > Rich > [-- Attachment #2: Type: text/html, Size: 3178 bytes --]