* [musl] mallocng progress and growth chart
@ 2020-05-10 18:09 Rich Felker
2020-05-11 17:51 ` Rich Felker
2020-05-16 0:29 ` Rich Felker
0 siblings, 2 replies; 12+ messages in thread
From: Rich Felker @ 2020-05-10 18:09 UTC (permalink / raw)
To: musl
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]
I just pushed a big queued set of changes to mallocng, including some
directions that didn't go well and were mostly reversed but with
improvements to the old. These include:
- Fixes to definitions of some size classes where counts didn't fit in
a power-of-two sized mmap or containing-group.
- Fixing bad interaction between realloc and coarse size classing
(always causing malloc-memcpy-free due to size class not matching;
this was what was making apk and some other things painfully slow).
- Coarse size classing down to class 4 (except 6->7). This tends to
save memory in low-usage programs.
- Improvements to logic for single-slot and low-count groups to avoid
situations that waste lots of memory as well as ones that
unnecessarily do eagar allocation.
- Flipping definition/dependency of aligned_alloc/memalign so that
legacy nonstd function is not used in definition of std functions.
- New bounce counter with sequence numbers and decay, so that bounces
with lots of intervening expensive (mmap/munmap) ops don't get
accounted, and classes in bouncing state (retaining a free group
unnecessarily to avoid cost of un- and re-mapping) can return to
non-bouncing state.
I've also hacked up a test program that probes and prints the growth
sequence at each size just below a step in size class or page
boundary. This is intended to verify that the emergent behavior from
the allocation strategy rules in mallocng isn't doing anything
horribly stupid.
For each size, 200 mallocs are performed and if the result is the
start of a new group, the size/type of the allocation is printed. The
form 'NxZ' indicates a normal group of N slots each of slot size Z
(Z-4 usable). The form 'Nk' indicates a single-slot group where the
mapping length (N kB) is less than the nominal slot size; it shows
places where the strategy chose to use individual mmap below the hard
mmap threshold to save space and preserve individual freeability. (And
in principle these allocations should be amenable to mremap for
realloc, but it's not used yet.)
Source (with nasty poking at implementation internals, so build with
-no-builtin or -ffreestanding) is also attached.
Rich
[-- Attachment #2: growth.c --]
[-- Type: text/plain, Size: 1549 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/wait.h>
struct meta {
struct meta *prev, *next;
struct group *mem;
volatile int avail_mask, freed_mask;
uintptr_t last_idx:5;
uintptr_t freeable:1;
uintptr_t sizeclass:6;
uintptr_t maplen:8*sizeof(uintptr_t)-12;
};
const uint16_t size_classes[] = {
1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 12, 15,
18, 20, 25, 31,
36, 42, 50, 63,
72, 84, 102, 127,
146, 170, 204, 255,
292, 340, 409, 511,
584, 682, 818, 1023,
1169, 1364, 1637, 2047,
2340, 2730, 3276, 4095,
4680, 5460, 6552, 8191,
};
void runtest(size_t size)
{
pid_t pid;
if ((pid=fork())) {
waitpid(pid, 0, 0);
return;
}
unsigned char *p, *ptrs[200];
printf("%5zu: ", size);
for (int i=0; i<sizeof ptrs / sizeof *ptrs; i++) {
ptrs[i] = p = malloc(size);
int idx = p[-3] & 31;
struct meta *m = *(struct meta **)(p-16-*(uint16_t *)(p-2)*16);
if (idx==0) {
if (!m->maplen || m->maplen*4096UL>=size_classes[m->sizeclass]*16+16)
printf(" %2dx%-5d", m->last_idx+1, size_classes[m->sizeclass]*16);
else
printf(" %7luk", m->maplen*4UL);
} else {
//printf("[ ]");
}
}
for (int i=0; i<sizeof ptrs / sizeof *ptrs; i++) free(ptrs[i]);
printf("\n");
exit(0);
}
int main() {
if (1) for (int i=0; i<44; i++) {
size_t size = size_classes[i]*16;
runtest(size-4);
size_t pgaligned = (size+4095) & -4096;
if (pgaligned != size+16 && pgaligned < size_classes[i+1]*16)
runtest(pgaligned-20);
}
// runtest(4076);
// runtest(4076);
// runtest(4076);
}
[-- Attachment #3: growth.txt --]
[-- Type: text/plain, Size: 13062 bytes --]
12: 30x16 30x16 30x16 30x16 30x16 30x16 30x16
28: 15x32 15x32 15x32 15x32 15x32 15x32 15x32 15x32 15x32 31x32 31x32 31x32
44: 10x48 10x48 10x48 10x48 10x48 10x48 10x48 10x48 20x48 20x48 20x48 20x48 20x48 20x48
60: 7x64 7x64 7x64 7x64 7x64 7x64 7x64 7x64 7x64 15x64 15x64 15x64 15x64 15x64 31x64 31x64
76: 5x96 5x96 6x80 6x80 6x80 6x80 6x80 6x80 6x80 6x80 12x80 12x80 12x80 12x80 12x80 25x80 25x80 25x80 25x80
92: 5x96 5x96 5x96 5x96 5x96 5x96 5x96 5x96 10x96 10x96 10x96 10x96 10x96 21x96 21x96 21x96 21x96 21x96 21x96
108: 4x112 4x112 4x112 4x112 4x112 4x112 4x112 4x112 8x112 8x112 8x112 8x112 8x112 18x112 18x112 18x112 18x112 18x112 18x112 18x112 18x112
124: 7x128 7x128 7x128 7x128 7x128 7x128 7x128 7x128 7x128 15x128 15x128 15x128 15x128 15x128 31x128 31x128
140: 3x160 3x160 3x160 3x160 6x144 6x144 6x144 6x144 6x144 6x144 6x144 6x144 6x144 6x144 14x144 14x144 14x144 14x144 28x144 28x144 28x144
156: 3x160 3x160 3x160 3x160 3x160 3x160 3x160 3x160 6x160 6x160 6x160 6x160 12x160 12x160 12x160 12x160 24x160 24x160 24x160 24x160 24x160
188: 2x240 2x240 2x240 2x240 2x240 5x192 5x192 5x192 5x192 5x192 5x192 5x192 5x192 10x192 10x192 10x192 10x192 20x192 20x192 20x192 20x192 20x192 20x192
236: 2x240 2x240 2x240 2x240 2x240 2x240 2x240 2x240 4x240 4x240 4x240 4x240 8x240 8x240 8x240 8x240 16x240 16x240 16x240 16x240 32x240 32x240 32x240
284: 3x320 3x320 3x320 3x320 7x288 7x288 7x288 7x288 7x288 7x288 7x288 7x288 14x288 14x288 14x288 14x288 28x288 28x288 28x288
316: 3x320 3x320 3x320 3x320 3x320 3x320 3x320 3x320 6x320 6x320 6x320 6x320 12x320 12x320 12x320 12x320 24x320 24x320 24x320 24x320 24x320
396: 2x496 2x496 2x496 2x496 2x496 5x400 5x400 5x400 5x400 5x400 5x400 5x400 5x400 10x400 10x400 10x400 10x400 20x400 20x400 20x400 20x400 20x400 20x400
492: 2x496 2x496 2x496 2x496 2x496 2x496 2x496 2x496 4x496 4x496 4x496 4x496 8x496 8x496 8x496 8x496 16x496 16x496 16x496 16x496 32x496 32x496 32x496
572: 3x672 3x672 3x672 3x672 7x576 7x576 7x576 7x576 7x576 7x576 7x576 7x576 14x576 14x576 14x576 14x576 28x576 28x576 28x576
668: 3x672 3x672 3x672 3x672 3x672 3x672 3x672 3x672 6x672 6x672 6x672 6x672 12x672 12x672 12x672 12x672 24x672 24x672 24x672 24x672 24x672
796: 2x1008 2x1008 2x1008 2x1008 2x1008 5x800 5x800 5x800 5x800 5x800 5x800 5x800 5x800 10x800 10x800 10x800 10x800 20x800 20x800 20x800 20x800 20x800 20x800
1004: 2x1008 2x1008 2x1008 2x1008 2x1008 2x1008 2x1008 2x1008 4x1008 4x1008 4x1008 4x1008 8x1008 8x1008 8x1008 8x1008 16x1008 16x1008 16x1008 16x1008 32x1008 32x1008 32x1008
1148: 3x1344 3x1344 3x1344 3x1344 7x1152 7x1152 7x1152 7x1152 7x1152 7x1152 7x1152 7x1152 14x1152 14x1152 14x1152 14x1152 28x1152 28x1152 28x1152
1340: 3x1344 3x1344 3x1344 3x1344 3x1344 3x1344 3x1344 3x1344 6x1344 6x1344 6x1344 6x1344 12x1344 12x1344 12x1344 12x1344 24x1344 24x1344 24x1344 24x1344 24x1344
1628: 2x2032 2x2032 2x2032 2x2032 2x2032 5x1632 5x1632 5x1632 5x1632 5x1632 5x1632 5x1632 5x1632 10x1632 10x1632 10x1632 10x1632 20x1632 20x1632 20x1632 20x1632 20x1632 20x1632
2028: 2x2032 2x2032 2x2032 2x2032 2x2032 2x2032 2x2032 2x2032 4x2032 4x2032 4x2032 4x2032 8x2032 8x2032 8x2032 8x2032 16x2032 16x2032 16x2032 16x2032 32x2032 32x2032 32x2032
2332: 3x2720 3x2720 3x2720 3x2720 5x2336 5x2336 5x2336 5x2336 5x2336 5x2336 7x2336 7x2336 7x2336 7x2336 14x2336 14x2336 14x2336 14x2336 28x2336 28x2336 28x2336
2716: 3x2720 3x2720 3x2720 3x2720 3x2720 3x2720 3x2720 3x2720 6x2720 6x2720 6x2720 6x2720 12x2720 12x2720 12x2720 12x2720 24x2720 24x2720 24x2720 24x2720 24x2720
3260: 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 2x4080 5x3264 5x3264 5x3264 5x3264 5x3264 5x3264 5x3264 5x3264 10x3264 10x3264 10x3264 10x3264 20x3264 20x3264 20x3264 20x3264 20x3264 20x3264
4076: 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 1x4080 2x4080 2x4080 2x4080 2x4080 4x4080 4x4080 4x4080 4x4080 8x4080 8x4080 8x4080 8x4080 16x4080 16x4080 16x4080 16x4080 32x4080 32x4080 32x4080
4668: 2x5440 2x5440 2x5440 2x5440 2x5440 5x4672 5x4672 5x4672 5x4672 5x4672 5x4672 7x4672 7x4672 7x4672 7x4672 14x4672 14x4672 14x4672 14x4672 28x4672 28x4672 28x4672
5436: 2x5440 2x5440 2x5440 2x5440 2x5440 2x5440 3x5440 3x5440 3x5440 3x5440 6x5440 6x5440 6x5440 6x5440 12x5440 12x5440 12x5440 12x5440 24x5440 24x5440 24x5440 24x5440 24x5440
6540: 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 2x8176 3x6544 3x6544 3x6544 3x6544 3x6544 3x6544 3x6544 5x6544 5x6544 5x6544 5x6544 10x6544 10x6544 10x6544 10x6544 20x6544 20x6544 20x6544 20x6544 20x6544 20x6544
8172: 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 1x8176 2x8176 2x8176 2x8176 2x8176 4x8176 4x8176 4x8176 4x8176 8x8176 8x8176 8x8176 8x8176 16x8176 16x8176 16x8176 16x8176 32x8176 32x8176 32x8176
9340: 2x10912 2x10912 2x10912 2x10912 2x10912 3x9344 3x9344 3x9344 3x9344 3x9344 3x9344 3x9344 3x9344 3x9344 3x9344 7x9344 7x9344 7x9344 7x9344 14x9344 14x9344 14x9344 14x9344 28x9344 28x9344 28x9344
10908: 2x10912 2x10912 2x10912 2x10912 2x10912 2x10912 3x10912 3x10912 3x10912 3x10912 6x10912 6x10912 6x10912 6x10912 12x10912 12x10912 12x10912 12x10912 24x10912 24x10912 24x10912 24x10912 24x10912
12268: 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 12k 10x13088 10x13088 10x13088 10x13088 20x13088 20x13088 20x13088 20x13088 20x13088 20x13088
13084: 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 2x16368 1x13088 1x13088 1x13088 1x13088 1x13088 1x13088 3x13088 3x13088 3x13088 3x13088 3x13088 5x13088 5x13088 5x13088 5x13088 10x13088 10x13088 10x13088 10x13088 20x13088 20x13088 20x13088 20x13088 20x13088 20x13088
16364: 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 1x16368 2x16368 2x16368 2x16368 2x16368 4x16368 4x16368 4x16368 4x16368 8x16368 8x16368 8x16368 8x16368 16x16368 16x16368 16x16368 16x16368 32x16368 32x16368 32x16368
18700: 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 1x18704 1x18704 1x18704 1x18704 1x18704 1x18704 3x18704 3x18704 3x18704 3x18704 3x18704 3x18704 3x18704 3x18704 7x18704 7x18704 7x18704 7x18704 14x18704 14x18704 14x18704 14x18704 28x18704 28x18704 28x18704
20460: 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 20k 12x21824 12x21824 12x21824 12x21824 24x21824 24x21824 24x21824 24x21824 24x21824
21820: 1x21824 1x21824 1x21824 1x21824 2x21824 2x21824 2x21824 2x21824 3x21824 3x21824 3x21824 3x21824 6x21824 6x21824 6x21824 6x21824 12x21824 12x21824 12x21824 12x21824 24x21824 24x21824 24x21824 24x21824 24x21824
24556: 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 24k 10x26192 10x26192 10x26192 10x26192 20x26192 20x26192 20x26192 20x26192 20x26192 20x26192
26188: 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 1x26192 1x26192 1x26192 1x26192 1x26192 1x26192 3x26192 3x26192 3x26192 3x26192 3x26192 5x26192 5x26192 5x26192 5x26192 10x26192 10x26192 10x26192 10x26192 20x26192 20x26192 20x26192 20x26192 20x26192 20x26192
28652: 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 28k 8x32752 8x32752 8x32752 8x32752 16x32752 16x32752 16x32752 16x32752 32x32752 32x32752 32x32752
32748: 1x32752 1x32752 1x32752 1x32752 1x32752 1x32752 1x32752 1x32752 2x32752 2x32752 2x32752 2x32752 4x32752 4x32752 4x32752 4x32752 8x32752 8x32752 8x32752 8x32752 16x32752 16x32752 16x32752 16x32752 32x32752 32x32752 32x32752
37436: 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 1x37440 1x37440 1x37440 1x37440 1x37440 1x37440 3x37440 3x37440 3x37440 3x37440 3x37440 3x37440 3x37440 3x37440 7x37440 7x37440 7x37440 7x37440 14x37440 14x37440 14x37440 14x37440 28x37440 28x37440 28x37440
40940: 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 40k 12x43680 12x43680 12x43680 12x43680 24x43680 24x43680 24x43680 24x43680 24x43680
43676: 1x43680 1x43680 1x43680 1x43680 2x43680 2x43680 2x43680 2x43680 3x43680 3x43680 3x43680 3x43680 6x43680 6x43680 6x43680 6x43680 12x43680 12x43680 12x43680 12x43680 24x43680 24x43680 24x43680 24x43680 24x43680
45036: 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 44k 10x52416 10x52416 10x52416 10x52416 20x52416 20x52416 20x52416 20x52416 20x52416 20x52416
52412: 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 1x52416 1x52416 1x52416 1x52416 1x52416 1x52416 3x52416 3x52416 3x52416 3x52416 3x52416 5x52416 5x52416 5x52416 5x52416 10x52416 10x52416 10x52416 10x52416 20x52416 20x52416 20x52416 20x52416 20x52416 20x52416
53228: 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 52k 8x65520 8x65520 8x65520 8x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520
65516: 1x65520 1x65520 1x65520 1x65520 1x65520 1x65520 1x65520 1x65520 2x65520 2x65520 2x65520 2x65520 4x65520 4x65520 4x65520 4x65520 8x65520 8x65520 8x65520 8x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520 16x65520
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-10 18:09 [musl] mallocng progress and growth chart Rich Felker
@ 2020-05-11 17:51 ` Rich Felker
2020-05-16 0:29 ` Rich Felker
1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2020-05-11 17:51 UTC (permalink / raw)
To: musl
On Sun, May 10, 2020 at 02:09:34PM -0400, Rich Felker wrote:
> I just pushed a big queued set of changes to mallocng, including some
> directions that didn't go well and were mostly reversed but with
> improvements to the old. These include:
>
> - Fixes to definitions of some size classes where counts didn't fit in
> a power-of-two sized mmap or containing-group.
>
> - Fixing bad interaction between realloc and coarse size classing
> (always causing malloc-memcpy-free due to size class not matching;
> this was what was making apk and some other things painfully slow).
>
> - Coarse size classing down to class 4 (except 6->7). This tends to
> save memory in low-usage programs.
>
> - Improvements to logic for single-slot and low-count groups to avoid
> situations that waste lots of memory as well as ones that
> unnecessarily do eagar allocation.
>
> - Flipping definition/dependency of aligned_alloc/memalign so that
> legacy nonstd function is not used in definition of std functions.
>
> - New bounce counter with sequence numbers and decay, so that bounces
> with lots of intervening expensive (mmap/munmap) ops don't get
> accounted, and classes in bouncing state (retaining a free group
> unnecessarily to avoid cost of un- and re-mapping) can return to
> non-bouncing state.
I left off one fairly important one:
- Hardening enhancement: clearing the pointer back to the group's meta
struct before freeing the group, so that the next time the memory is
handed out in an allocation the uninitialized memory doesn't contain
pointers to malloc internal structures. For example:
free(malloc(42));
p=malloc(492);
would include a pointer to the freed meta structure for the nested
group, at offset 0 or 16 if I'm not mistaken. This could allow
uninitialized memory leak plus a second difficult-to-exploit bug to
be transformed into an easier-to-exploit bug.
This also suggested to me that we could clear the pointer back to meta
whenever a group is entirely free slots but not itself freed, and
restore it only when it's subsequently used. This might have minor
hardening benefits too, but it highlights that we could use MADV_FREE
on such groups, which might be valuable at large sizes. OTOH MADV_FREE
has some anti-hardening properties (potentially loses offset cycling
state under attacker-controlled load, allowing the attacker to obtain
more predictable addresses) that make it less attractive to use here.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-10 18:09 [musl] mallocng progress and growth chart Rich Felker
2020-05-11 17:51 ` Rich Felker
@ 2020-05-16 0:29 ` Rich Felker
2020-05-16 3:29 ` Rich Felker
2020-05-18 18:35 ` Rich Felker
1 sibling, 2 replies; 12+ messages in thread
From: Rich Felker @ 2020-05-16 0:29 UTC (permalink / raw)
To: musl
On Sun, May 10, 2020 at 02:09:34PM -0400, Rich Felker wrote:
> 4668: 2x5440 2x5440 2x5440 2x5440 2x5440 5x4672 5x4672 5x4672 5x4672 5x4672 5x4672 7x4672 ...
This turns out to be just about the worst edge case we have, and in a
sense one that's fundamental. Sadly there are a number of
applications, including bash, that do a lot of malloc(4096). The ones
that just allocate and don't have any complex malloc/free patterns
will see somewhat higher usage with mallocng, and I don't think
there's any way around that. (Note: oldmalloc also has problems here
under certain patterns of alloc/free, due to bin_index vs bin_index_up
discrepancy!)
I have some changes I'm about to push that help this somewhat. The
2x5440 count-reduction (this is 3x with proper-fit count) is overly
costly at this size, and imposes a 12.5% waste on top of the slack
from coarse size classing and the base slack from mapping 4096 into a
4672 size class. Getting rid of it, and accounting for existing coarse
size class usage when doing the 7->5 reduction, produce:
4668: 3x5440 3x5440 3x5440 3x5440 5x4672 7x4672 ...
which seems like about the best we can do. The initial allocation of
3x rather than 2x only uses one additional page to get an additional
slot that can be used before needing to mmap again, which is a big win
(essentially that third slot doesn't have any overhead) except in the
case where it's never used, and only a small loss (1 page) even then.
The same thing happens at the next doubling for malloc(8192), and the
same mitigation applies. However with that:
9340: 3x10912 3x10912 3x10912 3x10912 3x9344 7x9344 ...
the coarse size classing is dubious because the size is sufficiently
large that a 7->3 count reduction can be used, with the same count the
coarsse class would have, but with a 28k rather than 32k mapping.
Unfortunately the decision here depends on knowing page size, which
isn't constant at the point where it needs to be made. For integration
with musl, page size is initially known even if it's variable, so we
could possibly make a decision not to use coarse sizing based on that
here, but standalone mallocng lacks that knowledge (page size isn't
known until after first alloc_meta). This might could be reworked.
There's a fairly small range of sizes that would benefit (larger ones
are better off with individual mmap because page size quickly becomes
"finer than" size classes), but the benefit seems fairly significant
(not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at
the sizes where it helps.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-16 0:29 ` Rich Felker
@ 2020-05-16 3:29 ` Rich Felker
2020-05-17 3:30 ` Rich Felker
2020-05-18 18:35 ` Rich Felker
1 sibling, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-05-16 3:29 UTC (permalink / raw)
To: musl
On Fri, May 15, 2020 at 08:29:13PM -0400, Rich Felker wrote:
> The same thing happens at the next doubling for malloc(8192), and the
> same mitigation applies. However with that:
>
> 9340: 3x10912 3x10912 3x10912 3x10912 3x9344 7x9344 ...
>
> the coarse size classing is dubious because the size is sufficiently
> large that a 7->3 count reduction can be used, with the same count the
> coarsse class would have, but with a 28k rather than 32k mapping.
>
> Unfortunately the decision here depends on knowing page size, which
> isn't constant at the point where it needs to be made. For integration
> with musl, page size is initially known even if it's variable, so we
> could possibly make a decision not to use coarse sizing based on that
> here, but standalone mallocng lacks that knowledge (page size isn't
> known until after first alloc_meta). This might could be reworked.
> There's a fairly small range of sizes that would benefit (larger ones
> are better off with individual mmap because page size quickly becomes
> "finer than" size classes), but the benefit seems fairly significant
> (not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at
> the sizes where it helps.
It might just make sense to always disable coarse size classing
starting at this size. The absolute amount of over-allocation is
sufficiently high that it's probably not justified. On archs with
larger page size, more pages may be mapped (e.g. 7x9344 can't be
reduced to 3x if page size is over 4k, and can't be reduced to 5x if
page size is over 16k) but having too much memory mapped is generally
an expected consequence of ridiculous page sizes.
Another possibly horrible idea for dealing with exact page sizes at
low usage: pair groups of short length. Rather than needing to choose
between coarse classing 3x5440 (16k) or a 5x4672 (5 whole slots, 24k)
for a malloc(4095), at low useage we could create a pseudo-2x4672
where the second slot is truncated and contains a nested 1x3264 that's
only freeable together with the outer group. This relation always
works: if size class k is just under a power of two (k == 3 mod 4),
classes k+1 and k-1 add up to just under 2 times class k. (This
follows from n/5 + 2n/7 == (7n+10n)/35 == 17n/35 <= n/2 == 2*n/4,
where n/5, 2n/7, and n/4 are the sizes of class k-1, k+1, and k,
respectively.)
This gives a strategy that always works for allocating very-low-count
of arbitrary size classes, as long as we're willing to allocate a slot
for the complementary size at the same time. And in some ways it's
nicer than coarse classing -- rather than overallocating the requested
slot in hopes that the grouped slots will be useful for larger
allocations too, it allocates a smaller complementary pair in hopes
that the complementary size will be useful.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-16 3:29 ` Rich Felker
@ 2020-05-17 3:30 ` Rich Felker
2020-05-18 18:53 ` Rich Felker
0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-05-17 3:30 UTC (permalink / raw)
To: musl
On Fri, May 15, 2020 at 11:29:01PM -0400, Rich Felker wrote:
> On Fri, May 15, 2020 at 08:29:13PM -0400, Rich Felker wrote:
> > The same thing happens at the next doubling for malloc(8192), and the
> > same mitigation applies. However with that:
> >
> > 9340: 3x10912 3x10912 3x10912 3x10912 3x9344 7x9344 ...
> >
> > the coarse size classing is dubious because the size is sufficiently
> > large that a 7->3 count reduction can be used, with the same count the
> > coarsse class would have, but with a 28k rather than 32k mapping.
> >
> > Unfortunately the decision here depends on knowing page size, which
> > isn't constant at the point where it needs to be made. For integration
> > with musl, page size is initially known even if it's variable, so we
> > could possibly make a decision not to use coarse sizing based on that
> > here, but standalone mallocng lacks that knowledge (page size isn't
> > known until after first alloc_meta). This might could be reworked.
> > There's a fairly small range of sizes that would benefit (larger ones
> > are better off with individual mmap because page size quickly becomes
> > "finer than" size classes), but the benefit seems fairly significant
> > (not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at
> > the sizes where it helps.
>
> It might just make sense to always disable coarse size classing
> starting at this size. The absolute amount of over-allocation is
> sufficiently high that it's probably not justified. On archs with
> larger page size, more pages may be mapped (e.g. 7x9344 can't be
> reduced to 3x if page size is over 4k, and can't be reduced to 5x if
> page size is over 16k) but having too much memory mapped is generally
> an expected consequence of ridiculous page sizes.
>
> Another possibly horrible idea for dealing with exact page sizes at
> low usage: pair groups of short length. Rather than needing to choose
> between coarse classing 3x5440 (16k) or a 5x4672 (5 whole slots, 24k)
> for a malloc(4095), at low useage we could create a pseudo-2x4672
> where the second slot is truncated and contains a nested 1x3264 that's
> only freeable together with the outer group. This relation always
> works: if size class k is just under a power of two (k == 3 mod 4),
> classes k+1 and k-1 add up to just under 2 times class k. (This
> follows from n/5 + 2n/7 == (7n+10n)/35 == 17n/35 <= n/2 == 2*n/4,
> where n/5, 2n/7, and n/4 are the sizes of class k-1, k+1, and k,
> respectively.)
>
> This gives a strategy that always works for allocating very-low-count
> of arbitrary size classes, as long as we're willing to allocate a slot
> for the complementary size at the same time. And in some ways it's
> nicer than coarse classing -- rather than overallocating the requested
> slot in hopes that the grouped slots will be useful for larger
> allocations too, it allocates a smaller complementary pair in hopes
> that the complementary size will be useful.
Another alternative for avoiding eagar commit at low usage, which
works for all but nommu: when adding groups with nontrivial slot count
at low usage, don't activate all the slots right away. Reserve vm
space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
and treat it as a group of just 1 slot until there are no slots free
and one is needed. Then, unprotect another page (or more if needed to
fit another slot, as would be needed at larger sizes) and adjust the
slot count to match. (Conceptually; implementation-wise, the slot
count would be fixed, and there would just be a limit on the number of
slots made avilable when transformed from "freed" to "available" for
activation.)
Note that this is what happens anyway with physical memory as clean
anonymous pages are first touched, but (1) doing it without explicit
unprotect over-counts the not-yet-used slots for commit charge
purposes and breaks tightly-memory-constrained environments (global
commit limit or cgroup) and (2) when all slots are initially available
as they are now, repeated free/malloc cycles for the same size will
round-robin all the slots, touching them all.
Here, property (2) is of course desirable for hardening at moderate to
high usage, but at low usage UAF tends to be less of a concern
(because you don't have complex data structures with complex lifetimes
if you hardly have any malloc).
Note also that (2) could be solved without addressing (1) just by
skipping the protection aspect of this idea and only using the
available-slot-limiting part.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-16 0:29 ` Rich Felker
2020-05-16 3:29 ` Rich Felker
@ 2020-05-18 18:35 ` Rich Felker
1 sibling, 0 replies; 12+ messages in thread
From: Rich Felker @ 2020-05-18 18:35 UTC (permalink / raw)
To: musl
On Fri, May 15, 2020 at 08:29:13PM -0400, Rich Felker wrote:
> On Sun, May 10, 2020 at 02:09:34PM -0400, Rich Felker wrote:
> > 4668: 2x5440 2x5440 2x5440 2x5440 2x5440 5x4672 5x4672 5x4672 5x4672 5x4672 5x4672 7x4672 ...
>
> This turns out to be just about the worst edge case we have, and in a
> sense one that's fundamental. Sadly there are a number of
> applications, including bash, that do a lot of malloc(4096). The ones
> that just allocate and don't have any complex malloc/free patterns
> will see somewhat higher usage with mallocng, and I don't think
> there's any way around that. (Note: oldmalloc also has problems here
> under certain patterns of alloc/free, due to bin_index vs bin_index_up
> discrepancy!)
Actually oldmalloc "cheats" for the exact case mentioned here, 4096,
because it's an exact size class boundary where
bin_index(4096)==bin_index_up(4096). For any other size (e.g. 4080 or
4097) the catastrophic non-reusable-chunk thing will happen.
This actually suggests that, if trying to improve/salvage a dlmalloc
type design, it would be useful to round up to size classes before
allocating, and to avoid trimming down to exact size. Of course this
would use considerably more memory initially (just like mallocng uses
more than oldmalloc) for straight bump allocation with no free, but it
should give less fragmentation and more-stable usage over time. Of
course there might be new failure modes too when doing that,
especially since split/merge won't preserve the property of being
exact size classes.
An interesting experiment would be to hack up oldmalloc to round up to
size classes, then compare memory usage with mallocng. I suspect we'd
then find mallocng almost always uses less memory.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-17 3:30 ` Rich Felker
@ 2020-05-18 18:53 ` Rich Felker
2020-05-25 15:45 ` Pirmin Walthert
0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-05-18 18:53 UTC (permalink / raw)
To: musl
On Sat, May 16, 2020 at 11:30:25PM -0400, Rich Felker wrote:
> Another alternative for avoiding eagar commit at low usage, which
> works for all but nommu: when adding groups with nontrivial slot count
> at low usage, don't activate all the slots right away. Reserve vm
> space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
> and treat it as a group of just 1 slot until there are no slots free
> and one is needed. Then, unprotect another page (or more if needed to
> fit another slot, as would be needed at larger sizes) and adjust the
> slot count to match. (Conceptually; implementation-wise, the slot
> count would be fixed, and there would just be a limit on the number of
> slots made avilable when transformed from "freed" to "available" for
> activation.)
>
> Note that this is what happens anyway with physical memory as clean
> anonymous pages are first touched, but (1) doing it without explicit
> unprotect over-counts the not-yet-used slots for commit charge
> purposes and breaks tightly-memory-constrained environments (global
> commit limit or cgroup) and (2) when all slots are initially available
> as they are now, repeated free/malloc cycles for the same size will
> round-robin all the slots, touching them all.
>
> Here, property (2) is of course desirable for hardening at moderate to
> high usage, but at low usage UAF tends to be less of a concern
> (because you don't have complex data structures with complex lifetimes
> if you hardly have any malloc).
>
> Note also that (2) could be solved without addressing (1) just by
> skipping the protection aspect of this idea and only using the
> available-slot-limiting part.
One abstract way of thinking about the above is that it's just a
per-size-class bump allocator, pre-reserving enough virtual address
space to end sufficiently close to a page boundary that there's no
significant memory waste. This is actually fairly elegant, and might
obsolete some of the other measures taken to avoid overly eagar
allocation. So this might be a worthwhile direction to pursue.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-18 18:53 ` Rich Felker
@ 2020-05-25 15:45 ` Pirmin Walthert
2020-05-25 17:54 ` Rich Felker
0 siblings, 1 reply; 12+ messages in thread
From: Pirmin Walthert @ 2020-05-25 15:45 UTC (permalink / raw)
To: musl
Am 18.05.20 um 20:53 schrieb Rich Felker:
> On Sat, May 16, 2020 at 11:30:25PM -0400, Rich Felker wrote:
>> Another alternative for avoiding eagar commit at low usage, which
>> works for all but nommu: when adding groups with nontrivial slot count
>> at low usage, don't activate all the slots right away. Reserve vm
>> space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
>> and treat it as a group of just 1 slot until there are no slots free
>> and one is needed. Then, unprotect another page (or more if needed to
>> fit another slot, as would be needed at larger sizes) and adjust the
>> slot count to match. (Conceptually; implementation-wise, the slot
>> count would be fixed, and there would just be a limit on the number of
>> slots made avilable when transformed from "freed" to "available" for
>> activation.)
>>
>> Note that this is what happens anyway with physical memory as clean
>> anonymous pages are first touched, but (1) doing it without explicit
>> unprotect over-counts the not-yet-used slots for commit charge
>> purposes and breaks tightly-memory-constrained environments (global
>> commit limit or cgroup) and (2) when all slots are initially available
>> as they are now, repeated free/malloc cycles for the same size will
>> round-robin all the slots, touching them all.
>>
>> Here, property (2) is of course desirable for hardening at moderate to
>> high usage, but at low usage UAF tends to be less of a concern
>> (because you don't have complex data structures with complex lifetimes
>> if you hardly have any malloc).
>> c
>> Note also that (2) could be solved without addressing (1) just by
>> skipping the protection aspect of this idea and only using the
>> available-slot-limiting part.
> One abstract way of thinking about the above is that it's just a
> per-size-class bump allocator, pre-reserving enough virtual address
> space to end sufficiently close to a page boundary that there's no
> significant memory waste. This is actually fairly elegant, and might
> obsolete some of the other measures taken to avoid overly eagar
> allocation. So this might be a worthwhile direction to pursue.
Dear Rich,
Currently we use mallocng in production for most applications in our
"embedded like" virtualised system setups, it even helped to find some
bugs (for example in asterisk) as mallocng was less forgiving than the
old malloc implementation. So if you're interested in real world
feedback: everything seems to be running quite smoothly so far, thanks
for this great work.
Currently we use the git version of April 24th, so the version before
you merged the huge optimization changes. As you mentioned in your
"brainstorming mails", if I got them right, that you might rethink a few
of these changes, I'd like to ask: do you think it would be better to
use the current git-master version rather than the version of April 24th
(we are not THAT memory constrained, so stability is the most important
thing) or do you think it would be better to stick on the old version
and wait for the next changes to be merged?
Best regards,
Pirmin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-25 15:45 ` Pirmin Walthert
@ 2020-05-25 17:54 ` Rich Felker
2020-05-25 18:13 ` Pirmin Walthert
0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2020-05-25 17:54 UTC (permalink / raw)
To: Pirmin Walthert; +Cc: musl
On Mon, May 25, 2020 at 05:45:33PM +0200, Pirmin Walthert wrote:
> Am 18.05.20 um 20:53 schrieb Rich Felker:
>
> >On Sat, May 16, 2020 at 11:30:25PM -0400, Rich Felker wrote:
> >>Another alternative for avoiding eagar commit at low usage, which
> >>works for all but nommu: when adding groups with nontrivial slot count
> >>at low usage, don't activate all the slots right away. Reserve vm
> >>space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
> >>and treat it as a group of just 1 slot until there are no slots free
> >>and one is needed. Then, unprotect another page (or more if needed to
> >>fit another slot, as would be needed at larger sizes) and adjust the
> >>slot count to match. (Conceptually; implementation-wise, the slot
> >>count would be fixed, and there would just be a limit on the number of
> >>slots made avilable when transformed from "freed" to "available" for
> >>activation.)
> >>
> >>Note that this is what happens anyway with physical memory as clean
> >>anonymous pages are first touched, but (1) doing it without explicit
> >>unprotect over-counts the not-yet-used slots for commit charge
> >>purposes and breaks tightly-memory-constrained environments (global
> >>commit limit or cgroup) and (2) when all slots are initially available
> >>as they are now, repeated free/malloc cycles for the same size will
> >>round-robin all the slots, touching them all.
> >>
> >>Here, property (2) is of course desirable for hardening at moderate to
> >>high usage, but at low usage UAF tends to be less of a concern
> >>(because you don't have complex data structures with complex lifetimes
> >>if you hardly have any malloc).
> >>c
> >>Note also that (2) could be solved without addressing (1) just by
> >>skipping the protection aspect of this idea and only using the
> >>available-slot-limiting part.
> >One abstract way of thinking about the above is that it's just a
> >per-size-class bump allocator, pre-reserving enough virtual address
> >space to end sufficiently close to a page boundary that there's no
> >significant memory waste. This is actually fairly elegant, and might
> >obsolete some of the other measures taken to avoid overly eagar
> >allocation. So this might be a worthwhile direction to pursue.
>
> Dear Rich,
>
> Currently we use mallocng in production for most applications in our
> "embedded like" virtualised system setups, it even helped to find
> some bugs (for example in asterisk) as mallocng was less forgiving
> than the old malloc implementation. So if you're interested in real
> world feedback: everything seems to be running quite smoothly so
> far, thanks for this great work.
>
> Currently we use the git version of April 24th, so the version
> before you merged the huge optimization changes. As you mentioned in
> your "brainstorming mails", if I got them right, that you might
> rethink a few of these changes, I'd like to ask: do you think it
> would be better to use the current git-master version rather than
> the version of April 24th (we are not THAT memory constrained, so
> stability is the most important thing) or do you think it would be
> better to stick on the old version and wait for the next changes to
> be merged?
Thanks for the feedback!
Which are the "huge optimization changes" you're wondering about?
Indeed there's a large series of commits after the version you're
using but I think you're possibly misattributing them.
A number of the commits are bug fixes -- mostly not for hard bugs, but
for unwanted and unintended behaviors:
a709dde fix unexpected allocation of 7x144 group in non-power-of-two slot
dda5a88 fix exact size tracking in memalign
915a914 adjust several size classes to fix nested groups of non-power-of-2 size
7acd61e allow in-place realloc when ideal size class is off-by-one
caca917 add support for aligned_alloc alignments 1M and over
There were also quite a few around an idea that didn't go well and was
mostly reverted, but with major improvements to the original behavior:
5bff93c overhaul bounce counter to work with map sizes instead of size classes
71262cd tune bounce counter to avoid triggering early
9601aaa prevent overflow of unmap counter part of bounce counter
aca1f32 don't let the mmap cache limit grow unboundedly or overflow
6fbee31 second partial overhaul of bounce counter system
150de6e revert from map cache to old okay_to_free scheme, but improved
1e972da initial conversion of bounce counting to use sequence numbers, decay
e3eecb1 factor bounce/sequence counter logic into meta.h
6693738 account seq for individually-mmapped allocations above hard threshold
4443f64 fix complete regression (malloc always fails) on variable-pagesize archs
If you don't care about low usage, that whole change series is fairly
unimportant, but should be harmless. It just changes decisions about
choices where either choice produces as valid state for the allocator
but there are tradeoffs between memory usage and performance. The new
behavior should be better, though.
A few commits were reordering the dependency between memalign and the
standard memalign-variant functions, which is a minor namespace
detail:
da4c88e rename aligned_alloc.c
04407f7 reverse dependency order of memalign and aligned_alloc
74e6657 rename aligned_alloc source file back to its proper name
c990cb1 rename memalign source file back to its proper name
A couple were hardening:
5bf4e92 clear group header pointer to meta when freeing groups
bd04c75 in get_meta, check offset against maplen (minor hardening)
77cea57 add support for allocating meta areas via legacy brk
And pretty much all the rest of the changes are tuning behavior for
"optimization" of some sort or another, which may be what you were
referring to:
26143c4 limit slot count growth to 25% instead of 50% of current usage in class
a9187f0 remove unnecessary optimization tuning flags from Makefile CFLAGS
045cc04 move coarse size classing logic to malloc fast path
8348a82 eliminate med_twos_tab
e619034 allow slot count 1 for size classes 3 mod 4 as natural/first-class
c9d54f4 activate coarse size classing for small classes down to 4 (but not 6)
44092d8 improve individual-mmap decision
d355eaf remove slot count reduction to 1 for size classes 1 mod 3
c555ebe fix off-by-one in logic to use single-slot groups
9d5ec34 switch from MADV_DONTNEED to MADV_FREE for large free slots
584c7aa avoid over-use of reduced-count groups due to coarse size classing
f9bfb0a increase threshold for 3->2 slot reduction to 16 pages
20da09e disable coarse size classing for large classes (over 8k)
I don't think any of these changes are potentially obsoleted by
further ideas in the above thread. I am working on delaying activation
of slots until they're actually needed, so that we don't dirty pages
we could avoid touching, but I proposed this as an alternative to
other more complex tricks that I didn't really like, which have not
been implemented and probably won't be now.
So, in summary, I don't see any good reason not to go with latest.
Rich
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-25 17:54 ` Rich Felker
@ 2020-05-25 18:13 ` Pirmin Walthert
2020-06-04 7:04 ` Pirmin Walthert
0 siblings, 1 reply; 12+ messages in thread
From: Pirmin Walthert @ 2020-05-25 18:13 UTC (permalink / raw)
To: Rich Felker; +Cc: musl
Am 25.05.20 um 19:54 schrieb Rich Felker:
> On Mon, May 25, 2020 at 05:45:33PM +0200, Pirmin Walthert wrote:
>> Am 18.05.20 um 20:53 schrieb Rich Felker:
>>
>>> On Sat, May 16, 2020 at 11:30:25PM -0400, Rich Felker wrote:
>>>> Another alternative for avoiding eagar commit at low usage, which
>>>> works for all but nommu: when adding groups with nontrivial slot count
>>>> at low usage, don't activate all the slots right away. Reserve vm
>>>> space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
>>>> and treat it as a group of just 1 slot until there are no slots free
>>>> and one is needed. Then, unprotect another page (or more if needed to
>>>> fit another slot, as would be needed at larger sizes) and adjust the
>>>> slot count to match. (Conceptually; implementation-wise, the slot
>>>> count would be fixed, and there would just be a limit on the number of
>>>> slots made avilable when transformed from "freed" to "available" for
>>>> activation.)
>>>>
>>>> Note that this is what happens anyway with physical memory as clean
>>>> anonymous pages are first touched, but (1) doing it without explicit
>>>> unprotect over-counts the not-yet-used slots for commit charge
>>>> purposes and breaks tightly-memory-constrained environments (global
>>>> commit limit or cgroup) and (2) when all slots are initially available
>>>> as they are now, repeated free/malloc cycles for the same size will
>>>> round-robin all the slots, touching them all.
>>>>
>>>> Here, property (2) is of course desirable for hardening at moderate to
>>>> high usage, but at low usage UAF tends to be less of a concern
>>>> (because you don't have complex data structures with complex lifetimes
>>>> if you hardly have any malloc).
>>>> c
>>>> Note also that (2) could be solved without addressing (1) just by
>>>> skipping the protection aspect of this idea and only using the
>>>> available-slot-limiting part.
>>> One abstract way of thinking about the above is that it's just a
>>> per-size-class bump allocator, pre-reserving enough virtual address
>>> space to end sufficiently close to a page boundary that there's no
>>> significant memory waste. This is actually fairly elegant, and might
>>> obsolete some of the other measures taken to avoid overly eagar
>>> allocation. So this might be a worthwhile direction to pursue.
>> Dear Rich,
>>
>> Currently we use mallocng in production for most applications in our
>> "embedded like" virtualised system setups, it even helped to find
>> some bugs (for example in asterisk) as mallocng was less forgiving
>> than the old malloc implementation. So if you're interested in real
>> world feedback: everything seems to be running quite smoothly so
>> far, thanks for this great work.
>>
>> Currently we use the git version of April 24th, so the version
>> before you merged the huge optimization changes. As you mentioned in
>> your "brainstorming mails", if I got them right, that you might
>> rethink a few of these changes, I'd like to ask: do you think it
>> would be better to use the current git-master version rather than
>> the version of April 24th (we are not THAT memory constrained, so
>> stability is the most important thing) or do you think it would be
>> better to stick on the old version and wait for the next changes to
>> be merged?
> Thanks for the feedback!
>
> Which are the "huge optimization changes" you're wondering about?
> Indeed there's a large series of commits after the version you're
> using but I think you're possibly misattributing them.
>
> A number of the commits are bug fixes -- mostly not for hard bugs, but
> for unwanted and unintended behaviors:
>
> a709dde fix unexpected allocation of 7x144 group in non-power-of-two slot
> dda5a88 fix exact size tracking in memalign
> 915a914 adjust several size classes to fix nested groups of non-power-of-2 size
> 7acd61e allow in-place realloc when ideal size class is off-by-one
> caca917 add support for aligned_alloc alignments 1M and over
>
> There were also quite a few around an idea that didn't go well and was
> mostly reverted, but with major improvements to the original behavior:
>
> 5bff93c overhaul bounce counter to work with map sizes instead of size classes
> 71262cd tune bounce counter to avoid triggering early
> 9601aaa prevent overflow of unmap counter part of bounce counter
> aca1f32 don't let the mmap cache limit grow unboundedly or overflow
> 6fbee31 second partial overhaul of bounce counter system
> 150de6e revert from map cache to old okay_to_free scheme, but improved
> 1e972da initial conversion of bounce counting to use sequence numbers, decay
> e3eecb1 factor bounce/sequence counter logic into meta.h
> 6693738 account seq for individually-mmapped allocations above hard threshold
> 4443f64 fix complete regression (malloc always fails) on variable-pagesize archs
>
> If you don't care about low usage, that whole change series is fairly
> unimportant, but should be harmless. It just changes decisions about
> choices where either choice produces as valid state for the allocator
> but there are tradeoffs between memory usage and performance. The new
> behavior should be better, though.
>
> A few commits were reordering the dependency between memalign and the
> standard memalign-variant functions, which is a minor namespace
> detail:
>
> da4c88e rename aligned_alloc.c
> 04407f7 reverse dependency order of memalign and aligned_alloc
> 74e6657 rename aligned_alloc source file back to its proper name
> c990cb1 rename memalign source file back to its proper name
>
> A couple were hardening:
>
> 5bf4e92 clear group header pointer to meta when freeing groups
> bd04c75 in get_meta, check offset against maplen (minor hardening)
> 77cea57 add support for allocating meta areas via legacy brk
>
> And pretty much all the rest of the changes are tuning behavior for
> "optimization" of some sort or another, which may be what you were
> referring to:
>
> 26143c4 limit slot count growth to 25% instead of 50% of current usage in class
> a9187f0 remove unnecessary optimization tuning flags from Makefile CFLAGS
> 045cc04 move coarse size classing logic to malloc fast path
> 8348a82 eliminate med_twos_tab
> e619034 allow slot count 1 for size classes 3 mod 4 as natural/first-class
> c9d54f4 activate coarse size classing for small classes down to 4 (but not 6)
> 44092d8 improve individual-mmap decision
> d355eaf remove slot count reduction to 1 for size classes 1 mod 3
> c555ebe fix off-by-one in logic to use single-slot groups
> 9d5ec34 switch from MADV_DONTNEED to MADV_FREE for large free slots
> 584c7aa avoid over-use of reduced-count groups due to coarse size classing
> f9bfb0a increase threshold for 3->2 slot reduction to 16 pages
> 20da09e disable coarse size classing for large classes (over 8k)
>
> I don't think any of these changes are potentially obsoleted by
> further ideas in the above thread. I am working on delaying activation
> of slots until they're actually needed, so that we don't dirty pages
> we could avoid touching, but I proposed this as an alternative to
> other more complex tricks that I didn't really like, which have not
> been implemented and probably won't be now.
>
> So, in summary, I don't see any good reason not to go with latest.
>
> Rich
Many thanks for your detailed answer. I'll give it a try then!
Pirmin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-05-25 18:13 ` Pirmin Walthert
@ 2020-06-04 7:04 ` Pirmin Walthert
2020-06-04 15:11 ` Rich Felker
0 siblings, 1 reply; 12+ messages in thread
From: Pirmin Walthert @ 2020-06-04 7:04 UTC (permalink / raw)
To: musl
Am 25.05.20 um 20:13 schrieb Pirmin Walthert:
> Am 25.05.20 um 19:54 schrieb Rich Felker:
>
>> On Mon, May 25, 2020 at 05:45:33PM +0200, Pirmin Walthert wrote:
>>> Am 18.05.20 um 20:53 schrieb Rich Felker:
>>>
>>>> On Sat, May 16, 2020 at 11:30:25PM -0400, Rich Felker wrote:
>>>>> Another alternative for avoiding eagar commit at low usage, which
>>>>> works for all but nommu: when adding groups with nontrivial slot
>>>>> count
>>>>> at low usage, don't activate all the slots right away. Reserve vm
>>>>> space for 7 slots for a 7x4672, but only unprotect the first 2 pages,
>>>>> and treat it as a group of just 1 slot until there are no slots free
>>>>> and one is needed. Then, unprotect another page (or more if needed to
>>>>> fit another slot, as would be needed at larger sizes) and adjust the
>>>>> slot count to match. (Conceptually; implementation-wise, the slot
>>>>> count would be fixed, and there would just be a limit on the
>>>>> number of
>>>>> slots made avilable when transformed from "freed" to "available" for
>>>>> activation.)
>>>>>
>>>>> Note that this is what happens anyway with physical memory as clean
>>>>> anonymous pages are first touched, but (1) doing it without explicit
>>>>> unprotect over-counts the not-yet-used slots for commit charge
>>>>> purposes and breaks tightly-memory-constrained environments (global
>>>>> commit limit or cgroup) and (2) when all slots are initially
>>>>> available
>>>>> as they are now, repeated free/malloc cycles for the same size will
>>>>> round-robin all the slots, touching them all.
>>>>>
>>>>> Here, property (2) is of course desirable for hardening at
>>>>> moderate to
>>>>> high usage, but at low usage UAF tends to be less of a concern
>>>>> (because you don't have complex data structures with complex
>>>>> lifetimes
>>>>> if you hardly have any malloc).
>>>>> c
>>>>> Note also that (2) could be solved without addressing (1) just by
>>>>> skipping the protection aspect of this idea and only using the
>>>>> available-slot-limiting part.
>>>> One abstract way of thinking about the above is that it's just a
>>>> per-size-class bump allocator, pre-reserving enough virtual address
>>>> space to end sufficiently close to a page boundary that there's no
>>>> significant memory waste. This is actually fairly elegant, and might
>>>> obsolete some of the other measures taken to avoid overly eagar
>>>> allocation. So this might be a worthwhile direction to pursue.
>>> Dear Rich,
>>>
>>> Currently we use mallocng in production for most applications in our
>>> "embedded like" virtualised system setups, it even helped to find
>>> some bugs (for example in asterisk) as mallocng was less forgiving
>>> than the old malloc implementation. So if you're interested in real
>>> world feedback: everything seems to be running quite smoothly so
>>> far, thanks for this great work.
>>>
>>> Currently we use the git version of April 24th, so the version
>>> before you merged the huge optimization changes. As you mentioned in
>>> your "brainstorming mails", if I got them right, that you might
>>> rethink a few of these changes, I'd like to ask: do you think it
>>> would be better to use the current git-master version rather than
>>> the version of April 24th (we are not THAT memory constrained, so
>>> stability is the most important thing) or do you think it would be
>>> better to stick on the old version and wait for the next changes to
>>> be merged?
>> Thanks for the feedback!
>>
>> Which are the "huge optimization changes" you're wondering about?
>> Indeed there's a large series of commits after the version you're
>> using but I think you're possibly misattributing them.
>>
>> A number of the commits are bug fixes -- mostly not for hard bugs, but
>> for unwanted and unintended behaviors:
>>
>> a709dde fix unexpected allocation of 7x144 group in non-power-of-two
>> slot
>> dda5a88 fix exact size tracking in memalign
>> 915a914 adjust several size classes to fix nested groups of
>> non-power-of-2 size
>> 7acd61e allow in-place realloc when ideal size class is off-by-one
>> caca917 add support for aligned_alloc alignments 1M and over
>>
>> There were also quite a few around an idea that didn't go well and was
>> mostly reverted, but with major improvements to the original behavior:
>>
>> 5bff93c overhaul bounce counter to work with map sizes instead of
>> size classes
>> 71262cd tune bounce counter to avoid triggering early
>> 9601aaa prevent overflow of unmap counter part of bounce counter
>> aca1f32 don't let the mmap cache limit grow unboundedly or overflow
>> 6fbee31 second partial overhaul of bounce counter system
>> 150de6e revert from map cache to old okay_to_free scheme, but improved
>> 1e972da initial conversion of bounce counting to use sequence
>> numbers, decay
>> e3eecb1 factor bounce/sequence counter logic into meta.h
>> 6693738 account seq for individually-mmapped allocations above hard
>> threshold
>> 4443f64 fix complete regression (malloc always fails) on
>> variable-pagesize archs
>>
>> If you don't care about low usage, that whole change series is fairly
>> unimportant, but should be harmless. It just changes decisions about
>> choices where either choice produces as valid state for the allocator
>> but there are tradeoffs between memory usage and performance. The new
>> behavior should be better, though.
>>
>> A few commits were reordering the dependency between memalign and the
>> standard memalign-variant functions, which is a minor namespace
>> detail:
>>
>> da4c88e rename aligned_alloc.c
>> 04407f7 reverse dependency order of memalign and aligned_alloc
>> 74e6657 rename aligned_alloc source file back to its proper name
>> c990cb1 rename memalign source file back to its proper name
>>
>> A couple were hardening:
>>
>> 5bf4e92 clear group header pointer to meta when freeing groups
>> bd04c75 in get_meta, check offset against maplen (minor hardening)
>> 77cea57 add support for allocating meta areas via legacy brk
>>
>> And pretty much all the rest of the changes are tuning behavior for
>> "optimization" of some sort or another, which may be what you were
>> referring to:
>>
>> 26143c4 limit slot count growth to 25% instead of 50% of current
>> usage in class
>> a9187f0 remove unnecessary optimization tuning flags from Makefile
>> CFLAGS
>> 045cc04 move coarse size classing logic to malloc fast path
>> 8348a82 eliminate med_twos_tab
>> e619034 allow slot count 1 for size classes 3 mod 4 as
>> natural/first-class
>> c9d54f4 activate coarse size classing for small classes down to 4
>> (but not 6)
>> 44092d8 improve individual-mmap decision
>> d355eaf remove slot count reduction to 1 for size classes 1 mod 3
>> c555ebe fix off-by-one in logic to use single-slot groups
>> 9d5ec34 switch from MADV_DONTNEED to MADV_FREE for large free slots
>> 584c7aa avoid over-use of reduced-count groups due to coarse size
>> classing
>> f9bfb0a increase threshold for 3->2 slot reduction to 16 pages
>> 20da09e disable coarse size classing for large classes (over 8k)
>>
>> I don't think any of these changes are potentially obsoleted by
>> further ideas in the above thread. I am working on delaying activation
>> of slots until they're actually needed, so that we don't dirty pages
>> we could avoid touching, but I proposed this as an alternative to
>> other more complex tricks that I didn't really like, which have not
>> been implemented and probably won't be now.
>>
>> So, in summary, I don't see any good reason not to go with latest.
>>
>> Rich
>
> Many thanks for your detailed answer. I'll give it a try then!
>
> Pirmin
>
FYI: everything working without any issues so far on about 150 systems
(using the most current version from 27th of May). Even got around an
OOM issue during DOS that could be reproduced with the old malloc
implementation (maybe because of the fragmentation issue?). Server
software running on these systems: lighttpd, php-fpm 7.3, asterisk 16,
slapd, isc-dhcpd, dropbear
Pirmin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [musl] mallocng progress and growth chart
2020-06-04 7:04 ` Pirmin Walthert
@ 2020-06-04 15:11 ` Rich Felker
0 siblings, 0 replies; 12+ messages in thread
From: Rich Felker @ 2020-06-04 15:11 UTC (permalink / raw)
To: musl
On Thu, Jun 04, 2020 at 09:04:47AM +0200, Pirmin Walthert wrote:
> FYI: everything working without any issues so far on about 150
> systems (using the most current version from 27th of May). Even got
> around an OOM issue during DOS that could be reproduced with the old
> malloc implementation (maybe because of the fragmentation issue?).
> Server software running on these systems: lighttpd, php-fpm 7.3,
> asterisk 16, slapd, isc-dhcpd, dropbear
Thanks for the update! That's really good to hear.
Rich
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-06-04 15:11 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-10 18:09 [musl] mallocng progress and growth chart Rich Felker
2020-05-11 17:51 ` Rich Felker
2020-05-16 0:29 ` Rich Felker
2020-05-16 3:29 ` Rich Felker
2020-05-17 3:30 ` Rich Felker
2020-05-18 18:53 ` Rich Felker
2020-05-25 15:45 ` Pirmin Walthert
2020-05-25 17:54 ` Rich Felker
2020-05-25 18:13 ` Pirmin Walthert
2020-06-04 7:04 ` Pirmin Walthert
2020-06-04 15:11 ` Rich Felker
2020-05-18 18:35 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).