* [musl] "Expected behavior" for mallocng @ low usage @ 2020-04-08 2:32 Rich Felker 2020-04-10 15:48 ` Rich Felker 0 siblings, 1 reply; 5+ messages in thread From: Rich Felker @ 2020-04-08 2:32 UTC (permalink / raw) To: musl I figured as I tune and prepare to integrate the new malloc, it would be helpful to have a description of what users should see in programs that make only small use of malloc, since this is the "hard case" to get right and since any reports of unexpected behavior would be really useful. For simplicity I'm going to assume 4k pages. If the program makes at least one "very small" (<= 108 bytes) allocation, or at least one allocation of certain other sizes smaller than the page size, you should expect to see a single-page mmap that's divided up something like a buddy allocator. At the top level it consists of 2 2032-byte slots, one of which will be broken into two 1008-byte slots, one of which will be broken into 2 496-byte slots. For "very small" sizes, one of these will in turn be broken up into N equal-sized slots for the requested size class (N=30, 15, 10, 7, 6, 5, or 4). If the page is fully broken up into pairs of 496-byte slots, there are 8 such slots, and only 7 "very small" size classes, so under "very low usage", all such objects should fit in the single page, even if you use a multitude of different sizes. For the next 8 size classes (2 doublings) up to 492 bytes, and depending on divisibility, a group of 2, 3, 5, or 7 slots will be created in a slot of size 496, 1008, or 2032. These can use the same page as the above smaller sizes if there's room available. Above this size, coarse size classing is used at first (until usage reaches a threshold) to avoid allocating a large number of many-slot groups of slightly different sizes that might never be filled. The next doubling consists only of ranges [493,668] and [669,1004], allocated in slots of size 2032 in groups of 3 and 2, respectively; these can use any existing free slot of size 2032. (Once usage has reached a threshold such that adding a group of 5 or 7 slots doesn't cause a dramatic relative increase in total usage, finer-grained size classes will be used.) At higher sizes, groups of slots are not allocated inside a larger slot, but as mmaps consisting of a power-of-two number of pages, which will be split N ways, initially with N=7, 3, 5, or 2 depending on divisibility. As usage increases, so does the N (doubling the number of pages used) which reduces potential for vm space fragmentation and increases the number of slots that can be allocated/freed with fast paths manipulating free masks. At present, coarse size classing is used for all these at first, which can result in significant "waste" but avoids preallocating large (5 or 7) counts of slots that might not ever be used. This is what I'm presently working to improve by allowing direct individual mmaps in cases where they can be efficient. Changes in this area are likely coming soon. Main thing I'm trying to solve still is getting eagar allocation down even further so that small programs don't grow significantly when switching to mallocng. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] "Expected behavior" for mallocng @ low usage 2020-04-08 2:32 [musl] "Expected behavior" for mallocng @ low usage Rich Felker @ 2020-04-10 15:48 ` Rich Felker 2020-04-10 17:04 ` [musl] Concrete case-by-case examples of " Rich Felker 0 siblings, 1 reply; 5+ messages in thread From: Rich Felker @ 2020-04-10 15:48 UTC (permalink / raw) To: musl On Tue, Apr 07, 2020 at 10:32:29PM -0400, Rich Felker wrote: > I figured as I tune and prepare to integrate the new malloc, it would > be helpful to have a description of what users should see in programs > that make only small use of malloc, since this is the "hard case" to > get right and since any reports of unexpected behavior would be really > useful. For simplicity I'm going to assume 4k pages. > > If the program makes at least one "very small" (<= 108 bytes) > allocation, or at least one allocation of certain other sizes smaller > than the page size, you should expect to see a single-page mmap that's > divided up something like a buddy allocator. At the top level it > consists of 2 2032-byte slots, one of which will be broken into two > 1008-byte slots, one of which will be broken into 2 496-byte slots. > For "very small" sizes, one of these will in turn be broken up into N > equal-sized slots for the requested size class (N=30, 15, 10, 7, 6, 5, > or 4). > > If the page is fully broken up into pairs of 496-byte slots, there are > 8 such slots, and only 7 "very small" size classes, so under "very low > usage", all such objects should fit in the single page, even if you > use a multitude of different sizes. > > For the next 8 size classes (2 doublings) up to 492 bytes, and > depending on divisibility, a group of 2, 3, 5, or 7 slots will be > created in a slot of size 496, 1008, or 2032. These can use the same > page as the above smaller sizes if there's room available. > > Above this size, coarse size classing is used at first (until usage > reaches a threshold) to avoid allocating a large number of many-slot > groups of slightly different sizes that might never be filled. The > next doubling consists only of ranges [493,668] and [669,1004], > allocated in slots of size 2032 in groups of 3 and 2, respectively; > these can use any existing free slot of size 2032. (Once usage has > reached a threshold such that adding a group of 5 or 7 slots doesn't > cause a dramatic relative increase in total usage, finer-grained size > classes will be used.) > > At higher sizes, groups of slots are not allocated inside a larger > slot, but as mmaps consisting of a power-of-two number of pages, which > will be split N ways, initially with N=7, 3, 5, or 2 depending on > divisibility. As usage increases, so does the N (doubling the number > of pages used) which reduces potential for vm space fragmentation and > increases the number of slots that can be allocated/freed with fast > paths manipulating free masks. > > At present, coarse size classing is used for all these at first, which > can result in significant "waste" but avoids preallocating large (5 or > 7) counts of slots that might not ever be used. This is what I'm > presently working to improve by allowing direct individual mmaps in > cases where they can be efficient. > > Changes in this area are likely coming soon. Main thing I'm trying to > solve still is getting eagar allocation down even further so that > small programs don't grow significantly when switching to mallocng. I've made some changes that should improve this significantly: https://github.com/richfelker/mallocng-draft/commits/2b676e6859ca241b62ed133fb1a8828509af29bf For the "higher sizes" above, the minimums of 7, 3, 5, or 2 are reduced to 3 or 5, 1 or 2, 3, or 1, respectively, conditional on size being sufficient that these lower minumum counts approximate an integral number of pages (the requirement that it be a power-of-two number of pages is relaxed). In particular with coarse size classing, the initial/minimal count will never be more than 2, and will always be 1 if the slot size is at least 4 pages minus epsilon. Also, individual mmap servicing of allocations in this range is now possible. This differs from "just allocating a group with count 1", in that, if the requested size was significantly less than the slot size, only the requested part (rounded up to whole pages) is mapped. Such allocations can't be kept and reused, since they might not suffice for a future allocation in the same size class. For this reason, individual mmap shouldn't be used for sizes where "unmap and remap" is likely to dominate time cost. They also shouldn't be used except a in a small bounded number of instances to avoid vm space fragmentation (and hitting vma limit). Details of how these conditions are implemented can be seen in the source. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Concrete case-by-case examples of mallocng @ low usage 2020-04-10 15:48 ` Rich Felker @ 2020-04-10 17:04 ` Rich Felker 2020-04-10 17:21 ` Rich Felker 2020-05-03 3:03 ` Rich Felker 0 siblings, 2 replies; 5+ messages in thread From: Rich Felker @ 2020-04-10 17:04 UTC (permalink / raw) To: musl The following are tables of initial allocation behavior for all sizes up through 16k assuming 4k page size. The fine classing figures won't actually be used for initial allocations, but I've included them for comparison to illustrate why coarse classes are used. These numbers were all done by hand independently of the code, so that if actual behavior differs, it indicates a bug in one or the other. First 16 size classes, no coarse classing: 0-12: 2x2032 -> 2x1008 -> 2x496 -> 30x16 13-28: 2x2032 -> 2x1008 -> 2x496 -> 15x32 29-44: 2x2032 -> 2x1008 -> 2x496 -> 10x48 45-60: 2x2032 -> 2x1008 -> 2x496 -> 7x64 61-76: 2x2032 -> 2x1008 -> 2x496 -> 6x80 77-92: 2x2032 -> 2x1008 -> 2x496 -> 5x96 93-108: 2x2032 -> 2x1008 -> 2x496 -> 4x112 109-124: 2x2032 -> 2x1008 -> 7x128 125-140: 2x2032 -> 2x1008 -> 7x144 141-156: 2x2032 -> 2x1008 -> 2x496 -> 3x160 157-188: 2x2032 -> 2x1008 -> 5x192 189-236: 2x2032 -> 2x1008 -> 2x496 -> 2x240 237-272: 2x2032 -> 7x288 273-332: 2x2032 -> 2x1008 -> 3x336 333-396: 2x2032 -> 5x400 397-492: 2x2032 -> 2x1008 -> 2x496 Coarse classing for classes 16-31: 493-668: 2x2032 -> 3x672 669-1004: 2x2032 -> 2x1008 1005-1356: 3x1360 (1 page) 1357-2028: 2x2032 (1 page) 2029-2716: 3x2720 (2 pages) 2717-4076: 1x4080 (1 page) 4077-5452: 2x5456 (3 pages) 5453-8172: 1x8176 (2 pages) Fine classing for classes 16-31: 493-572: 7x576 573-668: 2x2032 -> 3x672 669-812: 5x816 813-1004: 2x2032 -> 2x1008 1005-1164: 7x1168 (2 pages) 1165-1356: 3x1360 (1 page) 1357-1628: 5x1632 (2 pages) 1629-2028: 2x2032 (1 page) 2029-2332: 5x2336 (3 pages) 2333-2716: 3x2720 (2 pages) 2717-3260: 5x3264 (4 pages) 3261-4076: 1x4080 (1 page) 4077-4668: 5x4672 (6 pages) 4669-5452: 2x5456 (3 pages) 5453-6540: 3x6544 (5 pages) 6541-8172: 1x8176 (2 pages) Individual mmap for classes 32+: 8173-12268: 3 pages 12269-16364: 4 pages 16365-20460: 5 pages ... And for comparison to without individual mmap, fine classes: 8173-9340: 3x9344 (7 pages) 9341-10908: 1x10912 (3 pages) 10909-13084: 3x13088 (10 pages) 13085-16364: 1x16386 (4 pages) 16365-18700: 3x18704 (14 pages) 18701-21820: 1x21824 (6 pages) ... ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Concrete case-by-case examples of mallocng @ low usage 2020-04-10 17:04 ` [musl] Concrete case-by-case examples of " Rich Felker @ 2020-04-10 17:21 ` Rich Felker 2020-05-03 3:03 ` Rich Felker 1 sibling, 0 replies; 5+ messages in thread From: Rich Felker @ 2020-04-10 17:21 UTC (permalink / raw) To: musl On Fri, Apr 10, 2020 at 01:04:07PM -0400, Rich Felker wrote: > The following are tables of initial allocation behavior for all sizes > up through 16k assuming 4k page size. The fine classing figures won't > actually be used for initial allocations, but I've included them for > comparison to illustrate why coarse classes are used. These numbers > were all done by hand independently of the code, so that if actual > behavior differs, it indicates a bug in one or the other. > > > First 16 size classes, no coarse classing: > > 0-12: 2x2032 -> 2x1008 -> 2x496 -> 30x16 > 13-28: 2x2032 -> 2x1008 -> 2x496 -> 15x32 > 29-44: 2x2032 -> 2x1008 -> 2x496 -> 10x48 > 45-60: 2x2032 -> 2x1008 -> 2x496 -> 7x64 > 61-76: 2x2032 -> 2x1008 -> 2x496 -> 6x80 > 77-92: 2x2032 -> 2x1008 -> 2x496 -> 5x96 > 93-108: 2x2032 -> 2x1008 -> 2x496 -> 4x112 > 109-124: 2x2032 -> 2x1008 -> 7x128 > > 125-140: 2x2032 -> 2x1008 -> 7x144 > 141-156: 2x2032 -> 2x1008 -> 2x496 -> 3x160 > 157-188: 2x2032 -> 2x1008 -> 5x192 > 189-236: 2x2032 -> 2x1008 -> 2x496 -> 2x240 > > 237-272: 2x2032 -> 7x288 > 273-332: 2x2032 -> 2x1008 -> 3x336 These should read: 237-284: 2x2032 -> 7x288 285-332: 2x2032 -> 2x1008 -> 3x336 Not sure why I can't subtract 288-4... ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [musl] Concrete case-by-case examples of mallocng @ low usage 2020-04-10 17:04 ` [musl] Concrete case-by-case examples of " Rich Felker 2020-04-10 17:21 ` Rich Felker @ 2020-05-03 3:03 ` Rich Felker 1 sibling, 0 replies; 5+ messages in thread From: Rich Felker @ 2020-05-03 3:03 UTC (permalink / raw) To: musl On Fri, Apr 10, 2020 at 01:04:07PM -0400, Rich Felker wrote: > The following are tables of initial allocation behavior for all sizes > up through 16k assuming 4k page size. The fine classing figures won't > actually be used for initial allocations, but I've included them for > comparison to illustrate why coarse classes are used. These numbers > were all done by hand independently of the code, so that if actual > behavior differs, it indicates a bug in one or the other. > > > First 16 size classes, no coarse classing: > > 0-12: 2x2032 -> 2x1008 -> 2x496 -> 30x16 > 13-28: 2x2032 -> 2x1008 -> 2x496 -> 15x32 > 29-44: 2x2032 -> 2x1008 -> 2x496 -> 10x48 > 45-60: 2x2032 -> 2x1008 -> 2x496 -> 7x64 > 61-76: 2x2032 -> 2x1008 -> 2x496 -> 6x80 > 77-92: 2x2032 -> 2x1008 -> 2x496 -> 5x96 > 93-108: 2x2032 -> 2x1008 -> 2x496 -> 4x112 > 109-124: 2x2032 -> 2x1008 -> 7x128 > > 125-140: 2x2032 -> 2x1008 -> 7x144 This line is a math error present in the actual code. 7x144 fit in 1008, but not with room for the header of one extra unit (16b). Thus 7x144 get allocated inside a 1168 slot instead. This badly harms memory usage in small programs since the smallest 1168 group is 8k and the group may not get used for anything but satisfying a single malloc(128). For now I'll probably just make it special-case this one as 6x144 instead of 7x, but that wastes space so a better solution should be found. That might be applying coarse size classing so that this size class can start at 14x144 once it switches to fine classing. Rich ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-05-03 3:03 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-08 2:32 [musl] "Expected behavior" for mallocng @ low usage Rich Felker 2020-04-10 15:48 ` Rich Felker 2020-04-10 17:04 ` [musl] Concrete case-by-case examples of " Rich Felker 2020-04-10 17:21 ` Rich Felker 2020-05-03 3:03 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).