From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 Received: (qmail 10870 invoked from network); 5 Apr 2020 02:20:39 -0000 Received-SPF: pass (mother.openwall.net: domain of lists.openwall.com designates 195.42.179.200 as permitted sender) receiver=inbox.vuxu.org; client-ip=195.42.179.200 envelope-from= Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with UTF8ESMTPZ; 5 Apr 2020 02:20:39 -0000 Received: (qmail 5898 invoked by uid 550); 5 Apr 2020 02:20:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5880 invoked from network); 5 Apr 2020 02:20:35 -0000 Date: Sat, 4 Apr 2020 22:20:23 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200405022023.GI11469@brightrain.aerifal.cx> References: <20200403213110.GD11469@brightrain.aerifal.cx> <20200404025554.GG11469@brightrain.aerifal.cx> <20200404181948.GH11469@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200404181948.GH11469@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] New malloc tuning for low usage On Sat, Apr 04, 2020 at 02:19:48PM -0400, Rich Felker wrote: > On Fri, Apr 03, 2020 at 10:55:54PM -0400, Rich Felker wrote: > > In working on this, I noticed that it looks like the coarse size class > > threshold (6) in top-level malloc() is too low. At that threshold, the > > first fine-grained-class group allocation will be roughly a 100% > > increase in memory usage by the class; I'd rather keep the relative > > increase bounded by 50% or less. It should probably be something more > > like 10 or 12 to achieve this. With 12, repeated allocations of 16k > > first produce 7 individual 20k mmaps, then a 3-slot class-37 > > (21824-byte slots) group, then a 7-slot class-36 (18704-byte slots) > > group. > > > > One thing that's not clear to me is whether it's useful at all to > > produce the 3-slot class-37 group rather than just going on making > > more individual mmaps until it's time to switch to the larger group. > > It's easy to tune things to do the latter, and seems to offer more > > flexibility in how memory is used. It also allows slightly more > > fragmentation, but the number of such objects is highly bounded to > > begin with because we use increasingly larger groups as usage goes up, > > so the contribution should be asymptotically irrelevant. > > The answer is that it depends on where the sizes fall. At 16k, > rounding up to page size produces 20k usage (5 pages) but the 3-slot > class-37 group uses 5+1/3 pages, so individual mmaps are preferable. > However if we requested 20k, individual mmaps would be 24k (6 pages) > while the 3-slot group would still just use 5+1/3 page, and would be > preferable to switch to. The condition seems to be just whether the > rounded-up-to-whole-pages request size is larger than the slot size, > and we should prefer individual mmaps if (1) it's smaller than the > slot size, or (2) using a multi-slot group would be a relative usage > increase in the class of more than 50% (or whatever threshold it ends > up being tuned to). > > I'll see if I can put together a quick implementation of this and see > how it works. This seems to be working very well with the condition: if (sc >= 35 && cnt<=3 && (size*cnt > usage/2 || ((req+20+pagesize-1) & -pagesize) <= size)) ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ at least ~16k wanted to make a smaller requested size group but hit lower cnt rounded up to limit; see loop above page <= slot size at the end of the else clause for if (sc < 8) in alloc_group. Here req is a new argument to expose the size of the actual request malloc made, so that for single-slot groups (mmap serviced allocations) we can allocate just the minimum needed rather than the nominal slot size. Rich