From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13092 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: malloc implementation survey: omalloc Date: Mon, 30 Jul 2018 21:44:07 -0400 Message-ID: <20180731014407.GE1392@brightrain.aerifal.cx> References: <20180729192618.GA22386@voyager> <20180731004728.GD1392@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1533001336 14912 195.159.176.226 (31 Jul 2018 01:42:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 31 Jul 2018 01:42:16 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13108-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jul 31 03:42:12 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1fkJfw-0003lH-DY for gllmg-musl@m.gmane.org; Tue, 31 Jul 2018 03:42:12 +0200 Original-Received: (qmail 30678 invoked by uid 550); 31 Jul 2018 01:44:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 30657 invoked from network); 31 Jul 2018 01:44:19 -0000 Content-Disposition: inline In-Reply-To: <20180731004728.GD1392@brightrain.aerifal.cx> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13092 Archived-At: On Mon, Jul 30, 2018 at 08:47:28PM -0400, Rich Felker wrote: > On Sun, Jul 29, 2018 at 09:26:18PM +0200, Markus Wichmann wrote: > > Hi all, > > > > we discussed rewriting malloc() a while back, because, as I recall, Rich > > wasn't satisfied with the internal storage the current system is using > > (i.e. metadata is stored with the returned pointer) as well as some > > corner cases on lock contention, although fine grained locking is a nice > > feature in itself. > > > > I therefore had a look at existing malloc() algorithms, the rationale > > being that I thought malloc() to be a solved problem, so we only have to > > find the right solution. > > > > As it turns out, it appears Doug Lea was simply too successful: Many > > allocators follow his pattern in one way or another. But some systems > > buck the trend. > > > > So today I found omalloc, the allocator OpenBSD uses, in this nice > > repository: > > > > https://github.com/emeryberger/Malloc-Implementations > > I haven't looked at it in a couple years, but last I did, the OpenBSD > allocator was not practical. It used individual mmaps for even > moderately large allocations, and used guard pages with each, which > with the default Linux VMA limits puts an upper bound of 32768 on the > number of non-tiny (roughly, larger-than-page IIRC) allocations. > > It does have much stronger hardening against overflows than musl's > current malloc or any other allocator, but it seemed inferior in all > other ways. > > I'll read the rest of your description later and see if there's > anything new that's interesting. I've now read your description and it seems to agree with what I remember, except maybe replacing "individual mmaps" by some reuse of regions, and the guard page thing likely being optional (or no longer used because it was too inefficent?). Some ideas like treating different size ranges differently might be worthwhile (this should make a big impact limiting worst-case fragmentation). For instance I could see having all allocations below a certain size have to come from "pages" of 512 bytes, broken into 16-byte runs with a 32-bit atomic bitmap of allocated runs, and likewise for a few more size ranges. This is the kind of thing I've been thinking about for a while, but I haven't come up with any good approaches to estimating the cons. One thing that would probably be really useful is empirical data on what range of sizes "need to be fast" for bloated OO stuff that does rapid malloc/free cycles. Conceptually it should only be small sizes, assuming the whole allocated space is actually used, since beyond a certain size, actual use of the allocated memory (loads/stores) will dominate the time spent allocating and freeing it, but the cutoff isn't clear. Rich