From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12752
Path: news.gmane.org!.POSTED!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: What's wrong with musl's malloc
Date: Sat, 21 Apr 2018 01:28:12 -0400
Message-ID: <20180421052812.GU3094@brightrain.aerifal.cx>
References: <20180420200904.GS3094@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: blaine.gmane.org 1524288384 28822 195.159.176.226 (21 Apr 2018 05:26:24 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sat, 21 Apr 2018 05:26:24 +0000 (UTC)
User-Agent: Mutt/1.5.21 (2010-09-15)
To: musl@lists.openwall.com
Original-X-From: musl-return-12768-gllmg-musl=m.gmane.org@lists.openwall.com Sat Apr 21 07:26:20 2018
Return-path: <musl-return-12768-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-12768-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1f9l2R-0007PL-0p
	for gllmg-musl@m.gmane.org; Sat, 21 Apr 2018 07:26:19 +0200
Original-Received: (qmail 23939 invoked by uid 550); 21 Apr 2018 05:28:25 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 23858 invoked from network); 21 Apr 2018 05:28:24 -0000
Content-Disposition: inline
In-Reply-To: <20180420200904.GS3094@brightrain.aerifal.cx>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:12752
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/12752>

On Fri, Apr 20, 2018 at 04:09:04PM -0400, Rich Felker wrote:
> I've talked on and off about musl's malloc needing to be overhauled or
> replaced, and gaining the ability to experiment with that is one of
> the motivations for making malloc interposable/replacable. Here's a
> brief write-up of what's wrong that needs to be addressed:
> 
> 
> The main benefits of musl's malloc vs the standard dlmalloc algorithms
> it's based on is the fine-grained locking. As long as there are binned
> free chunks of various sizes available, threads calling malloc will
> only contend for a lock when they're requesting allocations of same or
> similar size. This works out well under artificial random loads; I'm
> not sure how much it helps on typical real-world loads.
> 
> 
> In any case, the fine-grained locking also created a problem I didn't
> anticipate when I designed it: when allocating memory that involves
> splitting a free chunk, or when freeing memory that will get merged
> with an adjacent free chunk, the operation as a whole is not atomic;
> rather, a large free chunk is consumed, possibly emptying the bin it
> lies in, split or merged, then returned either to the same or a
> different bin. By saying this operation is non-atomic, I mean other
> threads see the intermediate state where the large free chunk has been
> consumed but not yet returned, and when this happens, they
> unnecessarily split another free chunk or expand the heap rather than
> waiting for it to be returned. This yields bad, potentially
> catastrophic, fragmentation.
> 
> The malloc side already partially mitigates this with the "pretrim"
> function, which is used whenever the leftover part after splitting
> would remain in the same bin it was already in. In particular his
> covers splitting small allocations off a large "top of heap" zone, but
> it doesn't help with splitting small chunks. And the free side is much
> worse -- large free zones momentarily disappear every time something
> adjacent to them is freed.
> 
> In order to overcome this fully while maintaining the fine-grained
> locking, we'd need the ability to hold multiple locks concurrently,
> which would require a protocol for lock acquisition order to avoid
> deadlock. But there may be cheap mitigations that could be made in
> free like on the malloc side. For instance it might be possible to
> make a simpler protocol whereby we could always hold the lock for bin
> 63, thereby avoiding exposing a state where large free zones are
> unavailable. If there's a way to make this work, moving the binmap
> masking for newly-emptied bins from unbin() to unlock_bin() would
> ensure that malloc doesn't "see" a "temporary empty" state.

One protocol that might work:

When locking multiple bins, they must be locked in decreasing order.

For malloc, this is the natural thing to do -- the bin you return the
excess to will be the same or smaller than the bin you allocate from.

For free, take the freelock from the beginning, rather than just
momentarily at the end. Then you know the bin sizes you'll be working
with (up to at most 3 -- self, prev, and next) and can lock them in
the protocol order, merge them, and bin the result without any
looping. The MADV_DONTNEED operation (the most expensive and
contention-generating part when it happens) can be deferred until
after the freelock and all but one of the bin locks are released.

Overall I like this because it creates an opportunity to simplify the
code. My guess is that performance would be roughly the same, or at
least not significantly worse (and probably significantly better for
single-threaded use), and the chances at catastrophic fragmentation
from contention would be gone.

I still think a complete redesign is the right way forward in the long
term.

Rich