From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12756
Path: news.gmane.org!.POSTED!not-for-mail
From: Michael Clark <michaeljclark@mac.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: What's wrong with musl's malloc
Date: Mon, 23 Apr 2018 09:40:08 +1200
Message-ID: <0EC2E771-368F-454E-B08D-7D6A5321F598@mac.com>
References: <20180420200904.GS3094@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0 (1.0)
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: blaine.gmane.org 1524433103 10657 195.159.176.226 (22 Apr 2018 21:38:23 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sun, 22 Apr 2018 21:38:23 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-12772-gllmg-musl=m.gmane.org@lists.openwall.com Sun Apr 22 23:38:19 2018
Return-path: <musl-return-12772-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-12772-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1fAMgc-0002hZ-Ta
	for gllmg-musl@m.gmane.org; Sun, 22 Apr 2018 23:38:19 +0200
Original-Received: (qmail 2012 invoked by uid 550); 22 Apr 2018 21:40:27 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 1991 invoked from network); 22 Apr 2018 21:40:26 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mac.com; s=04042017;
	t=1524433214; bh=v/l5AxXhbRLpUMGTOh+epmwX+xegur31VQuRwfipJFM=;
	h=From:Content-type:MIME-version:Date:Subject:Message-id:To;
	b=bwfCgzA7UeWJB1NM8ov39vV9DMtjI3MtHgLlLKeYn3t3dnjHWBA9BDmvpulSkty3d
 Ef30oYDv9p/IOf3LL46taLeZGvbWPhxzo+lNkPLIqLPziyByYoDaqpTJiIZFVplcV7
 cDxlBTN/H+FghsvsF25xCj1igeKU6X4tST0CBUPc7HM26f0oua4zNKUkbVi/cDScTy
 ZeHGOdUf0NSu4DiA9sVHBRZVoQez6l6rXiRo8FxZa97zVPxaBvKTgTxnNd8122vW0A
 8f0iIesgRTjZUyv8oJ2uW8lUFRRqXkOs7FOilPKZSmsbyI77uBy5tD1g2gUjHByHaO
 lp0WN4ig6aWyQ==
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,,
 definitions=2018-04-22_09:,, signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 clxscore=1015 suspectscore=35 malwarescore=0 phishscore=0 adultscore=0
 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.0.1-1707230000 definitions=main-1804220241
In-reply-to: <20180420200904.GS3094@brightrain.aerifal.cx>
X-Mailer: iPhone Mail (15E216)
Xref: news.gmane.org gmane.linux.lib.musl.general:12756
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/12756>


> On 21/04/2018, at 8:09 AM, Rich Felker <dalias@libc.org> wrote:
>=20
> I've talked on and off about musl's malloc needing to be overhauled or
> replaced, and gaining the ability to experiment with that is one of
> the motivations for making malloc interposable/replacable. Here's a
> brief write-up of what's wrong that needs to be addressed:
>=20
>=20
> The main benefits of musl's malloc vs the standard dlmalloc algorithms
> it's based on is the fine-grained locking. As long as there are binned
> free chunks of various sizes available, threads calling malloc will
> only contend for a lock when they're requesting allocations of same or
> similar size. This works out well under artificial random loads; I'm
> not sure how much it helps on typical real-world loads.
>=20
>=20
> In any case, the fine-grained locking also created a problem I didn't
> anticipate when I designed it: when allocating memory that involves
> splitting a free chunk, or when freeing memory that will get merged
> with an adjacent free chunk, the operation as a whole is not atomic;
> rather, a large free chunk is consumed, possibly emptying the bin it
> lies in, split or merged, then returned either to the same or a
> different bin. By saying this operation is non-atomic, I mean other
> threads see the intermediate state where the large free chunk has been
> consumed but not yet returned, and when this happens, they
> unnecessarily split another free chunk or expand the heap rather than
> waiting for it to be returned. This yields bad, potentially
> catastrophic, fragmentation.
>=20
> The malloc side already partially mitigates this with the "pretrim"
> function, which is used whenever the leftover part after splitting
> would remain in the same bin it was already in. In particular his
> covers splitting small allocations off a large "top of heap" zone, but
> it doesn't help with splitting small chunks. And the free side is much
> worse -- large free zones momentarily disappear every time something
> adjacent to them is freed.
>=20
> In order to overcome this fully while maintaining the fine-grained
> locking, we'd need the ability to hold multiple locks concurrently,
> which would require a protocol for lock acquisition order to avoid
> deadlock. But there may be cheap mitigations that could be made in
> free like on the malloc side. For instance it might be possible to
> make a simpler protocol whereby we could always hold the lock for bin
> 63, thereby avoiding exposing a state where large free zones are
> unavailable. If there's a way to make this work, moving the binmap
> masking for newly-emptied bins from unbin() to unlock_bin() would
> ensure that malloc doesn't "see" a "temporary empty" state.
>=20
>=20
> In the long term, I think the whole malloc design we're using now is
> wrong, and should be replaced, but until the time comes to actually do
> that, it may make sense to try to improve some of the worst
> properties, or even to reduce or eliminate the granularity of the
> locking if it proves not to be very valuable on real-world loads.

Hi Rich,

Have you considered importing jemalloc?

It=E2=80=99s been battle tested for 12 years in FreeBSD and in many other ap=
ps. Given it=E2=80=99s bundled and used in MariaDB which is undoubtedly a de=
manding user and presumably Monty approved the choice of allocator, so it sh=
ouldn=E2=80=99t be a bad choice. I=E2=80=99d trust the allocator choice of a=
 well respected database architect.

=46rom looking at the Lockless, Inc benchmarks jemalloc has good performance=
 from very small to very large allocations. =46rom the Lockless Inc memory a=
llocator comparisons: =E2=80=9CThe disadvantage of the jemalloc allocator is=
 its memory usage. It uses power-of-two sized bins, which leads to a greatly=
 increased memory footprint compared to other allocators. This can affect re=
al-world performance due to excess cache and TLB misses.=E2=80=9D

=46rom the wiki:

- https://github.com/jemalloc/jemalloc/wiki/Background

=E2=80=9Cjemalloc is a general purpose malloc(3) implementation that emphasi=
zes fragmentation avoidance and scalable concurrency support. It is intended=
 for use as the system-provided memory allocator, as in FreeBSD's libc libra=
ry, as well as for linking into C/C++ applications. jemalloc provides many i=
ntrospection, memory management, and tuning features beyond the standard all=
ocator functionality.=E2=80=9D

It=E2=80=99s a libc allocator.

Here are some of the major users (who it=E2=80=99s fair to say must have exe=
rcised due consideration in choosing their memory allocator given there are b=
rowsers (Firefox=E2=80=99s mozjemalloc fork), several databases and caches):=


- FreeBSD
- NetBSD
- Mozilla Firefox
- Varnish
- Cassandra
- Redis
- hhvm
- MariaDB
- Aerospike
- Android

Curious how big the linked code is... assuming code size is an important con=
sideration...

Looking at the code, ... it doesn=E2=80=99t look too big... and it=E2=80=99s=
 interesting to note that the recent changes to glibc to support per thread p=
ools uses the same tcache feature name as jemalloc whose per thread pool imp=
lementation is in tcache.c

- https://sourceware.org/ml/libc-alpha/2017-01/msg00452.html
- https://sourceware.org/ml/libc-alpha/2017-01/msg00524.html

musl libc could vendor the jemalloc code with some sed scripts to modify inc=
ludes so that jemalloc fits into the musl build structure and can be easily s=
ynced with upstream. Ideally musl would want to import/vendor vs using a sub=
module so musl remains stand-alone. I assume many projects have done the sam=
e. MariaDB bundles it. Firefox forked it.

I think the primary disadvantage of jemalloc is power of 2 size bins. Non po=
wer of 2 size strides for small structures have better caching performance f=
or many workloads due to stochastic cache eviction when not using power of 2=
 strides. I=E2=80=99ve noticed that power of 2 alignments can have pathologi=
cal performance in some cases. i.e. witnessing an array with sizeof(struct) =3D=
=3D 28 or 36 perform much better than sizeof(struct) =3D=3D 32. That said mo=
st apps would be using a single malloc call for these types of performance c=
ritical arrays so it=E2=80=99s only an issue for apps that individually mall=
oc many small objects.

Worth considering.

BTW - I owe you an update on the riscv musl port. I=E2=80=99ll post one soon=
... I=E2=80=99ve been busy on the QEMU RISC-V port, whose =E2=80=98virt=E2=80=
=99 board has recently been used by Fedora and Debian distro porters for ris=
cv64 arch support (glibc based distros). Stefan O=E2=80=99Rear got MTTCG wor=
king on qemu/target/riscv so we can run SMP Linux in QEMU with reasonable pe=
rformance. Debian for example now has QEMU RISC-V builders for the riscv64 a=
rch.

- https://github.com/riscv/riscv-qemu/wiki

It will be nice to get the riscv musl support upstream. The major issue at p=
resent is ELF thread local storage (pthreads are working but GCC=E2=80=99s _=
_thread is not). The RISC-V TLS model is not documented so it requires some r=
everse engineering of the riscv support in GCC/binutils/glibc. I haven=E2=80=
=99t been able to isolate the issue and could  benefit from some help by som=
eone more knowledgable in that area (ELF TLS and architecture specific TLS m=
odels).

Regards,
Michael=