From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS
	autolearn=ham autolearn_force=no version=3.4.2
Received: (qmail 1297 invoked from network); 8 Apr 2020 02:32:44 -0000
Received-SPF:  pass (mother.openwall.net: domain of lists.openwall.com
  designates 195.42.179.200 as permitted sender)
  receiver=inbox.vuxu.org; client-ip=195.42.179.200
  envelope-from=<musl-return-15663-ml=inbox.vuxu.org@lists.openwall.com>
Received: from mother.openwall.net (195.42.179.200)
  by inbox.vuxu.org with UTF8ESMTPZ; 8 Apr 2020 02:32:44 -0000
Received: (qmail 5344 invoked by uid 550); 8 Apr 2020 02:32:42 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Reply-To: musl@lists.openwall.com
Received: (qmail 5310 invoked from network); 8 Apr 2020 02:32:41 -0000
Date: Tue, 7 Apr 2020 22:32:29 -0400
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Message-ID: <20200408023229.GM11469@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: [musl] "Expected behavior" for mallocng @ low usage

I figured as I tune and prepare to integrate the new malloc, it would
be helpful to have a description of what users should see in programs
that make only small use of malloc, since this is the "hard case" to
get right and since any reports of unexpected behavior would be really
useful. For simplicity I'm going to assume 4k pages.

If the program makes at least one "very small" (<= 108 bytes)
allocation, or at least one allocation of certain other sizes smaller
than the page size, you should expect to see a single-page mmap that's
divided up something like a buddy allocator. At the top level it
consists of 2 2032-byte slots, one of which will be broken into two
1008-byte slots, one of which will be broken into 2 496-byte slots.
For "very small" sizes, one of these will in turn be broken up into N
equal-sized slots for the requested size class (N=30, 15, 10, 7, 6, 5,
or 4).

If the page is fully broken up into pairs of 496-byte slots, there are
8 such slots, and only 7 "very small" size classes, so under "very low
usage", all such objects should fit in the single page, even if you
use a multitude of different sizes.

For the next 8 size classes (2 doublings) up to 492 bytes, and
depending on divisibility, a group of 2, 3, 5, or 7 slots will be
created in a slot of size 496, 1008, or 2032. These can use the same
page as the above smaller sizes if there's room available.

Above this size, coarse size classing is used at first (until usage
reaches a threshold) to avoid allocating a large number of many-slot
groups of slightly different sizes that might never be filled. The
next doubling consists only of ranges [493,668] and [669,1004],
allocated in slots of size 2032 in groups of 3 and 2, respectively;
these can use any existing free slot of size 2032. (Once usage has
reached a threshold such that adding a group of 5 or 7 slots doesn't
cause a dramatic relative increase in total usage, finer-grained size
classes will be used.)

At higher sizes, groups of slots are not allocated inside a larger
slot, but as mmaps consisting of a power-of-two number of pages, which
will be split N ways, initially with N=7, 3, 5, or 2 depending on
divisibility. As usage increases, so does the N (doubling the number
of pages used) which reduces potential for vm space fragmentation and
increases the number of slots that can be allocated/freed with fast
paths manipulating free masks.

At present, coarse size classing is used for all these at first, which
can result in significant "waste" but avoids preallocating large (5 or
7) counts of slots that might not ever be used. This is what I'm
presently working to improve by allowing direct individual mmaps in
cases where they can be efficient.

Changes in this area are likely coming soon. Main thing I'm trying to
solve still is getting eagar allocation down even further so that
small programs don't grow significantly when switching to mallocng.