From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11613
Path: news.gmane.org!.POSTED!not-for-mail
From: Alexander Monakov <amonakov@ispras.ru>
Newsgroups: gmane.linux.lib.musl.general
Subject: [PATCH] optimize malloc0
Date: Tue, 27 Jun 2017 00:43:39 +0300
Message-ID: <20170626214339.10942-1-amonakov@ispras.ru>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1498513436 14998 195.159.176.226 (26 Jun 2017 21:43:56 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Mon, 26 Jun 2017 21:43:56 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-11626-gllmg-musl=m.gmane.org@lists.openwall.com Mon Jun 26 23:43:52 2017
Return-path: <musl-return-11626-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11626-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1dPbnR-0003XS-Mz
	for gllmg-musl@m.gmane.org; Mon, 26 Jun 2017 23:43:49 +0200
Original-Received: (qmail 32276 invoked by uid 550); 26 Jun 2017 21:43:52 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 32238 invoked from network); 26 Jun 2017 21:43:50 -0000
X-Mailer: git-send-email 2.11.0
Xref: news.gmane.org gmane.linux.lib.musl.general:11613
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11613>

Implementation of __malloc0 in malloc.c takes care to preserve zero
pages by overwriting only non-zero data. However, malloc must have
already modified auxiliary heap data just before and beyond the
allocated region, so we know that edge pages need not be preserved.

For allocations smaller than one page, pass them immediately to memset.
Otherwise, use memset to handle partial pages at the head and tail of
the allocation, and scan complete pages in the interior. Optimize the
scanning loop by processing 16 bytes per iteration and handling rest of
page via memset as soon as a non-zero byte is found.
---
A followup to a recent IRC discussion. Code size cost on x86 is about just 80
bytes (note e.g. how mal0_clear uses memset for two purposes simultaneously,
handling the partial page at the end, and clearing interior non-zero pages).

On a Sandy Bridge CPU, speed improvement for the potentially-zero-page scanning
loop is almost 2x on 64-bit, almost 3x on 32-bit.

Note that existing implementation can over-clear by as much as sizeof(size_t)-1
beyond the allocation, the new implementation never does that. This may expose
application bugs that were hidden before.

Alexander

 src/malloc/malloc.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
index d5ee4280..720fa696 100644
--- a/src/malloc/malloc.c
+++ b/src/malloc/malloc.c
@@ -366,15 +366,28 @@ void *malloc(size_t n)
 	return CHUNK_TO_MEM(c);
 }
 
+static size_t mal0_clear(char *p, size_t pagesz, size_t n)
+{
+	typedef unsigned long long T;
+	char *pp = p + n;
+	size_t i = (uintptr_t)pp & (pagesz - 1);
+	for (;;) {
+		pp = memset(pp - i, 0, i);
+		if (pp - p < pagesz) return pp - p;
+		for (i = pagesz; i; i -= 2*sizeof(T), pp -= 2*sizeof(T))
+		        if (((T *)pp)[-1] | ((T *)pp)[-2])
+				break;
+	}
+}
+
 void *__malloc0(size_t n)
 {
 	void *p = malloc(n);
-	if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) {
-		size_t *z;
-		n = (n + sizeof *z - 1)/sizeof *z;
-		for (z=p; n; n--, z++) if (*z) *z=0;
-	}
-	return p;
+	if (!p || IS_MMAPPED(MEM_TO_CHUNK(p)))
+		return p;
+	if (n >= PAGE_SIZE)
+		n = mal0_clear(p, PAGE_SIZE, n);
+	return memset(p, 0, n);
 }
 
 void *realloc(void *p, size_t n)
-- 
2.11.0