From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [PATCH] optimize malloc0
Date: Tue, 4 Jul 2017 17:45:54 -0400 [thread overview]
Message-ID: <20170704214554.GS1627@brightrain.aerifal.cx> (raw)
In-Reply-To: <20170626214339.10942-1-amonakov@ispras.ru>
On Tue, Jun 27, 2017 at 12:43:39AM +0300, Alexander Monakov wrote:
> Implementation of __malloc0 in malloc.c takes care to preserve zero
> pages by overwriting only non-zero data. However, malloc must have
> already modified auxiliary heap data just before and beyond the
> allocated region, so we know that edge pages need not be preserved.
>
> For allocations smaller than one page, pass them immediately to memset.
> Otherwise, use memset to handle partial pages at the head and tail of
> the allocation, and scan complete pages in the interior. Optimize the
> scanning loop by processing 16 bytes per iteration and handling rest of
> page via memset as soon as a non-zero byte is found.
> ---
> A followup to a recent IRC discussion. Code size cost on x86 is about just 80
> bytes (note e.g. how mal0_clear uses memset for two purposes simultaneously,
> handling the partial page at the end, and clearing interior non-zero pages).
>
> On a Sandy Bridge CPU, speed improvement for the potentially-zero-page scanning
> loop is almost 2x on 64-bit, almost 3x on 32-bit.
>
> Note that existing implementation can over-clear by as much as sizeof(size_t)-1
> beyond the allocation, the new implementation never does that. This may expose
> application bugs that were hidden before.
>
> Alexander
>
> src/malloc/malloc.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
> index d5ee4280..720fa696 100644
> --- a/src/malloc/malloc.c
> +++ b/src/malloc/malloc.c
> @@ -366,15 +366,28 @@ void *malloc(size_t n)
> return CHUNK_TO_MEM(c);
> }
>
> +static size_t mal0_clear(char *p, size_t pagesz, size_t n)
> +{
> + typedef unsigned long long T;
> + char *pp = p + n;
> + size_t i = (uintptr_t)pp & (pagesz - 1);
> + for (;;) {
> + pp = memset(pp - i, 0, i);
> + if (pp - p < pagesz) return pp - p;
> + for (i = pagesz; i; i -= 2*sizeof(T), pp -= 2*sizeof(T))
> + if (((T *)pp)[-1] | ((T *)pp)[-2])
> + break;
> + }
> +}
> +
> void *__malloc0(size_t n)
> {
> void *p = malloc(n);
> - if (p && !IS_MMAPPED(MEM_TO_CHUNK(p))) {
> - size_t *z;
> - n = (n + sizeof *z - 1)/sizeof *z;
> - for (z=p; n; n--, z++) if (*z) *z=0;
> - }
> - return p;
> + if (!p || IS_MMAPPED(MEM_TO_CHUNK(p)))
> + return p;
> + if (n >= PAGE_SIZE)
> + n = mal0_clear(p, PAGE_SIZE, n);
> + return memset(p, 0, n);
> }
>
> void *realloc(void *p, size_t n)
> --
> 2.11.0
Overall I like this. Reviewing what was discussed on IRC, I called the
loop logic clever and nsz said maybe a bit too clever. On further
reading I think he's right. One additional concern was that the
reverse-scanning may be bad for performance. I suggested it might work
just as well to restructure the loop as "for each word, if nonzero,
memset to end of page and advance to that point". A cheap way to avoid
the scanning logic for the first and last partial page, while not
complicating the loop logic, would be just writing a nonzero value to
the first byte of each before the loop.
Rich
next prev parent reply other threads:[~2017-07-04 21:45 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-26 21:43 Alexander Monakov
2017-07-04 21:45 ` Rich Felker [this message]
2017-07-04 23:09 ` Alexander Monakov
2017-07-04 23:39 ` Rich Felker
2017-07-05 8:49 ` Szabolcs Nagy
2017-07-05 12:45 ` Rich Felker
2017-12-16 11:27 ` [PATCH v2] " Alexander Monakov
2017-07-05 13:28 ` [PATCH] " Alexander Monakov
2017-07-05 16:13 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170704214554.GS1627@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).