From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11660
Path: news.gmane.org!.POSTED!not-for-mail
From: Alexander Monakov <amonakov@ispras.ru>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: [PATCH] optimize malloc0
Date: Wed, 5 Jul 2017 02:09:21 +0300 (MSK)
Message-ID: <alpine.LNX.2.20.13.1707050139170.21060@monopod.intra.ispras.ru>
References: <20170626214339.10942-1-amonakov@ispras.ru> <20170704214554.GS1627@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Trace: blaine.gmane.org 1499209778 8603 195.159.176.226 (4 Jul 2017 23:09:38 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Tue, 4 Jul 2017 23:09:38 +0000 (UTC)
User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14)
To: musl@lists.openwall.com
Original-X-From: musl-return-11673-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 05 01:09:34 2017
Return-path: <musl-return-11673-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.84_2)
	(envelope-from <musl-return-11673-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1dSWwn-00020h-TD
	for gllmg-musl@m.gmane.org; Wed, 05 Jul 2017 01:09:33 +0200
Original-Received: (qmail 7794 invoked by uid 550); 4 Jul 2017 23:09:37 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 7773 invoked from network); 4 Jul 2017 23:09:37 -0000
In-Reply-To: <20170704214554.GS1627@brightrain.aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:11660
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/11660>

On Tue, 4 Jul 2017, Rich Felker wrote:
> Overall I like this. Reviewing what was discussed on IRC, I called the
> loop logic clever and nsz said maybe a bit too clever. On further
> reading I think he's right.

Somehow raising this point in the context of the rest of src/malloc seems
even worse than common bikeshed.

> One additional concern was that the reverse-scanning may be bad for
> performance.

Or it might be good for performance, because:

a) the caller is likely to use the lower addresses, in which case the
   reverse scan is more likely to leave relevant lines in L1$

b) switching directions corresponds to switching access patterns:
   reverse for reading, forward (in memset) for writing, and that
   may help hardware more than it hurts

c) at least on intel cpus hardware prefetcher doesn't cross 4K boundaries
   anyway, so discontiguous access on memset->scan transitions shouldn't
   matter there

d) in practice the most frequent calls are probably less-than-pagesize,
   and the patch handles those in the most efficient way

> A cheap way to avoid the scanning logic for the first and last partial
> page, while not complicating the loop logic, would be just writing a
> nonzero value to the first byte of each before the loop.

Nonsense.

This patch handles the common case (less than 4K) in the most efficient
way, strikes a good size/speed tradeoff for the rest, and makes the
mal0_clear interface such that it can be moved to a separate translation
unit (to assist non-'--gc-sections' static linking, if desired) with
minimal penalty.

I can rewrite it fully forward scanning without much trouble, but I
think it wouldn't be for the better.

Alexander