From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11660 Path: news.gmane.org!.POSTED!not-for-mail From: Alexander Monakov Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] optimize malloc0 Date: Wed, 5 Jul 2017 02:09:21 +0300 (MSK) Message-ID: References: <20170626214339.10942-1-amonakov@ispras.ru> <20170704214554.GS1627@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Trace: blaine.gmane.org 1499209778 8603 195.159.176.226 (4 Jul 2017 23:09:38 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 4 Jul 2017 23:09:38 +0000 (UTC) User-Agent: Alpine 2.20.13 (LNX 116 2015-12-14) To: musl@lists.openwall.com Original-X-From: musl-return-11673-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 05 01:09:34 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dSWwn-00020h-TD for gllmg-musl@m.gmane.org; Wed, 05 Jul 2017 01:09:33 +0200 Original-Received: (qmail 7794 invoked by uid 550); 4 Jul 2017 23:09:37 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 7773 invoked from network); 4 Jul 2017 23:09:37 -0000 In-Reply-To: <20170704214554.GS1627@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:11660 Archived-At: On Tue, 4 Jul 2017, Rich Felker wrote: > Overall I like this. Reviewing what was discussed on IRC, I called the > loop logic clever and nsz said maybe a bit too clever. On further > reading I think he's right. Somehow raising this point in the context of the rest of src/malloc seems even worse than common bikeshed. > One additional concern was that the reverse-scanning may be bad for > performance. Or it might be good for performance, because: a) the caller is likely to use the lower addresses, in which case the reverse scan is more likely to leave relevant lines in L1$ b) switching directions corresponds to switching access patterns: reverse for reading, forward (in memset) for writing, and that may help hardware more than it hurts c) at least on intel cpus hardware prefetcher doesn't cross 4K boundaries anyway, so discontiguous access on memset->scan transitions shouldn't matter there d) in practice the most frequent calls are probably less-than-pagesize, and the patch handles those in the most efficient way > A cheap way to avoid the scanning logic for the first and last partial > page, while not complicating the loop logic, would be just writing a > nonzero value to the first byte of each before the loop. Nonsense. This patch handles the common case (less than 4K) in the most efficient way, strikes a good size/speed tradeoff for the rest, and makes the mal0_clear interface such that it can be moved to a separate translation unit (to assist non-'--gc-sections' static linking, if desired) with minimal penalty. I can rewrite it fully forward scanning without much trouble, but I think it wouldn't be for the better. Alexander