From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3593
Path: news.gmane.org!not-for-mail
From: Andre Renaud <andre@bluewatersys.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Thinking about release
Date: Wed, 10 Jul 2013 09:28:21 +1200
Message-ID: <CAPfzE3ZTxynUeJjq7KWijZGhsV==NymW4vqLhnQbEYCXRxVf-g@mail.gmail.com>
References: <20130613012517.GA5859@brightrain.aerifal.cx>
	<CAPfzE3a0h=2NFqgnBqXj3J2q7VgYjqZ19Ab=0LAe5u5SvWXHaA@mail.gmail.com>
	<20130613014314.GC29800@brightrain.aerifal.cx>
	<CAPfzE3aerGrdmTkj15o0CTVtt8TZpTyAnSAj1Joau+Jb_cNGUA@mail.gmail.com>
	<20130709053711.GO29800@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Trace: ger.gmane.org 1373405314 24724 80.91.229.3 (9 Jul 2013 21:28:34 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Tue, 9 Jul 2013 21:28:34 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-3597-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jul 09 23:28:36 2013
Return-path: <musl-return-3597-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-3597-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1UwfSc-0000In-Bq
	for gllmg-musl@plane.gmane.org; Tue, 09 Jul 2013 23:28:34 +0200
Original-Received: (qmail 3528 invoked by uid 550); 9 Jul 2013 21:28:33 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 3517 invoked from network); 9 Jul 2013 21:28:33 -0000
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20120113;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:x-gm-message-state;
        bh=8YPCghjfoaLT/HYPPNC4I1jaMA3FukfeThrTXPgOrW4=;
        b=UX0VmexPR0zxPjKytIqmguPnUbyvkZ5dFTKfjb/qud4jKuI43M7AVq29QWGyZquAQ6
         y+k3YgbHk4rdIKqmYikqtnvJ2OloxPbecwLLmDjfDaWEPZwKLgjC+AGXvNUibskGwIuy
         +M31WS8R7t7XkJb7+vrsvvwV66RtNnNFZYaR/JstjGoTOxgP+qe4L4tZsoTnubCgaYCw
         jlqM0C/mLzYFrkWW/rrtbGOh7IzwwW/bMvu8M6UKHSzFbCwLZordTA01a63R9YggYazY
         O1ubQjUze3mYypsk+zaJ3Sl29f/AZTmVo9TahFp0rcYZWvOrIXrhByW/SffUVnLJAkay
         WtvA==
X-Received: by 10.58.152.3 with SMTP id uu3mr13005466veb.16.1373405301928;
 Tue, 09 Jul 2013 14:28:21 -0700 (PDT)
In-Reply-To: <20130709053711.GO29800@brightrain.aerifal.cx>
X-Gm-Message-State: ALoCoQnAxozzrVnCqs/73qgrnPzpRcQVVOEtGaZ0TMjB4DWnd8RMd5oNx9tbbh1wQkA6mW68OXDF
Xref: news.gmane.org gmane.linux.lib.musl.general:3593
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/3593>

Hi Rich,

> I think that's a reasonable place to begin. I do mildly question the
> relevance of memmove to performance, so if we end up having to do a
> lot of review or changes to get the asm committed, it might make sense
> to leave memmove for later.

I wasn't too sure on memmove, but I've seen a reasonable amount of
code which just uses memmove as standard (rather than memcpy), to
avoid the possibility of overlapping regions. Not a great policy, but
still. I'm fine with dropping it at this stage.

> At first glance, this looks like a clear improvement, but have you
> compared it to much more naive optimizations? My _general_ experience
> with optimized memcpy asm that's complex like this and that goes out
> of its way to deal explicitly with cache lines and such is that it's
> no faster than just naively moving large blocks at a time. Of course
> this may or may not be the case for ARM, but I'd like to know if
> you've done any tests.
>
> The basic principle in my mind here is that a complex solution is not
> necessarily wrong if it's a big win in other ways, but that a complex
> solution which is at most 1-2% faster than a much simpler solution is
> probably not the best choice.

Certainly if there was a more straight forward C implementation that
achieved similar results that would be superior. However the existing
musl C memcpy code is already optimised to some degree (doing 32-bit
rather than 8-bit copies), and it is difficult to convince gcc to use
the load-multiple & store-multiple instructions via C code I've found,
without resorting to pretty horrible C code. It may still be
preferable to the assembler though. At this stage I haven't
benchmarked this - I'll see if I can come up with something.

> It's an open question whether it's better to sync something like this
> with an 'upstream' or adapt it to musl coding conventions. Generally
> musl uses explicit instructions rather than pseudo-instructions/macros
> for prologue and epilogue, and does not use named labels.

Given that most of the other systems do some form of compile time
optimisations (which we're trying to avoid), and that these are not
functions that see a lot of code churn, I don't think it's too bad to
have it adapted to musl's style. I haven't really done that so far.

>> Does anyone have any comments on the suitability of this code, or what
>
> If nothing else, it fails to be armv4 compatible. Fixing that should
> not be hard, but it would require a bit of an audit. The return
> sequences are the obvious issue, but there may be other instructions
> in use that are not available on armv4 or maybe not even on armv5...?

Rob Landley mentioned a while ago that armv4 has issues with the EABI
stuff. Is armv4 a definite lower bound for musl support, as opposed to
armv4t or armv5?

Regards,
Andre