From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3785
Path: news.gmane.org!not-for-mail
From: Harald Becker <ralda@gmx.de>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: ARM memcpy post-0.9.12-release thread
Date: Wed, 31 Jul 2013 06:18:58 +0200
Message-ID: <20130731061858.07c30257@ralda.gmx.de>
References: <20130731022631.GA6655@brightrain.aerifal.cx>
	<20130731051347.7d8340ac@ralda.gmx.de>
	<20130731032315.GA221@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1375244350 25088 80.91.229.3 (31 Jul 2013 04:19:10 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Wed, 31 Jul 2013 04:19:10 +0000 (UTC)
Cc: musl@lists.openwall.com
To: Rich Felker <dalias@aerifal.cx>
Original-X-From: musl-return-3789-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 31 06:19:12 2013
Return-path: <musl-return-3789-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-3789-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1V4NsW-0008Si-1e
	for gllmg-musl@plane.gmane.org; Wed, 31 Jul 2013 06:19:12 +0200
Original-Received: (qmail 32148 invoked by uid 550); 31 Jul 2013 04:19:11 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 32140 invoked from network); 31 Jul 2013 04:19:11 -0000
In-Reply-To: <20130731032315.GA221@brightrain.aerifal.cx>
X-Provags-ID: V03:K0:q5fyHUK72UdIFhzzijql4AU5n3fiATPScpesN+WBHUb47l8k/z2
 uFwCluv1itHcv6MG4Z576a9LHwwxOPKScUbF1otSMloK6a3QTtingJ28nPW47iibf4jGqJj
 HnWjZzBg952Ouo94Qxjnk2gOthISTNI1MjgJ2kVGcVKGjbTaFkwjh0R5excXoqS9SXIKodg
 K9TDg4jJHJFGecdtdmJ2Q==
Xref: news.gmane.org gmane.linux.lib.musl.general:3785
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/3785>

Hi Rich !

30-07-2013 23:23 Rich Felker <dalias@aerifal.cx>:

> > misaligned case happens mostly due to working with strings,
> > and those are usually short. Can't we consider other
> > misaligned cases violation of the programmer or code
> > generator? If so, I would prefer the best-attempt inline asm
> > versions of code or even best attempt C code over arch
> > specific asm versions ... and add
> 
> Part of the problem discussed on #musl was that I was having to
> be really careful with "best attempt C" since GCC will
> _generate_ calls to memcpy for some code, even when
> -ffreestanding is used. The folks on #gcc claim this is not a
> bug. So, if compilers deem themselves at liberty to make this
> kind of transformation, any C implementation of memcpy that's
> not intentionally crippled (e.g. using volatile temps and 20x
> slower than it should be) is a time-bomb that might blow up on
> us with the next GCC version...

I never deal with the details of this type of gcc code
generation, but doesn't this only happen on small and structure
copies? Structure copies which shall usually be aligned? So if
they are aligned the simpler version saves code space.

> This makes asm (either inline or standalone) a lot more
> appealing for memcpy than it otherwise would be.

Optimization is always a question of decision, which I consider
the hard part of the job ... :(
 
> > a warning for performance lose on misaligned data in
> > documentation, with giving a rough percentage of this lose.
> 
> You'd prefer video processing being 4 to 5 times slower?

No, definitely not, but video processing is one of the cases I
consider candidate for optimized processing. So such projects
shall include an optimize version of of low level processing
functions (including memcpy, but not only - candidate for
library with optimized functions?). 

> Video typically consists of single-byte samples (planar YUV) and
> operations like cropping to a non-multiple-of-4 size, motion
> compensation, etc. all involve misaligned memcpy. Same goes for
> image transformations in gimp, image blitting in web browsers
> (not necessarily aligned to multiple-of-4 boundaries unless
> you're using 32bpp), etc...

You are all right, but the programmer shall know of this and
consider to use appropriate functions. You can write the code for
those parts which need the speed in a way, which call optimized
functions. A way which usually does not conflict with gcc self
inserted calls. So this self inserted calls usually hit the
aligned scope, or the programmer did not behave well (not the
compiler).

--
Harald