From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3785 Path: news.gmane.org!not-for-mail From: Harald Becker Newsgroups: gmane.linux.lib.musl.general Subject: Re: ARM memcpy post-0.9.12-release thread Date: Wed, 31 Jul 2013 06:18:58 +0200 Message-ID: <20130731061858.07c30257@ralda.gmx.de> References: <20130731022631.GA6655@brightrain.aerifal.cx> <20130731051347.7d8340ac@ralda.gmx.de> <20130731032315.GA221@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1375244350 25088 80.91.229.3 (31 Jul 2013 04:19:10 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 31 Jul 2013 04:19:10 +0000 (UTC) Cc: musl@lists.openwall.com To: Rich Felker Original-X-From: musl-return-3789-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 31 06:19:12 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V4NsW-0008Si-1e for gllmg-musl@plane.gmane.org; Wed, 31 Jul 2013 06:19:12 +0200 Original-Received: (qmail 32148 invoked by uid 550); 31 Jul 2013 04:19:11 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32140 invoked from network); 31 Jul 2013 04:19:11 -0000 In-Reply-To: <20130731032315.GA221@brightrain.aerifal.cx> X-Provags-ID: V03:K0:q5fyHUK72UdIFhzzijql4AU5n3fiATPScpesN+WBHUb47l8k/z2 uFwCluv1itHcv6MG4Z576a9LHwwxOPKScUbF1otSMloK6a3QTtingJ28nPW47iibf4jGqJj HnWjZzBg952Ouo94Qxjnk2gOthISTNI1MjgJ2kVGcVKGjbTaFkwjh0R5excXoqS9SXIKodg K9TDg4jJHJFGecdtdmJ2Q== Xref: news.gmane.org gmane.linux.lib.musl.general:3785 Archived-At: Hi Rich ! 30-07-2013 23:23 Rich Felker : > > misaligned case happens mostly due to working with strings, > > and those are usually short. Can't we consider other > > misaligned cases violation of the programmer or code > > generator? If so, I would prefer the best-attempt inline asm > > versions of code or even best attempt C code over arch > > specific asm versions ... and add > > Part of the problem discussed on #musl was that I was having to > be really careful with "best attempt C" since GCC will > _generate_ calls to memcpy for some code, even when > -ffreestanding is used. The folks on #gcc claim this is not a > bug. So, if compilers deem themselves at liberty to make this > kind of transformation, any C implementation of memcpy that's > not intentionally crippled (e.g. using volatile temps and 20x > slower than it should be) is a time-bomb that might blow up on > us with the next GCC version... I never deal with the details of this type of gcc code generation, but doesn't this only happen on small and structure copies? Structure copies which shall usually be aligned? So if they are aligned the simpler version saves code space. > This makes asm (either inline or standalone) a lot more > appealing for memcpy than it otherwise would be. Optimization is always a question of decision, which I consider the hard part of the job ... :( > > a warning for performance lose on misaligned data in > > documentation, with giving a rough percentage of this lose. > > You'd prefer video processing being 4 to 5 times slower? No, definitely not, but video processing is one of the cases I consider candidate for optimized processing. So such projects shall include an optimize version of of low level processing functions (including memcpy, but not only - candidate for library with optimized functions?). > Video typically consists of single-byte samples (planar YUV) and > operations like cropping to a non-multiple-of-4 size, motion > compensation, etc. all involve misaligned memcpy. Same goes for > image transformations in gimp, image blitting in web browsers > (not necessarily aligned to multiple-of-4 boundaries unless > you're using 32bpp), etc... You are all right, but the programmer shall know of this and consider to use appropriate functions. You can write the code for those parts which need the speed in a way, which call optimized functions. A way which usually does not conflict with gcc self inserted calls. So this self inserted calls usually hit the aligned scope, or the programmer did not behave well (not the compiler). -- Harald