From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3784 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: ARM memcpy post-0.9.12-release thread Date: Tue, 30 Jul 2013 23:23:15 -0400 Message-ID: <20130731032315.GA221@brightrain.aerifal.cx> References: <20130731022631.GA6655@brightrain.aerifal.cx> <20130731051347.7d8340ac@ralda.gmx.de> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1375241008 30125 80.91.229.3 (31 Jul 2013 03:23:28 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 31 Jul 2013 03:23:28 +0000 (UTC) Cc: musl@lists.openwall.com To: Harald Becker Original-X-From: musl-return-3788-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 31 05:23:31 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V4N0c-0005GR-R6 for gllmg-musl@plane.gmane.org; Wed, 31 Jul 2013 05:23:30 +0200 Original-Received: (qmail 32703 invoked by uid 550); 31 Jul 2013 03:23:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 32695 invoked from network); 31 Jul 2013 03:23:29 -0000 Content-Disposition: inline In-Reply-To: <20130731051347.7d8340ac@ralda.gmx.de> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3784 Archived-At: On Wed, Jul 31, 2013 at 05:13:47AM +0200, Harald Becker wrote: > Hi Rich ! > > 30-07-2013 22:26 Rich Felker : > > > Some rough times (128k copy repeated 10000 times): > > > > Aligned case: > > Current C code: 1.2s > > My best-attempt C code: 0.75s > > My best-attempt inline asm: 0.57s > > Bionic asm: 0.63s > > Bionic asm without prefetch: 0.57s > > > > Misaligned case: > > Current C code: 4.7s > > My best-attempt inline asm: 2.9s > > Bionic asm: 1.1s > > I like to throw in a question, as my cent to this topic: > > Does modern C Compiler not try to align all data types? So > following this path in most cases aligned data structures are > used and copying them around usually hit the aligned case. The Yes but these are small anyway and the compiler will be generating inline code to copy them with ldmia/stmia. > misaligned case happens mostly due to working with strings, and > those are usually short. Can't we consider other misaligned cases > violation of the programmer or code generator? If so, I would > prefer the best-attempt inline asm versions of code or even > best attempt C code over arch specific asm versions ... and add Part of the problem discussed on #musl was that I was having to be really careful with "best attempt C" since GCC will _generate_ calls to memcpy for some code, even when -ffreestanding is used. The folks on #gcc claim this is not a bug. So, if compilers deem themselves at liberty to make this kind of transformation, any C implementation of memcpy that's not intentionally crippled (e.g. using volatile temps and 20x slower than it should be) is a time-bomb that might blow up on us with the next GCC version... This makes asm (either inline or standalone) a lot more appealing for memcpy than it otherwise would be. > a warning for performance lose on misaligned data in > documentation, with giving a rough percentage of this lose. You'd prefer video processing being 4 to 5 times slower? Video typically consists of single-byte samples (planar YUV) and operations like cropping to a non-multiple-of-4 size, motion compensation, etc. all involve misaligned memcpy. Same goes for image transformations in gimp, image blitting in web browsers (not necessarily aligned to multiple-of-4 boundaries unless you're using 32bpp), etc... Rich