From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3870 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Optimized C memcpy [updated] Date: Sun, 11 Aug 2013 07:27:54 -0400 Message-ID: <20130811112754.GZ221@brightrain.aerifal.cx> References: <20130807182123.GA17670@brightrain.aerifal.cx> <20130811051135.GW221@brightrain.aerifal.cx> <20130811062009.GX221@brightrain.aerifal.cx> <20130811081312.GY221@brightrain.aerifal.cx> <52077217.9070004@gentoo.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1376220500 13456 80.91.229.3 (11 Aug 2013 11:28:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 11 Aug 2013 11:28:20 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3874-gllmg-musl=m.gmane.org@lists.openwall.com Sun Aug 11 13:28:22 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1V8Tor-0003Ba-7r for gllmg-musl@plane.gmane.org; Sun, 11 Aug 2013 13:28:21 +0200 Original-Received: (qmail 18295 invoked by uid 550); 11 Aug 2013 11:28:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18234 invoked from network); 11 Aug 2013 11:28:07 -0000 Content-Disposition: inline In-Reply-To: <52077217.9070004@gentoo.org> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3870 Archived-At: On Sun, Aug 11, 2013 at 01:14:31PM +0200, Luca Barbato wrote: > On 11/08/13 10:13, Rich Felker wrote: > >> Unfortunately this case seems to be compiling to a call to memcpy on > >> powerpc (but nowhere else I found). So I may need to drop the special > >> case for 64-bit alignment. I wish there was some source for knowledge > >> of the cases that can trigger gcc's stupidity, though... > > > > It turns out mips at certain optimization levels is also generating a > > memcpy for the structure assignments. I think I just need to drop all > > of the structure-assignment tricks and use a mildly unrolled loop with > > uint32_t units for the aligned case. This gives much worse performance > > on ARM, where gcc fails to generate the proper ldmia/stmia without the > > struct, but we have asm we can use for ARM anyway. On other archs, the > > struct copy code does not even seem to help. The simple integer loop > > works just as well. > > > > I'll do some more experimenting and probably commit the ARM asm soon, > > followed by the C code once I get some better feedback on how it > > performs on real machines. > > What about sprinkling volatile here and there? That might help with gcc 4.8.x issues, but these are already worked around by turning off the offending optimization, and it seems like major GCC folks are considering these a bug and hoping to fix them in an upcoming version anyway. The structure assignments generating memcpy, however, are a longstanding bug and are hard-coded in the deep target-specific code generation. There is no indication that volatile would make them go away (in fact, I was unable to get it to go away using volatile) because the memcpy that's being generated is not an abstract optimization of a series of reads/writes, but being generated directly out of the signal operation of struct assignment. While I'd like to see this bug fixed, I don't see any hope of using struct assignments to implement a C memcpy without some serious configure tests to make sure it works. There are just too many combinations of optimization flags, compilers, compiler versions, etc. that would have to be checked to have any confidence in it not breaking, and so far it seems ARM (for which we can just use the asm) is the only arch that would actually benefit from it. Rich