From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3633 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Fri, 12 Jul 2013 00:16:09 -0400 Message-ID: <20130712041609.GV29800@brightrain.aerifal.cx> References: <20130711033754.GL29800@brightrain.aerifal.cx> <20130711124613.GO29800@brightrain.aerifal.cx> <20130712031615.GS29800@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1373602580 13650 80.91.229.3 (12 Jul 2013 04:16:20 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 12 Jul 2013 04:16:20 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3637-gllmg-musl=m.gmane.org@lists.openwall.com Fri Jul 12 06:16:22 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UxUmM-0007R3-0u for gllmg-musl@plane.gmane.org; Fri, 12 Jul 2013 06:16:22 +0200 Original-Received: (qmail 18306 invoked by uid 550); 12 Jul 2013 04:16:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 18298 invoked from network); 12 Jul 2013 04:16:21 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3633 Archived-At: On Fri, Jul 12, 2013 at 03:36:42PM +1200, Andre Renaud wrote: > > I was unable to measure any difference in performance of your version > > with the prefetch hack versus simply: > > > > __asm__ __volatile__( > > "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" > > "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t" > > : "+r"(d), "+r"(s) : > > : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "memory"); > > What kind of machine were you using? I see a change of 115MB/s -> It's a combined ARM Cortex-A9 & FPGA chip from Xilinx. Supposedly the timings match the Cortex-A9 in other ARM chips. > 105MB/s when I drop the prefetch, even using the code that you > suggested. This is on an Atmel AT91sam9g45 (ARM926ejs @ 400MHz). I'm > assuming this is some subtlety about how the cache is operating? Perhaps so. By the way, I also did some tests with misaligning the src/dest with respect to cache lines. and the timing did change, but not in any way I could make sense of... It may turn out to be that the issues are sufficiently complex that we won't get ideal performance without either copying the BSD code you suggested or fully understanding what it's doing, and other ARM performance issues, and developing something new based on that understanding... In that case copying/adapting the BSD code might turn out to be the right solution for now. Rich