From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3619 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Thinking about release Date: Wed, 10 Jul 2013 17:01:49 -0400 Message-ID: <20130710210149.GG29800@brightrain.aerifal.cx> References: <20130709053711.GO29800@brightrain.aerifal.cx> <1373485116.27613.40@driftwood> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1373490121 4347 80.91.229.3 (10 Jul 2013 21:02:01 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 10 Jul 2013 21:02:01 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3623-gllmg-musl=m.gmane.org@lists.openwall.com Wed Jul 10 23:02:03 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Ux1WU-0003hV-Jk for gllmg-musl@plane.gmane.org; Wed, 10 Jul 2013 23:02:02 +0200 Original-Received: (qmail 10029 invoked by uid 550); 10 Jul 2013 21:02:01 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 10010 invoked from network); 10 Jul 2013 21:02:01 -0000 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3619 Archived-At: On Thu, Jul 11, 2013 at 08:34:03AM +1200, Andre Renaud wrote: > >> What also might be worth testing is whether GCC can compete if you > >> just give it a naive loop (not the fancy pseudo-vectorized stuff > >> currently in musl) and good CFLAGS. I know on x86 I was able to beat > >> the fanciest asm strlen I could come up with simply by writing the > >> naive loop in C and unrolling it a lot. > > > > > > Duff's device! > > That was exactly my first idea too, but interestingly it turns out not > to have really added any performance improvement. Looking at the > assembler, with -O3, gcc does a pretty good job of unrolling as it is. For what it's worth, my testing showed the current memcpy code in musl and the naive "while (n--) *d++=*s++;" version performing near-identically at -O3, and both got about 20% faster with -funroll-all-loops. With -O2 or -Os, the naive version was about 5 times slower. Rich