From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1505 Path: news.gmane.org!not-for-mail From: Solar Designer Newsgroups: gmane.linux.lib.musl.general Subject: Re: crypt* files in crypt directory Date: Fri, 10 Aug 2012 21:04:35 +0400 Message-ID: <20120810170435.GA29839@openwall.com> References: <20120808044235.GA22470@openwall.com> <20120808052844.GF27715@brightrain.aerifal.cx> <20120808062706.GA23135@openwall.com> <20120808214855.GL27715@brightrain.aerifal.cx> <20120809033613.GA24926@openwall.com> <20120809072940.GA26288@openwall.com> <20120809105348.GA27361@openwall.com> <20120809115811.GA32316@port70.net> <20120809232132.GX27715@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1344618284 433 80.91.229.3 (10 Aug 2012 17:04:44 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 10 Aug 2012 17:04:44 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1506-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 10 19:04:44 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Szsdb-00027Z-SI for gllmg-musl@plane.gmane.org; Fri, 10 Aug 2012 19:04:40 +0200 Original-Received: (qmail 1532 invoked by uid 550); 10 Aug 2012 17:04:38 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 1524 invoked from network); 10 Aug 2012 17:04:38 -0000 Content-Disposition: inline In-Reply-To: <20120809232132.GX27715@brightrain.aerifal.cx> User-Agent: Mutt/1.4.2.3i Xref: news.gmane.org gmane.linux.lib.musl.general:1505 Archived-At: On Thu, Aug 09, 2012 at 07:21:32PM -0400, Rich Felker wrote: > On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote: > > > do { > > > ptr += 2; > > > L ^= ctx->s.P[0]; > > > BF_ROUND(L, R, 0); [...] > > > BF_ROUND(R, L, 15); > > > tmp4 = R; > > > R = L; > > > L = tmp4 ^ ctx->s.P[BF_N + 1]; > > > *(ptr - 1) = R; > > > *(ptr - 2) = L; > > > } while (ptr < end); > > > > why increase ptr at the begining? > > it seems the idiomatic way would be > > > > *ptr++ = L; > > *ptr++ = R; > > For me, making this change makes it 5% faster. I suspect the > difference comes from the fact that gcc is not smart enough to move > the ptr+=2; across the rest of the loop body, and the fact that it > gets spilled to the stack and reloaded for *both* points of usage > rather than just one. The original version may perform better on > machines with A LOT more registers, but I'm doubtful... The spilling theory makes sense to me, but it does not fully explain the 5% difference - I think it could explain a 1% difference or so. More likely there's some change in register allocation overall, not only for ptr - or something like it. Anyhow, this does not match my test results so far, for different revisions of this code. What compiler, options, architecture, CPU? As written, this code did in fact want more registers than 32-bit x86 has - it needs one more register for the context, for thread-safety introduced in crypt_blowfish as opposed to JtR. In crypt_blowfish, I addressed this by some magic in the asm code, and assumed that other common archs do have more than 8 registers. With the asm code dropped, maybe this piece of C does need to be optimized for 32-bit x86 more - although it performs as well as the asm code on CPUs newer than the original Pentium (where the asm code is a lot faster) and different than Atom (where users reported the asm code being significantly faster). Alexander