From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1497 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: crypt* files in crypt directory Date: Thu, 9 Aug 2012 19:21:32 -0400 Message-ID: <20120809232132.GX27715@brightrain.aerifal.cx> References: <20120808022421.GE27715@brightrain.aerifal.cx> <20120808044235.GA22470@openwall.com> <20120808052844.GF27715@brightrain.aerifal.cx> <20120808062706.GA23135@openwall.com> <20120808214855.GL27715@brightrain.aerifal.cx> <20120809033613.GA24926@openwall.com> <20120809072940.GA26288@openwall.com> <20120809105348.GA27361@openwall.com> <20120809115811.GA32316@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1344554451 16666 80.91.229.3 (9 Aug 2012 23:20:51 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 9 Aug 2012 23:20:51 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1498-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 10 01:20:51 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1Szc27-0000MS-Cf for gllmg-musl@plane.gmane.org; Fri, 10 Aug 2012 01:20:51 +0200 Original-Received: (qmail 20065 invoked by uid 550); 9 Aug 2012 23:20:50 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 20054 invoked from network); 9 Aug 2012 23:20:50 -0000 Content-Disposition: inline In-Reply-To: <20120809115811.GA32316@port70.net> User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:1497 Archived-At: On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote: > > #define BF_ROUND(L, R, N) \ > > tmp1 = L & 0xFF; \ > > tmp2 = L >> 8; \ > > tmp2 &= 0xFF; \ > > tmp3 = L >> 16; \ > > tmp3 &= 0xFF; \ > > tmp4 = L >> 24; \ > > tmp1 = ctx->s.S[3][tmp1]; \ > > tmp2 = ctx->s.S[2][tmp2]; \ > > tmp3 = ctx->s.S[1][tmp3]; \ > > tmp3 += ctx->s.S[0][tmp4]; \ > > tmp3 ^= tmp2; \ > > R ^= ctx->s.P[N + 1]; \ > > tmp3 += tmp1; \ > > R ^= tmp3; > > i guess this is performance critical, but > i wouldn't spread those expressions over > several lines > > tmp1 = ctx->S[3][L & 0xff]; > tmp2 = ctx->S[2][L>>8 & 0xff]; > tmp3 = ctx->S[1][L>>16 & 0xff]; > tmp4 = ctx->S[0][L>>24 & 0xff]; > R ^= ctx->P[N+1]; > R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1; My first modified version to remove the manual scheduling is significantly slower than the hand-scheduled version. I haven't tried your version here yet, but it looks nicer and I think it would be reasonable to compare and see if it's better. > > do { > > ptr += 2; > > L ^= ctx->s.P[0]; > > BF_ROUND(L, R, 0); > > BF_ROUND(R, L, 1); > > BF_ROUND(L, R, 2); > > BF_ROUND(R, L, 3); > > BF_ROUND(L, R, 4); > > BF_ROUND(R, L, 5); > > BF_ROUND(L, R, 6); > > BF_ROUND(R, L, 7); > > BF_ROUND(L, R, 8); > > BF_ROUND(R, L, 9); > > BF_ROUND(L, R, 10); > > BF_ROUND(R, L, 11); > > BF_ROUND(L, R, 12); > > BF_ROUND(R, L, 13); > > BF_ROUND(L, R, 14); > > BF_ROUND(R, L, 15); > > tmp4 = R; > > R = L; > > L = tmp4 ^ ctx->s.P[BF_N + 1]; > > *(ptr - 1) = R; > > *(ptr - 2) = L; > > } while (ptr < end); > > why increase ptr at the begining? > it seems the idiomatic way would be > > *ptr++ = L; > *ptr++ = R; For me, making this change makes it 5% faster. I suspect the difference comes from the fact that gcc is not smart enough to move the ptr+=2; across the rest of the loop body, and the fact that it gets spilled to the stack and reloaded for *both* points of usage rather than just one. The original version may perform better on machines with A LOT more registers, but I'm doubtful... Rich