From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1505
Path: news.gmane.org!not-for-mail
From: Solar Designer <solar@openwall.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: crypt* files in crypt directory
Date: Fri, 10 Aug 2012 21:04:35 +0400
Message-ID: <20120810170435.GA29839@openwall.com>
References: <20120808044235.GA22470@openwall.com> <20120808052844.GF27715@brightrain.aerifal.cx> <20120808062706.GA23135@openwall.com> <CAPLrYETKUwjrV-R6ohPZuDZUXezSMvJM6Dzf7enitPu7gq_2yg@mail.gmail.com> <20120808214855.GL27715@brightrain.aerifal.cx> <20120809033613.GA24926@openwall.com> <20120809072940.GA26288@openwall.com> <20120809105348.GA27361@openwall.com> <20120809115811.GA32316@port70.net> <20120809232132.GX27715@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1344618284 433 80.91.229.3 (10 Aug 2012 17:04:44 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Fri, 10 Aug 2012 17:04:44 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-1506-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 10 19:04:44 2012
Return-path: <musl-return-1506-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-1506-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1Szsdb-00027Z-SI
	for gllmg-musl@plane.gmane.org; Fri, 10 Aug 2012 19:04:40 +0200
Original-Received: (qmail 1532 invoked by uid 550); 10 Aug 2012 17:04:38 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 1524 invoked from network); 10 Aug 2012 17:04:38 -0000
Content-Disposition: inline
In-Reply-To: <20120809232132.GX27715@brightrain.aerifal.cx>
User-Agent: Mutt/1.4.2.3i
Xref: news.gmane.org gmane.linux.lib.musl.general:1505
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/1505>

On Thu, Aug 09, 2012 at 07:21:32PM -0400, Rich Felker wrote:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > > 	do {
> > > 		ptr += 2;
> > > 		L ^= ctx->s.P[0];
> > > 		BF_ROUND(L, R, 0);
[...]
> > > 		BF_ROUND(R, L, 15);
> > > 		tmp4 = R;
> > > 		R = L;
> > > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > > 		*(ptr - 1) = R;
> > > 		*(ptr - 2) = L;
> > > 	} while (ptr < end);
> > 
> > why increase ptr at the begining?
> > it seems the idiomatic way would be
> > 
> >  *ptr++ = L;
> >  *ptr++ = R;
> 
> For me, making this change makes it 5% faster. I suspect the
> difference comes from the fact that gcc is not smart enough to move
> the ptr+=2; across the rest of the loop body, and the fact that it
> gets spilled to the stack and reloaded for *both* points of usage
> rather than just one. The original version may perform better on
> machines with A LOT more registers, but I'm doubtful...

The spilling theory makes sense to me, but it does not fully explain the
5% difference - I think it could explain a 1% difference or so.  More
likely there's some change in register allocation overall, not only for
ptr - or something like it.

Anyhow, this does not match my test results so far, for different
revisions of this code.  What compiler, options, architecture, CPU?

As written, this code did in fact want more registers than 32-bit x86
has - it needs one more register for the context, for thread-safety
introduced in crypt_blowfish as opposed to JtR.  In crypt_blowfish, I
addressed this by some magic in the asm code, and assumed that other
common archs do have more than 8 registers.  With the asm code dropped,
maybe this piece of C does need to be optimized for 32-bit x86 more -
although it performs as well as the asm code on CPUs newer than the
original Pentium (where the asm code is a lot faster) and different than
Atom (where users reported the asm code being significantly faster).

Alexander