From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1453
Path: news.gmane.org!not-for-mail
From: Solar Designer <solar@openwall.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: crypt* files in crypt directory
Date: Wed, 8 Aug 2012 10:27:06 +0400
Message-ID: <20120808062706.GA23135@openwall.com>
References: <alpine.LNX.2.02.1207211701001.1301@localhost.localdomain> <20120808022421.GE27715@brightrain.aerifal.cx> <20120808044235.GA22470@openwall.com> <20120808052844.GF27715@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1344407230 31196 80.91.229.3 (8 Aug 2012 06:27:10 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 8 Aug 2012 06:27:10 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-1454-gllmg-musl=m.gmane.org@lists.openwall.com Wed Aug 08 08:27:10 2012
Return-path: <musl-return-1454-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-1454-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1SyzjZ-0004Rz-7m
	for gllmg-musl@plane.gmane.org; Wed, 08 Aug 2012 08:27:09 +0200
Original-Received: (qmail 13922 invoked by uid 550); 8 Aug 2012 06:27:08 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 13914 invoked from network); 8 Aug 2012 06:27:08 -0000
Content-Disposition: inline
In-Reply-To: <20120808052844.GF27715@brightrain.aerifal.cx>
User-Agent: Mutt/1.4.2.3i
Xref: news.gmane.org gmane.linux.lib.musl.general:1453
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/1453>

On Wed, Aug 08, 2012 at 01:28:44AM -0400, Rich Felker wrote:
> On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote:
> > I see that you did this - and I think you took it too far.  The code
> > became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> > 140 c/s down to 77 c/s).  Adding -finline-functions
> > -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> > less aggressive loop unrolling results in lower speeds.
> 
> Can you compare with a more modern gcc?

I could and I might do that later, but to me the slowdown with gcc 3 is
enough reason not to make those changes in that specific way.

> > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on FX-8120
> > I get 490 c/s for the original code, 450 c/s for your code without
> > inlining/unrolling, and somehow only 430 c/s with -finline-functions
> > -funroll-loops.
> 
> Actually this is a lot closer to what I expected. I think you'll find
> similar results on 32-bit with gcc 4.6.3 too. The modern expectation
> is that manually unrolling loops will give worse performance than
> letting the compiler decide what to do. Certainly there are exceptions
> to the expected result, but on average, it's the right decision.

Per the numbers above, here the compiler's unroll is slower not only
than manual unroll, but also than non-unrolled code.

> Even if it's twice as slow, that should only be the cost of
> incrementing the (logarithmic) iteration count by one).

Yes, and I think this is significant.

> The size difference between the versions is roughly 50%

It doesn't have to be.  There are 6 instances of BF_ENCRYPT in
BF_crypt().  I am only asking you to revert to their larger form the two
that are inside BF_body.  The remaining 4 may remain as calls to a
function.  Alternatively, all 6 may be function calls, but then the
function's BF_ENCRYPT should be a fully manually unrolled one.  I am not
sure which of these options will be faster overall for typical settings
(we'd need to benchmark these at $2a$08).

> (7k vs 11.5k with -Os
> and roughly 9k vs 13.5k with -O3). Yes one can argue that the
> difference doesn't matter for one particular component they especially
> care about,

Exactly.

> but everyone cares about something different, and in the
> end the whole library ends up 50% larger if you follow that to its
> logical end.

Makes sense.

> I'd much rather stick with letting the compiler do the
> bloating-up for performance purposes if the user wants it, so that
> the choice is left to them.

Maybe you could support -DFAST_CRYPT or the like.  It could enable
forced inlining and manual unrolls in crypt_blowfish.c.

Alexander