From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1475
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@aerifal.cx>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: crypt* files in crypt directory
Date: Wed, 8 Aug 2012 23:16:40 -0400
Message-ID: <20120809031639.GM27715@brightrain.aerifal.cx>
References: <alpine.LNX.2.02.1207211701001.1301@localhost.localdomain>
 <20120808022421.GE27715@brightrain.aerifal.cx>
 <20120808044235.GA22470@openwall.com>
 <20120808052844.GF27715@brightrain.aerifal.cx>
 <20120808062706.GA23135@openwall.com>
 <CAPLrYETKUwjrV-R6ohPZuDZUXezSMvJM6Dzf7enitPu7gq_2yg@mail.gmail.com>
 <20120808214855.GL27715@brightrain.aerifal.cx>
 <20120808160810.731cec78@newbook>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1344482166 29911 80.91.229.3 (9 Aug 2012 03:16:06 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 9 Aug 2012 03:16:06 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-1476-gllmg-musl=m.gmane.org@lists.openwall.com Thu Aug 09 05:16:07 2012
Return-path: <musl-return-1476-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-1476-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1SzJEB-0002lk-9Y
	for gllmg-musl@plane.gmane.org; Thu, 09 Aug 2012 05:16:03 +0200
Original-Received: (qmail 3788 invoked by uid 550); 9 Aug 2012 03:16:01 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 3780 invoked from network); 9 Aug 2012 03:16:01 -0000
Content-Disposition: inline
In-Reply-To: <20120808160810.731cec78@newbook>
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.linux.lib.musl.general:1475
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/1475>

On Wed, Aug 08, 2012 at 04:08:10PM -0700, Isaac Dunham wrote:
> On Wed, 8 Aug 2012 17:48:55 -0400
> Rich Felker <dalias@aerifal.cx> wrote:
> 
> > > > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > > > forced inlining and manual unrolls in crypt_blowfish.c.
> ...
> > Unless there's a really compelling reason to do so, I'd like to avoid
> > having multiple alternative versions of the same code in a codebase.
> > It makes it so there's more combinations you have to test to be sure
> > the code works and doesn't have regressions.
> > 
> > As it stands, the code I posted with the manual unrolling removed
> > performs _better_ than the manually unrolled code with gcc 4 on x86_64
> > when optimized for speed, and it's 33% smaller when optimized for
> > size.
> 
> Per your own tests?
> I say this because the test previously mentioned shows the
> opposite:

OK, I misread the units as c=cycles and s=?? instead of c=crypts and
s=sec. But of course that doesn't make sense..

> > > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on
> > > FX-8120 I get 490 c/s for the original code, 450 c/s for your code
> > > without inlining/unrolling, and somehow only 430 c/s with
> > > -finline-functions -funroll-loops.  
> 
> that's :
> Raw	%speed	version
> 490 c/s	100%	original
> 450 c/s	92%	rich's version
> 430 c/s	88%	rich's version, unrolled by compiler
> Higher is faster.
> IE, unrolling is actually slowing your version down more.
> 
> GCC 3/x86 is getting 80% with rich's version, optimized.
> 
> Also, how much "bloat" does solar designer's proposal (unroll inside
> BF_body) add?

Source bloat, even worse than either version. It requires completely
duplicating the whole function (once unrolled, once straight). I have
no idea how much binary bloat it adds; anybody care to try it? My
principal hesitation to even go there is that it (1) makes really ugly
source bloat, and (2) perhaps cuts the binary bloat savings in half or
even worse, making the savings marginal and arguably no longer worth
the cost of the source bloat from having 2 copies of the same code.

Rich