From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1496
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@aerifal.cx>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: crypt_blowfish integration, optimization
Date: Thu, 9 Aug 2012 18:32:59 -0400
Message-ID: <20120809223258.GW27715@brightrain.aerifal.cx>
References: <20120808044235.GA22470@openwall.com>
 <20120808052844.GF27715@brightrain.aerifal.cx>
 <20120808062706.GA23135@openwall.com>
 <CAPLrYETKUwjrV-R6ohPZuDZUXezSMvJM6Dzf7enitPu7gq_2yg@mail.gmail.com>
 <20120808214855.GL27715@brightrain.aerifal.cx>
 <20120809033613.GA24926@openwall.com>
 <20120809072940.GA26288@openwall.com>
 <20120809105348.GA27361@openwall.com>
 <20120809214654.GU27715@brightrain.aerifal.cx>
 <20120809222103.GA29365@openwall.com>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1344551544 28617 80.91.229.3 (9 Aug 2012 22:32:24 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 9 Aug 2012 22:32:24 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-1497-gllmg-musl=m.gmane.org@lists.openwall.com Fri Aug 10 00:32:24 2012
Return-path: <musl-return-1497-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@plane.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-1497-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1SzbH8-0006Fy-2j
	for gllmg-musl@plane.gmane.org; Fri, 10 Aug 2012 00:32:18 +0200
Original-Received: (qmail 32421 invoked by uid 550); 9 Aug 2012 22:32:17 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 32413 invoked from network); 9 Aug 2012 22:32:16 -0000
Content-Disposition: inline
In-Reply-To: <20120809222103.GA29365@openwall.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Xref: news.gmane.org gmane.linux.lib.musl.general:1496
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/1496>

On Fri, Aug 10, 2012 at 02:21:03AM +0400, Solar Designer wrote:
> On Thu, Aug 09, 2012 at 05:46:54PM -0400, Rich Felker wrote:
> > I've taken this version and made some minimum changes based on my
> > version, mainly for integration with musl where I'm testing it. I also
> > think we've reached the final word on loop unrolling:
> > 
> > Just For Fun, I tried replacing your unrolled BF_ROUND loop with a for
> > loop and compiling with -O3 on gcc 4.6.3. After noticing the
> > performance numbers were coming out near-identical, and that the .o
> > sizes were mysteriously identical, I decided, Just For Fun, to
> > disassemble both versions with objdump and diff them. They are
> > identical. That is, modern gcc generates byte-for-byte identical code
> > with -O3 for the manually unrolled loop and the for loop.
> 
> What about -O2?
> 
> -O3 is probably not what will be used for most musl builds, is it?
> 
> Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
> "disabled" for -funroll-loops.  Why was the loop unrolled for you?

Not sure. I've found -Q --help=optimizers completely unreliable in the
past though. It only reports minimal differences between -Os, -O2, and
-O3, and trying to start with -O3 and reproduce -Os by just changing
the options that are different does not give effects even remotely
similar to -Os.

> Did you also have -funroll-loops specified explicitly?  If so, does this
> happen for normal musl builds?  I guess not?

No, I did not explicitly specify it. At present, -Os is default for
static libc and -O3 is default for shared libc. The reason for this
discrepency is that -fPIC generates a lot of size and speed bloat at
each function call, so the inlining from -O3 comes at reduced cost (it
eliminates wasteful prologue, compensating for some of the size
increase) and much greater performance benefits (again, from killing
prologue).

I've been thinking of making -O3 default across the board rather than
having different defaults for the two, which are ugly from a
build-system perspective, but some people are still against it even
though it's easy to override.

> As discussed, the problem with avoiding such hand-unrolls is that the
> compiler doesn't know just which loops are most important to unroll.

My experience has been that it tends to make good decisions overall,
and that if somebody is using -Os, they really want smallest size, not
performance.

> BTW, what speeds are you getting on your Atom?

I was clocking 0.573 seconds for one run with the 2^12 iterations on
one test, and about 4 million cycles per run with 2^4 iterations. This
is with my version of the code (essentially the same as yours;
compiled at -O3).

> How does this compare to
> the original crypt_blowfish-1.2 with asm code (both on 32-bit)?

I'll have to get the code and try it... The asm doesn't seem to have
ever been present in the code sent to the list.

Rich