From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/1453 Path: news.gmane.org!not-for-mail From: Solar Designer Newsgroups: gmane.linux.lib.musl.general Subject: Re: crypt* files in crypt directory Date: Wed, 8 Aug 2012 10:27:06 +0400 Message-ID: <20120808062706.GA23135@openwall.com> References: <20120808022421.GE27715@brightrain.aerifal.cx> <20120808044235.GA22470@openwall.com> <20120808052844.GF27715@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1344407230 31196 80.91.229.3 (8 Aug 2012 06:27:10 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 8 Aug 2012 06:27:10 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-1454-gllmg-musl=m.gmane.org@lists.openwall.com Wed Aug 08 08:27:10 2012 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1SyzjZ-0004Rz-7m for gllmg-musl@plane.gmane.org; Wed, 08 Aug 2012 08:27:09 +0200 Original-Received: (qmail 13922 invoked by uid 550); 8 Aug 2012 06:27:08 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 13914 invoked from network); 8 Aug 2012 06:27:08 -0000 Content-Disposition: inline In-Reply-To: <20120808052844.GF27715@brightrain.aerifal.cx> User-Agent: Mutt/1.4.2.3i Xref: news.gmane.org gmane.linux.lib.musl.general:1453 Archived-At: On Wed, Aug 08, 2012 at 01:28:44AM -0400, Rich Felker wrote: > On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote: > > I see that you did this - and I think you took it too far. The code > > became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx. > > 140 c/s down to 77 c/s). Adding -finline-functions > > -fold-unroll-all-loops regains only a fraction of the speed (112 c/s); > > less aggressive loop unrolling results in lower speeds. > > Can you compare with a more modern gcc? I could and I might do that later, but to me the slowdown with gcc 3 is enough reason not to make those changes in that specific way. > > The impact on x86-64 is less. With Ubuntu 12.04's gcc 4.6.3 on FX-8120 > > I get 490 c/s for the original code, 450 c/s for your code without > > inlining/unrolling, and somehow only 430 c/s with -finline-functions > > -funroll-loops. > > Actually this is a lot closer to what I expected. I think you'll find > similar results on 32-bit with gcc 4.6.3 too. The modern expectation > is that manually unrolling loops will give worse performance than > letting the compiler decide what to do. Certainly there are exceptions > to the expected result, but on average, it's the right decision. Per the numbers above, here the compiler's unroll is slower not only than manual unroll, but also than non-unrolled code. > Even if it's twice as slow, that should only be the cost of > incrementing the (logarithmic) iteration count by one). Yes, and I think this is significant. > The size difference between the versions is roughly 50% It doesn't have to be. There are 6 instances of BF_ENCRYPT in BF_crypt(). I am only asking you to revert to their larger form the two that are inside BF_body. The remaining 4 may remain as calls to a function. Alternatively, all 6 may be function calls, but then the function's BF_ENCRYPT should be a fully manually unrolled one. I am not sure which of these options will be faster overall for typical settings (we'd need to benchmark these at $2a$08). > (7k vs 11.5k with -Os > and roughly 9k vs 13.5k with -O3). Yes one can argue that the > difference doesn't matter for one particular component they especially > care about, Exactly. > but everyone cares about something different, and in the > end the whole library ends up 50% larger if you follow that to its > logical end. Makes sense. > I'd much rather stick with letting the compiler do the > bloating-up for performance purposes if the user wants it, so that > the choice is left to them. Maybe you could support -DFAST_CRYPT or the like. It could enable forced inlining and manual unrolls in crypt_blowfish.c. Alexander