mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Solar Designer <solar@openwall.com>
To: musl@lists.openwall.com
Subject: Re: crypt* files in crypt directory
Date: Fri, 10 Aug 2012 21:04:35 +0400	[thread overview]
Message-ID: <20120810170435.GA29839@openwall.com> (raw)
In-Reply-To: <20120809232132.GX27715@brightrain.aerifal.cx>

On Thu, Aug 09, 2012 at 07:21:32PM -0400, Rich Felker wrote:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > > 	do {
> > > 		ptr += 2;
> > > 		L ^= ctx->s.P[0];
> > > 		BF_ROUND(L, R, 0);
[...]
> > > 		BF_ROUND(R, L, 15);
> > > 		tmp4 = R;
> > > 		R = L;
> > > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > > 		*(ptr - 1) = R;
> > > 		*(ptr - 2) = L;
> > > 	} while (ptr < end);
> > 
> > why increase ptr at the begining?
> > it seems the idiomatic way would be
> > 
> >  *ptr++ = L;
> >  *ptr++ = R;
> 
> For me, making this change makes it 5% faster. I suspect the
> difference comes from the fact that gcc is not smart enough to move
> the ptr+=2; across the rest of the loop body, and the fact that it
> gets spilled to the stack and reloaded for *both* points of usage
> rather than just one. The original version may perform better on
> machines with A LOT more registers, but I'm doubtful...

The spilling theory makes sense to me, but it does not fully explain the
5% difference - I think it could explain a 1% difference or so.  More
likely there's some change in register allocation overall, not only for
ptr - or something like it.

Anyhow, this does not match my test results so far, for different
revisions of this code.  What compiler, options, architecture, CPU?

As written, this code did in fact want more registers than 32-bit x86
has - it needs one more register for the context, for thread-safety
introduced in crypt_blowfish as opposed to JtR.  In crypt_blowfish, I
addressed this by some magic in the asm code, and assumed that other
common archs do have more than 8 registers.  With the asm code dropped,
maybe this piece of C does need to be optimized for 32-bit x86 more -
although it performs as well as the asm code on CPUs newer than the
original Pentium (where the asm code is a lot faster) and different than
Atom (where users reported the asm code being significantly faster).

Alexander


  reply	other threads:[~2012-08-10 17:04 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-21 15:23 Łukasz Sowa
2012-07-21 17:11 ` Solar Designer
2012-07-21 20:17   ` Rich Felker
2012-07-22 16:23   ` Łukasz Sowa
2012-07-25  7:57 ` Rich Felker
2012-08-08  2:24 ` Rich Felker
2012-08-08  4:42   ` Solar Designer
2012-08-08  5:28     ` Rich Felker
2012-08-08  6:27       ` Solar Designer
2012-08-08  7:03         ` Daniel Cegiełka
2012-08-08  7:24           ` Solar Designer
2012-08-08  7:42             ` Daniel Cegiełka
2012-08-08 21:48           ` Rich Felker
2012-08-08 23:08             ` Isaac Dunham
2012-08-08 23:24               ` John Spencer
2012-08-09  1:03                 ` Isaac Dunham
2012-08-09  3:16               ` Rich Felker
2012-08-09  3:36             ` Solar Designer
2012-08-09  7:13               ` orc
2012-08-09  7:28                 ` Rich Felker
2012-08-09  7:29               ` Solar Designer
2012-08-09 10:53                 ` Solar Designer
2012-08-09 11:58                   ` Szabolcs Nagy
2012-08-09 16:43                     ` Solar Designer
2012-08-09 17:30                       ` Szabolcs Nagy
2012-08-09 18:22                       ` Rich Felker
2012-08-09 23:21                     ` Rich Felker
2012-08-10 17:04                       ` Solar Designer [this message]
2012-08-10 18:06                         ` Rich Felker
2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
2012-08-09 22:21                     ` Solar Designer
2012-08-09 22:32                       ` Rich Felker
2012-08-10 17:18                         ` Solar Designer
2012-08-10 18:08                           ` Rich Felker
2012-08-10 22:52                             ` Solar Designer
2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
2012-08-08 13:06       ` Rich Felker
2012-08-08 14:30         ` orc
2012-08-08 14:53           ` Szabolcs Nagy
2012-08-08 15:05             ` orc
2012-08-08 18:10         ` Rich Felker
2012-08-09  1:51         ` Solar Designer
2012-08-09  3:25           ` Rich Felker
2012-08-09  4:04             ` Solar Designer
2012-08-09  5:48               ` Rich Felker
2012-08-09 15:52                 ` Solar Designer
2012-08-09 17:59                   ` Rich Felker
2012-08-09 21:17                   ` Rich Felker
2012-08-09 21:44                     ` Solar Designer
2012-08-09 22:08                       ` Rich Felker
2012-08-09 23:33           ` Rich Felker
2012-08-09  6:03   ` Rich Felker
  -- strict thread matches above, loose matches on Subject: below --
2012-07-17  9:40 Daniel Cegiełka
2012-07-17 17:51 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120810170435.GA29839@openwall.com \
    --to=solar@openwall.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).