mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: crypt* files in crypt directory
Date: Thu, 9 Aug 2012 19:21:32 -0400	[thread overview]
Message-ID: <20120809232132.GX27715@brightrain.aerifal.cx> (raw)
In-Reply-To: <20120809115811.GA32316@port70.net>

On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > #define BF_ROUND(L, R, N) \
> > 	tmp1 = L & 0xFF; \
> > 	tmp2 = L >> 8; \
> > 	tmp2 &= 0xFF; \
> > 	tmp3 = L >> 16; \
> > 	tmp3 &= 0xFF; \
> > 	tmp4 = L >> 24; \
> > 	tmp1 = ctx->s.S[3][tmp1]; \
> > 	tmp2 = ctx->s.S[2][tmp2]; \
> > 	tmp3 = ctx->s.S[1][tmp3]; \
> > 	tmp3 += ctx->s.S[0][tmp4]; \
> > 	tmp3 ^= tmp2; \
> > 	R ^= ctx->s.P[N + 1]; \
> > 	tmp3 += tmp1; \
> > 	R ^= tmp3;
> 
> i guess this is performance critical, but
> i wouldn't spread those expressions over
> several lines
> 
> tmp1 = ctx->S[3][L & 0xff];
> tmp2 = ctx->S[2][L>>8 & 0xff];
> tmp3 = ctx->S[1][L>>16 & 0xff];
> tmp4 = ctx->S[0][L>>24 & 0xff];
> R ^= ctx->P[N+1];
> R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;

My first modified version to remove the manual scheduling is
significantly slower than the hand-scheduled version. I haven't tried
your version here yet, but it looks nicer and I think it would be
reasonable to compare and see if it's better.

> > 	do {
> > 		ptr += 2;
> > 		L ^= ctx->s.P[0];
> > 		BF_ROUND(L, R, 0);
> > 		BF_ROUND(R, L, 1);
> > 		BF_ROUND(L, R, 2);
> > 		BF_ROUND(R, L, 3);
> > 		BF_ROUND(L, R, 4);
> > 		BF_ROUND(R, L, 5);
> > 		BF_ROUND(L, R, 6);
> > 		BF_ROUND(R, L, 7);
> > 		BF_ROUND(L, R, 8);
> > 		BF_ROUND(R, L, 9);
> > 		BF_ROUND(L, R, 10);
> > 		BF_ROUND(R, L, 11);
> > 		BF_ROUND(L, R, 12);
> > 		BF_ROUND(R, L, 13);
> > 		BF_ROUND(L, R, 14);
> > 		BF_ROUND(R, L, 15);
> > 		tmp4 = R;
> > 		R = L;
> > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > 		*(ptr - 1) = R;
> > 		*(ptr - 2) = L;
> > 	} while (ptr < end);
> 
> why increase ptr at the begining?
> it seems the idiomatic way would be
> 
>  *ptr++ = L;
>  *ptr++ = R;

For me, making this change makes it 5% faster. I suspect the
difference comes from the fact that gcc is not smart enough to move
the ptr+=2; across the rest of the loop body, and the fact that it
gets spilled to the stack and reloaded for *both* points of usage
rather than just one. The original version may perform better on
machines with A LOT more registers, but I'm doubtful...

Rich


  parent reply	other threads:[~2012-08-09 23:21 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-21 15:23 Łukasz Sowa
2012-07-21 17:11 ` Solar Designer
2012-07-21 20:17   ` Rich Felker
2012-07-22 16:23   ` Łukasz Sowa
2012-07-25  7:57 ` Rich Felker
2012-08-08  2:24 ` Rich Felker
2012-08-08  4:42   ` Solar Designer
2012-08-08  5:28     ` Rich Felker
2012-08-08  6:27       ` Solar Designer
2012-08-08  7:03         ` Daniel Cegiełka
2012-08-08  7:24           ` Solar Designer
2012-08-08  7:42             ` Daniel Cegiełka
2012-08-08 21:48           ` Rich Felker
2012-08-08 23:08             ` Isaac Dunham
2012-08-08 23:24               ` John Spencer
2012-08-09  1:03                 ` Isaac Dunham
2012-08-09  3:16               ` Rich Felker
2012-08-09  3:36             ` Solar Designer
2012-08-09  7:13               ` orc
2012-08-09  7:28                 ` Rich Felker
2012-08-09  7:29               ` Solar Designer
2012-08-09 10:53                 ` Solar Designer
2012-08-09 11:58                   ` Szabolcs Nagy
2012-08-09 16:43                     ` Solar Designer
2012-08-09 17:30                       ` Szabolcs Nagy
2012-08-09 18:22                       ` Rich Felker
2012-08-09 23:21                     ` Rich Felker [this message]
2012-08-10 17:04                       ` Solar Designer
2012-08-10 18:06                         ` Rich Felker
2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
2012-08-09 22:21                     ` Solar Designer
2012-08-09 22:32                       ` Rich Felker
2012-08-10 17:18                         ` Solar Designer
2012-08-10 18:08                           ` Rich Felker
2012-08-10 22:52                             ` Solar Designer
2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
2012-08-08 13:06       ` Rich Felker
2012-08-08 14:30         ` orc
2012-08-08 14:53           ` Szabolcs Nagy
2012-08-08 15:05             ` orc
2012-08-08 18:10         ` Rich Felker
2012-08-09  1:51         ` Solar Designer
2012-08-09  3:25           ` Rich Felker
2012-08-09  4:04             ` Solar Designer
2012-08-09  5:48               ` Rich Felker
2012-08-09 15:52                 ` Solar Designer
2012-08-09 17:59                   ` Rich Felker
2012-08-09 21:17                   ` Rich Felker
2012-08-09 21:44                     ` Solar Designer
2012-08-09 22:08                       ` Rich Felker
2012-08-09 23:33           ` Rich Felker
2012-08-09  6:03   ` Rich Felker
  -- strict thread matches above, loose matches on Subject: below --
2012-07-17  9:40 Daniel Cegiełka
2012-07-17 17:51 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120809232132.GX27715@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).