From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: crypt* files in crypt directory
Date: Thu, 9 Aug 2012 19:21:32 -0400 [thread overview]
Message-ID: <20120809232132.GX27715@brightrain.aerifal.cx> (raw)
In-Reply-To: <20120809115811.GA32316@port70.net>
On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > #define BF_ROUND(L, R, N) \
> > tmp1 = L & 0xFF; \
> > tmp2 = L >> 8; \
> > tmp2 &= 0xFF; \
> > tmp3 = L >> 16; \
> > tmp3 &= 0xFF; \
> > tmp4 = L >> 24; \
> > tmp1 = ctx->s.S[3][tmp1]; \
> > tmp2 = ctx->s.S[2][tmp2]; \
> > tmp3 = ctx->s.S[1][tmp3]; \
> > tmp3 += ctx->s.S[0][tmp4]; \
> > tmp3 ^= tmp2; \
> > R ^= ctx->s.P[N + 1]; \
> > tmp3 += tmp1; \
> > R ^= tmp3;
>
> i guess this is performance critical, but
> i wouldn't spread those expressions over
> several lines
>
> tmp1 = ctx->S[3][L & 0xff];
> tmp2 = ctx->S[2][L>>8 & 0xff];
> tmp3 = ctx->S[1][L>>16 & 0xff];
> tmp4 = ctx->S[0][L>>24 & 0xff];
> R ^= ctx->P[N+1];
> R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;
My first modified version to remove the manual scheduling is
significantly slower than the hand-scheduled version. I haven't tried
your version here yet, but it looks nicer and I think it would be
reasonable to compare and see if it's better.
> > do {
> > ptr += 2;
> > L ^= ctx->s.P[0];
> > BF_ROUND(L, R, 0);
> > BF_ROUND(R, L, 1);
> > BF_ROUND(L, R, 2);
> > BF_ROUND(R, L, 3);
> > BF_ROUND(L, R, 4);
> > BF_ROUND(R, L, 5);
> > BF_ROUND(L, R, 6);
> > BF_ROUND(R, L, 7);
> > BF_ROUND(L, R, 8);
> > BF_ROUND(R, L, 9);
> > BF_ROUND(L, R, 10);
> > BF_ROUND(R, L, 11);
> > BF_ROUND(L, R, 12);
> > BF_ROUND(R, L, 13);
> > BF_ROUND(L, R, 14);
> > BF_ROUND(R, L, 15);
> > tmp4 = R;
> > R = L;
> > L = tmp4 ^ ctx->s.P[BF_N + 1];
> > *(ptr - 1) = R;
> > *(ptr - 2) = L;
> > } while (ptr < end);
>
> why increase ptr at the begining?
> it seems the idiomatic way would be
>
> *ptr++ = L;
> *ptr++ = R;
For me, making this change makes it 5% faster. I suspect the
difference comes from the fact that gcc is not smart enough to move
the ptr+=2; across the rest of the loop body, and the fact that it
gets spilled to the stack and reloaded for *both* points of usage
rather than just one. The original version may perform better on
machines with A LOT more registers, but I'm doubtful...
Rich
next prev parent reply other threads:[~2012-08-09 23:21 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-21 15:23 Łukasz Sowa
2012-07-21 17:11 ` Solar Designer
2012-07-21 20:17 ` Rich Felker
2012-07-22 16:23 ` Łukasz Sowa
2012-07-25 7:57 ` Rich Felker
2012-08-08 2:24 ` Rich Felker
2012-08-08 4:42 ` Solar Designer
2012-08-08 5:28 ` Rich Felker
2012-08-08 6:27 ` Solar Designer
2012-08-08 7:03 ` Daniel Cegiełka
2012-08-08 7:24 ` Solar Designer
2012-08-08 7:42 ` Daniel Cegiełka
2012-08-08 21:48 ` Rich Felker
2012-08-08 23:08 ` Isaac Dunham
2012-08-08 23:24 ` John Spencer
2012-08-09 1:03 ` Isaac Dunham
2012-08-09 3:16 ` Rich Felker
2012-08-09 3:36 ` Solar Designer
2012-08-09 7:13 ` orc
2012-08-09 7:28 ` Rich Felker
2012-08-09 7:29 ` Solar Designer
2012-08-09 10:53 ` Solar Designer
2012-08-09 11:58 ` Szabolcs Nagy
2012-08-09 16:43 ` Solar Designer
2012-08-09 17:30 ` Szabolcs Nagy
2012-08-09 18:22 ` Rich Felker
2012-08-09 23:21 ` Rich Felker [this message]
2012-08-10 17:04 ` Solar Designer
2012-08-10 18:06 ` Rich Felker
2012-08-09 21:46 ` crypt_blowfish integration, optimization Rich Felker
2012-08-09 22:21 ` Solar Designer
2012-08-09 22:32 ` Rich Felker
2012-08-10 17:18 ` Solar Designer
2012-08-10 18:08 ` Rich Felker
2012-08-10 22:52 ` Solar Designer
2012-08-08 7:52 ` crypt* files in crypt directory Szabolcs Nagy
2012-08-08 13:06 ` Rich Felker
2012-08-08 14:30 ` orc
2012-08-08 14:53 ` Szabolcs Nagy
2012-08-08 15:05 ` orc
2012-08-08 18:10 ` Rich Felker
2012-08-09 1:51 ` Solar Designer
2012-08-09 3:25 ` Rich Felker
2012-08-09 4:04 ` Solar Designer
2012-08-09 5:48 ` Rich Felker
2012-08-09 15:52 ` Solar Designer
2012-08-09 17:59 ` Rich Felker
2012-08-09 21:17 ` Rich Felker
2012-08-09 21:44 ` Solar Designer
2012-08-09 22:08 ` Rich Felker
2012-08-09 23:33 ` Rich Felker
2012-08-09 6:03 ` Rich Felker
-- strict thread matches above, loose matches on Subject: below --
2012-07-17 9:40 Daniel Cegiełka
2012-07-17 17:51 ` Rich Felker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120809232132.GX27715@brightrain.aerifal.cx \
--to=dalias@aerifal.cx \
--cc=musl@lists.openwall.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).