Re: crypt* files in crypt directory

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@aerifal.cx>
To: musl@lists.openwall.com
Subject: Re: crypt* files in crypt directory
Date: Thu, 9 Aug 2012 14:22:05 -0400	[thread overview]
Message-ID: <20120809182204.GS27715@brightrain.aerifal.cx> (raw)
In-Reply-To: <20120809164355.GA28398@openwall.com>

On Thu, Aug 09, 2012 at 08:43:55PM +0400, Solar Designer wrote:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > it's scary how these crypto codes mix various
> > signed and unsigned integer types
> 
> Yes, there's room for improvement here.  I'd probably write this
> differently if I were writing it from scratch now.  Perhaps a cleanup is
> in order, similarly to what we did for the FreeSec code recently.
> Specifically, we could have a higher-level wrapper function accept char *
> like crypt() is defined to accept, but pass these to a function accepting
> unsigned char * and go with that from this point.

I actually like this idea a lot. It should go a long way to clarify
the code and make it obvious to readers that no further signed-char
sign extension bugs exist.

> > > typedef unsigned int BF_word;
> > > typedef signed int BF_word_signed;
> > 
> > i don't like these typedefs
> > it seems the code assumes 32bit unsigned BF_word
> 
> Yes.  As you suggested while we were discussed FreeSec, in musl's
> revision of the code we should use <stdint.h> types here.  I made those
> changes to that code, but not yet to crypt_blowfish code - I think Rich
> will do that.

Yes, I think this change should be made, for clarity if nothing else.
It's non-obvious reading the code that BF_word is intended to be a
fixed-size 32-bit type (i.e. that arithmetic on it is intended to be
modulo 2^32).

> > tmp1 = ctx->S[3][L & 0xff];
> > tmp2 = ctx->S[2][L>>8 & 0xff];
> > tmp3 = ctx->S[1][L>>16 & 0xff];
> > tmp4 = ctx->S[0][L>>24 & 0xff];
> > R ^= ctx->P[N+1];
> > R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;
> 
> I think this rewrite is likely to cause a slowdown with some compilers.

When we're done with everything else, I'd like to see a comparison
with my version of the code (no explicit scheduling, straightforward
expression with no temps) versus the original on modern gcc (and
possibly clang). I would expect them to perform identically, in which
case I'd choose code clarity. I like C that reads like C rather than
reading like decompiled asm.. :-)

> > > 	do {
> > > 		ptr += 2;
> > > 		L ^= ctx->s.P[0];
> > > 		BF_ROUND(L, R, 0);
> > > 		BF_ROUND(R, L, 1);
> > > 		BF_ROUND(L, R, 2);
> > > 		BF_ROUND(R, L, 3);
> > > 		BF_ROUND(L, R, 4);
> > > 		BF_ROUND(R, L, 5);
> > > 		BF_ROUND(L, R, 6);
> > > 		BF_ROUND(R, L, 7);
> > > 		BF_ROUND(L, R, 8);
> > > 		BF_ROUND(R, L, 9);
> > > 		BF_ROUND(L, R, 10);
> > > 		BF_ROUND(R, L, 11);
> > > 		BF_ROUND(L, R, 12);
> > > 		BF_ROUND(R, L, 13);
> > > 		BF_ROUND(L, R, 14);
> > > 		BF_ROUND(R, L, 15);
> > > 		tmp4 = R;
> > > 		R = L;
> > > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > > 		*(ptr - 1) = R;
> > > 		*(ptr - 2) = L;
> > > 	} while (ptr < end);
> > 
> > why increase ptr at the begining?
> 
> To hide the latency.  Indeed, a smart compiler should do this on its
> own, but not all do it well.

I suspect your version will generate worse code (both larger and
slower) on cpus like arm that can increment the index/pointer register
and dereference it at the same time.

> > > 			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
> > 
> > i don't think the BF_word_signed cast helps here
> > signed char is promoted to int and then
> > it will be converted to BF_word
> 
> Yes, I just like it explicit to show what exactly is happening, because
> it matters a lot.

The cast is not the same as what's happening implicitly.

XXX

> 
> > > 	if (setting[0] != '$' ||
> > > 	    setting[1] != '2' ||
> > > 	    setting[2] < 'a' || setting[2] > 'z' ||
> > > 	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
> > 
> > setting[2] is already range checked so
> > the casts are not necessary
> > (assuming 'a' < 'z')
> 
> Casting chars to (unsigned int)(unsigned char) for array indices is an
> idiom for me.  Now, we have a promotion to (int) due to "- 'a'", but if
> this is ever compiled with a C++ compiler, the 'a' constant would be of
> type char rather than int, IIRC.  So better safe than sorry.

No, all arithmetic and logical operators are subject to default
promotions in both C and C++. 'b'-'a' has type int in both languages
despite the fact that 'a' and 'b' have type char in C++ (and int in
C). The only possible function the (unsigned int) cast can have is
forcing promotion of another value in the expressione from int to
unsigned int.

> Yes, the range checking should have helped in this case, but I recall
> seeing weird behavior with in-range (implies non-negative) char
> subscripts on Alpha in 1990s before I started using this idiom.  IIRC,
> instructions were generated such that high bytes of the array index
> register were not cleared before the lookup.  Compiler bug?  But it was
> seen with different compilers.

I suspect you're thinking of something different. There's definitely
no way char types could be usable at all if the high bytes of
registers loaded from chars were full of random junk. This compiler
bug sounds too serious to be plausible.

> > i'd use for (;;) for infinite loops
> 
> I saw Rich do that, but I kept the infinite loops in this file I posted
> consistent with those I use elsewhere for now.  Rich can musl'ify this
> code in this way if desired.

I just did it when removing the while condition from do/while loops,
not gratuitously changing existing infinite loops. I didn't even think
of the possibility of leaving it as do/while(1) because I rarely/never
use that idiom for infinite loops. I don't care which is used though.

> > for (count = 0; count < 64; count++)
> > 
> > i assume it will be optimized to the same thing
> > (count is not used later)
> 
> I hope/expect that it will be optimized to the same thing, but I am not
> sure it'd happen with all relevant compilers all the time.  I like
> writing performance-critical code closer to the desired assembly code.

This is a simple strength reduction optimization and every gcc version
since 2.7.2 or so has done it, if I'm not mistaken. I don't care which
way it's written though; either is clear and idiomatic C.

> > if it does not fail then errno is unchanged i guess
> 
> Yes, this should be the case currently.  I just did not want to have a
> pitfall in there in case of future changes to BF_crypt() where errno
> could potentially be altered on a successful call.

In principle this could happen if arbitrary library functions are
being called, but in practice it shouldn't. Anyway EINVAL is the only
error we support here, so it could just be set just before return if
an error was encountered. Or if we return fake hashes that can never
match instead of null on error, there's no sense in ever setting
errno to begin with.

> > > 	memcpy(buf.s, test_setting, sizeof(buf.s));
> > 
> > sizeof buf.s
> > without ()
> 
> I am consistently using sizeof() with the braces.  Maybe musl's coding
> style is different.  If so, this may be changed indeed.

I don't care if it's inconsistent in code that was written outside of
musl, but the idiom I use is to only include parentheses for types.
Aside from my dislike for gratuitous parens, this is to make it clear
that we're taking the size of an object rather than the size of a
type.

> > > 	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
> > 
> > the extra (unsigned int) cast is unnecessary
> 
> Yes, we could break the idiom and rely on the promotion to int here.
> (Without such promotion or cast, IIRC, things would sometimes fail on
> Alpha.)

I smell cargo culting...

Rich

next prev parent reply	other threads:[~2012-08-09 18:22 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-21 15:23 Łukasz Sowa
2012-07-21 17:11 ` Solar Designer
2012-07-21 20:17   ` Rich Felker
2012-07-22 16:23   ` Łukasz Sowa
2012-07-25  7:57 ` Rich Felker
2012-08-08  2:24 ` Rich Felker
2012-08-08  4:42   ` Solar Designer
2012-08-08  5:28     ` Rich Felker
2012-08-08  6:27       ` Solar Designer
2012-08-08  7:03         ` Daniel Cegiełka
2012-08-08  7:24           ` Solar Designer
2012-08-08  7:42             ` Daniel Cegiełka
2012-08-08 21:48           ` Rich Felker
2012-08-08 23:08             ` Isaac Dunham
2012-08-08 23:24               ` John Spencer
2012-08-09  1:03                 ` Isaac Dunham
2012-08-09  3:16               ` Rich Felker
2012-08-09  3:36             ` Solar Designer
2012-08-09  7:13               ` orc
2012-08-09  7:28                 ` Rich Felker
2012-08-09  7:29               ` Solar Designer
2012-08-09 10:53                 ` Solar Designer
2012-08-09 11:58                   ` Szabolcs Nagy
2012-08-09 16:43                     ` Solar Designer
2012-08-09 17:30                       ` Szabolcs Nagy
2012-08-09 18:22                       ` Rich Felker [this message]
2012-08-09 23:21                     ` Rich Felker
2012-08-10 17:04                       ` Solar Designer
2012-08-10 18:06                         ` Rich Felker
2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
2012-08-09 22:21                     ` Solar Designer
2012-08-09 22:32                       ` Rich Felker
2012-08-10 17:18                         ` Solar Designer
2012-08-10 18:08                           ` Rich Felker
2012-08-10 22:52                             ` Solar Designer
2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
2012-08-08 13:06       ` Rich Felker
2012-08-08 14:30         ` orc
2012-08-08 14:53           ` Szabolcs Nagy
2012-08-08 15:05             ` orc
2012-08-08 18:10         ` Rich Felker
2012-08-09  1:51         ` Solar Designer
2012-08-09  3:25           ` Rich Felker
2012-08-09  4:04             ` Solar Designer
2012-08-09  5:48               ` Rich Felker
2012-08-09 15:52                 ` Solar Designer
2012-08-09 17:59                   ` Rich Felker
2012-08-09 21:17                   ` Rich Felker
2012-08-09 21:44                     ` Solar Designer
2012-08-09 22:08                       ` Rich Felker
2012-08-09 23:33           ` Rich Felker
2012-08-09  6:03   ` Rich Felker
  -- strict thread matches above, loose matches on Subject: below --
2012-07-17  9:40 Daniel Cegiełka
2012-07-17 17:51 ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120809182204.GS27715@brightrain.aerifal.cx \
    --to=dalias@aerifal.cx \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).