mailing list of musl libc
 help / color / mirror / code / Atom feed
* Re: crypt* files in crypt directory
@ 2012-07-21 15:23 Łukasz Sowa
  2012-07-21 17:11 ` Solar Designer
                   ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Łukasz Sowa @ 2012-07-21 15:23 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1958 bytes --]

Hi,

My first mail here so sorry for any faux pas :)

>On Tue, Jul 17, 2012 at 11:40:28AM +0200, Daniel Cegiełka wrote:
>> Hi,
>> 
>> I have a proposal to move the crypt* files from misc directory to the
>> new crypt directory.
>
>I'm definitely open to this; the misc directory is presently a mess
>and eventually I want to eliminate it completely. If we do go ahead
>with moving it, it might make sense to eliminate the crypt_ prefix
>from the filenames (i.e. just des.c instead of crypt_des.c) to
>eliminate redundancy.

>> I want to port to musl crypt_blowfish patch
>> (http://www.openwall.com/crypt/) and keeping everything in one
>> directory would be more readable.

>I would be interested in getting at least this, md5, and sha hash
>support added, as long as they can be made compact. 3-5k each would be
>ideal but I realize that might be difficult. For inclusion in musl,
>they should have no non-const static data (i.e. no global state) and
>also avoid using excessive stack space for temporary tables.
>
>Rich

Together with Daniel we've prepared initial patch for inclusion crypt_blowfish
and crypt_gensalt in musl. Things to change were rather cosmetic. Both blowfish
and gensalt implementations don't use global state, only const statics (which
needed simply 'const' to meet musl standard). Besides that code looks quite
clean & nice.

However, there are some consts arrays used inside functions which may clutter
stack like flags_by_subtype from BF_crypt(), test_key, test_setting, test_hash
from _crypt_blowfish_rn(). I think that they can be pulled up to global static
consts but we haven't done that yet. What do you think about this?

By the way, we had some problems with compiling it to libcrypt.a - it seems
empty, don't know why. The code is being compiled into libc. Anyway Daniel
successfully tested it with
gcc-musl -DTEST -D_GNU_SOURCE wrapper.c -o musl_blowfish_test

Looking forward to your feedback.

Best Regards,
Łukasz Sowa

[-- Attachment #2: Type: TEXT/PLAIN, Size: 39500 bytes --]

diff --git a/include/crypt.h b/include/crypt.h
index 07de216..2d0e22a 100644
--- a/include/crypt.h
+++ b/include/crypt.h
@@ -13,6 +13,19 @@ struct crypt_data {
 char *crypt(const char *, const char *);
 char *crypt_r(const char *, const char *, struct crypt_data *);
 
+char *_crypt_gensalt_traditional_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size);
+char *_crypt_gensalt_extended_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size);
+char *_crypt_gensalt_md5_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size);
+
+int _crypt_output_magic(const char *setting, char *output, int size);
+char *_crypt_blowfish_rn(const char *key, const char *setting,
+	char *output, int size);
+char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/src/misc/crypt_blowfish.c b/src/misc/crypt_blowfish.c
new file mode 100644
index 0000000..5ef9276
--- /dev/null
+++ b/src/misc/crypt_blowfish.c
@@ -0,0 +1,902 @@
+/*
+ * The crypt_blowfish homepage is:
+ *
+ *	http://www.openwall.com/crypt/
+ *
+ * This code comes from John the Ripper password cracker, with reentrant
+ * and crypt(3) interfaces added, but optimizations specific to password
+ * cracking removed.
+ *
+ * Written by Solar Designer <solar at openwall.com> in 1998-2011.
+ * No copyright is claimed, and the software is hereby placed in the public
+ * domain.  In case this attempt to disclaim copyright and place the software
+ * in the public domain is deemed null and void, then the software is
+ * Copyright (c) 1998-2011 Solar Designer and it is hereby released to the
+ * general public under the following terms:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted.
+ *
+ * There's ABSOLUTELY NO WARRANTY, express or implied.
+ *
+ * It is my intent that you should be able to use this on your system,
+ * as part of a software package, or anywhere else to improve security,
+ * ensure compatibility, or for any other purpose.  I would appreciate
+ * it if you give credit where it is due and keep your modifications in
+ * the public domain as well, but I don't require that in order to let
+ * you place this code and any modifications you make under a license
+ * of your choice.
+ *
+ * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
+ * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
+ * ideas.  The password hashing algorithm was designed by David Mazieres
+ * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
+ * please refer to the comments in BF_set_key() below and to the included
+ * crypt(3) man page.
+ *
+ * There's a paper on the algorithm that explains its design decisions:
+ *
+ *	http://www.usenix.org/events/usenix99/provos.html
+ *
+ * Some of the tricks in BF_ROUND might be inspired by Eric Young's
+ * Blowfish library (I can't be sure if I would think of something if I
+ * hadn't seen his code).
+ */
+
+#include <string.h>
+
+#include <errno.h>
+#ifndef __set_errno
+#define __set_errno(val) errno = (val)
+#endif
+
+/* Just to make sure the prototypes match the actual definitions */
+#include <crypt.h>
+
+#ifdef __i386__
+#define BF_ASM				1
+#define BF_SCALE			1
+#elif defined(__x86_64__) || defined(__alpha__) || defined(__hppa__)
+#define BF_ASM				0
+#define BF_SCALE			1
+#else
+#define BF_ASM				0
+#define BF_SCALE			0
+#endif
+
+typedef unsigned int BF_word;
+typedef signed int BF_word_signed;
+
+/* Number of Blowfish rounds, this is also hardcoded into a few places */
+#define BF_N				16
+
+typedef BF_word BF_key[BF_N + 2];
+
+typedef struct {
+	BF_word S[4][0x100];
+	BF_key P;
+} BF_ctx;
+
+/*
+ * Magic IV for 64 Blowfish encryptions that we do at the end.
+ * The string is "OrpheanBeholderScryDoubt" on big-endian.
+ */
+static const BF_word BF_magic_w[6] = {
+	0x4F727068, 0x65616E42, 0x65686F6C,
+	0x64657253, 0x63727944, 0x6F756274
+};
+
+/*
+ * P-box and S-box tables initialized with digits of Pi.
+ */
+static const BF_ctx BF_init_state = {
+	{
+		{
+			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
+			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
+			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
+			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
+			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
+			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
+			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
+			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
+			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
+			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
+			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
+			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
+			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
+			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
+			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
+			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
+			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
+			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
+			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
+			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
+			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
+			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
+			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
+			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
+			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
+			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
+			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
+			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
+			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
+			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
+			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
+			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
+			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
+			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
+			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
+			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
+			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
+			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
+			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
+			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
+			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
+			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
+			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
+			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
+			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
+			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
+			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
+			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
+			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
+			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
+			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
+			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
+			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
+			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
+			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
+			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
+			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
+			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
+			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
+			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
+			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
+			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
+			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
+			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
+		}, {
+			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
+			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
+			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
+			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
+			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
+			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
+			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
+			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
+			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
+			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
+			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
+			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
+			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
+			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
+			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
+			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
+			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
+			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
+			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
+			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
+			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
+			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
+			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
+			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
+			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
+			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
+			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
+			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
+			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
+			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
+			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
+			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
+			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
+			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
+			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
+			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
+			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
+			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
+			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
+			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
+			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
+			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
+			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
+			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
+			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
+			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
+			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
+			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
+			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
+			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
+			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
+			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
+			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
+			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
+			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
+			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
+			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
+			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
+			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
+			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
+			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
+			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
+			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
+			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
+		}, {
+			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
+			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
+			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
+			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
+			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
+			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
+			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
+			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
+			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
+			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
+			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
+			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
+			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
+			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
+			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
+			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
+			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
+			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
+			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
+			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
+			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
+			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
+			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
+			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
+			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
+			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
+			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
+			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
+			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
+			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
+			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
+			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
+			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
+			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
+			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
+			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
+			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
+			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
+			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
+			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
+			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
+			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
+			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
+			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
+			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
+			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
+			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
+			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
+			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
+			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
+			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
+			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
+			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
+			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
+			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
+			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
+			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
+			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
+			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
+			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
+			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
+			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
+			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
+			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
+		}, {
+			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
+			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
+			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
+			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
+			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
+			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
+			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
+			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
+			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
+			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
+			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
+			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
+			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
+			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
+			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
+			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
+			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
+			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
+			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
+			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
+			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
+			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
+			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
+			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
+			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
+			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
+			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
+			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
+			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
+			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
+			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
+			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
+			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
+			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
+			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
+			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
+			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
+			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
+			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
+			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
+			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
+			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
+			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
+			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
+			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
+			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
+			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
+			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
+			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
+			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
+			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
+			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
+			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
+			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
+			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
+			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
+			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
+			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
+			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
+			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
+			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
+			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
+			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
+			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
+		}
+	}, {
+		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
+		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
+		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
+		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
+		0x9216d5d9, 0x8979fb1b
+	}
+};
+
+static const unsigned char BF_itoa64[64 + 1] =
+	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
+
+static const unsigned char BF_atoi64[0x60] = {
+	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
+	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
+	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
+	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
+	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
+	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
+};
+
+#define BF_safe_atoi64(dst, src) \
+{ \
+	tmp = (unsigned char)(src); \
+	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
+	tmp = BF_atoi64[tmp]; \
+	if (tmp > 63) return -1; \
+	(dst) = tmp; \
+}
+
+static int BF_decode(BF_word *dst, const char *src, int size)
+{
+	unsigned char *dptr = (unsigned char *)dst;
+	unsigned char *end = dptr + size;
+	const unsigned char *sptr = (const unsigned char *)src;
+	unsigned int tmp, c1, c2, c3, c4;
+
+	do {
+		BF_safe_atoi64(c1, *sptr++);
+		BF_safe_atoi64(c2, *sptr++);
+		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
+		if (dptr >= end) break;
+
+		BF_safe_atoi64(c3, *sptr++);
+		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
+		if (dptr >= end) break;
+
+		BF_safe_atoi64(c4, *sptr++);
+		*dptr++ = ((c3 & 0x03) << 6) | c4;
+	} while (dptr < end);
+
+	return 0;
+}
+
+static void BF_encode(char *dst, const BF_word *src, int size)
+{
+	const unsigned char *sptr = (const unsigned char *)src;
+	const unsigned char *end = sptr + size;
+	unsigned char *dptr = (unsigned char *)dst;
+	unsigned int c1, c2;
+
+	do {
+		c1 = *sptr++;
+		*dptr++ = BF_itoa64[c1 >> 2];
+		c1 = (c1 & 0x03) << 4;
+		if (sptr >= end) {
+			*dptr++ = BF_itoa64[c1];
+			break;
+		}
+
+		c2 = *sptr++;
+		c1 |= c2 >> 4;
+		*dptr++ = BF_itoa64[c1];
+		c1 = (c2 & 0x0f) << 2;
+		if (sptr >= end) {
+			*dptr++ = BF_itoa64[c1];
+			break;
+		}
+
+		c2 = *sptr++;
+		c1 |= c2 >> 6;
+		*dptr++ = BF_itoa64[c1];
+		*dptr++ = BF_itoa64[c2 & 0x3f];
+	} while (sptr < end);
+}
+
+static void BF_swap(BF_word *x, int count)
+{
+	static int endianness_check = 1;
+	char *is_little_endian = (char *)&endianness_check;
+	BF_word tmp;
+
+	if (*is_little_endian)
+	do {
+		tmp = *x;
+		tmp = (tmp << 16) | (tmp >> 16);
+		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
+	} while (--count);
+}
+
+#if BF_SCALE
+/* Architectures which can shift addresses left by 2 bits with no extra cost */
+#define BF_ROUND(L, R, N) \
+	tmp1 = L & 0xFF; \
+	tmp2 = L >> 8; \
+	tmp2 &= 0xFF; \
+	tmp3 = L >> 16; \
+	tmp3 &= 0xFF; \
+	tmp4 = L >> 24; \
+	tmp1 = data.ctx.S[3][tmp1]; \
+	tmp2 = data.ctx.S[2][tmp2]; \
+	tmp3 = data.ctx.S[1][tmp3]; \
+	tmp3 += data.ctx.S[0][tmp4]; \
+	tmp3 ^= tmp2; \
+	R ^= data.ctx.P[N + 1]; \
+	tmp3 += tmp1; \
+	R ^= tmp3;
+#else
+/* Architectures with no complicated addressing modes supported */
+#define BF_INDEX(S, i) \
+	(*((BF_word *)(((unsigned char *)S) + (i))))
+#define BF_ROUND(L, R, N) \
+	tmp1 = L & 0xFF; \
+	tmp1 <<= 2; \
+	tmp2 = L >> 6; \
+	tmp2 &= 0x3FC; \
+	tmp3 = L >> 14; \
+	tmp3 &= 0x3FC; \
+	tmp4 = L >> 22; \
+	tmp4 &= 0x3FC; \
+	tmp1 = BF_INDEX(data.ctx.S[3], tmp1); \
+	tmp2 = BF_INDEX(data.ctx.S[2], tmp2); \
+	tmp3 = BF_INDEX(data.ctx.S[1], tmp3); \
+	tmp3 += BF_INDEX(data.ctx.S[0], tmp4); \
+	tmp3 ^= tmp2; \
+	R ^= data.ctx.P[N + 1]; \
+	tmp3 += tmp1; \
+	R ^= tmp3;
+#endif
+
+/*
+ * Encrypt one block, BF_N is hardcoded here.
+ */
+#define BF_ENCRYPT \
+	L ^= data.ctx.P[0]; \
+	BF_ROUND(L, R, 0); \
+	BF_ROUND(R, L, 1); \
+	BF_ROUND(L, R, 2); \
+	BF_ROUND(R, L, 3); \
+	BF_ROUND(L, R, 4); \
+	BF_ROUND(R, L, 5); \
+	BF_ROUND(L, R, 6); \
+	BF_ROUND(R, L, 7); \
+	BF_ROUND(L, R, 8); \
+	BF_ROUND(R, L, 9); \
+	BF_ROUND(L, R, 10); \
+	BF_ROUND(R, L, 11); \
+	BF_ROUND(L, R, 12); \
+	BF_ROUND(R, L, 13); \
+	BF_ROUND(L, R, 14); \
+	BF_ROUND(R, L, 15); \
+	tmp4 = R; \
+	R = L; \
+	L = tmp4 ^ data.ctx.P[BF_N + 1];
+
+#if BF_ASM
+#define BF_body() \
+	_BF_body_r(&data.ctx);
+#else
+#define BF_body() \
+	L = R = 0; \
+	ptr = data.ctx.P; \
+	do { \
+		ptr += 2; \
+		BF_ENCRYPT; \
+		*(ptr - 2) = L; \
+		*(ptr - 1) = R; \
+	} while (ptr < &data.ctx.P[BF_N + 2]); \
+\
+	ptr = data.ctx.S[0]; \
+	do { \
+		ptr += 2; \
+		BF_ENCRYPT; \
+		*(ptr - 2) = L; \
+		*(ptr - 1) = R; \
+	} while (ptr < &data.ctx.S[3][0xFF]);
+#endif
+
+static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
+    unsigned char flags)
+{
+	const char *ptr = key;
+	unsigned int bug, i, j;
+	BF_word safety, sign, diff, tmp[2];
+
+/*
+ * There was a sign extension bug in older revisions of this function.  While
+ * we would have liked to simply fix the bug and move on, we have to provide
+ * a backwards compatibility feature (essentially the bug) for some systems and
+ * a safety measure for some others.  The latter is needed because for certain
+ * multiple inputs to the buggy algorithm there exist easily found inputs to
+ * the correct algorithm that produce the same hash.  Thus, we optionally
+ * deviate from the correct algorithm just enough to avoid such collisions.
+ * While the bug itself affected the majority of passwords containing
+ * characters with the 8th bit set (although only a percentage of those in a
+ * collision-producing way), the anti-collision safety measure affects
+ * only a subset of passwords containing the '\xff' character (not even all of
+ * those passwords, just some of them).  This character is not found in valid
+ * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
+ * Thus, the safety measure is unlikely to cause much annoyance, and is a
+ * reasonable tradeoff to use when authenticating against existing hashes that
+ * are not reliably known to have been computed with the correct algorithm.
+ *
+ * We use an approach that tries to minimize side-channel leaks of password
+ * information - that is, we mostly use fixed-cost bitwise operations instead
+ * of branches or table lookups.  (One conditional branch based on password
+ * length remains.  It is not part of the bug aftermath, though, and is
+ * difficult and possibly unreasonable to avoid given the use of C strings by
+ * the caller, which results in similar timing leaks anyway.)
+ *
+ * For actual implementation, we set an array index in the variable "bug"
+ * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
+ * variable "safety" (bit 16 is set when the safety measure is requested).
+ * Valid combinations of settings are:
+ *
+ * Prefix "$2a$": bug = 0, safety = 0x10000
+ * Prefix "$2x$": bug = 1, safety = 0
+ * Prefix "$2y$": bug = 0, safety = 0
+ */
+	bug = (unsigned int)flags & 1;
+	safety = ((BF_word)flags & 2) << 15;
+
+	sign = diff = 0;
+
+	for (i = 0; i < BF_N + 2; i++) {
+		tmp[0] = tmp[1] = 0;
+		for (j = 0; j < 4; j++) {
+			tmp[0] <<= 8;
+			tmp[0] |= (unsigned char)*ptr; /* correct */
+			tmp[1] <<= 8;
+			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
+/*
+ * Sign extension in the first char has no effect - nothing to overwrite yet,
+ * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
+ * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
+ * extension in tmp[1] occurs.  Once this flag is set, it remains set.
+ */
+			if (j)
+				sign |= tmp[1] & 0x80;
+			if (!*ptr)
+				ptr = key;
+			else
+				ptr++;
+		}
+		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */
+
+		expanded[i] = tmp[bug];
+		initial[i] = BF_init_state.P[i] ^ tmp[bug];
+	}
+
+/*
+ * At this point, "diff" is zero iff the correct and buggy algorithms produced
+ * exactly the same result.  If so and if "sign" is non-zero, which indicates
+ * that there was a non-benign sign extension, this means that we have a
+ * collision between the correctly computed hash for this password and a set of
+ * passwords that could be supplied to the buggy algorithm.  Our safety measure
+ * is meant to protect from such many-buggy to one-correct collisions, by
+ * deviating from the correct algorithm in such cases.  Let's check for this.
+ */
+	diff |= diff >> 16; /* still zero iff exact match */
+	diff &= 0xffff; /* ditto */
+	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
+	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
+	sign &= ~diff & safety; /* action needed? */
+
+/*
+ * If we have determined that we need to deviate from the correct algorithm,
+ * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
+ * let's stick to it now.  It came out of the approach we used above, and it's
+ * not any worse than any other choice we could make.)
+ *
+ * It is crucial that we don't do the same to the expanded key used in the main
+ * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
+ * state that could be directly specified by a password to the buggy algorithm
+ * (and to the fully correct one as well, but that's a side-effect).
+ */
+	initial[0] ^= sign;
+}
+
+static char *BF_crypt(const char *key, const char *setting,
+	char *output, int size,
+	BF_word min)
+{
+#if BF_ASM
+	extern void _BF_body_r(BF_ctx *ctx);
+#endif
+	static const unsigned char flags_by_subtype[26] =
+		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
+	struct {
+		BF_ctx ctx;
+		BF_key expanded_key;
+		union {
+			BF_word salt[4];
+			BF_word output[6];
+		} binary;
+	} data;
+	BF_word L, R;
+	BF_word tmp1, tmp2, tmp3, tmp4;
+	BF_word *ptr;
+	BF_word count;
+	int i;
+
+	if (size < 7 + 22 + 31 + 1) {
+		__set_errno(ERANGE);
+		return NULL;
+	}
+
+	if (setting[0] != '$' ||
+	    setting[1] != '2' ||
+	    setting[2] < 'a' || setting[2] > 'z' ||
+	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
+	    setting[3] != '$' ||
+	    setting[4] < '0' || setting[4] > '3' ||
+	    setting[5] < '0' || setting[5] > '9' ||
+	    (setting[4] == '3' && setting[5] > '1') ||
+	    setting[6] != '$') {
+		__set_errno(EINVAL);
+		return NULL;
+	}
+
+	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
+	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
+		__set_errno(EINVAL);
+		return NULL;
+	}
+	BF_swap(data.binary.salt, 4);
+
+	BF_set_key(key, data.expanded_key, data.ctx.P,
+	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);
+
+	memcpy(data.ctx.S, BF_init_state.S, sizeof(data.ctx.S));
+
+	L = R = 0;
+	for (i = 0; i < BF_N + 2; i += 2) {
+		L ^= data.binary.salt[i & 2];
+		R ^= data.binary.salt[(i & 2) + 1];
+		BF_ENCRYPT;
+		data.ctx.P[i] = L;
+		data.ctx.P[i + 1] = R;
+	}
+
+	ptr = data.ctx.S[0];
+	do {
+		ptr += 4;
+		L ^= data.binary.salt[(BF_N + 2) & 3];
+		R ^= data.binary.salt[(BF_N + 3) & 3];
+		BF_ENCRYPT;
+		*(ptr - 4) = L;
+		*(ptr - 3) = R;
+
+		L ^= data.binary.salt[(BF_N + 4) & 3];
+		R ^= data.binary.salt[(BF_N + 5) & 3];
+		BF_ENCRYPT;
+		*(ptr - 2) = L;
+		*(ptr - 1) = R;
+	} while (ptr < &data.ctx.S[3][0xFF]);
+
+	do {
+		int done;
+
+		for (i = 0; i < BF_N + 2; i += 2) {
+			data.ctx.P[i] ^= data.expanded_key[i];
+			data.ctx.P[i + 1] ^= data.expanded_key[i + 1];
+		}
+
+		done = 0;
+		do {
+			BF_body();
+			if (done)
+				break;
+			done = 1;
+
+			tmp1 = data.binary.salt[0];
+			tmp2 = data.binary.salt[1];
+			tmp3 = data.binary.salt[2];
+			tmp4 = data.binary.salt[3];
+			for (i = 0; i < BF_N; i += 4) {
+				data.ctx.P[i] ^= tmp1;
+				data.ctx.P[i + 1] ^= tmp2;
+				data.ctx.P[i + 2] ^= tmp3;
+				data.ctx.P[i + 3] ^= tmp4;
+			}
+			data.ctx.P[16] ^= tmp1;
+			data.ctx.P[17] ^= tmp2;
+		} while (1);
+	} while (--count);
+
+	for (i = 0; i < 6; i += 2) {
+		L = BF_magic_w[i];
+		R = BF_magic_w[i + 1];
+
+		count = 64;
+		do {
+			BF_ENCRYPT;
+		} while (--count);
+
+		data.binary.output[i] = L;
+		data.binary.output[i + 1] = R;
+	}
+
+	memcpy(output, setting, 7 + 22 - 1);
+	output[7 + 22 - 1] = BF_itoa64[(int)
+		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];
+
+/* This has to be bug-compatible with the original implementation, so
+ * only encode 23 of the 24 bytes. :-) */
+	BF_swap(data.binary.output, 6);
+	BF_encode(&output[7 + 22], data.binary.output, 23);
+	output[7 + 22 + 31] = '\0';
+
+	return output;
+}
+
+int _crypt_output_magic(const char *setting, char *output, int size)
+{
+	if (size < 3)
+		return -1;
+
+	output[0] = '*';
+	output[1] = '0';
+	output[2] = '\0';
+
+	if (setting[0] == '*' && setting[1] == '0')
+		output[1] = '1';
+
+	return 0;
+}
+
+/*
+ * Please preserve the runtime self-test.  It serves two purposes at once:
+ *
+ * 1. We really can't afford the risk of producing incompatible hashes e.g.
+ * when there's something like gcc bug 26587 again, whereas an application or
+ * library integrating this code might not also integrate our external tests or
+ * it might not run them after every build.  Even if it does, the miscompile
+ * might only occur on the production build, but not on a testing build (such
+ * as because of different optimization settings).  It is painful to recover
+ * from incorrectly-computed hashes - merely fixing whatever broke is not
+ * enough.  Thus, a proactive measure like this self-test is needed.
+ *
+ * 2. We don't want to leave sensitive data from our actual password hash
+ * computation on the stack or in registers.  Previous revisions of the code
+ * would do explicit cleanups, but simply running the self-test after hash
+ * computation is more reliable.
+ *
+ * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
+ * setting.
+ */
+char *_crypt_blowfish_rn(const char *key, const char *setting,
+	char *output, int size)
+{
+	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
+	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
+	static const char * const test_hash[2] =
+		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
+		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
+	char *retval;
+	const char *p;
+	int save_errno, ok;
+	struct {
+		char s[7 + 22 + 1];
+		char o[7 + 22 + 31 + 1 + 1 + 1];
+	} buf;
+
+/* Hash the supplied password */
+	_crypt_output_magic(setting, output, size);
+	retval = BF_crypt(key, setting, output, size, 16);
+	save_errno = errno;
+
+/*
+ * Do a quick self-test.  It is important that we make both calls to BF_crypt()
+ * from the same scope such that they likely use the same stack locations,
+ * which makes the second call overwrite the first call's sensitive data on the
+ * stack and makes it more likely that any alignment related issues would be
+ * detected by the self-test.
+ */
+	memcpy(buf.s, test_setting, sizeof(buf.s));
+	if (retval)
+		buf.s[2] = setting[2];
+	memset(buf.o, 0x55, sizeof(buf.o));
+	buf.o[sizeof(buf.o) - 1] = 0;
+	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);
+
+	ok = (p == buf.o &&
+	    !memcmp(p, buf.s, 7 + 22) &&
+	    !memcmp(p + (7 + 22),
+	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
+	    31 + 1 + 1 + 1));
+
+	{
+		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
+		BF_key ae, ai, ye, yi;
+		BF_set_key(k, ae, ai, 2); /* $2a$ */
+		BF_set_key(k, ye, yi, 4); /* $2y$ */
+		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
+		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
+		    !memcmp(ae, ye, sizeof(ae)) &&
+		    !memcmp(ai, yi, sizeof(ai));
+	}
+
+	__set_errno(save_errno);
+	if (ok)
+		return retval;
+
+/* Should not happen */
+	_crypt_output_magic(setting, output, size);
+	__set_errno(EINVAL); /* pretend we don't support this hash type */
+	return NULL;
+}
+
+char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size)
+{
+	if (size < 16 || output_size < 7 + 22 + 1 ||
+	    (count && (count < 4 || count > 31)) ||
+	    prefix[0] != '$' || prefix[1] != '2' ||
+	    (prefix[2] != 'a' && prefix[2] != 'y')) {
+		if (output_size > 0) output[0] = '\0';
+		__set_errno((output_size < 7 + 22 + 1) ? ERANGE : EINVAL);
+		return NULL;
+	}
+
+	if (!count) count = 5;
+
+	output[0] = '$';
+	output[1] = '2';
+	output[2] = prefix[2];
+	output[3] = '$';
+	output[4] = '0' + count / 10;
+	output[5] = '0' + count % 10;
+	output[6] = '$';
+
+	BF_encode(&output[7], (const BF_word *)input, 16);
+	output[7 + 22] = '\0';
+
+	return output;
+}
diff --git a/src/misc/crypt_gensalt.c b/src/misc/crypt_gensalt.c
new file mode 100644
index 0000000..43e1585
--- /dev/null
+++ b/src/misc/crypt_gensalt.c
@@ -0,0 +1,124 @@
+/*
+ * Written by Solar Designer <solar at openwall.com> in 2000-2011.
+ * No copyright is claimed, and the software is hereby placed in the public
+ * domain.  In case this attempt to disclaim copyright and place the software
+ * in the public domain is deemed null and void, then the software is
+ * Copyright (c) 2000-2011 Solar Designer and it is hereby released to the
+ * general public under the following terms:
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted.
+ *
+ * There's ABSOLUTELY NO WARRANTY, express or implied.
+ *
+ * See crypt_blowfish.c for more information.
+ *
+ * This file contains salt generation functions for the traditional and
+ * other common crypt(3) algorithms, except for bcrypt which is defined
+ * entirely in crypt_blowfish.c.
+ */
+
+#include <string.h>
+
+#include <errno.h>
+#ifndef __set_errno
+#define __set_errno(val) errno = (val)
+#endif
+
+/* Just to make sure the prototypes match the actual definitions */
+#include <crypt.h>
+
+unsigned const char _crypt_itoa64[64 + 1] =
+	"./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
+
+char *_crypt_gensalt_traditional_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size)
+{
+	(void) prefix;
+
+	if (size < 2 || output_size < 2 + 1 || (count && count != 25)) {
+		if (output_size > 0) output[0] = '\0';
+		__set_errno((output_size < 2 + 1) ? ERANGE : EINVAL);
+		return NULL;
+	}
+
+	output[0] = _crypt_itoa64[(unsigned int)input[0] & 0x3f];
+	output[1] = _crypt_itoa64[(unsigned int)input[1] & 0x3f];
+	output[2] = '\0';
+
+	return output;
+}
+
+char *_crypt_gensalt_extended_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size)
+{
+	unsigned long value;
+
+	(void) prefix;
+
+/* Even iteration counts make it easier to detect weak DES keys from a look
+ * at the hash, so they should be avoided */
+	if (size < 3 || output_size < 1 + 4 + 4 + 1 ||
+	    (count && (count > 0xffffff || !(count & 1)))) {
+		if (output_size > 0) output[0] = '\0';
+		__set_errno((output_size < 1 + 4 + 4 + 1) ? ERANGE : EINVAL);
+		return NULL;
+	}
+
+	if (!count) count = 725;
+
+	output[0] = '_';
+	output[1] = _crypt_itoa64[count & 0x3f];
+	output[2] = _crypt_itoa64[(count >> 6) & 0x3f];
+	output[3] = _crypt_itoa64[(count >> 12) & 0x3f];
+	output[4] = _crypt_itoa64[(count >> 18) & 0x3f];
+	value = (unsigned long)(unsigned char)input[0] |
+		((unsigned long)(unsigned char)input[1] << 8) |
+		((unsigned long)(unsigned char)input[2] << 16);
+	output[5] = _crypt_itoa64[value & 0x3f];
+	output[6] = _crypt_itoa64[(value >> 6) & 0x3f];
+	output[7] = _crypt_itoa64[(value >> 12) & 0x3f];
+	output[8] = _crypt_itoa64[(value >> 18) & 0x3f];
+	output[9] = '\0';
+
+	return output;
+}
+
+char *_crypt_gensalt_md5_rn(const char *prefix, unsigned long count,
+	const char *input, int size, char *output, int output_size)
+{
+	unsigned long value;
+
+	(void) prefix;
+
+	if (size < 3 || output_size < 3 + 4 + 1 || (count && count != 1000)) {
+		if (output_size > 0) output[0] = '\0';
+		__set_errno((output_size < 3 + 4 + 1) ? ERANGE : EINVAL);
+		return NULL;
+	}
+
+	output[0] = '$';
+	output[1] = '1';
+	output[2] = '$';
+	value = (unsigned long)(unsigned char)input[0] |
+		((unsigned long)(unsigned char)input[1] << 8) |
+		((unsigned long)(unsigned char)input[2] << 16);
+	output[3] = _crypt_itoa64[value & 0x3f];
+	output[4] = _crypt_itoa64[(value >> 6) & 0x3f];
+	output[5] = _crypt_itoa64[(value >> 12) & 0x3f];
+	output[6] = _crypt_itoa64[(value >> 18) & 0x3f];
+	output[7] = '\0';
+
+	if (size >= 6 && output_size >= 3 + 4 + 4 + 1) {
+		value = (unsigned long)(unsigned char)input[3] |
+			((unsigned long)(unsigned char)input[4] << 8) |
+			((unsigned long)(unsigned char)input[5] << 16);
+		output[7] = _crypt_itoa64[value & 0x3f];
+		output[8] = _crypt_itoa64[(value >> 6) & 0x3f];
+		output[9] = _crypt_itoa64[(value >> 12) & 0x3f];
+		output[10] = _crypt_itoa64[(value >> 18) & 0x3f];
+		output[11] = '\0';
+	}
+
+	return output;
+}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-07-21 15:23 crypt* files in crypt directory Łukasz Sowa
@ 2012-07-21 17:11 ` Solar Designer
  2012-07-21 20:17   ` Rich Felker
  2012-07-22 16:23   ` Łukasz Sowa
  2012-07-25  7:57 ` Rich Felker
  2012-08-08  2:24 ` Rich Felker
  2 siblings, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-07-21 17:11 UTC (permalink / raw)
  To: Lukasz Sowa; +Cc: musl

Hi,

I suggest that you subscribe to the list, so that if someone does not CC
you on a message yet you want to reply, you don't happen to start a new
thread (again).

On Sat, Jul 21, 2012 at 05:23:24PM +0200, ?ukasz Sowa wrote:
> Together with Daniel we've prepared initial patch for inclusion 
> crypt_blowfish
> and crypt_gensalt in musl. Things to change were rather cosmetic. Both 
> blowfish
> and gensalt implementations don't use global state, only const statics 
> (which
> needed simply 'const' to meet musl standard).

I will likely add those const's to my "upstream" code for crypt_blowfish.

> However, there are some consts arrays used inside functions which may 
> clutter
> stack like flags_by_subtype from BF_crypt(), test_key, test_setting, 
> test_hash
> from _crypt_blowfish_rn(). I think that they can be pulled up to global 
> static
> consts but we haven't done that yet. What do you think about this?

I think that they are in .rodata as long as you have "const" on them,
and thus there's no reason to move them to global scope.  They don't
clutter the stack.

> +++ b/include/crypt.h
> @@ -13,6 +13,19 @@ struct crypt_data {
>  char *crypt(const char *, const char *);
>  char *crypt_r(const char *, const char *, struct crypt_data *);
>  
> +char *_crypt_gensalt_traditional_rn(const char *prefix, unsigned long count,
> +	const char *input, int size, char *output, int output_size);
> +char *_crypt_gensalt_extended_rn(const char *prefix, unsigned long count,
> +	const char *input, int size, char *output, int output_size);
> +char *_crypt_gensalt_md5_rn(const char *prefix, unsigned long count,
> +	const char *input, int size, char *output, int output_size);
> +
> +int _crypt_output_magic(const char *setting, char *output, int size);
> +char *_crypt_blowfish_rn(const char *key, const char *setting,
> +	char *output, int size);
> +char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
> +	const char *input, int size, char *output, int output_size);
> +
>  #ifdef __cplusplus
>  }
>  #endif

None of the interfaces you've added above were supposed to be exported
by a libc.  They're not in Owl's glibc, for example, although it does
include crypt_blowfish.

Instead, you need to roll crypt_blowfish support into crypt() and
crypt_r() wrappers.  You may also add similarly hash type agnostic
crypt_rn(), crypt_ra(), crypt_gensalt(), crypt_gensalt_rn(), and
crypt_gensalt_ra().  The crypt.3 man page included with crypt_blowfish
documents them - perhaps it can also become the man page for musl's.

See crypt_blowfish's wrapper.c and modify it for use in musl or at least
reuse code from it.

> +typedef unsigned int BF_word;
> +typedef signed int BF_word_signed;
[...]
> +	const char *ptr = key;
[...]
> +			tmp[0] <<= 8;
> +			tmp[0] |= (unsigned char)*ptr; /* correct */
> +			tmp[1] <<= 8;
> +			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */

I think I have an instance of undefined behavior on the "bug" line here:
I am casting a potentially unsigned char *ptr to (signed char), thus
causing signed overflow (value may not fit in the signed type's data
range on systems where char is unsigned by default).  While reproducing
the old bug here is on purpose, the new behavior should better be
precisely defined (just _the_ bug the way it happened to be compiled
before, not some other misbehavior).  I'd appreciate suggestions for a
clean and not too verbose fix for this.

Thanks,

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-07-21 17:11 ` Solar Designer
@ 2012-07-21 20:17   ` Rich Felker
  2012-07-22 16:23   ` Łukasz Sowa
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-07-21 20:17 UTC (permalink / raw)
  To: musl; +Cc: Lukasz Sowa

On Sat, Jul 21, 2012 at 09:11:02PM +0400, Solar Designer wrote:
> > However, there are some consts arrays used inside functions which may 
> > clutter
> > stack like flags_by_subtype from BF_crypt(), test_key, test_setting, 
> > test_hash
> > from _crypt_blowfish_rn(). I think that they can be pulled up to global 
> > static
> > consts but we haven't done that yet. What do you think about this?
> 
> I think that they are in .rodata as long as you have "const" on them,
> and thus there's no reason to move them to global scope.  They don't
> clutter the stack.

These arrays have static storage duration so they do not use stack
space. If they were not static, then a naive compiler would be forced
to put them on the stack and generate bloated code to initialize them.
A very smart compiler could determine that the address does not leak
and thus optimize them to a single static copy.

> crypt_r() wrappers.  You may also add similarly hash type agnostic
> crypt_rn(), crypt_ra(), crypt_gensalt(), crypt_gensalt_rn(), and
> crypt_gensalt_ra().  The crypt.3 man page included with crypt_blowfish
> documents them - perhaps it can also become the man page for musl's.

This isn't a decision yet, but I really question (1) the value of the
_rn/_ra interfaces, and (2) whether any of these belong in libc.
There's no historical precedent (except perhaps openwall-patched
glibc) for having these functions in libc, and as far as I can tell,
the only program which will have any use for them on a typical system
is the passwd utility.

As for why I question _rn/_ra, the only historical reason for having
large amounts of data in the crypt_data structure is to store internal
state, which should not exist for the crypt_r interface - keeping
state is at best useless and at worst an information-leak security
vuln. The only time state would be needed is for the encrypt_r and
setkey_r functions (analogues of XSI legacy encrypt and setkey
functions) which have no place in modern cryptographic programming. As
such, a small crypt_data structure (or even a legacy large one) should
be fine for storing the only thing it's actually needed to store: the
resulting hash from crypt_r.

If we do include the crypt_rn/_ra functions, I'd prefer them be the
trivial wrappers for the plain crypt_r function.

> > +typedef unsigned int BF_word;
> > +typedef signed int BF_word_signed;

Using types whose range could vary between systems (even though they
won't vary on musl systems) seems like a bigger portability issue than
the char stuff below... Should these be uint32_t/int32_t?

> [...]
> > +	const char *ptr = key;
> [...]
> > +			tmp[0] <<= 8;
> > +			tmp[0] |= (unsigned char)*ptr; /* correct */
> > +			tmp[1] <<= 8;
> > +			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
> 
> I think I have an instance of undefined behavior on the "bug" line here:
> I am casting a potentially unsigned char *ptr to (signed char), thus
> causing signed overflow (value may not fit in the signed type's data
> range on systems where char is unsigned by default).  While reproducing

There's no signed overflow here; it's a conversion to a signed type,
whose result is implementation-defined if the value does not fit. We
require the standard conversion (modular reduction) in musl, so I have
no objection to the code as-is, but if you want it to be "more
portable" you can write this some other way like the way we did it in
the DES crypt module.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-07-21 17:11 ` Solar Designer
  2012-07-21 20:17   ` Rich Felker
@ 2012-07-22 16:23   ` Łukasz Sowa
  1 sibling, 0 replies; 52+ messages in thread
From: Łukasz Sowa @ 2012-07-22 16:23 UTC (permalink / raw)
  To: Solar Designer; +Cc: Lukasz Sowa, musl

Hi again, thanks for your reply.

On Sat, 21 Jul 2012, Solar Designer wrote:

> Hi,
>
> I suggest that you subscribe to the list, so that if someone does not CC
> you on a message yet you want to reply, you don't happen to start a new
> thread (again).

Sorry about that, I've already subscribed to the list.

>> However, there are some consts arrays used inside functions which may
>> clutter
>> stack like flags_by_subtype from BF_crypt(), test_key, test_setting,
>> test_hash
>> from _crypt_blowfish_rn(). I think that they can be pulled up to global
>> static
>> consts but we haven't done that yet. What do you think about this?
>
> I think that they are in .rodata as long as you have "const" on them,
> and thus there's no reason to move them to global scope.  They don't
> clutter the stack.

Yes, that's what I thought about at first, so we didn't moved the code to global
but my fault I didn't simply checked it in generated code. Code looks nicer if
they're function's static so we should leave it as is.

>
>> +++ b/include/crypt.h
>> @@ -13,6 +13,19 @@ struct crypt_data {
>>  char *crypt(const char *, const char *);
>>  char *crypt_r(const char *, const char *, struct crypt_data *);
>>
>> +char *_crypt_gensalt_traditional_rn(const char *prefix, unsigned long count,
>> +	const char *input, int size, char *output, int output_size);
>> +char *_crypt_gensalt_extended_rn(const char *prefix, unsigned long count,
>> +	const char *input, int size, char *output, int output_size);
>> +char *_crypt_gensalt_md5_rn(const char *prefix, unsigned long count,
>> +	const char *input, int size, char *output, int output_size);
>> +
>> +int _crypt_output_magic(const char *setting, char *output, int size);
>> +char *_crypt_blowfish_rn(const char *key, const char *setting,
>> +	char *output, int size);
>> +char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
>> +	const char *input, int size, char *output, int output_size);
>> +
>>  #ifdef __cplusplus
>>  }
>>  #endif
>
> None of the interfaces you've added above were supposed to be exported
> by a libc.  They're not in Owl's glibc, for example, although it does
> include crypt_blowfish.
>
> Instead, you need to roll crypt_blowfish support into crypt() and
> crypt_r() wrappers.  You may also add similarly hash type agnostic
> crypt_rn(), crypt_ra(), crypt_gensalt(), crypt_gensalt_rn(), and
> crypt_gensalt_ra().  The crypt.3 man page included with crypt_blowfish
> documents them - perhaps it can also become the man page for musl's.
>
> See crypt_blowfish's wrapper.c and modify it for use in musl or at least
> reuse code from it.
>

I think it's up to Rich what to do with this code. We (Daniel and me) don't have
clear idea about it.

Best Regards,
Lukasz Sowa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Re: crypt* files in crypt directory
  2012-07-21 15:23 crypt* files in crypt directory Łukasz Sowa
  2012-07-21 17:11 ` Solar Designer
@ 2012-07-25  7:57 ` Rich Felker
  2012-08-08  2:24 ` Rich Felker
  2 siblings, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-07-25  7:57 UTC (permalink / raw)
  To: musl

Hi. Finally got around to reviewing this a bit.. :-)

On Sat, Jul 21, 2012 at 05:23:24PM +0200, Łukasz Sowa wrote:
> --- /dev/null
> +++ b/src/misc/crypt_blowfish.c
> [...]
> +#include <errno.h>
> +#ifndef __set_errno
> +#define __set_errno(val) errno = (val)
> +#endif

Is there a reason for this __set_errno stuff? IMO the code should just
directly assign to errno. This looks like some silly cargo-culting
from glibc...

> +/* Just to make sure the prototypes match the actual definitions */
> +#include <crypt.h>

I'm not sure this is useful; this file does not define any public
interfaces.

> +#ifdef __i386__
> +#define BF_ASM				1
> +#define BF_SCALE			1
> +#elif defined(__x86_64__) || defined(__alpha__) || defined(__hppa__)
> +#define BF_ASM				0
> +#define BF_SCALE			1
> +#else
> +#define BF_ASM				0
> +#define BF_SCALE			0
> +#endif

Is this used/needed for anything? I'm generally opposed to having
different versions of code conditionally compiled for different archs
unless there's a very good reason. It makes it so testing on multiple
archs is required to ensure that there are not bugs.

> +typedef unsigned int BF_word;
> +typedef signed int BF_word_signed;

While it's okay for musl where all targets have 32-bit int, if you
want a type that's 32-bit you should probably be using [u]int32_t, or
if you want the system wordsize then int is wrong and you should be
using long or size_t or something...

> +#if BF_ASM
> +#define BF_body() \
> +	_BF_body_r(&data.ctx);
> +#else

This does not seem to exist in the submitted code..

> +char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
> +	const char *input, int size, char *output, int output_size)

This belongs in the gensalt file, not here. There's no reason every
program that merely wants to authenticate against passwords should
pull in salt-generation code.

Aside from that, the code looks a bit long as-is, but I haven't tried
building it yet and it might just look that way from all the comments.
I'll take a look at size stuff soon..

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Re: crypt* files in crypt directory
  2012-07-21 15:23 crypt* files in crypt directory Łukasz Sowa
  2012-07-21 17:11 ` Solar Designer
  2012-07-25  7:57 ` Rich Felker
@ 2012-08-08  2:24 ` Rich Felker
  2012-08-08  4:42   ` Solar Designer
  2012-08-09  6:03   ` Rich Felker
  2 siblings, 2 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-08  2:24 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

On Sat, Jul 21, 2012 at 05:23:24PM +0200, Łukasz Sowa wrote:
> Together with Daniel we've prepared initial patch for inclusion crypt_blowfish
> and crypt_gensalt in musl. Things to change were rather cosmetic. Both blowfish
> and gensalt implementations don't use global state, only const statics (which
> needed simply 'const' to meet musl standard). Besides that code looks quite
> clean & nice.

I've been looking at the code some now that I got the 0.9.3 release
out of the way, and I have a few questions for you and the
community...

First, the compatibility code for the sign extension bug. How
important is it to keep this? Part of my question is that I'm having a
hard time understanding how it's useful. For passwords that were
subject to collisions due to this bug, it seems like there's nothing
that can be done except discarding the old hashes, since they're
vulnerable. So my understanding is that the bug-compatibility code
just serves to keep the subset of old hashes of 8-bit passwords that
were _not_ subject to collisions from becoming unusable. I.e. the
bug-compat code only benefits users who:

1. Used passwords containing 8-bit characters
2. Happened to be using a password that was not subject to the
collision bug, and
3. Did not regenerate hashes after the bug was fixed.

I'm uncertain whether there's any portion of musl's user base that
this would be useful to.

For folks completely unfamiliar with the issue, here's an LWN article:
http://lwn.net/Articles/448699/

Second, what can be done to reduce size? I think the first step is
replacing the giant macros (BF_ROUND, BF_ENCRYPT, etc.) with
functions so that the code doesn't get generated in duplicate unless
aggressive inlining is enabled by CFLAGS. But are there other things
that would help? With the data tables being 4k in size, I'm thinking
a reasonable target size for the whole file might be 7k.

Actually while writing this, I made some quick changes and seem to
have already achieved that goal. See the attached file. It's untested,
so I might have broken something in the process. I'm not sure I'll
have time to test it well right away, so I'd appreciate comments on
whether it works as well as any other possible improvements... :-)

Rich

[-- Attachment #2: crypt_blowfish.c --]
[-- Type: text/plain, Size: 30078 bytes --]

/* Modified by Rich Felker for inclusion in musl libc. Main changes
 * made were reversing the manual inlining/loop unrolling and
 * replacing it with code that can be optimized for size or speed
 * depending on compiler flags. Endianness check was also optimized to
 * be a compile-time constant. */

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2011.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2011 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef struct {
	BF_word S[4][0x100];
	BF_key P;
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {
	{
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}, {
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}
};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

#define BF_safe_atoi64(dst, src) \
{ \
	tmp = (unsigned char)(src); \
	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
	tmp = BF_atoi64[tmp]; \
	if (tmp > 63) return -1; \
	(dst) = tmp; \
}

static int BF_decode(BF_word *dst, const char *src, int size)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + size;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	do {
		BF_safe_atoi64(c1, *sptr++);
		BF_safe_atoi64(c2, *sptr++);
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr >= end) break;

		BF_safe_atoi64(c3, *sptr++);
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		if (dptr >= end) break;

		BF_safe_atoi64(c4, *sptr++);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	} while (dptr < end);

	return 0;
}

static void BF_encode(char *dst, const BF_word *src, int size)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + size;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	do {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	} while (sptr < end);
}

static void BF_swap(BF_word *x, int count)
{
	BF_word tmp;

	if ((union { int i; char c; }){1}.c)
	do {
		tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

static BF_word BF_round(BF_word L, int N, BF_ctx *ctx)
{
	return ctx->P[N + 1] ^ (ctx->S[3][L & 0xFF]
		+ (ctx->S[2][L >> 8 & 0xFF]
		^ (ctx->S[1][L >> 16 & 0xFF]
		+ ctx->S[0][L >> 24])));
}

#define BF_ROUND(L, R, N) ( R ^= BF_round(L, N, ctx) )

static BF_word BF_encrypt(BF_word L, BF_word *RP, BF_ctx *ctx)
{
	int i;
	BF_word R = *RP;
	L ^= ctx->P[0];
	for (i=0; i<16; i+=2) {
		BF_ROUND(L, R, i);
		BF_ROUND(R, L, i+1);
	}
	*RP = L;
	return R ^ ctx->P[BF_N + 1];
}

#define BF_ENCRYPT ( L = BF_encrypt(L, &R, &data.ctx) )

#define BF_body() \
	L = R = 0; \
	ptr = data.ctx.P; \
	do { \
		ptr += 2; \
		BF_ENCRYPT; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.P[BF_N + 2]); \
\
	ptr = data.ctx.S[0]; \
	do { \
		ptr += 2; \
		BF_ENCRYPT; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.S[3][0xFF]);

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, int size,
	BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word L, R;
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr;
	BF_word count;
	int i;

	if (size < 7 + 22 + 31 + 1) {
		errno = ERANGE;
		return NULL;
	}

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] < 'a' || setting[2] > 'z' ||
	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] < '0' || setting[4] > '3' ||
	    setting[5] < '0' || setting[5] > '9' ||
	    (setting[4] == '3' && setting[5] > '1') ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.S, BF_init_state.S, sizeof(data.ctx.S));

	L = R = 0;
	for (i = 0; i < BF_N + 2; i += 2) {
		L ^= data.binary.salt[i & 2];
		R ^= data.binary.salt[(i & 2) + 1];
		BF_ENCRYPT;
		data.ctx.P[i] = L;
		data.ctx.P[i + 1] = R;
	}

	ptr = data.ctx.S[0];
	do {
		ptr += 4;
		L ^= data.binary.salt[(BF_N + 2) & 3];
		R ^= data.binary.salt[(BF_N + 3) & 3];
		BF_ENCRYPT;
		*(ptr - 4) = L;
		*(ptr - 3) = R;

		L ^= data.binary.salt[(BF_N + 4) & 3];
		R ^= data.binary.salt[(BF_N + 5) & 3];
		BF_ENCRYPT;
		*(ptr - 2) = L;
		*(ptr - 1) = R;
	} while (ptr < &data.ctx.S[3][0xFF]);

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.P[i] ^= data.expanded_key[i];
			data.ctx.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_body();
			if (done)
				break;
			done = 1;

			tmp1 = data.binary.salt[0];
			tmp2 = data.binary.salt[1];
			tmp3 = data.binary.salt[2];
			tmp4 = data.binary.salt[3];
			for (i = 0; i < BF_N; i += 4) {
				data.ctx.P[i] ^= tmp1;
				data.ctx.P[i + 1] ^= tmp2;
				data.ctx.P[i + 2] ^= tmp3;
				data.ctx.P[i + 3] ^= tmp4;
			}
			data.ctx.P[16] ^= tmp1;
			data.ctx.P[17] ^= tmp2;
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		L = BF_magic_w[i];
		R = BF_magic_w[i + 1];

		count = 64;
		do {
			BF_ENCRYPT;
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = R;
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode(&output[7 + 22], data.binary.output, 23);
	output[7 + 22 + 31] = '\0';

	return output;
}

int _crypt_output_magic(const char *setting, char *output, int size)
{
	if (size < 3)
		return -1;

	output[0] = '*';
	output[1] = '0';
	output[2] = '\0';

	if (setting[0] == '*' && setting[1] == '0')
		output[1] = '1';

	return 0;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *_crypt_blowfish_rn(const char *key, const char *setting,
	char *output, int size)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	_crypt_output_magic(setting, output, size);
	retval = BF_crypt(key, setting, output, size, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	_crypt_output_magic(setting, output, size);
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  2:24 ` Rich Felker
@ 2012-08-08  4:42   ` Solar Designer
  2012-08-08  5:28     ` Rich Felker
  2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
  2012-08-09  6:03   ` Rich Felker
  1 sibling, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-08  4:42 UTC (permalink / raw)
  To: musl

On Tue, Aug 07, 2012 at 10:24:21PM -0400, Rich Felker wrote:
> First, the compatibility code for the sign extension bug. How
> important is it to keep this?

Not very important, but nice to keep musl's code revision closer to
upstream.

> Part of my question is that I'm having a
> hard time understanding how it's useful. For passwords that were
> subject to collisions due to this bug, it seems like there's nothing
> that can be done except discarding the old hashes, since they're
> vulnerable.

Not everyone cares about the security risk this much.  Some sysadmins
may prefer not to inconvenience their users.  They would change all
existing hashes to use the $2x$ prefix when they deploy a fixed version
of the code on a system that was known to have the bug before that
point.  That way, all users can continue to log in normally, but newly
changed passwords would be free from the bug.

> So my understanding is that the bug-compatibility code
> just serves to keep the subset of old hashes of 8-bit passwords that
> were _not_ subject to collisions from becoming unusable. I.e. the
> bug-compat code only benefits users who:
> 
> 1. Used passwords containing 8-bit characters
> 2. Happened to be using a password that was not subject to the
> collision bug, and
> 3. Did not regenerate hashes after the bug was fixed.

This could be true, but the sysadmin does not know which hashes were
subject to the collision impact of the bug and which were not (short of
trying to crack the passwords or adding code to analyze them upon login
and substitute different hash prefixes).

So a possible decision to use $2x$ for older hashes may be made based on
other criteria only (security risk vs. user annoyance).

I think very few systems actually made use of the $2x$ prefix.  Maybe
some websites did.

> I'm uncertain whether there's any portion of musl's user base that
> this would be useful to.

Maybe not.

> Second, what can be done to reduce size?

I felt the size was acceptable already.  However, if you must, the
instances of BF_ENCRYPT that are outside of BF_body may be made slower
with little impact on overall speed.  For example, they may be made a
function rather than a macro, and the function would only be inlined in
builds optimized for speed rather than size.

> I think the first step is
> replacing the giant macros (BF_ROUND, BF_ENCRYPT, etc.) with
> functions so that the code doesn't get generated in duplicate unless
> aggressive inlining is enabled by CFLAGS.

I see that you did this - and I think you took it too far.  The code
became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
140 c/s down to 77 c/s).  Adding -finline-functions
-fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
less aggressive loop unrolling results in lower speeds.

The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on FX-8120
I get 490 c/s for the original code, 450 c/s for your code without
inlining/unrolling, and somehow only 430 c/s with -finline-functions
-funroll-loops.

I think you should revert the changes for the instance of BF_ENCRYPT
that is inside of BF_body.

I also think that this code should be optimized for speed even when the
rest of musl is optimized for size.  In this case, better speed may mean
better security, because it lets the sysadmin configure a higher
iteration count for new passwords.

> But are there other things
> that would help? With the data tables being 4k in size, I'm thinking
> a reasonable target size for the whole file might be 7k.
> 
> Actually while writing this, I made some quick changes and seem to
> have already achieved that goal. See the attached file. It's untested,
> so I might have broken something in the process. I'm not sure I'll
> have time to test it well right away, so I'd appreciate comments on
> whether it works as well as any other possible improvements... :-)

It passes wrapper.c's tests once I re-added _crypt_gensalt_blowfish_rn()
to make these files compile together again.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  4:42   ` Solar Designer
@ 2012-08-08  5:28     ` Rich Felker
  2012-08-08  6:27       ` Solar Designer
  2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
  1 sibling, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-08  5:28 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote:
> On Tue, Aug 07, 2012 at 10:24:21PM -0400, Rich Felker wrote:
> > First, the compatibility code for the sign extension bug. How
> > important is it to keep this?
> 
> Not very important, but nice to keep musl's code revision closer to
> upstream.
> [...]
> > I'm uncertain whether there's any portion of musl's user base that
> > this would be useful to.
> 
> Maybe not.

After further reading, the cost is near zero. The compat hack is done
at the same time useful data is being computed. I see no reason to
disable/remove this feature unless the goal is to force people to stop
using old hashes that are likely-vulnerable.

> > Second, what can be done to reduce size?
> 
> I felt the size was acceptable already.  However, if you must, the
> instances of BF_ENCRYPT that are outside of BF_body may be made slower
> with little impact on overall speed.  For example, they may be made a
> function rather than a macro, and the function would only be inlined in
> builds optimized for speed rather than size.
> 
> > I think the first step is
> > replacing the giant macros (BF_ROUND, BF_ENCRYPT, etc.) with
> > functions so that the code doesn't get generated in duplicate unless
> > aggressive inlining is enabled by CFLAGS.
> 
> I see that you did this - and I think you took it too far.  The code
> became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> 140 c/s down to 77 c/s).  Adding -finline-functions
> -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> less aggressive loop unrolling results in lower speeds.

Can you compare with a more modern gcc? 3.x is known to be horrible at
optimizing. It can't even peephole-optimize bswaps.

> The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on FX-8120
> I get 490 c/s for the original code, 450 c/s for your code without
> inlining/unrolling, and somehow only 430 c/s with -finline-functions
> -funroll-loops.

Actually this is a lot closer to what I expected. I think you'll find
similar results on 32-bit with gcc 4.6.3 too. The modern expectation
is that manually unrolling loops will give worse performance than
letting the compiler decide what to do. Certainly there are exceptions
to the expected result, but on average, it's the right decision.

> I think you should revert the changes for the instance of BF_ENCRYPT
> that is inside of BF_body.
> 
> I also think that this code should be optimized for speed even when the
> rest of musl is optimized for size.  In this case, better speed may mean
> better security, because it lets the sysadmin configure a higher
> iteration count for new passwords.

Even if it's twice as slow, that should only be the cost of
incrementing the (logarithmic) iteration count by one). The size
difference between the versions is roughly 50% (7k vs 11.5k with -Os
and roughly 9k vs 13.5k with -O3). Yes one can argue that the
difference doesn't matter for one particular component they especially
care about, but everyone cares about something different, and in the
end the whole library ends up 50% larger if you follow that to its
logical end. I'd much rather stick with letting the compiler do the
bloating-up for performance purposes if the user wants it, so that
the choice is left to them.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  5:28     ` Rich Felker
@ 2012-08-08  6:27       ` Solar Designer
  2012-08-08  7:03         ` Daniel Cegiełka
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-08  6:27 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 01:28:44AM -0400, Rich Felker wrote:
> On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote:
> > I see that you did this - and I think you took it too far.  The code
> > became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> > 140 c/s down to 77 c/s).  Adding -finline-functions
> > -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> > less aggressive loop unrolling results in lower speeds.
> 
> Can you compare with a more modern gcc?

I could and I might do that later, but to me the slowdown with gcc 3 is
enough reason not to make those changes in that specific way.

> > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on FX-8120
> > I get 490 c/s for the original code, 450 c/s for your code without
> > inlining/unrolling, and somehow only 430 c/s with -finline-functions
> > -funroll-loops.
> 
> Actually this is a lot closer to what I expected. I think you'll find
> similar results on 32-bit with gcc 4.6.3 too. The modern expectation
> is that manually unrolling loops will give worse performance than
> letting the compiler decide what to do. Certainly there are exceptions
> to the expected result, but on average, it's the right decision.

Per the numbers above, here the compiler's unroll is slower not only
than manual unroll, but also than non-unrolled code.

> Even if it's twice as slow, that should only be the cost of
> incrementing the (logarithmic) iteration count by one).

Yes, and I think this is significant.

> The size difference between the versions is roughly 50%

It doesn't have to be.  There are 6 instances of BF_ENCRYPT in
BF_crypt().  I am only asking you to revert to their larger form the two
that are inside BF_body.  The remaining 4 may remain as calls to a
function.  Alternatively, all 6 may be function calls, but then the
function's BF_ENCRYPT should be a fully manually unrolled one.  I am not
sure which of these options will be faster overall for typical settings
(we'd need to benchmark these at $2a$08).

> (7k vs 11.5k with -Os
> and roughly 9k vs 13.5k with -O3). Yes one can argue that the
> difference doesn't matter for one particular component they especially
> care about,

Exactly.

> but everyone cares about something different, and in the
> end the whole library ends up 50% larger if you follow that to its
> logical end.

Makes sense.

> I'd much rather stick with letting the compiler do the
> bloating-up for performance purposes if the user wants it, so that
> the choice is left to them.

Maybe you could support -DFAST_CRYPT or the like.  It could enable
forced inlining and manual unrolls in crypt_blowfish.c.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  6:27       ` Solar Designer
@ 2012-08-08  7:03         ` Daniel Cegiełka
  2012-08-08  7:24           ` Solar Designer
  2012-08-08 21:48           ` Rich Felker
  0 siblings, 2 replies; 52+ messages in thread
From: Daniel Cegiełka @ 2012-08-08  7:03 UTC (permalink / raw)
  To: musl

2012/8/8 Solar Designer <solar@openwall.com>:

> Not very important, but nice to keep musl's code revision closer to
> upstream.
> [...]
> > I'm uncertain whether there's any portion of musl's user base that
> > this would be useful to.

If closer to upstream, it would be preferable to use scrypt:
http://www.tarsnap.com/scrypt.html

> Maybe you could support -DFAST_CRYPT or the like.  It could enable
> forced inlining and manual unrolls in crypt_blowfish.c.
>
> Alexander

This can be a very sensible solution.

Daniel


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  7:03         ` Daniel Cegiełka
@ 2012-08-08  7:24           ` Solar Designer
  2012-08-08  7:42             ` Daniel Cegiełka
  2012-08-08 21:48           ` Rich Felker
  1 sibling, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-08  7:24 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 09:03:00AM +0200, Daniel Cegie?ka wrote:
> If closer to upstream, it would be preferable to use scrypt:
> http://www.tarsnap.com/scrypt.html

Huh?  As far as I'm aware, there's still no crypt(3) encoding syntax
defined for scrypt (which is intended primarily as a KDF rather than a
password hashing method for servers), so we'd have to devise our own.
How is that "closer to upstream"?

It does make sense to use scrypt for password hashing, but how exactly
that will be done and whether it'll be scrypt or something future
inspired by scrypt and others is not clear yet:

http://www.openwall.com/lists/crypt-dev/2011/05/12/4
http://www.openwall.com/lists/crypt-dev/2012/08/07/1

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  7:24           ` Solar Designer
@ 2012-08-08  7:42             ` Daniel Cegiełka
  0 siblings, 0 replies; 52+ messages in thread
From: Daniel Cegiełka @ 2012-08-08  7:42 UTC (permalink / raw)
  To: musl

2012/8/8 Solar Designer <solar@openwall.com>:
> On Wed, Aug 08, 2012 at 09:03:00AM +0200, Daniel Cegie?ka wrote:
>> If closer to upstream, it would be preferable to use scrypt:
>> http://www.tarsnap.com/scrypt.html
>
> Huh?  As far as I'm aware, there's still no crypt(3) encoding syntax
> defined for scrypt (which is intended primarily as a KDF rather than a
> password hashing method for servers), so we'd have to devise our own.
> How is that "closer to upstream"?
>
> It does make sense to use scrypt for password hashing, but how exactly
> that will be done and whether it'll be scrypt or something future
> inspired by scrypt and others is not clear yet:
>
> http://www.openwall.com/lists/crypt-dev/2011/05/12/4
> http://www.openwall.com/lists/crypt-dev/2012/08/07/1
>
> Alexander

Right. So blowfish/gensalt going to be a good solution for us (at this time).

Daniel


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  4:42   ` Solar Designer
  2012-08-08  5:28     ` Rich Felker
@ 2012-08-08  7:52     ` Szabolcs Nagy
  2012-08-08 13:06       ` Rich Felker
  1 sibling, 1 reply; 52+ messages in thread
From: Szabolcs Nagy @ 2012-08-08  7:52 UTC (permalink / raw)
  To: musl

* Solar Designer <solar@openwall.com> [2012-08-08 08:42:35 +0400]:
> I see that you did this - and I think you took it too far.  The code
> became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> 140 c/s down to 77 c/s).  Adding -finline-functions
> -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> less aggressive loop unrolling results in lower speeds.
> 

i thought slowness is a feature in this case..

at least that was the general agreement about the
size vs speed tradeoff of the des code


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
@ 2012-08-08 13:06       ` Rich Felker
  2012-08-08 14:30         ` orc
                           ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-08 13:06 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 09:52:34AM +0200, Szabolcs Nagy wrote:
> * Solar Designer <solar@openwall.com> [2012-08-08 08:42:35 +0400]:
> > I see that you did this - and I think you took it too far.  The code
> > became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> > 140 c/s down to 77 c/s).  Adding -finline-functions
> > -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> > less aggressive loop unrolling results in lower speeds.
> > 
> 
> i thought slowness is a feature in this case..
> 
> at least that was the general agreement about the
> size vs speed tradeoff of the des code

Solar's argument is that if you want more slowness, you should just
use a higher iteration count that also affects people trying to crack
your passwords. And being slower at the same count discourages
increasing the count.

Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
passwords, a malicious user can put a password with count=31 (it's
logarithmic, so this means 2^31) in their tcb shadow file. This will
cause a root-owned process to eat 100% cpu for hours if not days.
Perform a few simultaneous login attempts and the whole server becomes
unusable.

I don't know how to solve it, but in musl I think we'll have to put a
low limit on count if we're going to support blowfish. Unfortunately I
don't see a good way to make it runtime configurable without
hard-coding additional non-standard config paths, but letting the DoS
bug slip in is not acceptable.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 13:06       ` Rich Felker
@ 2012-08-08 14:30         ` orc
  2012-08-08 14:53           ` Szabolcs Nagy
  2012-08-08 18:10         ` Rich Felker
  2012-08-09  1:51         ` Solar Designer
  2 siblings, 1 reply; 52+ messages in thread
From: orc @ 2012-08-08 14:30 UTC (permalink / raw)
  To: musl

On Wed, 8 Aug 2012 09:06:23 -0400
Rich Felker <dalias@aerifal.cx> wrote:

> On Wed, Aug 08, 2012 at 09:52:34AM +0200, Szabolcs Nagy wrote:
> > * Solar Designer <solar@openwall.com> [2012-08-08 08:42:35 +0400]:
> > > I see that you did this - and I think you took it too far.  The
> > > code became twice slower on Pentium 3 when compiling with gcc
> > > 3.4.5 (approx. 140 c/s down to 77 c/s).  Adding -finline-functions
> > > -fold-unroll-all-loops regains only a fraction of the speed (112
> > > c/s); less aggressive loop unrolling results in lower speeds.
> > > 
> > 
> > i thought slowness is a feature in this case..
> > 
> > at least that was the general agreement about the
> > size vs speed tradeoff of the des code
> 
> Solar's argument is that if you want more slowness, you should just
> use a higher iteration count that also affects people trying to crack
> your passwords. And being slower at the same count discourages
> increasing the count.
> 
> Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> passwords, a malicious user can put a password with count=31 (it's
> logarithmic, so this means 2^31) in their tcb shadow file. This will
> cause a root-owned process to eat 100% cpu for hours if not days.
> Perform a few simultaneous login attempts and the whole server becomes
> unusable.

That's why glibc does not implements tcb scheme internally? (Not just
because Drepper can say "this is useless")
They have 'rounds=' argument in their crypt() sha256/512 implementation.

> 
> I don't know how to solve it, but in musl I think we'll have to put a
> low limit on count if we're going to support blowfish. Unfortunately I
> don't see a good way to make it runtime configurable without
> hard-coding additional non-standard config paths, but letting the DoS
> bug slip in is not acceptable.

While I experimented with musl-enabled system I implemented another
password hashing algorithm in musl (because musl had only des encryption
with max. 8 password chars) based on skein hash. I also separately
written small code for parsing standalone config file to take
additional parameters like second salt (just for testing, then I leaved
it so any host can have different hashes) and number of rounds. This
file can be accessed by root (obviously) and programs that require user
auth (I set sgid bit on them and 'tcb' group, same group on file with
settings).

Should we support such way to set number of rounds (count) and revert to
hardcoded one if file cannot be read?

> 
> Rich



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 14:30         ` orc
@ 2012-08-08 14:53           ` Szabolcs Nagy
  2012-08-08 15:05             ` orc
  0 siblings, 1 reply; 52+ messages in thread
From: Szabolcs Nagy @ 2012-08-08 14:53 UTC (permalink / raw)
  To: musl

* orc <orc@sibserver.ru> [2012-08-08 22:30:01 +0800]:
> While I experimented with musl-enabled system I implemented another
> password hashing algorithm in musl (because musl had only des encryption
> with max. 8 password chars) based on skein hash. I also separately

why not use a known password-based key derivation function?


musl shouldn't do too much crypto experiments, but select
something that's reasonable and already available
(i wouldn't do musl specific crypto api)


(at some point it may be reasonable to provide alternative
apis for things that are broken in posix or just missing
but those should go into a separate lib and we are not there
yet)



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 14:53           ` Szabolcs Nagy
@ 2012-08-08 15:05             ` orc
  0 siblings, 0 replies; 52+ messages in thread
From: orc @ 2012-08-08 15:05 UTC (permalink / raw)
  To: musl

On Wed, 8 Aug 2012 16:53:03 +0200
Szabolcs Nagy <nsz@port70.net> wrote:

> * orc <orc@sibserver.ru> [2012-08-08 22:30:01 +0800]:
> > While I experimented with musl-enabled system I implemented another
> > password hashing algorithm in musl (because musl had only des
> > encryption with max. 8 password chars) based on skein hash. I also
> > separately
> 
> why not use a known password-based key derivation function?
> 
> 
> musl shouldn't do too much crypto experiments, but select
> something that's reasonable and already available
> (i wouldn't do musl specific crypto api)
> 
> 
> (at some point it may be reasonable to provide alternative
> apis for things that are broken in posix or just missing
> but those should go into a separate lib and we are not there
> yet)
> 

This was just my experiment, since there were no news about bcrypt,
only plans.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 13:06       ` Rich Felker
  2012-08-08 14:30         ` orc
@ 2012-08-08 18:10         ` Rich Felker
  2012-08-09  1:51         ` Solar Designer
  2 siblings, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-08 18:10 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 09:06:23AM -0400, Rich Felker wrote:
> Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> passwords, a malicious user can put a password with count=31 (it's
> logarithmic, so this means 2^31) in their tcb shadow file. This will
> cause a root-owned process to eat 100% cpu for hours if not days.
> Perform a few simultaneous login attempts and the whole server becomes
> unusable.
> 
> I don't know how to solve it, but in musl I think we'll have to put a
> low limit on count if we're going to support blowfish. Unfortunately I
> don't see a good way to make it runtime configurable without
> hard-coding additional non-standard config paths, but letting the DoS
> bug slip in is not acceptable.

OK, here's my proposed direction for a fix. I definitely don't want to
be _unconditionally_ reading a config file for every crypt() call,
since that would adversely affect even calls with sane iteration
counts and perhaps dominate the run time or at least the cache
thrashing. Instead, I propose we come up with a range of iteration
counts that are "remotely sane" in the sense of taking (for example)
less than 250ms on a very low-end system, and always allowing any
count in this range. Initially we can just forbid higher counts, and
later we can extend the code to probe some sort of configuration when
the default limit is exceeded to see if there's a configuration
extending the limit.

On the other hand, for self-contained static binaries, it might be
desirable the have the limit be configured into the binary. This could
be achieved by making a weak symbol whose address is used as the
limit, and then -Wl,--defsym=__crypt_iter_max=20 or similar could be
used to configure it at link time.

In any case, if the default limit is sufficiently large, I think we
can get away without taking any action on the configurability right
away..

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08  7:03         ` Daniel Cegiełka
  2012-08-08  7:24           ` Solar Designer
@ 2012-08-08 21:48           ` Rich Felker
  2012-08-08 23:08             ` Isaac Dunham
  2012-08-09  3:36             ` Solar Designer
  1 sibling, 2 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-08 21:48 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 09:03:00AM +0200, Daniel Cegiełka wrote:
> > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > forced inlining and manual unrolls in crypt_blowfish.c.
> >
> > Alexander
> 
> This can be a very sensible solution.

Unless there's a really compelling reason to do so, I'd like to avoid
having multiple alternative versions of the same code in a codebase.
It makes it so there's more combinations you have to test to be sure
the code works and doesn't have regressions.

As it stands, the code I posted with the manual unrolling removed
performs _better_ than the manually unrolled code with gcc 4 on x86_64
when optimized for speed, and it's 33% smaller when optimized for
size.

As for being slower on gcc 3, there's already much more
performance-critical code that's significantly slower on musl+gcc3
than on glibc due to gcc3 badness, for example all of the
endianness-swapping functions (byteswap.h and htonl,etc. in netdb.h).
Really the only place where crypt performance is critical is in JtR,
and there you're using your own optimized code internal to JtR, right?
Even if crypt is half-speed on gcc3 without the manual unrolling, that
still only makes a 1-order-of-magnitude (base 2) difference to the
iterations you can use while keeping the same responsiveness/load,
i.e. not nearly enough to make or break somebody's ability to crack
your hashes. (In general, as long as you don't try to iterate this
principle, an attacker who can afford N time (or N cores) can also
afford 2*N time (or 2*N cores).)

Aside from my own feelings on the matter, I'm trying to consider the
impressions it makes on our user base. I've already taken some heat
for replacing the heap sort qsort code in musl with smoothsort,
despite it being a lot faster in a task where performance is generally
important, and the size difference was less than this crypt unrolling.
When someone frustrated with bloat sees hand-unrolled loops, their
first reaction is "eew, this code is bloated". My intent with
modernizing (and fixing the stack usage) of the old DES crypt code was
to save enough space that we could get the new algorithms (blowfish,
md5, sha) integrated without much (or even any, if possible) size
increase versus the old bad DES code. I think this makes a difference
for "selling" the idea of supporting all these algorithms to the
anti-bloat faction of musl's following.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 21:48           ` Rich Felker
@ 2012-08-08 23:08             ` Isaac Dunham
  2012-08-08 23:24               ` John Spencer
  2012-08-09  3:16               ` Rich Felker
  2012-08-09  3:36             ` Solar Designer
  1 sibling, 2 replies; 52+ messages in thread
From: Isaac Dunham @ 2012-08-08 23:08 UTC (permalink / raw)
  To: musl

On Wed, 8 Aug 2012 17:48:55 -0400
Rich Felker <dalias@aerifal.cx> wrote:

> > > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > > forced inlining and manual unrolls in crypt_blowfish.c.
..
> Unless there's a really compelling reason to do so, I'd like to avoid
> having multiple alternative versions of the same code in a codebase.
> It makes it so there's more combinations you have to test to be sure
> the code works and doesn't have regressions.
> 
> As it stands, the code I posted with the manual unrolling removed
> performs _better_ than the manually unrolled code with gcc 4 on x86_64
> when optimized for speed, and it's 33% smaller when optimized for
> size.

Per your own tests?
I say this because the test previously mentioned shows the
opposite:
> > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on
> > FX-8120 I get 490 c/s for the original code, 450 c/s for your code
> > without inlining/unrolling, and somehow only 430 c/s with
> > -finline-functions -funroll-loops.  

that's :
Raw	%speed	version
490 c/s	100%	original
450 c/s	92%	rich's version
430 c/s	88%	rich's version, unrolled by compiler
Higher is faster.
IE, unrolling is actually slowing your version down more.

GCC 3/x86 is getting 80% with rich's version, optimized.

Also, how much "bloat" does solar designer's proposal (unroll inside
BF_body) add?

Isaac Dunham



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 23:08             ` Isaac Dunham
@ 2012-08-08 23:24               ` John Spencer
  2012-08-09  1:03                 ` Isaac Dunham
  2012-08-09  3:16               ` Rich Felker
  1 sibling, 1 reply; 52+ messages in thread
From: John Spencer @ 2012-08-08 23:24 UTC (permalink / raw)
  To: musl

On 08/09/2012 01:08 AM, Isaac Dunham wrote:
>
> Also, how much "bloat" does solar designer's proposal (unroll inside
> BF_body) add?
On 08/08/2012 07:28 AM, Rich Felker wrote:

The size difference between the versions is roughly 50% (7k vs 11.5k with -Os
and roughly 9k vs 13.5k with -O3).


> Isaac Dunham
+1 for the size (and clean code) vs speed tradeoff
crypt is a function that very rarely gets called. better keep it 
non-bloated.
if it takes 0.05 instead of 0.02 seconds, nobody will ever care...


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 23:24               ` John Spencer
@ 2012-08-09  1:03                 ` Isaac Dunham
  0 siblings, 0 replies; 52+ messages in thread
From: Isaac Dunham @ 2012-08-09  1:03 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1624 bytes --]

On Thu, 09 Aug 2012 01:24:18 +0200
John Spencer <maillist-musl@barfooze.de> wrote:

> On 08/09/2012 01:08 AM, Isaac Dunham wrote:
> >
> > Also, how much "bloat" does solar designer's proposal (unroll inside
> > BF_body) add?
> On 08/08/2012 07:28 AM, Rich Felker wrote:
> 
> The size difference between the versions is roughly 50% (7k vs 11.5k
> with -Os and roughly 9k vs 13.5k with -O3).

That's not for Solar Designer's proposal, it's for Luka's original
patch.
The proposal should be somewhere in between; I have a modified one
that's running ~1.9k more as .o (which I have _not_ made pretty;
it's just for a test!)

But just for comparison, I did a check:
$ size src/misc/crypt_blowfish*o                              
   text    data     bss     dec     hex filename
  17089       8       0   17097    42c9 src/misc/crypt_blowfish.lo
   8738       0       0    8738    2222 src/misc/crypt_blowfish.o
  13840      12       0   13852    361c src/misc/crypt_blowfish_old.lo
  11430       4       0   11434    2caa src/misc/crypt_blowfish_old.o
  17617       8       0   17625    44d9 src/misc/crypt_blowfish_rich.lo
   6845       0       0    6845    1abd src/misc/crypt_blowfish_rich.o
All compiled with same flags, using GCC 4.1 on Lucid. 
(-Os = .o; -O3 -fPIC = .lo).


> +1 for the size (and clean code) vs speed tradeoff
> crypt is a function that very rarely gets called. better keep it 
> non-bloated.
It's interesting that in the shared version (the only case where
crypt() size matters to apps not using it), size seems to be
inverted, with the old manually unrolled version weighing about 4k less.

Isaac Dunham

[-- Attachment #2: crypt_blowfish.c --]
[-- Type: text/x-c++src, Size: 30867 bytes --]

/* Modified by Rich Felker for inclusion in musl libc. Main changes
 * made were reversing the manual inlining/loop unrolling and
 * replacing it with code that can be optimized for size or speed
 * depending on compiler flags. Endianness check was also optimized to
 * be a compile-time constant. */

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2011.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2011 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef struct {
	BF_word S[4][0x100];
	BF_key P;
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {
	{
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}, {
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}
};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

#define BF_safe_atoi64(dst, src) \
{ \
	tmp = (unsigned char)(src); \
	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
	tmp = BF_atoi64[tmp]; \
	if (tmp > 63) return -1; \
	(dst) = tmp; \
}

static int BF_decode(BF_word *dst, const char *src, int size)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + size;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	do {
		BF_safe_atoi64(c1, *sptr++);
		BF_safe_atoi64(c2, *sptr++);
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr >= end) break;

		BF_safe_atoi64(c3, *sptr++);
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		if (dptr >= end) break;

		BF_safe_atoi64(c4, *sptr++);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	} while (dptr < end);

	return 0;
}

static void BF_encode(char *dst, const BF_word *src, int size)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + size;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	do {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	} while (sptr < end);
}

static void BF_swap(BF_word *x, int count)
{
	BF_word tmp;

	if ((union { int i; char c; }){1}.c)
	do {
		tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

static BF_word BF_round(BF_word L, int N, BF_ctx *ctx)
{
	return ctx->P[N + 1] ^ (ctx->S[3][L & 0xFF]
		+ (ctx->S[2][L >> 8 & 0xFF]
		^ (ctx->S[1][L >> 16 & 0xFF]
		+ ctx->S[0][L >> 24])));
}

#define BF_ROUND(L, R, N) ( R ^= BF_round(L, N, ctx) )
#define BF_RND(L, R, N) \
	tmp1 = L & 0xFF; \
	tmp2 = L >> 8; \
	tmp2 &= 0xFF; \
	tmp3 = L >> 16; \
	tmp3 &= 0xFF; \
	tmp4 = L >> 24; \
	tmp1 = data.ctx.S[3][tmp1]; \
	tmp2 = data.ctx.S[2][tmp2]; \
	tmp3 = data.ctx.S[1][tmp3]; \
	tmp3 += data.ctx.S[0][tmp4]; \
	tmp3 ^= tmp2; \
	R ^= data.ctx.P[N + 1]; \
	tmp3 += tmp1; \
	R ^= tmp3;


static BF_word BF_encrypt(BF_word L, BF_word *RP, BF_ctx *ctx)
{
	int i;
	BF_word R = *RP;
	L ^= ctx->P[0];
	for (i=0; i<16; i+=2) {
		BF_ROUND(L, R, i);
		BF_ROUND(R, L, i+1);
	}
	*RP = L;
	return R ^ ctx->P[BF_N + 1];
}

/*
 * Rich used:
 */
#define BF_ENCRYPT ( L = BF_encrypt(L, &R, &data.ctx) )

#define BF_ENCRYPT_ \
	L ^= data.ctx.P[0]; \
	BF_RND(L, R, 0); \
	BF_RND(R, L, 1); \
	BF_RND(L, R, 2); \
	BF_RND(R, L, 3); \
	BF_RND(L, R, 4); \
	BF_RND(R, L, 5); \
	BF_RND(L, R, 6); \
	BF_RND(R, L, 7); \
	BF_RND(L, R, 8); \
	BF_RND(R, L, 9); \
	BF_RND(L, R, 10); \
	BF_RND(R, L, 11); \
	BF_RND(L, R, 12); \
	BF_RND(R, L, 13); \
	BF_RND(L, R, 14); \
	BF_RND(R, L, 15); \
	tmp4 = R; \
	R = L; \
	L = tmp4 ^ data.ctx.P[BF_N + 1];



#define BF_body() \
	L = R = 0; \
	ptr = data.ctx.P; \
	do { \
		ptr += 2; \
		BF_ENCRYPT_; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.P[BF_N + 2]); \
\
	ptr = data.ctx.S[0]; \
	do { \
		ptr += 2; \
		BF_ENCRYPT_; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.S[3][0xFF]);

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, int size,
	BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word L, R;
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr;
	BF_word count;
	int i;

	if (size < 7 + 22 + 31 + 1) {
		errno = ERANGE;
		return NULL;
	}

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] < 'a' || setting[2] > 'z' ||
	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] < '0' || setting[4] > '3' ||
	    setting[5] < '0' || setting[5] > '9' ||
	    (setting[4] == '3' && setting[5] > '1') ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.S, BF_init_state.S, sizeof(data.ctx.S));

	L = R = 0;
	for (i = 0; i < BF_N + 2; i += 2) {
		L ^= data.binary.salt[i & 2];
		R ^= data.binary.salt[(i & 2) + 1];
		BF_ENCRYPT;
		data.ctx.P[i] = L;
		data.ctx.P[i + 1] = R;
	}

	ptr = data.ctx.S[0];
	do {
		ptr += 4;
		L ^= data.binary.salt[(BF_N + 2) & 3];
		R ^= data.binary.salt[(BF_N + 3) & 3];
		BF_ENCRYPT;
		*(ptr - 4) = L;
		*(ptr - 3) = R;

		L ^= data.binary.salt[(BF_N + 4) & 3];
		R ^= data.binary.salt[(BF_N + 5) & 3];
		BF_ENCRYPT;
		*(ptr - 2) = L;
		*(ptr - 1) = R;
	} while (ptr < &data.ctx.S[3][0xFF]);

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.P[i] ^= data.expanded_key[i];
			data.ctx.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_body();
			if (done)
				break;
			done = 1;

			tmp1 = data.binary.salt[0];
			tmp2 = data.binary.salt[1];
			tmp3 = data.binary.salt[2];
			tmp4 = data.binary.salt[3];
			for (i = 0; i < BF_N; i += 4) {
				data.ctx.P[i] ^= tmp1;
				data.ctx.P[i + 1] ^= tmp2;
				data.ctx.P[i + 2] ^= tmp3;
				data.ctx.P[i + 3] ^= tmp4;
			}
			data.ctx.P[16] ^= tmp1;
			data.ctx.P[17] ^= tmp2;
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		L = BF_magic_w[i];
		R = BF_magic_w[i + 1];

		count = 64;
		do {
			BF_ENCRYPT;
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = R;
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode(&output[7 + 22], data.binary.output, 23);
	output[7 + 22 + 31] = '\0';

	return output;
}

int _crypt_output_magic(const char *setting, char *output, int size)
{
	if (size < 3)
		return -1;

	output[0] = '*';
	output[1] = '0';
	output[2] = '\0';

	if (setting[0] == '*' && setting[1] == '0')
		output[1] = '1';

	return 0;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *_crypt_blowfish_rn(const char *key, const char *setting,
	char *output, int size)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	_crypt_output_magic(setting, output, size);
	retval = BF_crypt(key, setting, output, size, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	_crypt_output_magic(setting, output, size);
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 13:06       ` Rich Felker
  2012-08-08 14:30         ` orc
  2012-08-08 18:10         ` Rich Felker
@ 2012-08-09  1:51         ` Solar Designer
  2012-08-09  3:25           ` Rich Felker
  2012-08-09 23:33           ` Rich Felker
  2 siblings, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-09  1:51 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 09:06:23AM -0400, Rich Felker wrote:
> Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> passwords, a malicious user can put a password with count=31 (it's
> logarithmic, so this means 2^31) in their tcb shadow file.

Yes, but only after having compromised group shadow.  If a user does
compromise group shadow, I'd appreciate learning of that - even if via
being DoS'ed. ;-)

Direct access to tcb shadow files should not be available for other
reasons as well, including password policy enforcement and not assisting
in exploitation of read-any-file vulnerabilities e.g. in web apps into
remote shell access.

If you implement tcb differently, then _that_ should be fixed.  It is
not a musl issue since musl does not set file permissions (nor is it
supposed to).  Whatever you use to create/update the files may need to
be fixed.

> I don't know how to solve it, but in musl I think we'll have to put a
> low limit on count if we're going to support blowfish.

That's not good.

BTW, the extended DES-based hashes that are already supported in musl
allow for variable iteration counts encoded along with hashes too, and
that's the way it should be.

> Unfortunately I
> don't see a good way to make it runtime configurable without
> hard-coding additional non-standard config paths, but letting the DoS
> bug slip in is not acceptable.

I agree that allowing for the DoS without the attacker having to find
and exploit a vulnerability first is not acceptable, but the issue is
not in crypt_blowfish nor in musl per se.

Our tcb suite as released by Openwall does not provide direct access to
those files.  It protects them with group shadow.

Do you have any released (rather than private use) software that makes
tcb shadow files user-writable without requiring any group privileges?

If not, then this issue should not affect musl development in any way,
and you may fix whatever you use privately as a separate step.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 23:08             ` Isaac Dunham
  2012-08-08 23:24               ` John Spencer
@ 2012-08-09  3:16               ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09  3:16 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 04:08:10PM -0700, Isaac Dunham wrote:
> On Wed, 8 Aug 2012 17:48:55 -0400
> Rich Felker <dalias@aerifal.cx> wrote:
> 
> > > > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > > > forced inlining and manual unrolls in crypt_blowfish.c.
> ...
> > Unless there's a really compelling reason to do so, I'd like to avoid
> > having multiple alternative versions of the same code in a codebase.
> > It makes it so there's more combinations you have to test to be sure
> > the code works and doesn't have regressions.
> > 
> > As it stands, the code I posted with the manual unrolling removed
> > performs _better_ than the manually unrolled code with gcc 4 on x86_64
> > when optimized for speed, and it's 33% smaller when optimized for
> > size.
> 
> Per your own tests?
> I say this because the test previously mentioned shows the
> opposite:

OK, I misread the units as c=cycles and s=?? instead of c=crypts and
s=sec. But of course that doesn't make sense..

> > > The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on
> > > FX-8120 I get 490 c/s for the original code, 450 c/s for your code
> > > without inlining/unrolling, and somehow only 430 c/s with
> > > -finline-functions -funroll-loops.  
> 
> that's :
> Raw	%speed	version
> 490 c/s	100%	original
> 450 c/s	92%	rich's version
> 430 c/s	88%	rich's version, unrolled by compiler
> Higher is faster.
> IE, unrolling is actually slowing your version down more.
> 
> GCC 3/x86 is getting 80% with rich's version, optimized.
> 
> Also, how much "bloat" does solar designer's proposal (unroll inside
> BF_body) add?

Source bloat, even worse than either version. It requires completely
duplicating the whole function (once unrolled, once straight). I have
no idea how much binary bloat it adds; anybody care to try it? My
principal hesitation to even go there is that it (1) makes really ugly
source bloat, and (2) perhaps cuts the binary bloat savings in half or
even worse, making the savings marginal and arguably no longer worth
the cost of the source bloat from having 2 copies of the same code.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  1:51         ` Solar Designer
@ 2012-08-09  3:25           ` Rich Felker
  2012-08-09  4:04             ` Solar Designer
  2012-08-09 23:33           ` Rich Felker
  1 sibling, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09  3:25 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 05:51:04AM +0400, Solar Designer wrote:
> On Wed, Aug 08, 2012 at 09:06:23AM -0400, Rich Felker wrote:
> > Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> > passwords, a malicious user can put a password with count=31 (it's
> > logarithmic, so this means 2^31) in their tcb shadow file.
> 
> Yes, but only after having compromised group shadow.  If a user does
> compromise group shadow, I'd appreciate learning of that - even if via
> being DoS'ed. ;-)

OK, so your intent is to require sgid-shadow utilities to update
passwords? How is this significantly better than the old suid-root
way? If someone compromises the utilities, they can change any user's
password (including perhaps root's?) and thus gain access to any
account. I would much rather just let users have rights to update
their own shadow files (and throw away/ignore all the silly policy
stuff in the shadow db; PAM can handle that better anyway) than risk
compromise of other user's (or worse, root's) passwords due to a bug
in the passwd program or similar... I thought the whole point of tcb
was to get us past suid/sgid madness.

> Direct access to tcb shadow files should not be available for other
> reasons as well, including password policy enforcement and not assisting
> in exploitation of read-any-file vulnerabilities e.g. in web apps into
> remote shell access.

Hm? We already protect against symlink issues. This was discussed when
tcb support in musl was first discussed.

> If you implement tcb differently, then _that_ should be fixed.  It is
> not a musl issue since musl does not set file permissions (nor is it
> supposed to).  Whatever you use to create/update the files may need to
> be fixed.

Indeed, this has nothing to do with musl. It's just my preferred
policy of having NO suid programs at all and no sgid programs that
could cause other users' accounts to be compromised if they were
compromised. Of course if you handle it with a daemon rather than suid
(where there's only a single channel of input, not all sorts of ways
you can control the environment the program runs in) then it may be
okay to use a group like this...

> > I don't know how to solve it, but in musl I think we'll have to put a
> > low limit on count if we're going to support blowfish.
> 
> That's not good.
> 
> BTW, the extended DES-based hashes that are already supported in musl
> allow for variable iteration counts encoded along with hashes too, and
> that's the way it should be.

Hmm, then we need to address that issue too. I consider O(2^2^n)
performance when processing potentially-untrusted input a major DoS
vuln. (It's 2^2^n where n is the number of bits in the logarithmic
iteration count).

There's no reason applications should not be able to assume they can
safely call crypt where both the hash/salt/setting and key were
provided by an untrusted party.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-08 21:48           ` Rich Felker
  2012-08-08 23:08             ` Isaac Dunham
@ 2012-08-09  3:36             ` Solar Designer
  2012-08-09  7:13               ` orc
  2012-08-09  7:29               ` Solar Designer
  1 sibling, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-09  3:36 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 05:48:55PM -0400, Rich Felker wrote:
> On Wed, Aug 08, 2012 at 09:03:00AM +0200, Daniel Cegie??ka wrote:
> > > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > > forced inlining and manual unrolls in crypt_blowfish.c.
> > >
> > > Alexander
> > 
> > This can be a very sensible solution.
> 
> Unless there's a really compelling reason to do so, I'd like to avoid
> having multiple alternative versions of the same code in a codebase.
> It makes it so there's more combinations you have to test to be sure
> the code works and doesn't have regressions.

This makes sense.

> As it stands, the code I posted with the manual unrolling removed
> performs _better_ than the manually unrolled code with gcc 4 on x86_64
> when optimized for speed,

For me, it does not.  The opposite is true.

> and it's 33% smaller when optimized for size.

I think we can arrive at almost the same size reduction without such
loss in speed.

BTW, I did not mention it yet, but the asm code that you dropped
probably performed a lot faster on Atom.  (This is based on user reports
for similar code in JtR.)  Anyone with an Atom to test?

> As for being slower on gcc 3, there's already much more
> performance-critical code that's significantly slower on musl+gcc3
> than on glibc due to gcc3 badness, for example all of the
> endianness-swapping functions (byteswap.h and htonl,etc. in netdb.h).

Not everyone will agree that those things you mention are much more
performance-critical.

> Really the only place where crypt performance is critical is in JtR,

As you point out below, it is also relevant when doing authentication.

> and there you're using your own optimized code internal to JtR, right?

Yes.  In fact, JtR's code is even faster since it processes multiple
candidate passwords at once (per logical CPU) for greater
instruction-level parallelism - something we can't reasonably do here.

> Even if crypt is half-speed on gcc3 without the manual unrolling, that
> still only makes a 1-order-of-magnitude (base 2) difference to the
> iterations you can use while keeping the same responsiveness/load,
> i.e. not nearly enough to make or break somebody's ability to crack
> your hashes. (In general, as long as you don't try to iterate this
> principle, an attacker who can afford N time (or N cores) can also
> afford 2*N time (or 2*N cores).)

Yes, but that's not the only relevant threat model.  It is also common
(or even more common) for the attacker to apply a fixed amount of
resources (whatever they happen to have and/or are willing to use) and
get a certain percentage of passwords cracked.  This percentage is
(slightly) affected by the cost of computing each password hash.  For a
(small) subset of the users, this will be the difference between their
password having been cracked vs. not.

Another way to think of it is that an average password's entropy may be,
say, 40 bits with some not-too-strict password policy in place (30 bits
without).  A 1-bit difference in the amount of password stretching we
provide is thus similar to a 2.5% difference in password entropy (or
more without password policy).  Passwords would need to be accordingly
more complicated to compensate for that, or they'd be weaker by this much.

For, say, 1 million users (e.g. total across multiple systems with
musl), that's 1 megabit of human memory - and I think that's a lot.
We can't directly compare this to the code size increase on those systems
(e.g., a few kilobytes per system or even per crypt-using program with
static linking), but clearly human memory is a far more precious
resource than computer memory.

> Aside from my own feelings on the matter, I'm trying to consider the
> impressions it makes on our user base. I've already taken some heat
> for replacing the heap sort qsort code in musl with smoothsort,
> despite it being a lot faster in a task where performance is generally
> important, and the size difference was less than this crypt unrolling.

Oh, I don't recall that.  Was it on this list?

> When someone frustrated with bloat sees hand-unrolled loops, their
> first reaction is "eew, this code is bloated".

I am frustrated with bloat, but hand-unrolled loops in crypto code, in
memcpy(), etc. look just right to me.

There's no compiler option to ask it to unroll only the truly
performance-critical loops.

> My intent with
> modernizing (and fixing the stack usage) of the old DES crypt code was
> to save enough space that we could get the new algorithms (blowfish,
> md5, sha) integrated without much (or even any, if possible) size
> increase versus the old bad DES code. I think this makes a difference
> for "selling" the idea of supporting all these algorithms to the
> anti-bloat faction of musl's following.

Understood.

How about we (almost) achieve your size goal for crypt_blowfish (7 KB
including tables), yet keep the hand-unrolling?  I think this is
possible, albeit maybe with some tricks.

Is use of GNU C extensions (also supported in clang) acceptable?  I am
thinking of a "Labels as Values" trick to share more code.  This would
make the source code less readable, though.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  3:25           ` Rich Felker
@ 2012-08-09  4:04             ` Solar Designer
  2012-08-09  5:48               ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-09  4:04 UTC (permalink / raw)
  To: musl

On Wed, Aug 08, 2012 at 11:25:27PM -0400, Rich Felker wrote:
> On Thu, Aug 09, 2012 at 05:51:04AM +0400, Solar Designer wrote:
> > On Wed, Aug 08, 2012 at 09:06:23AM -0400, Rich Felker wrote:
> > > Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> > > passwords, a malicious user can put a password with count=31 (it's
> > > logarithmic, so this means 2^31) in their tcb shadow file.
> > 
> > Yes, but only after having compromised group shadow.  If a user does
> > compromise group shadow, I'd appreciate learning of that - even if via
> > being DoS'ed. ;-)

I think I need to clarify that the above is my own preference and some
sysadmin's preference might be different.  This is a reason why we deal
with whatever DoS attacks may be easily dealt with anyway - those with
non-regular files.

For DoS via high iteration count, I see no good solution other than to
accept this as a possibility for when group shadow is compromised.

> OK, so your intent is to require sgid-shadow utilities to update
> passwords?

Yes.

/usr/bin/passwd and (if enabled) /usr/bin/chage on Owl are SGID shadow.

> How is this significantly better than the old suid-root
> way? If someone compromises the utilities, they can change any user's
> password

No, they can't.  They only gain the ability to perform this DoS attack,
bypass password policy for their own account, and mount certain further
attacks - which would require another vulnerability to succeed.  So we
have an extra layer of security here.

> I would much rather just let users have rights to update
> their own shadow files

This is precisely what happens after group shadow is somehow compromised.
But we don't provide that level of access right away for the reasons I
mentioned.

> (and throw away/ignore all the silly policy
> stuff in the shadow db; PAM can handle that better anyway)

PAM needs a place to store that info anyway, and in typical cases the
fields of the shadow db are just sufficient.

> than risk
> compromise of other user's (or worse, root's) passwords due to a bug
> in the passwd program or similar...

Our tcb scheme was designed precisely to mitigate this risk.

> I thought the whole point of tcb was to get us past suid/sgid madness.

Past SUID madness, yes.  SGID, on the other hand, is sometimes a way to
have multiple layers of security.  We use it for that in Owl - not only
in tcb, but also e.g. for crontab(1).  We always assume that the group
access might be compromised, yet we do introduce this separation layer.

This reminds me of how early versions of Postfix (VMailer at the time)
tried to avoid SUID/SGID altogether, but eventually, after some debates
between DJB and Wietse on Bugtraq, Postfix started using SGID for its
postdrop program - similarly to how qmail does that.  I think this move
was just right.

> > Direct access to tcb shadow files should not be available for other
> > reasons as well, including password policy enforcement and not assisting
> > in exploitation of read-any-file vulnerabilities e.g. in web apps into
> > remote shell access.
> 
> Hm? We already protect against symlink issues. This was discussed when
> tcb support in musl was first discussed.

Right.  That's not what I was referring to here.

Suppose any user can read their own tcb shadow file without any special
privileges.  This means that a CGI program running as them can read this
file too.  If the program has a remote read-any-file vulnerability, the
remote attacker can read the user's password hash, possibly succeed in
cracking it offline, then SSH in (if the service is running, etc.)  On
Owl, we avoid this risk by requiring group shadow for /etc/tcb access.

> > If you implement tcb differently, then _that_ should be fixed.  It is
> > not a musl issue since musl does not set file permissions (nor is it
> > supposed to).  Whatever you use to create/update the files may need to
> > be fixed.
> 
> Indeed, this has nothing to do with musl. It's just my preferred
> policy of having NO suid programs at all and no sgid programs that
> could cause other users' accounts to be compromised if they were
> compromised.

That's our preference (for Owl) too.  I think you misunderstood our use
of group shadow.  It does not "cause other users' accounts to be
compromised" if it is compromised.

> > BTW, the extended DES-based hashes that are already supported in musl
> > allow for variable iteration counts encoded along with hashes too, and
> > that's the way it should be.
> 
> Hmm, then we need to address that issue too.

Just implement tcb right, like we do. ;-)

> There's no reason applications should not be able to assume they can
> safely call crypt where both the hash/salt/setting and key were
> provided by an untrusted party.

There is such a reason for almost two decades now - since BSDi's
introduction of iteration counts in "setting" strings in 1993 or so.
That's the reality by now, and I think we should not be trying to change
it in musl.

As to untrusted keys (passwords/phrases), I agree with you.  Those kinds
of issues should be avoided.  glibc's uses of alloca() in SHA-crypt were
recently patched for that reason.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  4:04             ` Solar Designer
@ 2012-08-09  5:48               ` Rich Felker
  2012-08-09 15:52                 ` Solar Designer
  0 siblings, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09  5:48 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 08:04:32AM +0400, Solar Designer wrote:
> On Wed, Aug 08, 2012 at 11:25:27PM -0400, Rich Felker wrote:
> > On Thu, Aug 09, 2012 at 05:51:04AM +0400, Solar Designer wrote:
> > > On Wed, Aug 08, 2012 at 09:06:23AM -0400, Rich Felker wrote:
> > > > Actually this brings up a HUGE DoS vuln in blowfish crypt: with tcb
> > > > passwords, a malicious user can put a password with count=31 (it's
> > > > logarithmic, so this means 2^31) in their tcb shadow file.
> > > 
> > > Yes, but only after having compromised group shadow.  If a user does
> > > compromise group shadow, I'd appreciate learning of that - even if via
> > > being DoS'ed. ;-)
> 
> I think I need to clarify that the above is my own preference and some
> sysadmin's preference might be different.  This is a reason why we deal
> with whatever DoS attacks may be easily dealt with anyway - those with
> non-regular files.
> 
> For DoS via high iteration count, I see no good solution other than to
> accept this as a possibility for when group shadow is compromised.

Well it's also a possibility if you're using crypt to validate
passwords where both the hash and password are provided by a third
party. I think that's a major problem. I generally frown upon
interfaces where the run time is non-obviously superlinear in the
input size.

I don't see any down-size to limiting the iteration count if the limit
is reasonable. For instance if the limit were such that higher counts
would take more than 1 second on a theoretical 50 GHz variant of a
modern cpu (which is faster than a single core will EVER be able to
get), there's no way they would be practical to use, and there's no
sense in supporting them except to satisfy a fetish for "no arbitrary
limits" even when it conflicts with security and robustness. This
would at least ensure the function can't get stuck running for
hours/days/weeks at a time.

The hard part is putting the limit at some point a good bit lower.

> > OK, so your intent is to require sgid-shadow utilities to update
> > passwords?
> 
> Yes.
> 
> /usr/bin/passwd and (if enabled) /usr/bin/chage on Owl are SGID shadow.

If reading your own password hash also requires sgid-shadow, then
screen is sgid-shadow. Which means any user can easily get full shadow
group perms (since screen is full of vulns if it's running suid/sgid)
and thus you might as well not have had the group protection to begin
with. Same applies to things like xlock.

> That's our preference (for Owl) too.  I think you misunderstood our use
> of group shadow.  It does not "cause other users' accounts to be
> compromised" if it is compromised.

Indeed, I did misunderstand. Your way is not as bad as it could be,
but I'm still not very comfortable with the security implications of
it..

> > There's no reason applications should not be able to assume they can
> > safely call crypt where both the hash/salt/setting and key were
> > provided by an untrusted party.
> 
> There is such a reason for almost two decades now - since BSDi's
> introduction of iteration counts in "setting" strings in 1993 or so.
> That's the reality by now, and I think we should not be trying to change
> it in musl.

Thanks for bringing this up. I think the same limiting logic is
required there. And if there's going to be a way to configurably
override the limit, it should be mostly/entirely shareable.

> As to untrusted keys (passwords/phrases), I agree with you.  Those kinds
> of issues should be avoided.  glibc's uses of alloca() in SHA-crypt were
> recently patched for that reason.

Good analogy. And I think this one likewise needs to be addressed and
fixed.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: Re: crypt* files in crypt directory
  2012-08-08  2:24 ` Rich Felker
  2012-08-08  4:42   ` Solar Designer
@ 2012-08-09  6:03   ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09  6:03 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1268 bytes --]

On Tue, Aug 07, 2012 at 10:24:21PM -0400, Rich Felker wrote:
> Second, what can be done to reduce size? I think the first step is
> replacing the giant macros (BF_ROUND, BF_ENCRYPT, etc.) with
> functions so that the code doesn't get generated in duplicate unless
> aggressive inlining is enabled by CFLAGS. But are there other things
> that would help? With the data tables being 4k in size, I'm thinking
> a reasonable target size for the whole file might be 7k.
> 
> Actually while writing this, I made some quick changes and seem to
> have already achieved that goal. See the attached file. It's untested,
> so I might have broken something in the process. I'm not sure I'll
> have time to test it well right away, so I'd appreciate comments on
> whether it works as well as any other possible improvements... :-)

Here's an updated version with a few other minor optimizations -
particularly, the base64 encode/decode functions only operate on
fixed-size data, so instead of requiring the size as an argument and
having loop exit logic after each advance, they only need to check
after the advances that could lead into the exit condition.

I also made the changes to integrate the code with the caller
(crypt_r/crypt) and I'm including the patch for that.

Rich

[-- Attachment #2: cbf.diff --]
[-- Type: text/plain, Size: 1338 bytes --]

diff --git a/src/misc/crypt.c b/src/misc/crypt.c
index f35e13d..fc5c4ee 100644
--- a/src/misc/crypt.c
+++ b/src/misc/crypt.c
@@ -6,6 +6,6 @@ char *__crypt_r(const char *, const char *, struct crypt_data *);
 char *crypt(const char *key, const char *salt)
 {
 	/* Note: update this size when we add more hash types */
-	static char buf[21];
+	static char buf[64];
 	return __crypt_r(key, salt, (struct crypt_data *)buf);
 }
diff --git a/src/misc/crypt_r.c b/src/misc/crypt_r.c
index d16ab48..f4716d6 100644
--- a/src/misc/crypt_r.c
+++ b/src/misc/crypt_r.c
@@ -5,19 +5,19 @@ struct crypt_data;
 
 char *__crypt_des(const char *, const char *, char *);
 char *__crypt_md5(const char *, const char *, char *);
+char *__crypt_blowfish(const char *, const char *, char *);
 
 char *__crypt_r(const char *key, const char *salt, struct crypt_data *data)
 {
 	char *output = (char *)data;
+	if (salt[0] == '$' && salt[1] && salt[2]) {
 #if 0
-	/* MD5 or SHA? */
-	if (salt[0] == '$' && salt[1] && salt[2] == '$') {
-		if (salt[1] == '1')
-			return __crypt_md5((char *)data, key, salt);
-		else
-			return "x";
-	}
+		if (salt[1] == '1' && salt[2] == '$')
+			return __crypt_md5(key, salt, output);
 #endif
+		if (salt[1] == '2' && salt[3] == '$')
+			return __crypt_blowfish(key, salt, output);
+	}
 	return __crypt_des(key, salt, output);
 }
 

[-- Attachment #3: crypt_blowfish.c --]
[-- Type: text/plain, Size: 29279 bytes --]

/* Modified by Rich Felker for inclusion in musl libc. Main changes
 * made were reversing the manual inlining/loop unrolling and
 * replacing it with code that can be optimized for size or speed
 * depending on compiler flags. Endianness check was also optimized to
 * be a compile-time constant. */

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2011.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2011 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef struct {
	BF_word S[4][0x100];
	BF_key P;
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {
	{
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}, {
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}
};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

static int BF_decode_16(BF_word *dst, const char *src)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + 16;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	if (strspn(src, (char *)BF_itoa64) < 22) return -1;
	for (;;) {
		c1 = BF_atoi64[*sptr++-0x20];
		c2 = BF_atoi64[*sptr++-0x20];
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr == end) break;

		c3 = BF_atoi64[*sptr++-0x20];
		c4 = BF_atoi64[*sptr++-0x20];
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	}

	return 0;
}

static void BF_encode_23(char *dst, const BF_word *src)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + 23;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	for (;;) {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr == end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	}
}

static void BF_swap(BF_word *x, int count)
{
	BF_word tmp;

	if ((union { int i; char c; }){1}.c)
	do {
		tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

static BF_word BF_round(BF_word L, int N, BF_ctx *ctx)
{
	return ctx->P[N + 1] ^ (ctx->S[3][L & 0xFF]
		+ (ctx->S[2][L >> 8 & 0xFF]
		^ (ctx->S[1][L >> 16 & 0xFF]
		+ ctx->S[0][L >> 24])));
}

#define BF_ROUND(L, R, N) ( R ^= BF_round(L, N, ctx) )

static BF_word BF_encrypt(BF_word L, BF_word *RP, BF_ctx *ctx)
{
	int i;
	BF_word R = *RP;
	L ^= ctx->P[0];
	for (i=0; i<16; i+=2) {
		BF_ROUND(L, R, i);
		BF_ROUND(R, L, i+1);
	}
	*RP = L;
	return R ^ ctx->P[BF_N + 1];
}

#define BF_ENCRYPT ( L = BF_encrypt(L, &R, &data.ctx) )

#define BF_body() \
	L = R = 0; \
	ptr = data.ctx.P; \
	do { \
		ptr += 2; \
		BF_ENCRYPT; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.P[BF_N + 2]); \
\
	ptr = data.ctx.S[0]; \
	do { \
		ptr += 2; \
		BF_ENCRYPT; \
		*(ptr - 2) = L; \
		*(ptr - 1) = R; \
	} while (ptr < &data.ctx.S[3][0xFF]);

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word L, R;
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr;
	BF_word count;
	int i;

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] - 'a' > 25U ||
	    !flags_by_subtype[setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] - '0' > 3U ||
	    setting[5] - '0' > 9U ||
	    (setting[4] == '3' && setting[5] > '1') ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode_16(data.binary.salt, &setting[7])) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.S, BF_init_state.S, sizeof(data.ctx.S));

	L = R = 0;
	for (i = 0; i < BF_N + 2; i += 2) {
		L ^= data.binary.salt[i & 2];
		R ^= data.binary.salt[(i & 2) + 1];
		BF_ENCRYPT;
		data.ctx.P[i] = L;
		data.ctx.P[i + 1] = R;
	}

	ptr = data.ctx.S[0];
	do {
		ptr += 4;
		L ^= data.binary.salt[(BF_N + 2) & 3];
		R ^= data.binary.salt[(BF_N + 3) & 3];
		BF_ENCRYPT;
		*(ptr - 4) = L;
		*(ptr - 3) = R;

		L ^= data.binary.salt[(BF_N + 4) & 3];
		R ^= data.binary.salt[(BF_N + 5) & 3];
		BF_ENCRYPT;
		*(ptr - 2) = L;
		*(ptr - 1) = R;
	} while (ptr < &data.ctx.S[3][0xFF]);

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.P[i] ^= data.expanded_key[i];
			data.ctx.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_body();
			if (done)
				break;
			done = 1;

			tmp1 = data.binary.salt[0];
			tmp2 = data.binary.salt[1];
			tmp3 = data.binary.salt[2];
			tmp4 = data.binary.salt[3];
			for (i = 0; i < BF_N; i += 4) {
				data.ctx.P[i] ^= tmp1;
				data.ctx.P[i + 1] ^= tmp2;
				data.ctx.P[i + 2] ^= tmp3;
				data.ctx.P[i + 3] ^= tmp4;
			}
			data.ctx.P[16] ^= tmp1;
			data.ctx.P[17] ^= tmp2;
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		L = BF_magic_w[i];
		R = BF_magic_w[i + 1];

		count = 64;
		do {
			BF_ENCRYPT;
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = R;
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode_23(&output[7 + 22], data.binary.output);
	output[7 + 22 + 31] = '\0';

	return output;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *__crypt_blowfish(const char *key, const char *setting, char *output)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	retval = BF_crypt(key, setting, output, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  3:36             ` Solar Designer
@ 2012-08-09  7:13               ` orc
  2012-08-09  7:28                 ` Rich Felker
  2012-08-09  7:29               ` Solar Designer
  1 sibling, 1 reply; 52+ messages in thread
From: orc @ 2012-08-09  7:13 UTC (permalink / raw)
  To: musl


> Is use of GNU C extensions (also supported in clang) acceptable?  I am
> thinking of a "Labels as Values" trick to share more code.  This would
> make the source code less readable, though.
> 
> Alexander

They should be avoided (since the library is not targetting only gcc or
clang compilers only).


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  7:13               ` orc
@ 2012-08-09  7:28                 ` Rich Felker
  0 siblings, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09  7:28 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 03:13:32PM +0800, orc wrote:
> 
> > Is use of GNU C extensions (also supported in clang) acceptable?  I am
> > thinking of a "Labels as Values" trick to share more code.  This would
> > make the source code less readable, though.
> > 
> > Alexander
> 
> They should be avoided (since the library is not targetting only gcc or
> clang compilers only).

Yes. To clarify, of course musl uses _some_ features outside of plain
C99, mainly for weak symbols, atomic operations, and inline syscalls.
Unless/until we get a "coding standards" document of some sort, the
best description of current practice I can make is that extensions to
the C language are used mostly/entirely for controlling linking and
for interfacing with asm, not for making C into a different high level
language that's a superset of C. In particular, statement expressions
(in the form of ({x;y;z;})), nested functions, computed gotos/labels
as values, and other "GNU C" features that are particularly different
from normal C are not presently used in musl, and I don't have plans
to start using them.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  3:36             ` Solar Designer
  2012-08-09  7:13               ` orc
@ 2012-08-09  7:29               ` Solar Designer
  2012-08-09 10:53                 ` Solar Designer
  1 sibling, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-09  7:29 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 439 bytes --]

Rich -

On Thu, Aug 09, 2012 at 07:36:13AM +0400, Solar Designer wrote:
> How about we (almost) achieve your size goal for crypt_blowfish (7 KB
> including tables), yet keep the hand-unrolling?  I think this is
> possible, albeit maybe with some tricks.

Attached is the smaller and faster code, as discussed on IRC.

This is under 8 KB.  The speed is similar to the original, I measured
-3% to +2% on different systems/builds.

Alexander

[-- Attachment #2: crypt_blowfish.c --]
[-- Type: text/plain, Size: 31251 bytes --]

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2012.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2012 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

/* Just to make sure the prototypes match the actual definitions */
#include "crypt_blowfish.h"

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef struct {
	BF_word S[4][0x100];
	BF_key P;
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {
	{
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}, {
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}
};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

#define BF_safe_atoi64(dst, src) \
{ \
	tmp = (unsigned char)(src); \
	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
	tmp = BF_atoi64[tmp]; \
	if (tmp > 63) return -1; \
	(dst) = tmp; \
}

static int BF_decode(BF_word *dst, const char *src, int size)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + size;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	do {
		BF_safe_atoi64(c1, *sptr++);
		BF_safe_atoi64(c2, *sptr++);
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr >= end) break;

		BF_safe_atoi64(c3, *sptr++);
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		if (dptr >= end) break;

		BF_safe_atoi64(c4, *sptr++);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	} while (dptr < end);

	return 0;
}

static void BF_encode(char *dst, const BF_word *src, int size)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + size;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	do {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	} while (sptr < end);
}

static void BF_swap(BF_word *x, int count)
{
	if ((union { int i; char c; }){1}.c)
	do {
		BF_word tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

#define BF_ROUND(L, R, N) \
	tmp1 = L & 0xFF; \
	tmp2 = L >> 8; \
	tmp2 &= 0xFF; \
	tmp3 = L >> 16; \
	tmp3 &= 0xFF; \
	tmp4 = L >> 24; \
	tmp1 = ctx->S[3][tmp1]; \
	tmp2 = ctx->S[2][tmp2]; \
	tmp3 = ctx->S[1][tmp3]; \
	tmp3 += ctx->S[0][tmp4]; \
	tmp3 ^= tmp2; \
	R ^= ctx->P[N + 1]; \
	tmp3 += tmp1; \
	R ^= tmp3;

static BF_word BF_encrypt(BF_ctx *ctx,
    BF_word L, BF_word R,
    BF_word Lxor, BF_word Rxor,
    BF_word *start, BF_word *end)
{
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr = start;

	do {
		ptr += 2;
		L ^= Lxor ^ ctx->P[0];
		R ^= Rxor;
		BF_ROUND(L, R, 0);
		BF_ROUND(R, L, 1);
		BF_ROUND(L, R, 2);
		BF_ROUND(R, L, 3);
		BF_ROUND(L, R, 4);
		BF_ROUND(R, L, 5);
		BF_ROUND(L, R, 6);
		BF_ROUND(R, L, 7);
		BF_ROUND(L, R, 8);
		BF_ROUND(R, L, 9);
		BF_ROUND(L, R, 10);
		BF_ROUND(R, L, 11);
		BF_ROUND(L, R, 12);
		BF_ROUND(R, L, 13);
		BF_ROUND(L, R, 14);
		BF_ROUND(R, L, 15);
		tmp4 = R;
		R = L;
		L = tmp4 ^ ctx->P[BF_N + 1];
		*(ptr - 1) = R;
		*(ptr - 2) = L;
	} while (ptr < end);

	return L;
}

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, int size,
	BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word L, R;
	BF_word *ptr;
	BF_word count;
	int i;

	if (size < 7 + 22 + 31 + 1) {
		errno = ERANGE;
		return NULL;
	}

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] < 'a' || setting[2] > 'z' ||
	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] < '0' || setting[4] > '3' ||
	    setting[5] < '0' || setting[5] > '9' ||
	    (setting[4] == '3' && setting[5] > '1') ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.S, BF_init_state.S, sizeof(data.ctx.S));

	L = R = 0;
	ptr = &data.ctx.P[0];
	do {
		L = BF_encrypt(&data.ctx, L, R,
		    data.binary.salt[0], data.binary.salt[1], ptr, ptr);
		R = *(ptr + 1);
		ptr += 2;

		if (ptr >= &data.ctx.P[BF_N + 2])
			break;

		L = BF_encrypt(&data.ctx, L, R,
		    data.binary.salt[2], data.binary.salt[3], ptr, ptr);
		R = *(ptr + 1);
		ptr += 2;
	} while (1);

	ptr = data.ctx.S[0];
	do {
		L = BF_encrypt(&data.ctx, L, R,
		    data.binary.salt[(BF_N + 2) & 3],
		    data.binary.salt[(BF_N + 3) & 3],
		    ptr, ptr);
		R = *(ptr + 1);
		ptr += 2;

		L = BF_encrypt(&data.ctx, L, R,
		    data.binary.salt[(BF_N + 4) & 3],
		    data.binary.salt[(BF_N + 5) & 3],
		    ptr, ptr);
		R = *(ptr + 1);
		ptr += 2;
	} while (ptr < &data.ctx.S[3][0xFF]);

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.P[i] ^= data.expanded_key[i];
			data.ctx.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_word tmp1, tmp2, tmp3, tmp4;

			L = BF_encrypt(&data.ctx, 0, 0, 0, 0,
			    &data.ctx.P[0], &data.ctx.P[BF_N + 2]);
			R = data.ctx.P[BF_N + 1];

			BF_encrypt(&data.ctx, L, R, 0, 0,
			    &data.ctx.S[0][0], &data.ctx.S[3][0xFF]);

			if (done)
				break;
			done = 1;

			tmp1 = data.binary.salt[0];
			tmp2 = data.binary.salt[1];
			tmp3 = data.binary.salt[2];
			tmp4 = data.binary.salt[3];
			for (i = 0; i < BF_N; i += 4) {
				data.ctx.P[i] ^= tmp1;
				data.ctx.P[i + 1] ^= tmp2;
				data.ctx.P[i + 2] ^= tmp3;
				data.ctx.P[i + 3] ^= tmp4;
			}
			data.ctx.P[16] ^= tmp1;
			data.ctx.P[17] ^= tmp2;
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		BF_word LR[2];

		L = BF_magic_w[i];
		LR[1] = BF_magic_w[i + 1];

		count = 64;
		do {
			L = BF_encrypt(&data.ctx, L, LR[1], 0, 0,
			    &LR[0], &LR[0]);
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = LR[1];
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode(&output[7 + 22], data.binary.output, 23);
	output[7 + 22 + 31] = '\0';

	return output;
}

int _crypt_output_magic(const char *setting, char *output, int size)
{
	if (size < 3)
		return -1;

	output[0] = '*';
	output[1] = '0';
	output[2] = '\0';

	if (setting[0] == '*' && setting[1] == '0')
		output[1] = '1';

	return 0;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *_crypt_blowfish_rn(const char *key, const char *setting,
	char *output, int size)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	_crypt_output_magic(setting, output, size);
	retval = BF_crypt(key, setting, output, size, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	_crypt_output_magic(setting, output, size);
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
	const char *input, int size, char *output, int output_size)
{
	if (size < 16 || output_size < 7 + 22 + 1 ||
	    (count && (count < 4 || count > 31)) ||
	    prefix[0] != '$' || prefix[1] != '2' ||
	    (prefix[2] != 'a' && prefix[2] != 'y')) {
		if (output_size > 0) output[0] = '\0';
		errno = (output_size < 7 + 22 + 1) ? ERANGE : EINVAL;
		return NULL;
	}

	if (!count) count = 5;

	output[0] = '$';
	output[1] = '2';
	output[2] = prefix[2];
	output[3] = '$';
	output[4] = '0' + count / 10;
	output[5] = '0' + count % 10;
	output[6] = '$';

	BF_encode(&output[7], (const BF_word *)input, 16);
	output[7 + 22] = '\0';

	return output;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  7:29               ` Solar Designer
@ 2012-08-09 10:53                 ` Solar Designer
  2012-08-09 11:58                   ` Szabolcs Nagy
  2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
  0 siblings, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-09 10:53 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 297 bytes --]

Rich -

On Thu, Aug 09, 2012 at 11:29:40AM +0400, Solar Designer wrote:
> Attached is the smaller and faster code, as discussed on IRC.
> 
> This is under 8 KB.  The speed is similar to the original, I measured
> -3% to +2% on different systems/builds.

Here's an even smaller version.

Alexander

[-- Attachment #2: crypt_blowfish.c --]
[-- Type: text/plain, Size: 30859 bytes --]

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2012.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2012 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

/* Just to make sure the prototypes match the actual definitions */
#include "crypt_blowfish.h"

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef union {
	struct {
		BF_key P;
		BF_word S[4][0x100];
	} s;
	BF_word PS[BF_N + 2 + 4 * 0x100];
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {{
	{
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}, {
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}
}};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

#define BF_safe_atoi64(dst, src) \
{ \
	tmp = (unsigned char)(src); \
	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
	tmp = BF_atoi64[tmp]; \
	if (tmp > 63) return -1; \
	(dst) = tmp; \
}

static int BF_decode(BF_word *dst, const char *src, int size)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + size;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	do {
		BF_safe_atoi64(c1, *sptr++);
		BF_safe_atoi64(c2, *sptr++);
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr >= end) break;

		BF_safe_atoi64(c3, *sptr++);
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		if (dptr >= end) break;

		BF_safe_atoi64(c4, *sptr++);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	} while (dptr < end);

	return 0;
}

static void BF_encode(char *dst, const BF_word *src, int size)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + size;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	do {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	} while (sptr < end);
}

static void BF_swap(BF_word *x, int count)
{
	if ((union { int i; char c; }){1}.c)
	do {
		BF_word tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

#define BF_ROUND(L, R, N) \
	tmp1 = L & 0xFF; \
	tmp2 = L >> 8; \
	tmp2 &= 0xFF; \
	tmp3 = L >> 16; \
	tmp3 &= 0xFF; \
	tmp4 = L >> 24; \
	tmp1 = ctx->s.S[3][tmp1]; \
	tmp2 = ctx->s.S[2][tmp2]; \
	tmp3 = ctx->s.S[1][tmp3]; \
	tmp3 += ctx->s.S[0][tmp4]; \
	tmp3 ^= tmp2; \
	R ^= ctx->s.P[N + 1]; \
	tmp3 += tmp1; \
	R ^= tmp3;

static BF_word BF_encrypt(BF_ctx *ctx,
    BF_word L, BF_word R,
    BF_word *start, BF_word *end)
{
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr = start;

	do {
		ptr += 2;
		L ^= ctx->s.P[0];
		BF_ROUND(L, R, 0);
		BF_ROUND(R, L, 1);
		BF_ROUND(L, R, 2);
		BF_ROUND(R, L, 3);
		BF_ROUND(L, R, 4);
		BF_ROUND(R, L, 5);
		BF_ROUND(L, R, 6);
		BF_ROUND(R, L, 7);
		BF_ROUND(L, R, 8);
		BF_ROUND(R, L, 9);
		BF_ROUND(L, R, 10);
		BF_ROUND(R, L, 11);
		BF_ROUND(L, R, 12);
		BF_ROUND(R, L, 13);
		BF_ROUND(L, R, 14);
		BF_ROUND(R, L, 15);
		tmp4 = R;
		R = L;
		L = tmp4 ^ ctx->s.P[BF_N + 1];
		*(ptr - 1) = R;
		*(ptr - 2) = L;
	} while (ptr < end);

	return L;
}

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.s.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, int size,
	BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word count;
	int i;

	if (size < 7 + 22 + 31 + 1) {
		errno = ERANGE;
		return NULL;
	}

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] < 'a' || setting[2] > 'z' ||
	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] < '0' || setting[4] > '3' ||
	    setting[5] < '0' || setting[5] > '9' ||
	    (setting[4] == '3' && setting[5] > '1') ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.s.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.s.S, BF_init_state.s.S, sizeof(data.ctx.s.S));

	{
		BF_word L = 0, R = 0;
		BF_word *ptr = &data.ctx.PS[0];
		do {
			L = BF_encrypt(&data.ctx,
			    L ^ data.binary.salt[0], R ^ data.binary.salt[1],
			    ptr, ptr);
			R = *(ptr + 1);
			ptr += 2;

			if (ptr >= &data.ctx.PS[BF_N + 2 + 4 * 0x100])
				break;

			L = BF_encrypt(&data.ctx,
			    L ^ data.binary.salt[2], R ^ data.binary.salt[3],
			    ptr, ptr);
			R = *(ptr + 1);
			ptr += 2;
		} while (1);
	}

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.s.P[i] ^= data.expanded_key[i];
			data.ctx.s.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_encrypt(&data.ctx, 0, 0,
			    &data.ctx.PS[0],
			    &data.ctx.PS[BF_N + 2 + 4 * 0x100]);

			if (done)
				break;
			done = 1;

			{
				BF_word tmp1, tmp2, tmp3, tmp4;

				tmp1 = data.binary.salt[0];
				tmp2 = data.binary.salt[1];
				tmp3 = data.binary.salt[2];
				tmp4 = data.binary.salt[3];
				for (i = 0; i < BF_N; i += 4) {
					data.ctx.s.P[i] ^= tmp1;
					data.ctx.s.P[i + 1] ^= tmp2;
					data.ctx.s.P[i + 2] ^= tmp3;
					data.ctx.s.P[i + 3] ^= tmp4;
				}
				data.ctx.s.P[16] ^= tmp1;
				data.ctx.s.P[17] ^= tmp2;
			}
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		BF_word L, LR[2];

		L = BF_magic_w[i];
		LR[1] = BF_magic_w[i + 1];

		count = 64;
		do {
			L = BF_encrypt(&data.ctx, L, LR[1],
			    &LR[0], &LR[0]);
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = LR[1];
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode(&output[7 + 22], data.binary.output, 23);
	output[7 + 22 + 31] = '\0';

	return output;
}

int _crypt_output_magic(const char *setting, char *output, int size)
{
	if (size < 3)
		return -1;

	output[0] = '*';
	output[1] = '0';
	output[2] = '\0';

	if (setting[0] == '*' && setting[1] == '0')
		output[1] = '1';

	return 0;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *_crypt_blowfish_rn(const char *key, const char *setting,
	char *output, int size)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	_crypt_output_magic(setting, output, size);
	retval = BF_crypt(key, setting, output, size, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	_crypt_output_magic(setting, output, size);
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

char *_crypt_gensalt_blowfish_rn(const char *prefix, unsigned long count,
	const char *input, int size, char *output, int output_size)
{
	if (size < 16 || output_size < 7 + 22 + 1 ||
	    (count && (count < 4 || count > 31)) ||
	    prefix[0] != '$' || prefix[1] != '2' ||
	    (prefix[2] != 'a' && prefix[2] != 'y')) {
		if (output_size > 0) output[0] = '\0';
		errno = (output_size < 7 + 22 + 1) ? ERANGE : EINVAL;
		return NULL;
	}

	if (!count) count = 5;

	output[0] = '$';
	output[1] = '2';
	output[2] = prefix[2];
	output[3] = '$';
	output[4] = '0' + count / 10;
	output[5] = '0' + count % 10;
	output[6] = '$';

	BF_encode(&output[7], (const BF_word *)input, 16);
	output[7 + 22] = '\0';

	return output;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 10:53                 ` Solar Designer
@ 2012-08-09 11:58                   ` Szabolcs Nagy
  2012-08-09 16:43                     ` Solar Designer
  2012-08-09 23:21                     ` Rich Felker
  2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
  1 sibling, 2 replies; 52+ messages in thread
From: Szabolcs Nagy @ 2012-08-09 11:58 UTC (permalink / raw)
  To: musl

it's scary how these crypto codes mix various
signed and unsigned integer types

i thought these algorithms were designed and
implemented with the c type system in mind..


i added review comments
the style comments are subjective of course

* Solar Designer <solar@openwall.com> [2012-08-09 14:53:48 +0400]:
> typedef unsigned int BF_word;
> typedef signed int BF_word_signed;
> 
> /* Number of Blowfish rounds, this is also hardcoded into a few places */
> #define BF_N				16
> 
> typedef BF_word BF_key[BF_N + 2];

i don't like these typedefs
it seems the code assumes 32bit unsigned BF_word
(eg. the way L>>24 is used below)

> #define BF_safe_atoi64(dst, src) \
> { \
> 	tmp = (unsigned char)(src); \
> 	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \

tmp is already unsigned int

> #define BF_ROUND(L, R, N) \
> 	tmp1 = L & 0xFF; \
> 	tmp2 = L >> 8; \
> 	tmp2 &= 0xFF; \
> 	tmp3 = L >> 16; \
> 	tmp3 &= 0xFF; \
> 	tmp4 = L >> 24; \
> 	tmp1 = ctx->s.S[3][tmp1]; \
> 	tmp2 = ctx->s.S[2][tmp2]; \
> 	tmp3 = ctx->s.S[1][tmp3]; \
> 	tmp3 += ctx->s.S[0][tmp4]; \
> 	tmp3 ^= tmp2; \
> 	R ^= ctx->s.P[N + 1]; \
> 	tmp3 += tmp1; \
> 	R ^= tmp3;

i guess this is performance critical, but
i wouldn't spread those expressions over
several lines

tmp1 = ctx->S[3][L & 0xff];
tmp2 = ctx->S[2][L>>8 & 0xff];
tmp3 = ctx->S[1][L>>16 & 0xff];
tmp4 = ctx->S[0][L>>24 & 0xff];
R ^= ctx->P[N+1];
R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;

> 	do {
> 		ptr += 2;
> 		L ^= ctx->s.P[0];
> 		BF_ROUND(L, R, 0);
> 		BF_ROUND(R, L, 1);
> 		BF_ROUND(L, R, 2);
> 		BF_ROUND(R, L, 3);
> 		BF_ROUND(L, R, 4);
> 		BF_ROUND(R, L, 5);
> 		BF_ROUND(L, R, 6);
> 		BF_ROUND(R, L, 7);
> 		BF_ROUND(L, R, 8);
> 		BF_ROUND(R, L, 9);
> 		BF_ROUND(L, R, 10);
> 		BF_ROUND(R, L, 11);
> 		BF_ROUND(L, R, 12);
> 		BF_ROUND(R, L, 13);
> 		BF_ROUND(L, R, 14);
> 		BF_ROUND(R, L, 15);
> 		tmp4 = R;
> 		R = L;
> 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> 		*(ptr - 1) = R;
> 		*(ptr - 2) = L;
> 	} while (ptr < end);

why increase ptr at the begining?
it seems the idiomatic way would be

 *ptr++ = L;
 *ptr++ = R;

> 			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */

i don't think the BF_word_signed cast helps here
signed char is promoted to int and then
it will be converted to BF_word

> 	if (setting[0] != '$' ||
> 	    setting[1] != '2' ||
> 	    setting[2] < 'a' || setting[2] > 'z' ||
> 	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||

setting[2] is already range checked so
the casts are not necessary
(assuming 'a' < 'z')

> 	BF_set_key(key, data.expanded_key, data.ctx.s.P,
> 	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

ditto

> 		do {
> 			L = BF_encrypt(&data.ctx,
> 			    L ^ data.binary.salt[0], R ^ data.binary.salt[1],
> 			    ptr, ptr);
> 			R = *(ptr + 1);
> 			ptr += 2;
> 
> 			if (ptr >= &data.ctx.PS[BF_N + 2 + 4 * 0x100])
> 				break;
> 
> 			L = BF_encrypt(&data.ctx,
> 			    L ^ data.binary.salt[2], R ^ data.binary.salt[3],
> 			    ptr, ptr);
> 			R = *(ptr + 1);
> 			ptr += 2;
> 		} while (1);

i'd use for (;;) for infinite loops

eventhough most of the loops in the code are do{}while

> 	do {
> 		int done;
> 
> 		for (i = 0; i < BF_N + 2; i += 2) {
> 			data.ctx.s.P[i] ^= data.expanded_key[i];
> 			data.ctx.s.P[i + 1] ^= data.expanded_key[i + 1];
> 		}
> 
> 		done = 0;
> 		do {
> 			BF_encrypt(&data.ctx, 0, 0,
> 			    &data.ctx.PS[0],
> 			    &data.ctx.PS[BF_N + 2 + 4 * 0x100]);
> 
> 			if (done)
> 				break;
> 			done = 1;
> 
> 			{
> 				BF_word tmp1, tmp2, tmp3, tmp4;
> 
> 				tmp1 = data.binary.salt[0];
> 				tmp2 = data.binary.salt[1];
> 				tmp3 = data.binary.salt[2];
> 				tmp4 = data.binary.salt[3];
> 				for (i = 0; i < BF_N; i += 4) {
> 					data.ctx.s.P[i] ^= tmp1;
> 					data.ctx.s.P[i + 1] ^= tmp2;
> 					data.ctx.s.P[i + 2] ^= tmp3;
> 					data.ctx.s.P[i + 3] ^= tmp4;
> 				}
> 				data.ctx.s.P[16] ^= tmp1;
> 				data.ctx.s.P[17] ^= tmp2;
> 			}
> 		} while (1);

i like for better

for (done = 0; ; done++) {
	...
	if (done)
		break;
	...
}

> 		count = 64;
> 		do {
> 			L = BF_encrypt(&data.ctx, L, LR[1],
> 			    &LR[0], &LR[0]);
> 		} while (--count);

for (count = 0; count < 64; count++)

i assume it will be optimized to the same thing
(count is not used later)

> 	output[7 + 22 - 1] = BF_itoa64[(int)
> 		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

is setting[28] range checked at this point?

the int casts do not help

> 	_crypt_output_magic(setting, output, size);
> 	retval = BF_crypt(key, setting, output, size, 16);
> 	save_errno = errno;

why save errno before the self test?

if the self test fails then errno can be anything anyway
if it does not fail then errno is unchanged i guess

> 	memcpy(buf.s, test_setting, sizeof(buf.s));

sizeof buf.s
without ()

> 	memset(buf.o, 0x55, sizeof(buf.o));
> 	buf.o[sizeof(buf.o) - 1] = 0;
> 	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);

ditto

i'd write (1 + 1) as 2

> 	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],

the extra (unsigned int) cast is unnecessary



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  5:48               ` Rich Felker
@ 2012-08-09 15:52                 ` Solar Designer
  2012-08-09 17:59                   ` Rich Felker
  2012-08-09 21:17                   ` Rich Felker
  0 siblings, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-09 15:52 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 01:48:04AM -0400, Rich Felker wrote:
> On Thu, Aug 09, 2012 at 08:04:32AM +0400, Solar Designer wrote:
> > For DoS via high iteration count, I see no good solution other than to
> > accept this as a possibility for when group shadow is compromised.
> 
> Well it's also a possibility if you're using crypt to validate
> passwords where both the hash and password are provided by a third
> party. I think that's a major problem. I generally frown upon
> interfaces where the run time is non-obviously superlinear in the
> input size.

I agree that it's not great that this problem exists, but I am unsure if
trying to solve it would make things better overall.

> I don't see any down-size to limiting the iteration count if the limit
> is reasonable. For instance if the limit were such that higher counts
> would take more than 1 second on a theoretical 50 GHz variant of a
> modern cpu (which is faster than a single core will EVER be able to
> get), there's no way they would be practical to use, and there's no
> sense in supporting them except to satisfy a fetish for "no arbitrary
> limits" even when it conflicts with security and robustness. This
> would at least ensure the function can't get stuck running for
> hours/days/weeks at a time.
> 
> The hard part is putting the limit at some point a good bit lower.

This makes some sense.

> > /usr/bin/passwd and (if enabled) /usr/bin/chage on Owl are SGID shadow.
> 
> If reading your own password hash also requires sgid-shadow, then
> screen is sgid-shadow. Which means any user can easily get full shadow
> group perms (since screen is full of vulns if it's running suid/sgid)
> and thus you might as well not have had the group protection to begin
> with. Same applies to things like xlock.

No, screen is SGID screen, and group screen provides access to the
tcb_chkpwd and utempter helpers, which are SGID shadow and utmp,
respectively.

xlock, if installed, may be made SGID chkpwd (a group provided on Owl by
default for that possible use), which provides access to tcb_chkpwd
only.  This is what doc/REDHAT (advice on using Red Hat's packages on
Owl) suggests.  Being a server distro, we don't provide X ourselves.

Even if group screen or chkpwd is compromised, this only allows for
direct attacks on tcb_chkpwd - and it's a rather small program (5 KB
binary).  This does not allow for group shadow access without having
found and exploited a vulnerability in tcb_chkpwd first.

Obviously, certain vulnerabilities in the dynamic linker, libc, or/and
the kernel would allow to compromise any SGID program's target group.
That would be nasty, but not fatal - e.g., DoS attacks like what we're
discussing would be possible.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 11:58                   ` Szabolcs Nagy
@ 2012-08-09 16:43                     ` Solar Designer
  2012-08-09 17:30                       ` Szabolcs Nagy
  2012-08-09 18:22                       ` Rich Felker
  2012-08-09 23:21                     ` Rich Felker
  1 sibling, 2 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-09 16:43 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> it's scary how these crypto codes mix various
> signed and unsigned integer types

Yes, there's room for improvement here.  I'd probably write this
differently if I were writing it from scratch now.  Perhaps a cleanup is
in order, similarly to what we did for the FreeSec code recently.
Specifically, we could have a higher-level wrapper function accept char *
like crypt() is defined to accept, but pass these to a function accepting
unsigned char * and go with that from this point.

> i added review comments
> the style comments are subjective of course

Thanks.  Note that I deliberately did not merge Rich's cleanups yet in
order to be able to continue testing this as part of my tree, with its
wrapper.c still intact, etc.  As discussed with Rich on IRC, I left
merging those changes for musl's use to Rich.

> * Solar Designer <solar@openwall.com> [2012-08-09 14:53:48 +0400]:
> > typedef unsigned int BF_word;
> > typedef signed int BF_word_signed;
> > 
> > /* Number of Blowfish rounds, this is also hardcoded into a few places */
> > #define BF_N				16
> > 
> > typedef BF_word BF_key[BF_N + 2];
> 
> i don't like these typedefs
> it seems the code assumes 32bit unsigned BF_word

Yes.  As you suggested while we were discussed FreeSec, in musl's
revision of the code we should use <stdint.h> types here.  I made those
changes to that code, but not yet to crypt_blowfish code - I think Rich
will do that.

> > #define BF_safe_atoi64(dst, src) \
> > { \
> > 	tmp = (unsigned char)(src); \
> > 	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
> 
> tmp is already unsigned int

Rich is dropping BF_safe_atoi64() anyway, in favor of different code.
Thanks for the comment, though.  I might look into this when cleaning
this up for non-musl use.

> > #define BF_ROUND(L, R, N) \
> > 	tmp1 = L & 0xFF; \
> > 	tmp2 = L >> 8; \
> > 	tmp2 &= 0xFF; \
> > 	tmp3 = L >> 16; \
> > 	tmp3 &= 0xFF; \
> > 	tmp4 = L >> 24; \
> > 	tmp1 = ctx->s.S[3][tmp1]; \
> > 	tmp2 = ctx->s.S[2][tmp2]; \
> > 	tmp3 = ctx->s.S[1][tmp3]; \
> > 	tmp3 += ctx->s.S[0][tmp4]; \
> > 	tmp3 ^= tmp2; \
> > 	R ^= ctx->s.P[N + 1]; \
> > 	tmp3 += tmp1; \
> > 	R ^= tmp3;
> 
> i guess this is performance critical, but
> i wouldn't spread those expressions over
> several lines
> 
> tmp1 = ctx->S[3][L & 0xff];
> tmp2 = ctx->S[2][L>>8 & 0xff];
> tmp3 = ctx->S[1][L>>16 & 0xff];
> tmp4 = ctx->S[0][L>>24 & 0xff];
> R ^= ctx->P[N+1];
> R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;

I think this rewrite is likely to cause a slowdown with some compilers.

My revision of BF_ROUND makes the available parallelism explicit - for
example, that L & 0xFF may be computed early, but the corresponding
array lookup postponed until after other indices are also computed -
which leaves more time for the mask and address generation latency.

Sure, an optimizing compiler is supposed to figure this out and do
proper instruction scheduling on its own.  However, there exist cases
when some optimizations of a compiler have to be disabled - e.g., an
Apple person suggested disabling clang/llvm's instruction scheduling
when compiling this very code (well, a two hashes at a time revision of
it) a couple of years ago, as a workaround for the compiler doing it
poorly.  My hand-written instruction scheduling as above turned out to
work a lot better:

http://www.openwall.com/lists/john-users/2010/08/08/4

(and two follow-ups - click "thread-next").  It would be curious to see
what the same compiler would do with your "unoptimized" revision of the
source code, but I doubt that it would perform better.

> > 	do {
> > 		ptr += 2;
> > 		L ^= ctx->s.P[0];
> > 		BF_ROUND(L, R, 0);
> > 		BF_ROUND(R, L, 1);
> > 		BF_ROUND(L, R, 2);
> > 		BF_ROUND(R, L, 3);
> > 		BF_ROUND(L, R, 4);
> > 		BF_ROUND(R, L, 5);
> > 		BF_ROUND(L, R, 6);
> > 		BF_ROUND(R, L, 7);
> > 		BF_ROUND(L, R, 8);
> > 		BF_ROUND(R, L, 9);
> > 		BF_ROUND(L, R, 10);
> > 		BF_ROUND(R, L, 11);
> > 		BF_ROUND(L, R, 12);
> > 		BF_ROUND(R, L, 13);
> > 		BF_ROUND(L, R, 14);
> > 		BF_ROUND(R, L, 15);
> > 		tmp4 = R;
> > 		R = L;
> > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > 		*(ptr - 1) = R;
> > 		*(ptr - 2) = L;
> > 	} while (ptr < end);
> 
> why increase ptr at the begining?

To hide the latency.  Indeed, a smart compiler should do this on its
own, but not all do it well.

> it seems the idiomatic way would be
> 
>  *ptr++ = L;
>  *ptr++ = R;

Yes, but note that the writes are made in the other order, for a reason:
R is computed first, so it is available for the write first.

All of this is a result of experiments, some on old systems many years
ago, some recent.

> > 			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
> 
> i don't think the BF_word_signed cast helps here
> signed char is promoted to int and then
> it will be converted to BF_word

Yes, I just like it explicit to show what exactly is happening, because
it matters a lot.

> > 	if (setting[0] != '$' ||
> > 	    setting[1] != '2' ||
> > 	    setting[2] < 'a' || setting[2] > 'z' ||
> > 	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
> 
> setting[2] is already range checked so
> the casts are not necessary
> (assuming 'a' < 'z')

Casting chars to (unsigned int)(unsigned char) for array indices is an
idiom for me.  Now, we have a promotion to (int) due to "- 'a'", but if
this is ever compiled with a C++ compiler, the 'a' constant would be of
type char rather than int, IIRC.  So better safe than sorry.

Yes, the range checking should have helped in this case, but I recall
seeing weird behavior with in-range (implies non-negative) char
subscripts on Alpha in 1990s before I started using this idiom.  IIRC,
instructions were generated such that high bytes of the array index
register were not cleared before the lookup.  Compiler bug?  But it was
seen with different compilers.

> i'd use for (;;) for infinite loops

I saw Rich do that, but I kept the infinite loops in this file I posted
consistent with those I use elsewhere for now.  Rich can musl'ify this
code in this way if desired.

> i like for better
> 
> for (done = 0; ; done++) {
> 	...
> 	if (done)
> 		break;
> 	...
> }

I might try that and see if it's actually cleaner and as fast.  Instead
of "done++", I'd use "done = 1".

> > 		count = 64;
> > 		do {
> > 			L = BF_encrypt(&data.ctx, L, LR[1],
> > 			    &LR[0], &LR[0]);
> > 		} while (--count);
> 
> for (count = 0; count < 64; count++)
> 
> i assume it will be optimized to the same thing
> (count is not used later)

I hope/expect that it will be optimized to the same thing, but I am not
sure it'd happen with all relevant compilers all the time.  I like
writing performance-critical code closer to the desired assembly code.

This one loop is only moderately performance relevant, though.

> > 	output[7 + 22 - 1] = BF_itoa64[(int)
> > 		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];
> 
> is setting[28] range checked at this point?

Yes, as part of the salt decoding at function start.

> the int casts do not help

Yes, there's promotion to int here anyway.

> > 	_crypt_output_magic(setting, output, size);
> > 	retval = BF_crypt(key, setting, output, size, 16);
> > 	save_errno = errno;
> 
> why save errno before the self test?
> 
> if the self test fails then errno can be anything anyway

Yes, if the test fails, we set errno explicitly to EINVAL after this
point.

> if it does not fail then errno is unchanged i guess

Yes, this should be the case currently.  I just did not want to have a
pitfall in there in case of future changes to BF_crypt() where errno
could potentially be altered on a successful call.

> > 	memcpy(buf.s, test_setting, sizeof(buf.s));
> 
> sizeof buf.s
> without ()

I am consistently using sizeof() with the braces.  Maybe musl's coding
style is different.  If so, this may be changed indeed.

> > 	memset(buf.o, 0x55, sizeof(buf.o));
> > 	buf.o[sizeof(buf.o) - 1] = 0;
> > 	p = BF_crypt(test_key, buf.s, buf.o, sizeof(buf.o) - (1 + 1), 1);
> 
> ditto
> 
> i'd write (1 + 1) as 2

I'm not sure which is cleaner.  It can be said that both the '\x55' and
the NUL after it are canary bytes and thus they're the same kind of
thing and we can talk of a canary of size 2.  Then we'd need to use this
"2" in three places, not just here.  However, in the way we specify the
test_hash[] strings, the NUL is implicit, which kind of makes it
separate from the '\x55'.

> > 	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
> 
> the extra (unsigned int) cast is unnecessary

Yes, we could break the idiom and rely on the promotion to int here.
(Without such promotion or cast, IIRC, things would sometimes fail on
Alpha.)

Thanks,

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 16:43                     ` Solar Designer
@ 2012-08-09 17:30                       ` Szabolcs Nagy
  2012-08-09 18:22                       ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Szabolcs Nagy @ 2012-08-09 17:30 UTC (permalink / raw)
  To: musl

* Solar Designer <solar@openwall.com> [2012-08-09 20:43:55 +0400]:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > i added review comments
> > the style comments are subjective of course
> 
> Thanks.  Note that I deliberately did not merge Rich's cleanups yet in
> order to be able to continue testing this as part of my tree, with its
> wrapper.c still intact, etc.  As discussed with Rich on IRC, I left
> merging those changes for musl's use to Rich.

ah ok

thanks for the insightful answers

i didn't know about the alpha thing
(unsigned int cast may be needed in array index)

and i'm not used to these low level optimizations
(i prefer writing clean c assuming that the
compiler is perfect, but i can see how that
might not be the best approach in practice)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 15:52                 ` Solar Designer
@ 2012-08-09 17:59                   ` Rich Felker
  2012-08-09 21:17                   ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09 17:59 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 07:52:55PM +0400, Solar Designer wrote:
> > > /usr/bin/passwd and (if enabled) /usr/bin/chage on Owl are SGID shadow.
> > 
> > If reading your own password hash also requires sgid-shadow, then
> > screen is sgid-shadow. Which means any user can easily get full shadow
> > group perms (since screen is full of vulns if it's running suid/sgid)
> > and thus you might as well not have had the group protection to begin
> > with. Same applies to things like xlock.
> 
> No, screen is SGID screen, and group screen provides access to the
> tcb_chkpwd and utempter helpers, which are SGID shadow and utmp,
> respectively.

OK, so basically if you have screen installed, any user has permission
to run tcb_chkpwd any way they like (because essentially every user
now belongs to group screen).

> xlock, if installed, may be made SGID chkpwd (a group provided on Owl by
> default for that possible use), which provides access to tcb_chkpwd
> only.  This is what doc/REDHAT (advice on using Red Hat's packages on
> Owl) suggests.  Being a server distro, we don't provide X ourselves.

This sounds fairly reasonable, but given the history of vulns in
xlock, it probably means all users now belong to group chkpwd.

> Even if group screen or chkpwd is compromised, this only allows for
> direct attacks on tcb_chkpwd - and it's a rather small program (5 KB
> binary).  This does not allow for group shadow access without having
> found and exploited a vulnerability in tcb_chkpwd first.
> 
> Obviously, certain vulnerabilities in the dynamic linker, libc, or/and
> the kernel would allow to compromise any SGID program's target group.
> That would be nasty, but not fatal - e.g., DoS attacks like what we're
> discussing would be possible.

Have you considered changing tcb_chkpwd to communicate with a daemon
over a unix socket rather than being sgid? That reduced the number of
input channels an attacker has from N (which grows with time since
Linux keeps adding more and more ridiculous amounts of state processes
can inherit) to 1 (the socket). I think the change could be made
transparently to programs which use it.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 16:43                     ` Solar Designer
  2012-08-09 17:30                       ` Szabolcs Nagy
@ 2012-08-09 18:22                       ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09 18:22 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 08:43:55PM +0400, Solar Designer wrote:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > it's scary how these crypto codes mix various
> > signed and unsigned integer types
> 
> Yes, there's room for improvement here.  I'd probably write this
> differently if I were writing it from scratch now.  Perhaps a cleanup is
> in order, similarly to what we did for the FreeSec code recently.
> Specifically, we could have a higher-level wrapper function accept char *
> like crypt() is defined to accept, but pass these to a function accepting
> unsigned char * and go with that from this point.

I actually like this idea a lot. It should go a long way to clarify
the code and make it obvious to readers that no further signed-char
sign extension bugs exist.

> > > typedef unsigned int BF_word;
> > > typedef signed int BF_word_signed;
> > 
> > i don't like these typedefs
> > it seems the code assumes 32bit unsigned BF_word
> 
> Yes.  As you suggested while we were discussed FreeSec, in musl's
> revision of the code we should use <stdint.h> types here.  I made those
> changes to that code, but not yet to crypt_blowfish code - I think Rich
> will do that.

Yes, I think this change should be made, for clarity if nothing else.
It's non-obvious reading the code that BF_word is intended to be a
fixed-size 32-bit type (i.e. that arithmetic on it is intended to be
modulo 2^32).

> > tmp1 = ctx->S[3][L & 0xff];
> > tmp2 = ctx->S[2][L>>8 & 0xff];
> > tmp3 = ctx->S[1][L>>16 & 0xff];
> > tmp4 = ctx->S[0][L>>24 & 0xff];
> > R ^= ctx->P[N+1];
> > R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;
> 
> I think this rewrite is likely to cause a slowdown with some compilers.

When we're done with everything else, I'd like to see a comparison
with my version of the code (no explicit scheduling, straightforward
expression with no temps) versus the original on modern gcc (and
possibly clang). I would expect them to perform identically, in which
case I'd choose code clarity. I like C that reads like C rather than
reading like decompiled asm.. :-)

> > > 	do {
> > > 		ptr += 2;
> > > 		L ^= ctx->s.P[0];
> > > 		BF_ROUND(L, R, 0);
> > > 		BF_ROUND(R, L, 1);
> > > 		BF_ROUND(L, R, 2);
> > > 		BF_ROUND(R, L, 3);
> > > 		BF_ROUND(L, R, 4);
> > > 		BF_ROUND(R, L, 5);
> > > 		BF_ROUND(L, R, 6);
> > > 		BF_ROUND(R, L, 7);
> > > 		BF_ROUND(L, R, 8);
> > > 		BF_ROUND(R, L, 9);
> > > 		BF_ROUND(L, R, 10);
> > > 		BF_ROUND(R, L, 11);
> > > 		BF_ROUND(L, R, 12);
> > > 		BF_ROUND(R, L, 13);
> > > 		BF_ROUND(L, R, 14);
> > > 		BF_ROUND(R, L, 15);
> > > 		tmp4 = R;
> > > 		R = L;
> > > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > > 		*(ptr - 1) = R;
> > > 		*(ptr - 2) = L;
> > > 	} while (ptr < end);
> > 
> > why increase ptr at the begining?
> 
> To hide the latency.  Indeed, a smart compiler should do this on its
> own, but not all do it well.

I suspect your version will generate worse code (both larger and
slower) on cpus like arm that can increment the index/pointer register
and dereference it at the same time.

> > > 			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
> > 
> > i don't think the BF_word_signed cast helps here
> > signed char is promoted to int and then
> > it will be converted to BF_word
> 
> Yes, I just like it explicit to show what exactly is happening, because
> it matters a lot.

The cast is not the same as what's happening implicitly.

XXX

> 
> > > 	if (setting[0] != '$' ||
> > > 	    setting[1] != '2' ||
> > > 	    setting[2] < 'a' || setting[2] > 'z' ||
> > > 	    !flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a'] ||
> > 
> > setting[2] is already range checked so
> > the casts are not necessary
> > (assuming 'a' < 'z')
> 
> Casting chars to (unsigned int)(unsigned char) for array indices is an
> idiom for me.  Now, we have a promotion to (int) due to "- 'a'", but if
> this is ever compiled with a C++ compiler, the 'a' constant would be of
> type char rather than int, IIRC.  So better safe than sorry.

No, all arithmetic and logical operators are subject to default
promotions in both C and C++. 'b'-'a' has type int in both languages
despite the fact that 'a' and 'b' have type char in C++ (and int in
C). The only possible function the (unsigned int) cast can have is
forcing promotion of another value in the expressione from int to
unsigned int.

> Yes, the range checking should have helped in this case, but I recall
> seeing weird behavior with in-range (implies non-negative) char
> subscripts on Alpha in 1990s before I started using this idiom.  IIRC,
> instructions were generated such that high bytes of the array index
> register were not cleared before the lookup.  Compiler bug?  But it was
> seen with different compilers.

I suspect you're thinking of something different. There's definitely
no way char types could be usable at all if the high bytes of
registers loaded from chars were full of random junk. This compiler
bug sounds too serious to be plausible.

> > i'd use for (;;) for infinite loops
> 
> I saw Rich do that, but I kept the infinite loops in this file I posted
> consistent with those I use elsewhere for now.  Rich can musl'ify this
> code in this way if desired.

I just did it when removing the while condition from do/while loops,
not gratuitously changing existing infinite loops. I didn't even think
of the possibility of leaving it as do/while(1) because I rarely/never
use that idiom for infinite loops. I don't care which is used though.

> > for (count = 0; count < 64; count++)
> > 
> > i assume it will be optimized to the same thing
> > (count is not used later)
> 
> I hope/expect that it will be optimized to the same thing, but I am not
> sure it'd happen with all relevant compilers all the time.  I like
> writing performance-critical code closer to the desired assembly code.

This is a simple strength reduction optimization and every gcc version
since 2.7.2 or so has done it, if I'm not mistaken. I don't care which
way it's written though; either is clear and idiomatic C.

> > if it does not fail then errno is unchanged i guess
> 
> Yes, this should be the case currently.  I just did not want to have a
> pitfall in there in case of future changes to BF_crypt() where errno
> could potentially be altered on a successful call.

In principle this could happen if arbitrary library functions are
being called, but in practice it shouldn't. Anyway EINVAL is the only
error we support here, so it could just be set just before return if
an error was encountered. Or if we return fake hashes that can never
match instead of null on error, there's no sense in ever setting
errno to begin with.

> > > 	memcpy(buf.s, test_setting, sizeof(buf.s));
> > 
> > sizeof buf.s
> > without ()
> 
> I am consistently using sizeof() with the braces.  Maybe musl's coding
> style is different.  If so, this may be changed indeed.

I don't care if it's inconsistent in code that was written outside of
musl, but the idiom I use is to only include parentheses for types.
Aside from my dislike for gratuitous parens, this is to make it clear
that we're taking the size of an object rather than the size of a
type.

> > > 	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
> > 
> > the extra (unsigned int) cast is unnecessary
> 
> Yes, we could break the idiom and rely on the promotion to int here.
> (Without such promotion or cast, IIRC, things would sometimes fail on
> Alpha.)

I smell cargo culting...

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 15:52                 ` Solar Designer
  2012-08-09 17:59                   ` Rich Felker
@ 2012-08-09 21:17                   ` Rich Felker
  2012-08-09 21:44                     ` Solar Designer
  1 sibling, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09 21:17 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 07:52:55PM +0400, Solar Designer wrote:
> > I don't see any down-size to limiting the iteration count if the limit
> > is reasonable. For instance if the limit were such that higher counts
> > would take more than 1 second on a theoretical 50 GHz variant of a
> > modern cpu (which is faster than a single core will EVER be able to
> > get), there's no way they would be practical to use, and there's no
> > sense in supporting them except to satisfy a fetish for "no arbitrary
> > limits" even when it conflicts with security and robustness. This
> > would at least ensure the function can't get stuck running for
> > hours/days/weeks at a time.
> > 
> > The hard part is putting the limit at some point a good bit lower.
> 
> This makes some sense.

After some casual tests, I would say somewhere around 16 is
appropriate as the absolute upper cut-off, and 12-14 is probably the
"point a good bit lower" we're aiming for. Anyone else have opinions
on this? Information on what's in common use in the wild? (I would
guess 4-8 is typical in the wild..)

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 21:17                   ` Rich Felker
@ 2012-08-09 21:44                     ` Solar Designer
  2012-08-09 22:08                       ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-09 21:44 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 05:17:36PM -0400, Rich Felker wrote:
> After some casual tests, I would say somewhere around 16 is
> appropriate as the absolute upper cut-off, and 12-14 is probably the
> "point a good bit lower" we're aiming for. Anyone else have opinions
> on this? Information on what's in common use in the wild? (I would
> guess 4-8 is typical in the wild..)

4-12 exist in the wild for password authentication, larger values are
sometimes seen for other uses (you may choose not to support such uses).

I think the defaults are as follows:

Solaris - $2a$04 once bcrypt is enabled (it is not by default)
CommuniGate Pro - $2a$05, ditto
OpenBSD - $2a$08 for root, $2a$06 for non-root
Owl - $2y$08 for all by default
openSUSE - $2y$10 for all by default

Google web searches also find numerous instances of $2a$12, albeit
mostly in discussions on use of bcrypt from scripts and such.

An example use other than password authentication:

http://crypto.stackexchange.com/questions/1765/can-i-construct-a-zero-knowledge-proof-that-i-solved-a-project-euler-problem

This has $2a$16 and $2a$20 samples.

The paper and slides on scrypt compare it against bcrypt at up to $2a$16
("tuned for file encryption").

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* crypt_blowfish integration, optimization
  2012-08-09 10:53                 ` Solar Designer
  2012-08-09 11:58                   ` Szabolcs Nagy
@ 2012-08-09 21:46                   ` Rich Felker
  2012-08-09 22:21                     ` Solar Designer
  1 sibling, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09 21:46 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

On Thu, Aug 09, 2012 at 02:53:48PM +0400, Solar Designer wrote:
> Rich -
> 
> On Thu, Aug 09, 2012 at 11:29:40AM +0400, Solar Designer wrote:
> > Attached is the smaller and faster code, as discussed on IRC.
> > 
> > This is under 8 KB.  The speed is similar to the original, I measured
> > -3% to +2% on different systems/builds.
> 
> Here's an even smaller version.

I've taken this version and made some minimum changes based on my
version, mainly for integration with musl where I'm testing it. I also
think we've reached the final word on loop unrolling:

Just For Fun, I tried replacing your unrolled BF_ROUND loop with a for
loop and compiling with -O3 on gcc 4.6.3. After noticing the
performance numbers were coming out near-identical, and that the .o
sizes were mysteriously identical, I decided, Just For Fun, to
disassemble both versions with objdump and diff them. They are
identical. That is, modern gcc generates byte-for-byte identical code
with -O3 for the manually unrolled loop and the for loop.

I'm leaving both versions of the code in the attached file so that you
or anyone else interested can try this. This is not the version I
intend to commit; I want to add back some of my size optimizations in
encode/decode and possibly compare how the compiler does if I add back
my non-hand-scheduled version of the BF_ROUND code. There are also
issues (in crypt_*, not just blowfish) I want to address with
returning unmatchable hashes instead of NULL on failure; this should
further reduce the code size by eliminating all the errno accesses,
etc.

Rich

[-- Attachment #2: crypt_blowfish.c --]
[-- Type: text/plain, Size: 29786 bytes --]

/* Modified by Rich Felker in preparation for inclusion in musl libc.
 * This version is based on Solar Designer's second size-optimized
 * version sent to the musl list, with some changes for integration. */

/*
 * The crypt_blowfish homepage is:
 *
 *	http://www.openwall.com/crypt/
 *
 * This code comes from John the Ripper password cracker, with reentrant
 * and crypt(3) interfaces added, but optimizations specific to password
 * cracking removed.
 *
 * Written by Solar Designer <solar at openwall.com> in 1998-2012.
 * No copyright is claimed, and the software is hereby placed in the public
 * domain.  In case this attempt to disclaim copyright and place the software
 * in the public domain is deemed null and void, then the software is
 * Copyright (c) 1998-2012 Solar Designer and it is hereby released to the
 * general public under the following terms:
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted.
 *
 * There's ABSOLUTELY NO WARRANTY, express or implied.
 *
 * It is my intent that you should be able to use this on your system,
 * as part of a software package, or anywhere else to improve security,
 * ensure compatibility, or for any other purpose.  I would appreciate
 * it if you give credit where it is due and keep your modifications in
 * the public domain as well, but I don't require that in order to let
 * you place this code and any modifications you make under a license
 * of your choice.
 *
 * This implementation is mostly compatible with OpenBSD's bcrypt.c (prefix
 * "$2a$") by Niels Provos <provos at citi.umich.edu>, and uses some of his
 * ideas.  The password hashing algorithm was designed by David Mazieres
 * <dm at lcs.mit.edu>.  For more information on the level of compatibility,
 * please refer to the comments in BF_set_key() below and to the included
 * crypt(3) man page.
 *
 * There's a paper on the algorithm that explains its design decisions:
 *
 *	http://www.usenix.org/events/usenix99/provos.html
 *
 * Some of the tricks in BF_ROUND might be inspired by Eric Young's
 * Blowfish library (I can't be sure if I would think of something if I
 * hadn't seen his code).
 */

#include <string.h>
#include <errno.h>

typedef unsigned int BF_word;
typedef signed int BF_word_signed;

/* Number of Blowfish rounds, this is also hardcoded into a few places */
#define BF_N				16

typedef BF_word BF_key[BF_N + 2];

typedef union {
	struct {
		BF_key P;
		BF_word S[4][0x100];
	} s;
	BF_word PS[BF_N + 2 + 4 * 0x100];
} BF_ctx;

/*
 * Magic IV for 64 Blowfish encryptions that we do at the end.
 * The string is "OrpheanBeholderScryDoubt" on big-endian.
 */
static const BF_word BF_magic_w[6] = {
	0x4F727068, 0x65616E42, 0x65686F6C,
	0x64657253, 0x63727944, 0x6F756274
};

/*
 * P-box and S-box tables initialized with digits of Pi.
 */
static const BF_ctx BF_init_state = {{
	{
		0x243f6a88, 0x85a308d3, 0x13198a2e, 0x03707344,
		0xa4093822, 0x299f31d0, 0x082efa98, 0xec4e6c89,
		0x452821e6, 0x38d01377, 0xbe5466cf, 0x34e90c6c,
		0xc0ac29b7, 0xc97c50dd, 0x3f84d5b5, 0xb5470917,
		0x9216d5d9, 0x8979fb1b
	}, {
		{
			0xd1310ba6, 0x98dfb5ac, 0x2ffd72db, 0xd01adfb7,
			0xb8e1afed, 0x6a267e96, 0xba7c9045, 0xf12c7f99,
			0x24a19947, 0xb3916cf7, 0x0801f2e2, 0x858efc16,
			0x636920d8, 0x71574e69, 0xa458fea3, 0xf4933d7e,
			0x0d95748f, 0x728eb658, 0x718bcd58, 0x82154aee,
			0x7b54a41d, 0xc25a59b5, 0x9c30d539, 0x2af26013,
			0xc5d1b023, 0x286085f0, 0xca417918, 0xb8db38ef,
			0x8e79dcb0, 0x603a180e, 0x6c9e0e8b, 0xb01e8a3e,
			0xd71577c1, 0xbd314b27, 0x78af2fda, 0x55605c60,
			0xe65525f3, 0xaa55ab94, 0x57489862, 0x63e81440,
			0x55ca396a, 0x2aab10b6, 0xb4cc5c34, 0x1141e8ce,
			0xa15486af, 0x7c72e993, 0xb3ee1411, 0x636fbc2a,
			0x2ba9c55d, 0x741831f6, 0xce5c3e16, 0x9b87931e,
			0xafd6ba33, 0x6c24cf5c, 0x7a325381, 0x28958677,
			0x3b8f4898, 0x6b4bb9af, 0xc4bfe81b, 0x66282193,
			0x61d809cc, 0xfb21a991, 0x487cac60, 0x5dec8032,
			0xef845d5d, 0xe98575b1, 0xdc262302, 0xeb651b88,
			0x23893e81, 0xd396acc5, 0x0f6d6ff3, 0x83f44239,
			0x2e0b4482, 0xa4842004, 0x69c8f04a, 0x9e1f9b5e,
			0x21c66842, 0xf6e96c9a, 0x670c9c61, 0xabd388f0,
			0x6a51a0d2, 0xd8542f68, 0x960fa728, 0xab5133a3,
			0x6eef0b6c, 0x137a3be4, 0xba3bf050, 0x7efb2a98,
			0xa1f1651d, 0x39af0176, 0x66ca593e, 0x82430e88,
			0x8cee8619, 0x456f9fb4, 0x7d84a5c3, 0x3b8b5ebe,
			0xe06f75d8, 0x85c12073, 0x401a449f, 0x56c16aa6,
			0x4ed3aa62, 0x363f7706, 0x1bfedf72, 0x429b023d,
			0x37d0d724, 0xd00a1248, 0xdb0fead3, 0x49f1c09b,
			0x075372c9, 0x80991b7b, 0x25d479d8, 0xf6e8def7,
			0xe3fe501a, 0xb6794c3b, 0x976ce0bd, 0x04c006ba,
			0xc1a94fb6, 0x409f60c4, 0x5e5c9ec2, 0x196a2463,
			0x68fb6faf, 0x3e6c53b5, 0x1339b2eb, 0x3b52ec6f,
			0x6dfc511f, 0x9b30952c, 0xcc814544, 0xaf5ebd09,
			0xbee3d004, 0xde334afd, 0x660f2807, 0x192e4bb3,
			0xc0cba857, 0x45c8740f, 0xd20b5f39, 0xb9d3fbdb,
			0x5579c0bd, 0x1a60320a, 0xd6a100c6, 0x402c7279,
			0x679f25fe, 0xfb1fa3cc, 0x8ea5e9f8, 0xdb3222f8,
			0x3c7516df, 0xfd616b15, 0x2f501ec8, 0xad0552ab,
			0x323db5fa, 0xfd238760, 0x53317b48, 0x3e00df82,
			0x9e5c57bb, 0xca6f8ca0, 0x1a87562e, 0xdf1769db,
			0xd542a8f6, 0x287effc3, 0xac6732c6, 0x8c4f5573,
			0x695b27b0, 0xbbca58c8, 0xe1ffa35d, 0xb8f011a0,
			0x10fa3d98, 0xfd2183b8, 0x4afcb56c, 0x2dd1d35b,
			0x9a53e479, 0xb6f84565, 0xd28e49bc, 0x4bfb9790,
			0xe1ddf2da, 0xa4cb7e33, 0x62fb1341, 0xcee4c6e8,
			0xef20cada, 0x36774c01, 0xd07e9efe, 0x2bf11fb4,
			0x95dbda4d, 0xae909198, 0xeaad8e71, 0x6b93d5a0,
			0xd08ed1d0, 0xafc725e0, 0x8e3c5b2f, 0x8e7594b7,
			0x8ff6e2fb, 0xf2122b64, 0x8888b812, 0x900df01c,
			0x4fad5ea0, 0x688fc31c, 0xd1cff191, 0xb3a8c1ad,
			0x2f2f2218, 0xbe0e1777, 0xea752dfe, 0x8b021fa1,
			0xe5a0cc0f, 0xb56f74e8, 0x18acf3d6, 0xce89e299,
			0xb4a84fe0, 0xfd13e0b7, 0x7cc43b81, 0xd2ada8d9,
			0x165fa266, 0x80957705, 0x93cc7314, 0x211a1477,
			0xe6ad2065, 0x77b5fa86, 0xc75442f5, 0xfb9d35cf,
			0xebcdaf0c, 0x7b3e89a0, 0xd6411bd3, 0xae1e7e49,
			0x00250e2d, 0x2071b35e, 0x226800bb, 0x57b8e0af,
			0x2464369b, 0xf009b91e, 0x5563911d, 0x59dfa6aa,
			0x78c14389, 0xd95a537f, 0x207d5ba2, 0x02e5b9c5,
			0x83260376, 0x6295cfa9, 0x11c81968, 0x4e734a41,
			0xb3472dca, 0x7b14a94a, 0x1b510052, 0x9a532915,
			0xd60f573f, 0xbc9bc6e4, 0x2b60a476, 0x81e67400,
			0x08ba6fb5, 0x571be91f, 0xf296ec6b, 0x2a0dd915,
			0xb6636521, 0xe7b9f9b6, 0xff34052e, 0xc5855664,
			0x53b02d5d, 0xa99f8fa1, 0x08ba4799, 0x6e85076a
		}, {
			0x4b7a70e9, 0xb5b32944, 0xdb75092e, 0xc4192623,
			0xad6ea6b0, 0x49a7df7d, 0x9cee60b8, 0x8fedb266,
			0xecaa8c71, 0x699a17ff, 0x5664526c, 0xc2b19ee1,
			0x193602a5, 0x75094c29, 0xa0591340, 0xe4183a3e,
			0x3f54989a, 0x5b429d65, 0x6b8fe4d6, 0x99f73fd6,
			0xa1d29c07, 0xefe830f5, 0x4d2d38e6, 0xf0255dc1,
			0x4cdd2086, 0x8470eb26, 0x6382e9c6, 0x021ecc5e,
			0x09686b3f, 0x3ebaefc9, 0x3c971814, 0x6b6a70a1,
			0x687f3584, 0x52a0e286, 0xb79c5305, 0xaa500737,
			0x3e07841c, 0x7fdeae5c, 0x8e7d44ec, 0x5716f2b8,
			0xb03ada37, 0xf0500c0d, 0xf01c1f04, 0x0200b3ff,
			0xae0cf51a, 0x3cb574b2, 0x25837a58, 0xdc0921bd,
			0xd19113f9, 0x7ca92ff6, 0x94324773, 0x22f54701,
			0x3ae5e581, 0x37c2dadc, 0xc8b57634, 0x9af3dda7,
			0xa9446146, 0x0fd0030e, 0xecc8c73e, 0xa4751e41,
			0xe238cd99, 0x3bea0e2f, 0x3280bba1, 0x183eb331,
			0x4e548b38, 0x4f6db908, 0x6f420d03, 0xf60a04bf,
			0x2cb81290, 0x24977c79, 0x5679b072, 0xbcaf89af,
			0xde9a771f, 0xd9930810, 0xb38bae12, 0xdccf3f2e,
			0x5512721f, 0x2e6b7124, 0x501adde6, 0x9f84cd87,
			0x7a584718, 0x7408da17, 0xbc9f9abc, 0xe94b7d8c,
			0xec7aec3a, 0xdb851dfa, 0x63094366, 0xc464c3d2,
			0xef1c1847, 0x3215d908, 0xdd433b37, 0x24c2ba16,
			0x12a14d43, 0x2a65c451, 0x50940002, 0x133ae4dd,
			0x71dff89e, 0x10314e55, 0x81ac77d6, 0x5f11199b,
			0x043556f1, 0xd7a3c76b, 0x3c11183b, 0x5924a509,
			0xf28fe6ed, 0x97f1fbfa, 0x9ebabf2c, 0x1e153c6e,
			0x86e34570, 0xeae96fb1, 0x860e5e0a, 0x5a3e2ab3,
			0x771fe71c, 0x4e3d06fa, 0x2965dcb9, 0x99e71d0f,
			0x803e89d6, 0x5266c825, 0x2e4cc978, 0x9c10b36a,
			0xc6150eba, 0x94e2ea78, 0xa5fc3c53, 0x1e0a2df4,
			0xf2f74ea7, 0x361d2b3d, 0x1939260f, 0x19c27960,
			0x5223a708, 0xf71312b6, 0xebadfe6e, 0xeac31f66,
			0xe3bc4595, 0xa67bc883, 0xb17f37d1, 0x018cff28,
			0xc332ddef, 0xbe6c5aa5, 0x65582185, 0x68ab9802,
			0xeecea50f, 0xdb2f953b, 0x2aef7dad, 0x5b6e2f84,
			0x1521b628, 0x29076170, 0xecdd4775, 0x619f1510,
			0x13cca830, 0xeb61bd96, 0x0334fe1e, 0xaa0363cf,
			0xb5735c90, 0x4c70a239, 0xd59e9e0b, 0xcbaade14,
			0xeecc86bc, 0x60622ca7, 0x9cab5cab, 0xb2f3846e,
			0x648b1eaf, 0x19bdf0ca, 0xa02369b9, 0x655abb50,
			0x40685a32, 0x3c2ab4b3, 0x319ee9d5, 0xc021b8f7,
			0x9b540b19, 0x875fa099, 0x95f7997e, 0x623d7da8,
			0xf837889a, 0x97e32d77, 0x11ed935f, 0x16681281,
			0x0e358829, 0xc7e61fd6, 0x96dedfa1, 0x7858ba99,
			0x57f584a5, 0x1b227263, 0x9b83c3ff, 0x1ac24696,
			0xcdb30aeb, 0x532e3054, 0x8fd948e4, 0x6dbc3128,
			0x58ebf2ef, 0x34c6ffea, 0xfe28ed61, 0xee7c3c73,
			0x5d4a14d9, 0xe864b7e3, 0x42105d14, 0x203e13e0,
			0x45eee2b6, 0xa3aaabea, 0xdb6c4f15, 0xfacb4fd0,
			0xc742f442, 0xef6abbb5, 0x654f3b1d, 0x41cd2105,
			0xd81e799e, 0x86854dc7, 0xe44b476a, 0x3d816250,
			0xcf62a1f2, 0x5b8d2646, 0xfc8883a0, 0xc1c7b6a3,
			0x7f1524c3, 0x69cb7492, 0x47848a0b, 0x5692b285,
			0x095bbf00, 0xad19489d, 0x1462b174, 0x23820e00,
			0x58428d2a, 0x0c55f5ea, 0x1dadf43e, 0x233f7061,
			0x3372f092, 0x8d937e41, 0xd65fecf1, 0x6c223bdb,
			0x7cde3759, 0xcbee7460, 0x4085f2a7, 0xce77326e,
			0xa6078084, 0x19f8509e, 0xe8efd855, 0x61d99735,
			0xa969a7aa, 0xc50c06c2, 0x5a04abfc, 0x800bcadc,
			0x9e447a2e, 0xc3453484, 0xfdd56705, 0x0e1e9ec9,
			0xdb73dbd3, 0x105588cd, 0x675fda79, 0xe3674340,
			0xc5c43465, 0x713e38d8, 0x3d28f89e, 0xf16dff20,
			0x153e21e7, 0x8fb03d4a, 0xe6e39f2b, 0xdb83adf7
		}, {
			0xe93d5a68, 0x948140f7, 0xf64c261c, 0x94692934,
			0x411520f7, 0x7602d4f7, 0xbcf46b2e, 0xd4a20068,
			0xd4082471, 0x3320f46a, 0x43b7d4b7, 0x500061af,
			0x1e39f62e, 0x97244546, 0x14214f74, 0xbf8b8840,
			0x4d95fc1d, 0x96b591af, 0x70f4ddd3, 0x66a02f45,
			0xbfbc09ec, 0x03bd9785, 0x7fac6dd0, 0x31cb8504,
			0x96eb27b3, 0x55fd3941, 0xda2547e6, 0xabca0a9a,
			0x28507825, 0x530429f4, 0x0a2c86da, 0xe9b66dfb,
			0x68dc1462, 0xd7486900, 0x680ec0a4, 0x27a18dee,
			0x4f3ffea2, 0xe887ad8c, 0xb58ce006, 0x7af4d6b6,
			0xaace1e7c, 0xd3375fec, 0xce78a399, 0x406b2a42,
			0x20fe9e35, 0xd9f385b9, 0xee39d7ab, 0x3b124e8b,
			0x1dc9faf7, 0x4b6d1856, 0x26a36631, 0xeae397b2,
			0x3a6efa74, 0xdd5b4332, 0x6841e7f7, 0xca7820fb,
			0xfb0af54e, 0xd8feb397, 0x454056ac, 0xba489527,
			0x55533a3a, 0x20838d87, 0xfe6ba9b7, 0xd096954b,
			0x55a867bc, 0xa1159a58, 0xcca92963, 0x99e1db33,
			0xa62a4a56, 0x3f3125f9, 0x5ef47e1c, 0x9029317c,
			0xfdf8e802, 0x04272f70, 0x80bb155c, 0x05282ce3,
			0x95c11548, 0xe4c66d22, 0x48c1133f, 0xc70f86dc,
			0x07f9c9ee, 0x41041f0f, 0x404779a4, 0x5d886e17,
			0x325f51eb, 0xd59bc0d1, 0xf2bcc18f, 0x41113564,
			0x257b7834, 0x602a9c60, 0xdff8e8a3, 0x1f636c1b,
			0x0e12b4c2, 0x02e1329e, 0xaf664fd1, 0xcad18115,
			0x6b2395e0, 0x333e92e1, 0x3b240b62, 0xeebeb922,
			0x85b2a20e, 0xe6ba0d99, 0xde720c8c, 0x2da2f728,
			0xd0127845, 0x95b794fd, 0x647d0862, 0xe7ccf5f0,
			0x5449a36f, 0x877d48fa, 0xc39dfd27, 0xf33e8d1e,
			0x0a476341, 0x992eff74, 0x3a6f6eab, 0xf4f8fd37,
			0xa812dc60, 0xa1ebddf8, 0x991be14c, 0xdb6e6b0d,
			0xc67b5510, 0x6d672c37, 0x2765d43b, 0xdcd0e804,
			0xf1290dc7, 0xcc00ffa3, 0xb5390f92, 0x690fed0b,
			0x667b9ffb, 0xcedb7d9c, 0xa091cf0b, 0xd9155ea3,
			0xbb132f88, 0x515bad24, 0x7b9479bf, 0x763bd6eb,
			0x37392eb3, 0xcc115979, 0x8026e297, 0xf42e312d,
			0x6842ada7, 0xc66a2b3b, 0x12754ccc, 0x782ef11c,
			0x6a124237, 0xb79251e7, 0x06a1bbe6, 0x4bfb6350,
			0x1a6b1018, 0x11caedfa, 0x3d25bdd8, 0xe2e1c3c9,
			0x44421659, 0x0a121386, 0xd90cec6e, 0xd5abea2a,
			0x64af674e, 0xda86a85f, 0xbebfe988, 0x64e4c3fe,
			0x9dbc8057, 0xf0f7c086, 0x60787bf8, 0x6003604d,
			0xd1fd8346, 0xf6381fb0, 0x7745ae04, 0xd736fccc,
			0x83426b33, 0xf01eab71, 0xb0804187, 0x3c005e5f,
			0x77a057be, 0xbde8ae24, 0x55464299, 0xbf582e61,
			0x4e58f48f, 0xf2ddfda2, 0xf474ef38, 0x8789bdc2,
			0x5366f9c3, 0xc8b38e74, 0xb475f255, 0x46fcd9b9,
			0x7aeb2661, 0x8b1ddf84, 0x846a0e79, 0x915f95e2,
			0x466e598e, 0x20b45770, 0x8cd55591, 0xc902de4c,
			0xb90bace1, 0xbb8205d0, 0x11a86248, 0x7574a99e,
			0xb77f19b6, 0xe0a9dc09, 0x662d09a1, 0xc4324633,
			0xe85a1f02, 0x09f0be8c, 0x4a99a025, 0x1d6efe10,
			0x1ab93d1d, 0x0ba5a4df, 0xa186f20f, 0x2868f169,
			0xdcb7da83, 0x573906fe, 0xa1e2ce9b, 0x4fcd7f52,
			0x50115e01, 0xa70683fa, 0xa002b5c4, 0x0de6d027,
			0x9af88c27, 0x773f8641, 0xc3604c06, 0x61a806b5,
			0xf0177a28, 0xc0f586e0, 0x006058aa, 0x30dc7d62,
			0x11e69ed7, 0x2338ea63, 0x53c2dd94, 0xc2c21634,
			0xbbcbee56, 0x90bcb6de, 0xebfc7da1, 0xce591d76,
			0x6f05e409, 0x4b7c0188, 0x39720a3d, 0x7c927c24,
			0x86e3725f, 0x724d9db9, 0x1ac15bb4, 0xd39eb8fc,
			0xed545578, 0x08fca5b5, 0xd83d7cd3, 0x4dad0fc4,
			0x1e50ef5e, 0xb161e6f8, 0xa28514d9, 0x6c51133c,
			0x6fd5c7e7, 0x56e14ec4, 0x362abfce, 0xddc6c837,
			0xd79a3234, 0x92638212, 0x670efa8e, 0x406000e0
		}, {
			0x3a39ce37, 0xd3faf5cf, 0xabc27737, 0x5ac52d1b,
			0x5cb0679e, 0x4fa33742, 0xd3822740, 0x99bc9bbe,
			0xd5118e9d, 0xbf0f7315, 0xd62d1c7e, 0xc700c47b,
			0xb78c1b6b, 0x21a19045, 0xb26eb1be, 0x6a366eb4,
			0x5748ab2f, 0xbc946e79, 0xc6a376d2, 0x6549c2c8,
			0x530ff8ee, 0x468dde7d, 0xd5730a1d, 0x4cd04dc6,
			0x2939bbdb, 0xa9ba4650, 0xac9526e8, 0xbe5ee304,
			0xa1fad5f0, 0x6a2d519a, 0x63ef8ce2, 0x9a86ee22,
			0xc089c2b8, 0x43242ef6, 0xa51e03aa, 0x9cf2d0a4,
			0x83c061ba, 0x9be96a4d, 0x8fe51550, 0xba645bd6,
			0x2826a2f9, 0xa73a3ae1, 0x4ba99586, 0xef5562e9,
			0xc72fefd3, 0xf752f7da, 0x3f046f69, 0x77fa0a59,
			0x80e4a915, 0x87b08601, 0x9b09e6ad, 0x3b3ee593,
			0xe990fd5a, 0x9e34d797, 0x2cf0b7d9, 0x022b8b51,
			0x96d5ac3a, 0x017da67d, 0xd1cf3ed6, 0x7c7d2d28,
			0x1f9f25cf, 0xadf2b89b, 0x5ad6b472, 0x5a88f54c,
			0xe029ac71, 0xe019a5e6, 0x47b0acfd, 0xed93fa9b,
			0xe8d3c48d, 0x283b57cc, 0xf8d56629, 0x79132e28,
			0x785f0191, 0xed756055, 0xf7960e44, 0xe3d35e8c,
			0x15056dd4, 0x88f46dba, 0x03a16125, 0x0564f0bd,
			0xc3eb9e15, 0x3c9057a2, 0x97271aec, 0xa93a072a,
			0x1b3f6d9b, 0x1e6321f5, 0xf59c66fb, 0x26dcf319,
			0x7533d928, 0xb155fdf5, 0x03563482, 0x8aba3cbb,
			0x28517711, 0xc20ad9f8, 0xabcc5167, 0xccad925f,
			0x4de81751, 0x3830dc8e, 0x379d5862, 0x9320f991,
			0xea7a90c2, 0xfb3e7bce, 0x5121ce64, 0x774fbe32,
			0xa8b6e37e, 0xc3293d46, 0x48de5369, 0x6413e680,
			0xa2ae0810, 0xdd6db224, 0x69852dfd, 0x09072166,
			0xb39a460a, 0x6445c0dd, 0x586cdecf, 0x1c20c8ae,
			0x5bbef7dd, 0x1b588d40, 0xccd2017f, 0x6bb4e3bb,
			0xdda26a7e, 0x3a59ff45, 0x3e350a44, 0xbcb4cdd5,
			0x72eacea8, 0xfa6484bb, 0x8d6612ae, 0xbf3c6f47,
			0xd29be463, 0x542f5d9e, 0xaec2771b, 0xf64e6370,
			0x740e0d8d, 0xe75b1357, 0xf8721671, 0xaf537d5d,
			0x4040cb08, 0x4eb4e2cc, 0x34d2466a, 0x0115af84,
			0xe1b00428, 0x95983a1d, 0x06b89fb4, 0xce6ea048,
			0x6f3f3b82, 0x3520ab82, 0x011a1d4b, 0x277227f8,
			0x611560b1, 0xe7933fdc, 0xbb3a792b, 0x344525bd,
			0xa08839e1, 0x51ce794b, 0x2f32c9b7, 0xa01fbac9,
			0xe01cc87e, 0xbcc7d1f6, 0xcf0111c3, 0xa1e8aac7,
			0x1a908749, 0xd44fbd9a, 0xd0dadecb, 0xd50ada38,
			0x0339c32a, 0xc6913667, 0x8df9317c, 0xe0b12b4f,
			0xf79e59b7, 0x43f5bb3a, 0xf2d519ff, 0x27d9459c,
			0xbf97222c, 0x15e6fc2a, 0x0f91fc71, 0x9b941525,
			0xfae59361, 0xceb69ceb, 0xc2a86459, 0x12baa8d1,
			0xb6c1075e, 0xe3056a0c, 0x10d25065, 0xcb03a442,
			0xe0ec6e0e, 0x1698db3b, 0x4c98a0be, 0x3278e964,
			0x9f1f9532, 0xe0d392df, 0xd3a0342b, 0x8971f21e,
			0x1b0a7441, 0x4ba3348c, 0xc5be7120, 0xc37632d8,
			0xdf359f8d, 0x9b992f2e, 0xe60b6f47, 0x0fe3f11d,
			0xe54cda54, 0x1edad891, 0xce6279cf, 0xcd3e7e6f,
			0x1618b166, 0xfd2c1d05, 0x848fd2c5, 0xf6fb2299,
			0xf523f357, 0xa6327623, 0x93a83531, 0x56cccd02,
			0xacf08162, 0x5a75ebb5, 0x6e163697, 0x88d273cc,
			0xde966292, 0x81b949d0, 0x4c50901b, 0x71c65614,
			0xe6c6c7bd, 0x327a140a, 0x45e1d006, 0xc3f27b9a,
			0xc9aa53fd, 0x62a80f00, 0xbb25bfe2, 0x35bdd2f6,
			0x71126905, 0xb2040222, 0xb6cbcf7c, 0xcd769c2b,
			0x53113ec0, 0x1640e3d3, 0x38abbd60, 0x2547adf0,
			0xba38209c, 0xf746ce76, 0x77afa1c5, 0x20756060,
			0x85cbfe4e, 0x8ae88dd8, 0x7aaaf9b0, 0x4cf9aa7e,
			0x1948c25c, 0x02fb8a8c, 0x01c36ae4, 0xd6ebe1f9,
			0x90d4f869, 0xa65cdea0, 0x3f09252d, 0xc208e69f,
			0xb74e6132, 0xce77e25b, 0x578fdfe3, 0x3ac372e6
		}
	}
}};

static const unsigned char BF_itoa64[64 + 1] =
	"./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

static const unsigned char BF_atoi64[0x60] = {
	64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 0, 1,
	54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 64, 64, 64, 64, 64,
	64, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
	17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 64, 64, 64, 64, 64,
	64, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
	43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 64, 64, 64, 64, 64
};

#define BF_safe_atoi64(dst, src) \
{ \
	tmp = (unsigned char)(src); \
	if ((unsigned int)(tmp -= 0x20) >= 0x60) return -1; \
	tmp = BF_atoi64[tmp]; \
	if (tmp > 63) return -1; \
	(dst) = tmp; \
}

static int BF_decode(BF_word *dst, const char *src, int size)
{
	unsigned char *dptr = (unsigned char *)dst;
	unsigned char *end = dptr + size;
	const unsigned char *sptr = (const unsigned char *)src;
	unsigned int tmp, c1, c2, c3, c4;

	do {
		BF_safe_atoi64(c1, *sptr++);
		BF_safe_atoi64(c2, *sptr++);
		*dptr++ = (c1 << 2) | ((c2 & 0x30) >> 4);
		if (dptr >= end) break;

		BF_safe_atoi64(c3, *sptr++);
		*dptr++ = ((c2 & 0x0F) << 4) | ((c3 & 0x3C) >> 2);
		if (dptr >= end) break;

		BF_safe_atoi64(c4, *sptr++);
		*dptr++ = ((c3 & 0x03) << 6) | c4;
	} while (dptr < end);

	return 0;
}

static void BF_encode(char *dst, const BF_word *src, int size)
{
	const unsigned char *sptr = (const unsigned char *)src;
	const unsigned char *end = sptr + size;
	unsigned char *dptr = (unsigned char *)dst;
	unsigned int c1, c2;

	do {
		c1 = *sptr++;
		*dptr++ = BF_itoa64[c1 >> 2];
		c1 = (c1 & 0x03) << 4;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 4;
		*dptr++ = BF_itoa64[c1];
		c1 = (c2 & 0x0f) << 2;
		if (sptr >= end) {
			*dptr++ = BF_itoa64[c1];
			break;
		}

		c2 = *sptr++;
		c1 |= c2 >> 6;
		*dptr++ = BF_itoa64[c1];
		*dptr++ = BF_itoa64[c2 & 0x3f];
	} while (sptr < end);
}

static void BF_swap(BF_word *x, int count)
{
	if ((union { int i; char c; }){1}.c)
	do {
		BF_word tmp = *x;
		tmp = (tmp << 16) | (tmp >> 16);
		*x++ = ((tmp & 0x00FF00FF) << 8) | ((tmp >> 8) & 0x00FF00FF);
	} while (--count);
}

#define BF_ROUND(L, R, N) \
	tmp1 = L & 0xFF; \
	tmp2 = L >> 8; \
	tmp2 &= 0xFF; \
	tmp3 = L >> 16; \
	tmp3 &= 0xFF; \
	tmp4 = L >> 24; \
	tmp1 = ctx->s.S[3][tmp1]; \
	tmp2 = ctx->s.S[2][tmp2]; \
	tmp3 = ctx->s.S[1][tmp3]; \
	tmp3 += ctx->s.S[0][tmp4]; \
	tmp3 ^= tmp2; \
	R ^= ctx->s.P[N + 1]; \
	tmp3 += tmp1; \
	R ^= tmp3;

static BF_word BF_encrypt(BF_ctx *ctx,
    BF_word L, BF_word R,
    BF_word *start, BF_word *end)
{
	BF_word tmp1, tmp2, tmp3, tmp4;
	BF_word *ptr = start;

	do {
		ptr += 2;
		L ^= ctx->s.P[0];
#if 0
		BF_ROUND(L, R, 0);
		BF_ROUND(R, L, 1);
		BF_ROUND(L, R, 2);
		BF_ROUND(R, L, 3);
		BF_ROUND(L, R, 4);
		BF_ROUND(R, L, 5);
		BF_ROUND(L, R, 6);
		BF_ROUND(R, L, 7);
		BF_ROUND(L, R, 8);
		BF_ROUND(R, L, 9);
		BF_ROUND(L, R, 10);
		BF_ROUND(R, L, 11);
		BF_ROUND(L, R, 12);
		BF_ROUND(R, L, 13);
		BF_ROUND(L, R, 14);
		BF_ROUND(R, L, 15);
#else
		for (int i=0; i<16; i+=2) {
			BF_ROUND(L, R, i);
			BF_ROUND(R, L, i+1);
		}
#endif
		tmp4 = R;
		R = L;
		L = tmp4 ^ ctx->s.P[BF_N + 1];
		*(ptr - 1) = R;
		*(ptr - 2) = L;
	} while (ptr < end);

	return L;
}

static void BF_set_key(const char *key, BF_key expanded, BF_key initial,
    unsigned char flags)
{
	const char *ptr = key;
	unsigned int bug, i, j;
	BF_word safety, sign, diff, tmp[2];

/*
 * There was a sign extension bug in older revisions of this function.  While
 * we would have liked to simply fix the bug and move on, we have to provide
 * a backwards compatibility feature (essentially the bug) for some systems and
 * a safety measure for some others.  The latter is needed because for certain
 * multiple inputs to the buggy algorithm there exist easily found inputs to
 * the correct algorithm that produce the same hash.  Thus, we optionally
 * deviate from the correct algorithm just enough to avoid such collisions.
 * While the bug itself affected the majority of passwords containing
 * characters with the 8th bit set (although only a percentage of those in a
 * collision-producing way), the anti-collision safety measure affects
 * only a subset of passwords containing the '\xff' character (not even all of
 * those passwords, just some of them).  This character is not found in valid
 * UTF-8 sequences and is rarely used in popular 8-bit character encodings.
 * Thus, the safety measure is unlikely to cause much annoyance, and is a
 * reasonable tradeoff to use when authenticating against existing hashes that
 * are not reliably known to have been computed with the correct algorithm.
 *
 * We use an approach that tries to minimize side-channel leaks of password
 * information - that is, we mostly use fixed-cost bitwise operations instead
 * of branches or table lookups.  (One conditional branch based on password
 * length remains.  It is not part of the bug aftermath, though, and is
 * difficult and possibly unreasonable to avoid given the use of C strings by
 * the caller, which results in similar timing leaks anyway.)
 *
 * For actual implementation, we set an array index in the variable "bug"
 * (0 means no bug, 1 means sign extension bug emulation) and a flag in the
 * variable "safety" (bit 16 is set when the safety measure is requested).
 * Valid combinations of settings are:
 *
 * Prefix "$2a$": bug = 0, safety = 0x10000
 * Prefix "$2x$": bug = 1, safety = 0
 * Prefix "$2y$": bug = 0, safety = 0
 */
	bug = (unsigned int)flags & 1;
	safety = ((BF_word)flags & 2) << 15;

	sign = diff = 0;

	for (i = 0; i < BF_N + 2; i++) {
		tmp[0] = tmp[1] = 0;
		for (j = 0; j < 4; j++) {
			tmp[0] <<= 8;
			tmp[0] |= (unsigned char)*ptr; /* correct */
			tmp[1] <<= 8;
			tmp[1] |= (BF_word_signed)(signed char)*ptr; /* bug */
/*
 * Sign extension in the first char has no effect - nothing to overwrite yet,
 * and those extra 24 bits will be fully shifted out of the 32-bit word.  For
 * chars 2, 3, 4 in each four-char block, we set bit 7 of "sign" if sign
 * extension in tmp[1] occurs.  Once this flag is set, it remains set.
 */
			if (j)
				sign |= tmp[1] & 0x80;
			if (!*ptr)
				ptr = key;
			else
				ptr++;
		}
		diff |= tmp[0] ^ tmp[1]; /* Non-zero on any differences */

		expanded[i] = tmp[bug];
		initial[i] = BF_init_state.s.P[i] ^ tmp[bug];
	}

/*
 * At this point, "diff" is zero iff the correct and buggy algorithms produced
 * exactly the same result.  If so and if "sign" is non-zero, which indicates
 * that there was a non-benign sign extension, this means that we have a
 * collision between the correctly computed hash for this password and a set of
 * passwords that could be supplied to the buggy algorithm.  Our safety measure
 * is meant to protect from such many-buggy to one-correct collisions, by
 * deviating from the correct algorithm in such cases.  Let's check for this.
 */
	diff |= diff >> 16; /* still zero iff exact match */
	diff &= 0xffff; /* ditto */
	diff += 0xffff; /* bit 16 set iff "diff" was non-zero (on non-match) */
	sign <<= 9; /* move the non-benign sign extension flag to bit 16 */
	sign &= ~diff & safety; /* action needed? */

/*
 * If we have determined that we need to deviate from the correct algorithm,
 * flip bit 16 in initial expanded key.  (The choice of 16 is arbitrary, but
 * let's stick to it now.  It came out of the approach we used above, and it's
 * not any worse than any other choice we could make.)
 *
 * It is crucial that we don't do the same to the expanded key used in the main
 * Eksblowfish loop.  By doing it to only one of these two, we deviate from a
 * state that could be directly specified by a password to the buggy algorithm
 * (and to the fully correct one as well, but that's a side-effect).
 */
	initial[0] ^= sign;
}

static char *BF_crypt(const char *key, const char *setting,
	char *output, BF_word min)
{
	static const unsigned char flags_by_subtype[26] =
		{2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0};
	struct {
		BF_ctx ctx;
		BF_key expanded_key;
		union {
			BF_word salt[4];
			BF_word output[6];
		} binary;
	} data;
	BF_word count;
	int i;

	if (setting[0] != '$' ||
	    setting[1] != '2' ||
	    setting[2] - 'a' > 25U ||
	    !flags_by_subtype[setting[2] - 'a'] ||
	    setting[3] != '$' ||
	    setting[4] - '0' > 1U ||
	    setting[5] - '0' > 9U ||
	    setting[6] != '$') {
		errno = EINVAL;
		return NULL;
	}

	count = (BF_word)1 << ((setting[4] - '0') * 10 + (setting[5] - '0'));
	if (count < min || BF_decode(data.binary.salt, &setting[7], 16)) {
		errno = EINVAL;
		return NULL;
	}
	BF_swap(data.binary.salt, 4);

	BF_set_key(key, data.expanded_key, data.ctx.s.P,
	    flags_by_subtype[(unsigned int)(unsigned char)setting[2] - 'a']);

	memcpy(data.ctx.s.S, BF_init_state.s.S, sizeof(data.ctx.s.S));

	{
		BF_word L = 0, R = 0;
		BF_word *ptr = &data.ctx.PS[0];
		do {
			L = BF_encrypt(&data.ctx,
			    L ^ data.binary.salt[0], R ^ data.binary.salt[1],
			    ptr, ptr);
			R = *(ptr + 1);
			ptr += 2;

			if (ptr >= &data.ctx.PS[BF_N + 2 + 4 * 0x100])
				break;

			L = BF_encrypt(&data.ctx,
			    L ^ data.binary.salt[2], R ^ data.binary.salt[3],
			    ptr, ptr);
			R = *(ptr + 1);
			ptr += 2;
		} while (1);
	}

	do {
		int done;

		for (i = 0; i < BF_N + 2; i += 2) {
			data.ctx.s.P[i] ^= data.expanded_key[i];
			data.ctx.s.P[i + 1] ^= data.expanded_key[i + 1];
		}

		done = 0;
		do {
			BF_encrypt(&data.ctx, 0, 0,
			    &data.ctx.PS[0],
			    &data.ctx.PS[BF_N + 2 + 4 * 0x100]);

			if (done)
				break;
			done = 1;

			{
				BF_word tmp1, tmp2, tmp3, tmp4;

				tmp1 = data.binary.salt[0];
				tmp2 = data.binary.salt[1];
				tmp3 = data.binary.salt[2];
				tmp4 = data.binary.salt[3];
				for (i = 0; i < BF_N; i += 4) {
					data.ctx.s.P[i] ^= tmp1;
					data.ctx.s.P[i + 1] ^= tmp2;
					data.ctx.s.P[i + 2] ^= tmp3;
					data.ctx.s.P[i + 3] ^= tmp4;
				}
				data.ctx.s.P[16] ^= tmp1;
				data.ctx.s.P[17] ^= tmp2;
			}
		} while (1);
	} while (--count);

	for (i = 0; i < 6; i += 2) {
		BF_word L, LR[2];

		L = BF_magic_w[i];
		LR[1] = BF_magic_w[i + 1];

		count = 64;
		do {
			L = BF_encrypt(&data.ctx, L, LR[1],
			    &LR[0], &LR[0]);
		} while (--count);

		data.binary.output[i] = L;
		data.binary.output[i + 1] = LR[1];
	}

	memcpy(output, setting, 7 + 22 - 1);
	output[7 + 22 - 1] = BF_itoa64[(int)
		BF_atoi64[(int)setting[7 + 22 - 1] - 0x20] & 0x30];

/* This has to be bug-compatible with the original implementation, so
 * only encode 23 of the 24 bytes. :-) */
	BF_swap(data.binary.output, 6);
	BF_encode(&output[7 + 22], data.binary.output, 23);
	output[7 + 22 + 31] = '\0';

	return output;
}

/*
 * Please preserve the runtime self-test.  It serves two purposes at once:
 *
 * 1. We really can't afford the risk of producing incompatible hashes e.g.
 * when there's something like gcc bug 26587 again, whereas an application or
 * library integrating this code might not also integrate our external tests or
 * it might not run them after every build.  Even if it does, the miscompile
 * might only occur on the production build, but not on a testing build (such
 * as because of different optimization settings).  It is painful to recover
 * from incorrectly-computed hashes - merely fixing whatever broke is not
 * enough.  Thus, a proactive measure like this self-test is needed.
 *
 * 2. We don't want to leave sensitive data from our actual password hash
 * computation on the stack or in registers.  Previous revisions of the code
 * would do explicit cleanups, but simply running the self-test after hash
 * computation is more reliable.
 *
 * The performance cost of this quick self-test is around 0.6% at the "$2a$08"
 * setting.
 */
char *__crypt_blowfish(const char *key, const char *setting, char *output)
{
	const char *test_key = "8b \xd0\xc1\xd2\xcf\xcc\xd8";
	const char *test_setting = "$2a$00$abcdefghijklmnopqrstuu";
	static const char * const test_hash[2] =
		{"VUrPmXD6q/nVSSp7pNDhCR9071IfIRe\0\x55", /* $2x$ */
		"i1D709vfamulimlGcq0qq3UvuUasvEa\0\x55"}; /* $2a$, $2y$ */
	char *retval;
	const char *p;
	int save_errno, ok;
	struct {
		char s[7 + 22 + 1];
		char o[7 + 22 + 31 + 1 + 1 + 1];
	} buf;

/* Hash the supplied password */
	retval = BF_crypt(key, setting, output, 16);
	save_errno = errno;

/*
 * Do a quick self-test.  It is important that we make both calls to BF_crypt()
 * from the same scope such that they likely use the same stack locations,
 * which makes the second call overwrite the first call's sensitive data on the
 * stack and makes it more likely that any alignment related issues would be
 * detected by the self-test.
 */
	memcpy(buf.s, test_setting, sizeof(buf.s));
	if (retval)
		buf.s[2] = setting[2];
	memset(buf.o, 0x55, sizeof(buf.o));
	buf.o[sizeof(buf.o) - 1] = 0;
	p = BF_crypt(test_key, buf.s, buf.o, 1);

	ok = (p == buf.o &&
	    !memcmp(p, buf.s, 7 + 22) &&
	    !memcmp(p + (7 + 22),
	    test_hash[(unsigned int)(unsigned char)buf.s[2] & 1],
	    31 + 1 + 1 + 1));

	{
		const char *k = "\xff\xa3" "34" "\xff\xff\xff\xa3" "345";
		BF_key ae, ai, ye, yi;
		BF_set_key(k, ae, ai, 2); /* $2a$ */
		BF_set_key(k, ye, yi, 4); /* $2y$ */
		ai[0] ^= 0x10000; /* undo the safety (for comparison) */
		ok = ok && ai[0] == 0xdb9c59bc && ye[17] == 0x33343500 &&
		    !memcmp(ae, ye, sizeof(ae)) &&
		    !memcmp(ai, yi, sizeof(ai));
	}

	errno = save_errno;
	if (ok)
		return retval;

/* Should not happen */
	errno = EINVAL; /* pretend we don't support this hash type */
	return NULL;
}

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 21:44                     ` Solar Designer
@ 2012-08-09 22:08                       ` Rich Felker
  0 siblings, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09 22:08 UTC (permalink / raw)
  To: musl

On Fri, Aug 10, 2012 at 01:44:31AM +0400, Solar Designer wrote:
> On Thu, Aug 09, 2012 at 05:17:36PM -0400, Rich Felker wrote:
> > After some casual tests, I would say somewhere around 16 is
> > appropriate as the absolute upper cut-off, and 12-14 is probably the
> > "point a good bit lower" we're aiming for. Anyone else have opinions
> > on this? Information on what's in common use in the wild? (I would
> > guess 4-8 is typical in the wild..)
> 
> 4-12 exist in the wild for password authentication, larger values are
> sometimes seen for other uses (you may choose not to support such uses).

Then 12 is probably not a good cut-off, which is frustrating because
12 is getting to the point where it's unreasonable load on a mid-range
system (takes ½ sec on my atom).

> I think the defaults are as follows:
> 
> Solaris - $2a$04 once bcrypt is enabled (it is not by default)
> CommuniGate Pro - $2a$05, ditto
> OpenBSD - $2a$08 for root, $2a$06 for non-root
> Owl - $2y$08 for all by default
> openSUSE - $2y$10 for all by default

Thanks, very informative.

> An example use other than password authentication:
> 
> http://crypto.stackexchange.com/questions/1765/can-i-construct-a-zero-knowledge-proof-that-i-solved-a-project-euler-problem
> 
> This has $2a$16 and $2a$20 samples.
> 
> The paper and slides on scrypt compare it against bcrypt at up to $2a$16
> ("tuned for file encryption").

I think this potentially needs to be something we just don't support.
I can see the interest in being able to use crypt as a general purpose
hashing API, but I think I'd have a hard time convincing myself to
prioritize that over ensuring bounded runtime.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt_blowfish integration, optimization
  2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
@ 2012-08-09 22:21                     ` Solar Designer
  2012-08-09 22:32                       ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-09 22:21 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 05:46:54PM -0400, Rich Felker wrote:
> I've taken this version and made some minimum changes based on my
> version, mainly for integration with musl where I'm testing it. I also
> think we've reached the final word on loop unrolling:
> 
> Just For Fun, I tried replacing your unrolled BF_ROUND loop with a for
> loop and compiling with -O3 on gcc 4.6.3. After noticing the
> performance numbers were coming out near-identical, and that the .o
> sizes were mysteriously identical, I decided, Just For Fun, to
> disassemble both versions with objdump and diff them. They are
> identical. That is, modern gcc generates byte-for-byte identical code
> with -O3 for the manually unrolled loop and the for loop.

What about -O2?

-O3 is probably not what will be used for most musl builds, is it?

Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
"disabled" for -funroll-loops.  Why was the loop unrolled for you?
Did you also have -funroll-loops specified explicitly?  If so, does this
happen for normal musl builds?  I guess not?

As discussed, the problem with avoiding such hand-unrolls is that the
compiler doesn't know just which loops are most important to unroll.

BTW, what speeds are you getting on your Atom?  How does this compare to
the original crypt_blowfish-1.2 with asm code (both on 32-bit)?

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt_blowfish integration, optimization
  2012-08-09 22:21                     ` Solar Designer
@ 2012-08-09 22:32                       ` Rich Felker
  2012-08-10 17:18                         ` Solar Designer
  0 siblings, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09 22:32 UTC (permalink / raw)
  To: musl

On Fri, Aug 10, 2012 at 02:21:03AM +0400, Solar Designer wrote:
> On Thu, Aug 09, 2012 at 05:46:54PM -0400, Rich Felker wrote:
> > I've taken this version and made some minimum changes based on my
> > version, mainly for integration with musl where I'm testing it. I also
> > think we've reached the final word on loop unrolling:
> > 
> > Just For Fun, I tried replacing your unrolled BF_ROUND loop with a for
> > loop and compiling with -O3 on gcc 4.6.3. After noticing the
> > performance numbers were coming out near-identical, and that the .o
> > sizes were mysteriously identical, I decided, Just For Fun, to
> > disassemble both versions with objdump and diff them. They are
> > identical. That is, modern gcc generates byte-for-byte identical code
> > with -O3 for the manually unrolled loop and the for loop.
> 
> What about -O2?
> 
> -O3 is probably not what will be used for most musl builds, is it?
> 
> Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
> "disabled" for -funroll-loops.  Why was the loop unrolled for you?

Not sure. I've found -Q --help=optimizers completely unreliable in the
past though. It only reports minimal differences between -Os, -O2, and
-O3, and trying to start with -O3 and reproduce -Os by just changing
the options that are different does not give effects even remotely
similar to -Os.

> Did you also have -funroll-loops specified explicitly?  If so, does this
> happen for normal musl builds?  I guess not?

No, I did not explicitly specify it. At present, -Os is default for
static libc and -O3 is default for shared libc. The reason for this
discrepency is that -fPIC generates a lot of size and speed bloat at
each function call, so the inlining from -O3 comes at reduced cost (it
eliminates wasteful prologue, compensating for some of the size
increase) and much greater performance benefits (again, from killing
prologue).

I've been thinking of making -O3 default across the board rather than
having different defaults for the two, which are ugly from a
build-system perspective, but some people are still against it even
though it's easy to override.

> As discussed, the problem with avoiding such hand-unrolls is that the
> compiler doesn't know just which loops are most important to unroll.

My experience has been that it tends to make good decisions overall,
and that if somebody is using -Os, they really want smallest size, not
performance.

> BTW, what speeds are you getting on your Atom?

I was clocking 0.573 seconds for one run with the 2^12 iterations on
one test, and about 4 million cycles per run with 2^4 iterations. This
is with my version of the code (essentially the same as yours;
compiled at -O3).

> How does this compare to
> the original crypt_blowfish-1.2 with asm code (both on 32-bit)?

I'll have to get the code and try it... The asm doesn't seem to have
ever been present in the code sent to the list.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 11:58                   ` Szabolcs Nagy
  2012-08-09 16:43                     ` Solar Designer
@ 2012-08-09 23:21                     ` Rich Felker
  2012-08-10 17:04                       ` Solar Designer
  1 sibling, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-09 23:21 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > #define BF_ROUND(L, R, N) \
> > 	tmp1 = L & 0xFF; \
> > 	tmp2 = L >> 8; \
> > 	tmp2 &= 0xFF; \
> > 	tmp3 = L >> 16; \
> > 	tmp3 &= 0xFF; \
> > 	tmp4 = L >> 24; \
> > 	tmp1 = ctx->s.S[3][tmp1]; \
> > 	tmp2 = ctx->s.S[2][tmp2]; \
> > 	tmp3 = ctx->s.S[1][tmp3]; \
> > 	tmp3 += ctx->s.S[0][tmp4]; \
> > 	tmp3 ^= tmp2; \
> > 	R ^= ctx->s.P[N + 1]; \
> > 	tmp3 += tmp1; \
> > 	R ^= tmp3;
> 
> i guess this is performance critical, but
> i wouldn't spread those expressions over
> several lines
> 
> tmp1 = ctx->S[3][L & 0xff];
> tmp2 = ctx->S[2][L>>8 & 0xff];
> tmp3 = ctx->S[1][L>>16 & 0xff];
> tmp4 = ctx->S[0][L>>24 & 0xff];
> R ^= ctx->P[N+1];
> R ^= ((tmp3 + tmp4) ^ tmp2) + tmp1;

My first modified version to remove the manual scheduling is
significantly slower than the hand-scheduled version. I haven't tried
your version here yet, but it looks nicer and I think it would be
reasonable to compare and see if it's better.

> > 	do {
> > 		ptr += 2;
> > 		L ^= ctx->s.P[0];
> > 		BF_ROUND(L, R, 0);
> > 		BF_ROUND(R, L, 1);
> > 		BF_ROUND(L, R, 2);
> > 		BF_ROUND(R, L, 3);
> > 		BF_ROUND(L, R, 4);
> > 		BF_ROUND(R, L, 5);
> > 		BF_ROUND(L, R, 6);
> > 		BF_ROUND(R, L, 7);
> > 		BF_ROUND(L, R, 8);
> > 		BF_ROUND(R, L, 9);
> > 		BF_ROUND(L, R, 10);
> > 		BF_ROUND(R, L, 11);
> > 		BF_ROUND(L, R, 12);
> > 		BF_ROUND(R, L, 13);
> > 		BF_ROUND(L, R, 14);
> > 		BF_ROUND(R, L, 15);
> > 		tmp4 = R;
> > 		R = L;
> > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > 		*(ptr - 1) = R;
> > 		*(ptr - 2) = L;
> > 	} while (ptr < end);
> 
> why increase ptr at the begining?
> it seems the idiomatic way would be
> 
>  *ptr++ = L;
>  *ptr++ = R;

For me, making this change makes it 5% faster. I suspect the
difference comes from the fact that gcc is not smart enough to move
the ptr+=2; across the rest of the loop body, and the fact that it
gets spilled to the stack and reloaded for *both* points of usage
rather than just one. The original version may perform better on
machines with A LOT more registers, but I'm doubtful...

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09  1:51         ` Solar Designer
  2012-08-09  3:25           ` Rich Felker
@ 2012-08-09 23:33           ` Rich Felker
  1 sibling, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-09 23:33 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 05:51:04AM +0400, Solar Designer wrote:
> BTW, the extended DES-based hashes that are already supported in musl
> allow for variable iteration counts encoded along with hashes too, and
> that's the way it should be.

The maximum iteration count the format allows is 2^24-1, which
although a bit excessive is still in the range of "sane". It takes 11
seconds to compute on my Atom. Compare this to blowfish count=2^20
taking at least a minute, and therefore count=2^31 (the maximum
allowed) taking over 2048 minutes (1.5 days).

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-09 23:21                     ` Rich Felker
@ 2012-08-10 17:04                       ` Solar Designer
  2012-08-10 18:06                         ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-10 17:04 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 07:21:32PM -0400, Rich Felker wrote:
> On Thu, Aug 09, 2012 at 01:58:12PM +0200, Szabolcs Nagy wrote:
> > > 	do {
> > > 		ptr += 2;
> > > 		L ^= ctx->s.P[0];
> > > 		BF_ROUND(L, R, 0);
[...]
> > > 		BF_ROUND(R, L, 15);
> > > 		tmp4 = R;
> > > 		R = L;
> > > 		L = tmp4 ^ ctx->s.P[BF_N + 1];
> > > 		*(ptr - 1) = R;
> > > 		*(ptr - 2) = L;
> > > 	} while (ptr < end);
> > 
> > why increase ptr at the begining?
> > it seems the idiomatic way would be
> > 
> >  *ptr++ = L;
> >  *ptr++ = R;
> 
> For me, making this change makes it 5% faster. I suspect the
> difference comes from the fact that gcc is not smart enough to move
> the ptr+=2; across the rest of the loop body, and the fact that it
> gets spilled to the stack and reloaded for *both* points of usage
> rather than just one. The original version may perform better on
> machines with A LOT more registers, but I'm doubtful...

The spilling theory makes sense to me, but it does not fully explain the
5% difference - I think it could explain a 1% difference or so.  More
likely there's some change in register allocation overall, not only for
ptr - or something like it.

Anyhow, this does not match my test results so far, for different
revisions of this code.  What compiler, options, architecture, CPU?

As written, this code did in fact want more registers than 32-bit x86
has - it needs one more register for the context, for thread-safety
introduced in crypt_blowfish as opposed to JtR.  In crypt_blowfish, I
addressed this by some magic in the asm code, and assumed that other
common archs do have more than 8 registers.  With the asm code dropped,
maybe this piece of C does need to be optimized for 32-bit x86 more -
although it performs as well as the asm code on CPUs newer than the
original Pentium (where the asm code is a lot faster) and different than
Atom (where users reported the asm code being significantly faster).

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt_blowfish integration, optimization
  2012-08-09 22:32                       ` Rich Felker
@ 2012-08-10 17:18                         ` Solar Designer
  2012-08-10 18:08                           ` Rich Felker
  0 siblings, 1 reply; 52+ messages in thread
From: Solar Designer @ 2012-08-10 17:18 UTC (permalink / raw)
  To: musl

On Thu, Aug 09, 2012 at 06:32:59PM -0400, Rich Felker wrote:
> On Fri, Aug 10, 2012 at 02:21:03AM +0400, Solar Designer wrote:
> > Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
> > "disabled" for -funroll-loops.  Why was the loop unrolled for you?
> 
> Not sure. I've found -Q --help=optimizers completely unreliable in the
> past though. It only reports minimal differences between -Os, -O2, and
> -O3, and trying to start with -O3 and reproduce -Os by just changing
> the options that are different does not give effects even remotely
> similar to -Os.

Frankly, this matches my experience.  OK, -Q --help=optimizers is
unreliable.  But is -O3 supposed to include -funroll-loops now?  Does
it?  Or did you get loop unrolling done for some other reason?  I think
this needs to be understood by us.

> > As discussed, the problem with avoiding such hand-unrolls is that the
> > compiler doesn't know just which loops are most important to unroll.
> 
> My experience has been that it tends to make good decisions overall,

Yes, good decisions overall - like measured in terms of geometric mean
or median for performance change across many functions (I wrote a script
called relbench that reports such measurements for JtR builds) - but
sometimes poor decisions for individual performance-critical functions.
So hand-unrolling in those special cases helps.

> and that if somebody is using -Os, they really want smallest size, not
> performance.

Maybe, however:

So far, -Os was often providing good performance as well, on par with -O2.
IIRC, in the relbench tests mentioned above, it was 92% of -O2 on gcc 4.6
on x86_64 for the geometric mean across about 150 separate benchmark
results, but in some cases -Os code was actually faster than -O2.

So someone using -Os may want nearly optimal code that is also slightly
smaller.  If for some function we get a more than ~8% hit with -Os vs.
-O3 (or whatever does the unrolling), this means that the function could
use some hand-optimization to fix that.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt* files in crypt directory
  2012-08-10 17:04                       ` Solar Designer
@ 2012-08-10 18:06                         ` Rich Felker
  0 siblings, 0 replies; 52+ messages in thread
From: Rich Felker @ 2012-08-10 18:06 UTC (permalink / raw)
  To: musl

On Fri, Aug 10, 2012 at 09:04:35PM +0400, Solar Designer wrote:
> > > why increase ptr at the begining?
> > > it seems the idiomatic way would be
> > > 
> > >  *ptr++ = L;
> > >  *ptr++ = R;
> > 
> > For me, making this change makes it 5% faster. I suspect the
> > difference comes from the fact that gcc is not smart enough to move
> > the ptr+=2; across the rest of the loop body, and the fact that it
> > gets spilled to the stack and reloaded for *both* points of usage
> > rather than just one. The original version may perform better on
> > machines with A LOT more registers, but I'm doubtful...
> 
> The spilling theory makes sense to me, but it does not fully explain the
> 5% difference - I think it could explain a 1% difference or so.  More
> likely there's some change in register allocation overall, not only for
> ptr - or something like it.

Indeed, that's possible too. I haven't read the asm diff.

> Anyhow, this does not match my test results so far, for different
> revisions of this code.  What compiler, options, architecture, CPU?

gcc 4.6.3, -O3, generic/i486 code generation, no tuning for my cpu,
which is Atom.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt_blowfish integration, optimization
  2012-08-10 17:18                         ` Solar Designer
@ 2012-08-10 18:08                           ` Rich Felker
  2012-08-10 22:52                             ` Solar Designer
  0 siblings, 1 reply; 52+ messages in thread
From: Rich Felker @ 2012-08-10 18:08 UTC (permalink / raw)
  To: musl

On Fri, Aug 10, 2012 at 09:18:03PM +0400, Solar Designer wrote:
> > Not sure. I've found -Q --help=optimizers completely unreliable in the
> > past though. It only reports minimal differences between -Os, -O2, and
> > -O3, and trying to start with -O3 and reproduce -Os by just changing
> > the options that are different does not give effects even remotely
> > similar to -Os.
> 
> Frankly, this matches my experience.  OK, -Q --help=optimizers is
> unreliable.  But is -O3 supposed to include -funroll-loops now?  Does
> it?  Or did you get loop unrolling done for some other reason?  I think
> this needs to be understood by us.

Yes, -O3 includes -funroll-loops, which is intelligent about choosing
which loops to unroll. There's -funroll-all-loops that's much more
aggressive and not included in any -O level by default.

Rich


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: crypt_blowfish integration, optimization
  2012-08-10 18:08                           ` Rich Felker
@ 2012-08-10 22:52                             ` Solar Designer
  0 siblings, 0 replies; 52+ messages in thread
From: Solar Designer @ 2012-08-10 22:52 UTC (permalink / raw)
  To: musl

On Fri, Aug 10, 2012 at 02:08:14PM -0400, Rich Felker wrote:
> Yes, -O3 includes -funroll-loops,

That's good.

BTW, for the version of code I posted last, going from -O2 to -O3 with
gcc 3 actually slows things down by about 8% on 32-bit x86.

> which is intelligent about choosing which loops to unroll.

Yes, except that it can't know if the function is performance-critical
or not.

Alexander


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2012-08-10 22:52 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-21 15:23 crypt* files in crypt directory Łukasz Sowa
2012-07-21 17:11 ` Solar Designer
2012-07-21 20:17   ` Rich Felker
2012-07-22 16:23   ` Łukasz Sowa
2012-07-25  7:57 ` Rich Felker
2012-08-08  2:24 ` Rich Felker
2012-08-08  4:42   ` Solar Designer
2012-08-08  5:28     ` Rich Felker
2012-08-08  6:27       ` Solar Designer
2012-08-08  7:03         ` Daniel Cegiełka
2012-08-08  7:24           ` Solar Designer
2012-08-08  7:42             ` Daniel Cegiełka
2012-08-08 21:48           ` Rich Felker
2012-08-08 23:08             ` Isaac Dunham
2012-08-08 23:24               ` John Spencer
2012-08-09  1:03                 ` Isaac Dunham
2012-08-09  3:16               ` Rich Felker
2012-08-09  3:36             ` Solar Designer
2012-08-09  7:13               ` orc
2012-08-09  7:28                 ` Rich Felker
2012-08-09  7:29               ` Solar Designer
2012-08-09 10:53                 ` Solar Designer
2012-08-09 11:58                   ` Szabolcs Nagy
2012-08-09 16:43                     ` Solar Designer
2012-08-09 17:30                       ` Szabolcs Nagy
2012-08-09 18:22                       ` Rich Felker
2012-08-09 23:21                     ` Rich Felker
2012-08-10 17:04                       ` Solar Designer
2012-08-10 18:06                         ` Rich Felker
2012-08-09 21:46                   ` crypt_blowfish integration, optimization Rich Felker
2012-08-09 22:21                     ` Solar Designer
2012-08-09 22:32                       ` Rich Felker
2012-08-10 17:18                         ` Solar Designer
2012-08-10 18:08                           ` Rich Felker
2012-08-10 22:52                             ` Solar Designer
2012-08-08  7:52     ` crypt* files in crypt directory Szabolcs Nagy
2012-08-08 13:06       ` Rich Felker
2012-08-08 14:30         ` orc
2012-08-08 14:53           ` Szabolcs Nagy
2012-08-08 15:05             ` orc
2012-08-08 18:10         ` Rich Felker
2012-08-09  1:51         ` Solar Designer
2012-08-09  3:25           ` Rich Felker
2012-08-09  4:04             ` Solar Designer
2012-08-09  5:48               ` Rich Felker
2012-08-09 15:52                 ` Solar Designer
2012-08-09 17:59                   ` Rich Felker
2012-08-09 21:17                   ` Rich Felker
2012-08-09 21:44                     ` Solar Designer
2012-08-09 22:08                       ` Rich Felker
2012-08-09 23:33           ` Rich Felker
2012-08-09  6:03   ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).