thoughts on reallocarray, explicit

mailing list of musl libc
 help / color / mirror / code / Atom feed

* thoughts on reallocarray, explicit_bzero?
@ 2014-05-19 15:31 Isaac Dunham
  2014-05-19 15:43 ` Rich Felker
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Isaac Dunham @ 2014-05-19 15:31 UTC (permalink / raw)
  To: musl

Having read up on the LibreSSL fork of OpenSSL and also recently
backported a nuber of libXfont CVE fixes for integer overflows,
I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
n*sizeof(x)).
calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)), 
but there's no standard function that does overflow checking for 
realloc(). OpenBSD has provided the extension reallocarray(), which 
provides for bounds checking like calloc() does.

Additionally, there are times when a compiler will optimize away calls
to bzero() on areas that are not used before free(); this can result in
passwords getting left in memory. OpenBSD uses a wrapper function called
explicit_bzero() to keep this from happening, thugh it seems to be possible
to use some ugliness with volatile to stop it.

Should musl provide reallocarray()? 
And what's the best way to ensure that memory gets zeroed out? 

Thanks,
Isaac Dunham

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 15:31 thoughts on reallocarray, explicit_bzero? Isaac Dunham
@ 2014-05-19 15:43 ` Rich Felker
  2014-05-19 16:19   ` Daniel Cegiełka
  2014-05-19 15:44 ` Daniel Cegiełka
  2014-05-19 16:25 ` Szabolcs Nagy
  2 siblings, 1 reply; 30+ messages in thread
From: Rich Felker @ 2014-05-19 15:43 UTC (permalink / raw)
  To: musl

On Mon, May 19, 2014 at 08:31:31AM -0700, Isaac Dunham wrote:
> Having read up on the LibreSSL fork of OpenSSL and also recently
> backported a nuber of libXfont CVE fixes for integer overflows,
> I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
> n*sizeof(x)).
> calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)), 
> but there's no standard function that does overflow checking for 
> realloc(). OpenBSD has provided the extension reallocarray(), which 
> provides for bounds checking like calloc() does.
> 
> Additionally, there are times when a compiler will optimize away calls
> to bzero() on areas that are not used before free(); this can result in
> passwords getting left in memory. OpenBSD uses a wrapper function called
> explicit_bzero() to keep this from happening, thugh it seems to be possible
> to use some ugliness with volatile to stop it.
> 
> Should musl provide reallocarray()? 

I don't think so; there's no precedent for it yet except OpenBSD.
LibreSSL should just add a fallback implementation (trivial to write
as a wrapper to realloc) for all non-OpenBSD systems that lack it.

I'm not fundamentally opposed to reallocarray, but I also don't think
it's useful to implement it unless there's a consensus from other
implementations on adding it.

> And what's the best way to ensure that memory gets zeroed out? 

read() from /dev/zero? ;-)

Seriously, this is a tough one, because no matter what you try, a
smart enough compiler will work around it. There's the Annex K
memset_s, but Annex K is generally horrible. The OpenBSD approach of
"explicit_bzero" is also pretty bad since "bzero" is a deprecated
interface.

Note that there's not even a way to _implement_ this kind of
zero-filling without resorting to compiler extensions or asm. The
compiler is free (e.g. with LTO) to look inside the implementation and
determine that it does nothing.

Solar has one clever "solution" to this problem in the bcrypt code he
contributed to musl (see the source for the trick) but it's not an
explicit zeroing; rather it's a reuse of the buffer in a way that
would be very difficult for the compiler to optimize out.

Rich

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 15:43 ` Rich Felker
@ 2014-05-19 16:19   ` Daniel Cegiełka
  2014-05-20  6:19     ` Rich Felker
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2014-05-19 16:19 UTC (permalink / raw)
  To: musl

2014-05-19 17:43 GMT+02:00 Rich Felker <dalias@libc.org>:

>>
>> Should musl provide reallocarray()?
>
> I don't think so; there's no precedent for it yet except OpenBSD.
> LibreSSL should just add a fallback implementation (trivial to write
> as a wrapper to realloc) for all non-OpenBSD systems that lack it.

The problem is that these functions are now used everywhere in
openbsd's repo., so it's not just LibreSSL but ed, pax, mandoc, m4,
(open)ssh, (open)smtpd, etc.

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:19   ` Daniel Cegiełka
@ 2014-05-20  6:19     ` Rich Felker
  2014-05-20 15:50       ` Daniel Cegiełka
  0 siblings, 1 reply; 30+ messages in thread
From: Rich Felker @ 2014-05-20  6:19 UTC (permalink / raw)
  To: musl

On Mon, May 19, 2014 at 06:19:11PM +0200, Daniel Cegiełka wrote:
> 2014-05-19 17:43 GMT+02:00 Rich Felker <dalias@libc.org>:
> 
> >>
> >> Should musl provide reallocarray()?
> >
> > I don't think so; there's no precedent for it yet except OpenBSD.
> > LibreSSL should just add a fallback implementation (trivial to write
> > as a wrapper to realloc) for all non-OpenBSD systems that lack it.
> 
> The problem is that these functions are now used everywhere in
> openbsd's repo., so it's not just LibreSSL but ed, pax, mandoc, m4,
> (open)ssh, (open)smtpd, etc.

It looks like glibc is willing to add reallocarray, so that makes at
least two systems now. If there's demand maybe we should add it.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-20  6:19     ` Rich Felker
@ 2014-05-20 15:50       ` Daniel Cegiełka
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Cegiełka @ 2014-05-20 15:50 UTC (permalink / raw)
  To: musl

2014-05-20 8:19 GMT+02:00 Rich Felker <dalias@libc.org>:
> On Mon, May 19, 2014 at 06:19:11PM +0200, Daniel Cegiełka wrote:
>> 2014-05-19 17:43 GMT+02:00 Rich Felker <dalias@libc.org>:
>>
>> >>
>> >> Should musl provide reallocarray()?
>> >
>> > I don't think so; there's no precedent for it yet except OpenBSD.
>> > LibreSSL should just add a fallback implementation (trivial to write
>> > as a wrapper to realloc) for all non-OpenBSD systems that lack it.
>>
>> The problem is that these functions are now used everywhere in
>> openbsd's repo., so it's not just LibreSSL but ed, pax, mandoc, m4,
>> (open)ssh, (open)smtpd, etc.
>
> It looks like glibc is willing to add reallocarray, so that makes at
> least two systems now. If there's demand maybe we should add it.

If glibc will support this function, then we can also in musl. These
functions will be distributed together with openssh, libressl, mandoc
or pax etc., so I think it is better to have it well done in libc
(like strlcat and strlcpy).

Daniel

>
> Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 15:31 thoughts on reallocarray, explicit_bzero? Isaac Dunham
  2014-05-19 15:43 ` Rich Felker
@ 2014-05-19 15:44 ` Daniel Cegiełka
  2014-05-19 16:16   ` Rich Felker
  2014-05-19 16:25 ` Szabolcs Nagy
  2 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2014-05-19 15:44 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1115 bytes --]

2014-05-19 17:31 GMT+02:00 Isaac Dunham <ibid.ag@gmail.com>:
> Having read up on the LibreSSL fork of OpenSSL and also recently
> backported a nuber of libXfont CVE fixes for integer overflows,
> I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
> n*sizeof(x)).
> calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)),
> but there's no standard function that does overflow checking for
> realloc(). OpenBSD has provided the extension reallocarray(), which
> provides for bounds checking like calloc() does.
>
> Additionally, there are times when a compiler will optimize away calls
> to bzero() on areas that are not used before free(); this can result in
> passwords getting left in memory. OpenBSD uses a wrapper function called
> explicit_bzero() to keep this from happening, thugh it seems to be possible
> to use some ugliness with volatile to stop it.
>
> Should musl provide reallocarray()?

In my opinion, yes, we should.

btw. no bzero()/bcopy() but memset() and memcpy() etc.

Daniel

> And what's the best way to ensure that memory gets zeroed out?
>
> Thanks,
> Isaac Dunham

[-- Attachment #2: explicit_bzero.diff --]
[-- Type: text/plain, Size: 796 bytes --]

diff -urN musl.orig/include/string.h musl/include/string.h
--- musl.orig/include/string.h	Fri May  9 09:49:36 2014
+++ musl/include/string.h	Fri May  9 09:57:10 2014
@@ -82,6 +82,7 @@
 char *strsep(char **, const char *);
 size_t strlcat (char *, const char *, size_t);
 size_t strlcpy (char *, const char *, size_t);
+void explicit_bzero(void *b, size_t len);
 #endif
 
 #ifdef _GNU_SOURCE
diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
--- musl.orig/src/string/explicit_bzero.c	Thu Jan  1 00:00:00 1970
+++ musl/src/string/explicit_bzero.c	Fri May  9 09:57:45 2014
@@ -0,0 +1,8 @@
+#include <string.h>
+
+static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
+
+void explicit_bzero(void *b, size_t len)
+{
+	(*explicit_memset)(b, 0, len);
+}

[-- Attachment #3: reallocarray.diff --]
[-- Type: text/plain, Size: 1131 bytes --]

diff -urN musl.orig/include/stdlib.h musl/include/stdlib.h
--- musl.orig/include/stdlib.h	Thu May  8 09:04:08 2014
+++ musl/include/stdlib.h	Thu May  8 09:11:06 2014
@@ -44,6 +44,9 @@
 void *realloc (void *, size_t);
 void free (void *);
 void *aligned_alloc(size_t alignment, size_t size);
+#ifdef _BSD_SOURCE
+void *reallocarray(void *, size_t, size_t);
+#endif
 
 _Noreturn void abort (void);
 int atexit (void (*) (void));
diff -urN musl.orig/src/stdlib/reallocarray.c musl/src/stdlib/reallocarray.c
--- musl.orig/src/stdlib/reallocarray.c	Thu Jan  1 00:00:00 1970
+++ musl/src/stdlib/reallocarray.c	Thu May  8 09:06:30 2014
@@ -0,0 +1,17 @@
+#include <stdlib.h>
+#include <limits.h>
+#include <errno.h>
+
+/* this is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
+ * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW */
+#define MUL_NO_OVERFLOW	(1UL << (sizeof(size_t) * 4))
+
+void *reallocarray(void *optr, size_t nmemb, size_t size)
+{
+	if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
+	    nmemb > 0 && SSIZE_MAX / nmemb < size) {
+		errno = ENOMEM;
+		return NULL;
+	}
+	return realloc(optr, size * nmemb);
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 15:44 ` Daniel Cegiełka
@ 2014-05-19 16:16   ` Rich Felker
  2014-05-19 16:30     ` Daniel Cegiełka
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Rich Felker @ 2014-05-19 16:16 UTC (permalink / raw)
  To: musl

On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:
> 2014-05-19 17:31 GMT+02:00 Isaac Dunham <ibid.ag@gmail.com>:
> > Having read up on the LibreSSL fork of OpenSSL and also recently
> > backported a nuber of libXfont CVE fixes for integer overflows,
> > I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
> > n*sizeof(x)).
> > calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)),
> > but there's no standard function that does overflow checking for
> > realloc(). OpenBSD has provided the extension reallocarray(), which
> > provides for bounds checking like calloc() does.
> >
> > Additionally, there are times when a compiler will optimize away calls
> > to bzero() on areas that are not used before free(); this can result in
> > passwords getting left in memory. OpenBSD uses a wrapper function called
> > explicit_bzero() to keep this from happening, thugh it seems to be possible
> > to use some ugliness with volatile to stop it.
> >
> > Should musl provide reallocarray()?
> 
> In my opinion, yes, we should.

In the long term I'm not strongly decided one way or the other, but
LibreSSL needs to provide its own fallbacks for these to be portable
to any system but OpenBSD. I don't want to be part of their game of
imposing new nonstandard interfaces where they could just as easily
achieve the same via wrapping standard interfaces.

> btw. no bzero()/bcopy() but memset() and memcpy() etc.
> 
> Daniel
> 
> > And what's the best way to ensure that memory gets zeroed out?
> >
> > Thanks,
> > Isaac Dunham

> diff -urN musl.orig/include/string.h musl/include/string.h
> --- musl.orig/include/string.h	Fri May  9 09:49:36 2014
> +++ musl/include/string.h	Fri May  9 09:57:10 2014
> @@ -82,6 +82,7 @@
>  char *strsep(char **, const char *);
>  size_t strlcat (char *, const char *, size_t);
>  size_t strlcpy (char *, const char *, size_t);
> +void explicit_bzero(void *b, size_t len);
>  #endif
>  
>  #ifdef _GNU_SOURCE
> diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
> --- musl.orig/src/string/explicit_bzero.c	Thu Jan  1 00:00:00 1970
> +++ musl/src/string/explicit_bzero.c	Fri May  9 09:57:45 2014
> @@ -0,0 +1,8 @@
> +#include <string.h>
> +
> +static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
> +
> +void explicit_bzero(void *b, size_t len)
> +{
> +	(*explicit_memset)(b, 0, len);
> +}

This is a nice trick, but IIRC I actually observed GCC optimizing out
similar code before (instead of your static volatile, I used a
volatile compound literal). At least the concept is right though: you
want to prevent the compiler from being able to do any flow analysis
at compile time, and making the function pointer volatile achieves
this rather well. On the other hand, GCC will put the volatile pointer
(if it even emits it) in non-constant memory, meaning it's an
additional attack vector for function-pointer-overwrite attacks. And
as mentioned in the other email, I don't really like the idea of
making a new variant of a deprecated/removed function (bzero). The
name "explicit" is rather unclear what it means too.

> diff -urN musl.orig/include/stdlib.h musl/include/stdlib.h
> --- musl.orig/include/stdlib.h	Thu May  8 09:04:08 2014
> +++ musl/include/stdlib.h	Thu May  8 09:11:06 2014
> @@ -44,6 +44,9 @@
>  void *realloc (void *, size_t);
>  void free (void *);
>  void *aligned_alloc(size_t alignment, size_t size);
> +#ifdef _BSD_SOURCE
> +void *reallocarray(void *, size_t, size_t);
> +#endif
>  
>  _Noreturn void abort (void);
>  int atexit (void (*) (void));
> diff -urN musl.orig/src/stdlib/reallocarray.c musl/src/stdlib/reallocarray.c
> --- musl.orig/src/stdlib/reallocarray.c	Thu Jan  1 00:00:00 1970
> +++ musl/src/stdlib/reallocarray.c	Thu May  8 09:06:30 2014
> @@ -0,0 +1,17 @@
> +#include <stdlib.h>
> +#include <limits.h>
> +#include <errno.h>
> +
> +/* this is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
> + * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW */
> +#define MUL_NO_OVERFLOW	(1UL << (sizeof(size_t) * 4))
> +
> +void *reallocarray(void *optr, size_t nmemb, size_t size)
> +{
> +	if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
> +	    nmemb > 0 && SSIZE_MAX / nmemb < size) {
> +		errno = ENOMEM;
> +		return NULL;
> +	}
> +	return realloc(optr, size * nmemb);
> +}

While it's a bit ugly, if your goal is efficiency, it makes a lot more
sense to special-case 32-bit systems and do a 32x32 -> 64 multiply.
This makes it so you don't need division code or any extra branches.
And for 64-bit systems, either nmemb or size being >32bit would be a
pathological corner case (and very slow already anyway), so your check
is efficient.

Also, is there a reason you're using SSIZE_MAX? SIZE_MAX should work
just as well here, but functionally it makes no difference.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:16   ` Rich Felker
@ 2014-05-19 16:30     ` Daniel Cegiełka
  2014-05-19 16:32     ` Szabolcs Nagy
  2015-01-28 22:01     ` Daniel Cegiełka
  2 siblings, 0 replies; 30+ messages in thread
From: Daniel Cegiełka @ 2014-05-19 16:30 UTC (permalink / raw)
  To: musl

2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:

>>  _Noreturn void abort (void);
>>  int atexit (void (*) (void));
>> diff -urN musl.orig/src/stdlib/reallocarray.c musl/src/stdlib/reallocarray.c
>> --- musl.orig/src/stdlib/reallocarray.c       Thu Jan  1 00:00:00 1970
>> +++ musl/src/stdlib/reallocarray.c    Thu May  8 09:06:30 2014
>> @@ -0,0 +1,17 @@
>> +#include <stdlib.h>
>> +#include <limits.h>
>> +#include <errno.h>
>> +
>> +/* this is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
>> + * if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW */
>> +#define MUL_NO_OVERFLOW      (1UL << (sizeof(size_t) * 4))
>> +
>> +void *reallocarray(void *optr, size_t nmemb, size_t size)
>> +{
>> +     if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
>> +         nmemb > 0 && SSIZE_MAX / nmemb < size) {
>> +             errno = ENOMEM;
>> +             return NULL;
>> +     }
>> +     return realloc(optr, size * nmemb);
>> +}
>
> While it's a bit ugly, if your goal is efficiency, it makes a lot more
> sense to special-case 32-bit systems and do a 32x32 -> 64 multiply.
> This makes it so you don't need division code or any extra branches.
> And for 64-bit systems, either nmemb or size being >32bit would be a
> pathological corner case (and very slow already anyway), so your check
> is efficient.

It was a quick fix only from malloc.c from openbsd :) I use a lot of
programs from openbsd and I had problems with the compilation.

>
> Also, is there a reason you're using SSIZE_MAX? SIZE_MAX should work
> just as well here, but functionally it makes no difference.

yes, I should correct it:
http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/reallocarray.c?rev=1.1;content-type=text%2Fplain

Thanks,
Daniel

> Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:16   ` Rich Felker
  2014-05-19 16:30     ` Daniel Cegiełka
@ 2014-05-19 16:32     ` Szabolcs Nagy
  2015-01-28 22:01     ` Daniel Cegiełka
  2 siblings, 0 replies; 30+ messages in thread
From: Szabolcs Nagy @ 2014-05-19 16:32 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2014-05-19 12:16:54 -0400]:
> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegie??ka wrote:
> > +#define MUL_NO_OVERFLOW	(1UL << (sizeof(size_t) * 4))
> > +
> > +void *reallocarray(void *optr, size_t nmemb, size_t size)
> > +{
> > +	if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
> > +	    nmemb > 0 && SSIZE_MAX / nmemb < size) {
> > +		errno = ENOMEM;
> > +		return NULL;
> > +	}
> > +	return realloc(optr, size * nmemb);
> > +}
> 
> While it's a bit ugly, if your goal is efficiency, it makes a lot more
> sense to special-case 32-bit systems and do a 32x32 -> 64 multiply.
> This makes it so you don't need division code or any extra branches.
> And for 64-bit systems, either nmemb or size being >32bit would be a
> pathological corner case (and very slow already anyway), so your check
> is efficient.
> 
> Also, is there a reason you're using SSIZE_MAX? SIZE_MAX should work
> just as well here, but functionally it makes no difference.

i think this ugly code is from openbsd

they like to use such obfuscated checks for no apparent reason..


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:16   ` Rich Felker
  2014-05-19 16:30     ` Daniel Cegiełka
  2014-05-19 16:32     ` Szabolcs Nagy
@ 2015-01-28 22:01     ` Daniel Cegiełka
  2015-01-28 22:34       ` Daniel Cegiełka
  2 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-28 22:01 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1539 bytes --]

2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:
> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:

>> diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
>> --- musl.orig/src/string/explicit_bzero.c     Thu Jan  1 00:00:00 1970
>> +++ musl/src/string/explicit_bzero.c  Fri May  9 09:57:45 2014
>> @@ -0,0 +1,8 @@
>> +#include <string.h>
>> +
>> +static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
>> +
>> +void explicit_bzero(void *b, size_t len)
>> +{
>> +     (*explicit_memset)(b, 0, len);
>> +}
>
> This is a nice trick, but IIRC I actually observed GCC optimizing out
> similar code before (instead of your static volatile, I used a
> volatile compound literal). At least the concept is right though: you
> want to prevent the compiler from being able to do any flow analysis
> at compile time, and making the function pointer volatile achieves
> this rather well. On the other hand, GCC will put the volatile pointer
> (if it even emits it) in non-constant memory, meaning it's an
> additional attack vector for function-pointer-overwrite attacks.

Linux kernel has similar functions and uses a barrier() here:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162

Is such a solution is more correct (and still portable)?

Daniel

> Rich

[-- Attachment #2: explicit_bzero.c --]
[-- Type: text/x-csrc, Size: 220 bytes --]

#include <string.h>

static void *(*volatile explicit_memset)(void *, int, size_t) = memset;

void explicit_bzero(void *b, size_t len)
{
	(*explicit_memset)(b, 0, len);
	__asm__ volatile ( "" : : "r" (b) : "memory" );
}

[-- Attachment #3: explicit_bzero2.c --]
[-- Type: text/x-csrc, Size: 135 bytes --]

#include <string.h>

void explicit_bzero(void *b, size_t len)
{
	memset(b, 0, len);
	__asm__ volatile ( "" : : "r" (b) : "memory" );
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-28 22:01     ` Daniel Cegiełka
@ 2015-01-28 22:34       ` Daniel Cegiełka
  2015-01-28 22:38         ` Nathan McSween
  2015-01-29  2:19         ` Rich Felker
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-28 22:34 UTC (permalink / raw)
  To: musl

2015-01-28 23:01 GMT+01:00 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> 2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:
>> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:
>
>>> diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
>>> --- musl.orig/src/string/explicit_bzero.c     Thu Jan  1 00:00:00 1970
>>> +++ musl/src/string/explicit_bzero.c  Fri May  9 09:57:45 2014
>>> @@ -0,0 +1,8 @@
>>> +#include <string.h>
>>> +
>>> +static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
>>> +
>>> +void explicit_bzero(void *b, size_t len)
>>> +{
>>> +     (*explicit_memset)(b, 0, len);
>>> +}
>>
>> This is a nice trick, but IIRC I actually observed GCC optimizing out
>> similar code before (instead of your static volatile, I used a
>> volatile compound literal). At least the concept is right though: you
>> want to prevent the compiler from being able to do any flow analysis
>> at compile time, and making the function pointer volatile achieves
>> this rather well. On the other hand, GCC will put the volatile pointer
>> (if it even emits it) in non-constant memory, meaning it's an
>> additional attack vector for function-pointer-overwrite attacks.
>
> Linux kernel has similar functions and uses a barrier() here:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162
>
> Is such a solution is more correct (and still portable)?

I'm afraid that the only appropriate solution is to use memset_s()
from C11 and the expectation that the compiler will accept it.
barrier() does not give any guarantee that this function will be
secure. Only compiler decides. I'm afraid that OpenBSD goes bad path
with explicit_bzero(). The same applies to the linux kernel and
memzero_explicit().. very stupid name...

Daniel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-28 22:34       ` Daniel Cegiełka
@ 2015-01-28 22:38         ` Nathan McSween
  2015-01-28 22:54           ` Daniel Cegiełka
  2015-01-29  2:19         ` Rich Felker
  1 sibling, 1 reply; 30+ messages in thread
From: Nathan McSween @ 2015-01-28 22:38 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

How can a compiler optimize out a memory barrier?
On Jan 28, 2015 2:34 PM, "Daniel Cegiełka" <daniel.cegielka@gmail.com>
wrote:

> 2015-01-28 23:01 GMT+01:00 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> > 2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:
> >> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:
> >
> >>> diff -urN musl.orig/src/string/explicit_bzero.c
> musl/src/string/explicit_bzero.c
> >>> --- musl.orig/src/string/explicit_bzero.c     Thu Jan  1 00:00:00 1970
> >>> +++ musl/src/string/explicit_bzero.c  Fri May  9 09:57:45 2014
> >>> @@ -0,0 +1,8 @@
> >>> +#include <string.h>
> >>> +
> >>> +static void *(*volatile explicit_memset)(void *, int, size_t) =
> memset;
> >>> +
> >>> +void explicit_bzero(void *b, size_t len)
> >>> +{
> >>> +     (*explicit_memset)(b, 0, len);
> >>> +}
> >>
> >> This is a nice trick, but IIRC I actually observed GCC optimizing out
> >> similar code before (instead of your static volatile, I used a
> >> volatile compound literal). At least the concept is right though: you
> >> want to prevent the compiler from being able to do any flow analysis
> >> at compile time, and making the function pointer volatile achieves
> >> this rather well. On the other hand, GCC will put the volatile pointer
> >> (if it even emits it) in non-constant memory, meaning it's an
> >> additional attack vector for function-pointer-overwrite attacks.
> >
> > Linux kernel has similar functions and uses a barrier() here:
> >
> >
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600
> >
> >
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162
> >
> > Is such a solution is more correct (and still portable)?
>
> I'm afraid that the only appropriate solution is to use memset_s()
> from C11 and the expectation that the compiler will accept it.
> barrier() does not give any guarantee that this function will be
> secure. Only compiler decides. I'm afraid that OpenBSD goes bad path
> with explicit_bzero(). The same applies to the linux kernel and
> memzero_explicit().. very stupid name...
>
> Daniel
>

[-- Attachment #2: Type: text/html, Size: 3178 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-28 22:38         ` Nathan McSween
@ 2015-01-28 22:54           ` Daniel Cegiełka
  2015-01-28 23:02             ` Josiah Worcester
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-28 22:54 UTC (permalink / raw)
  To: musl

2015-01-28 23:38 GMT+01:00 Nathan McSween <nwmcsween@gmail.com>:
> How can a compiler optimize out a memory barrier?

The problem is not whether to do it or not, but in the fact that the
right solution is to use memset_s().

http://en.wikichip.org/wiki/c/c11#Bounds-checking_interfaces_annex_.28Annex_K.29


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-28 22:54           ` Daniel Cegiełka
@ 2015-01-28 23:02             ` Josiah Worcester
  0 siblings, 0 replies; 30+ messages in thread
From: Josiah Worcester @ 2015-01-28 23:02 UTC (permalink / raw)
  To: musl

On Wed, Jan 28, 2015 at 4:54 PM, Daniel Cegiełka
<daniel.cegielka@gmail.com> wrote:
> 2015-01-28 23:38 GMT+01:00 Nathan McSween <nwmcsween@gmail.com>:
>> How can a compiler optimize out a memory barrier?
>
> The problem is not whether to do it or not, but in the fact that the
> right solution is to use memset_s().
>
> http://en.wikichip.org/wiki/c/c11#Bounds-checking_interfaces_annex_.28Annex_K.29
memset_s inherits the typical Annex K brain damage which makes it not
at all library safe or sane to use, though.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-28 22:34       ` Daniel Cegiełka
  2015-01-28 22:38         ` Nathan McSween
@ 2015-01-29  2:19         ` Rich Felker
  2015-01-29  4:03           ` Brent Cook
  1 sibling, 1 reply; 30+ messages in thread
From: Rich Felker @ 2015-01-29  2:19 UTC (permalink / raw)
  To: musl

On Wed, Jan 28, 2015 at 11:34:20PM +0100, Daniel Cegiełka wrote:
> 2015-01-28 23:01 GMT+01:00 Daniel Cegiełka <daniel.cegielka@gmail.com>:
> > 2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:
> >> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:
> >
> >>> diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
> >>> --- musl.orig/src/string/explicit_bzero.c     Thu Jan  1 00:00:00 1970
> >>> +++ musl/src/string/explicit_bzero.c  Fri May  9 09:57:45 2014
> >>> @@ -0,0 +1,8 @@
> >>> +#include <string.h>
> >>> +
> >>> +static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
> >>> +
> >>> +void explicit_bzero(void *b, size_t len)
> >>> +{
> >>> +     (*explicit_memset)(b, 0, len);
> >>> +}
> >>
> >> This is a nice trick, but IIRC I actually observed GCC optimizing out
> >> similar code before (instead of your static volatile, I used a
> >> volatile compound literal). At least the concept is right though: you
> >> want to prevent the compiler from being able to do any flow analysis
> >> at compile time, and making the function pointer volatile achieves
> >> this rather well. On the other hand, GCC will put the volatile pointer
> >> (if it even emits it) in non-constant memory, meaning it's an
> >> additional attack vector for function-pointer-overwrite attacks.
> >
> > Linux kernel has similar functions and uses a barrier() here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162
> >
> > Is such a solution is more correct (and still portable)?
> 
> I'm afraid that the only appropriate solution is to use memset_s()
> from C11 and the expectation that the compiler will accept it.
> barrier() does not give any guarantee that this function will be
> secure. Only compiler decides. I'm afraid that OpenBSD goes bad path
> with explicit_bzero(). The same applies to the linux kernel and
> memzero_explicit().. very stupid name...

I see no way memset_s is technically "better". It's unable to find and
clear other temporary copies that have been made, and the barrier
method described above already reliably clears the pointed-to copy.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29  2:19         ` Rich Felker
@ 2015-01-29  4:03           ` Brent Cook
  2015-01-29  4:15             ` Rich Felker
  0 siblings, 1 reply; 30+ messages in thread
From: Brent Cook @ 2015-01-29  4:03 UTC (permalink / raw)
  To: musl

On Wed, Jan 28, 2015 at 8:19 PM, Rich Felker <dalias@libc.org> wrote:
> On Wed, Jan 28, 2015 at 11:34:20PM +0100, Daniel Cegiełka wrote:
>> 2015-01-28 23:01 GMT+01:00 Daniel Cegiełka <daniel.cegielka@gmail.com>:
>> > 2014-05-19 18:16 GMT+02:00 Rich Felker <dalias@libc.org>:
>> >> On Mon, May 19, 2014 at 05:44:59PM +0200, Daniel Cegiełka wrote:
>> >
>> >>> diff -urN musl.orig/src/string/explicit_bzero.c musl/src/string/explicit_bzero.c
>> >>> --- musl.orig/src/string/explicit_bzero.c     Thu Jan  1 00:00:00 1970
>> >>> +++ musl/src/string/explicit_bzero.c  Fri May  9 09:57:45 2014
>> >>> @@ -0,0 +1,8 @@
>> >>> +#include <string.h>
>> >>> +
>> >>> +static void *(*volatile explicit_memset)(void *, int, size_t) = memset;
>> >>> +
>> >>> +void explicit_bzero(void *b, size_t len)
>> >>> +{
>> >>> +     (*explicit_memset)(b, 0, len);
>> >>> +}
>> >>
>> >> This is a nice trick, but IIRC I actually observed GCC optimizing out
>> >> similar code before (instead of your static volatile, I used a
>> >> volatile compound literal). At least the concept is right though: you
>> >> want to prevent the compiler from being able to do any flow analysis
>> >> at compile time, and making the function pointer volatile achieves
>> >> this rather well. On the other hand, GCC will put the volatile pointer
>> >> (if it even emits it) in non-constant memory, meaning it's an
>> >> additional attack vector for function-pointer-overwrite attacks.
>> >
>> > Linux kernel has similar functions and uses a barrier() here:
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600
>> >
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162
>> >
>> > Is such a solution is more correct (and still portable)?
>>
>> I'm afraid that the only appropriate solution is to use memset_s()
>> from C11 and the expectation that the compiler will accept it.
>> barrier() does not give any guarantee that this function will be
>> secure. Only compiler decides. I'm afraid that OpenBSD goes bad path
>> with explicit_bzero(). The same applies to the linux kernel and
>> memzero_explicit().. very stupid name...
>
> I see no way memset_s is technically "better". It's unable to find and
> clear other temporary copies that have been made, and the barrier
> method described above already reliably clears the pointed-to copy.
>
> Rich

Whatever method you choose, the method of testing is an interesting
one, since seeing if the compiler optimized out a memset (because the
memory was not read after a write) requires tricking the compiler into
believing you aren't reading it. This test is pretty cool, IMO:

https://github.com/libressl-portable/openbsd/blob/master/src/regress/lib/libc/explicit_bzero/explicit_bzero.c

it is described a bit more here:
https://plus.google.com/+MatthewDempsky/posts/KQHFBouxurX

Getting around link-time optimizations required building the
explicit_bzero function with independent compiler flags to ensure LTO
was not enabled.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29  4:03           ` Brent Cook
@ 2015-01-29  4:15             ` Rich Felker
  2015-01-29  9:30               ` Daniel Cegiełka
  0 siblings, 1 reply; 30+ messages in thread
From: Rich Felker @ 2015-01-29  4:15 UTC (permalink / raw)
  To: musl

On Wed, Jan 28, 2015 at 10:03:33PM -0600, Brent Cook wrote:
> >> > Linux kernel has similar functions and uses a barrier() here:
> >> >
> >> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?id=refs/tags/v3.19-rc6#n600
> >> >
> >> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler.h?id=refs/tags/v3.19-rc6#n162
> >> >
> >> > Is such a solution is more correct (and still portable)?
> >>
> >> I'm afraid that the only appropriate solution is to use memset_s()
> >> from C11 and the expectation that the compiler will accept it.
> >> barrier() does not give any guarantee that this function will be
> >> secure. Only compiler decides. I'm afraid that OpenBSD goes bad path
> >> with explicit_bzero(). The same applies to the linux kernel and
> >> memzero_explicit().. very stupid name...
> >
> > I see no way memset_s is technically "better". It's unable to find and
> > clear other temporary copies that have been made, and the barrier
> > method described above already reliably clears the pointed-to copy.
> 
> Whatever method you choose, the method of testing is an interesting
> one, since seeing if the compiler optimized out a memset (because the
> memory was not read after a write) requires tricking the compiler into
> believing you aren't reading it. This test is pretty cool, IMO:
> 
> https://github.com/libressl-portable/openbsd/blob/master/src/regress/lib/libc/explicit_bzero/explicit_bzero.c
> 
> it is described a bit more here:
> https://plus.google.com/+MatthewDempsky/posts/KQHFBouxurX

The comment that pthread_attr_setstack could be used instead is
interesting and would make the test a lot simpler, I think.

> Getting around link-time optimizations required building the
> explicit_bzero function with independent compiler flags to ensure LTO
> was not enabled.

As long as there's a barrier, LTO is no problem. The asm is a black
box that's required to see the results of memset, since the address of
the object reaches the asm, and the only way to ensure that such a
black box sees the writes is for them to actually be performed.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29  4:15             ` Rich Felker
@ 2015-01-29  9:30               ` Daniel Cegiełka
  2015-01-29 10:04                 ` Szabolcs Nagy
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-29  9:30 UTC (permalink / raw)
  To: musl

The concept of safe memory cleaning was mostly promoted by
cryptographic libraries - eg. secure_memzero(). Unfortunately, we have
currently too many interfaces for the same functionality: memset_s(),
secure_memzero(), explicit_bzero(), memzero_explicit(). This is why I
believe that OpenBSD (and linux developers) goes bad path, introducing
yet another secure_memzero(). A better solution would be to promote a
single standard (eg. memset_s()) and the expectation that the compiler
will respect it.

summing up: we have several options:

* volatile based, but fails with LTO
http://openwall.com/lists/musl/2014/05/19/5

* weak symbols based (from Matthew Dempsky):
https://plus.google.com/+MatthewDempsky/posts/KQHFBouxurX
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/explicit_bzero.c?rev=1.3&content-type=text/x-cvsweb-markup

* barrier based, but with asm inline:
http://openwall.com/lists/musl/2015/01/28/34

Is the musl will support this feature to improve compatibility with BSD?

Daniel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29  9:30               ` Daniel Cegiełka
@ 2015-01-29 10:04                 ` Szabolcs Nagy
  2015-01-29 10:31                   ` Daniel Cegiełka
  2015-01-29 10:54                   ` Daniel Cegiełka
  0 siblings, 2 replies; 30+ messages in thread
From: Szabolcs Nagy @ 2015-01-29 10:04 UTC (permalink / raw)
  To: musl

* Daniel Cegie??ka <daniel.cegielka@gmail.com> [2015-01-29 10:30:40 +0100]:
> yet another secure_memzero(). A better solution would be to promote a
> single standard (eg. memset_s()) and the expectation that the compiler
> will respect it.
> 

i think you don't know the semantics of memset_s
(it uses nonsense types, has superflous arguments, handles
constraint violations through global state etc)

it is a complicated mess and not a good api to standardize on
if all you want is to avoid information leak in crypto code

(btw no memset based solution can provide complete protection
against info leak: if the crypto function is interrupted by
a signal then all the register state will be copied to the
stack or altstack and kept around for arbitrarily long time
which is plenty information leaked)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29 10:04                 ` Szabolcs Nagy
@ 2015-01-29 10:31                   ` Daniel Cegiełka
  2015-01-29 10:54                   ` Daniel Cegiełka
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-29 10:31 UTC (permalink / raw)
  To: musl

2015-01-29 11:04 GMT+01:00 Szabolcs Nagy <nsz@port70.net>:
> * Daniel Cegie??ka <daniel.cegielka@gmail.com> [2015-01-29 10:30:40 +0100]:
>> yet another secure_memzero(). A better solution would be to promote a
>> single standard (eg. memset_s()) and the expectation that the compiler
>> will respect it.
>>
>
> i think you don't know the semantics of memset_s
> (it uses nonsense types, has superflous arguments, handles
> constraint violations through global state etc)
>
> it is a complicated mess and not a good api to standardize on
> if all you want is to avoid information leak in crypto code

I gave this as an example - the intention is to have a single standard
(vs secure_memzero(), explicit_bzero(), memzero_explicit(), ...).

http://openwall.com/lists/musl/2015/01/14/5


btw. libsodium prefers memset_s() over explicit_bzero() and over weak symbols.

https://github.com/jedisct1/libsodium/blob/master/src/libsodium/sodium/utils.c#L56


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2015-01-29 10:04                 ` Szabolcs Nagy
  2015-01-29 10:31                   ` Daniel Cegiełka
@ 2015-01-29 10:54                   ` Daniel Cegiełka
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Cegiełka @ 2015-01-29 10:54 UTC (permalink / raw)
  To: musl

2015-01-29 11:04 GMT+01:00 Szabolcs Nagy <nsz@port70.net>:
> * Daniel Cegie??ka <daniel.cegielka@gmail.com> [2015-01-29 10:30:40 +0100]:
>> yet another secure_memzero(). A better solution would be to promote a
>> single standard (eg. memset_s()) and the expectation that the compiler
>> will respect it.
>>
>
> i think you don't know the semantics of memset_s
> (it uses nonsense types, has superflous arguments, handles
> constraint violations through global state etc)


btw. memset_s() is an attempt to solve the same problem. However, this
version will not work with LTO:

ftp://ftp.netbsd.org/pub/NetBSD/misc/apb/memset_s.20120224.diff

#include <sys/cdefs.h>

__RCSID("$NetBSD$");

#define __STDC_WANT_LIB_EXT1__ 1
#include <errno.h>
#include <stdint.h>
#include <string.h>

/*
 * __memset_vp is a volatile pointer to a function.
 * It is initialised to point to memset, and should never be changed.
 */
static void * (* const volatile __memset_vp)(void *, int, size_t)
= (memset);

#undef memset_s /* in case it was defined as a macro */

errno_t
memset_s(void *s, rsize_t smax, int c, rsize_t n)
{
errno_t err = 0;

if (s == NULL) {
err = EINVAL;
goto out;
}
if (smax > RSIZE_MAX) {
err = E2BIG;
goto out;
}
if (n > RSIZE_MAX) {
err = E2BIG;
n = smax;
}
if (n > smax) {
err = EOVERFLOW;
n = smax;
}

/* Calling through a volatile pointer should never be optimised away. */
(*__memset_vp)(s, c, n);

    out:
if (err == 0)
return 0;
else {
errno = err;
/* XXX call runtime-constraint handler */
return err;
}
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 15:31 thoughts on reallocarray, explicit_bzero? Isaac Dunham
  2014-05-19 15:43 ` Rich Felker
  2014-05-19 15:44 ` Daniel Cegiełka
@ 2014-05-19 16:25 ` Szabolcs Nagy
  2014-05-19 16:45   ` Daniel Cegiełka
                     ` (3 more replies)
  2 siblings, 4 replies; 30+ messages in thread
From: Szabolcs Nagy @ 2014-05-19 16:25 UTC (permalink / raw)
  To: musl

* Isaac Dunham <ibid.ag@gmail.com> [2014-05-19 08:31:31 -0700]:
> Having read up on the LibreSSL fork of OpenSSL and also recently
> backported a nuber of libXfont CVE fixes for integer overflows,
> I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
> n*sizeof(x)).
> calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)), 
> but there's no standard function that does overflow checking for 
> realloc(). OpenBSD has provided the extension reallocarray(), which 
> provides for bounds checking like calloc() does.

i'd use a saturated multiplication, because malloc/realloc
are not the only places where overflowing size calculations
may cause problems and in such cases (size_t)-1 is just as
good as a failure and it can be added to your code without
portability issues

static size_t sizemul(size_t a, size_t b)
{
	return b>1 && a>1 && a>-1/b ? -1 : a*b;
}

> Additionally, there are times when a compiler will optimize away calls
> to bzero() on areas that are not used before free(); this can result in
> passwords getting left in memory. OpenBSD uses a wrapper function called
> explicit_bzero() to keep this from happening, thugh it seems to be possible
> to use some ugliness with volatile to stop it.

i don't see how the openbsd explicit_bzero stops the
compiler to do optimizations..

(i guess they rely on that their gcc does not do lto
or that libc is dynamic linked and the compiler has no
'explicit_bzero' builtin, neither of which is a great
solution..)

the usual approach to this is volatile function pointer:

static void *(*volatile force_memset)(void,int,size_t) = memset;

in general in c one cannot be sure that the secret bits
are not leaked somewhere since the languge spec cannot
give such guarantees

that said either the volatile funcptr or actually reusing
the memory such that it cannot be optimized away works in
practice

> Should musl provide reallocarray()? 
> And what's the best way to ensure that memory gets zeroed out? 
> 
> Thanks,
> Isaac Dunham


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:25 ` Szabolcs Nagy
@ 2014-05-19 16:45   ` Daniel Cegiełka
  2014-05-19 16:58     ` Rich Felker
  2014-05-19 16:55   ` Rich Felker
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Daniel Cegiełka @ 2014-05-19 16:45 UTC (permalink / raw)
  To: musl

2014-05-19 18:25 GMT+02:00 Szabolcs Nagy <nsz@port70.net>:

> i don't see how the openbsd explicit_bzero stops the
> compiler to do optimizations..
>
> (i guess they rely on that their gcc does not do lto
> or that libc is dynamic linked and the compiler has no
> 'explicit_bzero' builtin, neither of which is a great
> solution..)
>
> the usual approach to this is volatile function pointer:
>
> static void *(*volatile force_memset)(void,int,size_t) = memset;
>
> in general in c one cannot be sure that the secret bits
> are not leaked somewhere since the languge spec cannot
> give such guarantees
>
> that said either the volatile funcptr or actually reusing
> the memory such that it cannot be optimized away works in
> practice

first version:

void explicit_bzero(void * const b, const size_t l)
{
    volatile unsigned char *p = (volatile unsigned char *) b;
    size_t i = (size_t) 0U;

    while (i < l) {
        p[i++] = 0U;
    }
}

Of course, if someone has better ideas... I'm very curious :)

Daniel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:45   ` Daniel Cegiełka
@ 2014-05-19 16:58     ` Rich Felker
  0 siblings, 0 replies; 30+ messages in thread
From: Rich Felker @ 2014-05-19 16:58 UTC (permalink / raw)
  To: musl

On Mon, May 19, 2014 at 06:45:08PM +0200, Daniel Cegiełka wrote:
> 2014-05-19 18:25 GMT+02:00 Szabolcs Nagy <nsz@port70.net>:
> 
> > i don't see how the openbsd explicit_bzero stops the
> > compiler to do optimizations..
> >
> > (i guess they rely on that their gcc does not do lto
> > or that libc is dynamic linked and the compiler has no
> > 'explicit_bzero' builtin, neither of which is a great
> > solution..)
> >
> > the usual approach to this is volatile function pointer:
> >
> > static void *(*volatile force_memset)(void,int,size_t) = memset;
> >
> > in general in c one cannot be sure that the secret bits
> > are not leaked somewhere since the languge spec cannot
> > give such guarantees
> >
> > that said either the volatile funcptr or actually reusing
> > the memory such that it cannot be optimized away works in
> > practice
> 
> first version:
> 
> void explicit_bzero(void * const b, const size_t l)
> {
>     volatile unsigned char *p = (volatile unsigned char *) b;
>     size_t i = (size_t) 0U;
> 
>     while (i < l) {
>         p[i++] = 0U;
>     }
> }
> 
> Of course, if someone has better ideas... I'm very curious :)

I'm pretty sure this does not work. The volatile pointer cast (which
BTW is not necessary; it happens implicitly) does not, as far as I can
tell, mean "access the object via an overlapped volatile object".
Rather, it just means that the compiler cannot _automatically_ assume
the pointed-to object is non-volatile. It's still free to determine
via other means (e.g. inter-procedural analysis/LTO/etc.) that the
pointed-to object is non-volatile (and of course, in cases where this
matters, that its lifetime is ending) and thereby optimize out the
whole thing as dead code.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:25 ` Szabolcs Nagy
  2014-05-19 16:45   ` Daniel Cegiełka
@ 2014-05-19 16:55   ` Rich Felker
  2014-05-19 18:12     ` Szabolcs Nagy
  2014-05-19 22:08   ` Andy Lutomirski
  2014-06-11  9:59   ` Thorsten Glaser
  3 siblings, 1 reply; 30+ messages in thread
From: Rich Felker @ 2014-05-19 16:55 UTC (permalink / raw)
  To: musl

On Mon, May 19, 2014 at 06:25:57PM +0200, Szabolcs Nagy wrote:
> i'd use a saturated multiplication, because malloc/realloc
> are not the only places where overflowing size calculations
> may cause problems and in such cases (size_t)-1 is just as
> good as a failure and it can be added to your code without
> portability issues
> 
> static size_t sizemul(size_t a, size_t b)
> {
> 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> }

On 32-bit this can easily be optimized to just one conditional instead
of three:

uint64_t tmp = (uint64_t)a * b;
return tmp>SIZE_MAX ? SIZE_MAX : tmp;

Of course that requires an ifdef, which is perhaps ugly.

Rich


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:55   ` Rich Felker
@ 2014-05-19 18:12     ` Szabolcs Nagy
  0 siblings, 0 replies; 30+ messages in thread
From: Szabolcs Nagy @ 2014-05-19 18:12 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2014-05-19 12:55:23 -0400]:
> On Mon, May 19, 2014 at 06:25:57PM +0200, Szabolcs Nagy wrote:
> > static size_t sizemul(size_t a, size_t b)
> > {
> > 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> > }
> 
> On 32-bit this can easily be optimized to just one conditional instead
> of three:
> 
> uint64_t tmp = (uint64_t)a * b;
> return tmp>SIZE_MAX ? SIZE_MAX : tmp;
> 
> Of course that requires an ifdef, which is perhaps ugly.

or without ifdef

static size_t sizemul(size_t a, size_t b)
{
	if (2*sizeof a <= sizeof 1ULL) {
		unsigned long long m = 1ULL*a*b;
		return m>>8*sizeof a ? -1 : m;
	}
	return (a|b)>>4*sizeof a && b && a>-1/b ? -1 : a*b;
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:25 ` Szabolcs Nagy
  2014-05-19 16:45   ` Daniel Cegiełka
  2014-05-19 16:55   ` Rich Felker
@ 2014-05-19 22:08   ` Andy Lutomirski
  2014-05-20  0:41     ` Szabolcs Nagy
  2014-06-11  9:59   ` Thorsten Glaser
  3 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2014-05-19 22:08 UTC (permalink / raw)
  To: musl

On 05/19/2014 09:25 AM, Szabolcs Nagy wrote:
> * Isaac Dunham <ibid.ag@gmail.com> [2014-05-19 08:31:31 -0700]:
>> Having read up on the LibreSSL fork of OpenSSL and also recently
>> backported a nuber of libXfont CVE fixes for integer overflows,
>> I've seen the risk posed by malloc(n*sizeof(x)) and realloc(ptr,
>> n*sizeof(x)).
>> calloc(n, sizeof(x)) can be used in place of malloc(n * sizeof(x)), 
>> but there's no standard function that does overflow checking for 
>> realloc(). OpenBSD has provided the extension reallocarray(), which 
>> provides for bounds checking like calloc() does.
> 
> i'd use a saturated multiplication, because malloc/realloc
> are not the only places where overflowing size calculations
> may cause problems and in such cases (size_t)-1 is just as
> good as a failure and it can be added to your code without
> portability issues
> 
> static size_t sizemul(size_t a, size_t b)
> {
> 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> }

Before going nuts trying to optimize this, it may pay to write some
good-enough helper and to use native compiler support for this, which is
already available in Clang [1] and should be coming reasonably soon in
gcc [2].

I suspect that, on all reasonably platforms, if doublesize_t is the
unsigned type that's twice as wide as size_t, then this isn't too bad
either:

doublesize_t total = (doublesize_t)a * (doublesize_t)b;
if (total > SIZE_MAX)
  fail;

For quite a while, gcc has had a 128-bit integer type that works on
64-bit platforms, and gcc should always support a 64-bit type on 32-bit
platforms.  On systems with widening multiply (e.g. x86), even if the
optimizer doesn't detect the idiom, this is only a few cycles slower
than the optimal code.

[1]
http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61129

--Andy


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 22:08   ` Andy Lutomirski
@ 2014-05-20  0:41     ` Szabolcs Nagy
  0 siblings, 0 replies; 30+ messages in thread
From: Szabolcs Nagy @ 2014-05-20  0:41 UTC (permalink / raw)
  To: musl

* Andy Lutomirski <luto@amacapital.net> [2014-05-19 15:08:11 -0700]:
> On 05/19/2014 09:25 AM, Szabolcs Nagy wrote:
> > i'd use a saturated multiplication, because malloc/realloc
> > are not the only places where overflowing size calculations
> > may cause problems and in such cases (size_t)-1 is just as
> > good as a failure and it can be added to your code without
> > portability issues
> > 
> > static size_t sizemul(size_t a, size_t b)
> > {
> > 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> > }
> 
> Before going nuts trying to optimize this, it may pay to write some
> good-enough helper and to use native compiler support for this, which is
> already available in Clang [1] and should be coming reasonably soon in
> gcc [2].

it's a shame that clang came up with this nonsese

they managed to add 18 new compiler specific builtins,
without actually addressing the practical issue:
easy to use overflow check of size_t multiplication..
(or checking arithmetics of various other non-builtin types)
(the several new multiprecision arithmetics builtins
are bad too but less problematic in practice)

they didn't make it easy to write backward compatible code
either: historically ifdef hackery was used to "detect"
builtins support using the __GNUC__ version macros, but clang
has incompatible versioning and builtins now making the use of
new builtins more painful

(meanwhile a lot of code has idiomatic overflow checks in iso c
which is not recognized by gcc or clang..
and many c parsing tools don't understand the fancy new builtins)

> I suspect that, on all reasonably platforms, if doublesize_t is the
> unsigned type that's twice as wide as size_t, then this isn't too bad
> either:
> 
> doublesize_t total = (doublesize_t)a * (doublesize_t)b;
> if (total > SIZE_MAX)
>   fail;
> 
> For quite a while, gcc has had a 128-bit integer type that works on
> 64-bit platforms, and gcc should always support a 64-bit type on 32-bit
> platforms.  On systems with widening multiply (e.g. x86), even if the
> optimizer doesn't detect the idiom, this is only a few cycles slower
> than the optimal code.

umm __int128 is only supported in gcc since 4.6 i think
(and even after that there were some related brokenness
in hacked toolchains so >=gcc-4.6 is not enough to check)

otherwise yes with doublesize_t it is easy to do
but the point was to do it in c

for doublesize_t you would need configure time checks..
or nasty ifdef hackery.. and in the end you still need the
fallback for implementations without such a type

(the code i showed can be included in any source file where
size_t is defined)

> [1]
> http://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61129


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: thoughts on reallocarray, explicit_bzero?
  2014-05-19 16:25 ` Szabolcs Nagy
                     ` (2 preceding siblings ...)
  2014-05-19 22:08   ` Andy Lutomirski
@ 2014-06-11  9:59   ` Thorsten Glaser
  2014-06-11 12:59     ` Rich Felker
  3 siblings, 1 reply; 30+ messages in thread
From: Thorsten Glaser @ 2014-06-11  9:59 UTC (permalink / raw)
  To: musl

Szabolcs Nagy <nsz <at> port70.net> writes:

> static size_t sizemul(size_t a, size_t b)
> {
> 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> }

There is no -1 in size_t. (And *you* complain about OpenBSD checks…)

> i don't see how the openbsd explicit_bzero stops the
> compiler to do optimizations..

On OpenBSD: by being in libc which is not built with LTO.
I’ve wondered about how to do this either. Maybe:

void
explicit_bzero(void *s, size_t n)
{
	bzero(s, n);
	__lto_boundary
}

Then you #define __lto_boundary to something like
	__asm__ volatile ("" : : : "memory");
or
	__sync_synchronize();
or some C11 barrier function.

bye,
//mirabilos



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Re: thoughts on reallocarray, explicit_bzero?
  2014-06-11  9:59   ` Thorsten Glaser
@ 2014-06-11 12:59     ` Rich Felker
  0 siblings, 0 replies; 30+ messages in thread
From: Rich Felker @ 2014-06-11 12:59 UTC (permalink / raw)
  To: musl

On Wed, Jun 11, 2014 at 09:59:56AM +0000, Thorsten Glaser wrote:
> Szabolcs Nagy <nsz <at> port70.net> writes:
> 
> > static size_t sizemul(size_t a, size_t b)
> > {
> > 	return b>1 && a>1 && a>-1/b ? -1 : a*b;
> > }
> 
> There is no -1 in size_t. (And *you* complain about OpenBSD checks…)

The standard way (especially for generic programming, but it's usable
anywhere) to get the max value for an unsigned type is to convert -1.
b and a*b both have type size_t, so...

> > i don't see how the openbsd explicit_bzero stops the
> > compiler to do optimizations..
> 
> On OpenBSD: by being in libc which is not built with LTO.
> I’ve wondered about how to do this either. Maybe:

Yeah, that's a poor hack. We still probably have some places where
"extern" is used as a compiler barrier, but it's wrong, and I'm
working to identify and remove them all.

> void
> explicit_bzero(void *s, size_t n)
> {
> 	bzero(s, n);
> 	__lto_boundary
> }
> 
> Then you #define __lto_boundary to something like
> 	__asm__ volatile ("" : : : "memory");
> or
> 	__sync_synchronize();
> or some C11 barrier function.

These are not sufficient. It would probably need to be:

 	__asm__ volatile ("" : : "r"(s) : "memory");

or similar. This is because volatility and memory-clobber only
provide a barrier with respect to objects which exist in memory, and
at the point of the asm (after transformations which do not disturb
the observable behavior of the program), the pointed-to object with
automatic storage does not exist.

Rich

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-01-29 10:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-19 15:31 thoughts on reallocarray, explicit_bzero? Isaac Dunham
2014-05-19 15:43 ` Rich Felker
2014-05-19 16:19   ` Daniel Cegiełka
2014-05-20  6:19     ` Rich Felker
2014-05-20 15:50       ` Daniel Cegiełka
2014-05-19 15:44 ` Daniel Cegiełka
2014-05-19 16:16   ` Rich Felker
2014-05-19 16:30     ` Daniel Cegiełka
2014-05-19 16:32     ` Szabolcs Nagy
2015-01-28 22:01     ` Daniel Cegiełka
2015-01-28 22:34       ` Daniel Cegiełka
2015-01-28 22:38         ` Nathan McSween
2015-01-28 22:54           ` Daniel Cegiełka
2015-01-28 23:02             ` Josiah Worcester
2015-01-29  2:19         ` Rich Felker
2015-01-29  4:03           ` Brent Cook
2015-01-29  4:15             ` Rich Felker
2015-01-29  9:30               ` Daniel Cegiełka
2015-01-29 10:04                 ` Szabolcs Nagy
2015-01-29 10:31                   ` Daniel Cegiełka
2015-01-29 10:54                   ` Daniel Cegiełka
2014-05-19 16:25 ` Szabolcs Nagy
2014-05-19 16:45   ` Daniel Cegiełka
2014-05-19 16:58     ` Rich Felker
2014-05-19 16:55   ` Rich Felker
2014-05-19 18:12     ` Szabolcs Nagy
2014-05-19 22:08   ` Andy Lutomirski
2014-05-20  0:41     ` Szabolcs Nagy
2014-06-11  9:59   ` Thorsten Glaser
2014-06-11 12:59     ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).