mailing list of musl libc
 help / color / mirror / code / Atom feed
* buffer overflow in regcomp and a way to find more of those
@ 2015-03-20 20:17 Konstantin Serebryany
  2015-03-20 20:40 ` Rich Felker
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-20 20:17 UTC (permalink / raw)
  To: musl, Szabolcs Nagy

Hi,

Following the discussion at the glibc mailing list
(https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
I've tried to fuzz musl regcomp and the first bug popped up quickly.
Please let me know if you would be interested in adding the fuzzer
(http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
to the musl testing process.

Exact repro steps, just copy-paste (assuming you have fresh clang)
===================  ===============
tar xf ~/Downloads/musl-1.1.7.tar.gz
cd musl-1.1.7
./configure && make -j
cat << EOF > bug1.c
#include <string.h>
#include <stdlib.h>
#include "regex.h"

int main() {
  regex_t preg;
  char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51,
48, 92, 125, 0};
  char *s = strdup(a);
  if (0 == regcomp(&preg, s, 0)) {
    regfree(&preg);
  }
  free(s);
  return 0;
}
EOF
clang  -g  -fsanitize=address  ./src/regex/reg*.c src/regex/tre*.c
src/locale/__lctrans.c src/internal/libc.c -I include -I src/internal/
-Iarch/x86_64 bug1.c
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out

==33356==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x60200000ef44 at pc 0x0000004d8cb9 bp 0x7fff09d51b10 sp
0x7fff09d51b08
WRITE of size 4 at 0x60200000ef44 thread T0
    #0 0x4d8cb8 in tre_copy_ast src/regex/regcomp.c:1697:27
    #1 0x4cc332 in tre_expand_ast src/regex/regcomp.c:1884:16
    #2 0x4c4de2 in regcomp src/regex/regcomp.c:2739:13
    #3 0x4e9e06 in main bug1.c:9:12
    #4 0x7f49f1086ec4 in __libc_start_main
/build/buildd/eglibc-2.19/csu/libc-start.c:287
    #5 0x416d45 in _start (a.out+0x416d45)

0x60200000ef44 is located 8 bytes to the right of 12-byte region
[0x60200000ef30,0x60200000ef3c)
allocated by thread T0 here:
    #0 0x4a20a4 in calloc
/usr/local/google/home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56:3
    #1 0x4c4bd9 in regcomp src/regex/regcomp.c:2721:28
    #2 0x4e9e06 in main bug1.c:9:12


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany
@ 2015-03-20 20:40 ` Rich Felker
  2015-03-20 21:28 ` Szabolcs Nagy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Rich Felker @ 2015-03-20 20:40 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl, Szabolcs Nagy

On Fri, Mar 20, 2015 at 01:17:47PM -0700, Konstantin Serebryany wrote:
> Hi,
> 
> Following the discussion at the glibc mailing list
> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> Please let me know if you would be interested in adding the fuzzer
> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> to the musl testing process.
> 
> Exact repro steps, just copy-paste (assuming you have fresh clang)
> ===================  ===============
> tar xf ~/Downloads/musl-1.1.7.tar.gz
> cd musl-1.1.7
> ../configure && make -j
> cat << EOF > bug1.c
> #include <string.h>
> #include <stdlib.h>
> #include "regex.h"
> 
> int main() {
>   regex_t preg;
>   char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51,
> 48, 92, 125, 0};

Simplified test case:
    char a[] = "\\\375\\{2\\}";

The problem seems to be handling of [backslash], [illegal sequence],
[repetition]. I haven't analyzed the cause, but that was my initial
guess and the minimal example I was able to reduce it to without the
crash disappearing.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany
  2015-03-20 20:40 ` Rich Felker
@ 2015-03-20 21:28 ` Szabolcs Nagy
  2015-03-20 23:48   ` Szabolcs Nagy
  2015-03-20 22:32 ` Rich Felker
  2015-03-20 23:52 ` Szabolcs Nagy
  3 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-20 21:28 UTC (permalink / raw)
  To: musl; +Cc: Szabolcs Nagy

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
> Following the discussion at the glibc mailing list
> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> Please let me know if you would be interested in adding the fuzzer
> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> to the musl testing process.


thanks for doing this

i think i already know what's the bug is

(of course it's in an underspecified extension where applications
expect conflicting behaviour... but the segfault is my bug)

> Exact repro steps, just copy-paste (assuming you have fresh clang)
> ===================  ===============
> tar xf ~/Downloads/musl-1.1.7.tar.gz
> cd musl-1.1.7
> ./configure && make -j
> cat << EOF > bug1.c
> #include <string.h>
> #include <stdlib.h>
> #include "regex.h"
> 
> int main() {
>   regex_t preg;
>   char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51,
> 48, 92, 125, 0};
>   char *s = strdup(a);
>   if (0 == regcomp(&preg, s, 0)) {
>     regfree(&preg);
>   }
>   free(s);
>   return 0;
> }
> EOF
> clang  -g  -fsanitize=address  ./src/regex/reg*.c src/regex/tre*.c
> src/locale/__lctrans.c src/internal/libc.c -I include -I src/internal/
> -Iarch/x86_64 bug1.c
> ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
> 

looks simple

i'll set up some glibc based virtual machine to be able to play with it

thanks again

> ==33356==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x60200000ef44 at pc 0x0000004d8cb9 bp 0x7fff09d51b10 sp
> 0x7fff09d51b08
> WRITE of size 4 at 0x60200000ef44 thread T0
>     #0 0x4d8cb8 in tre_copy_ast src/regex/regcomp.c:1697:27
>     #1 0x4cc332 in tre_expand_ast src/regex/regcomp.c:1884:16
>     #2 0x4c4de2 in regcomp src/regex/regcomp.c:2739:13
>     #3 0x4e9e06 in main bug1.c:9:12
>     #4 0x7f49f1086ec4 in __libc_start_main
> /build/buildd/eglibc-2.19/csu/libc-start.c:287
>     #5 0x416d45 in _start (a.out+0x416d45)
> 
> 0x60200000ef44 is located 8 bytes to the right of 12-byte region
> [0x60200000ef30,0x60200000ef3c)
> allocated by thread T0 here:
>     #0 0x4a20a4 in calloc
> /usr/local/google/home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56:3
>     #1 0x4c4bd9 in regcomp src/regex/regcomp.c:2721:28
>     #2 0x4e9e06 in main bug1.c:9:12


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany
  2015-03-20 20:40 ` Rich Felker
  2015-03-20 21:28 ` Szabolcs Nagy
@ 2015-03-20 22:32 ` Rich Felker
  2015-03-20 23:52 ` Szabolcs Nagy
  3 siblings, 0 replies; 42+ messages in thread
From: Rich Felker @ 2015-03-20 22:32 UTC (permalink / raw)
  To: musl

On Fri, Mar 20, 2015 at 01:17:47PM -0700, Konstantin Serebryany wrote:
> Hi,
> 
> Following the discussion at the glibc mailing list
> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> Please let me know if you would be interested in adding the fuzzer
> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> to the musl testing process.

Thanks! It's fixed in commit 39dfd58417ef642307d90306e1c7e50aaec5a35c.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 21:28 ` Szabolcs Nagy
@ 2015-03-20 23:48   ` Szabolcs Nagy
  0 siblings, 0 replies; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-20 23:48 UTC (permalink / raw)
  To: musl, Szabolcs Nagy

* Szabolcs Nagy <nsz@port70.net> [2015-03-20 22:28:03 +0100]:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
> > Following the discussion at the glibc mailing list
> > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> > I've tried to fuzz musl regcomp and the first bug popped up quickly.
> > Please let me know if you would be interested in adding the fuzzer
> > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> > to the musl testing process.

ok

(1) the clean approach would be to have a way to build an
instrumented libc and a separate set of test cases for
various libc apis that the fuzzer could use.

(2) the other approach is to cut parts of the libc out
(the parsers often don't depend on too much libc internals)
and build them with whatever runtime the fuzzer needs

the question is how hard it is to do (1) ?

i assume asan is non-trivial to set up for that (or is it
enough to replace malloc calls? and some startup logic?)

at first it is ok if the fuzzer only catches crashing bugs
so if that's easy to do i'd go for that.

for (1) i can write the test cases and adjust the musl build
system, but i dont know how much difficulty should i expect



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany
                   ` (2 preceding siblings ...)
  2015-03-20 22:32 ` Rich Felker
@ 2015-03-20 23:52 ` Szabolcs Nagy
  2015-03-21  0:06   ` Konstantin Serebryany
  3 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-20 23:52 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
> Following the discussion at the glibc mailing list
> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> Please let me know if you would be interested in adding the fuzzer
> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> to the musl testing process.
> 

(now with correct To: header)


(1) the clean approach would be to have a way to build an
instrumented libc and a separate set of test cases for
various libc apis that the fuzzer could use.

(2) the other approach is to cut parts of the libc out
(the parsers often don't depend on too much libc internals)
and build them with whatever runtime the fuzzer needs

the question is how hard it is to do (1) ?

i assume asan is non-trivial to set up for that (or is it
enough to replace malloc calls? and some startup logic?)

at first it is ok if the fuzzer only catches crashing bugs
so if that's easy to do i'd go for that.

for (1) i can write the test cases and adjust the musl build
system, but i dont know how much difficulty should i expect

thanks


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-20 23:52 ` Szabolcs Nagy
@ 2015-03-21  0:06   ` Konstantin Serebryany
  2015-03-21  0:26     ` Szabolcs Nagy
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  0:06 UTC (permalink / raw)
  To: Konstantin Serebryany, musl

On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
>> Following the discussion at the glibc mailing list
>> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
>> I've tried to fuzz musl regcomp and the first bug popped up quickly.
>> Please let me know if you would be interested in adding the fuzzer
>> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
>> to the musl testing process.
>>
>
> (now with correct To: header)
>
>
> (1) the clean approach would be to have a way to build an
> instrumented libc and a separate set of test cases for
> various libc apis that the fuzzer could use.

Correct. Building libc.a is simple:
CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j
But then I don't know how to properly link libc.a to a test case.
How do you usually link tests with libc.a on x86_64 linux?

>
> (2) the other approach is to cut parts of the libc out
> (the parsers often don't depend on too much libc internals)
> and build them with whatever runtime the fuzzer needs

That's exactly what I did. Not optimal, I agree.

>
> the question is how hard it is to do (1) ?
>
> i assume asan is non-trivial to set up for that (or is it
> enough to replace malloc calls? and some startup logic?)

asan replaces malloc and a few more libc functions.
It works with various different libcs, so there is a good chance that
it will work here with no or minimal changes.

>
> at first it is ok if the fuzzer only catches crashing bugs
> so if that's easy to do i'd go for that.
>
> for (1) i can write the test cases and adjust the musl build
> system, but i dont know how much difficulty should i expect
>
> thanks


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  0:06   ` Konstantin Serebryany
@ 2015-03-21  0:26     ` Szabolcs Nagy
  2015-03-21  0:46       ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21  0:26 UTC (permalink / raw)
  To: musl; +Cc: Konstantin Serebryany

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]:
> On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
> >> Following the discussion at the glibc mailing list
> >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> >> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> >> Please let me know if you would be interested in adding the fuzzer
> >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> >> to the musl testing process.
> >>
> >
> > (now with correct To: header)
> >
> >
> > (1) the clean approach would be to have a way to build an
> > instrumented libc and a separate set of test cases for
> > various libc apis that the fuzzer could use.
> 
> Correct. Building libc.a is simple:
> CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j
> But then I don't know how to properly link libc.a to a test case.
> How do you usually link tests with libc.a on x86_64 linux?
> 

we have a musl-gcc script when the compiler is gcc (it uses
a simple spec file to set things up), i don't know what's
the equivalent mechanism in clang world, but i think one
can create a simple script based on the first version of
musl-gcc

http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631

the test system does not know about toolchain details
the user has to provide whatever compiler wrapper script
is needed to make things work

but i think i wont try to integrate this into our libc-test
right away, libc-test is designed to test a posix libc with
minimal assumptions or external dependencies
(the testing process of musl is not very formal or automated
yet anyway)

> > the question is how hard it is to do (1) ?
> >
> > i assume asan is non-trivial to set up for that (or is it
> > enough to replace malloc calls? and some startup logic?)
> 
> asan replaces malloc and a few more libc functions.
> It works with various different libcs, so there is a good chance that
> it will work here with no or minimal changes.
> 

ok i'll try it



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  0:26     ` Szabolcs Nagy
@ 2015-03-21  0:46       ` Rich Felker
  2015-03-21  0:54         ` Konstantin Serebryany
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  0:46 UTC (permalink / raw)
  To: musl, Konstantin Serebryany

On Sat, Mar 21, 2015 at 01:26:16AM +0100, Szabolcs Nagy wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]:
> > On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
> > >> Following the discussion at the glibc mailing list
> > >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
> > >> I've tried to fuzz musl regcomp and the first bug popped up quickly.
> > >> Please let me know if you would be interested in adding the fuzzer
> > >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
> > >> to the musl testing process.
> > >>
> > >
> > > (now with correct To: header)
> > >
> > >
> > > (1) the clean approach would be to have a way to build an
> > > instrumented libc and a separate set of test cases for
> > > various libc apis that the fuzzer could use.
> > 
> > Correct. Building libc.a is simple:
> > CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j
> > But then I don't know how to properly link libc.a to a test case.
> > How do you usually link tests with libc.a on x86_64 linux?
> 
> we have a musl-gcc script when the compiler is gcc (it uses
> a simple spec file to set things up), i don't know what's
> the equivalent mechanism in clang world, but i think one
> can create a simple script based on the first version of
> musl-gcc
> 
> http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631

Do you mean the version removed in that commit? As long as you're just
building simple test files and not large program/library ecosystems, I
think it's even simpler than that. For static linking, just using
-nostdinc, -isystem, and -L should be all you need to compile/link
against the instrumented musl libc.a instead of the host libc.
Assuming the host is musl-based already, -nostdinc and -isystem
shouldn't even be needed. Just -L is sufficient.

> the test system does not know about toolchain details
> the user has to provide whatever compiler wrapper script
> is needed to make things work
> 
> but i think i wont try to integrate this into our libc-test
> right away, libc-test is designed to test a posix libc with
> minimal assumptions or external dependencies
> (the testing process of musl is not very formal or automated
> yet anyway)

Indeed, I don't think fuzzing is something that belongs with regular
functionality/regression tests. It presumably takes a lot more time,
requires different build procedures, and addresses a different need
than the tests we have.

> > > the question is how hard it is to do (1) ?
> > >
> > > i assume asan is non-trivial to set up for that (or is it
> > > enough to replace malloc calls? and some startup logic?)
> > 
> > asan replaces malloc and a few more libc functions.
> > It works with various different libcs, so there is a good chance that
> > it will work here with no or minimal changes.
> 
> ok i'll try it

I would guess it works with no change for static linking, but some
changes might be needed for dynamic linking. I'm perfectly happy with
all the fuzzing being done with static linking anyway; I don't think
dynamic linking would have significant additional code paths whose
coverage need checking.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  0:46       ` Rich Felker
@ 2015-03-21  0:54         ` Konstantin Serebryany
  2015-03-21  1:00           ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  0:54 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, Mar 20, 2015 at 5:46 PM, Rich Felker <dalias@libc.org> wrote:
> On Sat, Mar 21, 2015 at 01:26:16AM +0100, Szabolcs Nagy wrote:
>> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]:
>> > On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote:
>> > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]:
>> > >> Following the discussion at the glibc mailing list
>> > >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html)
>> > >> I've tried to fuzz musl regcomp and the first bug popped up quickly.
>> > >> Please let me know if you would be interested in adding the fuzzer
>> > >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup)
>> > >> to the musl testing process.
>> > >>
>> > >
>> > > (now with correct To: header)
>> > >
>> > >
>> > > (1) the clean approach would be to have a way to build an
>> > > instrumented libc and a separate set of test cases for
>> > > various libc apis that the fuzzer could use.
>> >
>> > Correct. Building libc.a is simple:
>> > CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j
>> > But then I don't know how to properly link libc.a to a test case.
>> > How do you usually link tests with libc.a on x86_64 linux?
>>
>> we have a musl-gcc script when the compiler is gcc (it uses
>> a simple spec file to set things up), i don't know what's
>> the equivalent mechanism in clang world, but i think one
>> can create a simple script based on the first version of
>> musl-gcc
>>
>> http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631
>
> Do you mean the version removed in that commit? As long as you're just
> building simple test files and not large program/library ecosystems, I
> think it's even simpler than that. For static linking, just using
> -nostdinc, -isystem, and -L should be all you need to compile/link
> against the instrumented musl libc.a instead of the host libc.
> Assuming the host is musl-based already, -nostdinc and -isystem
> shouldn't even be needed. Just -L is sufficient.
>
>> the test system does not know about toolchain details
>> the user has to provide whatever compiler wrapper script
>> is needed to make things work
>>
>> but i think i wont try to integrate this into our libc-test
>> right away, libc-test is designed to test a posix libc with
>> minimal assumptions or external dependencies
>> (the testing process of musl is not very formal or automated
>> yet anyway)
>
> Indeed, I don't think fuzzing is something that belongs with regular
> functionality/regression tests. It presumably takes a lot more time,
> requires different build procedures, and addresses a different need
> than the tests we have.
>
>> > > the question is how hard it is to do (1) ?
>> > >
>> > > i assume asan is non-trivial to set up for that (or is it
>> > > enough to replace malloc calls? and some startup logic?)
>> >
>> > asan replaces malloc and a few more libc functions.
>> > It works with various different libcs, so there is a good chance that
>> > it will work here with no or minimal changes.
>>
>> ok i'll try it
>
> I would guess it works with no change for static linking, but some
> changes might be needed for dynamic linking. I'm perfectly happy with
> all the fuzzing being done with static linking anyway; I don't think
> dynamic linking would have significant additional code paths whose
> coverage need checking.

sadly, asan does not support fully static linking.

>
> Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  0:54         ` Konstantin Serebryany
@ 2015-03-21  1:00           ` Rich Felker
  2015-03-21  1:05             ` Konstantin Serebryany
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  1:00 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
> >> > > the question is how hard it is to do (1) ?
> >> > >
> >> > > i assume asan is non-trivial to set up for that (or is it
> >> > > enough to replace malloc calls? and some startup logic?)
> >> >
> >> > asan replaces malloc and a few more libc functions.
> >> > It works with various different libcs, so there is a good chance that
> >> > it will work here with no or minimal changes.
> >>
> >> ok i'll try it
> >
> > I would guess it works with no change for static linking, but some
> > changes might be needed for dynamic linking. I'm perfectly happy with
> > all the fuzzing being done with static linking anyway; I don't think
> > dynamic linking would have significant additional code paths whose
> > coverage need checking.
> 
> sadly, asan does not support fully static linking.

Is this just an oversight or something fundamental that's hard to fix?
The sort of things it wants to do are much less likely to work with
dynamic linking. Dynamic-linked musl requires all internal symbol
references to be resolved at ld-time and does not support interposing
in front of them.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:00           ` Rich Felker
@ 2015-03-21  1:05             ` Konstantin Serebryany
  2015-03-21  1:10               ` Konstantin Serebryany
  2015-03-21  1:32               ` Rich Felker
  0 siblings, 2 replies; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  1:05 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
>> >> > > the question is how hard it is to do (1) ?
>> >> > >
>> >> > > i assume asan is non-trivial to set up for that (or is it
>> >> > > enough to replace malloc calls? and some startup logic?)
>> >> >
>> >> > asan replaces malloc and a few more libc functions.
>> >> > It works with various different libcs, so there is a good chance that
>> >> > it will work here with no or minimal changes.
>> >>
>> >> ok i'll try it
>> >
>> > I would guess it works with no change for static linking, but some
>> > changes might be needed for dynamic linking. I'm perfectly happy with
>> > all the fuzzing being done with static linking anyway; I don't think
>> > dynamic linking would have significant additional code paths whose
>> > coverage need checking.
>>
>> sadly, asan does not support fully static linking.
>
> Is this just an oversight or something fundamental that's hard to fix?

Quite fundamental.
asan needs to be able to intercept certain libc functions and on all
platforms (linux, android, OSX, Windows, etc) it works only when libc
itself is dynamically linked.

(Theoretically, it's possible to fix, but it'll be  too much work :( )

> The sort of things it wants to do are much less likely to work with
> dynamic linking. Dynamic-linked musl requires all internal symbol
> references to be resolved at ld-time and does not support interposing
> in front of them.
>
> Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:05             ` Konstantin Serebryany
@ 2015-03-21  1:10               ` Konstantin Serebryany
  2015-03-21  1:23                 ` Szabolcs Nagy
  2015-03-21  1:32               ` Rich Felker
  1 sibling, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  1:10 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

After your fix the fuzzer did not find anything else so far, but it
suffers from slow performance on some cases.
Not sure if this qualifies for a bug, but the following example takes
~2 seconds to run (runs instantly with glibc):
int main() {
  regex_t preg;
  const char *s = ".****\\Z$<\\0)_";
  regmatch_t pmatch[2];
  if (0 == regcomp(&preg, s, 0)) {
    regexec(&preg, s, 0, pmatch, 0);
    regfree(&preg);
  }
  return 0;
}


On Fri, Mar 20, 2015 at 6:05 PM, Konstantin Serebryany
<konstantin.s.serebryany@gmail.com> wrote:
> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote:
>> On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
>>> >> > > the question is how hard it is to do (1) ?
>>> >> > >
>>> >> > > i assume asan is non-trivial to set up for that (or is it
>>> >> > > enough to replace malloc calls? and some startup logic?)
>>> >> >
>>> >> > asan replaces malloc and a few more libc functions.
>>> >> > It works with various different libcs, so there is a good chance that
>>> >> > it will work here with no or minimal changes.
>>> >>
>>> >> ok i'll try it
>>> >
>>> > I would guess it works with no change for static linking, but some
>>> > changes might be needed for dynamic linking. I'm perfectly happy with
>>> > all the fuzzing being done with static linking anyway; I don't think
>>> > dynamic linking would have significant additional code paths whose
>>> > coverage need checking.
>>>
>>> sadly, asan does not support fully static linking.
>>
>> Is this just an oversight or something fundamental that's hard to fix?
>
> Quite fundamental.
> asan needs to be able to intercept certain libc functions and on all
> platforms (linux, android, OSX, Windows, etc) it works only when libc
> itself is dynamically linked.
>
> (Theoretically, it's possible to fix, but it'll be  too much work :( )
>
>> The sort of things it wants to do are much less likely to work with
>> dynamic linking. Dynamic-linked musl requires all internal symbol
>> references to be resolved at ld-time and does not support interposing
>> in front of them.
>>
>> Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:10               ` Konstantin Serebryany
@ 2015-03-21  1:23                 ` Szabolcs Nagy
  2015-03-21  1:30                   ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21  1:23 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Rich Felker, musl

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 18:10:18 -0700]:
> After your fix the fuzzer did not find anything else so far, but it
> suffers from slow performance on some cases.
> Not sure if this qualifies for a bug, but the following example takes
> ~2 seconds to run (runs instantly with glibc):

i think the problem is stacked repetitions
tre doesnt handle them in a sane way
and uses huge amount of ram

for * it would be easy to solve, but
the general case is theoretically impossible to
solve: x{255}{255} will be a 255*255 state machine

this is the only part in the musl regex
engine that's allowed to have super linear
space/time complexity

(you might want to add some logic to avoid
such stacked repetitions to speed up the search)

(btw the standard does not allow these, but if
the pattern is parenthesized around every repetition
then that's ok: (x*)* is a valid pattern, x** is not,
so there is not much point rejecting these patterns
the problem does not go away since grouping is allowed)

> int main() {
>   regex_t preg;
>   const char *s = ".****\\Z$<\\0)_";
>   regmatch_t pmatch[2];
>   if (0 == regcomp(&preg, s, 0)) {
>     regexec(&preg, s, 0, pmatch, 0);
>     regfree(&preg);
>   }
>   return 0;
> }
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:23                 ` Szabolcs Nagy
@ 2015-03-21  1:30                   ` Rich Felker
  2015-03-21  2:10                     ` Szabolcs Nagy
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  1:30 UTC (permalink / raw)
  To: Konstantin Serebryany, musl

On Sat, Mar 21, 2015 at 02:23:41AM +0100, Szabolcs Nagy wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 18:10:18 -0700]:
> > After your fix the fuzzer did not find anything else so far, but it
> > suffers from slow performance on some cases.
> > Not sure if this qualifies for a bug, but the following example takes
> > ~2 seconds to run (runs instantly with glibc):
> 
> i think the problem is stacked repetitions
> tre doesnt handle them in a sane way
> and uses huge amount of ram

Sadly there doesn't seem to be any sane way to handle them...

> for * it would be easy to solve, but
> the general case is theoretically impossible to
> solve: x{255}{255} will be a 255*255 state machine
> 
> this is the only part in the musl regex
> engine that's allowed to have super linear
> space/time complexity
> 
> (you might want to add some logic to avoid
> such stacked repetitions to speed up the search)
> 
> (btw the standard does not allow these, but if
> the pattern is parenthesized around every repetition
> then that's ok: (x*)* is a valid pattern, x** is not,
> so there is not much point rejecting these patterns
> the problem does not go away since grouping is allowed)
> 
> > int main() {
> >   regex_t preg;
> >   const char *s = ".****\\Z$<\\0)_";

Isn't the \0 an invalid backreference? Could it be getting processed
in a way that's causing the slowdown, but simply rejected by glibc?

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:05             ` Konstantin Serebryany
  2015-03-21  1:10               ` Konstantin Serebryany
@ 2015-03-21  1:32               ` Rich Felker
  2015-03-21  1:37                 ` Konstantin Serebryany
  1 sibling, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  1:32 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote:
> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
> >> >> > > the question is how hard it is to do (1) ?
> >> >> > >
> >> >> > > i assume asan is non-trivial to set up for that (or is it
> >> >> > > enough to replace malloc calls? and some startup logic?)
> >> >> >
> >> >> > asan replaces malloc and a few more libc functions.
> >> >> > It works with various different libcs, so there is a good chance that
> >> >> > it will work here with no or minimal changes.
> >> >>
> >> >> ok i'll try it
> >> >
> >> > I would guess it works with no change for static linking, but some
> >> > changes might be needed for dynamic linking. I'm perfectly happy with
> >> > all the fuzzing being done with static linking anyway; I don't think
> >> > dynamic linking would have significant additional code paths whose
> >> > coverage need checking.
> >>
> >> sadly, asan does not support fully static linking.
> >
> > Is this just an oversight or something fundamental that's hard to fix?
> 
> Quite fundamental.
> asan needs to be able to intercept certain libc functions and on all
> platforms (linux, android, OSX, Windows, etc) it works only when libc
> itself is dynamically linked.

But if you're compiling libc itself with asan, couldn't it just
hard-insert the interception code into the implementations of these
functions during compiling?

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:32               ` Rich Felker
@ 2015-03-21  1:37                 ` Konstantin Serebryany
  2015-03-21  1:56                   ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  1:37 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, Mar 20, 2015 at 6:32 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote:
>> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
>> >> >> > > the question is how hard it is to do (1) ?
>> >> >> > >
>> >> >> > > i assume asan is non-trivial to set up for that (or is it
>> >> >> > > enough to replace malloc calls? and some startup logic?)
>> >> >> >
>> >> >> > asan replaces malloc and a few more libc functions.
>> >> >> > It works with various different libcs, so there is a good chance that
>> >> >> > it will work here with no or minimal changes.
>> >> >>
>> >> >> ok i'll try it
>> >> >
>> >> > I would guess it works with no change for static linking, but some
>> >> > changes might be needed for dynamic linking. I'm perfectly happy with
>> >> > all the fuzzing being done with static linking anyway; I don't think
>> >> > dynamic linking would have significant additional code paths whose
>> >> > coverage need checking.
>> >>
>> >> sadly, asan does not support fully static linking.
>> >
>> > Is this just an oversight or something fundamental that's hard to fix?
>>
>> Quite fundamental.
>> asan needs to be able to intercept certain libc functions and on all
>> platforms (linux, android, OSX, Windows, etc) it works only when libc
>> itself is dynamically linked.
>
> But if you're compiling libc itself with asan, couldn't it just
> hard-insert the interception code into the implementations of these
> functions during compiling?

I think it could, it's just quite a bit of work to do. :(
We may end up doing it eventually as I hope to use instrumented glibc
whenever we can,
and at that point intercepting functions from glibc will become rather
silly. But we are not there yet.

>
> Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:37                 ` Konstantin Serebryany
@ 2015-03-21  1:56                   ` Rich Felker
  2015-03-21  2:14                     ` Konstantin Serebryany
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  1:56 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

On Fri, Mar 20, 2015 at 06:37:39PM -0700, Konstantin Serebryany wrote:
> On Fri, Mar 20, 2015 at 6:32 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote:
> >> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote:
> >> > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote:
> >> >> >> > > the question is how hard it is to do (1) ?
> >> >> >> > >
> >> >> >> > > i assume asan is non-trivial to set up for that (or is it
> >> >> >> > > enough to replace malloc calls? and some startup logic?)
> >> >> >> >
> >> >> >> > asan replaces malloc and a few more libc functions.
> >> >> >> > It works with various different libcs, so there is a good chance that
> >> >> >> > it will work here with no or minimal changes.
> >> >> >>
> >> >> >> ok i'll try it
> >> >> >
> >> >> > I would guess it works with no change for static linking, but some
> >> >> > changes might be needed for dynamic linking. I'm perfectly happy with
> >> >> > all the fuzzing being done with static linking anyway; I don't think
> >> >> > dynamic linking would have significant additional code paths whose
> >> >> > coverage need checking.
> >> >>
> >> >> sadly, asan does not support fully static linking.
> >> >
> >> > Is this just an oversight or something fundamental that's hard to fix?
> >>
> >> Quite fundamental.
> >> asan needs to be able to intercept certain libc functions and on all
> >> platforms (linux, android, OSX, Windows, etc) it works only when libc
> >> itself is dynamically linked.
> >
> > But if you're compiling libc itself with asan, couldn't it just
> > hard-insert the interception code into the implementations of these
> > functions during compiling?
> 
> I think it could, it's just quite a bit of work to do. :(
> We may end up doing it eventually as I hope to use instrumented glibc
> whenever we can,
> and at that point intercepting functions from glibc will become rather
> silly. But we are not there yet.

Sorry to keep bombarding you with questions. One more: is it only asan
that needs dynamic linking? If we're willing to drop asan for now and
just rely on musl itself crashing for heap corruption (musl does a
good job of detecting it usually), can the necessary coverage stuff
still work with static linking?

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:30                   ` Rich Felker
@ 2015-03-21  2:10                     ` Szabolcs Nagy
  2015-03-21  2:17                       ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21  2:10 UTC (permalink / raw)
  To: Rich Felker; +Cc: Konstantin Serebryany, musl

* Rich Felker <dalias@libc.org> [2015-03-20 21:30:16 -0400]:
> > > int main() {
> > >   regex_t preg;
> > >   const char *s = ".****\\Z$<\\0)_";
> 
> Isn't the \0 an invalid backreference? Could it be getting processed
> in a way that's causing the slowdown, but simply rejected by glibc?

ah you were right the \0 causes the slow down here:
it switches to the backtracking mode and there are
many ways to backtrack on .****


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  1:56                   ` Rich Felker
@ 2015-03-21  2:14                     ` Konstantin Serebryany
  2015-03-21  2:20                       ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  2:14 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

>
> Sorry to keep bombarding you with questions.

You are more than welcome!

> One more: is it only asan
> that needs dynamic linking? If we're willing to drop asan for now and
> just rely on musl itself crashing for heap corruption (musl does a
> good job of detecting it usually), can the necessary coverage stuff
> still work with static linking?

I think it can with a reasonable additional work, but not out of the box.
The compiler instrumentation in clang clearly does not care about
dynamic vs static linking.
If you build the source with "-fsanitize=leak -fsanitize-coverage=4
-O1" the compiler will not insert any of the asan instrumentation
and only insert calls to a couple of functions needed for coverage.
Then, instead of linking with the full asan+coverage run-time, you
will need a very simple re-implementation of coverage-only runtime.

But, my previous experience with running fuzzers w/o memory bug
detectors (asan, or others)
suggests that this is a bad idea. Memory bugs tend to accumulate and
show up in the following iterations (if at all).

>
> Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  2:10                     ` Szabolcs Nagy
@ 2015-03-21  2:17                       ` Rich Felker
  0 siblings, 0 replies; 42+ messages in thread
From: Rich Felker @ 2015-03-21  2:17 UTC (permalink / raw)
  To: Konstantin Serebryany, musl

On Sat, Mar 21, 2015 at 03:10:18AM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2015-03-20 21:30:16 -0400]:
> > > > int main() {
> > > >   regex_t preg;
> > > >   const char *s = ".****\\Z$<\\0)_";
> > 
> > Isn't the \0 an invalid backreference? Could it be getting processed
> > in a way that's causing the slowdown, but simply rejected by glibc?
> 
> ah you were right the \0 causes the slow down here:
> it switches to the backtracking mode and there are
> many ways to backtrack on .****

Right. But \0 isn't even a valid backreference. It would refer to "the
whole match" which could never match as a backreference. Valid
backrefs are only the digits 1-9 though. \0 is not defined and should
probably be treated as a literal or a parse error.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  2:14                     ` Konstantin Serebryany
@ 2015-03-21  2:20                       ` Rich Felker
  2015-03-21  6:05                         ` Konstantin Serebryany
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-21  2:20 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote:
> >
> > Sorry to keep bombarding you with questions.
> 
> You are more than welcome!
> 
> > One more: is it only asan
> > that needs dynamic linking? If we're willing to drop asan for now and
> > just rely on musl itself crashing for heap corruption (musl does a
> > good job of detecting it usually), can the necessary coverage stuff
> > still work with static linking?
> 
> I think it can with a reasonable additional work, but not out of the box.
> The compiler instrumentation in clang clearly does not care about
> dynamic vs static linking.
> If you build the source with "-fsanitize=leak -fsanitize-coverage=4
> -O1" the compiler will not insert any of the asan instrumentation
> and only insert calls to a couple of functions needed for coverage.
> Then, instead of linking with the full asan+coverage run-time, you
> will need a very simple re-implementation of coverage-only runtime.

Could the existing runtime be used, just stripped down?

> But, my previous experience with running fuzzers w/o memory bug
> detectors (asan, or others)
> suggests that this is a bad idea. Memory bugs tend to accumulate and
> show up in the following iterations (if at all).

Well static linking with musl does not impose any constraint on
redefining functions, so you could easily use a debugging malloc that
lines up each allocation to end on a page boundary with a guard page
after it. This would of course be slow and use lots of memory but
would catch all heap overflows. And -fstack-protector-all would catch
most stack-based overflows.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  2:20                       ` Rich Felker
@ 2015-03-21  6:05                         ` Konstantin Serebryany
  2015-03-21 13:28                           ` Szabolcs Nagy
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-21  6:05 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote:
>> >
>> > Sorry to keep bombarding you with questions.
>>
>> You are more than welcome!
>>
>> > One more: is it only asan
>> > that needs dynamic linking? If we're willing to drop asan for now and
>> > just rely on musl itself crashing for heap corruption (musl does a
>> > good job of detecting it usually), can the necessary coverage stuff
>> > still work with static linking?
>>
>> I think it can with a reasonable additional work, but not out of the box.
>> The compiler instrumentation in clang clearly does not care about
>> dynamic vs static linking.
>> If you build the source with "-fsanitize=leak -fsanitize-coverage=4
>> -O1" the compiler will not insert any of the asan instrumentation
>> and only insert calls to a couple of functions needed for coverage.
>> Then, instead of linking with the full asan+coverage run-time, you
>> will need a very simple re-implementation of coverage-only runtime.
>
> Could the existing runtime be used, just stripped down?

Yes, but for the basic functionality needed by the fuzzer it's simpler
to write it from scratch, see below:

========================================================
svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
cat <<EOF >cov-minimal-rt.c
static long counter;
void __sanitizer_cov_with_check(int *guard) {
  if (*guard == 0) {
    counter++;
    *guard=1;
  }
}
long __sanitizer_get_total_unique_coverage() { return counter; }
void __sanitizer_cov_module_init() {}
void __sanitizer_reset_coverage(){}
void __sanitizer_get_coverage_guards(){}
void __sanitizer_get_number_of_counters(){}
void __sanitizer_update_counter_bitset_and_clear_counters(){}
void __sanitizer_set_death_callback(){}
EOF

clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer
clang -std=c++11  -fsanitize=leak -fsanitize-coverage=3 -mllvm
-sanitizer-coverage-block-threshold=0  Fuzzer/test/SimpleTest.cpp -c
clang -c cov-minimal-rt.c
clang++ *.o
./a.out
========================================================
Seed: 1285924057
Shuffle: Size: 1 prefer small: 1
#1      cov: 5  bits: 0 exec/s: 0
Shuffle done: 1 IC: 5
#2      cov: 7  bits: 0 exec/s: 0
#2      NEW: 7 B: 0 L: 64 S: 2 I: 0
#4      cov: 7  bits: 0 exec/s: 0
#8      cov: 7  bits: 0 exec/s: 0
#16     cov: 7  bits: 0 exec/s: 0
#32     cov: 7  bits: 0 exec/s: 0
#64     cov: 7  bits: 0 exec/s: 0
#128    cov: 7  bits: 0 exec/s: 0
#256    cov: 7  bits: 0 exec/s: 0
#512    cov: 7  bits: 0 exec/s: 0
#1024   cov: 7  bits: 0 exec/s: 0
#2048   cov: 7  bits: 0 exec/s: 0
#2107   NEW: 11 B: 0 L: 64 S: 3 I: 0
#2153   NEW: 12 B: 0 L: 1 S: 4 I: 1     H       1: 72
#4096   cov: 12 bits: 0 exec/s: 0
#8192   cov: 12 bits: 0 exec/s: 0
#16384  cov: 12 bits: 0 exec/s: 0
#18091  NEW: 15 B: 0 L: 2 S: 5 I: 4     Hi      2: 72 105
#18122  NEW: 17 B: 0 L: 4 S: 6 I: 0     Hi?i    4: 72 105 8 105
Found the target, exiting

The recently added afl-style counters
(https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters)
are a bit more involved, but the basic bool-per-edge is quite enough
in most cases.

The fuzzer itself is written in C++ and uses STL (probably, not the
best idea, but it makes the experiments simpler).
Can't tell if it will be a problem with musl, but after all the fuzzer
itself is also trivial (as well as the entire concept)

>
>> But, my previous experience with running fuzzers w/o memory bug
>> detectors (asan, or others)
>> suggests that this is a bad idea. Memory bugs tend to accumulate and
>> show up in the following iterations (if at all).
>
> Well static linking with musl does not impose any constraint on
> redefining functions, so you could easily use a debugging malloc that
> lines up each allocation to end on a page boundary with a guard page
> after it.

Yea... This will slowdown fuzzing and guard pages only protect you
from overflow in one direction (ether left, of right, but not both).
But this is better than nothing.

> This would of course be slow and use lots of memory but
> would catch all heap overflows. And -fstack-protector-all would catch
> most stack-based overflows.

Only stack-overflow-write by a small amount, but yes, better than nothing.

BTW, writing a minimalistic asan run-time as part of musl should be a
matter of a couple of hours.
Probably much faster than making the current monster work with static linking.
I'd be happy to help with such.

--kcc


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21  6:05                         ` Konstantin Serebryany
@ 2015-03-21 13:28                           ` Szabolcs Nagy
  2015-03-21 21:03                             ` Szabolcs Nagy
  2015-03-23  4:55                             ` Konstantin Serebryany
  0 siblings, 2 replies; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21 13:28 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Rich Felker, musl

[-- Attachment #1: Type: text/plain, Size: 4549 bytes --]

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]:
> On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote:
> >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4
> >> -O1" the compiler will not insert any of the asan instrumentation
> >> and only insert calls to a couple of functions needed for coverage.
> >> Then, instead of linking with the full asan+coverage run-time, you
> >> will need a very simple re-implementation of coverage-only runtime.
> >
> > Could the existing runtime be used, just stripped down?
> 
> Yes, but for the basic functionality needed by the fuzzer it's simpler
> to write it from scratch, see below:
> 
> ========================================================
> svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
> cat <<EOF >cov-minimal-rt.c
> static long counter;
> void __sanitizer_cov_with_check(int *guard) {
>   if (*guard == 0) {
>     counter++;
>     *guard=1;
>   }
> }
> long __sanitizer_get_total_unique_coverage() { return counter; }
> void __sanitizer_cov_module_init() {}
> void __sanitizer_reset_coverage(){}
> void __sanitizer_get_coverage_guards(){}
> void __sanitizer_get_number_of_counters(){}
> void __sanitizer_update_counter_bitset_and_clear_counters(){}
> void __sanitizer_set_death_callback(){}
> EOF
> 
> clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer
> clang -std=c++11  -fsanitize=leak -fsanitize-coverage=3 -mllvm
> -sanitizer-coverage-block-threshold=0  Fuzzer/test/SimpleTest.cpp -c
> clang -c cov-minimal-rt.c
> clang++ *.o
> ./a.out
> ========================================================

with this i could run the fuzzer against libc.a

it's a bit more work to link to libc.a than adding
a -L so i attached the scripts i used (and an example)
so others can reproduce it

c++ headers cannot be used in the test (that would
require cleaning up the libstdc++ header mess)

but i think there is no reason to use c++ for these
libc api tests anyway

you may need to adjust the directories the scripts use

(the linking may need to change when compiler-rt is
used instead of libgcc)

usage:

cd workdir
./buildfuzz.sh
./buildmusl.sh
./fuzzcompile.sh reg.c
./fuzzlink.sh reg.o
./a.out

of course to make it useful the malloc magic is needed for
more likely crashes

> The recently added afl-style counters
> (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters)
> are a bit more involved, but the basic bool-per-edge is quite enough
> in most cases.
> 

ok

> The fuzzer itself is written in C++ and uses STL (probably, not the
> best idea, but it makes the experiments simpler).
> Can't tell if it will be a problem with musl, but after all the fuzzer
> itself is also trivial (as well as the entire concept)
> 

c++ happens to work because musl is (almost) abi compatible with
glibc on x86 so we can just link to the glibc linked libstdc++

(this can eg fail when the c++ thread local storage destructor
abi is used, that is not implemented in musl yet)

so yes c++ makes things more painful: you need to recompile the
entire toolchain to make it work reliably (and then both gcc
and clang have broken assumptions about the libc so you have to
patch them) which is too much work for running tests

> > Well static linking with musl does not impose any constraint on
> > redefining functions, so you could easily use a debugging malloc that
> > lines up each allocation to end on a page boundary with a guard page
> > after it.
> 
> Yea... This will slowdown fuzzing and guard pages only protect you
> from overflow in one direction (ether left, of right, but not both).
> But this is better than nothing.
> 

you can run the tests twice (for left and right) :)

> > This would of course be slow and use lots of memory but
> > would catch all heap overflows. And -fstack-protector-all would catch
> > most stack-based overflows.
> 
> Only stack-overflow-write by a small amount, but yes, better than nothing.
> 
> BTW, writing a minimalistic asan run-time as part of musl should be a
> matter of a couple of hours.
> Probably much faster than making the current monster work with static linking.
> I'd be happy to help with such.
> 

how would this look?

compile the tests and libc with asan, but instead of linking the
asan runtime from clang use a musl specific one?

i assume for that we still need to change the libc startup code, malloc
functions and may be some things around thread stacks

[-- Attachment #2: buildfuzz.sh --]
[-- Type: application/x-sh, Size: 823 bytes --]

[-- Attachment #3: buildmusl.sh --]
[-- Type: application/x-sh, Size: 298 bytes --]

[-- Attachment #4: fuzzcompile.sh --]
[-- Type: application/x-sh, Size: 177 bytes --]

[-- Attachment #5: fuzzlink.sh --]
[-- Type: application/x-sh, Size: 330 bytes --]

[-- Attachment #6: reg.c --]
[-- Type: text/x-csrc, Size: 279 bytes --]

#include <stdint.h>
#include <string.h>
#include <regex.h>

void TestOneInput(const uint8_t *p, size_t n)
{
	regex_t preg;
	regmatch_t pmatch[2];
	char *s = strndup((char*)p, n);
	if (!regcomp(&preg, s, REG_EXTENDED)) {
		regexec(&preg, s, 0, pmatch, 0);
		regfree(&preg);
	}
}


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 13:28                           ` Szabolcs Nagy
@ 2015-03-21 21:03                             ` Szabolcs Nagy
  2015-03-21 21:38                               ` Szabolcs Nagy
  2015-03-23  5:02                               ` Konstantin Serebryany
  2015-03-23  4:55                             ` Konstantin Serebryany
  1 sibling, 2 replies; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21 21:03 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

[-- Attachment #1: Type: text/plain, Size: 5167 bytes --]

i wrote some trivial test cases for

__dn_expand
__dns_parse
__pleval
fnmatch
inet_pton
strptime

to try out the concept, i've seen one crash so far:
a bus error when fuzzing inet_pton

probably a stack corruption that overwrites the location
where %rbp is stored and then the memory access relative
to rbp crashes

the fuzzing goes like:

./a.out -seed=1753234605
...
#8388608	cov: 546	bits: 0	exec/s: 838860
#16777216	cov: 546	bits: 0	exec/s: 798915
#27461772	NEW: 548 B: 0 L: 16 S: 22 I: 0	8283::2:2.8.83.3	16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51 
#27469404	NEW: 549 B: 0 L: 24 S: 23 I: 2	8283::2:283:2.8.83.2.833	24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51 
Bus error (core dumped)

is there a way to get a reproducer after such a crash?

in this case i fortunately had the core dump
and i can see the inet_pton argument in %r14
but it would be nice if there were occasional
saved check points from where i can restart
the fuzzer.

i dont yet see the bug and cannot reproduce the
issue outside the fuzzer (but i didnt try very hard)

attached the fuzz test case and the code that should
reproduce the issue, gdb session below

Core was generated by `./a.out -seed=1753234605'.
Program terminated with signal SIGBUS, Bus error.
#0  0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65
65			*a++ = ip[j]>>8;
(gdb) bt
#0  0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65
#1  0x0000000000400769 in TestOneInput ()
#2  0x000000000040c6f3 in fuzzer::Fuzzer::RunOneMaximizeTotalCoverage(std::vector<unsigned char, std::allocator<unsigned char> > const&) ()
#3  0x000000000040c412 in fuzzer::Fuzzer::RunOne(std::vector<unsigned char, std::allocator<unsigned char> > const&) ()
#4  0x000000000040cc7c in fuzzer::Fuzzer::MutateAndTestOne(std::vector<unsigned char, std::allocator<unsigned char> >*) ()
#5  0x000000000040cffb in fuzzer::Fuzzer::Loop(unsigned long) ()
#6  0x0000000000400d4c in fuzzer::FuzzerDriver(int, char**, void (*)(unsigned char const*, unsigned long)) ()
#7  0x00000000004007dc in main ()
(gdb) disass inet_pton,+40
Dump of assembler code from 0x479b40 to 0x479b68:
   0x0000000000479b40 <inet_pton+0>:	push   %rbp
   0x0000000000479b41 <inet_pton+1>:	push   %r15
   0x0000000000479b43 <inet_pton+3>:	push   %r14
   0x0000000000479b45 <inet_pton+5>:	push   %r13
   0x0000000000479b47 <inet_pton+7>:	push   %r12
   0x0000000000479b49 <inet_pton+9>:	push   %rbx
   0x0000000000479b4a <inet_pton+10>:	sub    $0x28,%rsp
   0x0000000000479b4e <inet_pton+14>:	mov    %rdx,%r13
   0x0000000000479b51 <inet_pton+17>:	mov    %rsi,%r14
   0x0000000000479b54 <inet_pton+20>:	mov    %edi,%ebp
   0x0000000000479b56 <inet_pton+22>:	mov    $0x6de364,%edi
   0x0000000000479b5b <inet_pton+27>:	callq  0x4007f0 <__sanitizer_cov_with_check>
   0x0000000000479b60 <inet_pton+32>:	cmp    $0xa,%ebp
   0x0000000000479b63 <inet_pton+35>:	jne    0x479ba6 <inet_pton+102>
   0x0000000000479b65 <inet_pton+37>:	mov    $0x6de3c8,%edi
End of assembler dump.
(gdb) disass /m 0x000000000047a020,+64 
Dump of assembler code from 0x47a020 to 0x47a060:
62			for (j=0; j<7-i; j++) ip[brk+j] = 0;
   0x000000000047a02a <inet_pton+1258>:	callq  0x4007f0 <__sanitizer_cov_with_check>
   0x000000000047a02f <inet_pton+1263>:	xor    %ebx,%ebx
   0x000000000047a031 <inet_pton+1265>:	mov    0x8(%rsp),%rbp
   0x000000000047a036 <inet_pton+1270>:	mov    0x4(%rsp),%r15d
   0x000000000047a03b <inet_pton+1275>:	jmp    0x47a04d <inet_pton+1293>
   0x000000000047a03d <inet_pton+1277>:	nopl   (%rax)

63		}
64		for (j=0; j<8; j++) {
   0x000000000047a040 <inet_pton+1280>:	inc    %rbx
   0x000000000047a043 <inet_pton+1283>:	mov    $0x6de46c,%edi
   0x000000000047a048 <inet_pton+1288>:	callq  0x4007f0 <__sanitizer_cov_with_check>
   0x000000000047a04d <inet_pton+1293>:	mov    $0x6de468,%edi

65			*a++ = ip[j]>>8;
   0x000000000047a052 <inet_pton+1298>:	callq  0x4007f0 <__sanitizer_cov_with_check>
   0x000000000047a057 <inet_pton+1303>:	mov    0x11(%rsp,%rbx,2),%al
=> 0x000000000047a05b <inet_pton+1307>:	mov    %al,0x0(%rbp,%rbx,2)

66			*a++ = ip[j];
   0x000000000047a05f <inet_pton+1311>:	mov    0x10(%rsp,%rbx,2),%al
   0x000000000047a063 <inet_pton+1315>:	mov    %al,0x1(%rbp,%rbx,2)

End of assembler dump.
(gdb) i reg
rax            0x7fffffffdf00	140737488346880
rbx            0x0	0
rcx            0x0	0
rdx            0x0	0
rsi            0x7fffffffdfb2	140737488347058
rdi            0x6de468	7201896
rbp            0x20000ffffe000	0x20000ffffe000
rsp            0x7fffffffdf80	0x7fffffffdf80
r8             0x7fffffffdf3a	140737488346938
r9             0x0	0
r10            0x0	0
r11            0x246	582
r12            0x10	16
r13            0x7	7
r14            0x6e2dc3	7220675
r15            0x1	1
rip            0x47a05b	0x47a05b <inet_pton+1307>
eflags         0x10202	[ IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x63	99
gs             0x0	0
(gdb) p (char*)0x6e2dc3
$3 = 0x6e2dc3 "2.8288;3:33::2.82.83333"
(gdb) 

[-- Attachment #2: inet_pton_fuzz.c --]
[-- Type: text/x-csrc, Size: 223 bytes --]

#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h>

void TestOneInput(const uint8_t *p, size_t n)
{
	char buf[16];
	char *s = strndup((char*)p, n);
	inet_pton(AF_INET6, s, buf);
	free(s);
}


[-- Attachment #3: inet_pton_reporoduce.c --]
[-- Type: text/x-csrc, Size: 136 bytes --]

#include <arpa/inet.h>

static const char s[] = "2.8288;3:33::2.82.83333";

int main()
{
	char buf[16];
	inet_pton(AF_INET6, s, buf);
}

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 21:03                             ` Szabolcs Nagy
@ 2015-03-21 21:38                               ` Szabolcs Nagy
  2015-03-21 22:13                                 ` Szabolcs Nagy
  2015-03-23  5:02                               ` Konstantin Serebryany
  1 sibling, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21 21:38 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

* Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:03:02 +0100]:
...
> r12            0x10	16
> r13            0x7	7
> r14            0x6e2dc3	7220675
> r15            0x1	1
> rip            0x47a05b	0x47a05b <inet_pton+1307>
> eflags         0x10202	[ IF RF ]
> cs             0x33	51
> ss             0x2b	43
> ds             0x0	0
> es             0x0	0
> fs             0x63	99
> gs             0x0	0
> (gdb) p (char*)0x6e2dc3
> $3 = 0x6e2dc3 "2.8288;3:33::2.82.83333"
> (gdb) 


ah.. r14 is incremented as the string is parsed
the original string is

(gdb) p (char*)0x6e2dc3-35
$37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333"

with this i can reproduce the crash


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 21:38                               ` Szabolcs Nagy
@ 2015-03-21 22:13                                 ` Szabolcs Nagy
  2015-03-22  6:36                                   ` Justin Cormack
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-21 22:13 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

* Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:38:25 +0100]:
> ah.. r14 is incremented as the string is parsed
> the original string is
> 
> (gdb) p (char*)0x6e2dc3-35
> $37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333"
> 
> with this i can reproduce the crash

i assume

1:2:3:4:5:6:7::

is invalid ipv6 address

currently musl gets the :: handling wrong at the end and it
goes on clobbering memory, i guess this is security critical
issue

a possible fix is attached but probably the code should
be made clearer here

[-- Attachment #2: inet_pton.diff --]
[-- Type: text/x-diff, Size: 358 bytes --]

diff --git a/src/network/inet_pton.c b/src/network/inet_pton.c
index 4496b47..e4cdad5 100644
--- a/src/network/inet_pton.c
+++ b/src/network/inet_pton.c
@@ -38,6 +38,7 @@ int inet_pton(int af, const char *restrict s, void *restrict a0)
 
 	for (i=0; ; i++) {
 		if (s[0]==':' && brk<0) {
+			if (i==7) return 0;
 			brk=i;
 			ip[i]=0;
 			if (!*++s) break;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 22:13                                 ` Szabolcs Nagy
@ 2015-03-22  6:36                                   ` Justin Cormack
  0 siblings, 0 replies; 42+ messages in thread
From: Justin Cormack @ 2015-03-22  6:36 UTC (permalink / raw)
  To: musl, Konstantin Serebryany, Rich Felker

On 21 March 2015 at 22:13, Szabolcs Nagy <nsz@port70.net> wrote:
> * Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:38:25 +0100]:
>> ah.. r14 is incremented as the string is parsed
>> the original string is
>>
>> (gdb) p (char*)0x6e2dc3-35
>> $37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333"
>>
>> with this i can reproduce the crash
>
> i assume
>
> 1:2:3:4:5:6:7::
>
> is invalid ipv6 address

No, it is valid, the last :: expands to :0. RFC 2373 says "The "::"
can also be used to compress the leading and/or trailing zeros in an
address."

Justin


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 13:28                           ` Szabolcs Nagy
  2015-03-21 21:03                             ` Szabolcs Nagy
@ 2015-03-23  4:55                             ` Konstantin Serebryany
  2015-03-23 12:35                               ` Szabolcs Nagy
  1 sibling, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-23  4:55 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]:
>> On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote:
>> >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4
>> >> -O1" the compiler will not insert any of the asan instrumentation
>> >> and only insert calls to a couple of functions needed for coverage.
>> >> Then, instead of linking with the full asan+coverage run-time, you
>> >> will need a very simple re-implementation of coverage-only runtime.
>> >
>> > Could the existing runtime be used, just stripped down?
>>
>> Yes, but for the basic functionality needed by the fuzzer it's simpler
>> to write it from scratch, see below:
>>
>> ========================================================
>> svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
>> cat <<EOF >cov-minimal-rt.c
>> static long counter;
>> void __sanitizer_cov_with_check(int *guard) {
>>   if (*guard == 0) {
>>     counter++;
>>     *guard=1;
>>   }
>> }
>> long __sanitizer_get_total_unique_coverage() { return counter; }
>> void __sanitizer_cov_module_init() {}
>> void __sanitizer_reset_coverage(){}
>> void __sanitizer_get_coverage_guards(){}
>> void __sanitizer_get_number_of_counters(){}
>> void __sanitizer_update_counter_bitset_and_clear_counters(){}
>> void __sanitizer_set_death_callback(){}
>> EOF
>>
>> clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer
>> clang -std=c++11  -fsanitize=leak -fsanitize-coverage=3 -mllvm
>> -sanitizer-coverage-block-threshold=0  Fuzzer/test/SimpleTest.cpp -c
>> clang -c cov-minimal-rt.c
>> clang++ *.o
>> ./a.out
>> ========================================================
>
> with this i could run the fuzzer against libc.a
>
> it's a bit more work to link to libc.a than adding
> a -L so i attached the scripts i used (and an example)
> so others can reproduce it
>
> c++ headers cannot be used in the test (that would
> require cleaning up the libstdc++ header mess)
>
> but i think there is no reason to use c++ for these
> libc api tests anyway

Sure.

>
> you may need to adjust the directories the scripts use
>
> (the linking may need to change when compiler-rt is
> used instead of libgcc)
>
> usage:
>
> cd workdir
> ./buildfuzz.sh
> ./buildmusl.sh
> ./fuzzcompile.sh reg.c
> ./fuzzlink.sh reg.o
> ./a.out
>
> of course to make it useful the malloc magic is needed for
> more likely crashes
>
>> The recently added afl-style counters
>> (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters)
>> are a bit more involved, but the basic bool-per-edge is quite enough
>> in most cases.
>>
>
> ok
>
>> The fuzzer itself is written in C++ and uses STL (probably, not the
>> best idea, but it makes the experiments simpler).
>> Can't tell if it will be a problem with musl, but after all the fuzzer
>> itself is also trivial (as well as the entire concept)
>>
>
> c++ happens to work because musl is (almost) abi compatible with
> glibc on x86 so we can just link to the glibc linked libstdc++
>
> (this can eg fail when the c++ thread local storage destructor
> abi is used, that is not implemented in musl yet)
>
> so yes c++ makes things more painful: you need to recompile the
> entire toolchain to make it work reliably (and then both gcc
> and clang have broken assumptions about the libc so you have to
> patch them) which is too much work for running tests
>
>> > Well static linking with musl does not impose any constraint on
>> > redefining functions, so you could easily use a debugging malloc that
>> > lines up each allocation to end on a page boundary with a guard page
>> > after it.
>>
>> Yea... This will slowdown fuzzing and guard pages only protect you
>> from overflow in one direction (ether left, of right, but not both).
>> But this is better than nothing.
>>
>
> you can run the tests twice (for left and right) :)
>
>> > This would of course be slow and use lots of memory but
>> > would catch all heap overflows. And -fstack-protector-all would catch
>> > most stack-based overflows.
>>
>> Only stack-overflow-write by a small amount, but yes, better than nothing.
>>
>> BTW, writing a minimalistic asan run-time as part of musl should be a
>> matter of a couple of hours.
>> Probably much faster than making the current monster work with static linking.
>> I'd be happy to help with such.
>>
>
> how would this look?
>
> compile the tests and libc with asan, but instead of linking the
> asan runtime from clang use a musl specific one?

Yes
>
> i assume for that we still need to change the libc startup code, malloc
> functions and may be some things around thread stacks

Try to compile a simple file with asan:

int main(int argc, char **argv) {
  int a[10];
  a[argc * 10] = 0;
  return 0;
}


% clang -fsanitize=address  a.c -c

% nm a.o | grep U
                 U __asan_init_v5
                 U __asan_option_detect_stack_use_after_return
                 U __asan_report_store4
                 U __asan_stack_malloc_1

__asan_report_store4 should print an error message saying that
"bad write of 4 bytes" happened in <current stack trace> on address <param>.
Also make  other __asan_report_{store,load}{1,2,4,8,16}

__asan_init_v5 will be called by the module initializer.
When called for the first time, it should mmap the shadow memory.
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm

__asan_option_detect_stack_use_after_return is a global, define it to 0.
__asan_stack_malloc_1 -- just make it an empty function.

Now, you can build a code with asan and detect stack buffer overflows.
(The reports won't be very detailed, but they will be correct).
If you add poisoned redzones to malloc -- you get heap buffer overflows.
If you delay the reuse of free-d memory -- you get use-after-free.

If you then implement __asan_register_globals (it is called on module
initialization and poisons redzones for globals)
you get global buffer overflows.

The current asan run-time is large an hairy because it attempts to be
thread-friendly,
intercepts lots of libc, and provides very details error messages.
W/o all that, the run-time will easily fit in < 100 LOC, which can be
a part of a libc implementation.

hth,
--kcc


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-21 21:03                             ` Szabolcs Nagy
  2015-03-21 21:38                               ` Szabolcs Nagy
@ 2015-03-23  5:02                               ` Konstantin Serebryany
  2015-03-23 12:25                                 ` Szabolcs Nagy
  1 sibling, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-23  5:02 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> i wrote some trivial test cases for
>
> __dn_expand
> __dns_parse
> __pleval
> fnmatch
> inet_pton
> strptime

Cool! Is there something you plan to have in the repository or share
some other way?

>
> to try out the concept, i've seen one crash so far:
> a bus error when fuzzing inet_pton
>
> probably a stack corruption that overwrites the location
> where %rbp is stored and then the memory access relative
> to rbp crashes
>
> the fuzzing goes like:
>
> ./a.out -seed=1753234605
> ...
> #8388608        cov: 546        bits: 0 exec/s: 838860
> #16777216       cov: 546        bits: 0 exec/s: 798915

This looks good. "exec/s: 798915" means that even with relatively weak search
algorithm you can find lots of paths.


> #27461772       NEW: 548 B: 0 L: 16 S: 22 I: 0  8283::2:2.8.83.3        16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51
> #27469404       NEW: 549 B: 0 L: 24 S: 23 I: 2  8283::2:283:2.8.83.2.833        24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51
> Bus error (core dumped)
>
> is there a way to get a reproducer after such a crash?
>

the fuzzer relies on asan to call at-crash handler -- this is what
__sanitizer_set_death_callback is for.
w/o asan you can set up a signal handler that will print
fuzzer::Fuzzer::CurrentUnit.
If everything else fails you can of course re-rerun the fuzzer with
the same seed.

> in this case i fortunately had the core dump
> and i can see the inet_pton argument in %r14
> but it would be nice if there were occasional
> saved check points from where i can restart
> the fuzzer.
>
> i dont yet see the bug and cannot reproduce the
> issue outside the fuzzer (but i didnt try very hard)
>
> attached the fuzz test case and the code that should
> reproduce the issue, gdb session below
>
> Core was generated by `./a.out -seed=1753234605'.
> Program terminated with signal SIGBUS, Bus error.
> #0  0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65
> 65                      *a++ = ip[j]>>8;
> (gdb) bt
> #0  0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65
> #1  0x0000000000400769 in TestOneInput ()
> #2  0x000000000040c6f3 in fuzzer::Fuzzer::RunOneMaximizeTotalCoverage(std::vector<unsigned char, std::allocator<unsigned char> > const&) ()
> #3  0x000000000040c412 in fuzzer::Fuzzer::RunOne(std::vector<unsigned char, std::allocator<unsigned char> > const&) ()
> #4  0x000000000040cc7c in fuzzer::Fuzzer::MutateAndTestOne(std::vector<unsigned char, std::allocator<unsigned char> >*) ()
> #5  0x000000000040cffb in fuzzer::Fuzzer::Loop(unsigned long) ()
> #6  0x0000000000400d4c in fuzzer::FuzzerDriver(int, char**, void (*)(unsigned char const*, unsigned long)) ()
> #7  0x00000000004007dc in main ()
> (gdb) disass inet_pton,+40
> Dump of assembler code from 0x479b40 to 0x479b68:
>    0x0000000000479b40 <inet_pton+0>:    push   %rbp
>    0x0000000000479b41 <inet_pton+1>:    push   %r15
>    0x0000000000479b43 <inet_pton+3>:    push   %r14
>    0x0000000000479b45 <inet_pton+5>:    push   %r13
>    0x0000000000479b47 <inet_pton+7>:    push   %r12
>    0x0000000000479b49 <inet_pton+9>:    push   %rbx
>    0x0000000000479b4a <inet_pton+10>:   sub    $0x28,%rsp
>    0x0000000000479b4e <inet_pton+14>:   mov    %rdx,%r13
>    0x0000000000479b51 <inet_pton+17>:   mov    %rsi,%r14
>    0x0000000000479b54 <inet_pton+20>:   mov    %edi,%ebp
>    0x0000000000479b56 <inet_pton+22>:   mov    $0x6de364,%edi
>    0x0000000000479b5b <inet_pton+27>:   callq  0x4007f0 <__sanitizer_cov_with_check>
>    0x0000000000479b60 <inet_pton+32>:   cmp    $0xa,%ebp
>    0x0000000000479b63 <inet_pton+35>:   jne    0x479ba6 <inet_pton+102>
>    0x0000000000479b65 <inet_pton+37>:   mov    $0x6de3c8,%edi
> End of assembler dump.
> (gdb) disass /m 0x000000000047a020,+64
> Dump of assembler code from 0x47a020 to 0x47a060:
> 62                      for (j=0; j<7-i; j++) ip[brk+j] = 0;
>    0x000000000047a02a <inet_pton+1258>: callq  0x4007f0 <__sanitizer_cov_with_check>
>    0x000000000047a02f <inet_pton+1263>: xor    %ebx,%ebx
>    0x000000000047a031 <inet_pton+1265>: mov    0x8(%rsp),%rbp
>    0x000000000047a036 <inet_pton+1270>: mov    0x4(%rsp),%r15d
>    0x000000000047a03b <inet_pton+1275>: jmp    0x47a04d <inet_pton+1293>
>    0x000000000047a03d <inet_pton+1277>: nopl   (%rax)
>
> 63              }
> 64              for (j=0; j<8; j++) {
>    0x000000000047a040 <inet_pton+1280>: inc    %rbx
>    0x000000000047a043 <inet_pton+1283>: mov    $0x6de46c,%edi
>    0x000000000047a048 <inet_pton+1288>: callq  0x4007f0 <__sanitizer_cov_with_check>
>    0x000000000047a04d <inet_pton+1293>: mov    $0x6de468,%edi
>
> 65                      *a++ = ip[j]>>8;
>    0x000000000047a052 <inet_pton+1298>: callq  0x4007f0 <__sanitizer_cov_with_check>
>    0x000000000047a057 <inet_pton+1303>: mov    0x11(%rsp,%rbx,2),%al
> => 0x000000000047a05b <inet_pton+1307>: mov    %al,0x0(%rbp,%rbx,2)
>
> 66                      *a++ = ip[j];
>    0x000000000047a05f <inet_pton+1311>: mov    0x10(%rsp,%rbx,2),%al
>    0x000000000047a063 <inet_pton+1315>: mov    %al,0x1(%rbp,%rbx,2)
>
> End of assembler dump.
> (gdb) i reg
> rax            0x7fffffffdf00   140737488346880
> rbx            0x0      0
> rcx            0x0      0
> rdx            0x0      0
> rsi            0x7fffffffdfb2   140737488347058
> rdi            0x6de468 7201896
> rbp            0x20000ffffe000  0x20000ffffe000
> rsp            0x7fffffffdf80   0x7fffffffdf80
> r8             0x7fffffffdf3a   140737488346938
> r9             0x0      0
> r10            0x0      0
> r11            0x246    582
> r12            0x10     16
> r13            0x7      7
> r14            0x6e2dc3 7220675
> r15            0x1      1
> rip            0x47a05b 0x47a05b <inet_pton+1307>
> eflags         0x10202  [ IF RF ]
> cs             0x33     51
> ss             0x2b     43
> ds             0x0      0
> es             0x0      0
> fs             0x63     99
> gs             0x0      0
> (gdb) p (char*)0x6e2dc3
> $3 = 0x6e2dc3 "2.8288;3:33::2.82.83333"
> (gdb)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23  5:02                               ` Konstantin Serebryany
@ 2015-03-23 12:25                                 ` Szabolcs Nagy
  2015-03-23 15:56                                   ` Konstantin Serebryany
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-23 12:25 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Rich Felker, musl

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 22:02:48 -0700]:
> On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> > i wrote some trivial test cases for
> >
> > __dn_expand
> > __dns_parse
> > __pleval
> > fnmatch
> > inet_pton
> > strptime
> 
> Cool! Is there something you plan to have in the repository or share
> some other way?
> 

(musl does not have extra tools/docs/tests in the main repo,
this is what you want eg for toolchain builds and packaging)

but i plan to release the tests somewhere
(currently they just trivial calls into the relevant libc function)

i don't know what's the best way to fuzz more than one argument
eg fnmatch(pattern, string, flags)

is it ok to just split the input data between the args?
(i havent looked under the hood how the fuzzer mutates the input)

> > #27461772       NEW: 548 B: 0 L: 16 S: 22 I: 0  8283::2:2.8.83.3        16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51
> > #27469404       NEW: 549 B: 0 L: 24 S: 23 I: 2  8283::2:283:2.8.83.2.833        24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51
> > Bus error (core dumped)
> >
> > is there a way to get a reproducer after such a crash?
> >
> 
> the fuzzer relies on asan to call at-crash handler -- this is what
> __sanitizer_set_death_callback is for.
> w/o asan you can set up a signal handler that will print
> fuzzer::Fuzzer::CurrentUnit.
> If everything else fails you can of course re-rerun the fuzzer with
> the same seed.
> 

thanks, sounds good



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23  4:55                             ` Konstantin Serebryany
@ 2015-03-23 12:35                               ` Szabolcs Nagy
  2015-03-23 14:40                                 ` stephen Turner
  2015-03-28 22:00                                 ` Szabolcs Nagy
  0 siblings, 2 replies; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-23 12:35 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: Rich Felker, musl

* Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]:
> On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]:
> >> BTW, writing a minimalistic asan run-time as part of musl should be a
> >> matter of a couple of hours.
> >> Probably much faster than making the current monster work with static linking.
> >> I'd be happy to help with such.
> >>
> >
> > how would this look?
> >
> > compile the tests and libc with asan, but instead of linking the
> > asan runtime from clang use a musl specific one?
> 
> Yes
> >
> > i assume for that we still need to change the libc startup code, malloc
> > functions and may be some things around thread stacks
> 
> Try to compile a simple file with asan:
> 
> int main(int argc, char **argv) {
>   int a[10];
>   a[argc * 10] = 0;
>   return 0;
> }
> 
> 
> % clang -fsanitize=address  a.c -c
> 
> % nm a.o | grep U
>                  U __asan_init_v5
>                  U __asan_option_detect_stack_use_after_return
>                  U __asan_report_store4
>                  U __asan_stack_malloc_1
> 
> __asan_report_store4 should print an error message saying that
> "bad write of 4 bytes" happened in <current stack trace> on address <param>.
> Also make  other __asan_report_{store,load}{1,2,4,8,16}
> 
> __asan_init_v5 will be called by the module initializer.
> When called for the first time, it should mmap the shadow memory.
> https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm
> 
> __asan_option_detect_stack_use_after_return is a global, define it to 0.
> __asan_stack_malloc_1 -- just make it an empty function.
> 
> Now, you can build a code with asan and detect stack buffer overflows.
> (The reports won't be very detailed, but they will be correct).
> If you add poisoned redzones to malloc -- you get heap buffer overflows.
> If you delay the reuse of free-d memory -- you get use-after-free.
> 
> If you then implement __asan_register_globals (it is called on module
> initialization and poisons redzones for globals)
> you get global buffer overflows.
> 
> The current asan run-time is large an hairy because it attempts to be
> thread-friendly,
> intercepts lots of libc, and provides very details error messages.
> W/o all that, the run-time will easily fit in < 100 LOC, which can be
> a part of a libc implementation.
> 

nice

i'm not sure if we want to push this into musl, but it looks useful

i'll try to implement it

> hth,
> --kcc


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 12:35                               ` Szabolcs Nagy
@ 2015-03-23 14:40                                 ` stephen Turner
  2015-03-23 14:53                                   ` Szabolcs Nagy
  2015-03-28 22:00                                 ` Szabolcs Nagy
  1 sibling, 1 reply; 42+ messages in thread
From: stephen Turner @ 2015-03-23 14:40 UTC (permalink / raw)
  To: musl, Konstantin Serebryany, Rich Felker

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]

So musl doesn't have any tests currently to ensure it was built correctly
by testing its responses to calls? I have seen a few packages such as
binutils come with its own built in test which I would gladly make use of
if it was available.

thanks
stephen

[-- Attachment #2: Type: text/html, Size: 406 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 14:40                                 ` stephen Turner
@ 2015-03-23 14:53                                   ` Szabolcs Nagy
  2015-03-23 15:46                                     ` stephen Turner
  0 siblings, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-23 14:53 UTC (permalink / raw)
  To: stephen Turner; +Cc: musl, Konstantin Serebryany, Rich Felker

* stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]:
> So musl doesn't have any tests currently to ensure it was built correctly

it has tests, just not in the main repo

> by testing its responses to calls? I have seen a few packages such as
> binutils come with its own built in test which I would gladly make use of
> if it was available.

you can use the tests, they are available at
http://nsz.repo.hu/git/?p=libc-test

(which was supposed to be a temporary location until
a cleanup is done..)



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 14:53                                   ` Szabolcs Nagy
@ 2015-03-23 15:46                                     ` stephen Turner
  2015-03-23 16:28                                       ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: stephen Turner @ 2015-03-23 15:46 UTC (permalink / raw)
  To: stephen Turner, musl, Konstantin Serebryany, Rich Felker

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On Mon, Mar 23, 2015 at 10:53 AM, Szabolcs Nagy <nsz@port70.net> wrote:

> * stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]:
> > So musl doesn't have any tests currently to ensure it was built correctly
>
> it has tests, just not in the main repo
>
> > by testing its responses to calls? I have seen a few packages such as
> > binutils come with its own built in test which I would gladly make use of
> > if it was available.
>
> you can use the tests, they are available at
> http://nsz.repo.hu/git/?p=libc-test
>
> (which was supposed to be a temporary location until
> a cleanup is done..)
>
> nice, i will give those a spin. Is there any consideration for making them
a feature/available in the release source files?

thanks,
stephen

[-- Attachment #2: Type: text/html, Size: 1284 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 12:25                                 ` Szabolcs Nagy
@ 2015-03-23 15:56                                   ` Konstantin Serebryany
  0 siblings, 0 replies; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-23 15:56 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

On Mon, Mar 23, 2015 at 5:25 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 22:02:48 -0700]:
>> On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote:
>> > i wrote some trivial test cases for
>> >
>> > __dn_expand
>> > __dns_parse
>> > __pleval
>> > fnmatch
>> > inet_pton
>> > strptime
>>
>> Cool! Is there something you plan to have in the repository or share
>> some other way?
>>
>
> (musl does not have extra tools/docs/tests in the main repo,
> this is what you want eg for toolchain builds and packaging)
>
> but i plan to release the tests somewhere
> (currently they just trivial calls into the relevant libc function)
>
> i don't know what's the best way to fuzz more than one argument
> eg fnmatch(pattern, string, flags)

Yes, splitting the input bytes between the args is the most
straightforward way.
Although sharing the input bytes (e.g. fnmatch(X, X, X[0])) was
surprisingly interesting too.

>
> is it ok to just split the input data between the args?
> (i havent looked under the hood how the fuzzer mutates the input)
>
>> > #27461772       NEW: 548 B: 0 L: 16 S: 22 I: 0  8283::2:2.8.83.3        16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51
>> > #27469404       NEW: 549 B: 0 L: 24 S: 23 I: 2  8283::2:283:2.8.83.2.833        24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51
>> > Bus error (core dumped)
>> >
>> > is there a way to get a reproducer after such a crash?
>> >
>>
>> the fuzzer relies on asan to call at-crash handler -- this is what
>> __sanitizer_set_death_callback is for.
>> w/o asan you can set up a signal handler that will print
>> fuzzer::Fuzzer::CurrentUnit.
>> If everything else fails you can of course re-rerun the fuzzer with
>> the same seed.
>>
>
> thanks, sounds good
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 15:46                                     ` stephen Turner
@ 2015-03-23 16:28                                       ` Rich Felker
  2015-03-23 17:21                                         ` Nathan McSween
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-23 16:28 UTC (permalink / raw)
  To: stephen Turner; +Cc: musl, Konstantin Serebryany

On Mon, Mar 23, 2015 at 11:46:04AM -0400, stephen Turner wrote:
> On Mon, Mar 23, 2015 at 10:53 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> 
> > * stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]:
> > > So musl doesn't have any tests currently to ensure it was built correctly
> >
> > it has tests, just not in the main repo
> >
> > > by testing its responses to calls? I have seen a few packages such as
> > > binutils come with its own built in test which I would gladly make use of
> > > if it was available.
> >
> > you can use the tests, they are available at
> > http://nsz.repo.hu/git/?p=libc-test
> >
> > (which was supposed to be a temporary location until
> > a cleanup is done..)
> >
> nice, i will give those a spin. Is there any consideration for making them
> a feature/available in the release source files?

From a release and build system standpoint, it really makes sense to
do tests separately, not integrated.

The biggest reason is not making cross-compiling a special case, but
isolating the concept of "libs/binaries generated for the target" as
something non-executable on the host. Other packages generally do a
poor job of this and then either cross-compiling breaks you you need
lots of cross-specific logic in the build system. With separate tests,
musl's build has no reason to care if it's being cross-compiled, and
testing a cross-compiled libc (if you feel a need to) is a matter of
how you script the build of everything for the cross toolchain and
environment rather.

Other than that, nsz has aimed to make all the tests libc-agnostic, so
they can also be used to test other libcs for conformance and bugs.
This works well with glibc already but uclibc has so much missing that
lots of the tests are gratuitously failing.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 16:28                                       ` Rich Felker
@ 2015-03-23 17:21                                         ` Nathan McSween
  0 siblings, 0 replies; 42+ messages in thread
From: Nathan McSween @ 2015-03-23 17:21 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 255 bytes --]

> From a release and build system standpoint, it really makes sense to
> do tests separately, not integrated.

I agree but only if there is a good automated continuous integration system
implemented to find bugs. I would run analyzers, etc as a git hook.

[-- Attachment #2: Type: text/html, Size: 298 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-23 12:35                               ` Szabolcs Nagy
  2015-03-23 14:40                                 ` stephen Turner
@ 2015-03-28 22:00                                 ` Szabolcs Nagy
  2015-03-28 22:32                                   ` Konstantin Serebryany
  1 sibling, 1 reply; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-28 22:00 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

* Szabolcs Nagy <nsz@port70.net> [2015-03-23 13:35:40 +0100]:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]:
> > On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> > > i assume for that we still need to change the libc startup code, malloc
> > > functions and may be some things around thread stacks
> > 
> > Try to compile a simple file with asan:
> > 
> > int main(int argc, char **argv) {
> >   int a[10];
> >   a[argc * 10] = 0;
> >   return 0;
> > }
> > 
> > 
> > % clang -fsanitize=address  a.c -c
> > 
> > % nm a.o | grep U
> >                  U __asan_init_v5
> >                  U __asan_option_detect_stack_use_after_return
> >                  U __asan_report_store4
> >                  U __asan_stack_malloc_1
> > 
> > __asan_report_store4 should print an error message saying that
> > "bad write of 4 bytes" happened in <current stack trace> on address <param>.
> > Also make  other __asan_report_{store,load}{1,2,4,8,16}
> > 
> > __asan_init_v5 will be called by the module initializer.
> > When called for the first time, it should mmap the shadow memory.
> > https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm
> > 

it seems asan intrumented code with memory access cannot run
before __asan_init_v5 does the shadow mapping (otherwise the
compiler generated shadow access would crash)

this is problematic for dynamic linking because the loader
calls various libc functions so those cannot be instrumented
unless shadow memory is already in place

i managed to make a minimal asan runtime work with static linking
(and then stack corruption is indeed detected).
(i called __asan_init_v5 in the begining of musl's __libc_start_main)

> > __asan_option_detect_stack_use_after_return is a global, define it to 0.
> > __asan_stack_malloc_1 -- just make it an empty function.
> > 
> > Now, you can build a code with asan and detect stack buffer overflows.
> > (The reports won't be very detailed, but they will be correct).
> > If you add poisoned redzones to malloc -- you get heap buffer overflows.
> > If you delay the reuse of free-d memory -- you get use-after-free.
> > 
> > If you then implement __asan_register_globals (it is called on module
> > initialization and poisons redzones for globals)
> > you get global buffer overflows.
> > 

i havent tried to do the heap/global poisoning

it's not clear to me what's the best way to manage the shadow
memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000
range and then mmap with rw the subranges that shadow mmaped memory
in the application?

then a modified mmap is needed to manage the shadow maps

so i think for a asan+cov instrumented libc:

- [S]crt1.s should do the initial shadow mmap before any c code gets run
- mmap should be replaced to do shadow management
- malloc etc should be replaced to handle shadow poisoning
- the minimal asan and cov runtimes should be added to libc
(so their symbols are available early in the loader)

and then we can use such a libc for testing and fuzzing
to catch heap/stack corruptions

i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1
and Scrt1asan.o on a system and the compiler/linker could
use those when compiling some code with asan+cov instrumentation
(but this can get ugly if there will be more instrumentations
that need runtime support in the future)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-28 22:00                                 ` Szabolcs Nagy
@ 2015-03-28 22:32                                   ` Konstantin Serebryany
  2015-03-28 22:38                                     ` Rich Felker
  0 siblings, 1 reply; 42+ messages in thread
From: Konstantin Serebryany @ 2015-03-28 22:32 UTC (permalink / raw)
  To: Konstantin Serebryany, Rich Felker, musl

On Sat, Mar 28, 2015 at 3:00 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Szabolcs Nagy <nsz@port70.net> [2015-03-23 13:35:40 +0100]:
>> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]:
>> > On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
>> > > i assume for that we still need to change the libc startup code, malloc
>> > > functions and may be some things around thread stacks
>> >
>> > Try to compile a simple file with asan:
>> >
>> > int main(int argc, char **argv) {
>> >   int a[10];
>> >   a[argc * 10] = 0;
>> >   return 0;
>> > }
>> >
>> >
>> > % clang -fsanitize=address  a.c -c
>> >
>> > % nm a.o | grep U
>> >                  U __asan_init_v5
>> >                  U __asan_option_detect_stack_use_after_return
>> >                  U __asan_report_store4
>> >                  U __asan_stack_malloc_1
>> >
>> > __asan_report_store4 should print an error message saying that
>> > "bad write of 4 bytes" happened in <current stack trace> on address <param>.
>> > Also make  other __asan_report_{store,load}{1,2,4,8,16}
>> >
>> > __asan_init_v5 will be called by the module initializer.
>> > When called for the first time, it should mmap the shadow memory.
>> > https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm
>> >
>
> it seems asan intrumented code with memory access cannot run
> before __asan_init_v5 does the shadow mapping (otherwise the
> compiler generated shadow access would crash)
>
Correct.

> this is problematic for dynamic linking because the loader
> calls various libc functions so those cannot be instrumented
> unless shadow memory is already in place

Yes, I have the same trouble with glibc and have to disable
instrumentation for some of the glibc functions
(by not adding -fsanitize-address), which is not optimal (may lose
bugs on other calls to these functions).

>
> i managed to make a minimal asan runtime work with static linking
> (and then stack corruption is indeed detected).
> (i called __asan_init_v5 in the begining of musl's __libc_start_main)

Nice!


>
>> > __asan_option_detect_stack_use_after_return is a global, define it to 0.
>> > __asan_stack_malloc_1 -- just make it an empty function.
>> >
>> > Now, you can build a code with asan and detect stack buffer overflows.
>> > (The reports won't be very detailed, but they will be correct).
>> > If you add poisoned redzones to malloc -- you get heap buffer overflows.
>> > If you delay the reuse of free-d memory -- you get use-after-free.
>> >
>> > If you then implement __asan_register_globals (it is called on module
>> > initialization and poisons redzones for globals)
>> > you get global buffer overflows.
>> >
>
> i havent tried to do the heap/global poisoning
>
> it's not clear to me what's the best way to manage the shadow
> memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000
> range and then mmap with rw the subranges that shadow mmaped memory
> in the application?

You probably can do it because you control all mmap calls from libc
(from malloc and thread stack creation),
but the first time the user calls mmap syscall bypassing libc it will break.
We use MAP_NORESERVE to map the entire range at startup.
This has a drawback that the application uses 16Tb of virtual address
space and tools like "ulimit -v" do not work.
But otherwise this works great.

>
> then a modified mmap is needed to manage the shadow maps
>
> so i think for a asan+cov instrumented libc:
>
> - [S]crt1.s should do the initial shadow mmap before any c code gets run
> - mmap should be replaced to do shadow management

Only if you do not use the MAP_NORESERVE trick.

> - malloc etc should be replaced to handle shadow poisoning
> - the minimal asan and cov runtimes should be added to libc
> (so their symbols are available early in the loader)
>
> and then we can use such a libc for testing and fuzzing
> to catch heap/stack corruptions
>
> i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1
> and Scrt1asan.o on a system and the compiler/linker could
> use those when compiling some code with asan+cov instrumentation

sounds great.

> (but this can get ugly if there will be more instrumentations
> that need runtime support in the future)
Yea. The core of asan run-time is relatively easy to replicate, as
you've seen.
Probably, one can replicate msan and ubsan (MemorySanitizer,
UndefinedBehaviorSanitizer)
with comparable effort since most of the logic for those tools is in
the compiler.
The use-after-return detection in asan relies on a very non-trivial
part of run-time.
tsan (ThreadSanitizer) has much more complex run-time which is hard to
replicate.

Maybe someday we'll make them working with static linking, but not any
time soon. :(

--kcc


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-28 22:32                                   ` Konstantin Serebryany
@ 2015-03-28 22:38                                     ` Rich Felker
  2015-03-28 23:15                                       ` Szabolcs Nagy
  0 siblings, 1 reply; 42+ messages in thread
From: Rich Felker @ 2015-03-28 22:38 UTC (permalink / raw)
  To: Konstantin Serebryany; +Cc: musl

On Sat, Mar 28, 2015 at 03:32:41PM -0700, Konstantin Serebryany wrote:
> > it seems asan intrumented code with memory access cannot run
> > before __asan_init_v5 does the shadow mapping (otherwise the
> > compiler generated shadow access would crash)
> >
> Correct.
> 
> > this is problematic for dynamic linking because the loader
> > calls various libc functions so those cannot be instrumented
> > unless shadow memory is already in place
> 
> Yes, I have the same trouble with glibc and have to disable
> instrumentation for some of the glibc functions
> (by not adding -fsanitize-address), which is not optimal (may lose
> bugs on other calls to these functions).

We have a similar problem with stack protector now. I want to be able
to enable stack protector for libc with 1.1.9 so maybe we can solve at
least some of the issues asan faces at the same time.

> >> > __asan_option_detect_stack_use_after_return is a global, define it to 0.
> >> > __asan_stack_malloc_1 -- just make it an empty function.
> >> >
> >> > Now, you can build a code with asan and detect stack buffer overflows.
> >> > (The reports won't be very detailed, but they will be correct).
> >> > If you add poisoned redzones to malloc -- you get heap buffer overflows.
> >> > If you delay the reuse of free-d memory -- you get use-after-free.
> >> >
> >> > If you then implement __asan_register_globals (it is called on module
> >> > initialization and poisons redzones for globals)
> >> > you get global buffer overflows.
> >> >
> >
> > i havent tried to do the heap/global poisoning
> >
> > it's not clear to me what's the best way to manage the shadow
> > memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000
> > range and then mmap with rw the subranges that shadow mmaped memory
> > in the application?
> 
> You probably can do it because you control all mmap calls from libc
> (from malloc and thread stack creation),
> but the first time the user calls mmap syscall bypassing libc it will break.
> We use MAP_NORESERVE to map the entire range at startup.
> This has a drawback that the application uses 16Tb of virtual address
> space and tools like "ulimit -v" do not work.
> But otherwise this works great.

MAP_NORESERVE is a NOP on systems with overcommit disabled. The right
way to achieve a similar result is to use PROT_NONE to reserve the
virtual address range without reserving commit, and only mprotect to
PROT_READ|PROT_WRITE later as needed.

> > - malloc etc should be replaced to handle shadow poisoning
> > - the minimal asan and cov runtimes should be added to libc
> > (so their symbols are available early in the loader)
> >
> > and then we can use such a libc for testing and fuzzing
> > to catch heap/stack corruptions
> >
> > i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1
> > and Scrt1asan.o on a system and the compiler/linker could
> > use those when compiling some code with asan+cov instrumentation
> 
> sounds great.

I'm not clear why there would be a different dynamic linker pathname
for it. It's not a different ABI from the application's standpoint, is
it? It seems like you might _want_ to install the dynamic linker with
a different name or location just to avoid clobbering the non-asan
build, but I don't think it needs a dedicated name/location like it
would if it were an ABI/ISA.

Rich


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: buffer overflow in regcomp and a way to find more of those
  2015-03-28 22:38                                     ` Rich Felker
@ 2015-03-28 23:15                                       ` Szabolcs Nagy
  0 siblings, 0 replies; 42+ messages in thread
From: Szabolcs Nagy @ 2015-03-28 23:15 UTC (permalink / raw)
  To: Rich Felker; +Cc: Konstantin Serebryany, musl

* Rich Felker <dalias@libc.org> [2015-03-28 18:38:33 -0400]:
> On Sat, Mar 28, 2015 at 03:32:41PM -0700, Konstantin Serebryany wrote:
> > >
> > > i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1
> > > and Scrt1asan.o on a system and the compiler/linker could
> > > use those when compiling some code with asan+cov instrumentation
> > 
> > sounds great.
> 
> I'm not clear why there would be a different dynamic linker pathname
> for it. It's not a different ABI from the application's standpoint, is
> it? It seems like you might _want_ to install the dynamic linker with
> a different name or location just to avoid clobbering the non-asan
> build, but I don't think it needs a dedicated name/location like it
> would if it were an ABI/ISA.
> 

if you only instrument libc and not the application then
there is no difference between the two libcs from app pov

but if you want to instrument the application too then
it must use the the libc which does the shadow management
and has the asan rt

the name does not have to be dedicated if asan instrumented
binaries are only used locally/temporarily for testing

(an instrumented library can only be used with the
asan libc, but a non-instrumented lib should work with
both libcs)


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-03-28 23:15 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany
2015-03-20 20:40 ` Rich Felker
2015-03-20 21:28 ` Szabolcs Nagy
2015-03-20 23:48   ` Szabolcs Nagy
2015-03-20 22:32 ` Rich Felker
2015-03-20 23:52 ` Szabolcs Nagy
2015-03-21  0:06   ` Konstantin Serebryany
2015-03-21  0:26     ` Szabolcs Nagy
2015-03-21  0:46       ` Rich Felker
2015-03-21  0:54         ` Konstantin Serebryany
2015-03-21  1:00           ` Rich Felker
2015-03-21  1:05             ` Konstantin Serebryany
2015-03-21  1:10               ` Konstantin Serebryany
2015-03-21  1:23                 ` Szabolcs Nagy
2015-03-21  1:30                   ` Rich Felker
2015-03-21  2:10                     ` Szabolcs Nagy
2015-03-21  2:17                       ` Rich Felker
2015-03-21  1:32               ` Rich Felker
2015-03-21  1:37                 ` Konstantin Serebryany
2015-03-21  1:56                   ` Rich Felker
2015-03-21  2:14                     ` Konstantin Serebryany
2015-03-21  2:20                       ` Rich Felker
2015-03-21  6:05                         ` Konstantin Serebryany
2015-03-21 13:28                           ` Szabolcs Nagy
2015-03-21 21:03                             ` Szabolcs Nagy
2015-03-21 21:38                               ` Szabolcs Nagy
2015-03-21 22:13                                 ` Szabolcs Nagy
2015-03-22  6:36                                   ` Justin Cormack
2015-03-23  5:02                               ` Konstantin Serebryany
2015-03-23 12:25                                 ` Szabolcs Nagy
2015-03-23 15:56                                   ` Konstantin Serebryany
2015-03-23  4:55                             ` Konstantin Serebryany
2015-03-23 12:35                               ` Szabolcs Nagy
2015-03-23 14:40                                 ` stephen Turner
2015-03-23 14:53                                   ` Szabolcs Nagy
2015-03-23 15:46                                     ` stephen Turner
2015-03-23 16:28                                       ` Rich Felker
2015-03-23 17:21                                         ` Nathan McSween
2015-03-28 22:00                                 ` Szabolcs Nagy
2015-03-28 22:32                                   ` Konstantin Serebryany
2015-03-28 22:38                                     ` Rich Felker
2015-03-28 23:15                                       ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).