* buffer overflow in regcomp and a way to find more of those @ 2015-03-20 20:17 Konstantin Serebryany 2015-03-20 20:40 ` Rich Felker ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-20 20:17 UTC (permalink / raw) To: musl, Szabolcs Nagy Hi, Following the discussion at the glibc mailing list (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) I've tried to fuzz musl regcomp and the first bug popped up quickly. Please let me know if you would be interested in adding the fuzzer (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) to the musl testing process. Exact repro steps, just copy-paste (assuming you have fresh clang) =================== =============== tar xf ~/Downloads/musl-1.1.7.tar.gz cd musl-1.1.7 ./configure && make -j cat << EOF > bug1.c #include <string.h> #include <stdlib.h> #include "regex.h" int main() { regex_t preg; char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51, 48, 92, 125, 0}; char *s = strdup(a); if (0 == regcomp(&preg, s, 0)) { regfree(&preg); } free(s); return 0; } EOF clang -g -fsanitize=address ./src/regex/reg*.c src/regex/tre*.c src/locale/__lctrans.c src/internal/libc.c -I include -I src/internal/ -Iarch/x86_64 bug1.c ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out ==33356==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000ef44 at pc 0x0000004d8cb9 bp 0x7fff09d51b10 sp 0x7fff09d51b08 WRITE of size 4 at 0x60200000ef44 thread T0 #0 0x4d8cb8 in tre_copy_ast src/regex/regcomp.c:1697:27 #1 0x4cc332 in tre_expand_ast src/regex/regcomp.c:1884:16 #2 0x4c4de2 in regcomp src/regex/regcomp.c:2739:13 #3 0x4e9e06 in main bug1.c:9:12 #4 0x7f49f1086ec4 in __libc_start_main /build/buildd/eglibc-2.19/csu/libc-start.c:287 #5 0x416d45 in _start (a.out+0x416d45) 0x60200000ef44 is located 8 bytes to the right of 12-byte region [0x60200000ef30,0x60200000ef3c) allocated by thread T0 here: #0 0x4a20a4 in calloc /usr/local/google/home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56:3 #1 0x4c4bd9 in regcomp src/regex/regcomp.c:2721:28 #2 0x4e9e06 in main bug1.c:9:12 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany @ 2015-03-20 20:40 ` Rich Felker 2015-03-20 21:28 ` Szabolcs Nagy ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Rich Felker @ 2015-03-20 20:40 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl, Szabolcs Nagy On Fri, Mar 20, 2015 at 01:17:47PM -0700, Konstantin Serebryany wrote: > Hi, > > Following the discussion at the glibc mailing list > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > I've tried to fuzz musl regcomp and the first bug popped up quickly. > Please let me know if you would be interested in adding the fuzzer > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > to the musl testing process. > > Exact repro steps, just copy-paste (assuming you have fresh clang) > =================== =============== > tar xf ~/Downloads/musl-1.1.7.tar.gz > cd musl-1.1.7 > ../configure && make -j > cat << EOF > bug1.c > #include <string.h> > #include <stdlib.h> > #include "regex.h" > > int main() { > regex_t preg; > char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51, > 48, 92, 125, 0}; Simplified test case: char a[] = "\\\375\\{2\\}"; The problem seems to be handling of [backslash], [illegal sequence], [repetition]. I haven't analyzed the cause, but that was my initial guess and the minimal example I was able to reduce it to without the crash disappearing. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany 2015-03-20 20:40 ` Rich Felker @ 2015-03-20 21:28 ` Szabolcs Nagy 2015-03-20 23:48 ` Szabolcs Nagy 2015-03-20 22:32 ` Rich Felker 2015-03-20 23:52 ` Szabolcs Nagy 3 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-20 21:28 UTC (permalink / raw) To: musl; +Cc: Szabolcs Nagy * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: > Following the discussion at the glibc mailing list > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > I've tried to fuzz musl regcomp and the first bug popped up quickly. > Please let me know if you would be interested in adding the fuzzer > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > to the musl testing process. thanks for doing this i think i already know what's the bug is (of course it's in an underspecified extension where applications expect conflicting behaviour... but the segfault is my bug) > Exact repro steps, just copy-paste (assuming you have fresh clang) > =================== =============== > tar xf ~/Downloads/musl-1.1.7.tar.gz > cd musl-1.1.7 > ./configure && make -j > cat << EOF > bug1.c > #include <string.h> > #include <stdlib.h> > #include "regex.h" > > int main() { > regex_t preg; > char a[] = {40, 123, 33, 124, 33, 19, 40, 96, 92, 253, 92, 123, 51, > 48, 92, 125, 0}; > char *s = strdup(a); > if (0 == regcomp(&preg, s, 0)) { > regfree(&preg); > } > free(s); > return 0; > } > EOF > clang -g -fsanitize=address ./src/regex/reg*.c src/regex/tre*.c > src/locale/__lctrans.c src/internal/libc.c -I include -I src/internal/ > -Iarch/x86_64 bug1.c > ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out > looks simple i'll set up some glibc based virtual machine to be able to play with it thanks again > ==33356==ERROR: AddressSanitizer: heap-buffer-overflow on address > 0x60200000ef44 at pc 0x0000004d8cb9 bp 0x7fff09d51b10 sp > 0x7fff09d51b08 > WRITE of size 4 at 0x60200000ef44 thread T0 > #0 0x4d8cb8 in tre_copy_ast src/regex/regcomp.c:1697:27 > #1 0x4cc332 in tre_expand_ast src/regex/regcomp.c:1884:16 > #2 0x4c4de2 in regcomp src/regex/regcomp.c:2739:13 > #3 0x4e9e06 in main bug1.c:9:12 > #4 0x7f49f1086ec4 in __libc_start_main > /build/buildd/eglibc-2.19/csu/libc-start.c:287 > #5 0x416d45 in _start (a.out+0x416d45) > > 0x60200000ef44 is located 8 bytes to the right of 12-byte region > [0x60200000ef30,0x60200000ef3c) > allocated by thread T0 here: > #0 0x4a20a4 in calloc > /usr/local/google/home/kcc/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:56:3 > #1 0x4c4bd9 in regcomp src/regex/regcomp.c:2721:28 > #2 0x4e9e06 in main bug1.c:9:12 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 21:28 ` Szabolcs Nagy @ 2015-03-20 23:48 ` Szabolcs Nagy 0 siblings, 0 replies; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-20 23:48 UTC (permalink / raw) To: musl, Szabolcs Nagy * Szabolcs Nagy <nsz@port70.net> [2015-03-20 22:28:03 +0100]: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: > > Following the discussion at the glibc mailing list > > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > > I've tried to fuzz musl regcomp and the first bug popped up quickly. > > Please let me know if you would be interested in adding the fuzzer > > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > > to the musl testing process. ok (1) the clean approach would be to have a way to build an instrumented libc and a separate set of test cases for various libc apis that the fuzzer could use. (2) the other approach is to cut parts of the libc out (the parsers often don't depend on too much libc internals) and build them with whatever runtime the fuzzer needs the question is how hard it is to do (1) ? i assume asan is non-trivial to set up for that (or is it enough to replace malloc calls? and some startup logic?) at first it is ok if the fuzzer only catches crashing bugs so if that's easy to do i'd go for that. for (1) i can write the test cases and adjust the musl build system, but i dont know how much difficulty should i expect ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany 2015-03-20 20:40 ` Rich Felker 2015-03-20 21:28 ` Szabolcs Nagy @ 2015-03-20 22:32 ` Rich Felker 2015-03-20 23:52 ` Szabolcs Nagy 3 siblings, 0 replies; 42+ messages in thread From: Rich Felker @ 2015-03-20 22:32 UTC (permalink / raw) To: musl On Fri, Mar 20, 2015 at 01:17:47PM -0700, Konstantin Serebryany wrote: > Hi, > > Following the discussion at the glibc mailing list > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > I've tried to fuzz musl regcomp and the first bug popped up quickly. > Please let me know if you would be interested in adding the fuzzer > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > to the musl testing process. Thanks! It's fixed in commit 39dfd58417ef642307d90306e1c7e50aaec5a35c. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany ` (2 preceding siblings ...) 2015-03-20 22:32 ` Rich Felker @ 2015-03-20 23:52 ` Szabolcs Nagy 2015-03-21 0:06 ` Konstantin Serebryany 3 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-20 23:52 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: > Following the discussion at the glibc mailing list > (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > I've tried to fuzz musl regcomp and the first bug popped up quickly. > Please let me know if you would be interested in adding the fuzzer > (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > to the musl testing process. > (now with correct To: header) (1) the clean approach would be to have a way to build an instrumented libc and a separate set of test cases for various libc apis that the fuzzer could use. (2) the other approach is to cut parts of the libc out (the parsers often don't depend on too much libc internals) and build them with whatever runtime the fuzzer needs the question is how hard it is to do (1) ? i assume asan is non-trivial to set up for that (or is it enough to replace malloc calls? and some startup logic?) at first it is ok if the fuzzer only catches crashing bugs so if that's easy to do i'd go for that. for (1) i can write the test cases and adjust the musl build system, but i dont know how much difficulty should i expect thanks ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-20 23:52 ` Szabolcs Nagy @ 2015-03-21 0:06 ` Konstantin Serebryany 2015-03-21 0:26 ` Szabolcs Nagy 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 0:06 UTC (permalink / raw) To: Konstantin Serebryany, musl On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: >> Following the discussion at the glibc mailing list >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) >> I've tried to fuzz musl regcomp and the first bug popped up quickly. >> Please let me know if you would be interested in adding the fuzzer >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) >> to the musl testing process. >> > > (now with correct To: header) > > > (1) the clean approach would be to have a way to build an > instrumented libc and a separate set of test cases for > various libc apis that the fuzzer could use. Correct. Building libc.a is simple: CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j But then I don't know how to properly link libc.a to a test case. How do you usually link tests with libc.a on x86_64 linux? > > (2) the other approach is to cut parts of the libc out > (the parsers often don't depend on too much libc internals) > and build them with whatever runtime the fuzzer needs That's exactly what I did. Not optimal, I agree. > > the question is how hard it is to do (1) ? > > i assume asan is non-trivial to set up for that (or is it > enough to replace malloc calls? and some startup logic?) asan replaces malloc and a few more libc functions. It works with various different libcs, so there is a good chance that it will work here with no or minimal changes. > > at first it is ok if the fuzzer only catches crashing bugs > so if that's easy to do i'd go for that. > > for (1) i can write the test cases and adjust the musl build > system, but i dont know how much difficulty should i expect > > thanks ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 0:06 ` Konstantin Serebryany @ 2015-03-21 0:26 ` Szabolcs Nagy 2015-03-21 0:46 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 0:26 UTC (permalink / raw) To: musl; +Cc: Konstantin Serebryany * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]: > On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote: > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: > >> Following the discussion at the glibc mailing list > >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > >> I've tried to fuzz musl regcomp and the first bug popped up quickly. > >> Please let me know if you would be interested in adding the fuzzer > >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > >> to the musl testing process. > >> > > > > (now with correct To: header) > > > > > > (1) the clean approach would be to have a way to build an > > instrumented libc and a separate set of test cases for > > various libc apis that the fuzzer could use. > > Correct. Building libc.a is simple: > CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j > But then I don't know how to properly link libc.a to a test case. > How do you usually link tests with libc.a on x86_64 linux? > we have a musl-gcc script when the compiler is gcc (it uses a simple spec file to set things up), i don't know what's the equivalent mechanism in clang world, but i think one can create a simple script based on the first version of musl-gcc http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631 the test system does not know about toolchain details the user has to provide whatever compiler wrapper script is needed to make things work but i think i wont try to integrate this into our libc-test right away, libc-test is designed to test a posix libc with minimal assumptions or external dependencies (the testing process of musl is not very formal or automated yet anyway) > > the question is how hard it is to do (1) ? > > > > i assume asan is non-trivial to set up for that (or is it > > enough to replace malloc calls? and some startup logic?) > > asan replaces malloc and a few more libc functions. > It works with various different libcs, so there is a good chance that > it will work here with no or minimal changes. > ok i'll try it ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 0:26 ` Szabolcs Nagy @ 2015-03-21 0:46 ` Rich Felker 2015-03-21 0:54 ` Konstantin Serebryany 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 0:46 UTC (permalink / raw) To: musl, Konstantin Serebryany On Sat, Mar 21, 2015 at 01:26:16AM +0100, Szabolcs Nagy wrote: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]: > > On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote: > > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: > > >> Following the discussion at the glibc mailing list > > >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) > > >> I've tried to fuzz musl regcomp and the first bug popped up quickly. > > >> Please let me know if you would be interested in adding the fuzzer > > >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) > > >> to the musl testing process. > > >> > > > > > > (now with correct To: header) > > > > > > > > > (1) the clean approach would be to have a way to build an > > > instrumented libc and a separate set of test cases for > > > various libc apis that the fuzzer could use. > > > > Correct. Building libc.a is simple: > > CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j > > But then I don't know how to properly link libc.a to a test case. > > How do you usually link tests with libc.a on x86_64 linux? > > we have a musl-gcc script when the compiler is gcc (it uses > a simple spec file to set things up), i don't know what's > the equivalent mechanism in clang world, but i think one > can create a simple script based on the first version of > musl-gcc > > http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631 Do you mean the version removed in that commit? As long as you're just building simple test files and not large program/library ecosystems, I think it's even simpler than that. For static linking, just using -nostdinc, -isystem, and -L should be all you need to compile/link against the instrumented musl libc.a instead of the host libc. Assuming the host is musl-based already, -nostdinc and -isystem shouldn't even be needed. Just -L is sufficient. > the test system does not know about toolchain details > the user has to provide whatever compiler wrapper script > is needed to make things work > > but i think i wont try to integrate this into our libc-test > right away, libc-test is designed to test a posix libc with > minimal assumptions or external dependencies > (the testing process of musl is not very formal or automated > yet anyway) Indeed, I don't think fuzzing is something that belongs with regular functionality/regression tests. It presumably takes a lot more time, requires different build procedures, and addresses a different need than the tests we have. > > > the question is how hard it is to do (1) ? > > > > > > i assume asan is non-trivial to set up for that (or is it > > > enough to replace malloc calls? and some startup logic?) > > > > asan replaces malloc and a few more libc functions. > > It works with various different libcs, so there is a good chance that > > it will work here with no or minimal changes. > > ok i'll try it I would guess it works with no change for static linking, but some changes might be needed for dynamic linking. I'm perfectly happy with all the fuzzing being done with static linking anyway; I don't think dynamic linking would have significant additional code paths whose coverage need checking. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 0:46 ` Rich Felker @ 2015-03-21 0:54 ` Konstantin Serebryany 2015-03-21 1:00 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 0:54 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Fri, Mar 20, 2015 at 5:46 PM, Rich Felker <dalias@libc.org> wrote: > On Sat, Mar 21, 2015 at 01:26:16AM +0100, Szabolcs Nagy wrote: >> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 17:06:18 -0700]: >> > On Fri, Mar 20, 2015 at 4:52 PM, Szabolcs Nagy <nsz@port70.net> wrote: >> > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 13:17:47 -0700]: >> > >> Following the discussion at the glibc mailing list >> > >> (https://sourceware.org/ml/libc-alpha/2015-03/msg00662.html) >> > >> I've tried to fuzz musl regcomp and the first bug popped up quickly. >> > >> Please let me know if you would be interested in adding the fuzzer >> > >> (http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup) >> > >> to the musl testing process. >> > >> >> > > >> > > (now with correct To: header) >> > > >> > > >> > > (1) the clean approach would be to have a way to build an >> > > instrumented libc and a separate set of test cases for >> > > various libc apis that the fuzzer could use. >> > >> > Correct. Building libc.a is simple: >> > CC="clang -fsanitize=address -fsanitize-coverage=3 " ./configure && make -j >> > But then I don't know how to properly link libc.a to a test case. >> > How do you usually link tests with libc.a on x86_64 linux? >> >> we have a musl-gcc script when the compiler is gcc (it uses >> a simple spec file to set things up), i don't know what's >> the equivalent mechanism in clang world, but i think one >> can create a simple script based on the first version of >> musl-gcc >> >> http://git.musl-libc.org/cgit/musl/commit/?id=58f430c1e0255c0b28aed1e9bf3d892c18c06631 > > Do you mean the version removed in that commit? As long as you're just > building simple test files and not large program/library ecosystems, I > think it's even simpler than that. For static linking, just using > -nostdinc, -isystem, and -L should be all you need to compile/link > against the instrumented musl libc.a instead of the host libc. > Assuming the host is musl-based already, -nostdinc and -isystem > shouldn't even be needed. Just -L is sufficient. > >> the test system does not know about toolchain details >> the user has to provide whatever compiler wrapper script >> is needed to make things work >> >> but i think i wont try to integrate this into our libc-test >> right away, libc-test is designed to test a posix libc with >> minimal assumptions or external dependencies >> (the testing process of musl is not very formal or automated >> yet anyway) > > Indeed, I don't think fuzzing is something that belongs with regular > functionality/regression tests. It presumably takes a lot more time, > requires different build procedures, and addresses a different need > than the tests we have. > >> > > the question is how hard it is to do (1) ? >> > > >> > > i assume asan is non-trivial to set up for that (or is it >> > > enough to replace malloc calls? and some startup logic?) >> > >> > asan replaces malloc and a few more libc functions. >> > It works with various different libcs, so there is a good chance that >> > it will work here with no or minimal changes. >> >> ok i'll try it > > I would guess it works with no change for static linking, but some > changes might be needed for dynamic linking. I'm perfectly happy with > all the fuzzing being done with static linking anyway; I don't think > dynamic linking would have significant additional code paths whose > coverage need checking. sadly, asan does not support fully static linking. > > Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 0:54 ` Konstantin Serebryany @ 2015-03-21 1:00 ` Rich Felker 2015-03-21 1:05 ` Konstantin Serebryany 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 1:00 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: > >> > > the question is how hard it is to do (1) ? > >> > > > >> > > i assume asan is non-trivial to set up for that (or is it > >> > > enough to replace malloc calls? and some startup logic?) > >> > > >> > asan replaces malloc and a few more libc functions. > >> > It works with various different libcs, so there is a good chance that > >> > it will work here with no or minimal changes. > >> > >> ok i'll try it > > > > I would guess it works with no change for static linking, but some > > changes might be needed for dynamic linking. I'm perfectly happy with > > all the fuzzing being done with static linking anyway; I don't think > > dynamic linking would have significant additional code paths whose > > coverage need checking. > > sadly, asan does not support fully static linking. Is this just an oversight or something fundamental that's hard to fix? The sort of things it wants to do are much less likely to work with dynamic linking. Dynamic-linked musl requires all internal symbol references to be resolved at ld-time and does not support interposing in front of them. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:00 ` Rich Felker @ 2015-03-21 1:05 ` Konstantin Serebryany 2015-03-21 1:10 ` Konstantin Serebryany 2015-03-21 1:32 ` Rich Felker 0 siblings, 2 replies; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 1:05 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote: > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: >> >> > > the question is how hard it is to do (1) ? >> >> > > >> >> > > i assume asan is non-trivial to set up for that (or is it >> >> > > enough to replace malloc calls? and some startup logic?) >> >> > >> >> > asan replaces malloc and a few more libc functions. >> >> > It works with various different libcs, so there is a good chance that >> >> > it will work here with no or minimal changes. >> >> >> >> ok i'll try it >> > >> > I would guess it works with no change for static linking, but some >> > changes might be needed for dynamic linking. I'm perfectly happy with >> > all the fuzzing being done with static linking anyway; I don't think >> > dynamic linking would have significant additional code paths whose >> > coverage need checking. >> >> sadly, asan does not support fully static linking. > > Is this just an oversight or something fundamental that's hard to fix? Quite fundamental. asan needs to be able to intercept certain libc functions and on all platforms (linux, android, OSX, Windows, etc) it works only when libc itself is dynamically linked. (Theoretically, it's possible to fix, but it'll be too much work :( ) > The sort of things it wants to do are much less likely to work with > dynamic linking. Dynamic-linked musl requires all internal symbol > references to be resolved at ld-time and does not support interposing > in front of them. > > Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:05 ` Konstantin Serebryany @ 2015-03-21 1:10 ` Konstantin Serebryany 2015-03-21 1:23 ` Szabolcs Nagy 2015-03-21 1:32 ` Rich Felker 1 sibling, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 1:10 UTC (permalink / raw) To: Rich Felker; +Cc: musl After your fix the fuzzer did not find anything else so far, but it suffers from slow performance on some cases. Not sure if this qualifies for a bug, but the following example takes ~2 seconds to run (runs instantly with glibc): int main() { regex_t preg; const char *s = ".****\\Z$<\\0)_"; regmatch_t pmatch[2]; if (0 == regcomp(&preg, s, 0)) { regexec(&preg, s, 0, pmatch, 0); regfree(&preg); } return 0; } On Fri, Mar 20, 2015 at 6:05 PM, Konstantin Serebryany <konstantin.s.serebryany@gmail.com> wrote: > On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote: >> On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: >>> >> > > the question is how hard it is to do (1) ? >>> >> > > >>> >> > > i assume asan is non-trivial to set up for that (or is it >>> >> > > enough to replace malloc calls? and some startup logic?) >>> >> > >>> >> > asan replaces malloc and a few more libc functions. >>> >> > It works with various different libcs, so there is a good chance that >>> >> > it will work here with no or minimal changes. >>> >> >>> >> ok i'll try it >>> > >>> > I would guess it works with no change for static linking, but some >>> > changes might be needed for dynamic linking. I'm perfectly happy with >>> > all the fuzzing being done with static linking anyway; I don't think >>> > dynamic linking would have significant additional code paths whose >>> > coverage need checking. >>> >>> sadly, asan does not support fully static linking. >> >> Is this just an oversight or something fundamental that's hard to fix? > > Quite fundamental. > asan needs to be able to intercept certain libc functions and on all > platforms (linux, android, OSX, Windows, etc) it works only when libc > itself is dynamically linked. > > (Theoretically, it's possible to fix, but it'll be too much work :( ) > >> The sort of things it wants to do are much less likely to work with >> dynamic linking. Dynamic-linked musl requires all internal symbol >> references to be resolved at ld-time and does not support interposing >> in front of them. >> >> Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:10 ` Konstantin Serebryany @ 2015-03-21 1:23 ` Szabolcs Nagy 2015-03-21 1:30 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 1:23 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: Rich Felker, musl * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 18:10:18 -0700]: > After your fix the fuzzer did not find anything else so far, but it > suffers from slow performance on some cases. > Not sure if this qualifies for a bug, but the following example takes > ~2 seconds to run (runs instantly with glibc): i think the problem is stacked repetitions tre doesnt handle them in a sane way and uses huge amount of ram for * it would be easy to solve, but the general case is theoretically impossible to solve: x{255}{255} will be a 255*255 state machine this is the only part in the musl regex engine that's allowed to have super linear space/time complexity (you might want to add some logic to avoid such stacked repetitions to speed up the search) (btw the standard does not allow these, but if the pattern is parenthesized around every repetition then that's ok: (x*)* is a valid pattern, x** is not, so there is not much point rejecting these patterns the problem does not go away since grouping is allowed) > int main() { > regex_t preg; > const char *s = ".****\\Z$<\\0)_"; > regmatch_t pmatch[2]; > if (0 == regcomp(&preg, s, 0)) { > regexec(&preg, s, 0, pmatch, 0); > regfree(&preg); > } > return 0; > } > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:23 ` Szabolcs Nagy @ 2015-03-21 1:30 ` Rich Felker 2015-03-21 2:10 ` Szabolcs Nagy 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 1:30 UTC (permalink / raw) To: Konstantin Serebryany, musl On Sat, Mar 21, 2015 at 02:23:41AM +0100, Szabolcs Nagy wrote: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 18:10:18 -0700]: > > After your fix the fuzzer did not find anything else so far, but it > > suffers from slow performance on some cases. > > Not sure if this qualifies for a bug, but the following example takes > > ~2 seconds to run (runs instantly with glibc): > > i think the problem is stacked repetitions > tre doesnt handle them in a sane way > and uses huge amount of ram Sadly there doesn't seem to be any sane way to handle them... > for * it would be easy to solve, but > the general case is theoretically impossible to > solve: x{255}{255} will be a 255*255 state machine > > this is the only part in the musl regex > engine that's allowed to have super linear > space/time complexity > > (you might want to add some logic to avoid > such stacked repetitions to speed up the search) > > (btw the standard does not allow these, but if > the pattern is parenthesized around every repetition > then that's ok: (x*)* is a valid pattern, x** is not, > so there is not much point rejecting these patterns > the problem does not go away since grouping is allowed) > > > int main() { > > regex_t preg; > > const char *s = ".****\\Z$<\\0)_"; Isn't the \0 an invalid backreference? Could it be getting processed in a way that's causing the slowdown, but simply rejected by glibc? Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:30 ` Rich Felker @ 2015-03-21 2:10 ` Szabolcs Nagy 2015-03-21 2:17 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 2:10 UTC (permalink / raw) To: Rich Felker; +Cc: Konstantin Serebryany, musl * Rich Felker <dalias@libc.org> [2015-03-20 21:30:16 -0400]: > > > int main() { > > > regex_t preg; > > > const char *s = ".****\\Z$<\\0)_"; > > Isn't the \0 an invalid backreference? Could it be getting processed > in a way that's causing the slowdown, but simply rejected by glibc? ah you were right the \0 causes the slow down here: it switches to the backtracking mode and there are many ways to backtrack on .**** ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 2:10 ` Szabolcs Nagy @ 2015-03-21 2:17 ` Rich Felker 0 siblings, 0 replies; 42+ messages in thread From: Rich Felker @ 2015-03-21 2:17 UTC (permalink / raw) To: Konstantin Serebryany, musl On Sat, Mar 21, 2015 at 03:10:18AM +0100, Szabolcs Nagy wrote: > * Rich Felker <dalias@libc.org> [2015-03-20 21:30:16 -0400]: > > > > int main() { > > > > regex_t preg; > > > > const char *s = ".****\\Z$<\\0)_"; > > > > Isn't the \0 an invalid backreference? Could it be getting processed > > in a way that's causing the slowdown, but simply rejected by glibc? > > ah you were right the \0 causes the slow down here: > it switches to the backtracking mode and there are > many ways to backtrack on .**** Right. But \0 isn't even a valid backreference. It would refer to "the whole match" which could never match as a backreference. Valid backrefs are only the digits 1-9 though. \0 is not defined and should probably be treated as a literal or a parse error. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:05 ` Konstantin Serebryany 2015-03-21 1:10 ` Konstantin Serebryany @ 2015-03-21 1:32 ` Rich Felker 2015-03-21 1:37 ` Konstantin Serebryany 1 sibling, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 1:32 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote: > On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote: > > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: > >> >> > > the question is how hard it is to do (1) ? > >> >> > > > >> >> > > i assume asan is non-trivial to set up for that (or is it > >> >> > > enough to replace malloc calls? and some startup logic?) > >> >> > > >> >> > asan replaces malloc and a few more libc functions. > >> >> > It works with various different libcs, so there is a good chance that > >> >> > it will work here with no or minimal changes. > >> >> > >> >> ok i'll try it > >> > > >> > I would guess it works with no change for static linking, but some > >> > changes might be needed for dynamic linking. I'm perfectly happy with > >> > all the fuzzing being done with static linking anyway; I don't think > >> > dynamic linking would have significant additional code paths whose > >> > coverage need checking. > >> > >> sadly, asan does not support fully static linking. > > > > Is this just an oversight or something fundamental that's hard to fix? > > Quite fundamental. > asan needs to be able to intercept certain libc functions and on all > platforms (linux, android, OSX, Windows, etc) it works only when libc > itself is dynamically linked. But if you're compiling libc itself with asan, couldn't it just hard-insert the interception code into the implementations of these functions during compiling? Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:32 ` Rich Felker @ 2015-03-21 1:37 ` Konstantin Serebryany 2015-03-21 1:56 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 1:37 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Fri, Mar 20, 2015 at 6:32 PM, Rich Felker <dalias@libc.org> wrote: > On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote: >> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote: >> > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: >> >> >> > > the question is how hard it is to do (1) ? >> >> >> > > >> >> >> > > i assume asan is non-trivial to set up for that (or is it >> >> >> > > enough to replace malloc calls? and some startup logic?) >> >> >> > >> >> >> > asan replaces malloc and a few more libc functions. >> >> >> > It works with various different libcs, so there is a good chance that >> >> >> > it will work here with no or minimal changes. >> >> >> >> >> >> ok i'll try it >> >> > >> >> > I would guess it works with no change for static linking, but some >> >> > changes might be needed for dynamic linking. I'm perfectly happy with >> >> > all the fuzzing being done with static linking anyway; I don't think >> >> > dynamic linking would have significant additional code paths whose >> >> > coverage need checking. >> >> >> >> sadly, asan does not support fully static linking. >> > >> > Is this just an oversight or something fundamental that's hard to fix? >> >> Quite fundamental. >> asan needs to be able to intercept certain libc functions and on all >> platforms (linux, android, OSX, Windows, etc) it works only when libc >> itself is dynamically linked. > > But if you're compiling libc itself with asan, couldn't it just > hard-insert the interception code into the implementations of these > functions during compiling? I think it could, it's just quite a bit of work to do. :( We may end up doing it eventually as I hope to use instrumented glibc whenever we can, and at that point intercepting functions from glibc will become rather silly. But we are not there yet. > > Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:37 ` Konstantin Serebryany @ 2015-03-21 1:56 ` Rich Felker 2015-03-21 2:14 ` Konstantin Serebryany 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 1:56 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl On Fri, Mar 20, 2015 at 06:37:39PM -0700, Konstantin Serebryany wrote: > On Fri, Mar 20, 2015 at 6:32 PM, Rich Felker <dalias@libc.org> wrote: > > On Fri, Mar 20, 2015 at 06:05:04PM -0700, Konstantin Serebryany wrote: > >> On Fri, Mar 20, 2015 at 6:00 PM, Rich Felker <dalias@libc.org> wrote: > >> > On Fri, Mar 20, 2015 at 05:54:49PM -0700, Konstantin Serebryany wrote: > >> >> >> > > the question is how hard it is to do (1) ? > >> >> >> > > > >> >> >> > > i assume asan is non-trivial to set up for that (or is it > >> >> >> > > enough to replace malloc calls? and some startup logic?) > >> >> >> > > >> >> >> > asan replaces malloc and a few more libc functions. > >> >> >> > It works with various different libcs, so there is a good chance that > >> >> >> > it will work here with no or minimal changes. > >> >> >> > >> >> >> ok i'll try it > >> >> > > >> >> > I would guess it works with no change for static linking, but some > >> >> > changes might be needed for dynamic linking. I'm perfectly happy with > >> >> > all the fuzzing being done with static linking anyway; I don't think > >> >> > dynamic linking would have significant additional code paths whose > >> >> > coverage need checking. > >> >> > >> >> sadly, asan does not support fully static linking. > >> > > >> > Is this just an oversight or something fundamental that's hard to fix? > >> > >> Quite fundamental. > >> asan needs to be able to intercept certain libc functions and on all > >> platforms (linux, android, OSX, Windows, etc) it works only when libc > >> itself is dynamically linked. > > > > But if you're compiling libc itself with asan, couldn't it just > > hard-insert the interception code into the implementations of these > > functions during compiling? > > I think it could, it's just quite a bit of work to do. :( > We may end up doing it eventually as I hope to use instrumented glibc > whenever we can, > and at that point intercepting functions from glibc will become rather > silly. But we are not there yet. Sorry to keep bombarding you with questions. One more: is it only asan that needs dynamic linking? If we're willing to drop asan for now and just rely on musl itself crashing for heap corruption (musl does a good job of detecting it usually), can the necessary coverage stuff still work with static linking? Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 1:56 ` Rich Felker @ 2015-03-21 2:14 ` Konstantin Serebryany 2015-03-21 2:20 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 2:14 UTC (permalink / raw) To: Rich Felker; +Cc: musl > > Sorry to keep bombarding you with questions. You are more than welcome! > One more: is it only asan > that needs dynamic linking? If we're willing to drop asan for now and > just rely on musl itself crashing for heap corruption (musl does a > good job of detecting it usually), can the necessary coverage stuff > still work with static linking? I think it can with a reasonable additional work, but not out of the box. The compiler instrumentation in clang clearly does not care about dynamic vs static linking. If you build the source with "-fsanitize=leak -fsanitize-coverage=4 -O1" the compiler will not insert any of the asan instrumentation and only insert calls to a couple of functions needed for coverage. Then, instead of linking with the full asan+coverage run-time, you will need a very simple re-implementation of coverage-only runtime. But, my previous experience with running fuzzers w/o memory bug detectors (asan, or others) suggests that this is a bad idea. Memory bugs tend to accumulate and show up in the following iterations (if at all). > > Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 2:14 ` Konstantin Serebryany @ 2015-03-21 2:20 ` Rich Felker 2015-03-21 6:05 ` Konstantin Serebryany 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-21 2:20 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote: > > > > Sorry to keep bombarding you with questions. > > You are more than welcome! > > > One more: is it only asan > > that needs dynamic linking? If we're willing to drop asan for now and > > just rely on musl itself crashing for heap corruption (musl does a > > good job of detecting it usually), can the necessary coverage stuff > > still work with static linking? > > I think it can with a reasonable additional work, but not out of the box. > The compiler instrumentation in clang clearly does not care about > dynamic vs static linking. > If you build the source with "-fsanitize=leak -fsanitize-coverage=4 > -O1" the compiler will not insert any of the asan instrumentation > and only insert calls to a couple of functions needed for coverage. > Then, instead of linking with the full asan+coverage run-time, you > will need a very simple re-implementation of coverage-only runtime. Could the existing runtime be used, just stripped down? > But, my previous experience with running fuzzers w/o memory bug > detectors (asan, or others) > suggests that this is a bad idea. Memory bugs tend to accumulate and > show up in the following iterations (if at all). Well static linking with musl does not impose any constraint on redefining functions, so you could easily use a debugging malloc that lines up each allocation to end on a page boundary with a guard page after it. This would of course be slow and use lots of memory but would catch all heap overflows. And -fstack-protector-all would catch most stack-based overflows. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 2:20 ` Rich Felker @ 2015-03-21 6:05 ` Konstantin Serebryany 2015-03-21 13:28 ` Szabolcs Nagy 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-21 6:05 UTC (permalink / raw) To: Rich Felker; +Cc: musl On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote: > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote: >> > >> > Sorry to keep bombarding you with questions. >> >> You are more than welcome! >> >> > One more: is it only asan >> > that needs dynamic linking? If we're willing to drop asan for now and >> > just rely on musl itself crashing for heap corruption (musl does a >> > good job of detecting it usually), can the necessary coverage stuff >> > still work with static linking? >> >> I think it can with a reasonable additional work, but not out of the box. >> The compiler instrumentation in clang clearly does not care about >> dynamic vs static linking. >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4 >> -O1" the compiler will not insert any of the asan instrumentation >> and only insert calls to a couple of functions needed for coverage. >> Then, instead of linking with the full asan+coverage run-time, you >> will need a very simple re-implementation of coverage-only runtime. > > Could the existing runtime be used, just stripped down? Yes, but for the basic functionality needed by the fuzzer it's simpler to write it from scratch, see below: ======================================================== svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer cat <<EOF >cov-minimal-rt.c static long counter; void __sanitizer_cov_with_check(int *guard) { if (*guard == 0) { counter++; *guard=1; } } long __sanitizer_get_total_unique_coverage() { return counter; } void __sanitizer_cov_module_init() {} void __sanitizer_reset_coverage(){} void __sanitizer_get_coverage_guards(){} void __sanitizer_get_number_of_counters(){} void __sanitizer_update_counter_bitset_and_clear_counters(){} void __sanitizer_set_death_callback(){} EOF clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer clang -std=c++11 -fsanitize=leak -fsanitize-coverage=3 -mllvm -sanitizer-coverage-block-threshold=0 Fuzzer/test/SimpleTest.cpp -c clang -c cov-minimal-rt.c clang++ *.o ./a.out ======================================================== Seed: 1285924057 Shuffle: Size: 1 prefer small: 1 #1 cov: 5 bits: 0 exec/s: 0 Shuffle done: 1 IC: 5 #2 cov: 7 bits: 0 exec/s: 0 #2 NEW: 7 B: 0 L: 64 S: 2 I: 0 #4 cov: 7 bits: 0 exec/s: 0 #8 cov: 7 bits: 0 exec/s: 0 #16 cov: 7 bits: 0 exec/s: 0 #32 cov: 7 bits: 0 exec/s: 0 #64 cov: 7 bits: 0 exec/s: 0 #128 cov: 7 bits: 0 exec/s: 0 #256 cov: 7 bits: 0 exec/s: 0 #512 cov: 7 bits: 0 exec/s: 0 #1024 cov: 7 bits: 0 exec/s: 0 #2048 cov: 7 bits: 0 exec/s: 0 #2107 NEW: 11 B: 0 L: 64 S: 3 I: 0 #2153 NEW: 12 B: 0 L: 1 S: 4 I: 1 H 1: 72 #4096 cov: 12 bits: 0 exec/s: 0 #8192 cov: 12 bits: 0 exec/s: 0 #16384 cov: 12 bits: 0 exec/s: 0 #18091 NEW: 15 B: 0 L: 2 S: 5 I: 4 Hi 2: 72 105 #18122 NEW: 17 B: 0 L: 4 S: 6 I: 0 Hi?i 4: 72 105 8 105 Found the target, exiting The recently added afl-style counters (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters) are a bit more involved, but the basic bool-per-edge is quite enough in most cases. The fuzzer itself is written in C++ and uses STL (probably, not the best idea, but it makes the experiments simpler). Can't tell if it will be a problem with musl, but after all the fuzzer itself is also trivial (as well as the entire concept) > >> But, my previous experience with running fuzzers w/o memory bug >> detectors (asan, or others) >> suggests that this is a bad idea. Memory bugs tend to accumulate and >> show up in the following iterations (if at all). > > Well static linking with musl does not impose any constraint on > redefining functions, so you could easily use a debugging malloc that > lines up each allocation to end on a page boundary with a guard page > after it. Yea... This will slowdown fuzzing and guard pages only protect you from overflow in one direction (ether left, of right, but not both). But this is better than nothing. > This would of course be slow and use lots of memory but > would catch all heap overflows. And -fstack-protector-all would catch > most stack-based overflows. Only stack-overflow-write by a small amount, but yes, better than nothing. BTW, writing a minimalistic asan run-time as part of musl should be a matter of a couple of hours. Probably much faster than making the current monster work with static linking. I'd be happy to help with such. --kcc ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 6:05 ` Konstantin Serebryany @ 2015-03-21 13:28 ` Szabolcs Nagy 2015-03-21 21:03 ` Szabolcs Nagy 2015-03-23 4:55 ` Konstantin Serebryany 0 siblings, 2 replies; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 13:28 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: Rich Felker, musl [-- Attachment #1: Type: text/plain, Size: 4549 bytes --] * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]: > On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote: > > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote: > >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4 > >> -O1" the compiler will not insert any of the asan instrumentation > >> and only insert calls to a couple of functions needed for coverage. > >> Then, instead of linking with the full asan+coverage run-time, you > >> will need a very simple re-implementation of coverage-only runtime. > > > > Could the existing runtime be used, just stripped down? > > Yes, but for the basic functionality needed by the fuzzer it's simpler > to write it from scratch, see below: > > ======================================================== > svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer > cat <<EOF >cov-minimal-rt.c > static long counter; > void __sanitizer_cov_with_check(int *guard) { > if (*guard == 0) { > counter++; > *guard=1; > } > } > long __sanitizer_get_total_unique_coverage() { return counter; } > void __sanitizer_cov_module_init() {} > void __sanitizer_reset_coverage(){} > void __sanitizer_get_coverage_guards(){} > void __sanitizer_get_number_of_counters(){} > void __sanitizer_update_counter_bitset_and_clear_counters(){} > void __sanitizer_set_death_callback(){} > EOF > > clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer > clang -std=c++11 -fsanitize=leak -fsanitize-coverage=3 -mllvm > -sanitizer-coverage-block-threshold=0 Fuzzer/test/SimpleTest.cpp -c > clang -c cov-minimal-rt.c > clang++ *.o > ./a.out > ======================================================== with this i could run the fuzzer against libc.a it's a bit more work to link to libc.a than adding a -L so i attached the scripts i used (and an example) so others can reproduce it c++ headers cannot be used in the test (that would require cleaning up the libstdc++ header mess) but i think there is no reason to use c++ for these libc api tests anyway you may need to adjust the directories the scripts use (the linking may need to change when compiler-rt is used instead of libgcc) usage: cd workdir ./buildfuzz.sh ./buildmusl.sh ./fuzzcompile.sh reg.c ./fuzzlink.sh reg.o ./a.out of course to make it useful the malloc magic is needed for more likely crashes > The recently added afl-style counters > (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters) > are a bit more involved, but the basic bool-per-edge is quite enough > in most cases. > ok > The fuzzer itself is written in C++ and uses STL (probably, not the > best idea, but it makes the experiments simpler). > Can't tell if it will be a problem with musl, but after all the fuzzer > itself is also trivial (as well as the entire concept) > c++ happens to work because musl is (almost) abi compatible with glibc on x86 so we can just link to the glibc linked libstdc++ (this can eg fail when the c++ thread local storage destructor abi is used, that is not implemented in musl yet) so yes c++ makes things more painful: you need to recompile the entire toolchain to make it work reliably (and then both gcc and clang have broken assumptions about the libc so you have to patch them) which is too much work for running tests > > Well static linking with musl does not impose any constraint on > > redefining functions, so you could easily use a debugging malloc that > > lines up each allocation to end on a page boundary with a guard page > > after it. > > Yea... This will slowdown fuzzing and guard pages only protect you > from overflow in one direction (ether left, of right, but not both). > But this is better than nothing. > you can run the tests twice (for left and right) :) > > This would of course be slow and use lots of memory but > > would catch all heap overflows. And -fstack-protector-all would catch > > most stack-based overflows. > > Only stack-overflow-write by a small amount, but yes, better than nothing. > > BTW, writing a minimalistic asan run-time as part of musl should be a > matter of a couple of hours. > Probably much faster than making the current monster work with static linking. > I'd be happy to help with such. > how would this look? compile the tests and libc with asan, but instead of linking the asan runtime from clang use a musl specific one? i assume for that we still need to change the libc startup code, malloc functions and may be some things around thread stacks [-- Attachment #2: buildfuzz.sh --] [-- Type: application/x-sh, Size: 823 bytes --] [-- Attachment #3: buildmusl.sh --] [-- Type: application/x-sh, Size: 298 bytes --] [-- Attachment #4: fuzzcompile.sh --] [-- Type: application/x-sh, Size: 177 bytes --] [-- Attachment #5: fuzzlink.sh --] [-- Type: application/x-sh, Size: 330 bytes --] [-- Attachment #6: reg.c --] [-- Type: text/x-csrc, Size: 279 bytes --] #include <stdint.h> #include <string.h> #include <regex.h> void TestOneInput(const uint8_t *p, size_t n) { regex_t preg; regmatch_t pmatch[2]; char *s = strndup((char*)p, n); if (!regcomp(&preg, s, REG_EXTENDED)) { regexec(&preg, s, 0, pmatch, 0); regfree(&preg); } } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 13:28 ` Szabolcs Nagy @ 2015-03-21 21:03 ` Szabolcs Nagy 2015-03-21 21:38 ` Szabolcs Nagy 2015-03-23 5:02 ` Konstantin Serebryany 2015-03-23 4:55 ` Konstantin Serebryany 1 sibling, 2 replies; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 21:03 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl [-- Attachment #1: Type: text/plain, Size: 5167 bytes --] i wrote some trivial test cases for __dn_expand __dns_parse __pleval fnmatch inet_pton strptime to try out the concept, i've seen one crash so far: a bus error when fuzzing inet_pton probably a stack corruption that overwrites the location where %rbp is stored and then the memory access relative to rbp crashes the fuzzing goes like: ./a.out -seed=1753234605 ... #8388608 cov: 546 bits: 0 exec/s: 838860 #16777216 cov: 546 bits: 0 exec/s: 798915 #27461772 NEW: 548 B: 0 L: 16 S: 22 I: 0 8283::2:2.8.83.3 16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51 #27469404 NEW: 549 B: 0 L: 24 S: 23 I: 2 8283::2:283:2.8.83.2.833 24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51 Bus error (core dumped) is there a way to get a reproducer after such a crash? in this case i fortunately had the core dump and i can see the inet_pton argument in %r14 but it would be nice if there were occasional saved check points from where i can restart the fuzzer. i dont yet see the bug and cannot reproduce the issue outside the fuzzer (but i didnt try very hard) attached the fuzz test case and the code that should reproduce the issue, gdb session below Core was generated by `./a.out -seed=1753234605'. Program terminated with signal SIGBUS, Bus error. #0 0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65 65 *a++ = ip[j]>>8; (gdb) bt #0 0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65 #1 0x0000000000400769 in TestOneInput () #2 0x000000000040c6f3 in fuzzer::Fuzzer::RunOneMaximizeTotalCoverage(std::vector<unsigned char, std::allocator<unsigned char> > const&) () #3 0x000000000040c412 in fuzzer::Fuzzer::RunOne(std::vector<unsigned char, std::allocator<unsigned char> > const&) () #4 0x000000000040cc7c in fuzzer::Fuzzer::MutateAndTestOne(std::vector<unsigned char, std::allocator<unsigned char> >*) () #5 0x000000000040cffb in fuzzer::Fuzzer::Loop(unsigned long) () #6 0x0000000000400d4c in fuzzer::FuzzerDriver(int, char**, void (*)(unsigned char const*, unsigned long)) () #7 0x00000000004007dc in main () (gdb) disass inet_pton,+40 Dump of assembler code from 0x479b40 to 0x479b68: 0x0000000000479b40 <inet_pton+0>: push %rbp 0x0000000000479b41 <inet_pton+1>: push %r15 0x0000000000479b43 <inet_pton+3>: push %r14 0x0000000000479b45 <inet_pton+5>: push %r13 0x0000000000479b47 <inet_pton+7>: push %r12 0x0000000000479b49 <inet_pton+9>: push %rbx 0x0000000000479b4a <inet_pton+10>: sub $0x28,%rsp 0x0000000000479b4e <inet_pton+14>: mov %rdx,%r13 0x0000000000479b51 <inet_pton+17>: mov %rsi,%r14 0x0000000000479b54 <inet_pton+20>: mov %edi,%ebp 0x0000000000479b56 <inet_pton+22>: mov $0x6de364,%edi 0x0000000000479b5b <inet_pton+27>: callq 0x4007f0 <__sanitizer_cov_with_check> 0x0000000000479b60 <inet_pton+32>: cmp $0xa,%ebp 0x0000000000479b63 <inet_pton+35>: jne 0x479ba6 <inet_pton+102> 0x0000000000479b65 <inet_pton+37>: mov $0x6de3c8,%edi End of assembler dump. (gdb) disass /m 0x000000000047a020,+64 Dump of assembler code from 0x47a020 to 0x47a060: 62 for (j=0; j<7-i; j++) ip[brk+j] = 0; 0x000000000047a02a <inet_pton+1258>: callq 0x4007f0 <__sanitizer_cov_with_check> 0x000000000047a02f <inet_pton+1263>: xor %ebx,%ebx 0x000000000047a031 <inet_pton+1265>: mov 0x8(%rsp),%rbp 0x000000000047a036 <inet_pton+1270>: mov 0x4(%rsp),%r15d 0x000000000047a03b <inet_pton+1275>: jmp 0x47a04d <inet_pton+1293> 0x000000000047a03d <inet_pton+1277>: nopl (%rax) 63 } 64 for (j=0; j<8; j++) { 0x000000000047a040 <inet_pton+1280>: inc %rbx 0x000000000047a043 <inet_pton+1283>: mov $0x6de46c,%edi 0x000000000047a048 <inet_pton+1288>: callq 0x4007f0 <__sanitizer_cov_with_check> 0x000000000047a04d <inet_pton+1293>: mov $0x6de468,%edi 65 *a++ = ip[j]>>8; 0x000000000047a052 <inet_pton+1298>: callq 0x4007f0 <__sanitizer_cov_with_check> 0x000000000047a057 <inet_pton+1303>: mov 0x11(%rsp,%rbx,2),%al => 0x000000000047a05b <inet_pton+1307>: mov %al,0x0(%rbp,%rbx,2) 66 *a++ = ip[j]; 0x000000000047a05f <inet_pton+1311>: mov 0x10(%rsp,%rbx,2),%al 0x000000000047a063 <inet_pton+1315>: mov %al,0x1(%rbp,%rbx,2) End of assembler dump. (gdb) i reg rax 0x7fffffffdf00 140737488346880 rbx 0x0 0 rcx 0x0 0 rdx 0x0 0 rsi 0x7fffffffdfb2 140737488347058 rdi 0x6de468 7201896 rbp 0x20000ffffe000 0x20000ffffe000 rsp 0x7fffffffdf80 0x7fffffffdf80 r8 0x7fffffffdf3a 140737488346938 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x10 16 r13 0x7 7 r14 0x6e2dc3 7220675 r15 0x1 1 rip 0x47a05b 0x47a05b <inet_pton+1307> eflags 0x10202 [ IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x63 99 gs 0x0 0 (gdb) p (char*)0x6e2dc3 $3 = 0x6e2dc3 "2.8288;3:33::2.82.83333" (gdb) [-- Attachment #2: inet_pton_fuzz.c --] [-- Type: text/x-csrc, Size: 223 bytes --] #include <stdint.h> #include <stdlib.h> #include <string.h> #include <arpa/inet.h> void TestOneInput(const uint8_t *p, size_t n) { char buf[16]; char *s = strndup((char*)p, n); inet_pton(AF_INET6, s, buf); free(s); } [-- Attachment #3: inet_pton_reporoduce.c --] [-- Type: text/x-csrc, Size: 136 bytes --] #include <arpa/inet.h> static const char s[] = "2.8288;3:33::2.82.83333"; int main() { char buf[16]; inet_pton(AF_INET6, s, buf); } ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 21:03 ` Szabolcs Nagy @ 2015-03-21 21:38 ` Szabolcs Nagy 2015-03-21 22:13 ` Szabolcs Nagy 2015-03-23 5:02 ` Konstantin Serebryany 1 sibling, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 21:38 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl * Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:03:02 +0100]: ... > r12 0x10 16 > r13 0x7 7 > r14 0x6e2dc3 7220675 > r15 0x1 1 > rip 0x47a05b 0x47a05b <inet_pton+1307> > eflags 0x10202 [ IF RF ] > cs 0x33 51 > ss 0x2b 43 > ds 0x0 0 > es 0x0 0 > fs 0x63 99 > gs 0x0 0 > (gdb) p (char*)0x6e2dc3 > $3 = 0x6e2dc3 "2.8288;3:33::2.82.83333" > (gdb) ah.. r14 is incremented as the string is parsed the original string is (gdb) p (char*)0x6e2dc3-35 $37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333" with this i can reproduce the crash ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 21:38 ` Szabolcs Nagy @ 2015-03-21 22:13 ` Szabolcs Nagy 2015-03-22 6:36 ` Justin Cormack 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-21 22:13 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl [-- Attachment #1: Type: text/plain, Size: 546 bytes --] * Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:38:25 +0100]: > ah.. r14 is incremented as the string is parsed > the original string is > > (gdb) p (char*)0x6e2dc3-35 > $37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333" > > with this i can reproduce the crash i assume 1:2:3:4:5:6:7:: is invalid ipv6 address currently musl gets the :: handling wrong at the end and it goes on clobbering memory, i guess this is security critical issue a possible fix is attached but probably the code should be made clearer here [-- Attachment #2: inet_pton.diff --] [-- Type: text/x-diff, Size: 358 bytes --] diff --git a/src/network/inet_pton.c b/src/network/inet_pton.c index 4496b47..e4cdad5 100644 --- a/src/network/inet_pton.c +++ b/src/network/inet_pton.c @@ -38,6 +38,7 @@ int inet_pton(int af, const char *restrict s, void *restrict a0) for (i=0; ; i++) { if (s[0]==':' && brk<0) { + if (i==7) return 0; brk=i; ip[i]=0; if (!*++s) break; ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 22:13 ` Szabolcs Nagy @ 2015-03-22 6:36 ` Justin Cormack 0 siblings, 0 replies; 42+ messages in thread From: Justin Cormack @ 2015-03-22 6:36 UTC (permalink / raw) To: musl, Konstantin Serebryany, Rich Felker On 21 March 2015 at 22:13, Szabolcs Nagy <nsz@port70.net> wrote: > * Szabolcs Nagy <nsz@port70.net> [2015-03-21 22:38:25 +0100]: >> ah.. r14 is incremented as the string is parsed >> the original string is >> >> (gdb) p (char*)0x6e2dc3-35 >> $37 = 0x6e2da0 "8:a:2:8:3:28:8::2:83:20:8:2:833:23:2.8288;3:33::2.82.83333" >> >> with this i can reproduce the crash > > i assume > > 1:2:3:4:5:6:7:: > > is invalid ipv6 address No, it is valid, the last :: expands to :0. RFC 2373 says "The "::" can also be used to compress the leading and/or trailing zeros in an address." Justin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 21:03 ` Szabolcs Nagy 2015-03-21 21:38 ` Szabolcs Nagy @ 2015-03-23 5:02 ` Konstantin Serebryany 2015-03-23 12:25 ` Szabolcs Nagy 1 sibling, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-23 5:02 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote: > i wrote some trivial test cases for > > __dn_expand > __dns_parse > __pleval > fnmatch > inet_pton > strptime Cool! Is there something you plan to have in the repository or share some other way? > > to try out the concept, i've seen one crash so far: > a bus error when fuzzing inet_pton > > probably a stack corruption that overwrites the location > where %rbp is stored and then the memory access relative > to rbp crashes > > the fuzzing goes like: > > ./a.out -seed=1753234605 > ... > #8388608 cov: 546 bits: 0 exec/s: 838860 > #16777216 cov: 546 bits: 0 exec/s: 798915 This looks good. "exec/s: 798915" means that even with relatively weak search algorithm you can find lots of paths. > #27461772 NEW: 548 B: 0 L: 16 S: 22 I: 0 8283::2:2.8.83.3 16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51 > #27469404 NEW: 549 B: 0 L: 24 S: 23 I: 2 8283::2:283:2.8.83.2.833 24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51 > Bus error (core dumped) > > is there a way to get a reproducer after such a crash? > the fuzzer relies on asan to call at-crash handler -- this is what __sanitizer_set_death_callback is for. w/o asan you can set up a signal handler that will print fuzzer::Fuzzer::CurrentUnit. If everything else fails you can of course re-rerun the fuzzer with the same seed. > in this case i fortunately had the core dump > and i can see the inet_pton argument in %r14 > but it would be nice if there were occasional > saved check points from where i can restart > the fuzzer. > > i dont yet see the bug and cannot reproduce the > issue outside the fuzzer (but i didnt try very hard) > > attached the fuzz test case and the code that should > reproduce the issue, gdb session below > > Core was generated by `./a.out -seed=1753234605'. > Program terminated with signal SIGBUS, Bus error. > #0 0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65 > 65 *a++ = ip[j]>>8; > (gdb) bt > #0 0x000000000047a05b in inet_pton (af=<optimized out>, s=<optimized out>, a0=0x20000ffffe000) at src/network/inet_pton.c:65 > #1 0x0000000000400769 in TestOneInput () > #2 0x000000000040c6f3 in fuzzer::Fuzzer::RunOneMaximizeTotalCoverage(std::vector<unsigned char, std::allocator<unsigned char> > const&) () > #3 0x000000000040c412 in fuzzer::Fuzzer::RunOne(std::vector<unsigned char, std::allocator<unsigned char> > const&) () > #4 0x000000000040cc7c in fuzzer::Fuzzer::MutateAndTestOne(std::vector<unsigned char, std::allocator<unsigned char> >*) () > #5 0x000000000040cffb in fuzzer::Fuzzer::Loop(unsigned long) () > #6 0x0000000000400d4c in fuzzer::FuzzerDriver(int, char**, void (*)(unsigned char const*, unsigned long)) () > #7 0x00000000004007dc in main () > (gdb) disass inet_pton,+40 > Dump of assembler code from 0x479b40 to 0x479b68: > 0x0000000000479b40 <inet_pton+0>: push %rbp > 0x0000000000479b41 <inet_pton+1>: push %r15 > 0x0000000000479b43 <inet_pton+3>: push %r14 > 0x0000000000479b45 <inet_pton+5>: push %r13 > 0x0000000000479b47 <inet_pton+7>: push %r12 > 0x0000000000479b49 <inet_pton+9>: push %rbx > 0x0000000000479b4a <inet_pton+10>: sub $0x28,%rsp > 0x0000000000479b4e <inet_pton+14>: mov %rdx,%r13 > 0x0000000000479b51 <inet_pton+17>: mov %rsi,%r14 > 0x0000000000479b54 <inet_pton+20>: mov %edi,%ebp > 0x0000000000479b56 <inet_pton+22>: mov $0x6de364,%edi > 0x0000000000479b5b <inet_pton+27>: callq 0x4007f0 <__sanitizer_cov_with_check> > 0x0000000000479b60 <inet_pton+32>: cmp $0xa,%ebp > 0x0000000000479b63 <inet_pton+35>: jne 0x479ba6 <inet_pton+102> > 0x0000000000479b65 <inet_pton+37>: mov $0x6de3c8,%edi > End of assembler dump. > (gdb) disass /m 0x000000000047a020,+64 > Dump of assembler code from 0x47a020 to 0x47a060: > 62 for (j=0; j<7-i; j++) ip[brk+j] = 0; > 0x000000000047a02a <inet_pton+1258>: callq 0x4007f0 <__sanitizer_cov_with_check> > 0x000000000047a02f <inet_pton+1263>: xor %ebx,%ebx > 0x000000000047a031 <inet_pton+1265>: mov 0x8(%rsp),%rbp > 0x000000000047a036 <inet_pton+1270>: mov 0x4(%rsp),%r15d > 0x000000000047a03b <inet_pton+1275>: jmp 0x47a04d <inet_pton+1293> > 0x000000000047a03d <inet_pton+1277>: nopl (%rax) > > 63 } > 64 for (j=0; j<8; j++) { > 0x000000000047a040 <inet_pton+1280>: inc %rbx > 0x000000000047a043 <inet_pton+1283>: mov $0x6de46c,%edi > 0x000000000047a048 <inet_pton+1288>: callq 0x4007f0 <__sanitizer_cov_with_check> > 0x000000000047a04d <inet_pton+1293>: mov $0x6de468,%edi > > 65 *a++ = ip[j]>>8; > 0x000000000047a052 <inet_pton+1298>: callq 0x4007f0 <__sanitizer_cov_with_check> > 0x000000000047a057 <inet_pton+1303>: mov 0x11(%rsp,%rbx,2),%al > => 0x000000000047a05b <inet_pton+1307>: mov %al,0x0(%rbp,%rbx,2) > > 66 *a++ = ip[j]; > 0x000000000047a05f <inet_pton+1311>: mov 0x10(%rsp,%rbx,2),%al > 0x000000000047a063 <inet_pton+1315>: mov %al,0x1(%rbp,%rbx,2) > > End of assembler dump. > (gdb) i reg > rax 0x7fffffffdf00 140737488346880 > rbx 0x0 0 > rcx 0x0 0 > rdx 0x0 0 > rsi 0x7fffffffdfb2 140737488347058 > rdi 0x6de468 7201896 > rbp 0x20000ffffe000 0x20000ffffe000 > rsp 0x7fffffffdf80 0x7fffffffdf80 > r8 0x7fffffffdf3a 140737488346938 > r9 0x0 0 > r10 0x0 0 > r11 0x246 582 > r12 0x10 16 > r13 0x7 7 > r14 0x6e2dc3 7220675 > r15 0x1 1 > rip 0x47a05b 0x47a05b <inet_pton+1307> > eflags 0x10202 [ IF RF ] > cs 0x33 51 > ss 0x2b 43 > ds 0x0 0 > es 0x0 0 > fs 0x63 99 > gs 0x0 0 > (gdb) p (char*)0x6e2dc3 > $3 = 0x6e2dc3 "2.8288;3:33::2.82.83333" > (gdb) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 5:02 ` Konstantin Serebryany @ 2015-03-23 12:25 ` Szabolcs Nagy 2015-03-23 15:56 ` Konstantin Serebryany 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-23 12:25 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: Rich Felker, musl * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 22:02:48 -0700]: > On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote: > > i wrote some trivial test cases for > > > > __dn_expand > > __dns_parse > > __pleval > > fnmatch > > inet_pton > > strptime > > Cool! Is there something you plan to have in the repository or share > some other way? > (musl does not have extra tools/docs/tests in the main repo, this is what you want eg for toolchain builds and packaging) but i plan to release the tests somewhere (currently they just trivial calls into the relevant libc function) i don't know what's the best way to fuzz more than one argument eg fnmatch(pattern, string, flags) is it ok to just split the input data between the args? (i havent looked under the hood how the fuzzer mutates the input) > > #27461772 NEW: 548 B: 0 L: 16 S: 22 I: 0 8283::2:2.8.83.3 16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51 > > #27469404 NEW: 549 B: 0 L: 24 S: 23 I: 2 8283::2:283:2.8.83.2.833 24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51 > > Bus error (core dumped) > > > > is there a way to get a reproducer after such a crash? > > > > the fuzzer relies on asan to call at-crash handler -- this is what > __sanitizer_set_death_callback is for. > w/o asan you can set up a signal handler that will print > fuzzer::Fuzzer::CurrentUnit. > If everything else fails you can of course re-rerun the fuzzer with > the same seed. > thanks, sounds good ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 12:25 ` Szabolcs Nagy @ 2015-03-23 15:56 ` Konstantin Serebryany 0 siblings, 0 replies; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-23 15:56 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl On Mon, Mar 23, 2015 at 5:25 AM, Szabolcs Nagy <nsz@port70.net> wrote: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 22:02:48 -0700]: >> On Sat, Mar 21, 2015 at 2:03 PM, Szabolcs Nagy <nsz@port70.net> wrote: >> > i wrote some trivial test cases for >> > >> > __dn_expand >> > __dns_parse >> > __pleval >> > fnmatch >> > inet_pton >> > strptime >> >> Cool! Is there something you plan to have in the repository or share >> some other way? >> > > (musl does not have extra tools/docs/tests in the main repo, > this is what you want eg for toolchain builds and packaging) > > but i plan to release the tests somewhere > (currently they just trivial calls into the relevant libc function) > > i don't know what's the best way to fuzz more than one argument > eg fnmatch(pattern, string, flags) Yes, splitting the input bytes between the args is the most straightforward way. Although sharing the input bytes (e.g. fnmatch(X, X, X[0])) was surprisingly interesting too. > > is it ok to just split the input data between the args? > (i havent looked under the hood how the fuzzer mutates the input) > >> > #27461772 NEW: 548 B: 0 L: 16 S: 22 I: 0 8283::2:2.8.83.3 16: 56 50 56 51 58 58 50 58 50 46 56 46 56 51 46 51 >> > #27469404 NEW: 549 B: 0 L: 24 S: 23 I: 2 8283::2:283:2.8.83.2.833 24: 56 50 56 51 58 58 50 58 50 56 51 58 50 46 56 46 56 51 46 50 46 56 51 51 >> > Bus error (core dumped) >> > >> > is there a way to get a reproducer after such a crash? >> > >> >> the fuzzer relies on asan to call at-crash handler -- this is what >> __sanitizer_set_death_callback is for. >> w/o asan you can set up a signal handler that will print >> fuzzer::Fuzzer::CurrentUnit. >> If everything else fails you can of course re-rerun the fuzzer with >> the same seed. >> > > thanks, sounds good > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-21 13:28 ` Szabolcs Nagy 2015-03-21 21:03 ` Szabolcs Nagy @ 2015-03-23 4:55 ` Konstantin Serebryany 2015-03-23 12:35 ` Szabolcs Nagy 1 sibling, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-23 4:55 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]: >> On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote: >> > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote: >> >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4 >> >> -O1" the compiler will not insert any of the asan instrumentation >> >> and only insert calls to a couple of functions needed for coverage. >> >> Then, instead of linking with the full asan+coverage run-time, you >> >> will need a very simple re-implementation of coverage-only runtime. >> > >> > Could the existing runtime be used, just stripped down? >> >> Yes, but for the basic functionality needed by the fuzzer it's simpler >> to write it from scratch, see below: >> >> ======================================================== >> svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer >> cat <<EOF >cov-minimal-rt.c >> static long counter; >> void __sanitizer_cov_with_check(int *guard) { >> if (*guard == 0) { >> counter++; >> *guard=1; >> } >> } >> long __sanitizer_get_total_unique_coverage() { return counter; } >> void __sanitizer_cov_module_init() {} >> void __sanitizer_reset_coverage(){} >> void __sanitizer_get_coverage_guards(){} >> void __sanitizer_get_number_of_counters(){} >> void __sanitizer_update_counter_bitset_and_clear_counters(){} >> void __sanitizer_set_death_callback(){} >> EOF >> >> clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer >> clang -std=c++11 -fsanitize=leak -fsanitize-coverage=3 -mllvm >> -sanitizer-coverage-block-threshold=0 Fuzzer/test/SimpleTest.cpp -c >> clang -c cov-minimal-rt.c >> clang++ *.o >> ./a.out >> ======================================================== > > with this i could run the fuzzer against libc.a > > it's a bit more work to link to libc.a than adding > a -L so i attached the scripts i used (and an example) > so others can reproduce it > > c++ headers cannot be used in the test (that would > require cleaning up the libstdc++ header mess) > > but i think there is no reason to use c++ for these > libc api tests anyway Sure. > > you may need to adjust the directories the scripts use > > (the linking may need to change when compiler-rt is > used instead of libgcc) > > usage: > > cd workdir > ./buildfuzz.sh > ./buildmusl.sh > ./fuzzcompile.sh reg.c > ./fuzzlink.sh reg.o > ./a.out > > of course to make it useful the malloc magic is needed for > more likely crashes > >> The recently added afl-style counters >> (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters) >> are a bit more involved, but the basic bool-per-edge is quite enough >> in most cases. >> > > ok > >> The fuzzer itself is written in C++ and uses STL (probably, not the >> best idea, but it makes the experiments simpler). >> Can't tell if it will be a problem with musl, but after all the fuzzer >> itself is also trivial (as well as the entire concept) >> > > c++ happens to work because musl is (almost) abi compatible with > glibc on x86 so we can just link to the glibc linked libstdc++ > > (this can eg fail when the c++ thread local storage destructor > abi is used, that is not implemented in musl yet) > > so yes c++ makes things more painful: you need to recompile the > entire toolchain to make it work reliably (and then both gcc > and clang have broken assumptions about the libc so you have to > patch them) which is too much work for running tests > >> > Well static linking with musl does not impose any constraint on >> > redefining functions, so you could easily use a debugging malloc that >> > lines up each allocation to end on a page boundary with a guard page >> > after it. >> >> Yea... This will slowdown fuzzing and guard pages only protect you >> from overflow in one direction (ether left, of right, but not both). >> But this is better than nothing. >> > > you can run the tests twice (for left and right) :) > >> > This would of course be slow and use lots of memory but >> > would catch all heap overflows. And -fstack-protector-all would catch >> > most stack-based overflows. >> >> Only stack-overflow-write by a small amount, but yes, better than nothing. >> >> BTW, writing a minimalistic asan run-time as part of musl should be a >> matter of a couple of hours. >> Probably much faster than making the current monster work with static linking. >> I'd be happy to help with such. >> > > how would this look? > > compile the tests and libc with asan, but instead of linking the > asan runtime from clang use a musl specific one? Yes > > i assume for that we still need to change the libc startup code, malloc > functions and may be some things around thread stacks Try to compile a simple file with asan: int main(int argc, char **argv) { int a[10]; a[argc * 10] = 0; return 0; } % clang -fsanitize=address a.c -c % nm a.o | grep U U __asan_init_v5 U __asan_option_detect_stack_use_after_return U __asan_report_store4 U __asan_stack_malloc_1 __asan_report_store4 should print an error message saying that "bad write of 4 bytes" happened in <current stack trace> on address <param>. Also make other __asan_report_{store,load}{1,2,4,8,16} __asan_init_v5 will be called by the module initializer. When called for the first time, it should mmap the shadow memory. https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm __asan_option_detect_stack_use_after_return is a global, define it to 0. __asan_stack_malloc_1 -- just make it an empty function. Now, you can build a code with asan and detect stack buffer overflows. (The reports won't be very detailed, but they will be correct). If you add poisoned redzones to malloc -- you get heap buffer overflows. If you delay the reuse of free-d memory -- you get use-after-free. If you then implement __asan_register_globals (it is called on module initialization and poisons redzones for globals) you get global buffer overflows. The current asan run-time is large an hairy because it attempts to be thread-friendly, intercepts lots of libc, and provides very details error messages. W/o all that, the run-time will easily fit in < 100 LOC, which can be a part of a libc implementation. hth, --kcc ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 4:55 ` Konstantin Serebryany @ 2015-03-23 12:35 ` Szabolcs Nagy 2015-03-23 14:40 ` stephen Turner 2015-03-28 22:00 ` Szabolcs Nagy 0 siblings, 2 replies; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-23 12:35 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: Rich Felker, musl * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]: > On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote: > > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]: > >> BTW, writing a minimalistic asan run-time as part of musl should be a > >> matter of a couple of hours. > >> Probably much faster than making the current monster work with static linking. > >> I'd be happy to help with such. > >> > > > > how would this look? > > > > compile the tests and libc with asan, but instead of linking the > > asan runtime from clang use a musl specific one? > > Yes > > > > i assume for that we still need to change the libc startup code, malloc > > functions and may be some things around thread stacks > > Try to compile a simple file with asan: > > int main(int argc, char **argv) { > int a[10]; > a[argc * 10] = 0; > return 0; > } > > > % clang -fsanitize=address a.c -c > > % nm a.o | grep U > U __asan_init_v5 > U __asan_option_detect_stack_use_after_return > U __asan_report_store4 > U __asan_stack_malloc_1 > > __asan_report_store4 should print an error message saying that > "bad write of 4 bytes" happened in <current stack trace> on address <param>. > Also make other __asan_report_{store,load}{1,2,4,8,16} > > __asan_init_v5 will be called by the module initializer. > When called for the first time, it should mmap the shadow memory. > https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm > > __asan_option_detect_stack_use_after_return is a global, define it to 0. > __asan_stack_malloc_1 -- just make it an empty function. > > Now, you can build a code with asan and detect stack buffer overflows. > (The reports won't be very detailed, but they will be correct). > If you add poisoned redzones to malloc -- you get heap buffer overflows. > If you delay the reuse of free-d memory -- you get use-after-free. > > If you then implement __asan_register_globals (it is called on module > initialization and poisons redzones for globals) > you get global buffer overflows. > > The current asan run-time is large an hairy because it attempts to be > thread-friendly, > intercepts lots of libc, and provides very details error messages. > W/o all that, the run-time will easily fit in < 100 LOC, which can be > a part of a libc implementation. > nice i'm not sure if we want to push this into musl, but it looks useful i'll try to implement it > hth, > --kcc ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 12:35 ` Szabolcs Nagy @ 2015-03-23 14:40 ` stephen Turner 2015-03-23 14:53 ` Szabolcs Nagy 2015-03-28 22:00 ` Szabolcs Nagy 1 sibling, 1 reply; 42+ messages in thread From: stephen Turner @ 2015-03-23 14:40 UTC (permalink / raw) To: musl, Konstantin Serebryany, Rich Felker [-- Attachment #1: Type: text/plain, Size: 255 bytes --] So musl doesn't have any tests currently to ensure it was built correctly by testing its responses to calls? I have seen a few packages such as binutils come with its own built in test which I would gladly make use of if it was available. thanks stephen [-- Attachment #2: Type: text/html, Size: 406 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 14:40 ` stephen Turner @ 2015-03-23 14:53 ` Szabolcs Nagy 2015-03-23 15:46 ` stephen Turner 0 siblings, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-23 14:53 UTC (permalink / raw) To: stephen Turner; +Cc: musl, Konstantin Serebryany, Rich Felker * stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]: > So musl doesn't have any tests currently to ensure it was built correctly it has tests, just not in the main repo > by testing its responses to calls? I have seen a few packages such as > binutils come with its own built in test which I would gladly make use of > if it was available. you can use the tests, they are available at http://nsz.repo.hu/git/?p=libc-test (which was supposed to be a temporary location until a cleanup is done..) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 14:53 ` Szabolcs Nagy @ 2015-03-23 15:46 ` stephen Turner 2015-03-23 16:28 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: stephen Turner @ 2015-03-23 15:46 UTC (permalink / raw) To: stephen Turner, musl, Konstantin Serebryany, Rich Felker [-- Attachment #1: Type: text/plain, Size: 763 bytes --] On Mon, Mar 23, 2015 at 10:53 AM, Szabolcs Nagy <nsz@port70.net> wrote: > * stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]: > > So musl doesn't have any tests currently to ensure it was built correctly > > it has tests, just not in the main repo > > > by testing its responses to calls? I have seen a few packages such as > > binutils come with its own built in test which I would gladly make use of > > if it was available. > > you can use the tests, they are available at > http://nsz.repo.hu/git/?p=libc-test > > (which was supposed to be a temporary location until > a cleanup is done..) > > nice, i will give those a spin. Is there any consideration for making them a feature/available in the release source files? thanks, stephen [-- Attachment #2: Type: text/html, Size: 1284 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 15:46 ` stephen Turner @ 2015-03-23 16:28 ` Rich Felker 2015-03-23 17:21 ` Nathan McSween 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-23 16:28 UTC (permalink / raw) To: stephen Turner; +Cc: musl, Konstantin Serebryany On Mon, Mar 23, 2015 at 11:46:04AM -0400, stephen Turner wrote: > On Mon, Mar 23, 2015 at 10:53 AM, Szabolcs Nagy <nsz@port70.net> wrote: > > > * stephen Turner <stephen.n.turner@gmail.com> [2015-03-23 10:40:01 -0400]: > > > So musl doesn't have any tests currently to ensure it was built correctly > > > > it has tests, just not in the main repo > > > > > by testing its responses to calls? I have seen a few packages such as > > > binutils come with its own built in test which I would gladly make use of > > > if it was available. > > > > you can use the tests, they are available at > > http://nsz.repo.hu/git/?p=libc-test > > > > (which was supposed to be a temporary location until > > a cleanup is done..) > > > nice, i will give those a spin. Is there any consideration for making them > a feature/available in the release source files? From a release and build system standpoint, it really makes sense to do tests separately, not integrated. The biggest reason is not making cross-compiling a special case, but isolating the concept of "libs/binaries generated for the target" as something non-executable on the host. Other packages generally do a poor job of this and then either cross-compiling breaks you you need lots of cross-specific logic in the build system. With separate tests, musl's build has no reason to care if it's being cross-compiled, and testing a cross-compiled libc (if you feel a need to) is a matter of how you script the build of everything for the cross toolchain and environment rather. Other than that, nsz has aimed to make all the tests libc-agnostic, so they can also be used to test other libcs for conformance and bugs. This works well with glibc already but uclibc has so much missing that lots of the tests are gratuitously failing. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 16:28 ` Rich Felker @ 2015-03-23 17:21 ` Nathan McSween 0 siblings, 0 replies; 42+ messages in thread From: Nathan McSween @ 2015-03-23 17:21 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 255 bytes --] > From a release and build system standpoint, it really makes sense to > do tests separately, not integrated. I agree but only if there is a good automated continuous integration system implemented to find bugs. I would run analyzers, etc as a git hook. [-- Attachment #2: Type: text/html, Size: 298 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-23 12:35 ` Szabolcs Nagy 2015-03-23 14:40 ` stephen Turner @ 2015-03-28 22:00 ` Szabolcs Nagy 2015-03-28 22:32 ` Konstantin Serebryany 1 sibling, 1 reply; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-28 22:00 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl * Szabolcs Nagy <nsz@port70.net> [2015-03-23 13:35:40 +0100]: > * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]: > > On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote: > > > i assume for that we still need to change the libc startup code, malloc > > > functions and may be some things around thread stacks > > > > Try to compile a simple file with asan: > > > > int main(int argc, char **argv) { > > int a[10]; > > a[argc * 10] = 0; > > return 0; > > } > > > > > > % clang -fsanitize=address a.c -c > > > > % nm a.o | grep U > > U __asan_init_v5 > > U __asan_option_detect_stack_use_after_return > > U __asan_report_store4 > > U __asan_stack_malloc_1 > > > > __asan_report_store4 should print an error message saying that > > "bad write of 4 bytes" happened in <current stack trace> on address <param>. > > Also make other __asan_report_{store,load}{1,2,4,8,16} > > > > __asan_init_v5 will be called by the module initializer. > > When called for the first time, it should mmap the shadow memory. > > https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm > > it seems asan intrumented code with memory access cannot run before __asan_init_v5 does the shadow mapping (otherwise the compiler generated shadow access would crash) this is problematic for dynamic linking because the loader calls various libc functions so those cannot be instrumented unless shadow memory is already in place i managed to make a minimal asan runtime work with static linking (and then stack corruption is indeed detected). (i called __asan_init_v5 in the begining of musl's __libc_start_main) > > __asan_option_detect_stack_use_after_return is a global, define it to 0. > > __asan_stack_malloc_1 -- just make it an empty function. > > > > Now, you can build a code with asan and detect stack buffer overflows. > > (The reports won't be very detailed, but they will be correct). > > If you add poisoned redzones to malloc -- you get heap buffer overflows. > > If you delay the reuse of free-d memory -- you get use-after-free. > > > > If you then implement __asan_register_globals (it is called on module > > initialization and poisons redzones for globals) > > you get global buffer overflows. > > i havent tried to do the heap/global poisoning it's not clear to me what's the best way to manage the shadow memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000 range and then mmap with rw the subranges that shadow mmaped memory in the application? then a modified mmap is needed to manage the shadow maps so i think for a asan+cov instrumented libc: - [S]crt1.s should do the initial shadow mmap before any c code gets run - mmap should be replaced to do shadow management - malloc etc should be replaced to handle shadow poisoning - the minimal asan and cov runtimes should be added to libc (so their symbols are available early in the loader) and then we can use such a libc for testing and fuzzing to catch heap/stack corruptions i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1 and Scrt1asan.o on a system and the compiler/linker could use those when compiling some code with asan+cov instrumentation (but this can get ugly if there will be more instrumentations that need runtime support in the future) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-28 22:00 ` Szabolcs Nagy @ 2015-03-28 22:32 ` Konstantin Serebryany 2015-03-28 22:38 ` Rich Felker 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Serebryany @ 2015-03-28 22:32 UTC (permalink / raw) To: Konstantin Serebryany, Rich Felker, musl On Sat, Mar 28, 2015 at 3:00 PM, Szabolcs Nagy <nsz@port70.net> wrote: > * Szabolcs Nagy <nsz@port70.net> [2015-03-23 13:35:40 +0100]: >> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-22 21:55:26 -0700]: >> > On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote: >> > > i assume for that we still need to change the libc startup code, malloc >> > > functions and may be some things around thread stacks >> > >> > Try to compile a simple file with asan: >> > >> > int main(int argc, char **argv) { >> > int a[10]; >> > a[argc * 10] = 0; >> > return 0; >> > } >> > >> > >> > % clang -fsanitize=address a.c -c >> > >> > % nm a.o | grep U >> > U __asan_init_v5 >> > U __asan_option_detect_stack_use_after_return >> > U __asan_report_store4 >> > U __asan_stack_malloc_1 >> > >> > __asan_report_store4 should print an error message saying that >> > "bad write of 4 bytes" happened in <current stack trace> on address <param>. >> > Also make other __asan_report_{store,load}{1,2,4,8,16} >> > >> > __asan_init_v5 will be called by the module initializer. >> > When called for the first time, it should mmap the shadow memory. >> > https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm >> > > > it seems asan intrumented code with memory access cannot run > before __asan_init_v5 does the shadow mapping (otherwise the > compiler generated shadow access would crash) > Correct. > this is problematic for dynamic linking because the loader > calls various libc functions so those cannot be instrumented > unless shadow memory is already in place Yes, I have the same trouble with glibc and have to disable instrumentation for some of the glibc functions (by not adding -fsanitize-address), which is not optimal (may lose bugs on other calls to these functions). > > i managed to make a minimal asan runtime work with static linking > (and then stack corruption is indeed detected). > (i called __asan_init_v5 in the begining of musl's __libc_start_main) Nice! > >> > __asan_option_detect_stack_use_after_return is a global, define it to 0. >> > __asan_stack_malloc_1 -- just make it an empty function. >> > >> > Now, you can build a code with asan and detect stack buffer overflows. >> > (The reports won't be very detailed, but they will be correct). >> > If you add poisoned redzones to malloc -- you get heap buffer overflows. >> > If you delay the reuse of free-d memory -- you get use-after-free. >> > >> > If you then implement __asan_register_globals (it is called on module >> > initialization and poisons redzones for globals) >> > you get global buffer overflows. >> > > > i havent tried to do the heap/global poisoning > > it's not clear to me what's the best way to manage the shadow > memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000 > range and then mmap with rw the subranges that shadow mmaped memory > in the application? You probably can do it because you control all mmap calls from libc (from malloc and thread stack creation), but the first time the user calls mmap syscall bypassing libc it will break. We use MAP_NORESERVE to map the entire range at startup. This has a drawback that the application uses 16Tb of virtual address space and tools like "ulimit -v" do not work. But otherwise this works great. > > then a modified mmap is needed to manage the shadow maps > > so i think for a asan+cov instrumented libc: > > - [S]crt1.s should do the initial shadow mmap before any c code gets run > - mmap should be replaced to do shadow management Only if you do not use the MAP_NORESERVE trick. > - malloc etc should be replaced to handle shadow poisoning > - the minimal asan and cov runtimes should be added to libc > (so their symbols are available early in the loader) > > and then we can use such a libc for testing and fuzzing > to catch heap/stack corruptions > > i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1 > and Scrt1asan.o on a system and the compiler/linker could > use those when compiling some code with asan+cov instrumentation sounds great. > (but this can get ugly if there will be more instrumentations > that need runtime support in the future) Yea. The core of asan run-time is relatively easy to replicate, as you've seen. Probably, one can replicate msan and ubsan (MemorySanitizer, UndefinedBehaviorSanitizer) with comparable effort since most of the logic for those tools is in the compiler. The use-after-return detection in asan relies on a very non-trivial part of run-time. tsan (ThreadSanitizer) has much more complex run-time which is hard to replicate. Maybe someday we'll make them working with static linking, but not any time soon. :( --kcc ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-28 22:32 ` Konstantin Serebryany @ 2015-03-28 22:38 ` Rich Felker 2015-03-28 23:15 ` Szabolcs Nagy 0 siblings, 1 reply; 42+ messages in thread From: Rich Felker @ 2015-03-28 22:38 UTC (permalink / raw) To: Konstantin Serebryany; +Cc: musl On Sat, Mar 28, 2015 at 03:32:41PM -0700, Konstantin Serebryany wrote: > > it seems asan intrumented code with memory access cannot run > > before __asan_init_v5 does the shadow mapping (otherwise the > > compiler generated shadow access would crash) > > > Correct. > > > this is problematic for dynamic linking because the loader > > calls various libc functions so those cannot be instrumented > > unless shadow memory is already in place > > Yes, I have the same trouble with glibc and have to disable > instrumentation for some of the glibc functions > (by not adding -fsanitize-address), which is not optimal (may lose > bugs on other calls to these functions). We have a similar problem with stack protector now. I want to be able to enable stack protector for libc with 1.1.9 so maybe we can solve at least some of the issues asan faces at the same time. > >> > __asan_option_detect_stack_use_after_return is a global, define it to 0. > >> > __asan_stack_malloc_1 -- just make it an empty function. > >> > > >> > Now, you can build a code with asan and detect stack buffer overflows. > >> > (The reports won't be very detailed, but they will be correct). > >> > If you add poisoned redzones to malloc -- you get heap buffer overflows. > >> > If you delay the reuse of free-d memory -- you get use-after-free. > >> > > >> > If you then implement __asan_register_globals (it is called on module > >> > initialization and poisons redzones for globals) > >> > you get global buffer overflows. > >> > > > > > i havent tried to do the heap/global poisoning > > > > it's not clear to me what's the best way to manage the shadow > > memory: mmap with PROT_NONE the entire 0x7fff8000 .. 0x10007fff8000 > > range and then mmap with rw the subranges that shadow mmaped memory > > in the application? > > You probably can do it because you control all mmap calls from libc > (from malloc and thread stack creation), > but the first time the user calls mmap syscall bypassing libc it will break. > We use MAP_NORESERVE to map the entire range at startup. > This has a drawback that the application uses 16Tb of virtual address > space and tools like "ulimit -v" do not work. > But otherwise this works great. MAP_NORESERVE is a NOP on systems with overcommit disabled. The right way to achieve a similar result is to use PROT_NONE to reserve the virtual address range without reserving commit, and only mprotect to PROT_READ|PROT_WRITE later as needed. > > - malloc etc should be replaced to handle shadow poisoning > > - the minimal asan and cov runtimes should be added to libc > > (so their symbols are available early in the loader) > > > > and then we can use such a libc for testing and fuzzing > > to catch heap/stack corruptions > > > > i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1 > > and Scrt1asan.o on a system and the compiler/linker could > > use those when compiling some code with asan+cov instrumentation > > sounds great. I'm not clear why there would be a different dynamic linker pathname for it. It's not a different ABI from the application's standpoint, is it? It seems like you might _want_ to install the dynamic linker with a different name or location just to avoid clobbering the non-asan build, but I don't think it needs a dedicated name/location like it would if it were an ABI/ISA. Rich ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: buffer overflow in regcomp and a way to find more of those 2015-03-28 22:38 ` Rich Felker @ 2015-03-28 23:15 ` Szabolcs Nagy 0 siblings, 0 replies; 42+ messages in thread From: Szabolcs Nagy @ 2015-03-28 23:15 UTC (permalink / raw) To: Rich Felker; +Cc: Konstantin Serebryany, musl * Rich Felker <dalias@libc.org> [2015-03-28 18:38:33 -0400]: > On Sat, Mar 28, 2015 at 03:32:41PM -0700, Konstantin Serebryany wrote: > > > > > > i guess it is possible to have a /lib/ld-muslasan-x86_64.so.1 > > > and Scrt1asan.o on a system and the compiler/linker could > > > use those when compiling some code with asan+cov instrumentation > > > > sounds great. > > I'm not clear why there would be a different dynamic linker pathname > for it. It's not a different ABI from the application's standpoint, is > it? It seems like you might _want_ to install the dynamic linker with > a different name or location just to avoid clobbering the non-asan > build, but I don't think it needs a dedicated name/location like it > would if it were an ABI/ISA. > if you only instrument libc and not the application then there is no difference between the two libcs from app pov but if you want to instrument the application too then it must use the the libc which does the shadow management and has the asan rt the name does not have to be dedicated if asan instrumented binaries are only used locally/temporarily for testing (an instrumented library can only be used with the asan libc, but a non-instrumented lib should work with both libcs) ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2015-03-28 23:15 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-03-20 20:17 buffer overflow in regcomp and a way to find more of those Konstantin Serebryany 2015-03-20 20:40 ` Rich Felker 2015-03-20 21:28 ` Szabolcs Nagy 2015-03-20 23:48 ` Szabolcs Nagy 2015-03-20 22:32 ` Rich Felker 2015-03-20 23:52 ` Szabolcs Nagy 2015-03-21 0:06 ` Konstantin Serebryany 2015-03-21 0:26 ` Szabolcs Nagy 2015-03-21 0:46 ` Rich Felker 2015-03-21 0:54 ` Konstantin Serebryany 2015-03-21 1:00 ` Rich Felker 2015-03-21 1:05 ` Konstantin Serebryany 2015-03-21 1:10 ` Konstantin Serebryany 2015-03-21 1:23 ` Szabolcs Nagy 2015-03-21 1:30 ` Rich Felker 2015-03-21 2:10 ` Szabolcs Nagy 2015-03-21 2:17 ` Rich Felker 2015-03-21 1:32 ` Rich Felker 2015-03-21 1:37 ` Konstantin Serebryany 2015-03-21 1:56 ` Rich Felker 2015-03-21 2:14 ` Konstantin Serebryany 2015-03-21 2:20 ` Rich Felker 2015-03-21 6:05 ` Konstantin Serebryany 2015-03-21 13:28 ` Szabolcs Nagy 2015-03-21 21:03 ` Szabolcs Nagy 2015-03-21 21:38 ` Szabolcs Nagy 2015-03-21 22:13 ` Szabolcs Nagy 2015-03-22 6:36 ` Justin Cormack 2015-03-23 5:02 ` Konstantin Serebryany 2015-03-23 12:25 ` Szabolcs Nagy 2015-03-23 15:56 ` Konstantin Serebryany 2015-03-23 4:55 ` Konstantin Serebryany 2015-03-23 12:35 ` Szabolcs Nagy 2015-03-23 14:40 ` stephen Turner 2015-03-23 14:53 ` Szabolcs Nagy 2015-03-23 15:46 ` stephen Turner 2015-03-23 16:28 ` Rich Felker 2015-03-23 17:21 ` Nathan McSween 2015-03-28 22:00 ` Szabolcs Nagy 2015-03-28 22:32 ` Konstantin Serebryany 2015-03-28 22:38 ` Rich Felker 2015-03-28 23:15 ` Szabolcs Nagy
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).