From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7248 Path: news.gmane.org!not-for-mail From: Konstantin Serebryany Newsgroups: gmane.linux.lib.musl.general Subject: Re: buffer overflow in regcomp and a way to find more of those Date: Sun, 22 Mar 2015 21:55:26 -0700 Message-ID: References: <20150321004637.GQ23507@brightrain.aerifal.cx> <20150321010043.GR23507@brightrain.aerifal.cx> <20150321013225.GT23507@brightrain.aerifal.cx> <20150321015619.GU23507@brightrain.aerifal.cx> <20150321022023.GW23507@brightrain.aerifal.cx> <20150321132810.GI16260@port70.net> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: ger.gmane.org 1427086584 20675 80.91.229.3 (23 Mar 2015 04:56:24 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 23 Mar 2015 04:56:24 +0000 (UTC) To: Konstantin Serebryany , Rich Felker , musl@lists.openwall.com Original-X-From: musl-return-7261-gllmg-musl=m.gmane.org@lists.openwall.com Mon Mar 23 05:56:09 2015 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1YZuPC-0004Mg-SN for gllmg-musl@m.gmane.org; Mon, 23 Mar 2015 05:56:03 +0100 Original-Received: (qmail 31761 invoked by uid 550); 23 Mar 2015 04:56:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 30713 invoked from network); 23 Mar 2015 04:55:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=tO3F0R4syNmYuVizKNv31fde87VWfAI/8Aa1xHSp8VI=; b=P2+/hM5/n2YMe/Qbh9IDK/KUVWXdE4gshG8jKvr/yYnPM+Xn2AkO6N0CBE57nAWUQG 244K/YmQ0TOt2Jq1JUceXqWfT95G/ZuJLi+QHa1YwFfYJ+uazSJxYlgmL32MnGVeKohT HEApPwWgi4khuOL6lC3OLlNBW6fgzIqEemroayDdEjSeShivsSoDzTAeqjg/k+0c7Dsk BtA1ejZg6EKyRqa9aqrdCMQf96560JxvgwOLmGRkktA20X0wi8P9hard4nLVmodPMvKj QUUsdo6F+KIln2JXw9SCuemH2UsBenO8HxWur2FSRqSobs6L3kYyMeZRlmo9N7PhIz4p lR5g== X-Received: by 10.52.30.34 with SMTP id p2mr74541506vdh.89.1427086547989; Sun, 22 Mar 2015 21:55:47 -0700 (PDT) In-Reply-To: <20150321132810.GI16260@port70.net> Xref: news.gmane.org gmane.linux.lib.musl.general:7248 Archived-At: On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy wrote: > * Konstantin Serebryany [2015-03-20 23:05:13 -0700]: >> On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker wrote: >> > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote: >> >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4 >> >> -O1" the compiler will not insert any of the asan instrumentation >> >> and only insert calls to a couple of functions needed for coverage. >> >> Then, instead of linking with the full asan+coverage run-time, you >> >> will need a very simple re-implementation of coverage-only runtime. >> > >> > Could the existing runtime be used, just stripped down? >> >> Yes, but for the basic functionality needed by the fuzzer it's simpler >> to write it from scratch, see below: >> >> ======================================================== >> svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer >> cat <cov-minimal-rt.c >> static long counter; >> void __sanitizer_cov_with_check(int *guard) { >> if (*guard == 0) { >> counter++; >> *guard=1; >> } >> } >> long __sanitizer_get_total_unique_coverage() { return counter; } >> void __sanitizer_cov_module_init() {} >> void __sanitizer_reset_coverage(){} >> void __sanitizer_get_coverage_guards(){} >> void __sanitizer_get_number_of_counters(){} >> void __sanitizer_update_counter_bitset_and_clear_counters(){} >> void __sanitizer_set_death_callback(){} >> EOF >> >> clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer >> clang -std=c++11 -fsanitize=leak -fsanitize-coverage=3 -mllvm >> -sanitizer-coverage-block-threshold=0 Fuzzer/test/SimpleTest.cpp -c >> clang -c cov-minimal-rt.c >> clang++ *.o >> ./a.out >> ======================================================== > > with this i could run the fuzzer against libc.a > > it's a bit more work to link to libc.a than adding > a -L so i attached the scripts i used (and an example) > so others can reproduce it > > c++ headers cannot be used in the test (that would > require cleaning up the libstdc++ header mess) > > but i think there is no reason to use c++ for these > libc api tests anyway Sure. > > you may need to adjust the directories the scripts use > > (the linking may need to change when compiler-rt is > used instead of libgcc) > > usage: > > cd workdir > ./buildfuzz.sh > ./buildmusl.sh > ./fuzzcompile.sh reg.c > ./fuzzlink.sh reg.o > ./a.out > > of course to make it useful the malloc magic is needed for > more likely crashes > >> The recently added afl-style counters >> (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters) >> are a bit more involved, but the basic bool-per-edge is quite enough >> in most cases. >> > > ok > >> The fuzzer itself is written in C++ and uses STL (probably, not the >> best idea, but it makes the experiments simpler). >> Can't tell if it will be a problem with musl, but after all the fuzzer >> itself is also trivial (as well as the entire concept) >> > > c++ happens to work because musl is (almost) abi compatible with > glibc on x86 so we can just link to the glibc linked libstdc++ > > (this can eg fail when the c++ thread local storage destructor > abi is used, that is not implemented in musl yet) > > so yes c++ makes things more painful: you need to recompile the > entire toolchain to make it work reliably (and then both gcc > and clang have broken assumptions about the libc so you have to > patch them) which is too much work for running tests > >> > Well static linking with musl does not impose any constraint on >> > redefining functions, so you could easily use a debugging malloc that >> > lines up each allocation to end on a page boundary with a guard page >> > after it. >> >> Yea... This will slowdown fuzzing and guard pages only protect you >> from overflow in one direction (ether left, of right, but not both). >> But this is better than nothing. >> > > you can run the tests twice (for left and right) :) > >> > This would of course be slow and use lots of memory but >> > would catch all heap overflows. And -fstack-protector-all would catch >> > most stack-based overflows. >> >> Only stack-overflow-write by a small amount, but yes, better than nothing. >> >> BTW, writing a minimalistic asan run-time as part of musl should be a >> matter of a couple of hours. >> Probably much faster than making the current monster work with static linking. >> I'd be happy to help with such. >> > > how would this look? > > compile the tests and libc with asan, but instead of linking the > asan runtime from clang use a musl specific one? Yes > > i assume for that we still need to change the libc startup code, malloc > functions and may be some things around thread stacks Try to compile a simple file with asan: int main(int argc, char **argv) { int a[10]; a[argc * 10] = 0; return 0; } % clang -fsanitize=address a.c -c % nm a.o | grep U U __asan_init_v5 U __asan_option_detect_stack_use_after_return U __asan_report_store4 U __asan_stack_malloc_1 __asan_report_store4 should print an error message saying that "bad write of 4 bytes" happened in on address . Also make other __asan_report_{store,load}{1,2,4,8,16} __asan_init_v5 will be called by the module initializer. When called for the first time, it should mmap the shadow memory. https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm __asan_option_detect_stack_use_after_return is a global, define it to 0. __asan_stack_malloc_1 -- just make it an empty function. Now, you can build a code with asan and detect stack buffer overflows. (The reports won't be very detailed, but they will be correct). If you add poisoned redzones to malloc -- you get heap buffer overflows. If you delay the reuse of free-d memory -- you get use-after-free. If you then implement __asan_register_globals (it is called on module initialization and poisons redzones for globals) you get global buffer overflows. The current asan run-time is large an hairy because it attempts to be thread-friendly, intercepts lots of libc, and provides very details error messages. W/o all that, the run-time will easily fit in < 100 LOC, which can be a part of a libc implementation. hth, --kcc