From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/7248
Path: news.gmane.org!not-for-mail
From: Konstantin Serebryany <konstantin.s.serebryany@gmail.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: buffer overflow in regcomp and a way to find more of those
Date: Sun, 22 Mar 2015 21:55:26 -0700
Message-ID: <CAGQ9bdzihNcZoWzpZghnhwwkt3YdwLtBZr3WA_UKn1xL_gzJdQ@mail.gmail.com>
References: <20150321004637.GQ23507@brightrain.aerifal.cx> <CAGQ9bdxQO-xCLBeL_J7R9ywqN_FMAyHaccQLyWNCNfoBDXfBcA@mail.gmail.com>
 <20150321010043.GR23507@brightrain.aerifal.cx> <CAGQ9bdzxrkdGJb1z4e9m9QC97WKbRgG=jasvy4v1-sf0kX08AQ@mail.gmail.com>
 <20150321013225.GT23507@brightrain.aerifal.cx> <CAGQ9bdxWaaUBv3gmDE3meSaibjJiUmcAfSae_OTrkky1tjVYLQ@mail.gmail.com>
 <20150321015619.GU23507@brightrain.aerifal.cx> <CAGQ9bdwziW09Jn17M=5+qyi5Q-1+LTy4dr0d0Tkm2WP0ao-NzA@mail.gmail.com>
 <20150321022023.GW23507@brightrain.aerifal.cx> <CAGQ9bdwwhkcBse2K612ynZ34SLLCoGvNZTgeMPVK8V-WX56peA@mail.gmail.com>
 <20150321132810.GI16260@port70.net>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Trace: ger.gmane.org 1427086584 20675 80.91.229.3 (23 Mar 2015 04:56:24 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Mon, 23 Mar 2015 04:56:24 +0000 (UTC)
To: Konstantin Serebryany <konstantin.s.serebryany@gmail.com>, Rich Felker <dalias@libc.org>, 
	musl@lists.openwall.com
Original-X-From: musl-return-7261-gllmg-musl=m.gmane.org@lists.openwall.com Mon Mar 23 05:56:09 2015
Return-path: <musl-return-7261-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-7261-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1YZuPC-0004Mg-SN
	for gllmg-musl@m.gmane.org; Mon, 23 Mar 2015 05:56:03 +0100
Original-Received: (qmail 31761 invoked by uid 550); 23 Mar 2015 04:56:00 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 30713 invoked from network); 23 Mar 2015 04:55:59 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        bh=tO3F0R4syNmYuVizKNv31fde87VWfAI/8Aa1xHSp8VI=;
        b=P2+/hM5/n2YMe/Qbh9IDK/KUVWXdE4gshG8jKvr/yYnPM+Xn2AkO6N0CBE57nAWUQG
         244K/YmQ0TOt2Jq1JUceXqWfT95G/ZuJLi+QHa1YwFfYJ+uazSJxYlgmL32MnGVeKohT
         HEApPwWgi4khuOL6lC3OLlNBW6fgzIqEemroayDdEjSeShivsSoDzTAeqjg/k+0c7Dsk
         BtA1ejZg6EKyRqa9aqrdCMQf96560JxvgwOLmGRkktA20X0wi8P9hard4nLVmodPMvKj
         QUUsdo6F+KIln2JXw9SCuemH2UsBenO8HxWur2FSRqSobs6L3kYyMeZRlmo9N7PhIz4p
         lR5g==
X-Received: by 10.52.30.34 with SMTP id p2mr74541506vdh.89.1427086547989; Sun,
 22 Mar 2015 21:55:47 -0700 (PDT)
In-Reply-To: <20150321132810.GI16260@port70.net>
Xref: news.gmane.org gmane.linux.lib.musl.general:7248
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/7248>

On Sat, Mar 21, 2015 at 6:28 AM, Szabolcs Nagy <nsz@port70.net> wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@gmail.com> [2015-03-20 23:05:13 -0700]:
>> On Fri, Mar 20, 2015 at 7:20 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 20, 2015 at 07:14:33PM -0700, Konstantin Serebryany wrote:
>> >> If you build the source with "-fsanitize=leak -fsanitize-coverage=4
>> >> -O1" the compiler will not insert any of the asan instrumentation
>> >> and only insert calls to a couple of functions needed for coverage.
>> >> Then, instead of linking with the full asan+coverage run-time, you
>> >> will need a very simple re-implementation of coverage-only runtime.
>> >
>> > Could the existing runtime be used, just stripped down?
>>
>> Yes, but for the basic functionality needed by the fuzzer it's simpler
>> to write it from scratch, see below:
>>
>> ========================================================
>> svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer
>> cat <<EOF >cov-minimal-rt.c
>> static long counter;
>> void __sanitizer_cov_with_check(int *guard) {
>>   if (*guard == 0) {
>>     counter++;
>>     *guard=1;
>>   }
>> }
>> long __sanitizer_get_total_unique_coverage() { return counter; }
>> void __sanitizer_cov_module_init() {}
>> void __sanitizer_reset_coverage(){}
>> void __sanitizer_get_coverage_guards(){}
>> void __sanitizer_get_number_of_counters(){}
>> void __sanitizer_update_counter_bitset_and_clear_counters(){}
>> void __sanitizer_set_death_callback(){}
>> EOF
>>
>> clang -std=c++11 -c Fuzzer/Fuzzer*.cpp -I Fuzzer
>> clang -std=c++11  -fsanitize=leak -fsanitize-coverage=3 -mllvm
>> -sanitizer-coverage-block-threshold=0  Fuzzer/test/SimpleTest.cpp -c
>> clang -c cov-minimal-rt.c
>> clang++ *.o
>> ./a.out
>> ========================================================
>
> with this i could run the fuzzer against libc.a
>
> it's a bit more work to link to libc.a than adding
> a -L so i attached the scripts i used (and an example)
> so others can reproduce it
>
> c++ headers cannot be used in the test (that would
> require cleaning up the libstdc++ header mess)
>
> but i think there is no reason to use c++ for these
> libc api tests anyway

Sure.

>
> you may need to adjust the directories the scripts use
>
> (the linking may need to change when compiler-rt is
> used instead of libgcc)
>
> usage:
>
> cd workdir
> ./buildfuzz.sh
> ./buildmusl.sh
> ./fuzzcompile.sh reg.c
> ./fuzzlink.sh reg.o
> ./a.out
>
> of course to make it useful the malloc magic is needed for
> more likely crashes
>
>> The recently added afl-style counters
>> (https://code.google.com/p/address-sanitizer/wiki/AsanCoverage#Coverage_counters)
>> are a bit more involved, but the basic bool-per-edge is quite enough
>> in most cases.
>>
>
> ok
>
>> The fuzzer itself is written in C++ and uses STL (probably, not the
>> best idea, but it makes the experiments simpler).
>> Can't tell if it will be a problem with musl, but after all the fuzzer
>> itself is also trivial (as well as the entire concept)
>>
>
> c++ happens to work because musl is (almost) abi compatible with
> glibc on x86 so we can just link to the glibc linked libstdc++
>
> (this can eg fail when the c++ thread local storage destructor
> abi is used, that is not implemented in musl yet)
>
> so yes c++ makes things more painful: you need to recompile the
> entire toolchain to make it work reliably (and then both gcc
> and clang have broken assumptions about the libc so you have to
> patch them) which is too much work for running tests
>
>> > Well static linking with musl does not impose any constraint on
>> > redefining functions, so you could easily use a debugging malloc that
>> > lines up each allocation to end on a page boundary with a guard page
>> > after it.
>>
>> Yea... This will slowdown fuzzing and guard pages only protect you
>> from overflow in one direction (ether left, of right, but not both).
>> But this is better than nothing.
>>
>
> you can run the tests twice (for left and right) :)
>
>> > This would of course be slow and use lots of memory but
>> > would catch all heap overflows. And -fstack-protector-all would catch
>> > most stack-based overflows.
>>
>> Only stack-overflow-write by a small amount, but yes, better than nothing.
>>
>> BTW, writing a minimalistic asan run-time as part of musl should be a
>> matter of a couple of hours.
>> Probably much faster than making the current monster work with static linking.
>> I'd be happy to help with such.
>>
>
> how would this look?
>
> compile the tests and libc with asan, but instead of linking the
> asan runtime from clang use a musl specific one?

Yes
>
> i assume for that we still need to change the libc startup code, malloc
> functions and may be some things around thread stacks

Try to compile a simple file with asan:

int main(int argc, char **argv) {
  int a[10];
  a[argc * 10] = 0;
  return 0;
}


% clang -fsanitize=address  a.c -c

% nm a.o | grep U
                 U __asan_init_v5
                 U __asan_option_detect_stack_use_after_return
                 U __asan_report_store4
                 U __asan_stack_malloc_1

__asan_report_store4 should print an error message saying that
"bad write of 4 bytes" happened in <current stack trace> on address <param>.
Also make  other __asan_report_{store,load}{1,2,4,8,16}

__asan_init_v5 will be called by the module initializer.
When called for the first time, it should mmap the shadow memory.
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm

__asan_option_detect_stack_use_after_return is a global, define it to 0.
__asan_stack_malloc_1 -- just make it an empty function.

Now, you can build a code with asan and detect stack buffer overflows.
(The reports won't be very detailed, but they will be correct).
If you add poisoned redzones to malloc -- you get heap buffer overflows.
If you delay the reuse of free-d memory -- you get use-after-free.

If you then implement __asan_register_globals (it is called on module
initialization and poisons redzones for globals)
you get global buffer overflows.

The current asan run-time is large an hairy because it attempts to be
thread-friendly,
intercepts lots of libc, and provides very details error messages.
W/o all that, the run-time will easily fit in < 100 LOC, which can be
a part of a libc implementation.

hth,
--kcc