From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14679 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Max Neunhoeffer Newsgroups: gmane.linux.lib.musl.general Subject: Bug report, concurrency issue on exception with gcc 8.3.0 Date: Tue, 17 Sep 2019 15:44:22 +0200 Message-ID: <20190917134422.aootviums4hdtell@zen.arangodb.com> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="176712"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: NeoMutt/20180716 To: musl@lists.openwall.com Original-X-From: musl-return-14695-gllmg-musl=m.gmane.org@lists.openwall.com Tue Sep 17 15:44:40 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1iADma-000jrL-0Z for gllmg-musl@m.gmane.org; Tue, 17 Sep 2019 15:44:40 +0200 Original-Received: (qmail 20352 invoked by uid 550); 17 Sep 2019 13:44:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 20317 invoked from network); 17 Sep 2019 13:44:35 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arangodb.com; s=google; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=YiQnABaEYT7z5GTUKETeOX1CxlBDZhvm1sUZj4mNKQM=; b=MVxq1NNZ78uk27RvKarojZw1ngziYtbjJLkcp2gAC7BXiBz3PM81NmzOVbkEP2xMBi vIRYhI5sZWDScimrqAPKHKiTIyLOfncWzNRNRifUZyl/X+tgdvUoEfalj59fp42awA4P aHIpLzYJzgtRjTPiI4XpW0w6R4Vb1f97Qd/eJ3XBbH5CBHm8WxdiL3XK04R9OUClP48v 9ZwCwX1bJ+x/7Mmmia+6aAbIPksXgO3bhied1fvMvk3ztMh/y4AwCNLZfu+bdOPnoc1h M0rVcYrR1VCH/7RhWvKnZA9NFMGTaRpHC6c6L+v3BUxrmOCsCykVtAk4ef74C5hVtTdm 8fhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=YiQnABaEYT7z5GTUKETeOX1CxlBDZhvm1sUZj4mNKQM=; b=IebSgXiOkdN5xZCWqwUekYs8Lcbg6JTQIBBYzZpb7g9+K/cfUCrm87Y7ryM8cn+w1Q aJQzO2KVOhMa8f1k76XXkSc9f6ITluqdmcaOdHZGrCPsdLB2Xl8p+o8z9OwfW7DZvsG1 TLXjIXa7Wy9E9wwmWwn6DfK9sDZcqe32SPydGNGk2gKhSFlNicqroTvI+aGceiVOK8gQ eYOBmE3MdNSUoFRrAsEWmeJL2oFjZN0qokZBmW06n6s/FWoY5o0py5scKj22vuXJgcgK VBhou3Z3c365sQglXVruacq9Bm8bz6X11YWgpF3zC69n1OE+1uO6b6VCgByNRo1N+5wK xnZA== X-Gm-Message-State: APjAAAUmBwYKBPZaBxnAyUcFXtJ2p2/SaNlANk/2H6FMz+UtW7J5qFhX 4e+CxhS+qT/UWjQZ4FcSCQKZcBqOx+/iFK7qkRzl5NkpgKE2hCqxW5mtTaHh5vszkRqNnTaBYb0 AUArtYBij2LpAOY4sH4JCvr9b8+6bTtW1eEfjzUK0BWumOZYi4X3PvqQ5RjK8RT0= X-Google-Smtp-Source: APXvYqyg9tY3xGDE4cii4lTFthclLSXnuDLmo3TXRSluwDqKAVZSeMKvQcFlrxeBwgWH5AcKV12mQQ== X-Received: by 2002:a1c:1d4:: with SMTP id 203mr3621567wmb.104.1568727863949; Tue, 17 Sep 2019 06:44:23 -0700 (PDT) Content-Disposition: inline Xref: news.gmane.org gmane.linux.lib.musl.general:14679 Archived-At: Hello, I am experiencing problems when linking a large multithreaded C++ application statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0 on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning) and gcc 8.3.0-r0. Before going into details, here is an overview: 1. libgcc does not detect correctly that the application is multithreaded, since `pthread_cancel` is not linked into the executable. As a consequence, the lazy initialization of data structures for stack unwinding (FDE tables) is executed without protection of a mutex. Therefore, if the very first exception in the program happens to be thrown in two threads concurrently, the data structures can be corrupted, resulting in a busy loop after `main()` is finished. 2. If I make sure that I explicitly link in `pthread_cancel` this problem is (almost certainly) gone, however, in certain scenarios this leads to a crash when the first exception is thrown. I had first reported this problem to gcc as a bug against libgcc, but the gcc team denies responsibility, see [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737). I have produced small sample programs to exhibit the problems, see below for a more detailed analysis as to what happens. For case 1: ------------------------ snip exceptioncollision.cpp ---------------------- #include #include #include std::atomic letsgo{0}; void waiter() { size_t count = 0; while (letsgo == 0) { ++count; } try { throw 42; } catch (int const& s) { } } int main(int, char*[]) { #ifdef REPAIR try { throw 42; } catch (int const& i) {} #endif std::thread t1(waiter); std::thread t2(waiter); std::this_thread::sleep_for(std::chrono::milliseconds(10)); letsgo = 1; t1.join(); t2.join(); return 0; } ------------------------ snip exceptioncollision.cpp ---------------------- Use Alpine Linux 3.10.1, for example in a Docker container, and compile as follows: g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static Then execute the static executable multiple times: while true ; do ./exceptioncollision ; date ; done after a few tries it will freeze. For case 2: ----------------------------------- snip exceptionbang.cpp --------------- #include //#include #ifdef REPAIR void* g(void *p) { return p; } void f() { pthread_t t; pthread_create(&t, nullptr, g, nullptr); pthread_cancel(t); pthread_join(t, nullptr); } #endif int main(int argc, char*[]) { #ifdef REPAIR if (argc == -1) { f(); } #endif //std::cout << "Hello world!" << std::endl; try { throw 42; } catch(int const& i) {}; return 0; } ----------------------------------- snip exceptionbang.cpp --------------- Use Alpine Linux 3.10.1, for example in a Docker container, and compile as follows: g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1 Execute `./exceptionbang` and it will create a segmentation violation. Curiously, if you uncomment the line //#include then more of static initialization code seems to be compiled in and all is well. More detailed analysis of what is happening: Let's look at case 1 first: libgcc insists that it is a good idea to check for the presence of `pthread_cancel` to detect if the application is multi-threaded. Therefore, in my case, since I do not explicitly use `pthread_cancel` and am linking statically, the libgcc runtime thinks that the program is single-threaded (since `pthread_cancel` is in its own compilation unit). As a consequence the mutex [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used. Therefore some code in `libgcc`, which is executed when an exception is first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072)) is not thread-safe and ruins the data structure `seen_objects` rendering a singly linked list circular. This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221). No let's look at case 2: I tried to "fix" this by using `pthread_cancel` explicitly. This is how I arrived at the second example program `exceptionbang.cpp`. Here, the detection is successful detecting a multi-threaded program. However, it crashes when the first exception is thrown. I do not understand the details, but it seems that the libgcc runtime code stumbles over some data structures which are not properly initialized. When including the header `iostream`, some more code is compiled in which initializes the structures and all is well. Please let me know if you need any more information and please Cc me in communication about this issue. Cheers, Max.