From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14680 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Bug report, concurrency issue on exception with gcc 8.3.0 Date: Tue, 17 Sep 2019 10:02:27 -0400 Message-ID: <20190917140227.GW9017@brightrain.aerifal.cx> References: <20190917134422.aootviums4hdtell@zen.arangodb.com> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="255385"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-14696-gllmg-musl=m.gmane.org@lists.openwall.com Tue Sep 17 16:02:43 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1iAE43-0014IB-Av for gllmg-musl@m.gmane.org; Tue, 17 Sep 2019 16:02:43 +0200 Original-Received: (qmail 1150 invoked by uid 550); 17 Sep 2019 14:02:40 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 1129 invoked from network); 17 Sep 2019 14:02:39 -0000 Content-Disposition: inline In-Reply-To: <20190917134422.aootviums4hdtell@zen.arangodb.com> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14680 Archived-At: On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote: > Hello, > > I am experiencing problems when linking a large multithreaded C++ application > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0 > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning) > and gcc 8.3.0-r0. > > Before going into details, here is an overview: > > 1. libgcc does not detect correctly that the application is multithreaded, > since `pthread_cancel` is not linked into the executable. > As a consequence, the lazy initialization of data structures for stack > unwinding (FDE tables) is executed without protection of a mutex. > Therefore, if the very first exception in the program happens to be > thrown in two threads concurrently, the data structures can be corrupted, > resulting in a busy loop after `main()` is finished. > 2. If I make sure that I explicitly link in `pthread_cancel` this problem > is (almost certainly) gone, however, in certain scenarios this leads > to a crash when the first exception is thrown. > > I had first reported this problem to gcc as a bug against libgcc, but the > gcc team denies responsibility, see > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737). This is a gcc bug and needs to be fixed in libgcc. Rich > I have produced small sample programs to exhibit the problems, see below for > a more detailed analysis as to what happens. > > For case 1: > > ------------------------ snip exceptioncollision.cpp ---------------------- > #include > #include > #include > > std::atomic letsgo{0}; > > void waiter() { > size_t count = 0; > while (letsgo == 0) { > ++count; > } > try { > throw 42; > } catch (int const& s) { > } > } > > int main(int, char*[]) { > #ifdef REPAIR > try { throw 42; } catch (int const& i) {} > #endif > std::thread t1(waiter); > std::thread t2(waiter); > std::this_thread::sleep_for(std::chrono::milliseconds(10)); > letsgo = 1; > t1.join(); > t2.join(); > return 0; > } > ------------------------ snip exceptioncollision.cpp ---------------------- > > Use Alpine Linux 3.10.1, for example in a Docker container, and compile > as follows: > > g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static > > Then execute the static executable multiple times: > > while true ; do ./exceptioncollision ; date ; done > > after a few tries it will freeze. > > > For case 2: > > ----------------------------------- snip exceptionbang.cpp --------------- > #include > //#include > > #ifdef REPAIR > void* g(void *p) { > return p; > } > > void f() { > pthread_t t; > pthread_create(&t, nullptr, g, nullptr); > pthread_cancel(t); > pthread_join(t, nullptr); > } > #endif > > int main(int argc, char*[]) { > #ifdef REPAIR > if (argc == -1) { f(); } > #endif > //std::cout << "Hello world!" << std::endl; > try { throw 42; } catch(int const& i) {}; > return 0; > } > ----------------------------------- snip exceptionbang.cpp --------------- > > Use Alpine Linux 3.10.1, for example in a Docker container, and compile > as follows: > > g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1 > > Execute `./exceptionbang` and it will create a segmentation violation. > > Curiously, if you uncomment the line > > //#include > > then more of static initialization code seems to be compiled in and > all is well. > > More detailed analysis of what is happening: > > Let's look at case 1 first: > > libgcc insists that it is a good idea to check for the presence of > `pthread_cancel` to detect if the application is multi-threaded. Therefore, > in my case, since I do not explicitly use `pthread_cancel` and am > linking statically, the libgcc runtime thinks that the program is > single-threaded (since `pthread_cancel` is in its own compilation > unit). As a consequence the mutex > [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used. > > Therefore some code in `libgcc`, which is executed when an exception is > first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072)) > is not thread-safe and ruins the data structure `seen_objects` rendering > a singly linked list circular. > > This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221). > > > No let's look at case 2: > > I tried to "fix" this by using `pthread_cancel` explicitly. This is how > I arrived at the second example program `exceptionbang.cpp`. Here, the > detection is successful detecting a multi-threaded program. However, > it crashes when the first exception is thrown. I do not understand the > details, but it seems that the libgcc runtime code stumbles over some > data structures which are not properly initialized. When including the > header `iostream`, some more code is compiled in which initializes the > structures and all is well. > > > Please let me know if you need any more information and please Cc me in > communication about this issue. > > Cheers, > Max.