mailing list of musl libc
 help / color / mirror / code / Atom feed
* Bug report, concurrency issue on exception with gcc 8.3.0
@ 2019-09-17 13:44 Max Neunhoeffer
  2019-09-17 14:02 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Max Neunhoeffer @ 2019-09-17 13:44 UTC (permalink / raw)
  To: musl

Hello,

I am experiencing problems when linking a large multithreaded C++ application
statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
and gcc 8.3.0-r0.

Before going into details, here is an overview:

1. libgcc does not detect correctly that the application is multithreaded,
   since `pthread_cancel` is not linked into the executable.
   As a consequence, the lazy initialization of data structures for stack
   unwinding (FDE tables) is executed without protection of a mutex.
   Therefore, if the very first exception in the program happens to be
   thrown in two threads concurrently, the data structures can be corrupted,
   resulting in a busy loop after `main()` is finished.
2. If I make sure that I explicitly link in `pthread_cancel` this problem
   is (almost certainly) gone, however, in certain scenarios this leads
   to a crash when the first exception is thrown.

I had first reported this problem to gcc as a bug against libgcc, but the
gcc team denies responsibility, see 
[this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).

I have produced small sample programs to exhibit the problems, see below for
a more detailed analysis as to what happens.

For case 1:

------------------------ snip exceptioncollision.cpp ----------------------
#include <thread>
#include <atomic>
#include <chrono>

std::atomic<int> letsgo{0};

void waiter() {
  size_t count = 0;
  while (letsgo == 0) {
    ++count;
  }
  try {
    throw 42;
  } catch (int const& s) {
  }
}

int main(int, char*[]) {
#ifdef REPAIR
  try { throw 42; } catch (int const& i) {}
#endif
  std::thread t1(waiter);
  std::thread t2(waiter);
  std::this_thread::sleep_for(std::chrono::milliseconds(10));
  letsgo = 1;
  t1.join();
  t2.join();
  return 0;
}
------------------------ snip exceptioncollision.cpp ----------------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static

Then execute the static executable multiple times:

    while true ; do ./exceptioncollision ; date ; done

after a few tries it will freeze.


For case 2:

----------------------------------- snip exceptionbang.cpp ---------------
#include <pthread.h>
//#include <iostream>

#ifdef REPAIR
void* g(void *p) {
  return p;
}

void f() {
  pthread_t t;
  pthread_create(&t, nullptr, g, nullptr);
  pthread_cancel(t);
  pthread_join(t, nullptr);
}
#endif

int main(int argc, char*[]) {
#ifdef REPAIR
  if (argc == -1) { f(); }
#endif
  //std::cout << "Hello world!" << std::endl;
  try { throw 42; } catch(int const& i) {};
  return 0;
}
----------------------------------- snip exceptionbang.cpp ---------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1

Execute `./exceptionbang` and it will create a segmentation violation.

Curiously, if you uncomment the line

    //#include <iostream>

then more of static initialization code seems to be compiled in and
all is well.

More detailed analysis of what is happening:

Let's look at case 1 first:

libgcc insists that it is a good idea to check for the presence of
`pthread_cancel` to detect if the application is multi-threaded. Therefore,
in my case, since I do not explicitly use `pthread_cancel` and am
linking statically, the libgcc runtime thinks that the program is
single-threaded (since `pthread_cancel` is in its own compilation
unit). As a consequence the mutex
[here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used.

Therefore some code in `libgcc`, which is executed when an exception is
first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072))
is not thread-safe and ruins the data structure `seen_objects` rendering
a singly linked list circular.

This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221).


No let's look at case 2:

I tried to "fix" this by using `pthread_cancel` explicitly. This is how
I arrived at the second example program `exceptionbang.cpp`. Here, the
detection is successful detecting a multi-threaded program. However,
it crashes when the first exception is thrown. I do not understand the
details, but it seems that the libgcc runtime code stumbles over some
data structures which are not properly initialized. When including the
header `iostream`, some more code is compiled in which initializes the
structures and all is well.


Please let me know if you need any more information and please Cc me in
communication about this issue.

Cheers,
  Max.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-24 23:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 13:44 Bug report, concurrency issue on exception with gcc 8.3.0 Max Neunhoeffer
2019-09-17 14:02 ` Rich Felker
2019-09-17 14:35   ` Rich Felker
2019-09-18  7:19     ` Max Neunhoeffer
2019-09-18  9:21       ` Szabolcs Nagy
2019-09-18 12:45         ` Max Neunhoeffer
2019-09-24 23:22           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).