mailing list of musl libc
 help / color / mirror / code / Atom feed
* Bug report, concurrency issue on exception with gcc 8.3.0
@ 2019-09-17 13:44 Max Neunhoeffer
  2019-09-17 14:02 ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Max Neunhoeffer @ 2019-09-17 13:44 UTC (permalink / raw)
  To: musl

Hello,

I am experiencing problems when linking a large multithreaded C++ application
statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
and gcc 8.3.0-r0.

Before going into details, here is an overview:

1. libgcc does not detect correctly that the application is multithreaded,
   since `pthread_cancel` is not linked into the executable.
   As a consequence, the lazy initialization of data structures for stack
   unwinding (FDE tables) is executed without protection of a mutex.
   Therefore, if the very first exception in the program happens to be
   thrown in two threads concurrently, the data structures can be corrupted,
   resulting in a busy loop after `main()` is finished.
2. If I make sure that I explicitly link in `pthread_cancel` this problem
   is (almost certainly) gone, however, in certain scenarios this leads
   to a crash when the first exception is thrown.

I had first reported this problem to gcc as a bug against libgcc, but the
gcc team denies responsibility, see 
[this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).

I have produced small sample programs to exhibit the problems, see below for
a more detailed analysis as to what happens.

For case 1:

------------------------ snip exceptioncollision.cpp ----------------------
#include <thread>
#include <atomic>
#include <chrono>

std::atomic<int> letsgo{0};

void waiter() {
  size_t count = 0;
  while (letsgo == 0) {
    ++count;
  }
  try {
    throw 42;
  } catch (int const& s) {
  }
}

int main(int, char*[]) {
#ifdef REPAIR
  try { throw 42; } catch (int const& i) {}
#endif
  std::thread t1(waiter);
  std::thread t2(waiter);
  std::this_thread::sleep_for(std::chrono::milliseconds(10));
  letsgo = 1;
  t1.join();
  t2.join();
  return 0;
}
------------------------ snip exceptioncollision.cpp ----------------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static

Then execute the static executable multiple times:

    while true ; do ./exceptioncollision ; date ; done

after a few tries it will freeze.


For case 2:

----------------------------------- snip exceptionbang.cpp ---------------
#include <pthread.h>
//#include <iostream>

#ifdef REPAIR
void* g(void *p) {
  return p;
}

void f() {
  pthread_t t;
  pthread_create(&t, nullptr, g, nullptr);
  pthread_cancel(t);
  pthread_join(t, nullptr);
}
#endif

int main(int argc, char*[]) {
#ifdef REPAIR
  if (argc == -1) { f(); }
#endif
  //std::cout << "Hello world!" << std::endl;
  try { throw 42; } catch(int const& i) {};
  return 0;
}
----------------------------------- snip exceptionbang.cpp ---------------

Use Alpine Linux 3.10.1, for example in a Docker container, and compile
as follows:

    g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1

Execute `./exceptionbang` and it will create a segmentation violation.

Curiously, if you uncomment the line

    //#include <iostream>

then more of static initialization code seems to be compiled in and
all is well.

More detailed analysis of what is happening:

Let's look at case 1 first:

libgcc insists that it is a good idea to check for the presence of
`pthread_cancel` to detect if the application is multi-threaded. Therefore,
in my case, since I do not explicitly use `pthread_cancel` and am
linking statically, the libgcc runtime thinks that the program is
single-threaded (since `pthread_cancel` is in its own compilation
unit). As a consequence the mutex
[here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used.

Therefore some code in `libgcc`, which is executed when an exception is
first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072))
is not thread-safe and ruins the data structure `seen_objects` rendering
a singly linked list circular.

This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221).


No let's look at case 2:

I tried to "fix" this by using `pthread_cancel` explicitly. This is how
I arrived at the second example program `exceptionbang.cpp`. Here, the
detection is successful detecting a multi-threaded program. However,
it crashes when the first exception is thrown. I do not understand the
details, but it seems that the libgcc runtime code stumbles over some
data structures which are not properly initialized. When including the
header `iostream`, some more code is compiled in which initializes the
structures and all is well.


Please let me know if you need any more information and please Cc me in
communication about this issue.

Cheers,
  Max.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-17 13:44 Bug report, concurrency issue on exception with gcc 8.3.0 Max Neunhoeffer
@ 2019-09-17 14:02 ` Rich Felker
  2019-09-17 14:35   ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2019-09-17 14:02 UTC (permalink / raw)
  To: musl

On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> Hello,
> 
> I am experiencing problems when linking a large multithreaded C++ application
> statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> and gcc 8.3.0-r0.
> 
> Before going into details, here is an overview:
> 
> 1. libgcc does not detect correctly that the application is multithreaded,
>    since `pthread_cancel` is not linked into the executable.
>    As a consequence, the lazy initialization of data structures for stack
>    unwinding (FDE tables) is executed without protection of a mutex.
>    Therefore, if the very first exception in the program happens to be
>    thrown in two threads concurrently, the data structures can be corrupted,
>    resulting in a busy loop after `main()` is finished.
> 2. If I make sure that I explicitly link in `pthread_cancel` this problem
>    is (almost certainly) gone, however, in certain scenarios this leads
>    to a crash when the first exception is thrown.
> 
> I had first reported this problem to gcc as a bug against libgcc, but the
> gcc team denies responsibility, see 
> [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).

This is a gcc bug and needs to be fixed in libgcc.

Rich



> I have produced small sample programs to exhibit the problems, see below for
> a more detailed analysis as to what happens.
> 
> For case 1:
> 
> ------------------------ snip exceptioncollision.cpp ----------------------
> #include <thread>
> #include <atomic>
> #include <chrono>
> 
> std::atomic<int> letsgo{0};
> 
> void waiter() {
>   size_t count = 0;
>   while (letsgo == 0) {
>     ++count;
>   }
>   try {
>     throw 42;
>   } catch (int const& s) {
>   }
> }
> 
> int main(int, char*[]) {
> #ifdef REPAIR
>   try { throw 42; } catch (int const& i) {}
> #endif
>   std::thread t1(waiter);
>   std::thread t2(waiter);
>   std::this_thread::sleep_for(std::chrono::milliseconds(10));
>   letsgo = 1;
>   t1.join();
>   t2.join();
>   return 0;
> }
> ------------------------ snip exceptioncollision.cpp ----------------------
> 
> Use Alpine Linux 3.10.1, for example in a Docker container, and compile
> as follows:
> 
>     g++ exceptioncollision.cpp -o exceptioncollision -O0 -Wall -std=c++14 -lpthread -static
> 
> Then execute the static executable multiple times:
> 
>     while true ; do ./exceptioncollision ; date ; done
> 
> after a few tries it will freeze.
> 
> 
> For case 2:
> 
> ----------------------------------- snip exceptionbang.cpp ---------------
> #include <pthread.h>
> //#include <iostream>
> 
> #ifdef REPAIR
> void* g(void *p) {
>   return p;
> }
> 
> void f() {
>   pthread_t t;
>   pthread_create(&t, nullptr, g, nullptr);
>   pthread_cancel(t);
>   pthread_join(t, nullptr);
> }
> #endif
> 
> int main(int argc, char*[]) {
> #ifdef REPAIR
>   if (argc == -1) { f(); }
> #endif
>   //std::cout << "Hello world!" << std::endl;
>   try { throw 42; } catch(int const& i) {};
>   return 0;
> }
> ----------------------------------- snip exceptionbang.cpp ---------------
> 
> Use Alpine Linux 3.10.1, for example in a Docker container, and compile
> as follows:
> 
>     g++ exceptionbang.cpp -o exceptionbang -Wall -Wextra -O0 -g -std=c++14 -static -DREPAIR=1
> 
> Execute `./exceptionbang` and it will create a segmentation violation.
> 
> Curiously, if you uncomment the line
> 
>     //#include <iostream>
> 
> then more of static initialization code seems to be compiled in and
> all is well.
> 
> More detailed analysis of what is happening:
> 
> Let's look at case 1 first:
> 
> libgcc insists that it is a good idea to check for the presence of
> `pthread_cancel` to detect if the application is multi-threaded. Therefore,
> in my case, since I do not explicitly use `pthread_cancel` and am
> linking statically, the libgcc runtime thinks that the program is
> single-threaded (since `pthread_cancel` is in its own compilation
> unit). As a consequence the mutex
> [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1045) is not actually used.
> 
> Therefore some code in `libgcc`, which is executed when an exception is
> first thrown in the life of the process ([see here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L1072))
> is not thread-safe and ruins the data structure `seen_objects` rendering
> a singly linked list circular.
> 
> This in the end leads to a busy loop [here](https://github.com/gcc-mirror/gcc/blob/4ac50a4913ed81cc83a8baf865e49a2c62a5fe5d/libgcc/unwind-dw2-fde.c#L221).
> 
> 
> No let's look at case 2:
> 
> I tried to "fix" this by using `pthread_cancel` explicitly. This is how
> I arrived at the second example program `exceptionbang.cpp`. Here, the
> detection is successful detecting a multi-threaded program. However,
> it crashes when the first exception is thrown. I do not understand the
> details, but it seems that the libgcc runtime code stumbles over some
> data structures which are not properly initialized. When including the
> header `iostream`, some more code is compiled in which initializes the
> structures and all is well.
> 
> 
> Please let me know if you need any more information and please Cc me in
> communication about this issue.
> 
> Cheers,
>   Max.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-17 14:02 ` Rich Felker
@ 2019-09-17 14:35   ` Rich Felker
  2019-09-18  7:19     ` Max Neunhoeffer
  0 siblings, 1 reply; 7+ messages in thread
From: Rich Felker @ 2019-09-17 14:35 UTC (permalink / raw)
  To: musl

On Tue, Sep 17, 2019 at 10:02:27AM -0400, Rich Felker wrote:
> On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> > Hello,
> > 
> > I am experiencing problems when linking a large multithreaded C++ application
> > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> > and gcc 8.3.0-r0.
> > 
> > Before going into details, here is an overview:
> > 
> > 1. libgcc does not detect correctly that the application is multithreaded,
> >    since `pthread_cancel` is not linked into the executable.
> >    As a consequence, the lazy initialization of data structures for stack
> >    unwinding (FDE tables) is executed without protection of a mutex.
> >    Therefore, if the very first exception in the program happens to be
> >    thrown in two threads concurrently, the data structures can be corrupted,
> >    resulting in a busy loop after `main()` is finished.
> > 2. If I make sure that I explicitly link in `pthread_cancel` this problem
> >    is (almost certainly) gone, however, in certain scenarios this leads
> >    to a crash when the first exception is thrown.
> > 
> > I had first reported this problem to gcc as a bug against libgcc, but the
> > gcc team denies responsibility, see 
> > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).
> 
> This is a gcc bug and needs to be fixed in libgcc.

I've updated the gcc tracker with more info, but I seem to lack the
ability to reopen the bug myself.

To add some more context, using weak references to determine if a
library is linked is a dynamic-linking-centric hack and is not
compatible with static linking. GCC has historically done this for
glibc and other systems where libpthread was a separate library to
avoid pulling in a dependency on it, but it's always been broken on
glibc with static linking too. Various distros worked around this with
horrible hacks as described in Andrew Pinski's reply to your bug
report, using binutils tricks to move the whole libpthread.a into a
single .o file so that if any of it gets linked it all gets linked.
It's possibly upstream glibc adopted this at some point; I'm not sure.
But they're in the process of moving the mutex functions to libc
instead of libpthread (and maybe even getting rid of libpthread like
musl does), so GCC's hacks here won't even provide any benefit with
future glibc versions.

In any case, this kind of pushback against fixes for clear bugs used
to be expected, but things have gotten a lot better with musl being
more mainstream nowadays. I think the issue will get resolved quickly
once a few more GCC developers look at it. It was actually just
reopened while I was writing this email.

Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-17 14:35   ` Rich Felker
@ 2019-09-18  7:19     ` Max Neunhoeffer
  2019-09-18  9:21       ` Szabolcs Nagy
  0 siblings, 1 reply; 7+ messages in thread
From: Max Neunhoeffer @ 2019-09-18  7:19 UTC (permalink / raw)
  To: musl

Hi Rich,

thanks for the quick response and for lobbying with the gcc folks!

Did you see the second example program in the original bug report? This
seems to indicate that there might be an additional problem, since when
I explicitly use `pthread_cancel` (thereby circumventing the detection
problem), I get a crash when the first exception is thrown.

Do you think this is a libgcc problem, too? Should I report this to the
gcc bug tracker as well?

Cheers,
  Max.

On 19/09/17 10:35, Rich Felker wrote:
> On Tue, Sep 17, 2019 at 10:02:27AM -0400, Rich Felker wrote:
> > On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> > > Hello,
> > > 
> > > I am experiencing problems when linking a large multithreaded C++ application
> > > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> > > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> > > and gcc 8.3.0-r0.
> > > 
> > > Before going into details, here is an overview:
> > > 
> > > 1. libgcc does not detect correctly that the application is multithreaded,
> > >    since `pthread_cancel` is not linked into the executable.
> > >    As a consequence, the lazy initialization of data structures for stack
> > >    unwinding (FDE tables) is executed without protection of a mutex.
> > >    Therefore, if the very first exception in the program happens to be
> > >    thrown in two threads concurrently, the data structures can be corrupted,
> > >    resulting in a busy loop after `main()` is finished.
> > > 2. If I make sure that I explicitly link in `pthread_cancel` this problem
> > >    is (almost certainly) gone, however, in certain scenarios this leads
> > >    to a crash when the first exception is thrown.
> > > 
> > > I had first reported this problem to gcc as a bug against libgcc, but the
> > > gcc team denies responsibility, see 
> > > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).
> > 
> > This is a gcc bug and needs to be fixed in libgcc.
> 
> I've updated the gcc tracker with more info, but I seem to lack the
> ability to reopen the bug myself.
> 
> To add some more context, using weak references to determine if a
> library is linked is a dynamic-linking-centric hack and is not
> compatible with static linking. GCC has historically done this for
> glibc and other systems where libpthread was a separate library to
> avoid pulling in a dependency on it, but it's always been broken on
> glibc with static linking too. Various distros worked around this with
> horrible hacks as described in Andrew Pinski's reply to your bug
> report, using binutils tricks to move the whole libpthread.a into a
> single .o file so that if any of it gets linked it all gets linked.
> It's possibly upstream glibc adopted this at some point; I'm not sure.
> But they're in the process of moving the mutex functions to libc
> instead of libpthread (and maybe even getting rid of libpthread like
> musl does), so GCC's hacks here won't even provide any benefit with
> future glibc versions.
> 
> In any case, this kind of pushback against fixes for clear bugs used
> to be expected, but things have gotten a lot better with musl being
> more mainstream nowadays. I think the issue will get resolved quickly
> once a few more GCC developers look at it. It was actually just
> reopened while I was writing this email.
> 
> Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-18  7:19     ` Max Neunhoeffer
@ 2019-09-18  9:21       ` Szabolcs Nagy
  2019-09-18 12:45         ` Max Neunhoeffer
  0 siblings, 1 reply; 7+ messages in thread
From: Szabolcs Nagy @ 2019-09-18  9:21 UTC (permalink / raw)
  To: musl

* Max Neunhoeffer <max@arangodb.com> [2019-09-18 09:19:31 +0200]:
> thanks for the quick response and for lobbying with the gcc folks!
> 
> Did you see the second example program in the original bug report? This
> seems to indicate that there might be an additional problem, since when
> I explicitly use `pthread_cancel` (thereby circumventing the detection
> problem), I get a crash when the first exception is thrown.

pthread_cancel does not solve the detection problem.

reference to pthread_cancel only helps with dynamic linking.
in case of static linking you have to explicitly add (strong)
reference to symbols that libgcc_eh.a uses:

pthread_cancel
pthread_getspecific
pthread_key_create
pthread_mutex_lock
pthread_mutex_unlock
pthread_once
pthread_setspecific

where pthread_cancel is only needed to make libgcc_eh.a call the
thread functions (but those are all weakrefs so will just be 0
at runtime unless there are other strong references to them).

> 
> Do you think this is a libgcc problem, too? Should I report this to the
> gcc bug tracker as well?
> 
> Cheers,
>   Max.
> 
> On 19/09/17 10:35, Rich Felker wrote:
> > On Tue, Sep 17, 2019 at 10:02:27AM -0400, Rich Felker wrote:
> > > On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> > > > Hello,
> > > > 
> > > > I am experiencing problems when linking a large multithreaded C++ application
> > > > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> > > > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> > > > and gcc 8.3.0-r0.
> > > > 
> > > > Before going into details, here is an overview:
> > > > 
> > > > 1. libgcc does not detect correctly that the application is multithreaded,
> > > >    since `pthread_cancel` is not linked into the executable.
> > > >    As a consequence, the lazy initialization of data structures for stack
> > > >    unwinding (FDE tables) is executed without protection of a mutex.
> > > >    Therefore, if the very first exception in the program happens to be
> > > >    thrown in two threads concurrently, the data structures can be corrupted,
> > > >    resulting in a busy loop after `main()` is finished.
> > > > 2. If I make sure that I explicitly link in `pthread_cancel` this problem
> > > >    is (almost certainly) gone, however, in certain scenarios this leads
> > > >    to a crash when the first exception is thrown.
> > > > 
> > > > I had first reported this problem to gcc as a bug against libgcc, but the
> > > > gcc team denies responsibility, see 
> > > > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).
> > > 
> > > This is a gcc bug and needs to be fixed in libgcc.
> > 
> > I've updated the gcc tracker with more info, but I seem to lack the
> > ability to reopen the bug myself.
> > 
> > To add some more context, using weak references to determine if a
> > library is linked is a dynamic-linking-centric hack and is not
> > compatible with static linking. GCC has historically done this for
> > glibc and other systems where libpthread was a separate library to
> > avoid pulling in a dependency on it, but it's always been broken on
> > glibc with static linking too. Various distros worked around this with
> > horrible hacks as described in Andrew Pinski's reply to your bug
> > report, using binutils tricks to move the whole libpthread.a into a
> > single .o file so that if any of it gets linked it all gets linked.
> > It's possibly upstream glibc adopted this at some point; I'm not sure.
> > But they're in the process of moving the mutex functions to libc
> > instead of libpthread (and maybe even getting rid of libpthread like
> > musl does), so GCC's hacks here won't even provide any benefit with
> > future glibc versions.
> > 
> > In any case, this kind of pushback against fixes for clear bugs used
> > to be expected, but things have gotten a lot better with musl being
> > more mainstream nowadays. I think the issue will get resolved quickly
> > once a few more GCC developers look at it. It was actually just
> > reopened while I was writing this email.
> > 
> > Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-18  9:21       ` Szabolcs Nagy
@ 2019-09-18 12:45         ` Max Neunhoeffer
  2019-09-24 23:22           ` Rich Felker
  0 siblings, 1 reply; 7+ messages in thread
From: Max Neunhoeffer @ 2019-09-18 12:45 UTC (permalink / raw)
  To: musl

Hello,

thank you very much for the explanation. This gives me a temporary way
to fix up our application until the bug has been fixed.

Cheers,
  Max.

On 19/09/18 11:21, Szabolcs Nagy wrote:
> * Max Neunhoeffer <max@arangodb.com> [2019-09-18 09:19:31 +0200]:
> > thanks for the quick response and for lobbying with the gcc folks!
> > 
> > Did you see the second example program in the original bug report? This
> > seems to indicate that there might be an additional problem, since when
> > I explicitly use `pthread_cancel` (thereby circumventing the detection
> > problem), I get a crash when the first exception is thrown.
> 
> pthread_cancel does not solve the detection problem.
> 
> reference to pthread_cancel only helps with dynamic linking.
> in case of static linking you have to explicitly add (strong)
> reference to symbols that libgcc_eh.a uses:
> 
> pthread_cancel
> pthread_getspecific
> pthread_key_create
> pthread_mutex_lock
> pthread_mutex_unlock
> pthread_once
> pthread_setspecific
> 
> where pthread_cancel is only needed to make libgcc_eh.a call the
> thread functions (but those are all weakrefs so will just be 0
> at runtime unless there are other strong references to them).
> 
> > 
> > Do you think this is a libgcc problem, too? Should I report this to the
> > gcc bug tracker as well?
> > 
> > Cheers,
> >   Max.
> > 
> > On 19/09/17 10:35, Rich Felker wrote:
> > > On Tue, Sep 17, 2019 at 10:02:27AM -0400, Rich Felker wrote:
> > > > On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> > > > > Hello,
> > > > > 
> > > > > I am experiencing problems when linking a large multithreaded C++ application
> > > > > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> > > > > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> > > > > and gcc 8.3.0-r0.
> > > > > 
> > > > > Before going into details, here is an overview:
> > > > > 
> > > > > 1. libgcc does not detect correctly that the application is multithreaded,
> > > > >    since `pthread_cancel` is not linked into the executable.
> > > > >    As a consequence, the lazy initialization of data structures for stack
> > > > >    unwinding (FDE tables) is executed without protection of a mutex.
> > > > >    Therefore, if the very first exception in the program happens to be
> > > > >    thrown in two threads concurrently, the data structures can be corrupted,
> > > > >    resulting in a busy loop after `main()` is finished.
> > > > > 2. If I make sure that I explicitly link in `pthread_cancel` this problem
> > > > >    is (almost certainly) gone, however, in certain scenarios this leads
> > > > >    to a crash when the first exception is thrown.
> > > > > 
> > > > > I had first reported this problem to gcc as a bug against libgcc, but the
> > > > > gcc team denies responsibility, see 
> > > > > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).
> > > > 
> > > > This is a gcc bug and needs to be fixed in libgcc.
> > > 
> > > I've updated the gcc tracker with more info, but I seem to lack the
> > > ability to reopen the bug myself.
> > > 
> > > To add some more context, using weak references to determine if a
> > > library is linked is a dynamic-linking-centric hack and is not
> > > compatible with static linking. GCC has historically done this for
> > > glibc and other systems where libpthread was a separate library to
> > > avoid pulling in a dependency on it, but it's always been broken on
> > > glibc with static linking too. Various distros worked around this with
> > > horrible hacks as described in Andrew Pinski's reply to your bug
> > > report, using binutils tricks to move the whole libpthread.a into a
> > > single .o file so that if any of it gets linked it all gets linked.
> > > It's possibly upstream glibc adopted this at some point; I'm not sure.
> > > But they're in the process of moving the mutex functions to libc
> > > instead of libpthread (and maybe even getting rid of libpthread like
> > > musl does), so GCC's hacks here won't even provide any benefit with
> > > future glibc versions.
> > > 
> > > In any case, this kind of pushback against fixes for clear bugs used
> > > to be expected, but things have gotten a lot better with musl being
> > > more mainstream nowadays. I think the issue will get resolved quickly
> > > once a few more GCC developers look at it. It was actually just
> > > reopened while I was writing this email.
> > > 
> > > Rich


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Bug report, concurrency issue on exception with gcc 8.3.0
  2019-09-18 12:45         ` Max Neunhoeffer
@ 2019-09-24 23:22           ` Rich Felker
  0 siblings, 0 replies; 7+ messages in thread
From: Rich Felker @ 2019-09-24 23:22 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4815 bytes --]

On Wed, Sep 18, 2019 at 02:45:51PM +0200, Max Neunhoeffer wrote:
> Hello,
> 
> thank you very much for the explanation. This gives me a temporary way
> to fix up our application until the bug has been fixed.

I'm adding the attached patch to musl-cross-make; it should fix the
issue adequately on the gcc side.

Rich


> On 19/09/18 11:21, Szabolcs Nagy wrote:
> > * Max Neunhoeffer <max@arangodb.com> [2019-09-18 09:19:31 +0200]:
> > > thanks for the quick response and for lobbying with the gcc folks!
> > > 
> > > Did you see the second example program in the original bug report? This
> > > seems to indicate that there might be an additional problem, since when
> > > I explicitly use `pthread_cancel` (thereby circumventing the detection
> > > problem), I get a crash when the first exception is thrown.
> > 
> > pthread_cancel does not solve the detection problem.
> > 
> > reference to pthread_cancel only helps with dynamic linking.
> > in case of static linking you have to explicitly add (strong)
> > reference to symbols that libgcc_eh.a uses:
> > 
> > pthread_cancel
> > pthread_getspecific
> > pthread_key_create
> > pthread_mutex_lock
> > pthread_mutex_unlock
> > pthread_once
> > pthread_setspecific
> > 
> > where pthread_cancel is only needed to make libgcc_eh.a call the
> > thread functions (but those are all weakrefs so will just be 0
> > at runtime unless there are other strong references to them).
> > 
> > > 
> > > Do you think this is a libgcc problem, too? Should I report this to the
> > > gcc bug tracker as well?
> > > 
> > > Cheers,
> > >   Max.
> > > 
> > > On 19/09/17 10:35, Rich Felker wrote:
> > > > On Tue, Sep 17, 2019 at 10:02:27AM -0400, Rich Felker wrote:
> > > > > On Tue, Sep 17, 2019 at 03:44:22PM +0200, Max Neunhoeffer wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > I am experiencing problems when linking a large multithreaded C++ application
> > > > > > statically against libmusl. I am using Alpine Linux 3.10.1 and gcc 8.3.0
> > > > > > on X86_64. That is, I am using libmusl 1.1.22-r3 (Alpine Linux versioning)
> > > > > > and gcc 8.3.0-r0.
> > > > > > 
> > > > > > Before going into details, here is an overview:
> > > > > > 
> > > > > > 1. libgcc does not detect correctly that the application is multithreaded,
> > > > > >    since `pthread_cancel` is not linked into the executable.
> > > > > >    As a consequence, the lazy initialization of data structures for stack
> > > > > >    unwinding (FDE tables) is executed without protection of a mutex.
> > > > > >    Therefore, if the very first exception in the program happens to be
> > > > > >    thrown in two threads concurrently, the data structures can be corrupted,
> > > > > >    resulting in a busy loop after `main()` is finished.
> > > > > > 2. If I make sure that I explicitly link in `pthread_cancel` this problem
> > > > > >    is (almost certainly) gone, however, in certain scenarios this leads
> > > > > >    to a crash when the first exception is thrown.
> > > > > > 
> > > > > > I had first reported this problem to gcc as a bug against libgcc, but the
> > > > > > gcc team denies responsibility, see 
> > > > > > [this bug report](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737).
> > > > > 
> > > > > This is a gcc bug and needs to be fixed in libgcc.
> > > > 
> > > > I've updated the gcc tracker with more info, but I seem to lack the
> > > > ability to reopen the bug myself.
> > > > 
> > > > To add some more context, using weak references to determine if a
> > > > library is linked is a dynamic-linking-centric hack and is not
> > > > compatible with static linking. GCC has historically done this for
> > > > glibc and other systems where libpthread was a separate library to
> > > > avoid pulling in a dependency on it, but it's always been broken on
> > > > glibc with static linking too. Various distros worked around this with
> > > > horrible hacks as described in Andrew Pinski's reply to your bug
> > > > report, using binutils tricks to move the whole libpthread.a into a
> > > > single .o file so that if any of it gets linked it all gets linked.
> > > > It's possibly upstream glibc adopted this at some point; I'm not sure.
> > > > But they're in the process of moving the mutex functions to libc
> > > > instead of libpthread (and maybe even getting rid of libpthread like
> > > > musl does), so GCC's hacks here won't even provide any benefit with
> > > > future glibc versions.
> > > > 
> > > > In any case, this kind of pushback against fixes for clear bugs used
> > > > to be expected, but things have gotten a lot better with musl being
> > > > more mainstream nowadays. I think the issue will get resolved quickly
> > > > once a few more GCC developers look at it. It was actually just
> > > > reopened while I was writing this email.
> > > > 
> > > > Rich

[-- Attachment #2: 0001-fix-gthr-weak-refs-for-libgcc.patch --]
[-- Type: text/plain, Size: 1675 bytes --]

From 51a354a0fb54165d505bfed9819c0440027312d9 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sun, 22 Sep 2019 23:04:48 +0000
Subject: [PATCH] fix gthr weak refs for libgcc

ideally gthr-posix.h should be fixed not to use weak refs for
single thread detection by default since that's unsafe.

currently we have to opt out explicitly from the unsafe behaviour
in the configure machinery of each target lib that uses gthr and
musl missed libgcc previously.

related bugs and discussions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87189
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737
https://sourceware.org/bugzilla/show_bug.cgi?id=5784
https://sourceware.org/ml/libc-alpha/2012-09/msg00192.html
https://sourceware.org/ml/libc-alpha/2019-08/msg00438.html
---
 libgcc/config.host          | 7 +++++++
 libgcc/config/t-gthr-noweak | 2 ++
 2 files changed, 9 insertions(+)
 create mode 100644 libgcc/config/t-gthr-noweak

diff --git a/libgcc/config.host b/libgcc/config.host
index 122113fc519..fe1b9ab93d5 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1500,3 +1500,10 @@ aarch64*-*-*)
 	tm_file="${tm_file} aarch64/value-unwind.h"
 	;;
 esac
+
+case ${host} in
+*-*-musl*)
+  # The gthr weak references are unsafe with static linking
+  tmake_file="$tmake_file t-gthr-noweak"
+  ;;
+esac
diff --git a/libgcc/config/t-gthr-noweak b/libgcc/config/t-gthr-noweak
new file mode 100644
index 00000000000..45a21e9361d
--- /dev/null
+++ b/libgcc/config/t-gthr-noweak
@@ -0,0 +1,2 @@
+# Don't use weak references for single-thread detection
+HOST_LIBGCC2_CFLAGS += -DGTHREAD_USE_WEAK=0
-- 
2.17.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-24 23:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17 13:44 Bug report, concurrency issue on exception with gcc 8.3.0 Max Neunhoeffer
2019-09-17 14:02 ` Rich Felker
2019-09-17 14:35   ` Rich Felker
2019-09-18  7:19     ` Max Neunhoeffer
2019-09-18  9:21       ` Szabolcs Nagy
2019-09-18 12:45         ` Max Neunhoeffer
2019-09-24 23:22           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).