mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Unwinding multithreaded musl applications with elfutils fails
@ 2023-03-31  2:43 Matt Wozniski
  2023-03-31 11:40 ` Szabolcs Nagy
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Wozniski @ 2023-03-31  2:43 UTC (permalink / raw)
  To: musl

I'm unsure if this is an elfutils bug or a musl bug. I suspect both.
I've already reported this to the elfutils maintainers at
https://sourceware.org/bugzilla/show_bug.cgi?id=30272

Using the elfutils eu-stack program or libdw's dwfl_getthread_frames
API to unwind multithreaded applications linked against musl libc on
x86-64 fails, getting stuck on `__clone`:

TID 241:
<uninteresting frames snipped>
#20 0x00007f6f2f74f08b start
#21 0x00007f6f2f75138e __clone
#22 0x00007f6f2f75138e __clone
#23 0x00007f6f2f75138e __clone
...
#253 0x00007f6f2f75138e __clone
#254 0x00007f6f2f75138e __clone
#255 0x00007f6f2f75138e __clone
eu-stack: tid 241: shown max number of frames (256, use -n 0 for unlimited)


GDB seems to detect the condition that libdw is getting stuck on,
emitting a warning message but terminating:

<uninteresting frames snipped>
#44 0x00007f8f83e4d08b in start (p=0x7f8f836b8b00) at
src/thread/pthread_create.c:203
#45 0x00007f8f83e4f38e in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC

I suspect the cause for gdb's "frame did not save the PC" warning and
elfutils' repeated emission of the same frame is an invalid DWARF CIE
for __clone in musl.


Reproducer:

docker run -it --privileged python:3.10-alpine sh

And in the container:

apk add --update musl-dbg elfutils
python3.10 -c "import os, threading; threading.Thread(target=lambda:
os.system(f'eu-stack --pid={os.getpid()}')).start()"

That spawns a thread that forks a subprocess that runs `eu-stack` on
its parent, and reproduces the issue. If you remove the thread and
just run:

python3.10 -c "import os; os.system(f'eu-stack --pid={os.getpid()}')"

then unwinding succeeds, ending at `_start`.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Unwinding multithreaded musl applications with elfutils fails
  2023-03-31  2:43 [musl] Unwinding multithreaded musl applications with elfutils fails Matt Wozniski
@ 2023-03-31 11:40 ` Szabolcs Nagy
  2023-04-02  2:57   ` Matt Wozniski
  0 siblings, 1 reply; 4+ messages in thread
From: Szabolcs Nagy @ 2023-03-31 11:40 UTC (permalink / raw)
  To: Matt Wozniski; +Cc: musl

* Matt Wozniski <godlygeek@gmail.com> [2023-03-30 22:43:28 -0400]:
> I'm unsure if this is an elfutils bug or a musl bug. I suspect both.
> I've already reported this to the elfutils maintainers at
> https://sourceware.org/bugzilla/show_bug.cgi?id=30272
> 
> Using the elfutils eu-stack program or libdw's dwfl_getthread_frames
> API to unwind multithreaded applications linked against musl libc on
> x86-64 fails, getting stuck on `__clone`:

musl has limited cfi debug info support (target specific), likely the
unwinder needs a

  .cfi_undefined rip

in the clone start function to know where the stack frames end.
(it could figure out the end with the same heuristic that gdb uses,
but apparently elfutils is not smart enough).

some backtracers may want cleared frame-pointer (rbp=0) to detect
the end. but musl does not guarantee frame-pointers either. rbp=0
may be the reason why backtrace in the main thread works, so it
may be enough to do that in threads too.

musl supports building things without any cfi debug info since c
does not require unwind support, but linux systems nowadays assume
unwind tables are part of the platform abi so musl based distros
should probably include it.


> 
> TID 241:
> <uninteresting frames snipped>
> #20 0x00007f6f2f74f08b start
> #21 0x00007f6f2f75138e __clone
> #22 0x00007f6f2f75138e __clone
> #23 0x00007f6f2f75138e __clone
> ...
> #253 0x00007f6f2f75138e __clone
> #254 0x00007f6f2f75138e __clone
> #255 0x00007f6f2f75138e __clone
> eu-stack: tid 241: shown max number of frames (256, use -n 0 for unlimited)
> 
> 
> GDB seems to detect the condition that libdw is getting stuck on,
> emitting a warning message but terminating:
> 
> <uninteresting frames snipped>
> #44 0x00007f8f83e4d08b in start (p=0x7f8f836b8b00) at
> src/thread/pthread_create.c:203
> #45 0x00007f8f83e4f38e in __clone () at src/thread/x86_64/clone.s:22
> Backtrace stopped: frame did not save the PC
> 
> I suspect the cause for gdb's "frame did not save the PC" warning and
> elfutils' repeated emission of the same frame is an invalid DWARF CIE
> for __clone in musl.
> 
> 
> Reproducer:
> 
> docker run -it --privileged python:3.10-alpine sh
> 
> And in the container:
> 
> apk add --update musl-dbg elfutils
> python3.10 -c "import os, threading; threading.Thread(target=lambda:
> os.system(f'eu-stack --pid={os.getpid()}')).start()"
> 
> That spawns a thread that forks a subprocess that runs `eu-stack` on
> its parent, and reproduces the issue. If you remove the thread and
> just run:
> 
> python3.10 -c "import os; os.system(f'eu-stack --pid={os.getpid()}')"
> 
> then unwinding succeeds, ending at `_start`.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Unwinding multithreaded musl applications with elfutils fails
  2023-03-31 11:40 ` Szabolcs Nagy
@ 2023-04-02  2:57   ` Matt Wozniski
  2023-04-03 16:14     ` Szabolcs Nagy
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Wozniski @ 2023-04-02  2:57 UTC (permalink / raw)
  To: Matt Wozniski, musl; +Cc: nsz

On Fri, Mar 31, 2023 at 7:40 AM Szabolcs Nagy <nsz@port70.net> wrote:
>
> * Matt Wozniski <godlygeek@gmail.com> [2023-03-30 22:43:28 -0400]:
> > Using the elfutils eu-stack program or libdw's dwfl_getthread_frames
> > API to unwind multithreaded applications linked against musl libc on
> > x86-64 fails, getting stuck on `__clone`:
>
> musl has limited cfi debug info support (target specific), likely the
> unwinder needs a
>
>   .cfi_undefined rip
>
> in the clone start function to know where the stack frames end.
...
> musl supports building things without any cfi debug info since c
> does not require unwind support, but linux systems nowadays assume
> unwind tables are part of the platform abi so musl based distros
> should probably include it.
...
> musl does not guarantee frame-pointers either

So, if I understand what you're saying correctly: musl itself doesn't
guarantee the ability to unwind through it at all (neither using DWARF
unwind tables nor using frame pointers), but musl based distros like
Alpine ought to include proper unwind tables. Does that mean that you
don't consider the lack of CFI for `__clone` a defect in musl, but
that it's still worth reporting to the Alpine musl maintainers as a
defect in Alpine's musl build?

If so, what would distro maintainers have to do in order to remedy
that defect? Would it be patches to the (target specific) `clone.s` to
add appropriate CFI when building musl for the distro?

> (it could figure out the end with the same heuristic that gdb uses,
> but apparently elfutils is not smart enough).
>
> some backtracers may want cleared frame-pointer (rbp=0) to detect
> the end.
...
> rbp=0 may be the reason why backtrace in the main thread works, so it
> may be enough to do that in threads too.

And it sounds like both of these are workarounds that elfutils might
be able to pursue in the absence of correct unwind information built
into musl itself. Thanks, that gives a useful direction to dig in.

Thanks for the reply!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Unwinding multithreaded musl applications with elfutils fails
  2023-04-02  2:57   ` Matt Wozniski
@ 2023-04-03 16:14     ` Szabolcs Nagy
  0 siblings, 0 replies; 4+ messages in thread
From: Szabolcs Nagy @ 2023-04-03 16:14 UTC (permalink / raw)
  To: Matt Wozniski; +Cc: musl

* Matt Wozniski <godlygeek@gmail.com> [2023-04-01 22:57:09 -0400]:
> On Fri, Mar 31, 2023 at 7:40 AM Szabolcs Nagy <nsz@port70.net> wrote:
> >
> > * Matt Wozniski <godlygeek@gmail.com> [2023-03-30 22:43:28 -0400]:
> > > Using the elfutils eu-stack program or libdw's dwfl_getthread_frames
> > > API to unwind multithreaded applications linked against musl libc on
> > > x86-64 fails, getting stuck on `__clone`:
> >
> > musl has limited cfi debug info support (target specific), likely the
> > unwinder needs a
> >
> >   .cfi_undefined rip
> >
> > in the clone start function to know where the stack frames end.
> ...
> > musl supports building things without any cfi debug info since c
> > does not require unwind support, but linux systems nowadays assume
> > unwind tables are part of the platform abi so musl based distros
> > should probably include it.
> ...
> > musl does not guarantee frame-pointers either
> 
> So, if I understand what you're saying correctly: musl itself doesn't
> guarantee the ability to unwind through it at all (neither using DWARF
> unwind tables nor using frame pointers), but musl based distros like
> Alpine ought to include proper unwind tables. Does that mean that you
> don't consider the lack of CFI for `__clone` a defect in musl, but
> that it's still worth reporting to the Alpine musl maintainers as a
> defect in Alpine's musl build?
> 
> If so, what would distro maintainers have to do in order to remedy
> that defect? Would it be patches to the (target specific) `clone.s` to
> add appropriate CFI when building musl for the distro?

musl has no cfi annotation by default, but there is a tool that adds
it to asm on some targets and the compiler can generate cfi for c code.

i think distros should enable cfi when building musl (currently it is
only in debug builds i think).

but it seems this is not enough to mark the end of the stack frames.

> > (it could figure out the end with the same heuristic that gdb uses,
> > but apparently elfutils is not smart enough).
> >
> > some backtracers may want cleared frame-pointer (rbp=0) to detect
> > the end.
> ...
> > rbp=0 may be the reason why backtrace in the main thread works, so it
> > may be enough to do that in threads too.
> 
> And it sounds like both of these are workarounds that elfutils might
> be able to pursue in the absence of correct unwind information built
> into musl itself. Thanks, that gives a useful direction to dig in.

it seems __clone already has xor %ebp,%ebp

maybe we need a rule in add-cfi.x86_64.awk to emit cfi based on that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-03 16:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31  2:43 [musl] Unwinding multithreaded musl applications with elfutils fails Matt Wozniski
2023-03-31 11:40 ` Szabolcs Nagy
2023-04-02  2:57   ` Matt Wozniski
2023-04-03 16:14     ` Szabolcs Nagy

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).