* [musl] Deadlock in dynamic linker?
@ 2025-05-24 5:45 Markus Wichmann
[not found] ` <20250527152007.GO1827@brightrain.aerifal.cx>
0 siblings, 1 reply; 3+ messages in thread
From: Markus Wichmann @ 2025-05-24 5:45 UTC (permalink / raw)
To: musl
Hi all,
I have a question about the handling of shutting_down in the dynamic
linker. Namely, I saw that do_init_fini() will go into an infinite wait
loop if it is set. The idea was probably to park initializing threads
while the system is shutting down, but can't this lead to a deadlock
situation?
I'm thinking something like this: Thread A initializes liba.so. liba.so
has initializers and finalizers, so thread A adds liba.so to the fini
list before calling the initializers. The liba initializer calls
dlopen("libb.so"). libb.so also has initializers.
While thread A is not holding the init_fini_lock, thread B calls exit().
That progresses until __libc_exit_fini() sets shutting_down to 1. Then
it tries to destroy all the libraries, but the loop stops when it comes
to liba.
liba.so has a ctor_visitor, namely thread A, so thread B cannot advance.
Thread A meanwhile is hanging in the infinite wait loop trying to
initialize libb.so. The situation cannot change, and the process hangs
indefinitely.
A simple way out of this pickle could be to add liba.so to the fini list
only after it was initialized. That way, thread B cannot hang on it, or
more generally, the finalizing thread cannot be halted by an incomplete
initialization in another thread. This might change the order of nodes
on the fini list, but only to account for dynamic dependencies. Isn't
that a good thing?
Ciao,
Markus
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] Deadlock in dynamic linker?
[not found] ` <20250527152007.GO1827@brightrain.aerifal.cx>
@ 2025-05-27 16:59 ` Markus Wichmann
2025-05-27 18:26 ` Rich Felker
0 siblings, 1 reply; 3+ messages in thread
From: Markus Wichmann @ 2025-05-27 16:59 UTC (permalink / raw)
To: musl
Am Tue, May 27, 2025 at 11:20:07AM -0400 schrieb Rich Felker:
> On Sat, May 24, 2025 at 07:45:45AM +0200, Markus Wichmann wrote:
> > I'm thinking something like this: Thread A initializes liba.so. liba.so
> > has initializers and finalizers, so thread A adds liba.so to the fini
> > list before calling the initializers. The liba initializer calls
> > dlopen("libb.so"). libb.so also has initializers.
> >
> > While thread A is not holding the init_fini_lock, thread B calls exit().
> > That progresses until __libc_exit_fini() sets shutting_down to 1. Then
> > it tries to destroy all the libraries, but the loop stops when it comes
> > to liba.
> >
> > liba.so has a ctor_visitor, namely thread A, so thread B cannot advance.
> > Thread A meanwhile is hanging in the infinite wait loop trying to
> > initialize libb.so. The situation cannot change, and the process hangs
> > indefinitely.
>
> I see. In particular you're assuming the dlopen of libb happened after
> the exit started.
>
I had completely neglected to look at the global ldso lock, actually.
But looking at it again, I am actually assuming that the dlopen() is
*starting* before the __libc_exit_fini() (so that thread B hangs waiting
for the lock), but that thread B then overtakes thread A between the
latter's release of the global lock and the taking of the init_fini_lock.
This does mean that taking the init_fini_lock before releasing the
global lock would entirely prevent the issue. Not sure if that's
acceptable, though.
> > A simple way out of this pickle could be to add liba.so to the fini list
> > only after it was initialized. That way, thread B cannot hang on it, or
> > more generally, the finalizing thread cannot be halted by an incomplete
> > initialization in another thread. This might change the order of nodes
> > on the fini list, but only to account for dynamic dependencies. Isn't
> > that a good thing?
>
> No, I think it's non-conforming, and also unsafe, as it can result in
> failure to run a dtor for something whose ctor already ran but did not
> finish. This is a worse outcome than a deadlock in a situation that's
> arguably undefined to begin with.
>
But __libc_exit_fini() refuses to destroy libraries that haven't been
constructed completely. If p->constructed is zero, a node is skipped
even if it is on the fini list. And that flag is set in do_init_fini()
only after all constructors have returned.
By adding nodes to the fini list only when construction has finished, we
would also get rid of the need to have __libc_exit_fini() wait for
construction to finish first.
> What might be acceptable, though, is moving the setting of
> shutting_down to take place after the last dtor is peeled off the
> list. However, this probably requires splitting shutting_down into two
> variables, due to lock order issues. The value is needed under the
> global ldso lock in dlopen() to make dlopen return with an error if
> exit has already begun (this one should be kept before the dtor loop,
> I think), and the value is needed in do_init_fini to block execution
> of new ctors (this one should only take effect after all dtors have
> been run).
>
> Does that sound right?
>
After __libc_exit_fini() has run its course, there is no need to record
anything, because it keeps the init_fini_lock. Anything that could query
shutting_down at that point would already hang taking that lock.
Ciao,
Markus
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [musl] Deadlock in dynamic linker?
2025-05-27 16:59 ` Markus Wichmann
@ 2025-05-27 18:26 ` Rich Felker
0 siblings, 0 replies; 3+ messages in thread
From: Rich Felker @ 2025-05-27 18:26 UTC (permalink / raw)
To: Markus Wichmann; +Cc: musl
On Tue, May 27, 2025 at 06:59:12PM +0200, Markus Wichmann wrote:
> Am Tue, May 27, 2025 at 11:20:07AM -0400 schrieb Rich Felker:
> > On Sat, May 24, 2025 at 07:45:45AM +0200, Markus Wichmann wrote:
> > > I'm thinking something like this: Thread A initializes liba.so. liba.so
> > > has initializers and finalizers, so thread A adds liba.so to the fini
> > > list before calling the initializers. The liba initializer calls
> > > dlopen("libb.so"). libb.so also has initializers.
> > >
> > > While thread A is not holding the init_fini_lock, thread B calls exit().
> > > That progresses until __libc_exit_fini() sets shutting_down to 1. Then
> > > it tries to destroy all the libraries, but the loop stops when it comes
> > > to liba.
> > >
> > > liba.so has a ctor_visitor, namely thread A, so thread B cannot advance.
> > > Thread A meanwhile is hanging in the infinite wait loop trying to
> > > initialize libb.so. The situation cannot change, and the process hangs
> > > indefinitely.
> >
> > I see. In particular you're assuming the dlopen of libb happened after
> > the exit started.
> >
>
> I had completely neglected to look at the global ldso lock, actually.
> But looking at it again, I am actually assuming that the dlopen() is
> *starting* before the __libc_exit_fini() (so that thread B hangs waiting
> for the lock), but that thread B then overtakes thread A between the
> latter's release of the global lock and the taking of the init_fini_lock.
>
> This does mean that taking the init_fini_lock before releasing the
> global lock would entirely prevent the issue. Not sure if that's
> acceptable, though.
>
> > > A simple way out of this pickle could be to add liba.so to the fini list
> > > only after it was initialized. That way, thread B cannot hang on it, or
> > > more generally, the finalizing thread cannot be halted by an incomplete
> > > initialization in another thread. This might change the order of nodes
> > > on the fini list, but only to account for dynamic dependencies. Isn't
> > > that a good thing?
> >
> > No, I think it's non-conforming, and also unsafe, as it can result in
> > failure to run a dtor for something whose ctor already ran but did not
> > finish. This is a worse outcome than a deadlock in a situation that's
> > arguably undefined to begin with.
> >
>
> But __libc_exit_fini() refuses to destroy libraries that haven't been
> constructed completely. If p->constructed is zero, a node is skipped
> even if it is on the fini list. And that flag is set in do_init_fini()
> only after all constructors have returned.
p->constructed being zero can only happen and mean "incompletely
p->constructed" in the case where visitor is self (call to exit from
p->your own ctor). It's not a condition you can encounter. from
p->concurrency, since in that case you would not get past the condvar
p->wait due to there being a visitor.
> > What might be acceptable, though, is moving the setting of
> > shutting_down to take place after the last dtor is peeled off the
> > list. However, this probably requires splitting shutting_down into two
> > variables, due to lock order issues. The value is needed under the
> > global ldso lock in dlopen() to make dlopen return with an error if
> > exit has already begun (this one should be kept before the dtor loop,
> > I think), and the value is needed in do_init_fini to block execution
> > of new ctors (this one should only take effect after all dtors have
> > been run).
> >
> > Does that sound right?
>
> After __libc_exit_fini() has run its course, there is no need to record
> anything, because it keeps the init_fini_lock. Anything that could query
> shutting_down at that point would already hang taking that lock.
Indeed, that sounds right.
However I'm beginning to doubt this solution really works. The problem
is that, if A depends on B, A's ctor can be waiting to run pending B's
ctor finishing. But if a concurrent exit calls B's dtor, then A's ctor
is no longer free to run, because it depends on B being constructed.
We really do need to block execution of any ctor whose deps may
already have been destructed.
Rich
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-27 18:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-24 5:45 [musl] Deadlock in dynamic linker? Markus Wichmann
[not found] ` <20250527152007.GO1827@brightrain.aerifal.cx>
2025-05-27 16:59 ` Markus Wichmann
2025-05-27 18:26 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).