[musl] Making exit actually thread-safe

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] Making exit actually thread-safe
@ 2024-07-24 19:09 Rich Felker
  2024-07-24 20:53 ` [musl] Re: [libc-coord] " Konstantin Belousov
  2024-07-24 21:00 ` [musl] " Zack Weinberg
  0 siblings, 2 replies; 13+ messages in thread
From: Rich Felker @ 2024-07-24 19:09 UTC (permalink / raw)
  To: libc-coord; +Cc: musl, libc-alpha

Inspired by Rust issue 126600, there seems to be renewed interest in
the (non-) thread-safety of exit, namely the C (and copied in POSIX
text) stipulation that calling exit "more than once" results in
undefined behavior.

This language predates threads and thus was certainly written with
recursive calls (via atexit handlers) in mind as the way one might
come to call exit "more than once".

My view is that making calls to exit from multiple threads (one or
more additional threads while the first caller is still acting on
exit) undefined is a mistake and is harmful. There is a clear unique
reasonable behavior for this situation: any remaining callers block
until the first call to exit has finished, at which point they no
longer exist. Leaving exit via longjmp, exceptions, etc. is undefined,
so there are no concerns about what happens in such a situation. The
additional calls to exit all behave as expected: the process
terminates.

Currently there is a glibc tracker item, #31997, and an Austin Group
(POSIX) tracker item, #1845, open for this topic, as well as the
original Rust tracker thread. I have participated in all of these, and
would like to move this forward towards consensus on whether any new
thread-safety requirement should be adopted, and if so, what the
specific semantics should be.

Links:

https://github.com/rust-lang/rust/issues/126600
https://sourceware.org/bugzilla/show_bug.cgi?id=31997
https://austingroupbugs.net/view.php?id=1845

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-24 19:09 [musl] Making exit actually thread-safe Rich Felker
@ 2024-07-24 20:53 ` Konstantin Belousov
  2024-07-24 21:21   ` enh
  2024-07-24 21:00 ` [musl] " Zack Weinberg
  1 sibling, 1 reply; 13+ messages in thread
From: Konstantin Belousov @ 2024-07-24 20:53 UTC (permalink / raw)
  To: libc-coord; +Cc: musl, libc-alpha

On Wed, Jul 24, 2024 at 03:09:48PM -0400, Rich Felker wrote:
> Inspired by Rust issue 126600, there seems to be renewed interest in
> the (non-) thread-safety of exit, namely the C (and copied in POSIX
> text) stipulation that calling exit "more than once" results in
> undefined behavior.
> 
> This language predates threads and thus was certainly written with
> recursive calls (via atexit handlers) in mind as the way one might
> come to call exit "more than once".
> 
> My view is that making calls to exit from multiple threads (one or
> more additional threads while the first caller is still acting on
> exit) undefined is a mistake and is harmful. There is a clear unique
> reasonable behavior for this situation: any remaining callers block
> until the first call to exit has finished, at which point they no
> longer exist. Leaving exit via longjmp, exceptions, etc. is undefined,
> so there are no concerns about what happens in such a situation. The
> additional calls to exit all behave as expected: the process
> terminates.
> 
> Currently there is a glibc tracker item, #31997, and an Austin Group
> (POSIX) tracker item, #1845, open for this topic, as well as the
> original Rust tracker thread. I have participated in all of these, and
> would like to move this forward towards consensus on whether any new
> thread-safety requirement should be adopted, and if so, what the
> specific semantics should be.
> 
> Links:
> 
> https://github.com/rust-lang/rust/issues/126600
> https://sourceware.org/bugzilla/show_bug.cgi?id=31997
> https://austingroupbugs.net/view.php?id=1845

I do think that the wording in the austin group bug is reasonable, and
intend to do that for FreeBSD, https://reviews.freebsd.org/D46108

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: Making exit actually thread-safe
  2024-07-24 19:09 [musl] Making exit actually thread-safe Rich Felker
  2024-07-24 20:53 ` [musl] Re: [libc-coord] " Konstantin Belousov
@ 2024-07-24 21:00 ` Zack Weinberg
  1 sibling, 0 replies; 13+ messages in thread
From: Zack Weinberg @ 2024-07-24 21:00 UTC (permalink / raw)
  To: Rich Felker, libc-coord; +Cc: musl, GNU libc development

On Wed, Jul 24, 2024, at 3:09 PM, Rich Felker wrote:
There is a clear unique
> reasonable behavior for this situation: any remaining callers block
> until the first call to exit has finished, at which point they no
> longer exist. Leaving exit via longjmp, exceptions, etc. is undefined.

I support this change.

The specification text should probably say explicitly that the exit status of the process is the value passed to the first call to exit, unless an atexit handler calls _exit etc.

zw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-24 20:53 ` [musl] Re: [libc-coord] " Konstantin Belousov
@ 2024-07-24 21:21   ` enh
  2024-07-25  0:19     ` Rich Felker
  0 siblings, 1 reply; 13+ messages in thread
From: enh @ 2024-07-24 21:21 UTC (permalink / raw)
  To: libc-coord; +Cc: musl, libc-alpha

you didn't want to go with the recursive mutex variant mentioned? i'm
convinced by this change for Android too, but was leaning towards the
recursive mutex myself...

On Wed, Jul 24, 2024 at 4:53 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Wed, Jul 24, 2024 at 03:09:48PM -0400, Rich Felker wrote:
> > Inspired by Rust issue 126600, there seems to be renewed interest in
> > the (non-) thread-safety of exit, namely the C (and copied in POSIX
> > text) stipulation that calling exit "more than once" results in
> > undefined behavior.
> >
> > This language predates threads and thus was certainly written with
> > recursive calls (via atexit handlers) in mind as the way one might
> > come to call exit "more than once".
> >
> > My view is that making calls to exit from multiple threads (one or
> > more additional threads while the first caller is still acting on
> > exit) undefined is a mistake and is harmful. There is a clear unique
> > reasonable behavior for this situation: any remaining callers block
> > until the first call to exit has finished, at which point they no
> > longer exist. Leaving exit via longjmp, exceptions, etc. is undefined,
> > so there are no concerns about what happens in such a situation. The
> > additional calls to exit all behave as expected: the process
> > terminates.
> >
> > Currently there is a glibc tracker item, #31997, and an Austin Group
> > (POSIX) tracker item, #1845, open for this topic, as well as the
> > original Rust tracker thread. I have participated in all of these, and
> > would like to move this forward towards consensus on whether any new
> > thread-safety requirement should be adopted, and if so, what the
> > specific semantics should be.
> >
> > Links:
> >
> > https://github.com/rust-lang/rust/issues/126600
> > https://sourceware.org/bugzilla/show_bug.cgi?id=31997
> > https://austingroupbugs.net/view.php?id=1845
>
> I do think that the wording in the austin group bug is reasonable, and
> intend to do that for FreeBSD, https://reviews.freebsd.org/D46108

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-24 21:21   ` enh
@ 2024-07-25  0:19     ` Rich Felker
  2024-07-25 12:39       ` enh
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2024-07-25  0:19 UTC (permalink / raw)
  To: libc-coord; +Cc: musl, libc-alpha

On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
> you didn't want to go with the recursive mutex variant mentioned? i'm
> convinced by this change for Android too, but was leaning towards the
> recursive mutex myself...

The change I'm advocating for first is a minimal one, just making
calls from other threads well-defined by blocking until the process
terminates. This is a trivial change that any implementation can adopt
without breaking anything else, and doesn't have any potential
far-reaching consequences.

While some implementations may want to allow (or feel they already
allow, often by accident) recursive calls to exit, imposing a
requirement to do this without a deep dive into where that might lead
seems like a bad idea to me. Even if it is desirable, it's something
that could be considered separately without having the thread-safety
issue blocked on it.

By leaving the recursive case undefined as it was before, any
implementations that want to do that or keep doing that are free to do
so.

Rich



> On Wed, Jul 24, 2024 at 4:53 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> >
> > On Wed, Jul 24, 2024 at 03:09:48PM -0400, Rich Felker wrote:
> > > Inspired by Rust issue 126600, there seems to be renewed interest in
> > > the (non-) thread-safety of exit, namely the C (and copied in POSIX
> > > text) stipulation that calling exit "more than once" results in
> > > undefined behavior.
> > >
> > > This language predates threads and thus was certainly written with
> > > recursive calls (via atexit handlers) in mind as the way one might
> > > come to call exit "more than once".
> > >
> > > My view is that making calls to exit from multiple threads (one or
> > > more additional threads while the first caller is still acting on
> > > exit) undefined is a mistake and is harmful. There is a clear unique
> > > reasonable behavior for this situation: any remaining callers block
> > > until the first call to exit has finished, at which point they no
> > > longer exist. Leaving exit via longjmp, exceptions, etc. is undefined,
> > > so there are no concerns about what happens in such a situation. The
> > > additional calls to exit all behave as expected: the process
> > > terminates.
> > >
> > > Currently there is a glibc tracker item, #31997, and an Austin Group
> > > (POSIX) tracker item, #1845, open for this topic, as well as the
> > > original Rust tracker thread. I have participated in all of these, and
> > > would like to move this forward towards consensus on whether any new
> > > thread-safety requirement should be adopted, and if so, what the
> > > specific semantics should be.
> > >
> > > Links:
> > >
> > > https://github.com/rust-lang/rust/issues/126600
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=31997
> > > https://austingroupbugs.net/view.php?id=1845
> >
> > I do think that the wording in the austin group bug is reasonable, and
> > intend to do that for FreeBSD, https://reviews.freebsd.org/D46108

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-25  0:19     ` Rich Felker
@ 2024-07-25 12:39       ` enh
  2024-07-25 12:48         ` Adhemerval Zanella Netto
  2024-07-26 19:38         ` Markus Wichmann
  0 siblings, 2 replies; 13+ messages in thread
From: enh @ 2024-07-25 12:39 UTC (permalink / raw)
  To: libc-coord; +Cc: musl, libc-alpha

On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
>
> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
> > you didn't want to go with the recursive mutex variant mentioned? i'm
> > convinced by this change for Android too, but was leaning towards the
> > recursive mutex myself...
>
> The change I'm advocating for first is a minimal one, just making
> calls from other threads well-defined by blocking until the process
> terminates. This is a trivial change that any implementation can adopt
> without breaking anything else, and doesn't have any potential
> far-reaching consequences.
>
> While some implementations may want to allow (or feel they already
> allow, often by accident)

yeah, i think that was what made me lean toward the recursive mutex
--- the assumption that _that's_ the option least likely to break
anyone. (i actually thought that was the point of even mentioning it
in the proposal --- the assumption that someone somewhere has an
atexit() handler that calls exit(). normally at this point i'd say "if
i've learned one thing in a decade+ of dealing with Android's libc and
the third-party binary app ecosystem, it's that no matter how crazy a
thing, if you can imagine it, someone's relying on it already", but
since "exiting" isn't really a thing on Android -- you're either
backgrounded or kill -9'ed, and don't typically have any kind of
"quit" functionality yourself -- this is one place where it seems
relatively unlikely.)

> recursive calls to exit, imposing a
> requirement to do this without a deep dive into where that might lead
> seems like a bad idea to me. Even if it is desirable, it's something
> that could be considered separately without having the thread-safety
> issue blocked on it.
>
> By leaving the recursive case undefined as it was before, any
> implementations that want to do that or keep doing that are free to do
> so.

aye, but a program that calls exit() from an atexit() handler is
working for me right now on Android, glibc, and macOS. so there's a
user-visible behavior change here for any of those libcs that goes
with a non-recursive mutex. (i think the same is true for musl too,
but don't have a musl-based system to test on.)

> Rich
>
>
>
> > On Wed, Jul 24, 2024 at 4:53 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
> > >
> > > On Wed, Jul 24, 2024 at 03:09:48PM -0400, Rich Felker wrote:
> > > > Inspired by Rust issue 126600, there seems to be renewed interest in
> > > > the (non-) thread-safety of exit, namely the C (and copied in POSIX
> > > > text) stipulation that calling exit "more than once" results in
> > > > undefined behavior.
> > > >
> > > > This language predates threads and thus was certainly written with
> > > > recursive calls (via atexit handlers) in mind as the way one might
> > > > come to call exit "more than once".
> > > >
> > > > My view is that making calls to exit from multiple threads (one or
> > > > more additional threads while the first caller is still acting on
> > > > exit) undefined is a mistake and is harmful. There is a clear unique
> > > > reasonable behavior for this situation: any remaining callers block
> > > > until the first call to exit has finished, at which point they no
> > > > longer exist. Leaving exit via longjmp, exceptions, etc. is undefined,
> > > > so there are no concerns about what happens in such a situation. The
> > > > additional calls to exit all behave as expected: the process
> > > > terminates.
> > > >
> > > > Currently there is a glibc tracker item, #31997, and an Austin Group
> > > > (POSIX) tracker item, #1845, open for this topic, as well as the
> > > > original Rust tracker thread. I have participated in all of these, and
> > > > would like to move this forward towards consensus on whether any new
> > > > thread-safety requirement should be adopted, and if so, what the
> > > > specific semantics should be.
> > > >
> > > > Links:
> > > >
> > > > https://github.com/rust-lang/rust/issues/126600
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=31997
> > > > https://austingroupbugs.net/view.php?id=1845
> > >
> > > I do think that the wording in the austin group bug is reasonable, and
> > > intend to do that for FreeBSD, https://reviews.freebsd.org/D46108

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-25 12:39       ` enh
@ 2024-07-25 12:48         ` Adhemerval Zanella Netto
  2024-07-26 19:56           ` Rich Felker
  2024-07-26 21:00           ` Carlos O'Donell
  2024-07-26 19:38         ` Markus Wichmann
  1 sibling, 2 replies; 13+ messages in thread
From: Adhemerval Zanella Netto @ 2024-07-25 12:48 UTC (permalink / raw)
  To: libc-coord, enh; +Cc: musl, libc-alpha



On 25/07/24 09:39, enh wrote:
> On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
>>
>> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
>>> you didn't want to go with the recursive mutex variant mentioned? i'm
>>> convinced by this change for Android too, but was leaning towards the
>>> recursive mutex myself...
>>
>> The change I'm advocating for first is a minimal one, just making
>> calls from other threads well-defined by blocking until the process
>> terminates. This is a trivial change that any implementation can adopt
>> without breaking anything else, and doesn't have any potential
>> far-reaching consequences.
>>
>> While some implementations may want to allow (or feel they already
>> allow, often by accident)
> 
> yeah, i think that was what made me lean toward the recursive mutex
> --- the assumption that _that's_ the option least likely to break
> anyone. (i actually thought that was the point of even mentioning it
> in the proposal --- the assumption that someone somewhere has an
> atexit() handler that calls exit(). normally at this point i'd say "if
> i've learned one thing in a decade+ of dealing with Android's libc and
> the third-party binary app ecosystem, it's that no matter how crazy a
> thing, if you can imagine it, someone's relying on it already", but
> since "exiting" isn't really a thing on Android -- you're either
> backgrounded or kill -9'ed, and don't typically have any kind of
> "quit" functionality yourself -- this is one place where it seems
> relatively unlikely.)
> 
>> recursive calls to exit, imposing a
>> requirement to do this without a deep dive into where that might lead
>> seems like a bad idea to me. Even if it is desirable, it's something
>> that could be considered separately without having the thread-safety
>> issue blocked on it.
>>
>> By leaving the recursive case undefined as it was before, any
>> implementations that want to do that or keep doing that are free to do
>> so.
> 
> aye, but a program that calls exit() from an atexit() handler is
> working for me right now on Android, glibc, and macOS. so there's a
> user-visible behavior change here for any of those libcs that goes
> with a non-recursive mutex. (i think the same is true for musl too,
> but don't have a musl-based system to test on.)

I think it is reasonable to not add the constraint to allow recursive
exit, although making this implementation defined will most likely 
pressure to eventually have the resolution on the most used behavior
(unless it is broken by design).

At least for glibc, my plan is to keep current support of allowing it
so mostly likely we will use a recursive mutex. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-25 12:39       ` enh
  2024-07-25 12:48         ` Adhemerval Zanella Netto
@ 2024-07-26 19:38         ` Markus Wichmann
  2024-07-26 19:58           ` Rich Felker
  1 sibling, 1 reply; 13+ messages in thread
From: Markus Wichmann @ 2024-07-26 19:38 UTC (permalink / raw)
  To: musl; +Cc: libc-coord, libc-alpha

Am Thu, Jul 25, 2024 at 08:39:27AM -0400 schrieb enh:
> aye, but a program that calls exit() from an atexit() handler is
> working for me right now on Android, glibc, and macOS. so there's a
> user-visible behavior change here for any of those libcs that goes
> with a non-recursive mutex. (i think the same is true for musl too,
> but don't have a musl-based system to test on.)

For musl, calling exit() from an atexit handler will work somewhat
normally, at the moment. __funcs_on_exit() can be called re-entrant from
a handler without a problem. I think that was mainly to enable calls to
atexit() from atexit handlers (whatever sense that makes), but a side
effect is that calling exit() should also work.

Also note that calling exit() from a destructor (_fini() or DT_FINI or
such like) has significantly more adverse effects. In the dynamic
linking case, it reliably leads to a deadlock on the init_fini_lock. In
the static linking case, it leads to a re-entrant call to
libc_exit_fini(), which will call the same destructor again. Unless the
destructor took measures to prevent this, this will lead to infinite
recursion, and finally a process crash.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-25 12:48         ` Adhemerval Zanella Netto
@ 2024-07-26 19:56           ` Rich Felker
  2024-07-26 21:00           ` Carlos O'Donell
  1 sibling, 0 replies; 13+ messages in thread
From: Rich Felker @ 2024-07-26 19:56 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-coord, enh, musl, libc-alpha

On Thu, Jul 25, 2024 at 09:48:46AM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 25/07/24 09:39, enh wrote:
> > On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
> >>
> >> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
> >>> you didn't want to go with the recursive mutex variant mentioned? i'm
> >>> convinced by this change for Android too, but was leaning towards the
> >>> recursive mutex myself...
> >>
> >> The change I'm advocating for first is a minimal one, just making
> >> calls from other threads well-defined by blocking until the process
> >> terminates. This is a trivial change that any implementation can adopt
> >> without breaking anything else, and doesn't have any potential
> >> far-reaching consequences.
> >>
> >> While some implementations may want to allow (or feel they already
> >> allow, often by accident)
> > 
> > yeah, i think that was what made me lean toward the recursive mutex
> > --- the assumption that _that's_ the option least likely to break
> > anyone. (i actually thought that was the point of even mentioning it
> > in the proposal --- the assumption that someone somewhere has an
> > atexit() handler that calls exit(). normally at this point i'd say "if
> > i've learned one thing in a decade+ of dealing with Android's libc and
> > the third-party binary app ecosystem, it's that no matter how crazy a
> > thing, if you can imagine it, someone's relying on it already", but
> > since "exiting" isn't really a thing on Android -- you're either
> > backgrounded or kill -9'ed, and don't typically have any kind of
> > "quit" functionality yourself -- this is one place where it seems
> > relatively unlikely.)
> > 
> >> recursive calls to exit, imposing a
> >> requirement to do this without a deep dive into where that might lead
> >> seems like a bad idea to me. Even if it is desirable, it's something
> >> that could be considered separately without having the thread-safety
> >> issue blocked on it.
> >>
> >> By leaving the recursive case undefined as it was before, any
> >> implementations that want to do that or keep doing that are free to do
> >> so.
> > 
> > aye, but a program that calls exit() from an atexit() handler is
> > working for me right now on Android, glibc, and macOS. so there's a
> > user-visible behavior change here for any of those libcs that goes
> > with a non-recursive mutex. (i think the same is true for musl too,
> > but don't have a musl-based system to test on.)
> 
> I think it is reasonable to not add the constraint to allow recursive
> exit, although making this implementation defined will most likely 
> pressure to eventually have the resolution on the most used behavior
> (unless it is broken by design).
> 
> At least for glibc, my plan is to keep current support of allowing it
> so mostly likely we will use a recursive mutex. 

Part of the reason I'm hesitant to suggesst specifying any behavior is
that it's a lot messier than we'd probably like to think it is.

There actually is a fairly "good" motivation for wanting recursive
exit to work: it lets atexit handlers (or global dtors) override the
exit code, for example if a write error is detected during cleanup.
While that can already be done by using _exit/_Exit (this is the way
gnulib does it, noting in the comments that exit cannot be called from
an atexit handler without invoking UB), it's somewhat unsatisfying
because it precludes any further execution of other atexit handlers
and precludes leaving stdio streams in a consistent state at exit
(flushed and with underlying fd position updated to match the logical
FILE position).

One might think recursive exit is a good solution here. The natural
behavior is that the currently executing atexit handler will alread
have been popped off the handler registration stack, so that when you
call exit again, things will pick up with the next handler.

The problem is that global dtors are not each their own handler. In
most real-world implementations, there is a single handler that runs
at the end of the atexit handler stack that processes all the global
dtors in the dtor_array or similar. This means that calling exit from
any one of them will skip execution of *all subsequent* dtors, not
just the currently executing one.

I don't see any *reasonable* way to specify this behavior; it's a
consequence of implementation details, not anything present in the
abstract machine.

Probably the closest we could get to a reasonable specification is
stipulating a behavior for exit from an actual abstract-machine atexit
handler (execution picks up at the next handler) but leaving the case
of exit from a global dtor undefined.

There may be other nasty surprises in this area too I haven't yet
thought of, but this is the main one that comes to mind so far.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-26 19:38         ` Markus Wichmann
@ 2024-07-26 19:58           ` Rich Felker
  2024-07-26 21:06             ` Carlos O'Donell
  0 siblings, 1 reply; 13+ messages in thread
From: Rich Felker @ 2024-07-26 19:58 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl, libc-coord, libc-alpha

On Fri, Jul 26, 2024 at 09:38:54PM +0200, Markus Wichmann wrote:
> Am Thu, Jul 25, 2024 at 08:39:27AM -0400 schrieb enh:
> > aye, but a program that calls exit() from an atexit() handler is
> > working for me right now on Android, glibc, and macOS. so there's a
> > user-visible behavior change here for any of those libcs that goes
> > with a non-recursive mutex. (i think the same is true for musl too,
> > but don't have a musl-based system to test on.)
> 
> For musl, calling exit() from an atexit handler will work somewhat
> normally, at the moment. __funcs_on_exit() can be called re-entrant from
> a handler without a problem. I think that was mainly to enable calls to
> atexit() from atexit handlers (whatever sense that makes), but a side
> effect is that calling exit() should also work.
> 
> Also note that calling exit() from a destructor (_fini() or DT_FINI or
> such like) has significantly more adverse effects. In the dynamic
> linking case, it reliably leads to a deadlock on the init_fini_lock. In
> the static linking case, it leads to a re-entrant call to
> libc_exit_fini(), which will call the same destructor again. Unless the
> destructor took measures to prevent this, this will lead to infinite
> recursion, and finally a process crash.

Yes. If we were going to allow recursive calls to exit, we would need
to either leave the case of call from dtor undefined, or add some
mechanism so that a call from a dtor did something predictable and
safe (like skipping all future dtors and going right to __stdio_exit,
or behaving like _exit, etc.)

Short of having a way to define it that we've reasoned to be safe and
consistent, having it trap/abort seems the best course of action.

Rich

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-25 12:48         ` Adhemerval Zanella Netto
  2024-07-26 19:56           ` Rich Felker
@ 2024-07-26 21:00           ` Carlos O'Donell
  2024-07-26 22:40             ` [musl] Re: [libc-coord] " Gabriel Ravier
  1 sibling, 1 reply; 13+ messages in thread
From: Carlos O'Donell @ 2024-07-26 21:00 UTC (permalink / raw)
  To: musl, Adhemerval Zanella Netto, libc-coord, enh; +Cc: libc-alpha

On 7/25/24 8:48 AM, Adhemerval Zanella Netto wrote:
> 
> 
> On 25/07/24 09:39, enh wrote:
>> On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
>>>
>>> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
>>>> you didn't want to go with the recursive mutex variant mentioned? i'm
>>>> convinced by this change for Android too, but was leaning towards the
>>>> recursive mutex myself...
>>>
>>> The change I'm advocating for first is a minimal one, just making
>>> calls from other threads well-defined by blocking until the process
>>> terminates. This is a trivial change that any implementation can adopt
>>> without breaking anything else, and doesn't have any potential
>>> far-reaching consequences.
>>>
>>> While some implementations may want to allow (or feel they already
>>> allow, often by accident)
>>
>> yeah, i think that was what made me lean toward the recursive mutex
>> --- the assumption that _that's_ the option least likely to break
>> anyone. (i actually thought that was the point of even mentioning it
>> in the proposal --- the assumption that someone somewhere has an
>> atexit() handler that calls exit(). normally at this point i'd say "if
>> i've learned one thing in a decade+ of dealing with Android's libc and
>> the third-party binary app ecosystem, it's that no matter how crazy a
>> thing, if you can imagine it, someone's relying on it already", but
>> since "exiting" isn't really a thing on Android -- you're either
>> backgrounded or kill -9'ed, and don't typically have any kind of
>> "quit" functionality yourself -- this is one place where it seems
>> relatively unlikely.)
>>
>>> recursive calls to exit, imposing a
>>> requirement to do this without a deep dive into where that might lead
>>> seems like a bad idea to me. Even if it is desirable, it's something
>>> that could be considered separately without having the thread-safety
>>> issue blocked on it.
>>>
>>> By leaving the recursive case undefined as it was before, any
>>> implementations that want to do that or keep doing that are free to do
>>> so.
>>
>> aye, but a program that calls exit() from an atexit() handler is
>> working for me right now on Android, glibc, and macOS. so there's a
>> user-visible behavior change here for any of those libcs that goes
>> with a non-recursive mutex. (i think the same is true for musl too,
>> but don't have a musl-based system to test on.)
> 
> I think it is reasonable to not add the constraint to allow recursive
> exit, although making this implementation defined will most likely 
> pressure to eventually have the resolution on the most used behavior
> (unless it is broken by design).
> 
> At least for glibc, my plan is to keep current support of allowing it
> so mostly likely we will use a recursive mutex. 

I agree, it's the most conservative thing. I just reviewed your patch on libc-alpha.
Thanks for working on that.

I'm *tempted* to try adding an abort() in the case of atexit() calling exit() and
running a Fedora mass-rebuild to see what triggers :}

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-26 19:58           ` Rich Felker
@ 2024-07-26 21:06             ` Carlos O'Donell
  0 siblings, 0 replies; 13+ messages in thread
From: Carlos O'Donell @ 2024-07-26 21:06 UTC (permalink / raw)
  To: musl, Rich Felker, Markus Wichmann; +Cc: libc-coord, libc-alpha

On 7/26/24 3:58 PM, Rich Felker wrote:
> On Fri, Jul 26, 2024 at 09:38:54PM +0200, Markus Wichmann wrote:
>> Am Thu, Jul 25, 2024 at 08:39:27AM -0400 schrieb enh:
>>> aye, but a program that calls exit() from an atexit() handler is
>>> working for me right now on Android, glibc, and macOS. so there's a
>>> user-visible behavior change here for any of those libcs that goes
>>> with a non-recursive mutex. (i think the same is true for musl too,
>>> but don't have a musl-based system to test on.)
>>
>> For musl, calling exit() from an atexit handler will work somewhat
>> normally, at the moment. __funcs_on_exit() can be called re-entrant from
>> a handler without a problem. I think that was mainly to enable calls to
>> atexit() from atexit handlers (whatever sense that makes), but a side
>> effect is that calling exit() should also work.
>>
>> Also note that calling exit() from a destructor (_fini() or DT_FINI or
>> such like) has significantly more adverse effects. In the dynamic
>> linking case, it reliably leads to a deadlock on the init_fini_lock. In
>> the static linking case, it leads to a re-entrant call to
>> libc_exit_fini(), which will call the same destructor again. Unless the
>> destructor took measures to prevent this, this will lead to infinite
>> recursion, and finally a process crash.
> 
> Yes. If we were going to allow recursive calls to exit, we would need
> to either leave the case of call from dtor undefined, or add some
> mechanism so that a call from a dtor did something predictable and
> safe (like skipping all future dtors and going right to __stdio_exit,
> or behaving like _exit, etc.)
> 
> Short of having a way to define it that we've reasoned to be safe and
> consistent, having it trap/abort seems the best course of action.

I lean slightly towards trap/abort, but I don't think I can make that change today
in glibc without breaking applications... even if the "skip subsequent destructors"
case you mentioned upthread is poor behaviour.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [musl] Re: [libc-coord] Re: [musl] Re: [libc-coord] Making exit actually thread-safe
  2024-07-26 21:00           ` Carlos O'Donell
@ 2024-07-26 22:40             ` Gabriel Ravier
  0 siblings, 0 replies; 13+ messages in thread
From: Gabriel Ravier @ 2024-07-26 22:40 UTC (permalink / raw)
  To: libc-coord, Carlos O'Donell, musl, Adhemerval Zanella Netto, enh
  Cc: libc-alpha

On 7/26/24 11:00 PM, Carlos O'Donell wrote:
> On 7/25/24 8:48 AM, Adhemerval Zanella Netto wrote:
>>
>>
>> On 25/07/24 09:39, enh wrote:
>>> On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@libc.org> wrote:
>>>>
>>>> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
>>>>> you didn't want to go with the recursive mutex variant mentioned? i'm
>>>>> convinced by this change for Android too, but was leaning towards the
>>>>> recursive mutex myself...
>>>>
>>>> The change I'm advocating for first is a minimal one, just making
>>>> calls from other threads well-defined by blocking until the process
>>>> terminates. This is a trivial change that any implementation can adopt
>>>> without breaking anything else, and doesn't have any potential
>>>> far-reaching consequences.
>>>>
>>>> While some implementations may want to allow (or feel they already
>>>> allow, often by accident)
>>>
>>> yeah, i think that was what made me lean toward the recursive mutex
>>> --- the assumption that _that's_ the option least likely to break
>>> anyone. (i actually thought that was the point of even mentioning it
>>> in the proposal --- the assumption that someone somewhere has an
>>> atexit() handler that calls exit(). normally at this point i'd say "if
>>> i've learned one thing in a decade+ of dealing with Android's libc and
>>> the third-party binary app ecosystem, it's that no matter how crazy a
>>> thing, if you can imagine it, someone's relying on it already", but
>>> since "exiting" isn't really a thing on Android -- you're either
>>> backgrounded or kill -9'ed, and don't typically have any kind of
>>> "quit" functionality yourself -- this is one place where it seems
>>> relatively unlikely.)
>>>
>>>> recursive calls to exit, imposing a
>>>> requirement to do this without a deep dive into where that might lead
>>>> seems like a bad idea to me. Even if it is desirable, it's something
>>>> that could be considered separately without having the thread-safety
>>>> issue blocked on it.
>>>>
>>>> By leaving the recursive case undefined as it was before, any
>>>> implementations that want to do that or keep doing that are free to do
>>>> so.
>>>
>>> aye, but a program that calls exit() from an atexit() handler is
>>> working for me right now on Android, glibc, and macOS. so there's a
>>> user-visible behavior change here for any of those libcs that goes
>>> with a non-recursive mutex. (i think the same is true for musl too,
>>> but don't have a musl-based system to test on.)
>>
>> I think it is reasonable to not add the constraint to allow recursive
>> exit, although making this implementation defined will most likely
>> pressure to eventually have the resolution on the most used behavior
>> (unless it is broken by design).
>>
>> At least for glibc, my plan is to keep current support of allowing it
>> so mostly likely we will use a recursive mutex.
> 
> I agree, it's the most conservative thing. I just reviewed your patch on libc-alpha.
> Thanks for working on that.
> 
> I'm *tempted* to try adding an abort() in the case of atexit() calling exit() and
> running a Fedora mass-rebuild to see what triggers :}
> 

An example I can give, though I have no idea whether it would trigger 
during a Fedora mass-rebuild, is that GNU findutils's xargs 
implementation will, upon seeing several processes exit with status 255 
while running in parallel mode (with `-P n` where `n > 0`), call exit() 
from an atexit() handler.

That is, I'm pretty sure that on Fedora (or any other system with GNU 
findutils), with your proposed addition of abort(), running the command:


echo $(seq 1 100) | xargs -P5 -n1 sh -c 'sleep "$0"; echo "$0"; exit 255'


would result in abort() being called. (note: 
https://savannah.gnu.org/bugs/?func=detailitem&item_id=64451 touches on 
this issue, with the exact behavior on this having changed from GNU 
findutils 4.9.0 to its latest git, but it still calls exit() twice)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-07-26 22:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-24 19:09 [musl] Making exit actually thread-safe Rich Felker
2024-07-24 20:53 ` [musl] Re: [libc-coord] " Konstantin Belousov
2024-07-24 21:21   ` enh
2024-07-25  0:19     ` Rich Felker
2024-07-25 12:39       ` enh
2024-07-25 12:48         ` Adhemerval Zanella Netto
2024-07-26 19:56           ` Rich Felker
2024-07-26 21:00           ` Carlos O'Donell
2024-07-26 22:40             ` [musl] Re: [libc-coord] " Gabriel Ravier
2024-07-26 19:38         ` Markus Wichmann
2024-07-26 19:58           ` Rich Felker
2024-07-26 21:06             ` Carlos O'Donell
2024-07-24 21:00 ` [musl] " Zack Weinberg

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).