mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Restrictions on child context after multithreaded fork
@ 2020-08-14 21:41 Rich Felker
  2020-08-14 22:02 ` Florian Weimer
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-14 21:41 UTC (permalink / raw)
  To: musl

musl 1.2.1 has exposed bugs in several applications and libraries
caused by async-signal-unsafe code between (multithreaded) fork and
subsequent exec. So far, dbus library code, pulseaudio library code,
and libvirt have been found to be affected. A couple of the bug
reports (with incomplete information) are:

https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815

Fixing the affected library code looks very straightforward; it's just
a matter of doing proper iterations of existing data/state rather than
allocating lists and using opendir on procfs and such. I've discussed
fixes with Alpine Linux folks and I believe fixes have been tested,
but I don't see any patches in aports yet.

I've seen suspicions that the switch to mallocng exposed this, but I'm
pretty sure it was:

https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c

Before this commit, the (incorrect) lock skipping logic allowed the
child process to access inconsistent state left from the parent if it
violated the requirement not to call AS-unsafe functions. Now, the
lock attempt in the child rightly deadlocks before accessing state
that was being modified under control of the lock in the parent. This
is not specific to malloc but common with anything using libc-internal
locks.

I'll follow up on this thread once there are patches for the known
affected libraries.

Note that this is a type of bug that's possibly hard to get upstreams
to take seriously. libvirt in particular, despite having multiple
comments throughout the source warning developers that they can't do
anything AS-unsafe between fork and exec, is somehow deeming malloc an
exception to that rule because they want to use it (despite it clearly
not being necessary).

And the dbus issue has been known for a long time; see open bug:

https://gitlab.freedesktop.org/dbus/dbus/-/issues/173
(originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843)

This is largely because glibc attempts to make the erroneous usage by
these libraries work (more on that below).

The next issue of POSIX (Issue 8) will drop the requirement that fork
be AS-safe, as a result of Austin Group tracker issue #62. This makes
the glibc behavior permissible/conforming, but there does not seem to
be any effort on the POSIX side to drop the requirement on
applications not to do AS-unsafe things in the child before exec, so
regardless of this change, what these libraries are doing is still
wrong.

In order to make the child environment unrestricted after fork, either
fork must hold *all* locks at the time the actual fork syscall takes
place, or it must be able to reset any state protected by a lock that
was held in the parent (or some mix of the two). It's fundamentally
impossible to do this completely (in a way that lets the child run
unrestricted), since some locks in the parent may be held arbitrarily
long such that fork waiting on them would deadlock. In particular, any
stdio FILE lock may be held indefinitely because there's a blocking
operation in progress on the underlying fd, or because the application
has called flockfile. Thus, at best, the implementation can give the
child an environment where fflush(0) and exit() still deadlock.

In case we do want to follow a direction of trying to provide some
degree of relaxation of restrictions on the child (taking the liberty
of POSIX-future drop of fork's AS-safety requirement), I did a quick
survey of libc-internal locks, and found:

- at_quick_exit
- atexit
- dlerror
- gettext
- malloc
- pthread_atfork (already necessarily held at fork)
- random
- sem_open
- stdio open file list (vs individual FILEs)
- syslog
- timezone

This list looks tractable. Aside from malloc, whose locks would need
to be taken last since the others may call malloc, these don't seem to
have any lock order dependencies between them, and each one's lock
functions could be provided as strong overrides to weak no-op
definitions in fork.c.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
@ 2020-08-14 22:02 ` Florian Weimer
  2020-08-14 22:14   ` Rich Felker
  2020-08-15  0:47 ` A. Wilcox
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-08-14 22:02 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:

pthread_once is another source of problems, although there seem to be
few users in musl.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 22:02 ` Florian Weimer
@ 2020-08-14 22:14   ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-14 22:14 UTC (permalink / raw)
  To: musl

On Sat, Aug 15, 2020 at 12:02:01AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > In case we do want to follow a direction of trying to provide some
> > degree of relaxation of restrictions on the child (taking the liberty
> > of POSIX-future drop of fork's AS-safety requirement), I did a quick
> > survey of libc-internal locks, and found:
> 
> pthread_once is another source of problems, although there seem to be
> few users in musl.

Indeed, there are a few internal uses of pthread_once and public
locking functions that I missed (I only checked internal LOCK()). So
at least the following are also involved:

- timer_create (will go away once I rip out signal-based SIGEV_THREAD
  and implement SIGEV_THREAD timers entirely in userspace)
- aio (I'm not sure it's worth considering since it's probably another
  area that's impossible to make safe to use in forked context)
- newlocale
- pthread_mutexattr_setprotocol
- pthread_mutexattr_setrobust

For the most part, I believe the "once" operations performed in these
are essentially idempotent, so it would probably suffice to have them
just provide "reset" code that zeros the once object in the child if
needed.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
  2020-08-14 22:02 ` Florian Weimer
@ 2020-08-15  0:47 ` A. Wilcox
  2020-08-15  2:40   ` Rich Felker
  2020-08-15  2:07 ` Ariadne Conill
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: A. Wilcox @ 2020-08-15  0:47 UTC (permalink / raw)
  To: musl


[-- Attachment #1.1: Type: text/plain, Size: 5043 bytes --]

On 14/08/2020 16:41, Rich Felker wrote:
> musl 1.2.1 has exposed bugs in several applications and libraries
> caused by async-signal-unsafe code between (multithreaded) fork and
> subsequent exec. So far, dbus library code, pulseaudio library code,
> and libvirt have been found to be affected. 
> 
> Fixing the affected library code looks very straightforward; it's just
> a matter of doing proper iterations of existing data/state rather than
> allocating lists and using opendir on procfs and such. I've discussed
> fixes with Alpine Linux folks and I believe fixes have been tested,
> but I don't see any patches in aports yet.


Over at Adélie we've been monitoring the situation intently.  We're not
bumping musl to 1.2.1 until we have time to do a full tree test (similar
to time64) due to this.

Fixing dbus and pa are on my list of post-1.0 things to do, since they
are so vital to modern desktop usage.  If someone else patches it in the
meantime, that's even better.


> I've seen suspicions that the switch to mallocng exposed this, but I'm
> pretty sure it was:
> 
> https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c


Doesn't make sense to me.  We backported that to Adélie's musl 1.2.0
package the day it was committed to musl, and Firefox has been rock
solid for me for months.  No deadlocks, no weird behaviour.  Similar
with other apps that use D-Bus and Pulse.  At least on ppc64, x86_64,
and aarch64 (my daily driver arches), it's been through almost three
months of usage without issue.

Perhaps it's the mixture of that commit + malloc-ng?


> I'll follow up on this thread once there are patches for the known
> affected libraries.


Thanks!  I'll do the same, if I manage to write patches.


> Note that this is a type of bug that's possibly hard to get upstreams
> to take seriously. libvirt in particular, despite having multiple
> comments throughout the source warning developers that they can't do
> anything AS-unsafe between fork and exec, is somehow deeming malloc an
> exception to that rule because they want to use it (despite it clearly
> not being necessary).


libvirt already isn't usable on musl, so this isn't really something we
are going to spend any time on.  Others are free to do it.


> And the dbus issue has been known for a long time; see open bug:
> 
> https://gitlab.freedesktop.org/dbus/dbus/-/issues/173
> (originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843)
> 
> This is largely because glibc attempts to make the erroneous usage by
> these libraries work (more on that below).
> 
> The next issue of POSIX (Issue 8) will drop the requirement that fork
> be AS-safe, as a result of Austin Group tracker issue #62. This makes
> the glibc behavior permissible/conforming, but there does not seem to
> be any effort on the POSIX side to drop the requirement on
> applications not to do AS-unsafe things in the child before exec, so
> regardless of this change, what these libraries are doing is still
> wrong.
> 
> In order to make the child environment unrestricted after fork, either
> fork must hold *all* locks at the time the actual fork syscall takes
> place, or it must be able to reset any state protected by a lock that
> was held in the parent (or some mix of the two). It's fundamentally
> impossible to do this completely (in a way that lets the child run
> unrestricted), since some locks in the parent may be held arbitrarily
> long such that fork waiting on them would deadlock. In particular, any
> stdio FILE lock may be held indefinitely because there's a blocking
> operation in progress on the underlying fd, or because the application
> has called flockfile. Thus, at best, the implementation can give the
> child an environment where fflush(0) and exit() still deadlock.
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:
> 
> - at_quick_exit
> - atexit
> - dlerror
> - gettext
> - malloc
> - pthread_atfork (already necessarily held at fork)
> - random
> - sem_open
> - stdio open file list (vs individual FILEs)
> - syslog
> - timezone
> 
> This list looks tractable. Aside from malloc, whose locks would need
> to be taken last since the others may call malloc, these don't seem to
> have any lock order dependencies between them, and each one's lock
> functions could be provided as strong overrides to weak no-op
> definitions in fork.c.


Some of these seem more important than others.  Including the list in
your follow-up, I would say a priority would be

- malloc
- gettext
- random

in roughly that order.  That's just based on my experience with these
not-great codebases.

Best,
--arw


-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
  2020-08-14 22:02 ` Florian Weimer
  2020-08-15  0:47 ` A. Wilcox
@ 2020-08-15  2:07 ` Ariadne Conill
  2020-08-15  3:02   ` Rich Felker
  2020-08-15  6:51 ` Timo Teras
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Ariadne Conill @ 2020-08-15  2:07 UTC (permalink / raw)
  To: musl

Hello,

On 2020-08-14 15:41, Rich Felker wrote:
> musl 1.2.1 has exposed bugs in several applications and libraries
> caused by async-signal-unsafe code between (multithreaded) fork and
> subsequent exec. So far, dbus library code, pulseaudio library code,
> and libvirt have been found to be affected. A couple of the bug
> reports (with incomplete information) are:

I suspect the dbus library code has been broken for some amount of time 
because we've been tracking an issue where dbus stalls on KDE Wayland 
for a short period of time causing the session to come up strangely. 
1.2.1 did make this stall more pronounced though.

> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815
> 
> Fixing the affected library code looks very straightforward; it's just
> a matter of doing proper iterations of existing data/state rather than
> allocating lists and using opendir on procfs and such. I've discussed
> fixes with Alpine Linux folks and I believe fixes have been tested,
> but I don't see any patches in aports yet.

I'll work on getting patches in next week.

> I've seen suspicions that the switch to mallocng exposed this, but I'm
> pretty sure it was:
> 
> https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c

We carried this patch before upgrade to 1.2.1 and it was mostly fine, 
has been carried since 1.1.24-r7 AFAIK.

> Before this commit, the (incorrect) lock skipping logic allowed the
> child process to access inconsistent state left from the parent if it
> violated the requirement not to call AS-unsafe functions. Now, the
> lock attempt in the child rightly deadlocks before accessing state
> that was being modified under control of the lock in the parent. This
> is not specific to malloc but common with anything using libc-internal
> locks.
> 
> I'll follow up on this thread once there are patches for the known
> affected libraries.
> 
> Note that this is a type of bug that's possibly hard to get upstreams
> to take seriously. libvirt in particular, despite having multiple
> comments throughout the source warning developers that they can't do
> anything AS-unsafe between fork and exec, is somehow deeming malloc an
> exception to that rule because they want to use it (despite it clearly
> not being necessary).
> 
> And the dbus issue has been known for a long time; see open bug:
> 
> https://gitlab.freedesktop.org/dbus/dbus/-/issues/173
> (originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843)
> 
> This is largely because glibc attempts to make the erroneous usage by
> these libraries work (more on that below).
> 
> The next issue of POSIX (Issue 8) will drop the requirement that fork
> be AS-safe, as a result of Austin Group tracker issue #62. This makes
> the glibc behavior permissible/conforming, but there does not seem to
> be any effort on the POSIX side to drop the requirement on
> applications not to do AS-unsafe things in the child before exec, so
> regardless of this change, what these libraries are doing is still
> wrong.
> 
> In order to make the child environment unrestricted after fork, either
> fork must hold *all* locks at the time the actual fork syscall takes
> place, or it must be able to reset any state protected by a lock that
> was held in the parent (or some mix of the two). It's fundamentally
> impossible to do this completely (in a way that lets the child run
> unrestricted), since some locks in the parent may be held arbitrarily
> long such that fork waiting on them would deadlock. In particular, any
> stdio FILE lock may be held indefinitely because there's a blocking
> operation in progress on the underlying fd, or because the application
> has called flockfile. Thus, at best, the implementation can give the
> child an environment where fflush(0) and exit() still deadlock.
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:

I think it is better to fix programs to not depend on AS-unsafe 
functions at fork time.  Being lax on this requirement is an indicator 
of other bad engineering decisions in these programs, especially libvirt 
and pulseaudio.

Ariadne

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-15  0:47 ` A. Wilcox
@ 2020-08-15  2:40   ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-15  2:40 UTC (permalink / raw)
  To: musl

On Fri, Aug 14, 2020 at 07:47:28PM -0500, A. Wilcox wrote:
> On 14/08/2020 16:41, Rich Felker wrote:
> > musl 1.2.1 has exposed bugs in several applications and libraries
> > caused by async-signal-unsafe code between (multithreaded) fork and
> > subsequent exec. So far, dbus library code, pulseaudio library code,
> > and libvirt have been found to be affected. 
> > 
> > Fixing the affected library code looks very straightforward; it's just
> > a matter of doing proper iterations of existing data/state rather than
> > allocating lists and using opendir on procfs and such. I've discussed
> > fixes with Alpine Linux folks and I believe fixes have been tested,
> > but I don't see any patches in aports yet.
> 
> Over at Adélie we've been monitoring the situation intently.  We're not
> bumping musl to 1.2.1 until we have time to do a full tree test (similar
> to time64) due to this.

Just be aware: any bugs exposed by this as deadlocks were already
present, just manifesting as access to inconsistent state. It's
possible for them to yield freelist corruption, etc. So even if not
upgrading musl yet, fixing them should be treated as high-priority.

> > I've seen suspicions that the switch to mallocng exposed this, but I'm
> > pretty sure it was:
> > 
> > https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c
> 
> Doesn't make sense to me.  We backported that to Adélie's musl 1.2.0
> package the day it was committed to musl, and Firefox has been rock
> solid for me for months.  No deadlocks, no weird behaviour.  Similar
> with other apps that use D-Bus and Pulse.  At least on ppc64, x86_64,
> and aarch64 (my daily driver arches), it's been through almost three
> months of usage without issue.
> 
> Perhaps it's the mixture of that commit + malloc-ng?

I was about to say I'm strongly recommending e01b5939b3 for
backporting, but you've already got that. With oldmalloc each size
class has a separate lock, so instead of always getting a deadlock if
parent was using malloc at the time of fork, you'd expect to only get
it if parent was allocating in the same class the child requests, or
if parent was performing free (when a global free-lock is held
momentarily). I'm still rather surprised you don't hit them at all,
rather than just hitting them less frequently.

> > Note that this is a type of bug that's possibly hard to get upstreams
> > to take seriously. libvirt in particular, despite having multiple
> > comments throughout the source warning developers that they can't do
> > anything AS-unsafe between fork and exec, is somehow deeming malloc an
> > exception to that rule because they want to use it (despite it clearly
> > not being necessary).
> 
> libvirt already isn't usable on musl, so this isn't really something we
> are going to spend any time on.  Others are free to do it.

Alpine is and has been using it, apparently. Are you aware of others
blockers for compat with musl?

> > And the dbus issue has been known for a long time; see open bug:
> > 
> > https://gitlab.freedesktop.org/dbus/dbus/-/issues/173
> > (originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843)
> > 
> > This is largely because glibc attempts to make the erroneous usage by
> > these libraries work (more on that below).
> > 
> > The next issue of POSIX (Issue 8) will drop the requirement that fork
> > be AS-safe, as a result of Austin Group tracker issue #62. This makes
> > the glibc behavior permissible/conforming, but there does not seem to
> > be any effort on the POSIX side to drop the requirement on
> > applications not to do AS-unsafe things in the child before exec, so
> > regardless of this change, what these libraries are doing is still
> > wrong.
> > 
> > In order to make the child environment unrestricted after fork, either
> > fork must hold *all* locks at the time the actual fork syscall takes
> > place, or it must be able to reset any state protected by a lock that
> > was held in the parent (or some mix of the two). It's fundamentally
> > impossible to do this completely (in a way that lets the child run
> > unrestricted), since some locks in the parent may be held arbitrarily
> > long such that fork waiting on them would deadlock. In particular, any
> > stdio FILE lock may be held indefinitely because there's a blocking
> > operation in progress on the underlying fd, or because the application
> > has called flockfile. Thus, at best, the implementation can give the
> > child an environment where fflush(0) and exit() still deadlock.
> > 
> > In case we do want to follow a direction of trying to provide some
> > degree of relaxation of restrictions on the child (taking the liberty
> > of POSIX-future drop of fork's AS-safety requirement), I did a quick
> > survey of libc-internal locks, and found:
> > 
> > - at_quick_exit
> > - atexit
> > - dlerror
> > - gettext
> > - malloc
> > - pthread_atfork (already necessarily held at fork)
> > - random
> > - sem_open
> > - stdio open file list (vs individual FILEs)
> > - syslog
> > - timezone
> > 
> > This list looks tractable. Aside from malloc, whose locks would need
> > to be taken last since the others may call malloc, these don't seem to
> > have any lock order dependencies between them, and each one's lock
> > functions could be provided as strong overrides to weak no-op
> > definitions in fork.c.
> 
> 
> Some of these seem more important than others.  Including the list in
> your follow-up, I would say a priority would be
> 
> - malloc
> - gettext
> - random
> 
> in roughly that order.  That's just based on my experience with these
> not-great codebases.

If we go this route, I'll cover them all; it's not any harder than
doing just some of them. My enumeration of them was more to determine
whether the problem is tractable and whether any have special
properties than make them especially difficult. Assuming that's not
the case, there's a standard pattern. In the source file for "foo":

void __foo_atfork(int x)
{
	if (x<0) LOCK(mylock);
	else UNLOCK(mylock);
}

and in fork.c:

static void dummy_atfork(int x)
{
}

weak_alias(dummy_atfork, __foo_atfork);

with __foo_atfork(-1) before the fork and __foo_atfork(!ret) after, in
proper ordering (after __fork_handler, before __block_all_sigs for the
locking, and reverse order for the unlocking).

malloc needs to be last, and there need to be either 2 versions to
cover the bump allocator or some special machinery so the bump
allocator's function gets replaced by the real one if real allocator
is linked.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-15  2:07 ` Ariadne Conill
@ 2020-08-15  3:02   ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-15  3:02 UTC (permalink / raw)
  To: musl

On Fri, Aug 14, 2020 at 08:07:26PM -0600, Ariadne Conill wrote:
> Hello,
> 
> On 2020-08-14 15:41, Rich Felker wrote:
> >musl 1.2.1 has exposed bugs in several applications and libraries
> >caused by async-signal-unsafe code between (multithreaded) fork and
> >subsequent exec. So far, dbus library code, pulseaudio library code,
> >and libvirt have been found to be affected. A couple of the bug
> >reports (with incomplete information) are:
> 
> I suspect the dbus library code has been broken for some amount of
> time because we've been tracking an issue where dbus stalls on KDE
> Wayland for a short period of time causing the session to come up
> strangely. 1.2.1 did make this stall more pronounced though.

Yes, the open bug on their tracker goes back something like 3 years or
more, with no resolution or progress.

> >https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
> >https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815
> >
> >Fixing the affected library code looks very straightforward; it's just
> >a matter of doing proper iterations of existing data/state rather than
> >allocating lists and using opendir on procfs and such. I've discussed
> >fixes with Alpine Linux folks and I believe fixes have been tested,
> >but I don't see any patches in aports yet.
> 
> I'll work on getting patches in next week.

Great!

> >I've seen suspicions that the switch to mallocng exposed this, but I'm
> >pretty sure it was:
> >
> >https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c
> 
> We carried this patch before upgrade to 1.2.1 and it was mostly
> fine, has been carried since 1.1.24-r7 AFAIK.

A. Wilcox reported similar findings, so either something more subtle
is going on, or the per-bin locks in oldmalloc just made it less
likely to hit the same lock in the child.

> >In case we do want to follow a direction of trying to provide some
> >degree of relaxation of restrictions on the child (taking the liberty
> >of POSIX-future drop of fork's AS-safety requirement), I did a quick
> >survey of libc-internal locks, and found:
> 
> I think it is better to fix programs to not depend on AS-unsafe
> functions at fork time.  Being lax on this requirement is an
> indicator of other bad engineering decisions in these programs,
> especially libvirt and pulseaudio.

My leaning is the same -- the code is doing sketchy stuff the authors
know is wrong, and should be fixed. And with current musl, it should
reliably deadlock if there's a problem, rather than corrupting memory
like in the past, so there's not a safety motivation for changing
this.

Since POSIX is dropping the AS-safety requirement from fork, though,
there is more legitimacy to arguments that libc "should" support this
usage (even though POSIX does not seem to be taking any action to
support such an expectation, just to make it so glibc is no longer
non-conforming in this regard).

With that said, I think it's at least something where we should watch
what other implementations are doing, and whether it becomes a QoI
issue. One compelling argument for doing it is that it makes it more
possible to link code with conflicting expectations, and so use of
threads as an implementation detail in libraries doesn't "leak"
impositions on the application using the library. Right now there are
sort of two competing perspectives:

1. Traditional use of fork is a very normal thing, and because
   secretly using threads behind the scenes can break that, you
   shouldn't secretly use threads behind the scenes.

2. Using threads as an implementation detail of library code is a very
   normal thing, and because forking and doing non-AS-safe stuff after
   you fork can break that, you shouldn't use fork in that way.

I'm strongly in camp 2 (for other reasons as well; I have plenty
reasons to hate fork and want to see it deprecated and replaced by
whatever enhancements to posix_spawn are needed to make users happy)
but I do see value in reconciling the two camps, too.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
                   ` (2 preceding siblings ...)
  2020-08-15  2:07 ` Ariadne Conill
@ 2020-08-15  6:51 ` Timo Teras
  2020-08-15 11:51   ` Natanael Copa
  2020-08-16  3:57 ` Rich Felker
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Timo Teras @ 2020-08-15  6:51 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Fri, 14 Aug 2020 17:41:38 -0400
Rich Felker <dalias@libc.org> wrote:

> musl 1.2.1 has exposed bugs in several applications and libraries
> caused by async-signal-unsafe code between (multithreaded) fork and
> subsequent exec. So far, dbus library code, pulseaudio library code,
> and libvirt have been found to be affected. A couple of the bug
> reports (with incomplete information) are:
> 
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815

Add to that list glib and libvte.

XFCE4 became quite unusable due to glib. Fortunately, it was fixed
quite fast, and is merged for Alpine already:
https://gitlab.gnome.org/GNOME/glib/-/issues/2140

Unfortunately, libvte duplicates the some of the code and the issue:
there's https://gitlab.gnome.org/GNOME/vte/-/issues/263
That got fixed relatively fast too in git master, but is not backported
to any stables branch. So that's not merged yet in Alpine. And is
causing still random lock ups in e.g. xfce4-terminal.

- Timo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-15  6:51 ` Timo Teras
@ 2020-08-15 11:51   ` Natanael Copa
  2020-08-15 16:25     ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Natanael Copa @ 2020-08-15 11:51 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker



> 15. aug. 2020 kl. 08:51 skrev Timo Teras <timo.teras@iki.fi>:
> 
> On Fri, 14 Aug 2020 17:41:38 -0400
> Rich Felker <dalias@libc.org> wrote:
> 
>> musl 1.2.1 has exposed bugs in several applications and libraries
>> caused by async-signal-unsafe code between (multithreaded) fork and
>> subsequent exec. So far, dbus library code, pulseaudio library code,
>> and libvirt have been found to be affected. A couple of the bug
>> reports (with incomplete information) are:
>> 
>> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
>> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815
> 
> Add to that list glib and libvte.
> 
> XFCE4 became quite unusable due to glib. Fortunately, it was fixed
> quite fast, and is merged for Alpine already:
> https://gitlab.gnome.org/GNOME/glib/-/issues/2140
> 
> Unfortunately, libvte duplicates the some of the code and the issue:
> there's https://gitlab.gnome.org/GNOME/vte/-/issues/263
> That got fixed relatively fast too in git master, but is not backported
> to any stables branch. So that's not merged yet in Alpine. And is
> causing still random lock ups in e.g. xfce4-terminal.
> 

xfce4-terminal was more or less completely useless so I had to add a workaround for it in two patches:

First uncover a useless setenv. Even the comment in the code says that it has no effect:
https://git.alpinelinux.org/aports/commit/community/vte3?id=ad687b01b2a5fa9d53fcd9d0ee3743882f3542b4

In the second patch I use some of the hunks in the upstream and replace malloc with alloca:
https://git.alpinelinux.org/aports/commit/community/vte3?id=161434fcb87807dae40dffdd332db1624b747bc7

After that xfce4-terminal becomes useable again.


-nc



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-15 11:51   ` Natanael Copa
@ 2020-08-15 16:25     ` Rich Felker
  2020-08-16  1:27       ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-08-15 16:25 UTC (permalink / raw)
  To: musl

On Sat, Aug 15, 2020 at 01:51:00PM +0200, Natanael Copa wrote:
> 
> 
> > 15. aug. 2020 kl. 08:51 skrev Timo Teras <timo.teras@iki.fi>:
> > 
> > On Fri, 14 Aug 2020 17:41:38 -0400
> > Rich Felker <dalias@libc.org> wrote:
> > 
> >> musl 1.2.1 has exposed bugs in several applications and libraries
> >> caused by async-signal-unsafe code between (multithreaded) fork and
> >> subsequent exec. So far, dbus library code, pulseaudio library code,
> >> and libvirt have been found to be affected. A couple of the bug
> >> reports (with incomplete information) are:
> >> 
> >> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
> >> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815
> > 
> > Add to that list glib and libvte.
> > 
> > XFCE4 became quite unusable due to glib. Fortunately, it was fixed
> > quite fast, and is merged for Alpine already:
> > https://gitlab.gnome.org/GNOME/glib/-/issues/2140
> > 
> > Unfortunately, libvte duplicates the some of the code and the issue:
> > there's https://gitlab.gnome.org/GNOME/vte/-/issues/263
> > That got fixed relatively fast too in git master, but is not backported
> > to any stables branch. So that's not merged yet in Alpine. And is
> > causing still random lock ups in e.g. xfce4-terminal.
> > 
> 
> xfce4-terminal was more or less completely useless so I had to add a workaround for it in two patches:
> 
> First uncover a useless setenv. Even the comment in the code says that it has no effect:
> https://git.alpinelinux.org/aports/commit/community/vte3?id=ad687b01b2a5fa9d53fcd9d0ee3743882f3542b4
> 
> In the second patch I use some of the hunks in the upstream and replace malloc with alloca:
> https://git.alpinelinux.org/aports/commit/community/vte3?id=161434fcb87807dae40dffdd332db1624b747bc7
> 
> After that xfce4-terminal becomes useable again.

I'm not sure whether the alloca is safe. The parent could just do this
allocation *before fork* rather than waiting til the last minute to do
it. The data does not change between fork and exec. Note that the glib
fix cited above did this right.

The hardcoded fallback for fd limit seems like a bad idea, and I don't
think I understand your fallback sequence. It looks like RLIM_INFINITY
gets reinterpreted as 4096. I'm not sure how it should be interpreted;
in principle, probably as INT_MAX+1U. This is again exposing how the
whole "close-all" idiom is horribly wrong and these libraries should
just be documenting that you must use O_CLOEXEC correctly if you don't
want them to leak file descriptors.

For what it's worth, I don't really see any reason prlimit should be
treated as AS-safe but getrlimit and sysconf shouldn't. I'm fairly
well in favor of having at least sysconf be one of the functions musl
makes AS-safe as an extension, though perhaps we should discuss that
more. (Actually I'd like to make a project of documenting everything
we make AS-safe, including things like strerror, dprintf/snprintf,
etc.)

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-15 16:25     ` Rich Felker
@ 2020-08-16  1:27       ` Rich Felker
  2020-08-16 12:48         ` Natanael Copa
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-08-16  1:27 UTC (permalink / raw)
  To: musl

On Sat, Aug 15, 2020 at 12:25:47PM -0400, Rich Felker wrote:
> On Sat, Aug 15, 2020 at 01:51:00PM +0200, Natanael Copa wrote:
> > xfce4-terminal was more or less completely useless so I had to add a workaround for it in two patches:
> > 
> > First uncover a useless setenv. Even the comment in the code says that it has no effect:
> > https://git.alpinelinux.org/aports/commit/community/vte3?id=ad687b01b2a5fa9d53fcd9d0ee3743882f3542b4
> > 
> > In the second patch I use some of the hunks in the upstream and replace malloc with alloca:
> > https://git.alpinelinux.org/aports/commit/community/vte3?id=161434fcb87807dae40dffdd332db1624b747bc7
> > 
> > After that xfce4-terminal becomes useable again.
> 
> I'm not sure whether the alloca is safe. The parent could just do this
> allocation *before fork* rather than waiting til the last minute to do
> it. The data does not change between fork and exec. Note that the glib
> fix cited above did this right.
> 
> The hardcoded fallback for fd limit seems like a bad idea, and I don't
> think I understand your fallback sequence. It looks like RLIM_INFINITY
> gets reinterpreted as 4096. I'm not sure how it should be interpreted;
> in principle, probably as INT_MAX+1U. This is again exposing how the
> whole "close-all" idiom is horribly wrong and these libraries should
> just be documenting that you must use O_CLOEXEC correctly if you don't
> want them to leak file descriptors.

Fortunately it looks like Linux doesn't let you set RLIM_INFINITY for
RLIM_NOFILE. The value of the rlimit is limited by sysctl fs.nr_open,
and fs.nr_open is limited between sysctl_nr_open_min (==BITS_PER_LONG)
and sysctl_nr_open_max. The latter is INT_MAX/8*8 on 64-bit archs, and
SIZE_MAX/16*4 on 32-bit archs. But in any case, on Linux, you can rely
on sysctl(_SC_OPEN_MAX) to give an actual meaningful number that fits
in the range of long, not -1/unlimited, and thus the invalid fallback
case is never hit.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
                   ` (3 preceding siblings ...)
  2020-08-15  6:51 ` Timo Teras
@ 2020-08-16  3:57 ` Rich Felker
  2020-08-16  9:10   ` Florian Weimer
  2020-08-16  7:05 ` Pirmin Walthert
  2020-09-30 18:38 ` Rich Felker
  6 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-08-16  3:57 UTC (permalink / raw)
  To: musl

On Fri, Aug 14, 2020 at 05:41:38PM -0400, Rich Felker wrote:
> This is largely because glibc attempts to make the erroneous usage by
> these libraries work (more on that below).
> 
> [...]
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:
> 
> - at_quick_exit
> - atexit
> - dlerror
> - gettext
> - malloc
> - pthread_atfork (already necessarily held at fork)
> - random
> - sem_open
> - stdio open file list (vs individual FILEs)
> - syslog
> - timezone
> 
> This list looks tractable. Aside from malloc, whose locks would need
> to be taken last since the others may call malloc, these don't seem to
> have any lock order dependencies between them, and each one's lock
> functions could be provided as strong overrides to weak no-op
> definitions in fork.c.

On some inspection, glibc does not actually attempt to make the child
environment unrestricted. The only things it does around fork are:

- obtain/release malloc locks (makes malloc work reliably in the
  child)

- obtain stdio file list lock in parent, release in parent, reset in
  child (probably makes fopen work in the child, but I see no reason
  why the child resets the lock rather than just unlocking it)

- reset FILE locks in child (necessarily leads to accessing
  inconsistent or corrupt state if they're ever used in the child)

- reset dynamic linker lock in child (AFAICT, necessarily leads to
  accessing inconsistent or corrupt state in child if any dynamic
  linker functions are ever used in the child, possibly including via
  the lazy resolver (!!))

So pretty much the *only* thing glibc attempts to make work
"correctly" in the MT-forked-child context is malloc. Everything else
is either ignored (see all of the above I found for musl, plus lots
more things that are glibc-specific) or actively broken by the special
handling at fork time.

In light of this, I think it's very reasonable that the new POSIX
direction is just allowing implementations that make fork non-AS-safe,
but not allowing the application to assume anything new. "It's
AS-unsafe, except malloc works" is a really weird and arbitrary
restriction. This reassures me that we really should be working to get
the broken application/library code here fixed, rather than trying to
accommodate it, unless there's a major change in direction where
multiple implementors want to agree to make this really work "right".

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
                   ` (4 preceding siblings ...)
  2020-08-16  3:57 ` Rich Felker
@ 2020-08-16  7:05 ` Pirmin Walthert
  2020-08-16 16:55   ` Rich Felker
  2020-09-30 18:38 ` Rich Felker
  6 siblings, 1 reply; 20+ messages in thread
From: Pirmin Walthert @ 2020-08-16  7:05 UTC (permalink / raw)
  To: musl, Rich Felker

Dear Rich,
> I've seen suspicions that the switch to mallocng exposed this, but I'm
> pretty sure it was:
>
> https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c

I needed to patch asterisk because of similar issues (you gave me some 
hints on this mailing list after I'd described the issue):

https://issues.asterisk.org/jira/browse/ASTERISK-28776

However this happened with a combination of musl 1.2.0, which did not 
have the mentioned commit applied, and mallocng (LD_PRELOAD method). 
Asterisk with 1.2.0 and old malloc was running just fine.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16  3:57 ` Rich Felker
@ 2020-08-16  9:10   ` Florian Weimer
  2020-08-16 16:56     ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-08-16  9:10 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On some inspection, glibc does not actually attempt to make the child
> environment unrestricted. The only things it does around fork are:

I think pthread_once initializers that have partially executed are
also executed from the start in the child if initialization is
requested again.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16  1:27       ` Rich Felker
@ 2020-08-16 12:48         ` Natanael Copa
  0 siblings, 0 replies; 20+ messages in thread
From: Natanael Copa @ 2020-08-16 12:48 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

On Sat, 15 Aug 2020 21:27:00 -0400
Rich Felker <dalias@libc.org> wrote:

> On Sat, Aug 15, 2020 at 12:25:47PM -0400, Rich Felker wrote:
> > On Sat, Aug 15, 2020 at 01:51:00PM +0200, Natanael Copa wrote:  
> > > xfce4-terminal was more or less completely useless so I had to add a workaround for it in two patches:
> > > 
> > > First uncover a useless setenv. Even the comment in the code says that it has no effect:
> > > https://git.alpinelinux.org/aports/commit/community/vte3?id=ad687b01b2a5fa9d53fcd9d0ee3743882f3542b4
> > > 
> > > In the second patch I use some of the hunks in the upstream and replace malloc with alloca:
> > > https://git.alpinelinux.org/aports/commit/community/vte3?id=161434fcb87807dae40dffdd332db1624b747bc7
> > > 
> > > After that xfce4-terminal becomes useable again.  
> > 
> > I'm not sure whether the alloca is safe. The parent could just do this
> > allocation *before fork* rather than waiting til the last minute to do
> > it. The data does not change between fork and exec. Note that the glib
> > fix cited above did this right.

That's why I called it a workaround and not a fix. They did refactor
the code but didn't want to backport it.

> > 
> > The hardcoded fallback for fd limit seems like a bad idea, and I don't
> > think I understand your fallback sequence. It looks like RLIM_INFINITY
> > gets reinterpreted as 4096. I'm not sure how it should be interpreted;
> > in principle, probably as INT_MAX+1U. This is again exposing how the
> > whole "close-all" idiom is horribly wrong and these libraries should
> > just be documenting that you must use O_CLOEXEC correctly if you don't
> > want them to leak file descriptors.  
> 
> Fortunately it looks like Linux doesn't let you set RLIM_INFINITY for
> RLIM_NOFILE. The value of the rlimit is limited by sysctl fs.nr_open,
> and fs.nr_open is limited between sysctl_nr_open_min (==BITS_PER_LONG)
> and sysctl_nr_open_max. The latter is INT_MAX/8*8 on 64-bit archs, and
> SIZE_MAX/16*4 on 32-bit archs. But in any case, on Linux, you can rely
> on sysctl(_SC_OPEN_MAX) to give an actual meaningful number that fits
> in the range of long, not -1/unlimited, and thus the invalid fallback
> case is never hit.
> 
> Rich

I think you have good points. Would be good to forward those upstream:
https://gitlab.gnome.org/GNOME/vte/-/commit/ed78b9e2ec47675a9dccf045caca4d7a0b6c9fe8

-nc

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16  7:05 ` Pirmin Walthert
@ 2020-08-16 16:55   ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-16 16:55 UTC (permalink / raw)
  To: musl

On Sun, Aug 16, 2020 at 09:05:28AM +0200, Pirmin Walthert wrote:
> Dear Rich,
> >I've seen suspicions that the switch to mallocng exposed this, but I'm
> >pretty sure it was:
> >
> >https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c
> 
> I needed to patch asterisk because of similar issues (you gave me
> some hints on this mailing list after I'd described the issue):
> 
> https://issues.asterisk.org/jira/browse/ASTERISK-28776
> 
> However this happened with a combination of musl 1.2.0, which did
> not have the mentioned commit applied, and mallocng (LD_PRELOAD
> method). Asterisk with 1.2.0 and old malloc was running just fine.

Yes, if you were using mallocng out-of-tree it doesn't integrate with
musl's internal locking, and would always perform locks regardless of
whether multithreaded. This is even stronger than the above-cited
change.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16  9:10   ` Florian Weimer
@ 2020-08-16 16:56     ` Rich Felker
  2020-08-16 17:11       ` Florian Weimer
  0 siblings, 1 reply; 20+ messages in thread
From: Rich Felker @ 2020-08-16 16:56 UTC (permalink / raw)
  To: musl

On Sun, Aug 16, 2020 at 11:10:37AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On some inspection, glibc does not actually attempt to make the child
> > environment unrestricted. The only things it does around fork are:
> 
> I think pthread_once initializers that have partially executed are
> also executed from the start in the child if initialization is
> requested again.

I don't follow how pthread_once is related. The vast majority of the
things I found glibc doing no specific handling for are actual mutable
state not just on-demand initialization.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16 16:56     ` Rich Felker
@ 2020-08-16 17:11       ` Florian Weimer
  2020-08-16 18:33         ` Rich Felker
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Weimer @ 2020-08-16 17:11 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

* Rich Felker:

> On Sun, Aug 16, 2020 at 11:10:37AM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > On some inspection, glibc does not actually attempt to make the child
>> > environment unrestricted. The only things it does around fork are:
>> 
>> I think pthread_once initializers that have partially executed are
>> also executed from the start in the child if initialization is
>> requested again.
>
> I don't follow how pthread_once is related. The vast majority of the
> things I found glibc doing no specific handling for are actual mutable
> state not just on-demand initialization.

If a fork happens during a pthread_once initialization, the subsystem
related to that becomes unavailable after fork.  The pthread_once_t
reinitialization logic intends to avoid that.  Like resetting locks
after fork in the new process, it is rather questionable.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-16 17:11       ` Florian Weimer
@ 2020-08-16 18:33         ` Rich Felker
  0 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-08-16 18:33 UTC (permalink / raw)
  To: musl

On Sun, Aug 16, 2020 at 07:11:37PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Sun, Aug 16, 2020 at 11:10:37AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> > On some inspection, glibc does not actually attempt to make the child
> >> > environment unrestricted. The only things it does around fork are:
> >> 
> >> I think pthread_once initializers that have partially executed are
> >> also executed from the start in the child if initialization is
> >> requested again.
> >
> > I don't follow how pthread_once is related. The vast majority of the
> > things I found glibc doing no specific handling for are actual mutable
> > state not just on-demand initialization.
> 
> If a fork happens during a pthread_once initialization, the subsystem
> related to that becomes unavailable after fork.  The pthread_once_t
> reinitialization logic intends to avoid that.  Like resetting locks
> after fork in the new process, it is rather questionable.

Yes but initialization is hardly the interesting case. All of the
subsystems I highlighted were not initialization but mutable state:

- adding (or removing, if you have dlclose remove them like glibc
  does) exit handlers.

- loading (or unloading) shared libraries

- adding textdomains to gettext or changing the active default one

- opening named semaphores (has a lock because POSIX requires opening
  the same one more than once to return the same sem_t pointer rather
  than a second mapping of it).

- using syslog (there's at least some state with regard to the log fd
  and log levels)

- using any time functions that depend on the timezone

The pthread_once-like initializations are another set of potential
deadlocks on top of that.

Most of the above happen fairly infrequently, especially compared to
malloc, so they're less likely to be bit, but they are deadlock
hazards that prevent the child environment from being unrestricted.
syslog and time are probably the most likely to be hit.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [musl] Restrictions on child context after multithreaded fork
  2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
                   ` (5 preceding siblings ...)
  2020-08-16  7:05 ` Pirmin Walthert
@ 2020-09-30 18:38 ` Rich Felker
  6 siblings, 0 replies; 20+ messages in thread
From: Rich Felker @ 2020-09-30 18:38 UTC (permalink / raw)
  To: musl

On Fri, Aug 14, 2020 at 05:41:38PM -0400, Rich Felker wrote:
> The next issue of POSIX (Issue 8) will drop the requirement that fork
> be AS-safe, as a result of Austin Group tracker issue #62. This makes
> the glibc behavior permissible/conforming, but there does not seem to
> be any effort on the POSIX side to drop the requirement on
> applications not to do AS-unsafe things in the child before exec, so
> regardless of this change, what these libraries are doing is still
> wrong.
> 
> In order to make the child environment unrestricted after fork, either
> fork must hold *all* locks at the time the actual fork syscall takes
> place, or it must be able to reset any state protected by a lock that
> was held in the parent (or some mix of the two). It's fundamentally
> impossible to do this completely (in a way that lets the child run
> unrestricted), since some locks in the parent may be held arbitrarily
> long such that fork waiting on them would deadlock. In particular, any
> stdio FILE lock may be held indefinitely because there's a blocking
> operation in progress on the underlying fd, or because the application
> has called flockfile. Thus, at best, the implementation can give the
> child an environment where fflush(0) and exit() still deadlock.
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:
> 
> - at_quick_exit
> - atexit
> - dlerror
> - gettext
> - malloc
> - pthread_atfork (already necessarily held at fork)
> - random
> - sem_open
> - stdio open file list (vs individual FILEs)
> - syslog
> - timezone
> 
> This list looks tractable. Aside from malloc, whose locks would need
> to be taken last since the others may call malloc, these don't seem to
> have any lock order dependencies between them, and each one's lock
> functions could be provided as strong overrides to weak no-op
> definitions in fork.c.

Based on continued frustration over the possible scope of applications
-- and perhaps more importantly, language runtimes, which may have no
plausible road to fixing them -- that are broken this regard, I've
been looking further into what it would take to support an
unrestricted execution environment in the MT-forked child.

This isn't a path I really like, but it's one I'm considering because
users/distributions have gotten into a nasty sitation where at least
some (and possibly a lot, and possibly some of them unfixable) things
that appeared to be working in musl 1.2.0 and earlier weren't, and now
(since e01b5939b38a) they very conspicuously hang instead, and neither
upgrading nor sticking with the silent breakage is viable.

So far my findings:

First, there are some fundamental limitations to how "unrestricted"
the child can be. Nothing can be done about resources that are held in
a potentially inconsistent state under application control. This
includes FILEs (but not the open file list) which the application may
have locked with flockfile or which may be locked for unbounded time
just because a blocking operation on them is in progress. As such, the
child cannot call fflush(0) or exit and expect them to work as normal.
Either they deadlock, or fork does something to exclude any affected
FILEs.

A less obvious instance of "under application control" is constructors
being executed by dlopen. If a library was being constructed by
another thread at the moment of fork, there is no way for that
construction to ever be completed in the child. fork cannot just "wait
for all ctors in progress" since that may take unboundedly long, and
would potentially break existing correct software that only wants to
do AS-safe things in the child but needs fork to make forward
progress. In fact, in a worst case, it's even possible that fork was
called from a ctor. So here the best we can hope for is for dlopen to
fail to load any library that was left (or whose dependencies were
left) partially constructed in the parent. And this best-we-can-hope
seems achievable if desired.

Aside from the above two, I don't think there are any further libc
resources with "potentially inconsistent state under application
control". So in theory, everything else can be made to work
"unrestricted" in the child.

As noted before, most of the libc components with internal locking
have no dependencies between one another, and they use the LOCK/__lock
mechanism. These can all be handled with an array of weak pointers to
their locks, and a loop that locks them all in the parent, unlocks
them all in both after fork. Nothing fancy here. Arguably the child
should reset the locks rather than unlocking, though, to get rid of
any stale waiter counts.

The nontrivial cases seem to be just:

- malloc, which has to be locked as late as possible since other
  libc components whose locks are being taken might depend on it.

- oldmalloc, which has multiple locks

- order requirements between ldso and thread related locks:
  - ldso main lock -> init_fini lock
  - ldso main lock -> dlerror lock
  - ldso main lock -> ptc_inhibit -> thread list lock

- thread list lock, which can only be taken with app signals blocked
  but implementation-internal ones unblocked

These don't seem to have any constraints that would actually conflict,
though, just ones that require some attention to detail.

It was noted in a branch of this thread that pthread_once usage in
pthread_mutexattr_set{robust,protocol} and timer_create was also a
problem. It would be easy enough to just reset these three once
objects in the MT-forked child, but it was even easier just to get rid
of them. Regardless of whether any fork changes actually end up being
made, I've changed pthread_mutexattr_set{robust,protocol} (pending
push) to just do a non-pthread_once lazy-init idiom, and dropped the
signal handler setup entirely from timer_create since it was obsolete.

I have a very rough draft, on top of some unpushed commits, that in
theory makes this all work but it's untested and a mess right now. To
give a rough idea of what's involved (for gauging whether it's
objectionable etc.), here are the diff stats:

 ldso/dynlink.c                | 18 +++++++++++++
 src/exit/at_quick_exit.c      |  2 ++
 src/exit/atexit.c             |  2 ++
 src/ldso/dlerror.c            |  2 ++
 src/locale/dcngettext.c       |  5 +++-
 src/malloc/lite_malloc.c      |  5 +++-
 src/malloc/mallocng/glue.h    |  9 ++++++-
 src/malloc/oldmalloc/malloc.c | 19 ++++++++++++++
 src/misc/syslog.c             |  2 ++
 src/prng/random.c             |  2 ++
 src/process/fork.c            | 60 +++++++++++++++++++++++++++++++++++++++++++
 src/stdio/ofl.c               |  2 ++
 src/thread/sem_open.c         |  2 ++
 src/time/__tz.c               |  2 ++
 14 files changed, 129 insertions(+), 3 deletions(-)

Most files changed are just to make locks that were presently
static-only visible outside their translation units. There's also a
small fork_impl.h not included since it's not tracked; it just
declares the symbols for linkage and marks them hidden.

If we proceed with this, the result is much stronger than what glibc
is giving -- glibc seems to only make an effort to make malloc work in
the MT-forked child, and leaves everything else deadlocking or failing
in unpredictable ways. Admittedly it would be slightly easier to just
make things work to the extent they work in glibc, but I think the
difference in effor required (and maintenance cost) is rather small,
and this kind of haphazard "let's make some things work, but not
really have a consistent model for what you can do" is not how things
are supposed to be done with musl.

Over the next few days I plan to review the changes I already have
queued (which are mostly bug fixes and cleanup, and ok regardless of
whether we move forward with the above) and do some testing. If things
look good, I'll post a draft of the possible changes. In the mean
time, comments on this (especially any problems I might have missed)
are welcome.

Rich

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-09-30 18:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 21:41 [musl] Restrictions on child context after multithreaded fork Rich Felker
2020-08-14 22:02 ` Florian Weimer
2020-08-14 22:14   ` Rich Felker
2020-08-15  0:47 ` A. Wilcox
2020-08-15  2:40   ` Rich Felker
2020-08-15  2:07 ` Ariadne Conill
2020-08-15  3:02   ` Rich Felker
2020-08-15  6:51 ` Timo Teras
2020-08-15 11:51   ` Natanael Copa
2020-08-15 16:25     ` Rich Felker
2020-08-16  1:27       ` Rich Felker
2020-08-16 12:48         ` Natanael Copa
2020-08-16  3:57 ` Rich Felker
2020-08-16  9:10   ` Florian Weimer
2020-08-16 16:56     ` Rich Felker
2020-08-16 17:11       ` Florian Weimer
2020-08-16 18:33         ` Rich Felker
2020-08-16  7:05 ` Pirmin Walthert
2020-08-16 16:55   ` Rich Felker
2020-09-30 18:38 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).