mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Subject: Re: [musl] Restrictions on child context after multithreaded fork
Date: Wed, 30 Sep 2020 14:38:08 -0400	[thread overview]
Message-ID: <20200930183808.GJ17637@brightrain.aerifal.cx> (raw)
In-Reply-To: <20200814214136.GP3265@brightrain.aerifal.cx>

On Fri, Aug 14, 2020 at 05:41:38PM -0400, Rich Felker wrote:
> The next issue of POSIX (Issue 8) will drop the requirement that fork
> be AS-safe, as a result of Austin Group tracker issue #62. This makes
> the glibc behavior permissible/conforming, but there does not seem to
> be any effort on the POSIX side to drop the requirement on
> applications not to do AS-unsafe things in the child before exec, so
> regardless of this change, what these libraries are doing is still
> wrong.
> 
> In order to make the child environment unrestricted after fork, either
> fork must hold *all* locks at the time the actual fork syscall takes
> place, or it must be able to reset any state protected by a lock that
> was held in the parent (or some mix of the two). It's fundamentally
> impossible to do this completely (in a way that lets the child run
> unrestricted), since some locks in the parent may be held arbitrarily
> long such that fork waiting on them would deadlock. In particular, any
> stdio FILE lock may be held indefinitely because there's a blocking
> operation in progress on the underlying fd, or because the application
> has called flockfile. Thus, at best, the implementation can give the
> child an environment where fflush(0) and exit() still deadlock.
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:
> 
> - at_quick_exit
> - atexit
> - dlerror
> - gettext
> - malloc
> - pthread_atfork (already necessarily held at fork)
> - random
> - sem_open
> - stdio open file list (vs individual FILEs)
> - syslog
> - timezone
> 
> This list looks tractable. Aside from malloc, whose locks would need
> to be taken last since the others may call malloc, these don't seem to
> have any lock order dependencies between them, and each one's lock
> functions could be provided as strong overrides to weak no-op
> definitions in fork.c.

Based on continued frustration over the possible scope of applications
-- and perhaps more importantly, language runtimes, which may have no
plausible road to fixing them -- that are broken this regard, I've
been looking further into what it would take to support an
unrestricted execution environment in the MT-forked child.

This isn't a path I really like, but it's one I'm considering because
users/distributions have gotten into a nasty sitation where at least
some (and possibly a lot, and possibly some of them unfixable) things
that appeared to be working in musl 1.2.0 and earlier weren't, and now
(since e01b5939b38a) they very conspicuously hang instead, and neither
upgrading nor sticking with the silent breakage is viable.

So far my findings:

First, there are some fundamental limitations to how "unrestricted"
the child can be. Nothing can be done about resources that are held in
a potentially inconsistent state under application control. This
includes FILEs (but not the open file list) which the application may
have locked with flockfile or which may be locked for unbounded time
just because a blocking operation on them is in progress. As such, the
child cannot call fflush(0) or exit and expect them to work as normal.
Either they deadlock, or fork does something to exclude any affected
FILEs.

A less obvious instance of "under application control" is constructors
being executed by dlopen. If a library was being constructed by
another thread at the moment of fork, there is no way for that
construction to ever be completed in the child. fork cannot just "wait
for all ctors in progress" since that may take unboundedly long, and
would potentially break existing correct software that only wants to
do AS-safe things in the child but needs fork to make forward
progress. In fact, in a worst case, it's even possible that fork was
called from a ctor. So here the best we can hope for is for dlopen to
fail to load any library that was left (or whose dependencies were
left) partially constructed in the parent. And this best-we-can-hope
seems achievable if desired.

Aside from the above two, I don't think there are any further libc
resources with "potentially inconsistent state under application
control". So in theory, everything else can be made to work
"unrestricted" in the child.

As noted before, most of the libc components with internal locking
have no dependencies between one another, and they use the LOCK/__lock
mechanism. These can all be handled with an array of weak pointers to
their locks, and a loop that locks them all in the parent, unlocks
them all in both after fork. Nothing fancy here. Arguably the child
should reset the locks rather than unlocking, though, to get rid of
any stale waiter counts.

The nontrivial cases seem to be just:

- malloc, which has to be locked as late as possible since other
  libc components whose locks are being taken might depend on it.

- oldmalloc, which has multiple locks

- order requirements between ldso and thread related locks:
  - ldso main lock -> init_fini lock
  - ldso main lock -> dlerror lock
  - ldso main lock -> ptc_inhibit -> thread list lock

- thread list lock, which can only be taken with app signals blocked
  but implementation-internal ones unblocked

These don't seem to have any constraints that would actually conflict,
though, just ones that require some attention to detail.

It was noted in a branch of this thread that pthread_once usage in
pthread_mutexattr_set{robust,protocol} and timer_create was also a
problem. It would be easy enough to just reset these three once
objects in the MT-forked child, but it was even easier just to get rid
of them. Regardless of whether any fork changes actually end up being
made, I've changed pthread_mutexattr_set{robust,protocol} (pending
push) to just do a non-pthread_once lazy-init idiom, and dropped the
signal handler setup entirely from timer_create since it was obsolete.

I have a very rough draft, on top of some unpushed commits, that in
theory makes this all work but it's untested and a mess right now. To
give a rough idea of what's involved (for gauging whether it's
objectionable etc.), here are the diff stats:

 ldso/dynlink.c                | 18 +++++++++++++
 src/exit/at_quick_exit.c      |  2 ++
 src/exit/atexit.c             |  2 ++
 src/ldso/dlerror.c            |  2 ++
 src/locale/dcngettext.c       |  5 +++-
 src/malloc/lite_malloc.c      |  5 +++-
 src/malloc/mallocng/glue.h    |  9 ++++++-
 src/malloc/oldmalloc/malloc.c | 19 ++++++++++++++
 src/misc/syslog.c             |  2 ++
 src/prng/random.c             |  2 ++
 src/process/fork.c            | 60 +++++++++++++++++++++++++++++++++++++++++++
 src/stdio/ofl.c               |  2 ++
 src/thread/sem_open.c         |  2 ++
 src/time/__tz.c               |  2 ++
 14 files changed, 129 insertions(+), 3 deletions(-)

Most files changed are just to make locks that were presently
static-only visible outside their translation units. There's also a
small fork_impl.h not included since it's not tracked; it just
declares the symbols for linkage and marks them hidden.

If we proceed with this, the result is much stronger than what glibc
is giving -- glibc seems to only make an effort to make malloc work in
the MT-forked child, and leaves everything else deadlocking or failing
in unpredictable ways. Admittedly it would be slightly easier to just
make things work to the extent they work in glibc, but I think the
difference in effor required (and maintenance cost) is rather small,
and this kind of haphazard "let's make some things work, but not
really have a consistent model for what you can do" is not how things
are supposed to be done with musl.

Over the next few days I plan to review the changes I already have
queued (which are mostly bug fixes and cleanup, and ok regardless of
whether we move forward with the above) and do some testing. If things
look good, I'll post a draft of the possible changes. In the mean
time, comments on this (especially any problems I might have missed)
are welcome.

Rich

      parent reply	other threads:[~2020-09-30 18:38 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-14 21:41 Rich Felker
2020-08-14 22:02 ` Florian Weimer
2020-08-14 22:14   ` Rich Felker
2020-08-15  0:47 ` A. Wilcox
2020-08-15  2:40   ` Rich Felker
2020-08-15  2:07 ` Ariadne Conill
2020-08-15  3:02   ` Rich Felker
2020-08-15  6:51 ` Timo Teras
2020-08-15 11:51   ` Natanael Copa
2020-08-15 16:25     ` Rich Felker
2020-08-16  1:27       ` Rich Felker
2020-08-16 12:48         ` Natanael Copa
2020-08-16  3:57 ` Rich Felker
2020-08-16  9:10   ` Florian Weimer
2020-08-16 16:56     ` Rich Felker
2020-08-16 17:11       ` Florian Weimer
2020-08-16 18:33         ` Rich Felker
2020-08-16  7:05 ` Pirmin Walthert
2020-08-16 16:55   ` Rich Felker
2020-09-30 18:38 ` Rich Felker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200930183808.GJ17637@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).