From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 29752 invoked from network); 30 Sep 2020 18:38:24 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 30 Sep 2020 18:38:24 -0000 Received: (qmail 29871 invoked by uid 550); 30 Sep 2020 18:38:21 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 29853 invoked from network); 30 Sep 2020 18:38:20 -0000 Date: Wed, 30 Sep 2020 14:38:08 -0400 From: Rich Felker To: musl@lists.openwall.com Message-ID: <20200930183808.GJ17637@brightrain.aerifal.cx> References: <20200814214136.GP3265@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200814214136.GP3265@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Restrictions on child context after multithreaded fork On Fri, Aug 14, 2020 at 05:41:38PM -0400, Rich Felker wrote: > The next issue of POSIX (Issue 8) will drop the requirement that fork > be AS-safe, as a result of Austin Group tracker issue #62. This makes > the glibc behavior permissible/conforming, but there does not seem to > be any effort on the POSIX side to drop the requirement on > applications not to do AS-unsafe things in the child before exec, so > regardless of this change, what these libraries are doing is still > wrong. > > In order to make the child environment unrestricted after fork, either > fork must hold *all* locks at the time the actual fork syscall takes > place, or it must be able to reset any state protected by a lock that > was held in the parent (or some mix of the two). It's fundamentally > impossible to do this completely (in a way that lets the child run > unrestricted), since some locks in the parent may be held arbitrarily > long such that fork waiting on them would deadlock. In particular, any > stdio FILE lock may be held indefinitely because there's a blocking > operation in progress on the underlying fd, or because the application > has called flockfile. Thus, at best, the implementation can give the > child an environment where fflush(0) and exit() still deadlock. > > In case we do want to follow a direction of trying to provide some > degree of relaxation of restrictions on the child (taking the liberty > of POSIX-future drop of fork's AS-safety requirement), I did a quick > survey of libc-internal locks, and found: > > - at_quick_exit > - atexit > - dlerror > - gettext > - malloc > - pthread_atfork (already necessarily held at fork) > - random > - sem_open > - stdio open file list (vs individual FILEs) > - syslog > - timezone > > This list looks tractable. Aside from malloc, whose locks would need > to be taken last since the others may call malloc, these don't seem to > have any lock order dependencies between them, and each one's lock > functions could be provided as strong overrides to weak no-op > definitions in fork.c. Based on continued frustration over the possible scope of applications -- and perhaps more importantly, language runtimes, which may have no plausible road to fixing them -- that are broken this regard, I've been looking further into what it would take to support an unrestricted execution environment in the MT-forked child. This isn't a path I really like, but it's one I'm considering because users/distributions have gotten into a nasty sitation where at least some (and possibly a lot, and possibly some of them unfixable) things that appeared to be working in musl 1.2.0 and earlier weren't, and now (since e01b5939b38a) they very conspicuously hang instead, and neither upgrading nor sticking with the silent breakage is viable. So far my findings: First, there are some fundamental limitations to how "unrestricted" the child can be. Nothing can be done about resources that are held in a potentially inconsistent state under application control. This includes FILEs (but not the open file list) which the application may have locked with flockfile or which may be locked for unbounded time just because a blocking operation on them is in progress. As such, the child cannot call fflush(0) or exit and expect them to work as normal. Either they deadlock, or fork does something to exclude any affected FILEs. A less obvious instance of "under application control" is constructors being executed by dlopen. If a library was being constructed by another thread at the moment of fork, there is no way for that construction to ever be completed in the child. fork cannot just "wait for all ctors in progress" since that may take unboundedly long, and would potentially break existing correct software that only wants to do AS-safe things in the child but needs fork to make forward progress. In fact, in a worst case, it's even possible that fork was called from a ctor. So here the best we can hope for is for dlopen to fail to load any library that was left (or whose dependencies were left) partially constructed in the parent. And this best-we-can-hope seems achievable if desired. Aside from the above two, I don't think there are any further libc resources with "potentially inconsistent state under application control". So in theory, everything else can be made to work "unrestricted" in the child. As noted before, most of the libc components with internal locking have no dependencies between one another, and they use the LOCK/__lock mechanism. These can all be handled with an array of weak pointers to their locks, and a loop that locks them all in the parent, unlocks them all in both after fork. Nothing fancy here. Arguably the child should reset the locks rather than unlocking, though, to get rid of any stale waiter counts. The nontrivial cases seem to be just: - malloc, which has to be locked as late as possible since other libc components whose locks are being taken might depend on it. - oldmalloc, which has multiple locks - order requirements between ldso and thread related locks: - ldso main lock -> init_fini lock - ldso main lock -> dlerror lock - ldso main lock -> ptc_inhibit -> thread list lock - thread list lock, which can only be taken with app signals blocked but implementation-internal ones unblocked These don't seem to have any constraints that would actually conflict, though, just ones that require some attention to detail. It was noted in a branch of this thread that pthread_once usage in pthread_mutexattr_set{robust,protocol} and timer_create was also a problem. It would be easy enough to just reset these three once objects in the MT-forked child, but it was even easier just to get rid of them. Regardless of whether any fork changes actually end up being made, I've changed pthread_mutexattr_set{robust,protocol} (pending push) to just do a non-pthread_once lazy-init idiom, and dropped the signal handler setup entirely from timer_create since it was obsolete. I have a very rough draft, on top of some unpushed commits, that in theory makes this all work but it's untested and a mess right now. To give a rough idea of what's involved (for gauging whether it's objectionable etc.), here are the diff stats: ldso/dynlink.c | 18 +++++++++++++ src/exit/at_quick_exit.c | 2 ++ src/exit/atexit.c | 2 ++ src/ldso/dlerror.c | 2 ++ src/locale/dcngettext.c | 5 +++- src/malloc/lite_malloc.c | 5 +++- src/malloc/mallocng/glue.h | 9 ++++++- src/malloc/oldmalloc/malloc.c | 19 ++++++++++++++ src/misc/syslog.c | 2 ++ src/prng/random.c | 2 ++ src/process/fork.c | 60 +++++++++++++++++++++++++++++++++++++++++++ src/stdio/ofl.c | 2 ++ src/thread/sem_open.c | 2 ++ src/time/__tz.c | 2 ++ 14 files changed, 129 insertions(+), 3 deletions(-) Most files changed are just to make locks that were presently static-only visible outside their translation units. There's also a small fork_impl.h not included since it's not tracked; it just declares the symbols for linkage and marks them hidden. If we proceed with this, the result is much stronger than what glibc is giving -- glibc seems to only make an effort to make malloc work in the MT-forked child, and leaves everything else deadlocking or failing in unpredictable ways. Admittedly it would be slightly easier to just make things work to the extent they work in glibc, but I think the difference in effor required (and maintenance cost) is rather small, and this kind of haphazard "let's make some things work, but not really have a consistent model for what you can do" is not how things are supposed to be done with musl. Over the next few days I plan to review the changes I already have queued (which are mostly bug fixes and cleanup, and ok regardless of whether we move forward with the above) and do some testing. If things look good, I'll post a draft of the possible changes. In the mean time, comments on this (especially any problems I might have missed) are welcome. Rich