From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 32280 invoked from network); 15 Aug 2020 02:07:43 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 15 Aug 2020 02:07:43 -0000 Received: (qmail 7361 invoked by uid 550); 15 Aug 2020 02:07:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 7343 invoked from network); 15 Aug 2020 02:07:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dereferenced.org; s=default; t=1597457249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bg4nEAZZVoUaOffRUNbDcWIVHEBGG2tugijPWDb8dQI=; b=jd73VIlm9hfBqSQN1MPXCrzGhpWz1AslZZu1mvwvrsi0jC8dHZqsQ8ZFO969Bb2qLC2rg3 mVl8hYLfGNuWJbKJ/KAv0P4fSfRjs+k4+BqmL4nkcNMSyI/pcLZ2FNqgfdQeeIfyMSaUXP eCw9LkebfnrXZUCWBWc8q6dlPg7IuG8= To: musl@lists.openwall.com References: <20200814214136.GP3265@brightrain.aerifal.cx> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Ariadne Conill Message-ID: Date: Fri, 14 Aug 2020 20:07:26 -0600 MIME-Version: 1.0 In-Reply-To: <20200814214136.GP3265@brightrain.aerifal.cx> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [musl] Restrictions on child context after multithreaded fork Hello, On 2020-08-14 15:41, Rich Felker wrote: > musl 1.2.1 has exposed bugs in several applications and libraries > caused by async-signal-unsafe code between (multithreaded) fork and > subsequent exec. So far, dbus library code, pulseaudio library code, > and libvirt have been found to be affected. A couple of the bug > reports (with incomplete information) are: I suspect the dbus library code has been broken for some amount of time because we've been tracking an issue where dbus stalls on KDE Wayland for a short period of time causing the session to come up strangely. 1.2.1 did make this stall more pronounced though. > https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602 > https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815 > > Fixing the affected library code looks very straightforward; it's just > a matter of doing proper iterations of existing data/state rather than > allocating lists and using opendir on procfs and such. I've discussed > fixes with Alpine Linux folks and I believe fixes have been tested, > but I don't see any patches in aports yet. I'll work on getting patches in next week. > I've seen suspicions that the switch to mallocng exposed this, but I'm > pretty sure it was: > > https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c We carried this patch before upgrade to 1.2.1 and it was mostly fine, has been carried since 1.1.24-r7 AFAIK. > Before this commit, the (incorrect) lock skipping logic allowed the > child process to access inconsistent state left from the parent if it > violated the requirement not to call AS-unsafe functions. Now, the > lock attempt in the child rightly deadlocks before accessing state > that was being modified under control of the lock in the parent. This > is not specific to malloc but common with anything using libc-internal > locks. > > I'll follow up on this thread once there are patches for the known > affected libraries. > > Note that this is a type of bug that's possibly hard to get upstreams > to take seriously. libvirt in particular, despite having multiple > comments throughout the source warning developers that they can't do > anything AS-unsafe between fork and exec, is somehow deeming malloc an > exception to that rule because they want to use it (despite it clearly > not being necessary). > > And the dbus issue has been known for a long time; see open bug: > > https://gitlab.freedesktop.org/dbus/dbus/-/issues/173 > (originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843) > > This is largely because glibc attempts to make the erroneous usage by > these libraries work (more on that below). > > The next issue of POSIX (Issue 8) will drop the requirement that fork > be AS-safe, as a result of Austin Group tracker issue #62. This makes > the glibc behavior permissible/conforming, but there does not seem to > be any effort on the POSIX side to drop the requirement on > applications not to do AS-unsafe things in the child before exec, so > regardless of this change, what these libraries are doing is still > wrong. > > In order to make the child environment unrestricted after fork, either > fork must hold *all* locks at the time the actual fork syscall takes > place, or it must be able to reset any state protected by a lock that > was held in the parent (or some mix of the two). It's fundamentally > impossible to do this completely (in a way that lets the child run > unrestricted), since some locks in the parent may be held arbitrarily > long such that fork waiting on them would deadlock. In particular, any > stdio FILE lock may be held indefinitely because there's a blocking > operation in progress on the underlying fd, or because the application > has called flockfile. Thus, at best, the implementation can give the > child an environment where fflush(0) and exit() still deadlock. > > In case we do want to follow a direction of trying to provide some > degree of relaxation of restrictions on the child (taking the liberty > of POSIX-future drop of fork's AS-safety requirement), I did a quick > survey of libc-internal locks, and found: I think it is better to fix programs to not depend on AS-unsafe functions at fork time. Being lax on this requirement is an indicator of other bad engineering decisions in these programs, especially libvirt and pulseaudio. Ariadne