From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14720 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Bug report: Memory corrupion due to stale robust_list.head pointer Date: Wed, 25 Sep 2019 08:21:10 -0400 Message-ID: <20190925122110.GU9017@brightrain.aerifal.cx> References: Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="146284"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) Cc: musl@lists.openwall.com To: d.dorau@avm.de Original-X-From: musl-return-14736-gllmg-musl=m.gmane.org@lists.openwall.com Wed Sep 25 14:21:34 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1iD6IT-000bpj-Al for gllmg-musl@m.gmane.org; Wed, 25 Sep 2019 14:21:29 +0200 Original-Received: (qmail 13553 invoked by uid 550); 25 Sep 2019 12:21:26 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13535 invoked from network); 25 Sep 2019 12:21:26 -0000 Content-Disposition: inline In-Reply-To: Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:14720 Archived-At: On Wed, Sep 25, 2019 at 12:05:18PM +0200, d.dorau@avm.de wrote: > Hello, > > I recently came across a memory corruption in the member "tsd" of struct > pthread > in a scenario where a pthread mutex is intentionally held during fork(). > I experienced this using the lastest release 1.1.23. > > I found that during fork musl resets self->robust_list.pending and > self->robust_list.off, > but not the robust_list.head. The stale pointer to the previously held and > reset > mutex turned out to be the cause for the following corruption. > > I therefore suggest to also reset the list head on fork as such: > > --- a/src/process/fork.c.orig 2019-09-23 11:41:01.381626360 +0200 > +++ b/src/process/fork.c 2019-09-23 11:41:26.657819473 +0200 > @@ -27,6 +27,7 @@ > self->tid = __syscall(SYS_gettid); > self->robust_list.off = 0; > self->robust_list.pending = 0; > + self->robust_list.head = &self->robust_list.head; > self->next = self->prev = self; > __thread_list_lock = 0; > libc.threads_minus_1 = 0; > > > This resolves the issue. > > I am very well aware of the fact that aquiring a mutex during fork and > re-initializing > in the child appears to result in undefined behaviour (as of > pthread_mutex_init(3posix)) > or to be controversial at least. > > However I believe that it should't result in a memory corruption as a > result. This is definitely undefined behavior, for exactly the types of reason you encountered. I agree it's useful to limit the scope of memory corruption that might occur on such invalid usage though. Your patch is probably ok but before merging I'd like to take some time to consider whether there are any valid usage cases it breaks (so far, I don't think so) and whether any other easily-caught cases are missed. For example, it is possible that the child attempts to use the mutex in non-UB ways like locking or unlocking it, but those should produce errors before even touching the robust list. In theory there's a tid-reuse problem that could allow child to unlock a mutex that used to be held by another thread in the parent, but (1) this can only be hit if the child is doing non-AS-safe things after a MT fork, and (2) musl already avoids this problem by using the robust list even for non-robust mutexes. I don't think (2) is 100% complete in the case where a MT program forks with non-pshared mutexes and the child makes new threads, but this is very very far into UB territory. > To reproduce I wrote a small example which triggers and shows the > curruption. > It also contains a description of the program flow and memory corruption. > > Please find it attached to this mail. > > Please note that the routine to print the robust_list is hacked using > hardcoded > offsets which are aimed at my 32-Bit platform. Thanks for the detailed report. Rich