From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/9115
Path: news.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: dlopen deadlock
Date: Thu, 14 Jan 2016 17:41:15 -0500
Message-ID: <20160114224115.GW238@brightrain.aerifal.cx>
References: <20160113110937.GE13558@port70.net>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1452811298 29576 80.91.229.3 (14 Jan 2016 22:41:38 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 14 Jan 2016 22:41:38 +0000 (UTC)
To: musl@lists.openwall.com
Original-X-From: musl-return-9128-gllmg-musl=m.gmane.org@lists.openwall.com Thu Jan 14 23:41:32 2016
Return-path: <musl-return-9128-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-9128-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1aJqa9-0003gO-T8
	for gllmg-musl@m.gmane.org; Thu, 14 Jan 2016 23:41:29 +0100
Original-Received: (qmail 26318 invoked by uid 550); 14 Jan 2016 22:41:27 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 26296 invoked from network); 14 Jan 2016 22:41:27 -0000
Content-Disposition: inline
In-Reply-To: <20160113110937.GE13558@port70.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:9115
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/9115>

On Wed, Jan 13, 2016 at 12:09:37PM +0100, Szabolcs Nagy wrote:
> This bug i reported against glibc also affects musl:
> https://sourceware.org/bugzilla/show_bug.cgi?id=19448
> 
> in case of musl it's not the global load lock, but the
> init_fini_lock that causes the problem.

The deadlock happens when a ctor makes a thread that calls dlopen and
does not return until the new thread's dlopen returns, right?

> the multi-threadedness detection is also problematic in
> do_init_fini:
> 
> 	need_locking = has_threads
> 	if (need_locking)
> 		lock(init_fini_lock)
> 	for all deps
> 		run_ctors(dep)
> 		if (!need_locking && has_threads)
> 			need_locking = 1
> 			lock(init_fini_lock)
> 	if (need_locking)
> 		unlock(init_fini_lock)
> 
> checking for threads after ctors are run is too late if
> the ctors may start new threads that can dlopen libs with
> common deps with the currently loaded lib.

The logic seems unnecessary now that there's no lazy/optional thread
pointer initialization (originally it was a problem because
pthread_mutex_lock with a recursive mutex needed to access TLS for the
owner tid, but TLS might not have been initialized when the ctors ran)
but I don't immediately see how it's harmful. The only state the lock
protects is p->constructed and the fini chain (fini_head,
p->fini_next) which are all used before the ctors run. The need for
locking is re-evaluated after the ctors run.

> one solution i can think of is to have an init_fini_lock
> for each dso, then the deadlock only happens if a ctor
> tries to dlopen its own lib (directly or indirectly)
> which is nonsense (the library depends on itself being
> loaded)

The lock has to protect the fini chain linked list (used to control
order of dtors) so I don't think having it be per-dso is a
possibility.

Rich