From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13741
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Szabolcs Nagy <nsz@port70.net>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task
Date: Sat, 9 Feb 2019 22:40:45 +0100
Message-ID: <20190209214045.GO21289@port70.net>
References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru>
 <20190207183626.GQ23599@brightrain.aerifal.cx>
 <f368f992-9c4c-8b7f-7b0e-39e39c27ebf7@ispras.ru>
 <20190208183357.GX23599@brightrain.aerifal.cx>
 <20190209162101.GN21289@port70.net>
 <6e0306699add531af519843de20c343a@ispras.ru>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="T6xhMxlHU34Bk0ad"
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="249221"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: Mutt/1.10.1 (2018-07-13)
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
To: musl@lists.openwall.com
Original-X-From: musl-return-13757-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 09 22:41:01 2019
Return-path: <musl-return-13757-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-13757-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gsaMu-0012hk-R6
	for gllmg-musl@m.gmane.org; Sat, 09 Feb 2019 22:41:00 +0100
Original-Received: (qmail 32469 invoked by uid 550); 9 Feb 2019 21:40:58 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 32450 invoked from network); 9 Feb 2019 21:40:58 -0000
Mail-Followup-To: musl@lists.openwall.com,
	Alexey Izbyshev <izbyshev@ispras.ru>
Content-Disposition: inline
In-Reply-To: <6e0306699add531af519843de20c343a@ispras.ru>
Xref: news.gmane.org gmane.linux.lib.musl.general:13741
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13741>


--T6xhMxlHU34Bk0ad
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

* Alexey Izbyshev <izbyshev@ispras.ru> [2019-02-09 21:33:32 +0300]:
> On 2019-02-09 19:21, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@libc.org> [2019-02-08 13:33:57 -0500]:
> > > On Fri, Feb 08, 2019 at 09:14:48PM +0300, Alexey Izbyshev wrote:
> > > > On 2/7/19 9:36 PM, Rich Felker wrote:
> > > > >Does it work if we force two iterations of the readdir loop with no
> > > > >tasks missed, rather than just one, to catch the case of missed
> > > > >concurrent additions? I'm not sure. But all this makes me really
> > > > >uncomfortable with the current approach.
> > > >
> > > > I've tested with 0, 1, 2 and 3 retries of the main loop if miss_cnt
> > > > == 0. The test eventually failed in all cases, with 0 retries
> > > > requiring only a handful of iterations, 1 -- on the order of 100, 2
> > > > -- on the order of 10000 and 3 -- on the order of 100000.
> > > 
> > > Do you have a theory on the mechanism of failure here? I'm guessing
> > > it's something like this: there's a thread that goes unseen in the
> > > first round, and during the second round, it creates a new thread and
> > > exits itself. The exit gets seen (again, it doesn't show up in the
> > > dirents) but the new thread it created still doesn't. Is that right?
> > > 
> > > In any case, it looks like the whole mechanism we're using is
> > > unreliable, so something needs to be done. My leaning is to go with
> > > the global thread list and atomicity of list-unlock with exit.
> > 
> > yes that sounds possible, i added some instrumentation to musl
> > and the trace shows situations like that before the deadlock,
> > exiting threads can even cause old (previously seen) entries to
> > disappear from the dir.
> > 
> Thanks for the thorough instrumentation! Your traces confirm both my theory
> about the deadlock and unreliability of /proc/self/task.
> 
> I'd also done a very light instrumentation just before I got your email, but
> it took me a while to understand the output I got (see below).

the attached patch fixes the issue on my machine.
i don't know if this is just luck.

the assumption is that if /proc/self/task is read twice such that
all tids in it seem to be active and caught, then all the active
threads of the process are caught (no new threads that are already
started but not visible there yet)

> Now, about the strange output I mentioned. Consider one of the above
> fragments:
> --iter: 4
> exit 15977
> retry 0
> tid 15977
> tid 15978
> exit 15978
> retry 1
> tid 15978
> tgkill: ESRCH
> mismatch: tid 15979: 0 != 23517
> 
> Note that "tid 15978" is printed two times. Recall that it's printed only if
> we haven't seen it in the chain. But how it's possible that we haven't seen
> it *two* times? Since we waited on the futex the first time and we got the
> lock, the signal handler must have unlocked it. There is even a comment
> before futex() call:
> 
> /* Obtaining the lock means the thread responded. ESRCH
>  * means the target thread exited, which is okay too. */
> 
> If it the signal handler reached futex unlock code, it must have updated the
> chain, and we must see the tid in the chain on the next retry and don't
> print it.
> 
> Apparently, there is another reason of futex(FUTEX_LOCK_PI) success: the
> owner is exiting concurrently (which is also indicated by the subsequent
> failure of tgkill with ESRCH). So obtaining the lock doesn't necessarily
> mean that the owner responded: it may also mean that the owner is (about to
> be?) dead.

so tgkill succeeds but the target exits before handling the signal.
i'd expect ESRCH then not success from the futex.
interesting.

anyway i had to retry until there are no exiting threads in dir to
reliably fix the deadlock.

--T6xhMxlHU34Bk0ad
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="0001-more-robust-synccall.patch"

>From ed101ece64b645865779293eb48109cad03e9c35 Mon Sep 17 00:00:00 2001
From: Szabolcs Nagy <nsz@port70.net>
Date: Sat, 9 Feb 2019 21:13:35 +0000
Subject: [PATCH] more robust synccall

---
 src/thread/synccall.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/thread/synccall.c b/src/thread/synccall.c
index cc66bd24..7f275114 100644
--- a/src/thread/synccall.c
+++ b/src/thread/synccall.c
@@ -102,6 +102,7 @@ void __synccall(void (*func)(void *), void *ctx)
 
 	/* Loop scanning the kernel-provided thread list until it shows no
 	 * threads that have not already replied to the signal. */
+	int all_threads_caught = 0;
 	for (;;) {
 		int miss_cnt = 0;
 		while ((de = readdir(&dir))) {
@@ -120,6 +121,7 @@ void __synccall(void (*func)(void *), void *ctx)
 			for (cp = head; cp && cp->tid != tid; cp=cp->next);
 			if (cp) continue;
 
+			miss_cnt++;
 			r = -__syscall(SYS_tgkill, pid, tid, SIGSYNCCALL);
 
 			/* Target thread exit is a success condition. */
@@ -142,10 +144,16 @@ void __synccall(void (*func)(void *), void *ctx)
 			/* Obtaining the lock means the thread responded. ESRCH
 			 * means the target thread exited, which is okay too. */
 			if (!r || r == ESRCH) continue;
-
-			miss_cnt++;
 		}
-		if (!miss_cnt) break;
+		if (miss_cnt)
+			all_threads_caught = 0;
+		else
+			all_threads_caught++;
+		/* when all visible threads are stopped there may be newly
+		 * created threads that are not in dir yet, so only assume
+		 * we are done when we see no running threads twice. */
+		if (all_threads_caught > 1)
+			break;
 		rewinddir(&dir);
 	}
 	close(dir.fd);
-- 
2.19.1


--T6xhMxlHU34Bk0ad--