From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13734
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Alexey Izbyshev <izbyshev@ispras.ru>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task
Date: Fri, 8 Feb 2019 21:14:48 +0300
Message-ID: <f368f992-9c4c-8b7f-7b0e-39e39c27ebf7@ispras.ru>
References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru>
 <20190207183626.GQ23599@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="83115"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.4.0
To: Rich Felker <dalias@libc.org>, musl@lists.openwall.com
Original-X-From: musl-return-13750-gllmg-musl=m.gmane.org@lists.openwall.com Fri Feb 08 19:14:33 2019
Return-path: <musl-return-13750-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-13750-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gsAfZ-000LVE-AH
	for gllmg-musl@m.gmane.org; Fri, 08 Feb 2019 19:14:33 +0100
Original-Received: (qmail 12100 invoked by uid 550); 8 Feb 2019 18:14:30 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 12079 invoked from network); 8 Feb 2019 18:14:30 -0000
In-Reply-To: <20190207183626.GQ23599@brightrain.aerifal.cx>
Content-Language: en-US
Xref: news.gmane.org gmane.linux.lib.musl.general:13734
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13734>

On 2/7/19 9:36 PM, Rich Felker wrote:
>> For some reason __synccall accesses the list without a barrier (line
>> 120), though I don't see why one wouldn't be necessary for correct
>> observability of head->next. However, I'm testing on x86_64, so
>> acquire/release semantics works without barriers.
> 
> The formal intent in musl is that all a_* are full seq_cst barriers.
> On x86[_64] this used to not be the case; we just used a normal store,
> but that turned out to be broken because in some places (and
> apparently here in __synccall) there was code that depended on a_store
> having acquire semantics too. See commit
> 3c43c0761e1725fd5f89a9c028cbf43250abb913 and
> 5a9c8c05a5a0cdced4122589184fd795b761bb4a.
> 
> If not for this fix, I could see this being related (but again, it
> should see it after timeout anyway). But since the barrier is there
> now, it shouldn't happen.

Thanks for the explanation about a_store(). I didn't know that it has 
seq_cst semantics. However, I was talking about a barrier between loads 
of head and cp->tid/cp->next:

for (cp = head; cp && cp->tid != tid; cp=cp->next);

In my understanding, we need consume semantics to observe correct values 
of tid and next after we load head. If we don't take Alpha into account,
it probably works without barriers on most current architectures, 
however, I don't know what policy musl has for such cases.

>> Of course, the larger problem remains: if we may miss some threads
>> because of /proc, we may fail to call setuid() syscall in those
>> threads. And that's indeed easily happens in my second test
>> (attached: test-setuid-mismatch.c; expected to be run as a suid
>> binary; note that I tested both with and without "presignalling").
> 
> Does it work if we force two iterations of the readdir loop with no
> tasks missed, rather than just one, to catch the case of missed
> concurrent additions? I'm not sure. But all this makes me really
> uncomfortable with the current approach.

I've tested with 0, 1, 2 and 3 retries of the main loop if miss_cnt == 
0. The test eventually failed in all cases, with 0 retries requiring 
only a handful of iterations, 1 -- on the order of 100, 2 -- on the 
order of 10000 and 3 -- on the order of 100000.
> 
>> Both tests run on glibc (2.27) without any problem.
> 
> I think glibc has a different problem: there's a window at thread exit
> where setxid can return success without having run the id change in
> the exiting thread. In this case, assuming an attacker has code
> execution in the process after dropping root, they can mmap malicious
> code over top of the thread exit code and obtain code execution as
> root.

Yes, it appears to be true: glibc won't signal a thread if it's marked 
as exiting [1, 2].

[1] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/allocatestack.c;h=d8e8570a7d9b9622309555b03cc98b3dd22e11c9;hb=HEAD#l1035
[2] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/allocatestack.c;h=d8e8570a7d9b9622309555b03cc98b3dd22e11c9;hb=HEAD#l1075

Alexey