From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/3225 Path: news.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: fork, set*id(synccall), cancellation -- nasty interaction Date: Fri, 26 Apr 2013 13:27:22 -0400 Message-ID: <20130426172722.GA21854@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1366997253 14517 80.91.229.3 (26 Apr 2013 17:27:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 26 Apr 2013 17:27:33 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-3229-gllmg-musl=m.gmane.org@lists.openwall.com Fri Apr 26 19:27:37 2013 Return-path: Envelope-to: gllmg-musl@plane.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by plane.gmane.org with smtp (Exim 4.69) (envelope-from ) id 1UVmQp-0003XQ-PS for gllmg-musl@plane.gmane.org; Fri, 26 Apr 2013 19:27:35 +0200 Original-Received: (qmail 9889 invoked by uid 550); 26 Apr 2013 17:27:34 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Original-Received: (qmail 9881 invoked from network); 26 Apr 2013 17:27:34 -0000 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Xref: news.gmane.org gmane.linux.lib.musl.general:3225 Archived-At: I've run across a nasty set of race conditions I'm trying to solve: 1. __synccall, needed for multi-threaded set*id, needs to obtain a lock that prevents thread creation and other things. However, setuid and setgid are specified to be async-signal-safe. They will presently hang if called from a signal handler that interrupted pthread_create or several other functions. 2. When forking, the calling thread's pid/tid change, and there is a window of time during which the child process has the wrong pid/tid in its thread descriptor. Considering that the next version of POSIX will remove fork from the list of async-signal-safe functions, and glibc's fork is already non-async-signal-safe, it seems that we could just go ahead and consider it unsafe. There are only two async-signal-safe code paths that care about the pid/tid in the thread descriptor: __synccall (called from setuid/setgid) and the cancellation signal handler (called asynchronously). We could prevent these from ever seeing wrong values in the thread descriptor by blocking signals before making the fork syscall and restoring them afterwards. This is probably the correct solution. However, there is still at least one other problem: 3. If thread B forks while thread A is in the middle of starting a __synccall operation, pthread_create, or anything else dealing with the lock mentioned in point 1 above, the child process will inherit an inconstent state. Normally this would not be a big deal since multi-threaded programs are forbidden from doing anything async-signal-unsafe after forking. However, setuid and setgid are specified to be async-signal-safe, but cannot handle the inconsistent state. The case where they're called after fork returns in the child is not a problem, since libc.threads_minus_1 will be 0 and the __synccall logic will not be used, but if fork was called from a signal handler that interrupted setuid() after it was determined that __synccall is needed but before __synccall actually began, we still hit the trouble case. Perhaps this can be solved though: if __synccall first blocks all signals except its own internal signal (blocking its own before taking the lock would lead to deadlock) then tests libc.threads_minus_1 before proceeding, it could determine that the process has switched to single-threaded mode and avoid the whole __synccall logic. Due to signals being blocked, there is no way fork could be called in this window. The hardest problem seems to be #1, and the only immediate solution I see is just making the operation fail if the lock is not available. This technically makes it async-signal-safe, but it's an undesirable failure case... Anyway, I'm mainly posting this to have a good record of the issues, but I would also be happy to hear any ideas towards fixing them. Rich