From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13513 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: sem_wait and EINTR Date: Thu, 6 Dec 2018 11:23:36 -0500 Message-ID: <20181206162336.GB23599@brightrain.aerifal.cx> References: <20181205191605.72492698@orivej.orivej.org> <20181205194759.GA32233@voyager> <20181205212716.sx6ra2xqhuei735q@core.my.home> <20181205215826.GX23599@brightrain.aerifal.cx> <20181206024340.202e0fc4@orivej.orivej.org> <20181206031756.GZ23599@brightrain.aerifal.cx> <20181206155756.GB32233@voyager> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1544113305 32487 195.159.176.226 (6 Dec 2018 16:21:45 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 6 Dec 2018 16:21:45 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13529-gllmg-musl=m.gmane.org@lists.openwall.com Thu Dec 06 17:21:41 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gUwPE-0008EA-EF for gllmg-musl@m.gmane.org; Thu, 06 Dec 2018 17:21:40 +0100 Original-Received: (qmail 10207 invoked by uid 550); 6 Dec 2018 16:23:49 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 10189 invoked from network); 6 Dec 2018 16:23:48 -0000 Content-Disposition: inline In-Reply-To: <20181206155756.GB32233@voyager> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13513 Archived-At: On Thu, Dec 06, 2018 at 04:57:56PM +0100, Markus Wichmann wrote: > On Wed, Dec 05, 2018 at 10:17:56PM -0500, Rich Felker wrote: > > I'd like it if we could avoid the pre-linux-2.6.22 bug of spurious > > EINTR from SYS_futex, but I don't see any way to do so except possibly > > wrapping all signal handlers and implementing restart-vs-EINTR > > ourselves. So if we need to change this, it might just be a case where > > we say "well, sorry, your kernel is broken" if someone is using a > > broken kernel. > > > > Thoughts? > > > > Rich > > I really don't know what you are getting at, here. In the hypothetical > case you detected an EINTR return without a signal having been handled, > you could just retry the syscall. The problem is getting that > information in the first place. See the commit c0ed5a201b2bdb6d1896064bec0020c9973db0a1 which introduced the EINTR suppression, deliberately: per POSIX, the EINTR condition is an optional error for these functions, not a mandatory one. since old kernels (pre-2.6.22) failed to honor SA_RESTART for the futex syscall, it's dangerous to trust EINTR from the kernel. thankfully POSIX offers an easy way out. (Ignore the apparently wrong claim about POSIX.) The concern is that perfectly correct programs can use sem_wait without a retry loop if they do not install interrupting signal handlers (and most programs refrain from doing that, because it's awful). However, if run on an old kernel (<2.6.22), these correct programs would wrongly make forward progress without finishing the sem_wait. One ugly hack that might be worth doing is simply tracking whether any signal handler has been installed without SA_RESTART, and keeping the retry-on-EINTR logic if not. Retrying under such conditions could not break conformance and would preserve safety on old kernels for programs which don't use interrupting signals at all. It would not preserve the safety of *all possible* programs on such kernels, since a program could install interrupting signal handlers but leave the corresponding signals blocked in all threads that use sem_wait, but I suspect that's a much less likely scenario. > Practically, I see a lot of work for little gain. Wrapping all signal > handlers means we need to save up to _NSIG function pointers. Access to > those doesn't need serialization any more than sigaction() does. Though, > what does it mean if someone changes the signal handler while we are in > the wrapper? This is not an actual proposal at this time (although the need has been considered for other reasons at various times, which is why I'm familiar with the concept). It was just a statement that I don't think the problem can be worked around without such an extreme measure. > Speaking of calls that shouldn't fail but do: Is futex_wake() affected > by the same bug? It shouldn't be because it shouldn't enter any interruptible sleep. Rich