From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13544 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: sem_wait and EINTR Date: Sat, 8 Dec 2018 21:51:40 -0500 Message-ID: <20181209025140.GL23599@brightrain.aerifal.cx> References: <20181205191605.72492698@orivej.orivej.org> <20181205194759.GA32233@voyager> <20181205212716.sx6ra2xqhuei735q@core.my.home> <20181205215826.GX23599@brightrain.aerifal.cx> <20181206024340.202e0fc4@orivej.orivej.org> <20181206031756.GZ23599@brightrain.aerifal.cx> <20181206155756.GB32233@voyager> <20181206162336.GB23599@brightrain.aerifal.cx> <20181206170359.GC32233@voyager> <20181206173337.GD32233@voyager> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1544323792 14329 195.159.176.226 (9 Dec 2018 02:49:52 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 9 Dec 2018 02:49:52 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-13560-gllmg-musl=m.gmane.org@lists.openwall.com Sun Dec 09 03:49:48 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1gVpAA-0003bR-9E for gllmg-musl@m.gmane.org; Sun, 09 Dec 2018 03:49:46 +0100 Original-Received: (qmail 32731 invoked by uid 550); 9 Dec 2018 02:51:54 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32707 invoked from network); 9 Dec 2018 02:51:54 -0000 Content-Disposition: inline In-Reply-To: <20181206173337.GD32233@voyager> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13544 Archived-At: On Thu, Dec 06, 2018 at 06:33:37PM +0100, Markus Wichmann wrote: > Hi all, > > is the attached patch acceptable? A word about the bitfields: I > generally dislike them for most things, but I didn't want to destroy the > alignment struct __libc had going on, and these other flags really only > are 0 or 1. > > Patch is untested for want of an old kernel. > > Ciao, > Markus > >From 816abf54fbbb02923331b69b62333b8a0edb4181 Mon Sep 17 00:00:00 2001 > From: Markus Wichmann > Date: Thu, 6 Dec 2018 18:30:26 +0100 > Subject: [PATCH 3/3] Add workaround for old linux bug. > > --- > src/internal/libc.h | 4 +--- > src/signal/sigaction.c | 2 ++ > src/thread/sem_timedwait.c | 5 ++++- > 3 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/src/internal/libc.h b/src/internal/libc.h > index ac97dc7e..2e6737d9 100644 > --- a/src/internal/libc.h > +++ b/src/internal/libc.h > @@ -18,9 +18,7 @@ struct tls_module { > }; > > struct __libc { > - int can_do_threads; > - int threaded; > - int secure; > + unsigned can_do_threads:1, threaded:1, secure:1, handling_sigs:1; > volatile int threads_minus_1; > size_t *auxv; > struct tls_module *tls_head; > diff --git a/src/signal/sigaction.c b/src/signal/sigaction.c > index af47195e..5f02b99b 100644 > --- a/src/signal/sigaction.c > +++ b/src/signal/sigaction.c > @@ -43,6 +43,8 @@ int __libc_sigaction(int sig, const struct sigaction *restrict sa, struct sigact > SIGPT_SET, 0, _NSIG/8); > unmask_done = 1; > } > + if (!(sa->sa_flags & SA_RESTART)) > + libc.handling_sigs = 1; > } > /* Changing the disposition of SIGABRT to anything but > * SIG_DFL requires a lock, so that it cannot be changed > diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c > index 8132eb1b..e76ae9de 100644 > --- a/src/thread/sem_timedwait.c > +++ b/src/thread/sem_timedwait.c > @@ -22,7 +22,10 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at) > pthread_cleanup_push(cleanup, (void *)(sem->__val+1)); > r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]); > pthread_cleanup_pop(1); > - if (r && r != EINTR) { > + /* Linux pre-2.6.22 bug: Sometimes SYS_futex returns with EINTR when it should not. > + * Workaround: Retry on EINTR unless someone installed handlers before. > + */ > + if (r && (r != EINTR || libc.handling_sigs)) { > errno = r; > return -1; > } > -- > 2.19.1 > Conceptually the logic looks correct, but I'm trying to phase out the __libc structure, and more importantly, introducing a bitfield here is not safe unless access to all bitfield members is controlled by a common mutex or other synchronization mechanism. There is a data race if any access to the other fields is accessed when sigaction is callable from another thread, and the impact of a data race on any of these is critical. Unfortunately, even without the bitfield, there is a data race between access to the new flag from sem_timedwait and from sigaction. In theory, this potentially results in a race window where sem_wait retries on EINTR because it doesn't yet see the updated handling_sigs. In reality, I don't think that happens -- at least not on kernels without the bug -- because the store to handling_sigs is sequenced before sigaction, which is sequenced before the EINTR-generating signal delivery, which is sequenced before the load in sig_timedwait, with memory barriers (necessarily from an implementation standpoint) imposed by whatever kernel mechanism delivers the signal. However there is still a race between concurrent/unsequenced sigaction calls. The "right" fix would be to use an AS-safe lock to protect the handling_sigs object, or make it atomic (a_store on the sigaction side, a_barrier before load on the sem side). Alternatively we could assume the above reasoning about sequencing of sigaction before EINTR and just use a lock on the sigaction side, but the atomic is probably simplest and "safer". A couple other small nits: comments especially should not exceed 80 columns (preferably <75 or so) assuming tab width 8; code should avoid it too but I see it slightly does in a few places there. Spaces should not be used for indention. And the commit message should reflect the change made; it's not adding a workaround, but reducing the scope of a previous workaround (which should be cited by commit id) to fix a conformance bug. The name "handling_sigs" is also a bit misleading; what it's indicating is that interrupting signal handlers are or have been present. I'm happy to fix up these issues and commit if there aren't mistakes in the above comments that still need to be addressed. Rich