mailing list of musl libc
 help / color / mirror / code / Atom feed
* sem_wait and EINTR
@ 2018-12-05 19:16 Orivej Desh
  2018-12-05 19:47 ` Markus Wichmann
  2018-12-05 22:03 ` Rich Felker
  0 siblings, 2 replies; 19+ messages in thread
From: Orivej Desh @ 2018-12-05 19:16 UTC (permalink / raw)
  To: musl

Hi,

musl differs from glibc in that it does not return from sem_wait() on EINTR.
This mail [1] explains that this is useful to safeguard the software that does
not check sem_wait() return code. However, since glibc does return EINTR, such
bugs in the open source software seem to be eventually noticed and fixed.

The musl behaviour has a disadvantage in that it makes sem_wait() difficult to
interrupt (and delays the return from sem_timedwait() until the timeout), which
is relied upon in particular by multithreaded fuse for breaking out of the
main thread waiting loop [2]. IMHO the fuse implementation is sensible, since it
looks better than the alternatives I could imagine, and I'm inclined to patch
musl like this [3] to meet its expectations.

Am I missing some implications? Would you reconsider returning from sem_wait()
on EINTR? Could you suggest a good fix for fuse that does not change musl?

[1] https://www.openwall.com/lists/musl/2018/02/24/3
[2] https://github.com/libfuse/libfuse/blob/fuse-3.3.0/lib/fuse_loop_mt.c#L332
[3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 19:16 sem_wait and EINTR Orivej Desh
@ 2018-12-05 19:47 ` Markus Wichmann
  2018-12-05 21:27   ` Ondřej Jirman
  2018-12-05 22:03 ` Rich Felker
  1 sibling, 1 reply; 19+ messages in thread
From: Markus Wichmann @ 2018-12-05 19:47 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 07:16:05PM +0000, Orivej Desh wrote:
> Hi,
> 
> musl differs from glibc in that it does not return from sem_wait() on EINTR.
> This mail [1] explains that this is useful to safeguard the software that does
> not check sem_wait() return code. However, since glibc does return EINTR, such
> bugs in the open source software seem to be eventually noticed and fixed.
> 
> The musl behaviour has a disadvantage in that it makes sem_wait() difficult to
> interrupt (and delays the return from sem_timedwait() until the timeout), which
> is relied upon in particular by multithreaded fuse for breaking out of the
> main thread waiting loop [2]. IMHO the fuse implementation is sensible, since it
> looks better than the alternatives I could imagine, and I'm inclined to patch
> musl like this [3] to meet its expectations.
> 
> Am I missing some implications? Would you reconsider returning from sem_wait()
> on EINTR? Could you suggest a good fix for fuse that does not change musl?
> 
> [1] https://www.openwall.com/lists/musl/2018/02/24/3
> [2] https://github.com/libfuse/libfuse/blob/fuse-3.3.0/lib/fuse_loop_mt.c#L332
> [3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1

I wanted to suggest a reworking of libfuse to instead of waiting on a
semaphore maybe just wait on the actual thread. Then I read the source
of pthread_join() and noticed that it, too, would hang itself in a loop
it can't break out of due to EINTR.

Maybe the simplest solution would be to simply tell libfuse users to
call fuse_session_exit() from the SIGINT handler if they want this
behavior to be portable. If fuse_session_exit() is not
async-signal-safe, then handle SIGINT in another thread using
pthread_sigmask() and sigwaitinfo().

In any case, libfuse is relying on behavior not guarenteed by the
interface. The fact that a certain implementation of the interface
happens to provide that behavior is irrelevant.

On a practical note, I certainly never expected sem_wait() to be capable
of failing due to errors other than bad programming before. Coding that
in would make even simple things like the consumer-producer example by
Dijkstra look horrible!

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 19:47 ` Markus Wichmann
@ 2018-12-05 21:27   ` Ondřej Jirman
  2018-12-05 21:58     ` Rich Felker
  0 siblings, 1 reply; 19+ messages in thread
From: Ondřej Jirman @ 2018-12-05 21:27 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 08:47:59PM +0100, Markus Wichmann wrote:
> On Wed, Dec 05, 2018 at 07:16:05PM +0000, Orivej Desh wrote:
> > Hi,
> > 
> > musl differs from glibc in that it does not return from sem_wait() on EINTR.
> > This mail [1] explains that this is useful to safeguard the software that does
> > not check sem_wait() return code. However, since glibc does return EINTR, such
> > bugs in the open source software seem to be eventually noticed and fixed.
> > 
> > The musl behaviour has a disadvantage in that it makes sem_wait() difficult to
> > interrupt (and delays the return from sem_timedwait() until the timeout), which
> > is relied upon in particular by multithreaded fuse for breaking out of the
> > main thread waiting loop [2]. IMHO the fuse implementation is sensible, since it
> > looks better than the alternatives I could imagine, and I'm inclined to patch
> > musl like this [3] to meet its expectations.
> > 
> > Am I missing some implications? Would you reconsider returning from sem_wait()
> > on EINTR? Could you suggest a good fix for fuse that does not change musl?
> > 
> > [1] https://www.openwall.com/lists/musl/2018/02/24/3
> > [2] https://github.com/libfuse/libfuse/blob/fuse-3.3.0/lib/fuse_loop_mt.c#L332
> > [3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1
> 
> I wanted to suggest a reworking of libfuse to instead of waiting on a
> semaphore maybe just wait on the actual thread. Then I read the source
> of pthread_join() and noticed that it, too, would hang itself in a loop
> it can't break out of due to EINTR.
> 
> Maybe the simplest solution would be to simply tell libfuse users to
> call fuse_session_exit() from the SIGINT handler if they want this
> behavior to be portable. If fuse_session_exit() is not
> async-signal-safe, then handle SIGINT in another thread using
> pthread_sigmask() and sigwaitinfo().
> 
> In any case, libfuse is relying on behavior not guarenteed by the
> interface. The fact that a certain implementation of the interface
> happens to provide that behavior is irrelevant.

Hello,

It's specified by POSIX:

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html

Sates: "The sem_wait() function is interruptible by the delivery of a signal."

> On a practical note, I certainly never expected sem_wait() to be capable
> of failing due to errors other than bad programming before. Coding that
> in would make even simple things like the consumer-producer example by
> Dijkstra look horrible!

There's a difference between pedagogy and production code.

Just wanted to share that I was bitten myself by blindly retrying on EINTR instead
of handling it correctly for the particular program.

For example: When using signalfd, it is critical to be handling EINTR correctly
by getting back to the main loop so that signal handler can be executed (as it
is not executed asynchronously). I certainly would not expect the C library to be
eating EINTRs.

regards,
  o.j.

> Ciao,
> Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 21:27   ` Ondřej Jirman
@ 2018-12-05 21:58     ` Rich Felker
  2018-12-06  2:43       ` Orivej Desh
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-05 21:58 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 10:27:16PM +0100, Ondřej Jirman wrote:
> On Wed, Dec 05, 2018 at 08:47:59PM +0100, Markus Wichmann wrote:
> > On Wed, Dec 05, 2018 at 07:16:05PM +0000, Orivej Desh wrote:
> > > Hi,
> > > 
> > > musl differs from glibc in that it does not return from sem_wait() on EINTR.
> > > This mail [1] explains that this is useful to safeguard the software that does
> > > not check sem_wait() return code. However, since glibc does return EINTR, such
> > > bugs in the open source software seem to be eventually noticed and fixed.
> > > 
> > > The musl behaviour has a disadvantage in that it makes sem_wait() difficult to
> > > interrupt (and delays the return from sem_timedwait() until the timeout), which
> > > is relied upon in particular by multithreaded fuse for breaking out of the
> > > main thread waiting loop [2]. IMHO the fuse implementation is sensible, since it
> > > looks better than the alternatives I could imagine, and I'm inclined to patch
> > > musl like this [3] to meet its expectations.
> > > 
> > > Am I missing some implications? Would you reconsider returning from sem_wait()
> > > on EINTR? Could you suggest a good fix for fuse that does not change musl?
> > > 
> > > [1] https://www.openwall.com/lists/musl/2018/02/24/3
> > > [2] https://github.com/libfuse/libfuse/blob/fuse-3.3.0/lib/fuse_loop_mt.c#L332
> > > [3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1
> > 
> > I wanted to suggest a reworking of libfuse to instead of waiting on a
> > semaphore maybe just wait on the actual thread. Then I read the source
> > of pthread_join() and noticed that it, too, would hang itself in a loop
> > it can't break out of due to EINTR.
> > 
> > Maybe the simplest solution would be to simply tell libfuse users to
> > call fuse_session_exit() from the SIGINT handler if they want this
> > behavior to be portable. If fuse_session_exit() is not
> > async-signal-safe, then handle SIGINT in another thread using
> > pthread_sigmask() and sigwaitinfo().
> > 
> > In any case, libfuse is relying on behavior not guarenteed by the
> > interface. The fact that a certain implementation of the interface
> > happens to provide that behavior is irrelevant.
> 
> Hello,
> 
> It's specified by POSIX:
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
> 
> Sates: "The sem_wait() function is interruptible by the delivery of a signal."

This seems contradictory with EINTR being a "may fail" error, and, if
interpreted the way you want to interpret it, seems to be
contradictory with SA_RESTART semantics, since it doesn't say anything
about whether that signal is an interrupting one. I think we should
attempt to obtain a clarification on what the intent is here. Does "is
interruptible" mean that it needs to fail on signals (only without
SA_RESTART?) or simply that signal handlers must be permitted to run
(i.e. the wait can't happen with signals blocked)?

> > On a practical note, I certainly never expected sem_wait() to be capable
> > of failing due to errors other than bad programming before. Coding that
> > in would make even simple things like the consumer-producer example by
> > Dijkstra look horrible!
> 
> There's a difference between pedagogy and production code.
> 
> Just wanted to share that I was bitten myself by blindly retrying on EINTR instead
> of handling it correctly for the particular program.
> 
> For example: When using signalfd, it is critical to be handling EINTR correctly
> by getting back to the main loop so that signal handler can be executed (as it
> is not executed asynchronously). I certainly would not expect the C library to be
> eating EINTRs.

Most attempts to use EINTR (any without backoff-and-resignal) are
subject to race conditions, so it's hard to expect that you can do
thing kind of thing safely. Also, the state of stdio FILEs after EINTR
does not seem to be predictable/recoverable.

With that said, it is my intent to implement the specified
requirements as best we can, even if I disagree over whether they're
useful. So let's see if we can get answers on what's supposed to
happen.

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 19:16 sem_wait and EINTR Orivej Desh
  2018-12-05 19:47 ` Markus Wichmann
@ 2018-12-05 22:03 ` Rich Felker
  2018-12-06  2:43   ` Orivej Desh
  1 sibling, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-05 22:03 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 07:16:05PM +0000, Orivej Desh wrote:
> Hi,
> 
> musl differs from glibc in that it does not return from sem_wait() on EINTR.
> This mail [1] explains that this is useful to safeguard the software that does
> not check sem_wait() return code. However, since glibc does return EINTR, such
> bugs in the open source software seem to be eventually noticed and fixed.
> 
> The musl behaviour has a disadvantage in that it makes sem_wait() difficult to
> interrupt (and delays the return from sem_timedwait() until the timeout), which
> is relied upon in particular by multithreaded fuse for breaking out of the
> main thread waiting loop [2]. IMHO the fuse implementation is sensible, since it
> looks better than the alternatives I could imagine, and I'm inclined to patch
> musl like this [3] to meet its expectations.
> 
> Am I missing some implications? Would you reconsider returning from sem_wait()
> on EINTR? Could you suggest a good fix for fuse that does not change musl?
> 
> [1] https://www.openwall.com/lists/musl/2018/02/24/3
> [2] https://github.com/libfuse/libfuse/blob/fuse-3.3.0/lib/fuse_loop_mt.c#L332
> [3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1
> 
> diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
> index 8132eb1b..58d3ebfe 100644
> --- a/src/thread/sem_timedwait.c
> +++ b/src/thread/sem_timedwait.c
> @@ -22,7 +22,7 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
>  		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
>  		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
>  		pthread_cleanup_pop(1);
> -		if (r && r != EINTR) {
> +		if (r) {
>  			errno = r;
>  			return -1;
>  		}
> diff --git a/src/thread/synccall.c b/src/thread/synccall.c
> index cc66bd24..d9ab40cb 100644
> --- a/src/thread/synccall.c
> +++ b/src/thread/synccall.c
> @@ -37,10 +37,10 @@ static void handler(int sig)
>  	if (a_cas(&target_tid, ch.tid, 0) == (ch.tid | 0x80000000))
>  		__syscall(SYS_futex, &target_tid, FUTEX_UNLOCK_PI|FUTEX_PRIVATE);
>  
> -	sem_wait(&ch.target_sem);
> +	while (sem_wait(&ch.target_sem) && errno != EINTR);
>  	callback(context);
>  	sem_post(&ch.caller_sem);
> -	sem_wait(&ch.target_sem);
> +	while (sem_wait(&ch.target_sem) && errno != EINTR);
>  
>  	errno = old_errno;
>  }
> @@ -153,7 +153,7 @@ void __synccall(void (*func)(void *), void *ctx)
>  	/* Serialize execution of callback in caught threads. */
>  	for (cp=head; cp; cp=cp->next) {
>  		sem_post(&cp->target_sem);
> -		sem_wait(&cp->caller_sem);
> +		while (sem_wait(&cp->caller_sem) && errno != EINTR);
>  	}
>  
>  	sa.sa_handler = SIG_IGN;

I think the changes to __synccall are unnecessary noise. It
necessarily runs with all signals, even implementation-internal ones,
blocked. Did you just miss this or do you think there's a reason the
checks need to be added?

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 22:03 ` Rich Felker
@ 2018-12-06  2:43   ` Orivej Desh
  0 siblings, 0 replies; 19+ messages in thread
From: Orivej Desh @ 2018-12-06  2:43 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2018-12-05]
> On Wed, Dec 05, 2018 at 07:16:05PM +0000, Orivej Desh wrote:
> > [3] https://github.com/orivej/musl/commit/c4c38aaab4fc55c23669f7b81386b615609cc3e1
> > 
> > diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
> > index 8132eb1b..58d3ebfe 100644
> > --- a/src/thread/sem_timedwait.c
> > +++ b/src/thread/sem_timedwait.c
> > @@ -22,7 +22,7 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
> >  		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
> >  		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
> >  		pthread_cleanup_pop(1);
> > -		if (r && r != EINTR) {
> > +		if (r) {
> >  			errno = r;
> >  			return -1;
> >  		}
> > diff --git a/src/thread/synccall.c b/src/thread/synccall.c
> > index cc66bd24..d9ab40cb 100644
> > --- a/src/thread/synccall.c
> > +++ b/src/thread/synccall.c
> > @@ -37,10 +37,10 @@ static void handler(int sig)
> >  	if (a_cas(&target_tid, ch.tid, 0) == (ch.tid | 0x80000000))
> >  		__syscall(SYS_futex, &target_tid, FUTEX_UNLOCK_PI|FUTEX_PRIVATE);
> >  
> > -	sem_wait(&ch.target_sem);
> > +	while (sem_wait(&ch.target_sem) && errno != EINTR);
> >  	callback(context);
> >  	sem_post(&ch.caller_sem);
> > -	sem_wait(&ch.target_sem);
> > +	while (sem_wait(&ch.target_sem) && errno != EINTR);
> >  
> >  	errno = old_errno;
> >  }
> > @@ -153,7 +153,7 @@ void __synccall(void (*func)(void *), void *ctx)
> >  	/* Serialize execution of callback in caught threads. */
> >  	for (cp=head; cp; cp=cp->next) {
> >  		sem_post(&cp->target_sem);
> > -		sem_wait(&cp->caller_sem);
> > +		while (sem_wait(&cp->caller_sem) && errno != EINTR);
> >  	}
> >  
> >  	sa.sa_handler = SIG_IGN;  
> 
> I think the changes to __synccall are unnecessary noise. It
> necessarily runs with all signals, even implementation-internal ones,
> blocked. Did you just miss this or do you think there's a reason the
> checks need to be added?

You are right, I did not notice that. There is no need to change synccall.c.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-05 21:58     ` Rich Felker
@ 2018-12-06  2:43       ` Orivej Desh
  2018-12-06  3:17         ` Rich Felker
  0 siblings, 1 reply; 19+ messages in thread
From: Orivej Desh @ 2018-12-06  2:43 UTC (permalink / raw)
  To: musl

* Rich Felker <dalias@libc.org> [2018-12-05]
> On Wed, Dec 05, 2018 at 10:27:16PM +0100, Ondřej Jirman wrote:
> > On Wed, Dec 05, 2018 at 08:47:59PM +0100, Markus Wichmann wrote:  
> > 
> > It's specified by POSIX:
> > 
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
> > 
> > Sates: "The sem_wait() function is interruptible by the delivery of a signal."  
> 
> This seems contradictory with EINTR being a "may fail" error, and, if
> interpreted the way you want to interpret it, seems to be
> contradictory with SA_RESTART semantics, since it doesn't say anything
> about whether that signal is an interrupting one. I think we should
> attempt to obtain a clarification on what the intent is here. Does "is
> interruptible" mean that it needs to fail on signals (only without
> SA_RESTART?) or simply that signal handlers must be permitted to run
> (i.e. the wait can't happen with signals blocked)?

There is a definition of interruptible functions on the sigaction page:

    SA_RESTART

    This flag affects the behavior of interruptible functions; that is, those
    specified to fail with errno set to [EINTR].

    If set, and a function specified as interruptible is interrupted by this
    signal, the function shall restart and shall not fail with [EINTR] unless
    otherwise specified.

    If the flag is not set, interruptible functions interrupted by this signal
    shall fail with errno set to [EINTR].

https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06  2:43       ` Orivej Desh
@ 2018-12-06  3:17         ` Rich Felker
  2018-12-06 15:57           ` Markus Wichmann
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-06  3:17 UTC (permalink / raw)
  To: musl

On Thu, Dec 06, 2018 at 02:43:40AM +0000, Orivej Desh wrote:
> * Rich Felker <dalias@libc.org> [2018-12-05]
> > On Wed, Dec 05, 2018 at 10:27:16PM +0100, Ondřej Jirman wrote:
> > > On Wed, Dec 05, 2018 at 08:47:59PM +0100, Markus Wichmann wrote:  
> > > 
> > > It's specified by POSIX:
> > > 
> > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_wait.html
> > > 
> > > Sates: "The sem_wait() function is interruptible by the delivery of a signal."  
> > 
> > This seems contradictory with EINTR being a "may fail" error, and, if
> > interpreted the way you want to interpret it, seems to be
> > contradictory with SA_RESTART semantics, since it doesn't say anything
> > about whether that signal is an interrupting one. I think we should
> > attempt to obtain a clarification on what the intent is here. Does "is
> > interruptible" mean that it needs to fail on signals (only without
> > SA_RESTART?) or simply that signal handlers must be permitted to run
> > (i.e. the wait can't happen with signals blocked)?
> 
> There is a definition of interruptible functions on the sigaction page:
> 
>     SA_RESTART
> 
>     This flag affects the behavior of interruptible functions; that is, those
>     specified to fail with errno set to [EINTR].
> 
>     If set, and a function specified as interruptible is interrupted by this
>     signal, the function shall restart and shall not fail with [EINTR] unless
>     otherwise specified.
> 
>     If the flag is not set, interruptible functions interrupted by this signal
>     shall fail with errno set to [EINTR].
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html

OK, this seems correct. I still don't understand why EINTR is a "may
fail" error; it's been that way at least back to SUSv2:

http://pubs.opengroup.org/onlinepubs/7908799/xsh/sem_wait.html

I'd like it if we could avoid the pre-linux-2.6.22 bug of spurious
EINTR from SYS_futex, but I don't see any way to do so except possibly
wrapping all signal handlers and implementing restart-vs-EINTR
ourselves. So if we need to change this, it might just be a case where
we say "well, sorry, your kernel is broken" if someone is using a
broken kernel.

Thoughts?

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06  3:17         ` Rich Felker
@ 2018-12-06 15:57           ` Markus Wichmann
  2018-12-06 16:23             ` Rich Felker
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Wichmann @ 2018-12-06 15:57 UTC (permalink / raw)
  To: musl

On Wed, Dec 05, 2018 at 10:17:56PM -0500, Rich Felker wrote:
> I'd like it if we could avoid the pre-linux-2.6.22 bug of spurious
> EINTR from SYS_futex, but I don't see any way to do so except possibly
> wrapping all signal handlers and implementing restart-vs-EINTR
> ourselves. So if we need to change this, it might just be a case where
> we say "well, sorry, your kernel is broken" if someone is using a
> broken kernel.
> 
> Thoughts?
> 
> Rich

I really don't know what you are getting at, here. In the hypothetical
case you detected an EINTR return without a signal having been handled,
you could just retry the syscall. The problem is getting that
information in the first place.

Practically, I see a lot of work for little gain. Wrapping all signal
handlers means we need to save up to _NSIG function pointers. Access to
those doesn't need serialization any more than sigaction() does. Though,
what does it mean if someone changes the signal handler while we are in
the wrapper?

Due to SA_SIGINFO we'd need two different wrappers. Or we just always
set SA_SIGINFO in the kernel. I don't know if that incurs a runtime cost
in the kernel, though. On the plus side, restorers would get simpler.

Does the "actual signal arrived" bit need to be thread local? And what
about race conditions? Well, OK, most races I can think of aren't
actually a problem. Any signal arrival between resetting the flag before
the syscall and checking it after the syscall would then count.

So, all in all we need between 64 and 128 additional machine words in
the data section, sigaction() becomes more complicated, all users of
__timedwait_cp() become more complicated (you do want to make this
behavior consistent, right?), signal handling becomes more complicated
over all, and all of that just so that sem_wait() can fail but not
under all circumstances.

Speaking of calls that shouldn't fail but do: Is futex_wake() affected
by the same bug?

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06 15:57           ` Markus Wichmann
@ 2018-12-06 16:23             ` Rich Felker
  2018-12-06 17:03               ` Markus Wichmann
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-06 16:23 UTC (permalink / raw)
  To: musl

On Thu, Dec 06, 2018 at 04:57:56PM +0100, Markus Wichmann wrote:
> On Wed, Dec 05, 2018 at 10:17:56PM -0500, Rich Felker wrote:
> > I'd like it if we could avoid the pre-linux-2.6.22 bug of spurious
> > EINTR from SYS_futex, but I don't see any way to do so except possibly
> > wrapping all signal handlers and implementing restart-vs-EINTR
> > ourselves. So if we need to change this, it might just be a case where
> > we say "well, sorry, your kernel is broken" if someone is using a
> > broken kernel.
> > 
> > Thoughts?
> > 
> > Rich
> 
> I really don't know what you are getting at, here. In the hypothetical
> case you detected an EINTR return without a signal having been handled,
> you could just retry the syscall. The problem is getting that
> information in the first place.

See the commit c0ed5a201b2bdb6d1896064bec0020c9973db0a1 which
introduced the EINTR suppression, deliberately:

    per POSIX, the EINTR condition is an optional error for these
    functions, not a mandatory one. since old kernels (pre-2.6.22) failed
    to honor SA_RESTART for the futex syscall, it's dangerous to trust
    EINTR from the kernel. thankfully POSIX offers an easy way out.

(Ignore the apparently wrong claim about POSIX.)

The concern is that perfectly correct programs can use sem_wait
without a retry loop if they do not install interrupting signal
handlers (and most programs refrain from doing that, because it's
awful). However, if run on an old kernel (<2.6.22), these correct
programs would wrongly make forward progress without finishing the
sem_wait.

One ugly hack that might be worth doing is simply tracking whether any
signal handler has been installed without SA_RESTART, and keeping the
retry-on-EINTR logic if not. Retrying under such conditions could not
break conformance and would preserve safety on old kernels for
programs which don't use interrupting signals at all. It would not
preserve the safety of *all possible* programs on such kernels, since
a program could install interrupting signal handlers but leave the
corresponding signals blocked in all threads that use sem_wait, but I
suspect that's a much less likely scenario.

> Practically, I see a lot of work for little gain. Wrapping all signal
> handlers means we need to save up to _NSIG function pointers. Access to
> those doesn't need serialization any more than sigaction() does. Though,
> what does it mean if someone changes the signal handler while we are in
> the wrapper?

This is not an actual proposal at this time (although the need has
been considered for other reasons at various times, which is why I'm
familiar with the concept). It was just a statement that I don't think
the problem can be worked around without such an extreme measure.

> Speaking of calls that shouldn't fail but do: Is futex_wake() affected
> by the same bug?

It shouldn't be because it shouldn't enter any interruptible sleep.

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06 16:23             ` Rich Felker
@ 2018-12-06 17:03               ` Markus Wichmann
  2018-12-06 17:33                 ` Markus Wichmann
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Wichmann @ 2018-12-06 17:03 UTC (permalink / raw)
  To: musl

On Thu, Dec 06, 2018 at 11:23:36AM -0500, Rich Felker wrote:
> One ugly hack that might be worth doing is simply tracking whether any
> signal handler has been installed without SA_RESTART, and keeping the
> retry-on-EINTR logic if not.

Oh, that's what you meant. So the effort suddenly went from "high" to
"trivial". This is just a flag that is 0 on process start and becomes 1
on the installation of a signal handler. And nothing else.

Honestly, as hacks go, I've seen way worse. The ugly thing is the bug
we're working around. Might be worth a comment, though.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06 17:03               ` Markus Wichmann
@ 2018-12-06 17:33                 ` Markus Wichmann
  2018-12-06 20:31                   ` Orivej Desh
  2018-12-09  2:51                   ` Rich Felker
  0 siblings, 2 replies; 19+ messages in thread
From: Markus Wichmann @ 2018-12-06 17:33 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 290 bytes --]

Hi all,

is the attached patch acceptable? A word about the bitfields: I
generally dislike them for most things, but I didn't want to destroy the
alignment struct __libc had going on, and these other flags really only
are 0 or 1.

Patch is untested for want of an old kernel.

Ciao,
Markus

[-- Attachment #2: 0003-Add-workaround-for-old-linux-bug.patch --]
[-- Type: text/x-diff, Size: 2021 bytes --]

From 816abf54fbbb02923331b69b62333b8a0edb4181 Mon Sep 17 00:00:00 2001
From: Markus Wichmann <nullplan@gmx.net>
Date: Thu, 6 Dec 2018 18:30:26 +0100
Subject: [PATCH 3/3] Add workaround for old linux bug.

---
 src/internal/libc.h        | 4 +---
 src/signal/sigaction.c     | 2 ++
 src/thread/sem_timedwait.c | 5 ++++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/internal/libc.h b/src/internal/libc.h
index ac97dc7e..2e6737d9 100644
--- a/src/internal/libc.h
+++ b/src/internal/libc.h
@@ -18,9 +18,7 @@ struct tls_module {
 };
 
 struct __libc {
-	int can_do_threads;
-	int threaded;
-	int secure;
+	unsigned can_do_threads:1, threaded:1, secure:1, handling_sigs:1;
 	volatile int threads_minus_1;
 	size_t *auxv;
 	struct tls_module *tls_head;
diff --git a/src/signal/sigaction.c b/src/signal/sigaction.c
index af47195e..5f02b99b 100644
--- a/src/signal/sigaction.c
+++ b/src/signal/sigaction.c
@@ -43,6 +43,8 @@ int __libc_sigaction(int sig, const struct sigaction *restrict sa, struct sigact
 					SIGPT_SET, 0, _NSIG/8);
 				unmask_done = 1;
 			}
+			if (!(sa->sa_flags & SA_RESTART))
+				libc.handling_sigs = 1;
 		}
 		/* Changing the disposition of SIGABRT to anything but
 		 * SIG_DFL requires a lock, so that it cannot be changed
diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
index 8132eb1b..e76ae9de 100644
--- a/src/thread/sem_timedwait.c
+++ b/src/thread/sem_timedwait.c
@@ -22,7 +22,10 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
 		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
 		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
 		pthread_cleanup_pop(1);
-		if (r && r != EINTR) {
+                /* Linux pre-2.6.22 bug: Sometimes SYS_futex returns with EINTR when it should not.
+                 * Workaround: Retry on EINTR unless someone installed handlers before.
+                 */
+		if (r && (r != EINTR || libc.handling_sigs)) {
 			errno = r;
 			return -1;
 		}
-- 
2.19.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06 17:33                 ` Markus Wichmann
@ 2018-12-06 20:31                   ` Orivej Desh
  2018-12-09  2:51                   ` Rich Felker
  1 sibling, 0 replies; 19+ messages in thread
From: Orivej Desh @ 2018-12-06 20:31 UTC (permalink / raw)
  To: musl

* Markus Wichmann <nullplan@gmx.net> [2018-12-06]
> Hi all,
> 
> is the attached patch acceptable? A word about the bitfields: I
> generally dislike them for most things, but I didn't want to destroy the
> alignment struct __libc had going on, and these other flags really only
> are 0 or 1.
> 
> Patch is untested for want of an old kernel.

Thank you! I have checked that your patch works as intended.
I have emulated an old kernel with this:

--- a/src/thread/__timedwait.c
+++ b/src/thread/__timedwait.c
@@ -28 +28 @@ int __timedwait_cp(volatile int *addr, int val,
-	r = -__syscall_cp(SYS_futex, addr, FUTEX_WAIT|priv, val, top);
+	r = EINTR;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-06 17:33                 ` Markus Wichmann
  2018-12-06 20:31                   ` Orivej Desh
@ 2018-12-09  2:51                   ` Rich Felker
  2018-12-09  6:50                     ` Markus Wichmann
  1 sibling, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-09  2:51 UTC (permalink / raw)
  To: musl

On Thu, Dec 06, 2018 at 06:33:37PM +0100, Markus Wichmann wrote:
> Hi all,
> 
> is the attached patch acceptable? A word about the bitfields: I
> generally dislike them for most things, but I didn't want to destroy the
> alignment struct __libc had going on, and these other flags really only
> are 0 or 1.
> 
> Patch is untested for want of an old kernel.
> 
> Ciao,
> Markus

> >From 816abf54fbbb02923331b69b62333b8a0edb4181 Mon Sep 17 00:00:00 2001
> From: Markus Wichmann <nullplan@gmx.net>
> Date: Thu, 6 Dec 2018 18:30:26 +0100
> Subject: [PATCH 3/3] Add workaround for old linux bug.
> 
> ---
>  src/internal/libc.h        | 4 +---
>  src/signal/sigaction.c     | 2 ++
>  src/thread/sem_timedwait.c | 5 ++++-
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/src/internal/libc.h b/src/internal/libc.h
> index ac97dc7e..2e6737d9 100644
> --- a/src/internal/libc.h
> +++ b/src/internal/libc.h
> @@ -18,9 +18,7 @@ struct tls_module {
>  };
>  
>  struct __libc {
> -	int can_do_threads;
> -	int threaded;
> -	int secure;
> +	unsigned can_do_threads:1, threaded:1, secure:1, handling_sigs:1;
>  	volatile int threads_minus_1;
>  	size_t *auxv;
>  	struct tls_module *tls_head;
> diff --git a/src/signal/sigaction.c b/src/signal/sigaction.c
> index af47195e..5f02b99b 100644
> --- a/src/signal/sigaction.c
> +++ b/src/signal/sigaction.c
> @@ -43,6 +43,8 @@ int __libc_sigaction(int sig, const struct sigaction *restrict sa, struct sigact
>  					SIGPT_SET, 0, _NSIG/8);
>  				unmask_done = 1;
>  			}
> +			if (!(sa->sa_flags & SA_RESTART))
> +				libc.handling_sigs = 1;
>  		}
>  		/* Changing the disposition of SIGABRT to anything but
>  		 * SIG_DFL requires a lock, so that it cannot be changed
> diff --git a/src/thread/sem_timedwait.c b/src/thread/sem_timedwait.c
> index 8132eb1b..e76ae9de 100644
> --- a/src/thread/sem_timedwait.c
> +++ b/src/thread/sem_timedwait.c
> @@ -22,7 +22,10 @@ int sem_timedwait(sem_t *restrict sem, const struct timespec *restrict at)
>  		pthread_cleanup_push(cleanup, (void *)(sem->__val+1));
>  		r = __timedwait_cp(sem->__val, -1, CLOCK_REALTIME, at, sem->__val[2]);
>  		pthread_cleanup_pop(1);
> -		if (r && r != EINTR) {
> +                /* Linux pre-2.6.22 bug: Sometimes SYS_futex returns with EINTR when it should not.
> +                 * Workaround: Retry on EINTR unless someone installed handlers before.
> +                 */
> +		if (r && (r != EINTR || libc.handling_sigs)) {
>  			errno = r;
>  			return -1;
>  		}
> -- 
> 2.19.1
> 

Conceptually the logic looks correct, but I'm trying to phase out the
__libc structure, and more importantly, introducing a bitfield here is
not safe unless access to all bitfield members is controlled by a
common mutex or other synchronization mechanism. There is a data race
if any access to the other fields is accessed when sigaction is
callable from another thread, and the impact of a data race on any of
these is critical.

Unfortunately, even without the bitfield, there is a data race between
access to the new flag from sem_timedwait and from sigaction. In
theory, this potentially results in a race window where sem_wait
retries on EINTR because it doesn't yet see the updated handling_sigs.

In reality, I don't think that happens -- at least not on kernels
without the bug -- because the store to handling_sigs is sequenced
before sigaction, which is sequenced before the EINTR-generating
signal delivery, which is sequenced before the load in sig_timedwait,
with memory barriers (necessarily from an implementation standpoint)
imposed by whatever kernel mechanism delivers the signal. However
there is still a race between concurrent/unsequenced sigaction calls.

The "right" fix would be to use an AS-safe lock to protect the
handling_sigs object, or make it atomic (a_store on the sigaction
side, a_barrier before load on the sem side). Alternatively we could
assume the above reasoning about sequencing of sigaction before EINTR
and just use a lock on the sigaction side, but the atomic is probably
simplest and "safer".

A couple other small nits: comments especially should not exceed 80
columns (preferably <75 or so) assuming tab width 8; code should avoid
it too but I see it slightly does in a few places there. Spaces should
not be used for indention. And the commit message should reflect the
change made; it's not adding a workaround, but reducing the scope of a
previous workaround (which should be cited by commit id) to fix a
conformance bug. The name "handling_sigs" is also a bit misleading;
what it's indicating is that interrupting signal handlers are or have
been present.

I'm happy to fix up these issues and commit if there aren't mistakes
in the above comments that still need to be addressed.

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-09  2:51                   ` Rich Felker
@ 2018-12-09  6:50                     ` Markus Wichmann
  2018-12-12  0:32                       ` Rich Felker
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Wichmann @ 2018-12-09  6:50 UTC (permalink / raw)
  To: musl

On Sat, Dec 08, 2018 at 09:51:40PM -0500, Rich Felker wrote:
> Conceptually the logic looks correct, but I'm trying to phase out the
> __libc structure,

Oh? Well, I suppose it does offer no benefit the linker doesn't already
give you. I suppose we could make the flag a static int in sigaction(),
with a hidden getter function.

> and more importantly, introducing a bitfield here is
> not safe unless access to all bitfield members is controlled by a
> common mutex or other synchronization mechanism. There is a data race
> if any access to the other fields is accessed when sigaction is
> callable from another thread, and the impact of a data race on any of
> these is critical.
> 

The flags can_do_thrads, threaded, and secure are all set before the
program becomes multi-threaded and they keep their values throughout the
rest of the program. So the only accesses are read accesses.

handling_sigs on the other hand might be written by multiple threads at
the same time, but to the same value. Can that ever be an issue?

Wait... I think I figured it out. On some architectures you can have
incoherent caches. Threads need to synchronize to access shared
ressources, and those synchronization functions take care of that
problem, but I circumvented them. Now, if the caches are set to
write-back mode (which is common), then writing to handling_sigs only
writes to the cache line for the address of handling_sigs, but not to
main memory. So another thread on another core will not see the update.

Worse: If by a bad roll of the dice the writing thread's cache line is
evicted before the reading thread's cache line is (it might have written
somewhere else in that cache line), that latter eviction will overwrite
handling_sigs back to 0.

Am I close?

> Unfortunately, even without the bitfield, there is a data race between
> access to the new flag from sem_timedwait and from sigaction. In
> theory, this potentially results in a race window where sem_wait
> retries on EINTR because it doesn't yet see the updated handling_sigs.
> 

In that case, the sem_wait() and the sigaction() would be unserialized,
and any correct program cannot depend on the order of operations. In
particular, the caller if the sem_wait() can't depend on the sigaction()
having installed the signal handler yet.

> The "right" fix would be to use an AS-safe lock to protect the
> handling_sigs object, or make it atomic (a_store on the sigaction
> side, a_barrier before load on the sem side). Alternatively we could
> assume the above reasoning about sequencing of sigaction before EINTR
> and just use a lock on the sigaction side, but the atomic is probably
> simplest and "safer".
> 

Plus, a lock for a single int that can only ever go from 0 to 1 would be
a bit overkill.

> A couple other small nits: comments especially should not exceed 80
> columns (preferably <75 or so) assuming tab width 8; code should avoid
> it too but I see it slightly does in a few places there. Spaces should
> not be used for indention.

Ah crap. I forgot the comment the first time I was in the file, and when
I started vim again, I forgot to re-apply the musl settings.

> And the commit message should reflect the
> change made; it's not adding a workaround, but reducing the scope of a
> previous workaround (which should be cited by commit id) to fix a
> conformance bug. The name "handling_sigs" is also a bit misleading;
> what it's indicating is that interrupting signal handlers are or have
> been present.
> 
> I'm happy to fix up these issues and commit if there aren't mistakes
> in the above comments that still need to be addressed.
> 
> Rich

Well, it certainly was a learning experience for me. Just goes to show
that you never learn as much as when you're making a mistake and have to
figure it out.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-09  6:50                     ` Markus Wichmann
@ 2018-12-12  0:32                       ` Rich Felker
  2018-12-12  5:15                         ` Markus Wichmann
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-12  0:32 UTC (permalink / raw)
  To: musl

On Sun, Dec 09, 2018 at 07:50:33AM +0100, Markus Wichmann wrote:
> On Sat, Dec 08, 2018 at 09:51:40PM -0500, Rich Felker wrote:
> > Conceptually the logic looks correct, but I'm trying to phase out the
> > __libc structure,
> 
> Oh? Well, I suppose it does offer no benefit the linker doesn't already
> give you. I suppose we could make the flag a static int in sigaction(),
> with a hidden getter function.

For the record, the original purpose of the __libc structure was, in
the days of the first-gen dynamic linker, making it possible to access
the members in a pure PC-relative manner before relocations had taken
place, and without assuming the toolchain supported visibility. In the
fallback case, libc was #defined like errno, as (*__libc_location())
or similar, and the struct was static in the file where that function
was defined. It later got used for other globals we wanted to make
cheap to access, and, to some extent, to micro-optimize the placement
of atomics relative to cache lines in ways that didn't actually
matter.

Now that we just use hidden visibility for internal things, access is
just as efficient without putting them in the special hidden-vis libc
structure, and static linking is more efficient since the linker can
avoid pulling in members that don't actually get used.

> > and more importantly, introducing a bitfield here is
> > not safe unless access to all bitfield members is controlled by a
> > common mutex or other synchronization mechanism. There is a data race
> > if any access to the other fields is accessed when sigaction is
> > callable from another thread, and the impact of a data race on any of
> > these is critical.
> > 
> 
> The flags can_do_thrads, threaded, and secure are all set before the
> program becomes multi-threaded and they keep their values throughout the
> rest of the program. So the only accesses are read accesses.
> 
> handling_sigs on the other hand might be written by multiple threads at
> the same time, but to the same value. Can that ever be an issue?
> 
> Wait... I think I figured it out. On some architectures you can have
> incoherent caches. Threads need to synchronize to access shared
> ressources, and those synchronization functions take care of that
> problem, but I circumvented them. Now, if the caches are set to
> write-back mode (which is common), then writing to handling_sigs only
> writes to the cache line for the address of handling_sigs, but not to
> main memory. So another thread on another core will not see the update.
> 
> Worse: If by a bad roll of the dice the writing thread's cache line is
> evicted before the reading thread's cache line is (it might have written
> somewhere else in that cache line), that latter eviction will overwrite
> handling_sigs back to 0.
> 
> Am I close?

There might or might not be mechanisms on very weakly-ordered machines
by which this could blow up, but the reasoning here is more simple and
high-level: we just don't assume properties about the machine's memory
ordering properties, and instead treat data races as the undefined
behavior they are, with the assumption that "anything could happen".

> > Unfortunately, even without the bitfield, there is a data race between
> > access to the new flag from sem_timedwait and from sigaction. In
> > theory, this potentially results in a race window where sem_wait
> > retries on EINTR because it doesn't yet see the updated handling_sigs.
> 
> In that case, the sem_wait() and the sigaction() would be unserialized,
> and any correct program cannot depend on the order of operations. In
> particular, the caller if the sem_wait() can't depend on the sigaction()
> having installed the signal handler yet.

Consider the case where thread A is blocked in sem_wait, and thread B
installs an interrupting signal handler then calls pthread_kill to
send the signal to thread A. Despite any operations that synchronize
memory, since the pthread_kill is ordered after the sigaction in
thread B, it seems a requirement that the signal generate EINTR. There
necessarily is some underlying synchronization in kernelspace that
would ensure this even without any effort by userspace, but it's not
part of the abstract model I'm working from. We could just assume it's
there or add a_barrier() before the check and make it explicit.

> > The "right" fix would be to use an AS-safe lock to protect the
> > handling_sigs object, or make it atomic (a_store on the sigaction
> > side, a_barrier before load on the sem side). Alternatively we could
> > assume the above reasoning about sequencing of sigaction before EINTR
> > and just use a lock on the sigaction side, but the atomic is probably
> > simplest and "safer".
> > 
> 
> Plus, a lock for a single int that can only ever go from 0 to 1 would be
> a bit overkill.

It's not a big deal, but I think a_store would work just as well and
be simpler.

> > A couple other small nits: comments especially should not exceed 80
> > columns (preferably <75 or so) assuming tab width 8; code should avoid
> > it too but I see it slightly does in a few places there. Spaces should
> > not be used for indention.
> 
> Ah crap. I forgot the comment the first time I was in the file, and when
> I started vim again, I forgot to re-apply the musl settings.
> 
> > And the commit message should reflect the
> > change made; it's not adding a workaround, but reducing the scope of a
> > previous workaround (which should be cited by commit id) to fix a
> > conformance bug. The name "handling_sigs" is also a bit misleading;
> > what it's indicating is that interrupting signal handlers are or have
> > been present.
> > 
> > I'm happy to fix up these issues and commit if there aren't mistakes
> > in the above comments that still need to be addressed.
> 
> Well, it certainly was a learning experience for me. Just goes to show
> that you never learn as much as when you're making a mistake and have to
> figure it out.

One other thought: would it be preferable for the EINTR suppression in
the absence of interruptible signal handlers to be in __timedwait
rather than sem_timedwait? Then any code using __timedwait would
benefit from it. I'm not sure if there are other callers where it
would help but it wouldn't hurt either.

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-12  0:32                       ` Rich Felker
@ 2018-12-12  5:15                         ` Markus Wichmann
  2018-12-14 19:45                           ` Rich Felker
  0 siblings, 1 reply; 19+ messages in thread
From: Markus Wichmann @ 2018-12-12  5:15 UTC (permalink / raw)
  To: musl

On Tue, Dec 11, 2018 at 07:32:38PM -0500, Rich Felker wrote:
> One other thought: would it be preferable for the EINTR suppression in
> the absence of interruptible signal handlers to be in __timedwait
> rather than sem_timedwait? Then any code using __timedwait would
> benefit from it. I'm not sure if there are other callers where it
> would help but it wouldn't hurt either.
> 
> Rich

Nope, that would not help. I had a look at all users of SYS_futex that
might be impacted by the kernel bug you mentioned (and followed them
back to the public interfaces that use them). Only __timedwait does
anything with the return value at all, and of all the users of
__timedwait(), sem_timedwait() is the only function even specified to
return EINTR. All others are specified to *never* return EINTR. At least
according to the manpages I have here (manpages-posix).

So only sem_timedwait() needs this patch. For the other users it would
hurt conformance.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-12  5:15                         ` Markus Wichmann
@ 2018-12-14 19:45                           ` Rich Felker
  2018-12-15  9:45                             ` Markus Wichmann
  0 siblings, 1 reply; 19+ messages in thread
From: Rich Felker @ 2018-12-14 19:45 UTC (permalink / raw)
  To: musl

On Wed, Dec 12, 2018 at 06:15:59AM +0100, Markus Wichmann wrote:
> On Tue, Dec 11, 2018 at 07:32:38PM -0500, Rich Felker wrote:
> > One other thought: would it be preferable for the EINTR suppression in
> > the absence of interruptible signal handlers to be in __timedwait
> > rather than sem_timedwait? Then any code using __timedwait would
> > benefit from it. I'm not sure if there are other callers where it
> > would help but it wouldn't hurt either.
> 
> Nope, that would not help. I had a look at all users of SYS_futex that

Perhaps help was the wrong word; I think you're right that there's
nowhere else it matters and that all other callers already ignore
EINTR unconditionally because they're supposed to. The only plausible
improvement is avoiding spurious dec/inc cycle on the waiter count in
some places. On the other hand it might be a nicer factorization (less
ugly and linux-bug-specific logic in high level code, i.e.
sem_timedwait) if the workaround were buried in low-level stuff
(__timedwait).

x> might be impacted by the kernel bug you mentioned (and followed them
> back to the public interfaces that use them). Only __timedwait does
> anything with the return value at all, and of all the users of
> __timedwait(), sem_timedwait() is the only function even specified to
> return EINTR. All others are specified to *never* return EINTR. At least
> according to the manpages I have here (manpages-posix).
> 
> So only sem_timedwait() needs this patch. For the other users it would
> hurt conformance.

I don't see how it could hurt conformance.

Rich


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sem_wait and EINTR
  2018-12-14 19:45                           ` Rich Felker
@ 2018-12-15  9:45                             ` Markus Wichmann
  0 siblings, 0 replies; 19+ messages in thread
From: Markus Wichmann @ 2018-12-15  9:45 UTC (permalink / raw)
  To: musl

On Fri, Dec 14, 2018 at 02:45:16PM -0500, Rich Felker wrote:
> Perhaps help was the wrong word; I think you're right that there's
> nowhere else it matters and that all other callers already ignore
> EINTR unconditionally because they're supposed to. The only plausible
> improvement is avoiding spurious dec/inc cycle on the waiter count in
> some places. On the other hand it might be a nicer factorization (less
> ugly and linux-bug-specific logic in high level code, i.e.
> sem_timedwait) if the workaround were buried in low-level stuff
> (__timedwait).
> 
> [...]
> 
> I don't see how it could hurt conformance.
> 
> Rich

I was misunderstanding you. I thought you were about to put the
maybe-retry on EINTR into __timedwait() and then remove the
corresponding check from all the users of __timedwait(). But apparently
you want leave the users relatively unmolested. Yeah, that sounds better
than what I pictured. Go right ahead, I say.

Ciao,
Markus


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-12-15  9:45 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05 19:16 sem_wait and EINTR Orivej Desh
2018-12-05 19:47 ` Markus Wichmann
2018-12-05 21:27   ` Ondřej Jirman
2018-12-05 21:58     ` Rich Felker
2018-12-06  2:43       ` Orivej Desh
2018-12-06  3:17         ` Rich Felker
2018-12-06 15:57           ` Markus Wichmann
2018-12-06 16:23             ` Rich Felker
2018-12-06 17:03               ` Markus Wichmann
2018-12-06 17:33                 ` Markus Wichmann
2018-12-06 20:31                   ` Orivej Desh
2018-12-09  2:51                   ` Rich Felker
2018-12-09  6:50                     ` Markus Wichmann
2018-12-12  0:32                       ` Rich Felker
2018-12-12  5:15                         ` Markus Wichmann
2018-12-14 19:45                           ` Rich Felker
2018-12-15  9:45                             ` Markus Wichmann
2018-12-05 22:03 ` Rich Felker
2018-12-06  2:43   ` Orivej Desh

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).