mailing list of musl libc
 help / color / mirror / code / Atom feed
* nfs-utils broken with musl: "select: Bad file descriptor"
@ 2015-08-18  2:30 Tastky
  2015-08-18  3:00 ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Tastky @ 2015-08-18  2:30 UTC (permalink / raw)
  To: musl

As by this OpenWRT bugreport:
https://dev.openwrt.org/ticket/20038

On various architectures – at least a mips and powerpc one – nfs-utils 
is broken with musl, yielding a never ending stream of "my_svc_run() - 
select: Bad file descriptor" in the system log.

The message originates in the this file:
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/svc_run.c

"Downgrading" to uClibc has the issue vanish.

I verified this myself with recent git versions of both musl and the 
utils on a fresh ar71xx OpenWRT compilation.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18  2:30 nfs-utils broken with musl: "select: Bad file descriptor" Tastky
@ 2015-08-18  3:00 ` Rich Felker
  2015-08-18 16:50   ` Tastky
  0 siblings, 1 reply; 11+ messages in thread
From: Rich Felker @ 2015-08-18  3:00 UTC (permalink / raw)
  To: Tastky; +Cc: musl

On Tue, Aug 18, 2015 at 04:30:21AM +0200, Tastky wrote:
> As by this OpenWRT bugreport:
> https://dev.openwrt.org/ticket/20038
> 
> On various architectures – at least a mips and powerpc one –
> nfs-utils is broken with musl, yielding a never ending stream of
> "my_svc_run() - select: Bad file descriptor" in the system log.
> 
> The message originates in the this file:
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/svc_run.c
> 
> "Downgrading" to uClibc has the issue vanish.
> 
> I verified this myself with recent git versions of both musl and the
> utils on a fresh ar71xx OpenWRT compilation.

Here's my quick guess at what's going wrong. This file:

http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/system.h;h=a1739c491474179c16a64f7a2bbfde8f651085c6;hb=HEAD

contains nonsense to define SVC_FDSET as int rather than using fd_set
on "systems which don't have fd_set" (which don't exist).
Unfortunately, it's checking #ifdef FD_SETSIZE without including the
header that defines it, sys/select.h. If this is the problem, adding:

#include <sys/select.h>

to the top of that file should fix the error.

Note that compiling with -Werror=implicit-function-declaration would
catch such bogus code right away.

If this turns out not to be the problem, can you send an strace of the
failing program up to the first failure message?

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18  3:00 ` Rich Felker
@ 2015-08-18 16:50   ` Tastky
  2015-08-18 17:49     ` Rich Felker
  0 siblings, 1 reply; 11+ messages in thread
From: Tastky @ 2015-08-18 16:50 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

Just checked with said include and everything recompiled. Unfortunately 
the same error persists.

Running OpenWrt's command
/usr/sbin/rpc.statd -p 32778 -o 32779 -F
manually (following the script's prior steps, ofc) results in a loop of:

sm-notify: Version 1.3.2 starting
sm-notify: Already notifying clients; Exiting

With strace: http://pastebin.com/raw.php?i=9ypUbmsp

On 18.08.2015 05:00, Rich Felker wrote:
> On Tue, Aug 18, 2015 at 04:30:21AM +0200, Tastky wrote:
>> As by this OpenWRT bugreport:
>> https://dev.openwrt.org/ticket/20038
>>
>> On various architectures – at least a mips and powerpc one –
>> nfs-utils is broken with musl, yielding a never ending stream of
>> "my_svc_run() - select: Bad file descriptor" in the system log.
>>
>> The message originates in the this file:
>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/svc_run.c
>>
>> "Downgrading" to uClibc has the issue vanish.
>>
>> I verified this myself with recent git versions of both musl and the
>> utils on a fresh ar71xx OpenWRT compilation.
>
> Here's my quick guess at what's going wrong. This file:
>
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/system.h;h=a1739c491474179c16a64f7a2bbfde8f651085c6;hb=HEAD
>
> contains nonsense to define SVC_FDSET as int rather than using fd_set
> on "systems which don't have fd_set" (which don't exist).
> Unfortunately, it's checking #ifdef FD_SETSIZE without including the
> header that defines it, sys/select.h. If this is the problem, adding:
>
> #include <sys/select.h>
>
> to the top of that file should fix the error.
>
> Note that compiling with -Werror=implicit-function-declaration would
> catch such bogus code right away.
>
> If this turns out not to be the problem, can you send an strace of the
> failing program up to the first failure message?
>
> Rich
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 16:50   ` Tastky
@ 2015-08-18 17:49     ` Rich Felker
  2015-08-18 18:07       ` Tastky
  2015-08-18 18:20       ` Felix Janda
  0 siblings, 2 replies; 11+ messages in thread
From: Rich Felker @ 2015-08-18 17:49 UTC (permalink / raw)
  To: Tastky; +Cc: musl

On Tue, Aug 18, 2015 at 06:50:54PM +0200, Tastky wrote:
> Just checked with said include and everything recompiled.
> Unfortunately the same error persists.
> 
> Running OpenWrt's command
> /usr/sbin/rpc.statd -p 32778 -o 32779 -F
> manually (following the script's prior steps, ofc) results in a loop of:
> 
> sm-notify: Version 1.3.2 starting
> sm-notify: Already notifying clients; Exiting
> 
> With strace: http://pastebin.com/raw.php?i=9ypUbmsp

From the strace, I see that a nonsensical fd #105 is in the fd_set
readfds that comes from SVC_FDSET. I don't know where the latter is
defined or modified.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 17:49     ` Rich Felker
@ 2015-08-18 18:07       ` Tastky
  2015-08-18 18:08         ` Tastky
  2015-08-18 18:20       ` Felix Janda
  1 sibling, 1 reply; 11+ messages in thread
From: Tastky @ 2015-08-18 18:07 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

I'm afraid that all I can add right now is that when rpc.statd is 
started with -d (write to stderr not syslog) the following line also 
appears before the error:

rpc.statd: Waiting for client connections

Meaning it's hung in the if(notify) block of svc_run's my_svc_run().

On 18.08.2015 19:49, Rich Felker wrote:
>  From the strace, I see that a nonsensical fd #105 is in the fd_set
> readfds that comes from SVC_FDSET. I don't know where the latter is
> defined or modified.
>
> Rich
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 18:07       ` Tastky
@ 2015-08-18 18:08         ` Tastky
  0 siblings, 0 replies; 11+ messages in thread
From: Tastky @ 2015-08-18 18:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

In the block's else cause, ofc.

On 18.08.2015 20:07, Tastky wrote:
> I'm afraid that all I can add right now is that when rpc.statd is
> started with -d (write to stderr not syslog) the following line also
> appears before the error:
>
> rpc.statd: Waiting for client connections
>
> Meaning it's hung in the if(notify) block of svc_run's my_svc_run().



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 17:49     ` Rich Felker
  2015-08-18 18:07       ` Tastky
@ 2015-08-18 18:20       ` Felix Janda
  2015-08-18 19:18         ` Szabolcs Nagy
  1 sibling, 1 reply; 11+ messages in thread
From: Felix Janda @ 2015-08-18 18:20 UTC (permalink / raw)
  To: musl; +Cc: Tastky

Rich Felker wrote:
> On Tue, Aug 18, 2015 at 06:50:54PM +0200, Tastky wrote:
> > Just checked with said include and everything recompiled.
> > Unfortunately the same error persists.
> > 
> > Running OpenWrt's command
> > /usr/sbin/rpc.statd -p 32778 -o 32779 -F
> > manually (following the script's prior steps, ofc) results in a loop of:
> > 
> > sm-notify: Version 1.3.2 starting
> > sm-notify: Already notifying clients; Exiting
> > 
> > With strace: http://pastebin.com/raw.php?i=9ypUbmsp
> 
> From the strace, I see that a nonsensical fd #105 is in the fd_set
> readfds that comes from SVC_FDSET. I don't know where the latter is
> defined or modified.

It is defined in system.h (now hopefully) to be svc_fdset, which seems
to be defined as a global variable by the rpc headers.

Felix


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 18:20       ` Felix Janda
@ 2015-08-18 19:18         ` Szabolcs Nagy
  2015-08-18 20:51           ` Rich Felker
  2015-08-19  9:34           ` Natanael Copa
  0 siblings, 2 replies; 11+ messages in thread
From: Szabolcs Nagy @ 2015-08-18 19:18 UTC (permalink / raw)
  To: musl, Tastky

* Felix Janda <felix.janda@posteo.de> [2015-08-18 20:20:14 +0200]:

> Rich Felker wrote:
> > On Tue, Aug 18, 2015 at 06:50:54PM +0200, Tastky wrote:
> > > Just checked with said include and everything recompiled.
> > > Unfortunately the same error persists.
> > > 
> > > Running OpenWrt's command
> > > /usr/sbin/rpc.statd -p 32778 -o 32779 -F
> > > manually (following the script's prior steps, ofc) results in a loop of:
> > > 
> > > sm-notify: Version 1.3.2 starting
> > > sm-notify: Already notifying clients; Exiting
> > > 
> > > With strace: http://pastebin.com/raw.php?i=9ypUbmsp
> > 
> > From the strace, I see that a nonsensical fd #105 is in the fd_set
> > readfds that comes from SVC_FDSET. I don't know where the latter is
> > defined or modified.
> 
> It is defined in system.h (now hopefully) to be svc_fdset, which seems
> to be defined as a global variable by the rpc headers.
> 

i think this call goes wrong:

http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56

it loops for 100 iterations and if all ports are used
according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
with some random high sockfd (eg. 105) that is closed.

..so should getservbyport fail there?

(according to strace it tries ports 883 to 982)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 19:18         ` Szabolcs Nagy
@ 2015-08-18 20:51           ` Rich Felker
  2015-08-19  9:34           ` Natanael Copa
  1 sibling, 0 replies; 11+ messages in thread
From: Rich Felker @ 2015-08-18 20:51 UTC (permalink / raw)
  To: musl; +Cc: Tastky

On Tue, Aug 18, 2015 at 09:18:10PM +0200, Szabolcs Nagy wrote:
> * Felix Janda <felix.janda@posteo.de> [2015-08-18 20:20:14 +0200]:
> 
> > Rich Felker wrote:
> > > On Tue, Aug 18, 2015 at 06:50:54PM +0200, Tastky wrote:
> > > > Just checked with said include and everything recompiled.
> > > > Unfortunately the same error persists.
> > > > 
> > > > Running OpenWrt's command
> > > > /usr/sbin/rpc.statd -p 32778 -o 32779 -F
> > > > manually (following the script's prior steps, ofc) results in a loop of:
> > > > 
> > > > sm-notify: Version 1.3.2 starting
> > > > sm-notify: Already notifying clients; Exiting
> > > > 
> > > > With strace: http://pastebin.com/raw.php?i=9ypUbmsp
> > > 
> > > From the strace, I see that a nonsensical fd #105 is in the fd_set
> > > readfds that comes from SVC_FDSET. I don't know where the latter is
> > > defined or modified.
> > 
> > It is defined in system.h (now hopefully) to be svc_fdset, which seems
> > to be defined as a global variable by the rpc headers.
> > 
> 
> i think this call goes wrong:
> 
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> 
> it loops for 100 iterations and if all ports are used
> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
> with some random high sockfd (eg. 105) that is closed.
> 
> ...so should getservbyport fail there?
> 
> (according to strace it tries ports 883 to 982)

I think the application's expectation is that it fail rather than
returning a decimal-string-only service entity. However it looks like
the code is written to handle the case where all 100 iterations fail
to get an anonymous port. The problem seems to be that, when the loop
stops due to hitting the iteration count rather than exiting with
break, i has already been incremented past the last tmp_socket slot,
so the close loop closes the fd that they actually want to use, later
causing EBADF. This is purely an application bug, but it happens not
to get noticed if getservbyport fails anywhere along the way, which
they expect to happen in the usual case.

Rich


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-18 19:18         ` Szabolcs Nagy
  2015-08-18 20:51           ` Rich Felker
@ 2015-08-19  9:34           ` Natanael Copa
  1 sibling, 0 replies; 11+ messages in thread
From: Natanael Copa @ 2015-08-19  9:34 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: musl, Tastky

On Tue, 18 Aug 2015 21:18:10 +0200
Szabolcs Nagy <nsz@port70.net> wrote:

> * Felix Janda <felix.janda@posteo.de> [2015-08-18 20:20:14 +0200]:
> 
> > Rich Felker wrote:
> > > On Tue, Aug 18, 2015 at 06:50:54PM +0200, Tastky wrote:
> > > > Just checked with said include and everything recompiled.
> > > > Unfortunately the same error persists.
> > > > 
> > > > Running OpenWrt's command
> > > > /usr/sbin/rpc.statd -p 32778 -o 32779 -F
> > > > manually (following the script's prior steps, ofc) results in a loop of:
> > > > 
> > > > sm-notify: Version 1.3.2 starting
> > > > sm-notify: Already notifying clients; Exiting
> > > > 
> > > > With strace: http://pastebin.com/raw.php?i=9ypUbmsp
> > > 
> > > From the strace, I see that a nonsensical fd #105 is in the fd_set
> > > readfds that comes from SVC_FDSET. I don't know where the latter is
> > > defined or modified.
> > 
> > It is defined in system.h (now hopefully) to be svc_fdset, which seems
> > to be defined as a global variable by the rpc headers.
> > 
> 
> i think this call goes wrong:
> 
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> 
> it loops for 100 iterations and if all ports are used
> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
> with some random high sockfd (eg. 105) that is closed.


Yeah. Alpine Linux works around it this way:
http://git.alpinelinux.org/cgit/aports/plain/main/nfs-utils/musl-getservbyport.patch

Musl will always return something with getservbyport so we cannot skip
ports that returns non-null.

diff --git a/utils/statd/rmtcall.c b/utils/statd/rmtcall.c
index fd576d9..d72a0bf 100644
--- a/utils/statd/rmtcall.c
+++ b/utils/statd/rmtcall.c
@@ -90,8 +90,10 @@ statd_get_socket(void)
 					__func__);
 			break;
 		}
+#if 0
 		se = getservbyport(sin.sin_port, "udp");
 		if (se == NULL)
+#endif
 			break;
 		/* rather not use that port, try again */


-nc


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: nfs-utils broken with musl: "select: Bad file descriptor"
@ 2015-08-19  1:05 Chuck Lever
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2015-08-19  1:05 UTC (permalink / raw)
  To: musl

>> i think this call goes wrong:
>> 
>> 
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> 
>> 
>> it loops for 100 iterations and if all ports are used
>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
>> with some random high sockfd (eg. 105) that is closed.
>> 
>> ...so should getservbyport fail there?
>> 
>> (according to strace it tries ports 883 to 982)
> 
> I think the application's expectation is that it fail rather than
> returning a decimal-string-only service entity. However it looks like
> the code is written to handle the case where all 100 iterations fail
> to get an anonymous port. The problem seems to be that, when the loop
> stops due to hitting the iteration count rather than exiting with
> break, i has already been incremented past the last tmp_socket slot,
> so the close loop closes the fd that they actually want to use, later
> causing EBADF. This is purely an application bug, but it happens not
> to get noticed if getservbyport fails anywhere along the way, which
> they expect to happen in the usual case.

statd_get_socket() is hunting for a privileged source port that
is not just unused at the moment, but that is also not going to be
used by some other well-known service. This is a long-lived socket
that statd uses to communicate with the kernel. It must use a
privileged port.

if getservbyport(3) is returning something for every port that
is tried, then statd_get_socket() will fail to find a usable
port.

If it's returning 105, that suggests it has run out of retries.
It should return -1 in this case. That is a logic bug.

But is it true that every port returned by bindresvport(3) is
actually defined in /etc/services? Surely there is one open
port that can be used. What port does bindresvport(3) start
with?

--
Chuck Lever





^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-19  9:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-18  2:30 nfs-utils broken with musl: "select: Bad file descriptor" Tastky
2015-08-18  3:00 ` Rich Felker
2015-08-18 16:50   ` Tastky
2015-08-18 17:49     ` Rich Felker
2015-08-18 18:07       ` Tastky
2015-08-18 18:08         ` Tastky
2015-08-18 18:20       ` Felix Janda
2015-08-18 19:18         ` Szabolcs Nagy
2015-08-18 20:51           ` Rich Felker
2015-08-19  9:34           ` Natanael Copa
2015-08-19  1:05 Chuck Lever

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).