mailing list of musl libc
 help / color / mirror / code / Atom feed
* Re: nfs-utils broken with musl: "select: Bad file descriptor"
@ 2015-08-19  1:05 Chuck Lever
  2015-08-19  1:24 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2015-08-19  1:05 UTC (permalink / raw)
  To: musl

>> i think this call goes wrong:
>> 
>> 
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> 
>> 
>> it loops for 100 iterations and if all ports are used
>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
>> with some random high sockfd (eg. 105) that is closed.
>> 
>> ...so should getservbyport fail there?
>> 
>> (according to strace it tries ports 883 to 982)
> 
> I think the application's expectation is that it fail rather than
> returning a decimal-string-only service entity. However it looks like
> the code is written to handle the case where all 100 iterations fail
> to get an anonymous port. The problem seems to be that, when the loop
> stops due to hitting the iteration count rather than exiting with
> break, i has already been incremented past the last tmp_socket slot,
> so the close loop closes the fd that they actually want to use, later
> causing EBADF. This is purely an application bug, but it happens not
> to get noticed if getservbyport fails anywhere along the way, which
> they expect to happen in the usual case.

statd_get_socket() is hunting for a privileged source port that
is not just unused at the moment, but that is also not going to be
used by some other well-known service. This is a long-lived socket
that statd uses to communicate with the kernel. It must use a
privileged port.

if getservbyport(3) is returning something for every port that
is tried, then statd_get_socket() will fail to find a usable
port.

If it's returning 105, that suggests it has run out of retries.
It should return -1 in this case. That is a logic bug.

But is it true that every port returned by bindresvport(3) is
actually defined in /etc/services? Surely there is one open
port that can be used. What port does bindresvport(3) start
with?

--
Chuck Lever





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-19  1:05 nfs-utils broken with musl: "select: Bad file descriptor" Chuck Lever
@ 2015-08-19  1:24 ` Rich Felker
  2015-08-19  1:44   ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-08-19  1:24 UTC (permalink / raw)
  To: Chuck Lever; +Cc: musl

On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
> >> i think this call goes wrong:
> >> 
> >> 
> > http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> > 
> >> 
> >> it loops for 100 iterations and if all ports are used
> >> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
> >> with some random high sockfd (eg. 105) that is closed.
> >> 
> >> ...so should getservbyport fail there?
> >> 
> >> (according to strace it tries ports 883 to 982)
> > 
> > I think the application's expectation is that it fail rather than
> > returning a decimal-string-only service entity. However it looks like
> > the code is written to handle the case where all 100 iterations fail
> > to get an anonymous port. The problem seems to be that, when the loop
> > stops due to hitting the iteration count rather than exiting with
> > break, i has already been incremented past the last tmp_socket slot,
> > so the close loop closes the fd that they actually want to use, later
> > causing EBADF. This is purely an application bug, but it happens not
> > to get noticed if getservbyport fails anywhere along the way, which
> > they expect to happen in the usual case.
> 
> statd_get_socket() is hunting for a privileged source port that
> is not just unused at the moment, but that is also not going to be
> used by some other well-known service. This is a long-lived socket
> that statd uses to communicate with the kernel. It must use a
> privileged port.
> 
> if getservbyport(3) is returning something for every port that
> is tried, then statd_get_socket() will fail to find a usable
> port.
> 
> If it's returning 105, that suggests it has run out of retries.
> It should return -1 in this case. That is a logic bug.
> 
> But is it true that every port returned by bindresvport(3) is
> actually defined in /etc/services? Surely there is one open
> port that can be used. What port does bindresvport(3) start
> with?

The logic bug is the count-down loop that closes all the temp sockets.
In the case where the loop terminates via break, it leaves the last
one open and only closes the extras. But in the case where where the
loop terminates via the end condition in the for statement, the close
loop closes all the sockets including the one it intends to use.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-19  1:24 ` Rich Felker
@ 2015-08-19  1:44   ` Chuck Lever
  2015-08-19  1:51     ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2015-08-19  1:44 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl


On Aug 18, 2015, at 6:24 PM, Rich Felker <dalias@libc.org> wrote:

> On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
>>>> i think this call goes wrong:
>>>> 
>>>> 
>>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
>>> 
>>>> 
>>>> it loops for 100 iterations and if all ports are used
>>>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
>>>> with some random high sockfd (eg. 105) that is closed.
>>>> 
>>>> ...so should getservbyport fail there?
>>>> 
>>>> (according to strace it tries ports 883 to 982)
>>> 
>>> I think the application's expectation is that it fail rather than
>>> returning a decimal-string-only service entity. However it looks like
>>> the code is written to handle the case where all 100 iterations fail
>>> to get an anonymous port. The problem seems to be that, when the loop
>>> stops due to hitting the iteration count rather than exiting with
>>> break, i has already been incremented past the last tmp_socket slot,
>>> so the close loop closes the fd that they actually want to use, later
>>> causing EBADF. This is purely an application bug, but it happens not
>>> to get noticed if getservbyport fails anywhere along the way, which
>>> they expect to happen in the usual case.
>> 
>> statd_get_socket() is hunting for a privileged source port that
>> is not just unused at the moment, but that is also not going to be
>> used by some other well-known service. This is a long-lived socket
>> that statd uses to communicate with the kernel. It must use a
>> privileged port.
>> 
>> if getservbyport(3) is returning something for every port that
>> is tried, then statd_get_socket() will fail to find a usable
>> port.
>> 
>> If it's returning 105, that suggests it has run out of retries.
>> It should return -1 in this case. That is a logic bug.
>> 
>> But is it true that every port returned by bindresvport(3) is
>> actually defined in /etc/services? Surely there is one open
>> port that can be used. What port does bindresvport(3) start
>> with?

> The logic bug is the count-down loop that closes all the temp sockets.
> In the case where the loop terminates via break, it leaves the last
> one open and only closes the extras. But in the case where where the
> loop terminates via the end condition in the for statement, the close
> loop closes all the sockets including the one it intends to use.

OK. Do you have a patch?

Still not clear why it would take 100 tries exactly.


--
Chuck Lever





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-19  1:44   ` Chuck Lever
@ 2015-08-19  1:51     ` Rich Felker
  2015-08-19  2:13       ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2015-08-19  1:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: musl

On Tue, Aug 18, 2015 at 06:44:46PM -0700, Chuck Lever wrote:
> 
> On Aug 18, 2015, at 6:24 PM, Rich Felker <dalias@libc.org> wrote:
> 
> > On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
> >>>> i think this call goes wrong:
> >>>> 
> >>>> 
> >>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
> >>> 
> >>>> 
> >>>> it loops for 100 iterations and if all ports are used
> >>>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
> >>>> with some random high sockfd (eg. 105) that is closed.
> >>>> 
> >>>> ...so should getservbyport fail there?
> >>>> 
> >>>> (according to strace it tries ports 883 to 982)
> >>> 
> >>> I think the application's expectation is that it fail rather than
> >>> returning a decimal-string-only service entity. However it looks like
> >>> the code is written to handle the case where all 100 iterations fail
> >>> to get an anonymous port. The problem seems to be that, when the loop
> >>> stops due to hitting the iteration count rather than exiting with
> >>> break, i has already been incremented past the last tmp_socket slot,
> >>> so the close loop closes the fd that they actually want to use, later
> >>> causing EBADF. This is purely an application bug, but it happens not
> >>> to get noticed if getservbyport fails anywhere along the way, which
> >>> they expect to happen in the usual case.
> >> 
> >> statd_get_socket() is hunting for a privileged source port that
> >> is not just unused at the moment, but that is also not going to be
> >> used by some other well-known service. This is a long-lived socket
> >> that statd uses to communicate with the kernel. It must use a
> >> privileged port.
> >> 
> >> if getservbyport(3) is returning something for every port that
> >> is tried, then statd_get_socket() will fail to find a usable
> >> port.
> >> 
> >> If it's returning 105, that suggests it has run out of retries.
> >> It should return -1 in this case. That is a logic bug.
> >> 
> >> But is it true that every port returned by bindresvport(3) is
> >> actually defined in /etc/services? Surely there is one open
> >> port that can be used. What port does bindresvport(3) start
> >> with?
> 
> > The logic bug is the count-down loop that closes all the temp sockets.
> > In the case where the loop terminates via break, it leaves the last
> > one open and only closes the extras. But in the case where where the
> > loop terminates via the end condition in the for statement, the close
> > loop closes all the sockets including the one it intends to use.
> 
> OK. Do you have a patch?

Something like this:

+        if (i == loopcnt) i--;
         while (--i >= 0)
                 close(tmp_sockets[i]);

I don't have a checkout of the source to make a proper diff, and I've
got too many things I'm trying to do at the moment already. But it
should be easy to make a real patch from what I wrote above.
 
> Still not clear why it would take 100 tries exactly.

Because getservbyport never fails, and the code path with 100 failures
to find a port where getservbyport fails was never tested. We can (and
probably should) make it fail in musl when there's no named service to
go with the port, but that doesn't change that there's a bug in this
previously-untested codepath of nfs-utils that's the source of the
EBADF error from select.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: nfs-utils broken with musl: "select: Bad file descriptor"
  2015-08-19  1:51     ` Rich Felker
@ 2015-08-19  2:13       ` Chuck Lever
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2015-08-19  2:13 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl


On Aug 18, 2015, at 6:51 PM, Rich Felker <dalias@libc.org> wrote:

> On Tue, Aug 18, 2015 at 06:44:46PM -0700, Chuck Lever wrote:
>> 
>> On Aug 18, 2015, at 6:24 PM, Rich Felker <dalias@libc.org> wrote:
>> 
>>> On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
>>>>>> i think this call goes wrong:
>>>>>> 
>>>>>> 
>>>>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
>>>>> 
>>>>>> 
>>>>>> it loops for 100 iterations and if all ports are used
>>>>>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
>>>>>> with some random high sockfd (eg. 105) that is closed.
>>>>>> 
>>>>>> ...so should getservbyport fail there?
>>>>>> 
>>>>>> (according to strace it tries ports 883 to 982)
>>>>> 
>>>>> I think the application's expectation is that it fail rather than
>>>>> returning a decimal-string-only service entity. However it looks like
>>>>> the code is written to handle the case where all 100 iterations fail
>>>>> to get an anonymous port. The problem seems to be that, when the loop
>>>>> stops due to hitting the iteration count rather than exiting with
>>>>> break, i has already been incremented past the last tmp_socket slot,
>>>>> so the close loop closes the fd that they actually want to use, later
>>>>> causing EBADF. This is purely an application bug, but it happens not
>>>>> to get noticed if getservbyport fails anywhere along the way, which
>>>>> they expect to happen in the usual case.
>>>> 
>>>> statd_get_socket() is hunting for a privileged source port that
>>>> is not just unused at the moment, but that is also not going to be
>>>> used by some other well-known service. This is a long-lived socket
>>>> that statd uses to communicate with the kernel. It must use a
>>>> privileged port.
>>>> 
>>>> if getservbyport(3) is returning something for every port that
>>>> is tried, then statd_get_socket() will fail to find a usable
>>>> port.
>>>> 
>>>> If it's returning 105, that suggests it has run out of retries.
>>>> It should return -1 in this case. That is a logic bug.
>>>> 
>>>> But is it true that every port returned by bindresvport(3) is
>>>> actually defined in /etc/services? Surely there is one open
>>>> port that can be used. What port does bindresvport(3) start
>>>> with?
>> 
>>> The logic bug is the count-down loop that closes all the temp sockets.
>>> In the case where the loop terminates via break, it leaves the last
>>> one open and only closes the extras. But in the case where where the
>>> loop terminates via the end condition in the for statement, the close
>>> loop closes all the sockets including the one it intends to use.
>> 
>> OK. Do you have a patch?
> 
> Something like this:
> 
> +        if (i == loopcnt) i--;
>         while (--i >= 0)
>                 close(tmp_sockets[i]);
> 
> I don't have a checkout of the source to make a proper diff, and I've
> got too many things I'm trying to do at the moment already. But it
> should be easy to make a real patch from what I wrote above.

I'll see if I can put together a proper patch for Steve, then.
But we're both ooo through at least next week.


>> Still not clear why it would take 100 tries exactly.
> 
> Because getservbyport never fails, and the code path with 100 failures
> to find a port where getservbyport fails was never tested. We can (and
> probably should) make it fail in musl when there's no named service to
> go with the port,

Ah ha. So musl's getservbyport(3) never returns NULL?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/endservent.html

suggests that's incorrect behavior.

Why wasn't this a problem before now? The port hunting logic hasn't
changed since January 2013 (commit eb8229338f).


> but that doesn't change that there's a bug in this
> previously-untested codepath of nfs-utils that's the source of the
> EBADF error from select.

Agreed. Just trying to get a full picture.


--
Chuck Lever





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-19  2:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-19  1:05 nfs-utils broken with musl: "select: Bad file descriptor" Chuck Lever
2015-08-19  1:24 ` Rich Felker
2015-08-19  1:44   ` Chuck Lever
2015-08-19  1:51     ` Rich Felker
2015-08-19  2:13       ` Chuck Lever

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).