mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Bug in getaddrinfo causing spurious returns with wrong error values
@ 2021-11-23 14:47 Nieminen, Jussi
  2021-11-23 15:05 ` Rich Felker
  2022-07-20  1:54 ` Rich Felker
  0 siblings, 2 replies; 4+ messages in thread
From: Nieminen, Jussi @ 2021-11-23 14:47 UTC (permalink / raw)
  To: musl

Hi,

I'm a developer from the performance monitoring company Dynatrace, and I've been
recently investigating curious problems at our customers' environments where a
call to musl's getaddrinfo appears to spuriously return ENOENT when called from
a node.js application that is being monitored with the Dynatrace agent.

I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
check. If an address family that is not enabled on the host is specified, a call
to "connect" in that code fails, the socket fd is closed, and the value of
"errno" is then evaluated.

The problem is that the call to "close" can change the value of errno, which
will break the switch-case that follows it. Especially if aio is used (which is
the case when the Dynatrace agent is included in the application), the call to
close will end up setting errno to ENOENT by default (even without a failure)
within the "aio_cancel" function if an aio operation is active. In such a case
getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.

(After some error code translations within libuv, node.js will then print an
error message claiming that getaddrinfo failed with ENOENT which is rather
confusing.)

Even if aio is not used, the code might fail whenever "close" gets interrupted
and returns with errno set to EINTR. As the return value of close is not
checked, the errno might thus "silently" change before getting evaluated with
the assumption that it still contains the value set when "connect" failed.

Below is a simple patch that should take care of this problem. Let me know if I
can provide any more information or if there is anything else I can help with.

Thanks,
Jussi


-------------------------------------------------------------------------------
diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c
index efaab306..71809856 100644
--- a/src/network/getaddrinfo.c
+++ b/src/network/getaddrinfo.c
@@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
        char canon[256], *outcanon;
        int nservs, naddrs, nais, canon_len, i, j, k;
        int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0;
+       int saved_errno = 0;
        struct aibuf *out;

        if (!host && !serv) return EAI_NONAME;
@@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
                                pthread_setcancelstate(
                                        PTHREAD_CANCEL_DISABLE, &cs);
                                int r = connect(s, ta[i], tl[i]);
+                               /* The call to "close" might change errno, especially if aio is in use;
+                                * save the value set by "connect" for the later comparison. */
+                               if (r < 0) saved_errno = errno;
                                pthread_setcancelstate(cs, 0);
                                close(s);
                                if (!r) continue;
                        }
-                       switch (errno) {
+                       switch (saved_errno) {
                        case EADDRNOTAVAIL:
                        case EAFNOSUPPORT:
                        case EHOSTUNREACH:
-------------------------------------------------------------------------------

This email may contain confidential information. If it appears this message was sent to you by mistake, please let us know of the error. In this case, we also ask that you do not further forward the content and delete it. Thank you for your cooperation and understanding. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4020 Linz, Austria, Am Fünfundzwanziger Turm 20.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Bug in getaddrinfo causing spurious returns with wrong error values
  2021-11-23 14:47 [musl] Bug in getaddrinfo causing spurious returns with wrong error values Nieminen, Jussi
@ 2021-11-23 15:05 ` Rich Felker
  2022-07-19  2:57   ` Rich Felker
  2022-07-20  1:54 ` Rich Felker
  1 sibling, 1 reply; 4+ messages in thread
From: Rich Felker @ 2021-11-23 15:05 UTC (permalink / raw)
  To: Nieminen, Jussi; +Cc: musl

On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> Hi,
> 
> I'm a developer from the performance monitoring company Dynatrace, and I've been
> recently investigating curious problems at our customers' environments where a
> call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> a node.js application that is being monitored with the Dynatrace agent.
> 
> I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> check. If an address family that is not enabled on the host is specified, a call
> to "connect" in that code fails, the socket fd is closed, and the value of
> "errno" is then evaluated.
> 
> The problem is that the call to "close" can change the value of errno, which
> will break the switch-case that follows it. Especially if aio is used (which is
> the case when the Dynatrace agent is included in the application), the call to
> close will end up setting errno to ENOENT by default (even without a failure)
> within the "aio_cancel" function if an aio operation is active. In such a case
> getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
> 
> (After some error code translations within libuv, node.js will then print an
> error message claiming that getaddrinfo failed with ENOENT which is rather
> confusing.)

Indeed, this all makes sense.

> Even if aio is not used, the code might fail whenever "close" gets interrupted
> and returns with errno set to EINTR. As the return value of close is not
> checked, the errno might thus "silently" change before getting evaluated with
> the assumption that it still contains the value set when "connect" failed.

close can't return with EINTR but can return with EINPROGRESS which
would give the same result here, I think.

> Below is a simple patch that should take care of this problem. Let me know if I
> can provide any more information or if there is anything else I can help with.

I think this patch is probably okay. I wondered if close was in the
set of functions POSIX-future intends to require not to touch errno on
success, but it doesn't seem to be, and moreover the EINPROGRESS
semantics would undermine that anyway. So saving errno before calling
close is probably the right thing to do here.

> Thanks,
> Jussi

Thanks for the clear analysis and patch!

> -------------------------------------------------------------------------------
> diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c
> index efaab306..71809856 100644
> --- a/src/network/getaddrinfo.c
> +++ b/src/network/getaddrinfo.c
> @@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>         char canon[256], *outcanon;
>         int nservs, naddrs, nais, canon_len, i, j, k;
>         int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0;
> +       int saved_errno = 0;
>         struct aibuf *out;
> 
>         if (!host && !serv) return EAI_NONAME;
> @@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>                                 pthread_setcancelstate(
>                                         PTHREAD_CANCEL_DISABLE, &cs);
>                                 int r = connect(s, ta[i], tl[i]);
> +                               /* The call to "close" might change errno, especially if aio is in use;
> +                                * save the value set by "connect" for the later comparison. */
> +                               if (r < 0) saved_errno = errno;
>                                 pthread_setcancelstate(cs, 0);
>                                 close(s);
>                                 if (!r) continue;
>                         }
> -                       switch (errno) {
> +                       switch (saved_errno) {
>                         case EADDRNOTAVAIL:
>                         case EAFNOSUPPORT:
>                         case EHOSTUNREACH:
> -------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Bug in getaddrinfo causing spurious returns with wrong error values
  2021-11-23 15:05 ` Rich Felker
@ 2022-07-19  2:57   ` Rich Felker
  0 siblings, 0 replies; 4+ messages in thread
From: Rich Felker @ 2022-07-19  2:57 UTC (permalink / raw)
  To: Nieminen, Jussi; +Cc: musl

On Tue, Nov 23, 2021 at 10:05:37AM -0500, Rich Felker wrote:
> On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> > Hi,
> > 
> > I'm a developer from the performance monitoring company Dynatrace, and I've been
> > recently investigating curious problems at our customers' environments where a
> > call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> > a node.js application that is being monitored with the Dynatrace agent.
> > 
> > I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> > check. If an address family that is not enabled on the host is specified, a call
> > to "connect" in that code fails, the socket fd is closed, and the value of
> > "errno" is then evaluated.
> > 
> > The problem is that the call to "close" can change the value of errno, which
> > will break the switch-case that follows it. Especially if aio is used (which is
> > the case when the Dynatrace agent is included in the application), the call to
> > close will end up setting errno to ENOENT by default (even without a failure)
> > within the "aio_cancel" function if an aio operation is active. In such a case
> > getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
> > 
> > (After some error code translations within libuv, node.js will then print an
> > error message claiming that getaddrinfo failed with ENOENT which is rather
> > confusing.)
> 
> Indeed, this all makes sense.
> 
> > Even if aio is not used, the code might fail whenever "close" gets interrupted
> > and returns with errno set to EINTR. As the return value of close is not
> > checked, the errno might thus "silently" change before getting evaluated with
> > the assumption that it still contains the value set when "connect" failed.
> 
> close can't return with EINTR but can return with EINPROGRESS which
> would give the same result here, I think.

I was mistaken about this. EINPROGRESS was removed back in commit
6bea5dc69892cd9ff0c222474e7dd468c29dfa75 (2015) and is no longer
possible. However, as the original report stated, ENOENT comes from
__aio_close where errno is clobbered even on success.

On looking at this again, I'm thinking we might should rework the
stuff causing that to happen, but the patch submitted still looks
reasonable since close does not have any contract not to clobber
errno.

> > Below is a simple patch that should take care of this problem. Let me know if I
> > can provide any more information or if there is anything else I can help with.
> 
> I think this patch is probably okay. I wondered if close was in the
> set of functions POSIX-future intends to require not to touch errno on
> success, but it doesn't seem to be, and moreover the EINPROGRESS
> semantics would undermine that anyway. So saving errno before calling
> close is probably the right thing to do here.
> 
> > Thanks,
> > Jussi
> 
> Thanks for the clear analysis and patch!

If nobody objects/sees anything wrong, I'll go ahead and take this
now. Sorry for the long delay!

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [musl] Bug in getaddrinfo causing spurious returns with wrong error values
  2021-11-23 14:47 [musl] Bug in getaddrinfo causing spurious returns with wrong error values Nieminen, Jussi
  2021-11-23 15:05 ` Rich Felker
@ 2022-07-20  1:54 ` Rich Felker
  1 sibling, 0 replies; 4+ messages in thread
From: Rich Felker @ 2022-07-20  1:54 UTC (permalink / raw)
  To: Nieminen, Jussi; +Cc: musl

On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> Hi,
> 
> I'm a developer from the performance monitoring company Dynatrace, and I've been
> recently investigating curious problems at our customers' environments where a
> call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> a node.js application that is being monitored with the Dynatrace agent.
> 
> I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> check. If an address family that is not enabled on the host is specified, a call
> to "connect" in that code fails, the socket fd is closed, and the value of
> "errno" is then evaluated.
> 
> The problem is that the call to "close" can change the value of errno, which
> will break the switch-case that follows it. Especially if aio is used (which is
> the case when the Dynatrace agent is included in the application), the call to
> close will end up setting errno to ENOENT by default (even without a failure)
> within the "aio_cancel" function if an aio operation is active. In such a case
> getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
> 
> (After some error code translations within libuv, node.js will then print an
> error message claiming that getaddrinfo failed with ENOENT which is rather
> confusing.)
> 
> Even if aio is not used, the code might fail whenever "close" gets interrupted
> and returns with errno set to EINTR. As the return value of close is not
> checked, the errno might thus "silently" change before getting evaluated with
> the assumption that it still contains the value set when "connect" failed.
> 
> Below is a simple patch that should take care of this problem. Let me know if I
> can provide any more information or if there is anything else I can help with.
> 
> Thanks,
> Jussi
> 
> 
> -------------------------------------------------------------------------------
> diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c
> index efaab306..71809856 100644
> --- a/src/network/getaddrinfo.c
> +++ b/src/network/getaddrinfo.c
> @@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>         char canon[256], *outcanon;
>         int nservs, naddrs, nais, canon_len, i, j, k;
>         int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0;
> +       int saved_errno = 0;
>         struct aibuf *out;
> 
>         if (!host && !serv) return EAI_NONAME;
> @@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
>                                 pthread_setcancelstate(
>                                         PTHREAD_CANCEL_DISABLE, &cs);
>                                 int r = connect(s, ta[i], tl[i]);
> +                               /* The call to "close" might change errno, especially if aio is in use;
> +                                * save the value set by "connect" for the later comparison. */
> +                               if (r < 0) saved_errno = errno;
>                                 pthread_setcancelstate(cs, 0);
>                                 close(s);
>                                 if (!r) continue;
>                         }
> -                       switch (errno) {
> +                       switch (saved_errno) {
>                         case EADDRNOTAVAIL:
>                         case EAFNOSUPPORT:
>                         case EHOSTUNREACH:
> -------------------------------------------------------------------------------

A couple minor problems with the patch:

- The errno from socket() is not used if the failure was from
  socket(). I'm not sure yet if that matters but I think it may if
  IPv6 was disabled in a way that makes socket() fail.

- In the case where EAI_SYSTEM is returned, the error was not restored
  back into errno, so the caller cannot get the cause of error if it
  was clobbered by close.

I'll work on a fixed version. I think the right thing to do is just
save/restore errno itself rather than switching on saved_errno.

Rich

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-20  1:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-23 14:47 [musl] Bug in getaddrinfo causing spurious returns with wrong error values Nieminen, Jussi
2021-11-23 15:05 ` Rich Felker
2022-07-19  2:57   ` Rich Felker
2022-07-20  1:54 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).