From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13400 invoked from network); 20 Jul 2022 01:55:18 -0000 Received: from second.openwall.net (193.110.157.125) by inbox.vuxu.org with ESMTPUTF8; 20 Jul 2022 01:55:18 -0000 Received: (qmail 3911 invoked by uid 550); 20 Jul 2022 01:55:13 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 3879 invoked from network); 20 Jul 2022 01:55:12 -0000 Date: Tue, 19 Jul 2022 21:54:59 -0400 From: Rich Felker To: "Nieminen, Jussi" Cc: "musl@lists.openwall.com" Message-ID: <20220720015457.GC7074@brightrain.aerifal.cx> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Bug in getaddrinfo causing spurious returns with wrong error values On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote: > Hi, > > I'm a developer from the performance monitoring company Dynatrace, and I've been > recently investigating curious problems at our customers' environments where a > call to musl's getaddrinfo appears to spuriously return ENOENT when called from > a node.js application that is being monitored with the Dynatrace agent. > > I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG > check. If an address family that is not enabled on the host is specified, a call > to "connect" in that code fails, the socket fd is closed, and the value of > "errno" is then evaluated. > > The problem is that the call to "close" can change the value of errno, which > will break the switch-case that follows it. Especially if aio is used (which is > the case when the Dynatrace agent is included in the application), the call to > close will end up setting errno to ENOENT by default (even without a failure) > within the "aio_cancel" function if an aio operation is active. In such a case > getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT. > > (After some error code translations within libuv, node.js will then print an > error message claiming that getaddrinfo failed with ENOENT which is rather > confusing.) > > Even if aio is not used, the code might fail whenever "close" gets interrupted > and returns with errno set to EINTR. As the return value of close is not > checked, the errno might thus "silently" change before getting evaluated with > the assumption that it still contains the value set when "connect" failed. > > Below is a simple patch that should take care of this problem. Let me know if I > can provide any more information or if there is anything else I can help with. > > Thanks, > Jussi > > > ------------------------------------------------------------------------------- > diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c > index efaab306..71809856 100644 > --- a/src/network/getaddrinfo.c > +++ b/src/network/getaddrinfo.c > @@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru > char canon[256], *outcanon; > int nservs, naddrs, nais, canon_len, i, j, k; > int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0; > + int saved_errno = 0; > struct aibuf *out; > > if (!host && !serv) return EAI_NONAME; > @@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru > pthread_setcancelstate( > PTHREAD_CANCEL_DISABLE, &cs); > int r = connect(s, ta[i], tl[i]); > + /* The call to "close" might change errno, especially if aio is in use; > + * save the value set by "connect" for the later comparison. */ > + if (r < 0) saved_errno = errno; > pthread_setcancelstate(cs, 0); > close(s); > if (!r) continue; > } > - switch (errno) { > + switch (saved_errno) { > case EADDRNOTAVAIL: > case EAFNOSUPPORT: > case EHOSTUNREACH: > ------------------------------------------------------------------------------- A couple minor problems with the patch: - The errno from socket() is not used if the failure was from socket(). I'm not sure yet if that matters but I think it may if IPv6 was disabled in a way that makes socket() fail. - In the case where EAI_SYSTEM is returned, the error was not restored back into errno, so the caller cannot get the cause of error if it was clobbered by close. I'll work on a fixed version. I think the right thing to do is just save/restore errno itself rather than switching on saved_errno. Rich