mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Behavior change in getaddrbyname() with AF_UNSPEC
@ 2023-01-14 22:56 Barry Bond
  2023-01-18 15:26 ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Barry Bond @ 2023-01-14 22:56 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

This is related to this change:  https://git.musl-libc.org/cgit/musl/commit/?id=5cf1ac2443ad0dba263559a3fe043d929e0e5c4c made back in 2020.

In the repro case, getaddrbyname() with AF_UNSPEC sends out two requests, but only gets back a single response, with the ipv4 address.   There is no ipv6 on the network.

name_from_dns() contains the relevant code.  After __res_msend_rc() returns, 'nq' is 2, and 'alens' is [96, 0], indicating that there was an ipv4 response of 96 bytes, but no response for ipv6.  Then the validation code runs:

                for (i=0; i<nq; i++) {
                                if (alens[i] < 4 || (abuf[i][3] & 15) == 2) return EAI_AGAIN;
                                if ((abuf[i][3] & 15) == 3) return 0;
                                if ((abuf[i][3] & 15) != 0) return EAI_FAIL;
                }

and the result is EAI_AGAIN, because alens[1]==0.

Before this patch, the code would have parsed the ipv4 response via __dns_parse(), failed to parse the empty second response because alens[1]<12, and the function would return with ctx.cnt==1.

I propose adding one new check at the top of the for() loop:
                if (alens[i] == 0) continue; /* response timed out */

Thanks!
Barry - Microsoft Azure Sphere


[-- Attachment #2: Type: text/html, Size: 4566 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC
  2023-01-14 22:56 [musl] Behavior change in getaddrbyname() with AF_UNSPEC Barry Bond
@ 2023-01-18 15:26 ` Rich Felker
  2023-01-19 14:58   ` [musl] RE: [EXTERNAL] " Barry Bond
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2023-01-18 15:26 UTC (permalink / raw)
  To: Barry Bond; +Cc: musl

On Sat, Jan 14, 2023 at 10:56:28PM +0000, Barry Bond wrote:
> This is related to this change:  https://git.musl-libc.org/cgit/musl/commit/?id=5cf1ac2443ad0dba263559a3fe043d929e0e5c4c made back in 2020.
> 
> In the repro case, getaddrbyname() with AF_UNSPEC sends out two
> requests, but only gets back a single response, with the ipv4
> address. There is no ipv6 on the network.
> 
> name_from_dns() contains the relevant code. After __res_msend_rc()
> returns, 'nq' is 2, and 'alens' is [96, 0], indicating that there
> was an ipv4 response of 96 bytes, but no response for ipv6. Then the
> validation code runs:
> 
>                 for (i=0; i<nq; i++) {
>                                 if (alens[i] < 4 || (abuf[i][3] & 15) == 2) return EAI_AGAIN;
>                                 if ((abuf[i][3] & 15) == 3) return 0;
>                                 if ((abuf[i][3] & 15) != 0) return EAI_FAIL;
>                 }
> 
> and the result is EAI_AGAIN, because alens[1]==0.
> 
> Before this patch, the code would have parsed the ipv4 response via
> __dns_parse(), failed to parse the empty second response because
> alens[1]<12, and the function would return with ctx.cnt==1.

That was the wrong behavior that this patch fixed. Previously, the
query was timing out, but because there was an answer to the other
query, we were erroneously hiding the failure from the application and
presenting a timing-/network-congestion-dependent incorrect result
(wrongly claiming only A or only AAAA exist when in fact we didn't get
enough information to make that determination).

"There is no ipv6 on the network" is not cause for the AAAA query to
timeout. The ability to lookup a particular RR type has nothing to do
with what network protocols are supported on your network. Can you
describe the environment this is happening in and why it might be
happening?

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] RE: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC
  2023-01-18 15:26 ` Rich Felker
@ 2023-01-19 14:58   ` Barry Bond
  2023-01-23  1:34     ` [musl] " Barry Bond
  0 siblings, 1 reply; 5+ messages in thread
From: Barry Bond @ 2023-01-19 14:58 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

OK, let me get more data about exactly what happened to the second query.

Barry

-----Original Message-----
From: Rich Felker <dalias@libc.org> 
Sent: Wednesday, January 18, 2023 7:27 AM
To: Barry Bond <barrybo@microsoft.com>
Cc: musl@lists.openwall.com
Subject: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC

[You don't often get email from dalias@libc.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Sat, Jan 14, 2023 at 10:56:28PM +0000, Barry Bond wrote:
> This is related to this change:  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.musl-libc.org%2Fcgit%2Fmusl%2Fcommit%2F%3Fid%3D5cf1ac2443ad0dba263559a3fe043d929e0e5c4c&data=05%7C01%7Cbarrybo%40microsoft.com%7C02af54c5fcbb4559460b08daf9686c4a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638096524143833435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yh3xEyuJixzQbjEWRofxh85c%2BF6yxkUsxy3LgS2tasw%3D&reserved=0 made back in 2020.
>
> In the repro case, getaddrbyname() with AF_UNSPEC sends out two 
> requests, but only gets back a single response, with the ipv4 address. 
> There is no ipv6 on the network.
>
> name_from_dns() contains the relevant code. After __res_msend_rc() 
> returns, 'nq' is 2, and 'alens' is [96, 0], indicating that there was 
> an ipv4 response of 96 bytes, but no response for ipv6. Then the 
> validation code runs:
>
>                 for (i=0; i<nq; i++) {
>                                 if (alens[i] < 4 || (abuf[i][3] & 15) == 2) return EAI_AGAIN;
>                                 if ((abuf[i][3] & 15) == 3) return 0;
>                                 if ((abuf[i][3] & 15) != 0) return EAI_FAIL;
>                 }
>
> and the result is EAI_AGAIN, because alens[1]==0.
>
> Before this patch, the code would have parsed the ipv4 response via 
> __dns_parse(), failed to parse the empty second response because 
> alens[1]<12, and the function would return with ctx.cnt==1.

That was the wrong behavior that this patch fixed. Previously, the query was timing out, but because there was an answer to the other query, we were erroneously hiding the failure from the application and presenting a timing-/network-congestion-dependent incorrect result (wrongly claiming only A or only AAAA exist when in fact we didn't get enough information to make that determination).

"There is no ipv6 on the network" is not cause for the AAAA query to timeout. The ability to lookup a particular RR type has nothing to do with what network protocols are supported on your network. Can you describe the environment this is happening in and why it might be happening?

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [musl] Re: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC
  2023-01-19 14:58   ` [musl] RE: [EXTERNAL] " Barry Bond
@ 2023-01-23  1:34     ` Barry Bond
  2023-01-24  8:54       ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Barry Bond @ 2023-01-23  1:34 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl

[-- Attachment #1: Type: text/plain, Size: 3886 bytes --]

I now have more data.  The DNS queries are being multcast to a customer's mDNS server implementation.  I don't have source code access to it.

Its behavior is to correctly respond to "A" requests, but drop "AAAA" requests completely.

So ​__res_msend_rc() is sending two request messages out ("A" and "AAAA"), and getting only 1 response back ("A").  The "AAAA" request is resent, then times out.  That leaves alens[0]==54 and alens[1]==0.  That leads to the EAI_AGAIN return in name_from_dns().

RFC 8906 (https://datatracker.ietf.org/doc/rfc8906/) seems relevant:  section 3.1.2 says that it is expected the DNS servers return a response as if it has no data, for unknown rr types.  But that's a best practice and not a hard requirement.  MUSL before the relevant change was OK with a non-reply to the AAAA request, but now requires the response.


​
________________________________
From: Barry Bond
Sent: Thursday, January 19, 2023 6:58 AM
To: Rich Felker <dalias@libc.org>
Cc: musl@lists.openwall.com <musl@lists.openwall.com>
Subject: RE: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC

OK, let me get more data about exactly what happened to the second query.

Barry

-----Original Message-----
From: Rich Felker <dalias@libc.org>
Sent: Wednesday, January 18, 2023 7:27 AM
To: Barry Bond <barrybo@microsoft.com>
Cc: musl@lists.openwall.com
Subject: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC

[You don't often get email from dalias@libc.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On Sat, Jan 14, 2023 at 10:56:28PM +0000, Barry Bond wrote:
> This is related to this change:  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.musl-libc.org%2Fcgit%2Fmusl%2Fcommit%2F%3Fid%3D5cf1ac2443ad0dba263559a3fe043d929e0e5c4c&data=05%7C01%7Cbarrybo%40microsoft.com%7C02af54c5fcbb4559460b08daf9686c4a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638096524143833435%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Yh3xEyuJixzQbjEWRofxh85c%2BF6yxkUsxy3LgS2tasw%3D&reserved=0 made back in 2020.
>
> In the repro case, getaddrbyname() with AF_UNSPEC sends out two
> requests, but only gets back a single response, with the ipv4 address.
> There is no ipv6 on the network.
>
> name_from_dns() contains the relevant code. After __res_msend_rc()
> returns, 'nq' is 2, and 'alens' is [96, 0], indicating that there was
> an ipv4 response of 96 bytes, but no response for ipv6. Then the
> validation code runs:
>
>                 for (i=0; i<nq; i++) {
>                                 if (alens[i] < 4 || (abuf[i][3] & 15) == 2) return EAI_AGAIN;
>                                 if ((abuf[i][3] & 15) == 3) return 0;
>                                 if ((abuf[i][3] & 15) != 0) return EAI_FAIL;
>                 }
>
> and the result is EAI_AGAIN, because alens[1]==0.
>
> Before this patch, the code would have parsed the ipv4 response via
> __dns_parse(), failed to parse the empty second response because
> alens[1]<12, and the function would return with ctx.cnt==1.

That was the wrong behavior that this patch fixed. Previously, the query was timing out, but because there was an answer to the other query, we were erroneously hiding the failure from the application and presenting a timing-/network-congestion-dependent incorrect result (wrongly claiming only A or only AAAA exist when in fact we didn't get enough information to make that determination).

"There is no ipv6 on the network" is not cause for the AAAA query to timeout. The ability to lookup a particular RR type has nothing to do with what network protocols are supported on your network. Can you describe the environment this is happening in and why it might be happening?

Rich

[-- Attachment #2: Type: text/html, Size: 7879 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [musl] Re: [EXTERNAL] Re: [musl] Behavior change in getaddrbyname() with AF_UNSPEC
  2023-01-23  1:34     ` [musl] " Barry Bond
@ 2023-01-24  8:54       ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2023-01-24  8:54 UTC (permalink / raw)
  To: Barry Bond; +Cc: musl

On Mon, Jan 23, 2023 at 01:34:10AM +0000, Barry Bond wrote:
> I now have more data. The DNS queries are being multcast to a
> customer's mDNS server implementation. I don't have source code
> access to it.

In general, pointing the stub resolver at mDNS is not expected to
work. It might work under some conditions if it's pointed *only* at
mDNS, but the common thing folks try to do like this is pointing
resolv.conf at both a real recursive resolver and mDNS hoping this
will yield a unioned namespace, which is not how it works. Doing that
really requires a unioning proxy resolver (which can also provide
functionality like DNSSEC validation etc if desired).

> Its behavior is to correctly respond to "A" requests, but drop
> "AAAA" requests completely.
> 
> So ​__res_msend_rc() is sending two request messages out ("A" and
> "AAAA"), and getting only 1 response back ("A"). The "AAAA" request
> is resent, then times out. That leaves alens[0]==54 and alens[1]==0.
> That leads to the EAI_AGAIN return in name_from_dns().
> 
> RFC 8906 (https://datatracker.ietf.org/doc/rfc8906/) seems relevant:
> section 3.1.2 says that it is expected the DNS servers return a
> response as if it has no data, for unknown rr types. But that's a
> best practice and not a hard requirement. MUSL before the relevant
> change was OK with a non-reply to the AAAA request, but now requires
> the response.

There are lots of things that are perfectly valid for a nameserver to
do, but which do not answer the question asked by the stub resolver.
And if the question is not answered, it cannot draw any conclusion for
what to give the application.

Timing out then returning success anyway after the timeout was a bug.
It only "worked" for this setup before insomuch as someone was happy
with waiting 5 seconds for each query to "succeed". I'm not sure what
the right waya to do what they want is because I'm not sure exactly
what they're tryng to do, but the old behavior isn't it.

Rich

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-01-24  8:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-14 22:56 [musl] Behavior change in getaddrbyname() with AF_UNSPEC Barry Bond
2023-01-18 15:26 ` Rich Felker
2023-01-19 14:58   ` [musl] RE: [EXTERNAL] " Barry Bond
2023-01-23  1:34     ` [musl] " Barry Bond
2023-01-24  8:54       ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).