* [musl] lookup_name issue with search domains @ 2022-12-04 4:02 Kenny MacDermid 2022-12-04 5:45 ` Markus Wichmann 0 siblings, 1 reply; 9+ messages in thread From: Kenny MacDermid @ 2022-12-04 4:02 UTC (permalink / raw) To: musl Hello, I'm seeing an issue in resolving hosts when any resolv.conf search domain returns a no-data response. In debugging I believe it's caused by the check in network/lookup_name.c, line 225: if (cnt) return cnt; The code is looping through the search domains trying each one. This works fine for some of my search domains because the DNS response will have reply code flags set to 3, which causes name_from_dns() to return 0. The issue arises when it queries my cloudflare hosted domain (which also uses dnssec). That query does not have the reply code flags set to 3. Instead it's set to 0. This results in name_from_dns() returning EAI_NODATA. Because of the above mentioned check, this value is directly returned and subsequent domains (and most importantly the domain without anything appended) are not tested. When I replaced the condition with `(cnt > 0)` it worked for me. I'm not sure that's the best solution, but I also can't see a reason to stop attempting to lookup the host because an unrelated host caused some error. To add some context, this was seen in a golang program running on a kind/Kubernetes cluster. In these clusters ndots is set to 5 so pretty much every name is first checked against the search list. When using the golang resolver with `GODEBUG=netdns=go` I do not see the same issue. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-04 4:02 [musl] lookup_name issue with search domains Kenny MacDermid @ 2022-12-04 5:45 ` Markus Wichmann 2022-12-04 15:31 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: Markus Wichmann @ 2022-12-04 5:45 UTC (permalink / raw) To: musl On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote: > The issue arises when it queries my cloudflare hosted domain (which also > uses dnssec). That query does not have the reply code flags set to 3. > Instead it's set to 0. This results in name_from_dns() returning > EAI_NODATA. I think we had that report before. The problem is that cloudflare is wrong here. DNS response with empty data section and NOERROR status means the domain name exists, but has no records of the requested type. If cloudflare is reporting that for a name where that isn't true, they are making a mistake. This is a cloudflare-specific break with the DNS standards (don't ask me which, though), so we probably won't change musl to deal with this. Simplest solution for the known-bad actor is to write a proxy server that turns the wrong answers into correct ones. Ciao, Markus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-04 5:45 ` Markus Wichmann @ 2022-12-04 15:31 ` Rich Felker 2022-12-04 23:04 ` Kenny MacDermid 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2022-12-04 15:31 UTC (permalink / raw) To: Markus Wichmann; +Cc: musl On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote: > On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote: > > The issue arises when it queries my cloudflare hosted domain (which also > > uses dnssec). That query does not have the reply code flags set to 3. > > Instead it's set to 0. This results in name_from_dns() returning > > EAI_NODATA. > > I think we had that report before. The problem is that cloudflare is > wrong here. DNS response with empty data section and NOERROR status > means the domain name exists, but has no records of the requested type. > If cloudflare is reporting that for a name where that isn't true, they > are making a mistake. > > This is a cloudflare-specific break with the DNS standards (don't ask me > which, though), so we probably won't change musl to deal with this. > Simplest solution for the known-bad actor is to write a proxy server > that turns the wrong answers into correct ones. It's not that we just won't accommodate what Cloudflare is doing, but that Cloudflare is returning data that *means something different* and for which the only correct behavior (that wouldn't break consistency for other results where the provider is using DNS semantics correctly) is what we're doing. Cloudflare is lying "this name exists but has no RRs of the type you requested" when it should be saying "this name does not exist". This is a consequence of an optimization they did to make it easier for them to implement DNSSEC dynamically without having to follow the way NSEC records work right. Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-04 15:31 ` Rich Felker @ 2022-12-04 23:04 ` Kenny MacDermid 2022-12-05 13:26 ` Rich Felker 0 siblings, 1 reply; 9+ messages in thread From: Kenny MacDermid @ 2022-12-04 23:04 UTC (permalink / raw) To: musl On Sun, Dec 04, 2022 at 10:31:33AM -0500, Rich Felker wrote: > On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote: > > On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote: > > > The issue arises when it queries my cloudflare hosted domain > > > (which also uses dnssec). That query does not have the reply code > > > flags set to 3. Instead it's set to 0. This results in > > > name_from_dns() returning EAI_NODATA. > > > > I think we had that report before. The problem is that cloudflare is > > wrong here. DNS response with empty data section and NOERROR status > > means the domain name exists, but has no records of the requested > > type. If cloudflare is reporting that for a name where that isn't > > true, they are making a mistake. > > > > This is a cloudflare-specific break with the DNS standards (don't > > ask me which, though), so we probably won't change musl to deal with > > this. Simplest solution for the known-bad actor is to write a proxy > > server that turns the wrong answers into correct ones. > > It's not that we just won't accommodate what Cloudflare is doing, but > that Cloudflare is returning data that *means something different* and > for which the only correct behavior (that wouldn't break consistency > for other results where the provider is using DNS semantics correctly) > is what we're doing. Well, I guess the “It’s always DNS” meme strikes again. Do you happen to have a reference to the RFC that Cloudflare isn't following by returning what they do? The blog post I found on the topic /claims/ they're compliant[1]. Either way it's unfortunate that musl handles this differently than others like glibc, the BSD libc, and Go. [1]: https://blog.cloudflare.com/black-lies/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-04 23:04 ` Kenny MacDermid @ 2022-12-05 13:26 ` Rich Felker 2022-12-05 20:11 ` Kenny MacDermid 0 siblings, 1 reply; 9+ messages in thread From: Rich Felker @ 2022-12-05 13:26 UTC (permalink / raw) To: Kenny MacDermid; +Cc: musl On Sun, Dec 04, 2022 at 07:04:10PM -0400, Kenny MacDermid wrote: > On Sun, Dec 04, 2022 at 10:31:33AM -0500, Rich Felker wrote: > > On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote: > > > On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote: > > > > The issue arises when it queries my cloudflare hosted domain > > > > (which also uses dnssec). That query does not have the reply code > > > > flags set to 3. Instead it's set to 0. This results in > > > > name_from_dns() returning EAI_NODATA. > > > > > > I think we had that report before. The problem is that cloudflare is > > > wrong here. DNS response with empty data section and NOERROR status > > > means the domain name exists, but has no records of the requested > > > type. If cloudflare is reporting that for a name where that isn't > > > true, they are making a mistake. > > > > > > This is a cloudflare-specific break with the DNS standards (don't > > > ask me which, though), so we probably won't change musl to deal with > > > this. Simplest solution for the known-bad actor is to write a proxy > > > server that turns the wrong answers into correct ones. > > > > It's not that we just won't accommodate what Cloudflare is doing, but > > that Cloudflare is returning data that *means something different* and > > for which the only correct behavior (that wouldn't break consistency > > for other results where the provider is using DNS semantics correctly) > > is what we're doing. > > Well, I guess the “It’s always DNS” meme strikes again. > > Do you happen to have a reference to the RFC that Cloudflare isn't > following by returning what they do? The blog post I found on the > topic /claims/ they're compliant[1]. > > Either way it's unfortunate that musl handles this differently than > others like glibc, the BSD libc, and Go. > > [1]: https://blog.cloudflare.com/black-lies/ You're not going to find anything saying they're not "compliant" because that's not the problem. The responses they're given are well-formed, consistent, and not breaking any rules of DNS from the perspective of someone making queries who does not have any prior expectation for what the queried zones contain. The problem is just that the responses *mean something different thant what you intended*. As an analogy, you could imagine a DNS provider adding some sort of TXT records to every name in your zone. Nothing about DNS says they can't -- these are valid records that can exist anywhere -- but they'd be serving something different than what you asked them to. In this case, Cloudflare is effectively making *every possible* name under your zone exist, but with no RRs defined for it unless you provided some. This is contrary to your intent that names you didn't define simply not exist. The solutions here are basically: - Turn off DNSSEC (not good), or - Use a different DNS provider that doesn't munge your zones, or - Don't use any functionality that depends on ability to distinguish NODATA from NxDomain for the names under your zone, and accept that everything is going to be NODATA. (In particular, don't use "search" on it.) If you want to search out other sources on the topic, "nodata vs nxdomain" is a good query. Rich ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-05 13:26 ` Rich Felker @ 2022-12-05 20:11 ` Kenny MacDermid 2022-12-05 22:25 ` Quentin Rameau 0 siblings, 1 reply; 9+ messages in thread From: Kenny MacDermid @ 2022-12-05 20:11 UTC (permalink / raw) To: musl On Mon, Dec 05, 2022 at 08:26:05AM -0500, Rich Felker wrote: > As an analogy, you could imagine a DNS provider adding some sort of > TXT records to every name in your zone. Nothing about DNS says they > can't -- these are valid records that can exist anywhere -- but they'd > be serving something different than what you asked them to. > > In this case, Cloudflare is effectively making *every possible* name > under your zone exist, but with no RRs defined for it unless you > provided some. This is contrary to your intent that names you didn't > define simply not exist. Thank you for all the information Rich. I'm in no way trying to be argumentative here, and am not claiming to be a DNS expert. I'm just trying to provide another view of the issue. In providing a different perspective I think the analogy is a good place to start. Let's say we take it a bit further and say it wasn't the DNS provider changing things. Say I added an MX record to a domain. The API that's in question is called `gethostbyname*`. It's not getTXT, or getMX or anything like that. When calling that I don't care if a name exists, I care if a host exists. As such I expect the API to only look at host records (and possibly dnssec that protect them). I wouldn't really care if there was 10 odd new record types, if there's no host records then there's no host at that name. From my understanding of what you're saying: if the query response doesn't contain error flags , it's indicating the name exists. That's fine, the name exists. That doesn't mean the host exists. The response that comes back has zero 'Answer RRs'. If searching should now stop because the host was found, what's it's address? Reading a Linux man page on `resolv.conf` it says of the "Search list for host-name lookup": >> Resolver queries having fewer than ndots dots (default is 1) in them >> will be attempted using each component of the search path in turn >> until a match is found. In the case where I have 3 search list entries, has a host match been found because the second domain has an MX record? It doesn't seem like it to me. From a glance for empty answers in RFC1034 I see section 6.2.4 has: NAME=SRI-NIC.ARPA, QTYPE=NS This query could return without any error but the RFC says: >> The only difference between the response and the query is the AA and >> RESPONSE bits in the header. The interpretation of this response is >> that the server is authoritative for the name, and the name exists, >> but no RRs of type NS are present there. That sounds to me like what Cloudflare is doing. They're saying they're the authority for the name, and no A records exist. So I guess it comes down to the question: Does this match a host? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-05 20:11 ` Kenny MacDermid @ 2022-12-05 22:25 ` Quentin Rameau 2022-12-06 5:19 ` Kenny MacDermid 0 siblings, 1 reply; 9+ messages in thread From: Quentin Rameau @ 2022-12-05 22:25 UTC (permalink / raw) To: musl Hi Kenny, > The API that's in question is called `gethostbyname*`. It's not getTXT, > or getMX or anything like that. When calling that I don't care if a name > exists, I care if a host exists. As such I expect the API to only look > at host records (and possibly dnssec that protect them). I wouldn't > really care if there was 10 odd new record types, if there's no host > records then there's no host at that name. Indeed, and that's what you get there. The DNS server is telling you it's authoritative (you'll get no better different answer from somebody else), the name exists, but its without an (IPv4) address. You get the error NO_DATA and your request ends there, as the authoritative entity of the domain told you so. > From my understanding of what you're saying: if the query response > doesn't contain error flags , it's indicating the name exists. That's > fine, the name exists. That doesn't mean the host exists. The response > that comes back has zero 'Answer RRs'. If searching should now stop > because the host was found, what's it's address? Searching ends there because the host was found by name, and the server said it doesn't have an associated address. > Reading a Linux man page on `resolv.conf` it says of the "Search list > for host-name lookup": > > >> Resolver queries having fewer than ndots dots (default is 1) in them > >> will be attempted using each component of the search path in turn > >> until a match is found. > So I guess it comes down to the question: Does this match a host? This matches a host, with no configured AF_INET address. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-05 22:25 ` Quentin Rameau @ 2022-12-06 5:19 ` Kenny MacDermid 2022-12-06 9:57 ` Quentin Rameau 0 siblings, 1 reply; 9+ messages in thread From: Kenny MacDermid @ 2022-12-06 5:19 UTC (permalink / raw) To: musl On Mon, Dec 05, 2022 at 11:25:06PM +0100, Quentin Rameau wrote: > Hi Kenny, > > > The API that's in question is called `gethostbyname*`. It's not > > getTXT, or getMX or anything like that. When calling that I don't > > care if a name exists, I care if a host exists. As such I expect the > > API to only look at host records (and possibly dnssec that protect > > them). I wouldn't really care if there was 10 odd new record types, > > if there's no host records then there's no host at that name. > > Indeed, and that's what you get there. > The DNS server is telling you it's authoritative > (you'll get no better different answer from somebody else), > the name exists, but its without an (IPv4) address. The name exists, yes, but does the _host_ exist? > Searching ends there because the host was found by name, > and the server said it doesn't have an associated address. Except a host wasn't found, just the name. To put an example to it, please point to the host that is 'notahost.macdermid.ca'. There is a TXT record for that domain name, yet I don't see how that creates a host. > > So I guess it comes down to the question: Does this match a host? > > This matches a host, with no configured AF_INET address. That would only be the case if we considered every domain name a host. I haven't found anything that specifies that particular limitation on DNS. If anything it seems MX records would be a counter-example. Also from RFC 1034: >>> We should be able to use names to retrieve host addresses, mailbox >>> data, and other as yet undetermined information. All data >>> associated with a name is tagged with a type, and queries can be >>> limited to a single type. Note it doesn't say 'data associated with a host'. I hope you don't feel I'm just being pedantic here. I'm simply trying to explain how we see domains names differently, and why I don't understand this particular difference between libc implementations. To me I own a domain and can create records in that domain. If I happen to point some names at hosts using A/AAA records, great. If other names have TXT, MX, or some other record type, well I don't feel I've created a host-missing-an-A/AAAA. And maybe I'm wrong. Maybe other libc's should be following musl and for a name to exist automatically makes it a host (although in that case, would musl be being pedantic in not supporting cloudflare?). Either way hopefully you understand better why it's confusing to me, and why people are bitten by this decision. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl] lookup_name issue with search domains 2022-12-06 5:19 ` Kenny MacDermid @ 2022-12-06 9:57 ` Quentin Rameau 0 siblings, 0 replies; 9+ messages in thread From: Quentin Rameau @ 2022-12-06 9:57 UTC (permalink / raw) To: musl > The name exists, yes, but does the _host_ exist? Yes it exists, that's what the authoritative DNS server answered. The host exists because it was identified by its name. The server, though, also told you that this host doesn't have a record of the type you queried for, and that's final. > > Searching ends there because the host was found by name, > > and the server said it doesn't have an associated address. > > Except a host wasn't found, just the name. To put an example to it, > please point to the host that is 'notahost.macdermid.ca'. There is a > TXT record for that domain name, yet I don't see how that creates a host. Yes, the host was found, otherwise the server would have answered with an “NXDomain” meaning it doesn't know this name. So what it tells you here, is that it is responsible for a host with the name notahost.macdermid.ca. If there are no address records associated with that name and you ask for some, then the server will tell you there isn't any with “NoData”, no error, but the data you asked for doesn't exist. If you ask for that TXT record, it'll give the answer. In both cases, it tells you that the host exists (no error). > > > So I guess it comes down to the question: Does this match a host? > > > > This matches a host, with no configured AF_INET address. > > That would only be the case if we considered every domain name a host. > I haven't found anything that specifies that particular limitation on > DNS. If anything it seems MX records would be a counter-example. Also > from RFC 1034: > > >>> We should be able to use names to retrieve host addresses, mailbox > >>> data, and other as yet undetermined information. All data > >>> associated with a name is tagged with a type, and queries can be > >>> limited to a single type. > > Note it doesn't say 'data associated with a host'. Indeed! All data of a host is associated with a (host) name. So if you get a positive anwer to a host name, but there is no actual data associated with it, then there is no such data. > I hope you don't feel I'm just being pedantic here. I'm simply trying to > explain how we see domains names differently, and why I don't understand > this particular difference between libc implementations. I kind of feel that's actually of the opposite. It seems that you interpret “host” as an independant virtual concept. In the context of DNS, a host is identified by a name. If a server answers it's responsible for that host name, then it exists. If it also tells you there is no record of the type you queried for, then it doesn't have any of those. > To me I own a domain and can create records in that domain. If I happen > to point some names at hosts using A/AAA records, great. If other names > have TXT, MX, or some other record type, well I don't feel I've created > a host-missing-an-A/AAAA. But you actually did, that's the point. > And maybe I'm wrong. Maybe other libc's should be following musl and for > a name to exist automatically makes it a host (although in that case, > would musl be being pedantic in not supporting cloudflare?). Either way > hopefully you understand better why it's confusing to me, and why people > are bitten by this decision. Yes, again there is not host and names and addresses existing independantly. A host is identified by a name, it's a host name. It can have addresses, it's called a host address. It can have other properties, it'd be called a host property. -- It seems that this whole discussion is not really about nxdata or nxdomain, but what yourself expect from gethostbyname(3), and the search directive of resolv.conf. Note that the former is deprecated, and the later not standardized. Regarding the API, it's pretty clear: - [HOST_NOT_FOUND] No such host is known. Meaning that this server isn't responsible for that host (and you would ask another one if you're searching for it) - [NO_DATA] The server recognized the request and the name, but no address is available. Another type of request to the name server for the domain might return an answer. Meaning you found the correct server responsible for that host. This host doesn't have an address associated with it, but it might have another type associated with it, like an MX address. Regarding the resolv.conf search directive, as it's not properly agreed on nor well written (documentation-wise), it is up to interpretation and one's own idea of what's correct and sane to do. Should the resolver spam all servers of the directive until some (most likely none) answers your actual request, even if the first one told you it's responsible for it and your requested data doesn't exist? Or should it respect what the server tells you in the first place? ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-12-06 9:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-04 4:02 [musl] lookup_name issue with search domains Kenny MacDermid 2022-12-04 5:45 ` Markus Wichmann 2022-12-04 15:31 ` Rich Felker 2022-12-04 23:04 ` Kenny MacDermid 2022-12-05 13:26 ` Rich Felker 2022-12-05 20:11 ` Kenny MacDermid 2022-12-05 22:25 ` Quentin Rameau 2022-12-06 5:19 ` Kenny MacDermid 2022-12-06 9:57 ` Quentin Rameau
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).