mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] lookup_name issue with search domains
@ 2022-12-04  4:02 Kenny MacDermid
  2022-12-04  5:45 ` Markus Wichmann
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny MacDermid @ 2022-12-04  4:02 UTC (permalink / raw)
  To: musl

Hello,

I'm seeing an issue in resolving hosts when any resolv.conf search
domain returns a no-data response. In debugging I believe it's caused by
the check in network/lookup_name.c, line 225:

if (cnt) return cnt;

The code is looping through the search domains trying each one.
This works fine for some of my search domains because the DNS response
will have reply code flags set to 3, which causes name_from_dns() to
return 0.

The issue arises when it queries my cloudflare hosted domain (which also
uses dnssec). That query does not have the reply code flags set to 3.
Instead it's set to 0. This results in name_from_dns() returning
EAI_NODATA.

Because of the above mentioned check, this value is directly returned
and subsequent domains (and most importantly the domain without anything
appended) are not tested.

When I replaced the condition with `(cnt > 0)` it worked for me. I'm not
sure that's the best solution, but I also can't see a reason to stop
attempting to lookup the host because an unrelated host caused some
error.

To add some context, this was seen in a golang program running on a
kind/Kubernetes cluster. In these clusters ndots is set to 5 so pretty
much every name is first checked against the search list. When using the
golang resolver with `GODEBUG=netdns=go` I do not see the same issue.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-04  4:02 [musl] lookup_name issue with search domains Kenny MacDermid
@ 2022-12-04  5:45 ` Markus Wichmann
  2022-12-04 15:31   ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Markus Wichmann @ 2022-12-04  5:45 UTC (permalink / raw)
  To: musl

On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote:
> The issue arises when it queries my cloudflare hosted domain (which also
> uses dnssec). That query does not have the reply code flags set to 3.
> Instead it's set to 0. This results in name_from_dns() returning
> EAI_NODATA.

I think we had that report before. The problem is that cloudflare is
wrong here. DNS response with empty data section and NOERROR status
means the domain name exists, but has no records of the requested type.
If cloudflare is reporting that for a name where that isn't true, they
are making a mistake.

This is a cloudflare-specific break with the DNS standards (don't ask me
which, though), so we probably won't change musl to deal with this.
Simplest solution for the known-bad actor is to write a proxy server
that turns the wrong answers into correct ones.

Ciao,
Markus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-04  5:45 ` Markus Wichmann
@ 2022-12-04 15:31   ` Rich Felker
  2022-12-04 23:04     ` Kenny MacDermid
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2022-12-04 15:31 UTC (permalink / raw)
  To: Markus Wichmann; +Cc: musl

On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote:
> On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote:
> > The issue arises when it queries my cloudflare hosted domain (which also
> > uses dnssec). That query does not have the reply code flags set to 3.
> > Instead it's set to 0. This results in name_from_dns() returning
> > EAI_NODATA.
> 
> I think we had that report before. The problem is that cloudflare is
> wrong here. DNS response with empty data section and NOERROR status
> means the domain name exists, but has no records of the requested type.
> If cloudflare is reporting that for a name where that isn't true, they
> are making a mistake.
> 
> This is a cloudflare-specific break with the DNS standards (don't ask me
> which, though), so we probably won't change musl to deal with this.
> Simplest solution for the known-bad actor is to write a proxy server
> that turns the wrong answers into correct ones.

It's not that we just won't accommodate what Cloudflare is doing, but
that Cloudflare is returning data that *means something different* and
for which the only correct behavior (that wouldn't break consistency
for other results where the provider is using DNS semantics correctly)
is what we're doing.

Cloudflare is lying "this name exists but has no RRs of the type you
requested" when it should be saying "this name does not exist". This
is a consequence of an optimization they did to make it easier for
them to implement DNSSEC dynamically without having to follow the way
NSEC records work right.

Rich

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-04 15:31   ` Rich Felker
@ 2022-12-04 23:04     ` Kenny MacDermid
  2022-12-05 13:26       ` Rich Felker
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny MacDermid @ 2022-12-04 23:04 UTC (permalink / raw)
  To: musl

On Sun, Dec 04, 2022 at 10:31:33AM -0500, Rich Felker wrote:
> On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote:
> > On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote:
> > > The issue arises when it queries my cloudflare hosted domain
> > > (which also uses dnssec). That query does not have the reply code
> > > flags set to 3.  Instead it's set to 0. This results in
> > > name_from_dns() returning EAI_NODATA.
> > 
> > I think we had that report before. The problem is that cloudflare is
> > wrong here. DNS response with empty data section and NOERROR status
> > means the domain name exists, but has no records of the requested
> > type.  If cloudflare is reporting that for a name where that isn't
> > true, they are making a mistake.
> > 
> > This is a cloudflare-specific break with the DNS standards (don't
> > ask me which, though), so we probably won't change musl to deal with
> > this.  Simplest solution for the known-bad actor is to write a proxy
> > server that turns the wrong answers into correct ones.
> 
> It's not that we just won't accommodate what Cloudflare is doing, but
> that Cloudflare is returning data that *means something different* and
> for which the only correct behavior (that wouldn't break consistency
> for other results where the provider is using DNS semantics correctly)
> is what we're doing.

Well, I guess the “It’s always DNS” meme strikes again.

Do you happen to have a reference to the RFC that Cloudflare isn't
following by returning what they do? The blog post I found on the
topic /claims/ they're compliant[1].

Either way it's unfortunate that musl handles this differently than
others like glibc, the BSD libc, and Go.

[1]: https://blog.cloudflare.com/black-lies/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-04 23:04     ` Kenny MacDermid
@ 2022-12-05 13:26       ` Rich Felker
  2022-12-05 20:11         ` Kenny MacDermid
  0 siblings, 1 reply; 9+ messages in thread
From: Rich Felker @ 2022-12-05 13:26 UTC (permalink / raw)
  To: Kenny MacDermid; +Cc: musl

On Sun, Dec 04, 2022 at 07:04:10PM -0400, Kenny MacDermid wrote:
> On Sun, Dec 04, 2022 at 10:31:33AM -0500, Rich Felker wrote:
> > On Sun, Dec 04, 2022 at 06:45:59AM +0100, Markus Wichmann wrote:
> > > On Sun, Dec 04, 2022 at 12:02:54AM -0400, Kenny MacDermid wrote:
> > > > The issue arises when it queries my cloudflare hosted domain
> > > > (which also uses dnssec). That query does not have the reply code
> > > > flags set to 3.  Instead it's set to 0. This results in
> > > > name_from_dns() returning EAI_NODATA.
> > > 
> > > I think we had that report before. The problem is that cloudflare is
> > > wrong here. DNS response with empty data section and NOERROR status
> > > means the domain name exists, but has no records of the requested
> > > type.  If cloudflare is reporting that for a name where that isn't
> > > true, they are making a mistake.
> > > 
> > > This is a cloudflare-specific break with the DNS standards (don't
> > > ask me which, though), so we probably won't change musl to deal with
> > > this.  Simplest solution for the known-bad actor is to write a proxy
> > > server that turns the wrong answers into correct ones.
> > 
> > It's not that we just won't accommodate what Cloudflare is doing, but
> > that Cloudflare is returning data that *means something different* and
> > for which the only correct behavior (that wouldn't break consistency
> > for other results where the provider is using DNS semantics correctly)
> > is what we're doing.
> 
> Well, I guess the “It’s always DNS” meme strikes again.
> 
> Do you happen to have a reference to the RFC that Cloudflare isn't
> following by returning what they do? The blog post I found on the
> topic /claims/ they're compliant[1].
> 
> Either way it's unfortunate that musl handles this differently than
> others like glibc, the BSD libc, and Go.
> 
> [1]: https://blog.cloudflare.com/black-lies/

You're not going to find anything saying they're not "compliant"
because that's not the problem. The responses they're given are
well-formed, consistent, and not breaking any rules of DNS from the
perspective of someone making queries who does not have any prior
expectation for what the queried zones contain. The problem is just
that the responses *mean something different thant what you intended*.

As an analogy, you could imagine a DNS provider adding some sort of
TXT records to every name in your zone. Nothing about DNS says they
can't -- these are valid records that can exist anywhere -- but they'd
be serving something different than what you asked them to.

In this case, Cloudflare is effectively making *every possible* name
under your zone exist, but with no RRs defined for it unless you
provided some. This is contrary to your intent that names you didn't
define simply not exist.

The solutions here are basically:

- Turn off DNSSEC (not good), or

- Use a different DNS provider that doesn't munge your zones, or

- Don't use any functionality that depends on ability to distinguish
  NODATA from NxDomain for the names under your zone, and accept that
  everything is going to be NODATA. (In particular, don't use "search"
  on it.)

If you want to search out other sources on the topic, "nodata vs
nxdomain" is a good query.

Rich

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-05 13:26       ` Rich Felker
@ 2022-12-05 20:11         ` Kenny MacDermid
  2022-12-05 22:25           ` Quentin Rameau
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny MacDermid @ 2022-12-05 20:11 UTC (permalink / raw)
  To: musl

On Mon, Dec 05, 2022 at 08:26:05AM -0500, Rich Felker wrote:
> As an analogy, you could imagine a DNS provider adding some sort of
> TXT records to every name in your zone.  Nothing about DNS says they
> can't -- these are valid records that can exist anywhere -- but they'd
> be serving something different than what you asked them to.
> 
> In this case, Cloudflare is effectively making *every possible* name
> under your zone exist, but with no RRs defined for it unless you
> provided some. This is contrary to your intent that names you didn't
> define simply not exist.

Thank you for all the information Rich. I'm in no way trying to be
argumentative here, and am not claiming to be a DNS expert. I'm just
trying to provide another view of the issue.

In providing a different perspective I think the analogy is a good place
to start. Let's say we take it a bit further and say it wasn't the DNS
provider changing things. Say I added an MX record to a domain.

The API that's in question is called `gethostbyname*`. It's not getTXT,
or getMX or anything like that. When calling that I don't care if a name
exists, I care if a host exists. As such I expect the API to only look
at host records (and possibly dnssec that protect them). I wouldn't
really care if there was 10 odd new record types, if there's no host
records then there's no host at that name.

From my understanding of what you're saying: if the query response
doesn't contain error flags , it's indicating the name exists. That's
fine, the name exists. That doesn't mean the host exists. The response
that comes back has zero 'Answer RRs'. If searching should now stop
because the host was found, what's it's address?

Reading a Linux man page on `resolv.conf` it says of the "Search list
for host-name lookup":

>> Resolver queries having fewer than ndots dots (default is 1) in them
>> will be attempted using each component of the search path in turn
>> until a match is found.

In the case where I have 3 search list entries, has a host match been
found because the second domain has an MX record? It doesn't seem like
it to me.

From a glance for empty answers in RFC1034 I see section 6.2.4 has:

NAME=SRI-NIC.ARPA, QTYPE=NS

This query could return without any error but the RFC says:

>> The only difference between the response and the query is the AA and
>> RESPONSE bits in the header.  The interpretation of this response is
>> that the server is authoritative for the name, and the name exists,
>> but no RRs of type NS are present there.

That sounds to me like what Cloudflare is doing. They're saying they're
the authority for the name, and no A records exist.

So I guess it comes down to the question: Does this match a host?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-05 20:11         ` Kenny MacDermid
@ 2022-12-05 22:25           ` Quentin Rameau
  2022-12-06  5:19             ` Kenny MacDermid
  0 siblings, 1 reply; 9+ messages in thread
From: Quentin Rameau @ 2022-12-05 22:25 UTC (permalink / raw)
  To: musl

Hi Kenny,

> The API that's in question is called `gethostbyname*`. It's not getTXT,
> or getMX or anything like that. When calling that I don't care if a name
> exists, I care if a host exists. As such I expect the API to only look
> at host records (and possibly dnssec that protect them). I wouldn't
> really care if there was 10 odd new record types, if there's no host
> records then there's no host at that name.

Indeed, and that's what you get there.
The DNS server is telling you it's authoritative
(you'll get no better different answer from somebody else),
the name exists, but its without an (IPv4) address.

You get the error NO_DATA and your request ends there,
as the authoritative entity of the domain told you so.

> From my understanding of what you're saying: if the query response
> doesn't contain error flags , it's indicating the name exists. That's
> fine, the name exists. That doesn't mean the host exists. The response
> that comes back has zero 'Answer RRs'. If searching should now stop
> because the host was found, what's it's address?

Searching ends there because the host was found by name,
and the server said it doesn't have an associated address.

> Reading a Linux man page on `resolv.conf` it says of the "Search list
> for host-name lookup":
> 
> >> Resolver queries having fewer than ndots dots (default is 1) in them
> >> will be attempted using each component of the search path in turn
> >> until a match is found.  

> So I guess it comes down to the question: Does this match a host?

This matches a host, with no configured AF_INET address.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-05 22:25           ` Quentin Rameau
@ 2022-12-06  5:19             ` Kenny MacDermid
  2022-12-06  9:57               ` Quentin Rameau
  0 siblings, 1 reply; 9+ messages in thread
From: Kenny MacDermid @ 2022-12-06  5:19 UTC (permalink / raw)
  To: musl

On Mon, Dec 05, 2022 at 11:25:06PM +0100, Quentin Rameau wrote:
> Hi Kenny,
> 
> > The API that's in question is called `gethostbyname*`. It's not
> > getTXT, or getMX or anything like that. When calling that I don't
> > care if a name exists, I care if a host exists. As such I expect the
> > API to only look at host records (and possibly dnssec that protect
> > them). I wouldn't really care if there was 10 odd new record types,
> > if there's no host records then there's no host at that name.
> 
> Indeed, and that's what you get there.
> The DNS server is telling you it's authoritative
> (you'll get no better different answer from somebody else),
> the name exists, but its without an (IPv4) address.

The name exists, yes, but does the _host_ exist?

> Searching ends there because the host was found by name,
> and the server said it doesn't have an associated address.

Except a host wasn't found, just the name. To put an example to it,
please point to the host that is 'notahost.macdermid.ca'. There is a
TXT record for that domain name, yet I don't see how that creates a host.

> > So I guess it comes down to the question: Does this match a host?
> 
> This matches a host, with no configured AF_INET address.

That would only be the case if we considered every domain name a host.
I haven't found anything that specifies that particular limitation on
DNS. If anything it seems MX records would be a counter-example. Also
from RFC 1034:

>>> We should be able to use names to retrieve host addresses, mailbox
>>> data, and other as yet undetermined information.  All data
>>> associated with a name is tagged with a type, and queries can be
>>> limited to a single type.

Note it doesn't say 'data associated with a host'.

I hope you don't feel I'm just being pedantic here. I'm simply trying to
explain how we see domains names differently, and why I don't understand
this particular difference between libc implementations.

To me I own a domain and can create records in that domain. If I happen
to point some names at hosts using A/AAA records, great. If other names
have TXT, MX, or some other record type, well I don't feel I've created
a host-missing-an-A/AAAA.

And maybe I'm wrong. Maybe other libc's should be following musl and for
a name to exist automatically makes it a host (although in that case,
would musl be being pedantic in not supporting cloudflare?). Either way
hopefully you understand better why it's confusing to me, and why people
are bitten by this decision.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [musl] lookup_name issue with search domains
  2022-12-06  5:19             ` Kenny MacDermid
@ 2022-12-06  9:57               ` Quentin Rameau
  0 siblings, 0 replies; 9+ messages in thread
From: Quentin Rameau @ 2022-12-06  9:57 UTC (permalink / raw)
  To: musl

> The name exists, yes, but does the _host_ exist?

Yes it exists, that's what the authoritative DNS server answered.
The host exists because it was identified by its name.
The server, though, also told you that this host doesn't have a record
of the type you queried for, and that's final.

> > Searching ends there because the host was found by name,
> > and the server said it doesn't have an associated address.  
> 
> Except a host wasn't found, just the name. To put an example to it,
> please point to the host that is 'notahost.macdermid.ca'. There is a
> TXT record for that domain name, yet I don't see how that creates a host.

Yes, the host was found, otherwise the server would have answered with
an “NXDomain” meaning it doesn't know this name.
So what it tells you here, is that it is responsible for a host
with the name notahost.macdermid.ca. If there are no address records
associated with that name and you ask for some,
then the server will tell you there isn't any with “NoData”,
no error, but the data you asked for doesn't exist.
If you ask for that TXT record, it'll give the answer.
In both cases, it tells you that the host exists (no error).

> > > So I guess it comes down to the question: Does this match a host?  
> > 
> > This matches a host, with no configured AF_INET address.  
> 
> That would only be the case if we considered every domain name a host.
> I haven't found anything that specifies that particular limitation on
> DNS. If anything it seems MX records would be a counter-example. Also
> from RFC 1034:
>
> >>> We should be able to use names to retrieve host addresses, mailbox
> >>> data, and other as yet undetermined information.  All data
> >>> associated with a name is tagged with a type, and queries can be
> >>> limited to a single type.  
> 
> Note it doesn't say 'data associated with a host'.

Indeed! All data of a host is associated with a (host) name.
So if you get a positive anwer to a host name, but there is no actual
data associated with it, then there is no such data.

> I hope you don't feel I'm just being pedantic here. I'm simply trying to
> explain how we see domains names differently, and why I don't understand
> this particular difference between libc implementations.

I kind of feel that's actually of the opposite.
It seems that you interpret “host” as an independant virtual concept.
In the context of DNS, a host is identified by a name.
If a server answers it's responsible for that host name,
then it exists.
If it also tells you there is no record of the type you queried for,
then it doesn't have any of those.

> To me I own a domain and can create records in that domain. If I happen
> to point some names at hosts using A/AAA records, great. If other names
> have TXT, MX, or some other record type, well I don't feel I've created
> a host-missing-an-A/AAAA.

But you actually did, that's the point.

> And maybe I'm wrong. Maybe other libc's should be following musl and for
> a name to exist automatically makes it a host (although in that case,
> would musl be being pedantic in not supporting cloudflare?). Either way
> hopefully you understand better why it's confusing to me, and why people
> are bitten by this decision.

Yes, again there is not host and names and addresses
existing independantly.
A host is identified by a name, it's a host name.
It can have addresses, it's called a host address.
It can have other properties, it'd be called a host property.

--

It seems that this whole discussion is not really about nxdata or
nxdomain, but what yourself expect from gethostbyname(3),
and the search directive of resolv.conf.
Note that the former is deprecated, and the later not standardized.

Regarding the API, it's pretty clear:

- [HOST_NOT_FOUND] No such host is known.
Meaning that this server isn't responsible for that host
(and you would ask another one if you're searching for it)

- [NO_DATA] The server recognized the request and the name, but no
  address is available. Another type of request to the name server
  for the domain might return an answer.
Meaning you found the correct server responsible for that host.
This host doesn't have an address associated with it, but it might
have another type associated with it, like an MX address.

Regarding the resolv.conf search directive, as it's not properly
agreed on nor well written (documentation-wise), it is up to
interpretation and one's own idea of what's correct and sane to do.

Should the resolver spam all servers of the directive until some
(most likely none) answers your actual request, even if the first one
told you it's responsible for it and your requested data doesn't
exist?

Or should it respect what the server tells you in the first place?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-06  9:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-04  4:02 [musl] lookup_name issue with search domains Kenny MacDermid
2022-12-04  5:45 ` Markus Wichmann
2022-12-04 15:31   ` Rich Felker
2022-12-04 23:04     ` Kenny MacDermid
2022-12-05 13:26       ` Rich Felker
2022-12-05 20:11         ` Kenny MacDermid
2022-12-05 22:25           ` Quentin Rameau
2022-12-06  5:19             ` Kenny MacDermid
2022-12-06  9:57               ` Quentin Rameau

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).