[PATCH] resolver: only exit the search path loop there are a positive number of results given

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [PATCH] resolver: only exit the search path loop there are a positive number of results given
@ 2018-03-30 18:52 William Pitcock
  2018-03-30 19:14 ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: William Pitcock @ 2018-03-30 18:52 UTC (permalink / raw)
  To: musl; +Cc: William Pitcock

In the event of no results being given by any of the lookup modules, EAI_NONAME will still
be thrown.

This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
such as the one Cloudflare have implemented, and appear in the search path.
---
 src/network/lookup_name.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
index 209c20f0..b068bb92 100644
--- a/src/network/lookup_name.c
+++ b/src/network/lookup_name.c
@@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
 			memcpy(canon+l+1, p, z-p);
 			canon[z-p+1+l] = 0;
 			int cnt = name_from_dns(buf, canon, canon, family, &conf);
-			if (cnt) return cnt;
+			if (cnt > 0) return cnt;
 		}
 	}
 
-- 
2.16.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 18:52 [PATCH] resolver: only exit the search path loop there are a positive number of results given William Pitcock
@ 2018-03-30 19:14 ` Rich Felker
  2018-03-30 19:19   ` William Pitcock
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2018-03-30 19:14 UTC (permalink / raw)
  To: musl

On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
> be thrown.
> 
> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
> such as the one Cloudflare have implemented, and appear in the search path.
> ---
>  src/network/lookup_name.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> index 209c20f0..b068bb92 100644
> --- a/src/network/lookup_name.c
> +++ b/src/network/lookup_name.c
> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>  			memcpy(canon+l+1, p, z-p);
>  			canon[z-p+1+l] = 0;
>  			int cnt = name_from_dns(buf, canon, canon, family, &conf);
> -			if (cnt) return cnt;
> +			if (cnt > 0) return cnt;
>  		}
>  	}

This patch is incorrect, and the reason should be an FAQ item if it's
not already. Only a return value of 0 means that the requested name
does not exist and that it's permissible to continue search. Other
nonpositive return values indicate either that the name does exist but
does not have a record of the quested type, or that a transient error
occurred, making it impossible to determine whether the search can be
continued and thus requiring the error to be reported to the caller.
Anything else results in one or both of the following bugs:

- Nondeterministically returning different results for the same query
  depending on transient unavailability of the nameservers to answer
  on time.

- Returning inconsistent results (for different search components)
  depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.

I'm aware that at least rancher-dns and Cloudflare's nameservers have
had bugs related to this issue. I'm not sure what the status on
getting them fixed is, and for Cloudflare I don't know exactly what it
is they're doing wrong or why. But I do know the problem is that
they're returning semantically incorrect dns replies.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:14 ` Rich Felker
@ 2018-03-30 19:19   ` William Pitcock
  2018-03-30 19:35     ` Rich Felker
  0 siblings, 1 reply; 12+ messages in thread
From: William Pitcock @ 2018-03-30 19:19 UTC (permalink / raw)
  To: musl

Hello,

On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
>> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
>> be thrown.
>>
>> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
>> such as the one Cloudflare have implemented, and appear in the search path.
>> ---
>>  src/network/lookup_name.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> index 209c20f0..b068bb92 100644
>> --- a/src/network/lookup_name.c
>> +++ b/src/network/lookup_name.c
>> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>>                       memcpy(canon+l+1, p, z-p);
>>                       canon[z-p+1+l] = 0;
>>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> -                     if (cnt) return cnt;
>> +                     if (cnt > 0) return cnt;
>>               }
>>       }
>
> This patch is incorrect, and the reason should be an FAQ item if it's
> not already. Only a return value of 0 means that the requested name
> does not exist and that it's permissible to continue search. Other
> nonpositive return values indicate either that the name does exist but
> does not have a record of the quested type, or that a transient error
> occurred, making it impossible to determine whether the search can be
> continued and thus requiring the error to be reported to the caller.
> Anything else results in one or both of the following bugs:
>
> - Nondeterministically returning different results for the same query
>   depending on transient unavailability of the nameservers to answer
>   on time.
>
> - Returning inconsistent results (for different search components)
>   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
>
> I'm aware that at least rancher-dns and Cloudflare's nameservers have
> had bugs related to this issue. I'm not sure what the status on
> getting them fixed is, and for Cloudflare I don't know exactly what it
> is they're doing wrong or why. But I do know the problem is that
> they're returning semantically incorrect dns replies.

Kubernetes imposes a default search path with the cluster domain last, so:

  - local.prod.svc.whatever
  - prod.svc.whatever
  - svc.whatever
  - yourdomain.com

The cloudflare issue is that they send SUCCESS code with 0 replies,
which causes musl to error when it hits the yourdomain.com.

Do you have any suggestions on a mitigation which would be more
palatable?  We need to ship a mitigation for this in Alpine 3.8
regardless.  I would much rather carry a patch that is upstreamable,
but I am quite willing to carry one that isn't, in order to solve this
problem.

William


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:19   ` William Pitcock
@ 2018-03-30 19:35     ` Rich Felker
  2018-03-30 19:44       ` William Pitcock
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2018-03-30 19:35 UTC (permalink / raw)
  To: musl

On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote:
> Hello,
> 
> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
> >> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
> >> be thrown.
> >>
> >> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
> >> such as the one Cloudflare have implemented, and appear in the search path.
> >> ---
> >>  src/network/lookup_name.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> >> index 209c20f0..b068bb92 100644
> >> --- a/src/network/lookup_name.c
> >> +++ b/src/network/lookup_name.c
> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
> >>                       memcpy(canon+l+1, p, z-p);
> >>                       canon[z-p+1+l] = 0;
> >>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
> >> -                     if (cnt) return cnt;
> >> +                     if (cnt > 0) return cnt;
> >>               }
> >>       }
> >
> > This patch is incorrect, and the reason should be an FAQ item if it's
> > not already. Only a return value of 0 means that the requested name
> > does not exist and that it's permissible to continue search. Other
> > nonpositive return values indicate either that the name does exist but
> > does not have a record of the quested type, or that a transient error
> > occurred, making it impossible to determine whether the search can be
> > continued and thus requiring the error to be reported to the caller.
> > Anything else results in one or both of the following bugs:
> >
> > - Nondeterministically returning different results for the same query
> >   depending on transient unavailability of the nameservers to answer
> >   on time.
> >
> > - Returning inconsistent results (for different search components)
> >   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
> >
> > I'm aware that at least rancher-dns and Cloudflare's nameservers have
> > had bugs related to this issue. I'm not sure what the status on
> > getting them fixed is, and for Cloudflare I don't know exactly what it
> > is they're doing wrong or why. But I do know the problem is that
> > they're returning semantically incorrect dns replies.
> 
> Kubernetes imposes a default search path with the cluster domain last, so:
> 
>   - local.prod.svc.whatever
>   - prod.svc.whatever
>   - svc.whatever
>   - yourdomain.com
> 
> The cloudflare issue is that they send SUCCESS code with 0 replies,
> which causes musl to error when it hits the yourdomain.com.

Yes, that makes sense. Do you know why they're doing it? If they
refuse to fix it, the only clean fix I know is a local proxy
configured to fix the records for the specific broken domains you care
about. But of course that's not convenient.

> Do you have any suggestions on a mitigation which would be more
> palatable?  We need to ship a mitigation for this in Alpine 3.8
> regardless.  I would much rather carry a patch that is upstreamable,
> but I am quite willing to carry one that isn't, in order to solve this
> problem.

A theoretically-non-horrible (but somewhat costly) solution is to
always query both A and AAAA, rather than only doing it for AF_UNSPEC.
Then if you see a reply with 0 (total, between both) records, you can
opt to interpret that the same way as NxDomain without breaking
consistency properties. If Cloudflare refuses to fix the bug, maybe we
should consider adding an _option_ (in the resolv.conf options line)
to do this. I don't think it should be the default behavior because it
mildly slows down lookups, especially if you have nontrivial packet
loss since probability of failure is now 1-(1-p)²=2p-p² rather than p
(where p is the packet loss rate).

Rich


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:35     ` Rich Felker
@ 2018-03-30 19:44       ` William Pitcock
  2018-03-30 20:24         ` Szabolcs Nagy
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: William Pitcock @ 2018-03-30 19:44 UTC (permalink / raw)
  To: musl

Hello,

On Fri, Mar 30, 2018 at 2:35 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote:
>> Hello,
>>
>> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
>> >> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
>> >> be thrown.
>> >>
>> >> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
>> >> such as the one Cloudflare have implemented, and appear in the search path.
>> >> ---
>> >>  src/network/lookup_name.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> >> index 209c20f0..b068bb92 100644
>> >> --- a/src/network/lookup_name.c
>> >> +++ b/src/network/lookup_name.c
>> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>> >>                       memcpy(canon+l+1, p, z-p);
>> >>                       canon[z-p+1+l] = 0;
>> >>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> >> -                     if (cnt) return cnt;
>> >> +                     if (cnt > 0) return cnt;
>> >>               }
>> >>       }
>> >
>> > This patch is incorrect, and the reason should be an FAQ item if it's
>> > not already. Only a return value of 0 means that the requested name
>> > does not exist and that it's permissible to continue search. Other
>> > nonpositive return values indicate either that the name does exist but
>> > does not have a record of the quested type, or that a transient error
>> > occurred, making it impossible to determine whether the search can be
>> > continued and thus requiring the error to be reported to the caller.
>> > Anything else results in one or both of the following bugs:
>> >
>> > - Nondeterministically returning different results for the same query
>> >   depending on transient unavailability of the nameservers to answer
>> >   on time.
>> >
>> > - Returning inconsistent results (for different search components)
>> >   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
>> >
>> > I'm aware that at least rancher-dns and Cloudflare's nameservers have
>> > had bugs related to this issue. I'm not sure what the status on
>> > getting them fixed is, and for Cloudflare I don't know exactly what it
>> > is they're doing wrong or why. But I do know the problem is that
>> > they're returning semantically incorrect dns replies.
>>
>> Kubernetes imposes a default search path with the cluster domain last, so:
>>
>>   - local.prod.svc.whatever
>>   - prod.svc.whatever
>>   - svc.whatever
>>   - yourdomain.com
>>
>> The cloudflare issue is that they send SUCCESS code with 0 replies,
>> which causes musl to error when it hits the yourdomain.com.
>
> Yes, that makes sense. Do you know why they're doing it? If they
> refuse to fix it, the only clean fix I know is a local proxy
> configured to fix the records for the specific broken domains you care
> about. But of course that's not convenient.

My contacts at cloudflare indicate that their environment depends on
this behaviour, so they have no interest in fixing it.

A local proxy isn't going to be workable, because most people are
going to just say "but Debian or Fedora doesn't require this," and
then just go use a glibc distribution.

There is a talk in a few weeks at Kubecon (the Kubernetes conference),
explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
largely centers around how musl's overly strict behaviour makes Alpine
a bad choice for "the real world."  I would like to turn this into a
story where we can announce that Alpine 3.8 mitigates this problem
instead, doing such will be good for both Alpine and the musl
ecosystem as a whole, as it is defanging a point of possible FUD.

>
>> Do you have any suggestions on a mitigation which would be more
>> palatable?  We need to ship a mitigation for this in Alpine 3.8
>> regardless.  I would much rather carry a patch that is upstreamable,
>> but I am quite willing to carry one that isn't, in order to solve this
>> problem.
>
> A theoretically-non-horrible (but somewhat costly) solution is to
> always query both A and AAAA, rather than only doing it for AF_UNSPEC.
> Then if you see a reply with 0 (total, between both) records, you can
> opt to interpret that the same way as NxDomain without breaking
> consistency properties. If Cloudflare refuses to fix the bug, maybe we
> should consider adding an _option_ (in the resolv.conf options line)
> to do this. I don't think it should be the default behavior because it
> mildly slows down lookups, especially if you have nontrivial packet
> loss since probability of failure is now 1-(1-p)²=2p-p² rather than p
> (where p is the packet loss rate).

It seems to me we could just send ANY and filter out the records we
don't care about.  This is what I did with charybdis's asynchronous
DNS resolver when it had a similar problem.  What are your thoughts on
that?

William


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:44       ` William Pitcock
@ 2018-03-30 20:24         ` Szabolcs Nagy
  2018-03-30 20:33           ` William Pitcock
  2018-03-30 20:35         ` Rich Felker
  2018-03-31 10:42         ` Florian Weimer
  2 siblings, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2018-03-30 20:24 UTC (permalink / raw)
  To: musl

* William Pitcock <nenolod@dereferenced.org> [2018-03-30 14:44:44 -0500]:
> There is a talk in a few weeks at Kubecon (the Kubernetes conference),
> explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
> largely centers around how musl's overly strict behaviour makes Alpine
> a bad choice for "the real world."  I would like to turn this into a
> story where we can announce that Alpine 3.8 mitigates this problem
> instead, doing such will be good for both Alpine and the musl
> ecosystem as a whole, as it is defanging a point of possible FUD.

musl has had a good track record catching bugs in dns
implementations and in dns setups, i don't think this
should be mitigated, it's good to know that cloudflare
has a broken dns server, or that kubernetes had dns
setup bugs, these are real reliability and performance
issues affecting all systems not just musl.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 20:24         ` Szabolcs Nagy
@ 2018-03-30 20:33           ` William Pitcock
  0 siblings, 0 replies; 12+ messages in thread
From: William Pitcock @ 2018-03-30 20:33 UTC (permalink / raw)
  To: musl

Hello,

On Fri, Mar 30, 2018 at 3:24 PM, Szabolcs Nagy <nsz@port70.net> wrote:
> * William Pitcock <nenolod@dereferenced.org> [2018-03-30 14:44:44 -0500]:
>> There is a talk in a few weeks at Kubecon (the Kubernetes conference),
>> explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
>> largely centers around how musl's overly strict behaviour makes Alpine
>> a bad choice for "the real world."  I would like to turn this into a
>> story where we can announce that Alpine 3.8 mitigates this problem
>> instead, doing such will be good for both Alpine and the musl
>> ecosystem as a whole, as it is defanging a point of possible FUD.
>
> musl has had a good track record catching bugs in dns
> implementations and in dns setups, i don't think this
> should be mitigated, it's good to know that cloudflare
> has a broken dns server, or that kubernetes had dns
> setup bugs, these are real reliability and performance
> issues affecting all systems not just musl.

We can add a strict option to turn off the mitigation, but Alpine 3.8
will ship with a mitigation regardless.

William


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:44       ` William Pitcock
  2018-03-30 20:24         ` Szabolcs Nagy
@ 2018-03-30 20:35         ` Rich Felker
  2018-03-30 21:09           ` William Pitcock
  2018-03-31 10:42         ` Florian Weimer
  2 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2018-03-30 20:35 UTC (permalink / raw)
  To: musl

On Fri, Mar 30, 2018 at 02:44:44PM -0500, William Pitcock wrote:
> Hello,
> 
> On Fri, Mar 30, 2018 at 2:35 PM, Rich Felker <dalias@libc.org> wrote:
> > On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote:
> >> Hello,
> >>
> >> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
> >> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
> >> >> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
> >> >> be thrown.
> >> >>
> >> >> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
> >> >> such as the one Cloudflare have implemented, and appear in the search path.
> >> >> ---
> >> >>  src/network/lookup_name.c | 2 +-
> >> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
> >> >> index 209c20f0..b068bb92 100644
> >> >> --- a/src/network/lookup_name.c
> >> >> +++ b/src/network/lookup_name.c
> >> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
> >> >>                       memcpy(canon+l+1, p, z-p);
> >> >>                       canon[z-p+1+l] = 0;
> >> >>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
> >> >> -                     if (cnt) return cnt;
> >> >> +                     if (cnt > 0) return cnt;
> >> >>               }
> >> >>       }
> >> >
> >> > This patch is incorrect, and the reason should be an FAQ item if it's
> >> > not already. Only a return value of 0 means that the requested name
> >> > does not exist and that it's permissible to continue search. Other
> >> > nonpositive return values indicate either that the name does exist but
> >> > does not have a record of the quested type, or that a transient error
> >> > occurred, making it impossible to determine whether the search can be
> >> > continued and thus requiring the error to be reported to the caller.
> >> > Anything else results in one or both of the following bugs:
> >> >
> >> > - Nondeterministically returning different results for the same query
> >> >   depending on transient unavailability of the nameservers to answer
> >> >   on time.
> >> >
> >> > - Returning inconsistent results (for different search components)
> >> >   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
> >> >
> >> > I'm aware that at least rancher-dns and Cloudflare's nameservers have
> >> > had bugs related to this issue. I'm not sure what the status on
> >> > getting them fixed is, and for Cloudflare I don't know exactly what it
> >> > is they're doing wrong or why. But I do know the problem is that
> >> > they're returning semantically incorrect dns replies.
> >>
> >> Kubernetes imposes a default search path with the cluster domain last, so:
> >>
> >>   - local.prod.svc.whatever
> >>   - prod.svc.whatever
> >>   - svc.whatever
> >>   - yourdomain.com
> >>
> >> The cloudflare issue is that they send SUCCESS code with 0 replies,
> >> which causes musl to error when it hits the yourdomain.com.
> >
> > Yes, that makes sense. Do you know why they're doing it? If they
> > refuse to fix it, the only clean fix I know is a local proxy
> > configured to fix the records for the specific broken domains you care
> > about. But of course that's not convenient.
> 
> My contacts at cloudflare indicate that their environment depends on
> this behaviour, so they have no interest in fixing it.
> 
> A local proxy isn't going to be workable, because most people are
> going to just say "but Debian or Fedora doesn't require this," and
> then just go use a glibc distribution.
> 
> There is a talk in a few weeks at Kubecon (the Kubernetes conference),
> explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
> largely centers around how musl's overly strict behaviour makes Alpine
> a bad choice for "the real world."  I would like to turn this into a
> story where we can announce that Alpine 3.8 mitigates this problem
> instead, doing such will be good for both Alpine and the musl
> ecosystem as a whole, as it is defanging a point of possible FUD.

I understand that this is a frustrating position to be in, where
someone who (from your description so far, at least) seems like a
troll, is planning to publicly malign your distro. I don't think
rushing quick, wrong workarounds helps. Rather it makes it look like
you knew there was a problem and only acted on it because someone
threatened to make you look bad in public. The reality is that
Cloudflare is returning wrong DNS results that aren't suitable for the
search settings k8s users want to use.

I think a better response would be demonstrating why the glibc
behavior they're relying on is buggy. This should be possible by
configuring two search domains:

	ex1.example.com
	ex2.example.com

and adding the following records to them:

	foo.ex1.example.com.	A	10.0.0.1
	foo.ex2.example.com.	A	192.168.0.1
	foo.ex2.example.com.	AAAA	fd00::1

Then, try glibc's getaddrinfo for the name "foo" with AF_INET,
AF_INET6, and AF_UNSPEC. The results I expect you'll see are:

	AF_INET:	10.0.0.1
	AF_INET6:	192.168.0.1, fd00::1
	AF_UNSPEC:	10.0.0.1 (possibly also fd00::1 but doubtful)

That is, *which fqdn you see results for* is depending on which
address family you made the query with. This is clearly wrong.

With musl you'll consistently see results for foo.ex1.example.com:

	AF_INET:	10.0.0.1
	AF_INET6:	no results
	AF_UNSPEC:	10.0.0.1

since it's the first definition of foo in the serch domains.

> >> Do you have any suggestions on a mitigation which would be more
> >> palatable?  We need to ship a mitigation for this in Alpine 3.8
> >> regardless.  I would much rather carry a patch that is upstreamable,
> >> but I am quite willing to carry one that isn't, in order to solve this
> >> problem.
> >
> > A theoretically-non-horrible (but somewhat costly) solution is to
> > always query both A and AAAA, rather than only doing it for AF_UNSPEC.
> > Then if you see a reply with 0 (total, between both) records, you can
> > opt to interpret that the same way as NxDomain without breaking
> > consistency properties. If Cloudflare refuses to fix the bug, maybe we
> > should consider adding an _option_ (in the resolv.conf options line)
> > to do this. I don't think it should be the default behavior because it
> > mildly slows down lookups, especially if you have nontrivial packet
> > loss since probability of failure is now 1-(1-p)²=2p-p² rather than p
> > (where p is the packet loss rate).
> 
> It seems to me we could just send ANY and filter out the records we
> don't care about.  This is what I did with charybdis's asynchronous
> DNS resolver when it had a similar problem.  What are your thoughts on
> that?

If ANY requests actually worked we would have done this years ago.
They don't. Not only is the ANY result unlikely to fit in a UDP reply
(and lacking any reason to include the records you need if it doesn't
fit); A and AAAA queries are also special in that they resolve CNAMEs
and automatically include the A or AAAA records for the name the CNAME
points to in the reply. With ANY you'd have to reimplement the whole
CNAME logic as a second (slow) query from the stub resolver rather
than having the recursive resolver give you the answer it already knew
when you made the first query.

Rich




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 20:35         ` Rich Felker
@ 2018-03-30 21:09           ` William Pitcock
  0 siblings, 0 replies; 12+ messages in thread
From: William Pitcock @ 2018-03-30 21:09 UTC (permalink / raw)
  To: musl

Hello,

On Fri, Mar 30, 2018 at 3:35 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 30, 2018 at 02:44:44PM -0500, William Pitcock wrote:
>> Hello,
>>
>> On Fri, Mar 30, 2018 at 2:35 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote:
>> >> Hello,
>> >>
>> >> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
>> >> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
>> >> >> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
>> >> >> be thrown.
>> >> >>
>> >> >> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
>> >> >> such as the one Cloudflare have implemented, and appear in the search path.
>> >> >> ---
>> >> >>  src/network/lookup_name.c | 2 +-
>> >> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >>
>> >> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> >> >> index 209c20f0..b068bb92 100644
>> >> >> --- a/src/network/lookup_name.c
>> >> >> +++ b/src/network/lookup_name.c
>> >> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>> >> >>                       memcpy(canon+l+1, p, z-p);
>> >> >>                       canon[z-p+1+l] = 0;
>> >> >>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> >> >> -                     if (cnt) return cnt;
>> >> >> +                     if (cnt > 0) return cnt;
>> >> >>               }
>> >> >>       }
>> >> >
>> >> > This patch is incorrect, and the reason should be an FAQ item if it's
>> >> > not already. Only a return value of 0 means that the requested name
>> >> > does not exist and that it's permissible to continue search. Other
>> >> > nonpositive return values indicate either that the name does exist but
>> >> > does not have a record of the quested type, or that a transient error
>> >> > occurred, making it impossible to determine whether the search can be
>> >> > continued and thus requiring the error to be reported to the caller.
>> >> > Anything else results in one or both of the following bugs:
>> >> >
>> >> > - Nondeterministically returning different results for the same query
>> >> >   depending on transient unavailability of the nameservers to answer
>> >> >   on time.
>> >> >
>> >> > - Returning inconsistent results (for different search components)
>> >> >   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
>> >> >
>> >> > I'm aware that at least rancher-dns and Cloudflare's nameservers have
>> >> > had bugs related to this issue. I'm not sure what the status on
>> >> > getting them fixed is, and for Cloudflare I don't know exactly what it
>> >> > is they're doing wrong or why. But I do know the problem is that
>> >> > they're returning semantically incorrect dns replies.
>> >>
>> >> Kubernetes imposes a default search path with the cluster domain last, so:
>> >>
>> >>   - local.prod.svc.whatever
>> >>   - prod.svc.whatever
>> >>   - svc.whatever
>> >>   - yourdomain.com
>> >>
>> >> The cloudflare issue is that they send SUCCESS code with 0 replies,
>> >> which causes musl to error when it hits the yourdomain.com.
>> >
>> > Yes, that makes sense. Do you know why they're doing it? If they
>> > refuse to fix it, the only clean fix I know is a local proxy
>> > configured to fix the records for the specific broken domains you care
>> > about. But of course that's not convenient.
>>
>> My contacts at cloudflare indicate that their environment depends on
>> this behaviour, so they have no interest in fixing it.
>>
>> A local proxy isn't going to be workable, because most people are
>> going to just say "but Debian or Fedora doesn't require this," and
>> then just go use a glibc distribution.
>>
>> There is a talk in a few weeks at Kubecon (the Kubernetes conference),
>> explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
>> largely centers around how musl's overly strict behaviour makes Alpine
>> a bad choice for "the real world."  I would like to turn this into a
>> story where we can announce that Alpine 3.8 mitigates this problem
>> instead, doing such will be good for both Alpine and the musl
>> ecosystem as a whole, as it is defanging a point of possible FUD.
>
> I understand that this is a frustrating position to be in, where
> someone who (from your description so far, at least) seems like a
> troll, is planning to publicly malign your distro. I don't think
> rushing quick, wrong workarounds helps. Rather it makes it look like
> you knew there was a problem and only acted on it because someone
> threatened to make you look bad in public. The reality is that
> Cloudflare is returning wrong DNS results that aren't suitable for the
> search settings k8s users want to use.

I agree with this conclusion, but also think that mitigating the k8s
problem is worthwhile, too, as there have been many people who have
complained about it.  Strangely, no bug report has ever been filed
with Alpine for this though.

In IRC, we talked about conditionally enabling a mitigation based on
split A/AAAA lookup.  I believe this is the way to go, and am working
on a patch for it.  I believe we should enable the mitigation where
ndots > 1 or n_search_domains > 1, as discussed in IRC.

> I think a better response would be demonstrating why the glibc
> behavior they're relying on is buggy. This should be possible by
> configuring two search domains:
>
>         ex1.example.com
>         ex2.example.com
>
> and adding the following records to them:
>
>         foo.ex1.example.com.    A       10.0.0.1
>         foo.ex2.example.com.    A       192.168.0.1
>         foo.ex2.example.com.    AAAA    fd00::1
>
> Then, try glibc's getaddrinfo for the name "foo" with AF_INET,
> AF_INET6, and AF_UNSPEC. The results I expect you'll see are:
>
>         AF_INET:        10.0.0.1
>         AF_INET6:       192.168.0.1, fd00::1
>         AF_UNSPEC:      10.0.0.1 (possibly also fd00::1 but doubtful)
>
> That is, *which fqdn you see results for* is depending on which
> address family you made the query with. This is clearly wrong.

Yes, I agree with this as a counter-example.  I intend to write a blog
about it, and that is a good scenario to demonstrate.

> With musl you'll consistently see results for foo.ex1.example.com:
>
>         AF_INET:        10.0.0.1
>         AF_INET6:       no results
>         AF_UNSPEC:      10.0.0.1
>
> since it's the first definition of foo in the serch domains.

Yes, I agree that the consistency is good to demonstrate also.

>> >> Do you have any suggestions on a mitigation which would be more
>> >> palatable?  We need to ship a mitigation for this in Alpine 3.8
>> >> regardless.  I would much rather carry a patch that is upstreamable,
>> >> but I am quite willing to carry one that isn't, in order to solve this
>> >> problem.
>> >
>> > A theoretically-non-horrible (but somewhat costly) solution is to
>> > always query both A and AAAA, rather than only doing it for AF_UNSPEC.
>> > Then if you see a reply with 0 (total, between both) records, you can
>> > opt to interpret that the same way as NxDomain without breaking
>> > consistency properties. If Cloudflare refuses to fix the bug, maybe we
>> > should consider adding an _option_ (in the resolv.conf options line)
>> > to do this. I don't think it should be the default behavior because it
>> > mildly slows down lookups, especially if you have nontrivial packet
>> > loss since probability of failure is now 1-(1-p)²=2p-p² rather than p
>> > (where p is the packet loss rate).
>>
>> It seems to me we could just send ANY and filter out the records we
>> don't care about.  This is what I did with charybdis's asynchronous
>> DNS resolver when it had a similar problem.  What are your thoughts on
>> that?
>
> If ANY requests actually worked we would have done this years ago.
> They don't. Not only is the ANY result unlikely to fit in a UDP reply
> (and lacking any reason to include the records you need if it doesn't
> fit); A and AAAA queries are also special in that they resolve CNAMEs
> and automatically include the A or AAAA records for the name the CNAME
> points to in the reply. With ANY you'd have to reimplement the whole
> CNAME logic as a second (slow) query from the stub resolver rather
> than having the recursive resolver give you the answer it already knew
> when you made the first query.

Right, I forgot about CNAMEs.  In charybdis, we wound up just using
A/AAAA conditionally, because of them.
DNS gives me a headache.

William


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-30 19:44       ` William Pitcock
  2018-03-30 20:24         ` Szabolcs Nagy
  2018-03-30 20:35         ` Rich Felker
@ 2018-03-31 10:42         ` Florian Weimer
  2018-03-31 14:01           ` Rich Felker
  2 siblings, 1 reply; 12+ messages in thread
From: Florian Weimer @ 2018-03-31 10:42 UTC (permalink / raw)
  To: William Pitcock; +Cc: musl

* William Pitcock:

> A local proxy isn't going to be workable, because most people are
> going to just say "but Debian or Fedora doesn't require this," and
> then just go use a glibc distribution.

Some parts of the glibc behavior are clearly wrong and not even
internally consistent.  Rich is right that for correctness, you can
only proceed on the search path if you have received a successful
reply.  However, making changing in this area difficult, both due to
the current state of the glibc code, and existing deployments
depending on corner cases which are not well-understood.

I'm not entirely convinced that using different search path domains
for different address families is necessarily wrong.  Historically,
the NODATA/NXDOMAIN signaling has been really inconsistent anyway, and
I suspect it still is for some users.

What Cloudflare is doing appears to be some kind of protection against
NSEC-based zone enumeration, and that requires synthesizing NODATA
response.  They are unlikely to change that, and they won't be the
only ones doing this.

(going further upthread)

> Kubernetes imposes a default search path with the cluster domain last, so:
> 
>   - local.prod.svc.whatever
>   - prod.svc.whatever
>   - svc.whatever
>   - yourdomain.com

Do you have a source for that?

Considering that glibc had for a long time a hard limit at six
entries, I find that approach rather surprising.  This leaves just
three domains in the end user's context.  That's not going to be
sufficient for many users.  Anyway …

> The cloudflare issue is that they send SUCCESS code with 0 replies,
> which causes musl to error when it hits the yourdomain.com.

Is the long search path really the problem here?  Isn't it ndots:5?
It means that queries destined to the general DNS tree hit the
organizational tree first, where the search stops due to the NODATA
response.  So you never get the expected response from the public
tree.

Is this what's happening?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-31 10:42         ` Florian Weimer
@ 2018-03-31 14:01           ` Rich Felker
  2018-03-31 16:08             ` Florian Weimer
  0 siblings, 1 reply; 12+ messages in thread
From: Rich Felker @ 2018-03-31 14:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: William Pitcock, musl

On Sat, Mar 31, 2018 at 12:42:17PM +0200, Florian Weimer wrote:
> * William Pitcock:
> 
> > A local proxy isn't going to be workable, because most people are
> > going to just say "but Debian or Fedora doesn't require this," and
> > then just go use a glibc distribution.
> 
> Some parts of the glibc behavior are clearly wrong and not even
> internally consistent.  Rich is right that for correctness, you can
> only proceed on the search path if you have received a successful
> reply.  However, making changing in this area difficult, both due to
> the current state of the glibc code, and existing deployments
> depending on corner cases which are not well-understood.

The behavior of path search on failures is a separate issue from the
behavior on "NODATA" so we can probably stick to the latter for now.

> I'm not entirely convinced that using different search path domains
> for different address families is necessarily wrong.

It breaks the completely reasonable application expectation that the
results produced by AF_INET and AF_INET6 queries are subsets of the
results produced by AF_UNSPEC. The proper application idiom is to use
AF_UNSPEC (or no hints) and respect the order the results are returned
in, in order to honor RFC 3484/gai.conf or any other means by which
getaddrinfo determines which order results should be tried in. It's
(IMO at least) utterly wrong to try to merge results from different
search domains, but I can see applications trying both queries
separately when they encounter the inconsistency...

> Historically,
> the NODATA/NXDOMAIN signaling has been really inconsistent anyway, and
> I suspect it still is for some users.

Do you have a reference for this? AFAIK it was very consistent in all
historical implementations. It's also documented (in RFC-????...I
forget where but I looked it up during this).

> What Cloudflare is doing appears to be some kind of protection against
> NSEC-based zone enumeration, and that requires synthesizing NODATA
> response.  They are unlikely to change that, and they won't be the
> only ones doing this.

Thanks for the explanation.

> > Kubernetes imposes a default search path with the cluster domain last, so:
> > 
> >   - local.prod.svc.whatever
> >   - prod.svc.whatever
> >   - svc.whatever
> >   - yourdomain.com
> 
> Do you have a source for that?
> 
> Considering that glibc had for a long time a hard limit at six
> entries, I find that approach rather surprising.  This leaves just
> three domains in the end user's context.  That's not going to be
> sufficient for many users.  Anyway …

k8s isn't software you install as a package on your user system. It's
cloud/container stuff, where it wouldn't make sense to add more search
domains beyond the ones for your application.

> > The cloudflare issue is that they send SUCCESS code with 0 replies,
> > which causes musl to error when it hits the yourdomain.com.
> 
> Is the long search path really the problem here?  Isn't it ndots:5?
> It means that queries destined to the general DNS tree hit the
> organizational tree first, where the search stops due to the NODATA
> response.  So you never get the expected response from the public
> tree.
> 
> Is this what's happening?

Yes. ndots>1 is utterly awful -- it greatly increases latency of every
lookup, and has failure modes like what we're seeing now -- but the
k8s folks designed stuff around it. Based on conversations when musl
added search domains, I think there are people on the k8s side that
realize this was a bad design choice and want to fix it, but that
probably won't be easy to roll out to everyone and I have no idea if
it's really going to happen.

FWIW, if ndots<=1 and there is only one search domain, the
NODATA/NXDOMAIN issue does not make any difference to the results
(assuming no TLDs with top-level A/AAAA records :). But if ndots>1 or
there are at least 2 search domains, the result does change. In the
former case, global lookups get broken; in the latter, subsequent
search domains get missed.

Rich

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
  2018-03-31 14:01           ` Rich Felker
@ 2018-03-31 16:08             ` Florian Weimer
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2018-03-31 16:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: William Pitcock, musl

* Rich Felker:

>> I'm not entirely convinced that using different search path domains
>> for different address families is necessarily wrong.
>
> It breaks the completely reasonable application expectation that the
> results produced by AF_INET and AF_INET6 queries are subsets of the
> results produced by AF_UNSPEC. The proper application idiom is to use
> AF_UNSPEC (or no hints) and respect the order the results are returned
> in, in order to honor RFC 3484/gai.conf or any other means by which
> getaddrinfo determines which order results should be tried in. It's
> (IMO at least) utterly wrong to try to merge results from different
> search domains, but I can see applications trying both queries
> separately when they encounter the inconsistency...

Well, yes, but I'm not sure you can get consistent behavior without
always issuing two queries.  And least not if you want to stay
compatible with certain forms of DNSSEC online signing.

>> Historically,
>> the NODATA/NXDOMAIN signaling has been really inconsistent anyway, and
>> I suspect it still is for some users.
>
> Do you have a reference for this? AFAIK it was very consistent in all
> historical implementations. It's also documented (in RFC-????...I
> forget where but I looked it up during this).

Today, I expected that it is consistent among the major
implementations, mainly due to DNSSEC influence.

Some load balancers returned NXDOMAIN for AAAA queries.  I'm not sure
if F5 was one of them, but this document suggest something in this
direction:

  https://support.f5.com/csp/article/K2345

Here's a report of this issue:

  https://www.nanog.org/mailinglist/mailarchives/old_archive/2002-04/msg00559.html

Here's a more concrete bug report about MaraDNS:

  http://maradns.samiam.org/old-list-archive/2009-October/000476.html

(Which is surprisingly recent, but then, non-lower-case domain names
are probably quite rare.)

Peter van Dijk reports something else, some form of NODATA-to-NXDOMAIN
escalation:

  https://blog.powerdns.com/2015/03/02/from-noerror-to-refused/

Although that doesn't happen on the stub resolver interface, it shows
that the behavior still isn't as uniform as we would like it to be.

>> > Kubernetes imposes a default search path with the cluster domain last, so:
>> > 
>> >   - local.prod.svc.whatever
>> >   - prod.svc.whatever
>> >   - svc.whatever
>> >   - yourdomain.com
>> 
>> Do you have a source for that?
>> 
>> Considering that glibc had for a long time a hard limit at six
>> entries, I find that approach rather surprising.  This leaves just
>> three domains in the end user's context.  That's not going to be
>> sufficient for many users.  Anyway …
>
> k8s isn't software you install as a package on your user system. It's
> cloud/container stuff, where it wouldn't make sense to add more search
> domains beyond the ones for your application.

From what I've heard, quite a few people use it to run older software
which interacts with the corporate network.  Even before, the six
domain limit was quite low for some deployments (and some sites
apparently stuck to NIS because of its server-side search list
processing).  All I'm saying is that it's a curious choice due to the
compatibility and performance issues involved.

> Yes. ndots>1 is utterly awful -- it greatly increases latency of every
> lookup, and has failure modes like what we're seeing now -- but the
> k8s folks designed stuff around it. Based on conversations when musl
> added search domains, I think there are people on the k8s side that
> realize this was a bad design choice and want to fix it, but that
> probably won't be easy to roll out to everyone and I have no idea if
> it's really going to happen.

They probably do not want to maintain an NSS module for that. 8-> 

In the past, in the container context, there have also been reports
about injecting the recursive resolver endpoint, so that it appears in
the container under a 127.0.0.0/8 address.  I don't know if that has
been solved completely.  I suspect a DNS transport over a UNIX domain
socket would help here.

For the search path problems, we would need a DNS protocol extension
which transfers search path expansion to the recursive resolver.  I'm
not sure if this is worth the effort, and how many glibc-based
distributions would be willing to backport a patch for that in a
relatively timely fashion.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-31 16:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-30 18:52 [PATCH] resolver: only exit the search path loop there are a positive number of results given William Pitcock
2018-03-30 19:14 ` Rich Felker
2018-03-30 19:19   ` William Pitcock
2018-03-30 19:35     ` Rich Felker
2018-03-30 19:44       ` William Pitcock
2018-03-30 20:24         ` Szabolcs Nagy
2018-03-30 20:33           ` William Pitcock
2018-03-30 20:35         ` Rich Felker
2018-03-30 21:09           ` William Pitcock
2018-03-31 10:42         ` Florian Weimer
2018-03-31 14:01           ` Rich Felker
2018-03-31 16:08             ` Florian Weimer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).