mailing list of musl libc
 help / color / mirror / code / Atom feed
From: William Pitcock <nenolod@dereferenced.org>
To: musl@lists.openwall.com
Subject: Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given
Date: Fri, 30 Mar 2018 14:44:44 -0500	[thread overview]
Message-ID: <CA+T2pCFyTEE6rWkbiZH42i5PjwoG8CX=31e2zZjDR5i7Nk9bpA@mail.gmail.com> (raw)
In-Reply-To: <20180330193548.GT1436@brightrain.aerifal.cx>

Hello,

On Fri, Mar 30, 2018 at 2:35 PM, Rich Felker <dalias@libc.org> wrote:
> On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote:
>> Hello,
>>
>> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker <dalias@libc.org> wrote:
>> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote:
>> >> In the event of no results being given by any of the lookup modules, EAI_NONAME will still
>> >> be thrown.
>> >>
>> >> This is intended to mitigate problems that occur when zones are hosted by weird DNS servers,
>> >> such as the one Cloudflare have implemented, and appear in the search path.
>> >> ---
>> >>  src/network/lookup_name.c | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
>> >> index 209c20f0..b068bb92 100644
>> >> --- a/src/network/lookup_name.c
>> >> +++ b/src/network/lookup_name.c
>> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address buf[static MAXADDRS], char canon[
>> >>                       memcpy(canon+l+1, p, z-p);
>> >>                       canon[z-p+1+l] = 0;
>> >>                       int cnt = name_from_dns(buf, canon, canon, family, &conf);
>> >> -                     if (cnt) return cnt;
>> >> +                     if (cnt > 0) return cnt;
>> >>               }
>> >>       }
>> >
>> > This patch is incorrect, and the reason should be an FAQ item if it's
>> > not already. Only a return value of 0 means that the requested name
>> > does not exist and that it's permissible to continue search. Other
>> > nonpositive return values indicate either that the name does exist but
>> > does not have a record of the quested type, or that a transient error
>> > occurred, making it impossible to determine whether the search can be
>> > continued and thus requiring the error to be reported to the caller.
>> > Anything else results in one or both of the following bugs:
>> >
>> > - Nondeterministically returning different results for the same query
>> >   depending on transient unavailability of the nameservers to answer
>> >   on time.
>> >
>> > - Returning inconsistent results (for different search components)
>> >   depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested.
>> >
>> > I'm aware that at least rancher-dns and Cloudflare's nameservers have
>> > had bugs related to this issue. I'm not sure what the status on
>> > getting them fixed is, and for Cloudflare I don't know exactly what it
>> > is they're doing wrong or why. But I do know the problem is that
>> > they're returning semantically incorrect dns replies.
>>
>> Kubernetes imposes a default search path with the cluster domain last, so:
>>
>>   - local.prod.svc.whatever
>>   - prod.svc.whatever
>>   - svc.whatever
>>   - yourdomain.com
>>
>> The cloudflare issue is that they send SUCCESS code with 0 replies,
>> which causes musl to error when it hits the yourdomain.com.
>
> Yes, that makes sense. Do you know why they're doing it? If they
> refuse to fix it, the only clean fix I know is a local proxy
> configured to fix the records for the specific broken domains you care
> about. But of course that's not convenient.

My contacts at cloudflare indicate that their environment depends on
this behaviour, so they have no interest in fixing it.

A local proxy isn't going to be workable, because most people are
going to just say "but Debian or Fedora doesn't require this," and
then just go use a glibc distribution.

There is a talk in a few weeks at Kubecon (the Kubernetes conference),
explicitly titled "Don't Use Alpine If You Care About DNS."  The talk
largely centers around how musl's overly strict behaviour makes Alpine
a bad choice for "the real world."  I would like to turn this into a
story where we can announce that Alpine 3.8 mitigates this problem
instead, doing such will be good for both Alpine and the musl
ecosystem as a whole, as it is defanging a point of possible FUD.

>
>> Do you have any suggestions on a mitigation which would be more
>> palatable?  We need to ship a mitigation for this in Alpine 3.8
>> regardless.  I would much rather carry a patch that is upstreamable,
>> but I am quite willing to carry one that isn't, in order to solve this
>> problem.
>
> A theoretically-non-horrible (but somewhat costly) solution is to
> always query both A and AAAA, rather than only doing it for AF_UNSPEC.
> Then if you see a reply with 0 (total, between both) records, you can
> opt to interpret that the same way as NxDomain without breaking
> consistency properties. If Cloudflare refuses to fix the bug, maybe we
> should consider adding an _option_ (in the resolv.conf options line)
> to do this. I don't think it should be the default behavior because it
> mildly slows down lookups, especially if you have nontrivial packet
> loss since probability of failure is now 1-(1-p)²=2p-p² rather than p
> (where p is the packet loss rate).

It seems to me we could just send ANY and filter out the records we
don't care about.  This is what I did with charybdis's asynchronous
DNS resolver when it had a similar problem.  What are your thoughts on
that?

William


  reply	other threads:[~2018-03-30 19:44 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-30 18:52 William Pitcock
2018-03-30 19:14 ` Rich Felker
2018-03-30 19:19   ` William Pitcock
2018-03-30 19:35     ` Rich Felker
2018-03-30 19:44       ` William Pitcock [this message]
2018-03-30 20:24         ` Szabolcs Nagy
2018-03-30 20:33           ` William Pitcock
2018-03-30 20:35         ` Rich Felker
2018-03-30 21:09           ` William Pitcock
2018-03-31 10:42         ` Florian Weimer
2018-03-31 14:01           ` Rich Felker
2018-03-31 16:08             ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+T2pCFyTEE6rWkbiZH42i5PjwoG8CX=31e2zZjDR5i7Nk9bpA@mail.gmail.com' \
    --to=nenolod@dereferenced.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).