From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12657 Path: news.gmane.org!.POSTED!not-for-mail From: William Pitcock Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given Date: Fri, 30 Mar 2018 14:44:44 -0500 Message-ID: References: <20180330185225.29656-1-nenolod@dereferenced.org> <20180330191452.GS1436@brightrain.aerifal.cx> <20180330193548.GT1436@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: blaine.gmane.org 1522438980 29605 195.159.176.226 (30 Mar 2018 19:43:00 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 30 Mar 2018 19:43:00 +0000 (UTC) To: musl@lists.openwall.com Original-X-From: musl-return-12671-gllmg-musl=m.gmane.org@lists.openwall.com Fri Mar 30 21:42:56 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1f1zvJ-0007XJ-Al for gllmg-musl@m.gmane.org; Fri, 30 Mar 2018 21:42:53 +0200 Original-Received: (qmail 10108 invoked by uid 550); 30 Mar 2018 19:44:57 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 10088 invoked from network); 30 Mar 2018 19:44:56 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dereferenced-org.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-transfer-encoding; bh=FDXcgmVvezNArSwbq+DQJB3bE1jYaZNoEtzaPLlfW5o=; b=WNpjXfjM8YdquraCSTzmdLHzH8MjchSbwHQKDYVQyIMKPhFyZVfoy02+T2//c7Ft0z 8eEgN0P3E8+g0fvTGrJW90/iH44khJZNJQlgdxLm+pnOrC9fSX+m/9/DhHPs9TJKsPrU /k41aZ8Sr7lTH1KUnuNKZzPT18JfTten1Ht1TYI33k/6jm42Ws8Qqqn8DpHVF651SlRf /GrUnQmC+vk16IQVdOUKxhlQ+Xcr8bxP1mmNOSImbpC/D14ccC4BWcCTt7EeejOjHQV0 cCQaphzHXPOX6SSiv8DUqavw5ZB2x6rUq6TBmiCATRedehSGZQU+PgJa/7VX9Cb/hCxv R83Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-transfer-encoding; bh=FDXcgmVvezNArSwbq+DQJB3bE1jYaZNoEtzaPLlfW5o=; b=OHQk1xm6WEAQxYStrrQPJo6ti7piUxEJ07z7Ek0mWcc6yPl0Va0OxcwbCPR8KR9rdT NWnGTHN60l4weny9bME0c8Oww2T+ETa7/pKV/xtkFbd2COFxyzO4/0LK6GOOHC6onE7U Ggtnq4OKpyhjGyEvUvwvy0PW23l9zVuXae1qlMAu7IjkqAZhqEQEZtAfFghQnqhc3ica iOWZNV1Oj6ADlMrsrw+9hlrpTR6RmKybsaMb4SEGRvvJenndCS1scKkdFS74HLsOSB0c r6AUnc4ZNqpLVNIQ2F8hXUAD70kFLAmCVvCJJ0IZI88e7sjDS7r9zIVGZzS6p0feAzqV BfUQ== X-Gm-Message-State: ALQs6tBkedIDQN7IvCatjp0nFwj+6hQMHC2dQriBvdmKlIk8FFNKkWpL IMYtTWgGT9F27p3Zs5YNg4+6FyVvEYuXJureifVzbQ== X-Google-Smtp-Source: AIpwx488MVsIFgMmMdqocRmuEnmTJVqDLjrRDoZxYeZKiMvRUCwHGakt6pkbwIvjxr38sVHflj33BW97sk/IQVHQeyA= X-Received: by 10.55.234.6 with SMTP id t6mr392702qkj.291.1522439084754; Fri, 30 Mar 2018 12:44:44 -0700 (PDT) In-Reply-To: <20180330193548.GT1436@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:12657 Archived-At: Hello, On Fri, Mar 30, 2018 at 2:35 PM, Rich Felker wrote: > On Fri, Mar 30, 2018 at 02:19:48PM -0500, William Pitcock wrote: >> Hello, >> >> On Fri, Mar 30, 2018 at 2:14 PM, Rich Felker wrote: >> > On Fri, Mar 30, 2018 at 06:52:25PM +0000, William Pitcock wrote: >> >> In the event of no results being given by any of the lookup modules, = EAI_NONAME will still >> >> be thrown. >> >> >> >> This is intended to mitigate problems that occur when zones are hoste= d by weird DNS servers, >> >> such as the one Cloudflare have implemented, and appear in the search= path. >> >> --- >> >> src/network/lookup_name.c | 2 +- >> >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> >> >> diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c >> >> index 209c20f0..b068bb92 100644 >> >> --- a/src/network/lookup_name.c >> >> +++ b/src/network/lookup_name.c >> >> @@ -202,7 +202,7 @@ static int name_from_dns_search(struct address bu= f[static MAXADDRS], char canon[ >> >> memcpy(canon+l+1, p, z-p); >> >> canon[z-p+1+l] =3D 0; >> >> int cnt =3D name_from_dns(buf, canon, canon, fa= mily, &conf); >> >> - if (cnt) return cnt; >> >> + if (cnt > 0) return cnt; >> >> } >> >> } >> > >> > This patch is incorrect, and the reason should be an FAQ item if it's >> > not already. Only a return value of 0 means that the requested name >> > does not exist and that it's permissible to continue search. Other >> > nonpositive return values indicate either that the name does exist but >> > does not have a record of the quested type, or that a transient error >> > occurred, making it impossible to determine whether the search can be >> > continued and thus requiring the error to be reported to the caller. >> > Anything else results in one or both of the following bugs: >> > >> > - Nondeterministically returning different results for the same query >> > depending on transient unavailability of the nameservers to answer >> > on time. >> > >> > - Returning inconsistent results (for different search components) >> > depending on whether AF_INET, AF_INET6, or AF_UNSPEC was requested. >> > >> > I'm aware that at least rancher-dns and Cloudflare's nameservers have >> > had bugs related to this issue. I'm not sure what the status on >> > getting them fixed is, and for Cloudflare I don't know exactly what it >> > is they're doing wrong or why. But I do know the problem is that >> > they're returning semantically incorrect dns replies. >> >> Kubernetes imposes a default search path with the cluster domain last, s= o: >> >> - local.prod.svc.whatever >> - prod.svc.whatever >> - svc.whatever >> - yourdomain.com >> >> The cloudflare issue is that they send SUCCESS code with 0 replies, >> which causes musl to error when it hits the yourdomain.com. > > Yes, that makes sense. Do you know why they're doing it? If they > refuse to fix it, the only clean fix I know is a local proxy > configured to fix the records for the specific broken domains you care > about. But of course that's not convenient. My contacts at cloudflare indicate that their environment depends on this behaviour, so they have no interest in fixing it. A local proxy isn't going to be workable, because most people are going to just say "but Debian or Fedora doesn't require this," and then just go use a glibc distribution. There is a talk in a few weeks at Kubecon (the Kubernetes conference), explicitly titled "Don't Use Alpine If You Care About DNS." The talk largely centers around how musl's overly strict behaviour makes Alpine a bad choice for "the real world." I would like to turn this into a story where we can announce that Alpine 3.8 mitigates this problem instead, doing such will be good for both Alpine and the musl ecosystem as a whole, as it is defanging a point of possible FUD. > >> Do you have any suggestions on a mitigation which would be more >> palatable? We need to ship a mitigation for this in Alpine 3.8 >> regardless. I would much rather carry a patch that is upstreamable, >> but I am quite willing to carry one that isn't, in order to solve this >> problem. > > A theoretically-non-horrible (but somewhat costly) solution is to > always query both A and AAAA, rather than only doing it for AF_UNSPEC. > Then if you see a reply with 0 (total, between both) records, you can > opt to interpret that the same way as NxDomain without breaking > consistency properties. If Cloudflare refuses to fix the bug, maybe we > should consider adding an _option_ (in the resolv.conf options line) > to do this. I don't think it should be the default behavior because it > mildly slows down lookups, especially if you have nontrivial packet > loss since probability of failure is now 1-(1-p)=C2=B2=3D2p-p=C2=B2 rathe= r than p > (where p is the packet loss rate). It seems to me we could just send ANY and filter out the records we don't care about. This is what I did with charybdis's asynchronous DNS resolver when it had a similar problem. What are your thoughts on that? William