From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12665 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] resolver: only exit the search path loop there are a positive number of results given Date: Sat, 31 Mar 2018 10:01:57 -0400 Message-ID: <20180331140157.GV1436@brightrain.aerifal.cx> References: <20180330185225.29656-1-nenolod@dereferenced.org> <20180330191452.GS1436@brightrain.aerifal.cx> <20180330193548.GT1436@brightrain.aerifal.cx> <874lkw1q7q.fsf@mid.deneb.enyo.de> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1522504817 11710 195.159.176.226 (31 Mar 2018 14:00:17 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sat, 31 Mar 2018 14:00:17 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) Cc: William Pitcock , musl@lists.openwall.com To: Florian Weimer Original-X-From: musl-return-12679-gllmg-musl=m.gmane.org@lists.openwall.com Sat Mar 31 16:00:13 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1f2H3E-0002wN-JN for gllmg-musl@m.gmane.org; Sat, 31 Mar 2018 16:00:12 +0200 Original-Received: (qmail 32173 invoked by uid 550); 31 Mar 2018 14:02:15 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32146 invoked from network); 31 Mar 2018 14:02:15 -0000 Content-Disposition: inline In-Reply-To: <874lkw1q7q.fsf@mid.deneb.enyo.de> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:12665 Archived-At: On Sat, Mar 31, 2018 at 12:42:17PM +0200, Florian Weimer wrote: > * William Pitcock: > > > A local proxy isn't going to be workable, because most people are > > going to just say "but Debian or Fedora doesn't require this," and > > then just go use a glibc distribution. > > Some parts of the glibc behavior are clearly wrong and not even > internally consistent. Rich is right that for correctness, you can > only proceed on the search path if you have received a successful > reply. However, making changing in this area difficult, both due to > the current state of the glibc code, and existing deployments > depending on corner cases which are not well-understood. The behavior of path search on failures is a separate issue from the behavior on "NODATA" so we can probably stick to the latter for now. > I'm not entirely convinced that using different search path domains > for different address families is necessarily wrong. It breaks the completely reasonable application expectation that the results produced by AF_INET and AF_INET6 queries are subsets of the results produced by AF_UNSPEC. The proper application idiom is to use AF_UNSPEC (or no hints) and respect the order the results are returned in, in order to honor RFC 3484/gai.conf or any other means by which getaddrinfo determines which order results should be tried in. It's (IMO at least) utterly wrong to try to merge results from different search domains, but I can see applications trying both queries separately when they encounter the inconsistency... > Historically, > the NODATA/NXDOMAIN signaling has been really inconsistent anyway, and > I suspect it still is for some users. Do you have a reference for this? AFAIK it was very consistent in all historical implementations. It's also documented (in RFC-????...I forget where but I looked it up during this). > What Cloudflare is doing appears to be some kind of protection against > NSEC-based zone enumeration, and that requires synthesizing NODATA > response. They are unlikely to change that, and they won't be the > only ones doing this. Thanks for the explanation. > > Kubernetes imposes a default search path with the cluster domain last, so: > > > > - local.prod.svc.whatever > > - prod.svc.whatever > > - svc.whatever > > - yourdomain.com > > Do you have a source for that? > > Considering that glibc had for a long time a hard limit at six > entries, I find that approach rather surprising. This leaves just > three domains in the end user's context. That's not going to be > sufficient for many users. Anyway … k8s isn't software you install as a package on your user system. It's cloud/container stuff, where it wouldn't make sense to add more search domains beyond the ones for your application. > > The cloudflare issue is that they send SUCCESS code with 0 replies, > > which causes musl to error when it hits the yourdomain.com. > > Is the long search path really the problem here? Isn't it ndots:5? > It means that queries destined to the general DNS tree hit the > organizational tree first, where the search stops due to the NODATA > response. So you never get the expected response from the public > tree. > > Is this what's happening? Yes. ndots>1 is utterly awful -- it greatly increases latency of every lookup, and has failure modes like what we're seeing now -- but the k8s folks designed stuff around it. Based on conversations when musl added search domains, I think there are people on the k8s side that realize this was a bad design choice and want to fix it, but that probably won't be easy to roll out to everyone and I have no idea if it's really going to happen. FWIW, if ndots<=1 and there is only one search domain, the NODATA/NXDOMAIN issue does not make any difference to the results (assuming no TLDs with top-level A/AAAA records :). But if ndots>1 or there are at least 2 search domains, the result does change. In the former case, global lookups get broken; in the latter, subsequent search domains get missed. Rich