From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/8715
Path: news.gmane.org!not-for-mail
From: Tim Hockin <thockin@google.com>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: Re: Would love to see reconsideration for domain and search
Date: Thu, 22 Oct 2015 15:36:47 -0700
Message-ID: <CAO_Rewa5vpYUZYoShoSrgJn-fv2xvMPhFjpLQsuMekHVs+grtQ@mail.gmail.com>
References: <CAO_RewaaFU=RmDPHNwpaCg=FsoG+dqr4NV-QXY=fAkVYnsS-Jw@mail.gmail.com>
 <20151022215608.GA8645@brightrain.aerifal.cx>
Reply-To: musl@lists.openwall.com
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Trace: ger.gmane.org 1445553456 4884 80.91.229.3 (22 Oct 2015 22:37:36 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Thu, 22 Oct 2015 22:37:36 +0000 (UTC)
Cc: musl@lists.openwall.com
To: Rich Felker <dalias@libc.org>
Original-X-From: musl-return-8728-gllmg-musl=m.gmane.org@lists.openwall.com Fri Oct 23 00:37:27 2015
Return-path: <musl-return-8728-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by plane.gmane.org with smtp (Exim 4.69)
	(envelope-from <musl-return-8728-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1ZpOU5-00010o-JM
	for gllmg-musl@m.gmane.org; Fri, 23 Oct 2015 00:37:21 +0200
Original-Received: (qmail 24188 invoked by uid 550); 22 Oct 2015 22:37:19 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
Original-Received: (qmail 24167 invoked from network); 22 Oct 2015 22:37:18 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20120113;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type;
        bh=OfYTYO7psMxRwSi+Upqut2SEsnfnX46Jg/IVDwLrFgg=;
        b=OvoyidSJb54pwFIF1asGR4qonUsGyu+7FocEYTUzrRWxz1IiBZ6NZ5G9vfw+uJZYVK
         1rLHFlMbtBvif2rayPNyuczk3fIdaFCkoxzCLo9YajSnAv9zmnhITfZ0xFboZ1CPuZac
         M4VuqWK43/5PrXCDEO8Ao6NtGaRa+kFz/L7mZBOInzyiM23wV3z9xCYTAmBCKPL2npdH
         X6YcgNpBR/VoTlOclVjnQj3CIOqN/7sVTjza+qFwT2hklCq2UG4QlXDya5kjo/nLuOka
         bQWN94kp/vJ8cN/HLKlhZo9nHhIN6fAxnZqFZ6DQLcygcYQUmkmMrDrKFOCVeGmLXszE
         5UxA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:in-reply-to:references:from:date
         :message-id:subject:to:cc:content-type;
        bh=OfYTYO7psMxRwSi+Upqut2SEsnfnX46Jg/IVDwLrFgg=;
        b=gZJORtSidj7JzbiRii/KY63l1IOMT60Mrnh6KeBtyXIjMYlW+vwHlp70fm/EB8g8kd
         BMAtegZAQVWU3+SMivJAtAeMFDdi6qM1IL82Gj7g1BlBmehpUX1UmwUbax4L6d2nKvQN
         KBf07s4MrOSdfzoWk3SWVYl54HwbC62Ocn1wAXmHEzLIsBKfP1cjCiZmWl9btU/xBCQU
         EOELU9a0kRcThVpdE0VTDUea0mSTd0LVQPdomNMgHHDFqTtZo/yHIttMqS1MFNmHavvb
         IoA/aUrIJMM+Qvz/vhNvZXk7papuqtjCzT2dwvk9OgAgkf0vpEi/j+pFMepNd98ccCwN
         BtyQ==
X-Gm-Message-State: ALoCoQneXwQirk8qcErH1MeR0/yAb54a91faad8UWbXGCw4rVK5aQYSxMNjaoHZfEZgwDoGV76y8
X-Received: by 10.31.54.208 with SMTP id d199mr10357543vka.143.1445553426785;
 Thu, 22 Oct 2015 15:37:06 -0700 (PDT)
In-Reply-To: <20151022215608.GA8645@brightrain.aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:8715
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/8715>

On Thu, Oct 22, 2015 at 2:56 PM, Rich Felker <dalias@libc.org> wrote:
> On Thu, Oct 22, 2015 at 02:24:11PM -0700, Tim Hockin wrote:
>> Hi all,
>>
>> I saw this thread on the web archive but am not sure how to respond to
>> the thread directly as a new joinee of the ML.  I hope this finds its
>> way...
>
> No problem; just starting a new thread like this and quoting the old
> one is fine.
>
>> I am one of the developers of Kubernetes and I own the DNS portion, in
>> particular.  I desperately want to use Alpine Linux (based on musl)
>> but for now I have to warn people NOT to use it because of this issue.
>>
>> On Fri, Sep 04, 2015 at 02:04:29PM -0400, Rich Felker wrote:
>> > On Fri, Sep 04, 2015 at 12:11:36PM -0500, Andy Shinn wrote:
>> >> I'm writing the wonderful musl project today to open discussion
>> >> about the future possibility of DNS search and domain keyword
>> >> support. We've been using musl libc (by way of Alpine Linux) for
>> >> new development of applications as containers that discover each
>> >> other through DNS and other software defined networking.
>> >>
>> >> In particular, we are starting to use applications like SkyDNS,
>> >> Consul, and Kubernetes, all of which rely on local name
>> >> resolution in some way using search paths. Many users of the
>> >> Alpine Linux container image have also expressed their desire for
>> >> this feature at
>> >> https://github.com/gliderlabs/docker-alpine/issues/8.
>> >>
>> >> On the functional differences between glibc page, the domain and
>> >> search keyword "Support may be added in the future if there is
>> >> demand". So please consider this request an addition to whatever
>> >> demand for the feature already exists.
>> >>
>> >> Thank you for your time and great work on the musl libc project!
>> >
>> > I think this is a reasonable request. I'll look into it more.
>> >
>> > One property I do not want to break is deterministic results, so
>> > when a search is performed, if any step of the search ends with
>> > an error rather than a positive or negative result, the whole
>> > lookup needs to stop and report the error rather than falling
>> > back. Falling back is not safe and creates a situation where DoS
>> > can be used to control which results are returned.
>>
>> I understand your point, though the world at large tends to disagree.
>> Everyone has a primary and secondary `nameserver` record (or should).
>> If the first one times out, try the second.  Most resolver libs seem
>> to accept a SERVFAIL response or a timeout as a signal to try the next
>> server, and I would encourage you to do the same.
>
> musl intentionally does not do this because it yields abysmal
> performance. If the first nameserver is overloaded or the packet is
> lost, you suffer several-second lookup latency.

But at least it works eventually.  You're faced with a choice.  Wait 2
seconds for ns1 to timeout and then fail in a way that most apps don't
handle well or wait for 2 seconds and then (usually) get a fast
response from ns2.

It seems better in every way to eventually succeed, though I agree
it's a bit less visible.

>> Stopping on positive response or NXDOMAIN seems to be commonly
>> accepted with a caveat.  You can't query all nameservers and just take
>> the first NXDOMAIN to respond.  You can only accept NXDOMAIN if all of
>> the higher-priority (listed first in resolv.conf) nameservers have
>> timed out or SERVFAIL'ed.  You can issue queries in parallel, but you
>> must process responses in order, which is what you describe below.
>
> Timeout or servfail is not sufficient to accept an nxdomain from a
> lower-priority server. To preserve consistency of results under
> transient failure, you actually have to wait for the nxdomain from the

I have to disagree.  Some non-forwarding DNS servers use SERVFAIL to
indicate "I am not serving for that domain" specifically to make the
client move to their next nameserver.  if ns1 returns SERVFAIL, try
ns2.  If ns1 times out, try ns2.  Otherwise what good is ns2?

> higher-priority server. Either way, this very much pessimizes usage
> cases like running "netstat" with huge numbers of connections where
> many of the ip addresses fail to reverse. Being able to return
> immediately as soon as any one of the nameservers responds with
> nxdomain makes the difference between a <1s netstat run and a 5-10s
> netstat run.

Sure it's faster but it's WRONG.  Returning a random number would be
faster, too, but it is equally wrong.  This is why netstat (and myriad
other tools) has a `-n` flag.

> Thus, if we add extensions to support the kind of result unioning you
> want across multiple nameservers, I think they should be configurable
> and off-by-default. A simple option in resolv.conf could turn them on.
> And there could be options for requiring nxdomain from all servers
> (true union) or just for highest-priority when accepting negative
> results.

I can't agree with this.  It's reasonable to make options for these,
but I think the non-standard behaviors should be off by default.
Consider this from the point of view of a system like Docker or
Kubernetes, which generate resolv.conf for you - they have no idea
what libc your apps are using, so it's unreasonable to ask them to
turn off libc-specific flags.  However, the end user knows, and it is
perfectly sane to ask them to explicitly opt-in to non-standard
optimized behaviors.

>> > While it would be possible to parallelize the search while
>> > serializing the results (i.e. waiting to accept a result from the
>> > second query until the first query finished with a negative
>> > result), I think the consensus during the last round of
>> > discussion of this topic was that the complexity cost is too
>> > great and the benefit too small. Ideally, the first query should
>> > always succeed, anyway.
>>
>> The real world is not ideal.  Not all nameservers are identically
>> scoped - you MUST respect the ordering in resolv.conf - to do
>> otherwise is semantically broken.  If implementation simplicity means
>> literally doing queries in serial, then that is what you should do.
>
> Even legacy resolvers had the option to rotate the nameservers for
> load-balancing, so I think it's a stretch to say the ordering is
> supposed to be semantic. My view has always been that multiple
> nameservers in resolv.conf are for redundancy, not for serving
> conflicting records.

You argued above that you should not try a secondary server in case of
timeout or SERVFAIL.  Obviously you would not try it on success nor
NXDOMAIN.  When do you see a secondary being used at all?

As for rotate, note that it is an option and OFF by default in every
mainstream resolver implementation.

But this point is sort of academic for us - we're moving to a
forwarding nameserver so really there is only the primary nameserver.
 We just need you to ask the first nameserver first.

>> Similarly, you can't just search all search domains in parallel and
>> take the first response.  The ordering is meaningful.
>
> Indeed, search domains are like that, because they inherently produce
> ambiguity/overlapping namespaces with different definitions. This is
> why myself and others who weighed in on the original question of
> supporting them were against, but left the option open to revisit the
> topic if users who need them show up.

Yeah, I scanned the related threads.  I understand the issue in
theory, but in practice these are things configured by admins.  If
there's a conflict or ambiguity, you should solve that, not jettison
powerful functionality.

>> > I also have a few questions:
>> >
>> > 1. Do you need multiple search items, or just a single domain?
>> >    Any setup with multiple searches necessarily has suboptimal
>> >    performance because ndots is not sufficient to make the right
>> >    initial choice of query. If you do need this functionality, a
>> >    unioning proxy dns server may be a better option than resolv.conf
>> >    domain search; it would give much better performance.
>>
>> We use multiple search paths and ndots > 1.  I'm not sure what you
>> mean by "unioning" here.  Search path ordering is as meaningful as
>> nameserver ordering.  You can't avoid making the query for each search
>> suffix in the worst case, and it has the same restriction as
>> nameserver - the search order must be respected.
>>
>> There does seem to be some different implementations that search for
>> the "naked" query first vs last, though.  I think the semantically
>> correct (but pessimal performance) is to search for that last.
>
> The traditional behavior is to do the naked query first if the query
> string has at least 'ndots' dots, and to do the search domains first
> otherwise. Also I believe a final dot always suppresses search.
>
> My point was that with ndots=1 (default) and only a single search
> domain, the _expected_ result is that the first query succeed. But if
> you have ndots>1 or multiple search domains, you expect a portion of
> your queries to fall back at least once. This adds significant
> latency.

It adds latency, but the magnitude is very much determined by the
installation.  In our case it is negligible and well worth the cost.
I fear you're optimizing without data - it should be the site-admin's
problem to configure things in an acceptable way.  libc doesn't get to
decide what "acceptable" means.

> In such a situation, you can avoid the additional latency (except on
> the first query of a given record) by running a local caching
> nameserver that does the search and unioning for you, rather than
> having the stub resolver in libc do it. Then subsequent queries
> succeed immediately using the cache. The reason I asked about usage
> case (ndots=1 vs ndots>1, single vs multiple search) is that, in the
> multi-fallback case, it might make more sense (from a performance and
> clean design standpoint) to implement this with a caching nameserver
> on localhost rather than in musl.

We might be moving to a per-machine local DNS agent, which would cache
as you describe.  HOWEVER, there's a pretty important piece that I
guess I left out.  Docker and Kubernetes and similar systems run many
containers per machine.  Each container has a potentially different
search path.  I might run 100 or more containers on a single machine -
I can't run 100 DNS caches, and I can't put that back on users.

So from our perspective the search paths MUST come from the containers
themselves, even if we run a machine local cache to mitigate latency
and SPOF.

>> > 2. For your intended applications, is there a need to support
>> >    ndots>1?  Such configurations are generally not friendly to
>> >    applications that expect to be able to resolve normal internet
>> >    domain lookups, and performance for such lookups will be very bad
>> >    (because the search domains first have to fail).
>>
>> DNS is a very lightweight protocol.  We have not measured any
>> practical detriment for having 6 search domains and ndots=5.  In the
>> normal case it fails very quickly.  That aside, it should be my
>> business if I want to (mis)configure my system that way :)
>
> I suspect we have different definitions of quick... :)

Quick is situational.  In a cloud-based mostly-webapp stack, 50ms to
do a name lookup ain't so bad, given the relative infrequency of that
operation.  Also most names actually DO resolve on the first or second
search path.

>> > 3. The glibc behavior is just to swap the order of search when
>> >    the query string has >=ndots dots in it, but would it be
>> >    acceptable never to try the search domains at all in this case?
>> >    That would yield much better performance for nxdomain results and
>> >    avoid unexpected positive results due to weird subdomains
>> >    existing in your search domain (e.g. a wildcard for
>> >    *.us.example.com would cause *.us to wrongly resolve for
>> >    non-existant .us domains).
>>
>> I think that would be correct.  If I have 3 dots and ndots=2, search
>> paths should be ignored.
>
> Glad we agree on this.
>
> I hope you feel like this conversation is productive. I don't want to
> rule out anything/"say no" right away, but rather try to get a better
> understanding of your requirements first and figure out what makes the
> most sense to do on musl's side.

Absolutely.  I'm happy to engage.  Obviously our use case is a bit
outside of what musl was really aiming for, but it offers a really
nice base for very efficient containers.  A lot of people want to use
Alpine and it breaks my heart to tell them they can't.