Hi Jameel, Also on the same subject (since you have specifically pointed out Consul) is a thread I started at http://www.openwall.com/lists/musl/2015/09/04/3 which may interest you. I actually maintain the official Docker Alpine Linux image at https://github.com/gliderlabs/docker-alpine and there is a similar thread we are tracking information at https://github.com/gliderlabs/docker-alpine/issues/8. I'd also be interested in the conversation that took place on IRC (I'm in the channel but must have missed it). Rich, are you able to give dates / times that I might be able to go back in my own IRC client to check out what was discussed? -Andy On Tue, Sep 15, 2015 at 5:56 AM, Jameel Al-Aziz wrote: > Thanks for the response! > > I would love to know more about the conversation on IRC. > > I almost feel like there are valid arguments on both sides. In a > distributed environment, where machines and services come and go, it's > pretty difficult to guarantee consistent records both reliably and quickly. > > While I was able to semi-solve my problem by enabling recursors through > Consul DNS, I realized that I have a chicken and egg problem. The caveat > here is this is particular to docker and some of the decisions they've made. > > The basic issue is that I have some containers that need to be run with > "--net=host" and some that do not. In the "--net=host" containers > effectively copy over the host's resolv.conf. In order to make sure > everything can be resolved, I need to guarantee that Consul is setup as > early as possible. However, in the case that the setup process needs DNS, > you run into a problem. I could do some clever hackery to use the default > host DNS and overwrite the host's /etc/resolv.conf after setting up Consul > DNS, but that's not the greatest solution. This problem can also occur with > bridged-networking containers if you choose to specify the "dynamic" DNS > server as a default dns option to the docker daemon. > > Put in more simple terms, we need normal DNS resolution while > bootstrapping, then as services register themselves, we need dependent > services to be able to look up the newly registered entries. Effectively, > the consistency is delayed at best. > > The other issue here is that having recursion enabled just feels wrong and > insecure. Sure, this is all behind a VPC, but I like to err on the side of > caution. > > I am probably wrong here, but it seems that the musl logic is only valid > when all nameservers are consistent. However, with dynamic service > registration, that consistency comes at the cost of speed. > > The behavior we would ideally want is as you mentioned: > "Assuming no _conflicting_ positive responses, it would need to do > something like forward positive responses as soon as it has at least > one positive response from upstream, but only forward negative > responses once it has a negative response from _all_ upstream sources." > > I'm almost certain we can accomplish what we want by having dnsmasq or > some other dns proxy/cache try Consul DNS first and then fallback upstream > for non-authoritative domains. The proxy has to be available very early on, > which is entirely doable in our scenario. However, it does add another > layer of indirection, which is just another potential failure point. > > All that being said, I definitely understand why the decision was made, > just would be nice to have an option to enable the "robust" logic! :) > > On Mon, Sep 14, 2015 at 9:43 PM Rich Felker wrote: > >> On Tue, Sep 15, 2015 at 03:25:20AM +0000, Jameel Al-Aziz wrote: >> > I'm sure this has been brought up before, but just thought I'd reach out >> > see if there's a solution. >> > >> > I use musl on Alpine via Docker. I encountered issues today where DNS >> > wasn't resolving the way we expect in our images. I finally managed to >> > trace it down to musl's resolver ( >> > >> http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc#Name_Resolver_.2F_DNS >> > ). >> > >> > We configure resolv.conf with three DNS servers: Consul DNS, AWS VPC >> DNS, >> > Google DNS. It turns out that the AWS VPC DNS is the fastest to respond >> and >> > therefore causes results to fail even though they can be served via >> Consul >> > DNS. Putting aside that the musl resolver logic breaks convention (which >> > many people rely on), it seems that in this case it is more >> unpredictable >> > than simply following the order. >> > >> > The host DNS is Consul, and while we could just setup Consul with >> > recursors, we run the risk of failing to resolve anything if Consul >> fails. >> > Setting up a local caching DNS is also overkill (we're in Docker >> > containers). >> > >> > Is there no way to force musl to follow the order of nameservers in >> > resolv.conf? Or even if not, to allow musl to accept the first >> successful >> > response instead of failing on the first response? It seems to me that >> we >> > have to give up reliability for predictability, which is not what this >> > feature was intended to do from my understanding. >> > >> > Any help on this matter would be greatly appreciated! >> >> Someone else raised this question on our IRC channel a week or two >> ago, and in short, the answer is no. Basically this setup does not >> make sense, even if you do have a resolver (glibc) that does do >> ordered fallback: >> >> - If you expect to sometimes need the second or third nameserver for >> queries the first cannot answer, then you're going to have terrible >> performance (multi-second delay before falling back to the second >> one). >> >> - Unless all the nameservers agree on the records they're serving (in >> which case you wouldn't care about order), your query results will >> be unstable/inconsistent when the first server fails to respond. The >> typical result is that you will wrongly get NxDomain instead of a >> failed/timed-out query. >> >> The second issue is really the motivation for what musl is doing: musl >> is assuming that all the nameservers have consistent records, because >> if they didn't, actual positive/negative results would be affected by >> transient failures rather than transient failures being reported to >> the calling program. This is a serious class of robustness (and >> possibly security, since DoS can translate into false results) >> failure. >> >> If you really need to union inconsistent records from multiple >> nameservers, the right way to do this is with a dns proxy/cache. >> Assuming no _conflicting_ positive responses, it would need to do >> something like forward positive responses as soon as it has at least >> one positive response from upstream, but only forward negative >> responses once it has a negative response from _all_ upstream sources. >> Of course these are the constraints to do it "right"/robustly. If all >> you want is something that works at least as well as glibc is working >> for you now, dnsmasq is probably sufficient. >> >> The conversation about all this on IRC was actually quite interesting. >> We have a no-public-logging policy so there are not logs posted >> anywhere, but if you're interested in more of what was discussed I >> could try to summarize it or see if the people involved would be ok >> with sharing a log excerpt. >> >> Rich >> >