mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com
Cc: Andrey Arapov <andrey.arapov@nixaid.com>
Subject: Re: DNS FQDN query issue in musl starting 1.1.13
Date: Fri, 13 Sep 2019 08:15:28 -0400	[thread overview]
Message-ID: <20190913121528.GE9017@brightrain.aerifal.cx> (raw)
In-Reply-To: <bca0b9302a2485a52ee69d4bd7d26ecc@nixaid.com>

On Fri, Sep 13, 2019 at 07:43:28AM +0000, Andrey Arapov wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hello,
> 
> I've noticed that musl C lib starting 1.1.13 isn't trying to resolve the FQDN in the first place,
> it rather tries <FQDN>.<search_domain_found_in_/etc/resolv.conf_file> first which is different to how GNU
> C library is working.

This is only the case if fqdn contains fewer than ndots dots and does
not end in a dot. This behavior should match all other resolvers.

> Also, since musl C library is "never falling back to search, which glibc would do" according to
> https://wiki.musl-libc.org/functional-differences-from-glibc.html#Name-Resolver/DNS
> 
> this poses an issue when DNS server is misconfigured.
> 
> For example, when DNS server is returning SERVFAIL (no SOA), the musl C is simply stopping from
> attempting the FQDN.

If one lookup ends in ServFail, it's indeterminate and must be
reported as an error to the caller. Otherwise the successful result of
a lookup yields different values depending on transient failure of a
nameserver. This is dangerously wrong regardless of whether other
implementations do it.

This was all discussed (with people involved in the Docker-related
projects using these kind of search tricks, as I remember it) at the
time search was added. Addition of search was explicitly conditional
on *not* reproducing buggy/dangerous behavior other implementations
have.

> So having a wrong record in the /etc/resolv.conf will cause musl C
> resolver to break way too fast.
> 
> I was wondering whether this is an expected behavior or not? And can
> this be changed in a way so musl C lib is trying the FQDN first?

Don't set ndots>1. ndots>1 has all sorts of problems.

> This behavior is making some people resort to using short hostnames instead of FQDNs, such as
> ad-hoc patching ucp-metrics (Alpine based container) --
> https://forums.docker.com/t/ucp-dashboard-shows-no-data/72337/4
> 
> 
> To expand the issue with the ucp-metrics:
> 
> So when resolv.conf is set to the following configuration:
> nameserver 10.96.0.10
> search kube-system.svc.cluster.local svc.cluster.local cluster.local some.brokendnsserver.com
> options ndots:5
> 
> An attempt to resolve the
> ucp-controller.kube-system.svc.cluster.local will be rendered into
> attempt to resolve the
> ucp-controller.kube-system.svc.cluster.local.some.brokendnsserver.com
> in the first place.
> 
> Workaround people use in the wild is: ucp-controller.kube-system.svc.cluster.local => ucp-controller
> 
> I've already informed the Docker Support about this issue, they are
> working on the knowledge base article regarding this issue, so
> people are aware of this and can decide to rather fix their domain
> search server (should they have an access/rights to) or resolv.conf
> record.

Ideally they just would not use ndots>1, since it necessarily yields
this and lots of other problems (like extra round trips and timeout
delay for each lookup, even if the lookup works). I'm not sure what to
recommend as an alternative since I don't entirely understand the
usage constraints here, but I know these issues were all known on the
Docker and Kubernetes side back when search was first implemented in
musl, and that folks understood that these uses of search domains with
multiple components were a problem and planned to phase them out. I
don't know what happened with that.

> I think that this should be fixed since even having the good domain
> search server is making the system prone to an error should the
> domain search server fail (or partially fail, returning SERVFAIL/[no
> SOA]) at any point of time.

This is entirely intentional. If one of the servers fails, the
application needs to know that it can't get a meaningful result for
its query. Not silently get the wrong result.

Rich


  reply	other threads:[~2019-09-13 12:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13  7:43 Andrey Arapov
2019-09-13 12:15 ` Rich Felker [this message]
2019-09-13 14:19 ` Andrey Arapov
2019-09-13 15:11   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190913121528.GE9017@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=andrey.arapov@nixaid.com \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).