From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 5572 invoked from network); 1 Dec 2021 12:49:22 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 1 Dec 2021 12:49:22 -0000 Received: (qmail 17866 invoked by uid 550); 1 Dec 2021 12:49:20 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 17833 invoked from network); 1 Dec 2021 12:49:20 -0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xwax.org; s=swing; h=Content-Type:MIME-Version:Message-ID:Subject:To:From:Date:Sender: Reply-To:Cc:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=QNcOD/FEvqurYwBuGmAWOsVxoAFmju4wjVkq8lIWhes=; b=i2dqSTZe1W8UojVtKaVw25gFx1 LFeXCpxscsCQ5HP0MAHSF887g/WUVWcImjQ9+O8UG1kzu731vgulB2CtNt+Fs/yaNLQlLh41RkFu9 DTy0lRa+fzdrZwQkFf40JD7kLhmZrjoWpbnCHaBPq5kFHMyc45R+UScCyXozO9f3QpWk=; Date: Wed, 1 Dec 2021 12:49:07 +0000 (GMT) From: Mark Hills To: musl@lists.openwall.com Message-ID: <2112011203500.21490@stax.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Subject: [musl] DNS resolver fails prematurely when server reports failure? With multiple DNS servers in /etc/resolv.conf, the docs [1] are clear: "musl's resolver queries them all in parallel and accepts whichever response arrives first." So dual configuration is expected to give greater resiliancy: nameserver 213.186.33.99 # OVH nameserver 1.1.1.1 # Cloudflare However, 1.1.1.1 appears quite prone to some kind of internal SERVFAIL (may be internal load shedding; though we are not making excessive DNS queries) With glibc's cascading behaviour (or perhaps another OS) this may be dealt with by the client. But if the wiki is read literally, the first response received is "this server has failed" then a good response from another server is ignored? And indeed this seems to be the behaviour we experience, as removing 1.1.1.1 restored reliability. I tried to confirm this in the source [2] but found I'd need more time to understand this code. Also, diagnosis was made more difficult by a colleage diligently following the resolv.conf(5) man page on the host (installed via man-pages on Alpine Linux) but this documents glibc. Perhaps musl could/should provide its own, but I expect there is a policy for this and similar issues. Thanks [1] https://wiki.musl-libc.org/functional-differences-from-glibc.html [2] https://git.musl-libc.org/cgit/musl/tree/src/network/lookup_name.c#n296 -- Mark