From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23062 invoked from network); 1 Dec 2021 15:23:28 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 1 Dec 2021 15:23:28 -0000 Received: (qmail 17840 invoked by uid 550); 1 Dec 2021 15:23:25 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 17806 invoked from network); 1 Dec 2021 15:23:24 -0000 Date: Wed, 1 Dec 2021 10:23:11 -0500 From: Rich Felker To: Mark Hills Cc: musl@lists.openwall.com Message-ID: <20211201152311.GF7074@brightrain.aerifal.cx> References: <2112011203500.21490@stax.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2112011203500.21490@stax.localdomain> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] DNS resolver fails prematurely when server reports failure? On Wed, Dec 01, 2021 at 12:49:07PM +0000, Mark Hills wrote: > With multiple DNS servers in /etc/resolv.conf, the docs [1] are clear: > > "musl's resolver queries them all in parallel and accepts whichever > response arrives first." > > So dual configuration is expected to give greater resiliancy: > > nameserver 213.186.33.99 # OVH > nameserver 1.1.1.1 # Cloudflare > > However, 1.1.1.1 appears quite prone to some kind of internal SERVFAIL > (may be internal load shedding; though we are not making excessive DNS > queries) > > With glibc's cascading behaviour (or perhaps another OS) this may be dealt > with by the client. > > But if the wiki is read literally, the first response received is "this > server has failed" then a good response from another server is ignored? No. ServFail is an inconclusive response, treated basically the same as if no packet had arrived at all. (Slight difference: it triggers immediate retry up to a limited number of times.) > And indeed this seems to be the behaviour we experience, as removing > 1.1.1.1 restored reliability. Have you looked at a packet capture of what's happening? Likely 1.1.1.1 was returning a false conclusive result (NxDomain or NODATA) rather than ServFail. Rich