From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 6584 invoked from network); 25 Jun 2022 01:57:13 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 25 Jun 2022 01:57:13 -0000 Received: (qmail 3156 invoked by uid 550); 25 Jun 2022 01:57:10 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 3124 invoked from network); 25 Jun 2022 01:57:09 -0000 Date: Fri, 24 Jun 2022 21:56:56 -0400 From: Rich Felker To: Markus Geiger Cc: musl@lists.openwall.com Message-ID: <20220625015655.GR7074@brightrain.aerifal.cx> References: <20220624145936.GP7074@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] [BUG] Non-FQDN domain resolving failure on musl-1.2.x On Fri, Jun 24, 2022 at 07:14:10PM +0200, Markus Geiger wrote: > Sorry: not Amazon DNS – 10.204.109.209 is a BIND server in our network > we've setup to work with our global VPN/DNS. > > BUT the strange thing is that the domain lookup works with musl-1.1.24 > while with some musl-1.2.x just quits with an error. > > a comparison with the docker runs and `sudo tcpdump -v -i docker0 udp port > 53 or tcp port 53` did not bring up any diffs except the list of A records > returned is in a different order (which i think is completely normal). the > order of requests is the same > > tcpdump from working version: > > bind-us-east-1a.XXXXXXXXXXXXXX.domain > 172.17.0.3.45501: 18685 9/13/8 > slack.com. A 3.95.117.96, slack.com. A 34.231.24.224, slack.com. A > 54.163.235.119, slack.com. A 54.147.59.169, slack.com. A 34.193.255.5, > slack.com. A 34.204.109.226, slack.com. A 34.225.62.185, slack.com. A > 34.203.97.10, slack.com. A 54.92.199.186 (510) > > tcpdump from non-working version: > > bind-us-east-1a.XXXXXXXXXXXXXX.domain > 172.17.0.3.59951: 49211 9/13/8 > slack.com. A 34.225.62.185, slack.com. A 54.163.235.119, slack.com. A > 34.231.24.224, slack.com. A 54.147.59.169, slack.com. A 34.193.255.5, > slack.com. A 34.204.109.226, slack.com. A 54.92.199.186, slack.com. A > 3.95.117.96, slack.com. A 34.203.97.10 (510) > > Complete log: > > 172.17.0.3.59951 > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain: > 49211+ A? slack.com. (27) > 18:56:19.990087 IP (tos 0x0, ttl 64, id 10210, offset 0, flags [DF], proto > UDP (17), length 55) > 172.17.0.3.59951 > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain: > 49334+ AAAA? slack.com. (27) > 18:56:20.154990 IP (tos 0x0, ttl 250, id 17825, offset 0, flags [none], > proto UDP (17), length 538) > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain > 172.17.0.3.59951: > 49211 9/13/8 slack.com. A 34.225.62.185, slack.com. A 54.163.235.119, > slack.com. A 34.231.24.224, slack.com. A 54.147.59.169, slack.com. A > 34.193.255.5, slack.com. A 34.204.109.226, slack.com. A 54.92.199.186, > slack.com. A 3.95.117.96, slack.com. A 34.203.97.10 (510) > 18:56:20.241377 IP (tos 0x0, ttl 250, id 17846, offset 0, flags [none], > proto UDP (17), length 55) > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain > 172.17.0.3.59951: > 49334 ServFail 0/0/0 (27) > 18:56:20.241501 IP (tos 0x0, ttl 64, id 10233, offset 0, flags [DF], proto > UDP (17), length 55) Here's your problem -- the server is returning ServFail rather than an answer for some of the queries. This makes musl's resolver continue retrying for an answer. In an old version, there may have been a bug whereby, after the retries timed out, the fact that one query failed was sometimes overlooked. This logic was improved between the versions you tested as part of ensuring DNSSEC integrity. In any case, you just need to find the cause of the ServFail (maybe a hack someone put in place to try to suppress use of IPv6?) and fix it. Rich