From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 23479 invoked from network); 3 Jun 2022 12:27:21 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 3 Jun 2022 12:27:21 -0000 Received: (qmail 30058 invoked by uid 550); 3 Jun 2022 12:27:18 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 30018 invoked from network); 3 Jun 2022 12:27:17 -0000 Date: Fri, 3 Jun 2022 08:27:04 -0400 From: Rich Felker To: Sascha Braun Cc: musl@lists.openwall.com Message-ID: <20220603122704.GZ7074@brightrain.aerifal.cx> References: <20220601141453.GW7074@brightrain.aerifal.cx> <20220601213022.GX7074@brightrain.aerifal.cx> <20220602142504.GY7074@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Problably Issue in name_from_dns // __res_msend_rc (lookup_name.c) On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote: > Hi Rich, >=20 > I think I narrowed the thing down. Below is a dump of what happens in a > 'normal' situation and what happened when the sporadic issue appeared. >=20 > Normal: > begin lookup3... > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > __syscall_recv begin EP[0.0.0.0:0] > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] > ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010= 001000000390004d83ad4840000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > __syscall_recv begin EP[9.9.9.9:53] > __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53] > ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c0= 0010000003400102a0014504001080800000000000020040000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > connect:0200ffffd83ad4840000000000000000 > connect:0a00ffff000000002a00145040010808000000000000200400000000 >=20 > So we did sent the bytes > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the > other sequence (hex encoded, 2 characters per byte) to each of the DNS and > we received back one 'short' and one 'long' reply from 9.9.9.9. > I guess the short one is IPv4, long one IPv6(?). That's the case with all > successful lookups, i.e. the 99% ok ones - (at least) one short - (at > least) one long. >=20 > Now the problematic one: >=20 > begin lookup1... > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > 94d90100000100000000000003777777037765620264650000010001 > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] > 94d90100000100000000000003777777037765620264650000010001 > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > 94d90100000100000000000003777777037765620264650000010001 > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > 94d901000001000000000000037777770377656202646500001c0001 > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] > 94d901000001000000000000037777770377656202646500001c0001 > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > 94d901000001000000000000037777770377656202646500001c0001 >=20 > __syscall_recv begin EP[0.0.0.0:0] > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] > 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000= 04e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f= 0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a8= 00000003c000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > __syscall_recv begin EP[9.9.9.9:53] > __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53] > 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000= 114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f= 0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a8= 00000003c000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > getaddrinfo1: Address in use >=20 > We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4 >=20 > All occurrences of the problem show this constellation - two 'long' > responses received. > As a note, of course my implementation of recv returns the correct number > of bytes received. The zeros you see are only from the dump function, it's > dumping the 512 byte buffer. >=20 > I hope this is helpful in some manner. >=20 > I came across this, I seems glibc had a similar issue (I did not look > in-depth, just want to share the link) > https://bugzilla.redhat.com/show_bug.cgi?id=3D1044628 > https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html OK, I found your problem. It's that the query ids for both the A and AAAA are the same, probably because you have a low-resolution or non-working clock_gettime. If the host environment does not provide a way to get a high resolution clock, I think you should still apply a monotonic increasing increment of the nanoseconds on each call where the host environment's time did not increase so that the clock is strictly monotonic. However musl's resolver should also deal with this case since it's always possible to get identical query ids (with low probability). We should just check if they're equal, and if so, increment the second one. I'll write a patch to do this. Rich