From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 24299 invoked from network); 3 Jun 2022 12:34:55 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 3 Jun 2022 12:34:55 -0000 Received: (qmail 5915 invoked by uid 550); 3 Jun 2022 12:34:53 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 5871 invoked from network); 3 Jun 2022 12:34:51 -0000 Date: Fri, 3 Jun 2022 08:34:38 -0400 From: Rich Felker To: Sascha Braun Cc: musl@lists.openwall.com Message-ID: <20220603123437.GA7074@brightrain.aerifal.cx> References: <20220601141453.GW7074@brightrain.aerifal.cx> <20220601213022.GX7074@brightrain.aerifal.cx> <20220602142504.GY7074@brightrain.aerifal.cx> <20220603122704.GZ7074@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Qa0ccP92Gc0Ko3P8" Content-Disposition: inline In-Reply-To: <20220603122704.GZ7074@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Problably Issue in name_from_dns // __res_msend_rc (lookup_name.c) --Qa0ccP92Gc0Ko3P8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jun 03, 2022 at 08:27:04AM -0400, Rich Felker wrote: > On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote: > > Hi Rich, > >=20 > > I think I narrowed the thing down. Below is a dump of what happens in a > > 'normal' situation and what happened when the sporadic issue appeared. > >=20 > > Normal: > > begin lookup3... > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:= 53] > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:= 53] > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > __syscall_recv begin EP[0.0.0.0:0] > > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] > > ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c000= 10001000000390004d83ad48400000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000 > > __syscall_recv begin EP[9.9.9.9:53] > > __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53] > > ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001= c00010000003400102a00145040010808000000000000200400000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000 > > connect:0200ffffd83ad4840000000000000000 > > connect:0a00ffff000000002a00145040010808000000000000200400000000 > >=20 > > So we did sent the bytes > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the > > other sequence (hex encoded, 2 characters per byte) to each of the DNS = and > > we received back one 'short' and one 'long' reply from 9.9.9.9. > > I guess the short one is IPv4, long one IPv6(?). That's the case with a= ll > > successful lookups, i.e. the 99% ok ones - (at least) one short - (at > > least) one long. > >=20 > > Now the problematic one: > >=20 > > begin lookup1... > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > 94d90100000100000000000003777777037765620264650000010001 > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:= 53] > > 94d90100000100000000000003777777037765620264650000010001 > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > 94d90100000100000000000003777777037765620264650000010001 > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > 94d901000001000000000000037777770377656202646500001c0001 > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:= 53] > > 94d901000001000000000000037777770377656202646500001c0001 > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > 94d901000001000000000000037777770377656202646500001c0001 > >=20 > > __syscall_recv begin EP[0.0.0.0:0] > > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] > > 94d981800001000100010000037777770377656202646500001c0001c00c00050001000= 0004e000f0377777708672d68612d776562c014c02c00060001000000340031036e73310270= 6f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093= a800000003c0000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000 > > __syscall_recv begin EP[9.9.9.9:53] > > __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53] > > 94d981800001000100010000037777770377656202646500001c0001c00c00050001000= 00114000f0377777708672d68612d776562c014c02c000600010000002c0031036e73310270= 6f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093= a800000003c0000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000 > > getaddrinfo1: Address in use > >=20 > > We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4 > >=20 > > All occurrences of the problem show this constellation - two 'long' > > responses received. > > As a note, of course my implementation of recv returns the correct numb= er > > of bytes received. The zeros you see are only from the dump function, i= t's > > dumping the 512 byte buffer. > >=20 > > I hope this is helpful in some manner. > >=20 > > I came across this, I seems glibc had a similar issue (I did not look > > in-depth, just want to share the link) > > https://bugzilla.redhat.com/show_bug.cgi?id=3D1044628 > > https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html >=20 > OK, I found your problem. It's that the query ids for both the A and > AAAA are the same, probably because you have a low-resolution or > non-working clock_gettime. If the host environment does not provide a > way to get a high resolution clock, I think you should still apply a > monotonic increasing increment of the nanoseconds on each call where > the host environment's time did not increase so that the clock is > strictly monotonic. However musl's resolver should also deal with this > case since it's always possible to get identical query ids (with low > probability). We should just check if they're equal, and if so, > increment the second one. I'll write a patch to do this. See if the attached patch fixes it. --Qa0ccP92Gc0Ko3P8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="queryid.diff" diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c index aa558c19..e20ad6db 100644 --- a/src/network/lookup_name.c +++ b/src/network/lookup_name.c @@ -155,6 +155,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static if (qlens[nq] == -1) return EAI_NONAME; qbuf[nq][3] = 0; /* don't need AD flag */ + if (nq && qbuf[nq][0] == qbuf[0][0]) + qbuf[nq][0]++; nq++; } } --Qa0ccP92Gc0Ko3P8--