From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,HTML_OBFUSCATE_05_10, MAILING_LIST_MULTI,NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 22449 invoked from network); 5 Jun 2022 18:27:41 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 5 Jun 2022 18:27:41 -0000 Received: (qmail 1949 invoked by uid 550); 5 Jun 2022 18:27:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 30013 invoked from network); 5 Jun 2022 18:15:14 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6BK3zwAVyZT8rzWhZ4Gvy3LOe0MKcUzBfyMe9p6jzDA=; b=Qs84MSnXEIoyuLmwkFzv1Rluh1oLFKd3rON4Lb0TV2rUCeH9yDgHp55UboRDEdiLyS gf4vu4CeZcFCqrF9srluwT+tOwNgLm5Tz7F4XYv8lUJ4P3kEJ6YOXDYfIdvyLgAYQMcw LKIzliF0kGMtnk9O4yH4dAVA393c82TZ2tz2dOGn9pr24Ah1uJ1fcxxiMv0Dmwl4CC/m R2A9/h/eQrL5dMv21UiSL32+VgOF22S+C1e+//2dhzKkN/7dDHoGI5gUIRDUszgsN2lS tth94IUhNkrwqa+NGKPVjEWi2940Lhtiwg0cJDWKSDBWkI+GT5TyoUKiwYINZwZYHCa4 wXUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6BK3zwAVyZT8rzWhZ4Gvy3LOe0MKcUzBfyMe9p6jzDA=; b=xcbpX3Y8GswaUlDkhZL9cAL7ORecXIM2uuOiz5D+p1jziaG8XPTHd2IL14UiirP+9H 4p0+LbGVRcYsTrt8kVAgJjeTIJgTU02BRExpiPKm6OJZKpP9xu4w/v/6MU684kI5iI+t ntvzGXzQjr6lpyBvqc/je7JHa4m9bfr9AIz2yvmUuGkXVJeoMDXa8NIrnL+zeWlIe6wz mtRX1nYRptz/MfUe+AzPpp9kDihH+c2NzxHmMmx6NfX6NHDaldn0wsfesDAchcAXUT21 6ki1s7tX1PgYhQQ4Ga1qntlIjh66Dgn1fribtRKg9nSca8LOBrFqvXo9UFwfav1vIXEI aQiw== X-Gm-Message-State: AOAM532kXrXgAVnC/fPYRGQHMqIGHki/43+x5Vu9eP/Lh7DBeVoLKNgM 9ZfmcUhfwnm/xxRJnQqXxtqEdnMjVKpls7LdRjM= X-Google-Smtp-Source: ABdhPJx7YlC0THnbCMCC5vfHlyBbb5opyCwvZq9RDIfZYaDT1k0JaIldZMIP5Fc2OaYaAk9PGCnDX675mKMJQv2JZVk= X-Received: by 2002:a1f:1dcf:0:b0:35d:5b78:4d0a with SMTP id d198-20020a1f1dcf000000b0035d5b784d0amr6450246vkd.9.1654452902038; Sun, 05 Jun 2022 11:15:02 -0700 (PDT) MIME-Version: 1.0 References: <20220601141453.GW7074@brightrain.aerifal.cx> <20220601213022.GX7074@brightrain.aerifal.cx> <20220602142504.GY7074@brightrain.aerifal.cx> <20220603122704.GZ7074@brightrain.aerifal.cx> <20220603123437.GA7074@brightrain.aerifal.cx> In-Reply-To: <20220603123437.GA7074@brightrain.aerifal.cx> From: Sascha Braun Date: Sun, 5 Jun 2022 20:14:49 +0200 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="0000000000008612f605e0b755c0" Subject: Re: [musl] Problably Issue in name_from_dns // __res_msend_rc (lookup_name.c) --0000000000008612f605e0b755c0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Rich, I tested your patch against my new (high res clock_realtime 100nsec) and my old (low res 1msec) code and in both cases it's working, no more problems. Many Thanks Sascha Am Fr., 3. Juni 2022 um 14:34 Uhr schrieb Rich Felker : > On Fri, Jun 03, 2022 at 08:27:04AM -0400, Rich Felker wrote: > > On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote: > > > Hi Rich, > > > > > > I think I narrowed the thing down. Below is a dump of what happens in= a > > > 'normal' situation and what happened when the sporadic issue appeared= . > > > > > > Normal: > > > begin lookup3... > > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > > __syscall_send_internal 02000035d043dede00000000000000 > 00208.67.222.222:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 > > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > > __syscall_send_internal 02000035d043dede00000000000000 > 00208.67.222.222:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001 > > > __syscall_recv begin EP[0.0.0.0:0] > > > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53= ] > > > > ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010= 001000000390004d83ad4840000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > > > __syscall_recv begin EP[9.9.9.9:53] > > > __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53= ] > > > > ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c0= 0010000003400102a0014504001080800000000000020040000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > > > connect:0200ffffd83ad4840000000000000000 > > > connect:0a00ffff000000002a00145040010808000000000000200400000000 > > > > > > So we did sent the bytes > > > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and > the > > > other sequence (hex encoded, 2 characters per byte) to each of the DN= S > and > > > we received back one 'short' and one 'long' reply from 9.9.9.9. > > > I guess the short one is IPv4, long one IPv6(?). That's the case with > all > > > successful lookups, i.e. the 99% ok ones - (at least) one short - (at > > > least) one long. > > > > > > Now the problematic one: > > > > > > begin lookup1... > > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > > 94d90100000100000000000003777777037765620264650000010001 > > > __syscall_send_internal 02000035d043dede00000000000000 > 00208.67.222.222:53] > > > 94d90100000100000000000003777777037765620264650000010001 > > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > > 94d90100000100000000000003777777037765620264650000010001 > > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53] > > > 94d901000001000000000000037777770377656202646500001c0001 > > > __syscall_send_internal 02000035d043dede00000000000000 > 00208.67.222.222:53] > > > 94d901000001000000000000037777770377656202646500001c0001 > > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53] > > > 94d901000001000000000000037777770377656202646500001c0001 > > > > > > __syscall_recv begin EP[0.0.0.0:0] > > > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53= ] > > > > 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000= 04e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f= 0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a8= 00000003c000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > > > __syscall_recv begin EP[9.9.9.9:53] > > > __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53= ] > > > > 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000= 114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f= 0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a8= 00000003c000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000 > > > getaddrinfo1: Address in use > > > > > > We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4 > > > > > > All occurrences of the problem show this constellation - two 'long' > > > responses received. > > > As a note, of course my implementation of recv returns the correct > number > > > of bytes received. The zeros you see are only from the dump function, > it's > > > dumping the 512 byte buffer. > > > > > > I hope this is helpful in some manner. > > > > > > I came across this, I seems glibc had a similar issue (I did not look > > > in-depth, just want to share the link) > > > https://bugzilla.redhat.com/show_bug.cgi?id=3D1044628 > > > https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html > > > > OK, I found your problem. It's that the query ids for both the A and > > AAAA are the same, probably because you have a low-resolution or > > non-working clock_gettime. If the host environment does not provide a > > way to get a high resolution clock, I think you should still apply a > > monotonic increasing increment of the nanoseconds on each call where > > the host environment's time did not increase so that the clock is > > strictly monotonic. However musl's resolver should also deal with this > > case since it's always possible to get identical query ids (with low > > probability). We should just check if they're equal, and if so, > > increment the second one. I'll write a patch to do this. > > See if the attached patch fixes it. > > --0000000000008612f605e0b755c0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Rich,

I tested your patch against my new (high res clock_realtime 100nsec) and = my old (low res 1msec) code and in both cases it's working, no more pro= blems.

Many Thanks
Sascha

Am Fr., 3. Juni 2022 um 14:34=C2=A0Uhr schrieb Rich Felke= r <dalias@libc.org>:
=
On Fri, Jun 03, 2022 at 0= 8:27:04AM -0400, Rich Felker wrote:
> On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote:
> > Hi Rich,
> >
> > I think I narrowed the thing down. Below is a dump of what happen= s in a
> > 'normal' situation and what happened when the sporadic is= sue appeared.
> >
> > Normal:
> > begin lookup3...
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001<= br> > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001<= br> > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001<= br> > > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001<= br> > > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001<= br> > > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001<= br> > > __syscall_recv begin EP[0.0.0.0:0]
> > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> > ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c= 00c00010001000000390004d83ad48400000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000000000
> > __syscall_recv begin EP[9.9.9.9:53]
> > __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53]
> > ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c= 00c001c00010000003400102a00145040010808000000000000200400000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000000000
> > connect:0200ffffd83ad4840000000000000000
> > connect:0a00ffff000000002a00145040010808000000000000200400000000<= br> > >
> > So we did sent the bytes
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 = and the
> > other sequence (hex encoded, 2 characters per byte) to each of th= e DNS and
> > we received back one 'short' and one 'long' reply= from 9.9.9.9.
> > I guess the short one is IPv4, long one IPv6(?). That's the c= ase with all
> > successful lookups, i.e. the 99% ok ones - (at least) one short -= (at
> > least) one long.
> >
> > Now the problematic one:
> >
> > begin lookup1...
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> >
> > __syscall_recv begin EP[0.0.0.0:0]
> > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> > 94d981800001000100010000037777770377656202646500001c0001c00c00050= 0010000004e000f0377777708672d68612d776562c014c02c00060001000000340031036e73= 3102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1= 000093a800000003c0000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000000000
> > __syscall_recv begin EP[9.9.9.9:53]
> > __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53]
> > 94d981800001000100010000037777770377656202646500001c0001c00c00050= 00100000114000f0377777708672d68612d776562c014c02c000600010000002c0031036e73= 3102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1= 000093a800000003c0000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 000000000000000000000000000000000000000000000000000000000000000000000000000= 00000000000000000000000000000000000000000000000000000000000
> > getaddrinfo1: Address in use
> >
> > We received two 'long' responses, one from 9.9.9.9; one f= rom 8.8.4.4
> >
> > All occurrences of the problem show this constellation - two '= ;long'
> > responses received.
> > As a note, of course my implementation of recv returns the correc= t number
> > of bytes received. The zeros you see are only from the dump funct= ion, it's
> > dumping the 512 byte buffer.
> >
> > I hope this is helpful in some manner.
> >
> > I came across this, I seems glibc had a similar issue (I did not = look
> > in-depth, just want to share the link)
> > https://bugzilla.redhat.com/show_bug.= cgi?id=3D1044628
> > https://sourceware.org/le= gacy-ml/libc-alpha/2014-04/msg00321.html
>
> OK, I found your problem. It's that the query ids for both the A a= nd
> AAAA are the same, probably because you have a low-resolution or
> non-working clock_gettime. If the host environment does not provide a<= br> > way to get a high resolution clock, I think you should still apply a > monotonic increasing increment of the nanoseconds on each call where > the host environment's time did not increase so that the clock is<= br> > strictly monotonic. However musl's resolver should also deal with = this
> case since it's always possible to get identical query ids (with l= ow
> probability). We should just check if they're equal, and if so, > increment the second one. I'll write a patch to do this.

See if the attached patch fixes it.

--0000000000008612f605e0b755c0--