mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Sascha Braun <sascha.braun.lpz@googlemail.com>
Cc: musl@lists.openwall.com
Subject: Re: [musl] Problably Issue in name_from_dns // __res_msend_rc (lookup_name.c)
Date: Fri, 3 Jun 2022 08:34:38 -0400	[thread overview]
Message-ID: <20220603123437.GA7074@brightrain.aerifal.cx> (raw)
In-Reply-To: <20220603122704.GZ7074@brightrain.aerifal.cx>

[-- Attachment #1: Type: text/plain, Size: 8706 bytes --]

On Fri, Jun 03, 2022 at 08:27:04AM -0400, Rich Felker wrote:
> On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote:
> > Hi Rich,
> > 
> > I think I narrowed the thing down. Below is a dump of what happens in a
> > 'normal' situation and what happened when the sporadic issue appeared.
> > 
> > Normal:
> > begin lookup3...
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> > __syscall_recv begin EP[0.0.0.0:0]
> > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> > ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010001000000390004d83ad4840000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > __syscall_recv begin EP[9.9.9.9:53]
> > __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53]
> > ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c00010000003400102a0014504001080800000000000020040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > connect:0200ffffd83ad4840000000000000000
> > connect:0a00ffff000000002a00145040010808000000000000200400000000
> > 
> > So we did sent the bytes
> > ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the
> > other sequence (hex encoded, 2 characters per byte) to each of the DNS and
> > we received back one 'short' and one 'long' reply from 9.9.9.9.
> > I guess the short one is IPv4, long one IPv6(?). That's the case with all
> > successful lookups, i.e. the 99% ok ones - (at least) one short - (at
> > least) one long.
> > 
> > Now the problematic one:
> > 
> > begin lookup1...
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > 94d90100000100000000000003777777037765620264650000010001
> > __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> > __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> > __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> > 94d901000001000000000000037777770377656202646500001c0001
> > 
> > __syscall_recv begin EP[0.0.0.0:0]
> > __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> > 94d981800001000100010000037777770377656202646500001c0001c00c000500010000004e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > __syscall_recv begin EP[9.9.9.9:53]
> > __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53]
> > 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > getaddrinfo1: Address in use
> > 
> > We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4
> > 
> > All occurrences of the problem show this constellation - two 'long'
> > responses received.
> > As a note, of course my implementation of recv returns the correct number
> > of bytes received. The zeros you see are only from the dump function, it's
> > dumping the 512 byte buffer.
> > 
> > I hope this is helpful in some manner.
> > 
> > I came across this, I seems glibc had a similar issue (I did not look
> > in-depth, just want to share the link)
> > https://bugzilla.redhat.com/show_bug.cgi?id=1044628
> > https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html
> 
> OK, I found your problem. It's that the query ids for both the A and
> AAAA are the same, probably because you have a low-resolution or
> non-working clock_gettime. If the host environment does not provide a
> way to get a high resolution clock, I think you should still apply a
> monotonic increasing increment of the nanoseconds on each call where
> the host environment's time did not increase so that the clock is
> strictly monotonic. However musl's resolver should also deal with this
> case since it's always possible to get identical query ids (with low
> probability). We should just check if they're equal, and if so,
> increment the second one. I'll write a patch to do this.

See if the attached patch fixes it.


[-- Attachment #2: queryid.diff --]
[-- Type: text/plain, Size: 438 bytes --]

diff --git a/src/network/lookup_name.c b/src/network/lookup_name.c
index aa558c19..e20ad6db 100644
--- a/src/network/lookup_name.c
+++ b/src/network/lookup_name.c
@@ -155,6 +155,8 @@ static int name_from_dns(struct address buf[static MAXADDRS], char canon[static
 			if (qlens[nq] == -1)
 				return EAI_NONAME;
 			qbuf[nq][3] = 0; /* don't need AD flag */
+			if (nq && qbuf[nq][0] == qbuf[0][0])
+				qbuf[nq][0]++;
 			nq++;
 		}
 	}

  reply	other threads:[~2022-06-03 12:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31 10:24 Sascha Braun
2022-06-01 14:14 ` Rich Felker
2022-06-01 20:35   ` Sascha Braun
2022-06-01 21:30     ` Rich Felker
2022-06-01 21:52       ` Sascha Braun
2022-06-02 14:25         ` Rich Felker
2022-06-02 22:28           ` Sascha Braun
2022-06-03 12:27             ` Rich Felker
2022-06-03 12:34               ` Rich Felker [this message]
2022-06-03 13:44                 ` Sascha Braun
2022-06-05 18:14                 ` Sascha Braun
2022-06-05 18:47                   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220603123437.GA7074@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=musl@lists.openwall.com \
    --cc=sascha.braun.lpz@googlemail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).