Hi Rich,
I think I narrowed the thing down. Below is a dump of what happens in a 'normal' situation and what happened when the sporadic issue appeared.
Normal:
begin lookup3...
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53] ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53] ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53] ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53] ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
__syscall_recv begin EP[
0.0.0.0:0]
__syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010001000000390004d83ad4840000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
__syscall_recv begin EP[
9.9.9.9:53]
__syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53] ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c00010000003400102a0014504001080800000000000020040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
connect:0200ffffd83ad4840000000000000000
connect:0a00ffff000000002a00145040010808000000000000200400000000
So we did sent the bytes
ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the other sequence (hex encoded, 2 characters per byte) to each of the DNS and we received back one 'short' and one 'long' reply from 9.9.9.9.
I guess the short one is IPv4, long one IPv6(?). That's the case with all successful lookups, i.e. the 99% ok ones - (at least) one short - (at least) one long.
Now the problematic one:
begin lookup1...
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53] 94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] 94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53] 94d90100000100000000000003777777037765620264650000010001
__syscall_send_internal 020000350808040400000000000000008.8.4.4:53] 94d901000001000000000000037777770377656202646500001c0001
__syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53] 94d901000001000000000000037777770377656202646500001c0001
__syscall_send_internal 020000350909090900000000000000009.9.9.9:53] 94d901000001000000000000037777770377656202646500001c0001
__syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53] 94d981800001000100010000037777770377656202646500001c0001c00c000500010000004e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
__syscall_recv begin EP[
9.9.9.9:53]
__syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53] 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
getaddrinfo1: Address in use
We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4
All occurrences of the problem show this constellation - two 'long' responses received.
As a note, of course my implementation of recv returns the correct number of bytes received. The zeros you see are only from the dump function, it's dumping the 512 byte buffer.
I hope this is helpful in some manner.
I came across this, I seems glibc had a similar issue (I did not look in-depth, just want to share the link)
Thanks
Sascha