From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 14901 invoked from network); 7 Feb 2022 19:19:32 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 7 Feb 2022 19:19:32 -0000 Received: (qmail 32350 invoked by uid 550); 7 Feb 2022 19:19:30 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 32317 invoked from network); 7 Feb 2022 19:19:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7grZM5Eo9R10y9hbWFVz9aK9A6z0+uRgCpDRpqkT978=; b=LYplO7n4s4S6CgQ1UAq86ObH3nKOj/l32fA9udC/5BQLBMTTjUCghzrBSHsDQrZDXn I0x6n26K8gd7xpfChUcH1jK7ZD/qDRW/FldKEYBfDdMi1XpRENX8B3Dw+D7/jIUiHLjb FBlXM8CIOWywVe7PhVBAjGlxboiMDZNgrBQ8BP+KnVONSPraaVpW4kmk0mMYBgU98CG7 1ykL0iL1piU0YCA/8kpbt0gYNi4D/7V38N9qDB6Qx+YqE8ZmRZ0U/JsHnYOdR57rOvA6 HGTtDKCX4HB+fUBo6oLtuEhWoqeErZiNH+UYFaA4c2LYV5/SipEv1oGYjWcHkuKCuTX4 1GjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7grZM5Eo9R10y9hbWFVz9aK9A6z0+uRgCpDRpqkT978=; b=ZBnLygxtwJ3rfxPL8MWea8qB+sAQlVitsRT1G7O9BNs7nLUPevtCZRLzODU0SJtrmC mbwgvQpRJFd+3JxWxQkFMMtyBXKIIITy5bjnznuBD2Lb085k5lfthXbJu/Jar7JWjUp+ Jk1BtKnM9ZwOmvFFh8W7wQyRBMwtjZoo8+yKvGLrpCCan111cxqJELreEpRE/jhvw9v8 hfbAYFpFNKBzDH+WMWzgjdKzJlqSA+29ZJ5FAUcAenXyXVcBuPhzOfj/UBK5G1GpYDgo +ekXwE5l997C7bAW7u+rhS8IQOBWcEz19ECOXI5AYDBFCN0Cgj+4ZEfeS5aC/nCeAjvG h43A== X-Gm-Message-State: AOAM530jCxJU8AjjLJ2WhanbBLHf5xTGJxxtFyUCg94XS9xUl0kjMQGy dha1ZUYycTWU/JI00x1wNgiL1m4D2qbtVaftj3Ro8jVMapc= X-Google-Smtp-Source: ABdhPJzfX6uQjACE9wLEz7DdEDcQ/VGRJBZrWQz+rB+oBLxyInBaPwaueWPGEWTN5v0yiOy3vYLAbzo0CBTOoWtbyw8= X-Received: by 2002:a2e:b0cd:: with SMTP id g13mr589307ljl.362.1644261557707; Mon, 07 Feb 2022 11:19:17 -0800 (PST) MIME-Version: 1.0 References: <20220206213032.GU7074@brightrain.aerifal.cx> <20220206234405.GW7074@brightrain.aerifal.cx> <20220207024056.GY7074@brightrain.aerifal.cx> In-Reply-To: <20220207024056.GY7074@brightrain.aerifal.cx> From: Satadru Pramanik Date: Mon, 7 Feb 2022 14:19:05 -0500 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="00000000000010a8fb05d7727a24" Subject: Re: [musl] Re: musl getaddr info breakage on older kernels --00000000000010a8fb05d7727a24 Content-Type: text/plain; charset="UTF-8" The test programs are being run from... glibc 2.23 -> bash (crosh shell) crosh shell -> invokes ruby -> invokes bash to run the test programs. tcpdump on the router shows no network activity at all when running the test program with tcpdump -i any -vvv host (IP address) When I run the test pogram with strace though I see this: 14:06:24.617860 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 56) 192.168.0.121.46846 > office.lan.53: [udp sum ok] 16051+ A? google.com. (28) 14:06:24.622352 IP (tos 0x0, ttl 64, id 15884, offset 0, flags [DF], proto UDP (17), length 72) office.lan.53 > 192.168.0.121.46846: [bad udp cksum 0x8210 -> 0x7bc1!] 16051 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 (44) 14:06:24.688610 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 56) 192.168.0.121.42267 > office.lan.53: [udp sum ok] 35406+ A? google.com. (28) 14:06:24.688931 IP (tos 0x0, ttl 64, id 15887, offset 0, flags [DF], proto UDP (17), length 72) office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x8210 -> 0x4209!] 35406 q: A? google.com. 1/0/0 google.com. [1m32s] A 142.251.40.110 (44) 14:06:24.689018 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 56) 192.168.0.121.42267 > office.lan.53: [udp sum ok] 13657+ AAAA? google.com. (28) 14:06:24.689186 IP (tos 0x0, ttl 64, id 15888, offset 0, flags [DF], proto UDP (17), length 84) office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x821c -> 0xc77e!] 13657 q: AAAA? google.com. 1/0/0 google.com. [20s] AAAA 2607:f8b0:4006:80b::200e (56) On Sun, Feb 6, 2022 at 9:40 PM Rich Felker wrote: > On Sun, Feb 06, 2022 at 08:29:16PM -0500, Satadru Pramanik wrote: > > Here are illustrative logs of output and strace logs. > > > > Note that while the musl toolchain is built in a container on a much more > > powerful machine, this "musl_getaddrinfo_test" app is built locally on > the > > machine with the 3.8 kernel. > > > > I ran the following to get the output on the smaller i686 machine > > immediately after the app is built. > > Apologies for the ruby code wrapping the shell commands. > > > > @musl_ver = `#{CREW_MUSL_PREFIX}/lib/libc.so 2>&1 >/dev/null | head > -2 > > | tail -1 | awk '{print $2}'`.chomp > > puts 'Testing the musl resolver to see if it can resolve google.com: > > '.lightblue > > system "./musl_getaddrinfo_test google.com set_ai_family 2>&1 |tee > -a > > /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_set_ai_family.txt " > > system "./musl_getaddrinfo_test google.com 2>&1 |tee -a > > /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com.txt" > > system "strace -o > > > /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_set_ai_family_STRACE.txt > > ../musl_getaddrinfo_test google.com set_ai_family" > > system "strace -o > > /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_STRACE.txt > > ../musl_getaddrinfo_test google.com" > > > > And here is the output for each run before running again via strace. Note > > how IPv6 addresses show up sporadically, and for 1.2.2 nothing at all > shows > > up, but everything works fine according to the strace logs. (Strace is > > built against glibc 2.23.) > > > > ==> > > musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com_set_ai_family.txt > > <== > > AF_INET: 142.251.40.110 > > > > ==> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com.txt <== > > AF_INET: 142.251.40.110 > > > > ==> > > musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com_set_ai_family.txt > > <== > > AF_INET: 142.251.40.142 > > > > ==> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com.txt <== > > getaddrinfo: Try again > > > > ==> > > musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com_set_ai_family.txt > > <== > > AF_INET: 142.251.40.206 > > > > ==> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com.txt <== > > AF_INET6: 2607:f8b0:4006:81f::200e > > AF_INET: 142.251.40.206 > > > > ==> > > musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com_set_ai_family.txt > <== > > AF_INET: 142.250.65.206 > > > > ==> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com.txt <== > > AF_INET: 142.250.65.206 > > > > ==> musl_1.2.1_getaddrinfo_test_google.com_set_ai_family.txt <== > > AF_INET: 142.251.40.110 > > > > ==> musl_1.2.1_getaddrinfo_test_google.com.txt <== > > getaddrinfo: Try again > > > > ==> musl_1.2.2_getaddrinfo_test_google.com_set_ai_family.txt <== > > getaddrinfo: Try again > > > > ==> musl_1.2.2_getaddrinfo_test_google.com.txt <== > > getaddrinfo: Try again > > > > Regards, > > OK, I don't see anything in the strace suggesting a cause. The kernel > version (or whether a container was used) present on the system where > you built musl or the test programs should make no difference > whatsoever; musl has no build dependencies on the host kernel or > kernel headers or anything like that (and doesn't even need to be > built on a Linux host). > > A couple questions: > > Are the test programs on the i686 machine running under Docker or any > other container environment? > > Can you tcpdump the traffic between the test program and the dnsmasq > during a failing run, with verbose display of the packet contents > (-vvv or something like that)? > > I don't see any plausible explanation for the result varying between > runs and with timing like this unless dnsmasq is doing something > odd/wrong. I thought it might be related to something blocking time64 > syscalls but that doesn't seem to be the case -- according to the > strace logs they're getting ENOSYS as expected with fallback to the > legacy 32-bit clock_gettime etc. which is fine. > --00000000000010a8fb05d7727a24 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The test programs are being run from...
glibc 2.23 -&g= t; bash (crosh shell)
crosh shell -> invokes ruby -> invoke= s bash to run the test programs.

tcpdump on the ro= uter shows no network activity at all when running the=C2=A0test program wi= th tcpdump -i any -vvv host (IP address)
When I run the test pogr= am with strace though I see this:
14:06:24.617860 IP (tos 0x0, tt= l 64, id 0, offset 0, flags [DF], proto UDP (17), length 56)
=C2=A0 =C2= =A0 192.168.0.121.46846 > office.lan.53: [udp sum ok] 16051+ A? google.com. (28)
14:06:24.622352 IP (tos 0x0,= ttl 64, id 15884, offset 0, flags [DF], proto UDP (17), length 72)
=C2= =A0 =C2=A0 office.lan.53 > 192.168.0.121.46846: [bad udp cksum 0x8210 -&= gt; 0x7bc1!] 16051 q: A? google.com. 1/0/= 0 google.com. [1m32s] A 142.251.40.110 (4= 4)
14:06:24.688610 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], prot= o UDP (17), length 56)
=C2=A0 =C2=A0 192.168.0.121.42267 > office.lan= .53: [udp sum ok] 35406+ A? google.com. (= 28)
14:06:24.688931 IP (tos 0x0, ttl 64, id 15887, offset 0, flags [DF],= proto UDP (17), length 72)
=C2=A0 =C2=A0 office.lan.53 > 192.168.0.1= 21.42267: [bad udp cksum 0x8210 -> 0x4209!] 35406 q: A? google.com. 1/0/0 google.co= m. [1m32s] A 142.251.40.110 (44)
14:06:24.689018 IP (tos 0x0, ttl 64= , id 0, offset 0, flags [DF], proto UDP (17), length 56)
=C2=A0 =C2=A0 1= 92.168.0.121.42267 > office.lan.53: [udp sum ok] 13657+ AAAA? google.com. (28)
14:06:24.689186 IP (tos 0x0, tt= l 64, id 15888, offset 0, flags [DF], proto UDP (17), length 84)
=C2=A0 = =C2=A0 office.lan.53 > 192.168.0.121.42267: [bad udp cksum 0x821c -> = 0xc77e!] 13657 q: AAAA? google.com. 1/0/0= google.com. [20s] AAAA 2607:f8b0:4006:80= b::200e (56)

On Sun, Feb 6, 2022 at 9:40 PM Rich Felker <dalias@aerifal.cx> wrote:
On Sun, Feb 06, 2022 at 08:= 29:16PM -0500, Satadru Pramanik wrote:
> Here are illustrative logs of output and strace logs.
>
> Note that while the musl toolchain is built in a container on a much m= ore
> powerful machine, this "musl_getaddrinfo_test" app is built = locally on the
> machine with the 3.8 kernel.
>
> I ran the following to get the output on the smaller i686 machine
> immediately after the app is built.
> Apologies for the ruby code wrapping the shell commands.
>
>=C2=A0 =C2=A0 =C2=A0@musl_ver =3D `#{CREW_MUSL_PREFIX}/lib/libc.so 2>= ;&1 >/dev/null | head -2
> | tail -1 | awk '{print $2}'`.chomp
>=C2=A0 =C2=A0 =C2=A0puts 'Testing the musl resolver to see if it ca= n resolve google.com:
> '.lightblue
>=C2=A0 =C2=A0 =C2=A0system "./musl_getaddrinfo_test google.com set_ai_= family 2>&1 |tee -a
> /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_set_ai_family.txt &= quot;
>=C2=A0 =C2=A0 =C2=A0system "./musl_getaddrinfo_test google.com 2>&a= mp;1 |tee -a
> /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com.txt"
>=C2=A0 =C2=A0 =C2=A0system "strace -o
> /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_set_ai_family_STRAC= E.txt
> ../musl_getaddrinfo_test google.com set_ai_family"
>=C2=A0 =C2=A0 =C2=A0system "strace -o
> /tmp/musl_#{@musl_ver}_getaddrinfo_test_google.com_STRACE.txt
> ../musl_getaddrinfo_test google.com"
>
> And here is the output for each run before running again via strace. N= ote
> how IPv6 addresses show up sporadically, and for 1.2.2 nothing at all = shows
> up, but everything works fine according to the strace logs. (Strace is=
> built against glibc 2.23.)
>
> =3D=3D>
> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com_set_ai_family.= txt
> <=3D=3D
> AF_INET: 142.251.40.110
>
> =3D=3D> musl_1.2.0-git-17-g33338ebc_getaddrinfo_test_google.com.txt <= =3D=3D
> AF_INET: 142.251.40.110
>
> =3D=3D>
> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com_set_ai_family.= txt
> <=3D=3D
> AF_INET: 142.251.40.142
>
> =3D=3D> musl_1.2.0-git-39-g5cf1ac24_getaddrinfo_test_google.com.txt <= =3D=3D
> getaddrinfo: Try again
>
> =3D=3D>
> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com_set_ai_family.= txt
> <=3D=3D
> AF_INET: 142.251.40.206
>
> =3D=3D> musl_1.2.0-git-40-g1b4e84c5_getaddrinfo_test_google.com.txt <= =3D=3D
> AF_INET6: 2607:f8b0:4006:81f::200e
> AF_INET: 142.251.40.206
>
> =3D=3D>
> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com_set_ai_family.t= xt <=3D=3D
> AF_INET: 142.250.65.206
>
> =3D=3D> musl_1.2.0-git-6-g2f2348c9_getaddrinfo_test_google.com.txt <= =3D=3D
> AF_INET: 142.250.65.206
>
> =3D=3D> musl_1.2.1_getaddrinfo_test_google.com_set_ai_family.txt &l= t;=3D=3D
> AF_INET: 142.251.40.110
>
> =3D=3D> musl_1.2.1_getaddrinfo_test_google.com.txt <=3D=3D
> getaddrinfo: Try again
>
> =3D=3D> musl_1.2.2_getaddrinfo_test_google.com_set_ai_family.txt &l= t;=3D=3D
> getaddrinfo: Try again
>
> =3D=3D> musl_1.2.2_getaddrinfo_test_google.com.txt <=3D=3D
> getaddrinfo: Try again
>
> Regards,

OK, I don't see anything in the strace suggesting a cause. The kernel version (or whether a container was used) present on the system where
you built musl or the test programs should make no difference
whatsoever; musl has no build dependencies on the host kernel or
kernel headers or anything like that (and doesn't even need to be
built on a Linux host).

A couple questions:

Are the test programs on the i686 machine running under Docker or any
other container environment?

Can you tcpdump the traffic between the test program and the dnsmasq
during a failing run, with verbose display of the packet contents
(-vvv or something like that)?

I don't see any plausible explanation for the result varying between runs and with timing like this unless dnsmasq is doing something
odd/wrong. I thought it might be related to something blocking time64
syscalls but that doesn't seem to be the case -- according to the
strace logs they're getting ENOSYS as expected with fallback to the
legacy 32-bit clock_gettime etc. which is fine.
--00000000000010a8fb05d7727a24--