From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 11447 invoked from network); 17 Feb 2022 13:18:03 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 17 Feb 2022 13:18:03 -0000 Received: (qmail 8084 invoked by uid 550); 17 Feb 2022 13:18:00 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 8045 invoked from network); 17 Feb 2022 13:17:59 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rFMZSbsC0rbMPJwUR9VkH1oYGV2JdGHX/GD2mRhN/WU=; b=VWfcKmmPVHhLB/z2x7x27a5y8YDgCc4lFWHHaCvuTwGtLTsK5RYOW7K6v1D/TAnNO6 PlKnw0FK1tkUM2w82Glsdi+c7E1tgfSdGXUAOrfNaZvrg7Bi+MG5YZsZ/TKJ+/r7uO0V vdnsASrU84ew7LFCcrkKu61w4hkuWnfctYJwMgLjAWj+5jZ0kCAjII188kptSE0NKmh9 UUaQPi5byG9Lp8zz42WM8Xt6PSaTF7fz1xIu8z/Ve+m8x3uk3ojt16lsTtf35/UYSDU2 LzKcaefB+siQygeSbi4q9A8xSEtdLoq801c1taMpL/rf6a9JyF69zOYFHg2hTr4/OTeC r21Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rFMZSbsC0rbMPJwUR9VkH1oYGV2JdGHX/GD2mRhN/WU=; b=vb1Ebyd4j2LHz25hkSP6jHrS5xvEZj5f7jTedAaf1JuAAaWAyMzSzmS24eiSBaA2aN htT1RLB8YRWFZyybdr4kDIjLW3jsHG9lDLSa5CNbppUHyS8q7MdWDTuoI73JH3J9QUcW r4NewobvLYIRvVBoW549rwa5b4I1UJFB5pU3rsR9f1z5+k6srlt6k7xxScmhkijry4C3 e/qbjsBVpO5S3U6sc+iMwWEqlcCy5BrzO6+KQJ5N89k1LoGQ7kKqFpfAKjCz825aoOBP mzMg9/W9k5Ps0XTDNqp8kVQCMTLs5nv4RowcqF+gYnGB1EOSiJ56viDWgZeYeyEPCpD1 TuCg== X-Gm-Message-State: AOAM532mqwx7J13rSCkzGWgif1IXZf8+yt5rai8Qeg1WAG5rNUnMsCbM AxreHdeF/ziCny3C2Det8oZQzO7qiWlz9DxVdP9Inhax6dc= X-Google-Smtp-Source: ABdhPJyEL6cFPU3T1lyKYc8DSbItS8F3a1kRoylOXHuX1Pb3+dJWKzu03QPQwcukiS5HcKpaaPcyQLs9xdVz5I6z2uc= X-Received: by 2002:a2e:a270:0:b0:245:f51f:354 with SMTP id k16-20020a2ea270000000b00245f51f0354mr2246417ljm.497.1645103868190; Thu, 17 Feb 2022 05:17:48 -0800 (PST) MIME-Version: 1.0 References: <20220214182952.GI7074@brightrain.aerifal.cx> <20220214220043.GK7074@brightrain.aerifal.cx> <20220215174420.GL7074@brightrain.aerifal.cx> <20220216014153.GM7074@brightrain.aerifal.cx> <20220216213335.GO7074@brightrain.aerifal.cx> In-Reply-To: From: Satadru Pramanik Date: Thu, 17 Feb 2022 08:17:36 -0500 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="000000000000aea56605d836972a" Subject: Re: [musl] Re: musl getaddr info breakage on older kernels --000000000000aea56605d836972a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Looks like reverting that commit works, but interestingly, only stochastically. chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com getaddrinfo: Try again chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com getaddrinfo: Try again chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com getaddrinfo: Try again chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com getaddrinfo: Try again chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com getaddrinfo: Try again chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com AF_INET6: 2607:f8b0:4006:81c::200e AF_INET: 142.250.80.46 On Wed, Feb 16, 2022 at 11:14 PM Satadru Pramanik wrote= : > Oops. Looks like I reversed the patch wrong. Rebuilding again... > > On Wed, Feb 16, 2022 at 8:44 PM Satadru Pramanik > wrote: > >> Looks like I need to do more than just reverse that commit to get this t= o >> build... >> >> ../src_musl/src/internal/syscall.h:47:53: note: in expansion of macro >> =E2=80=98__socketcall_cp=E2=80=99 47 | #define socketcall_cp(nm,a,b,c= ,d,e,f) >> __syscall_ret(__socketcall_cp(nm,a,b,c,d,e,f)) >> | >> ^~~~~~~~~~~~~~~ >> ../src_musl/src/network/accept.c:6:16: note: in expansion of macro >> =E2=80=98socketcall_cp=E2=80=99 >> 6 | return socketcall_cp(accept, fd, addr, len, 0, 0, 0); >> | ^~~~~~~~~~~~~ >> ../src_musl/src/internal/syscall.h:62:54: note: each undeclared >> identifier is reported only once for each function it appears in >> 62 | #define >> __socketcall_cp(nm,a,b,c,d,e,f) __syscall_cp(SYS_##nm, a, b, c, d, e, f) >> | ^~~~ >> ../src_musl/src/internal/syscall.h:55:53: note: in definition o= f >> macro =E2=80=98__syscall_cp6=E2=80=99 55 | #define __syscall_cp6(n,a= ,b,c,d,e,f) >> (__syscall_cp)(n,__scc(a),__scc(b),__scc(c),__scc(d),__scc(e),__scc(f)) >> | >> ^ >> ../src_musl/src/internal/syscall.h:57:27: note: in expansion of macro >> =E2=80=98__SYSCALL_DISP=E2=80=99 57 | #define __syscall_cp(...) >> __SYSCALL_DISP(__syscall_cp,__VA_ARGS__) | >> ^~~~~~~~~~~~~~ >> ../src_musl/src/internal/syscall.h:62:41: note: in expansion of macro >> =E2=80=98__syscall_cp=E2=80=99 62 | #define __socketcall_cp(nm,a,b= ,c,d,e,f) >> __syscall_cp(SYS_##nm, a, b, c, d, e, f) | >> ^~~~~~~~~~~~ >> ../src_musl/src/internal/syscall.h:47:53: note: in expansion of macro >> =E2=80=98__socketcall_cp=E2=80=99 >> 47 | #define socketcall_cp(nm,a,b,c,d,e,f) >> __syscall_ret(__socketcall_cp(nm,a,b,c,d,e,f)) >> | >> ^~~~~~~~~~~~~~~ >> ../src_musl/src/network/accept.c:6:16: note: in expansion of macro >> =E2=80=98socketcall_cp=E2=80=99 6 | return socketcall_c= p(accept, fd, addr, >> len, 0, 0, 0); | ^~~~~~~~~~~~~ >> ../src_musl/src/network/accept.c:7:1: warning: control reaches end of >> non-void function [-Wreturn-type] >> 7 | } >> | ^ >> make[2]: *** [Makefile:159: obj/src/network/accept.lo] Er= ror >> 1 >> make[2]: Leaving directory >> '/usr/local/tmp/crew/musl_native_toolchain.20220216215322.dir/build/loca= l/i686-linux-musl/obj_musl' >> make[1]: *** [Makefile:2= 49: >> obj_musl/.lc_built] Error 2 make[1]: >> Leaving directory >> '/usr/local/tmp/crew/musl_native_toolchain.20220216215322.dir/build/loca= l/i686-linux-musl' >> make: *** >> [Makefile:194: all] Error 2 >> >> On Wed, Feb 16, 2022, 4:53 PM Satadru Pramanik wrote= : >> >>> I was looking at that commit too. I've started a build with that >>> reverted and should be able to check back on that tomorrow. >>> >>> On Wed, Feb 16, 2022 at 4:33 PM Rich Felker wrote: >>> >>>> On Wed, Feb 16, 2022 at 01:44:35PM -0500, Satadru Pramanik wrote: >>>> > The only change to socket.c I'm seeing is use __socketcall to simpli= fy >>>> > socket() >>>> > < >>>> https://git.musl-libc.org/cgit/musl/commit/?id=3D7063c459e7dbd63c2c94e= 04413743abab5272001 >>>> >, >>>> > so maybe it would make sense for me to try building with that >>>> reversed? >>>> >>>> That should not be a functional change, but you may be overlooking >>>> commit c2feda4e2ea61f4da73f2f38b2be5e327a7d1a91, which was: using the >>>> new (added in 4.3) individual socket syscalls instead of the legacy >>>> multiplexed SYS_socketcall. It's supposed to fall back to using the >>>> old ones, but perhaps something goes wrong on your kernel that's >>>> preventing it. I'm not sure what the mechanism by which it works when >>>> straced/single-stepped could be, though, but if it's a weird kernel >>>> bug anything is possible. >>>> >>>> Reverting that commit should be entirely safe, if it turns out to be >>>> what's triggering your problem, but I'd like to get to the root cause >>>> and see if there's anything we can do to ensure this doesn't come up >>>> again. >>>> >>>> >>>> > On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik >>>> wrote: >>>> > >>>> > > >>>> > >> >>>> > >> - Whether any network traffic occurs when it fails (in the real >>>> > >> environment not a replicated one elsewhere). >>>> > >> >>>> > >> >>>> > > There is no network traffic in the real environment. >>>> > > >>>> > > >>>> > >> - Whether it fails or succeeds under strace (in the real >>>> > >> environment not a replicated one elsewhere). >>>> > >> >>>> > >> It succeeds in strace (in the real environment) >>>> > > >>>> > > >>>> > > >>>> > >> - Whether the real environment involves Docker or not. >>>> > >> >>>> > >> The real environment does not involve docker. >>>> > > >>>> > > >>>> > > >>>> > >> - What's in resolv.conf (in the real environment not a replicated >>>> one >>>> > >> elsewhere) and what nameserver software (if known) is running o= n >>>> the >>>> > >> nameserver(s) listed in there. >>>> > >> >>>> > >> The nameserver is picked up from dhcp. The contents of the file >>>> are as >>>> > > follows: >>>> > > nameserver 192.168.0.1 >>>> > > search lan. >>>> > > options single-request timeout:1 attempts:5 >>>> > > >>>> > > >>>> > >> - Anything else that might be relevant. >>>> > >> >>>> > >> DNS server is dnsmasq running on a current OpenWRT device. >>>> > > >>>> > > >>>> > >> It's really hard to offer any productive advice when the problem = is >>>> > >> unclear. >>>> > >> >>>> > >> Apologies for the confusion. >>>> > > I'm really just trying to debug this getaddrinfo breakage on this >>>> older >>>> > > hardware. The docker containers setup is something we use to build >>>> packages >>>> > > for this hardware, and our frustration is that the software works >>>> perfectly >>>> > > fine in the docker containers, but not on the hardware. >>>> > > >>>> > > > Any other suggestions on how to track down this issue? >>>> > >> >>>> > >> Rather than stepping through, I would put a single breakpoint at = a >>>> > >> place you want to see whether execution reaches before running th= e >>>> > >> test program, then start it and see if the breakpoint fires or no= t. >>>> > >> Then remove the breakpoint, add a different one, and repeat. For >>>> > >> example, see if __res_msend is ever called, and if so, whether >>>> > >> particular lines of it are reached (or just put breakpoints on >>>> some of >>>> > >> the functions it calls, like socket, bind, recvfrom, poll, etc. t= o >>>> see >>>> > >> if they're called). >>>> > >> >>>> > >> It might also be useful to put a breakpoint on clock_gettime and >>>> then >>>> > >> 'finish' to see what it returns (in case the problem is something >>>> > >> time64-related). >>>> > >> >>>> > >> >>>> > > The only breakpoint which fixed the execution was for line 20 (whi= ch >>>> > > invokes getaddrinfo). Stepping through the __kernel_vsyscall and >>>> then >>>> > > continuing is the only way it does not result in failure. >>>> > > >>>> > > Any later breakpoints fail. >>>> > > >>>> > > I went though the other breakpoints as requested. >>>> > > clock_gettime did not fire. >>>> > > >>>> > > Breakpoint 1 at 0x5c2f7: file >>>> ../src_musl/compat/time32/clock_gettime32.c, >>>> > > line 9. >>>> > > __res_msend, setsockopt also did not fire. >>>> > > The ones that did fire were: socket, bind, recvfrom, poll, >>>> __res_msend_rc, >>>> > > memset, sendto, __get_resolv_conf, pthread_setcancelstate, >>>> > > __pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy >>>> > > >>>> > > When breaking on socket, stepping through the __kernel_vsyscall >>>> call after >>>> > > socket and then continuing succeeds. >>>> > > >>>> > > Is it possible that the socket is not waiting long enough for a >>>> response >>>> > > from __kernel_vsyscall? Has that changed? >>>> > > Breaking, stepping, and continuing on every other function above >>>> fails. >>>> > > >>>> > > The gdb log is attached. >>>> > > >>>> > > Regards, >>>> > > >>>> > > Satadru >>>> > > >>>> > > >>>> >>> --000000000000aea56605d836972a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Looks like reverting that commit works, but interestingly,= only stochastically.

chronos@localhost /usr/local/tmp/c= rew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com
getaddrinfo: Try again
chron= os@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $= ./musl_getaddrinfo_test google.com
AF= _INET6: 2607:f8b0:4006:81c::200e
AF_INET: 142.250.80.46
chronos@local= host /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_= getaddrinfo_test google.com
AF_INET6: = 2607:f8b0:4006:81c::200e
AF_INET: 142.250.80.46
chronos@localhost /us= r/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddri= nfo_test google.com
getaddrinfo: Try a= gain
chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.2022021= 7125953.dir $ ./musl_getaddrinfo_test google.= com
AF_INET6: 2607:f8b0:4006:81c::200e
AF_INET: 142.250.80.46
= chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.= dir $ ./musl_getaddrinfo_test google.com<= br>AF_INET6: 2607:f8b0:4006:81c::200e
AF_INET: 142.250.80.46
chronos@= localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir $ ./= musl_getaddrinfo_test google.com
getad= drinfo: Try again
chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo= _test.20220217125953.dir $ ./musl_getaddrinfo_test google.com
AF_INET6: 2607:f8b0:4006:81c::200e
AF_INET: 142.= 250.80.46
chronos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20= 220217125953.dir $ ./musl_getaddrinfo_test go= ogle.com
getaddrinfo: Try again
chronos@localhost /usr/local/tmp/= crew/musl_getaddrinfo_test.20220217125953.dir $ ./musl_getaddrinfo_test google.com
getaddrinfo: Try again
chro= nos@localhost /usr/local/tmp/crew/musl_getaddrinfo_test.20220217125953.dir = $ ./musl_getaddrinfo_test google.com
A= F_INET6: 2607:f8b0:4006:81c::200e
AF_INET: 142.250.80.46
=
On Wed= , Feb 16, 2022 at 11:14 PM Satadru Pramanik <satadru@gmail.com> wrote:
Oops. Looks like I reversed the= patch wrong. Rebuilding again...

On Wed, Feb 16, 2022 at 8:44 PM Satadru Pr= amanik <satadru@g= mail.com> wrote:
Looks like I need to do more than just reverse th= at commit to get this to build...

../src_musl/src/internal/syscall.h:47:53: note: in ex= pansion of macro =E2=80=98__socketcall_cp=E2=80=99=C2=A0 =C2=A0 47 | #defin= e socketcall_cp(nm,a,b,c,d,e,f) __syscall_ret(__socketcall_cp(nm,a,b,c,d,e,= f))=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^~~~~~~~~= ~~~~~~=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0../src_musl/src/netwo= rk/accept.c:6:16: note: in expansion of macro =E2=80=98socketcall_cp=E2=80= =99
=C2=A0 =C2=A0 6 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0return socketcall_cp(accept, fd, addr, len, 0, 0, 0);
=C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 ^~~~~~~~~~~~~
../src_musl/src/internal/sysca= ll.h:62:54: note: each undeclared identifier is reported only once for each= function it appears in=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A062 | #define __socke= tcall_cp(nm,a,b,c,d,e,f) __syscall_cp(SYS_##nm, a, b, c, d, e, f)=C2=A0 =C2= =A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ^~~~=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0../src_mu= sl/src/internal/syscall.h:55:53: note: in definition of macro =E2=80=98__sy= scall_cp6=E2=80=99=C2=A0 =C2=A0 =C2=A055 | #define __syscall_cp6(n,a,b,c,d,= e,f) (__syscall_cp)(n,__scc(a),__scc(b),__scc(c),__scc(d),__scc(e),__scc(f)= )=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^=C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0../src_musl/src/internal/syscall.h:57:27: note: in expansion of macro = =E2=80=98__SYSCALL_DISP=E2=80=99=C2=A0 =C2=A0 =C2=A057 | #define __syscall_= cp(...) __SYSCALL_DISP(__syscall_cp,__VA_ARGS__)=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^~~~~~~~~= ~~~~~=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ../sr= c_musl/src/internal/syscall.h:62:41: note: in expansion of macro =E2=80=98_= _syscall_cp=E2=80=99=C2=A0 =C2=A0 =C2=A0 =C2=A062 | #define __socketcall_cp= (nm,a,b,c,d,e,f) __syscall_cp(SYS_##nm, a, b, c, d, e, f)=C2=A0 =C2=A0 =C2= =A0 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0^~~~~~~~~~~~=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ../src_musl/src/internal/syscall.h:4= 7:53: note: in expansion of macro =E2=80=98__socketcall_cp=E2=80=99
=C2=A0 =C2=A047 | #define socketcall_cp(nm,a,b,c,d,e,f) __s= yscall_ret(__socketcall_cp(nm,a,b,c,d,e,f))=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0^~~~~~~~~~~~~~~=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0../src_musl/src/network/accept.c:6:16: note: in expansi= on of macro =E2=80=98socketcall_cp=E2=80=99=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 6 |=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return socketcall_cp(accept, fd, a= ddr, len, 0, 0, 0);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 ^~~~~~~~~~~~~
../src_musl/src/netw= ork/accept.c:7:1: warning: control reaches end of non-void function [-Wretu= rn-type]=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A07 | }
=C2=A0 =C2=A0 =C2=A0 | ^=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0make[2]: *** [Makefile:159: obj/src/network/accept.lo] = Error 1
make[2]: Leaving directory '/usr/local/t= mp/crew/musl_native_toolchain.20220216215322.dir/build/local/i686-linux-mus= l/obj_musl'=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 make[1]: *** [Makefile:249: obj_m= usl/.lc_built] Error 2=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0make[1]: = Leaving directory '/usr/local/tmp/crew/musl_native_toolchain.2022021621= 5322.dir/build/local/i686-linux-musl'=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0make: *** [Makefile:194: all] Error 2

On Wed, Feb 16, 2022, 4:53 PM Satadru Pramanik <satadru@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">
I was loo= king at that commit too. I've started a build with that reverted and sh= ould be able to check back on that tomorrow.

On Wed, Feb 16, 2022 at 4:33 PM= Rich Felker <dalias@aerifal.cx> wrote:
On Wed, Feb 16, 2022 at 01:44:35PM -0500, = Satadru Pramanik wrote:
> The only change to socket.c I'm seeing is use __socketcall to simp= lify
> socket()
> <https://git.musl-libc.org/cgit/musl/commit/?id=3D7063c459e7dbd63c2= c94e04413743abab5272001>,
> so maybe it would make sense for me to try building with that reversed= ?

That should not be a functional change, but you may be overlooking
commit c2feda4e2ea61f4da73f2f38b2be5e327a7d1a91, which was: using the
new (added in 4.3) individual socket syscalls instead of the legacy
multiplexed SYS_socketcall. It's supposed to fall back to using the
old ones, but perhaps something goes wrong on your kernel that's
preventing it. I'm not sure what the mechanism by which it works when straced/single-stepped could be, though, but if it's a weird kernel
bug anything is possible.

Reverting that commit should be entirely safe, if it turns out to be
what's triggering your problem, but I'd like to get to the root cau= se
and see if there's anything we can do to ensure this doesn't come u= p
again.


> On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik <satadru@gmail.com<= /a>> wrote:
>
> >
> >>
> >> - Whether any network traffic occurs when it fails (in the re= al
> >>=C2=A0 =C2=A0environment not a replicated one elsewhere).
> >>
> >>
> > There is no network traffic in the real environment.
> >
> >
> >> - Whether it fails or succeeds under strace (in the real
> >>=C2=A0 =C2=A0environment not a replicated one elsewhere).
> >>
> >> It succeeds in strace (in the real environment)
> >
> >
> >
> >> - Whether the real environment involves Docker or not.
> >>
> >> The real environment does not involve docker.
> >
> >
> >
> >> - What's in resolv.conf (in the real environment not a re= plicated one
> >>=C2=A0 =C2=A0elsewhere) and what nameserver software (if known= ) is running on the
> >>=C2=A0 =C2=A0nameserver(s) listed in there.
> >>
> >> The nameserver is picked up from dhcp. The contents of the fi= le are as
> > follows:
> > nameserver 192.168.0.1
> > search lan.
> > options single-request timeout:1 attempts:5
> >
> >
> >> - Anything else that might be relevant.
> >>
> >> DNS server is dnsmasq running on a current OpenWRT device. > >
> >
> >> It's really hard to offer any productive advice when the = problem is
> >> unclear.
> >>
> >> Apologies for the confusion.
> > I'm really just trying to debug this getaddrinfo breakage on = this older
> > hardware. The docker containers setup is something we use to buil= d packages
> > for this hardware, and our frustration is that the software works= perfectly
> > fine in the docker containers, but not on the hardware.
> >
> > > Any other suggestions on how to track down this issue?
> >>
> >> Rather than stepping through, I would put a single breakpoint= at a
> >> place you want to see whether execution reaches before runnin= g the
> >> test program, then start it and see if the breakpoint fires o= r not.
> >> Then remove the breakpoint, add a different one, and repeat. = For
> >> example, see if __res_msend is ever called, and if so, whethe= r
> >> particular lines of it are reached (or just put breakpoints o= n some of
> >> the functions it calls, like socket, bind, recvfrom, poll, et= c. to see
> >> if they're called).
> >>
> >> It might also be useful to put a breakpoint on clock_gettime = and then
> >> 'finish' to see what it returns (in case the problem = is something
> >> time64-related).
> >>
> >>
> > The only breakpoint which fixed the execution was for line 20 (wh= ich
> > invokes getaddrinfo). Stepping through the __kernel_vsyscall and = then
> > continuing is the only way it does not result in failure.
> >
> > Any later breakpoints fail.
> >
> > I went though the other breakpoints as requested.
> > clock_gettime did not fire.
> >
> > Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_get= time32.c,
> > line 9.
> > __res_msend, setsockopt also did not fire.
> > The ones that did fire were: socket, bind, recvfrom, poll, __res_= msend_rc,
> > memset, sendto, __get_resolv_conf, pthread_setcancelstate,
> > __pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy > >
> > When breaking on socket, stepping through the __kernel_vsyscall c= all after
> > socket and then continuing succeeds.
> >
> > Is it possible that the socket is not waiting long enough for a r= esponse
> > from __kernel_vsyscall? Has that changed?
> > Breaking, stepping, and continuing on every other function above = fails.
> >
> > The gdb log is attached.
> >
> > Regards,
> >
> > Satadru
> >
> >
--000000000000aea56605d836972a--