The only change to socket.c I'm seeing is use __socketcall to simplify socket() , so maybe it would make sense for me to try building with that reversed? satadru On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik wrote: > >> >> - Whether any network traffic occurs when it fails (in the real >> environment not a replicated one elsewhere). >> >> > There is no network traffic in the real environment. > > >> - Whether it fails or succeeds under strace (in the real >> environment not a replicated one elsewhere). >> >> It succeeds in strace (in the real environment) > > > >> - Whether the real environment involves Docker or not. >> >> The real environment does not involve docker. > > > >> - What's in resolv.conf (in the real environment not a replicated one >> elsewhere) and what nameserver software (if known) is running on the >> nameserver(s) listed in there. >> >> The nameserver is picked up from dhcp. The contents of the file are as > follows: > nameserver 192.168.0.1 > search lan. > options single-request timeout:1 attempts:5 > > >> - Anything else that might be relevant. >> >> DNS server is dnsmasq running on a current OpenWRT device. > > >> It's really hard to offer any productive advice when the problem is >> unclear. >> >> Apologies for the confusion. > I'm really just trying to debug this getaddrinfo breakage on this older > hardware. The docker containers setup is something we use to build packages > for this hardware, and our frustration is that the software works perfectly > fine in the docker containers, but not on the hardware. > > > Any other suggestions on how to track down this issue? >> >> Rather than stepping through, I would put a single breakpoint at a >> place you want to see whether execution reaches before running the >> test program, then start it and see if the breakpoint fires or not. >> Then remove the breakpoint, add a different one, and repeat. For >> example, see if __res_msend is ever called, and if so, whether >> particular lines of it are reached (or just put breakpoints on some of >> the functions it calls, like socket, bind, recvfrom, poll, etc. to see >> if they're called). >> >> It might also be useful to put a breakpoint on clock_gettime and then >> 'finish' to see what it returns (in case the problem is something >> time64-related). >> >> > The only breakpoint which fixed the execution was for line 20 (which > invokes getaddrinfo). Stepping through the __kernel_vsyscall and then > continuing is the only way it does not result in failure. > > Any later breakpoints fail. > > I went though the other breakpoints as requested. > clock_gettime did not fire. > > Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_gettime32.c, > line 9. > __res_msend, setsockopt also did not fire. > The ones that did fire were: socket, bind, recvfrom, poll, __res_msend_rc, > memset, sendto, __get_resolv_conf, pthread_setcancelstate, > __pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy > > When breaking on socket, stepping through the __kernel_vsyscall call after > socket and then continuing succeeds. > > Is it possible that the socket is not waiting long enough for a response > from __kernel_vsyscall? Has that changed? > Breaking, stepping, and continuing on every other function above fails. > > The gdb log is attached. > > Regards, > > Satadru > >