From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 5784 invoked from network); 16 Feb 2022 01:42:09 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 16 Feb 2022 01:42:09 -0000 Received: (qmail 10127 invoked by uid 550); 16 Feb 2022 01:42:07 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 10092 invoked from network); 16 Feb 2022 01:42:06 -0000 Date: Tue, 15 Feb 2022 20:41:54 -0500 From: Rich Felker To: Satadru Pramanik Cc: musl@lists.openwall.com Message-ID: <20220216014153.GM7074@brightrain.aerifal.cx> References: <20220207024056.GY7074@brightrain.aerifal.cx> <20220207210223.GZ7074@brightrain.aerifal.cx> <20220214182952.GI7074@brightrain.aerifal.cx> <20220214220043.GK7074@brightrain.aerifal.cx> <20220215174420.GL7074@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: [musl] Re: musl getaddr info breakage on older kernels On Tue, Feb 15, 2022 at 05:56:53PM -0500, Satadru Pramanik wrote: > > > > > > > > OK, then in that case it's surely Docker's seccomp filters that are > > the problem. I think --security-opt seccomp=unconfined is the part you > > need to work around it. > > That's the command line I was using, which leads to the application NOT > breaking, and thus doesn't allow me to replicate the problem: > docker run --security-opt seccomp=unconfined --platform linux/386 > --cap-add SYS_PTRACE --rm -v $(pwd)/pkg_cache:/usr/local/tmp/packages -v > $(pwd):/output -h $(hostname)-i686 -it satmandu/crewbuild:alex-i686.m58 > /usr/local/bin/setarch i686 sudo -i -u chronos /usr/local/bin/bash -i > > The goal with docker was to try to replicate the breakage on the actual > hardware, which is the place we are having this problem. OK, you haven't been clear about where the problem actually happens from the beginning. I was under the impression all along that the problem happened only in a Docker environment. Before we continue, can you please clarify the exact environment the problem happens in including: - Whether any network traffic occurs when it fails (in the real environment not a replicated one elsewhere). - Whether it fails or succeeds under strace (in the real environment not a replicated one elsewhere). - Whether the real environment involves Docker or not. - What's in resolv.conf (in the real environment not a replicated one elsewhere) and what nameserver software (if known) is running on the nameserver(s) listed in there. - Anything else that might be relevant. It's really hard to offer any productive advice when the problem is unclear. > I ran the process through gdb on the hardware, and stepped through it with > the timeit function from here: https://stackoverflow.com/a/48412363 > > Of note perhaps is the very long time it takes for some of these calls to > return in gdb? (The program does run in gdb when stepping through the > function, but not when run without the break point) > my commands were in essence the following in gdb: > add symbol table from file "/usr/local/share/musl/lib/libc.so" > break main > run google.com 2>>gdb.out.txt > ti (repeated until the program exited) > (I ran this twice, and both runs succeed with long delays) > Then I ran (this, which fails): > clear main > run google.com 2>>gdb.out.txt > > Any other suggestions on how to track down this issue? Rather than stepping through, I would put a single breakpoint at a place you want to see whether execution reaches before running the test program, then start it and see if the breakpoint fires or not. Then remove the breakpoint, add a different one, and repeat. For example, see if __res_msend is ever called, and if so, whether particular lines of it are reached (or just put breakpoints on some of the functions it calls, like socket, bind, recvfrom, poll, etc. to see if they're called). It might also be useful to put a breakpoint on clock_gettime and then 'finish' to see what it returns (in case the problem is something time64-related).