From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 4120 invoked from network); 16 Feb 2022 18:45:01 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 16 Feb 2022 18:45:01 -0000 Received: (qmail 9887 invoked by uid 550); 16 Feb 2022 18:44:59 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 9851 invoked from network); 16 Feb 2022 18:44:58 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VAFZjTasvPR0zJShhO15AX3vZN85vlRkJ/WUwpnxRyE=; b=Myl1/w6YJarsZeOKK9OBiSowRmoylvpJYg/ztaYdHwU4wk8E94w0cjdho+tueZ5FHh 2axeqAVk+dfT9tGKVj1MfpSzV/IG0MYdiQVsAdOX4FigNjQfvTXSqbegGEy7oVlH2tLX qLSiD9DfqI8Eh4QoMMXkQYWrDDBFXT6jAGKPpVhKez9dczbxcTJ/haWuIOidU7bzUP4X cX87Ujb7Qt4UHwiwOhc27pGO0ju8vjFFMEeqf8IgymlMJ60mqV+Nmz618rphLMzV+i8B wdBewRcuoQJgzRlZUKRAznSY71ahO/2RwMTrlWBMWHwJWlNi+8uAY1xD4G44nXNhzERC +lsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VAFZjTasvPR0zJShhO15AX3vZN85vlRkJ/WUwpnxRyE=; b=3xtxfTQq0uZCW0WYpAZsFwjYN17bm1z5CSXbMjY4i887LH572AGh918JItSqKlWfR4 anSMv1sK+KVrdrU1Bm69eBfCmmmnzfUql/S2bcOUfZdTCVhiuSPnfFpLkHSYyyzvP5TC jAwm2SYkIOtZfCDBA/1+bnljE7MB2uRTczeijX0ZOlnqsoZYl1K0GcJJbkDw+sXJOqP4 oaLDkucQR7He/sfWcJSo6Gklg2dbpw8+ilgOWYxw4pkmx9Jz36hYmO/+1WlKm26L0yR4 /io6z/nK20v02QFlLSiDwPNxQAVKJLPb8lb32+5UfE/lMgpjTtwVPDXW/rgTCxboA9Iz X7UA== X-Gm-Message-State: AOAM532AikJvGtmX3ZXkle+26rLSL6wuypqmQ8h951QMFAKJMdy3c3MC BBbVNNj4KGZy3fhwdTz3Xqh0gX6vYZeHUAXhzykMjX+qTS0= X-Google-Smtp-Source: ABdhPJzK5FrBuXNXFngevRQIkwdDifa02WdrVe6m07f0QU/2Jj+Vnnfyltnt04TyeP3k4sdFyyAv+nEWES3HqN2OWx4= X-Received: by 2002:a2e:9847:0:b0:238:eca:62fd with SMTP id e7-20020a2e9847000000b002380eca62fdmr3013724ljj.65.1645037087012; Wed, 16 Feb 2022 10:44:47 -0800 (PST) MIME-Version: 1.0 References: <20220207024056.GY7074@brightrain.aerifal.cx> <20220207210223.GZ7074@brightrain.aerifal.cx> <20220214182952.GI7074@brightrain.aerifal.cx> <20220214220043.GK7074@brightrain.aerifal.cx> <20220215174420.GL7074@brightrain.aerifal.cx> <20220216014153.GM7074@brightrain.aerifal.cx> In-Reply-To: From: Satadru Pramanik Date: Wed, 16 Feb 2022 13:44:35 -0500 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="00000000000036bd4c05d8270b4f" Subject: Re: [musl] Re: musl getaddr info breakage on older kernels --00000000000036bd4c05d8270b4f Content-Type: text/plain; charset="UTF-8" The only change to socket.c I'm seeing is use __socketcall to simplify socket() , so maybe it would make sense for me to try building with that reversed? satadru On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik wrote: > >> >> - Whether any network traffic occurs when it fails (in the real >> environment not a replicated one elsewhere). >> >> > There is no network traffic in the real environment. > > >> - Whether it fails or succeeds under strace (in the real >> environment not a replicated one elsewhere). >> >> It succeeds in strace (in the real environment) > > > >> - Whether the real environment involves Docker or not. >> >> The real environment does not involve docker. > > > >> - What's in resolv.conf (in the real environment not a replicated one >> elsewhere) and what nameserver software (if known) is running on the >> nameserver(s) listed in there. >> >> The nameserver is picked up from dhcp. The contents of the file are as > follows: > nameserver 192.168.0.1 > search lan. > options single-request timeout:1 attempts:5 > > >> - Anything else that might be relevant. >> >> DNS server is dnsmasq running on a current OpenWRT device. > > >> It's really hard to offer any productive advice when the problem is >> unclear. >> >> Apologies for the confusion. > I'm really just trying to debug this getaddrinfo breakage on this older > hardware. The docker containers setup is something we use to build packages > for this hardware, and our frustration is that the software works perfectly > fine in the docker containers, but not on the hardware. > > > Any other suggestions on how to track down this issue? >> >> Rather than stepping through, I would put a single breakpoint at a >> place you want to see whether execution reaches before running the >> test program, then start it and see if the breakpoint fires or not. >> Then remove the breakpoint, add a different one, and repeat. For >> example, see if __res_msend is ever called, and if so, whether >> particular lines of it are reached (or just put breakpoints on some of >> the functions it calls, like socket, bind, recvfrom, poll, etc. to see >> if they're called). >> >> It might also be useful to put a breakpoint on clock_gettime and then >> 'finish' to see what it returns (in case the problem is something >> time64-related). >> >> > The only breakpoint which fixed the execution was for line 20 (which > invokes getaddrinfo). Stepping through the __kernel_vsyscall and then > continuing is the only way it does not result in failure. > > Any later breakpoints fail. > > I went though the other breakpoints as requested. > clock_gettime did not fire. > > Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_gettime32.c, > line 9. > __res_msend, setsockopt also did not fire. > The ones that did fire were: socket, bind, recvfrom, poll, __res_msend_rc, > memset, sendto, __get_resolv_conf, pthread_setcancelstate, > __pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy > > When breaking on socket, stepping through the __kernel_vsyscall call after > socket and then continuing succeeds. > > Is it possible that the socket is not waiting long enough for a response > from __kernel_vsyscall? Has that changed? > Breaking, stepping, and continuing on every other function above fails. > > The gdb log is attached. > > Regards, > > Satadru > > --00000000000036bd4c05d8270b4f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The only change to socket.c I'm seeing is=C2=A0use __socketcall to simplify socket(), so maybe it = would make sense for me to try building with that reversed?

<= div>satadru

On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik <satadru@gmail.com> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex">


- Whether any network traffic occurs when it fails (in the real
=C2=A0 environment not a replicated one elsewhere).


There is no network traffic in the rea= l environment.
=C2=A0
- Whether it fails or succeeds under strace (in the real
=C2=A0 environment not a replicated one elsewhere).

It succeeds in strace (in the real environment)
=

=C2=A0
- Whether the real environment involves Docker or not.

The real environment does not involve docker.

=C2=A0
- What's in resolv.conf (in the real environment not a replicated one =C2=A0 elsewhere) and what nameserver software (if known) is running on the=
=C2=A0 nameserver(s) listed in there.

The nameserver is picked up from dhcp. The contents o= f the file are as follows:
nameserver 192.168.0.1
searc= h lan.
options single-request timeout:1 attempts:5
=C2= =A0
- Anything else that might be relevant.

DNS server is dnsmasq running on a current OpenWRT de= vice.
=C2=A0
It's really hard to offer any productive advice when the problem is
unclear.

Apologies for the confusion.
I'm really= just trying to debug this getaddrinfo breakage on this older hardware. The= docker containers setup is something we use to build packages for this har= dware, and our frustration is that the software works perfectly fine in the= docker containers, but not on the hardware.

> Any other suggestions on how to track down this issue?

Rather than stepping through, I would put a single breakpoint at a
place you want to see whether execution reaches before running the
test program, then start it and see if the breakpoint fires or not.
Then remove the breakpoint, add a different one, and repeat. For
example, see if __res_msend is ever called, and if so, whether
particular lines of it are reached (or just put breakpoints on some of
the functions it calls, like socket, bind, recvfrom, poll, etc. to see
if they're called).

It might also be useful to put a breakpoint on clock_gettime and then
'finish' to see what it returns (in case the problem is something time64-related).


The only breakpoint which fixed the executi= on=C2=A0was for line 20 (which invokes getaddrinfo). Stepping through the _= _kernel_vsyscall and then continuing is the only way it does not result in = failure.

Any later brea= kpoints fail.

I went though the other breakpoints = as requested.
clock_gettime did not fire.

Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_gettime= 32.c, line 9.
__res_msend, setsockopt also did not fire.
<= /div>
The ones that did fire were: socket, bind, recvfrom, poll,=C2=A0_= _res_msend_rc, memset, sendto,=C2=A0__get_resolv_conf,=C2=A0pthread_setcanc= elstate, __pthread_setcancelstate,=C2=A0__lookup_serv,=C2=A0__lookup_name, = memcpy

When breaking on socket, stepping through t= he __kernel_vsyscall call after socket and then continuing succeeds.
<= div>
Is it possible that the socket is not waiting long enoug= h for a response from __kernel_vsyscall? Has that changed?
Breaki= ng, stepping, and continuing on every other function above fails.

The gdb log is attached.

Regar= ds,

Satadru
=C2=A0
--00000000000036bd4c05d8270b4f--