From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 2919 invoked from network); 17 Feb 2022 16:36:58 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 17 Feb 2022 16:36:58 -0000 Received: (qmail 20453 invoked by uid 550); 17 Feb 2022 16:36:54 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 20417 invoked from network); 17 Feb 2022 16:36:54 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YRmL5H2WSOV2ID1AzPB16uRR1j0/5nWbrfBG8K9Gbew=; b=JB49ZhsHmu0/4N9MAUFZzBhvWP5CUBU8nxiKS7qAzr2rLO6etu4j5vPgnM59wU9PGY MXJUja5lBxKkRPeXe7rgz0y+wyToCBJCzPHJTQBlvFMrDAsPQ4UsOUOXEUugWz5JPMio XNmwuLNy1erHNvuzKKeMpsLEkC9QySV8ujTE+gxK7ZuBXRCwVKPuShSR69emsS6civoG y1x2JaEWmrW0chzqnfZVS9KWeUBtOOoHdqQ5luIs/QL6oRtzxN5ZxWxWMG9OsscGUK/c U83O0X3FoRCrPGTp+Ye5hJuNFrvJVev0JDX2fuisfcSYFDUncmVrGZj83uvCSZlhHJgV oevA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YRmL5H2WSOV2ID1AzPB16uRR1j0/5nWbrfBG8K9Gbew=; b=c8TY/BR8QTopXU8HnGnDBxZt45bJa21JzJxX4IfNx56XJe2+EwO97IHCNpWu9sqmNQ NqyzwvJ/2oYOR8mLH5kDHDw6kZ4mRvmolVTuMtycrV7RPmadrJ00Li2Iu9U1DsdmePHH ZIoFiuz+vL5j2IAjXMY05DUPIAU18hYXH9+ZJ7ewZuzoZ2IJHiHFSVmrcvovxQUFA6h0 63S/J2Af7aUJUrQ1CrrdhYj+GB1yhpp1jJLIlr20DrE+ItjMrUCfx9JejGnkc/GSr6C0 kufveMnVRFb3MKth5ymewC2ddngCY5CROKcwwNUWBknNjK+kk5lY78fUDTVJOvURqHri 0zIA== X-Gm-Message-State: AOAM530b6EmStkzyEIU9bYzluSPKBasKUW3h6OLxhsKzk4Y92U5NbXuP Z4RS4BYBNtz6BwdDMJlvPcentij0//O/pBMLl4qsEqXlrHc= X-Google-Smtp-Source: ABdhPJzYuecce8g7CopSpqiZG2TxHxhVecsjzkhHhnsOqWdXmp8kYUcAPiQngAhbIDJk30aa7ZWIikXBa8yoZONr0EE= X-Received: by 2002:a19:dc0f:0:b0:439:702c:d83b with SMTP id t15-20020a19dc0f000000b00439702cd83bmr2419470lfg.192.1645115802891; Thu, 17 Feb 2022 08:36:42 -0800 (PST) MIME-Version: 1.0 References: <20220216213335.GO7074@brightrain.aerifal.cx> <20220217132434.GP7074@brightrain.aerifal.cx> <20220217134651.GQ7074@brightrain.aerifal.cx> <20220217155351.GR7074@brightrain.aerifal.cx> <20220217160501.GS7074@brightrain.aerifal.cx> In-Reply-To: <20220217160501.GS7074@brightrain.aerifal.cx> From: Satadru Pramanik Date: Thu, 17 Feb 2022 11:36:31 -0500 Message-ID: To: Rich Felker Cc: musl@lists.openwall.com Content-Type: multipart/alternative; boundary="0000000000000bbe7e05d8395f65" Subject: Re: [musl] Re: musl getaddr info breakage on older kernels --0000000000000bbe7e05d8395f65 Content-Type: text/plain; charset="UTF-8" This machine is a EOL Samsung Series 5 Chromebook code named Alex . It is the target device for our i686 builds for Chromebrew. It is running a 3.8.11 kernel, and I believe the kernel source for that is here: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-3.8 Getting a signed kernel update for an EOL kernel for an EOL machine is close to impossible from Google, so we're just trying to work around these issues in userspace to maintain some functionality for any users who may still be using the device. The simplest workaround possible would be ideal. It is interesting though that the sample program works fine when built against near-stock glibc 2.23, no? Satadru On Thu, Feb 17, 2022 at 11:05 AM Rich Felker wrote: > On Thu, Feb 17, 2022 at 10:53:52AM -0500, Rich Felker wrote: > > On Thu, Feb 17, 2022 at 09:49:45AM -0500, Satadru Pramanik wrote: > > > Apologies for not being as familiar with gdb as I ought to be. > > > I used the __clock_gettime64 breakpoint and did a backtrace and finish > > > repeatedly. > > > I couldn't figure out how to best get the timespec struct info. > > > > > > Alternately if you want to throw out a sample test program for me to > build > > > and run, and what gdb commands to run to get the right info, happy to > do > > > that too. > > > > > > gdb output is attached. > > > > If gdb reported it correctly, clock_gettime returned 403, which should > > be impossible. It can only return 0 or -1. Incidentally, 403 is the > > syscall number for SYS_clock_gettime64, which suggests your kernel is > > simply *returning the syscall number* instead of -ENOSYS for syscalls > > that don't exist on it. Is this a stock kernel (3.8 IIRC) or does it > > have any sort of weird vendor patching? Any LSMs loaded? > > > > If you'd like to run a test just to make sure we're accurately seeing > > what's happening, the attached should work. It should print 0 followed > > by the current time in seconds and nanoseconds. > > It looks like you hit the bug introduced in commit > 554086d85e71f30abe46fc014fea31929a7c6a8a and fixed in commit > 8142b215501f8b291a108a202b3a053a265b03dd. It looks like, since the > former was a CVE fix, somebody backported it to the kernel you're > using, but they failed to backport the fix-for-the-fix, so you have a > kernel that operates dangerously incorrectly for syscall numbers it's > unaware of. > > This really needs to be fixed in the kernel if you can. On our side > (musl) we probably need to find out if such kernels are actually out > in the wild, and if so, whether there's any reasonable way to detect > the false success and treat it as failure. > > > > On Thu, Feb 17, 2022 at 8:46 AM Rich Felker wrote: > > > > > > > On Thu, Feb 17, 2022 at 08:30:47AM -0500, Satadru Pramanik wrote: > > > > > *This is a failure:* > > > > > tcpdump -i any -vvv host 192.168.0.115 > > > > > tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), > capture > > > > > size 262144 bytes > > > > > 08:29:38.043849 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto > > > > UDP > > > > > (17), length 56) > > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ A? > google.com. > > > > (28) > > > > > 08:29:38.044237 IP (tos 0x0, ttl 64, id 11463, offset 0, flags > [DF], > > > > proto > > > > > UDP (17), length 72) > > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum 0x820a -> > > > > 0x5c7d!] > > > > > 0 q: A? google.com. 1/0/0 google.com. [2m15s] A 142.250.80.110 > (44) > > > > > 08:29:38.047754 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], > proto > > > > UDP > > > > > (17), length 56) > > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ AAAA? > > > > google.com. > > > > > (28) > > > > > 08:29:38.048078 IP (tos 0x0, ttl 64, id 11464, offset 0, flags > [DF], > > > > proto > > > > > UDP (17), length 84) > > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum 0x8216 -> > > > > 0xb42f!] > > > > > 0 q: AAAA? google.com. 1/0/0 google.com. [4m26s] AAAA > > > > > 2607:f8b0:4006:80d::200e (56) > > > > > 08:29:38.048955 IP (tos 0xc0, ttl 64, id 59728, offset 0, flags > [none], > > > > > proto ICMP (1), length 112) > > > > > 192.168.0.115 > office.lan: ICMP 192.168.0.115 udp port 60625 > > > > > unreachable, length 92 > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > OK, this shows that the client has requested both answers and the > > > > nameserver replied almost immediately (about 0.5ms later), but when > > > > the second reply arrives (to the AAAA), the client has already closed > > > > the listening port, despite only a few ms having passed. The only way > > > > I see this could happen is by "timing out". This suggests that > > > > something is wrong with telling time. > > > > > > > > Can you either put a breakpoint in __clock_gettime64 (this is the > name > > > > you have to use for a breakpoint -- sorry I messed it up last time) > > > > and then see what it returns when you "finish" it and what's in the > > > > timespec struct after that? Or just write a test program to call > > > > clock_gettime(CLOCK_REALTIME, &ts) (note: you do NOT need or want to > > > > use the time64 symbol name here) and print the results (return value > > > > and contents of the timespec struct). > > > > > > > > > > > > > > > > > IP (tos 0x0, ttl 64, id 11464, offset 0, flags [DF], proto > UDP > > > > > (17), length 84) > > > > > office.lan.53 > 192.168.0.115.60625: [udp sum ok] 0 q: AAAA? > > > > google.com. > > > > > 1/0/0 google.com. [4m26s] AAAA 2607:f8b0:4006:80d::200e (56) > > > > > 08:29:39.476101 IP (tos 0x0, ttl 64, id 12690, offset 0, flags > [DF], > > > > proto > > > > > TCP (6), length 52) > > > > > 192.168.0.115.51204 > lga34s35-in-f3.1e100.net.80: Flags [.], > cksum > > > > > 0xa666 (correct), seq 1466707759, ack 3358943837, win 115, options > > > > > [nop,nop,TS val 198422160 ecr 2351261566], length 0 > > > > > 08:29:39.478914 IP (tos 0x80, ttl 122, id 6227, offset 0, flags > [none], > > > > > proto TCP (6), length 52) > > > > > lga34s35-in-f3.1e100.net.80 > 192.168.0.115.51204: Flags [.], > cksum > > > > > 0xa5b7 (correct), seq 1, ack 1, win 282, options [nop,nop,TS val > > > > 2351306585 > > > > > ecr 198377148], length 0 > > > > > ^C > > > > > 7 packets captured > > > > > 7 packets received by filter > > > > > 0 packets dropped by kernel > > > > > > > > > > > #include > > #include > > int main() > > { > > struct timespec ts; > > printf("%d", clock_gettime(CLOCK_REALTIME, &ts)); > > printf(" %lld %.9ld\n", (long long)ts.tv_sec, ts.tv_nsec); > > } > > --0000000000000bbe7e05d8395f65 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
=C2=A0This machine is a EOL Samsung Series 5 Chromebook=C2=A0code named Al= ex=C2=A0. It is the target device for our i686 builds for Chromebrew.
It is running a 3.8.11 kernel, and I believe the kernel s= ource for that is here:

Getting a signed kernel update for an EOL= kernel for an EOL machine is close to impossible from Google, so we're= just trying to work around these issues in userspace to maintain some func= tionality for any users who may still be using the device.

The simplest workaround possible would be ideal. It is interesting= though that the sample program works fine when built against=C2=A0near-sto= ck glibc 2.23, no?

Satadru

On Thu, Feb 17, 20= 22 at 11:05 AM Rich Felker <dalias@aerifal.cx> wrote:
On Thu, Feb 17, 2022 at 10:53:52AM -0500, Rich= Felker wrote:
> On Thu, Feb 17, 2022 at 09:49:45AM -0500, Satadru Pramanik wrote:
> > Apologies for not being as familiar with gdb as I ought to be. > > I used the __clock_gettime64 breakpoint and did a backtrace and f= inish
> > repeatedly.
> > I couldn't figure out how to best get the timespec struct inf= o.
> >
> > Alternately if you want to throw out a sample test program for me= to build
> > and run, and what gdb commands to run to get the right info, happ= y to do
> > that too.
> >
> > gdb output is attached.
>
> If gdb reported it correctly, clock_gettime returned 403, which should=
> be impossible. It can only return 0 or -1. Incidentally, 403 is the > syscall number for SYS_clock_gettime64, which suggests your kernel is<= br> > simply *returning the syscall number* instead of -ENOSYS for syscalls<= br> > that don't exist on it. Is this a stock kernel (3.8 IIRC) or does = it
> have any sort of weird vendor patching? Any LSMs loaded?
>
> If you'd like to run a test just to make sure we're accurately= seeing
> what's happening, the attached should work. It should print 0 foll= owed
> by the current time in seconds and nanoseconds.

It looks like you hit the bug introduced in commit
554086d85e71f30abe46fc014fea31929a7c6a8a and fixed in commit
8142b215501f8b291a108a202b3a053a265b03dd. It looks like, since the
former was a CVE fix, somebody backported it to the kernel you're
using, but they failed to backport the fix-for-the-fix, so you have a
kernel that operates dangerously incorrectly for syscall numbers it's unaware of.

This really needs to be fixed in the kernel if you can. On our side
(musl) we probably need to find out if such kernels are actually out
in the wild, and if so, whether there's any reasonable way to detect the false success and treat it as failure.

> > On Thu, Feb 17, 2022 at 8:46 AM Rich Felker <dalias@aerifal.cx> wrote:
> >
> > > On Thu, Feb 17, 2022 at 08:30:47AM -0500, Satadru Pramanik w= rote:
> > > > *This is a failure:*
> > > > tcpdump -i any -vvv host 192.168.0.115
> > > > tcpdump: listening on any, link-type LINUX_SLL (Linux c= ooked v1), capture
> > > > size 262144 bytes
> > > > 08:29:38.043849 IP (tos 0x0, ttl 64, id 0, offset 0, fl= ags [DF], proto
> > > UDP
> > > > (17), length 56)
> > > >=C2=A0 =C2=A0 =C2=A0192.168.0.115.60625 > office.lan.= 53: [udp sum ok] 0+ A? google.com.
> > > (28)
> > > > 08:29:38.044237 IP (tos 0x0, ttl 64, id 11463, offset 0= , flags [DF],
> > > proto
> > > > UDP (17), length 72)
> > > >=C2=A0 =C2=A0 =C2=A0office.lan.53 > 192.168.0.115.606= 25: [bad udp cksum 0x820a ->
> > > 0x5c7d!]
> > > > 0 q: A? google.com. 1/0/0 google.com. [2m15s] A 142.250.80.110 = (44)
> > > > 08:29:38.047754 IP (tos 0x0, ttl 64, id 0, offset 0, fl= ags [DF], proto
> > > UDP
> > > > (17), length 56)
> > > >=C2=A0 =C2=A0 =C2=A0192.168.0.115.60625 > office.lan.= 53: [udp sum ok] 0+ AAAA?
> > > google.com.
> > > > (28)
> > > > 08:29:38.048078 IP (tos 0x0, ttl 64, id 11464, offset 0= , flags [DF],
> > > proto
> > > > UDP (17), length 84)
> > > >=C2=A0 =C2=A0 =C2=A0office.lan.53 > 192.168.0.115.606= 25: [bad udp cksum 0x8216 ->
> > > 0xb42f!]
> > > > 0 q: AAAA? google.com. 1/0/0 google.com. [4m26s] AAAA
> > > > 2607:f8b0:4006:80d::200e (56)
> > > > 08:29:38.048955 IP (tos 0xc0, ttl 64, id 59728, offset = 0, flags [none],
> > > > proto ICMP (1), length 112)
> > > >=C2=A0 =C2=A0 =C2=A0192.168.0.115 > office.lan: ICMP = 192.168.0.115 udp port 60625
> > > > unreachable, length 92
> > >=C2=A0 =C2=A0^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^= ^^^^^^^^^^^^^^^^^
> > >
> > > OK, this shows that the client has requested both answers an= d the
> > > nameserver replied almost immediately (about 0.5ms later), b= ut when
> > > the second reply arrives (to the AAAA), the client has alrea= dy closed
> > > the listening port, despite only a few ms having passed. The= only way
> > > I see this could happen is by "timing out". This s= uggests that
> > > something is wrong with telling time.
> > >
> > > Can you either put a breakpoint in __clock_gettime64 (this i= s the name
> > > you have to use for a breakpoint -- sorry I messed it up las= t time)
> > > and then see what it returns when you "finish" it = and what's in the
> > > timespec struct after that? Or just write a test program to = call
> > > clock_gettime(CLOCK_REALTIME, &ts) (note: you do NOT nee= d or want to
> > > use the time64 symbol name here) and print the results (retu= rn value
> > > and contents of the timespec struct).
> > >
> > >
> > >
> > > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0IP (tos 0x0, ttl 64, i= d 11464, offset 0, flags [DF], proto UDP
> > > > (17), length 84)
> > > >=C2=A0 =C2=A0 =C2=A0office.lan.53 > 192.168.0.115.606= 25: [udp sum ok] 0 q: AAAA?
> > > google.com.
> > > > 1/0/0 google.com. [4m26s] AAAA 2607:f8b0:4006:80d::200e (56= )
> > > > 08:29:39.476101 IP (tos 0x0, ttl 64, id 12690, offset 0= , flags [DF],
> > > proto
> > > > TCP (6), length 52)
> > > >=C2=A0 =C2=A0 =C2=A0192.168.0.115.51204 > lga34s35-in= -f3.1e100.net.80: Flags [.], cksum
> > > > 0xa666 (correct), seq 1466707759, ack 3358943837, win 1= 15, options
> > > > [nop,nop,TS val 198422160 ecr 2351261566], length 0
> > > > 08:29:39.478914 IP (tos 0x80, ttl 122, id 6227, offset = 0, flags [none],
> > > > proto TCP (6), length 52)
> > > >=C2=A0 =C2=A0 =C2=A0lga34s35-in-f3.1e100.net.80 > 192= .168.0.115.51204: Flags [.], cksum
> > > > 0xa5b7 (correct), seq 1, ack 1, win 282, options [nop,n= op,TS val
> > > 2351306585
> > > > ecr 198377148], length 0
> > > > ^C
> > > > 7 packets captured
> > > > 7 packets received by filter
> > > > 0 packets dropped by kernel
> > >
>
>

> #include <time.h>
> #include <stdio.h>
> int main()
> {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec ts;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0printf("%d", clock_gettime(CLOCK_R= EALTIME, &ts));
>=C2=A0 =C2=A0 =C2=A0 =C2=A0printf(" %lld %.9ld\n", (long long= )ts.tv_sec, ts.tv_nsec);
> }

--0000000000000bbe7e05d8395f65--