Just adding this to arch/i386/syscall_arch.h solves the build... #define ENOSYS 38 /* Invalid system call number */ On Thu, Feb 17, 2022 at 4:39 PM Satadru Pramanik wrote: > I'm noticing one small issue with this suggested patch: > > In file included from ../src_musl/src/internal/syscall.h:6, > from ../src_musl/src/dirent/opendir.c:6: > ../src_musl/arch/i386/syscall_arch.h: In function ‘__syscall0’: > ../src_musl/arch/i386/syscall_arch.h:17:28: error: ‘ENOSYS’ undeclared > (first use in this function) > 17 | if (n>350) return -ENOSYS; > | ^~~~~~ > ../src_musl/arch/i386/syscall_arch.h:17:28: note: each undeclared > identifier is reported only once for each function it appears in > ../src_musl/arch/i386/syscall_arch.h: In function ‘__syscall1’: > ../src_musl/arch/i386/syscall_arch.h:25:28: error: ‘ENOSYS’ undeclared > (first use in this function) > 25 | if (n>350) return -ENOSYS; > | ^~~~~~ > ../src_musl/arch/i386/syscall_arch.h: In function ‘__syscall2’: > ../src_musl/arch/i386/syscall_arch.h:33:28: error: ‘ENOSYS’ undeclared > (first use in this function) > 33 | if (n>350) return -ENOSYS; > > Should we be adding an include or just defining this locally? > > Satadru > > On Thu, Feb 17, 2022 at 1:17 PM Rich Felker wrote: > >> On Thu, Feb 17, 2022 at 11:36:31AM -0500, Satadru Pramanik wrote: >> > This machine is a EOL Samsung Series 5 Chromebook >> > < >> https://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/samsung-series-5-chromebook/ >> > >> > code >> > named Alex >> > < >> https://www.chromium.org/chromium-os/developer-information-for-chrome-os-devices/#:~:text=Series%205%20Chromebook-,Alex,-x86%2Dalex%20%26%20x86 >> > >> > .. >> > It is the target device for our i686 builds for Chromebrew. >> > >> > It is running a 3.8.11 kernel, and I believe the kernel source for that >> is >> > here: >> > >> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-3.8 >> > >> > Getting a signed kernel update for an EOL kernel for an EOL machine is >> > close to impossible from Google, so we're just trying to work around >> these >> >> If these are machines you're in control of, you may be able to load a >> module to patch it. If this is something you're deploying to users >> stuck on that kernel who don't want to fix their systems, then of >> course that's not a practical option. >> >> > issues in userspace to maintain some functionality for any users who may >> > still be using the device. >> > >> > The simplest workaround possible would be ideal. >> >> If you're shipping binaries specifically for these devices, the >> simplest fix is just to emulate the failure that should happen in the >> kernel in userspace, using the attached patch. DO NOT deploy this >> patch in binaries meant to be used on modern systems, since they will >> break when Y2038 rolls around. (Your old Chromebooks will break then >> too.) >> >> > It is interesting though >> > that the sample program works fine when built against near-stock glibc >> > 2.23, no? >> >> No. If your kernel has a bug that makes something behave wildly wrong, >> whether you do or don't see that as visible breakage with a particular >> piece of software is not particularly interesting. >> >> I'm pretty sure, however, that you just haven't tested enough to see >> any failures. glibc 2.23 is from 2016, so any functionality in it >> using syscalls added after 2011 (3.8 kernel) is going to blow up >> badly, thinking the syscall succeeded and returned some positive value >> when actually the kernel lacks it. >> >> In the particular case of clock_gettime, it's just that your glibc >> 2.23 has a hard Y2038 EOL and does not use/support the missing time64 >> syscalls. >> >> >> > On Thu, Feb 17, 2022 at 11:05 AM Rich Felker wrote: >> > >> > > On Thu, Feb 17, 2022 at 10:53:52AM -0500, Rich Felker wrote: >> > > > On Thu, Feb 17, 2022 at 09:49:45AM -0500, Satadru Pramanik wrote: >> > > > > Apologies for not being as familiar with gdb as I ought to be. >> > > > > I used the __clock_gettime64 breakpoint and did a backtrace and >> finish >> > > > > repeatedly. >> > > > > I couldn't figure out how to best get the timespec struct info. >> > > > > >> > > > > Alternately if you want to throw out a sample test program for me >> to >> > > build >> > > > > and run, and what gdb commands to run to get the right info, >> happy to >> > > do >> > > > > that too. >> > > > > >> > > > > gdb output is attached. >> > > > >> > > > If gdb reported it correctly, clock_gettime returned 403, which >> should >> > > > be impossible. It can only return 0 or -1. Incidentally, 403 is the >> > > > syscall number for SYS_clock_gettime64, which suggests your kernel >> is >> > > > simply *returning the syscall number* instead of -ENOSYS for >> syscalls >> > > > that don't exist on it. Is this a stock kernel (3.8 IIRC) or does it >> > > > have any sort of weird vendor patching? Any LSMs loaded? >> > > > >> > > > If you'd like to run a test just to make sure we're accurately >> seeing >> > > > what's happening, the attached should work. It should print 0 >> followed >> > > > by the current time in seconds and nanoseconds. >> > > >> > > It looks like you hit the bug introduced in commit >> > > 554086d85e71f30abe46fc014fea31929a7c6a8a and fixed in commit >> > > 8142b215501f8b291a108a202b3a053a265b03dd. It looks like, since the >> > > former was a CVE fix, somebody backported it to the kernel you're >> > > using, but they failed to backport the fix-for-the-fix, so you have a >> > > kernel that operates dangerously incorrectly for syscall numbers it's >> > > unaware of. >> > > >> > > This really needs to be fixed in the kernel if you can. On our side >> > > (musl) we probably need to find out if such kernels are actually out >> > > in the wild, and if so, whether there's any reasonable way to detect >> > > the false success and treat it as failure. >> > > >> > > > > On Thu, Feb 17, 2022 at 8:46 AM Rich Felker >> wrote: >> > > > > >> > > > > > On Thu, Feb 17, 2022 at 08:30:47AM -0500, Satadru Pramanik >> wrote: >> > > > > > > *This is a failure:* >> > > > > > > tcpdump -i any -vvv host 192.168.0.115 >> > > > > > > tcpdump: listening on any, link-type LINUX_SLL (Linux cooked >> v1), >> > > capture >> > > > > > > size 262144 bytes >> > > > > > > 08:29:38.043849 IP (tos 0x0, ttl 64, id 0, offset 0, flags >> [DF], >> > > proto >> > > > > > UDP >> > > > > > > (17), length 56) >> > > > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ A? >> > > google.com. >> > > > > > (28) >> > > > > > > 08:29:38.044237 IP (tos 0x0, ttl 64, id 11463, offset 0, flags >> > > [DF], >> > > > > > proto >> > > > > > > UDP (17), length 72) >> > > > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum >> 0x820a -> >> > > > > > 0x5c7d!] >> > > > > > > 0 q: A? google.com. 1/0/0 google.com. [2m15s] A >> 142.250.80.110 >> > > (44) >> > > > > > > 08:29:38.047754 IP (tos 0x0, ttl 64, id 0, offset 0, flags >> [DF], >> > > proto >> > > > > > UDP >> > > > > > > (17), length 56) >> > > > > > > 192.168.0.115.60625 > office.lan.53: [udp sum ok] 0+ AAAA? >> > > > > > google.com. >> > > > > > > (28) >> > > > > > > 08:29:38.048078 IP (tos 0x0, ttl 64, id 11464, offset 0, flags >> > > [DF], >> > > > > > proto >> > > > > > > UDP (17), length 84) >> > > > > > > office.lan.53 > 192.168.0.115.60625: [bad udp cksum >> 0x8216 -> >> > > > > > 0xb42f!] >> > > > > > > 0 q: AAAA? google.com. 1/0/0 google.com. [4m26s] AAAA >> > > > > > > 2607:f8b0:4006:80d::200e (56) >> > > > > > > 08:29:38.048955 IP (tos 0xc0, ttl 64, id 59728, offset 0, >> flags >> > > [none], >> > > > > > > proto ICMP (1), length 112) >> > > > > > > 192.168.0.115 > office.lan: ICMP 192.168.0.115 udp port >> 60625 >> > > > > > > unreachable, length 92 >> > > > > > >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> > > > > > >> > > > > > OK, this shows that the client has requested both answers and >> the >> > > > > > nameserver replied almost immediately (about 0.5ms later), but >> when >> > > > > > the second reply arrives (to the AAAA), the client has already >> closed >> > > > > > the listening port, despite only a few ms having passed. The >> only way >> > > > > > I see this could happen is by "timing out". This suggests that >> > > > > > something is wrong with telling time. >> > > > > > >> > > > > > Can you either put a breakpoint in __clock_gettime64 (this is >> the >> > > name >> > > > > > you have to use for a breakpoint -- sorry I messed it up last >> time) >> > > > > > and then see what it returns when you "finish" it and what's in >> the >> > > > > > timespec struct after that? Or just write a test program to call >> > > > > > clock_gettime(CLOCK_REALTIME, &ts) (note: you do NOT need or >> want to >> > > > > > use the time64 symbol name here) and print the results (return >> value >> > > > > > and contents of the timespec struct). >> > > > > > >> > > > > > >> > > > > > >> > > > > > > IP (tos 0x0, ttl 64, id 11464, offset 0, flags [DF], >> proto >> > > UDP >> > > > > > > (17), length 84) >> > > > > > > office.lan.53 > 192.168.0.115.60625: [udp sum ok] 0 q: >> AAAA? >> > > > > > google.com. >> > > > > > > 1/0/0 google.com. [4m26s] AAAA 2607:f8b0:4006:80d::200e (56) >> > > > > > > 08:29:39.476101 IP (tos 0x0, ttl 64, id 12690, offset 0, flags >> > > [DF], >> > > > > > proto >> > > > > > > TCP (6), length 52) >> > > > > > > 192.168.0.115.51204 > lga34s35-in-f3.1e100.net.80: Flags >> [.], >> > > cksum >> > > > > > > 0xa666 (correct), seq 1466707759, ack 3358943837, win 115, >> options >> > > > > > > [nop,nop,TS val 198422160 ecr 2351261566], length 0 >> > > > > > > 08:29:39.478914 IP (tos 0x80, ttl 122, id 6227, offset 0, >> flags >> > > [none], >> > > > > > > proto TCP (6), length 52) >> > > > > > > lga34s35-in-f3.1e100.net.80 > 192.168.0.115.51204: Flags >> [.], >> > > cksum >> > > > > > > 0xa5b7 (correct), seq 1, ack 1, win 282, options [nop,nop,TS >> val >> > > > > > 2351306585 >> > > > > > > ecr 198377148], length 0 >> > > > > > > ^C >> > > > > > > 7 packets captured >> > > > > > > 7 packets received by filter >> > > > > > > 0 packets dropped by kernel >> > > > > > >> > > > >> > > > >> > > >> > > > #include >> > > > #include >> > > > int main() >> > > > { >> > > > struct timespec ts; >> > > > printf("%d", clock_gettime(CLOCK_REALTIME, &ts)); >> > > > printf(" %lld %.9ld\n", (long long)ts.tv_sec, ts.tv_nsec); >> > > > } >> > > >> > > >> >