* [musl] Question about musl's time() implementation in time.c @ 2022-06-07 12:25 Zev Levy Stevenson 2022-06-07 14:29 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Zev Levy Stevenson @ 2022-06-07 12:25 UTC (permalink / raw) To: musl [-- Attachment #1: Type: text/plain, Size: 723 bytes --] Hi all, While running the libc-test test suite on a customized clang+musl build, I had trouble with some of the tests because of issues with time accuracy. I can go in detail if needed, but the problem seemed to boil down to the time() function in musl (in src/time/time.c) using a clock_gettime syscall (without vdso) instead of using the Linux time syscall that we expected it to use. Some other libc implementations use this syscall, and indeed after switching the syscall used in time () the tests passed, seemingly because the accuracy of the clocks used matched up. My main question is why musl's implementation doesn't use the time syscall, I'd be happy to hear if there was a special reason for this. Thanks, Zev [-- Attachment #2: Type: text/html, Size: 918 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-07 12:25 [musl] Question about musl's time() implementation in time.c Zev Levy Stevenson @ 2022-06-07 14:29 ` Arnd Bergmann 2022-06-07 16:30 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2022-06-07 14:29 UTC (permalink / raw) To: musl On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com> wrote: > > Hi all, > > While running the libc-test test suite on a customized clang+musl build, I had trouble with some of the tests because of issues with time accuracy. > I can go in detail if needed, but the problem seemed to boil down to the time() function in musl (in src/time/time.c) using a clock_gettime syscall (without vdso) instead of using the Linux time syscall that we expected it to use. Some other libc implementations use this syscall, and indeed after switching the syscall used in time () the tests passed, seemingly because the accuracy of the clocks used matched up. > My main question is why musl's implementation doesn't use the time syscall, I'd be happy to hear if there was a special reason for this. The time() syscall on 32-bit architectures returns a 32-bit integer, which overflows in y2038, only clock_gettime() has the required range. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-07 14:29 ` Arnd Bergmann @ 2022-06-07 16:30 ` Rich Felker 2022-06-10 8:52 ` Zev Levy Stevenson 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2022-06-07 16:30 UTC (permalink / raw) To: Arnd Bergmann; +Cc: musl, Zev Levy Stevenson On Tue, Jun 07, 2022 at 04:29:28PM +0200, Arnd Bergmann wrote: > On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com> wrote: > > > > Hi all, > > > > While running the libc-test test suite on a customized clang+musl > > build, I had trouble with some of the tests because of issues with > > time accuracy. > > I can go in detail if needed, but the problem seemed to boil down > > to the time() function in musl (in src/time/time.c) using a > > clock_gettime syscall (without vdso) instead of using the Linux > > time syscall that we expected it to use. Some other libc > > implementations use this syscall, and indeed after switching the > > syscall used in time () the tests passed, seemingly because the > > accuracy of the clocks used matched up. > > My main question is why musl's implementation doesn't use the time > > syscall, I'd be happy to hear if there was a special reason for > > this. > > The time() syscall on 32-bit architectures returns a 32-bit integer, > which overflows in y2038, only > clock_gettime() has the required range. This is indeed a good reason it can't be changed. Historically, though it was just a matter of avoiding code duplication. Due to the desire to support vdso and, clock_gettime requires a good deal of logic to find and use the vdso function and perform fallbacks if it's not available, or if the newer syscalls are not available. If time() did not use clock_gettime as its backend, but instead used separate kernel interfaces, this logic would need to be duplicated in time() too. It would also impose weird incentives for new archs to provide a time() syscall or vdso function, or would impose a requirement that we *also* duplicate the vdso logic to consume the vdso clock_gettime in time() if there's no vdso time(). If you've created an alternate kernel/syscall implementation where clock_gettime behaves badly and a legacy time syscall (or vdso function?) behaves good, that really doesn't seem like a good implementation choice. Especially if they produce mismatching output. I guess you could make a stretch argument that the implementation behaves as if someone is constantly changing the clock, but short of that, think it's even nonconforming (there's a single realtime clock and they're both supposed to return times in terms of it). Are there reasons you're trying to do things that way? Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-07 16:30 ` Rich Felker @ 2022-06-10 8:52 ` Zev Levy Stevenson 2022-06-14 16:50 ` Arnd Bergmann 2022-06-16 14:57 ` Rich Felker 0 siblings, 2 replies; 14+ messages in thread From: Zev Levy Stevenson @ 2022-06-10 8:52 UTC (permalink / raw) To: Rich Felker; +Cc: Arnd Bergmann, musl [-- Attachment #1: Type: text/plain, Size: 3894 bytes --] Thank you for the responses, those reasons make sense to me. We are using a very customized toolchain but the kernel itself is standard. We looked into it a bit further and we were able to reproduce the issue with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux kernel that we took from a standard Ubuntu distribution. Specifically, tests in the libc-test suite ( https://wiki.musl-libc.org/libc-test.html) using the time() function fail sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in every 1,000 runs. This can be reduced to this type of code failing: t = time(0); if(futimens(fd, ((struct timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1; if (fstat(fd, &st) != 0) return 1; if (st.st_atim.tv_sec < t) printf("time inconsistency\n"); When replacing the call to time(0) with a raw call to the Linux time() syscall the issue seems to disappear. On the other hand, using the clock_gettime syscall results in the same issue. Perhaps this is an issue with the Linux implementation of these syscalls / vdso functions, in which case further research may be required, or maybe such consistency when using different methods for measuring the system time doesn't have to be guaranteed, in which case the tests should probably be modified to allow for small inaccuracies such as the one described above. On Tue, Jun 7, 2022, 19:30 Rich Felker <dalias@libc.org> wrote: > On Tue, Jun 07, 2022 at 04:29:28PM +0200, Arnd Bergmann wrote: > > On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com> > wrote: > > > > > > Hi all, > > > > > > While running the libc-test test suite on a customized clang+musl > > > build, I had trouble with some of the tests because of issues with > > > time accuracy. > > > I can go in detail if needed, but the problem seemed to boil down > > > to the time() function in musl (in src/time/time.c) using a > > > clock_gettime syscall (without vdso) instead of using the Linux > > > time syscall that we expected it to use. Some other libc > > > implementations use this syscall, and indeed after switching the > > > syscall used in time () the tests passed, seemingly because the > > > accuracy of the clocks used matched up. > > > My main question is why musl's implementation doesn't use the time > > > syscall, I'd be happy to hear if there was a special reason for > > > this. > > > > The time() syscall on 32-bit architectures returns a 32-bit integer, > > which overflows in y2038, only > > clock_gettime() has the required range. > > This is indeed a good reason it can't be changed. Historically, though > it was just a matter of avoiding code duplication. Due to the desire > to support vdso and, clock_gettime requires a good deal of logic to > find and use the vdso function and perform fallbacks if it's not > available, or if the newer syscalls are not available. If time() did > not use clock_gettime as its backend, but instead used separate kernel > interfaces, this logic would need to be duplicated in time() too. It > would also impose weird incentives for new archs to provide a time() > syscall or vdso function, or would impose a requirement that we *also* > duplicate the vdso logic to consume the vdso clock_gettime in time() > if there's no vdso time(). > > If you've created an alternate kernel/syscall implementation where > clock_gettime behaves badly and a legacy time syscall (or vdso > function?) behaves good, that really doesn't seem like a good > implementation choice. Especially if they produce mismatching output. > I guess you could make a stretch argument that the implementation > behaves as if someone is constantly changing the clock, but short of > that, think it's even nonconforming (there's a single realtime clock > and they're both supposed to return times in terms of it). Are there > reasons you're trying to do things that way? > > Rich > [-- Attachment #2: Type: text/html, Size: 4843 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-10 8:52 ` Zev Levy Stevenson @ 2022-06-14 16:50 ` Arnd Bergmann 2022-06-14 17:00 ` Rich Felker 2022-06-16 14:57 ` Rich Felker 1 sibling, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2022-06-14 16:50 UTC (permalink / raw) To: musl; +Cc: Rich Felker On Fri, Jun 10, 2022 at 10:53 AM Zev Levy Stevenson <zevlevys@gmail.com> wrote: > > Thank you for the responses, those reasons make sense to me. We are using a very customized toolchain but the kernel itself is standard. > We looked into it a bit further and we were able to reproduce the issue with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux kernel that we took from a standard Ubuntu distribution. > Specifically, tests in the libc-test suite (https://wiki.musl-libc.org/libc-test.html) using the time() function fail sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in every 1,000 runs. This can be reduced to this type of code failing: > > t = time(0); > if(futimens(fd, ((struct timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1; > if (fstat(fd, &st) != 0) return 1; > if (st.st_atim.tv_sec < t) printf("time inconsistency\n"); > > When replacing the call to time(0) with a raw call to the Linux time() syscall the issue seems to disappear. On the other hand, using the clock_gettime syscall results in the same issue. > Perhaps this is an issue with the Linux implementation of these syscalls / vdso functions, in which case further research may be required, or maybe such consistency when using different methods for measuring the system time doesn't have to be guaranteed, in which case the tests should probably be modified to allow for small inaccuracies such as the one described above. I think the problem here is likely the inconsistency between the COCK_REALTIME and CLOCK_REALTIME_COARSE domains. The time() syscall in the kernel is based on CLOCK_REALTIME_COARSE, while musl uses CLOCK_REALTIME. The futimens() syscall in turn uses CLOCK_REALTIME_COARSE. The coarse time can be up to one timer tick behind, so reading CLOCK_REALTIME first can give you the exact second with a small nanosecond value, while the utime will still set the previous value. Can you change the test case to check if the later time is less than clock_getres(CLOCK_REALTIME_COARSE, ...) behind? Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 16:50 ` Arnd Bergmann @ 2022-06-14 17:00 ` Rich Felker 2022-06-14 20:37 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2022-06-14 17:00 UTC (permalink / raw) To: Arnd Bergmann; +Cc: musl On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote: > On Fri, Jun 10, 2022 at 10:53 AM Zev Levy Stevenson <zevlevys@gmail.com> wrote: > > > > Thank you for the responses, those reasons make sense to me. We are using a very customized toolchain but the kernel itself is standard. > > We looked into it a bit further and we were able to reproduce the issue with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux kernel that we took from a standard Ubuntu distribution. > > Specifically, tests in the libc-test suite (https://wiki.musl-libc.org/libc-test.html) using the time() function fail sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in every 1,000 runs. This can be reduced to this type of code failing: > > > > t = time(0); > > if(futimens(fd, ((struct timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1; > > if (fstat(fd, &st) != 0) return 1; > > if (st.st_atim.tv_sec < t) printf("time inconsistency\n"); > > > > When replacing the call to time(0) with a raw call to the Linux time() syscall the issue seems to disappear. On the other hand, using the clock_gettime syscall results in the same issue. > > Perhaps this is an issue with the Linux implementation of these syscalls / vdso functions, in which case further research may be required, or maybe such consistency when using different methods for measuring the system time doesn't have to be guaranteed, in which case the tests should probably be modified to allow for small inaccuracies such as the one described above. > > I think the problem here is likely the inconsistency between the > COCK_REALTIME and > CLOCK_REALTIME_COARSE domains. > > The time() syscall in the kernel is based on CLOCK_REALTIME_COARSE, while musl > uses CLOCK_REALTIME. The futimens() syscall in turn uses > CLOCK_REALTIME_COARSE. > > The coarse time can be up to one timer tick behind, so reading > CLOCK_REALTIME first > can give you the exact second with a small nanosecond value, while the > utime will still > set the previous value. > > Can you change the test case to check if the later time is less than > clock_getres(CLOCK_REALTIME_COARSE, ...) behind? This seems like a bug that the kernel uses the wrong clock for setting file timestamps. It can result in seeing events out-of-order (exactly as described in this thread). This should really be fixed or at least made switchable so users who care can fix it. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 17:00 ` Rich Felker @ 2022-06-14 20:37 ` Arnd Bergmann 2022-06-14 20:49 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2022-06-14 20:37 UTC (permalink / raw) To: musl On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote: > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote: > > The coarse time can be up to one timer tick behind, so reading > > CLOCK_REALTIME first > > can give you the exact second with a small nanosecond value, while the > > utime will still > > set the previous value. > > > > Can you change the test case to check if the later time is less than > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind? > > This seems like a bug that the kernel uses the wrong clock for setting > file timestamps. It can result in seeing events out-of-order (exactly > as described in this thread). This should really be fixed or at least > made switchable so users who care can fix it. I can't find any reference to what the correct clock is here, are you sure that this is specified at all? The decision to use the coarse time in the kernel is definitely intentional, as reading the hardware clocksource can be expensive (depending on the hardware), and changing the behavior would likely break applications that rely on it being the coarse clock. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 20:37 ` Arnd Bergmann @ 2022-06-14 20:49 ` Rich Felker 2022-06-14 21:11 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2022-06-14 20:49 UTC (permalink / raw) To: Arnd Bergmann; +Cc: musl On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote: > On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote: > > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote: > > > The coarse time can be up to one timer tick behind, so reading > > > CLOCK_REALTIME first > > > can give you the exact second with a small nanosecond value, while the > > > utime will still > > > set the previous value. > > > > > > Can you change the test case to check if the later time is less than > > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind? > > > > This seems like a bug that the kernel uses the wrong clock for setting > > file timestamps. It can result in seeing events out-of-order (exactly > > as described in this thread). This should really be fixed or at least > > made switchable so users who care can fix it. > > I can't find any reference to what the correct clock is here, > are you sure that this is specified at all? The decision to use the coarse > time in the kernel is definitely intentional, as reading the hardware > clocksource can be expensive (depending on the hardware), and > changing the behavior would likely break applications that rely on > it being the coarse clock. POSIX specifies operations that set the file timestamps in terms of the system (CLOCK_REALTIME) clock, not a weird implementation-defined alternate clock. Maybe you're right that getting the correct clock is costly on some archs, but it's almost surely not on any arch that admits vdso clock_gettime. And "race that causes applications to see wrong ordering of filesystem operations with respect to other activity for the sake of performance" does not seem like a good idea. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 20:49 ` Rich Felker @ 2022-06-14 21:11 ` Arnd Bergmann 2022-06-14 23:28 ` Rich Felker 0 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2022-06-14 21:11 UTC (permalink / raw) To: musl On Tue, Jun 14, 2022 at 10:49 PM Rich Felker <dalias@libc.org> wrote: > On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote: > > On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote: > > > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote: > > > > The coarse time can be up to one timer tick behind, so reading > > > > CLOCK_REALTIME first > > > > can give you the exact second with a small nanosecond value, while the > > > > utime will still > > > > set the previous value. > > > > > > > > Can you change the test case to check if the later time is less than > > > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind? > > > > > > This seems like a bug that the kernel uses the wrong clock for setting > > > file timestamps. It can result in seeing events out-of-order (exactly > > > as described in this thread). This should really be fixed or at least > > > made switchable so users who care can fix it. > > > > I can't find any reference to what the correct clock is here, > > are you sure that this is specified at all? The decision to use the coarse > > time in the kernel is definitely intentional, as reading the hardware > > clocksource can be expensive (depending on the hardware), and > > changing the behavior would likely break applications that rely on > > it being the coarse clock. > > POSIX specifies operations that set the file timestamps in terms of > the system (CLOCK_REALTIME) clock, not a weird implementation-defined > alternate clock. > > Maybe you're right that getting the correct clock is costly on some > archs, but it's almost surely not on any arch that admits vdso > clock_gettime. And "race that causes applications to see wrong > ordering of filesystem operations with respect to other activity for > the sake of performance" does not seem like a good idea. The thing is that a lot of file systems would still behave the same way because they round times down to a filesystem specific resolution, often one microsecond or one second, while the kernel time accounting is in nanoseconds. There have been discussions about an interface to find out what the actual resolution on a given mount point is (similar to clock_getres), but that never made it in. The guarantees that you get from file systems at the moment are: - the timestamp is always rounded down, not up, so a newly created file never gets a timestamp that is newer than either CLOCK_REALTIME or CLOCK_REALTIME_COARSE as reported by a subsequent clock_gettime()/gettimeofday()/time(). - the in-memory timestamp is the same that you read back after umount/mount, and gets adjusted for both resolution and range of the on-disk representation. - any file system that supports timestamps (some always report tv_sec=0) set the timestamps to at most three seconds before the current time as read by an earlier time() syscall. Making it use CLOCK_REALTIME instead of CLOCK_REALTIME_COARSE would improve the third guarantee so it could be within two seconds (or one second on file systems with full-second resolution like ext3), but would break the first rule by making it report timestamps that can be either before or after the time reported by the time() syscall. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 21:11 ` Arnd Bergmann @ 2022-06-14 23:28 ` Rich Felker 2022-06-15 12:09 ` Arnd Bergmann 0 siblings, 1 reply; 14+ messages in thread From: Rich Felker @ 2022-06-14 23:28 UTC (permalink / raw) To: Arnd Bergmann; +Cc: musl On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote: > On Tue, Jun 14, 2022 at 10:49 PM Rich Felker <dalias@libc.org> wrote: > > On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote: > > > On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote: > > > > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote: > > > > > The coarse time can be up to one timer tick behind, so reading > > > > > CLOCK_REALTIME first > > > > > can give you the exact second with a small nanosecond value, while the > > > > > utime will still > > > > > set the previous value. > > > > > > > > > > Can you change the test case to check if the later time is less than > > > > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind? > > > > > > > > This seems like a bug that the kernel uses the wrong clock for setting > > > > file timestamps. It can result in seeing events out-of-order (exactly > > > > as described in this thread). This should really be fixed or at least > > > > made switchable so users who care can fix it. > > > > > > I can't find any reference to what the correct clock is here, > > > are you sure that this is specified at all? The decision to use the coarse > > > time in the kernel is definitely intentional, as reading the hardware > > > clocksource can be expensive (depending on the hardware), and > > > changing the behavior would likely break applications that rely on > > > it being the coarse clock. > > > > POSIX specifies operations that set the file timestamps in terms of > > the system (CLOCK_REALTIME) clock, not a weird implementation-defined > > alternate clock. > > > > Maybe you're right that getting the correct clock is costly on some > > archs, but it's almost surely not on any arch that admits vdso > > clock_gettime. And "race that causes applications to see wrong > > ordering of filesystem operations with respect to other activity for > > the sake of performance" does not seem like a good idea. > > The thing is that a lot of file systems would still behave the same way > because they round times down to a filesystem specific resolution, > often one microsecond or one second, while the kernel time accounting > is in nanoseconds. There have been discussions about an interface > to find out what the actual resolution on a given mount point is (similar > to clock_getres), but that never made it in. The guarantees that you > get from file systems at the moment are: It's normal that they may be rounded down the the filesystem timestamp granularity. I thought what was going on here was worse. > - the timestamp is always rounded down, not up, so a newly > created file never gets a timestamp that is newer than either > CLOCK_REALTIME or CLOCK_REALTIME_COARSE as > reported by a subsequent clock_gettime()/gettimeofday()/time(). > > - the in-memory timestamp is the same that you read back > after umount/mount, and gets adjusted for both resolution > and range of the on-disk representation. > > - any file system that supports timestamps (some always > report tv_sec=0) set the timestamps to at most three > seconds before the current time as read by an earlier > time() syscall. > > Making it use CLOCK_REALTIME instead of > CLOCK_REALTIME_COARSE would improve the third > guarantee so it could be within two seconds (or one second > on file systems with full-second resolution like ext3), but would > break the first rule by making it report timestamps that can > be either before or after the time reported by the time() syscall. OK, the time syscall doing the wrong thing here (using a different clock that's not correctly ordered with respect to CLOCK_REALTIME) seems to be the worst problem here -- if I'm understanding it right. The filesystem issue might be a non-issue if it's truly equivalent to just having coarser fs timestamp granularity, which is allowed. Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-14 23:28 ` Rich Felker @ 2022-06-15 12:09 ` Arnd Bergmann 2022-06-15 16:55 ` Adhemerval Zanella 2022-06-16 9:06 ` Thomas Gleixner 0 siblings, 2 replies; 14+ messages in thread From: Arnd Bergmann @ 2022-06-15 12:09 UTC (permalink / raw) To: musl Cc: John Stultz, Stephen Boyd, Linux Kernel Mailing List, Thomas Gleixner, Adhemerval Zanella On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote: > On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote: > > > > The thing is that a lot of file systems would still behave the same way > > because they round times down to a filesystem specific resolution, > > often one microsecond or one second, while the kernel time accounting > > is in nanoseconds. There have been discussions about an interface > > to find out what the actual resolution on a given mount point is (similar > > to clock_getres), but that never made it in. The guarantees that you > > get from file systems at the moment are: > > It's normal that they may be rounded down the the filesystem timestamp > granularity. I thought what was going on here was worse. It gets rounded down twice: first down to the start of the current timer tick, which is at an arbitrary nanosecond value in the past 10ms, and then to the resolution of the file system. The result is that the file timestamp can point to a slightly earlier value, up to max(timer tick cycle, fs resolution) before the actual nanosecond value. We don't advertise the granule of the file system though, so I would expect this to be within the expected behavior. > OK, the time syscall doing the wrong thing here (using a different > clock that's not correctly ordered with respect to CLOCK_REALTIME) > seems to be the worst problem here -- if I'm understanding it right. > The filesystem issue might be a non-issue if it's truly equivalent to > just having coarser fs timestamp granularity, which is allowed. Adding the kernel timekeeping maintainers to Cc. I think this is a reasonable argument, but it goes against the current behavior. We have four implementations of the time() syscall that one would commonly encounter: - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...) - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...) So even if everyone agrees that the musl implementation is the correct one, I think both linux and glibc are more likely to stick with the traditional behavior to avoid breaking user space code such as the libc-test case that Zev brought up initially. At least Adhemerval's time() implementation in glibc[1] appears to have done this intentionally, while the Linux implementation has simply never changed this in an incompatible way since Linux-0.01 added time() and 0.99.13k added the high-resolution gettimeofday(). Arnd [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0d56378349 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-15 12:09 ` Arnd Bergmann @ 2022-06-15 16:55 ` Adhemerval Zanella 2022-06-16 9:06 ` Thomas Gleixner 1 sibling, 0 replies; 14+ messages in thread From: Adhemerval Zanella @ 2022-06-15 16:55 UTC (permalink / raw) To: Arnd Bergmann Cc: musl, John Stultz, Stephen Boyd, Linux Kernel Mailing List, Thomas Gleixner > On 15 Jun 2022, at 05:09, Arnd Bergmann <arnd@kernel.org> wrote: > > On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote: >> On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote: >>> >>> The thing is that a lot of file systems would still behave the same way >>> because they round times down to a filesystem specific resolution, >>> often one microsecond or one second, while the kernel time accounting >>> is in nanoseconds. There have been discussions about an interface >>> to find out what the actual resolution on a given mount point is (similar >>> to clock_getres), but that never made it in. The guarantees that you >>> get from file systems at the moment are: >> >> It's normal that they may be rounded down the the filesystem timestamp >> granularity. I thought what was going on here was worse. > > It gets rounded down twice: first down to the start of the current > timer tick, which is at an arbitrary nanosecond value in the past 10ms, > and then to the resolution of the file system. The result is that the > file timestamp can point to a slightly earlier value, up to max(timer tick > cycle, fs resolution) before the actual nanosecond value. We don't > advertise the granule of the file system though, so I would expect > this to be within the expected behavior. > >> OK, the time syscall doing the wrong thing here (using a different >> clock that's not correctly ordered with respect to CLOCK_REALTIME) >> seems to be the worst problem here -- if I'm understanding it right. >> The filesystem issue might be a non-issue if it's truly equivalent to >> just having coarser fs timestamp granularity, which is allowed. > > Adding the kernel timekeeping maintainers to Cc. I think this is a > reasonable argument, but it goes against the current behavior. > > We have four implementations of the time() syscall that one would > commonly encounter: > > - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE > - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE > - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...) > - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...) > > So even if everyone agrees that the musl implementation is the > correct one, I think both linux and glibc are more likely to stick with > the traditional behavior to avoid breaking user space code such as the > libc-test case that Zev brought up initially. At least Adhemerval's > time() implementation in glibc[1] appears to have done this intentionally, > while the Linux implementation has simply never changed this in an > incompatible way since Linux-0.01 added time() and 0.99.13k added > the high-resolution gettimeofday(). > > Arnd > > [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0d56378349 Indeed I have changed glibc to be consistent on all architectures to mimic kernel behavior time syscall and avoid this very issue. We did not have a consistent implementation before, so glibc varied depending of architecture and kernel version whether it uses CLOCK_REALTIME or CLOCK_REALTIME_COARSE. If kernel does change to make time() use CLOCK_REALTIME, it would make sense to make glibc __clock_gettime64 to use it as well. We will also need to either disable time vDSO usage on x86 and powerpc or make kernel implementation to use CLOCK_REALTIME as well. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-15 12:09 ` Arnd Bergmann 2022-06-15 16:55 ` Adhemerval Zanella @ 2022-06-16 9:06 ` Thomas Gleixner 1 sibling, 0 replies; 14+ messages in thread From: Thomas Gleixner @ 2022-06-16 9:06 UTC (permalink / raw) To: Arnd Bergmann, musl Cc: John Stultz, Stephen Boyd, Linux Kernel Mailing List, Adhemerval Zanella On Wed, Jun 15 2022 at 14:09, Arnd Bergmann wrote: > On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote: > Adding the kernel timekeeping maintainers to Cc. I think this is a > reasonable argument, but it goes against the current behavior. > > We have four implementations of the time() syscall that one would > commonly encounter: > > - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE > - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE > - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...) > - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...) > > So even if everyone agrees that the musl implementation is the > correct one, I think both linux and glibc are more likely to stick with > the traditional behavior to avoid breaking user space code such as the > libc-test case that Zev brought up initially. At least Adhemerval's > time() implementation in glibc[1] appears to have done this intentionally, > while the Linux implementation has simply never changed this in an > incompatible way since Linux-0.01 added time() and 0.99.13k added > the high-resolution gettimeofday(). That's correct. Assumed this call order: clock_gettime(REALTIME, &tr); clock_gettime(REALTIME_COARSE, &tc); tt = time(); You can observe tr->sec > tc->sec tr->sec > tt but you can never observe tc->sec > tt The reason for this is historical and time() has a distinct performance advantage as it boils down to a single read and does not require the sequence count (at least on 64bit). Coarse REALTIME requires the seqcount, but avoids the hardware read and the larger math. The costy part is the hardware read. Before TSC became usable, the hardware read was a matter of microseconds, so avoiding it was a significant performance gain. With a loop of 1e9 reads (including the loop overhead) as measured with perf on a halfways recent SKL the average per invocation is: time() 7 cycles clock_gettime(REAL_COARSE) 21 cycles clock_gettime(REAL) TSC 60 cycles clock_gettime(REAL) HPET 6092 cycles (~2000 cycles syscall overhead) clock_gettime(REAL) ACPI_PM 4096 cycles (~2000 cycles syscall overhead) So at the very end it boils down to performance and expectations. File systems have chosen their granularity and the underlying mechanism to get the timestamp according to that. It's clearly not well documented, but I doubt that we can change the implementation without running into measurable performance regressions. VDSO based time() vs. clock_gettime(REAL) TSC is almost an order of magnitude... Thanks, tglx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [musl] Question about musl's time() implementation in time.c 2022-06-10 8:52 ` Zev Levy Stevenson 2022-06-14 16:50 ` Arnd Bergmann @ 2022-06-16 14:57 ` Rich Felker 1 sibling, 0 replies; 14+ messages in thread From: Rich Felker @ 2022-06-16 14:57 UTC (permalink / raw) To: Zev Levy Stevenson; +Cc: Arnd Bergmann, musl On Fri, Jun 10, 2022 at 11:52:50AM +0300, Zev Levy Stevenson wrote: > Thank you for the responses, those reasons make sense to me. We are using a > very customized toolchain but the kernel itself is standard. > We looked into it a bit further and we were able to reproduce the issue > with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux > kernel that we took from a standard Ubuntu distribution. > Specifically, tests in the libc-test suite ( > https://wiki.musl-libc.org/libc-test.html) using the time() function fail > sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in > every 1,000 runs. This can be reduced to this type of code failing: > > t = time(0); > if(futimens(fd, ((struct > timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1; > if (fstat(fd, &st) != 0) return 1; > if (st.st_atim.tv_sec < t) printf("time inconsistency\n"); > > When replacing the call to time(0) with a raw call to the Linux time() > syscall the issue seems to disappear. On the other hand, using the > clock_gettime syscall results in the same issue. > Perhaps this is an issue with the Linux implementation of these syscalls / > vdso functions, in which case further research may be required, or maybe > such consistency when using different methods for measuring the system time > doesn't have to be guaranteed, in which case the tests should probably be > modified to allow for small inaccuracies such as the one described above. Going back to the issue that motivated this thread: perhaps libc-test should not use time(0) here but instead a second tmpfile() with a write to it followed by fstat, to get a timestamp out of the (same, temp) filesystem. Then it's not encoding any assumption about correct ordering between realtime clock and fs timestamps, which might be broken by fs timestamp granularity or the kernel issue discussed in this thread. It's only encoding an assumption that timestamps on the same filesystem are correctly ordered with respect to each other (which is necessary in order for make, etc. to work right). Rich ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-06-16 14:57 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-07 12:25 [musl] Question about musl's time() implementation in time.c Zev Levy Stevenson 2022-06-07 14:29 ` Arnd Bergmann 2022-06-07 16:30 ` Rich Felker 2022-06-10 8:52 ` Zev Levy Stevenson 2022-06-14 16:50 ` Arnd Bergmann 2022-06-14 17:00 ` Rich Felker 2022-06-14 20:37 ` Arnd Bergmann 2022-06-14 20:49 ` Rich Felker 2022-06-14 21:11 ` Arnd Bergmann 2022-06-14 23:28 ` Rich Felker 2022-06-15 12:09 ` Arnd Bergmann 2022-06-15 16:55 ` Adhemerval Zanella 2022-06-16 9:06 ` Thomas Gleixner 2022-06-16 14:57 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).