mailing list of musl libc
 help / color / mirror / code / Atom feed
* [musl] Question about musl's time() implementation in time.c
@ 2022-06-07 12:25 Zev Levy Stevenson
  2022-06-07 14:29 ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Zev Levy Stevenson @ 2022-06-07 12:25 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

Hi all,

While running the libc-test test suite on a customized clang+musl build, I
had trouble with some of the tests because of issues with time accuracy.
I can go in detail if needed, but the problem seemed to boil down to the
time() function in musl (in src/time/time.c) using a clock_gettime syscall
(without vdso) instead of using the Linux time syscall that we expected it
to use. Some other libc implementations use this syscall, and indeed after
switching the syscall used in time () the tests passed, seemingly because
the accuracy of the clocks used matched up.
My main question is why musl's implementation doesn't use the time syscall,
I'd be happy to hear if there was a special reason for this.

Thanks, Zev

[-- Attachment #2: Type: text/html, Size: 918 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-07 12:25 [musl] Question about musl's time() implementation in time.c Zev Levy Stevenson
@ 2022-06-07 14:29 ` Arnd Bergmann
  2022-06-07 16:30   ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2022-06-07 14:29 UTC (permalink / raw)
  To: musl

On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com> wrote:
>
> Hi all,
>
> While running the libc-test test suite on a customized clang+musl build, I had trouble with some of the tests because of issues with time accuracy.
> I can go in detail if needed, but the problem seemed to boil down to the time() function in musl (in src/time/time.c) using a clock_gettime syscall (without vdso) instead of using the Linux time syscall that we expected it to use. Some other libc implementations use this syscall, and indeed after switching the syscall used in time () the tests passed, seemingly because the accuracy of the clocks used matched up.
> My main question is why musl's implementation doesn't use the time syscall, I'd be happy to hear if there was a special reason for this.

The time() syscall on 32-bit architectures returns a 32-bit integer,
which overflows in y2038, only
clock_gettime() has the required range.

       Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-07 14:29 ` Arnd Bergmann
@ 2022-06-07 16:30   ` Rich Felker
  2022-06-10  8:52     ` Zev Levy Stevenson
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2022-06-07 16:30 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: musl, Zev Levy Stevenson

On Tue, Jun 07, 2022 at 04:29:28PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com> wrote:
> >
> > Hi all,
> >
> > While running the libc-test test suite on a customized clang+musl
> > build, I had trouble with some of the tests because of issues with
> > time accuracy.
> > I can go in detail if needed, but the problem seemed to boil down
> > to the time() function in musl (in src/time/time.c) using a
> > clock_gettime syscall (without vdso) instead of using the Linux
> > time syscall that we expected it to use. Some other libc
> > implementations use this syscall, and indeed after switching the
> > syscall used in time () the tests passed, seemingly because the
> > accuracy of the clocks used matched up.
> > My main question is why musl's implementation doesn't use the time
> > syscall, I'd be happy to hear if there was a special reason for
> > this.
> 
> The time() syscall on 32-bit architectures returns a 32-bit integer,
> which overflows in y2038, only
> clock_gettime() has the required range.

This is indeed a good reason it can't be changed. Historically, though
it was just a matter of avoiding code duplication. Due to the desire
to support vdso and, clock_gettime requires a good deal of logic to
find and use the vdso function and perform fallbacks if it's not
available, or if the newer syscalls are not available. If time() did
not use clock_gettime as its backend, but instead used separate kernel
interfaces, this logic would need to be duplicated in time() too. It
would also impose weird incentives for new archs to provide a time()
syscall or vdso function, or would impose a requirement that we *also*
duplicate the vdso logic to consume the vdso clock_gettime in time()
if there's no vdso time().

If you've created an alternate kernel/syscall implementation where
clock_gettime behaves badly and a legacy time syscall (or vdso
function?) behaves good, that really doesn't seem like a good
implementation choice. Especially if they produce mismatching output.
I guess you could make a stretch argument that the implementation
behaves as if someone is constantly changing the clock, but short of
that, think it's even nonconforming (there's a single realtime clock
and they're both supposed to return times in terms of it). Are there
reasons you're trying to do things that way?

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-07 16:30   ` Rich Felker
@ 2022-06-10  8:52     ` Zev Levy Stevenson
  2022-06-14 16:50       ` Arnd Bergmann
  2022-06-16 14:57       ` Rich Felker
  0 siblings, 2 replies; 14+ messages in thread
From: Zev Levy Stevenson @ 2022-06-10  8:52 UTC (permalink / raw)
  To: Rich Felker; +Cc: Arnd Bergmann, musl

[-- Attachment #1: Type: text/plain, Size: 3894 bytes --]

Thank you for the responses, those reasons make sense to me. We are using a
very customized toolchain but the kernel itself is standard.
We looked into it a bit further and we were able to reproduce the issue
with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux
kernel that we took from a standard Ubuntu distribution.
Specifically, tests in the libc-test suite (
https://wiki.musl-libc.org/libc-test.html) using the time() function fail
sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in
every 1,000 runs. This can be reduced to this type of code failing:

t = time(0);
if(futimens(fd, ((struct
timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1;
if (fstat(fd, &st) != 0) return 1;
if (st.st_atim.tv_sec < t) printf("time inconsistency\n");

When replacing the call to time(0) with a raw call to the Linux time()
syscall the issue seems to disappear. On the other hand, using the
clock_gettime syscall results in the same issue.
Perhaps this is an issue with the Linux implementation of these syscalls /
vdso functions, in which case further research may be required, or maybe
such consistency when using different methods for measuring the system time
doesn't have to be guaranteed, in which case the tests should probably be
modified to allow for small inaccuracies such as the one described above.

On Tue, Jun 7, 2022, 19:30 Rich Felker <dalias@libc.org> wrote:

> On Tue, Jun 07, 2022 at 04:29:28PM +0200, Arnd Bergmann wrote:
> > On Tue, Jun 7, 2022 at 2:25 PM Zev Levy Stevenson <zevlevys@gmail.com>
> wrote:
> > >
> > > Hi all,
> > >
> > > While running the libc-test test suite on a customized clang+musl
> > > build, I had trouble with some of the tests because of issues with
> > > time accuracy.
> > > I can go in detail if needed, but the problem seemed to boil down
> > > to the time() function in musl (in src/time/time.c) using a
> > > clock_gettime syscall (without vdso) instead of using the Linux
> > > time syscall that we expected it to use. Some other libc
> > > implementations use this syscall, and indeed after switching the
> > > syscall used in time () the tests passed, seemingly because the
> > > accuracy of the clocks used matched up.
> > > My main question is why musl's implementation doesn't use the time
> > > syscall, I'd be happy to hear if there was a special reason for
> > > this.
> >
> > The time() syscall on 32-bit architectures returns a 32-bit integer,
> > which overflows in y2038, only
> > clock_gettime() has the required range.
>
> This is indeed a good reason it can't be changed. Historically, though
> it was just a matter of avoiding code duplication. Due to the desire
> to support vdso and, clock_gettime requires a good deal of logic to
> find and use the vdso function and perform fallbacks if it's not
> available, or if the newer syscalls are not available. If time() did
> not use clock_gettime as its backend, but instead used separate kernel
> interfaces, this logic would need to be duplicated in time() too. It
> would also impose weird incentives for new archs to provide a time()
> syscall or vdso function, or would impose a requirement that we *also*
> duplicate the vdso logic to consume the vdso clock_gettime in time()
> if there's no vdso time().
>
> If you've created an alternate kernel/syscall implementation where
> clock_gettime behaves badly and a legacy time syscall (or vdso
> function?) behaves good, that really doesn't seem like a good
> implementation choice. Especially if they produce mismatching output.
> I guess you could make a stretch argument that the implementation
> behaves as if someone is constantly changing the clock, but short of
> that, think it's even nonconforming (there's a single realtime clock
> and they're both supposed to return times in terms of it). Are there
> reasons you're trying to do things that way?
>
> Rich
>

[-- Attachment #2: Type: text/html, Size: 4843 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-10  8:52     ` Zev Levy Stevenson
@ 2022-06-14 16:50       ` Arnd Bergmann
  2022-06-14 17:00         ` Rich Felker
  2022-06-16 14:57       ` Rich Felker
  1 sibling, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2022-06-14 16:50 UTC (permalink / raw)
  To: musl; +Cc: Rich Felker

On Fri, Jun 10, 2022 at 10:53 AM Zev Levy Stevenson <zevlevys@gmail.com> wrote:
>
> Thank you for the responses, those reasons make sense to me. We are using a very customized toolchain but the kernel itself is standard.
> We looked into it a bit further and we were able to reproduce the issue with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux kernel that we took from a standard Ubuntu distribution.
> Specifically, tests in the libc-test suite (https://wiki.musl-libc.org/libc-test.html) using the time() function fail sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in every 1,000 runs. This can be reduced to this type of code failing:
>
> t = time(0);
> if(futimens(fd, ((struct timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1;
> if (fstat(fd, &st) != 0) return 1;
> if (st.st_atim.tv_sec < t) printf("time inconsistency\n");
>
> When replacing the call to time(0) with a raw call to the Linux time() syscall the issue seems to disappear. On the other hand, using the clock_gettime syscall results in the same issue.
> Perhaps this is an issue with the Linux implementation of these syscalls / vdso functions, in which case further research may be required, or maybe such consistency when using different methods for measuring the system time doesn't have to be guaranteed, in which case the tests should probably be modified to allow for small inaccuracies such as the one described above.

I think the problem here is likely the inconsistency between the
COCK_REALTIME and
CLOCK_REALTIME_COARSE domains.

The time() syscall in the kernel is based on CLOCK_REALTIME_COARSE, while musl
uses CLOCK_REALTIME. The futimens() syscall in turn uses
CLOCK_REALTIME_COARSE.

The coarse time can be up to one timer tick behind, so reading
CLOCK_REALTIME first
can give you the exact second with a small nanosecond value, while the
utime will still
set the previous value.

Can you change the test case to check if the later time is less than
clock_getres(CLOCK_REALTIME_COARSE, ...) behind?

        Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 16:50       ` Arnd Bergmann
@ 2022-06-14 17:00         ` Rich Felker
  2022-06-14 20:37           ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2022-06-14 17:00 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: musl

On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote:
> On Fri, Jun 10, 2022 at 10:53 AM Zev Levy Stevenson <zevlevys@gmail.com> wrote:
> >
> > Thank you for the responses, those reasons make sense to me. We are using a very customized toolchain but the kernel itself is standard.
> > We looked into it a bit further and we were able to reproduce the issue with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux kernel that we took from a standard Ubuntu distribution.
> > Specifically, tests in the libc-test suite (https://wiki.musl-libc.org/libc-test.html) using the time() function fail sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in every 1,000 runs. This can be reduced to this type of code failing:
> >
> > t = time(0);
> > if(futimens(fd, ((struct timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1;
> > if (fstat(fd, &st) != 0) return 1;
> > if (st.st_atim.tv_sec < t) printf("time inconsistency\n");
> >
> > When replacing the call to time(0) with a raw call to the Linux time() syscall the issue seems to disappear. On the other hand, using the clock_gettime syscall results in the same issue.
> > Perhaps this is an issue with the Linux implementation of these syscalls / vdso functions, in which case further research may be required, or maybe such consistency when using different methods for measuring the system time doesn't have to be guaranteed, in which case the tests should probably be modified to allow for small inaccuracies such as the one described above.
> 
> I think the problem here is likely the inconsistency between the
> COCK_REALTIME and
> CLOCK_REALTIME_COARSE domains.
> 
> The time() syscall in the kernel is based on CLOCK_REALTIME_COARSE, while musl
> uses CLOCK_REALTIME. The futimens() syscall in turn uses
> CLOCK_REALTIME_COARSE.
> 
> The coarse time can be up to one timer tick behind, so reading
> CLOCK_REALTIME first
> can give you the exact second with a small nanosecond value, while the
> utime will still
> set the previous value.
> 
> Can you change the test case to check if the later time is less than
> clock_getres(CLOCK_REALTIME_COARSE, ...) behind?

This seems like a bug that the kernel uses the wrong clock for setting
file timestamps. It can result in seeing events out-of-order (exactly
as described in this thread). This should really be fixed or at least
made switchable so users who care can fix it.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 17:00         ` Rich Felker
@ 2022-06-14 20:37           ` Arnd Bergmann
  2022-06-14 20:49             ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2022-06-14 20:37 UTC (permalink / raw)
  To: musl

On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote:
> On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote:
> > The coarse time can be up to one timer tick behind, so reading
> > CLOCK_REALTIME first
> > can give you the exact second with a small nanosecond value, while the
> > utime will still
> > set the previous value.
> >
> > Can you change the test case to check if the later time is less than
> > clock_getres(CLOCK_REALTIME_COARSE, ...) behind?
>
> This seems like a bug that the kernel uses the wrong clock for setting
> file timestamps. It can result in seeing events out-of-order (exactly
> as described in this thread). This should really be fixed or at least
> made switchable so users who care can fix it.

I can't find any reference to what the correct clock is here,
are you sure that this is specified at all? The decision to use the coarse
time in the kernel is definitely intentional, as reading the hardware
clocksource can be expensive (depending on the hardware), and
changing the behavior would likely break applications that rely on
it being the coarse clock.

        Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 20:37           ` Arnd Bergmann
@ 2022-06-14 20:49             ` Rich Felker
  2022-06-14 21:11               ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2022-06-14 20:49 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: musl

On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote:
> > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote:
> > > The coarse time can be up to one timer tick behind, so reading
> > > CLOCK_REALTIME first
> > > can give you the exact second with a small nanosecond value, while the
> > > utime will still
> > > set the previous value.
> > >
> > > Can you change the test case to check if the later time is less than
> > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind?
> >
> > This seems like a bug that the kernel uses the wrong clock for setting
> > file timestamps. It can result in seeing events out-of-order (exactly
> > as described in this thread). This should really be fixed or at least
> > made switchable so users who care can fix it.
> 
> I can't find any reference to what the correct clock is here,
> are you sure that this is specified at all? The decision to use the coarse
> time in the kernel is definitely intentional, as reading the hardware
> clocksource can be expensive (depending on the hardware), and
> changing the behavior would likely break applications that rely on
> it being the coarse clock.

POSIX specifies operations that set the file timestamps in terms of
the system (CLOCK_REALTIME) clock, not a weird implementation-defined
alternate clock.

Maybe you're right that getting the correct clock is costly on some
archs, but it's almost surely not on any arch that admits vdso
clock_gettime. And "race that causes applications to see wrong
ordering of filesystem operations with respect to other activity for
the sake of performance" does not seem like a good idea.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 20:49             ` Rich Felker
@ 2022-06-14 21:11               ` Arnd Bergmann
  2022-06-14 23:28                 ` Rich Felker
  0 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2022-06-14 21:11 UTC (permalink / raw)
  To: musl

On Tue, Jun 14, 2022 at 10:49 PM Rich Felker <dalias@libc.org> wrote:
> On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote:
> > On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote:
> > > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote:
> > > > The coarse time can be up to one timer tick behind, so reading
> > > > CLOCK_REALTIME first
> > > > can give you the exact second with a small nanosecond value, while the
> > > > utime will still
> > > > set the previous value.
> > > >
> > > > Can you change the test case to check if the later time is less than
> > > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind?
> > >
> > > This seems like a bug that the kernel uses the wrong clock for setting
> > > file timestamps. It can result in seeing events out-of-order (exactly
> > > as described in this thread). This should really be fixed or at least
> > > made switchable so users who care can fix it.
> >
> > I can't find any reference to what the correct clock is here,
> > are you sure that this is specified at all? The decision to use the coarse
> > time in the kernel is definitely intentional, as reading the hardware
> > clocksource can be expensive (depending on the hardware), and
> > changing the behavior would likely break applications that rely on
> > it being the coarse clock.
>
> POSIX specifies operations that set the file timestamps in terms of
> the system (CLOCK_REALTIME) clock, not a weird implementation-defined
> alternate clock.
>
> Maybe you're right that getting the correct clock is costly on some
> archs, but it's almost surely not on any arch that admits vdso
> clock_gettime. And "race that causes applications to see wrong
> ordering of filesystem operations with respect to other activity for
> the sake of performance" does not seem like a good idea.

The thing is that a lot of file systems would still behave the same way
because they round times down to a filesystem specific resolution,
often one microsecond or one second, while the kernel time accounting
is in nanoseconds. There have been discussions about an interface
to find out what the actual resolution on a given mount point is (similar
to clock_getres), but that never made it in. The guarantees that you
get from file systems at the moment are:

- the timestamp is always rounded down, not up, so a newly
  created file never gets a timestamp that is newer than either
  CLOCK_REALTIME or CLOCK_REALTIME_COARSE as
  reported by a subsequent clock_gettime()/gettimeofday()/time().

- the in-memory timestamp is the same that you read back
  after umount/mount, and gets adjusted for both resolution
  and range of the on-disk representation.

- any file system that supports timestamps (some always
  report tv_sec=0) set the timestamps to at most three
  seconds before the current time as read by an earlier
  time() syscall.

Making it use CLOCK_REALTIME instead of
CLOCK_REALTIME_COARSE would improve the third
guarantee so it could be within two seconds (or one second
on file systems with full-second resolution like ext3), but would
break the first rule by making it report timestamps that can
be either before or after the time reported by the time() syscall.

        Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 21:11               ` Arnd Bergmann
@ 2022-06-14 23:28                 ` Rich Felker
  2022-06-15 12:09                   ` Arnd Bergmann
  0 siblings, 1 reply; 14+ messages in thread
From: Rich Felker @ 2022-06-14 23:28 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: musl

On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote:
> On Tue, Jun 14, 2022 at 10:49 PM Rich Felker <dalias@libc.org> wrote:
> > On Tue, Jun 14, 2022 at 10:37:25PM +0200, Arnd Bergmann wrote:
> > > On Tue, Jun 14, 2022 at 7:00 PM Rich Felker <dalias@libc.org> wrote:
> > > > On Tue, Jun 14, 2022 at 06:50:40PM +0200, Arnd Bergmann wrote:
> > > > > The coarse time can be up to one timer tick behind, so reading
> > > > > CLOCK_REALTIME first
> > > > > can give you the exact second with a small nanosecond value, while the
> > > > > utime will still
> > > > > set the previous value.
> > > > >
> > > > > Can you change the test case to check if the later time is less than
> > > > > clock_getres(CLOCK_REALTIME_COARSE, ...) behind?
> > > >
> > > > This seems like a bug that the kernel uses the wrong clock for setting
> > > > file timestamps. It can result in seeing events out-of-order (exactly
> > > > as described in this thread). This should really be fixed or at least
> > > > made switchable so users who care can fix it.
> > >
> > > I can't find any reference to what the correct clock is here,
> > > are you sure that this is specified at all? The decision to use the coarse
> > > time in the kernel is definitely intentional, as reading the hardware
> > > clocksource can be expensive (depending on the hardware), and
> > > changing the behavior would likely break applications that rely on
> > > it being the coarse clock.
> >
> > POSIX specifies operations that set the file timestamps in terms of
> > the system (CLOCK_REALTIME) clock, not a weird implementation-defined
> > alternate clock.
> >
> > Maybe you're right that getting the correct clock is costly on some
> > archs, but it's almost surely not on any arch that admits vdso
> > clock_gettime. And "race that causes applications to see wrong
> > ordering of filesystem operations with respect to other activity for
> > the sake of performance" does not seem like a good idea.
> 
> The thing is that a lot of file systems would still behave the same way
> because they round times down to a filesystem specific resolution,
> often one microsecond or one second, while the kernel time accounting
> is in nanoseconds. There have been discussions about an interface
> to find out what the actual resolution on a given mount point is (similar
> to clock_getres), but that never made it in. The guarantees that you
> get from file systems at the moment are:

It's normal that they may be rounded down the the filesystem timestamp
granularity. I thought what was going on here was worse.

> - the timestamp is always rounded down, not up, so a newly
>   created file never gets a timestamp that is newer than either
>   CLOCK_REALTIME or CLOCK_REALTIME_COARSE as
>   reported by a subsequent clock_gettime()/gettimeofday()/time().
> 
> - the in-memory timestamp is the same that you read back
>   after umount/mount, and gets adjusted for both resolution
>   and range of the on-disk representation.
> 
> - any file system that supports timestamps (some always
>   report tv_sec=0) set the timestamps to at most three
>   seconds before the current time as read by an earlier
>   time() syscall.
> 
> Making it use CLOCK_REALTIME instead of
> CLOCK_REALTIME_COARSE would improve the third
> guarantee so it could be within two seconds (or one second
> on file systems with full-second resolution like ext3), but would
> break the first rule by making it report timestamps that can
> be either before or after the time reported by the time() syscall.

OK, the time syscall doing the wrong thing here (using a different
clock that's not correctly ordered with respect to CLOCK_REALTIME)
seems to be the worst problem here -- if I'm understanding it right.
The filesystem issue might be a non-issue if it's truly equivalent to
just having coarser fs timestamp granularity, which is allowed.

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-14 23:28                 ` Rich Felker
@ 2022-06-15 12:09                   ` Arnd Bergmann
  2022-06-15 16:55                     ` Adhemerval Zanella
  2022-06-16  9:06                     ` Thomas Gleixner
  0 siblings, 2 replies; 14+ messages in thread
From: Arnd Bergmann @ 2022-06-15 12:09 UTC (permalink / raw)
  To: musl
  Cc: John Stultz, Stephen Boyd, Linux Kernel Mailing List,
	Thomas Gleixner, Adhemerval Zanella

On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote:
> On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote:
> >
> > The thing is that a lot of file systems would still behave the same way
> > because they round times down to a filesystem specific resolution,
> > often one microsecond or one second, while the kernel time accounting
> > is in nanoseconds. There have been discussions about an interface
> > to find out what the actual resolution on a given mount point is (similar
> > to clock_getres), but that never made it in. The guarantees that you
> > get from file systems at the moment are:
>
> It's normal that they may be rounded down the the filesystem timestamp
> granularity. I thought what was going on here was worse.

It gets rounded down twice: first down to the start of the current
timer tick, which is at an arbitrary nanosecond value in the past 10ms,
and then to the resolution of the file system. The result is that the
file timestamp can point to a slightly earlier value, up to max(timer tick
cycle, fs resolution) before the actual nanosecond value. We don't
advertise the granule of the file system though, so I would expect
this to be within the expected behavior.

> OK, the time syscall doing the wrong thing here (using a different
> clock that's not correctly ordered with respect to CLOCK_REALTIME)
> seems to be the worst problem here -- if I'm understanding it right.
> The filesystem issue might be a non-issue if it's truly equivalent to
> just having coarser fs timestamp granularity, which is allowed.

Adding the kernel timekeeping maintainers to Cc. I think this is a
reasonable argument, but it goes against the current behavior.

We have four implementations of the time() syscall that one would
commonly encounter:

- The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE
- The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE
- The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...)
- The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...)

So even if everyone agrees that the musl implementation is the
correct one, I think both linux and glibc are more likely to stick with
the traditional behavior to avoid breaking user space code such as the
libc-test case that Zev brought up initially. At least Adhemerval's
time() implementation in glibc[1] appears to have done this intentionally,
while the Linux implementation has simply never changed this in an
incompatible way since Linux-0.01 added time() and 0.99.13k added
the high-resolution gettimeofday().

       Arnd

[1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0d56378349

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-15 12:09                   ` Arnd Bergmann
@ 2022-06-15 16:55                     ` Adhemerval Zanella
  2022-06-16  9:06                     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2022-06-15 16:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: musl, John Stultz, Stephen Boyd, Linux Kernel Mailing List,
	Thomas Gleixner



> On 15 Jun 2022, at 05:09, Arnd Bergmann <arnd@kernel.org> wrote:
> 
> On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote:
>> On Tue, Jun 14, 2022 at 11:11:32PM +0200, Arnd Bergmann wrote:
>>> 
>>> The thing is that a lot of file systems would still behave the same way
>>> because they round times down to a filesystem specific resolution,
>>> often one microsecond or one second, while the kernel time accounting
>>> is in nanoseconds. There have been discussions about an interface
>>> to find out what the actual resolution on a given mount point is (similar
>>> to clock_getres), but that never made it in. The guarantees that you
>>> get from file systems at the moment are:
>> 
>> It's normal that they may be rounded down the the filesystem timestamp
>> granularity. I thought what was going on here was worse.
> 
> It gets rounded down twice: first down to the start of the current
> timer tick, which is at an arbitrary nanosecond value in the past 10ms,
> and then to the resolution of the file system. The result is that the
> file timestamp can point to a slightly earlier value, up to max(timer tick
> cycle, fs resolution) before the actual nanosecond value. We don't
> advertise the granule of the file system though, so I would expect
> this to be within the expected behavior.
> 
>> OK, the time syscall doing the wrong thing here (using a different
>> clock that's not correctly ordered with respect to CLOCK_REALTIME)
>> seems to be the worst problem here -- if I'm understanding it right.
>> The filesystem issue might be a non-issue if it's truly equivalent to
>> just having coarser fs timestamp granularity, which is allowed.
> 
> Adding the kernel timekeeping maintainers to Cc. I think this is a
> reasonable argument, but it goes against the current behavior.
> 
> We have four implementations of the time() syscall that one would
> commonly encounter:
> 
> - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE
> - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE
> - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...)
> - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...)
> 
> So even if everyone agrees that the musl implementation is the
> correct one, I think both linux and glibc are more likely to stick with
> the traditional behavior to avoid breaking user space code such as the
> libc-test case that Zev brought up initially. At least Adhemerval's
> time() implementation in glibc[1] appears to have done this intentionally,
> while the Linux implementation has simply never changed this in an
> incompatible way since Linux-0.01 added time() and 0.99.13k added
> the high-resolution gettimeofday().
> 
>       Arnd
> 
> [1] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=0d56378349

Indeed I have changed glibc to be consistent on all architectures to mimic kernel
behavior time syscall and avoid this very issue. We did not have a consistent
implementation before, so glibc varied depending of architecture and kernel
version whether it uses CLOCK_REALTIME or CLOCK_REALTIME_COARSE.

If kernel does change to make time() use CLOCK_REALTIME, it would make
sense to make glibc __clock_gettime64 to use it as well. We will also need to
either disable time vDSO usage on x86 and powerpc or make kernel implementation	
to use CLOCK_REALTIME as well.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-15 12:09                   ` Arnd Bergmann
  2022-06-15 16:55                     ` Adhemerval Zanella
@ 2022-06-16  9:06                     ` Thomas Gleixner
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2022-06-16  9:06 UTC (permalink / raw)
  To: Arnd Bergmann, musl
  Cc: John Stultz, Stephen Boyd, Linux Kernel Mailing List, Adhemerval Zanella

On Wed, Jun 15 2022 at 14:09, Arnd Bergmann wrote:
> On Wed, Jun 15, 2022 at 1:28 AM Rich Felker <dalias@libc.org> wrote:
> Adding the kernel timekeeping maintainers to Cc. I think this is a
> reasonable argument, but it goes against the current behavior.
>
> We have four implementations of the time() syscall that one would
> commonly encounter:
>
> - The kernel syscall, using (effectively) CLOCK_REALTIME_COARSE
> - The kernel vdso, using (effectively) CLOCK_REALTIME_COARSE
> - The glibc interface, calling __clock_gettime64(CLOCK_REALTIME_COARSE, ...)
> - The musl interface, calling __clock_gettime64(CLOCK_REALTIME, ...)
>
> So even if everyone agrees that the musl implementation is the
> correct one, I think both linux and glibc are more likely to stick with
> the traditional behavior to avoid breaking user space code such as the
> libc-test case that Zev brought up initially. At least Adhemerval's
> time() implementation in glibc[1] appears to have done this intentionally,
> while the Linux implementation has simply never changed this in an
> incompatible way since Linux-0.01 added time() and 0.99.13k added
> the high-resolution gettimeofday().

That's correct. Assumed this call order:

       clock_gettime(REALTIME, &tr);
       clock_gettime(REALTIME_COARSE, &tc);
       tt = time();

You can observe

  tr->sec > tc->sec
  tr->sec > tt

but you can never observe

      tc->sec > tt

The reason for this is historical and time() has a distinct performance
advantage as it boils down to a single read and does not require the
sequence count (at least on 64bit). Coarse REALTIME requires the
seqcount, but avoids the hardware read and the larger math.

The costy part is the hardware read. Before TSC became usable, the
hardware read was a matter of microseconds, so avoiding it was a
significant performance gain. With a loop of 1e9 reads (including the
loop overhead) as measured with perf on a halfways recent SKL the
average per invocation is:

time()                           7 cycles
clock_gettime(REAL_COARSE)      21 cycles
clock_gettime(REAL) TSC         60 cycles
clock_gettime(REAL) HPET      6092 cycles (~2000 cycles syscall overhead)
clock_gettime(REAL) ACPI_PM   4096 cycles (~2000 cycles syscall overhead)

So at the very end it boils down to performance and expectations. File
systems have chosen their granularity and the underlying mechanism to
get the timestamp according to that.

It's clearly not well documented, but I doubt that we can change the
implementation without running into measurable performance regressions.

VDSO based time() vs. clock_gettime(REAL) TSC is almost an order of
magnitude...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [musl] Question about musl's time() implementation in time.c
  2022-06-10  8:52     ` Zev Levy Stevenson
  2022-06-14 16:50       ` Arnd Bergmann
@ 2022-06-16 14:57       ` Rich Felker
  1 sibling, 0 replies; 14+ messages in thread
From: Rich Felker @ 2022-06-16 14:57 UTC (permalink / raw)
  To: Zev Levy Stevenson; +Cc: Arnd Bergmann, musl

On Fri, Jun 10, 2022 at 11:52:50AM +0300, Zev Levy Stevenson wrote:
> Thank you for the responses, those reasons make sense to me. We are using a
> very customized toolchain but the kernel itself is standard.
> We looked into it a bit further and we were able to reproduce the issue
> with a clean musl-gcc toolchain for x86_64 (version 1.2.2) on a Linux
> kernel that we took from a standard Ubuntu distribution.
> Specifically, tests in the libc-test suite (
> https://wiki.musl-libc.org/libc-test.html) using the time() function fail
> sometimes, e.g. src/functional/utime.c, which fails on about ~3-4 runs in
> every 1,000 runs. This can be reduced to this type of code failing:
> 
> t = time(0);
> if(futimens(fd, ((struct
> timespec[2]){{.tv_nsec=UTIME_NOW},{.tv_nsec=UTIME_OMIT}})) != 0) return 1;
> if (fstat(fd, &st) != 0) return 1;
> if (st.st_atim.tv_sec < t) printf("time inconsistency\n");
> 
> When replacing the call to time(0) with a raw call to the Linux time()
> syscall the issue seems to disappear. On the other hand, using the
> clock_gettime syscall results in the same issue.
> Perhaps this is an issue with the Linux implementation of these syscalls /
> vdso functions, in which case further research may be required, or maybe
> such consistency when using different methods for measuring the system time
> doesn't have to be guaranteed, in which case the tests should probably be
> modified to allow for small inaccuracies such as the one described above.

Going back to the issue that motivated this thread: perhaps libc-test
should not use time(0) here but instead a second tmpfile() with a
write to it followed by fstat, to get a timestamp out of the (same,
temp) filesystem. Then it's not encoding any assumption about correct
ordering between realtime clock and fs timestamps, which might be
broken by fs timestamp granularity or the kernel issue discussed in
this thread. It's only encoding an assumption that timestamps on the
same filesystem are correctly ordered with respect to each other
(which is necessary in order for make, etc. to work right).

Rich

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-16 14:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 12:25 [musl] Question about musl's time() implementation in time.c Zev Levy Stevenson
2022-06-07 14:29 ` Arnd Bergmann
2022-06-07 16:30   ` Rich Felker
2022-06-10  8:52     ` Zev Levy Stevenson
2022-06-14 16:50       ` Arnd Bergmann
2022-06-14 17:00         ` Rich Felker
2022-06-14 20:37           ` Arnd Bergmann
2022-06-14 20:49             ` Rich Felker
2022-06-14 21:11               ` Arnd Bergmann
2022-06-14 23:28                 ` Rich Felker
2022-06-15 12:09                   ` Arnd Bergmann
2022-06-15 16:55                     ` Adhemerval Zanella
2022-06-16  9:06                     ` Thomas Gleixner
2022-06-16 14:57       ` Rich Felker

Code repositories for project(s) associated with this inbox:

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).