Hello, while looking for a reason for a failed DNS resolve I noticed that the mtime function which is used to calculate and decide on the timeout uses the wall clock instead of a monotonic clock: static unsigned long mtime() { struct timespec ts; clock_gettime(CLOCK_REALTIME, &ts); return (unsigned long)ts.tv_sec * 1000 + ts.tv_nsec / 1000000; } http://git.musl-libc.org/cgit/musl/tree/src/network/res_msend.c#n28 Is this a bug or intentional? Thanks, Gregor
On Thu, Dec 01, 2022 at 12:24:47PM +0100, Gregor Jasny wrote:
> Hello,
>
> while looking for a reason for a failed DNS resolve I noticed that
> the mtime function which is used to calculate and decide on the
> timeout uses the wall clock instead of a monotonic clock:
>
> static unsigned long mtime()
> {
> struct timespec ts;
> clock_gettime(CLOCK_REALTIME, &ts);
> return (unsigned long)ts.tv_sec * 1000
> + ts.tv_nsec / 1000000;
> }
>
> http://git.musl-libc.org/cgit/musl/tree/src/network/res_msend.c#n28
>
> Is this a bug or intentional?
It was intentional, based on a belief that the monotonic clock might
not be present on all kernels. That seems to be incorrect for the
range of versions we "support" (>=2.6.0) but some archs unofficially
work back to mid 2.4.x or earlier with limited functionality (no
threads). Note for example that clock_gettime has fallback to the
gettimeofday syscall despite all kernels >=2.6.0 having clock_gettime
(though was it perhaps gated under some CONFIG_ for "realtime
features" at some point? this probably calls for some research...)
Switching to monotonic here has been on my radar for a while. I see
two decent ways to do it without any possibility of regression:
1. Have the above mtime() function fall back to CLOCK_REALTIME on
ENOSYS, or
2. Go through with integrating a fallback for CLOCK_MONOTONIC I've had
in draft for a long time that works on ancient kernels. It works by
combining the seconds-resolution time from SYS_sysinfo uptime with
the finer-grained-but-wrapping jiffy count from SYS_times too get a
monotonic jiffies-resolution uptime.
The latter is cute/fun but a little bit of work to get right and I'm
not sure it's sufficiently useful to justify doing it. Option 1 seems
very reasonable.
Rich
On Dec 1, 2022, at 9:11 AM, Rich Felker <dalias@libc.org> wrote:
>
>> http://git.musl-libc.org/cgit/musl/tree/src/network/res_msend.c#n28
>>
>> Is this a bug or intentional?
>
> It was intentional, based on a belief that the monotonic clock might
> not be present on all kernels. That seems to be incorrect for the
> range of versions we "support" (>=2.6.0) but some archs unofficially
> work back to mid 2.4.x or earlier with limited functionality (no
> threads). Note for example that clock_gettime has fallback to the
> gettimeofday syscall despite all kernels >=2.6.0 having clock_gettime
> (though was it perhaps gated under some CONFIG_ for "realtime
> features" at some point? this probably calls for some research...)
>
> Switching to monotonic here has been on my radar for a while. I see
> two decent ways to do it without any possibility of regression:
>
> 1. Have the above mtime() function fall back to CLOCK_REALTIME on
> ENOSYS, or
>
> 2. Go through with integrating a fallback for CLOCK_MONOTONIC I've had
> in draft for a long time that works on ancient kernels. It works by
> combining the seconds-resolution time from SYS_sysinfo uptime with
> the finer-grained-but-wrapping jiffy count from SYS_times too get a
> monotonic jiffies-resolution uptime.
>
> The latter is cute/fun but a little bit of work to get right and I'm
> not sure it's sufficiently useful to justify doing it. Option 1 seems
> very reasonable.
>
> Rich
clock_gettime was added in 2.5.63 and 2.6.0 and was always obj-y.
There is no CONFIG knob that controls it in early 2.6, though I stopped
my search at 2.6.16.
I have written this patch out, but I don’t have a box without monotonic
clock, so I can’t test the validity of the patch. I will send the patch
to the list for review.
Best,
-A.
Before this commit, DNS timeouts always used CLOCK_REALTIME, which could produce errors if wall time changed for whatever reason. Now we try CLOCK_MONOTONIC and only fall back to CLOCK_REALTIME when it is unavailable. --- src/network/res_msend.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/network/res_msend.c b/src/network/res_msend.c index 11c6aa0e..fef7e3a2 100644 --- a/src/network/res_msend.c +++ b/src/network/res_msend.c @@ -25,7 +25,8 @@ static void cleanup(void *p) static unsigned long mtime() { struct timespec ts; - clock_gettime(CLOCK_REALTIME, &ts); + if (clock_gettime(CLOCK_MONOTONIC, &ts) < 0 && errno == ENOSYS) + clock_gettime(CLOCK_REALTIME, &ts); return (unsigned long)ts.tv_sec * 1000 + ts.tv_nsec / 1000000; } -- 2.37.1 (Apple Git-137.1)