* CPU usage - clock related issue
@ 2024-07-28 5:56 Marcel Telka
2024-07-28 16:37 ` [developer] " Robert Mustacchi
0 siblings, 1 reply; 4+ messages in thread
From: Marcel Telka @ 2024-07-28 5:56 UTC (permalink / raw)
To: developer
Hi,
It looks like something went wrong between changesets 6e0c6e37fb and
8b913f79fc in the illumos-gate.
After upgrade of OpenIndiana from
osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
to
osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
I see two processes eating full CPU: nwamd and mariadbd. The machine is
a qemu/kvm quest (host is Rocky 9).
# dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
dtrace: description 'profile-101 ' matched 2 probes
nwamd`in_past+0x23
nwamd`nwamd_event_dequeue+0x1bf
nwamd`nwamd_event_handler+0x162
nwamd`main+0x1b0
nwamd`_start_crt+0x9a
nwamd`_start+0x1a
24
nwamd`in_past+0x26
nwamd`nwamd_event_dequeue+0x1bf
nwamd`nwamd_event_handler+0x162
nwamd`main+0x1b0
nwamd`_start_crt+0x9a
nwamd`_start+0x1a
25
libc.so.1`__cp_gethrtime+0x5e
libc.so.1`__cp_clock_gettime_realtime+0x77
libc.so.1`__clock_gettime+0x72
libc.so.1`clock_gettime+0x26
nwamd`in_past+0x23
nwamd`nwamd_event_dequeue+0x1bf
nwamd`nwamd_event_handler+0x162
nwamd`main+0x1b0
nwamd`_start_crt+0x9a
nwamd`_start+0x1a
27
libc.so.1`__cp_tsc_read+0x19
libc.so.1`__cp_gethrtime+0x39
libc.so.1`__cp_clock_gettime_realtime+0x77
libc.so.1`__clock_gettime+0x72
libc.so.1`clock_gettime+0x26
nwamd`in_past+0x23
nwamd`nwamd_event_dequeue+0x1bf
nwamd`nwamd_event_handler+0x162
nwamd`main+0x1b0
nwamd`_start_crt+0x9a
nwamd`_start+0x1a
403
# dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
dtrace: description 'profile-101 ' matched 2 probes
mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
libstdc++.so.6.0.32`execute_native_thread_routine+0x10
libc.so.1`_thrp_setup+0x77
libc.so.1`_lwp_start
126
libc.so.1`__cp_tsc_read+0xf
libc.so.1`clock_gettime+0x15
libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
libstdc++.so.6.0.32`execute_native_thread_routine+0x10
libc.so.1`_thrp_setup+0x77
libc.so.1`_lwp_start
130
libc.so.1`__cp_tsc_read+0xf
libc.so.1`clock_gettime+0x15
libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
libstdc++.so.6.0.32`execute_native_thread_routine+0x10
libc.so.1`_thrp_setup+0x77
libc.so.1`_lwp_start
135
libc.so.1`__cp_tsc_read+0xf
libc.so.1`clock_gettime+0x15
libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
libstdc++.so.6.0.32`execute_native_thread_routine+0x10
libc.so.1`_thrp_setup+0x77
libc.so.1`_lwp_start
137
#
The obvious suspect is:
commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
Author: Robert Mustacchi <rm@fingolfin.org>
Date: Tue Jul 23 14:44:22 2024 +0000
14237 Want support for pthread_cond_clockwait() and friends
Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
Approved by: Gordon Ross <gordon.w.ross@gmail.com>
but I didn't bisect yet.
Regards.
--
+-------------------------------------------+
| Marcel Telka e-mail: marcel@telka.sk |
| homepage: http://telka.sk/ |
+-------------------------------------------+
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [developer] CPU usage - clock related issue
2024-07-28 5:56 CPU usage - clock related issue Marcel Telka
@ 2024-07-28 16:37 ` Robert Mustacchi
2024-07-28 17:50 ` Robert Mustacchi
0 siblings, 1 reply; 4+ messages in thread
From: Robert Mustacchi @ 2024-07-28 16:37 UTC (permalink / raw)
To: illumos-developer, Marcel Telka
On 7/27/24 22:56, Marcel Telka wrote:
> Hi,
>
> It looks like something went wrong between changesets 6e0c6e37fb and
> 8b913f79fc in the illumos-gate.
>
> After upgrade of OpenIndiana from
> osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
> to
> osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
>
> I see two processes eating full CPU: nwamd and mariadbd. The machine is
> a qemu/kvm quest (host is Rocky 9).
>
> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
> dtrace: description 'profile-101 ' matched 2 probes
> nwamd`in_past+0x23
> nwamd`nwamd_event_dequeue+0x1bf
> nwamd`nwamd_event_handler+0x162
> nwamd`main+0x1b0
> nwamd`_start_crt+0x9a
> nwamd`_start+0x1a
> 24
>
> nwamd`in_past+0x26
> nwamd`nwamd_event_dequeue+0x1bf
> nwamd`nwamd_event_handler+0x162
> nwamd`main+0x1b0
> nwamd`_start_crt+0x9a
> nwamd`_start+0x1a
> 25
>
> libc.so.1`__cp_gethrtime+0x5e
> libc.so.1`__cp_clock_gettime_realtime+0x77
> libc.so.1`__clock_gettime+0x72
> libc.so.1`clock_gettime+0x26
> nwamd`in_past+0x23
> nwamd`nwamd_event_dequeue+0x1bf
> nwamd`nwamd_event_handler+0x162
> nwamd`main+0x1b0
> nwamd`_start_crt+0x9a
> nwamd`_start+0x1a
> 27
>
> libc.so.1`__cp_tsc_read+0x19
> libc.so.1`__cp_gethrtime+0x39
> libc.so.1`__cp_clock_gettime_realtime+0x77
> libc.so.1`__clock_gettime+0x72
> libc.so.1`clock_gettime+0x26
> nwamd`in_past+0x23
> nwamd`nwamd_event_dequeue+0x1bf
> nwamd`nwamd_event_handler+0x162
> nwamd`main+0x1b0
> nwamd`_start_crt+0x9a
> nwamd`_start+0x1a
> 403
> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
> dtrace: description 'profile-101 ' matched 2 probes
> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
> libc.so.1`_thrp_setup+0x77
> libc.so.1`_lwp_start
> 126
>
> libc.so.1`__cp_tsc_read+0xf
> libc.so.1`clock_gettime+0x15
> libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
> libc.so.1`_thrp_setup+0x77
> libc.so.1`_lwp_start
> 130
>
> libc.so.1`__cp_tsc_read+0xf
> libc.so.1`clock_gettime+0x15
> libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
> libc.so.1`_thrp_setup+0x77
> libc.so.1`_lwp_start
> 135
>
> libc.so.1`__cp_tsc_read+0xf
> libc.so.1`clock_gettime+0x15
> libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
> libc.so.1`_thrp_setup+0x77
> libc.so.1`_lwp_start
> 137
>
>
> The obvious suspect is:
>
> commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
> Author: Robert Mustacchi <rm@fingolfin.org>
> Date: Tue Jul 23 14:44:22 2024 +0000
>
> 14237 Want support for pthread_cond_clockwait() and friends
> Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
> Approved by: Gordon Ross <gordon.w.ross@gmail.com>
>
>
> but I didn't bisect yet.
Thanks for the report, Marcel. I will dig in and see what I can find.
Apologies for the trouble.
Robert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [developer] CPU usage - clock related issue
2024-07-28 16:37 ` [developer] " Robert Mustacchi
@ 2024-07-28 17:50 ` Robert Mustacchi
2024-07-28 23:16 ` Marcel Telka
0 siblings, 1 reply; 4+ messages in thread
From: Robert Mustacchi @ 2024-07-28 17:50 UTC (permalink / raw)
To: illumos-developer, Marcel Telka
On 7/28/24 09:37, Robert Mustacchi wrote:
> On 7/27/24 22:56, Marcel Telka wrote:
>> Hi,
>>
>> It looks like something went wrong between changesets 6e0c6e37fb and
>> 8b913f79fc in the illumos-gate.
>>
>> After upgrade of OpenIndiana from
>> osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
>> to
>> osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
>>
>> I see two processes eating full CPU: nwamd and mariadbd. The machine is
>> a qemu/kvm quest (host is Rocky 9).
>>
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>> nwamd`in_past+0x23
>> nwamd`nwamd_event_dequeue+0x1bf
>> nwamd`nwamd_event_handler+0x162
>> nwamd`main+0x1b0
>> nwamd`_start_crt+0x9a
>> nwamd`_start+0x1a
>> 24
>>
>> nwamd`in_past+0x26
>> nwamd`nwamd_event_dequeue+0x1bf
>> nwamd`nwamd_event_handler+0x162
>> nwamd`main+0x1b0
>> nwamd`_start_crt+0x9a
>> nwamd`_start+0x1a
>> 25
>>
>> libc.so.1`__cp_gethrtime+0x5e
>> libc.so.1`__cp_clock_gettime_realtime+0x77
>> libc.so.1`__clock_gettime+0x72
>> libc.so.1`clock_gettime+0x26
>> nwamd`in_past+0x23
>> nwamd`nwamd_event_dequeue+0x1bf
>> nwamd`nwamd_event_handler+0x162
>> nwamd`main+0x1b0
>> nwamd`_start_crt+0x9a
>> nwamd`_start+0x1a
>> 27
>>
>> libc.so.1`__cp_tsc_read+0x19
>> libc.so.1`__cp_gethrtime+0x39
>> libc.so.1`__cp_clock_gettime_realtime+0x77
>> libc.so.1`__clock_gettime+0x72
>> libc.so.1`clock_gettime+0x26
>> nwamd`in_past+0x23
>> nwamd`nwamd_event_dequeue+0x1bf
>> nwamd`nwamd_event_handler+0x162
>> nwamd`main+0x1b0
>> nwamd`_start_crt+0x9a
>> nwamd`_start+0x1a
>> 403
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
>> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>> libc.so.1`_thrp_setup+0x77
>> libc.so.1`_lwp_start
>> 126
>>
>> libc.so.1`__cp_tsc_read+0xf
>> libc.so.1`clock_gettime+0x15
>> libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
>> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>> libc.so.1`_thrp_setup+0x77
>> libc.so.1`_lwp_start
>> 130
>>
>> libc.so.1`__cp_tsc_read+0xf
>> libc.so.1`clock_gettime+0x15
>> libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
>> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
>> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>> libc.so.1`_thrp_setup+0x77
>> libc.so.1`_lwp_start
>> 135
>>
>> libc.so.1`__cp_tsc_read+0xf
>> libc.so.1`clock_gettime+0x15
>> libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>> mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
>> mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>> mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>> libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>> libc.so.1`_thrp_setup+0x77
>> libc.so.1`_lwp_start
>> 137
>>
>>
>> The obvious suspect is:
>>
>> commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
>> Author: Robert Mustacchi <rm@fingolfin.org>
>> Date: Tue Jul 23 14:44:22 2024 +0000
>>
>> 14237 Want support for pthread_cond_clockwait() and friends
>> Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
>> Approved by: Gordon Ross <gordon.w.ross@gmail.com>
>>
>>
>> but I didn't bisect yet.
>
> Thanks for the report, Marcel. I will dig in and see what I can find.
> Apologies for the trouble.
I've root caused this and written it up in
https://www.illumos.org/issues/16683. The short form is that I
incorrectly handled how the default initializer set the internal clock
id in the cond_t. I have verified that nwam no longer is in a 100% loop
with the fix in place and confirmed that if I used static initializers
that my tests properly caught the issue before the fix and it is working
afterwards. I'll be sending a review request with additional regression
tests in a separate thread.
Again, I'm sorry for the trouble and inconvenience that this caused.
Thank you for reporting this Marcel.
Robert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [developer] CPU usage - clock related issue
2024-07-28 17:50 ` Robert Mustacchi
@ 2024-07-28 23:16 ` Marcel Telka
0 siblings, 0 replies; 4+ messages in thread
From: Marcel Telka @ 2024-07-28 23:16 UTC (permalink / raw)
To: illumos-developer
On Sun, Jul 28, 2024 at 10:50:40AM -0700, Robert Mustacchi wrote:
> I've root caused this and written it up in
> https://www.illumos.org/issues/16683. The short form is that I
> incorrectly handled how the default initializer set the internal clock
> id in the cond_t. I have verified that nwam no longer is in a 100% loop
> with the fix in place and confirmed that if I used static initializers
> that my tests properly caught the issue before the fix and it is working
> afterwards. I'll be sending a review request with additional regression
> tests in a separate thread.
Thanks a lot for very fast root causing and the fix.
Lookig forward to see it integrated soon.
Thank you for your hard and extremely valuable work, Robert!
--
+-------------------------------------------+
| Marcel Telka e-mail: marcel@telka.sk |
| homepage: http://telka.sk/ |
+-------------------------------------------+
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-28 23:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-28 5:56 CPU usage - clock related issue Marcel Telka
2024-07-28 16:37 ` [developer] " Robert Mustacchi
2024-07-28 17:50 ` Robert Mustacchi
2024-07-28 23:16 ` Marcel Telka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).