public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
* CPU usage - clock related issue
@ 2024-07-28  5:56 Marcel Telka
  2024-07-28 16:37 ` [developer] " Robert Mustacchi
  0 siblings, 1 reply; 4+ messages in thread
From: Marcel Telka @ 2024-07-28  5:56 UTC (permalink / raw)
  To: developer

Hi,

It looks like something went wrong between changesets 6e0c6e37fb and
8b913f79fc in the illumos-gate.

After upgrade of OpenIndiana from
	osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
to
	osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)

I see two processes eating full CPU: nwamd and mariadbd.  The machine is
a qemu/kvm quest (host is Rocky 9).

# dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
dtrace: description 'profile-101 ' matched 2 probes
              nwamd`in_past+0x23
              nwamd`nwamd_event_dequeue+0x1bf
              nwamd`nwamd_event_handler+0x162
              nwamd`main+0x1b0
              nwamd`_start_crt+0x9a
              nwamd`_start+0x1a
               24

              nwamd`in_past+0x26
              nwamd`nwamd_event_dequeue+0x1bf
              nwamd`nwamd_event_handler+0x162
              nwamd`main+0x1b0
              nwamd`_start_crt+0x9a
              nwamd`_start+0x1a
               25

              libc.so.1`__cp_gethrtime+0x5e
              libc.so.1`__cp_clock_gettime_realtime+0x77
              libc.so.1`__clock_gettime+0x72
              libc.so.1`clock_gettime+0x26
              nwamd`in_past+0x23
              nwamd`nwamd_event_dequeue+0x1bf
              nwamd`nwamd_event_handler+0x162
              nwamd`main+0x1b0
              nwamd`_start_crt+0x9a
              nwamd`_start+0x1a
               27

              libc.so.1`__cp_tsc_read+0x19
              libc.so.1`__cp_gethrtime+0x39
              libc.so.1`__cp_clock_gettime_realtime+0x77
              libc.so.1`__clock_gettime+0x72
              libc.so.1`clock_gettime+0x26
              nwamd`in_past+0x23
              nwamd`nwamd_event_dequeue+0x1bf
              nwamd`nwamd_event_handler+0x162
              nwamd`main+0x1b0
              nwamd`_start_crt+0x9a
              nwamd`_start+0x1a
              403
# dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
dtrace: description 'profile-101 ' matched 2 probes
              mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
              mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
              mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
              libstdc++.so.6.0.32`execute_native_thread_routine+0x10
              libc.so.1`_thrp_setup+0x77
              libc.so.1`_lwp_start
              126

              libc.so.1`__cp_tsc_read+0xf
              libc.so.1`clock_gettime+0x15
              libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
              mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
              mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
              mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
              libstdc++.so.6.0.32`execute_native_thread_routine+0x10
              libc.so.1`_thrp_setup+0x77
              libc.so.1`_lwp_start
              130

              libc.so.1`__cp_tsc_read+0xf
              libc.so.1`clock_gettime+0x15
              libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
              mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
              mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
              mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
              libstdc++.so.6.0.32`execute_native_thread_routine+0x10
              libc.so.1`_thrp_setup+0x77
              libc.so.1`_lwp_start
              135

              libc.so.1`__cp_tsc_read+0xf
              libc.so.1`clock_gettime+0x15
              libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
              mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
              mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
              mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
              libstdc++.so.6.0.32`execute_native_thread_routine+0x10
              libc.so.1`_thrp_setup+0x77
              libc.so.1`_lwp_start
              137
#


The obvious suspect is:

commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   Tue Jul 23 14:44:22 2024 +0000

    14237 Want support for pthread_cond_clockwait() and friends
    Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>


but I didn't bisect yet.


Regards.

-- 
+-------------------------------------------+
| Marcel Telka   e-mail:   marcel@telka.sk  |
|                homepage: http://telka.sk/ |
+-------------------------------------------+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [developer] CPU usage - clock related issue
  2024-07-28  5:56 CPU usage - clock related issue Marcel Telka
@ 2024-07-28 16:37 ` Robert Mustacchi
  2024-07-28 17:50   ` Robert Mustacchi
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Mustacchi @ 2024-07-28 16:37 UTC (permalink / raw)
  To: illumos-developer, Marcel Telka

On 7/27/24 22:56, Marcel Telka wrote:
> Hi,
> 
> It looks like something went wrong between changesets 6e0c6e37fb and
> 8b913f79fc in the illumos-gate.
> 
> After upgrade of OpenIndiana from
>         osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
> to
>         osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
> 
> I see two processes eating full CPU: nwamd and mariadbd.  The machine is
> a qemu/kvm quest (host is Rocky 9).
> 
> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
> dtrace: description 'profile-101 ' matched 2 probes
>               nwamd`in_past+0x23
>               nwamd`nwamd_event_dequeue+0x1bf
>               nwamd`nwamd_event_handler+0x162
>               nwamd`main+0x1b0
>               nwamd`_start_crt+0x9a
>               nwamd`_start+0x1a
>                24
> 
>               nwamd`in_past+0x26
>               nwamd`nwamd_event_dequeue+0x1bf
>               nwamd`nwamd_event_handler+0x162
>               nwamd`main+0x1b0
>               nwamd`_start_crt+0x9a
>               nwamd`_start+0x1a
>                25
> 
>               libc.so.1`__cp_gethrtime+0x5e
>               libc.so.1`__cp_clock_gettime_realtime+0x77
>               libc.so.1`__clock_gettime+0x72
>               libc.so.1`clock_gettime+0x26
>               nwamd`in_past+0x23
>               nwamd`nwamd_event_dequeue+0x1bf
>               nwamd`nwamd_event_handler+0x162
>               nwamd`main+0x1b0
>               nwamd`_start_crt+0x9a
>               nwamd`_start+0x1a
>                27
> 
>               libc.so.1`__cp_tsc_read+0x19
>               libc.so.1`__cp_gethrtime+0x39
>               libc.so.1`__cp_clock_gettime_realtime+0x77
>               libc.so.1`__clock_gettime+0x72
>               libc.so.1`clock_gettime+0x26
>               nwamd`in_past+0x23
>               nwamd`nwamd_event_dequeue+0x1bf
>               nwamd`nwamd_event_handler+0x162
>               nwamd`main+0x1b0
>               nwamd`_start_crt+0x9a
>               nwamd`_start+0x1a
>               403
> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
> dtrace: description 'profile-101 ' matched 2 probes
>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>               libc.so.1`_thrp_setup+0x77
>               libc.so.1`_lwp_start
>               126
> 
>               libc.so.1`__cp_tsc_read+0xf
>               libc.so.1`clock_gettime+0x15
>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>               libc.so.1`_thrp_setup+0x77
>               libc.so.1`_lwp_start
>               130
> 
>               libc.so.1`__cp_tsc_read+0xf
>               libc.so.1`clock_gettime+0x15
>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>               libc.so.1`_thrp_setup+0x77
>               libc.so.1`_lwp_start
>               135
> 
>               libc.so.1`__cp_tsc_read+0xf
>               libc.so.1`clock_gettime+0x15
>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>               libc.so.1`_thrp_setup+0x77
>               libc.so.1`_lwp_start
>               137
> 
> 
> The obvious suspect is:
> 
> commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
> Author: Robert Mustacchi <rm@fingolfin.org>
> Date:   Tue Jul 23 14:44:22 2024 +0000
> 
>     14237 Want support for pthread_cond_clockwait() and friends
>     Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
>     Approved by: Gordon Ross <gordon.w.ross@gmail.com>
> 
> 
> but I didn't bisect yet.

Thanks for the report, Marcel. I will dig in and see what I can find.
Apologies for the trouble.

Robert


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [developer] CPU usage - clock related issue
  2024-07-28 16:37 ` [developer] " Robert Mustacchi
@ 2024-07-28 17:50   ` Robert Mustacchi
  2024-07-28 23:16     ` Marcel Telka
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Mustacchi @ 2024-07-28 17:50 UTC (permalink / raw)
  To: illumos-developer, Marcel Telka

On 7/28/24 09:37, Robert Mustacchi wrote:
> On 7/27/24 22:56, Marcel Telka wrote:
>> Hi,
>>
>> It looks like something went wrong between changesets 6e0c6e37fb and
>> 8b913f79fc in the illumos-gate.
>>
>> After upgrade of OpenIndiana from
>>         osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
>> to
>>         osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
>>
>> I see two processes eating full CPU: nwamd and mariadbd.  The machine is
>> a qemu/kvm quest (host is Rocky 9).
>>
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                24
>>
>>               nwamd`in_past+0x26
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                25
>>
>>               libc.so.1`__cp_gethrtime+0x5e
>>               libc.so.1`__cp_clock_gettime_realtime+0x77
>>               libc.so.1`__clock_gettime+0x72
>>               libc.so.1`clock_gettime+0x26
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                27
>>
>>               libc.so.1`__cp_tsc_read+0x19
>>               libc.so.1`__cp_gethrtime+0x39
>>               libc.so.1`__cp_clock_gettime_realtime+0x77
>>               libc.so.1`__clock_gettime+0x72
>>               libc.so.1`clock_gettime+0x26
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>               403
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               126
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               130
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               135
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               137
>>
>>
>> The obvious suspect is:
>>
>> commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
>> Author: Robert Mustacchi <rm@fingolfin.org>
>> Date:   Tue Jul 23 14:44:22 2024 +0000
>>
>>     14237 Want support for pthread_cond_clockwait() and friends
>>     Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
>>     Approved by: Gordon Ross <gordon.w.ross@gmail.com>
>>
>>
>> but I didn't bisect yet.
> 
> Thanks for the report, Marcel. I will dig in and see what I can find.
> Apologies for the trouble.

I've root caused this and written it up in
https://www.illumos.org/issues/16683. The short form is that I
incorrectly handled how the default initializer set the internal clock
id in the cond_t. I have verified that nwam no longer is in a 100% loop
with the fix in place and confirmed that if I used static initializers
that my tests properly caught the issue before the fix and it is working
afterwards. I'll be sending a review request with additional regression
tests in a separate thread.

Again, I'm sorry for the trouble and inconvenience that this caused.
Thank you for reporting this Marcel.

Robert

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [developer] CPU usage - clock related issue
  2024-07-28 17:50   ` Robert Mustacchi
@ 2024-07-28 23:16     ` Marcel Telka
  0 siblings, 0 replies; 4+ messages in thread
From: Marcel Telka @ 2024-07-28 23:16 UTC (permalink / raw)
  To: illumos-developer

On Sun, Jul 28, 2024 at 10:50:40AM -0700, Robert Mustacchi wrote:
> I've root caused this and written it up in
> https://www.illumos.org/issues/16683. The short form is that I
> incorrectly handled how the default initializer set the internal clock
> id in the cond_t. I have verified that nwam no longer is in a 100% loop
> with the fix in place and confirmed that if I used static initializers
> that my tests properly caught the issue before the fix and it is working
> afterwards. I'll be sending a review request with additional regression
> tests in a separate thread.

Thanks a lot for very fast root causing and the fix.
Lookig forward to see it integrated soon.


Thank you for your hard and extremely valuable work, Robert!

-- 
+-------------------------------------------+
| Marcel Telka   e-mail:   marcel@telka.sk  |
|                homepage: http://telka.sk/ |
+-------------------------------------------+

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-28 23:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-28  5:56 CPU usage - clock related issue Marcel Telka
2024-07-28 16:37 ` [developer] " Robert Mustacchi
2024-07-28 17:50   ` Robert Mustacchi
2024-07-28 23:16     ` Marcel Telka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).