public inbox for developer@lists.illumos.org (since 2011-08)
 help / color / mirror / Atom feed
From: Robert Mustacchi <rm@fingolfin.org>
To: illumos-developer <developer@lists.illumos.org>,
	Marcel Telka <marcel@telka.sk>
Subject: Re: [developer] CPU usage - clock related issue
Date: Sun, 28 Jul 2024 10:50:40 -0700	[thread overview]
Message-ID: <e32abba4-e351-4907-986a-ed442749a49a@fingolfin.org> (raw)
In-Reply-To: <4d9797a5-b68c-4c50-9a13-66a9d0d63bd0@fingolfin.org>

On 7/28/24 09:37, Robert Mustacchi wrote:
> On 7/27/24 22:56, Marcel Telka wrote:
>> Hi,
>>
>> It looks like something went wrong between changesets 6e0c6e37fb and
>> 8b913f79fc in the illumos-gate.
>>
>> After upgrade of OpenIndiana from
>>         osnet-incorporation@0.5.11-2024.0.0.22264 (illumos-6e0c6e37fb)
>> to
>>         osnet-incorporation@0.5.11-2024.0.0.22271 (illumos-8b913f79fc)
>>
>> I see two processes eating full CPU: nwamd and mariadbd.  The machine is
>> a qemu/kvm quest (host is Rocky 9).
>>
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x nwamd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                24
>>
>>               nwamd`in_past+0x26
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                25
>>
>>               libc.so.1`__cp_gethrtime+0x5e
>>               libc.so.1`__cp_clock_gettime_realtime+0x77
>>               libc.so.1`__clock_gettime+0x72
>>               libc.so.1`clock_gettime+0x26
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>                27
>>
>>               libc.so.1`__cp_tsc_read+0x19
>>               libc.so.1`__cp_gethrtime+0x39
>>               libc.so.1`__cp_clock_gettime_realtime+0x77
>>               libc.so.1`__clock_gettime+0x72
>>               libc.so.1`clock_gettime+0x26
>>               nwamd`in_past+0x23
>>               nwamd`nwamd_event_dequeue+0x1bf
>>               nwamd`nwamd_event_handler+0x162
>>               nwamd`main+0x1b0
>>               nwamd`_start_crt+0x9a
>>               nwamd`_start+0x1a
>>               403
>> # dtrace -n 'profile-101 /pid == $target/ { @[ustack()] = count() } tick-10s{exit(0)}' -p $(pgrep -x mariadbd) | tail -n 40
>> dtrace: description 'profile-101 ' matched 2 probes
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb8
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               126
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xb0
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               130
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212system_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0x103
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               135
>>
>>               libc.so.1`__cp_tsc_read+0xf
>>               libc.so.1`clock_gettime+0x15
>>               libstdc++.so.6.0.32`_ZNSt6chrono3_V212steady_clock3nowEv+0x16
>>               mariadbd`_ZN5tpool19thread_pool_generic14wait_for_tasksERSt11unique_lockISt5mutexEPNS_11worker_dataE+0xa8
>>               mariadbd`_ZN5tpool19thread_pool_generic8get_taskEPNS_11worker_dataEPPNS_4taskE+0x8a
>>               mariadbd`_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x65
>>               libstdc++.so.6.0.32`execute_native_thread_routine+0x10
>>               libc.so.1`_thrp_setup+0x77
>>               libc.so.1`_lwp_start
>>               137
>>
>>
>> The obvious suspect is:
>>
>> commit 8b6b46dcb073dba71917d6a7309f0df7bad798a2
>> Author: Robert Mustacchi <rm@fingolfin.org>
>> Date:   Tue Jul 23 14:44:22 2024 +0000
>>
>>     14237 Want support for pthread_cond_clockwait() and friends
>>     Reviewed by: Andy Fiddaman <illumos@fiddaman.net>
>>     Approved by: Gordon Ross <gordon.w.ross@gmail.com>
>>
>>
>> but I didn't bisect yet.
> 
> Thanks for the report, Marcel. I will dig in and see what I can find.
> Apologies for the trouble.

I've root caused this and written it up in
https://www.illumos.org/issues/16683. The short form is that I
incorrectly handled how the default initializer set the internal clock
id in the cond_t. I have verified that nwam no longer is in a 100% loop
with the fix in place and confirmed that if I used static initializers
that my tests properly caught the issue before the fix and it is working
afterwards. I'll be sending a review request with additional regression
tests in a separate thread.

Again, I'm sorry for the trouble and inconvenience that this caused.
Thank you for reporting this Marcel.

Robert

  reply	other threads:[~2024-07-28 17:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-28  5:56 Marcel Telka
2024-07-28 16:37 ` [developer] " Robert Mustacchi
2024-07-28 17:50   ` Robert Mustacchi [this message]
2024-07-28 23:16     ` Marcel Telka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e32abba4-e351-4907-986a-ed442749a49a@fingolfin.org \
    --to=rm@fingolfin.org \
    --cc=developer@lists.illumos.org \
    --cc=marcel@telka.sk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).