* CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) [not found] ` <20220628041252.GV1790663@paulmck-ThinkPad-P17-Gen-1> @ 2022-06-28 15:02 ` Alex Xu (Hello71) 2022-06-28 15:13 ` Jason A. Donenfeld 2022-06-28 18:54 ` Paul E. McKenney 0 siblings, 2 replies; 12+ messages in thread From: Alex Xu (Hello71) @ 2022-06-28 15:02 UTC (permalink / raw) To: paulmck, rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Hridya Valsaraju, Suren Baghdasaryan, linux-kernel, Jason A. Donenfeld, wireguard, Theodore Ts'o Cc: alexander.deucher, christian.koenig, Xinhui.Pan, amd-gfx Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am: > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote: >> Ah, I see. I have selected the default value for >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not >> using Android; I'm not sure there exist Android devices with AMD GPUs. >> However, I have set CONFIG_ANDROID=y in order to use >> ANDROID_BINDER_IPC=m for emulation. >> >> In general, I think CONFIG_ANDROID is not a reliable method for >> detecting if the kernel is for an Android device; for example, Fedora >> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with >> Android userspace. >> >> On the other hand, it's not clear to me why the value 20 should be for >> Android only anyways. If, as you say in >> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/, >> it is related to the size of the system, perhaps some other heuristic >> would be more appropriate. > > It is related to the fact that quite a few Android guys want these > 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one > else does. Not yet anyway. > > And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely > straightforward and unmistakeable. So perhaps people not running Android > devices but wanting a little bit of the Android functionality should do > something other than setting CONFIG_ANDROID=y in their .config files. Me, > I am surprised that it took this long for something like this to bite you. > > But just out of curiosity, what would you suggest instead? Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If major distro vendors are consistently making this "mistake", then perhaps the problem is elsewhere. In my own opinion, assuming that binderfs means Android vendor is not a good assumption. The ANDROID help says: > Enable support for various drivers needed on the Android platform It doesn't say "Enable only if building an Android device", or "Enable only if you are Google". Isn't the traditional Linux philosophy a collection of pieces to be assembled, without gratuitous hidden dependencies? For example, [0] removes the unnecessary Android dependency, it doesn't block the whole thing with "depends on ANDROID". It seems to me that the proper way to set some configuration for Android kernels is or should be to ask the Android kernel config maintainers, not to set it based on an upstream kernel option. There is, after all, no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA. WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as rcu, there to see if suspends are "frequent". This seems dubious for the same reasons. I wonder if it might be time to retire CONFIG_ANDROID: the only remaining driver covered is binder, which originates from Android but is no longer used exclusively on Android systems. Like ufs-qcom, binder is no longer used exclusively on Android devices; it is also used for Android device emulators, which might be used on Android-like mobile devices, or might not. My understanding is that both Android and upstream kernel developers intend to add no more Android-specific drivers, so binder should be the only one covered for the foreseeable future. > For that matter, why the private reply? Mail client issues, not intentional. Lists re-added, plus Android, WireGuard, and random. Thanks, Alex. [0] https://lore.kernel.org/all/20220321151853.24138-1-krzk@kernel.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-06-28 15:02 ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71) @ 2022-06-28 15:13 ` Jason A. Donenfeld 2022-06-28 18:54 ` Paul E. McKenney 1 sibling, 0 replies; 12+ messages in thread From: Jason A. Donenfeld @ 2022-06-28 15:13 UTC (permalink / raw) To: Alex Xu (Hello71) Cc: paulmck, rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Hridya Valsaraju, Suren Baghdasaryan, linux-kernel, wireguard, Theodore Ts'o, alexander.deucher, christian.koenig, Xinhui.Pan, amd-gfx Hi Alex, On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote: > WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as > rcu, there to see if suspends are "frequent". This seems dubious for the > same reasons. I'd be happy to take a patch in WireGuard and random.c to get rid of the CONFIG_ANDROID usage, if you can conduct an analysis and conclude this won't break anything inadvertently. Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-06-28 15:02 ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71) 2022-06-28 15:13 ` Jason A. Donenfeld @ 2022-06-28 18:54 ` Paul E. McKenney 2022-06-28 19:28 ` Alex Xu (Hello71) 1 sibling, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-06-28 18:54 UTC (permalink / raw) To: Alex Xu (Hello71) Cc: rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman, Arve Hjønnevåg, Todd Kjos, Martijn Coenen, Joel Fernandes, Christian Brauner, Hridya Valsaraju, Suren Baghdasaryan, linux-kernel, Jason A. Donenfeld, wireguard, Theodore Ts'o, alexander.deucher, christian.koenig, Xinhui.Pan, amd-gfx On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote: > Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am: > > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote: > >> Ah, I see. I have selected the default value for > >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not > >> using Android; I'm not sure there exist Android devices with AMD GPUs. > >> However, I have set CONFIG_ANDROID=y in order to use > >> ANDROID_BINDER_IPC=m for emulation. > >> > >> In general, I think CONFIG_ANDROID is not a reliable method for > >> detecting if the kernel is for an Android device; for example, Fedora > >> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with > >> Android userspace. > >> > >> On the other hand, it's not clear to me why the value 20 should be for > >> Android only anyways. If, as you say in > >> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/, > >> it is related to the size of the system, perhaps some other heuristic > >> would be more appropriate. > > > > It is related to the fact that quite a few Android guys want these > > 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one > > else does. Not yet anyway. > > > > And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely > > straightforward and unmistakeable. So perhaps people not running Android > > devices but wanting a little bit of the Android functionality should do > > something other than setting CONFIG_ANDROID=y in their .config files. Me, > > I am surprised that it took this long for something like this to bite you. > > > > But just out of curiosity, what would you suggest instead? > > Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If > major distro vendors are consistently making this "mistake", then > perhaps the problem is elsewhere. > > In my own opinion, assuming that binderfs means Android vendor is not a > good assumption. The ANDROID help says: > > > Enable support for various drivers needed on the Android platform > > It doesn't say "Enable only if building an Android device", or "Enable > only if you are Google". Isn't the traditional Linux philosophy a > collection of pieces to be assembled, without gratuitous hidden > dependencies? For example, [0] removes the unnecessary Android > dependency, it doesn't block the whole thing with "depends on ANDROID". > > It seems to me that the proper way to set some configuration for Android > kernels is or should be to ask the Android kernel config maintainers, > not to set it based on an upstream kernel option. There is, after all, > no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA. > > WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as > rcu, there to see if suspends are "frequent". This seems dubious for the > same reasons. > > I wonder if it might be time to retire CONFIG_ANDROID: the only > remaining driver covered is binder, which originates from Android but > is no longer used exclusively on Android systems. Like ufs-qcom, binder > is no longer used exclusively on Android devices; it is also used for > Android device emulators, which might be used on Android-like mobile > devices, or might not. > > My understanding is that both Android and upstream kernel developers > intend to add no more Android-specific drivers, so binder should be the > only one covered for the foreseeable future. Thank you for the perspective, but you never did suggest an alternative. So here is is what I suggest given the current setup: config RCU_EXP_CPU_STALL_TIMEOUT int "Expedited RCU CPU stall timeout in milliseconds" depends on RCU_STALL_COMMON range 0 21000 default 20 if ANDROID default 0 if !ANDROID help If a given expedited RCU grace period extends more than the specified number of milliseconds, a CPU stall warning is printed. If the RCU grace period persists, additional CPU stall warnings are printed at more widely spaced intervals. A value of zero says to use the RCU_CPU_STALL_TIMEOUT value converted from seconds to milliseconds. The default, and only the default, is controlled by ANDROID. All you need to do to get the previous behavior is to add something like this to your defconfig file: CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 Any reason why this will not work for you? > > For that matter, why the private reply? > > Mail client issues, not intentional. Lists re-added, plus Android, > WireGuard, and random. Thank you! Thanx, Paul > Thanks, > Alex. > > [0] https://lore.kernel.org/all/20220321151853.24138-1-krzk@kernel.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-06-28 18:54 ` Paul E. McKenney @ 2022-06-28 19:28 ` Alex Xu (Hello71) 2022-06-28 20:11 ` Uladzislau Rezki 0 siblings, 1 reply; 12+ messages in thread From: Alex Xu (Hello71) @ 2022-06-28 19:28 UTC (permalink / raw) To: paulmck Cc: alexander.deucher, amd-gfx, Arve Hjønnevåg, Christian Brauner, christian.koenig, Greg Kroah-Hartman, Hridya Valsaraju, Jason A. Donenfeld, Joel Fernandes, linux-kernel, Martijn Coenen, rcu, Suren Baghdasaryan, Todd Kjos, Theodore Ts'o, uladzislau.rezki, urezki, wireguard, Xinhui.Pan Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > All you need to do to get the previous behavior is to add something like > this to your defconfig file: > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > Any reason why this will not work for you? As far as I know, I do not require any particular RCU debugging features intended for developers; as an individual user and distro maintainer, I would like to select the option corresponding to "emit errors for unexpected conditions which should be reported upstream", not "emit debugging information for development purposes". Therefore, I think 0 is a suitable setting for me and most ordinary (not tightly controlled) distributions. My concern is that other users and distro maintainers will also have confusion about what value to set and whether the warnings are important, since the help text does not say anything about Android, and "make oldconfig" does not indicate that the default value is different for Android. My suggestion is that the default be set to 0, and if a non-zero value is appropriate for Android, that should be communicated to the Android developers, not made conditional on CONFIG_ANDROID. Thanks, Alex. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-06-28 19:28 ` Alex Xu (Hello71) @ 2022-06-28 20:11 ` Uladzislau Rezki 2022-07-04 11:30 ` Christian König 0 siblings, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2022-06-28 20:11 UTC (permalink / raw) To: Alex Xu (Hello71) Cc: paulmck, alexander.deucher, amd-gfx, Arve Hjønnevåg, Christian Brauner, christian.koenig, Greg Kroah-Hartman, Hridya Valsaraju, Jason A. Donenfeld, Joel Fernandes, linux-kernel, Martijn Coenen, rcu, Suren Baghdasaryan, Todd Kjos, Theodore Ts'o, uladzislau.rezki, urezki, wireguard, Xinhui.Pan > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > All you need to do to get the previous behavior is to add something like > > this to your defconfig file: > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > Any reason why this will not work for you? > > As far as I know, I do not require any particular RCU debugging features > intended for developers; as an individual user and distro maintainer, I > would like to select the option corresponding to "emit errors for > unexpected conditions which should be reported upstream", not "emit > debugging information for development purposes". > Sorry but we need to apply some assumption, i.e. to me the CONFIG_ANDROID indicates that a kernel runs on the Android wise device. When you enable this option on you specific box it is supposed that some Android related code are activated also on your device which may lead to some side effect. > > Therefore, I think 0 is a suitable setting for me and most ordinary > (not tightly controlled) distributions. My concern is that other users > and distro maintainers will also have confusion about what value to set > and whether the warnings are important, since the help text does not say > anything about Android, and "make oldconfig" does not indicate that the > default value is different for Android. > <snip> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 9b64e55d4f61..ced0d1f7c675 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -94,7 +94,8 @@ config RCU_EXP_CPU_STALL_TIMEOUT If the RCU grace period persists, additional CPU stall warnings are printed at more widely spaced intervals. A value of zero says to use the RCU_CPU_STALL_TIMEOUT value converted from - seconds to milliseconds. + seconds to milliseconds. If CONFIG_ANDROID is set for non-Android + platform and you unsure, set the RCU_EXP_CPU_STALL_TIMEOUT to zero. config RCU_TRACE bool "Enable tracing for RCU" <snip> Will it work for you? -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-06-28 20:11 ` Uladzislau Rezki @ 2022-07-04 11:30 ` Christian König 2022-07-06 17:48 ` Uladzislau Rezki 0 siblings, 1 reply; 12+ messages in thread From: Christian König @ 2022-07-04 11:30 UTC (permalink / raw) To: Uladzislau Rezki, Alex Xu (Hello71) Cc: wireguard, Jason A. Donenfeld, Joel Fernandes, paulmck, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, christian.koenig, Christian Brauner Hi guys, Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: >> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: >>> All you need to do to get the previous behavior is to add something like >>> this to your defconfig file: >>> >>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 >>> >>> Any reason why this will not work for you? sorry for jumping in so later, I was on vacation for a week. Well when any RCU period is longer than 20ms and amdgpu in the backtrace my educated guess is that we messed up some timeout waiting for the hw. We usually do wait a few us, but it can be that somebody is waiting for ms instead. So there are some todos here as far as I can see and It would be helpful to get a cleaner backtrace if possible. Regards, Christian. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-04 11:30 ` Christian König @ 2022-07-06 17:48 ` Uladzislau Rezki 2022-07-06 17:58 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2022-07-06 17:48 UTC (permalink / raw) To: Christian König Cc: Uladzislau Rezki, Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, paulmck, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner Hello. On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: > Hi guys, > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > > > All you need to do to get the previous behavior is to add something like > > > > this to your defconfig file: > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > > > > > Any reason why this will not work for you? > > sorry for jumping in so later, I was on vacation for a week. > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my > educated guess is that we messed up some timeout waiting for the hw. > > We usually do wait a few us, but it can be that somebody is waiting for ms > instead. > > So there are some todos here as far as I can see and It would be helpful to > get a cleaner backtrace if possible. > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will not have any dependencies on the CONFIG_ANDROID anymore: https://lkml.org/lkml/2022/6/29/756 -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-06 17:48 ` Uladzislau Rezki @ 2022-07-06 17:58 ` Paul E. McKenney 2022-07-06 18:09 ` Uladzislau Rezki 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-07-06 17:58 UTC (permalink / raw) To: Uladzislau Rezki Cc: Christian König, Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote: > Hello. > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: > > Hi guys, > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > > > > All you need to do to get the previous behavior is to add something like > > > > > this to your defconfig file: > > > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > > > > > > > Any reason why this will not work for you? > > > > sorry for jumping in so later, I was on vacation for a week. > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my > > educated guess is that we messed up some timeout waiting for the hw. > > > > We usually do wait a few us, but it can be that somebody is waiting for ms > > instead. > > > > So there are some todos here as far as I can see and It would be helpful to > > get a cleaner backtrace if possible. > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT > will not have any dependencies on the CONFIG_ANDROID anymore: > > https://lkml.org/lkml/2022/6/29/756 But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you wish. Setting this option to 20 will get you the behavior previously obtained by setting the now-defunct ANDROID Kconfig option. Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-06 17:58 ` Paul E. McKenney @ 2022-07-06 18:09 ` Uladzislau Rezki 2022-07-06 20:42 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Uladzislau Rezki @ 2022-07-06 18:09 UTC (permalink / raw) To: Paul E. McKenney Cc: Uladzislau Rezki, Christian König, Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote: > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote: > > Hello. > > > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: > > > Hi guys, > > > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > > > > > All you need to do to get the previous behavior is to add something like > > > > > > this to your defconfig file: > > > > > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > > > > > > > > > Any reason why this will not work for you? > > > > > > sorry for jumping in so later, I was on vacation for a week. > > > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my > > > educated guess is that we messed up some timeout waiting for the hw. > > > > > > We usually do wait a few us, but it can be that somebody is waiting for ms > > > instead. > > > > > > So there are some todos here as far as I can see and It would be helpful to > > > get a cleaner backtrace if possible. > > > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT > > will not have any dependencies on the CONFIG_ANDROID anymore: > > > > https://lkml.org/lkml/2022/6/29/756 > > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you > wish. Setting this option to 20 will get you the behavior previously > obtained by setting the now-defunct ANDROID Kconfig option. > Right. Or over boot parameter. So for us it is not a big issue :) -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-06 18:09 ` Uladzislau Rezki @ 2022-07-06 20:42 ` Paul E. McKenney 2022-07-07 7:30 ` Christian König 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-07-06 20:42 UTC (permalink / raw) To: Uladzislau Rezki Cc: Christian König, Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote: > On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote: > > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote: > > > Hello. > > > > > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: > > > > Hi guys, > > > > > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: > > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > > > > > > All you need to do to get the previous behavior is to add something like > > > > > > > this to your defconfig file: > > > > > > > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > > > > > > > > > > > Any reason why this will not work for you? > > > > > > > > sorry for jumping in so later, I was on vacation for a week. > > > > > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my > > > > educated guess is that we messed up some timeout waiting for the hw. > > > > > > > > We usually do wait a few us, but it can be that somebody is waiting for ms > > > > instead. > > > > > > > > So there are some todos here as far as I can see and It would be helpful to > > > > get a cleaner backtrace if possible. > > > > > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT > > > will not have any dependencies on the CONFIG_ANDROID anymore: > > > > > > https://lkml.org/lkml/2022/6/29/756 > > > > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you > > wish. Setting this option to 20 will get you the behavior previously > > obtained by setting the now-defunct ANDROID Kconfig option. > > > Right. Or over boot parameter. So for us it is not a big issue :) Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now tuning in. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-06 20:42 ` Paul E. McKenney @ 2022-07-07 7:30 ` Christian König 2022-07-07 13:29 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Christian König @ 2022-07-07 7:30 UTC (permalink / raw) To: paulmck, Uladzislau Rezki Cc: Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner Am 06.07.22 um 22:42 schrieb Paul E. McKenney: > On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote: >> On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote: >>> On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote: >>>> Hello. >>>> >>>> On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: >>>>> Hi guys, >>>>> >>>>> Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: >>>>>>> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: >>>>>>>> All you need to do to get the previous behavior is to add something like >>>>>>>> this to your defconfig file: >>>>>>>> >>>>>>>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 >>>>>>>> >>>>>>>> Any reason why this will not work for you? >>>>> sorry for jumping in so later, I was on vacation for a week. >>>>> >>>>> Well when any RCU period is longer than 20ms and amdgpu in the backtrace my >>>>> educated guess is that we messed up some timeout waiting for the hw. >>>>> >>>>> We usually do wait a few us, but it can be that somebody is waiting for ms >>>>> instead. >>>>> >>>>> So there are some todos here as far as I can see and It would be helpful to >>>>> get a cleaner backtrace if possible. >>>>> >>>> Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT >>>> will not have any dependencies on the CONFIG_ANDROID anymore: >>>> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&reserved=0 >>> But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you >>> wish. Setting this option to 20 will get you the behavior previously >>> obtained by setting the now-defunct ANDROID Kconfig option. >>> >> Right. Or over boot parameter. So for us it is not a big issue :) > Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now > tuning in. ;-) I was just about to write a response asking for that :) Thanks, I will suggest to our QA to add this parameter while doing some tests. Regards, Christian. > > Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) 2022-07-07 7:30 ` Christian König @ 2022-07-07 13:29 ` Paul E. McKenney 0 siblings, 0 replies; 12+ messages in thread From: Paul E. McKenney @ 2022-07-07 13:29 UTC (permalink / raw) To: Christian König Cc: Uladzislau Rezki, Alex Xu (Hello71), wireguard, Jason A. Donenfeld, Joel Fernandes, Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx, Suren Baghdasaryan, rcu, Hridya Valsaraju, Arve Hjønnevåg, Theodore Ts'o, alexander.deucher, Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner On Thu, Jul 07, 2022 at 09:30:39AM +0200, Christian König wrote: > Am 06.07.22 um 22:42 schrieb Paul E. McKenney: > > On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote: > > > On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote: > > > > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote: > > > > > Hello. > > > > > > > > > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote: > > > > > > Hi guys, > > > > > > > > > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki: > > > > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm: > > > > > > > > > All you need to do to get the previous behavior is to add something like > > > > > > > > > this to your defconfig file: > > > > > > > > > > > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000 > > > > > > > > > > > > > > > > > > Any reason why this will not work for you? > > > > > > sorry for jumping in so later, I was on vacation for a week. > > > > > > > > > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my > > > > > > educated guess is that we messed up some timeout waiting for the hw. > > > > > > > > > > > > We usually do wait a few us, but it can be that somebody is waiting for ms > > > > > > instead. > > > > > > > > > > > > So there are some todos here as far as I can see and It would be helpful to > > > > > > get a cleaner backtrace if possible. > > > > > > > > > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT > > > > > will not have any dependencies on the CONFIG_ANDROID anymore: > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&reserved=0 > > > > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you > > > > wish. Setting this option to 20 will get you the behavior previously > > > > obtained by setting the now-defunct ANDROID Kconfig option. > > > > > > > Right. Or over boot parameter. So for us it is not a big issue :) > > Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now > > tuning in. ;-) > > I was just about to write a response asking for that :) > > Thanks, I will suggest to our QA to add this parameter while doing some > tests. Very good! Please let me know how it goes. Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-07-07 13:29 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1656357116.rhe0mufk6a.none.ref@localhost> [not found] ` <1656357116.rhe0mufk6a.none@localhost> [not found] ` <20220627204139.GL1790663@paulmck-ThinkPad-P17-Gen-1> [not found] ` <1656379893.q9yb069erk.none@localhost> [not found] ` <20220628041252.GV1790663@paulmck-ThinkPad-P17-Gen-1> 2022-06-28 15:02 ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71) 2022-06-28 15:13 ` Jason A. Donenfeld 2022-06-28 18:54 ` Paul E. McKenney 2022-06-28 19:28 ` Alex Xu (Hello71) 2022-06-28 20:11 ` Uladzislau Rezki 2022-07-04 11:30 ` Christian König 2022-07-06 17:48 ` Uladzislau Rezki 2022-07-06 17:58 ` Paul E. McKenney 2022-07-06 18:09 ` Uladzislau Rezki 2022-07-06 20:42 ` Paul E. McKenney 2022-07-07 7:30 ` Christian König 2022-07-07 13:29 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).