Development discussion of WireGuard
 help / color / mirror / Atom feed
* CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
       [not found]       ` <20220628041252.GV1790663@paulmck-ThinkPad-P17-Gen-1>
@ 2022-06-28 15:02         ` Alex Xu (Hello71)
  2022-06-28 15:13           ` Jason A. Donenfeld
  2022-06-28 18:54           ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Xu (Hello71) @ 2022-06-28 15:02 UTC (permalink / raw)
  To: paulmck, rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, linux-kernel, Jason A. Donenfeld, wireguard,
	Theodore Ts'o
  Cc: alexander.deucher, christian.koenig, Xinhui.Pan, amd-gfx

Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am:
> On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote:
>> Ah, I see. I have selected the default value for 
>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not 
>> using Android; I'm not sure there exist Android devices with AMD GPUs. 
>> However, I have set CONFIG_ANDROID=y in order to use 
>> ANDROID_BINDER_IPC=m for emulation.
>> 
>> In general, I think CONFIG_ANDROID is not a reliable method for 
>> detecting if the kernel is for an Android device; for example, Fedora 
>> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with 
>> Android userspace.
>> 
>> On the other hand, it's not clear to me why the value 20 should be for 
>> Android only anyways. If, as you say in 
>> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/,
>> it is related to the size of the system, perhaps some other heuristic 
>> would be more appropriate.
> 
> It is related to the fact that quite a few Android guys want these
> 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one
> else does.  Not yet anyway.
> 
> And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely
> straightforward and unmistakeable.  So perhaps people not running Android
> devices but wanting a little bit of the Android functionality should do
> something other than setting CONFIG_ANDROID=y in their .config files.  Me,
> I am surprised that it took this long for something like this to bite you.
> 
> But just out of curiosity, what would you suggest instead?

Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If 
major distro vendors are consistently making this "mistake", then 
perhaps the problem is elsewhere.

In my own opinion, assuming that binderfs means Android vendor is not a 
good assumption. The ANDROID help says:

> Enable support for various drivers needed on the Android platform

It doesn't say "Enable only if building an Android device", or "Enable 
only if you are Google". Isn't the traditional Linux philosophy a 
collection of pieces to be assembled, without gratuitous hidden 
dependencies? For example, [0] removes the unnecessary Android 
dependency, it doesn't block the whole thing with "depends on ANDROID".

It seems to me that the proper way to set some configuration for Android 
kernels is or should be to ask the Android kernel config maintainers, 
not to set it based on an upstream kernel option. There is, after all, 
no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA.

WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as 
rcu, there to see if suspends are "frequent". This seems dubious for the 
same reasons.

I wonder if it might be time to retire CONFIG_ANDROID: the only 
remaining driver covered is binder, which originates from Android but 
is no longer used exclusively on Android systems. Like ufs-qcom, binder 
is no longer used exclusively on Android devices; it is also used for 
Android device emulators, which might be used on Android-like mobile 
devices, or might not.

My understanding is that both Android and upstream kernel developers 
intend to add no more Android-specific drivers, so binder should be the 
only one covered for the foreseeable future.

> For that matter, why the private reply?

Mail client issues, not intentional. Lists re-added, plus Android, 
WireGuard, and random.

Thanks,
Alex.

[0] https://lore.kernel.org/all/20220321151853.24138-1-krzk@kernel.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-06-28 15:02         ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71)
@ 2022-06-28 15:13           ` Jason A. Donenfeld
  2022-06-28 18:54           ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Jason A. Donenfeld @ 2022-06-28 15:13 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: paulmck, rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, linux-kernel, wireguard, Theodore Ts'o,
	alexander.deucher, christian.koenig, Xinhui.Pan, amd-gfx

Hi Alex,

On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote:
> WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as 
> rcu, there to see if suspends are "frequent". This seems dubious for the 
> same reasons.

I'd be happy to take a patch in WireGuard and random.c to get rid of the
CONFIG_ANDROID usage, if you can conduct an analysis and conclude this
won't break anything inadvertently.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-06-28 15:02         ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71)
  2022-06-28 15:13           ` Jason A. Donenfeld
@ 2022-06-28 18:54           ` Paul E. McKenney
  2022-06-28 19:28             ` Alex Xu (Hello71)
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-06-28 18:54 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: rcu, urezki, uladzislau.rezki, Greg Kroah-Hartman,
	Arve Hjønnevåg, Todd Kjos, Martijn Coenen,
	Joel Fernandes, Christian Brauner, Hridya Valsaraju,
	Suren Baghdasaryan, linux-kernel, Jason A. Donenfeld, wireguard,
	Theodore Ts'o, alexander.deucher, christian.koenig,
	Xinhui.Pan, amd-gfx

On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote:
> Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am:
> > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote:
> >> Ah, I see. I have selected the default value for 
> >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not 
> >> using Android; I'm not sure there exist Android devices with AMD GPUs. 
> >> However, I have set CONFIG_ANDROID=y in order to use 
> >> ANDROID_BINDER_IPC=m for emulation.
> >> 
> >> In general, I think CONFIG_ANDROID is not a reliable method for 
> >> detecting if the kernel is for an Android device; for example, Fedora 
> >> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with 
> >> Android userspace.
> >> 
> >> On the other hand, it's not clear to me why the value 20 should be for 
> >> Android only anyways. If, as you say in 
> >> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/,
> >> it is related to the size of the system, perhaps some other heuristic 
> >> would be more appropriate.
> > 
> > It is related to the fact that quite a few Android guys want these
> > 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one
> > else does.  Not yet anyway.
> > 
> > And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely
> > straightforward and unmistakeable.  So perhaps people not running Android
> > devices but wanting a little bit of the Android functionality should do
> > something other than setting CONFIG_ANDROID=y in their .config files.  Me,
> > I am surprised that it took this long for something like this to bite you.
> > 
> > But just out of curiosity, what would you suggest instead?
> 
> Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If 
> major distro vendors are consistently making this "mistake", then 
> perhaps the problem is elsewhere.
> 
> In my own opinion, assuming that binderfs means Android vendor is not a 
> good assumption. The ANDROID help says:
> 
> > Enable support for various drivers needed on the Android platform
> 
> It doesn't say "Enable only if building an Android device", or "Enable 
> only if you are Google". Isn't the traditional Linux philosophy a 
> collection of pieces to be assembled, without gratuitous hidden 
> dependencies? For example, [0] removes the unnecessary Android 
> dependency, it doesn't block the whole thing with "depends on ANDROID".
> 
> It seems to me that the proper way to set some configuration for Android 
> kernels is or should be to ask the Android kernel config maintainers, 
> not to set it based on an upstream kernel option. There is, after all, 
> no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA.
> 
> WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as 
> rcu, there to see if suspends are "frequent". This seems dubious for the 
> same reasons.
> 
> I wonder if it might be time to retire CONFIG_ANDROID: the only 
> remaining driver covered is binder, which originates from Android but 
> is no longer used exclusively on Android systems. Like ufs-qcom, binder 
> is no longer used exclusively on Android devices; it is also used for 
> Android device emulators, which might be used on Android-like mobile 
> devices, or might not.
> 
> My understanding is that both Android and upstream kernel developers 
> intend to add no more Android-specific drivers, so binder should be the 
> only one covered for the foreseeable future.

Thank you for the perspective, but you never did suggest an alternative.

So here is is what I suggest given the current setup:

config RCU_EXP_CPU_STALL_TIMEOUT
	int "Expedited RCU CPU stall timeout in milliseconds"
	depends on RCU_STALL_COMMON
	range 0 21000
	default 20 if ANDROID
	default 0 if !ANDROID
	help
	  If a given expedited RCU grace period extends more than the
	  specified number of milliseconds, a CPU stall warning is printed.
	  If the RCU grace period persists, additional CPU stall warnings
	  are printed at more widely spaced intervals.  A value of zero
	  says to use the RCU_CPU_STALL_TIMEOUT value converted from
	  seconds to milliseconds.

The default, and only the default, is controlled by ANDROID.

All you need to do to get the previous behavior is to add something like
this to your defconfig file:

CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000

Any reason why this will not work for you?

> > For that matter, why the private reply?
> 
> Mail client issues, not intentional. Lists re-added, plus Android, 
> WireGuard, and random.

Thank you!

							Thanx, Paul

> Thanks,
> Alex.
> 
> [0] https://lore.kernel.org/all/20220321151853.24138-1-krzk@kernel.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-06-28 18:54           ` Paul E. McKenney
@ 2022-06-28 19:28             ` Alex Xu (Hello71)
  2022-06-28 20:11               ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Xu (Hello71) @ 2022-06-28 19:28 UTC (permalink / raw)
  To: paulmck
  Cc: alexander.deucher, amd-gfx, Arve Hjønnevåg,
	Christian Brauner, christian.koenig, Greg Kroah-Hartman,
	Hridya Valsaraju, Jason A. Donenfeld, Joel Fernandes,
	linux-kernel, Martijn Coenen, rcu, Suren Baghdasaryan, Todd Kjos,
	Theodore Ts'o, uladzislau.rezki, urezki, wireguard,
	Xinhui.Pan

Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> All you need to do to get the previous behavior is to add something like
> this to your defconfig file:
> 
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> 
> Any reason why this will not work for you?

As far as I know, I do not require any particular RCU debugging features 
intended for developers; as an individual user and distro maintainer, I 
would like to select the option corresponding to "emit errors for 
unexpected conditions which should be reported upstream", not "emit 
debugging information for development purposes".

Therefore, I think 0 is a suitable setting for me and most ordinary 
(not tightly controlled) distributions. My concern is that other users 
and distro maintainers will also have confusion about what value to set 
and whether the warnings are important, since the help text does not say 
anything about Android, and "make oldconfig" does not indicate that the 
default value is different for Android.

My suggestion is that the default be set to 0, and if a non-zero value 
is appropriate for Android, that should be communicated to the Android 
developers, not made conditional on CONFIG_ANDROID.

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-06-28 19:28             ` Alex Xu (Hello71)
@ 2022-06-28 20:11               ` Uladzislau Rezki
  2022-07-04 11:30                 ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2022-06-28 20:11 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: paulmck, alexander.deucher, amd-gfx, Arve Hjønnevåg,
	Christian Brauner, christian.koenig, Greg Kroah-Hartman,
	Hridya Valsaraju, Jason A. Donenfeld, Joel Fernandes,
	linux-kernel, Martijn Coenen, rcu, Suren Baghdasaryan, Todd Kjos,
	Theodore Ts'o, uladzislau.rezki, urezki, wireguard,
	Xinhui.Pan

> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > All you need to do to get the previous behavior is to add something like
> > this to your defconfig file:
> > 
> > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > 
> > Any reason why this will not work for you?
> 
> As far as I know, I do not require any particular RCU debugging features 
> intended for developers; as an individual user and distro maintainer, I 
> would like to select the option corresponding to "emit errors for 
> unexpected conditions which should be reported upstream", not "emit 
> debugging information for development purposes".
> 
Sorry but we need to apply some assumption, i.e. to me the CONFIG_ANDROID
indicates that a kernel runs on the Android wise device. When you enable
this option on you specific box it is supposed that some Android related
code are activated also on your device which may lead to some side effect.

>
> Therefore, I think 0 is a suitable setting for me and most ordinary 
> (not tightly controlled) distributions. My concern is that other users 
> and distro maintainers will also have confusion about what value to set 
> and whether the warnings are important, since the help text does not say 
> anything about Android, and "make oldconfig" does not indicate that the 
> default value is different for Android.
> 
<snip>
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 9b64e55d4f61..ced0d1f7c675 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -94,7 +94,8 @@ config RCU_EXP_CPU_STALL_TIMEOUT
          If the RCU grace period persists, additional CPU stall warnings
          are printed at more widely spaced intervals.  A value of zero
          says to use the RCU_CPU_STALL_TIMEOUT value converted from
-         seconds to milliseconds.
+         seconds to milliseconds. If CONFIG_ANDROID is set for non-Android
+         platform and you unsure, set the RCU_EXP_CPU_STALL_TIMEOUT to zero.

 config RCU_TRACE
        bool "Enable tracing for RCU"
<snip>

Will it work for you?

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-06-28 20:11               ` Uladzislau Rezki
@ 2022-07-04 11:30                 ` Christian König
  2022-07-06 17:48                   ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2022-07-04 11:30 UTC (permalink / raw)
  To: Uladzislau Rezki, Alex Xu (Hello71)
  Cc: wireguard, Jason A. Donenfeld, Joel Fernandes, paulmck,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, christian.koenig,
	Christian Brauner

Hi guys,

Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
>> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
>>> All you need to do to get the previous behavior is to add something like
>>> this to your defconfig file:
>>>
>>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
>>>
>>> Any reason why this will not work for you?

sorry for jumping in so later, I was on vacation for a week.

Well when any RCU period is longer than 20ms and amdgpu in the backtrace 
my educated guess is that we messed up some timeout waiting for the hw.

We usually do wait a few us, but it can be that somebody is waiting for 
ms instead.

So there are some todos here as far as I can see and It would be helpful 
to get a cleaner backtrace if possible.

Regards,
Christian.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-04 11:30                 ` Christian König
@ 2022-07-06 17:48                   ` Uladzislau Rezki
  2022-07-06 17:58                     ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2022-07-06 17:48 UTC (permalink / raw)
  To: Christian König
  Cc: Uladzislau Rezki, Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes, paulmck,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

Hello.

On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> Hi guys,
> 
> Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > All you need to do to get the previous behavior is to add something like
> > > > this to your defconfig file:
> > > > 
> > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > 
> > > > Any reason why this will not work for you?
> 
> sorry for jumping in so later, I was on vacation for a week.
> 
> Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> educated guess is that we messed up some timeout waiting for the hw.
> 
> We usually do wait a few us, but it can be that somebody is waiting for ms
> instead.
> 
> So there are some todos here as far as I can see and It would be helpful to
> get a cleaner backtrace if possible.
> 
Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
will not have any dependencies on the CONFIG_ANDROID anymore:

https://lkml.org/lkml/2022/6/29/756

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-06 17:48                   ` Uladzislau Rezki
@ 2022-07-06 17:58                     ` Paul E. McKenney
  2022-07-06 18:09                       ` Uladzislau Rezki
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-07-06 17:58 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christian König, Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> Hello.
> 
> On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> > Hi guys,
> > 
> > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > All you need to do to get the previous behavior is to add something like
> > > > > this to your defconfig file:
> > > > > 
> > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > 
> > > > > Any reason why this will not work for you?
> > 
> > sorry for jumping in so later, I was on vacation for a week.
> > 
> > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > educated guess is that we messed up some timeout waiting for the hw.
> > 
> > We usually do wait a few us, but it can be that somebody is waiting for ms
> > instead.
> > 
> > So there are some todos here as far as I can see and It would be helpful to
> > get a cleaner backtrace if possible.
> > 
> Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> will not have any dependencies on the CONFIG_ANDROID anymore:
> 
> https://lkml.org/lkml/2022/6/29/756

But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
wish.  Setting this option to 20 will get you the behavior previously
obtained by setting the now-defunct ANDROID Kconfig option.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-06 17:58                     ` Paul E. McKenney
@ 2022-07-06 18:09                       ` Uladzislau Rezki
  2022-07-06 20:42                         ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Uladzislau Rezki @ 2022-07-06 18:09 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Uladzislau Rezki, Christian König, Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > Hello.
> > 
> > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> > > Hi guys,
> > > 
> > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > this to your defconfig file:
> > > > > > 
> > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > > 
> > > > > > Any reason why this will not work for you?
> > > 
> > > sorry for jumping in so later, I was on vacation for a week.
> > > 
> > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > educated guess is that we messed up some timeout waiting for the hw.
> > > 
> > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > instead.
> > > 
> > > So there are some todos here as far as I can see and It would be helpful to
> > > get a cleaner backtrace if possible.
> > > 
> > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > will not have any dependencies on the CONFIG_ANDROID anymore:
> > 
> > https://lkml.org/lkml/2022/6/29/756
> 
> But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> wish.  Setting this option to 20 will get you the behavior previously
> obtained by setting the now-defunct ANDROID Kconfig option.
> 
Right. Or over boot parameter. So for us it is not a big issue :)

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-06 18:09                       ` Uladzislau Rezki
@ 2022-07-06 20:42                         ` Paul E. McKenney
  2022-07-07  7:30                           ` Christian König
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-07-06 20:42 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christian König, Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
> On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > > Hello.
> > > 
> > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> > > > Hi guys,
> > > > 
> > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > > this to your defconfig file:
> > > > > > > 
> > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > > > 
> > > > > > > Any reason why this will not work for you?
> > > > 
> > > > sorry for jumping in so later, I was on vacation for a week.
> > > > 
> > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > > educated guess is that we messed up some timeout waiting for the hw.
> > > > 
> > > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > > instead.
> > > > 
> > > > So there are some todos here as far as I can see and It would be helpful to
> > > > get a cleaner backtrace if possible.
> > > > 
> > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > > will not have any dependencies on the CONFIG_ANDROID anymore:
> > > 
> > > https://lkml.org/lkml/2022/6/29/756
> > 
> > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> > wish.  Setting this option to 20 will get you the behavior previously
> > obtained by setting the now-defunct ANDROID Kconfig option.
> > 
> Right. Or over boot parameter. So for us it is not a big issue :)

Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
tuning in.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-06 20:42                         ` Paul E. McKenney
@ 2022-07-07  7:30                           ` Christian König
  2022-07-07 13:29                             ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Christian König @ 2022-07-07  7:30 UTC (permalink / raw)
  To: paulmck, Uladzislau Rezki
  Cc: Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

Am 06.07.22 um 22:42 schrieb Paul E. McKenney:
> On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
>> On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
>>> On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
>>>> Hello.
>>>>
>>>> On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
>>>>> Hi guys,
>>>>>
>>>>> Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
>>>>>>> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
>>>>>>>> All you need to do to get the previous behavior is to add something like
>>>>>>>> this to your defconfig file:
>>>>>>>>
>>>>>>>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
>>>>>>>>
>>>>>>>> Any reason why this will not work for you?
>>>>> sorry for jumping in so later, I was on vacation for a week.
>>>>>
>>>>> Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
>>>>> educated guess is that we messed up some timeout waiting for the hw.
>>>>>
>>>>> We usually do wait a few us, but it can be that somebody is waiting for ms
>>>>> instead.
>>>>>
>>>>> So there are some todos here as far as I can see and It would be helpful to
>>>>> get a cleaner backtrace if possible.
>>>>>
>>>> Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
>>>> will not have any dependencies on the CONFIG_ANDROID anymore:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&amp;reserved=0
>>> But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
>>> wish.  Setting this option to 20 will get you the behavior previously
>>> obtained by setting the now-defunct ANDROID Kconfig option.
>>>
>> Right. Or over boot parameter. So for us it is not a big issue :)
> Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
> tuning in.  ;-)

I was just about to write a response asking for that :)

Thanks, I will suggest to our QA to add this parameter while doing some 
tests.

Regards,
Christian.

>
> 							Thanx, Paul


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)
  2022-07-07  7:30                           ` Christian König
@ 2022-07-07 13:29                             ` Paul E. McKenney
  0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2022-07-07 13:29 UTC (permalink / raw)
  To: Christian König
  Cc: Uladzislau Rezki, Alex Xu (Hello71),
	wireguard, Jason A. Donenfeld, Joel Fernandes,
	Greg Kroah-Hartman, Xinhui.Pan, linux-kernel, amd-gfx,
	Suren Baghdasaryan, rcu, Hridya Valsaraju,
	Arve Hjønnevåg, Theodore Ts'o, alexander.deucher,
	Todd Kjos, uladzislau.rezki, Martijn Coenen, Christian Brauner

On Thu, Jul 07, 2022 at 09:30:39AM +0200, Christian König wrote:
> Am 06.07.22 um 22:42 schrieb Paul E. McKenney:
> > On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > > > > Hello.
> > > > > 
> > > > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> > > > > > Hi guys,
> > > > > > 
> > > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > > > > this to your defconfig file:
> > > > > > > > > 
> > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > > > > > 
> > > > > > > > > Any reason why this will not work for you?
> > > > > > sorry for jumping in so later, I was on vacation for a week.
> > > > > > 
> > > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > > > > educated guess is that we messed up some timeout waiting for the hw.
> > > > > > 
> > > > > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > > > > instead.
> > > > > > 
> > > > > > So there are some todos here as far as I can see and It would be helpful to
> > > > > > get a cleaner backtrace if possible.
> > > > > > 
> > > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > > > > will not have any dependencies on the CONFIG_ANDROID anymore:
> > > > > 
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&amp;reserved=0
> > > > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> > > > wish.  Setting this option to 20 will get you the behavior previously
> > > > obtained by setting the now-defunct ANDROID Kconfig option.
> > > > 
> > > Right. Or over boot parameter. So for us it is not a big issue :)
> > Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
> > tuning in.  ;-)
> 
> I was just about to write a response asking for that :)
> 
> Thanks, I will suggest to our QA to add this parameter while doing some
> tests.

Very good!  Please let me know how it goes.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-07 13:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1656357116.rhe0mufk6a.none.ref@localhost>
     [not found] ` <1656357116.rhe0mufk6a.none@localhost>
     [not found]   ` <20220627204139.GL1790663@paulmck-ThinkPad-P17-Gen-1>
     [not found]     ` <1656379893.q9yb069erk.none@localhost>
     [not found]       ` <20220628041252.GV1790663@paulmck-ThinkPad-P17-Gen-1>
2022-06-28 15:02         ` CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend) Alex Xu (Hello71)
2022-06-28 15:13           ` Jason A. Donenfeld
2022-06-28 18:54           ` Paul E. McKenney
2022-06-28 19:28             ` Alex Xu (Hello71)
2022-06-28 20:11               ` Uladzislau Rezki
2022-07-04 11:30                 ` Christian König
2022-07-06 17:48                   ` Uladzislau Rezki
2022-07-06 17:58                     ` Paul E. McKenney
2022-07-06 18:09                       ` Uladzislau Rezki
2022-07-06 20:42                         ` Paul E. McKenney
2022-07-07  7:30                           ` Christian König
2022-07-07 13:29                             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).