* [ISSUE] amdgpu system freeze
@ 2025-01-01 14:42 narodnik
2025-01-01 22:03 ` TeusLollo
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: narodnik @ 2025-01-01 14:42 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 2162 bytes --]
New issue by narodnik on void-packages repository
https://github.com/void-linux/void-packages/issues/53787
Description:
### Is this a new report?
Yes
### System Info
Void 6.9.12_1 x86_64 AuthenticAMD notuptodate rDDF
### Package(s) Affected
linux-firmware-amd-20241210_1.x86_64
### Does a report exist for this bug with the project's home (upstream) and/or another distro?
https://gitlab.freedesktop.org/drm/amd/-/issues/3863
See also: https://github.com/void-linux/void-packages/issues/53434#issuecomment-2564815160
### Expected behaviour
I'm using Wayland with a minimalistic window manager (DWL). After 30 mins usage, I get a full system freeze.
### Actual behaviour
The system fully freezes. I can still hear music playing, and I can reset using REISUB. But there is no response to input from the WM itself.
I've managed to fix it by doing these steps:
1. `xdowngrade mesa-24.2.3_2.x86_64.xbps libglapi-24.2.3_2.x86_64.xbps libOSMesa-24.2.3_2.x86_64.xbps mesa-libgallium-24.2.3_2.x86_64.xbps libgbm-24.2.3_2.x86_64.xbps libgbm-devel-24.2.3_2.x86_64.xbps MesaLib-devel-24.2.3_2.x86_64.xbps`
2. `xdowngrade linux-firmware-amd-20241110_1.x86_64.xbps`
3. Using Linux 6.9
4. Making a completely new user.
The last step is very unusual and makes me think it's due to a stale mesa cache somewhere. I tried clearing out my normal user's home directory, and logging on as that user. But I still get the crash.
However with a completely new user, the system is completely stable. This is the configuration I've been using so far. When installing linux6.9, I got these messages:
```
File descriptor 21 (/home/myuser/.cache/mesa_shader_cache_db/part0/mesa_cache.db) leaked on lvs invocation. Parent PID 76833: /bin/sh
```
Which made me think it's that cache. So I removed `~/.cache/` completely but it didn't fix the issue. Only a completely new user does!
Please advise me the steps to triage this bug and help get it fixed. I have no idea if it's an issue with Linux, amdgpu or mesa.
### Steps to reproduce
Just use my computer. Strangely if I only use foot terminal, and play no videos or use the browser then it's fine.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
@ 2025-01-01 22:03 ` TeusLollo
2025-01-01 22:04 ` TeusLollo
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: TeusLollo @ 2025-01-01 22:03 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 4190 bytes --]
New comment by TeusLollo on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2567160184
Comment:
The Mesa Shader Cache is populated by, namely, shaders, which are an important, yet secondary, aspect of modern graphical processing.
If shaders were to fail, someway, you'd probably get botched graphics somewhere, but most rendering engines would continue running. Even then, `DWL` should probably just crash and send you back to the console, not freeze in a non-responsive state. Linux kernel was still running, however, since it was responsive to REISUB.
A file descriptor failure is when application utilizing a given file fails to properly terminate its access: https://pradeesh-kumar.medium.com/you-must-be-aware-of-file-descriptor-leaking-600cee607dd6
Indeed, shaders are put into that cache because they need to be pre-generated by rendering engines based on GPU capabilities/compatibility, and having them in caches allows skipping the generation phase once it has been done once.
The fact that file is reported leaking may not be related to freezes, it may just be a bug in `DWL`.
A couple of questions:
1) After manually deleting the contents of `~/.cache/`, did you actually reboot your entire system? Because some shaders can actually be loaded into memory on userspace start-up, thus they could have been still into memory even after their files were deleted (This is especially likely if `DWL` was already running). That would neatly explain why a new user instead gave no crashes, because in that case the system was forced to re-generate and re-load its shaders, something which would not happen if you just deleted the cache (Pre-loaded shaders will stay in memory, and only after a cache re-generation and a system/application reboot they will actually be cleared)
2) The fact that you get not crashes while not in browser (Modern browsers rely on graphical acceleration) or videos (Most media players rely on graphical acceleration, and some even on hardware-based audio acceleration) hint at a problem with graphical acceleration itself. Is there a way to run `DWL` in a software-only (No graphical acceleration) mode? If there is one, and you can test successfully that no freezes happen, it may indeed be a bug in `DWL`
3) I can reproduce fairly reliably some hard WM freezes (Although not "black screen" kind of freezes) with `sway` when I run a WINE/PROTON setup and attempt to utilize some Windows-only applications built in .NET Framework. This may hint at some underlying problem with the AMDGPU driver. Indeed, on Windows, OpenGL support on AMD is said to be botched on several levels, so don't expect the AMDGPU driver to be perfect in all of its components, especially if `DWL` relies on some OpenGL compatibility features, or some more obscure functions which haven't been tested in years. If `DWL` developers do most of their tests on an architecture different than AMDGPU, it wouldn't be too surprising that some unforeseen behaviours manifest on AMDGPU-based setups.
4) Are you absolutely sure that `DWL` generates its own shaders and puts them into `~/.cache/`? (Indeed, does it make any use of shaders at all?). Because, if it does, it should have a built-in routine checking against installed `mesa` version and compatibility of cached shaders, and a forced re-generation process of those shaders if something in user setup changes (`Mesa` update, `DWL` update with modification on graphical acceleration backend, change of GPU device even). I would be surprised you were the first to get these problems if there are no safeguards in place.
5) Lastly, if you could try and use a different WM for some time (I know, it can take weeks to redo a satisfying setup on a different WM), and you can't replicate this, than it's pretty much guaranteed that `DWL` is having a problem.
6) Remember that all applications ran from ~ will make use of the Mesa Shader cache, thus something else may be polluting that cache. `steam`-launched applications however should be using their own cache, unless something's changed in the meantime, or you did some funky symlinking in your config.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
2025-01-01 22:03 ` TeusLollo
@ 2025-01-01 22:04 ` TeusLollo
2025-01-01 22:13 ` TeusLollo
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: TeusLollo @ 2025-01-01 22:04 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 4191 bytes --]
New comment by TeusLollo on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2567160184
Comment:
The Mesa Shader Cache is populated by, namely, shaders, which are an important, yet secondary, aspect of modern graphical processing.
If shaders were to fail, someway, you'd probably get botched graphics somewhere, but most rendering engines would continue running. Even then, `DWL` should probably just crash and send you back to the console, not freeze in a non-responsive state. Linux kernel was still running, however, since it was responsive to REISUB.
A file descriptor failure is when application utilizing a given file fails to properly terminate its access: https://pradeesh-kumar.medium.com/you-must-be-aware-of-file-descriptor-leaking-600cee607dd6
Indeed, shaders are put into that cache because they need to be pre-generated by rendering engines based on GPU capabilities/compatibility, and having them in caches allows skipping the generation phase once it has been done once.
The fact that file is reported leaking may not be related to freezes, it may just be a bug in `DWL`.
A couple of questions:
1) After manually deleting the contents of `~/.cache/`, did you actually reboot your entire system? Because some shaders can actually be loaded into memory on userspace start-up, thus they could have been still into memory even after their files were deleted (This is especially likely if `DWL` was already running). That would neatly explain why a new user instead gave no crashes, because in that case the system was forced to re-generate and re-load its shaders, something which would not happen if you just deleted the cache (Pre-loaded shaders will stay in memory, and only after a cache re-generation and a system/application reboot they will actually be cleared)
2) The fact that you get not crashes while not in browser (Modern browsers rely on graphical acceleration) or videos (Most media players rely on graphical acceleration, and some even on hardware-based audio acceleration) hints at a problem with graphical acceleration itself. Is there a way to run `DWL` in a software-only (No graphical acceleration) mode? If there is one, and you can test successfully that no freezes happen, it may indeed be a bug in `DWL`
3) I can reproduce fairly reliably some hard WM freezes (Although not "black screen" kind of freezes) with `sway` when I run a WINE/PROTON setup and attempt to utilize some Windows-only applications built in .NET Framework. This may hint at some underlying problem with the AMDGPU driver. Indeed, on Windows, OpenGL support on AMD is said to be botched on several levels, so don't expect the AMDGPU driver to be perfect in all of its components, especially if `DWL` relies on some OpenGL compatibility features, or some more obscure functions which haven't been tested in years. If `DWL` developers do most of their tests on an architecture different than AMDGPU, it wouldn't be too surprising that some unforeseen behaviours manifest on AMDGPU-based setups.
4) Are you absolutely sure that `DWL` generates its own shaders and puts them into `~/.cache/`? (Indeed, does it make any use of shaders at all?). Because, if it does, it should have a built-in routine checking against installed `mesa` version and compatibility of cached shaders, and a forced re-generation process of those shaders if something in user setup changes (`Mesa` update, `DWL` update with modification on graphical acceleration backend, change of GPU device even). I would be surprised you were the first to get these problems if there are no safeguards in place.
5) Lastly, if you could try and use a different WM for some time (I know, it can take weeks to redo a satisfying setup on a different WM), and you can't replicate this, than it's pretty much guaranteed that `DWL` is having a problem.
6) Remember that all applications ran from ~ will make use of the Mesa Shader cache, thus something else may be polluting that cache. `steam`-launched applications however should be using their own cache, unless something's changed in the meantime, or you did some funky symlinking in your config.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
2025-01-01 22:03 ` TeusLollo
2025-01-01 22:04 ` TeusLollo
@ 2025-01-01 22:13 ` TeusLollo
2025-01-02 1:16 ` CaioFrancisco
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: TeusLollo @ 2025-01-01 22:13 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 4544 bytes --]
New comment by TeusLollo on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2567160184
Comment:
The Mesa Shader Cache is populated by, namely, shaders, which are an important, yet secondary, aspect of modern graphical processing.
If shaders were to fail, someway, you'd probably get botched graphics somewhere, but most rendering engines would continue running. Even then, `DWL` should probably just crash and send you back to the console, not freeze in a non-responsive state. Linux kernel was still running, however, since it was responsive to REISUB.
A file descriptor failure is when application utilizing a given file fails to properly terminate its access: https://pradeesh-kumar.medium.com/you-must-be-aware-of-file-descriptor-leaking-600cee607dd6
Indeed, shaders are put into that cache because they need to be pre-generated by rendering engines based on GPU capabilities/compatibility, and having them in caches allows skipping the generation phase once it has been done once.
The fact that file is reported leaking may not be related to freezes, it may just be a bug in `DWL`.
A couple of questions:
1) After manually deleting the contents of `~/.cache/`, did you actually reboot your entire system? Because some shaders can actually be loaded into memory on userspace start-up, thus they could have been still into memory even after their files were deleted (This is especially likely if `DWL` was already running). That would neatly explain why a new user instead gave no crashes, because in that case the system was forced to re-generate and re-load its shaders, something which would not happen if you just deleted the cache (Pre-loaded shaders will stay in memory, and only after a cache re-generation and a system/application reboot they will actually be cleared)
2) The fact that you get not crashes while not in browser (Modern browsers rely on graphical acceleration) or videos (Most media players rely on graphical acceleration, and some even on hardware-based audio acceleration) hints at a problem with graphical acceleration itself. Is there a way to run `DWL` in a software-only (No graphical acceleration) mode? If there is one, and you can test successfully that no freezes happen, it may indeed be a bug in `DWL`
3) I can reproduce fairly reliably some hard WM freezes (Although not "black screen" kind of freezes) with `sway` when I run a WINE/PROTON setup and attempt to utilize some Windows-only applications built in .NET Framework. This may hint at some underlying problem with the AMDGPU driver. Indeed, on Windows, OpenGL support on AMD is said to be botched on several levels, so don't expect the AMDGPU driver to be perfect in all of its components, especially if `DWL` relies on some OpenGL compatibility features, or some more obscure functions which haven't been tested in years. If `DWL` developers do most of their tests on an architecture different than AMDGPU, it wouldn't be too surprising that some unforeseen behaviours manifest on AMDGPU-based setups.
4) Are you absolutely sure that `DWL` generates its own shaders and puts them into `~/.cache/`? (Indeed, does it make any use of shaders at all?). Because, if it does, it should have a built-in routine checking against installed `mesa` version and compatibility of cached shaders, and a forced re-generation process of those shaders if something in user setup changes (`Mesa` update, `DWL` update with modification on graphical acceleration backend, change of GPU device even). I would be surprised you were the first to get these problems if there are no safeguards in place.
5) Lastly, if you could try and use a different WM for some time (I know, it can take weeks to redo a satisfying setup on a different WM), and you can't replicate this, than it's pretty much guaranteed that `DWL` is having a problem.
6) Remember that all applications ran from ~ will make use of the Mesa Shader cache, thus something else may be polluting that cache. `steam`-launched applications however should be using their own cache, unless something's changed in the meantime, or you did some funky symlinking in your config.
EDIT: Looking at https://codeberg.org/dwl/dwl/issues/707, it would seem that `DWL` is at least using OpenGL ES 3.0, although AMDGPU support goes up to 4.6 depending on hardware. Maybe `DWL` is built against previous OpenGL versions for compatibility reasons, yet something funky happens when more recent hardware forces compatibility mode for those?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (2 preceding siblings ...)
2025-01-01 22:13 ` TeusLollo
@ 2025-01-02 1:16 ` CaioFrancisco
2025-01-02 7:47 ` narodnik
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: CaioFrancisco @ 2025-01-02 1:16 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]
New comment by CaioFrancisco on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2567210373
Comment:
i too am suffering from this bug. currently running XFCE on X11, i also ran kde plasma X11 some time ago, which also crashed just the same way. weirdly enough, it might just be luck, but kde plasma wayland never crashed on me.
my system specs are ryzen 5 2400g and nvidia GTX 1650 GPU, and that the crash can happen randomly. i can sometimes do video intensive tasks for hours without a single hiccup, but sometimes i can crash 10 minutes after booting up while using my browser.
as a last note, the dmesg logs have some errors when the system "freezes" (i can still ssh my way in with my phone). they usually are prety quiet up until something like this happens:
```
[ 422.608105] amdgpu 0000:08:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
[ 422.876854] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
[ 433.117233] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:77:crtc-1] hw_done or flip_done timed out
[ 434.689642] amdgpu 0000:08:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 454.621453] amdgpu 0000:08:00.0: amdgpu: Dumping IP State
```
followed by the watchdog freaking out at the CPU threads getting stuck until i reisub.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (3 preceding siblings ...)
2025-01-02 1:16 ` CaioFrancisco
@ 2025-01-02 7:47 ` narodnik
2025-01-02 16:31 ` TeusLollo
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: narodnik @ 2025-01-02 7:47 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 678 bytes --]
New comment by narodnik on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2567385810
Comment:
Thanks a lot @TeusLollo
1. To test, I just rebooted my system, then in tty cleared `~/.cache/` before launching the original user. The freeze happened again. Now I'm back on my alt user with no problems. I honestly don't think it's the cache and maybe nothing in the home dir at all since I completely cleared it then only moved back the configs I need.
2. Interesting. I didn't realize it could be DWL itself. That's an interesting theory. I wonder why it works for one user and not the other. I'll try to debug if it's the WM.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (4 preceding siblings ...)
2025-01-02 7:47 ` narodnik
@ 2025-01-02 16:31 ` TeusLollo
2025-01-02 16:32 ` TeusLollo
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: TeusLollo @ 2025-01-02 16:31 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]
New comment by TeusLollo on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2568046151
Comment:
> i too am suffering from this bug. currently running XFCE on X11, i also ran kde plasma X11 some time ago, which also crashed just the same way. weirdly enough, it might just be luck, but kde plasma wayland never crashed on me.
>
> my system specs are and nvidia GTX 1650 GPU, and that the crash can happen randomly. i can sometimes do video intensive tasks for hours without a single hiccup, but sometimes i can crash 10 minutes after booting up while using my browser.
>
> as a last note, the dmesg logs have some errors when the system "freezes" (i can still ssh my way in with my phone). they usually are prety quiet up until something like this happens:
>
> ```
> [ 422.608105] amdgpu 0000:08:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
> [ 422.876854] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
> [ 433.117233] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:77:crtc-1] hw_done or flip_done timed out
> [ 434.689642] amdgpu 0000:08:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
> [ 454.621453] amdgpu 0000:08:00.0: amdgpu: Dumping IP State
> ```
>
> followed by the watchdog freaking out at the CPU threads getting stuck until i reisub.
Back when these errors started cropping up, it was speculated that some out-of-sync interactions between `AMDGPU`, `amd-firmware`, and the GPU's clock state were resulting in freezes. But you managed to get those with a CPU-integrated GPU, which is interesting to say the least.
In your case, you may just want to disable your CPU-embedded GPU in BIOS/UEFI, and be stuck with then Nvidia discrete adapter, although its support for wayland is not there, basically, and it's driver support is in a state of flux since they're open-sourcing their driver, but support is still early.
@narodnik Try and see also if you can get some `dmesg` logs just like CaioFrancisco did.
Also ensure you have setup a `syslog` config, and see if anything comes up there: https://docs.voidlinux.org/config/services/logging.html?highlight=log#logging (Less likely, but you never know what weird cascade effects may crop up)
I know `dmesg` can be a pain to read into, but something like the log above hints at some driver problem into `AMDGPU`, which really should be brought up to their engineers. It's not the first time something like this comes up.
Both of you may have found some deep-seated bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (5 preceding siblings ...)
2025-01-02 16:31 ` TeusLollo
@ 2025-01-02 16:32 ` TeusLollo
2025-01-08 18:40 ` [ISSUE] [CLOSED] " narodnik
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: TeusLollo @ 2025-01-02 16:32 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 2577 bytes --]
New comment by TeusLollo on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2568046151
Comment:
> i too am suffering from this bug. currently running XFCE on X11, i also ran kde plasma X11 some time ago, which also crashed just the same way. weirdly enough, it might just be luck, but kde plasma wayland never crashed on me.
>
> my system specs are and nvidia GTX 1650 GPU, and that the crash can happen randomly. i can sometimes do video intensive tasks for hours without a single hiccup, but sometimes i can crash 10 minutes after booting up while using my browser.
>
> as a last note, the dmesg logs have some errors when the system "freezes" (i can still ssh my way in with my phone). they usually are prety quiet up until something like this happens:
>
> ```
> [ 422.608105] amdgpu 0000:08:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
> [ 422.876854] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
> [ 433.117233] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:77:crtc-1] hw_done or flip_done timed out
> [ 434.689642] amdgpu 0000:08:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
> [ 454.621453] amdgpu 0000:08:00.0: amdgpu: Dumping IP State
> ```
>
> followed by the watchdog freaking out at the CPU threads getting stuck until i reisub.
Back when these errors started cropping up, it was speculated that some out-of-sync interactions between `AMDGPU`, `amd-firmware`, and the GPU's clock state were resulting in freezes. But you managed to get those with a CPU-integrated GPU, which is interesting to say the least.
In your case, you may just want to disable your CPU-embedded GPU in BIOS/UEFI, and be stuck with the Nvidia discrete adapter, although its support for wayland is not there, basically, and it's driver support is in a state of flux since they're open-sourcing their driver, but support is still early.
@narodnik Try and see also if you can get some `dmesg` logs just like CaioFrancisco did.
Also ensure you have setup a `syslog` config, and see if anything comes up there: https://docs.voidlinux.org/config/services/logging.html?highlight=log#logging (Less likely, but you never know what weird cascade effects may crop up)
I know `dmesg` can be a pain to read into, but something like the log above hints at some driver problem into `AMDGPU`, which really should be brought up to their engineers. It's not the first time something like this comes up.
Both of you may have found some deep-seated bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [ISSUE] [CLOSED] amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (6 preceding siblings ...)
2025-01-02 16:32 ` TeusLollo
@ 2025-01-08 18:40 ` narodnik
2025-01-08 18:40 ` narodnik
2025-01-13 17:46 ` CaioFrancisco
9 siblings, 0 replies; 11+ messages in thread
From: narodnik @ 2025-01-08 18:40 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]
Closed issue by narodnik on void-packages repository
https://github.com/void-linux/void-packages/issues/53787
Description:
### Is this a new report?
Yes
### System Info
Void 6.9.12_1 x86_64 AuthenticAMD notuptodate rDDF
### Package(s) Affected
linux-firmware-amd-20241210_1.x86_64
### Does a report exist for this bug with the project's home (upstream) and/or another distro?
https://gitlab.freedesktop.org/drm/amd/-/issues/3863
See also: https://github.com/void-linux/void-packages/issues/53434#issuecomment-2564815160
### Expected behaviour
I'm using Wayland with a minimalistic window manager (DWL). After 30 mins usage, I get a full system freeze.
### Actual behaviour
The system fully freezes. I can still hear music playing, and I can reset using REISUB. But there is no response to input from the WM itself.
I've managed to fix it by doing these steps:
1. `xdowngrade mesa-24.2.3_2.x86_64.xbps libglapi-24.2.3_2.x86_64.xbps libOSMesa-24.2.3_2.x86_64.xbps mesa-libgallium-24.2.3_2.x86_64.xbps libgbm-24.2.3_2.x86_64.xbps libgbm-devel-24.2.3_2.x86_64.xbps MesaLib-devel-24.2.3_2.x86_64.xbps`
2. `xdowngrade linux-firmware-amd-20241110_1.x86_64.xbps`
3. Using Linux 6.9
4. Making a completely new user.
The last step is very unusual and makes me think it's due to a stale mesa cache somewhere. I tried clearing out my normal user's home directory, and logging on as that user. But I still get the crash.
However with a completely new user, the system is completely stable. This is the configuration I've been using so far. When installing linux6.9, I got these messages:
```
File descriptor 21 (/home/myuser/.cache/mesa_shader_cache_db/part0/mesa_cache.db) leaked on lvs invocation. Parent PID 76833: /bin/sh
```
Which made me think it's that cache. So I removed `~/.cache/` completely but it didn't fix the issue. Only a completely new user does!
Please advise me the steps to triage this bug and help get it fixed. I have no idea if it's an issue with Linux, amdgpu or mesa.
### Steps to reproduce
Just use my computer. Strangely if I only use foot terminal, and play no videos or use the browser then it's fine, even with the latest linux6.12, mesa and linux-firmware-amd.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (7 preceding siblings ...)
2025-01-08 18:40 ` [ISSUE] [CLOSED] " narodnik
@ 2025-01-08 18:40 ` narodnik
2025-01-13 17:46 ` CaioFrancisco
9 siblings, 0 replies; 11+ messages in thread
From: narodnik @ 2025-01-08 18:40 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 283 bytes --]
New comment by narodnik on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2578368312
Comment:
I updated my WM to the latest version and the error is gone. I'm now on the latest mesa and Linux with no issues. Sorry about the trouble.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: amdgpu system freeze
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
` (8 preceding siblings ...)
2025-01-08 18:40 ` narodnik
@ 2025-01-13 17:46 ` CaioFrancisco
9 siblings, 0 replies; 11+ messages in thread
From: CaioFrancisco @ 2025-01-13 17:46 UTC (permalink / raw)
To: ml
[-- Attachment #1: Type: text/plain, Size: 520 bytes --]
New comment by CaioFrancisco on void-packages repository
https://github.com/void-linux/void-packages/issues/53787#issuecomment-2587780265
Comment:
i'm still suffering from this bug. in fact, it got worse after i just updated to the lastest amd microcode.
it was steadily improving over the last few weeks, with a crash very rarely happening. now it's happening once every hour again. i think i might just stop updating until the source of this bug is figured out.
@narodnik can you please open this issue again?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-01-13 17:46 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-01 14:42 [ISSUE] amdgpu system freeze narodnik
2025-01-01 22:03 ` TeusLollo
2025-01-01 22:04 ` TeusLollo
2025-01-01 22:13 ` TeusLollo
2025-01-02 1:16 ` CaioFrancisco
2025-01-02 7:47 ` narodnik
2025-01-02 16:31 ` TeusLollo
2025-01-02 16:32 ` TeusLollo
2025-01-08 18:40 ` [ISSUE] [CLOSED] " narodnik
2025-01-08 18:40 ` narodnik
2025-01-13 17:46 ` CaioFrancisco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).