Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workworkloads
#Hardware

Linux 7.0 AMDGPU Fixing Idle Power Issue For RDNA4 GPUs After Compute Workworkloads

Hardware Reporter
1 min read

A fix is coming to Linux 7.0 for RDNA4 GPUs that stay at 100% GPU usage after compute workloads, causing high idle power consumption.

A fix is on the way to the Linux 7.0 kernel today for addressing an idle power issue with AMD RDNA4 GPUs reporting high power consumption and full utilization even after being "idle" following compute workloads like Llama.cpp.

Stemming from this Llama.cpp bug report back in November about Llama.cpp with the HIP back-end causing the Radeon R9700 GPU to remain at 100% GPU usage after idle, a fix is on the way for at least Linux 7.0. With a newer GPU MES firmware is also on the way to being released to fix the underlying issue.

This patch notes that current/older MES firmware may cause "abnormal" GPU power consumption on RDNA4/GFX12 hardware. This happens when performing inference tasks on the GPU like ollama / Llama.cpp. This can lead to high power use in idle state and incorrect GPU load information. AMD is working to release the new MES firmware publicly to address the issue while the AMDGPU kernel driver added a check to adjust the MES over subscription timer to workaround it without changing the firmware or those stuck on an older firmware version.

This idle power fix for compute workloads was sent in via this week's AMDGPU fixes for Linux 7.0 and should be hitting Linux 7.0 Git in the next day or so as part of the broader DRM fixes pull for the week. This week's AMDGPU code also has some SMU13 and SMU14 fixes and other minor fixes.

Comments

Loading comments...