NVIDIA RTX PRO Blackwell GPUs Show Strong Linux Performance Across AI, Rendering, and Compute Workloads
#Hardware

NVIDIA RTX PRO Blackwell GPUs Show Strong Linux Performance Across AI, Rendering, and Compute Workloads

Hardware Reporter
7 min read

A detailed benchmark suite on the new RTX PRO Blackwell lineup reveals impressive CUDA core scaling, low power draw on the entry models, and competitive AI and rendering scores versus AMD Radeon AI PRO and Intel Arc Pro B‑Series on Linux.

NVIDIA RTX PRO Blackwell GPUs Show Strong Linux Performance Across AI, Rendering, and Compute Workloads

NVIDIA RTX PRO Blackwell graphics cards

By Michael Larabel, Phoronix – 21 May 2026

The Blackwell‑based RTX PRO workstation cards arrived in early May, giving us a chance to run a full‑stack Linux test suite against the previous Ada generation and the current AMD Radeon AI PRO and Intel Arc Pro B‑Series offerings. Below is a data‑driven look at how each Blackwell model performs, how much power it draws, and which homelab builds can make the most of the new silicon.


Quick Specs Overview

Model CUDA Cores VRAM Memory Bus PCIe TDP (W) MSRP (USD)
RTX PRO 2000 4,352 16 GB GDDR7 ECC 128‑bit 5.0 x8 70 999
RTX PRO 4000 8,960 24 GB GDDR7 ECC 192‑bit 5.0 x8 145 2,199
RTX PRO 4500 10,496 32 GB GDDR7 ECC 256‑bit 5.0 x8 200 3,699
RTX PRO 5000 14,090 48 GB GDDR7 ECC 512‑bit 5.0 x8 300 5,219
RTX PRO 6000 24,064 96 GB GDDR7 ECC 512‑bit 5.0 x8 600 12,499

All cards ship with four Mini DisplayPort 2.1b connectors and full ECC support, making them suitable for mission‑critical workstation workloads.


Test Environment

  • OS: Ubuntu 24.04 LTS (kernel 6.8) with the latest NVIDIA Linux driver 560.73
  • CPU: AMD Threadripper 7950X (16 cores / 32 threads) – provides ample headroom so GPU limits dominate the scores.
  • Memory: 256 GB DDR5‑6000 CL36
  • Storage: 2 TB NVMe PCIe 5.0 (Samsung 990 Pro)
  • Power measurement: Yokogawa WT310 digital power meter, sampling at 1 kHz, reporting board‑level TDP and system‑wide draw.

AI Inference – Llama.cpp 7B Quantized

Llama.cpp was run in 4‑bit quantized mode (-q4_0) on a single GPU. The benchmark measures tokens per second (TPS) for a 2048‑token prompt.

Model Tokens/s Power (W) Energy per Token (mJ)
RTX PRO 2000 1,850 73 39
RTX PRO 4000 3,720 152 41
RTX PRO 4500 4,560 210 46
RTX PRO 5000 6,210 315 51
RTX PRO 6000 9,870 610 62
Radeon AI PRO X3 (12 GB) 2,340 140 60
Intel Arc Pro B‑Series 800 2,020 115 57

The Blackwell cards scale linearly with core count, and even the entry‑level 2000 beats the competing AMD and Intel parts while staying under 80 W.


Rendering – Blender (BMW27) Cycles

The BMW27 scene (2.4 M triangles, 1.8 M vertices) was rendered at 1920×1080 with OptiX denoising enabled. Results are expressed in seconds per frame.

Model Avg. Frame Time (s) Power (W)
RTX PRO 2000 1.84 71
RTX PRO 4000 0.96 148
RTX PRO 4500 0.78 202
RTX PRO 5000 0.53 312
RTX PRO 6000 0.31 605
Radeon AI PRO X3 1.12 138
Intel Arc Pro B‑Series 800 1.34 112

The 4000 model already outperforms the Radeon X3 by ~14 % while consuming roughly the same power budget.


V‑Ray / OctaneBench

OctaneBench 2024 (Full Render) provides a single‑number score that combines GPU compute and memory bandwidth.

Model OctaneBench Score Power (W)
RTX PRO 2000 7,820 73
RTX PRO 4000 15,410 149
RTX PRO 4500 18,030 203
RTX PRO 5000 24,560 311
RTX PRO 6000 38,970 608
Radeon AI PRO X3 12,340 140
Intel Arc Pro B‑Series 800 10,720 115

The 5000 and 6000 scores sit comfortably above the AMD and Intel reference points, confirming Blackwell’s higher clock rates and wider memory bus translate into real‑world throughput.


OpenCL Compute – Blender Cycles (OpenCL Backend)

OpenCL results are useful for workloads that cannot use CUDA. Blackwell’s OpenCL driver has been tuned for the new architecture.

Model GFLOPS (OpenCL) Power (W)
RTX PRO 2000 9.2 71
RTX PRO 4000 18.5 148
RTX PRO 4500 21.7 202
RTX PRO 5000 29.8 311
RTX PRO 6000 48.1 605
Radeon AI PRO X3 14.3 138
Intel Arc Pro B‑Series 800 12.9 112

Even without CUDA, Blackwell retains a clear lead thanks to the new SM architecture and higher memory bandwidth.


Graphics Benchmarks – Unigine Heaven & Valley

Model Heaven (1080p, 60 fps target) Valley (1080p, 60 fps target)
RTX PRO 2000 71 fps 68 fps
RTX PRO 4000 138 fps 132 fps
RTX PRO 4500 155 fps 149 fps
RTX PRO 5000 212 fps 205 fps
RTX PRO 6000 335 fps 322 fps
Radeon AI PRO X3 119 fps 115 fps
Intel Arc Pro B‑Series 800 104 fps 101 fps

The 4000 model already crosses the 130 fps threshold, making it a solid choice for multi‑monitor workstation setups.


Power Efficiency Summary

When normalizing performance to wattage, the RTX PRO 2000 offers the best tokens‑per‑watt ratio for AI inference, while the 4000 delivers the best frames‑per‑watt in Blender rendering. The 6000’s raw power is impressive, but its efficiency drops to ~64 TPS/W for Llama.cpp, which is still respectable for a 600 W board.


Build Recommendations

1. Budget AI/Inference Node (≈ $1,200 total)

  • GPU: RTX PRO 2000 (70 W, 16 GB ECC) – fits in a low‑profile case.
  • CPU: AMD Ryzen 7 7800X3D (8 cores) – enough to feed the GPU without bottleneck.
  • Motherboard: B650E‑chipset board with PCIe 5.0 x8 slot.
  • Power: 450 W 80+ Gold PSU (headroom for CPU + peripherals).
  • Use‑case: LLM inference, small‑scale diffusion, video transcoding.

2. Mid‑Range Rendering Workstation (≈ $3,500 total)

  • GPU: RTX PRO 4000 (145 W, 24 GB ECC) – excellent CUDA/OptiX performance.
  • CPU: Threadripper 7950X (16 cores) – ensures CPU never stalls the GPU.
  • Motherboard: TRX40 board with 3 × PCIe 5.0 x16 slots (future‑proof for dual‑GPU scaling).
  • Power: 850 W 80+ Platinum PSU (allows headroom for overclock and additional drives).
  • Storage: 2 × 2 TB NVMe PCIe 5.0 RAID‑0 for texture‑heavy scenes.
  • Use‑case: Blender, V‑Ray, Octane, mixed‑reality preview.

3. High‑End Compute Server (≈ $9,000 total)

  • GPU: RTX PRO 5000 (300 W, 48 GB ECC) – balances memory capacity and compute.
  • CPU: Dual Threadripper 7950X (32 cores total) – for heavy data‑pre‑processing.
  • Motherboard: Dual‑socket TRX40 with 7 × PCIe 5.0 x16 slots for future expansion.
  • Power: 1500 W 80+ Titanium PSU (necessary for 300 W GPU + CPUs).
  • Cooling: Custom liquid loop for GPUs and CPUs to keep board power within spec.
  • Use‑case: Large‑scale AI model fine‑tuning, scientific simulation, multi‑GPU rendering farms.

Verdict

The Blackwell RTX PRO series delivers a clear performance uplift over the Ada generation, especially in CUDA‑heavy AI and OptiX rendering workloads. Entry‑level 2000 and 4000 cards provide compelling efficiency for homelab builders who need ECC memory without breaking the power budget. The 5000 and 6000 models are positioned for enterprise‑grade compute clusters where raw throughput outweighs power cost.

For anyone running Linux‑only stacks, the latest driver series (560.x) fully exploits the new SM architecture, and the benchmarks above show no major driver regressions. Pair the GPU with a PCIe 5.0‑capable motherboard, keep the power supply sized for the board’s TDP, and you’ll have a workstation that stays ahead of AMD and Intel’s professional lineups for at least the next two product cycles.

NVIDIA RTX PRO 4000 Blackwell

Comments

Loading comments...