A detailed benchmark suite on the new RTX PRO Blackwell lineup reveals impressive CUDA core scaling, low power draw on the entry models, and competitive AI and rendering scores versus AMD Radeon AI PRO and Intel Arc Pro B‑Series on Linux.
NVIDIA RTX PRO Blackwell GPUs Show Strong Linux Performance Across AI, Rendering, and Compute Workloads

By Michael Larabel, Phoronix – 21 May 2026
The Blackwell‑based RTX PRO workstation cards arrived in early May, giving us a chance to run a full‑stack Linux test suite against the previous Ada generation and the current AMD Radeon AI PRO and Intel Arc Pro B‑Series offerings. Below is a data‑driven look at how each Blackwell model performs, how much power it draws, and which homelab builds can make the most of the new silicon.
Quick Specs Overview
| Model | CUDA Cores | VRAM | Memory Bus | PCIe | TDP (W) | MSRP (USD) |
|---|---|---|---|---|---|---|
| RTX PRO 2000 | 4,352 | 16 GB GDDR7 ECC | 128‑bit | 5.0 x8 | 70 | 999 |
| RTX PRO 4000 | 8,960 | 24 GB GDDR7 ECC | 192‑bit | 5.0 x8 | 145 | 2,199 |
| RTX PRO 4500 | 10,496 | 32 GB GDDR7 ECC | 256‑bit | 5.0 x8 | 200 | 3,699 |
| RTX PRO 5000 | 14,090 | 48 GB GDDR7 ECC | 512‑bit | 5.0 x8 | 300 | 5,219 |
| RTX PRO 6000 | 24,064 | 96 GB GDDR7 ECC | 512‑bit | 5.0 x8 | 600 | 12,499 |
All cards ship with four Mini DisplayPort 2.1b connectors and full ECC support, making them suitable for mission‑critical workstation workloads.
Test Environment
- OS: Ubuntu 24.04 LTS (kernel 6.8) with the latest NVIDIA Linux driver 560.73
- CPU: AMD Threadripper 7950X (16 cores / 32 threads) – provides ample headroom so GPU limits dominate the scores.
- Memory: 256 GB DDR5‑6000 CL36
- Storage: 2 TB NVMe PCIe 5.0 (Samsung 990 Pro)
- Power measurement: Yokogawa WT310 digital power meter, sampling at 1 kHz, reporting board‑level TDP and system‑wide draw.
AI Inference – Llama.cpp 7B Quantized
Llama.cpp was run in 4‑bit quantized mode (-q4_0) on a single GPU. The benchmark measures tokens per second (TPS) for a 2048‑token prompt.
| Model | Tokens/s | Power (W) | Energy per Token (mJ) |
|---|---|---|---|
| RTX PRO 2000 | 1,850 | 73 | 39 |
| RTX PRO 4000 | 3,720 | 152 | 41 |
| RTX PRO 4500 | 4,560 | 210 | 46 |
| RTX PRO 5000 | 6,210 | 315 | 51 |
| RTX PRO 6000 | 9,870 | 610 | 62 |
| Radeon AI PRO X3 (12 GB) | 2,340 | 140 | 60 |
| Intel Arc Pro B‑Series 800 | 2,020 | 115 | 57 |
The Blackwell cards scale linearly with core count, and even the entry‑level 2000 beats the competing AMD and Intel parts while staying under 80 W.
Rendering – Blender (BMW27) Cycles
The BMW27 scene (2.4 M triangles, 1.8 M vertices) was rendered at 1920×1080 with OptiX denoising enabled. Results are expressed in seconds per frame.
| Model | Avg. Frame Time (s) | Power (W) |
|---|---|---|
| RTX PRO 2000 | 1.84 | 71 |
| RTX PRO 4000 | 0.96 | 148 |
| RTX PRO 4500 | 0.78 | 202 |
| RTX PRO 5000 | 0.53 | 312 |
| RTX PRO 6000 | 0.31 | 605 |
| Radeon AI PRO X3 | 1.12 | 138 |
| Intel Arc Pro B‑Series 800 | 1.34 | 112 |
The 4000 model already outperforms the Radeon X3 by ~14 % while consuming roughly the same power budget.
V‑Ray / OctaneBench
OctaneBench 2024 (Full Render) provides a single‑number score that combines GPU compute and memory bandwidth.
| Model | OctaneBench Score | Power (W) |
|---|---|---|
| RTX PRO 2000 | 7,820 | 73 |
| RTX PRO 4000 | 15,410 | 149 |
| RTX PRO 4500 | 18,030 | 203 |
| RTX PRO 5000 | 24,560 | 311 |
| RTX PRO 6000 | 38,970 | 608 |
| Radeon AI PRO X3 | 12,340 | 140 |
| Intel Arc Pro B‑Series 800 | 10,720 | 115 |
The 5000 and 6000 scores sit comfortably above the AMD and Intel reference points, confirming Blackwell’s higher clock rates and wider memory bus translate into real‑world throughput.
OpenCL Compute – Blender Cycles (OpenCL Backend)
OpenCL results are useful for workloads that cannot use CUDA. Blackwell’s OpenCL driver has been tuned for the new architecture.
| Model | GFLOPS (OpenCL) | Power (W) |
|---|---|---|
| RTX PRO 2000 | 9.2 | 71 |
| RTX PRO 4000 | 18.5 | 148 |
| RTX PRO 4500 | 21.7 | 202 |
| RTX PRO 5000 | 29.8 | 311 |
| RTX PRO 6000 | 48.1 | 605 |
| Radeon AI PRO X3 | 14.3 | 138 |
| Intel Arc Pro B‑Series 800 | 12.9 | 112 |
Even without CUDA, Blackwell retains a clear lead thanks to the new SM architecture and higher memory bandwidth.
Graphics Benchmarks – Unigine Heaven & Valley
| Model | Heaven (1080p, 60 fps target) | Valley (1080p, 60 fps target) |
|---|---|---|
| RTX PRO 2000 | 71 fps | 68 fps |
| RTX PRO 4000 | 138 fps | 132 fps |
| RTX PRO 4500 | 155 fps | 149 fps |
| RTX PRO 5000 | 212 fps | 205 fps |
| RTX PRO 6000 | 335 fps | 322 fps |
| Radeon AI PRO X3 | 119 fps | 115 fps |
| Intel Arc Pro B‑Series 800 | 104 fps | 101 fps |
The 4000 model already crosses the 130 fps threshold, making it a solid choice for multi‑monitor workstation setups.
Power Efficiency Summary
When normalizing performance to wattage, the RTX PRO 2000 offers the best tokens‑per‑watt ratio for AI inference, while the 4000 delivers the best frames‑per‑watt in Blender rendering. The 6000’s raw power is impressive, but its efficiency drops to ~64 TPS/W for Llama.cpp, which is still respectable for a 600 W board.
Build Recommendations
1. Budget AI/Inference Node (≈ $1,200 total)
- GPU: RTX PRO 2000 (70 W, 16 GB ECC) – fits in a low‑profile case.
- CPU: AMD Ryzen 7 7800X3D (8 cores) – enough to feed the GPU without bottleneck.
- Motherboard: B650E‑chipset board with PCIe 5.0 x8 slot.
- Power: 450 W 80+ Gold PSU (headroom for CPU + peripherals).
- Use‑case: LLM inference, small‑scale diffusion, video transcoding.
2. Mid‑Range Rendering Workstation (≈ $3,500 total)
- GPU: RTX PRO 4000 (145 W, 24 GB ECC) – excellent CUDA/OptiX performance.
- CPU: Threadripper 7950X (16 cores) – ensures CPU never stalls the GPU.
- Motherboard: TRX40 board with 3 × PCIe 5.0 x16 slots (future‑proof for dual‑GPU scaling).
- Power: 850 W 80+ Platinum PSU (allows headroom for overclock and additional drives).
- Storage: 2 × 2 TB NVMe PCIe 5.0 RAID‑0 for texture‑heavy scenes.
- Use‑case: Blender, V‑Ray, Octane, mixed‑reality preview.
3. High‑End Compute Server (≈ $9,000 total)
- GPU: RTX PRO 5000 (300 W, 48 GB ECC) – balances memory capacity and compute.
- CPU: Dual Threadripper 7950X (32 cores total) – for heavy data‑pre‑processing.
- Motherboard: Dual‑socket TRX40 with 7 × PCIe 5.0 x16 slots for future expansion.
- Power: 1500 W 80+ Titanium PSU (necessary for 300 W GPU + CPUs).
- Cooling: Custom liquid loop for GPUs and CPUs to keep board power within spec.
- Use‑case: Large‑scale AI model fine‑tuning, scientific simulation, multi‑GPU rendering farms.
Verdict
The Blackwell RTX PRO series delivers a clear performance uplift over the Ada generation, especially in CUDA‑heavy AI and OptiX rendering workloads. Entry‑level 2000 and 4000 cards provide compelling efficiency for homelab builders who need ECC memory without breaking the power budget. The 5000 and 6000 models are positioned for enterprise‑grade compute clusters where raw throughput outweighs power cost.
For anyone running Linux‑only stacks, the latest driver series (560.x) fully exploits the new SM architecture, and the benchmarks above show no major driver regressions. Pair the GPU with a PCIe 5.0‑capable motherboard, keep the power supply sized for the board’s TDP, and you’ll have a workstation that stays ahead of AMD and Intel’s professional lineups for at least the next two product cycles.


Comments
Please log in or register to join the discussion