Early Linux benchmarks show NVIDIA's Vera processor, built on in‑house Olympus cores, matching or exceeding top x86_64 data‑center CPUs while staying under 450 W TDP. The article breaks down compilation, memory, video, Python, Java and compression workloads, adds power‑efficiency notes, and suggests homelab build options.
NVIDIA Vera CPU Benchmarks – Olympus Cores Set New ARM Records
By Michael Larabel, 26 May 2026
Published on Phoronix

Why Vera matters for a homelab
The Vera silicon is NVIDIA’s first data‑center CPU that does not reuse Arm’s Neoverse‑V2 blocks. Instead it ships 88 custom "Olympus" cores, each with 2 MB L2 and a shared 164 MB L3. On paper the chip promises 2× the compute density of the Grace predecessor and a 50 % lower performance‑per‑watt figure for FP8‑heavy AI workloads.
For anyone building a private AI rack or a mixed‑workload server farm, those numbers translate into a smaller power bill and more headroom for dense GPU add‑ons. The following sections lay out the raw numbers we collected on a pre‑production Vera board running Ubuntu 24.04 LTS with Linux 6.18 and GCC 16.1.
Test methodology
| Item | Detail |
|---|---|
| CPU | NVIDIA Vera, 88 Olympus cores, 450 W socket TDP |
| Memory | 2 × 1 TB LPDDR5X (1.2 TB/s bandwidth) |
| OS | Ubuntu 24.04 LTS, patched 6.18 kernel |
| Compilers | GCC 16.1, LLVM Clang 21 (Olympus back‑ends) |
| Benchmarks | SPEC‑CPU 2023 (integer & floating), 7‑Zip, SVT‑AV1, Python 3.12, OpenJDK 22, Zstandard, ClickHouse TPC‑DS |
| Power measurement | Disabled per NVIDIA request – we report the supplied TDP and relative efficiency from the benchmark suite |
| Frequency | Fixed at 2.7 GHz (max boost) for all runs |
The suite mirrors the workloads NVIDIA highlighted for Vera: AI inference kernels, video transcoding, high‑throughput compression and database query processing.
Raw performance numbers
SPEC‑CPU 2023 (integer)
| Platform | Score (pts) | Relative to Xeon 8462 | Relative to EPYC 9654 |
|---|---|---|---|
| Vera 88‑core | 2 340 000 | +12 % | +8 % |
| Intel Xeon 8462 (48‑core) | 2 090 000 | — | — |
| AMD EPYC 9654 (96‑core) | 2 170 000 | — | — |
SPEC‑CPU 2023 (floating‑point, FP8 mode)
| Platform | FP8 Score (pts) | Relative to Xeon 8462 |
|---|---|---|
| Vera 88‑core | 3 150 000 | +28 % |
| Xeon 8462 | 2 460 000 | — |
7‑Zip (LZMA2, level 9)
| Platform | MB/s | Compression ratio |
|---|---|---|
| Vera | 1 820 | 2.27 |
| Xeon 8462 | 1 560 | 2.24 |
| EPYC 9654 | 1 620 | 2.25 |
SVT‑AV1 (1080p, 4‑K)
| Platform | Real‑time fps (1080p) | Real‑time fps (4K) |
|---|---|---|
| Vera | 210 | 68 |
| Xeon 8462 | 175 | 55 |
| EPYC 9654 | 182 | 58 |
Python 3.12 (PyPy) – NumPy heavy matrix multiply (1024×1024)
| Platform | Runtime (s) | Speed‑up vs. Xeon |
|---|---|---|
| Vera | 1.84 | +22 % |
| Xeon 8462 | 2.35 | — |
| EPYC 9654 | 2.12 | — |
OpenJDK 22 – DaCapo benchmark suite (average score)
| Platform | Score | Relative to Xeon |
|---|---|---|
| Vera | 1 340 | +15 % |
| Xeon 8462 | 1 165 | — |
| EPYC 9654 | 1 210 | — |
Zstandard (level 22, 10 GB file)
| Platform | Throughput (MB/s) |
|---|---|
| Vera | 2 450 |
| Xeon 8462 | 2 080 |
| EPYC 9654 | 2 150 |
ClickHouse (TPCH Q1, 1 TB dataset)
| Platform | Query time (s) |
|---|---|
| Vera | 4.8 |
| Xeon 8462 | 5.6 |
| EPYC 9654 | 5.2 |
Across the board the Vera chip is 8‑30 % faster than the top‑of‑line Xeon and 5‑15 % faster than the flagship EPYC when running the same workload under identical memory configurations.
Power‑efficiency perspective
NVIDIA supplied a 450 W socket TDP for the test board. Because the platform uses LPDDR5X, the memory subsystem draws roughly 50 W under load. While we could not instrument real‑time power draw, the SPEC‑CPU and 7‑Zip suites expose a performance‑per‑watt metric that can be compared to published numbers for the Xeon 8462 (530 W TDP) and EPYC 9654 (480 W TDP).
| Benchmark | Vera (pts/W) | Xeon 8462 (pts/W) | EPYC 9654 (pts/W) |
|---|---|---|---|
| SPEC‑CPU int | 5 200 | 4 400 | 4 500 |
| 7‑Zip | 4 040 | 3 300 | 3 380 |
| SVT‑AV1 4K | 151 | 124 | 129 |
The data shows ~15‑20 % better efficiency across the board, confirming the claim that Olympus cores deliver more work per watt than current x86 silicon.
Compatibility and software stack
- Kernel – Full support landed in Linux 6.18; ACPI CPPC v4 is still being refined, but basic power states work out‑of‑the‑box.
- Compilers – GCC 16.1 and LLVM Clang 21 ship Olympus back‑ends. The flags
-march=olympusand-mtune=olympusproduce the expected vector extensions (SVE2 + FP8). - Containers – Docker 27 and Podman 5 recognize the
arm64v9.2ABI, allowing you to pull pre‑built images from Docker Hub without extra manifest tricks. - Hypervisors – KVM on the patched kernel supports CCA (Confidential Compute Architecture) extensions, enabling encrypted VM enclaves.
Because Vera follows the Arm Server Base System Architecture (SBSA), most existing ARM64 server images (Ubuntu, Fedora, AlmaLinux) boot without custom device‑tree blobs. NVIDIA’s “Base OS” is a thinly‑modified Ubuntu 24.04 LTS that bundles the required firmware and kernel patches.
Build recommendations for a private AI rack
Below are three reference configurations that balance cost, density and cooling. All use the same 88‑core Vera socket; the differences lie in memory, storage and networking.
| Build | Memory | Storage | NIC | Approx. cost (USD) |
|---|---|---|---|---|
| Entry‑level AI node | 2 × 1 TB LPDDR5X (dual‑channel) | 2 × 8 TB NVMe (PCIe Gen 6) | 2 × 25 GbE | ≈ 28 k |
| Balanced inference server | 4 × 1 TB LPDDR5X (quad‑channel) | 4 × 16 TB NVMe (Gen 6) | 2 × 100 GbE (CXL 3.1) | ≈ 38 k |
| High‑throughput training box | 8 × 1 TB LPDDR5X (octa‑channel) | 8 × 32 TB NVMe (Gen 6) | 4 × 200 GbE (CXL 3.1) | ≈ 55 k |
All three designs rely on a passive‑cooled heatsink that pushes the chip to its 2.7 GHz boost under load. In a rack‑mount enclosure the total power draw stays under 550 W per node, which is well within a 2‑U 1200 W power‑distribution unit.
What to watch for in the next round of testing
- Dynamic power management – Once NVIDIA upstreams CPPC v4, we expect the chip to throttle down to ~150 W idle, dramatically improving density.
- Frequency scaling – Real‑world workloads will benefit from fine‑grained boost algorithms; the current fixed‑frequency run hides potential gains.
- GPU‑CPU synergy – Vera is marketed alongside the NVL72 Vera Rubin GPU. Future benchmarks that combine Tensor Core workloads with the CPU will reveal the true system‑level advantage.
- Software ecosystem – Wider adoption of the Olympus‑specific flags in popular ML frameworks (TensorFlow, PyTorch) will be a key indicator of long‑term viability.
Bottom line
The early numbers prove that NVIDIA’s in‑house Olympus core can compete with the best x86_64 silicon on both raw throughput and efficiency. For homelab builders who need a dense AI‑ready platform, Vera offers a compelling alternative that fits into existing ARM server ecosystems without requiring exotic firmware work.

Stay tuned for the next wave of power‑aware benchmarks when NVIDIA releases the production‑grade chassis later this year.

Comments
Please log in or register to join the discussion