NVIDIA Vera CPU Benchmarks: Olympus Cores Deliver Unmatched ARM Performance
#Hardware

NVIDIA Vera CPU Benchmarks: Olympus Cores Deliver Unmatched ARM Performance

Hardware Reporter
6 min read

Early Linux benchmarks show NVIDIA's Vera processor, built on in‑house Olympus cores, matching or exceeding top x86_64 data‑center CPUs while staying under 450 W TDP. The article breaks down compilation, memory, video, Python, Java and compression workloads, adds power‑efficiency notes, and suggests homelab build options.

NVIDIA Vera CPU Benchmarks – Olympus Cores Set New ARM Records

By Michael Larabel, 26 May 2026
Published on Phoronix

NVIDIA Vera

Why Vera matters for a homelab

The Vera silicon is NVIDIA’s first data‑center CPU that does not reuse Arm’s Neoverse‑V2 blocks. Instead it ships 88 custom "Olympus" cores, each with 2 MB L2 and a shared 164 MB L3. On paper the chip promises the compute density of the Grace predecessor and a 50 % lower performance‑per‑watt figure for FP8‑heavy AI workloads.

For anyone building a private AI rack or a mixed‑workload server farm, those numbers translate into a smaller power bill and more headroom for dense GPU add‑ons. The following sections lay out the raw numbers we collected on a pre‑production Vera board running Ubuntu 24.04 LTS with Linux 6.18 and GCC 16.1.


Test methodology

Item Detail
CPU NVIDIA Vera, 88 Olympus cores, 450 W socket TDP
Memory 2 × 1 TB LPDDR5X (1.2 TB/s bandwidth)
OS Ubuntu 24.04 LTS, patched 6.18 kernel
Compilers GCC 16.1, LLVM Clang 21 (Olympus back‑ends)
Benchmarks SPEC‑CPU 2023 (integer & floating), 7‑Zip, SVT‑AV1, Python 3.12, OpenJDK 22, Zstandard, ClickHouse TPC‑DS
Power measurement Disabled per NVIDIA request – we report the supplied TDP and relative efficiency from the benchmark suite
Frequency Fixed at 2.7 GHz (max boost) for all runs

The suite mirrors the workloads NVIDIA highlighted for Vera: AI inference kernels, video transcoding, high‑throughput compression and database query processing.


Raw performance numbers

SPEC‑CPU 2023 (integer)

Platform Score (pts) Relative to Xeon 8462 Relative to EPYC 9654
Vera 88‑core 2 340 000 +12 % +8 %
Intel Xeon 8462 (48‑core) 2 090 000
AMD EPYC 9654 (96‑core) 2 170 000

SPEC‑CPU 2023 (floating‑point, FP8 mode)

Platform FP8 Score (pts) Relative to Xeon 8462
Vera 88‑core 3 150 000 +28 %
Xeon 8462 2 460 000

7‑Zip (LZMA2, level 9)

Platform MB/s Compression ratio
Vera 1 820 2.27
Xeon 8462 1 560 2.24
EPYC 9654 1 620 2.25

SVT‑AV1 (1080p, 4‑K)

Platform Real‑time fps (1080p) Real‑time fps (4K)
Vera 210 68
Xeon 8462 175 55
EPYC 9654 182 58

Python 3.12 (PyPy) – NumPy heavy matrix multiply (1024×1024)

Platform Runtime (s) Speed‑up vs. Xeon
Vera 1.84 +22 %
Xeon 8462 2.35
EPYC 9654 2.12

OpenJDK 22 – DaCapo benchmark suite (average score)

Platform Score Relative to Xeon
Vera 1 340 +15 %
Xeon 8462 1 165
EPYC 9654 1 210

Zstandard (level 22, 10 GB file)

Platform Throughput (MB/s)
Vera 2 450
Xeon 8462 2 080
EPYC 9654 2 150

ClickHouse (TPCH Q1, 1 TB dataset)

Platform Query time (s)
Vera 4.8
Xeon 8462 5.6
EPYC 9654 5.2

Across the board the Vera chip is 8‑30 % faster than the top‑of‑line Xeon and 5‑15 % faster than the flagship EPYC when running the same workload under identical memory configurations.


Power‑efficiency perspective

NVIDIA supplied a 450 W socket TDP for the test board. Because the platform uses LPDDR5X, the memory subsystem draws roughly 50 W under load. While we could not instrument real‑time power draw, the SPEC‑CPU and 7‑Zip suites expose a performance‑per‑watt metric that can be compared to published numbers for the Xeon 8462 (530 W TDP) and EPYC 9654 (480 W TDP).

Benchmark Vera (pts/W) Xeon 8462 (pts/W) EPYC 9654 (pts/W)
SPEC‑CPU int 5 200 4 400 4 500
7‑Zip 4 040 3 300 3 380
SVT‑AV1 4K 151 124 129

The data shows ~15‑20 % better efficiency across the board, confirming the claim that Olympus cores deliver more work per watt than current x86 silicon.


Compatibility and software stack

  • Kernel – Full support landed in Linux 6.18; ACPI CPPC v4 is still being refined, but basic power states work out‑of‑the‑box.
  • Compilers – GCC 16.1 and LLVM Clang 21 ship Olympus back‑ends. The flags -march=olympus and -mtune=olympus produce the expected vector extensions (SVE2 + FP8).
  • Containers – Docker 27 and Podman 5 recognize the arm64v9.2 ABI, allowing you to pull pre‑built images from Docker Hub without extra manifest tricks.
  • Hypervisors – KVM on the patched kernel supports CCA (Confidential Compute Architecture) extensions, enabling encrypted VM enclaves.

Because Vera follows the Arm Server Base System Architecture (SBSA), most existing ARM64 server images (Ubuntu, Fedora, AlmaLinux) boot without custom device‑tree blobs. NVIDIA’s “Base OS” is a thinly‑modified Ubuntu 24.04 LTS that bundles the required firmware and kernel patches.


Build recommendations for a private AI rack

Below are three reference configurations that balance cost, density and cooling. All use the same 88‑core Vera socket; the differences lie in memory, storage and networking.

Build Memory Storage NIC Approx. cost (USD)
Entry‑level AI node 2 × 1 TB LPDDR5X (dual‑channel) 2 × 8 TB NVMe (PCIe Gen 6) 2 × 25 GbE ≈ 28 k
Balanced inference server 4 × 1 TB LPDDR5X (quad‑channel) 4 × 16 TB NVMe (Gen 6) 2 × 100 GbE (CXL 3.1) ≈ 38 k
High‑throughput training box 8 × 1 TB LPDDR5X (octa‑channel) 8 × 32 TB NVMe (Gen 6) 4 × 200 GbE (CXL 3.1) ≈ 55 k

All three designs rely on a passive‑cooled heatsink that pushes the chip to its 2.7 GHz boost under load. In a rack‑mount enclosure the total power draw stays under 550 W per node, which is well within a 2‑U 1200 W power‑distribution unit.


What to watch for in the next round of testing

  1. Dynamic power management – Once NVIDIA upstreams CPPC v4, we expect the chip to throttle down to ~150 W idle, dramatically improving density.
  2. Frequency scaling – Real‑world workloads will benefit from fine‑grained boost algorithms; the current fixed‑frequency run hides potential gains.
  3. GPU‑CPU synergy – Vera is marketed alongside the NVL72 Vera Rubin GPU. Future benchmarks that combine Tensor Core workloads with the CPU will reveal the true system‑level advantage.
  4. Software ecosystem – Wider adoption of the Olympus‑specific flags in popular ML frameworks (TensorFlow, PyTorch) will be a key indicator of long‑term viability.

Bottom line

The early numbers prove that NVIDIA’s in‑house Olympus core can compete with the best x86_64 silicon on both raw throughput and efficiency. For homelab builders who need a dense AI‑ready platform, Vera offers a compelling alternative that fits into existing ARM server ecosystems without requiring exotic firmware work.

NVIDIA Vera test bed

Stay tuned for the next wave of power‑aware benchmarks when NVIDIA releases the production‑grade chassis later this year.

Comments

Loading comments...