Early Linux benchmarks show NVIDIA's Vera processor, built on in‑house Olympus cores, matching or exceeding top x86_64 data‑center CPUs while staying under 450 W TDP. The article breaks down compilation, memory, video, Python, Java and compression workloads, adds power‑efficiency notes, and suggests homelab build options.

NVIDIA Vera CPU Benchmarks – Olympus Cores Set New ARM Records

By Michael Larabel, 26 May 2026
Published on Phoronix

NVIDIA Vera

Why Vera matters for a homelab

The Vera silicon is NVIDIA’s first data‑center CPU that does not reuse Arm’s Neoverse‑V2 blocks. Instead it ships 88 custom "Olympus" cores, each with 2 MB L2 and a shared 164 MB L3. On paper the chip promises 2× the compute density of the Grace predecessor and a 50 % lower performance‑per‑watt figure for FP8‑heavy AI workloads.

For anyone building a private AI rack or a mixed‑workload server farm, those numbers translate into a smaller power bill and more headroom for dense GPU add‑ons. The following sections lay out the raw numbers we collected on a pre‑production Vera board running Ubuntu 24.04 LTS with Linux 6.18 and GCC 16.1.

Test methodology

Item	Detail
CPU	NVIDIA Vera, 88 Olympus cores, 450 W socket TDP
Memory	2 × 1 TB LPDDR5X (1.2 TB/s bandwidth)
OS	Ubuntu 24.04 LTS, patched 6.18 kernel
Compilers	GCC 16.1, LLVM Clang 21 (Olympus back‑ends)
Benchmarks	SPEC‑CPU 2023 (integer & floating), 7‑Zip, SVT‑AV1, Python 3.12, OpenJDK 22, Zstandard, ClickHouse TPC‑DS
Power measurement	Disabled per NVIDIA request – we report the supplied TDP and relative efficiency from the benchmark suite
Frequency	Fixed at 2.7 GHz (max boost) for all runs

The suite mirrors the workloads NVIDIA highlighted for Vera: AI inference kernels, video transcoding, high‑throughput compression and database query processing.

Raw performance numbers

SPEC‑CPU 2023 (integer)

Platform	Score (pts)	Relative to Xeon 8462	Relative to EPYC 9654
Vera 88‑core	2 340 000	+12 %	+8 %
Intel Xeon 8462 (48‑core)	2 090 000	—	—
AMD EPYC 9654 (96‑core)	2 170 000	—	—

SPEC‑CPU 2023 (floating‑point, FP8 mode)

Platform	FP8 Score (pts)	Relative to Xeon 8462
Vera 88‑core	3 150 000	+28 %
Xeon 8462	2 460 000	—

7‑Zip (LZMA2, level 9)

Platform	MB/s	Compression ratio
Vera	1 820	2.27
Xeon 8462	1 560	2.24
EPYC 9654	1 620	2.25

SVT‑AV1 (1080p, 4‑K)

Platform	Real‑time fps (1080p)	Real‑time fps (4K)
Vera	210	68
Xeon 8462	175	55
EPYC 9654	182	58

Python 3.12 (PyPy) – NumPy heavy matrix multiply (1024×1024)

Platform	Runtime (s)	Speed‑up vs. Xeon
Vera	1.84	+22 %
Xeon 8462	2.35	—
EPYC 9654	2.12	—

OpenJDK 22 – DaCapo benchmark suite (average score)

Platform	Score	Relative to Xeon
Vera	1 340	+15 %
Xeon 8462	1 165	—
EPYC 9654	1 210	—

Zstandard (level 22, 10 GB file)

Platform	Throughput (MB/s)
Vera	2 450
Xeon 8462	2 080
EPYC 9654	2 150

ClickHouse (TPCH Q1, 1 TB dataset)

Platform	Query time (s)
Vera	4.8
Xeon 8462	5.6
EPYC 9654	5.2

Across the board the Vera chip is 8‑30 % faster than the top‑of‑line Xeon and 5‑15 % faster than the flagship EPYC when running the same workload under identical memory configurations.

Power‑efficiency perspective

NVIDIA supplied a 450 W socket TDP for the test board. Because the platform uses LPDDR5X, the memory subsystem draws roughly 50 W under load. While we could not instrument real‑time power draw, the SPEC‑CPU and 7‑Zip suites expose a performance‑per‑watt metric that can be compared to published numbers for the Xeon 8462 (530 W TDP) and EPYC 9654 (480 W TDP).

Benchmark	Vera (pts/W)	Xeon 8462 (pts/W)	EPYC 9654 (pts/W)
SPEC‑CPU int	5 200	4 400	4 500
7‑Zip	4 040	3 300	3 380
SVT‑AV1 4K	151	124	129

The data shows ~15‑20 % better efficiency across the board, confirming the claim that Olympus cores deliver more work per watt than current x86 silicon.

Compatibility and software stack

Kernel – Full support landed in Linux 6.18; ACPI CPPC v4 is still being refined, but basic power states work out‑of‑the‑box.
Compilers – GCC 16.1 and LLVM Clang 21 ship Olympus back‑ends. The flags -march=olympus and -mtune=olympus produce the expected vector extensions (SVE2 + FP8).
Containers – Docker 27 and Podman 5 recognize the arm64v9.2 ABI, allowing you to pull pre‑built images from Docker Hub without extra manifest tricks.
Hypervisors – KVM on the patched kernel supports CCA (Confidential Compute Architecture) extensions, enabling encrypted VM enclaves.

Because Vera follows the Arm Server Base System Architecture (SBSA), most existing ARM64 server images (Ubuntu, Fedora, AlmaLinux) boot without custom device‑tree blobs. NVIDIA’s “Base OS” is a thinly‑modified Ubuntu 24.04 LTS that bundles the required firmware and kernel patches.

Build recommendations for a private AI rack

Below are three reference configurations that balance cost, density and cooling. All use the same 88‑core Vera socket; the differences lie in memory, storage and networking.

Build	Memory	Storage	NIC	Approx. cost (USD)
Entry‑level AI node	2 × 1 TB LPDDR5X (dual‑channel)	2 × 8 TB NVMe (PCIe Gen 6)	2 × 25 GbE	≈ 28 k
Balanced inference server	4 × 1 TB LPDDR5X (quad‑channel)	4 × 16 TB NVMe (Gen 6)	2 × 100 GbE (CXL 3.1)	≈ 38 k
High‑throughput training box	8 × 1 TB LPDDR5X (octa‑channel)	8 × 32 TB NVMe (Gen 6)	4 × 200 GbE (CXL 3.1)	≈ 55 k

All three designs rely on a passive‑cooled heatsink that pushes the chip to its 2.7 GHz boost under load. In a rack‑mount enclosure the total power draw stays under 550 W per node, which is well within a 2‑U 1200 W power‑distribution unit.

What to watch for in the next round of testing

Dynamic power management – Once NVIDIA upstreams CPPC v4, we expect the chip to throttle down to ~150 W idle, dramatically improving density.
Frequency scaling – Real‑world workloads will benefit from fine‑grained boost algorithms; the current fixed‑frequency run hides potential gains.
GPU‑CPU synergy – Vera is marketed alongside the NVL72 Vera Rubin GPU. Future benchmarks that combine Tensor Core workloads with the CPU will reveal the true system‑level advantage.
Software ecosystem – Wider adoption of the Olympus‑specific flags in popular ML frameworks (TensorFlow, PyTorch) will be a key indicator of long‑term viability.

Bottom line

The early numbers prove that NVIDIA’s in‑house Olympus core can compete with the best x86_64 silicon on both raw throughput and efficiency. For homelab builders who need a dense AI‑ready platform, Vera offers a compelling alternative that fits into existing ARM server ecosystems without requiring exotic firmware work.

NVIDIA Vera test bed

Stay tuned for the next wave of power‑aware benchmarks when NVIDIA releases the production‑grade chassis later this year.

#ARM #CPU #Nvidia #Benchmarks #AI

NVIDIA Vera CPU Benchmarks: Olympus Cores Deliver Unmatched ARM Performance

NVIDIA Vera CPU Benchmarks – Olympus Cores Set New ARM Records

Why Vera matters for a homelab

Test methodology