Google Cloud N4 Series Benchmarks: Axion vs. Xeon vs. EPYC Performance Showdown
#Hardware

Google Cloud N4 Series Benchmarks: Axion vs. Xeon vs. EPYC Performance Showdown

Hardware Reporter
6 min read

Google Cloud's new N4A instances with custom Axion ARM64 processors face off against Intel Xeon Emerald Rapids and AMD EPYC Turin in comprehensive 16 vCPU benchmarks, revealing performance-per-dollar advantages and architectural trade-offs.

Google Cloud recently expanded its N4 series with the launch of N4A instances powered by the company's in-house Axion ARM64 processors. These custom-designed chips represent Google's push into vertical integration, following in the footsteps of other cloud giants developing proprietary silicon. The N4A series promises significant performance improvements over previous-generation ARM64 offerings, but how do they stack up against the established x86 competition?

Google Cloud N4 instances

To find out, I conducted comprehensive benchmarking of 16 vCPU instances across all three N4 variants: the N4A with Google Axion, the N4 with Intel Xeon Platinum 8581C Emerald Rapids, and the N4D with AMD EPYC 9B45 Turin. Each instance was configured with 400GB of storage and running Ubuntu 25.10 to ensure a modern, optimized software stack. The testing took place in Google Cloud's Iowa region, using current pricing to calculate performance-per-dollar metrics.

Architecture and Configuration

The three processors take fundamentally different approaches to achieving 16 vCPUs. The Intel N4 instance uses eight physical cores with Hyper-Threading enabled, while the AMD N4D employs eight physical cores with Simultaneous Multi-Threading (SMT). Both achieve their 16 vCPU count through hardware threading technologies. In contrast, the Axion-powered N4A relies on 16 physical cores without any SMT capability.

This architectural difference has implications for both performance and pricing. The N4A instance costs $0.71 per hour, the N4D EPYC VM runs at $0.77 per hour, and the N4 Xeon instance tops out at $0.82 per hour. These price points immediately suggest that Google is positioning the Axion processor as a cost-effective alternative, particularly for workloads that can effectively utilize physical cores without requiring the additional threads provided by SMT.

Performance Across Workloads

Google Cloud N4 Series: N4A vs. N4 vs. N4D Benchmarks The benchmark suite covered a wide range of workloads to evaluate performance across different scenarios:

Computational Finance and HPC: QuantLib benchmarks revealed the strengths and weaknesses of each architecture. The Axion processor showed competitive performance in certain financial calculations, though the x86 processors maintained advantages in specific algorithmic patterns that benefit from SMT.

Data Processing: JSON parsing benchmarks highlighted the Axion's efficiency in handling structured data formats. The custom ARM64 design appears optimized for modern data processing workloads, with the N4A showing particularly strong results in parsing throughput.

Security and Cryptography: Crypto benchmarks demonstrated that all three processors handle encryption and decryption tasks competently, though with different performance characteristics. The Axion's dedicated cryptographic instructions showed promise in certain algorithms.

Media Encoding: AV1 and x265 video encoding tests revealed interesting trade-offs. While the x86 processors maintained leads in some encoding scenarios, the Axion showed competitive performance-per-watt characteristics, suggesting potential advantages for large-scale encoding operations.

Compression and Compilation: 7-Zip compression benchmarks and code compilation tests provided insights into general-purpose computing performance. The results varied significantly based on workload characteristics, with some tests favoring the raw core count of Axion and others benefiting from the threading capabilities of x86 processors.

Database Performance: ClickHouse, CockroachDB, and PostgreSQL benchmarks evaluated database server performance. These tests are particularly relevant for cloud workloads, as many cloud applications rely heavily on database operations. The Axion showed competitive performance in read-heavy workloads while the x86 processors maintained advantages in complex query scenarios.

Web Serving: Nginx HTTPS web server benchmarks tested the processors' ability to handle concurrent connections and SSL/TLS termination. The results varied based on connection patterns and payload sizes, with each architecture showing strengths in different scenarios.

AI/ML Workloads: Llama.cpp benchmarks provided early insights into how these processors handle AI inference workloads. As ARM64 processors gain traction in the AI space, these results offer valuable data points for organizations considering ARM-based infrastructure for machine learning applications.

Performance-Per-Dollar Analysis

The pricing differential between the three instance types creates interesting economic considerations. At $0.71 per hour for N4A, $0.77 for N4D, and $0.82 for N4, the Axion processor offers a 13% cost advantage over the Xeon and a 8% advantage over the EPYC at equivalent vCPU counts.

However, performance-per-dollar calculations reveal a more nuanced picture. In some workloads, the Axion's performance advantage combined with its lower price creates compelling economics. In others, the x86 processors' superior performance in specific scenarios can justify their higher cost.

For example, in database-heavy workloads where the Axion shows competitive performance, the cost savings can be significant at scale. Conversely, for workloads that heavily benefit from SMT or specific x86 optimizations, the higher-priced instances may deliver better overall value despite their increased hourly cost.

Architectural Implications

The benchmarking reveals several important architectural considerations:

Core Count vs. Threading: The Axion's approach of using 16 physical cores without SMT represents a different philosophy from the x86 processors' use of threading technologies. This design choice appears to prioritize predictable performance and potentially better power efficiency over the ability to handle more threads.

Instruction Set Optimization: The custom nature of the Axion processor allows Google to optimize the instruction set for specific workloads. The strong performance in data processing and certain computational tasks suggests successful optimization efforts.

Software Ecosystem Maturity: While ARM64 support has improved dramatically in recent years, some workloads still show performance gaps compared to x86. This highlights the ongoing importance of software optimization for ARM architectures.

Use Case Recommendations

Based on the benchmark results, different processors emerge as optimal choices for specific use cases:

Cost-Sensitive Workloads: For organizations prioritizing cost efficiency and running workloads that scale well across physical cores, the N4A with Axion offers compelling economics.

Thread-Intensive Applications: Workloads that benefit significantly from SMT or require the highest single-threaded performance may still favor the x86 options, particularly the Xeon with its established software ecosystem.

Balanced Performance: The N4D with EPYC Turin provides a middle ground, offering strong performance across a wide range of workloads with reasonable pricing.

Specialized Workloads: Organizations running specific workloads that align with the architectural strengths of each processor should evaluate the detailed benchmark results for their particular use cases.

Future Considerations

Google's entry into custom processor design with Axion represents a significant development in the cloud computing landscape. The competitive performance and attractive pricing of the N4A instances suggest that custom ARM64 processors can effectively compete with established x86 offerings in many scenarios.

As the software ecosystem continues to mature and optimize for ARM64, we can expect the performance gap between architectures to narrow further. Additionally, future generations of Axion processors may address current limitations while building on demonstrated strengths.

For organizations evaluating cloud infrastructure, these benchmark results provide valuable data for making informed decisions. The choice between Axion, Xeon, and EPYC should be based on specific workload requirements, performance needs, and budget constraints rather than architectural preferences alone.

The comprehensive benchmarking across diverse workloads demonstrates that no single processor architecture dominates across all scenarios. Instead, each offers distinct advantages that make them suitable for different use cases and organizational priorities.

Comments

Loading comments...