Early-Z testing is a foundational GPU optimization that dramatically reduces unnecessary pixel shading by culling occluded fragments early. This deep dive explores its intricate mechanics, surprising limitations with modern shader techniques, and practical strategies for maximizing rendering performance.

The Hidden Engine of GPU Efficiency: Demystifying Early-Z Testing

For decades, Early-Z testing has been the silent workhorse of real-time graphics, enabling techniques like depth pre-passes that keep forward rendering viable. Yet its nuanced interactions with modern shader features remain widely misunderstood. As GPUs evolve, mastering Early-Z becomes critical for unlocking peak performance in complex rendering pipelines.

The Logical Pipeline vs. Hardware Reality

Graphics APIs depict a logical pipeline where depth operations occur after pixel shading—a historical artifact from when depth buffers primarily resolved visibility. In reality, drivers analyze shaders and states to determine if depth testing can safely move before pixel shading (Early-Z), culling fragments without executing expensive shaders:

"This ‘sneaky’ optimization works because for opaque geometry, culling before shading produces identical results to the logical pipeline—just faster," explains the analysis. "The magic lies in drivers guaranteeing correctness while exploiting hardware parallelism."

When Early-Z Thrives… and Stumbles

The Ideal Scenario

With standard opaque shaders (no discards, depth exports, or UAV writes), Early-Z shines. Front-to-back rendering slashes pixel shader invocations dramatically, as demonstrated by the author's test app:

Back-to-front draw: 648,000 shader invocations
Front-to-front draw: 440,640 invocations (32% reduction)

The Disruptors

Discard/Alpha Test: Forces partial Late-Z when depth writes are enabled, crippling culling efficiency. Even an unused discard instruction in the shader disables full Early-Z:
```
// This unused discard still disables full Early-Z!
if (false) discard;
```
Depth Export: Pixel shader overrides (SV_Depth, gl_FragDepth) force full Late-Z—the GPU can't predict outputs pre-shading. Conservative variants (SV_DepthGreaterEqual) offer limited reprieves.
UAV/Storage Writes: Side effects break Early-Z's "pure function" assumption. Without explicit forcing, drivers default to Late-Z to preserve correctness.

Taking Control: Forcing Early-Z

APIs like D3D offer [earlydepthstencil] to override driver decisions. This enables Early-Z with UAVs—crucial for techniques like Order-Independent Transparency—but introduces caveats:

Depth exports are ignored
Discard doesn't prevent depth writes
Without ROVs, UAV writes race across overlapping fragments

Rasterizer Order Views: The Savior?

ROVs/FSI enforce submission-order UAV writes, restoring expected depth-test behavior when forcing Early-Z:

"ROVs guarantee UAV writes only occur for visible fragments and respect draw order, making forced Early-Z viable for advanced techniques—with a parallelism penalty."

The Decision Matrix

Shader Features	Depth Write	Implicit Early-Z?	Forced Early-Z Behavior
None	Off	✅ Likely	Correct
Discard	On	⚠️ Partial (reduced)	❌ Depth write ignores discard
UAV Writes	Off	❌ Late-Z	✅ Writes if visible (unordered)
UAV + ROV	On	❌ Late-Z	✅ Correct with ROVs
Depth Export	Any	❌ Late-Z	❌ Export ignored

Strategic Insights

Prepass Wisely: Depth-only passes maximize Early-Z efficiency for opaque geometry.
Isolate Disruptors: Batch non-discard opaques first to prime the depth buffer.
ROVs > Atomcis: For OIT, prefer ROVs over depth+payload atomics when forcing Early-Z.
Mobile Caveat: Behavior varies—test target hardware aggressively.

As rendering complexity escalates, understanding Early-Z transitions from optimization to necessity. The difference between theory and hardware reality isn't just academic—it's the gap between stutter and silky frames.

Source: To Early-Z or Not to Early-Z by Michał Iwanicki (Principal Engine Architect)

#EarlyZ #GPUOptimization #RenderingPipeline

Unlocking GPU Performance: The Hidden Mechanics of Early-Z Testing