The Hidden Engine of GPU Efficiency: Demystifying Early-Z Testing

Article illustration 1

For decades, Early-Z testing has been the silent workhorse of real-time graphics, enabling techniques like depth pre-passes that keep forward rendering viable. Yet its nuanced interactions with modern shader features remain widely misunderstood. As GPUs evolve, mastering Early-Z becomes critical for unlocking peak performance in complex rendering pipelines.

The Logical Pipeline vs. Hardware Reality

Graphics APIs depict a logical pipeline where depth operations occur after pixel shading—a historical artifact from when depth buffers primarily resolved visibility. In reality, drivers analyze shaders and states to determine if depth testing can safely move before pixel shading (Early-Z), culling fragments without executing expensive shaders:

Article illustration 3

"This ‘sneaky’ optimization works because for opaque geometry, culling before shading produces identical results to the logical pipeline—just faster," explains the analysis. "The magic lies in drivers guaranteeing correctness while exploiting hardware parallelism."

When Early-Z Thrives… and Stumbles

The Ideal Scenario

With standard opaque shaders (no discards, depth exports, or UAV writes), Early-Z shines. Front-to-back rendering slashes pixel shader invocations dramatically, as demonstrated by the author's test app:

  • Back-to-front draw: 648,000 shader invocations
  • Front-to-front draw: 440,640 invocations (32% reduction)
Article illustration 4
<img src="https://news.lavx.hu/api/uploads/unlocking-gpu-performance-the-hidden-mechanics-of-early-z-testing_20250909_050945_image.jpg" 
     alt="Article illustration 5" 
     loading="lazy">
### The Disruptors 1. **Discard/Alpha Test**: Forces partial Late-Z when depth writes are enabled, crippling culling efficiency. Even an *unused* `discard` instruction in the shader disables full Early-Z:
// This unused discard still disables full Early-Z!
if (false) discard;
   
2. **Depth Export**: Pixel shader overrides (`SV_Depth`, `gl_FragDepth`) force full Late-Z—the GPU can't predict outputs pre-shading. Conservative variants (`SV_DepthGreaterEqual`) offer limited reprieves. 3. **UAV/Storage Writes**: Side effects break Early-Z's "pure function" assumption. Without explicit forcing, drivers default to Late-Z to preserve correctness. ## Taking Control: Forcing Early-Z APIs like D3D offer `[earlydepthstencil]` to override driver decisions. This enables Early-Z with UAVs—crucial for techniques like **Order-Independent Transparency**—but introduces caveats: - Depth exports are **ignored** - Discard **doesn't prevent depth writes** - Without ROVs, UAV writes race across overlapping fragments
<img src="https://news.lavx.hu/api/uploads/unlocking-gpu-performance-the-hidden-mechanics-of-early-z-testing_20250909_050948_image.jpg" 
     alt="Article illustration 2" 
     loading="lazy">

Rasterizer Order Views: The Savior?

ROVs/FSI enforce submission-order UAV writes, restoring expected depth-test behavior when forcing Early-Z:

"ROVs guarantee UAV writes only occur for visible fragments and respect draw order, making forced Early-Z viable for advanced techniques—with a parallelism penalty."

The Decision Matrix








































Shader Features Depth Write Implicit Early-Z? Forced Early-Z Behavior
None Off ✅ Likely Correct
Discard On ⚠️ Partial (reduced) ❌ Depth write ignores discard
UAV Writes Off ❌ Late-Z ✅ Writes if visible (unordered)
UAV + ROV On ❌ Late-Z ✅ Correct with ROVs
Depth Export Any ❌ Late-Z ❌ Export ignored

Strategic Insights

  1. Prepass Wisely: Depth-only passes maximize Early-Z efficiency for opaque geometry.
  2. Isolate Disruptors: Batch non-discard opaques first to prime the depth buffer.
  3. ROVs > Atomcis: For OIT, prefer ROVs over depth+payload atomics when forcing Early-Z.
  4. Mobile Caveat: Behavior varies—test target hardware aggressively.

As rendering complexity escalates, understanding Early-Z transitions from optimization to necessity. The difference between theory and hardware reality isn't just academic—it's the gap between stutter and silky frames.

Source: To Early-Z or Not to Early-Z by Michał Iwanicki (Principal Engine Architect)