GitHub's Performance Revolution: How Diff Lines Became 74% Faster
#Frontend

GitHub's Performance Revolution: How Diff Lines Became 74% Faster

Cloud Reporter
5 min read

GitHub tackled massive performance issues in pull request reviews by completely rebuilding their diff line architecture, achieving up to 78% faster interaction speeds and 50% less memory usage for large pull requests.

When GitHub engineers set out to improve pull request performance, they faced a daunting challenge: diff views that could contain thousands of files and millions of lines of code were becoming unusably slow. The problem wasn't just theoretical—users were experiencing JavaScript heap sizes exceeding 1 GB, DOM node counts surpassing 400,000, and interaction latencies that made large pull requests practically unusable.

The Scale of the Problem

Pull requests are the beating heart of GitHub's collaborative workflow. At GitHub's scale, these can range from tiny one-line fixes to massive changes spanning thousands of files. The Files changed tab, now the default experience for all users, needed to handle this extreme variability while maintaining responsiveness.

The metrics were sobering. Interaction to Next Paint (INP) scores—a key metric for measuring responsiveness—were consistently above acceptable levels. Users could quantifiably feel the input lag, with some extreme cases showing page interactions becoming extremely sluggish or even unusable.

First Iteration: What Worked and What Didn't

The original v1 architecture made sense at first glance. Each diff line was rendered using multiple React components, with unified diffs requiring roughly 10 DOM elements and split views needing closer to 15. Syntax highlighting added even more complexity with numerous <span> tags.

At the React layer, unified diffs typically contained at least eight components per line, while split views had a minimum of 13. This approach seemed reasonable when porting from the classic Rails view—lots of small, reusable React components maintaining DOM tree structure.

But the reality was brutal at scale. A single diff line could carry 20+ event handlers, multiplied across thousands of lines. The component tree became unwieldy:

  • Minimum of 10-15 DOM tree elements per line
  • Minimum of 8-13 React components per line
  • Minimum of 20 React event handlers per line
  • Lots of small reusable React components

This architecture proved unsustainable for large pull requests, where larger sizes directly led to slower INP and increased JavaScript heap usage.

The Breakthrough: v2 Architecture

The team realized that no single silver bullet would solve the problem. Instead, they developed multiple targeted approaches for different pull request sizes and complexities.

Small Changes, Massive Impact

Sometimes the smallest optimizations compound into the biggest wins. The team removed unnecessary <code> tags from line number cells—dropping just two DOM nodes per diff line. Across 10,000 lines, that's 20,000 fewer nodes in the DOM.

This became a guiding principle: every opportunity for improvement mattered, no matter how small.

Component Simplification

The most dramatic change was reducing eight components per diff line to just two. The v1 components were thin wrappers that shared code between Split and Unified views, but each wrapper carried logic for both views even though only one rendered at a time.

In v2, each view got its own dedicated component. Some code duplication resulted, but the simplification paid massive dividends in performance and maintainability.

State Management Revolution

The most impactful change was moving complex state for commenting and context menus into their respective components. Given GitHub's scale, where some pull requests exceed thousands of lines, it wasn't practical for every line to carry complex commenting state when only a small subset would ever have comments or menus open.

This aligned with the Single Responsibility Principle—the diff-line component's main responsibility became just rendering code.

Data Access Optimization

V1 had accumulated O(n) lookups across shared data stores and component state, with useEffect hooks scattered throughout the component tree causing extra re-rendering.

V2 adopted a two-part strategy:

  1. Restricted useEffect usage to the top level of diff files
  2. Established linting rules to prevent useEffect hooks in line-wrapping React components

They also redesigned global and diff state machines to use O(1) constant time lookups with JavaScript Maps, enabling fast, consistent selectors for common operations like line selection and comment management.

The Results: Numbers That Speak Volumes

The improvements were dramatic and measurable:

Metric v2 Improvement
Total lines of code 2,000 (27% less)
Total unique component types 10 (47% fewer)
Total components rendered ~50,004 (74% fewer)
Total DOM nodes ~180,000 (10% fewer)
Total memory usage 80-120 MB (50% less)
INP on large pull request (m1 MacBook Pro with 4x slowdown) 100 ms (78% faster)

These weren't just incremental improvements—they represented a fundamental transformation in how GitHub handles large-scale code review.

Virtualization for the Largest Pull Requests

For the largest pull requests (p95+ with over 10,000 diff lines), even the most efficient components struggled when rendering tens of thousands at once. Window virtualization became essential.

By integrating TanStack Virtual, GitHub ensured only the visible portion of the diff list was present in the DOM at any time. The impact was transformative:

  • 10X reduction in JavaScript heap usage and DOM nodes
  • INP fell from 275–700+ milliseconds to just 40–80 ms
  • Users could interact with content immediately without waiting for massive loads

Additional Performance Optimizations

The team didn't stop with the core architecture changes. They tackled several major areas:

React Re-renders: Trimmed unnecessary re-renders and honed state management, cutting wasted computation and making UI updates noticeably faster.

Styling Improvements: Swapped heavy CSS selectors (e.g., :has(...)) and re-engineered drag and resize handling with GPU transforms, eliminating forced layouts and sluggishness.

Monitoring Enhancement: Implemented interaction-level INP tracking, diff-size segmentation, and memory tagging in a Datadog dashboard, giving developers real-time, actionable metrics.

Server-Side Optimization: Optimized rendering to hydrate only visible diff lines, slashing time-to-interactive and keeping memory usage in check.

Progressive Loading: Implemented smart background fetches so users could see and interact with content sooner, eliminating the need to wait for massive numbers of diffs to finish loading.

The Bigger Picture

This performance journey demonstrates that targeted refactoring, even within large and mature codebases, can deliver meaningful benefits to all users. The team learned that sometimes focusing on small, simple improvements can have the largest impact.

The new architecture made the UI feel lighter, faster, and ready for anything users throw at it. For GitHub's millions of developers who spend significant time in pull requests, these improvements translate directly into productivity gains and reduced frustration.

As the team notes, the measurable gains show that performance optimization isn't just about handling edge cases—it's about creating a consistently excellent experience across the entire spectrum of use cases, from tiny fixes to massive refactors.

The diff lines are definitely better now—and the techniques developed here will likely influence performance optimization strategies across the entire GitHub platform.

Comments

Loading comments...