GitHub Availability Report: February 2026 - Six Major Incidents Impact Services | LavX News

GitHub experienced six significant outages in February 2026, affecting Dependabot, Actions, Codespaces, and core services. The company has outlined root causes and mitigation steps while working on long-term resilience improvements.

In February 2026, GitHub experienced six major incidents that resulted in degraded performance across multiple services, impacting teams and workflows worldwide. The company has since published detailed post-mortems and outlined steps to improve system resilience.

Dependabot Service Degradation

Duration: January 31, 2026, 00:30 UTC to February 2, 2026, 18:00 UTC

During this period, Dependabot failed to create 10% of automated pull requests due to a cluster failover that connected to a read-only database. The team mitigated the issue by pausing Dependabot queues until traffic was properly routed to healthy clusters. All failed jobs were identified and restarted.

Key improvements:

Added new monitors and alerts to reduce detection time
Implemented safeguards to prevent similar database connectivity issues

GitHub Actions and Codespaces Outage

Duration: February 2, 2026, 18:35 UTC to February 3, 2026, 00:30 UTC (standard runners), February 3 at 00:15 UTC (Codespaces)

This was one of the most widespread outages, affecting GitHub Actions hosted runners, Codespaces, and several dependent services including Copilot coding agent, CodeQL, Dependabot, GitHub Enterprise Importer, and GitHub Pages. All regions and runner types were impacted.

The outage was caused by a loss in telemetry that cascaded to mistakenly applying security policies to backend storage accounts in GitHub's compute provider. These policies blocked access to critical VM metadata, causing all VM create, delete, reimage, and other operations to fail.

Mitigation steps:

Rolled back policy changes starting at 22:15 UTC
Worked with compute provider to improve incident response and early detection
Enhanced safe rollout procedures for future changes

GitHub Core Services Degradation

Duration: February 9, 2026, 16:12 UTC to 17:39 UTC and 18:53 UTC to 20:09 UTC

GitHub experienced two related periods of degraded availability affecting github.com, the GitHub API, GitHub Actions, Git operations, and GitHub Copilot. Users encountered errors loading pages, HTTPS Git operation failures, and GitHub Actions workflow run failures.

Both incidents shared the same root cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. In the first incident, asynchronous rewrites overwhelmed a shared infrastructure component, leading to cascading failures and connection exhaustion in the Git HTTPS proxy.

The second incident occurred when an additional source of cache updates introduced high-volume synchronous writes, causing replication delays and similar cascade failures.

Immediate improvements implemented:

Optimized caching mechanism to avoid write amplification
Added self-throttling during bulk updates
Enhanced safeguards for caching system changes
Fixed connection exhaustion in Git HTTPS proxy layer

Codespaces Provisioning Failures

Duration: February 12, 2026, 00:51 UTC to 09:35 UTC

Users attempting to create or resume Codespaces experienced elevated failure rates across Europe, Asia, and Australia, peaking at a 90% failure rate. The issue started in UK South and progressively impacted other regions.

The failures were caused by an authorization claim change in a core networking dependency, which led to codespace pool provisioning failures. Alerts detected the issue but lacked appropriate severity, leading to delayed detection and response.

Improvements made:

Enhanced validation of changes to backend services
Updated monitoring during rollout
Improved alerting thresholds
Enhanced automated failover mechanisms

Repository Archive Download Errors

Duration: February 12, 2026, 09:16 UTC to 11:01 UTC

A small percentage of users attempting to download repository archives (tar.gz/zip) that include Git LFS objects received errors. Standard repository archives without LFS objects were not affected. The archive download error rate averaged 0.0042% and peaked at 0.0339% of requests.

The incident was caused by the deployment of incorrect network configuration in the LFS Service, which caused service health checks to fail and an internal service to be incorrectly marked as unreachable.

Mitigation:

Manually applied corrected network settings
Added checks for configuration corruption
Implemented auto-rollback detection

Looking Forward

GitHub has acknowledged the impact these outages have had on teams and workflows, and is prioritizing both near-term and long-term investments to improve system resilience. The company has released a comprehensive blog post outlining the root causes of recent incidents and the steps being taken to prevent future occurrences.

For real-time updates on service status and post-incident recaps, users can follow GitHub's status page. The engineering team continues to share detailed technical insights and improvement plans through the GitHub Blog.

These incidents highlight the complexity of maintaining large-scale distributed systems and the importance of robust monitoring, rapid incident response, and continuous improvement in cloud infrastructure.

#GitHub #Service Outage #incident response #distributed systems #Reliability

GitHub Availability Report: February 2026 - Six Major Incidents Impact Services