GitHub experienced six significant outages in February 2026, affecting Dependabot, Actions, Codespaces, and core services. The company has outlined root causes and mitigation steps while working on long-term resilience improvements.
In February 2026, GitHub experienced six major incidents that resulted in degraded performance across multiple services, impacting teams and workflows worldwide. The company has since published detailed post-mortems and outlined steps to improve system resilience.
Dependabot Service Degradation
Duration: January 31, 2026, 00:30 UTC to February 2, 2026, 18:00 UTC
During this period, Dependabot failed to create 10% of automated pull requests due to a cluster failover that connected to a read-only database. The team mitigated the issue by pausing Dependabot queues until traffic was properly routed to healthy clusters. All failed jobs were identified and restarted.
Key improvements:
- Added new monitors and alerts to reduce detection time
- Implemented safeguards to prevent similar database connectivity issues
GitHub Actions and Codespaces Outage
Duration: February 2, 2026, 18:35 UTC to February 3, 2026, 00:30 UTC (standard runners), February 3 at 00:15 UTC (Codespaces)
This was one of the most widespread outages, affecting GitHub Actions hosted runners, Codespaces, and several dependent services including Copilot coding agent, CodeQL, Dependabot, GitHub Enterprise Importer, and GitHub Pages. All regions and runner types were impacted.
The outage was caused by a loss in telemetry that cascaded to mistakenly applying security policies to backend storage accounts in GitHub's compute provider. These policies blocked access to critical VM metadata, causing all VM create, delete, reimage, and other operations to fail.
Mitigation steps:
- Rolled back policy changes starting at 22:15 UTC
- Worked with compute provider to improve incident response and early detection
- Enhanced safe rollout procedures for future changes
GitHub Core Services Degradation
Duration: February 9, 2026, 16:12 UTC to 17:39 UTC and 18:53 UTC to 20:09 UTC
GitHub experienced two related periods of degraded availability affecting github.com, the GitHub API, GitHub Actions, Git operations, and GitHub Copilot. Users encountered errors loading pages, HTTPS Git operation failures, and GitHub Actions workflow run failures.
Both incidents shared the same root cause: a configuration change to a user settings caching mechanism caused a large volume of cache rewrites to occur simultaneously. In the first incident, asynchronous rewrites overwhelmed a shared infrastructure component, leading to cascading failures and connection exhaustion in the Git HTTPS proxy.
The second incident occurred when an additional source of cache updates introduced high-volume synchronous writes, causing replication delays and similar cascade failures.
Immediate improvements implemented:
- Optimized caching mechanism to avoid write amplification
- Added self-throttling during bulk updates
- Enhanced safeguards for caching system changes
- Fixed connection exhaustion in Git HTTPS proxy layer
Codespaces Provisioning Failures
Duration: February 12, 2026, 00:51 UTC to 09:35 UTC
Users attempting to create or resume Codespaces experienced elevated failure rates across Europe, Asia, and Australia, peaking at a 90% failure rate. The issue started in UK South and progressively impacted other regions.
The failures were caused by an authorization claim change in a core networking dependency, which led to codespace pool provisioning failures. Alerts detected the issue but lacked appropriate severity, leading to delayed detection and response.
Improvements made:
- Enhanced validation of changes to backend services
- Updated monitoring during rollout
- Improved alerting thresholds
- Enhanced automated failover mechanisms
Repository Archive Download Errors
Duration: February 12, 2026, 09:16 UTC to 11:01 UTC
A small percentage of users attempting to download repository archives (tar.gz/zip) that include Git LFS objects received errors. Standard repository archives without LFS objects were not affected. The archive download error rate averaged 0.0042% and peaked at 0.0339% of requests.
The incident was caused by the deployment of incorrect network configuration in the LFS Service, which caused service health checks to fail and an internal service to be incorrectly marked as unreachable.
Mitigation:
- Manually applied corrected network settings
- Added checks for configuration corruption
- Implemented auto-rollback detection
Looking Forward
GitHub has acknowledged the impact these outages have had on teams and workflows, and is prioritizing both near-term and long-term investments to improve system resilience. The company has released a comprehensive blog post outlining the root causes of recent incidents and the steps being taken to prevent future occurrences.
For real-time updates on service status and post-incident recaps, users can follow GitHub's status page. The engineering team continues to share detailed technical insights and improvement plans through the GitHub Blog.
These incidents highlight the complexity of maintaining large-scale distributed systems and the importance of robust monitoring, rapid incident response, and continuous improvement in cloud infrastructure.

Comments
Please log in or register to join the discussion