On February 20, 2026, Cloudflare experienced a major service outage when a configuration change caused 1,100 BYOIP prefixes to be withdrawn from the network, impacting 25% of Bring Your Own IP customers and making services unreachable for approximately 6 hours and 7 minutes.
On February 20, 2026, at 17:48 UTC, Cloudflare experienced a significant service outage when a configuration change caused approximately 1,100 Bring Your Own IP (BYOIP) prefixes to be unintentionally withdrawn from the network. This incident impacted 25% of BYOIP prefixes and resulted in services and applications being unreachable for a subset of Cloudflare's customers for approximately 6 hours and 7 minutes.
What Happened
The outage was triggered by a change Cloudflare made to how its network manages IP addresses onboarded through the BYOIP pipeline. The change caused Cloudflare to unintentionally withdraw customer prefixes, resulting in timeouts and connection failures across Cloudflare deployments that used BYOIP. A subset of 1.1.1.1, specifically the destination one.one.one.one, was also impacted.
The total duration of the incident was 6 hours and 7 minutes, with most of that time spent restoring prefix configurations to their state prior to the change. Cloudflare engineers reverted the change and prefixes stopped being withdrawn when failures were observed.
Impact on Customers
During the incident, 1,100 prefixes out of the total 6,500 were withdrawn from 17:56 to 18:46 UTC. Out of the 4,306 total BYOIP prefixes, 25% were unintentionally withdrawn. The incident did not impact all BYOIP customers because the configuration change was applied iteratively rather than instantaneously across all customers.
Affected services included:
- Core CDN and Security Services: Traffic was not attracted to Cloudflare, causing connection failures
- Spectrum: Spectrum apps on BYOIP failed to proxy traffic
- Dedicated Egress: Customers using Gateway Dedicated Egress or Dedicated IPs for CDN Egress leveraging BYOIP couldn't send traffic to destinations
- Magic Transit: End users connecting to applications protected by Magic Transit experienced connection timeouts
Some customers were able to restore their own service by using the Cloudflare dashboard to re-advertise their IP addresses. However, approximately 300 prefixes couldn't be remediated through the dashboard due to a software bug that removed service configurations from edge servers.
Technical Root Cause
The specific configuration that broke was a modification attempting to automate the customer action of removing prefixes from Cloudflare's BYOIP service. This automation was part of Cloudflare's Code Orange: Fail Small initiative, which aims to move manual processes to safe, automated workflows.
The issue stemmed from a bug in an API query. The cleanup sub-task was supposed to query for prefixes pending deletion, but due to how the API interpreted an empty string parameter, it instead retrieved all BYOIP prefixes. The system then interpreted all returned prefixes as queued for deletion and began systematically deleting them along with their dependent objects, including service bindings.
Why the Bug Wasn't Caught
Several factors contributed to the bug making it to production:
- The staging environment's mock data was insufficient to catch this scenario
- While tests existed for this functionality, coverage for this specific scenario was incomplete
- Initial testing and code review focused on the BYOIP self-service API journey and were completed successfully, but didn't cover scenarios where the task-runner service would independently execute changes without explicit input
Recovery Process
Recovery was complicated because affected BYOIP prefixes were in different states:
- Most customers: Only had prefixes withdrawn and could restore service via dashboard
- Some customers: Had prefixes withdrawn and some bindings removed (partial recovery state)
- Some customers: Had prefixes withdrawn and all service bindings removed (required intensive data recovery)
The final 300 prefixes that couldn't be restored through the dashboard required a global configuration update to reapply service bindings across Cloudflare's edge network.
Timeline of Events
- 2026-02-05 21:53: Broken sub-process merged into code base
- 2026-02-20 17:46: Address API release with broken sub-process completes
- 2026-02-20 17:56: Impact starts - prefixes begin to be withdrawn
- 2026-02-20 18:13: Cloudflare engaged for failures on one.one.one.one
- 2026-02-20 18:18: Internal incident declared
- 2026-02-20 18:21: Addressing API team paged
- 2026-02-20 18:46: Issue identified - broken sub-process terminated
- 2026-02-20 19:19: Some prefixes mitigated - customers begin self-remediation via dashboard
- 2026-02-20 20:30: Final mitigation process begins - engineers complete release to restore withdrawn prefixes
- 2026-02-20 23:03: Configuration update completed - remaining prefixes restored
Connection to Code Orange: Fail Small
This incident occurred while Cloudflare was implementing changes as part of the Code Orange: Fail Small initiative, which has three main goals:
- Require controlled rollouts for configuration changes propagated to the network
- Change internal "break glass" procedures and remove circular dependencies
- Review, improve, and test failure modes of all systems handling network traffic
The change that caused the outage fell under the first goal - moving risky manual changes to safe, automated configuration updates. While preventative measures weren't fully deployed before the outage, teams were actively working on these systems when the incident occurred.
Remediation and Follow-up Steps
Cloudflare has outlined several improvements to prevent similar incidents:
API Schema Standardization
- Improve API schema to ensure better standardization
- Make it easier for testing and systems to validate whether API calls are properly formed
Better Separation Between Operational and Configured State
- Redesign rollback mechanism and database configuration
- Introduce layers between customer configuration and production
- Implement database snapshots that can be applied through health-mediated deployments
Better Arbitrate Large Withdrawal Actions
- Improve monitoring to detect when changes are happening too fast or too broadly
- Implement circuit breakers to stop out-of-control processes
- Use customer service behavior signals to trigger circuit breakers
Conclusion
Cloudflare has acknowledged the severity of this incident and its impact on customers and the broader internet. The company has committed to making these improvements to ensure improved stability moving forward and to prevent similar problems from occurring again. This incident serves as a reminder of the critical nature of internet infrastructure and the importance of robust testing, deployment processes, and rollback mechanisms in maintaining service reliability.
For more information about Cloudflare's services and their commitment to building a better internet, visit their website or check out their open positions if you're interested in contributing to internet infrastructure reliability.

Comments
Please log in or register to join the discussion