Anthropic Abandons Core Safety Pledge as AI Race Heats Up

Anthropic drops its flagship Responsible Scaling Policy's central promise to halt AI development without adequate safety measures, citing competitive pressures and scientific uncertainty.

Anthropic, once considered the most safety-conscious AI company, has abandoned the central pledge of its Responsible Scaling Policy (RSP), marking a significant shift in its approach to AI development. The company that positioned itself as the responsible alternative to OpenAI is now explicitly choosing to continue training AI models even when it cannot guarantee adequate safety measures are in place.

The Original Promise and Its Abandonment

In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that its safety measures were adequate. This promise became the cornerstone of the RSP and was repeatedly touted by company leaders as evidence of their commitment to responsible AI development. Jared Kaplan, Anthropic's chief science officer, told TIME that the company felt this approach was no longer viable.

"We felt that it wouldn't actually help anyone for us to stop training AI models," Kaplan explained. "We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead."

The New Reality of AI Competition

The policy change comes as Anthropic experiences unprecedented commercial success. The company raised $30 billion in new investments in February, valuing it at approximately $380 billion, with annualized revenue growing at a rate of 10x per year. Its Claude models, particularly the software-writing tool Claude Code, have gained devoted followers.

This commercial momentum appears to have influenced Anthropic's strategic calculus. The company's core business model of selling directly to businesses is viewed by many investors as more credible than OpenAI's consumer-focused approach. However, Kaplan denies the change represents a capitulation to market pressures.

"I don't think we're making any kind of U-turn," he stated, framing the decision instead as a pragmatic response to emerging political and scientific realities.

The Scientific and Political Context

The original RSP was designed partly to encourage rivals to adopt similar safety measures and potentially serve as a blueprint for national regulations or international treaties. However, these hoped-for outcomes never materialized. The Trump Administration has endorsed an aggressive approach to AI development, even attempting to nullify state regulations, with no federal AI law on the horizon.

Meanwhile, the science of AI evaluations has proven more complex than anticipated. In 2025, Anthropic announced it could not rule out the possibility that its models could facilitate bio-terrorist attacks, yet lacked strong scientific evidence to definitively prove such dangers exist. This created a "fuzzy gradient" rather than the "bright red line" the company had envisioned.

The New Policy Framework

The revised RSP includes several key changes:

Transparency commitments: Additional disclosures about how Anthropic's models perform in safety testing
Competitive alignment: A promise to match or surpass competitors' safety efforts
Conditional delays: Only delaying development if Anthropic considers itself the AI race leader AND believes catastrophe risks are significant
Frontier Safety Roadmaps: Regular documents outlining detailed goals for future safety measures
Risk Reports: More in-depth assessments published every three to six months

The new policy explicitly states that if one developer pauses while others continue without strong mitigations, "responsible developers would lose their ability to do safety research."

Industry Reactions and Concerns

Chris Painter, director of policy at METR, a nonprofit focused on evaluating AI models for risky behavior, reviewed an early draft of the policy. He views the change as understandable but concerning.

"The change to the RSP shows Anthropic believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities," Painter told TIME. "This is more evidence that society is not prepared for the potential catastrophic risks posed by AI."

Painter expressed concern that moving away from binary thresholds could enable a "frog-boiling" effect, where danger gradually increases without clear alarm points.

The Safety Research Paradox

The decision reflects a fundamental tension in Anthropic's founding premise: the belief that proper AI safety research requires building models at the frontier of capability, even though this approach might accelerate the arrival of the dangers they fear. Kaplan argues that continuing development while competitors race ahead actually serves safety goals.

"If all of our competitors are transparently doing the right thing when it comes to catastrophic risk, we are committed to doing as well or better," Kaplan said. "But we don't think it makes sense for us to stop engaging with AI research, AI safety, and most likely lose relevance as an innovator who understands the frontier of the technology."

Implications for AI Safety

The policy shift represents a significant retreat from Anthropic's previous position as the industry's safety standard-bearer. While the company maintains it can preserve incentives for building safety measures by constraining itself from releasing new models, the removal of categorical barriers to training represents a fundamental change in approach.

The decision suggests that even companies that positioned themselves as prioritizing safety over speed are ultimately unwilling to slow down in the face of competitive pressures and scientific uncertainty. As the AI race intensifies between companies and nations, the question of how to balance innovation with safety becomes increasingly urgent.

The new RSP may create a forcing function for safety work, but it also removes the clearest mechanism for pausing development when risks are unclear. Whether this approach will prove more effective at managing catastrophic AI risks remains to be seen, but it represents a significant evolution in how one of the field's leading companies approaches the challenge of safe AI development.

#AI_Safety #Anthropic #Responsible Scaling Policy #AI Policy #competitive pressure