1. What the original RSP actually promised 2. What RSP v3.0 removes and what it adds 3. The dual condition that replaced the hard limit 4. Why Anthropic says it had to change 5. The Pentagon standoff happening at the same time 6. What the AI safety community is saying 7. RSP v2 vs RSP v3: what actually changed 8. What it means for you as someone building with Claude 9. Frequently asked questions ---

Anthropic drops its safety pause pledge: what RSP v3.0 actu…

TL;DR: Anthropic's Responsible Scaling Policy v3.0, effective February 24, 2026, removes the hard commitment to pause AI training if the company cannot guarantee adequate safety mitigations first. The new policy only triggers a delay if Anthropic is simultaneously leading the AI race and judges the risk of catastrophe to be material. The company frames this as a collective action problem, not a retreat from safety.

What you will learn

What the original RSP actually promised
What RSP v3.0 removes and what it adds
The dual condition that replaced the hard limit
Why Anthropic says it had to change
The Pentagon standoff happening at the same time
What the AI safety community is saying
RSP v2 vs RSP v3: what actually changed
What it means for you as someone building with Claude
Frequently asked questions

What the original RSP actually promised

When Anthropic introduced its Responsible Scaling Policy in 2023, the core commitment was unusually concrete for a tech company. The company pledged that it would not train AI models more capable than a certain threshold unless it could demonstrate, in advance, that its safety measures were adequate to handle the risks those models could pose.

This was a firm, unilateral line. It did not depend on what OpenAI was doing. It did not depend on whether the regulatory environment was favorable. Anthropic was saying: if we cannot prove safety first, we stop. Full stop.

The policy was built around "AI Safety Levels," or ASLs, a tiering system roughly analogous to biosafety levels used in laboratory settings. Each level corresponded to a range of model capabilities and a set of required mitigations. The logic was simple. Before crossing into a higher capability tier, you earn the right to be there by proving your safety framework can handle it.

That commitment was what made Anthropic credibly different from its competitors, at least in the eyes of the AI safety community and much of the press. It was enforceable, public, and required no political will from anyone else to activate.

RSP v3.0, effective February 24, 2026, removes that.

What RSP v3.0 removes and what it adds

As Winbuzzer reported, the headline change is the elimination of the categorical pause trigger. Under the old framework, Anthropic's inability to demonstrate adequate safety mitigations was, by itself, sufficient to halt development. That is no longer the case.

What replaces it is a conditional framework. And what Anthropic adds on the transparency side is genuinely new.

The policy now introduces two required outputs: Frontier Safety Roadmaps and Risk Reports.

Frontier Safety Roadmaps are public documents laying out Anthropic's concrete plans across four domains: Security, Alignment, Safeguards, and Policy. The goals are meant to be ambitious but achievable, and the company commits to updating them regularly.

Risk Reports are published every three to six months and go beyond model capability descriptions. They cover threat models, active mitigations, and Anthropic's overall assessment of risk level at the time of publication. In certain circumstances, external expert reviewers get access to these reports before they are published.

The RSP also now formally separates two categories of commitment: what Anthropic will do regardless of what competitors do, and what it believes the broader industry should adopt. This is an honest structural change. The previous version treated both as equivalent. The new one admits they are not.

The dual condition that replaced the hard limit

The new pause trigger is a two-part test, and both conditions must be true at the same time for Anthropic to delay development.

First, Anthropic's leadership must judge that the company is the leader in the AI race. Second, they must assess that the risk of catastrophe is material.

Both conditions must be satisfied simultaneously. Neither alone is sufficient.

This is a considerably higher bar than the original. Under RSP v2, the inability to demonstrate adequate safety mitigations was enough on its own. You did not need to also be winning the race. Under v3, even if risks are material, Anthropic can continue developing if it does not consider itself the frontrunner. And even if it is the frontrunner, it can continue if it does not consider the catastrophe risk material.

The subjective nature of both judgments matters here. Who decides if Anthropic is "leading"? The same executives whose financial incentives are tied to shipping products. Who decides if catastrophe risk is "material"? The same company whose chief science officer just told TIME that stopping development "wouldn't actually help anyone."

Why Anthropic says it had to change

Jared Kaplan, Anthropic's chief science officer, gave TIME the clearest on-record explanation. "We felt that it wouldn't actually help anyone for us to stop training AI models," he said, adding that "with the rapid advance of AI, it made sense for us to make unilateral commitments" and that "if competitors are blazing ahead" a unilateral pause would simply cede ground.

Anthropic's own policy documentation cites three forces that made the original structure untenable.

The first is what the company calls a "zone of ambiguity," the difficulty of defining precisely when a capability threshold has been crossed. Bright lines are clean in policy documents and messy in practice.

The second is the political climate. In plain terms, the regulatory environment in 2026 is not friendly to companies that voluntarily slow themselves down. The incentive structure has shifted.

The third is the most intellectually honest acknowledgment in the document: some of the mitigations required at higher ASL levels simply cannot be implemented by one company alone. They require industry-wide coordination, and that coordination does not yet exist.

This brings the company to what it calls the collective action problem. If Anthropic pauses and OpenAI, Google DeepMind, xAI, and others continue, the net effect on global AI risk is unclear and potentially negative. The company with the weakest safety practices sets the pace. A unilateral pause might feel principled and accomplish nothing.

That is a real argument. It is also one that every competitor would be happy for Anthropic to believe.

The Pentagon standoff happening at the same time

RSP v3.0 did not drop in a vacuum. The week of its release, Anthropic was simultaneously in a tense public standoff with the Pentagon.

According to CNN, Defense Secretary Pete Hegseth met with Anthropic CEO Dario Amodei and demanded that the company lift its usage restrictions so Claude could be used for "all lawful use" by the military. Anthropic had drawn two red lines: AI-controlled weapons and mass domestic surveillance of American citizens. The Pentagon wanted both removed.

The company holds a $200 million government contract. According to reporting from Axios and Bloomberg, Hegseth threatened to terminate that contract by Friday if Anthropic did not comply. The administration also reportedly planned to label Anthropic a "supply chain risk," a designation that would function as a government blacklist. There was discussion of invoking the Defense Production Act to compel cooperation.

This is not a peripheral detail. Every other major AI lab, including OpenAI, Google, and Elon Musk's xAI, had already agreed to lift guardrails for Pentagon work. Anthropic was the outlier. And as the Pentagon turned up the pressure, the company released a policy document reframing its safety commitments as contingent rather than absolute.

Anthropic has not publicly confirmed any link between the Pentagon pressure and the RSP timing. But as CNN noted, the proximity of the two events has not gone unnoticed.

The Register and Engadget both covered the dual storyline, with Engadget's headline directly connecting the two: "Anthropic weakens its safety pledge in the wake of the Pentagon's pressure campaign."

What the AI safety community is saying

The response from researchers close to this space has been a mix of critique and cautious acknowledgment of the policy's new transparency mechanisms.

The most striking signal came two weeks before the RSP release. Mrinank Sharma, who led Anthropic's safeguards research team, resigned on February 9, 2026, posting a public letter on X stating "the world is in peril," as reported by Semafor. Sharma wrote that he had "repeatedly seen how hard it is to truly let our values govern our actions" at Anthropic, and insinuated a gap between the company's public commitments and internal practice. He specifically cited concerns about bioterrorism and AI-assisted catastrophic risks.

Sharma's resignation was not framed as being about the RSP specifically. But its timing, two weeks before a policy revision that softened Anthropic's hardest commitments, drew its own conclusions.

On Hacker News, the discussion thread on the TIME piece generated significant traffic. Analysis on LessWrong and other forums drew comparisons to Google abandoning "Don't be evil." Others noted that removing visible commitments weakens broader industry norms, since those norms have historically depended on peer pressure and public accountability. When the most safety-focused company drops its hardest line, it becomes harder for anyone else to hold theirs.

"They said they were different. Now they're not." This sentiment, repeated in various forms across forums and comment threads, captures the emotional charge of the response. The factual version is more complex.

The policy's new Frontier Safety Roadmap and Risk Reports are not nothing. External review of Risk Reports under certain conditions is a concrete accountability mechanism. Publishing every three to six months on threat models and active mitigations gives researchers and journalists something to check against. The old RSP had a bright line. The new one has more transparency. Whether those are equivalent trades depends on what you believe actually reduces risk.

RSP v2 vs RSP v3: what actually changed

Feature	RSP v2	RSP v3
Hard pause trigger	✓ Cannot train without proven mitigations	✗ Replaced with dual condition
Dual condition required	✗ Not applicable	✓ Must lead AI race AND judge risk material
Frontier Safety Roadmap	✗ Not required	✓ Published publicly, updated regularly
Risk Reports	✗ Not required	✓ Every 3-6 months, with threat model details
External review of reports	✗ Not required	✓ Required under certain circumstances
Unilateral commitment scope	✓ Full, regardless of competitors	✗ Narrowed; broader industry track separated
Collective action framing	✗ Not addressed	✓ Explicitly acknowledged
Separation of company vs industry commitments	✗ Treated as equivalent	✓ Now formally separated

The table tells the story. What Anthropic removed was a hard constraint. What it added was visibility. Whether the AI safety community accepts that trade will depend on whether the new transparency mechanisms are actually enforced.

What it means for you as someone building with Claude

If you build products on Claude today, the practical implications are limited in the short term. The API is the same. The usage policies are the same. Anthropic is not suggesting that Claude's capabilities or safety guardrails have changed.

The longer-term signal is subtler. Anthropic has repositioned itself from a company that would unilaterally stop before crossing certain lines, to one that will match or surpass competitor safety efforts and publish detailed documentation of where it stands. The bet is that transparency plus competitive parity produces better outcomes than a unilateral hard limit in a market where others will not follow.

For developers, this means the safety guarantees baked into Claude's design remain intact for now. But the framework that was supposed to prevent the company from ever building beyond a safe threshold has become conditional on competitive dynamics that the company itself assesses.

That is a different kind of promise. It may still be kept. But you should know it changed.

Frequently asked questions

What is Anthropic's Responsible Scaling Policy?

The Responsible Scaling Policy, or RSP, is Anthropic's internal framework governing how it develops and deploys increasingly capable AI models. It defines AI Safety Levels (ASLs) tied to model capability thresholds, and specifies what safety mitigations must be in place before Anthropic can train or deploy models at each level. The policy was first introduced in 2023 and is intended to be a public, binding commitment on the company's behavior.

What did the original RSP commit Anthropic to?

Under the original framework (versions prior to v3.0), Anthropic committed to halting development of more capable AI models if it could not demonstrate in advance that its safety measures were adequate to handle the risks those models posed. This was a unilateral commitment, meaning it applied regardless of what other AI companies were doing.

What is the dual condition in RSP v3.0?

RSP v3.0 replaces the categorical pause trigger with a two-part test. Anthropic will only delay AI development if its leadership concludes, simultaneously, that Anthropic is leading the AI race and that the risk of catastrophe is material. Both conditions must be true at the same time. If either is absent, development continues.

Why did Anthropic say it changed the policy?

Anthropic cited three reasons: a "zone of ambiguity" in defining capability thresholds precisely, an increasingly anti-regulatory political climate, and the practical impossibility of implementing some higher-level mitigations without industry-wide coordination. The company frames the change as a response to a collective action problem, not an abandonment of safety values.

What is the collective action problem Anthropic is describing?

Anthropic's argument is that AI catastrophe risk depends on the behavior of the entire ecosystem of frontier AI developers, not just one company. If Anthropic pauses while OpenAI, Google, xAI, and others continue without equivalent safety measures, the net result could be a world where companies with weaker safety practices set the development pace. A unilateral pause, in this view, imposes costs on Anthropic without producing safety benefits for the world.

What are the Frontier Safety Roadmaps and Risk Reports?

These are two new transparency mechanisms introduced in RSP v3.0. Frontier Safety Roadmaps are public documents describing Anthropic's concrete plans across Security, Alignment, Safeguards, and Policy. Risk Reports are published every three to six months and include threat model analysis, active mitigations, and Anthropic's overall risk assessment for deployed models. External experts can review Risk Reports under certain circumstances before publication.

Who is Mrinank Sharma and why is his resignation relevant?

Mrinank Sharma was Anthropic's head of safeguards research. He resigned on February 9, 2026, two weeks before RSP v3.0 was published, and posted a public letter stating "the world is in peril." He wrote that he had repeatedly seen how difficult it is for Anthropic to let its values govern its actions in practice, and cited concerns about AI-assisted catastrophic risks including bioterrorism. His resignation is relevant because it suggests internal tension about the gap between Anthropic's public safety commitments and internal culture, even before the policy change was announced.

What is the Pentagon-Anthropic dispute about?

The U.S. Department of Defense holds a $200 million contract with Anthropic. Defense Secretary Pete Hegseth reportedly demanded that Anthropic lift its usage restrictions to allow Claude to be used for "all lawful use" by the military, including areas Anthropic had flagged as red lines: AI-controlled weapons and mass domestic surveillance. The Pentagon reportedly threatened to terminate the contract and label Anthropic a "supply chain risk," a de facto government blacklist, if Anthropic refused. All other major AI labs had already agreed to lift their standard guardrails for Pentagon work. The standoff became public the same week RSP v3.0 was released.

Does RSP v3.0 change how Claude behaves as a product?

No, not directly. The RSP governs Anthropic's internal development decisions, not the day-to-day behavior of Claude in the API or consumer products. Claude's usage policies, system prompt behavior, and built-in safety training are separate from the RSP framework. The policy change affects how Anthropic decides whether to build and release more capable future models, not what the current model does or refuses to do.

Is RSP v3.0 a good or bad development for AI safety?

That depends on your model of how safety actually gets achieved in a competitive market. Critics argue that removing the hard unilateral pause weakens a credible, enforceable commitment and sets a bad precedent for the broader industry. Defenders argue that a pause no other company would follow is not actually safe, and that the new transparency mechanisms, Frontier Safety Roadmaps, Risk Reports, and external review, create more durable accountability. Both positions are coherent. The honest answer is that it trades one kind of assurance for another, and we will not know which bet was right until we are further into the capability curve.

Let's Build Something Together

Anthropic drops its safety pause pledge: what RSP v3.0 actually changes

Weekly Newsletter