Anthropic drops its safety pause pledge: what RSP v3.0 actually changes
Anthropic removed its commitment to halt AI training if risks aren't mitigated. Here is what RSP v3.0 changes and why it matters.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: Anthropic's Responsible Scaling Policy v3.0, effective February 24, 2026, removes the hard commitment to pause AI training if the company cannot guarantee adequate safety mitigations first. The new policy only triggers a delay if Anthropic is simultaneously leading the AI race and judges the risk of catastrophe to be material. The company frames this as a collective action problem, not a retreat from safety.
When Anthropic introduced its Responsible Scaling Policy in 2023, the core commitment was unusually concrete for a tech company. The company pledged that it would not train AI models more capable than a certain threshold unless it could demonstrate, in advance, that its safety measures were adequate to handle the risks those models could pose.
This was a firm, unilateral line. It did not depend on what OpenAI was doing. It did not depend on whether the regulatory environment was favorable. Anthropic was saying: if we cannot prove safety first, we stop. Full stop.
The policy was built around "AI Safety Levels," or ASLs, a tiering system roughly analogous to biosafety levels used in laboratory settings. Each level corresponded to a range of model capabilities and a set of required mitigations. The logic was simple. Before crossing into a higher capability tier, you earn the right to be there by proving your safety framework can handle it.
That commitment was what made Anthropic credibly different from its competitors, at least in the eyes of the AI safety community and much of the press. It was enforceable, public, and required no political will from anyone else to activate.
RSP v3.0, effective February 24, 2026, removes that.
As Winbuzzer reported, the headline change is the elimination of the categorical pause trigger. Under the old framework, Anthropic's inability to demonstrate adequate safety mitigations was, by itself, sufficient to halt development. That is no longer the case.
What replaces it is a conditional framework. And what Anthropic adds on the transparency side is genuinely new.
The policy now introduces two required outputs: Frontier Safety Roadmaps and Risk Reports.
Frontier Safety Roadmaps are public documents laying out Anthropic's concrete plans across four domains: Security, Alignment, Safeguards, and Policy. The goals are meant to be ambitious but achievable, and the company commits to updating them regularly.
Risk Reports are published every three to six months and go beyond model capability descriptions. They cover threat models, active mitigations, and Anthropic's overall assessment of risk level at the time of publication. In certain circumstances, external expert reviewers get access to these reports before they are published.
The RSP also now formally separates two categories of commitment: what Anthropic will do regardless of what competitors do, and what it believes the broader industry should adopt. This is an honest structural change. The previous version treated both as equivalent. The new one admits they are not.
The new pause trigger is a two-part test, and both conditions must be true at the same time for Anthropic to delay development.
First, Anthropic's leadership must judge that the company is the leader in the AI race. Second, they must assess that the risk of catastrophe is material.
Both conditions must be satisfied simultaneously. Neither alone is sufficient.
This is a considerably higher bar than the original. Under RSP v2, the inability to demonstrate adequate safety mitigations was enough on its own. You did not need to also be winning the race. Under v3, even if risks are material, Anthropic can continue developing if it does not consider itself the frontrunner. And even if it is the frontrunner, it can continue if it does not consider the catastrophe risk material.
The subjective nature of both judgments matters here. Who decides if Anthropic is "leading"? The same executives whose financial incentives are tied to shipping products. Who decides if catastrophe risk is "material"? The same company whose chief science officer just told TIME that stopping development "wouldn't actually help anyone."
Jared Kaplan, Anthropic's chief science officer, gave TIME the clearest on-record explanation. "We felt that it wouldn't actually help anyone for us to stop training AI models," he said, adding that "with the rapid advance of AI, it made sense for us to make unilateral commitments" and that "if competitors are blazing ahead" a unilateral pause would simply cede ground.
Anthropic's own policy documentation cites three forces that made the original structure untenable.
The first is what the company calls a "zone of ambiguity," the difficulty of defining precisely when a capability threshold has been crossed. Bright lines are clean in policy documents and messy in practice.
The second is the political climate. In plain terms, the regulatory environment in 2026 is not friendly to companies that voluntarily slow themselves down. The incentive structure has shifted.
The third is the most intellectually honest acknowledgment in the document: some of the mitigations required at higher ASL levels simply cannot be implemented by one company alone. They require industry-wide coordination, and that coordination does not yet exist.
This brings the company to what it calls the collective action problem. If Anthropic pauses and OpenAI, Google DeepMind, xAI, and others continue, the net effect on global AI risk is unclear and potentially negative. The company with the weakest safety practices sets the pace. A unilateral pause might feel principled and accomplish nothing.
That is a real argument. It is also one that every competitor would be happy for Anthropic to believe.
RSP v3.0 did not drop in a vacuum. The week of its release, Anthropic was simultaneously in a tense public standoff with the Pentagon.
According to CNN, Defense Secretary Pete Hegseth met with Anthropic CEO Dario Amodei and demanded that the company lift its usage restrictions so Claude could be used for "all lawful use" by the military. Anthropic had drawn two red lines: AI-controlled weapons and mass domestic surveillance of American citizens. The Pentagon wanted both removed.
The company holds a $200 million government contract. According to reporting from Axios and Bloomberg, Hegseth threatened to terminate that contract by Friday if Anthropic did not comply. The administration also reportedly planned to label Anthropic a "supply chain risk," a designation that would function as a government blacklist. There was discussion of invoking the Defense Production Act to compel cooperation.
This is not a peripheral detail. Every other major AI lab, including OpenAI, Google, and Elon Musk's xAI, had already agreed to lift guardrails for Pentagon work. Anthropic was the outlier. And as the Pentagon turned up the pressure, the company released a policy document reframing its safety commitments as contingent rather than absolute.
Anthropic has not publicly confirmed any link between the Pentagon pressure and the RSP timing. But as CNN noted, the proximity of the two events has not gone unnoticed.
The Register and Engadget both covered the dual storyline, with Engadget's headline directly connecting the two: "Anthropic weakens its safety pledge in the wake of the Pentagon's pressure campaign."
The response from researchers close to this space has been a mix of critique and cautious acknowledgment of the policy's new transparency mechanisms.
The most striking signal came two weeks before the RSP release. Mrinank Sharma, who led Anthropic's safeguards research team, resigned on February 9, 2026, posting a public letter on X stating "the world is in peril," as reported by Semafor. Sharma wrote that he had "repeatedly seen how hard it is to truly let our values govern our actions" at Anthropic, and insinuated a gap between the company's public commitments and internal practice. He specifically cited concerns about bioterrorism and AI-assisted catastrophic risks.
Sharma's resignation was not framed as being about the RSP specifically. But its timing, two weeks before a policy revision that softened Anthropic's hardest commitments, drew its own conclusions.
On Hacker News, the discussion thread on the TIME piece generated significant traffic. Analysis on LessWrong and other forums drew comparisons to Google abandoning "Don't be evil." Others noted that removing visible commitments weakens broader industry norms, since those norms have historically depended on peer pressure and public accountability. When the most safety-focused company drops its hardest line, it becomes harder for anyone else to hold theirs.
"They said they were different. Now they're not." This sentiment, repeated in various forms across forums and comment threads, captures the emotional charge of the response. The factual version is more complex.
The policy's new Frontier Safety Roadmap and Risk Reports are not nothing. External review of Risk Reports under certain conditions is a concrete accountability mechanism. Publishing every three to six months on threat models and active mitigations gives researchers and journalists something to check against. The old RSP had a bright line. The new one has more transparency. Whether those are equivalent trades depends on what you believe actually reduces risk.
| Feature | RSP v2 | RSP v3 |
|---|---|---|
| Hard pause trigger | ✓ Cannot train without proven mitigations | ✗ Replaced with dual condition |
| Dual condition required | ✗ Not applicable | ✓ Must lead AI race AND judge risk material |
| Frontier Safety Roadmap | ✗ Not required | ✓ Published publicly, updated regularly |
| Risk Reports | ✗ Not required | ✓ Every 3-6 months, with threat model details |
| External review of reports | ✗ Not required | ✓ Required under certain circumstances |
| Unilateral commitment scope | ✓ Full, regardless of competitors | ✗ Narrowed; broader industry track separated |
| Collective action framing | ✗ Not addressed | ✓ Explicitly acknowledged |
| Separation of company vs industry commitments | ✗ Treated as equivalent | ✓ Now formally separated |
The table tells the story. What Anthropic removed was a hard constraint. What it added was visibility. Whether the AI safety community accepts that trade will depend on whether the new transparency mechanisms are actually enforced.
If you build products on Claude today, the practical implications are limited in the short term. The API is the same. The usage policies are the same. Anthropic is not suggesting that Claude's capabilities or safety guardrails have changed.
The longer-term signal is subtler. Anthropic has repositioned itself from a company that would unilaterally stop before crossing certain lines, to one that will match or surpass competitor safety efforts and publish detailed documentation of where it stands. The bet is that transparency plus competitive parity produces better outcomes than a unilateral hard limit in a market where others will not follow.
For developers, this means the safety guarantees baked into Claude's design remain intact for now. But the framework that was supposed to prevent the company from ever building beyond a safe threshold has become conditional on competitive dynamics that the company itself assesses.
That is a different kind of promise. It may still be kept. But you should know it changed.
The Responsible Scaling Policy, or RSP, is Anthropic's internal framework governing how it develops and deploys increasingly capable AI models. It defines AI Safety Levels (ASLs) tied to model capability thresholds, and specifies what safety mitigations must be in place before Anthropic can train or deploy models at each level. The policy was first introduced in 2023 and is intended to be a public, binding commitment on the company's behavior.
Under the original framework (versions prior to v3.0), Anthropic committed to halting development of more capable AI models if it could not demonstrate in advance that its safety measures were adequate to handle the risks those models posed. This was a unilateral commitment, meaning it applied regardless of what other AI companies were doing.
RSP v3.0 replaces the categorical pause trigger with a two-part test. Anthropic will only delay AI development if its leadership concludes, simultaneously, that Anthropic is leading the AI race and that the risk of catastrophe is material. Both conditions must be true at the same time. If either is absent, development continues.
Anthropic cited three reasons: a "zone of ambiguity" in defining capability thresholds precisely, an increasingly anti-regulatory political climate, and the practical impossibility of implementing some higher-level mitigations without industry-wide coordination. The company frames the change as a response to a collective action problem, not an abandonment of safety values.
Anthropic's argument is that AI catastrophe risk depends on the behavior of the entire ecosystem of frontier AI developers, not just one company. If Anthropic pauses while OpenAI, Google, xAI, and others continue without equivalent safety measures, the net result could be a world where companies with weaker safety practices set the development pace. A unilateral pause, in this view, imposes costs on Anthropic without producing safety benefits for the world.
These are two new transparency mechanisms introduced in RSP v3.0. Frontier Safety Roadmaps are public documents describing Anthropic's concrete plans across Security, Alignment, Safeguards, and Policy. Risk Reports are published every three to six months and include threat model analysis, active mitigations, and Anthropic's overall risk assessment for deployed models. External experts can review Risk Reports under certain circumstances before publication.
Mrinank Sharma was Anthropic's head of safeguards research. He resigned on February 9, 2026, two weeks before RSP v3.0 was published, and posted a public letter stating "the world is in peril." He wrote that he had repeatedly seen how difficult it is for Anthropic to let its values govern its actions in practice, and cited concerns about AI-assisted catastrophic risks including bioterrorism. His resignation is relevant because it suggests internal tension about the gap between Anthropic's public safety commitments and internal culture, even before the policy change was announced.
The U.S. Department of Defense holds a $200 million contract with Anthropic. Defense Secretary Pete Hegseth reportedly demanded that Anthropic lift its usage restrictions to allow Claude to be used for "all lawful use" by the military, including areas Anthropic had flagged as red lines: AI-controlled weapons and mass domestic surveillance. The Pentagon reportedly threatened to terminate the contract and label Anthropic a "supply chain risk," a de facto government blacklist, if Anthropic refused. All other major AI labs had already agreed to lift their standard guardrails for Pentagon work. The standoff became public the same week RSP v3.0 was released.
No, not directly. The RSP governs Anthropic's internal development decisions, not the day-to-day behavior of Claude in the API or consumer products. Claude's usage policies, system prompt behavior, and built-in safety training are separate from the RSP framework. The policy change affects how Anthropic decides whether to build and release more capable future models, not what the current model does or refuses to do.
That depends on your model of how safety actually gets achieved in a competitive market. Critics argue that removing the hard unilateral pause weakens a credible, enforceable commitment and sets a bad precedent for the broader industry. Defenders argue that a pause no other company would follow is not actually safe, and that the new transparency mechanisms, Frontier Safety Roadmaps, Risk Reports, and external review, create more durable accountability. Both positions are coherent. The honest answer is that it trades one kind of assurance for another, and we will not know which bet was right until we are further into the capability curve.
While the DoD blacklisted Anthropic as a supply chain risk, Microsoft Azure and Google Cloud continue offering Claude to commercial enterprise clients — creating a two-tier AI reality.
Legendary computer scientist Donald Knuth publicly confirmed that Claude Opus 4.6 solved an open mathematical conjecture he'd pursued for weeks, calling it 'a dramatic advance in automatic deduction.'
The Department of Defense has formally designated Anthropic a supply-chain risk, the first US company ever to receive the label. Dario Amodei announced Anthropic will challenge the designation in court.