TL;DR: The Office of Personnel Management has quietly updated its public AI use disclosure to remove Anthropic's Claude and add xAI's Grok and OpenAI's Codex, both listed as "Low-impact" tools with first production use in Q1 2026. The move is part of a broader federal compliance wave following Trump's directive ordering agencies to cease business with Anthropic within six months. The speed of the transition is raising serious questions about whether AI safety evaluation is keeping pace with AI procurement.
What you will learn
- What OPM specifically changed in its AI use disclosure inventory and what the "Low-impact" designation means
- How deeply embedded Claude was across federal agencies — from NASA chatbots to Treasury coding assistants to OPM's own HR drafting tools
- The scope and timeline of Trump's directive ordering federal agencies and military contractors to end Anthropic relationships
- xAI's parallel Pentagon deal to deploy Grok in classified systems, and why that deal matters beyond OPM
- Grok's documented history of producing racist and antisemitic outputs, and how that squares with its new federal role
- What OpenAI's Codex is being used for in government contexts and why OpenAI is positioned to capture most federal AI spend going forward
- Which agencies are moving fastest to comply and how FedScoop is tracking disclosure changes
- The structural risks of replacing a safety-first AI system with one that has documented safety failures
- What "locked-in infrastructure" means in the context of federal AI and why these choices are hard to reverse
- The geopolitical dimension of federal AI procurement and its Cold War parallels
The disclosure update: what OPM published and what it means
The Office of Personnel Management maintains a publicly accessible AI use disclosure inventory, a requirement under the previous administration's guidance that agencies document and disclose how they are using AI systems in agency operations. The inventory is not glamorous reading — it is a structured list of tools, use cases, and risk classifications. But when OPM updated that inventory in late February 2026, the change was significant enough that FedScoop flagged it immediately.
Claude — Anthropic's large language model — was removed. In its place: Grok, the AI model developed by Elon Musk's xAI, and Codex, OpenAI's code generation model. Both are listed under the "Low-impact" classification, with first production use recorded in Q1 2026. The timing is not coincidental.
The "Low-impact" designation in federal AI governance is a formal category that signals the agency has assessed the AI system and determined it does not meaningfully affect the rights, safety, or welfare of the public. It is the lowest tier in a three-level risk classification framework. Low-impact systems can often be deployed with lighter oversight burdens and fewer approval stages than systems classified as impactful or safety-critical. That classification is significant because it enables faster rollout — and because it implies that the new tools passed, or were deemed to need, minimal safety scrutiny before going into production.
What OPM was using Claude for before the swap matters here. According to FedScoop's earlier reporting on agency Claude deployments, OPM had integrated Claude into HR drafting workflows — generating policy language, drafting employee communications, and supporting document preparation tasks. These are not trivial functions. OPM manages personnel policy for roughly 2.3 million federal civilian employees. Errors in HR communications have downstream legal and operational consequences. The tools used in those workflows matter.
How Claude was embedded in federal government before the ban
OPM was far from the only agency that built operational workflows around Claude. FedScoop's deep-dive reporting — "NASA chatbots, Treasury coding, OPM drafting: How agencies have deployed Claude" — documented a wide range of embedded use cases across the federal government, and the picture that emerged was not of a peripheral tool being used for occasional experimentation. Claude had become operational infrastructure.
NASA deployed Claude-powered chatbots to handle internal knowledge base queries, reducing the burden on IT helpdesk staff and enabling engineers to surface technical documentation more quickly. Treasury used Claude for coding assistance tasks, including reviewing and generating code in financial systems contexts. OPM, as noted, used it for HR document drafting. Other agencies had integrated Claude into procurement assistance, legal document review, and internal research summarization workflows.
This matters for the transition now underway. When an AI model is used experimentally — for one-off queries or low-stakes drafting — switching tools is relatively painless. But when it is embedded in documented workflows with trained users, established prompts, integration points with agency systems, and institutional knowledge built around its specific behavior, the switch carries real operational cost. Staff need to be retrained. Outputs need to be re-evaluated against a new model's behavior. Edge cases that were managed with Claude may surface differently with Grok or Codex.
The fact that Claude was embedded this deeply also complicates the safety analysis of the switch. Claude was built by Anthropic, a company whose founding thesis is "responsible development and maintenance of advanced AI for the long-term benefit of humanity." Anthropic has published extensive research on AI safety, maintains detailed model cards, and refused Pentagon requests to modify its model's safety restrictions — a dispute that, according to Federal News Network, reached near-deadline status before the Trump directive resolved the standoff in the government's favor. You can read more on that friction in the context of Claude still running Pentagon ops while defense contractors flee to rivals.
The Trump directive: timeline and agency compliance requirements
The executive action that set this transition in motion was not subtle. CNN Business reported that the Trump administration issued a directive ordering military contractors and federal agencies to cease business with Anthropic — a sweeping mandate that went beyond advisory guidance into explicit instruction. ExecutiveGov separately confirmed: "Trump Halts Federal Use of Anthropic's Claude."
The directive gave agencies six months to phase out Anthropic products. That timeline, while appearing generous on paper, is unusually short for federal IT transitions. Government procurement cycles typically run 12 to 24 months for significant tool changes. A six-month window — covering contract unwinding, vendor notification, alternative tool procurement, user migration, and documentation — is aggressive. The agencies moving fastest are those, like OPM, that are already appearing in updated disclosure inventories. But the compliance deadline applies broadly.
Nextgov/FCW captured the wider compliance picture in its headline: "Agencies begin to shed Anthropic contracts following Trump's directive." Their reporting noted that the shedding is not uniform — some agencies are further along than others, and some have more complex integration patterns that make rapid transition difficult. But the direction of travel is clear, and the OPM disclosure update is the most concrete public evidence yet that the transition is producing real operational changes, not just contract modifications.
The geopolitical dimension of this directive is worth examining. Claude, developed by Anthropic — a company with significant backing from Google and a governance structure explicitly designed around AI safety concerns — represented a set of choices about how AI systems should behave in sensitive government contexts. Replacing it with Grok, developed by a company controlled by Elon Musk, and Codex, developed by OpenAI (which has its own Pentagon deal that came with its own safety debate — more on that below), represents a different set of choices. The Cold War parallel has been raised: technology choices in federal infrastructure tend to persist, shape institutional knowledge, and carry geopolitical implications long after the original policy rationale has evolved.
Grok's Pentagon deal: classified systems and xAI's government play
The OPM disclosure is not Grok's first move into federal territory. Axios reported on February 23 that Musk's xAI and the Pentagon have reached a deal to use Grok in classified systems. That deal — with its access to classified data environments — is a fundamentally different level of government integration than OPM's HR drafting workflows. It means Grok is being trusted with information that carries national security classification, inside air-gapped or controlled systems designed to handle sensitive intelligence and defense data.
The Pentagon deal and the OPM disclosure together sketch a picture of Grok moving rapidly across the federal government on multiple fronts simultaneously: from unclassified administrative functions at civilian agencies to classified systems at the Department of Defense. xAI's government strategy appears to be breadth-first — get deployed widely, establish a presence across use cases, and build the kind of institutional entrenchment that makes future contract challenges difficult.
This strategy, if it is working, will be hard for competitors to reverse. Federal contracts are sticky. Once agencies build workflows around a tool, train staff on it, and integrate it with existing systems, switching costs rise sharply. xAI appears to understand this dynamic.
The irony: Grok's safety record vs. Claude's safety-first stance
Here is where the transition becomes genuinely difficult to evaluate on its merits. Grok has a documented history of producing racist and antisemitic responses. These are not edge-case rumors — they are reported incidents, observed by journalists and researchers testing the model, that reflect real failures in Grok's content moderation and output filtering. The model has, on multiple occasions, generated content that would be immediately disqualifying for most enterprise AI deployments if produced during evaluation.
Claude, by contrast, was built by a company that has made AI safety its central organizing principle. Anthropic's refusal to modify Claude's safety restrictions for Pentagon use — documented in Federal News Network's reporting on the dispute — was precisely because the company viewed those restrictions as load-bearing for responsible AI behavior in high-stakes contexts. The company was willing to lose government contracts rather than weaken the safety properties of its model.
The directive that removed Claude from federal use was not driven by AI safety analysis. It was driven by policy, politics, and the specific dynamics of the Trump administration's relationship with Anthropic versus its relationship with xAI and OpenAI. Speed of procurement, driven by a six-month deadline, is now outpacing the AI safety evaluation processes that would ordinarily scrutinize a tool before deploying it in sensitive federal contexts.
This asymmetry — replacing a safety-focused AI with an AI that has documented safety failures — is one of the most significant risks embedded in the current transition. It does not mean Grok will cause harm in its specific federal deployments. But it means the usual safety evaluation pipeline has been compressed or bypassed in service of a political timeline, and that compression creates residual risk that federal AI governance frameworks are not currently equipped to catch quickly.
The contrast between Grok's procurement trajectory and the broader AI safety conversation is stark. For more on the safety trade-offs embedded in these government AI deals, see the analysis of OpenAI's Pentagon deal and the safety loopholes Anthropic refused to accept.
OpenAI's Codex in government: what it's being used for
OpenAI's Codex, the code generation model that first launched as part of GitHub Copilot's backend before becoming available as a standalone API, is a different kind of tool from either Claude or Grok. It is specialized for code — understanding, generating, completing, and reviewing software. Its inclusion in OPM's updated disclosure alongside Grok suggests the agency is segmenting its AI tool use by function: Grok for general-purpose text tasks, Codex for technical and coding workflows.
This segmentation makes sense operationally. Coding tasks have different evaluation criteria than prose drafting tasks. Codex can be assessed on whether the code it generates compiles, runs correctly, meets functional requirements, and avoids security vulnerabilities. These are more tractable evaluation dimensions than the harder-to-measure qualities of general-purpose language model outputs like tone, accuracy on factual claims, or resistance to manipulation.
OpenAI is positioned to capture the largest share of federal AI spend in the long term. The company's deal with the Pentagon — which itself triggered a significant public backlash, with 1.5 million users joining a coordinated QuitGPT boycott over OpenAI's Pentagon deal — established OpenAI as a primary federal AI vendor. Codex appearing in OPM's inventory alongside Grok signals that even agencies that are adding xAI tools are also adding OpenAI tools. The federal market, at least in the near term, is not winner-take-all — it is multi-vendor, with different tools occupying different functional niches.
The 6-month clock: which agencies are moving fastest
Six months is a short window for federal IT transitions, and the pace of compliance is uneven. OPM's updated disclosure is early evidence of agencies that are moving fast. The pattern FedScoop is tracking — looking for explicit disclosure changes that document tool substitutions — gives some visibility into compliance velocity, though it is an incomplete picture because not all agencies update their disclosures on the same schedule or with the same specificity.
The agencies likely to move fastest are those where Claude integration was lightest — used in pilot programs, one-off experiments, or low-criticality workflows that can be swapped without significant disruption. The agencies that will struggle with the six-month deadline are those where Claude is embedded in operational workflows with trained users, documented prompting strategies, and integration points with other agency systems.
Treasury, given its use of Claude for coding assistance in financial systems contexts, faces a more complex migration than OPM's HR drafting workflows. NASA's chatbot deployments, depending on how deeply they are integrated with internal knowledge bases, may also require extended transition timelines. These agencies may request deadline extensions, or they may comply on paper while maintaining legacy Claude integrations in practice during the transition period.
The enforcement mechanism for the directive is not fully specified in public reporting. Agency inspectors general and the Office of Management and Budget are the likely oversight actors, but the specific compliance verification process has not been publicly documented. That ambiguity may affect how aggressively agencies prioritize the transition versus other operational demands.
What federal AI procurement looks like after Anthropic
Anthropic's effective exclusion from the federal market is not just a competitive setback for the company — it is a structural change in how federal AI procurement works. Until recently, the federal AI vendor landscape included a meaningful range of companies with different safety philosophies, governance structures, and technical approaches. Anthropic's presence created competitive pressure that gave federal procurement officers a credible alternative when vendors proposed changes to AI safety properties that agencies found uncomfortable.
With Anthropic out of the picture, the remaining major vendors — OpenAI and xAI in particular — face less competition on safety grounds. OpenAI's willingness to accept the Pentagon deal that Anthropic refused, and xAI's willingness to deploy Grok in classified systems despite Grok's documented safety history, suggests that the safety floor in federal AI procurement has shifted downward. Agencies negotiating new AI contracts now have fewer credible alternatives if they want a vendor that will hold firm on safety properties.
The Claude-related reporting on military operations — including the notable case of Claude being used in Iran strikes hours after Trump's ban was announced — illustrates how deep Claude's operational integration ran in some contexts, and how abrupt the transition mandate was regardless of operational readiness.
This dynamic has implications for the procurement officers and contracting officers who will be negotiating the new Grok and Codex contracts. They are operating with a compressed timeline, a narrowed vendor field, and reduced leverage to impose safety requirements that would have seemed reasonable in a more competitive market.
Long-term implications: when government AI choices become locked-in infrastructure
Federal technology choices have a long half-life. The government is still running COBOL systems that were deployed in the 1960s. The email systems, the HR platforms, the financial management systems — all of them reflect technology choices made under previous administrations, by procurement officers who had no reason to anticipate how long those choices would persist. AI is different in some respects — models change, new versions are released, and the underlying capabilities evolve faster than legacy software. But the institutional knowledge built around a specific model, the trained users, the documented workflows, the organizational understanding of what the tool can and cannot do — all of that persists even when the model itself changes.
The agencies that are now deploying Grok and Codex are building that institutional knowledge around xAI and OpenAI products. Staff are learning to write prompts for Grok. IT teams are building integrations with Codex's API. Procurement officers are developing relationships with xAI and OpenAI account teams. All of that investment creates switching costs that compound over time.
This is why the OPM disclosure change, while small in isolation, matters as a leading indicator. It is not just a tool swap — it is the beginning of an institutional reorientation toward a different set of AI vendors with different safety philosophies, different governance structures, and different relationships to the current administration. The agencies that adopt Grok and Codex now will be more likely to adopt xAI and OpenAI products in future procurement cycles. The pattern, once established, tends to reinforce itself.
The federal AI governance frameworks that are supposed to catch safety problems — the AI use inventories, the risk classifications, the disclosure requirements — were designed to slow this kind of rapid institutional reorientation and create checkpoints for evaluation. The six-month mandate is running faster than those frameworks can operate. The result is that the safety evaluation infrastructure and the procurement timeline are no longer synchronized.
Whether that misalignment produces harm in specific federal deployments is not yet knowable. Grok's racist and antisemitic outputs were produced under testing conditions; they may not surface in the constrained, workflow-specific contexts where OPM will deploy it. Codex's code generation may perform adequately for the specific coding tasks where Treasury and other agencies deploy it. The short-term risk is manageable. The long-term risk — of institutional dependence on AI systems whose safety properties were accepted under political pressure rather than rigorous evaluation — is harder to measure and harder to unwind.
The OPM disclosure is a small document. A few lines changed. Claude removed, Grok and Codex added. But it is also a record of a decision that will shape how the federal government relates to AI systems for years beyond the current administration. That is worth reading carefully.
Sources: FedScoop ("OPM drops Claude, adds Grok and Codex to AI use disclosure"; "NASA chatbots, Treasury coding, OPM drafting: How agencies have deployed Claude"); CNN Business ("Trump administration orders military contractors and federal agencies to cease business with Anthropic"); Nextgov/FCW ("Agencies begin to shed Anthropic contracts following Trump's directive"); Axios ("Musk's xAI and Pentagon reach deal to use Grok in classified systems," Feb 23, 2026); Federal News Network ("Anthropic refuses to bend to Pentagon on AI safeguards as dispute nears deadline"); ExecutiveGov ("Trump Halts Federal Use of Anthropic's Claude").