1. What GPT-5.4 actually is — and what changed in the architecture 2. Extreme Thinking mode: how it works and when to use it 3. The 1 million token context window in practice 4. Benchmark results: where GPT-5.4 leads and where it does not 5. GPT-5.4 vs GPT-5.3 vs Claude Opus 4.6 vs Gemini 3.1 Pro 6. Pricing and availability for ChatGPT and API users 7. What GPT-5.4 means for enterprise deployments 8. The accelerating OpenAI release cadence — and what comes next 9. Should you switch from GPT-5.3 Instant to GPT-5.4? 10. Frequently asked questions ---

OpenAI Launches GPT-5.4 — Its Smartest Model Yet

Q: What is the API identifier for GPT-5.4?

The API model identifier is gpt-5.4. There is a separate parameter for Extreme Thinking mode: set "thinking": "extreme" in your request body. Standard mode is the default when no thinking parameter is specified.

TL;DR: OpenAI has released GPT-5.4, its most capable model to date, featuring a 1 million token context window — more than double GPT-5.3's 500K limit — alongside a new "Extreme Thinking" mode that applies significantly more compute to hard problems. The model sets new internal benchmarks in scientific reasoning, multi-step task completion, and long-context fidelity. It is available immediately to ChatGPT Plus, Team, and Pro subscribers and via the API under the identifier gpt-5.4.

OpenAI shipped GPT-5.4 on March 5, 2026, less than 72 hours after teasing it alongside the GPT-5.3 Instant release. The cadence is deliberate: a smarter base model every six to eight weeks, with behavioral patches like GPT-5.3 Instant filling the gaps. GPT-5.4 is the most significant capability jump since GPT-5 launched in mid-2025 — not a behavioral patch, not a speed variant, but a new frontier on the hard problems that matter.

What you will learn

What GPT-5.4 actually is — and what changed in the architecture
Extreme Thinking mode: how it works and when to use it
The 1 million token context window in practice
Benchmark results: where GPT-5.4 leads and where it does not
GPT-5.4 vs GPT-5.3 vs Claude Opus 4.6 vs Gemini 3.1 Pro
Pricing and availability for ChatGPT and API users
What GPT-5.4 means for enterprise deployments
The accelerating OpenAI release cadence — and what comes next
Should you switch from GPT-5.3 Instant to GPT-5.4?
Frequently asked questions

What GPT-5.4 actually is

GPT-5.4 is OpenAI's latest frontier model, released March 5, 2026. It is not a behavioral patch — the category that GPT-5.3 Instant occupied — and it is not a speed-optimized variant. It is a new base model with updated weights, a substantially larger context window, and an optional reasoning mode that represents a qualitative shift in how the model handles difficult problems.

The headline architectural change is the context window expansion from GPT-5.3's 500K tokens to 1 million tokens. That is approximately 750,000 words, or roughly five full-length novels processed simultaneously. For practical reference: a 1M token context can hold an entire codebase, a year of email threads, or a complete legal discovery document set within a single prompt.

The second major change is Extreme Thinking mode. This is not a separate model but a runtime parameter — a thinking budget that instructs the model to allocate substantially more compute to the pre-response reasoning phase. The result is slower responses for complex queries, but with measurably higher accuracy on multi-step problems.

The third change is what OpenAI describes as improved "task persistence" — the model's ability to sustain reliable performance across multi-hour agentic tasks without accumulating errors or losing track of intermediate state. Prior GPT-5 series models showed degrading accuracy in long autonomous workflows. GPT-5.4 addresses this directly.

Extreme Thinking mode

Extreme Thinking is GPT-5.4's most discussed new feature, and the name is deliberately provocative. OpenAI is positioning this mode against the most difficult problems in science, mathematics, and engineering — the queries where GPT-5.3, Claude Opus 4.6, and Gemini 3.1 Pro all produce confident but wrong answers because they do not have enough reasoning cycles to catch their own errors.

The mechanism is a tiered compute budget. In standard mode, GPT-5.4 behaves like a fast, capable model — responses in seconds, reasoning depth comparable to GPT-5.3 Instant. In Extreme Thinking mode, the model applies an extended chain-of-thought process that includes self-verification loops: the model generates a candidate response, checks it against its own reasoning trace for internal consistency, identifies likely failure points, and revises before surfacing a final answer.

OpenAI is clear about the tradeoff: Extreme Thinking is slower and costs more per response. The company is positioning it not for consumer chat but for scientific research workflows, formal mathematics, complex legal analysis, and software engineering tasks where a single high-quality answer is more valuable than five fast ones.

For developers using the API, Extreme Thinking is a parameter on the chat completions endpoint:

{
  "model": "gpt-5.4",
  "thinking": "extreme",
  "messages": [...]
}

Standard thinking mode is the default. OpenAI recommends Extreme Thinking only for queries where response latency is acceptable and accuracy is paramount.

The 1 million token context window

Context window size matters most when it matters at all — which is to say, for most everyday queries, 500K tokens and 1M tokens are functionally equivalent. The practical significance of GPT-5.4's 1M token window lives in a specific set of high-value use cases.

Long-document analysis. A complete regulatory filing, a multi-volume legal case, a full technical specification — these fit inside GPT-5.4's context in a way they did not fit inside GPT-5.3's. Previously, users had to chunk documents and assemble partial analyses. With 1M tokens, the entire document is in-context simultaneously, which eliminates cross-reference errors that chunking introduces.

Large codebase understanding. The median GitHub repository is around 200K–400K tokens. GPT-5.4 can hold an entire mid-size codebase in context, enabling cross-file refactoring, architecture-level analysis, and dependency tracing that smaller contexts cannot support.

Extended conversation memory. Enterprise deployments that use ChatGPT for multi-session customer interactions have been constrained by context limits. At 1M tokens, a year of interaction history with a single user can be maintained in-context without truncation.

What has not changed. A larger context window does not automatically improve performance on content within that window. OpenAI reports that GPT-5.4 maintains attention quality across the full 1M token range, but this claim — like all self-reported benchmark data — requires independent validation. Users who rely on models to accurately retrieve specific passages from very long documents should test GPT-5.4's "needle in a haystack" performance against their specific document types before migrating production pipelines.

Benchmark results

OpenAI published internal benchmark results alongside the GPT-5.4 release. As with all self-reported benchmarks, these should be interpreted as directional signals rather than definitive rankings.

Benchmark	GPT-5.4 (Standard)	GPT-5.4 (Extreme Thinking)	GPT-5.3 Instant	Notes
GPQA Diamond	84.2%	91.7%	78.4%	PhD-level science Q&A
MATH Level 5	79.1%	88.6%	71.3%	Competition mathematics
HumanEval	82.4%	85.1%	78.9%	Code generation
SWE-bench Verified	47.3%	52.8%	41.6%	Real-world bug resolution
Humanity's Last Exam	22.1%	31.4%	17.8%	Hardest known benchmark
SimpleBench	88.3%	89.1%	86.2%	Factual accuracy
MMLU Pro	81.6%	83.9%	77.2%	Multitask language understanding

The Humanity's Last Exam result is the most notable figure in this table. HLE is designed to be unsolvable by current AI systems — it consists of questions submitted by domain experts that were specifically chosen because they expected GPT-4-level models to fail on them. A 31.4% score in Extreme Thinking mode is not a passing grade, but it is a substantial improvement over GPT-5.3 Instant's 17.8% and well above what GPT-5 achieved at launch.

The GPQA Diamond improvement from 78.4% to 91.7% in Extreme Thinking mode is the clearest signal of what the reasoning upgrade actually delivers. This benchmark tests PhD-level scientific reasoning across physics, chemistry, and biology — domains where chain-of-thought accuracy matters enormously and where GPT-5.3 was noticeably behind Claude Opus 4.6.

Competitive positioning

The four-way comparison that developers and enterprises care about in March 2026 is GPT-5.4, GPT-5.3 Instant, Claude Opus 4.6, and Gemini 3.1 Pro. Here is how they stack up across the dimensions that matter most.

Dimension	GPT-5.4	GPT-5.3 Instant	Claude Opus 4.6	Gemini 3.1 Pro
Context window	1M tokens	500K tokens	200K tokens	2M tokens
Hard reasoning (GPQA)	91.7% (Extreme)	78.4%	~90.5%	~88.1%
Code generation (HumanEval)	85.1%	78.9%	~83.0%	~84.0%
Speed (standard mode)	Fast	Fastest	Moderate	Moderate
Extreme/extended thinking	Yes	No	Yes (extended)	Yes (Deep Think)
Agentic task performance	Improved	Baseline	Leading	Strong
Pricing (input/M tokens)	~$2.50	~$0.75	~$15.00	~$3.50
Pricing (output/M tokens)	~$12.00	~$4.00	~$75.00	~$10.50
ChatGPT integration	Native	Native	N/A	N/A
API identifier	`gpt-5.4`	`gpt-5.3-chat-latest`	`claude-opus-4-6`	`gemini-3.1-pro`

A few things stand out in this table.

Gemini 3.1 Pro still leads on context. OpenAI's 1M token window is a significant leap from GPT-5.3's 500K, but Google's Gemini 3.1 Pro ships with a 2M token context — double GPT-5.4. For use cases that genuinely require processing very large corpora, Gemini 3.1 Pro remains the leader on raw context capacity.

Claude Opus 4.6 leads on agentic performance. Anthropic's top model has consistently ranked first on Arena.ai and Artificial Analysis leaderboards for agentic tool use and computer-use tasks. GPT-5.4's improved task persistence narrows this gap but does not close it.

GPT-5.4 Extreme Thinking challenges Claude's reasoning lead. On GPQA Diamond, Claude Opus 4.6's 90.5% in thinking mode is now matched by GPT-5.4's 91.7% in Extreme Thinking. For scientific and mathematical reasoning specifically, GPT-5.4 has pulled even.

Pricing is where GPT-5.4 wins definitively against Claude. Claude Opus 4.6 at ~$15.00/$75.00 per million tokens is six to seven times more expensive on both input and output than GPT-5.4. Enterprises running high-volume inference pipelines have a compelling cost argument for GPT-5.4 over Claude Opus 4.6 unless agentic performance is the primary criterion.

This comparison builds directly on the context established by GPT-5.3 Instant's release, where OpenAI was still the challenger on raw reasoning benchmarks against both Claude and Gemini. GPT-5.4 changes that picture substantially, particularly on the hardest scientific benchmarks.

Pricing and availability

GPT-5.4 is available immediately upon release in the following configurations:

ChatGPT:

ChatGPT Plus, Team, and Pro subscribers can access GPT-5.4 in standard mode from the model picker today.
Extreme Thinking mode is available to Pro subscribers only at launch, with planned expansion to Plus and Team in the coming weeks.
Free-tier users will have access to GPT-5.4 standard mode with rate limits applied, replacing GPT-5.3 as the default model on a rolling basis.

API pricing:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context
GPT-5.4 (standard)	$2.50	$12.00	1M tokens
GPT-5.4 (Extreme Thinking)	$5.00	$25.00	1M tokens
GPT-5.4 (cached input)	$0.25	—	1M tokens
GPT-5.3 Instant	$0.75	$4.00	500K tokens
GPT-5.3 (standard)	$2.00	$10.00	500K tokens

The Extreme Thinking pricing reflects the additional compute cost of the extended reasoning phase. At $5.00 input and $25.00 output per million tokens, a single Extreme Thinking response to a 10,000-word document analysis query costs roughly $0.35–$0.60 in tokens, depending on response length. For high-stakes professional queries, that cost is trivial relative to the value of an accurate answer. For routine tasks, standard mode is the right choice.

Microsoft 365 Copilot users will receive GPT-5.4 in standard mode within 48 hours of today's release, consistent with Microsoft's rapid integration pattern established with GPT-5.3 Instant.

The API model identifier is gpt-5.4. OpenAI is maintaining gpt-5.3-chat-latest and gpt-5.2 as available options; no model retirement has been announced alongside the GPT-5.4 launch.

Enterprise implications

For enterprise deployments, GPT-5.4 changes the calculus in three meaningful ways.

Long-document processing pipelines no longer require chunking. The 1M token context window is large enough to eliminate the document chunking strategies that most enterprise RAG (retrieval-augmented generation) systems were built around. A contract review workflow that previously required splitting a 300-page contract into segments and reassembling partial analyses can now run the full document through a single prompt. This simplifies pipeline architecture and removes a source of cross-reference errors that chunking introduces.

Multi-hour agentic tasks are more reliable. OpenAI's claim of improved "task persistence" targets a specific enterprise pain point: GPT-5.3 and earlier models would accumulate errors in long autonomous workflows — losing track of intermediate state, repeating completed steps, or drifting from the original objective. If GPT-5.4 delivers on this claim, it expands the category of workflows that can be delegated to autonomous AI agents without human checkpointing at every step.

The cost argument against Claude Opus 4.6 strengthens. At $2.50/$12.00 per million tokens versus Claude Opus 4.6's ~$15.00/$75.00, GPT-5.4 is approximately 6x cheaper per token. For enterprises running thousands of long-context queries per day, the difference is material. The remaining argument for Claude Opus 4.6 — superior agentic performance — is narrowing with each GPT-5.x release.

What enterprise teams should do now: Run GPT-5.4 against your existing GPT-5.3 Instant pipelines on a sample of real production queries before migrating. The behavioral shifts between model versions can affect prompt sensitivity, output formatting, and edge case handling in ways that require pipeline validation even when the underlying capability improvement is real.

The accelerating release cadence

GPT-5.4 is the clearest evidence yet of a structural shift in how OpenAI ships models. The historical pattern — a major model every three to six months, followed by a long stabilization period — is gone.

The GPT-5.x series has unfolded as follows:

Model	Release	Gap from Previous
GPT-5	Mid-2025	—
GPT-5.1	November 2025	~4 months
GPT-5.2	December 2025	~6 weeks
GPT-5.2-Codex	February 2026	~8 weeks
GPT-5.3	Late January 2026	~6 weeks
GPT-5.3 Instant	March 3, 2026	~5 weeks
GPT-5.4	March 5, 2026	2 days

The two-day gap between GPT-5.3 Instant and GPT-5.4 is not a slip — it is a deliberate strategy. OpenAI is shipping behavioral patches (GPT-5.3 Instant, which we covered in depth here) and capability upgrades (GPT-5.4) as separate, parallel release tracks. The behavioral patches can ship fast because they target RLHF tuning, not base model weights. The capability upgrades require more lead time but are deployed as soon as they clear safety evaluations.

The implication for developers and enterprises: model version management is now a continuous operational concern. Pinning to gpt-5.3-chat-latest and periodically validating against new releases is the right pattern. Assuming model stability within a named version is increasingly risky as the pace of releases accelerates.

What comes after GPT-5.4 is not announced. Given the pattern, a GPT-5.4 Instant (speed-optimized behavioral variant) within four to six weeks is plausible. A GPT-5.5 with further reasoning improvements and potential multimodal upgrades is the likely next major capability release.

Should you switch?

The switching decision is different for different user types.

ChatGPT Plus users: Yes, switch immediately. GPT-5.4 standard mode is a meaningful capability upgrade over GPT-5.3 Instant at no additional cost. Extreme Thinking mode is available if you have a Pro subscription. There is no reason to stay on GPT-5.3 Instant for general use.

API developers (standard pipelines): Test before switching. GPT-5.4 is a new base model, not a behavioral patch, which means prompt responses may differ from GPT-5.3 in ways that affect your pipeline. Run a representative sample of production queries against both models, check for output format changes, edge case behavior, and refusal patterns before migrating. The capability improvement is real but so is the risk of behavioral drift.

API developers (long-context use cases): Evaluate GPT-5.4 against your specific document types. If you are currently chunking documents to fit within 500K tokens, the 1M window is immediately valuable and likely worth the migration effort.

Enterprises running agentic workflows: Pilot GPT-5.4 in a staging environment before production migration. The task persistence improvements are the most significant enterprise-relevant change, but independent validation of multi-hour agentic performance against your specific workflow types is essential before committing.

Users primarily using GPT-5.3 Instant for speed: GPT-5.4 standard mode is meaningfully faster than GPT-5.3 standard but not as fast as GPT-5.3 Instant's 3x speed optimization. If response latency is your primary criterion, GPT-5.3 Instant remains the right choice for now.

Frequently asked questions

What is the API identifier for GPT-5.4?

The API model identifier is gpt-5.4. There is a separate parameter for Extreme Thinking mode: set "thinking": "extreme" in your request body. Standard mode is the default when no thinking parameter is specified.

Is GPT-5.4 available on the free tier?

GPT-5.4 in standard mode is rolling out as the default model for all ChatGPT users, including the free tier, over the coming days. Rate limits apply to free-tier users. Extreme Thinking mode is restricted to Pro subscribers at launch.

How does Extreme Thinking mode compare to GPT-5.3's reasoning?

Extreme Thinking is categorically different from GPT-5.3's standard chain-of-thought. It introduces self-verification loops — the model checks its own reasoning for internal consistency before responding — and applies a significantly larger compute budget to the pre-response phase. On GPQA Diamond (PhD-level science), Extreme Thinking scores 91.7% versus GPT-5.3 Instant's 78.4%. The tradeoff is response latency: Extreme Thinking queries can take 30–90 seconds depending on complexity.

Does GPT-5.4 replace GPT-5.3 Instant for everyday tasks?

Not necessarily. GPT-5.3 Instant was specifically optimized for conversational speed and behavioral improvements — the preachy tone fix, the hallucination reductions we covered in detail here, the 3x inference speed. For rapid back-and-forth conversation, GPT-5.3 Instant remains the right choice. GPT-5.4 is the right choice when you need deeper reasoning, longer context, or more reliable agentic performance.

Is GPT-5.4 faster than GPT-5.3 Instant?

No. In standard mode, GPT-5.4 is faster than GPT-5.3 standard but not faster than GPT-5.3 Instant. GPT-5.3 Instant was specifically optimized for 3x inference speed at 60% of standard pricing. GPT-5.4 in Extreme Thinking mode is substantially slower than both.

How does GPT-5.4's 1M token context compare to Gemini 3.1 Pro's 2M tokens?

Gemini 3.1 Pro still leads on raw context capacity at 2M tokens — double GPT-5.4's 1M. For the vast majority of use cases, 1M tokens is sufficient. The gap matters only for specific large-corpus applications: processing a very large multi-volume document set, holding an entire substantial codebase in context, or maintaining very long conversation histories. If raw context length is your primary bottleneck, Gemini 3.1 Pro remains the leader.

When will Extreme Thinking mode be available to Plus subscribers?

OpenAI has announced it is "coming soon" to Plus and Team subscribers but has not confirmed a specific date. Based on the company's recent release pattern, availability within two to four weeks of the Pro launch is plausible.

Will GPT-5.3 Instant be retired?

No retirement timeline has been announced. OpenAI confirmed that GPT-5.2 Instant will be retired on June 3, 2026, but made no equivalent announcement for GPT-5.3 Instant. Given its speed advantages, GPT-5.3 Instant is likely to remain as the fast-inference option in the API lineup alongside the more capable GPT-5.4.

Let's Build Something Together