TL;DR: On March 17, 2026, OpenAI released two new models: GPT-5.4 Mini — more than 2x faster than its predecessor with benchmark scores approaching the flagship GPT-5.4 — and GPT-5.4 Nano, OpenAI's cheapest model ever at $0.20 per million input tokens, purpose-built for agentic subagent workflows. Free and Go tier ChatGPT users get the Mini upgrade immediately. Nano is API-only and targets the classification, extraction, and ranking tasks that make up the grunt work of autonomous agent pipelines. Both models share a 400,000-token context window and full multimodal, tool-use support. The launch marks OpenAI's clearest signal yet that the agentic economy — where AI models delegate to other AI models at scale — has moved from concept to production priority.
What you will learn
- What GPT-5.4 Mini and Nano actually are
- Benchmark performance: Mini vs flagship
- The free tier upgrade: what changes today
- Nano: the $0.20/M agent economy model
- Why this matters for agentic workflows
- How Mini and Nano compare to competitors
- The pricing reality: capability gains vs cost creep
- Enterprise implications: the ROI inflection point
- What developers should do now
- Frequently asked questions
What GPT-5.4 Mini and Nano actually are
OpenAI's March 17 announcement introduced two distinct models serving different positions in the inference cost-performance spectrum.
GPT-5.4 Mini is a midrange model designed for high-volume, fast workloads that previously required making a hard tradeoff between capability and cost. It runs more than 2x faster than GPT-5 Mini while approaching GPT-5.4's performance on key benchmarks — a combination that was genuinely unavailable at this price point before March 17. It is available in ChatGPT (Free and Go tiers), Codex, and the API. For users hitting the GPT-5.4 Thinking rate limit, Mini now serves as the fallback.
GPT-5.4 Nano is something different: the cheapest model in the GPT-5.4 family, priced at $0.20 per million input tokens and $1.25 per million output tokens. OpenAI explicitly built it for subagent use cases — the classification, data extraction, document ranking, and coding subtasks that a larger orchestration model delegates downward. It is currently API-only, with no ChatGPT integration.
Both models support:
- Text and image inputs (multimodal)
- Tool use and function calling
- Web search and file search
- Computer use
- A 400,000-token context window
- Structured output and JSON mode
The context window alone is significant: 400K tokens means both models can process long documents, multi-turn conversation histories, and large codebases in a single pass — a capability previously restricted to premium tiers.
The benchmark numbers tell a clear story: GPT-5.4 Mini punches well above its class on the evaluations that matter most for real-world use.
SWE-Bench Pro (coding):
GPT-5.4 Mini scores 54.4%, versus 45.7% for GPT-5 Mini and 57.7% for the full GPT-5.4. That 8.7-point gain over its predecessor — and the 3.3-point gap to the flagship — means Mini is operating at roughly 94% of GPT-5.4's coding capability at a fraction of the cost.
OSWorld-Verified (computer use):
This is where the upgrade is most dramatic. GPT-5.4 Mini hits 72.1%, compared to 42.0% for GPT-5 Mini and 75.0% for the full GPT-5.4. A 30-point gain in computer use capability represents a genuine architectural improvement, not incremental tuning. For developers building agents that interact with GUIs, execute shell commands, or automate browser workflows, this number matters.
Toolathlon (tool use):
GPT-5.4 Mini scores 42.9% versus GPT-5 Mini's 26.9% — a 16-point improvement on the benchmark most directly relevant to agentic pipelines. Tool use accuracy is the linchpin of multi-step agent workflows; a model that calls the wrong function or formats arguments incorrectly breaks the entire chain.
GPQA Diamond (graduate-level reasoning):
Mini scores 88.0%, up from 81.6% for GPT-5 Mini. For reference, many researchers use GPQA Diamond as a proxy for "can this model handle complex domain reasoning?" An 88% score puts Mini well into territory previously requiring much larger, more expensive models.
Telecom tau2-bench (customer service workflows):
Mini scores 93.4% versus 74.1% for GPT-5 Mini — a 19-point leap that signals strong gains in structured dialogue and multi-turn task completion, the backbone of enterprise automation use cases.
GPT-5.4 Nano benchmarks:
Nano's benchmark profile is deliberately more focused. It scores 52.4% on SWE-Bench Pro (higher than GPT-5 Mini's 45.7%) and 39.0% on OSWorld-Verified (slightly below GPT-5 Mini). Its Terminal-Bench 2.0 score is 46.3%, versus Mini's 60.0%. Nano trades deep reasoning for speed and cost — it is not designed to handle the same tasks as Mini.
One notable gap: On long-context needle retrieval tasks (OpenAI MRCR v2, 8-needle, 64K-128K range), Mini scores 47.7% versus 86.0% for the full GPT-5.4. If your use case involves finding specific facts across very long documents, Mini underperforms significantly compared to the flagship. This is the most important limitation to understand before migrating workloads.
The free tier upgrade: what changes today
For ChatGPT's free users, March 17 marked a meaningful capability jump.
GPT-5.4 Mini now powers the Free and Go tier experience, accessible through ChatGPT's "Thinking" mode feature. The upgrade brings materially better coding assistance, improved multi-step reasoning, and stronger multimodal understanding — without any change in subscription cost.
What this means practically:
- Free users get access to a model that outperforms GPT-5 Mini on every major coding and reasoning benchmark
- The 400K context window is available in Free tier — a context length that was premium-gated before
- Computer use and tool-calling capabilities, while primarily relevant to API users, underpin some of the agentic features rolling out in ChatGPT products
What hasn't changed for free users:
- Rate limits remain in place — GPT-5.4 Mini serves as a fallback when GPT-5.4 Thinking limits are hit
- Nano is not available in ChatGPT; it remains API-only
- Advanced coding workflows in Codex require paid API access
The free tier upgrade matters beyond the individual user. It accelerates adoption curves by letting developers prototype with Mini-grade capability before committing to API spending — a deliberate on-ramp strategy OpenAI has used successfully since GPT-3.5.
Nano: the $0.20/M agent economy model
At $0.20 per million input tokens — with cached input at $0.02 per million tokens — GPT-5.4 Nano is OpenAI's first model priced specifically for the economics of autonomous agent pipelines.
To understand why the pricing matters, consider how agentic systems actually operate. A typical multi-agent workflow involves:
- A large orchestrator model (GPT-5.4 or equivalent) handling planning, synthesis, and decisions requiring deep reasoning
- Multiple subagent models performing discrete, bounded tasks — classifying a document, extracting structured data from a web page, ranking a list of results, or running a specific code snippet
The orchestrator runs infrequently and justifies its higher cost. The subagents run constantly — sometimes dozens or hundreds of times per user request. At $2.50 per million input tokens for GPT-5.4, routing subagent work to the flagship is economically unsustainable at scale. Nano changes that equation.
Simon Willison's practical cost analysis illustrated this concretely: describing a single museum photo using Nano costs approximately 0.069 cents (2,751 input + 112 output tokens). Scaling that to his 76,000-photo archive yields a total cost of roughly $52.44. Processing a collection of that size was previously either prohibitively expensive or required using a model that couldn't match Nano's capability-to-cost ratio.
OpenAI's recommended Nano use cases:
- Document classification (routing emails, support tickets, legal filings)
- Data extraction (parsing structured information from unstructured text)
- Ranking and scoring (relevance ranking, sentiment classification, entity recognition)
- Coding subagents handling simple, well-defined subtasks
- Bulk image analysis and captioning
Where Nano is not suitable:
- Complex reasoning chains requiring sustained context tracking
- Multi-step computer use tasks (OSWorld-Verified score of 39.0% makes this clear)
- Deep document analysis across long contexts
- Planning tasks requiring judgment about ambiguous requirements
The key insight is that Nano is not a degraded Mini — it is a model purpose-built for a different set of tasks. Treating it as a budget Mini and routing complex work to it will produce poor results. Treating it as a specialized subagent for bounded tasks will produce excellent results at near-commodity pricing.
Why this matters for agentic workflows
The dual release of Mini and Nano is not just a product update — it is OpenAI's architectural statement about how production AI systems should be built.
Multi-model agent architectures have become the dominant pattern in serious AI deployments. The pattern is straightforward: expensive models do the thinking, cheap models do the doing. What has historically blocked adoption of this pattern at scale is the inference economics — cheap models were too weak to be reliably useful, and capable models were too expensive to run at subagent frequency.
GPT-5.4 Mini and Nano together close that gap. Mini's 54.4% SWE-Bench Pro score makes it a credible orchestration layer for coding workflows where GPT-5.4 is overkill. Nano's 52.4% SWE-Bench Pro score — higher than the previous-generation GPT-5 Mini — makes it a credible worker model for discrete coding subtasks.
The Toolathlon benchmark is particularly relevant here. Mini's 42.9% versus GPT-5 Mini's 26.9% represents a 60% relative improvement in tool-calling accuracy. In agentic pipelines, tool-calling errors compound: a single wrong function call can derail a multi-step workflow and require expensive retries. A model that calls tools reliably is worth more than its benchmark score suggests, because it reduces failure rates multiplicatively across pipeline steps.
The 400K context window also matters for agentic applications in ways it doesn't for simple chat. Agent workflows frequently accumulate large context windows through conversation history, retrieved documents, tool call results, and intermediate reasoning. Fitting this into a 128K window required constant context management overhead. At 400K tokens, that overhead largely disappears — agents can maintain richer state without expensive context trimming operations.
How Mini and Nano compare to competitors
The small model market is genuinely competitive in early 2026. GPT-5.4 Mini and Nano enter a landscape with three serious rivals: Anthropic's Claude 4.5 Haiku, Google's Gemini 3.1 Flash-Lite, and Meta's open-source Llama 4 Scout.
Claude 4.5 Haiku is priced at $1.00/M input and $5.00/M output — meaningfully more expensive than Mini ($0.75/$4.50) and significantly more expensive than Nano ($0.20/$1.25). Its SWE-Bench Verified score of 73.3% is competitive with Mini's 54.4% on SWE-Bench Pro (note: these are different benchmarks, making direct comparison difficult). Haiku is particularly strong on multi-step agentic workflows and instruction following. It does not have a clear cost advantage against Nano for classification and extraction tasks, where Nano's 4x lower input pricing is hard to overcome.
Gemini 3.1 Flash-Lite is priced at $0.25/M input and $1.50/M output, sits between Nano and Mini on pricing, and offers a 1M-token context window — a genuine advantage over both OpenAI models for extremely long document processing. It scores 86.9% on GPQA Diamond (versus Mini's 88.0%) and can exceed 380 tokens per second in throughput-optimized deployments. For bulk processing workloads where context length or raw throughput matters more than agentic tool-calling accuracy, Gemini 3.1 Flash-Lite remains competitive.
Llama 4 Scout and other open-source options add another dimension: for teams that can host models or use providers like Together AI or Fireworks, open-source alternatives can undercut even Nano's pricing at sufficient scale. The tradeoff is operational complexity, reliability SLAs, and multimodal capability gaps.
The competitive landscape summary:
- Pure cost: Gemini 3.1 Flash-Lite and open-source models can match or beat Nano for bulk text tasks
- Computer use: Mini's 72.1% OSWorld-Verified score is a genuine differentiator — no competitor small model comes close
- Ecosystem integration: Mini's availability in ChatGPT, Codex, and the full API toolchain reduces integration friction for existing OpenAI customers
- Tool-calling reliability: Mini's Toolathlon gain (+16 points over GPT-5 Mini) is a differentiator for pipeline-critical applications
The pricing reality: capability gains vs cost creep
OpenAI's release comes with a pricing caveat that developers should understand clearly before planning migrations.
GPT-5.4 Mini costs $0.75 per million input tokens. GPT-5 Mini cost $0.25 — a 3x increase. Output tokens went from $2.00 to $4.50, a 2.25x increase. GPT-5.4 Nano costs $0.20 per million input tokens versus $0.05 for GPT-5 Nano — a 4x increase. Output tokens went from $0.40 to $1.25, a 3.1x increase.
The capability gains are real. The cost increases are also real. For workloads currently running on GPT-5 Mini or GPT-5 Nano, migrating to GPT-5.4 equivalents at equivalent volume will increase your API spend meaningfully — unless the improved capabilities allow you to reduce call frequency, use shorter prompts, or eliminate retry overhead.
The economic calculus is workload-specific:
Cases where the upgrade clearly pays: If your workflow currently uses GPT-5 Mini for computer use tasks, the 30-point OSWorld gain (42.0% → 72.1%) will dramatically reduce failure rates and retry costs. The net cost per successful task completion may be lower even at 3x the per-token price.
Cases where the upgrade needs analysis: Bulk text classification running at high volume on GPT-5 Nano at $0.05/M input is a workload where switching to Nano at $0.20/M requires demonstrating 4x reduction in total token consumption or failure rates to break even. If the task accuracy improvement is not that large, the economics are tight.
Cases where staying put makes sense: Simple text generation, summarization, or question-answering tasks that GPT-5 Mini already handles reliably at high accuracy will see cost increases without proportional quality gains.
The smart approach is staged migration: run a sample of your production workload on both the old and new model, measure accuracy and failure rates, calculate the real cost per successful output, and make the migration decision based on that number rather than the per-token price alone.
Enterprise implications: the ROI inflection point
For enterprise AI teams, the GPT-5.4 Mini and Nano launch represents a genuine inflection point — but not for the reason most coverage suggests.
The important number is not $0.20 per million tokens. The important number is what happens when you multiply that pricing by the volume of subagent calls in a production multi-agent system.
Consider a document processing workflow that runs 10,000 classification decisions per day. At GPT-5 Nano's previous pricing of $0.05/M input tokens, with average inputs of 500 tokens, that costs roughly $0.25/day — already cheap. At GPT-5.4 Nano's $0.20/M, the same workflow costs $1.00/day. That 4x increase matters less than the fact that Nano's improved accuracy reduces the downstream error rate from (hypothetical) 8% to 3%, cutting expensive human-review interventions by 62%.
The real ROI calculation for enterprise AI teams is: does the capability improvement reduce the cost of downstream error handling, human review, and workflow retries by more than the token price increase? In most agentic workflows with meaningful accuracy gaps, the answer is yes — often by a significant margin.
The 400K context window also carries enterprise-specific value. Legal, financial, and regulatory document processing workflows frequently bump against 128K context limits, requiring either document truncation (which reduces accuracy) or expensive chunking pipelines. At 400K tokens, most real-world enterprise documents fit in a single pass, simplifying both architecture and billing.
Migration considerations:
- Start with workloads currently failing due to model capability limits, not cost-optimized workloads running well
- Benchmark accuracy improvement against your specific task distribution, not published benchmarks
- For mixed orchestrator-subagent architectures, evaluate which tier (Mini or Nano) best fits each role
- Build cost monitoring before migrating to catch unexpected volume increases from changed user behavior
What developers should do now
API access: Both models are live in the OpenAI API as of March 17. GPT-5.4 Mini is available as gpt-5.4-mini and GPT-5.4 Nano as gpt-5.4-nano in the completions and chat API. Both support all existing API parameters — no migration of request format is required.
ChatGPT and Codex: GPT-5.4 Mini is live in ChatGPT for Free and Go users via the Thinking feature selector. Codex users get Mini automatically. Nano requires API access and is not available in ChatGPT interfaces.
Immediate evaluation steps:
-
Identify your subagent workloads. Any workflow where you are currently routing classification, extraction, or ranking tasks to a larger model is a Nano candidate. Pull a representative sample of 100-500 inputs and run them through Nano to measure accuracy against your quality bar.
-
Benchmark computer use workflows. If you are building anything with computer use, OpenAI's 30-point OSWorld improvement for Mini is worth empirically validating on your specific task. The benchmark score is representative, but your task distribution may differ.
-
Test tool-calling reliability. For pipeline-critical tool use, run your function-calling suite against Mini and compare failure rates to GPT-5 Mini. The Toolathlon improvement is significant enough that many pipelines will see material reliability gains.
-
Run a cost model before migrating. Use OpenAI's pricing page to project costs at your actual token volumes. Factor in the accuracy improvement's impact on retry rates and downstream error costs. The 3-4x token price increase requires a real capability benefit to justify.
-
Check context window fit. If you have been doing context management or chunking to fit within 128K limits, your architecture may simplify significantly at 400K. This is worth testing — simpler pipelines reduce latency, error surface, and maintenance overhead.
-
Don't overlook cached input pricing. Nano's cached input price is $0.02 per million tokens — ten times cheaper than the standard rate. For workflows with repeated system prompts, fixed documents, or shared context, caching can reduce effective input costs dramatically. This changes the break-even analysis significantly for high-volume use cases.
Frequently asked questions
What is GPT-5.4 Mini and how does it differ from GPT-5 Mini?
GPT-5.4 Mini is OpenAI's updated mid-range model, released March 17, 2026. It runs more than 2x faster than GPT-5 Mini while scoring significantly higher on every major benchmark — 54.4% on SWE-Bench Pro (vs. 45.7%), 72.1% on OSWorld-Verified (vs. 42.0%), and 42.9% on Toolathlon (vs. 26.9%). It is available in the API, Codex, and ChatGPT (Free and Go tiers). The tradeoff is pricing: $0.75/M input versus GPT-5 Mini's $0.25/M.
What is GPT-5.4 Nano designed for?
Nano is OpenAI's cheapest model, priced at $0.20/M input tokens, and is purpose-built for subagent tasks in multi-agent pipelines: document classification, data extraction, ranking, and simple coding subtasks. It is not a budget replacement for Mini — it is a specialized model for bounded, high-volume tasks where a larger model is too expensive to run at scale. It is currently available via API only.
Yes. Despite being cheaper than GPT-5 Mini, Nano scores 52.4% on SWE-Bench Pro versus GPT-5 Mini's 45.7%. On the tasks it is designed for — classification and extraction — it outperforms the previous generation. It underperforms on computer use (OSWorld-Verified: 39.0% vs. GPT-5 Mini's 42.0%), which reflects its narrower design focus.
What is the context window size for both models?
Both GPT-5.4 Mini and GPT-5.4 Nano support a 400,000-token context window, a significant increase over the 128K window available in the previous Mini generation. This enables processing of long documents, extended conversation histories, and large codebases in a single API call.
How does GPT-5.4 Mini compare to Claude 4.5 Haiku and Gemini 3.1 Flash-Lite?
Claude 4.5 Haiku is priced higher ($1.00/M input vs. $0.75/M for Mini) and scores strongly on instruction-following in agentic workflows. Gemini 3.1 Flash-Lite ($0.25/M) is cheaper and offers a 1M-token context window, but lacks Mini's computer use capability (72.1% OSWorld-Verified). Mini's main differentiators are its computer use performance and native integration with OpenAI's Codex, ChatGPT, and tool ecosystem.
Is the price increase from GPT-5 Mini worth it?
It depends on your workload. The 3x input price increase is offset by meaningful capability gains for computer use (30-point OSWorld improvement) and tool calling (16-point Toolathlon improvement). Workflows that currently experience high failure rates due to model capability gaps will often see net cost reductions. Workloads that are already accurate and cost-optimized on GPT-5 Mini should do a careful cost-benefit analysis before migrating.
Sources: OpenAI, March 17, 2026, 9to5Mac, The Decoder, Simon Willison, Adam Holter