China's MiniMax M2.5 rivals Claude at one-tenth the price and UBS just initiated a buy
MiniMax M2.5 is reaching a third of Claude's usage at 10% of the cost. UBS initiated a buy rating. Five new Chinese AI models are reshaping the competition.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: MiniMax M2.5, released February 12, scores 80.2% on SWE-Bench Verified — within 0.6 points of Claude Opus 4.6 — at roughly one-tenth the cost per task. Its real-world usage has already hit one-third of Claude's volume. UBS initiated coverage with a Buy and an optimistic case pegging global enterprise revenue potential at $41 billion. Meanwhile, four more Chinese labs dropped frontier-level models in the same compressed window. The gap is closing faster than most Western analysts expected.
What you will learn:
When Wall Street analysts describe a competitive threat, they usually speak in potential. UBS did something more pointed: they cited actual traction. According to a mid-February UBS note, MiniMax's AI usage had already reached one-third of Anthropic's Claude, at one-tenth the price.
That is not a projection. That is a ratio describing what enterprise developers had already chosen to do with their API budgets before the UBS report was even published. The model behind that number — MiniMax M2.5, released February 12, 2026 — had been live for days before it started reshaping analyst commentary.
The pricing arithmetic is not subtle. Claude Opus 4.6 costs $5 per million input tokens and $25 per million output tokens. MiniMax M2.5 costs $0.30 per million input tokens and $1.20 per million output tokens. MiniMax's own framing puts the per-task cost at approximately $0.15 versus $3.00 for Anthropic's top model — a 20x differential at the task level, depending on token volume.
MiniMax calls M2.5 "the first frontier model where users do not need to worry about cost." That is a marketing claim, but the benchmark numbers make it hard to dismiss.
MiniMax is a Beijing-based AI company founded in early 2022 by Yan Junjie, a former executive at SenseTime. The company went public in Hong Kong on January 9, 2026, raising $619 million in its IPO — a debut that preceded the M2.5 release by five weeks.
M2.5 is a Mixture-of-Experts (MoE) model with 230 billion total parameters, but only 10 billion active parameters during inference. That architectural choice is the primary reason the pricing works. The model runs at full capability while activating a fraction of its weights per forward pass, dramatically reducing compute cost per token without sacrificing the quality that 230 billion parameters enables during training.
The model was trained with reinforcement learning across hundreds of thousands of complex real-world environments — not just synthetic code tasks, but agentic workflows involving browser searches, API calls, and file system operations. MiniMax built a proprietary training framework called Forge with a 40x training speedup, alongside the CISPO algorithm for MoE model stability and a process reward mechanism for long-context credit assignment.
Two model variants were released simultaneously:
| Variant | Input (per 1M tokens) | Output (per 1M tokens) | Throughput |
|---|---|---|---|
| MiniMax M2.5 | $0.30 | $1.20 | ~50 tokens/sec |
| MiniMax M2.5 Lightning | $0.30 | $2.40 | ~100 tokens/sec |
| Claude Opus 4.6 | $5.00 | $25.00 | ~22 tokens/sec |
| GPT-5 | ~$5.00 | ~$20.00 | varies |
M2.5 Lightning doubles throughput at twice the output token cost — still a fraction of competing frontier models. At 100 tokens per second, MiniMax calculates the full runtime cost at approximately $1 per hour. That number is designed to reframe the conversation: not cost per token, but cost per hour of autonomous operation.
The headline benchmark is SWE-Bench Verified, the standard software engineering evaluation where models are given real GitHub issues and asked to produce working patches.
| Model | SWE-Bench Verified | Multi-SWE-Bench | BrowseComp |
|---|---|---|---|
| MiniMax M2.5 | 80.2% | 51.3% | 76.3% |
| Claude Opus 4.6 | 80.8% | — | — |
| GPT-5 | ~80.0% | — | — |
| Gemini 3 Pro | ~78.0% | — | — |
The 0.6-point gap between M2.5 and Claude Opus 4.6 on SWE-Bench is essentially a tie within measurement noise. M2.5 completes the evaluation at roughly the same speed as Opus 4.6 — 37% faster than its predecessor M2.1 — while costing a fraction of the price per task.
On BrowseComp, a benchmark evaluating a model's ability to research the web and retrieve specific information through multi-hop queries, M2.5 scores 76.3% with context management. This is an agentic metric — it measures not raw language ability but the model's effectiveness at completing multi-step tasks in real environments. That is where the enterprise value proposition lives.
M2.5 also outperforms Opus 4.6 on two secondary evaluations: Droid (79.7 vs 78.9) and OpenCode (76.1 vs 75.9). These are narrower benchmarks, but they matter because they represent the kind of recurring, high-volume agentic tasks that enterprise customers pay for at scale.
UBS initiated coverage of MiniMax with a Buy-equivalent rating shortly after the M2.5 release — notable because the investment bank was positioning on a freshly-listed Hong Kong AI company weeks after its IPO, in a market still sorting through the implications of DeepSeek's January disruption.
The key line from the UBS note: "We recently initiated coverage on MiniMax, and consider the company well positioned to benefit from the AI tailwinds in China and global markets."
The bull case rests on a specific scenario: if MiniMax achieves 3% of the global market for enterprise AI services, that corresponds to segment revenue of $41 billion. That figure is not a current valuation — it is the optimistic scenario in UBS's model for what the enterprise AI software market could allocate to a cost-competitive Chinese frontier lab with open-weight distribution.
The logic is structural. Enterprise software buyers face a simple tradeoff: frontier capability at high cost, or lesser capability at low cost. M2.5 is the first Chinese model to credibly threaten that binary. If you can access near-Claude performance at one-tenth the cost, the procurement calculus changes, and it changes at scale.
UBS is not alone. Morgan Stanley and Jefferies have also initiated Buy-equivalent coverage of MiniMax. That kind of simultaneous analyst attention on a newly public company in a contested geopolitical technology sector is unusual. It reflects consensus that the cost-performance gap between Chinese and US frontier models is narrower than the pricing gap suggests.
MiniMax M2.5 did not arrive in isolation. February 2026 produced a coordinated — or at least coincident — wave of releases from Chinese labs timed around the Lunar New Year. UBS flagged five models in total. Here is what the other four represent:
Qwen 3.5 (Alibaba) Alibaba released Qwen 3.5 hours before the Lunar New Year on February 16. Qwen3.5-Plus is the first native multimodal flagship from Alibaba — natively handling text, images, and other modalities in a single model rather than bolting on vision as an afterthought. It is up to 60% cheaper than its predecessor Qwen 2.5 and continues Alibaba's strategy of open-source distribution to maximize developer adoption globally.
GLM-5 (Zhipu AI) Zhipu released GLM-5 on February 11, positioning it around "agentic intelligence, advanced multi-step reasoning, and frontier-level performance" in coding, creative writing, and problem-solving. The detail that drew the most attention: GLM-5 was trained entirely on Huawei Ascend chips, with no US-manufactured semiconductor hardware involved. That is a capability and geopolitical statement combined.
Seedance 2.0 (ByteDance) ByteDance unveiled Seedance 2.0, a video generation model positioned for professional film production. The American Motion Picture Association criticized it for operating on copyrighted works at scale — a signal that the model is capable enough to trigger industry-level legal concern, which is its own form of benchmark validation.
ERNIE 5.0 (Baidu) Baidu released ERNIE 5.0 as part of the broader wave. Combined with Qwen 3.5 and others, ERNIE 5.0 represents what several analysts described as a "generational inflection" in open-weight LLMs: the models are no longer text-only, they are increasingly multimodal, and they are increasingly trained independent of US chip supply chains.
What is notable about all five releases is the simultaneity. These are not labs in loose coordination — they are competing fiercely against each other within China. The Lunar New Year window functions as an unofficial launch date, and the result in February 2026 was five frontier-adjacent models arriving within days of each other, each with distinct architectural choices and market positioning.
The framing of US-China AI competition has typically centered on safety compute, model capability, and chip access. The MiniMax moment suggests a fourth dimension: cost structure.
When Claude Opus 4.6 debuted, it set a benchmark for what a frontier model could do. The implicit assumption was that closing the performance gap required comparable investment — comparable training runs, comparable chip access, comparable operational costs. MiniMax M2.5 challenges that assumption directly.
The MoE architecture is not a shortcut. It is a legitimate architectural innovation that produces competitive performance at lower inference cost. The fact that it originates in China is partly because Chinese labs face pricing pressure that American labs, flush with enterprise software pricing power, do not. When your cost of capital is different, your architecture choices are different.
"Chinese labs are no longer just catching up to US frontier models; in specific domains like open-source availability, agent orchestration, and cost efficiency, they are setting the pace."
That framing, which appeared across multiple analyst notes in February, represents a genuine shift from the post-DeepSeek narrative of January 2025. DeepSeek proved that Chinese labs could train competitive models more efficiently. MiniMax is proving that they can also deploy them more cheaply. Those are different advantages, and together they constitute a durable competitive position.
The enterprise implication is direct. A company building an agentic software product today faces a build-or-buy decision on the underlying model. If a model with Claude-level performance is available at one-tenth the price, with open weights, under an MIT license, the procurement argument for the premium US model weakens. Not disappears — trust, compliance, support, and integration ecosystem all matter — but weakens measurably.
MiniMax's own internal adoption data reinforces the enterprise story: 80% of all new code at MiniMax is generated by M2.5, and 30% of daily internal tasks are completed autonomously by the model. Those figures suggest a company that is dogfooding at scale — and building the internal automation case that it then sells externally.
To move past the token-rate comparison, the per-task framing is more useful for enterprise buyers. UBS's cited figure — $0.15 per task for M2.5 versus $3.00 for Claude Opus 4.6 — assumes a typical agentic software engineering task involving multi-step reasoning, code generation, and tool use.
At those rates, the economics shift substantially at volume:
| Monthly tasks | Claude Opus 4.6 cost | MiniMax M2.5 cost | Savings |
|---|---|---|---|
| 10,000 | $30,000 | $1,500 | $28,500 |
| 100,000 | $300,000 | $15,000 | $285,000 |
| 1,000,000 | $3,000,000 | $150,000 | $2,850,000 |
These are directional estimates based on the per-task figures UBS cited. Actual costs vary with context window size and task complexity. But the order-of-magnitude difference is durable across reasonable parameterizations.
For startups building on top of frontier models, the cost structure difference is existential. A company that can run a million agentic tasks per month for $150,000 instead of $3 million has a fundamentally different unit economics story — and a different ability to pass savings through to customers.
MiniMax has already announced that M2.5 is a stepping stone. The company's product roadmap points toward deeper integration of Office Skills — Word, Excel, PowerPoint workflows running natively inside the model's agentic loop — and continued expansion of its Expert marketplace, which already hosts 10,000+ user-built domain-specific agents.
The open-weight distribution under MIT license matters for the longer arc. Enterprise customers who need to self-host for compliance reasons, or who want to fine-tune on proprietary data, can do so without licensing friction. That is a meaningful differentiator against closed frontier models from Anthropic, OpenAI, and Google.
The UBS Buy rating puts a credibility stamp on what developer usage data was already showing. When a major Swiss investment bank initiates coverage of a freshly-listed Beijing AI company with explicit language about global enterprise market positioning, it signals that institutional capital is beginning to price in a scenario where the frontier model market is not a US duopoly.
Whether MiniMax sustains its performance advantage as Anthropic and OpenAI respond — presumably with their own architectural and pricing moves — is an open question. The history of the past eighteen months suggests the frontier compresses faster than anyone predicts. But for buyers making decisions today, the numbers are real, the benchmarks are verified, and the usage data is in production.
One-third of Claude's usage. One-tenth of the price. That is the sentence the rest of the industry is now reading.
What is MiniMax M2.5 and when was it released? MiniMax M2.5 is a large language model released by Beijing-based MiniMax on February 12, 2026. It is a Mixture-of-Experts model with 230 billion total parameters (10 billion active during inference), trained with reinforcement learning on real-world agentic tasks. It is open weights under the MIT license.
How does MiniMax M2.5 compare to Claude Opus 4.6 on benchmarks? M2.5 scores 80.2% on SWE-Bench Verified versus Claude Opus 4.6's 80.8% — a gap of 0.6 percentage points, which is within benchmark noise. M2.5 also outperforms Opus 4.6 on Droid (79.7 vs 78.9) and OpenCode (76.1 vs 75.9). On BrowseComp, which tests multi-step web research, M2.5 scores 76.3%.
Why is MiniMax M2.5 so much cheaper than Claude? The Mixture-of-Experts architecture activates only 10 billion of its 230 billion parameters during inference, dramatically reducing compute cost per token. M2.5 costs $0.30 per million input tokens and $1.20 per million output tokens, compared to Claude Opus 4.6's $5.00 input and $25.00 output. The per-task cost difference is approximately $0.15 for M2.5 versus $3.00 for Opus 4.6 on typical agentic tasks.
What did UBS say about MiniMax? UBS initiated Buy-equivalent coverage on MiniMax, describing the company as "well positioned to benefit from the AI tailwinds in China and global markets." In an optimistic scenario, UBS estimates MiniMax could achieve 3% of the global enterprise AI services market, corresponding to $41 billion in segment revenue. Morgan Stanley and Jefferies also initiated Buy-equivalent coverage.
What other Chinese AI models were released alongside M2.5? The February 2026 wave included: Qwen 3.5 from Alibaba (natively multimodal, 60% cheaper than its predecessor), GLM-5 from Zhipu AI (trained entirely on Huawei chips, no US semiconductors), Seedance 2.0 from ByteDance (professional video generation), and ERNIE 5.0 from Baidu. Together with M2.5, these five releases represent the most concentrated frontier model launch window from Chinese labs to date.
Anthropic releases Claude Opus 4 and Sonnet 4 with hybrid instant-and-extended thinking, setting new SWE-bench records at 72.5% and 72.7% respectively.
Meta has quietly revealed its Llama 4 lineup: Mango (flagship multimodal) and Avocado (coding-specialized) models launching H1 2026, with open weights that could democratize AI access.
xAI announces Grok 5, a 6-trillion-parameter Mixture-of-Experts model training on Colossus 2 — the largest LLM announced by parameter count, with public beta expected Q2 2026.