NVIDIA unveils Rubin platform with six new chips at MWC 2026
NVIDIA launches the Rubin computing architecture at MWC Barcelona with six new chips spanning agentic AI, physical AI, autonomous vehicles, robotics, and biomedical.
Whether you're looking for an angel investor, a growth advisor, or just want to connect — I'm always open to great ideas.
Get in TouchAI, startups & growth insights. No spam.
TL;DR: NVIDIA launched the Rubin platform at MWC Barcelona 2026 with 6 new chips replacing Blackwell in H2 2026. The flagship GPU packs 336 billion transistors and delivers 50 PFLOPS of FP4 inference -- a 10x cost-per-token reduction over Blackwell. The full Vera Rubin NVL72 rack pushes 3.6 EFLOPS, with AWS, Google Cloud, Azure, and Oracle confirmed as launch partners.
336 billion transistors. 50 petaflops of FP4 inference. 10x lower cost per token than Blackwell. Five AI model families spanning six industries. One rack-scale supercomputer. This is what NVIDIA brought to MWC Barcelona 2026 — and it fundamentally changes the math on AI infrastructure for the next two years.
NVIDIA announced the Rubin computing platform at CES 2026 in January and placed it front and center at MWC Barcelona in early March, pairing the hardware story with a targeted telecom and 6G narrative. The framing matters: when NVIDIA says "six new chips," it does not mean six GPU variants. It means six distinct silicon components that only reach their advertised performance figures when deployed as one co-designed system.
The six chips:
Each of these ships together in the flagship Vera Rubin NVL72 configuration: 72 Rubin GPUs paired with 36 Vera CPUs in a single rack. Partner systems from the full range of OEMs and cloud providers ship in H2 2026.
The Vera CPU is NVIDIA's first production CPU designed in-house at the system level, purpose-built to run agentic reasoning pipelines alongside GPU inference rather than as a separate general-purpose host processor.
Key specifications:
The Olympus core design diverges from standard Neoverse cores. NVIDIA has not released a full microarchitecture paper, but the emphasis on coherent, low-latency GPU attachment via NVLink-C2C positions Vera as a CPU that shares memory state with the GPU rather than operating at arm's length across PCIe. For agentic workloads — where a reasoning model must call tools, retrieve context, and re-enter inference repeatedly — reducing CPU-GPU round-trip latency is a material performance lever.
The Rubin GPU is a dual-die package built on TSMC's 3nm node. The transistor count of 336 billion compares to 208 billion in the Blackwell B200 — a 62% increase, absorbed largely by the expanded Tensor Core matrix, the new Transformer Engine, and the on-chip memory subsystem.
Compute performance:
| Metric | Rubin GPU | Blackwell GB200 | Hopper H100 |
|---|---|---|---|
| FP4 inference (PFLOPS) | 50 | 10 | — |
| FP8 training (PFLOPS) | 35 | ~10 | ~4 |
| HBM capacity | 288 GB HBM4 | 192 GB HBM3e | 80 GB HBM3 |
| Memory bandwidth | 22 TB/s | ~8 TB/s | 3.35 TB/s |
| Transistors | 336B | 208B | 80B |
| Process node | TSMC 3nm | TSMC 4nm | TSMC 4nm |
The 3rd-generation Transformer Engine inside Rubin supports hardware-accelerated adaptive precision, switching dynamically between NVFP4 and FP8 depending on layer sensitivity. NVIDIA has published a single-sentence performance claim on this feature: it avoids accuracy loss that would otherwise require FP16 fallback on select attention heads.
The 224 Streaming Multiprocessors carry 5th-generation Tensor Cores. The SFU (Special Function Unit) count has expanded alongside new execution pipelines for sparse attention and activation functions — directly targeting the compute patterns of modern MoE (Mixture-of-Experts) models, which now dominate frontier AI training runs.
NVLink 6 is the interconnect that makes the Vera Rubin NVL72 function as one logical compute unit rather than 72 independent GPUs sharing a network. The NVLink 6 switch delivers 3.6 TB/s of bidirectional GPU-to-GPU bandwidth — double NVLink 4's 1.8 TB/s.
At the full NVL72 rack scale:
For context: the GB200 NVL72 Blackwell rack delivers approximately 720 PFLOPS of FP8 training. Rubin NVL72 at 2.5 EFLOPS in NVFP4 is not directly apples-to-apples, but NVIDIA's own claim is 3.5x better training throughput and 5x better inference throughput versus Blackwell when normalized to identical workloads.
The three networking chips are often undercovered but they are what allows NVIDIA to claim the Rubin platform is a supercomputer and not just a dense GPU cluster.
ConnectX-9 SuperNIC:
BlueField-4 DPU:
Spectrum-6 Ethernet Switch:
The hardware platform shipped alongside five open model families at CES 2026, which NVIDIA brought to MWC 2026 with specific telecom extensions. Each model family targets a distinct vertical:
| Model Family | Domain | Flagship Model |
|---|---|---|
| Nemotron | Agentic AI, language, speech | Nemotron Large Telco Model (30B) |
| Cosmos | Physical AI, robotics, simulation | Cosmos world foundation model |
| Alpamayo | Autonomous vehicles | Alpamayo 1 (VLA, open-source) |
| Isaac GR00T | Humanoid robotics | GR00T N1.6 (full-body control) |
| Clara | Biomedical, drug discovery | Protein/RNA structure prediction models |
Nemotron is the broadest family. Nemotron Speech delivers real-time, low-latency ASR on the leaderboard. Nemotron RAG includes multilingual and multimodal embedding and reranking models. At MWC 2026, NVIDIA and AdaptKey AI released the Nemotron Large Telco Model (LTM) — a 30-billion-parameter open model fine-tuned on telecom standards, synthetic logs, and fault isolation workflows. The model reasons through remediation plans for network faults and is designed for autonomous network operations centers.
Cosmos is NVIDIA's physical AI platform — a world foundation model trained on video, simulation data, and robotics trajectories. It serves as a backbone for training robot and vehicle models that need to reason about physical environments without billions of labeled real-world examples.
Alpamayo is the first open, large-scale reasoning vision-language-action (VLA) model for autonomous vehicle development. NVIDIA contributed one of the world's largest open autonomous vehicle datasets alongside it: 100 terabytes of vehicle sensor data covering diverse road conditions.
Isaac GR00T N1.6 is a reasoning VLA for humanoid robots, specifically addressing simultaneous locomotion and manipulation — the hardest unsolved problem in embodied robotics. The model is trained on 500,000 robot trajectories from NVIDIA's open dataset contribution.
Clara expands NVIDIA's biomedical platform with dedicated models for protein design, drug synthesis planning, safety testing, and RNA structure prediction. NVIDIA contributed 455,000 open protein structures to the research community alongside the Clara release.
MWC Barcelona 2026 (March 2–5) was where NVIDIA made the telecom and 6G layer of the Rubin platform explicit. The hardware story had landed at CES; MWC was about making the case that the same Rubin infrastructure powering frontier AI training would also power the radio access networks of the next decade.
The 6G AI-native coalition: NVIDIA secured commitments from more than a dozen global operators and vendors to build 6G on open, secure, AI-native platforms. Confirmed participants include BT Group, Deutsche Telekom, Ericsson, Nokia, SK Telecom, SoftBank, T-Mobile, Cisco, and Booz Allen Hamilton.
Jensen Huang's framing at MWC: "AI is redefining computing and driving the largest infrastructure buildout in human history — and telecommunications is next."
AI-RAN field results: T-Mobile US demonstrated concurrent AI workloads and RAN processing on NVIDIA's AI-RAN platform using Nokia's CUDA-accelerated RAN software stack — running video streaming, generative AI applications, and AI-powered captioning alongside live 5G over the air simultaneously. This is the first large-scale public demonstration of AI-RAN on production hardware.
Spectrum-X with co-packaged optics: NVIDIA and Coherent announced a strategic partnership at MWC 2026 to develop co-packaged optics technology for the Spectrum-6 switch. Co-packaged optics reduces the power and latency cost of long-distance signaling inside AI factories — an important cost driver at the scale of a 100,000-GPU cluster.
| Specification | Vera Rubin NVL72 | GB200 NVL72 (Blackwell) | H100 NVL8 (Hopper) |
|---|---|---|---|
| GPU count | 72 Rubin | 72 B200 | 8 H100 |
| CPU count | 36 Vera | 36 Grace | Host CPU |
| FP4 inference | 3.6 EFLOPS | ~720 PFLOPS* | N/A |
| HBM capacity (total) | 20.7 TB HBM4 | ~13.8 TB HBM3e | 640 GB HBM3 |
| HBM bandwidth | 1.6 PB/s | ~576 TB/s | ~26.8 TB/s |
| Scale-up fabric | NVLink 6 (3.6 TB/s) | NVLink 4 (1.8 TB/s) | NVLink 3 |
| Scale-out NIC | ConnectX-9 (1.6 Tb/s) | ConnectX-8 (400 Gb/s) | ConnectX-7 |
| Process node | TSMC 3nm | TSMC 4nm | TSMC 4nm |
| Availability | H2 2026 | Now | Now |
| Cost per token vs prior gen | 10x lower (MoE inference) | ~30x lower vs H100 | Baseline |
*FP4 not natively supported in Blackwell; figure is normalized FP8 equivalent for comparison.
The training efficiency claim deserves a closer look. NVIDIA states that training a MoE model on Rubin requires one-quarter the GPU count compared to Blackwell for the same job. The mechanism is not raw flops — it is memory capacity and bandwidth. MoE models have large parameter counts with sparse activation patterns. Fitting a larger fraction of the model in HBM without fragmentation across more GPUs reduces inter-GPU communication overhead, which is frequently the bottleneck in distributed training beyond a few thousand GPUs.
Rubin is in production as of March 2026. Volume shipments of Vera Rubin NVL72 systems begin in H2 2026 through the following channels:
Hyperscalers: AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure
Specialist cloud providers: CoreWeave, Lambda, Nebius, Nscale
OEM partners: The full set of NVIDIA's DGX, HGX, and MGX system partners
For teams currently evaluating whether to build on Blackwell or wait for Rubin: the practical answer depends on your timeline. If you have a production workload today, Blackwell is the right choice — software improvements have added up to 1.4x higher training throughput since GB200's initial launch with no hardware change. If your deployment is H2 2026 or later, or if you are designing an AI factory with a 3-5 year planning horizon, Rubin changes the cost model dramatically at the MoE inference tier.
The 10x cost-per-token reduction for MoE inference is the most commercially significant number in the Rubin announcement. Frontier model inference is the fastest-growing cloud cost category for AI-native companies in 2026. A 10x reduction does not mean smaller GPU bills — it means the same GPU bill buys 10x more inference throughput, which changes the economics of agents, real-time personalization, and any use case that requires high-frequency model calls.
NVIDIA's publicly stated roadmap beyond Rubin:
| Generation | Codename | Estimated Timeframe | Key Claim |
|---|---|---|---|
| Current | Blackwell (GB200) | Now — H2 2026 | 30x inference vs H100 |
| Next | Vera Rubin (NVL72) | H2 2026 | 5x inference vs Blackwell |
| Following | Rubin Ultra | 2027 | 2x Rubin (100 PFLOPS FP4) |
| Future | Feynman | Post-2027 | Not disclosed |
Rubin Ultra doubles the Rubin GPU's FP4 inference from 50 PFLOPS to 100 PFLOPS by adding a second GPU die to the superchip package. The NVL72 configuration scales accordingly. Feynman is named after physicist Richard Feynman; no architecture details have been disclosed.
The annual cadence NVIDIA committed to at GTC 2024 — one new architecture per year — remains intact through this roadmap.
Alongside the hardware, NVIDIA made one of the largest single contributions of open AI training data in 2026:
The data contribution is not philanthropy — it is ecosystem lock-in of a different kind. If the dominant open robotics, autonomous vehicle, and biomedical models are trained on NVIDIA-curated datasets using NVIDIA simulation tools on NVIDIA hardware, the optimization path for those workloads runs directly through NVIDIA's stack. Researchers building on Cosmos, Isaac, or Clara start from a baseline that is already tuned for Rubin-class hardware.
What is the NVIDIA Rubin platform? Rubin is NVIDIA's next-generation AI computing platform, consisting of six co-designed chips: the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. Delivered as the Vera Rubin NVL72, a 72-GPU, 36-CPU rack-scale system, it succeeds the Blackwell platform and ships in volume in H2 2026.
How much faster is Rubin compared to Blackwell? NVIDIA claims 5x the FP4 inference throughput of a Blackwell GB200 NVL72 rack, 3.5x the training throughput for equivalent workloads, and 10x lower cost per token for MoE inference. For training MoE models specifically, Rubin requires one-quarter the number of GPUs to achieve the same throughput.
What is the Vera CPU and why does it matter? Vera is a custom 88-core Arm CPU with 1.8 TB/s of coherent bandwidth to the Rubin GPU via NVLink-C2C. Unlike standard host CPUs that sit behind PCIe, Vera shares coherent memory state with the GPU. This matters for agentic AI workloads where the CPU orchestrates tool calls, memory retrieval, and inference in tight loops — reducing round-trip latency changes end-to-end throughput.
What did NVIDIA announce at MWC 2026 specifically? At MWC Barcelona (March 2–5, 2026), NVIDIA focused on the telecom layer of the Rubin platform. Key announcements: a 6G AI-native coalition with 12+ global operators including Deutsche Telekom, Ericsson, Nokia, and T-Mobile; the Nemotron Large Telco Model (30B parameters, open source); live AI-RAN field results with T-Mobile on Nokia's CUDA-accelerated stack; and the Coherent partnership for co-packaged optics on Spectrum-6.
When will Rubin systems be available? Volume availability of Vera Rubin NVL72 systems from cloud providers and OEM partners begins H2 2026. AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale are confirmed launch partners.
Should I wait for Rubin instead of deploying on Blackwell? If your workload is live or planned before H2 2026, deploy on Blackwell now — software improvements have already added 1.4x training throughput at no hardware cost. If you are planning a new AI factory with a timeline of late 2026 or beyond, the Rubin cost model — particularly for MoE inference — is materially better and worth building toward.
What is the Nemotron Large Telco Model? A 30-billion-parameter open language model fine-tuned on telecom standards documents, synthetic network logs, and fault isolation workflows. Built by NVIDIA and AdaptKey AI on the Nemotron 3 foundation, it is designed to reason through network fault diagnosis and autonomous remediation planning for telecom NOCs.
What comes after Rubin? Rubin Ultra in 2027 doubles FP4 inference performance to 100 PFLOPS per GPU. After that, the Feynman architecture — timeline and specs not yet disclosed. NVIDIA's roadmap commits to one new architecture per year through at least the Feynman generation.
AI chipmaker Cerebras raises $1 billion at $23 billion valuation and prepares for a Q2 2026 IPO, positioning as the most credible NVIDIA inference competitor.
NVIDIA GTC 2026 runs March 16-19 in San Jose with the Rubin platform, Jetson T4000, physical AI breakthroughs, and robotics partnerships with Boston Dynamics and more.
NVIDIA invests $2 billion each in Lumentum and Coherent, signaling that AI infrastructure's next bottleneck is connectivity — not compute — as optical interconnects become critical for data center scaling.