- Why NVIDIA calls the Rubin platform a "six-chip" system and what each chip does - How the Vera CPU, Rubin GPU, NVLink 6, ConnectX-9, BlueField-4, and Spectrum-6 interact as a unified system - Exact performance comparisons between Rubin, Blackwell, and Hopper across inference, training, memory, and bandwidth - What the Vera Rubin NVL72 rack configuration delivers at full scale (3.6 EFLOPS of FP4 inference) - The five new open AI model families NVIDIA released alongside the hardware - What NVIDIA announced specifically for the telecom industry at MWC 2026, including the 6G AI-native coalition - How the 30-billion-parameter Nemotron Large Telco Model changes autonomous network operations - Which cloud providers — AWS, Google Cloud, Azure, Oracle Cloud, CoreWeave, Lambda — will carry Rubin systems in H2 2026 - The full NVIDIA chip roadmap: Rubin now, Rubin Ultra in 2027, Feynman after that - What this means for teams currently deciding whether to build on Blackwell or wait ---

NVIDIA unveils Rubin platform with six new chips at MWC 202…

title: "NVIDIA unveils Rubin platform with six new chips at MWC 2026" description: "NVIDIA launches the Rubin computing architecture at MWC Barcelona with six new chips spanning agentic AI, physical AI, autonomous vehicles, robotics, and biomedical." date: "2026-03-05" lastUpdated: "2026-03-05" slug: "nvidia-rubin-platform-six-new-chips-mwc-2026" published: true tags: ["nvidia", "rubin", "mwc", "chips", "ai-infrastructure", "news"] readTime: "11 min read"

TL;DR: NVIDIA launched the Rubin platform at MWC Barcelona 2026 with 6 new chips replacing Blackwell in H2 2026. The flagship GPU packs 336 billion transistors and delivers 50 PFLOPS of FP4 inference -- a 10x cost-per-token reduction over Blackwell. The full Vera Rubin NVL72 rack pushes 3.6 EFLOPS, with AWS, Google Cloud, Azure, and Oracle confirmed as launch partners.

336 billion transistors. 50 petaflops of FP4 inference. 10x lower cost per token than Blackwell. Five AI model families spanning six industries. One rack-scale supercomputer. This is what NVIDIA brought to MWC Barcelona 2026 — and it fundamentally changes the math on AI infrastructure for the next two years.

What you will learn

Why NVIDIA calls the Rubin platform a "six-chip" system and what each chip does
How the Vera CPU, Rubin GPU, NVLink 6, ConnectX-9, BlueField-4, and Spectrum-6 interact as a unified system
Exact performance comparisons between Rubin, Blackwell, and Hopper across inference, training, memory, and bandwidth
What the Vera Rubin NVL72 rack configuration delivers at full scale (3.6 EFLOPS of FP4 inference)
The five new open AI model families NVIDIA released alongside the hardware
What NVIDIA announced specifically for the telecom industry at MWC 2026, including the 6G AI-native coalition
How the 30-billion-parameter Nemotron Large Telco Model changes autonomous network operations
Which cloud providers — AWS, Google Cloud, Azure, Oracle Cloud, CoreWeave, Lambda — will carry Rubin systems in H2 2026
The full NVIDIA chip roadmap: Rubin now, Rubin Ultra in 2027, Feynman after that
What this means for teams currently deciding whether to build on Blackwell or wait

The Rubin platform is not a single chip — it is a six-chip system

NVIDIA announced the Rubin computing platform at CES 2026 in January and placed it front and center at MWC Barcelona in early March, pairing the hardware story with a targeted telecom and 6G narrative. The framing matters: when NVIDIA says "six new chips," it does not mean six GPU variants. It means six distinct silicon components that only reach their advertised performance figures when deployed as one co-designed system.

The six chips:

NVIDIA Vera CPU — custom Arm-based processor for agentic reasoning workloads
NVIDIA Rubin GPU — the inference and training engine at the heart of the platform
NVIDIA NVLink 6 Switch — sixth-generation scale-up interconnect fabric
NVIDIA ConnectX-9 SuperNIC — high-speed networking at 1.6 Tb/s per GPU
NVIDIA BlueField-4 DPU — infrastructure offload and security
NVIDIA Spectrum-6 Ethernet Switch — 102.4 Tb/s scale-out switching with co-packaged optics

Each of these ships together in the flagship Vera Rubin NVL72 configuration: 72 Rubin GPUs paired with 36 Vera CPUs in a single rack. Partner systems from the full range of OEMs and cloud providers ship in H2 2026.

Vera CPU: 88 custom Arm cores built for agentic AI

The Vera CPU is NVIDIA's first production CPU designed in-house at the system level, purpose-built to run agentic reasoning pipelines alongside GPU inference rather than as a separate general-purpose host processor.

Key specifications:

227 billion transistors — more than the entire Blackwell B100 GPU die
88 custom Olympus cores with 176 threads via NVIDIA Spatial Multi-Threading
Full Armv9.2 compatibility — runs existing Arm software stacks without modification
Up to 1.5 TB of LPDDR5x via SOCAMM modules
Up to 1.2 TB/s of memory bandwidth — roughly 3x a standard server CPU
1.8 TB/s of coherent NVLink-C2C bandwidth to the Rubin GPU

The Olympus core design diverges from standard Neoverse cores. NVIDIA has not released a full microarchitecture paper, but the emphasis on coherent, low-latency GPU attachment via NVLink-C2C positions Vera as a CPU that shares memory state with the GPU rather than operating at arm's length across PCIe. For agentic workloads — where a reasoning model must call tools, retrieve context, and re-enter inference repeatedly — reducing CPU-GPU round-trip latency is a material performance lever.

Rubin GPU: 336 billion transistors and HBM4

The Rubin GPU is a dual-die package built on TSMC's 3nm node. The transistor count of 336 billion compares to 208 billion in the Blackwell B200 — a 62% increase, absorbed largely by the expanded Tensor Core matrix, the new Transformer Engine, and the on-chip memory subsystem.

Compute performance:

Metric	Rubin GPU	Blackwell GB200	Hopper H100
FP4 inference (PFLOPS)	50	10	—
FP8 training (PFLOPS)	35	~10	~4
HBM capacity	288 GB HBM4	192 GB HBM3e	80 GB HBM3
Memory bandwidth	22 TB/s	~8 TB/s	3.35 TB/s
Transistors	336B	208B	80B
Process node	TSMC 3nm	TSMC 4nm	TSMC 4nm

The 3rd-generation Transformer Engine inside Rubin supports hardware-accelerated adaptive precision, switching dynamically between NVFP4 and FP8 depending on layer sensitivity. NVIDIA has published a single-sentence performance claim on this feature: it avoids accuracy loss that would otherwise require FP16 fallback on select attention heads.

The 224 Streaming Multiprocessors carry 5th-generation Tensor Cores. The SFU (Special Function Unit) count has expanded alongside new execution pipelines for sparse attention and activation functions — directly targeting the compute patterns of modern MoE (Mixture-of-Experts) models, which now dominate frontier AI training runs.

NVLink 6: 3.6 TB/s scale-up bandwidth per GPU

NVLink 6 is the interconnect that makes the Vera Rubin NVL72 function as one logical compute unit rather than 72 independent GPUs sharing a network. The NVLink 6 switch delivers 3.6 TB/s of bidirectional GPU-to-GPU bandwidth — double NVLink 4's 1.8 TB/s.

At the full NVL72 rack scale:

260 TB/s total scale-up bandwidth across all 72 GPUs
All 72 GPUs share a unified coherent memory pool: 20.7 TB of HBM4 and 54 TB of LPDDR5x
The rack delivers 3.6 EFLOPS of NVFP4 inference and 2.5 EFLOPS of NVFP4 training

For context: the GB200 NVL72 Blackwell rack delivers approximately 720 PFLOPS of FP8 training. Rubin NVL72 at 2.5 EFLOPS in NVFP4 is not directly apples-to-apples, but NVIDIA's own claim is 3.5x better training throughput and 5x better inference throughput versus Blackwell when normalized to identical workloads.

ConnectX-9, BlueField-4, and Spectrum-6: the network layer

The three networking chips are often undercovered but they are what allows NVIDIA to claim the Rubin platform is a supercomputer and not just a dense GPU cluster.

ConnectX-9 SuperNIC:

1.6 Tb/s per Rubin GPU — up from 400 Gb/s on ConnectX-7
Enables direct GPU-to-GPU RDMA across racks without CPU involvement
Ships as a SuperNIC when paired with a host CPU, or as a SmartNIC configuration

BlueField-4 DPU:

800 Gb/s data plane
Dual-die package: 64-core Grace CPU for infrastructure offload + integrated ConnectX-9
Handles encryption, firewall, storage virtualization, and telemetry at line rate
Frees the Vera CPU and Rubin GPU from all infrastructure overhead

Spectrum-6 Ethernet Switch:

102.4 Tb/s aggregate switching capacity
Co-packaged optics integration (with NVIDIA's Coherent partnership announced at MWC 2026)
Part of Spectrum-X, NVIDIA's full Ethernet AI network stack

Five new AI model families released alongside the hardware

The hardware platform shipped alongside five open model families at CES 2026, which NVIDIA brought to MWC 2026 with specific telecom extensions. Each model family targets a distinct vertical:

Model Family	Domain	Flagship Model
Nemotron	Agentic AI, language, speech	Nemotron Large Telco Model (30B)
Cosmos	Physical AI, robotics, simulation	Cosmos world foundation model
Alpamayo	Autonomous vehicles	Alpamayo 1 (VLA, open-source)
Isaac GR00T	Humanoid robotics	GR00T N1.6 (full-body control)
Clara	Biomedical, drug discovery	Protein/RNA structure prediction models

Nemotron is the broadest family. Nemotron Speech delivers real-time, low-latency ASR on the leaderboard. Nemotron RAG includes multilingual and multimodal embedding and reranking models. At MWC 2026, NVIDIA and AdaptKey AI released the Nemotron Large Telco Model (LTM) — a 30-billion-parameter open model fine-tuned on telecom standards, synthetic logs, and fault isolation workflows. The model reasons through remediation plans for network faults and is designed for autonomous network operations centers.

Cosmos is NVIDIA's physical AI platform — a world foundation model trained on video, simulation data, and robotics trajectories. It serves as a backbone for training robot and vehicle models that need to reason about physical environments without billions of labeled real-world examples.

Alpamayo is the first open, large-scale reasoning vision-language-action (VLA) model for autonomous vehicle development. NVIDIA contributed one of the world's largest open autonomous vehicle datasets alongside it: 100 terabytes of vehicle sensor data covering diverse road conditions.

Isaac GR00T N1.6 is a reasoning VLA for humanoid robots, specifically addressing simultaneous locomotion and manipulation — the hardest unsolved problem in embodied robotics. The model is trained on 500,000 robot trajectories from NVIDIA's open dataset contribution.

Clara expands NVIDIA's biomedical platform with dedicated models for protein design, drug synthesis planning, safety testing, and RNA structure prediction. NVIDIA contributed 455,000 open protein structures to the research community alongside the Clara release.

MWC 2026: the telecom-specific announcements

MWC Barcelona 2026 (March 2–5) was where NVIDIA made the telecom and 6G layer of the Rubin platform explicit. The hardware story had landed at CES; MWC was about making the case that the same Rubin infrastructure powering frontier AI training would also power the radio access networks of the next decade.

The 6G AI-native coalition: NVIDIA secured commitments from more than a dozen global operators and vendors to build 6G on open, secure, AI-native platforms. Confirmed participants include BT Group, Deutsche Telekom, Ericsson, Nokia, SK Telecom, SoftBank, T-Mobile, Cisco, and Booz Allen Hamilton.

Jensen Huang's framing at MWC: "AI is redefining computing and driving the largest infrastructure buildout in human history — and telecommunications is next."

AI-RAN field results: T-Mobile US demonstrated concurrent AI workloads and RAN processing on NVIDIA's AI-RAN platform using Nokia's CUDA-accelerated RAN software stack — running video streaming, generative AI applications, and AI-powered captioning alongside live 5G over the air simultaneously. This is the first large-scale public demonstration of AI-RAN on production hardware.

Spectrum-X with co-packaged optics: NVIDIA and Coherent announced a strategic partnership at MWC 2026 to develop co-packaged optics technology for the Spectrum-6 switch. Co-packaged optics reduces the power and latency cost of long-distance signaling inside AI factories — an important cost driver at the scale of a 100,000-GPU cluster.

Rubin vs Blackwell vs Hopper: full platform comparison

Specification	Vera Rubin NVL72	GB200 NVL72 (Blackwell)	H100 NVL8 (Hopper)
GPU count	72 Rubin	72 B200	8 H100
CPU count	36 Vera	36 Grace	Host CPU
FP4 inference	3.6 EFLOPS	~720 PFLOPS*	N/A
HBM capacity (total)	20.7 TB HBM4	~13.8 TB HBM3e	640 GB HBM3
HBM bandwidth	1.6 PB/s	~576 TB/s	~26.8 TB/s
Scale-up fabric	NVLink 6 (3.6 TB/s)	NVLink 4 (1.8 TB/s)	NVLink 3
Scale-out NIC	ConnectX-9 (1.6 Tb/s)	ConnectX-8 (400 Gb/s)	ConnectX-7
Process node	TSMC 3nm	TSMC 4nm	TSMC 4nm
Availability	H2 2026	Now	Now
Cost per token vs prior gen	10x lower (MoE inference)	~30x lower vs H100	Baseline

*FP4 not natively supported in Blackwell; figure is normalized FP8 equivalent for comparison.

The training efficiency claim deserves a closer look. NVIDIA states that training a MoE model on Rubin requires one-quarter the GPU count compared to Blackwell for the same job. The mechanism is not raw flops — it is memory capacity and bandwidth. MoE models have large parameter counts with sparse activation patterns. Fitting a larger fraction of the model in HBM without fragmentation across more GPUs reduces inter-GPU communication overhead, which is frequently the bottleneck in distributed training beyond a few thousand GPUs.

Availability, cloud deployments, and the procurement question

Rubin is in production as of March 2026. Volume shipments of Vera Rubin NVL72 systems begin in H2 2026 through the following channels:

Hyperscalers: AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure

Specialist cloud providers: CoreWeave, Lambda, Nebius, Nscale

OEM partners: The full set of NVIDIA's DGX, HGX, and MGX system partners

For teams currently evaluating whether to build on Blackwell or wait for Rubin: the practical answer depends on your timeline. If you have a production workload today, Blackwell is the right choice — software improvements have added up to 1.4x higher training throughput since GB200's initial launch with no hardware change. If your deployment is H2 2026 or later, or if you are designing an AI factory with a 3-5 year planning horizon, Rubin changes the cost model dramatically at the MoE inference tier.

The 10x cost-per-token reduction for MoE inference is the most commercially significant number in the Rubin announcement. Frontier model inference is the fastest-growing cloud cost category for AI-native companies in 2026. A 10x reduction does not mean smaller GPU bills — it means the same GPU bill buys 10x more inference throughput, which changes the economics of agents, real-time personalization, and any use case that requires high-frequency model calls.

NVIDIA chip roadmap: Rubin, Rubin Ultra, Feynman

NVIDIA's publicly stated roadmap beyond Rubin:

Generation	Codename	Estimated Timeframe	Key Claim
Current	Blackwell (GB200)	Now — H2 2026	30x inference vs H100
Next	Vera Rubin (NVL72)	H2 2026	5x inference vs Blackwell
Following	Rubin Ultra	2027	2x Rubin (100 PFLOPS FP4)
Future	Feynman	Post-2027	Not disclosed

Rubin Ultra doubles the Rubin GPU's FP4 inference from 50 PFLOPS to 100 PFLOPS by adding a second GPU die to the superchip package. The NVL72 configuration scales accordingly. Feynman is named after physicist Richard Feynman; no architecture details have been disclosed.

The annual cadence NVIDIA committed to at GTC 2024 — one new architecture per year — remains intact through this roadmap.

The open data contribution: often overlooked, strategically significant

Alongside the hardware, NVIDIA made one of the largest single contributions of open AI training data in 2026:

10 trillion language training tokens
500,000 robotics trajectories
455,000 protein structures
100 terabytes of vehicle sensor data
Open training frameworks for physical AI

The data contribution is not philanthropy — it is ecosystem lock-in of a different kind. If the dominant open robotics, autonomous vehicle, and biomedical models are trained on NVIDIA-curated datasets using NVIDIA simulation tools on NVIDIA hardware, the optimization path for those workloads runs directly through NVIDIA's stack. Researchers building on Cosmos, Isaac, or Clara start from a baseline that is already tuned for Rubin-class hardware.

FAQ

What is the NVIDIA Rubin platform? Rubin is NVIDIA's next-generation AI computing platform, consisting of six co-designed chips: the Vera CPU, Rubin GPU, NVLink 6 Switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet Switch. Delivered as the Vera Rubin NVL72, a 72-GPU, 36-CPU rack-scale system, it succeeds the Blackwell platform and ships in volume in H2 2026.

How much faster is Rubin compared to Blackwell? NVIDIA claims 5x the FP4 inference throughput of a Blackwell GB200 NVL72 rack, 3.5x the training throughput for equivalent workloads, and 10x lower cost per token for MoE inference. For training MoE models specifically, Rubin requires one-quarter the number of GPUs to achieve the same throughput.

What is the Vera CPU and why does it matter? Vera is a custom 88-core Arm CPU with 1.8 TB/s of coherent bandwidth to the Rubin GPU via NVLink-C2C. Unlike standard host CPUs that sit behind PCIe, Vera shares coherent memory state with the GPU. This matters for agentic AI workloads where the CPU orchestrates tool calls, memory retrieval, and inference in tight loops — reducing round-trip latency changes end-to-end throughput.

What did NVIDIA announce at MWC 2026 specifically? At MWC Barcelona (March 2–5, 2026), NVIDIA focused on the telecom layer of the Rubin platform. Key announcements: a 6G AI-native coalition with 12+ global operators including Deutsche Telekom, Ericsson, Nokia, and T-Mobile; the Nemotron Large Telco Model (30B parameters, open source); live AI-RAN field results with T-Mobile on Nokia's CUDA-accelerated stack; and the Coherent partnership for co-packaged optics on Spectrum-6.

When will Rubin systems be available? Volume availability of Vera Rubin NVL72 systems from cloud providers and OEM partners begins H2 2026. AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, and Nscale are confirmed launch partners.

Should I wait for Rubin instead of deploying on Blackwell? If your workload is live or planned before H2 2026, deploy on Blackwell now — software improvements have already added 1.4x training throughput at no hardware cost. If you are planning a new AI factory with a timeline of late 2026 or beyond, the Rubin cost model — particularly for MoE inference — is materially better and worth building toward.

What is the Nemotron Large Telco Model? A 30-billion-parameter open language model fine-tuned on telecom standards documents, synthetic network logs, and fault isolation workflows. Built by NVIDIA and AdaptKey AI on the Nemotron 3 foundation, it is designed to reason through network fault diagnosis and autonomous remediation planning for telecom NOCs.

What comes after Rubin? Rubin Ultra in 2027 doubles FP4 inference performance to 100 PFLOPS per GPU. After that, the Feynman architecture — timeline and specs not yet disclosed. NVIDIA's roadmap commits to one new architecture per year through at least the Feynman generation.

Let's Build Something Together

NVIDIA unveils Rubin platform with six new chips at MWC 2026

On this page

Weekly Newsletter