TL;DR: At GTC 2026, NVIDIA unveiled GR00T N2 — a next-generation humanoid robot foundation model that completes unfamiliar tasks twice as often as leading vision-language-action models. Alongside it came Cosmos 3 (a unified world simulator), the Newton physics engine, and Isaac Lab 3.0. Disney debuted its Olaf robot built on NVIDIA's stack, with Boston Dynamics, Caterpillar, and Franka Robots among the launch partners. The underlying thesis: swap the robotics field's chronic data shortage for a compute-intensive simulation problem that NVIDIA's GPUs are purpose-built to solve.
Table of contents
- GR00T N2: what changed from N1
- DreamZero: the research behind the 2x improvement
- Cosmos 3: world simulation meets visual reasoning
- Newton physics engine: GPU-accelerated simulation at scale
- Disney's Olaf robot and the Kamino simulator
- Isaac Lab 3.0 and DGX infrastructure for robot training
- Industry partners: Boston Dynamics, Caterpillar, and more
- Sim-to-real transfer: why it is still the hard part
- Competitive landscape: Tesla Optimus and Figure AI
- What this means for the robotics industry
- 15 frequently asked questions
GR00T N2: what changed from N1
NVIDIA's GR00T (Generalist Robot 00 Technology) project has been building toward a single ambition since its introduction at GTC 2024: a foundation model for humanoid robots that generalizes across hardware, tasks, and environments the way large language models generalize across text.
GR00T N1.5, released earlier this cycle, offered early access with a commercial license and introduced generalized dexterity skills — the ability for robots to manipulate unfamiliar objects without retraining from scratch. That was the capability demo. GR00T N2 is the production step.
The headline benchmark is a 2x improvement in task completion rate in unfamiliar environments compared to leading vision-language-action (VLA) models. The comparison class matters here. VLA models — which map visual input and language instructions directly to robot actions — are the current competitive baseline. They are what most serious robotics teams are building on in 2026. Doubling their out-of-distribution task success rate is not an incremental improvement; it represents a shift in what robots can plausibly be deployed to do without exhaustive environment-specific training data.
GR00T N2 is built around a dual-system architecture. A slow reasoning system processes high-level scene understanding and instruction following. A fast reactive system handles low-latency motor control. The two systems communicate through a learned interface that allows the reasoning layer to hand off fine-grained execution without becoming a bottleneck. This mirrors the architecture proposed in several recent academic papers on robot foundation models, but GR00T N2 is the first commercially licensed implementation at this scale.
The model supports a wide range of humanoid hardware. NVIDIA has not locked it to a proprietary robot platform — a deliberate strategic choice that positions GR00T N2 as infrastructure rather than a product, and lets NVIDIA capture value at the training compute and developer tooling layer.
Sources: NVIDIA newsroom, GlobeNewswire
DreamZero: the research behind the 2x improvement
GR00T N2's performance gains trace back to a research project called DreamZero. The core insight in DreamZero is that robots, like language models, can learn from synthetic experience — but only if the synthetic experience is physically coherent enough that the learned behaviors transfer to the real world.
Prior synthetic training approaches for robotics suffered from a well-documented problem: the simulator was good enough to produce plausible-looking trajectories but not good enough to produce physically accurate contact dynamics, material properties, or lighting variation. Models trained on this data would fail in deployment because the real world looked and felt different in ways that mattered for control.
DreamZero addresses this with a two-part approach. First, it uses world model pretraining — training a generative model to predict realistic future states of physical environments, not just visually plausible ones. Second, it uses the pretrained world model as a data engine: sample robot behaviors, evaluate them inside the world model, filter for the ones that succeed, and use those as training signal for the action policy.
The result is a policy that has been implicitly trained on a vastly larger and more diverse set of physically grounded scenarios than any real-world data collection effort could produce. The 2x task completion improvement is, in this framing, primarily a function of coverage: GR00T N2 has seen more of the distribution of possible situations than its predecessors, because the world model can generate that distribution on demand.
Cosmos 3: world simulation meets visual reasoning
Cosmos 3 is NVIDIA's third-generation world foundation model, and it represents the most significant architectural expansion since the project began. Previous Cosmos versions focused primarily on synthetic data generation — producing photorealistic video of physical environments that robots could train on. Cosmos 3 adds two capabilities on top of that base: visual reasoning and action simulation.
Visual reasoning means the model can answer questions about a scene — what objects are present, how they relate spatially, what actions are likely to change their configuration — rather than just rendering plausible futures. This makes Cosmos 3 useful not only as a training data generator but as a component in a robot's inference pipeline, where it can serve as a scene understanding module or a plan verification system.
Action simulation means the model can take a proposed robot action sequence and predict whether it will succeed, before the robot attempts it in the real world. This is valuable at deployment time: a robot encountering a novel task can simulate several candidate plans inside Cosmos 3, rank them by predicted success probability, and execute only the most promising one.
The combination of these three capabilities — synthetic world generation, visual reasoning, and action simulation — positions Cosmos 3 as a unified platform for the full lifecycle of robot intelligence, from training data creation through deployment-time inference.
Newton physics engine: GPU-accelerated simulation at scale
Alongside Cosmos 3, NVIDIA announced Newton, a new GPU-accelerated physics engine designed specifically for robot simulation. Newton is built around NVIDIA's existing simulation technology but adds multiphysics support — the ability to simulate not just rigid body dynamics but also soft bodies, fluids, granular materials, and their interactions.
The new NVIDIA PhysX SDK that underpins Newton provides dexterous manipulation simulation that is physically accurate enough to generate training data for fine-grained tasks like cable routing, fabric folding, and liquid handling. These are exactly the categories where current robot foundation models perform worst, because the real-world physics is complex and prior simulators approximated it too coarsely.
Newton also supports simulation at scale. A single DGX system can run thousands of physics simulation instances in parallel, which is necessary for reinforcement learning approaches that require millions of environment interactions to converge. This is the infrastructure side of the "swapping the data problem for a compute problem" framing that has become NVIDIA's core pitch for physical AI.
Disney's Olaf robot and the Kamino simulator
The most visible demonstration at GTC 2026 was Disney's Olaf robot — an animatronic character robot built on NVIDIA's physical AI stack. Olaf is designed for theme park interaction: it can navigate crowds, recognize guests, respond to natural language, and express the full range of the Frozen character's physical mannerisms.
Disney built its Kamino simulator — the internal simulation environment used to train and validate Olaf's behaviors — on top of NVIDIA's Warp framework, a GPU-accelerated Python framework for differentiable simulation. Kamino is now integrated into Newton, which means Disney's simulation infrastructure and NVIDIA's training infrastructure share a common physics layer.
This matters because it eliminates a major source of sim-to-real error: when the simulator used for training and the simulator used for testing are different, behaviors that work in one may not transfer to the other. By building Kamino on Warp and integrating it into Newton, Disney and NVIDIA have created a unified simulation environment from data generation through final deployment validation.
Olaf debuts at Disneyland Paris on March 29, 2026. The deployment is notable not just as a novelty but as a real-world stress test: a theme park environment — uncontrolled crowds, varied lighting, unpredictable guest behavior, continuous multi-hour operation — is one of the most demanding deployment contexts for a character robot. If Olaf performs reliably, it will serve as a strong validation of the entire physical AI stack.
Isaac Lab 3.0 and DGX infrastructure for robot training
Isaac Lab 3.0, available in early access, is NVIDIA's platform for large-scale robot learning on DGX infrastructure. The 3.0 release brings three main improvements over earlier versions.
First, training throughput. Isaac Lab 3.0 can run robot learning experiments at significantly larger scale than previous versions, taking advantage of multi-node DGX configurations and the parallelism improvements in Newton. For reference, training a capable generalist manipulation policy previously required weeks of simulation time on a large cluster. Isaac Lab 3.0 targets a reduction in that timeline to days.
Second, curriculum support. The platform now includes built-in tools for designing and managing training curricula — sequences of increasingly complex tasks that guide a robot policy from basic competence to expert-level performance. This is important because training directly on hard tasks is sample-inefficient; curricula allow the policy to build up capability incrementally, which reduces total compute needed.
Third, integration with GR00T N2. Isaac Lab 3.0 is designed to fine-tune and adapt GR00T N2 foundation models for specific hardware and deployment contexts. The workflow is: start from GR00T N2 as a base, run targeted simulation training in Isaac Lab 3.0 to adapt the model to your specific robot and task distribution, then deploy. This mirrors the fine-tuning paradigm that made large language models accessible to application developers, applied to robotics.
Industry partners: Boston Dynamics, Caterpillar, and more
NVIDIA announced a roster of launch partners for the physical AI stack that spans humanoid robotics, industrial equipment, consumer electronics, and collaborative robotics.
Boston Dynamics is integrating GR00T N2 into its development pipeline. Boston Dynamics is the most credible robotics brand in the market — its Atlas humanoid and Spot quadruped are the benchmark platforms that other teams compare against. The partnership signals that even a company with deep proprietary robotics IP sees value in building on NVIDIA's foundation model layer rather than developing entirely in-house.
Caterpillar's involvement represents the industrial automation segment. Heavy equipment automation — construction sites, mining operations, agriculture — involves physical environments that are even more variable and less controlled than factory floors. GR00T N2's generalization capability is directly relevant to this use case, and Caterpillar's data and deployment scale would provide valuable feedback to NVIDIA's training pipeline.
Franka Robotics brings collaborative robot (cobot) expertise. Franka's arms are among the most widely used research and light industrial platforms in the world, which means GR00T N2 will be validated across a large and diverse installed base.
LG Electronics and NEURA Robotics round out the partnership list, representing consumer and service robotics respectively. The breadth of the partner ecosystem is deliberate: NVIDIA needs real-world deployment feedback across many hardware configurations and task domains to continue improving GR00T N2.
Sources: GlobeNewswire
Sim-to-real transfer: why it is still the hard part
The entire NVIDIA physical AI stack is, at its core, an attempt to solve the sim-to-real transfer problem. This is worth understanding clearly, because it is both the central technical challenge in robotics and the reason NVIDIA's approach — swapping data collection for simulation compute — is not a complete solution to the problem, even if it is a substantial improvement.
Sim-to-real transfer fails when the simulator makes assumptions that do not hold in the real world. These assumptions fall into several categories: visual (lighting, textures, material appearances), physical (contact dynamics, friction, deformation), and perceptual (sensor noise, calibration errors, latency). Every simulation, no matter how sophisticated, makes some simplifying assumptions. The question is whether the policy trained in simulation is robust enough to the errors introduced by those assumptions.
Newton and Cosmos 3 address the physical and visual layers more completely than prior simulators. The new PhysX SDK's multiphysics support means that soft bodies and granular materials — historically major failure modes — are simulated more accurately. Cosmos 3's world model learns realistic visual appearances from real-world data, reducing the visual domain gap.
But the residual gap remains. It is smaller than it was, and DreamZero's world model filtering approach helps by training policies that are robust to variation. However, every GR00T N2 deployment will still require some real-world fine-tuning data to close the remaining gap for the specific hardware and environment. Isaac Lab 3.0's fine-tuning pipeline is designed to make that step as efficient as possible — but it does not eliminate it.
The honest framing from The Decoder captures it well: NVIDIA is swapping the data problem for a compute problem, not solving the underlying physics. That is a genuine advance, but it is a change in the nature of the bottleneck, not its elimination.
NVIDIA's physical AI announcements land in the middle of an increasingly competitive humanoid robotics race. Two companies define the competitive context most clearly: Tesla and Figure AI.
Tesla's Optimus program is the highest-profile humanoid effort outside NVIDIA's ecosystem. Tesla develops its own simulation infrastructure, its own training data pipeline (leveraging vehicle fleet data for visual understanding), and its own hardware platform. Tesla does not use NVIDIA's foundation models — it is building the full stack internally. The advantage of this approach is integration; the disadvantage is that it does not benefit from the breadth of the partner ecosystem that NVIDIA is assembling.
Figure AI has pursued a different path: close partnerships with OpenAI for reasoning capabilities, with Microsoft for cloud infrastructure, and with BMW for deployment environments. Figure's approach acknowledges that no single company is likely to win every layer of the robot AI stack, and focuses Figure's engineering on hardware and integration rather than foundation model research. This positions Figure as a potential GR00T N2 adopter rather than a direct competitor at the model layer.
The broader competitive picture is that NVIDIA is attempting to occupy the infrastructure layer of the robotics stack — the position that CUDA has occupied in the GPU computing stack for two decades. If it succeeds, every robotics company becomes a customer, regardless of whether they win or lose in the hardware market. That is a more durable competitive position than building a single robot platform, and it is why the breadth of the partner roster matters as much as the technical specifications of GR00T N2.
What this means for the robotics industry
Three implications stand out from the GTC 2026 physical AI announcements.
The data bottleneck is shifting. For most of robotics' history, the limiting factor was real-world training data: robot time is expensive, human demonstration collection is slow, and environments are not instrumented for automated data collection. GR00T N2 and the simulation stack reduce how much real-world data is required for a capable policy. This does not eliminate the data problem, but it changes the economics substantially. Teams that previously could not afford to build competitive robot AI because they lacked data infrastructure can now start from a strong pretrained baseline and use simulation to fill the gaps.
Hardware diversity is a feature, not a bug. NVIDIA's decision to make GR00T N2 hardware-agnostic and to build a broad partner ecosystem means that the physical AI stack will be stress-tested across more robot configurations than any single company could produce. This accelerates the generalization capability of future GR00T versions, because every partner deployment generates feedback data that NVIDIA can use to improve the foundation model. The network effects here favor NVIDIA over vertically integrated competitors.
Theme parks before factories. The Disney Olaf deployment is an unusual choice for a first high-profile real-world test of physical AI infrastructure. Theme parks are chaotic, safety-critical, and highly visible. Success at Disneyland Paris will do more for physical AI adoption than a controlled factory pilot, because it demonstrates that the technology works in genuinely unstructured environments under public scrutiny. NVIDIA and Disney are betting that the stack is ready for that test. The March 29 debut will be closely watched.
15 frequently asked questions
What is GR00T N2?
GR00T N2 is NVIDIA's second-generation foundation model for humanoid robots. It provides a pretrained base that robot manufacturers can fine-tune for their specific hardware and tasks, reducing the real-world data required to build capable robot AI.
How much better is GR00T N2 than GR00T N1?
NVIDIA reports that GR00T N2 completes unfamiliar tasks in new environments 2x more often than leading vision-language-action (VLA) models. GR00T N1.5 was the early access commercial release; N2 is the next major capability step.
What is DreamZero?
DreamZero is the research project underlying GR00T N2's improved generalization. It uses a pretrained world model as a synthetic data engine: generate candidate robot behaviors, simulate them in the world model, filter for successful ones, and use those as training signal for the action policy.
What is Cosmos 3?
Cosmos 3 is NVIDIA's third-generation world foundation model. It combines synthetic world generation, visual reasoning, and action simulation into a unified platform for robot training and deployment-time inference.
What is the Newton physics engine?
Newton is a new GPU-accelerated physics engine from NVIDIA that supports multiphysics simulation — rigid bodies, soft bodies, fluids, and granular materials. It is built on the NVIDIA PhysX SDK and is designed to generate training data for dexterous manipulation tasks.
What is Isaac Lab 3.0?
Isaac Lab 3.0 is NVIDIA's platform for large-scale robot learning on DGX infrastructure. The 3.0 release improves training throughput, adds curriculum support, and is designed to fine-tune GR00T N2 for specific hardware and deployment contexts.
What is Disney's Kamino simulator?
Kamino is Disney's internal simulation environment for training and validating robot behaviors. It was built on NVIDIA's Warp framework and is now integrated into Newton, giving Disney and NVIDIA a shared physics layer from data generation through deployment validation.
When does Disney's Olaf robot debut publicly?
The Olaf robot debuts at Disneyland Paris on March 29, 2026.
What hardware does GR00T N2 support?
GR00T N2 is hardware-agnostic and supports a range of humanoid robot platforms. Current announced partners include Boston Dynamics, Franka Robots, LG Electronics, and NEURA Robotics.
Is GR00T N2 commercially available?
GR00T N1.5 is available in early access with a commercial license. GR00T N2 was announced at GTC 2026; commercial availability details are available through NVIDIA's developer program.
What is the NVIDIA PhysX SDK?
The new NVIDIA PhysX SDK is the multiphysics simulation library that underpins Newton. It adds support for dexterous manipulation simulation including soft bodies and granular materials, which were poorly handled by prior physics engines.
What does "swapping the data problem for a compute problem" mean?
It means that instead of needing large amounts of real-world robot training data (which is expensive and slow to collect), NVIDIA's approach uses GPU-accelerated simulation to generate synthetic training data at scale. The bottleneck shifts from data collection to simulation compute — which NVIDIA's hardware is designed to provide.
How does GR00T N2 handle sim-to-real transfer?
GR00T N2 uses DreamZero's world model filtering to train policies that are robust to variation, reducing the sim-to-real gap. However, real-world fine-tuning is still required for specific hardware and environments. Isaac Lab 3.0's fine-tuning pipeline is designed to make this step as efficient as possible.
How does NVIDIA's physical AI stack compare to Tesla Optimus?
Tesla builds a fully integrated internal stack — hardware, simulation, training, and deployment — without using NVIDIA's foundation models. NVIDIA's approach is infrastructure-layer: provide the foundation model and simulation platform that any robotics company can build on, creating network effects across the partner ecosystem.
What is the significance of the Boston Dynamics partnership?
Boston Dynamics is the most credible brand in humanoid and quadruped robotics. Its decision to integrate GR00T N2 into its development pipeline signals that even companies with deep proprietary robotics expertise see value in building on NVIDIA's foundation model layer, validating NVIDIA's infrastructure-layer strategy.