TL;DR: At GTC 2026, NVIDIA revealed that its network of NVIDIA Cloud Partners (NCPs) has cumulatively deployed more than one million GPUs across AI factories globally — representing 1.7 gigawatts of aggregate AI compute capacity. That figure nearly doubles the 400,000 GPUs and 550 megawatts reported at GTC 2025, reflecting year-over-year growth of roughly 2x across both GPU count and power capacity. The expansion spans sovereign AI deployments in the United States, Australia, Germany, Indonesia, and India, with AWS separately committing to deploy more than one million NVIDIA GPUs — including Blackwell, Rubin, and Groq 3 LPUs — in its own infrastructure. The milestone is not a single deployment but a cumulative ledger of NCP-operated compute capacity built on NVIDIA architecture, and it signals that the AI factory model Jensen Huang introduced in 2023 is becoming the dominant paradigm for national and commercial cloud infrastructure.
What you will learn
- What the one million GPU milestone actually represents
- GTC 2025 to GTC 2026: the 12-month doubling
- What is an AI factory and why the terminology matters
- The NCP program: structure, partners, and selection criteria
- Sovereign AI: the geopolitical dimension of GPU deployment
- AWS and the one million GPU commitment
- Blackwell, Rubin, and Groq 3: the hardware stack powering AI factories
- 1.7 gigawatts: what that power figure actually implies
- Competitive dynamics: NVIDIA's partner ecosystem as a moat
- What comes next: the road to ten million GPUs
- Frequently asked questions
What the one million GPU milestone actually represents
The one million GPU figure announced at GTC 2026 is a cumulative deployment count across the entire NVIDIA Cloud Partner ecosystem. It is not a single data center, a single customer, or a single country. It is the aggregate of every NCP-operated GPU cluster built on NVIDIA hardware — from hyperscale deployments running thousands of nodes in Virginia and Frankfurt to sovereign AI installations running Blackwell racks in Jakarta and Hyderabad.
The distinction matters. One million GPUs deployed across a partner ecosystem is a different signal than one million GPUs in a single facility. It reflects the breadth of the AI factory model's adoption: dozens of operators, across dozens of markets, building infrastructure according to NVIDIA's reference architecture and operating it under the NCP certification program. The ecosystem interpretation also means the number compounds from all sides simultaneously — existing partners expanding capacity, new partners joining, and sovereign mandates driving government-sponsored deployments that add capacity outside the traditional hyperscaler channel.
For context on scale: NVIDIA shipped approximately 3.76 million data center GPUs in fiscal Q4 2026 alone, generating the bulk of its reported $39.1 billion quarterly data center revenue. The one million NCP GPU figure represents a meaningful fraction of a single quarter's shipments permanently installed and operating in partner facilities. The installed base continues to grow with every quarter's shipments.
The 1.7 gigawatt capacity figure maps to the power draw of the installed GPU fleet. A single Blackwell B200 GPU has a thermal design power of roughly 1,000 watts. One million GPUs at ~1,700 watts average per GPU (accounting for HGX/MGX node overhead, networking, and cooling) produces approximately 1.7 GW of aggregate facility draw. This is not peak capacity — it is the operational power envelope of the installed base at normal utilization.
GTC 2025 to GTC 2026: the 12-month doubling
At GTC 2025, NVIDIA reported that its NCP network had deployed approximately 400,000 GPUs consuming 550 megawatts of capacity. The GTC 2026 figures — one million-plus GPUs at 1.7 gigawatts — represent growth of:
- 2.5x in GPU count (400K to 1M+)
- 3.1x in power capacity (550 MW to 1,700 MW)
The power capacity growing faster than GPU count is meaningful. It reflects a generation transition in the installed base. The GPUs added between GTC 2025 and GTC 2026 are largely Blackwell-class hardware operating at significantly higher thermal envelopes than the A100 and H100 equipment that dominated the GTC 2025 base. A Blackwell B200 draws roughly 2x the power of an H100, meaning each new GPU added to the fleet contributes more to the wattage denominator than its predecessor did. The divergence between unit growth (2.5x) and power growth (3.1x) is precisely what a generation transition from H100 to B200 at scale would produce.
The 12-month timeframe is also significant because it aligns with the period when sovereign AI became a policy priority globally. Multiple governments moved from feasibility studies to funded deployments between mid-2025 and early 2026. That policy acceleration drove NCP deployments in markets that were not on the board a year earlier, adding capacity in Southeast Asia, South Asia, and Central Europe that contributed meaningfully to the cumulative GPU count.
What is an AI factory and why the terminology matters
Jensen Huang first used the term AI factory at GTC 2023, and it has since become the organizing concept for NVIDIA's entire go-to-market narrative. Understanding why NVIDIA uses this term — rather than "data center" or "cloud" — clarifies why the one million GPU milestone is framed the way it is.
A conventional data center is, at its core, a place where applications run. The computing resource is general-purpose: the same rack that runs a web server today can run a database tomorrow. The data center metaphor implies a warehouse of resources available for allocation.
The AI factory metaphor implies something different: an industrial facility with a defined input, a defined process, and a defined output. Raw data and compute go in. Trained models and inference results come out. The factory does not do arbitrary work — it specializes in producing intelligence at scale. Like a physical factory, its throughput is measurable, its efficiency is optimizable, and its output has direct economic value.
This framing has several strategic implications. First, it positions AI compute as production infrastructure rather than IT overhead. A factory is a capital asset that generates revenue; an IT department is a cost center. Customers who internalize the AI factory concept are more likely to treat GPU infrastructure as a strategic investment requiring dedicated capacity, rather than a commodity to be right-sized and outsourced to the lowest bidder.
Second, the factory metaphor implies vertical integration of the production stack. A factory does not just need machines — it needs the full production line, from raw material intake through finished output delivery. NVIDIA's response to this implication is a full-stack AI infrastructure portfolio: GPUs, NVLink switches, DGX systems, networking fabric, software frameworks (CUDA, cuDNN, TensorRT, NIM), and now the NCP ecosystem that operates it all on behalf of customers who want to buy AI output rather than manage infrastructure.
Third, the factory concept supports sovereign AI narratives. Nations that want to produce AI domestically — to avoid depending on foreign cloud providers for strategic intelligence workloads — need their own AI factories. The NCP program provides a path: a government or domestic cloud operator becomes an NCP, deploys NVIDIA hardware under NVIDIA's reference architecture, and operates an AI factory that is physically located within the nation's jurisdiction.
The NCP program: structure, partners, and selection criteria
The NVIDIA Cloud Partner program is NVIDIA's framework for certifying and supporting organizations that build and operate NVIDIA-based AI infrastructure for third-party customers. NCPs are not simply resellers of NVIDIA hardware — they are operators of NVIDIA-architected AI factories that provide GPU compute as a service.
To qualify as an NCP, an organization must meet requirements across several dimensions: minimum GPU deployment scale, NVIDIA-certified network and storage integration, technical training and certification for operations staff, and compliance with NVIDIA's reference architecture for DGX SuperPOD or equivalent configurations. The program provides NCPs with access to prioritized hardware allocation, joint go-to-market support, NVIDIA AI Enterprise software licenses, and co-branded credibility.
The NCP ecosystem spans a range of operator types:
- Hyperscalers: AWS, Google Cloud, Microsoft Azure, and Oracle Cloud all participate in the NCP framework for their NVIDIA-based GPU offerings, though their scale and direct relationship with NVIDIA goes beyond standard NCP terms.
- Regional cloud providers: Operators like CoreWeave (US), Lambda Labs (US), Hetzner (Germany), and numerous Asia-Pacific regional providers have built significant NVIDIA-based GPU clouds under the NCP program.
- Telecom-affiliated operators: National telecoms in markets like Australia, Indonesia, India, and Germany have partnered with system integrators to deploy NCP-certified AI factories, often with sovereign AI mandates driving the investment.
- Sovereign AI infrastructure operators: Government-sponsored entities in multiple countries have deployed NCP-certified infrastructure explicitly to ensure domestic AI capability and data sovereignty.
The cumulative one million GPU figure aggregates across all of these operator types. The distribution is not uniform — a handful of large operators account for the majority of deployed GPUs, while the long tail of smaller regional NCPs contributes the geographic diversity that enables the six-continent claim.
Sovereign AI: the geopolitical dimension of GPU deployment
The sovereign AI dimension of the NCP milestone may be its most strategically significant aspect. Sovereign AI — the principle that nations should control their own AI infrastructure, data, and model development rather than depending entirely on foreign cloud providers — has moved from think-tank discussion to policy priority across multiple major economies in the past two years.
NVIDIA has been the most aggressive hardware vendor in embracing the sovereign AI narrative, for obvious reasons: sovereign AI mandates create demand for domestic GPU deployments that bypass the hyperscaler channel, generating direct NCP sales that might otherwise be mediated by AWS or Azure.
The GTC 2026 announcement highlighted sovereign AI deployments across five countries specifically: United States, Australia, Germany, Indonesia, and India. Each represents a distinct policy driver:
United States: The US sovereign AI build-out is driven by national security requirements. Federal agencies, defense contractors, and intelligence community operators need GPU infrastructure that operates outside commercial cloud environments with specific data handling, access control, and security certifications. NCP-certified operators with appropriate government clearances fill this role.
Australia: Australia has been one of the most active sovereign AI investors globally, with the federal government's Australian Sovereign AI Capability program funding domestic compute infrastructure. The rationale is dual: economic (keeping AI value-creation onshore) and security (ensuring that sensitive government workloads run on domestically controlled infrastructure rather than foreign cloud regions).
Germany: Germany's sovereign AI push is driven by EU data sovereignty requirements and the broader European concern about strategic dependence on US hyperscalers for critical AI infrastructure. GDPR compliance for AI training workloads is simpler when the training happens on domestically operated infrastructure. German industrial companies — particularly automotive and manufacturing — have also been early enterprise adopters of sovereign AI factories for proprietary model training.
Indonesia: Southeast Asia's largest economy has articulated an explicit goal of becoming a regional AI hub, and its government has been investing in domestic AI infrastructure as part of that strategy. NCP deployments in Indonesia reflect both that government investment and the broader commercial demand from one of the world's largest digital economies.
India: India's AI mission has backed domestic compute infrastructure buildout as a national priority. NCP deployments in India connect to both government-sponsored initiatives and the substantial commercial demand from India's technology industry for GPU access that does not require data egress to US or European cloud regions.
The six-continent claim — including deployments in Africa and South America beyond the five highlighted countries — underscores that the AI factory model is genuinely global, not concentrated in the traditional hyperscaler heartlands of North America and Western Europe.
AWS and the one million GPU commitment
Separate from the aggregate NCP milestone, Amazon Web Services announced at GTC 2026 a commitment to deploy more than one million NVIDIA GPUs within its own infrastructure. This figure deserves unpacking because it encompasses multiple hardware generations and represents a different commitment type than the cumulative NCP figure.
The AWS one million GPU deployment spans three distinct hardware lines:
Blackwell GPUs (B100, B200, GB200): The current-generation NVIDIA compute platform, featuring the Blackwell architecture with 208 billion transistors, NVLink 5.0, and the integrated Transformer Engine that accelerates attention operations central to large language model inference. AWS's Blackwell deployment is the near-term core of the commitment, with B200-based EC2 instances already in production and GB200 NVL72 configurations in the deployment pipeline.
Rubin GPUs: The next-generation NVIDIA architecture following Blackwell, announced at GTC 2025 for delivery in 2026. AWS's commitment to Rubin-class deployments reflects multi-year capacity planning agreements with NVIDIA — the same kind of supply chain certainty that the NCP program is designed to provide. Including Rubin in the commitment signals AWS has visibility into its own demand trajectory far enough out to justify reserving capacity for hardware that is not yet in volume production.
Groq 3 LPUs (Language Processing Units): This is the most unusual element of the AWS commitment. Groq is an AI inference chip startup that designs purpose-built chips for LLM inference — not for training, but specifically for the autoregressive token generation that dominates deployed model serving. Including Groq 3 LPUs in an NVIDIA-framed one million GPU announcement suggests either that AWS is treating LPUs as a complement to GPU capacity in a mixed inference fleet, or that the "one million GPU" figure uses "GPU" loosely to mean "AI accelerator unit" inclusive of non-GPU architectures.
The AWS commitment, if executed, would by itself add a number comparable to the entire current NCP installed base to a single operator's fleet. AWS is already among the largest NVIDIA customers globally; this commitment would extend that relationship across at least two hardware generations.
Blackwell, Rubin, and Groq 3: the hardware stack powering AI factories
The AI factory ecosystem is not a static hardware install — it is a generational refresh cycle where the compute density available per square meter of data center floor doubles approximately every two years.
Blackwell (current generation): The NVIDIA B200 GPU delivers 20 petaflops of FP4 AI compute in a 700W TDP package. The GB200 NVL72 configuration — 72 B200 GPUs connected by NVLink 5.0 in a single rack-scale system — delivers 1.4 exaflops of FP4 inference compute in a 120 kW rack. For AI factory operators, the GB200 NVL72 is the reference unit of compute: one rack, one exaflop, one operator deployment. The NCP ecosystem is currently transitioning from H100 DGX pods to Blackwell configurations as the dominant installation unit.
Rubin (next generation): NVIDIA's Rubin architecture, announced for 2026 delivery, introduces several advances over Blackwell: NVLink 6.0 with higher bandwidth, HBM4 memory with greater capacity and bandwidth, and the Rubin Ultra configuration that doubles the compute density of GB200 in a similar power envelope. Early Rubin deployments will target hyperscalers and the largest NCPs with established infrastructure for 120–200 kW racks. The AWS Rubin commitment suggests volume deployments beginning in late 2026 or 2027.
Groq 3 LPUs: Groq's Language Processing Unit architecture takes a fundamentally different approach to AI inference than GPUs. Where a GPU is a massively parallel processor with flexible data paths and high-bandwidth memory hierarchy, a Groq LPU is a deterministic, latency-optimized processor designed specifically for the sequential token generation patterns of autoregressive LLMs. Groq chips deliver predictable, consistent token generation speeds with very low latency variance — characteristics that matter enormously for customer-facing AI applications where response time consistency affects user experience. The Groq 3 generation targets competitive throughput per chip while maintaining the latency advantages of the LPU architecture.
The 1.7 gigawatt capacity figure attached to the NCP milestone deserves analysis beyond the headline. It is simultaneously a measure of AI compute density, a statement about energy infrastructure requirements, and a forecast of economic impact.
For reference on power scale: the Hoover Dam generates approximately 2 gigawatts of hydroelectric power at full capacity. The NCP AI factory ecosystem consumes energy at roughly 85% of a major hydroelectric installation — continuously, not as a peak draw. Three large nuclear reactors produce approximately 3 gigawatts combined. The AI factory ecosystem NVIDIA has enabled now consumes power at roughly half the output of three nuclear reactors.
This is not offered as a criticism — it is offered as a calibration. Energy grid planning organizations in every country where significant AI infrastructure is being deployed are incorporating these figures into long-range capacity forecasts. The question of where the electricity comes from — and at what carbon intensity — is becoming a first-order constraint on AI factory siting decisions.
From an economic perspective, 1.7 GW of continuously operating AI compute generates revenue. At market rates for GPU cloud services (roughly $2–4 per GPU-hour for H100-class hardware, higher for B200), one million GPUs operating at 70% average utilization generate approximately $12–25 billion in annualized revenue for the NCP ecosystem. The wide range reflects the heterogeneity of pricing across operator types, geographies, and contract structures, but the order of magnitude indicates that the NCP program has created a multi-billion-dollar annual revenue ecosystem on NVIDIA hardware.
The power figure also drives data center real estate and infrastructure investment. Building out 1 GW of data center capacity requires approximately $5–8 billion in facility, cooling, and power infrastructure investment, separate from the GPU hardware cost. The 1.7 GW NCP ecosystem has therefore catalyzed somewhere between $8–14 billion in supporting infrastructure investment by NCP operators and their facility partners.
Competitive dynamics: NVIDIA's partner ecosystem as a moat
The NCP milestone is not merely a deployment statistic — it is evidence of a competitive moat strategy executing at scale.
AMD, Intel, and custom silicon programs at Google, Amazon, and Microsoft are all NVIDIA's competitors for AI compute workloads. They compete on the merits of their hardware: compute density, memory bandwidth, software ecosystem, total cost of ownership. What none of them has replicated is NVIDIA's partner ecosystem infrastructure — the network of certified operators, sovereign AI relationships, and enterprise customer deployments that the NCP program represents.
Building GPU hardware competitive with Blackwell is a multi-year, multi-billion-dollar engineering effort that AMD's MI300X and Intel's Gaudi 3 are pursuing with credible results. Building a global ecosystem of one million deployed GPUs operated by certified partners across six continents is a relationship and supply chain achievement that cannot be replicated by shipping better chips.
The NCP program creates switching costs at multiple levels:
- Operator level: An NCP operator has invested in NVIDIA-certified infrastructure, trained staff on NVIDIA tools, and built customer relationships based on NVIDIA product roadmap. Migrating to an alternative GPU platform requires re-certification, re-training, and re-validation of workloads. The cost is not just hardware replacement — it is operational disruption.
- Customer level: Enterprise customers who have tuned workloads, built internal tooling, and trained internal teams on NVIDIA CUDA and NIM frameworks face significant migration costs to move to alternative GPU platforms, even if the alternative hardware delivers equivalent performance.
- Sovereign level: National AI factory programs that have committed public funds to NVIDIA-based infrastructure are politically difficult to unwind. A government that has announced a sovereign AI factory built on NVIDIA architecture has, in effect, staked domestic AI credibility on the NVIDIA roadmap. Re-platforming is not just a technical decision — it is a political one.
This ecosystem lock-in is what NVIDIA's management means when they describe the NCP program as strategically important beyond its direct revenue contribution. Every NCP deployment is a reference installation that makes the next NCP deployment more likely, in the same geography or in a neighboring one that benchmarks against its peer.
What comes next: the road to ten million GPUs
The current trajectory of NCP deployments and the announced capacity commitments from major cloud operators suggest the next major milestone is ten million cumulative GPUs in the NCP ecosystem. At the current doubling rate, this milestone would arrive in approximately two to three years — potentially at GTC 2028 or 2029.
The path there involves several compounding forces:
Generational density increase: Each GPU generation delivers more compute per chip. If the NCP ecosystem reaches ten million GPU-equivalents, but those GPUs are Rubin or post-Rubin chips delivering 4–8x the compute of a Blackwell B200, the actual AI compute capacity grows by a larger multiple than the GPU count. Ten million Rubin GPUs at 2x Blackwell density represents twenty million Blackwell-equivalent compute units.
Sovereign AI pipeline: The sovereign AI deployments announced at GTC 2026 represent early-stage commitments. As those initial deployments prove out, the nations that have made initial investments will expand capacity and the nations that have been studying the space will move to deployment. The addressable market for sovereign AI factories — countries with GDP above $500 billion that have articulated AI strategy goals — includes more than thirty nations that have not yet made major NCP commitments.
Enterprise AI factory adoption: The NCP ecosystem is currently dominated by cloud operators serving multiple tenants. The next growth phase involves enterprises deploying dedicated AI factory capacity for their own workloads — pharmaceutical companies training drug discovery models, automotive companies training autonomous driving systems, financial institutions training proprietary risk models. These deployments are often operated by NCPs under managed service arrangements, contributing to NCP GPU counts without the enterprise appearing in any public deployment announcement.
Edge and inference expansion: Current NCP deployments are predominantly training-oriented — large-scale clusters for model development. As AI transitions from development to deployment, the inference workload grows faster than the training workload. Inference AI factories — potentially smaller, geographically distributed, closer to end users — represent a second wave of NCP deployments that has not yet reached significant scale but is the logical next chapter as every enterprise application embeds AI model serving.
The road from one million to ten million GPUs is not linear extrapolation — it is a structural expansion of who operates AI factories, for what purposes, and in how many markets.
Frequently asked questions
What is the NVIDIA Cloud Partner program?
The NVIDIA Cloud Partner program is a certification framework for organizations that build and operate NVIDIA-based AI compute infrastructure for third-party customers. NCPs must meet minimum deployment standards, technical certifications, and architectural requirements set by NVIDIA. In return, they receive access to prioritized hardware allocation, joint marketing support, and NVIDIA AI Enterprise software licensing. The program includes hyperscalers, regional cloud providers, telecom-affiliated operators, and sovereign AI infrastructure operators.
Does "one million GPUs" mean one million individual GPU chips?
The figure refers to cumulative GPU deployments — the total count of individual GPU chips installed and operating in NCP-certified facilities. In practical data center terms, GPUs are deployed in HGX server nodes (8 GPUs per node) or DGX systems (8 GPUs per system), so one million GPUs represents approximately 125,000 server nodes. These nodes are distributed across hundreds of facilities operated by dozens of NCPs worldwide.
NVIDIA shipped approximately 3.76 million data center GPUs in fiscal Q4 2026 alone. The one million NCP GPU figure is a cumulative installed base across multiple years and hardware generations — not a single quarter's shipments. It represents the GPUs currently operating in NCP-certified facilities, which includes H100-era deployments alongside newer Blackwell installations. The installed base grows with each quarter's shipments allocated to NCP operators.
What countries have the largest sovereign AI deployments?
Based on announced programs and public investment commitments, the United States, Germany, Australia, India, and Indonesia lead in sovereign AI factory deployments. France, Japan, UAE, Saudi Arabia, and Singapore have also made significant sovereign AI investments that include NVIDIA-based NCP deployments. The six-continent claim from NVIDIA's GTC 2026 announcement includes facilities in Africa and South America, though specific country details for those regions were not disclosed.
What does 1.7 gigawatts of AI capacity mean in practical terms?
1.7 gigawatts is the aggregate power draw of the NCP AI factory ecosystem at normal operating utilization. For context: a single Blackwell B200 GPU draws approximately 700 watts, so one million B200-equivalent GPUs at roughly 1.7 kW average (including node overhead and cooling) produce approximately 1.7 GW. This power consumption is roughly equivalent to the output of a large hydroelectric dam or three nuclear reactor units. It translates to approximately 14.9 terawatt-hours of annual energy consumption across the NCP ecosystem.
What is the difference between an AI factory and a conventional data center?
A conventional data center is a general-purpose compute facility where diverse workloads — web serving, databases, business applications — run on shared infrastructure. An AI factory is a specialized facility architected specifically for AI training and inference workloads, with hardware configurations (high-density GPU clusters with NVLink interconnects), networking (InfiniBand or high-bandwidth Ethernet fabrics), storage (NVMe and high-throughput parallel file systems), and cooling (liquid cooling for 60–120 kW racks) optimized for AI workloads. Jensen Huang introduced the term to signal that AI compute requires purpose-built infrastructure, not repurposed general data center capacity.
The NCP cumulative one million GPU figure and the AWS one million GPU commitment are distinct metrics. The NCP figure represents the aggregate installed base across all NVIDIA Cloud Partners globally. The AWS commitment is a forward-looking deployment target for Amazon Web Services specifically, encompassing Blackwell, Rubin, and Groq 3 LPUs. If AWS executes its commitment, it would potentially double the NCP-category installed base as Rubin deployments ramp — though the two figures are tracked on different bases (cumulative installed vs. planned forward deployment).
What hardware does an AI factory use beyond GPUs?
An AI factory built on NVIDIA reference architecture includes: GPU compute nodes (HGX or DGX systems), NVLink switch fabric for intra-rack and rack-scale connectivity, InfiniBand or high-bandwidth Ethernet for inter-rack networking, NVMe-based storage arrays for training data and model checkpoints, liquid cooling infrastructure for high-density racks, and software stack (CUDA, cuDNN, NCCL, TensorRT, NIM microservices) for workload execution. The full NCP certification covers all of these components, not just the GPU chips.
Is Groq part of the NCP program?
Groq is a separate hardware vendor — its LPUs are distinct from NVIDIA GPUs. The inclusion of Groq 3 LPUs in AWS's "one million GPU" commitment reflects AWS's mixed accelerator strategy for inference workloads, not a direct NVIDIA-Groq partnership. NVIDIA and Groq are competitors in the AI accelerator inference market. The AWS announcement bundles both hardware types under a single headline number, which may reflect AWS framing more than any formal relationship between NVIDIA and Groq.
What is a sovereign AI factory and who funds them?
A sovereign AI factory is an AI compute facility operated within a nation's jurisdiction to serve domestic government, research, or commercial AI needs without dependence on foreign cloud infrastructure. They are typically funded through national AI strategies — government investment programs that provide capital for hardware procurement and facility construction. Operations are usually contracted to a domestic cloud operator or telecom (certified as an NCP) that manages the infrastructure on behalf of government and enterprise customers. The sovereign AI mandate typically includes requirements around data residency, access controls, and supply chain provenance of hardware components.
How does NVIDIA price GPU access through NCPs compared to direct hyperscaler access?
NCP operators set their own pricing for GPU cloud services, which creates a competitive market separate from hyperscaler pricing. In practice, NCP GPU pricing tends to be competitive with or slightly above hyperscaler pricing for on-demand access, with NCPs offering differentiated value through geographic availability (sovereign deployment locations not served by hyperscalers), contract flexibility (reserved capacity terms, compliance certifications), and specialized service offerings (managed MLOps, custom inference optimization). For enterprise customers with sovereignty requirements, NCP pricing may carry a premium that reflects the value of domestic data residency.
What is NVIDIA's Rubin architecture and when will it be available in AI factories?
Rubin is NVIDIA's post-Blackwell GPU architecture, announced at GTC 2025 and targeted for delivery in 2026. It introduces NVLink 6.0, HBM4 memory, and an enhanced Transformer Engine. Early Rubin deployments will reach hyperscalers and large NCPs in late 2026, with broader NCP availability expected through 2027. The Rubin Ultra configuration doubles compute density relative to GB200 NVL72 in a comparable power envelope, making it the target architecture for next-generation AI factory builds.
AMD's Instinct MI300X and MI350 series GPUs compete with NVIDIA Blackwell on hardware benchmarks for specific workloads, including large model inference where the MI300X's unified HBM memory architecture delivers advantages for models that fit in memory. However, AMD does not have an equivalent to the NCP program — a certified partner ecosystem for AI factory operations at scale. AMD's data center GPU deployments are primarily mediated through hyperscaler custom deployments (Microsoft Azure's MI300X deployments, for example) rather than a broad partner ecosystem. The NCP program's cumulative one million GPU milestone has no direct AMD equivalent to benchmark against.
Will AI factory deployments slow as model training plateaus?
The trajectory of AI factory deployments does not depend on a training-only demand model. Even if large model pre-training reaches diminishing returns (a debated hypothesis), inference demand grows with every new enterprise AI application deployment. A single widely-used AI application serving millions of users requires orders of magnitude more GPU-hours for inference than for the original training run. The transition from AI development (training-dominated) to AI deployment (inference-dominated) is expected to dramatically increase aggregate GPU demand over the next five years, sustaining AI factory build-outs even as the training dynamics evolve.
Where can I learn more about NVIDIA's GTC 2026 announcements?
NVIDIA's official GTC 2026 news coverage is available at https://blogs.nvidia.com/blog/gtc-2026-news/, which aggregates announcements across compute platforms, AI software, partner programs, and sovereign AI initiatives announced at the conference.