TubeReads

NVIDIA GTC Keynote 2026

NVIDIA is betting a trillion dollars on a radical shift in computing—from data centers that store files to AI factories that manufacture tokens. The company claims inference demand has exploded 10,000 times in two years, while the value of frontier models has plummeted from exclusive to commoditized. Now, with the Vera Rubin platform and Grok's acquisition, Jensen Huang is promising 350x token generation speed increases in just 24 months. Can the world's infrastructure absorb this pace? And will OpenClaw's overnight explosion as «the most popular open source project in human history» force every company to rethink their entire software stack before they're ready?

Durée de la vidéo : 2:56:06·Publié 16 mars 2026·Langue de la vidéo : English
10–11 min de lecture·16,643 mots prononcésrésumé en 2,085 mots (8x)·

1

Points clés

1

The «inference inflection» has arrived: AI now performs productive work requiring 10,000× more compute than two years ago, while usage has grown 100×—a combined million-fold demand increase Jensen attributes to reasoning (O1), code-generation agents (Claude Code), and agentic workflows.

2

NVIDIA pivoted from chip vendor to vertically integrated AI factory architect, delivering Vera Rubin systems that generate 350× more tokens per gigawatt than Hopper (H200) in the same 24-month span—far exceeding Moore's Law's expected 1.5× gain.

3

OpenClaw «open-sourced the operating system of agentic computers,» forcing every SaaS company to become a «gas» (agent-as-a-service) company; NVIDIA wrapped it in NemoClaw with security, privacy routing, and enterprise guardrails to make corporate deployment safe.

4

Physical AI is scaling across 18 million robo-taxi-ready vehicles per year (BYD, Hyundai, Nissan, plus prior Mercedes, Toyota, GM) and 110 robot partners; NVIDIA's AlpaMayo reasoning model and Cosmos World Foundation enable «compute is data» synthetic training at scale.

5

NVIDIA now produces «multi-gigawatts of AI factories per month» with Vera Rubin already running at Azure; the company insists its install base, 20-year CUDA flywheel, and horizontal openness make it the only platform customers can deploy «with complete confidence» at trillion-dollar scale.

En bref

NVIDIA has engineered a 40-million-times compute increase in a decade and now sees $1 trillion in committed demand through 2027, built on three pillars: AI factories optimized for token economics, OpenClaw as the «operating system» for agentic computing, and physical AI deployed at planetary scale—from autonomous vehicles to humanoid robots to space data centers.


2

The Inference Inflection: Why Demand Exploded 1,000,000×

Three model breakthroughs—ChatGPT, O1 reasoning, Claude Code—triggered a million-fold compute jump in two years.

Jensen opened by declaring 2025 «NVIDIA's year of inference» and explained why computing demand has surged beyond all forecasts. He traced three inflection points: ChatGPT (generative AI era), OpenAI's O1 (reasoning models that reflect, plan, and decompose problems), and Anthropic's Claude Code (the first agentic model that reads, compiles, tests, and iterates code). Each wave multiplied both token input (context) and output (thinking) by orders of magnitude.

He quantified the shock: «The computing demand of the work has gone up by 10,000 times, and usage has gone up by 100 times. That's why computing demand has increased by one million times in the last two years.» This is no longer training-dominated—AI must now reason, generate, and act in production, which means continuous inference at massive scale. Every startup, every hyperscaler, every enterprise is constrained not by training capacity but by inference throughput.

The consequence: 100% of NVIDIA engineers now use a combination of ChatGPT, Claude Code, Codex, and Cursor. «There's not one software engineer today who is not assisted by one or many AI agents,» Jensen said. The shift from perception to generation to reasoning to action has made AI «able to do productive work»—and that productive work consumes tokens at rates no one anticipated even twelve months ago.


3

NVIDIA's Three Platforms and the CUDA Flywheel

🔧
CUDAX Libraries
Over 100 domain-specific libraries (RTX, cuDF, cuVS, cuDNN, cuOpt) turn raw compute into activated solutions. NVIDIA is «an algorithm company»—continuous software updates reduce compute cost for the entire install base, extending useful life and ROI.
🖥️
AI Factory Systems
Vertical integration from chip to rack-scale liquid-cooled supercomputers (NVLink 72, Vera Rubin, Grok LPX). NVIDIA now ships «thousands of systems per week, multi-gigawatts of AI factories per month.» Every component co-designed for token throughput and perf-per-watt.
🌐
Horizontal Openness
«Vertically integrated, horizontally open.» NVIDIA integrates into every cloud (AWS, Azure, GCP, Oracle, CoreWeave), every OEM (Dell, HPE), and every AI native. This flywheel attracts developers, drives breakthroughs, expands install base, and repeats—accelerating faster each cycle.

4

«Hopper pricing is going up»: Why Old GPUs Still Print Money

Six-year-old Ampere cloud pricing rises because CUDA's reach and continuous optimization sustain value.

💡

«Hopper pricing is going up»: Why Old GPUs Still Print Money

Jensen revealed an under-discussed fact: «Ampere, shipped six years ago, the pricing of Ampere in the cloud is going up.» Why? NVIDIA's install base spans hundreds of millions of GPUs, all architecturally compatible. Every new software optimization released by NVIDIA benefits millions of deployed systems simultaneously. The combination of high reach, continuous updates, and accelerating application diversity means older GPUs retain—and even increase—their utility. This dynamic is unique to NVIDIA and explains why customers can deploy infrastructure «with complete confidence» that useful life will be long and total cost of ownership low.


5

Grace Blackwell to Vera Rubin: 350× Token Speed in 24 Months

NVIDIA's roadmap leapfrogs Moore's Law with architecture, not transistors—delivering 35× to 50× per generation.

Hopper H200 → Grace Blackwell Performance Gain
35× tokens per watt
Semi-Analysis measured 50× in practice; Jensen: «He accused me of sandbagging.»
Grace Blackwell → Vera Rubin Token Throughput Increase (ISO power)
10× at premium tier
NVLink 72 scale-up and FP4 precision enable highest-value inference workloads to scale massively.
Hopper → Vera Rubin + Grok Total Speed-Up (one gigawatt factory)
350× token generation rate
From 2 million to 700 million tokens per gigawatt in two years; Moore's Law would deliver ~1.5×.
NVIDIA Compute Growth (10 years)
40 million times
From DGX-1 (170 teraflops, 2016) to Vera Rubin (3.6 exaflops per NVLink 72 rack, 2026).
Committed Demand Through 2027
$1 trillion
Up from $500 billion through 2026 announced one year ago. Jensen: «We are going to be short.»

6

The Token Factory: Throughput × Speed = Revenue

Every data center is now power-limited; optimizing tokens-per-watt and inference latency directly determines profitability.

Jensen introduced the mental model every AI CEO will adopt: «Your data center is no longer a file repository—it's a factory that generates tokens.» He drew a two-axis chart: vertical = throughput (tokens per watt), horizontal = token speed (latency/interactivity). The smarter the AI (longer context, more reasoning), the lower the throughput; the faster the inference, the higher the price tier.

NVIDIA segments the market into four tiers: free (high throughput, low speed), $3/million tokens (medium), $6/million, and $45–$150/million for premium, ultra-low-latency research workloads. «As a research team, $150 per million tokens using 50 million tokens per day is not even a thing,» Jensen said. The key insight: Grace Blackwell lifted every tier, but Vera Rubin + Grok created an entirely new premium segment—35× faster than Blackwell at the highest-value tier. This architectural leap translates directly into revenue expansion: a simplified model shows Blackwell generating 5× more revenue than Hopper in the same gigawatt, and Vera Rubin another 5× over Blackwell.

Jensen's punchline: «What you do this year will show up precisely next year as your revenues. This chart is what it's all about.» Every hyperscaler, cloud provider, and AI startup will now manage their infrastructure as a manufacturing operation, balancing power allocation across tiers to maximize token margin.


7

Vera Rubin System Architecture: Seven Chips, Five Racks, One Supercomputer

NVLink 72 GPU Rack
72 Blackwell GPUs, 3.6 exaflops, 260 TB/s all-to-all bandwidth. Sixth-gen NVLink spine switch, liquid-cooled at 45°C hot water. Install time reduced from two days to two hours; all cables eliminated.
🤖
Grok3 LPX Token Rack
Eight Grok LP30 chips per rack, massive on-chip SRAM, deterministic dataflow, compiler-scheduled. Tightly coupled to Vera Rubin via low-latency Ethernet for disaggregated inference—35× speed-up at premium tier, ideal for ultra-fast token decode.
💾
STX AI-Native Storage
Vera CPU + CX9 + Bluefield 4 DPUs. Accelerates cuDF (structured data), cuVS (vector stores), and KV cache. «100% of the world's storage industry is joining us»—agents pound storage harder than humans ever did.
🔗
Spectrum X CPO Ethernet
Co-packaged optics (electrons → photons on-chip) invented with TSMC, world's first in production. Spectrum 6 scale-out networking increases energy efficiency and resiliency; copper and optical scale-up paths both supported through Rubin Ultra and Feynman generations.

8

OpenClaw: The Operating System for Agentic AI

Open-source framework surpassed Linux adoption in weeks; NVIDIA wrapped it in NemoClaw for enterprise security.

Jensen called OpenClaw «the most popular open source project in the history of humanity… it exceeded what Linux did in 30 years, in just a few weeks.» Peter Steinberger's framework provides scheduling, resource management, multi-modal I/O, tool access, and agent orchestration—essentially an OS for agentic computers. Within days of launch, «Hundreds of people are queuing up,» and a ClawCon conference materialized.

The implication is massive: every SaaS company must now have an «OpenClaw strategy,» just as they once needed Linux, Kubernetes, or HTTP strategies. NVIDIA responded by releasing NemoClaw, a secure, enterprise-ready reference design built on OpenClaw. It adds OpenShell (network guardrails, privacy router, policy engine integration) to prevent agents from accessing sensitive data or executing unsafe code. «Access sensitive information, execute code, communicate externally—obviously this can't possibly be allowed,» Jensen warned.

NVIDIA also bundled its open frontier models (NemoTron, Cosmos, Groot, BioNemo, AlpaMayo, Earth-2) and launched the NemoTron Coalition—partners include Black Forest Labs, Cursor, Mistral, Perplexity, and others committing to co-develop NemoTron 4. The goal: enable every enterprise to deploy domain-specific agents safely, using world-class base models that NVIDIA will «keep working on every single day… NemoTron 3 followed by NemoTron 4, Cosmos 1 by Cosmos 2.»


9

From SaaS to GaaS: The Enterprise IT Renaissance

Every software company will rent agents, not tools; engineers will need annual token budgets.

BEFORE OPENCLAW
Data Centers, Tools, Digital Workers
Enterprise IT stored files, ran codified workflows in tools (ERP, CRM), and humans operated those tools. Consultants integrated systems. Software companies sold licenses. Value captured through seat count and support contracts.
AFTER OPENCLAW
Token Factories, Agents, GaaS Revenue
Every SaaS company becomes «gas»—agent-as-a-service. Enterprises allocate token budgets to employees: «I'm going to give them probably half their base pay in tokens so they could be amplified 10×.» Recruiting pitch: «How many tokens come with my job?» Revenue shifts from seats to token consumption.

10

Physical AI at Planetary Scale: Robotaxis, Humanoids, Space Data Centers

18 million robo-ready cars/year, 110 robot partners, T-Mobile AI-RAN, and Vera Rubin Space One.

1

Autonomous Vehicles: AlpaMayo & Robo-Taxi Fleet BYD, Hyundai, Nissan join Mercedes, Toyota, GM—18 million robo-taxi-ready vehicles per year. Uber partnership for multi-city deployment. AlpaMayo reasoning models narrate actions, explain decisions, follow voice commands.

2

Humanoids & Industrial Robots: Isaac Lab, Cosmos, Groot 110 robot partners (ABB, KUKA, Universal Robotics, Foxconn, Disney Research). Isaac Lab for training, Cosmos World Models for synthetic data, Groot foundation models for reasoning. Disney's Olaf learned to walk in Omniverse using Newton physics solver on NVIDIA Warp.

3

AI-RAN & Edge Compute: NVIDIA Aerial T-Mobile partnership: base stations become «AI infrastructure platforms.» Aerial AI-RAN reasons about traffic, adjusts beamforming, saves energy. «That radio tower used to be a radio tower—now it's a robotics radio tower.»

4

Space Data Centers: Vera Rubin Space One Thor radiation-approved and flying in satellites for imaging. Next: data centers in space. «In space, no conduction, no convection, just radiation.» NVIDIA engineers solving thermal challenges for orbital compute at scale.


11

DSX: Digital Twin Platform for AI Factory Design & MaxQ Optimization

Omniverse-based blueprint saves «a factor of two» in power efficiency across trillion-dollar buildouts.

NVIDIA introduced DSX, a digital twin platform where all vendors—compute, cooling, electrical, networking—meet virtually before hardware ships. Partners include Siemens (Star CCM Plus thermals), Cadence (Reality internal simulation), ETAP (electrical), PTC Windchill (PLM), Dassault (3D Experience systems engineering), Jacobs (custom Omniverse apps), and Procore (virtual commissioning).

Once live, the twin becomes an AI agent-driven operator. DSX MaxQ dynamically orchestrates infrastructure: Phaedra agents manage cooling and electrical, Emerald agents interpret grid demand and adjust power. Jensen: «There's no question in my mind there's a factor of two in here. And the factor of two at the scale we're talking about is gigantic.» At trillion-dollar capex scale, a 2× efficiency gain is worth hundreds of billions in recovered capacity and avoided buildout. DSX aims to eliminate squandered power, maximize token throughput, and compress construction timelines across every AI factory deployment globally.


12

The Roadmap: Rubin Ultra (Kyber), Feynman (Rosa CPU, LP40, CPO Scale-Up)

📦
Rubin (2026) & Rubin Ultra
Rubin: NVLink 72 (Oberon rack, copper) or NVLink 576 (optical scale-up). Rubin Ultra: Kyber rack, 144 GPUs vertically mounted, mid-plane NVLink spine switches—no rear cabling. Grok LP35 adds FP4 tensor cores for «another few X-factor speed-up.»
🚀
Feynman (2027+)
New GPU, LP40 LPU (uniting NVIDIA + Grok team), Rosa CPU (named for Rosalind Franklin), Bluefield 5, CX-10. Kyber CPO for co-packaged optical scale-up alongside copper. Spectrum 6 for scale-out. «Every single year, brand new architecture.»
🔩
Backward Compatibility & Multi-Path
Oberon standard racks remain available for customers preferring evolutionary upgrades. NVIDIA supports copper scale-up (Kyber) and optical scale-up (CPO) simultaneously. «Copper going to still be important? Yes. Optical scale-up? Yes. Scale-out optical? Yes. We need a lot more capacity.»

13

«If You Have the Wrong Architecture, Even Free Isn't Cheap Enough»

Jensen's closing argument: $40B gigawatt factory amortized over 15 years demands best perf/watt from day one.

I've said before, if you have the wrong architecture, even if it's free, it's not cheap enough. And the reason for that is because no matter what happens, you still have to build a gigawatt data center. You still have to build a gigawatt factory, and that gigawatt factory, for 15 years amortized across, that gigawatt factory is about $40 billion. Even when you put nothing on, it's $40 billion in. You better make for darn sure you put the best computer system on that thing so that you could have the best token cost. NVIDIA's token cost is world class, untouchable at the moment.

Jensen Huang


14

Titres mentionnés

NVDANVIDIA Corporation

15

Personnes

Jensen Huang
CEO, NVIDIA
host
Sarah Goh
Conviction (pre-game host)
mentioned
Alfred Lim
Sequoia Capital (NVIDIA's first venture capitalist)
mentioned
Gavin Baker
NVIDIA's first major institutional investor
mentioned
Dylan Patel
Semi-Analysis (analyst)
mentioned
Peter Steinberger
Creator, OpenClaw
mentioned
Kimberly Powell
NVIDIA (healthcare AI lead)
mentioned
Satya Nadella
CEO, Microsoft
mentioned

Glossaire
NVLink 72NVIDIA's sixth-generation scale-up interconnect switch fabric enabling 72 GPUs to operate as one unified domain with 130–260 TB/s all-to-all bandwidth.
FP4 Precision4-bit floating-point tensor core format introduced in Blackwell; delivers massive inference speed and energy efficiency gains without loss of accuracy.
Tokens per WattKey AI factory metric: number of tokens generated per watt of power consumed; determines throughput at ISO power in a power-constrained data center.
Co-Packaged Optics (CPO)Technology integrating optical transceivers directly onto switch silicon, translating electrons to photons on-chip; NVIDIA + TSMC invented process, now in production (Spectrum X).
DynamoNVIDIA's disaggregated inference orchestration software; pipelines pre-fill (Vera Rubin) and decode (Grok) across heterogeneous processors to maximize throughput and latency.

Avertissement : Ceci est un résumé généré par IA d'une vidéo YouTube à des fins éducatives et de référence. Il ne constitue pas un conseil en investissement, financier ou juridique. Vérifiez toujours les informations auprès des sources originales avant de prendre des décisions. TubeReads n'est pas affilié au créateur de contenu.