NVIDIA GTC Keynote 2026
NVIDIA is betting a trillion dollars on a radical shift in computing—from data centers that store files to AI factories that manufacture tokens. The company claims inference demand has exploded 10,000 times in two years, while the value of frontier models has plummeted from exclusive to commoditized. Now, with the Vera Rubin platform and Grok's acquisition, Jensen Huang is promising 350x token generation speed increases in just 24 months. Can the world's infrastructure absorb this pace? And will OpenClaw's overnight explosion as «the most popular open source project in human history» force every company to rethink their entire software stack before they're ready?
Kernaussagen
The «inference inflection» has arrived: AI now performs productive work requiring 10,000× more compute than two years ago, while usage has grown 100×—a combined million-fold demand increase Jensen attributes to reasoning (O1), code-generation agents (Claude Code), and agentic workflows.
NVIDIA pivoted from chip vendor to vertically integrated AI factory architect, delivering Vera Rubin systems that generate 350× more tokens per gigawatt than Hopper (H200) in the same 24-month span—far exceeding Moore's Law's expected 1.5× gain.
OpenClaw «open-sourced the operating system of agentic computers,» forcing every SaaS company to become a «gas» (agent-as-a-service) company; NVIDIA wrapped it in NemoClaw with security, privacy routing, and enterprise guardrails to make corporate deployment safe.
Physical AI is scaling across 18 million robo-taxi-ready vehicles per year (BYD, Hyundai, Nissan, plus prior Mercedes, Toyota, GM) and 110 robot partners; NVIDIA's AlpaMayo reasoning model and Cosmos World Foundation enable «compute is data» synthetic training at scale.
NVIDIA now produces «multi-gigawatts of AI factories per month» with Vera Rubin already running at Azure; the company insists its install base, 20-year CUDA flywheel, and horizontal openness make it the only platform customers can deploy «with complete confidence» at trillion-dollar scale.
Kurzgesagt
NVIDIA has engineered a 40-million-times compute increase in a decade and now sees $1 trillion in committed demand through 2027, built on three pillars: AI factories optimized for token economics, OpenClaw as the «operating system» for agentic computing, and physical AI deployed at planetary scale—from autonomous vehicles to humanoid robots to space data centers.
The Inference Inflection: Why Demand Exploded 1,000,000×
Three model breakthroughs—ChatGPT, O1 reasoning, Claude Code—triggered a million-fold compute jump in two years.
Jensen opened by declaring 2025 «NVIDIA's year of inference» and explained why computing demand has surged beyond all forecasts. He traced three inflection points: ChatGPT (generative AI era), OpenAI's O1 (reasoning models that reflect, plan, and decompose problems), and Anthropic's Claude Code (the first agentic model that reads, compiles, tests, and iterates code). Each wave multiplied both token input (context) and output (thinking) by orders of magnitude.
He quantified the shock: «The computing demand of the work has gone up by 10,000 times, and usage has gone up by 100 times. That's why computing demand has increased by one million times in the last two years.» This is no longer training-dominated—AI must now reason, generate, and act in production, which means continuous inference at massive scale. Every startup, every hyperscaler, every enterprise is constrained not by training capacity but by inference throughput.
The consequence: 100% of NVIDIA engineers now use a combination of ChatGPT, Claude Code, Codex, and Cursor. «There's not one software engineer today who is not assisted by one or many AI agents,» Jensen said. The shift from perception to generation to reasoning to action has made AI «able to do productive work»—and that productive work consumes tokens at rates no one anticipated even twelve months ago.
NVIDIA's Three Platforms and the CUDA Flywheel
«Hopper pricing is going up»: Why Old GPUs Still Print Money
Six-year-old Ampere cloud pricing rises because CUDA's reach and continuous optimization sustain value.
«Hopper pricing is going up»: Why Old GPUs Still Print Money
Jensen revealed an under-discussed fact: «Ampere, shipped six years ago, the pricing of Ampere in the cloud is going up.» Why? NVIDIA's install base spans hundreds of millions of GPUs, all architecturally compatible. Every new software optimization released by NVIDIA benefits millions of deployed systems simultaneously. The combination of high reach, continuous updates, and accelerating application diversity means older GPUs retain—and even increase—their utility. This dynamic is unique to NVIDIA and explains why customers can deploy infrastructure «with complete confidence» that useful life will be long and total cost of ownership low.
Grace Blackwell to Vera Rubin: 350× Token Speed in 24 Months
NVIDIA's roadmap leapfrogs Moore's Law with architecture, not transistors—delivering 35× to 50× per generation.
The Token Factory: Throughput × Speed = Revenue
Every data center is now power-limited; optimizing tokens-per-watt and inference latency directly determines profitability.
Jensen introduced the mental model every AI CEO will adopt: «Your data center is no longer a file repository—it's a factory that generates tokens.» He drew a two-axis chart: vertical = throughput (tokens per watt), horizontal = token speed (latency/interactivity). The smarter the AI (longer context, more reasoning), the lower the throughput; the faster the inference, the higher the price tier.
NVIDIA segments the market into four tiers: free (high throughput, low speed), $3/million tokens (medium), $6/million, and $45–$150/million for premium, ultra-low-latency research workloads. «As a research team, $150 per million tokens using 50 million tokens per day is not even a thing,» Jensen said. The key insight: Grace Blackwell lifted every tier, but Vera Rubin + Grok created an entirely new premium segment—35× faster than Blackwell at the highest-value tier. This architectural leap translates directly into revenue expansion: a simplified model shows Blackwell generating 5× more revenue than Hopper in the same gigawatt, and Vera Rubin another 5× over Blackwell.
Jensen's punchline: «What you do this year will show up precisely next year as your revenues. This chart is what it's all about.» Every hyperscaler, cloud provider, and AI startup will now manage their infrastructure as a manufacturing operation, balancing power allocation across tiers to maximize token margin.
Vera Rubin System Architecture: Seven Chips, Five Racks, One Supercomputer
OpenClaw: The Operating System for Agentic AI
Open-source framework surpassed Linux adoption in weeks; NVIDIA wrapped it in NemoClaw for enterprise security.
Jensen called OpenClaw «the most popular open source project in the history of humanity… it exceeded what Linux did in 30 years, in just a few weeks.» Peter Steinberger's framework provides scheduling, resource management, multi-modal I/O, tool access, and agent orchestration—essentially an OS for agentic computers. Within days of launch, «Hundreds of people are queuing up,» and a ClawCon conference materialized.
The implication is massive: every SaaS company must now have an «OpenClaw strategy,» just as they once needed Linux, Kubernetes, or HTTP strategies. NVIDIA responded by releasing NemoClaw, a secure, enterprise-ready reference design built on OpenClaw. It adds OpenShell (network guardrails, privacy router, policy engine integration) to prevent agents from accessing sensitive data or executing unsafe code. «Access sensitive information, execute code, communicate externally—obviously this can't possibly be allowed,» Jensen warned.
NVIDIA also bundled its open frontier models (NemoTron, Cosmos, Groot, BioNemo, AlpaMayo, Earth-2) and launched the NemoTron Coalition—partners include Black Forest Labs, Cursor, Mistral, Perplexity, and others committing to co-develop NemoTron 4. The goal: enable every enterprise to deploy domain-specific agents safely, using world-class base models that NVIDIA will «keep working on every single day… NemoTron 3 followed by NemoTron 4, Cosmos 1 by Cosmos 2.»
From SaaS to GaaS: The Enterprise IT Renaissance
Every software company will rent agents, not tools; engineers will need annual token budgets.
Physical AI at Planetary Scale: Robotaxis, Humanoids, Space Data Centers
18 million robo-ready cars/year, 110 robot partners, T-Mobile AI-RAN, and Vera Rubin Space One.
Autonomous Vehicles: AlpaMayo & Robo-Taxi Fleet BYD, Hyundai, Nissan join Mercedes, Toyota, GM—18 million robo-taxi-ready vehicles per year. Uber partnership for multi-city deployment. AlpaMayo reasoning models narrate actions, explain decisions, follow voice commands.
Humanoids & Industrial Robots: Isaac Lab, Cosmos, Groot 110 robot partners (ABB, KUKA, Universal Robotics, Foxconn, Disney Research). Isaac Lab for training, Cosmos World Models for synthetic data, Groot foundation models for reasoning. Disney's Olaf learned to walk in Omniverse using Newton physics solver on NVIDIA Warp.
AI-RAN & Edge Compute: NVIDIA Aerial T-Mobile partnership: base stations become «AI infrastructure platforms.» Aerial AI-RAN reasons about traffic, adjusts beamforming, saves energy. «That radio tower used to be a radio tower—now it's a robotics radio tower.»
Space Data Centers: Vera Rubin Space One Thor radiation-approved and flying in satellites for imaging. Next: data centers in space. «In space, no conduction, no convection, just radiation.» NVIDIA engineers solving thermal challenges for orbital compute at scale.
DSX: Digital Twin Platform for AI Factory Design & MaxQ Optimization
Omniverse-based blueprint saves «a factor of two» in power efficiency across trillion-dollar buildouts.
NVIDIA introduced DSX, a digital twin platform where all vendors—compute, cooling, electrical, networking—meet virtually before hardware ships. Partners include Siemens (Star CCM Plus thermals), Cadence (Reality internal simulation), ETAP (electrical), PTC Windchill (PLM), Dassault (3D Experience systems engineering), Jacobs (custom Omniverse apps), and Procore (virtual commissioning).
Once live, the twin becomes an AI agent-driven operator. DSX MaxQ dynamically orchestrates infrastructure: Phaedra agents manage cooling and electrical, Emerald agents interpret grid demand and adjust power. Jensen: «There's no question in my mind there's a factor of two in here. And the factor of two at the scale we're talking about is gigantic.» At trillion-dollar capex scale, a 2× efficiency gain is worth hundreds of billions in recovered capacity and avoided buildout. DSX aims to eliminate squandered power, maximize token throughput, and compress construction timelines across every AI factory deployment globally.
The Roadmap: Rubin Ultra (Kyber), Feynman (Rosa CPU, LP40, CPO Scale-Up)
«If You Have the Wrong Architecture, Even Free Isn't Cheap Enough»
Jensen's closing argument: $40B gigawatt factory amortized over 15 years demands best perf/watt from day one.
“I've said before, if you have the wrong architecture, even if it's free, it's not cheap enough. And the reason for that is because no matter what happens, you still have to build a gigawatt data center. You still have to build a gigawatt factory, and that gigawatt factory, for 15 years amortized across, that gigawatt factory is about $40 billion. Even when you put nothing on, it's $40 billion in. You better make for darn sure you put the best computer system on that thing so that you could have the best token cost. NVIDIA's token cost is world class, untouchable at the moment.”
Erwähnte Wertpapiere
Personen
Glossar
Haftungsausschluss: Dies ist eine KI-generierte Zusammenfassung eines YouTube-Videos für Bildungs- und Referenzzwecke. Sie stellt keine Anlage-, Finanz- oder Rechtsberatung dar. Überprüfen Sie Informationen immer anhand der Originalquellen, bevor Sie Entscheidungen treffen. TubeReads ist nicht mit dem Content-Ersteller verbunden.