The history and future of AI at Google, with Sundar Pichai

Just over a decade into his tenure as CEO, Sundar Pichai faces one of the defining tensions in tech history: Google invented the Transformer architecture that powers modern AI, yet OpenAI productized it first with ChatGPT. Now, as Alphabet plans to spend $175 billion in CapEx in 2026 — a staggering six-fold increase from just a few years ago — Pichai must navigate memory shortages, wafer capacity bottlenecks, and a capital allocation puzzle that spans autonomous vehicles, quantum computing, and the race to AGI. How does the company that pioneered so much of the AI stack think about its future when the constraints aren't just technical, but physical — from electrician shortages to data center permitting? And can Google's culture of methodical, safety-first product development coexist with the breakneck pace of consumer AI innovation?

StripeTech3 Personnes mentionnées 5 Termes du glossaire

Durée de la vidéo : 1:09:33·Publié 7 avr. 2026·Langue de la vidéo : English

9–10 min de lecture·11,068 mots prononcés → résumé en 1,839 mots (6x)·

Regarder sur YouTube ↗

1 —

Points clés

1

Google didn't miss Transformers — the architecture was immediately deployed in Search via BERT and MUM, driving some of the largest quality jumps in the product's history. The company had LaMDA (an early ChatGPT analog) internally but couldn't ship it due to toxicity and higher product quality bars.

2

2026 AI infrastructure is supply-constrained across wafer starts, memory (especially critical), power, and permitting. Google could not spend $400 billion even if it wanted to; the bottleneck is physical capacity, not capital or demand.

3

Speed and latency remain core to Google's product DNA. Search teams now operate with millisecond-level latency budgets, and Gemini on TPUs is designed for frontier capability at dramatically lower inference cost — a competitive moat built on vertical integration.

4

Waymo succeeded because Google evaluated it at the «deeper technology level» — tracking safety metrics and driver performance curves — and stayed committed through plateaus. In contrast, starting Waymo today would benefit from end-to-end deep learning breakthroughs unavailable 15 years ago.

5

By 2027, Pichai expects profound workflow shifts across non-engineering teams, with agents handling tasks like budget forecasting. The current bottleneck isn't model capability but diffusion: prompting skills, data access, permissions, role definitions, and change management in large organizations.

En bref

Google is betting $175 billion that vertical integration — from TPUs to Transformers to products like Search, Waymo, and Gemini — will define the AI era, even as near-term supply constraints (memory, power, permits) force every player into a high-stakes game of capital allocation and creative compaction.

2 —

The Transformer Paradox: Why Google Shipped the Tech but Not the Product

Google deployed Transformers internally in Search years before ChatGPT launched.

Sundar Pichai pushes back on the narrative that Google invented Transformers but failed to productize them. «Transformers were done to solve a specific product need,» he explains — the research emerged from tackling translation quality and scaling speech recognition to two billion users with limited chip capacity. The architecture was immediately applied to Search through BERT and MUM, driving «some of the biggest jumps in search quality» during a period when Google pulled ahead of competitors.

Google also had LaMDA — essentially an early ChatGPT — running internally, famously prompting an engineer to claim it was sentient. The company even launched AI Test Kitchen at Google I/O 2022 as a constrained version of LaMDA. The roadblock wasn't vision or capability; it was product quality bars. «The version I saw was a lot more toxic at a level. We couldn't have possibly put it out at that time,» Pichai recalls. As a company with «this search quality bias,» Google held itself to a higher standard for what constituted shippable.

Pichai acknowledges OpenAI's launch timing — the week of Thanksgiving 2023, somewhat buried — caught the industry by surprise. But he contextualizes it: «If you're in consumer internet, you're going to have surprises.» He draws parallels to YouTube emerging when Google had Video Search, or Instagram appearing when Facebook dominated social. The lesson: consumer internet allows small teams to prototype rapidly and create breakout moments in ways that hardware products like the iPhone cannot replicate.

3 —

Speed as Strategy: Latency Budgets in the Milliseconds

⚡

Latency as Product DNA

Speed has always been a core differentiator for Google — from original Search displaying query time to Gmail's fast search and Chrome's performance. Pichai views latency as «one of the distinguishing features of a great product» and a signal that technical underpinnings are sound.

📊

Millisecond-Level Budgeting

Search sub-teams now have formal latency budgets in the milliseconds. Ship something that shaves 3ms? You earn 1.5ms for your budget (50% credit), and 1.5ms goes to users. Depending on the feature, teams get 10–30ms budgets with rigorous reviews.

🚀

30% Faster in Five Years

Despite massive increases in Search functionality and AI integration, Google has improved Search latency by 30% over the past five years. This reflects the discipline of balancing capability expansion with performance optimization.

🔮

Flash Models for Speed

Gemini Flash models deliver «90% the capability of the pro models, but much faster, much more effective to serve.» Vertical integration with TPUs enables this speed advantage, which Pichai sees as a moat in the AI product landscape.

4 —

The $175 Billion Constraint Problem

Google faces physical bottlenecks on wafer starts, memory, power, and permitting.

2026 Planned CapEx

$175–185 billion

A dramatic scale-up from approximately $30 billion just a few years ago, reflecting AI infrastructure demands.

CapEx Ceiling

$400 billion (impossible)

Pichai notes Google could not spend $400 billion even if desired — memory, wafer capacity, electricians, and permitting are hard limits.

Memory as Critical Constraint

Short-term bottleneck

«Memory is definitely one of the most critical components now.» Leading memory companies cannot dramatically expand capacity quickly, creating near-term supply constraints.

Search Latency Improvement

30% reduction

Achieved over the past five years despite massive functionality increases, demonstrating disciplined engineering.

Wing Delivery Reach Goal

40 million Americans

Pichai expects this level of Wing drone delivery access «in some reasonable time period» — not years out.

5 —

Capital Allocation in the Age of TPUs

Compute is now the scarcest resource; Pichai spends an hour weekly reviewing TPU budgets.

Capital allocation at Google has fundamentally shifted. «The scarce resource is compute in a lot of cases,» Pichai explains, and managing it has become a weekly discipline. He spends a dedicated hour reviewing compute allocation at a granular level — tracking which projects and teams are using TPU resources and assessing returns. This mirrors the rigor historically applied to headcount planning, but with higher stakes: in a supply-constrained environment, every TPU hour deployed on one project is an opportunity cost for another.

The framework is methodical. Google forward-plans compute needs for both internal products and Google Cloud customers. Any customer commitment is «sacrosanct» — contractual obligations are met first. The Cloud team, like internal teams, operates in a constrained world where demand exceeds supply. But Pichai rejects zero-sum thinking: «It feels so far from a zero-sum game to me. The value of what people are going to be able to do is also on some crazy curve.»

Historically, Google made early-stage bets with small funding but long-term commitment — Waymo, TPUs, Quantum. Pichai evaluates these at the «deeper technology level,» tracking underlying metrics like Waymo's driver safety curves or quantum logical qubit error rates. «As long as you're seeing that underlying tech,» progress justifies continued investment. He increased Waymo funding two to three years ago when competitors backed off, betting on the team's ability to break through plateaus. The vertical integration from research to infrastructure to platforms creates «a very leveraged way to make progress» — one breakthrough in AI accelerates Search, YouTube, Cloud, and Waymo simultaneously.

6 —

Search, Agents, and the Future of Information Work

Search will evolve into an agent manager running asynchronous, long-running tasks.

TODAY

Search as Query → Results

The traditional model: a one-line prompt returns ranked results or a single answer. But this is already shifting — AI mode in Search now handles deep research queries that don't fit the old paradigm. People have adapted to new modalities, and Pichai expects this to continue evolving rather than disappearing.

TOMORROW

Search as Agent Manager

In 10 years, Search will likely be «an agent manager in which you're doing a lot of things.» Information-seeking queries will become agentic task completion with many threads running asynchronously. Device form factors and I/O will change radically, making it hard to predict exact interfaces — but the expansion is clear.

7 —

Why Waymo Survived When Others Were Cut

Evaluating underlying technology curves — not quarterly results — justified long-term commitment.

“The more you're able to evaluate things at that deeper technology level, I think you tend to make those decisions better, or at least that's how I have tried to do it.”
— Sundar Pichai

8 —

The Intelligence Overhang: Why AI Adoption Lags Capability

Models are capable, but diffusion is limited by prompting skills, data access, and permissions.

1

Learning to Prompt Engineers must develop general prompting skills, then company-specific expertise (e.g., which Stripe or Google tools to invoke). This learning curve slows adoption even when models are ready.

2

Collaboration on AI-Generated Code High code velocity from AI creates blast radius problems. Teams rewrite code multiple times before shipping, making it hard for many people to collaborate on a rapidly changing codebase.

3

Data Access and Permissions Agentic answers require access to internal data (e.g., «How many times a day do people ask about deal status?»). Permissions engines must be rebuilt to handle AI querying sensitive information securely.

4

Role and Workflow Redefinition Traditional Eng/PM/Design roles stem from a prior era. As AI handles more of each function, roles may merge. Change management in large organizations becomes the bottleneck, not model capability.

5

Security and Cost of Mistakes At Google scale, mistakes are expensive. The company takes security seriously, adding another layer of caution that slows diffusion but will result in more robust products when solved.

9 —

The 2027 Inflection: From Engineering to Enterprise-Wide AI

Pichai expects profound workflow shifts across non-engineering functions by next year.

💡

The 2027 Inflection: From Engineering to Enterprise-Wide AI

When asked about fully agentic forecasting — where AI produces quarterly business projections with no human in the loop — Pichai targets 2027 as «an important inflection point for certain things.» Engineering teams are early adopters, already living in agent manager workflows internally (called «Jet Ski» at Google, «Antigravity» externally). But the real transformation comes when non-engineering functions — finance, operations, design — adopt these tools at scale. «Even the people doing it, that is the workflow through which they would produce it,» Pichai says of forecasting. For a while, teams may check AI output conventionally, but the crossover is imminent. The fixed cost of diffusion — retraining, transformation, permissions, identity access controls — is being paid now. By 2027, the jumps in what people can do will be dramatic.

10 —

Long Shots: Data Centers in Space and Small Bets That Compound

Google starts moonshots with tiny teams and small budgets, scaling only after milestones.

Pichai's favorite small project is audacious: «We are in the earliest stages of thinking about data centers in space.» He frames it as constraint-driven creativity — if you take a 20-year outlook, where will you put data centers when terrestrial limits on power, land, and permitting become binding? It's a Waymo-in-2010 scale bet, staffed today by «literally a few people with a small budget to go to the first milestone.» The discipline is to start small even for big ideas, then compound over time.

Other bets in the portfolio: scaling Wing drone delivery to 40 million Americans «in some reasonable time period,» advancing quantum computing with logical qubit error correction, and robotics via Google DeepMind's Gemini Robotics models (state-of-the-art on spatial reasoning). Isomorphic Labs is rethinking drug discovery by applying AI models across the full pipeline, not just molecular design. These are methodical, long-term projects that benefit from one common accelerant: progress in AI models improves all of them simultaneously.

11 —

Titres mentionnés

GOOGLAlphabet Inc.

12 —

Personnes

Sundar Pichai

CEO, Alphabet & Google

guest

Patrick Collison

CEO, Stripe

host

Elad Gil

Investor & Entrepreneur

host

Glossaire

TransformersA neural network architecture (published in Google's 2017 «Attention Is All You Need» paper) that uses self-attention mechanisms and is the foundation for modern large language models like GPT and Gemini.

BERT and MUMGoogle Search technologies built on Transformers: BERT (Bidirectional Encoder Representations from Transformers) and MUM (Multitask Unified Model) dramatically improved query and document understanding.

LaMDALanguage Model for Dialogue Applications — Google's conversational AI model, an early analog to ChatGPT, which famously prompted an engineer to claim it was sentient.

TPUTensor Processing Unit — Google's custom-designed AI accelerator chip optimized for machine learning workloads, first announced in 2016 and now in its seventh generation.

Logical qubitIn quantum computing, a qubit protected from errors through quantum error correction, enabling more stable and reliable computation than physical qubits alone.

Avertissement : Ceci est un résumé généré par IA d'une vidéo YouTube à des fins éducatives et de référence. Il ne constitue pas un conseil en investissement, financier ou juridique. Vérifiez toujours les informations auprès des sources originales avant de prendre des décisions. TubeReads n'est pas affilié au créateur de contenu.