Live from Transition-AI 2026: Inside Google’s massive AI CapEx

title Live from Transition-AI 2026: Inside Google’s massive AI CapEx

description As the race to build out artificial intelligence accelerates, the infrastructure required to support it is undergoing a remarkable transformation. In February, Google announced a plan to spend $175 billion to $185 billion in CapEx for 2026— a figure roughly equivalent to the GDP of Hungary.

In this special live episode, recorded at Transition-AI 2026 in San Francisco, Shayle sits down with Amin Vahdat, Google’s chief technologist for AI infrastructure. Amin pulls back the curtain on how the hyperscaler is rethinking everything from data center reliability and behind-the-meter power generation to real-time inference.

Shayle and Amin discuss:

How Google’s shift from focusing on training to inference can enable more distributed, smaller-scale data center deployments

Why Google is moving away from traditional "five nines" reliability for certain workloads in exchange for doubling compute capacity

How on-site generation can serve as a "bridge" to manage interconnection latency

Google’s milestone agreement with utilities for one gigawatt of demand response

How software can co-optimize chip design, building cooling and power generation to create superefficient and flexible "AI factories"

Catalyst: The rise of flexible data centers

Catalyst: Will inference move to the edge?

Catalyst: The mechanics of data center flexibility

Open Circuit: The natural gas ‘bridge’ becomes a highway

Open Circuit: Are investors losing faith in the AI infrastructure frenzy?

Latitude Media: Energy Vault is expanding into infrastructure for AI

Latitude Media: The rise of the AI infrastructure asset class

Credits: Hosted by Shayle Kann. Produced and edited by Max Savage Levenson. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by FischTank PR, an award-winning climate and energy tech, renewables, and sustainability-focused PR firm dedicated to elevating the work of both early-stage and established companies. Learn more about their PR approach and how they can support your company’s messaging by visiting fischtankpr.com.

Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com.

Tune into Critical Capital, a brand new podcast from Crux and Latitude Studios. Hosted by Crux CEO Alfred Johnson, Critical Capital explores the interlocking forces powering clean and critical infrastructure. Join us every other Tuesday for in-depth conversations at the intersection of energy, government, finance, and global markets. Listen here, or wherever you get podcasts.

pubDate Thu, 23 Apr 2026 10:00:00 GMT

author Latitude Media

duration 2154000

transcript

Speaker 1:
[00:00] Catalyst is supported by FischTank PR, an award-winning PR firm focused on climate and energy tech, renewables and sustainability. FischTank is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight and gets results, you'll like FischTank PR. To learn more about FischTank's approach, visit fischtankpr.com. That's F-I-S-C-H, fischtankpr.com.

Speaker 2:
[00:27] When utilities need flexible capacity, they can count on, they turn to EnergyHub. EnergyHub works with more than 170 utilities, coordinating over 2.5 million devices to manage 3.4 gigawatts of flexibility, built for the moments when utilities can't afford uncertainty. EnergyHub builds and operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale. Predictive, verifiable, and designed to perform when it counts. Learn more at energyhub.com.

Speaker 3:
[01:00] Trillions of dollars are flowing into clean and critical infrastructure. But those investments aren't driven by technology alone. They're shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux and host of a brand new podcast, Critical Capital. Each episode, I talk with people deploying capital, shaping policy, and building the clean economy. Tune in as we unpack how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts.

Speaker 1:
[01:33] Latitude Media, covering the new frontiers of the energy transition.

Speaker 4:
[01:38] I'm Shayle Kann. Welcome to Catalyst Live. Thank you so much. Okay. I am here with Amin Vahdat, who's sitting next to me here. Amin is the Chief Technologist for AI Infrastructure at Google. Amin, welcome.

Speaker 5:
[01:57] Thank you for having me. Excited to be here.

Speaker 4:
[02:00] Okay. I want to provide a little bit of context for the conversation we're about to have here. I know this is why everybody is here in this room, at this conference, but there's a lot going on in AI infrastructure at the moment, particularly as it pertains to energy. Amin leads the infrastructure team at Google. So in the Q4 2025 earnings report, Google announced its intent to spend somewhere between $175 and $185 billion in CAPEX this year. It's not all for AI infrastructure, but let's assume a decent portion of it is just for this purpose right now. Let me offer you some context for that number. We had a big election in Hungary this week. That number is roughly the GDP of Hungary. Numbers that are more relevant to this audience probably, we spend about $25 or $35 billion a year in CAPEX on electricity transmission infrastructure in the United States. So this is five to seven times that amount just from Google, just in one year. If you want to talk about big infrastructure projects, let's talk about Vogel. Vogel is the notoriously expensive, extremely expensive nuclear plant that's the first nuclear project built in the United States in decades. Vogel cost about $30 billion. So this is five or six Vogels per year. If you want to move outside, Energy, just for one fun one, I was in San Diego last week, which happened to be when the lunar mission dropped down. So I looked up NASA. NASA's annual budget is $25 billion. So this is seven NASA's that Amin is responsible for spending each year, or at least this year on infrastructure. So with a lot of infrastructure and with great capex, comes a lot of great questions. I have many. Let's dive into some of them. The first one, I mean, I guess is one of the big ones that's been on my mind and I want your perspective on it. We clearly have been living in a world where scale of the individual data centers has been a driving force, right? We've gone from, you guys were probably building tens of megawatts per data center years ago, to hundreds of megawatts to now gigawatts. I think probably everybody here appreciates that for training purposes, for model training purposes, scale is really important. This is why we're getting these huge data centers. But for inference, I've heard mixed things. As we shift more into inference world, it may or may not be true that you need that level of individual scale. In your mind, how much does scale matter when it comes to inference computing? When I say scale, I mean scale of the individual data center.

Speaker 5:
[04:37] Yeah, it's a great question and I think you have it spot on. I remember when Google announced its first data center in Oregon, the Dallas, this was 23, four years ago, before I was at Google, 10 megawatts and people were just stunned that a little company would go build a 10 megawatt data center. That was a big number and actually no one else was building data centers for their own compute infrastructure at the time. It's just grown from there, 100 megawatts, gigawatt, etc. It's a really good question in terms of the split between training and serving. So here's where to me gets perhaps most interesting. At the scale that we're operating, we want the latest, greatest, most efficient, most capable training cluster essentially on an annual basis. If you look at our announcements for TPUs and videos announcements for GPUs, the latest, greatest is coming out every year. Every year, the latest, greatest is by definition better than the last year. Let's pick this gigawatt number. Let's say you buy the latest, greatest and you put a gigawatt somewhere, and maybe you put a couple of these down. After a few years, one, two, probably not much more than that, whoever is doing the training is going to want the new latest, greatest, and then they're going to want a gigawatt somewhere else. Now, you got a gigawatt of capacity that used to be used for training. What are you going to do with it? Probably going to serve on it. So now, the question is, could you get away with lower scale? Yes, absolutely. In fact, we have lots of smaller deployments, lots of data centers with much less than a gigawatt of capacity, 10 megawatts in certain places.

Speaker 4:
[06:18] That serves equal value for inference?

Speaker 5:
[06:21] Inference in general, now for our largest, most capable models, they are going to run on many chips. It's not just one chip simultaneously, but you don't strictly need a gigawatt of capacity to be able to do useful work. You probably don't even need 100 megawatts of capacity. It gets a little bit more interesting than that because of, let's say, co-located compute and storage and networking and everything else. In other words, it's not just the accelerator. But no, strictly speaking, you could go to much smaller deployments and still be able to do inference. The life cycle aspect of it that I just described as people cycle workloads over the capacity is the more interesting one in terms of the footprint for observing.

Speaker 4:
[06:59] There's two interesting pieces to that. One is, as you're saying, just intrinsically for inference, you don't need the same scale effect. But there is probably some minimum scale that's viable, as you said, because you are co-locating it with other things. So you're probably not doing 10 kilowatt deployments.

Speaker 5:
[07:15] No.

Speaker 4:
[07:16] Okay. So we're in the tens of megawatts, but not, or hundreds, but not gigawatts necessarily.

Speaker 5:
[07:20] These racks today are trending toward hundreds of kilowatts, just the same.

Speaker 4:
[07:24] Right, for the rack.

Speaker 5:
[07:25] For the rack with multiple chips in it. But I mean, it's absolutely you're going to need some minimum scale.

Speaker 4:
[07:32] Okay. Then the second interesting piece is what you said about repurposing. There, I guess, it's a question of demand. You put a few gigawatts for training, you move on to the next few gigawatts for training of whatever the next TPU or GPU is, but is that enough to serve the booming? I think the assumption has been, look, we're training now, but that is going to result in the inference demand shooting upward, right? So then that would imply it's not nearly going to be enough.

Speaker 5:
[08:01] Exactly. I think we're at that transition point. I mean, we said last year that we're entering the age of inference. I think with agents exploding today, that's well, well happening. So probably, I mean, the analogy I would use is from Google's early days with Web Search. It used to be that most of the compute at Google was dedicated to building the search index. Pretty quickly, you hoped, and it unfortunately turned out to be true, that most of the capacity needed to be used to serve that index. Same thing here. Most of our capacity maybe earlier on was used for building the model, but you would hope that it transitions to serving the model pretty quickly, and you're absolutely right that we're there. So I do think that over time also, as the efficiency and latency of these models improves, more disparate deployments are going to be valuable. So what I mean by that is today, each individual token that is generated by the model takes a reasonable amount of latency. So much so that actually you might not be able to tell the difference here, let's say in San Francisco, if you're accessing content on the East Coast, maybe even Europe sometimes, relative to San Francisco. In general, for let's say maps or search or ads, that's not true. The computing is sufficiently efficient and the latency is sufficiently low that you will notice because of the speed of light propagation delay of the network, if you're going to a faraway site. So as these services become more interactive, as they become more efficient, and that is still going to be a journey, we're not there today, you're going to want to have geographic locality, that's also going to impact reliability. Because again, you can think of it as a highway system, the less distance you have to go, the more likely it is that you're going to find the capacity you need for your request.

Speaker 4:
[09:45] So I guess wrapping up this piece of it, the core question that I've been trying to think about, I think a lot of folks in this world that intersects energy and AI have been thinking about as well, is do we end up as we shift more and more into inference, where you can make an argument for smaller pixel sizes, making sense for data centers, does it end up being easier in three years, five years, something like that, to go build a new gigawatt data center and find a site on the grid that you can interconnect the gigawatt data center, or does it become easier and or faster to build 50-20 megawatt data centers or something like that?

Speaker 5:
[10:23] Yeah, that's a good question. In general, we found over the years that it's easier to build a smaller number of larger sites. There's still asterisks there. You don't want to be too concentrated, again, from a fault tolerance and geographic locality perspective. In other words, the argument of build as big a site as you can in one place breaks down rather quickly. But having 1,000 each with 0.1 percent of your capacity has other overheads associated with it in terms of management. So I think that it'll really come down to geographic locality and probably a medium number of medium-sized data center, sorry for the whatever lack of precision there, medium number of medium-sized data centers augmented with a small number of large data centers.

Speaker 4:
[11:08] Right, which makes sense. Okay. So then the next question has been on my mind about the future of this infrastructure that has a lot of direct relevancy to the energy side of the equation is about reliability. Data centers historically have been just its gospel, I would say. The data centers require the highest reliability, three nines or whatever the number is, and to the extent where the standard footprint of normal data center in Cloud world, pre-AI, but even the early AI data centers as well, has a UPS system and backup generators and all this kind of stuff, just to make sure that reliability is that high. Two questions for you. One, why? Why is the reliability requirement so high? And two, is there any argument for that changing in the future? Because that reliability requirement causes so much challenge and capex, right? Why is it such a problem that we have lead times on gas generators, all this kind of stuff? It is because of the reliability requirement. So is it intrinsic to something about what you're doing, or is it just a function of how the business has evolved?

Speaker 5:
[12:11] Yeah, a fantastic question. And I think that if I were to probably send one message here is no, it is not intrinsic and we should be thinking about lower reliability power delivery overall. I'll tell you why it has been, but I think that I'll also get to why it has changed substantially. So for most modern software services, the compute is actually a relatively small fraction of your cost. So now it makes sense to over-provision it. You want to have 99.999 percent, five nines reliability for your software services. You don't need quite that, but many of our data centers aim for four nines, minutes of downtime a year maximum, which as you said, has a large amount of cost associated with it. Now, if you think about it though, as of now, given how constrained resources are and how costly they are, much larger fraction of your overall service cost is in the compute. So if you want to your internal customers, if I were to go to my internal customers and said, would you rather have four nines of availability and half the capacity, or two nines of availability and twice the capacity, which do you pick? Very often, not always, very often they'll say, oh my gosh, give me two extra capacity. If I need to have 99 percent, and 99 percent sounds good, you all know the math, that's 3.65 days of downtime a year. That's a lot. Like we're saying, three and a half a week every year, you're down. You don't have the capacity. But if the other 51 and a half weeks, I get twice the capacity, many people would say, sign me up.

Speaker 4:
[13:52] Yet, I don't see that happening. Is it happening and I'm not seeing it?

Speaker 5:
[13:57] Without saying too much, it's happening. I would say that's actually the co-design there with our customers at Google has been one of our sources of significant deficiency.

Speaker 4:
[14:07] Okay. So that's a good segue then into my next question, which is behind-the-meter power generation storage, whatever it might be. There are multiple reasons that one might put something behind-the-meter, right? It can be for reliability purposes, that is one. But oftentimes now people are talking about bridge power and things like that. What is your view on this? There's an enormous amount of planned behind-the-meter power. Is that the direction of travel? Will it be the direction of travel for an extended period of time?

Speaker 5:
[14:38] It's a very important opportunity for us and it is one of latency. Again, a different kind of latency. In other words, what is the time to delivery of capacity? What I'll say though before going down that path is that we would actually at Google prefer grid-connected capacity. Why?

Speaker 4:
[14:58] I was going to say, why? Is it reliability?

Speaker 5:
[15:00] It is in the end provisioning for a given level of reliability. If you're behind the meter, you're going to have to do all that provisioning yourself. Now, an aspect of this that's actually quite powerful for us and to give an example, going back to the reliability question. In March, we actually had a significant milestone in agreements with utilities for a gigawatt of demand response across our fleet. What does demand response mean? It means that for the utility, for the one week of the year where they have maximum demand, we're willing to brown down. That also goes to the availability commitment that we make to our customers. Why? Because that allows them to provision not for their worst, coldest, hottest, whatever it is, week of the year, but to then provision for the 90, whatever it is, eighth percentile. We'll give up that capacity in exchange for, in the end, more availability of power, less cost, both for us, but also for the ratepayers in the region. So now, if we have to do all the reliability work ourselves, rather than being able to shift capacity back and forth when we're not using it, like let's say that we actually have behind the meter power generation, and we will, behind the meter in quotes, what if we can, when we're not using it, give it back to the utility? In general, the way we look at it is, we like behind the meter if it means that we get the capacity up most quickly, but we're always going to look to invest with the utilities to bring the transmission. Maybe it's a year after, maybe it's two years after. But the point is, this gets us the capacity we need, and maybe we need some bridge power in the interim. But that bridge power actually in the limit could be mobile.

Speaker 4:
[16:40] Tying these two things together, one thing I haven't fully wrapped my head around with bridge power is the reliability question. If you're still in this world where you're demanding, let's say it's not four nines, let's say it's two nines of reliability, but you need two nines of reliability with just on-site generation for some period of time, however long that bridge is, you got to build a lot of on-site stuff, right? You end up over-provisioning really heavily, and then eventually you get the grid connection, and now what do you do with all this stuff? So is there during that bridge power period, are you offering a different level of service somehow, or are you actually provisioning for your two nines, whatever your ultimate reliability requirement is going to be, but from day one with on-site resources?

Speaker 5:
[17:19] It's both. One way to look at it is that most people have trouble unless they've operated at scale, thinking in terms of these numbers of like, what's the difference between 99.9 and 99.5 or 99.99 in a given year? And in a given year, they might actually be identical. And so some people are just going to say, I'm going to roll the dice. I hope I get lucky. And sometimes they will. And they actually won't experience any issues. What I would say, though, is that we also look to seeing, okay, beyond some of this bridge power that we're going to need, what are the more permanent sources? Would we use solar, wind, nuclear, other sources that will be permanent? But might not be able to get us all the way to the power capacity that we might need. And then we have to augment with whatever might be turbines, gas or something else.

Speaker 4:
[18:05] Which could be mobile, as you said.

Speaker 5:
[18:06] Which could be mobile.

Speaker 4:
[18:07] Yeah. I guess the question for me then is, do you feel that we're going to end up with all this stranded on-site generation as a result of this? Are we going to end up with, is there any world where we build excess generating capacity? Or are we just so far underwater now that it doesn't matter?

Speaker 5:
[18:32] You know, I'd love to have that problem. I'd love to have that problem. I think that one of the things that we aim for at Google, and I think you all as well, is a world of energy abundance. And I think that the world would be a better place if energy were abundant. It's not. I'm not just talking about AI or data centers or anything. Energy is a limiter. I think we're so far away from that world that I'd love to have the conversation. I don't think it's the next few years where we have too much.

Speaker 1:
[19:02] Are you tired of overpaying for big name PR firms, but not really knowing what they're delivering? Is your comms team wasting time reviewing lengthy messaging briefs and decks, instead of engaging journalists or producing content? Are you wondering why your competitors are getting press and you aren't? FischTank PR is an award-winning climate and energy tech, renewables and sustainability focused PR firm dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, FischTank can help. Check out fischtankpr.com. That's F-I-S-C-H fischtankpr.com.

Speaker 2:
[19:44] Virtual power plants are becoming a reliable way for utilities to manage capacity, but rolling devices is just the start. What really matters is confidence, knowing those resources will perform when dispatched and being able to prove it from the control room to the living room. EnergyHub's platform handles the full picture, from near real-time forecasting, locational dispatch, and the kind of rigorous verification that holds up when regulators, grid operators, or leadership ask, did it deliver? Easy enrollment creates momentum, proven performance builds trust. That's why more than 170 utilities rely on EnergyHub to manage over 2.5 million devices, delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com.

Speaker 3:
[20:28] We're living through a profound economic shift, and energy sits at the center of all of it. Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers. But the future of energy isn't shaped by technology alone. It's shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, the capital platform for the clean economy. Join me for my brand new show, Critical Capital, as I talk with people deploying capital, shaping policy and building projects. Together, we unpack how risk is priced, how incentives are structured, and how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts.

Speaker 4:
[21:14] Let's talk about the different resources that you might put behind the meter. You mentioned you can build on-site solar or wind or whatever. You can do nuclear. You can get your generation that way. You can get your generation with gas as well, and then you can build batteries to buffer. Do these end up the ones that you are going to, the data centers you're going to build that do have on-site infrastructure beyond just the UPS and the backup generator, do they end up looking like little microgrids, and are you optimizing against a bunch of resources or is it generally going to be a lot? Some of the data centers like, I don't know, the XAI data center that got built is just a bunch of gas generators basically.

Speaker 5:
[21:50] The microgrid and the software control here is going to be absolutely key. This is a place where I think we as a community are under invested today. If you think about that demand response scenario I talked about, if we need to do a brown down, it's not going to be that the whole site goes away. It's not okay, maybe we need to give up 20%, 30%, 40% of our capacity. Okay, which 20%, 30% and 40%? What's the signal to the software? What do we drain from where? What SLOs do we shift? Do we say, what for the next week we're going to need to fail over 20% of requests from this location to somewhere else? Maybe actually a whole building gets powered down for a week, maybe or most of the building. The microgrid is going to look exactly like this microgrid, and now can you distribute the power dynamically. Also, by the way, in response to the workload. We talked about training versus serving, the power footprint, the two are very different.

Speaker 4:
[22:45] I assume the latency sensitivity is super different as well. Even within, as you said, even within inference, there are some things that are going to be super latency sensitive and some that very much will not.

Speaker 5:
[22:55] If you got your overnight agent running, then it might be all serving, but it might be batch serving that's not sensitive from a human and loop perspective. But then others for your chat interactions or whatever, that might be very latency sensitive.

Speaker 4:
[23:08] Is there an extent to which you are uniquely capable of executing on this, in the sense that Google is certainly the most vertically integrated player. You're from the TPUs through the Cloud service, you have Gemini, you're running your own workloads and so on. So if part of what is required in order to reach this future where data centers are flexible and can operate, it's slightly lower reliability and all those kinds of things. If part of what's required is that you have to differentiate amongst the workloads such that some can operate as necessary at really low latency, and others at higher latency. Google can do all that in-house. I mean, you have customers for Gemini, so you have to serve those customers, but you have more capability than most. How do you think it disseminates out beyond Google?

Speaker 5:
[23:52] So I think that it's a good question. It's something that we think a lot about. In other words, what we want to do is we want to design end-to-end systems that taken together create capabilities. This word capabilities actually essential to what we discuss internally a lot, so I appreciate the question. Create capabilities that otherwise wouldn't be possible. I do think that it comes down to this vertical integration. In other words, for us, for let's say our TPUs, we co-design them with the building. We co-design them with the power generation source. We co-design them with the DeepMind team that builds Gemini models. So it's the software above, the models above that, the chip design, by which we do in my team as well. That's integrated with the rack, that's integrated with the data center, that's integrated with the power source. And if between each of these boundaries, you have a custom optimized interface that gets you a few percent, those few percent up and down start adding up, multiplying out, in fact, to something meaningful. And that is exactly what we go after.

Speaker 4:
[24:53] Okay. So I'm going to ask you to rank some things. There's been a little bit of a debate publicly that I have found interesting about what is the rate limiter on the growth of AI. Let's assume for the moment it's not demand, like the relative to supply today, that there's essentially infinite demand. And maybe that changes some point in the future, I'd be interested in your perspective on if and when that might happen, but certainly not the case today. So it's going to be something else. There has been an argument that it is chips, and the chip supply chain, right? Particularly some of the things upstream in the chip supply chain, like UV tools for lithography and so on. This is a roomful power grant to people. There's certainly also an argument that it is power. I think there's a third argument maybe that it could be labor at some point. You can add a fourth if you want to that, but if you had to rank order, what is the biggest rate limiter to growth between power, chips and labor? How would you rank them?

Speaker 5:
[25:55] Yeah, and I would add sort of data center construction and delivery as…

Speaker 4:
[26:00] So EPC is a broad category and not just labor, you mean?

Speaker 5:
[26:04] Yes. Labor is one component of it, but I think just even the supply chain there associated with it, electricals, mechanicals, cooling, etc., is another aspect of it beyond the chip supply chain. I would say that when delivering the end-to-end, we unfortunately don't have the luxury of focusing on a single limiter. I would say very sincerely and honestly, at 10 a.m. it's labor, at noon it's power, and at 2 p.m. it's chips every single day.

Speaker 4:
[26:34] All right. I'm going to force you to answer the question in a different way. You're supposed to spend whatever it is, $175, $185 billion this year building out new infrastructure. If you woke up tomorrow and Sundar said, you got to spend $300 now, what would you go try to solve?

Speaker 5:
[26:56] I'm not trying to dodge the question, but I very sincerely feel that actually we'd have to go scale all of them and that every single one of those is at the limit of what we can do for the envelope that we have. Is one of them inherently easier to scale than the others among the options? Honestly, no. All three of those are major, major issues for us. I'm sure that there is an answer, but I'm not relaxed about any of them. This is a real thing. I couldn't pick one. I would say Sundar, wow, 300. Okay, I'll get back to you as to what the exact issues are going to be.

Speaker 4:
[27:32] On the labor and EPC-1, I'm curious your perspective on, and not just related to data center construction, but in general, where the rise of physical AI as a category, the rise of robotics, and who knows what form factor that ends up taking has been a second wave. There was an LLM wave of excitement in the public. I'm sure in your world, it's been going on longer. But I would say we had this wave of digital AI excitement and now a physical AI wave as well. Do you have a heuristic in your head for how demand shapes up between those two or how infrastructure will get built relative to those two?

Speaker 5:
[28:14] Yeah, it's a good question. I mean, I think that in terms of the digital side rather than the physical side, the demand obviously today is much, much, much larger. The architecture for the physical side is still in development. I would say the best examples of it right now are with self-driving cars. In other words, if you think about it, these self-driving cars really are robots on four wheels. For this use case in particular, you can imagine this is actually one of the hardest use cases. Safety is paramount. Safety is absolutely paramount. What that means is that you actually give up some capability, some scale for certainty and reliability. To my knowledge, without speaking about any of the specifics, this means actually more of the Edge use cases are relevant there, because the multiplexing associated with Cloud is probably less desirable. In other words, if you have a blip and you're really counting on some computation, if you're doing a chat and whatever your chat app is down for five seconds, fine. Do something else for five seconds, you come back. If the robot can't get its answer in five seconds, depending on the use case, that could be catastrophic.

Speaker 4:
[29:33] How much of that happens on device or in the case of Waymo in car? How much of the compute that occurs in Waymo or in a humanoid robot in the future, is going to happen inside that instantiation of the physical AI device versus getting pulled from the Cloud?

Speaker 5:
[29:49] Well, without talking about any specific use case, I believe that a lot of it is going to have to be on device, and dedicated to that use case. Not all, again, there's going to be different kinds of use cases. If it's, what kind of music do I want to play for my passenger? I don't know. Maybe that's okay if that blips for a few seconds. But if it's, which turn do I take now in an evasive maneuver? It seems like you want that on device.

Speaker 4:
[30:12] Right. Which then makes the argument for like the Edge infrastructure stuff a little bit weaker. Because the thing, people have made the argument back to kind of Edge versus medium size, medium number versus hyperscale thing. I think that people thought the strongest argument for Edge, Edge, small localized compute was things like Waymo. But if the really sensitive safety-oriented stuff or the really latency-sensitive stuff is all going to happen on device, then maybe when you pull, you can handle the latency of going to the East Coast.

Speaker 5:
[30:48] That's a very good question. So I think I would need to think through it more, but if you think about some other related use cases like factory automation, in that case, would you have an Edge deployment, something that looks more like Edge deployment that handles this provision for handling 100 or 1,000 or whatever it is robots for that particular use case at the Edge? Again, good question. I'm not giving you an authoritative answer.

Speaker 4:
[31:10] No. In that case, you might do that for cost-saving reasons, right? Putting all that computing into every individual robot at the expensive or power?

Speaker 5:
[31:18] Right. Because putting that much compute into every one of these robot arms may be prohibitive.

Speaker 4:
[31:24] Right. For the infrastructure inside the robot.

Speaker 5:
[31:26] Yes.

Speaker 4:
[31:27] Okay. I want to finish up by asking you something that I feel like I don't hear as much talk about as you'd expect in the long-term that there should be, which is CAPEX and cost savings in data center infrastructure. Right now, we're just in the world of like, we need to build as much as we possibly can, and it seems like speed is the only thing that matters. But in the long arc of history, one presumes ultimately the cost of that CAPEX is going to be important. Where do you see the biggest opportunities? Like if you think out into the future, how do you turn? If you were to build the same amount of capacity in five years in megawatts as you are today, is there a world where you turn that $175, $185 billion into $100 billion, and what are the things that could get you there?

Speaker 5:
[32:10] We're looking at this all the time. In other words, it is probably one of the biggest focus areas in my team. I won't say biggest, but it's top three for sure. It might be biggest. So in other words, when we say we're spending X dollars, we're saying that if we had to have done this work last year, we would have to have spent 12X making up the number. Don't take it as, in other words, every year we're looking to deliver substantial efficiencies such that if we had to do it again, it would be way more efficient. This starts with software, and again, a lot of opportunity on the software side, but lots of opportunity on the hardware side. Let me give a very simple example. What is the ratio of power to space in your data center? In other words, if you have, let me pick a number, 100 megawatts. How big a building do you build? How big a building do you build for 25-year lifetime of that building? Not just one generation of TPU or GP or whatever, but maybe five or six generations of them. Now, you could be conservative and build an infinitely sized whatever it is building and say, okay, whatever comes next, I'm going to be set. Or maybe I have to, now, if you think about the watts per linear foot of a disk rack versus a GPU rack, radically different. Like, I don't know, 100X different between disk and GPU. What are you going to assume? So now, if we could actually co-design and optimize and say, you know what, this building is going to be a GPU building, that building is going to be a TPU building, and that building is going to be a disk building. Huge opportunity. Now, I've now limited my fungibility. Like, if I change my mind in five years time and I have a disk building and I want to put some TPUs there, there's going to be a lot of empty space. A lot of empty space. So I think we figure these things out, not perfectly, but every year, every generation, we're looking to draw that co-design for that optimization and managing the flexibility while optimizing the cost.

Speaker 4:
[34:14] It's interesting on the outside, I think I would have assumed that you had already basically optimized to the T for linear area density. Everybody talks so much about that density inside these data centers for a variety of reasons. Some of that is because for training, it actually it's a performance thing, but for cost reasons as well. I would have assumed you're already at a maximum possible density given today's technology. It sounds like you're saying that hasn't always been the case, in part because we've been designing data centers to be more multi-purpose tools.

Speaker 5:
[34:46] Exactly. I would say five years ago, 10 years ago, it didn't pay to have that hyper-optimized because if you lost anything in flexibility in a world where compute wasn't the dominant portion of your costs. If compute is not the dominant portion of your cost, you actually want to have flexibility and fungibility. When compute becomes a more dominant portion of your cost, you now actually are thinking, okay, what am I going to do for this year, next year, and the year after, to make sure that I optimize it super well. The difference between storage and compute was at most 10x. The difference between storage and accelerators are approaching 100x. So the problem just got and getting wider. Right now, the disks aren't consuming any more power. Every generation, the accelerators are. So these kinds of problems, but even within an accelerator, if you look at the power of flip prints, and this is where the microgids also come in, of serving versus training, radically different. Like if you just look at how much power we draw from the utility, or from our batteries, or from whatever, for one workload versus another, could be a factor of two.

Speaker 4:
[35:55] Well, and the profile, those workloads are very different, right? Training sort of notoriously very on-off, spiky, and you solve for that. I don't know if this is still true within Google's data centers, but you solve for that by basically blank workloads to try to make it smooth.

Speaker 5:
[36:10] We don't do this, but yes, others.

Speaker 4:
[36:13] Some do. Yeah. The profiles, workloads ultimately impacts what other infrastructure you need on-site, what your buffer system, all the power infrastructure, all those kinds of things.

Speaker 5:
[36:22] Yes.

Speaker 4:
[36:23] All right. I mean, this was very fun, very informative as I expected. Thank you so much for being here.

Speaker 5:
[36:28] Thanks for having me. This was great.

Speaker 4:
[36:35] Amin Vahdat is the Chief Technologist for AI Infrastructure at Google. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today's topics. This episode is produced by Max Savage Levenson, mixing a theme song by Sean Marquand. Stephen Lacey is our Executive Editor. I'm Shayle Kann, and this is Catalyst.