transcript
Speaker 1:
[00:00] A couple of weeks ago, we covered the Claude Mythos release, the model that found decade-old security flaws overnight and scared the hell out of basically anyone who was following the AI story. So much so that the federal government is involved, but the part that we didn't get into is the backend that powered this model. Mythos was built on a chip from March 2024 that Jensen pulled out of his pocket on stage at GTC, which was the Blackwell chip. It had 208 billion transistors, everyone treated it like the future had arrived, and yet it took two years of fabrication for us to get the first manifestation of that, which is Claude Mythos. 24 models from Keynote to a working model. It happened with Hopper, it happened again with Blackwell, and it's going to happen again with their future models. But the difference is we have a series of future models that exist today, that we can map out to where we're going to be heading based on this trajectory that we've seen with the previous chips. It's pretty awe-inspiring to see where we are going to go, considering there are three generations of chips that have already been announced since Blackwell. The Vera Rubin, Rubin Ultra, and Feynman, each one many multiples more powerful than the last. And when you look at what Blackwell already produced in the very first version, it gets impossible to imagine a world where we don't reach AGI on hardware that's already been designed. Everything that's been announced that is going into production almost certainly is going to produce models indistinguishable from AGI, at least that's what it seems like on surface level.
Speaker 2:
[01:18] Yeah, so the story here in a single sentence is AGI like AI models are already here. We just haven't distributed it because we haven't powered up the GPUs that enable it. So everyone is obsessed with AI models. We talk about our favorite models, how we prompt them, how intelligent they are. But very few people are talking about the fact that the hardware is the thing that powers these things. They train these things, they inference these things, and it's still about 70 percent of the influence of how intelligent your model is. And the prime example, most recent example of that has been Anthropics Mythos release, right? You just mentioned it, it's discovered a bunch of different cybersecurity flaws. It is this all-being powerful thing that the governments around the world, including the US government, the Federal Reserve, they're sharing meetings with the top banks, to talk about the craziness of this model, we must prepare. There's a lot of Duma news out there in the future. Little do you know that this is powered by a GPU, or this is trained by a GPU that was built 20 months ago. So we're talking about almost two years ago, it's called Blackwell, and I want to give you guys an idea of the timeline of what this looked like. So in March 2024, Nvidia GTC, which is like their developer conference, Jensen Huang comes on stage and he presents this gargantuan scrap of metal. It looks very pretty, by the way. And he goes, this is Blackwell, GB 200, GB 300, a brand new GPU. We can train frontier models on it. Everyone gets so excited, their stock price absolutely ascends, right? The thing is, people couldn't get their hands on this until exactly a year later. So to give you guys an idea of the timeline, he announces it in March 2024. Then by the middle of the year, they discovered there's like a bit of a design floor and they amend that. And then by the end of 2024, early 2025, they start shipping these units of Blackwell GPUs out to the top frontier AI labs. But there's an important nuance here, which is it's just the GPUs sitting in a data center. They aren't actually powered up. It's not until 6 to 12 months after that fact that these GPUs were finally powered up, used to train models, which is why we now start to see these new AGI-like models like OpenAI Spud and Claude Mythos come to fruition. So the point is there is a long gap between the frontier GPUs being announced and rolled out to them actually being powered to train the models. We talked about Elon Musk and XAI a lot on this show before. They actually have the largest arsenal of these Blackwell GPUs. They bought about a million of them. The crazy part about this now is they're not like one, two, but three new Nvidia GPU models that have been announced in the recent Nvidia GTC. So there is a major lag between frontier hardware and the new AI models that are being released. People don't understand this and we want to tell you the story.
Speaker 1:
[04:06] Just remember GPT-4, how long ago that was and how that felt like the huge most pivotal model that OpenAI ever released. I mean, that was the big one right after Chatcha PT came out. That was trained using the Hopper chips. The most recent model.
Speaker 2:
[04:20] Hopper is a word I haven't heard in a while, Josh.
Speaker 1:
[04:22] Yeah. Well, you know GPT-54, the most recent model that we're using every single day on Chatcha PT. That was also trained on Hopper chips. The same chips are training models from GPT-4 to GPT-54. It's a testament to how the efficiency gains of software can actually increase the throughput of hardware. I think I want to use that as an example because what we just got recently with Mythos through Anthropic, that seems to be the first real implementation of a true Blackwell model. Rumors are that Spud, the new OpenAI model is going to be the same in terms of power that is coming as it relates to the first Blackwell model. Even if we don't actually iterate on the hardware, the amount of progress we're going to get from Blackwell models alone seems like it is going to be difficult to imagine it doesn't become some sort of an AGI, right? It's like when you think about the difference of intelligence between GPT-4 and GPT-54, how far we've come, that applied to Blackwell at this new scale seems crazy, but that's not even the crazy part because we have an entire roadmap of these three generations of chips that are coming that we can very clearly map to the gains that we're going to see. And I think that's when things get like particularly disturbing because on the chart that we're looking on screen now, we have Blackwell. That's where we are right now. Blackwell is a significant improvement over the previous model. But then we have Vera Rubin, which jumps from 20 petaflops to 50 petaflops. That's a two and a half to five times multiple on the compute. Then we have Rubin Ultra, which is scheduled for the second half of 2027. That is a 14 times multiple. And then we have Feynman in 2028, which is an estimated 30 to 50 times multiple on the current chip stack that we have today. Assuming that we get no software progress at all. And what we saw with the Hopper chips is that we got a tremendous amount of progress just from software. So when you combine this 30 to 50 times multiple with a maybe another 100 times multiple on software, if we make another breakthrough, we're looking at some pretty insane improvements here that are really hard to wrap your head around.
Speaker 2:
[06:30] I want to point out that these improvements, these multiples that you just mentioned are just on the speed and power of these hardware modules. So it's going to work 3x harder or 14x harder, but it's also going to cost you a lot less to be able to train the same type of intelligence or model. So the intelligence per density, which is a unit that we completely made up, and we don't know if it exists, but it somehow rhymes in my head at least, is improving and it's going to be cheaper with each successive model. But if you want to get a bit of context as to what that looks like in terms of the models that you use today and what it's going to look like tomorrow, we have this other table here which maps that out. So with Blackwell today, you get about a 2-3x more intelligent, crazier model. That's what Claude Mythos is supposedly meant to be. It's a larger size. It's trained on these Blackwells. You're going to see a bunch of models similar come out from OpenAI and XAI over the next couple of months.
Speaker 1:
[07:25] Just to pause you there, these are already models deemed too dangerous to release for the public.
Speaker 2:
[07:30] Yes. Just like their emergency meetings, literally being called by the Federal Chair, top banks. Actually, I read something yesterday that the NSA is using or conferring or re-engaged with Anthrobic as well as the Pentagon and the US Defense Department after banning and blacklisting Anthrobic because it's so powerful.
Speaker 1:
[07:52] That's where we are today.
Speaker 2:
[07:53] That's today. So that's right here, 2026, 2-3X, right?
Speaker 1:
[07:56] Yes, crazy.
Speaker 2:
[07:57] Now, you might notice that by next year, we have a larger multiple on the original multiple. By next year, we're going to have a 10-15X improvement purely through VeroRubin GPUs. Now, I must emphasize, this does not include post-training. This doesn't include all the fine fancy techniques that AI labs themselves will implement to make a smarter model. This is just the hardware. It's like buying the hardware and training a model. Today versus next year, you're going to get a 10-15X more intelligent model, but it gets even scarier. 2028, 30-50X, 2029, 100-200X. Now, I haven't seen these multiples in any other industry for any performance or hardware improvement. So I can't wrap my head around this because it looks like just a few small numbers that are getting larger, but these are multiples of its predecessor, which means that we're probably going to get AGI honestly by the start of next year, and they're trained on hardware that currently exists and is rolling out. I'm just scared reading all of this, to be honest, because what happens if we have universal access to this? There's going to be a load of malicious actors which can use these models for various different things, but also I don't know what these models are going to be capable of. They're going to be so much smarter than humans themselves.
Speaker 1:
[09:13] The disturbing thing is that this technology is here. It's no longer an engineering problem or a physics problem necessarily. It's just a matter of actually producing the thing and plugging it into an outlet and putting it online. This is coming. There are no novel breakthroughs required to make this a reality. Now, what that looks like on the other side, I don't know, but I think it's safe to assume the velocity of improvement we're going to get is certainly not slowing down. It is turning more to closely resemble vertical line than anything else. I think it begs the question, at what point do we reach AGI and how do we even define that? Because I'm not sure we spoke about that much on the show, but Ejaaz, when you say AGI, what do you mean by AGI? What would you be looking for to declare, okay, we have finally reached AGI?
Speaker 2:
[09:58] Okay. This is my own made up definition, but it's what will make me go, okay, this is AGI. It would be a single AI model, not many, but a single AI model that advances the frontier of three key major industries autonomously. I'll pick these industries as examples. Financial industry, so it trades better than the average wall, sorry, than the best hedge fund or investor. It is able to make assessments better than any of the financial analysts, the top experts, et cetera, in that industry. In science, it has discovered a bunch of medical cures for some major diseases, such as cancer, Alzheimer's, and stuff like that, that top scientists at their top level could not figure out. It accelerates their research. Maybe one of the industry that I can't think of right now, but it's when these models start doing things that the best of the best humans right now couldn't figure out themselves and couldn't have seen themselves. Do you have a similar definition or?
Speaker 1:
[10:59] Yeah, I think that sounds right. I think and again, it's very fuzzy. Everyone has their own custom definition of what they believe AGI is going to be. For me, it's just AI that's smarter than the smartest human at pretty much any cognitive task that exists. So you can go to this model and it will be better than anyone else who you can ask on planet Earth about anything. The problem with models today is they're very spiky. You can do this for code probably, and it can code better than every human on Earth. But if you ask it a generalized question about something that you really know a lot about, there's a lot of times where it's not completely accurate or it will respond as if it has the intelligence of a three-year-old. It fails the reasoning tests of a lot of simple things. It still feels like it's this very spiky entity. Once it is fully developed, once it is actually better at every cognitive task, that includes physical things too. That includes understanding physics of the real world, world models, that feels like AGI. Then artificial super intelligence, ASI, feels like it is smarter than all humans combined. It's like if we put all of our brains together, no matter how long we tried, we can never come up with the things that artificial super intelligence will come up with. I mean, will we get there using this chip architecture? Possibly. I'm seeing a 50X multiple, not including the software multiples, and those compounding on top of each other, at the rate that we're moving, seems like the only real constraint is going to be physical. It's going to be actually rolling out these models and powering them on.
Speaker 2:
[12:23] Well, another crazy thing is, I think a lot of people, including myself, would assume that with every chip upgrade, it's going to be more expensive and it's going to be bigger. It's going to be clunkier, right? Like the data centers are going to get bigger, it's going to be more expensive. It's, I wish I had a chart to show this, but it's actually the complete inverse. And I'll give you some examples, some numbers to explain that, right? So a reasoning task that costs $1 on Blackwell, costs 20 cents on Vero Rubin, which is rolling out as we speak or later this year. And it'll only cost 7 cents on Rubin Ultra, which starts to get released by the start of next year. So the cost is going down pretty massively. Now, by 2028, Jensen announced the Feynman GPU, right? A single rack of that. So we're talking about like just a couple of that, right? Blocked on top of each other, will process more compute than was required to train GPT-4 that you mentioned earlier, Josh. So the point is less is more, but somehow more powerful, but also somehow more cheap relative to the intelligence that you're building. If you assume this intelligence is going to reach this ASI, AGI like state, it's going to make you money as well. So you end up just having, I guess, I'm afraid to say this, but the best of all worlds, both worlds, I don't know what humans are going to be doing, but it's great for AI, basically.
Speaker 1:
[13:48] Yeah. There's no world in which things don't get better, and it feels like right now we're really just constrained by this compute power. There was this great meme that I saw online. It said, Mythos is too powerful for public release, but the reality is that they're just completely out of compute, and Nthrapa can't actually supply the tokens required to give Mythos to the world. These optimizations, these cost structures, yeah, there it is. We got it on screen now. Great meme, but these cost structures that are going to incur from these new models, they're going to completely destroy that factor, at least for now until whatever that next-generation model is that is so powerful that it's constraining GPUs. The interesting thing is that OpenAI has the same exact thing going on. All these models are converging on the same spot, but they all seem to be compute constrained.
Speaker 2:
[14:33] I think what critics will push back on though, Josh, for everything that we've said so far is, okay, cool, you can buy these new hardware things, but why would you do that if you could just wait a few months or six months and buy the next thing? Jensen's just shipping out these products, making a lot more money. It doesn't make sense. These things are depreciating assets. By the time you've bought the first one and you've ramped that up with power and training your next model, there's already three other new chip architectures. He would be right, that critic would be right, except that they're massively wrong and we have proof for that. GPUs have now become this anti-depreciation machine.
Speaker 1:
[15:10] One of the most amazing things about this phenomenon, and it feels like a narrative violation, is the idea that the GPUs that were released three years ago are actually more valuable today than they were at the time they launched, which is a pretty bizarre idea. We have this artifact on screen that shows a chart and an H100 from Nvidia cost $30,000 when it launched in 2023. At its peak, because of the scarcity, because everyone needs these things, it was selling for a four times multiple at $120,000 per H100. This is kind of outrageous. It was a little exorbitant. We don't need to be paying that much money. But now that they are old, they're not depreciated, but there's much better hardware out there. They're still holding their price at $30,000. In fact, you can see a rebound that happens in late 2025, where the cost of these H100 GPUs actually ticks upwards. And I think a lot of the people, Michael Burry most famously, who is the guy behind the big short, he created an entire short thesis around the idea that the depreciation schedule of these GPUs wasn't aggressive enough, and they were actually going to lose their value, and therefore the market was going to deflate, because the companies weren't marking these down properly. The reality is that not only are they not going down, they're starting to trend back up, because the incremental cost for a token is so low with these, and everyone's so desperate for compute that they're like, well, might as well spend some extra money, get the H100s, and start generating inference tokens with them. It's this pretty amazing phenomenon that's happening.
Speaker 2:
[16:36] Yeah, so if you're wondering why this is happening, explicitly it's AI demand is growing faster than chip supply can expand. We don't have enough fabs, or the manufacturing prowess, or the energy grid to support creating and generating more GPUs to satiate the demand that we're seeing in AI across all these different industries. It's a very pervasive bit of technology. Now, the data that we're showing you on the screen right now isn't siloed to a few research papers. This is happening in the market right now, and it's incredibly liquid. So a new phenomenon of companies in AI whose stocks have all skyrocketed are these things called Neo Clouds, right? So these are like, think of it as like AWS, they supply compute to train your AI models by setting up their own data centers, and they provide it to you in a cloud or data center specific structure. Examples would be CoreWeave, for example. The idea here is these data centers or these GPU providers, 70 percent of the GPUs that they're running are old GPUs that we're showing you on our screen right now, and they're booked out, I'm not exaggerating, six to 12 months in advance. In fact, they're done so in contracts, and the same providers renew the contracts three months before the contract needs to be renewed, just to make sure that they get access to these older GPUs. The point I'm trying to make, and you mentioned this just now, Josh, is all that matters is, can I get AI tokens generated to do the thing that my company needs, or answer the prompt that I have? If the answer is yes, and it's for a reasonable price, I'm down to go for that because the value that you can build and earn on top of that is invaluable. They can have a large markup on that. It makes sense that these assets are in high demand. To your earlier point, Michael J. Burry shorted the entire market saying that these are depreciating asset and he got that completely wrong. His thesis specifically was based on, it can't train frontier models. He's actually right. The older models can't train frontier models, but what they are being used for is one thing very specifically, inference, which is if someone has a question, how do I get them the answer? How do I process the prompt? That's what the older GPUs are being used for and they're really damn good at it. The reason why it's important and essential for AI labs specifically who are training models, who you might think might want the expensive models is, they have a ton of inference. They use inference to even train the new models. It's this new paradigm where all these old GPU architectures are being re-found or re-purposed for this really important thing that is inference. So important context to understand if you're investing in some of these companies, for example.
Speaker 1:
[19:11] Yeah, and why is it so valuable? Well, it's a testament to the software improvements, right? So we have those software efficiency improvements that we didn't have three years ago. So that same hardware generates a lot more value. And if we scroll down to the value multiplier section of this artifact, it shows that the cost of a chatbot inference in 2023 was $3 an hour. And now autonomous agents completing these complex tasks is $30 to $300 per hour. So the value that you can charge for these tokens is significantly higher than it was in the past. And the amount of tokens that you're able to generate efficiently with that higher quality is much higher as well. So there's all these converging forces that are just making the market desperate for compute. Nobody has the compute required that they want. And Nvidia is trying to put it online as fast as they can, but it's not fast enough. And I assume as we go through this, we're going to continue to see varying bottlenecks and the efficiencies will move to where there are bottlenecks, which creates new bottlenecks. Right now, we're seeing some convergence around CPUs and CPUs seem to be like they're going to be hitting a shortage somewhat soon because we're out of GPUs, let's move to CPUs. And it's this really interesting dynamic, but that is the idea on this Nvidia episode or just the chip episode in general, that it is hard to imagine a world in which we don't reach AGI given the currently announced infrastructure. It doesn't require any breakthroughs. It's just if Nvidia does what they announced on stage through Jensen Huang, through these next three chips, it is almost impossible to imagine what the world of intelligence is going to look like. And I think it's important to understand is that Mythos is trained on a two-year-old chip. And no one's really talking about that. So it blew my mind. Hopefully it blew yours as well. At least found it a little bit fascinating. And that is our episode today. Thank you guys so much for watching. We really appreciate it.
Speaker 2:
[20:51] And I know some of you are probably thinking, there's a bunch of challenges here. And Josh actually just mentioned one of them, which is like, you got CPUs, we don't have enough energy, we don't have enough memory. And that's like another episode that we can get into. So all of those things assumed will be leveled at some point. And we're going to see all those industries grow versus being constrained. Like people are throwing trillions of dollars into this industry. So all of those problems should theoretically be fixed. But rest be assured, we will be the first show to cover it and give you those thoughts before it happens, by the way. And Intel is a sneaky one to get into. But we'll talk about that another time. Thank you so much for listening. If you are not subscribed to us, please subscribe. It helps us out massively. We are having banger weeks on YouTube, Spotify, Apple, and wherever you listen to us. Please rate us. Leave us a comment. We love hearing your feedback. There are like thousands of newbies that are listening to the show. Welcome. And also give us feedback about stuff that we may not be covering that you want to hear more of. We're always open to feedback. But until then, I guess we'll see you on the next one.