transcript
Speaker 1:
[00:00] The most powerful model in the world is here right now. In fact, it's so good that it beats Claude Mythos. OpenAI just released Chat GPT 5.5, and it crushes Claude on every single benchmark. It's the new number one coding model. It can do 20-hour tasks that expert software engineers sometimes can't do. It's already discovered groundbreaking solutions in maths and frontier sciences such as genetics, and it's cheaper than GPT 5.4. This is the result of two years worth of frontier research released in this one single model. In fact, it's so good that an NVIDIA engineer said, and I quote, losing access to GPT 5.5 feels like I've had a limb amputated.
Speaker 2:
[00:41] I think a lot of people are going to compare this to Opus 4.7, and that's fair, but I really think the true comparison is to Mythos. Because Sam Altman recently, he just posted something as the model was coming out, that felt very much like a jab at Mythos, and we're going to get into the benchmarks comparing them, many of which will actually beat the Claude model. But what I find most interesting about this post is the second paragraph where he says, we believe in democratization, and he mentioned specifically, we have been tracking cybersecurity as a preparedness category for a long time, and have built mitigations we believe in that enable us to make capable models broadly available. So this is very much a dig at Mythos, which is, as we all know, privately available, only gated to the companies that are given allowance to it. ChatGPT and OpenAI are like, hey, we're going to give you the powerful cybersecurity, we're just going to bake in the precautions into the model so that everyone could have it. And it ends by saying, it's this really sweet thing, it's like, we love you and we want you to win, we believe in everyone having access to this intelligence. And I really respect that. And I think it's an awesome way to set the precedence for what the next generation of these models is going to look like. But before we go any further, let's talk about the model itself. It's out right now. If you have a ChatGPT membership, you can go and use it, go and play with it. Ejaaz, what's the TLDR? What are the high level things that everyone should know? What's most new and noteworthy about GPT 5.5?
Speaker 1:
[01:54] Okay, so inspired by your Mythos comparison, the first question that pops into my head is, I use Claude Opus 4.7 every single day. So I'm like, is it better than this? Should I be switching back to ChatGPT right now? The answer might be yes. So if we look at the benchmark score right here, GPT 5.5 on the left over here, absolutely crushes all the standard benchmarks that these frontier models are weighted against. If you look on the right over here, Claude Opus 4.7, it either doesn't even measure in a particular category, or it's completely beaten by GPT 5.5. In fact, the only stat that GPT 5.5 doesn't beat Opus 4.7 in is something called Software Engineering Benchmark Verified Pro or something like that. It's like the pro software coding situation. But there's a footnote at the bottom of this blog where OpenAI states, Anthropic has publicly said that they might have gamed that particular benchmark and they need to be reevaluated. So we might have a complete clean sweep for 5.5 as we see today. So it's an incredibly powerful model. But a question that popped to my head is, does it actually beat Mythos? And we have a direct comparison right here.
Speaker 2:
[03:00] Yeah. So it shows that it does across some benchmarks. Now, again, these benchmarks are pretty fuzzy. We don't know which ones are gamed to do what. But there is a world in which GPT 5.5 will outperform Mythos on some things. Which ones we're not entirely sure. I think as we kind of figure out ways to describe GPT 5.5, it seems as if it's their first attempt at making a model built for autonomy instead of answers. I think a lot of the benchmarks that they're working on is an agent decoding. Things like it handles tasks that are 20 hours long. We'll get into that. It's doing 85 percent of OpenAI's internal work already. And it also helped rewrite the infrastructure that built it. There was this amazing quote in the blog post that said, OpenAI says 5.5 itself helped optimize the stack that serves it. Codex analyzed weeks of production traffic and wrote custom heuristics for load balancing that boosted token generation speed by over 20 percent. So they're using the model to actually build the model and make it maximally efficient based on the data that it's collected from users like us who are interacting with the model on a daily basis. So it's very smart, it's very clever. It's not just there to give you answers, it's there to think deeply and actually solve problems for you in a way that I think Mythos and a lot of these other frontier models are kind of pivoting towards now.
Speaker 1:
[04:10] The great thing about this model release is it reveals a few things that OpenAI has as an advantage against say a frontier lab like Anthropic. It's clear looking at these benchmarks compared to Mythos, which by the way, the entire world is spiraling because of this model because it's going to have the cybersecurity ability to take over any kind of government system. This model is pretty close and Sam is going to be releasing this publicly, or OpenAI is going to be releasing it publicly for everyone to use. So a question that pops to my head is, does this mean that it's a matter of compute and OpenAI just simply has more of them? Certainly, if you compare Sam Altman's ability to acquire compute and spend all these trillions of dollars to acquire it versus Anthropic, Anthropic has been extremely conservative and now they're struggling. They recently signed a $5 billion deal with Amazon, which we'll get to later on. But the point is, this is a tale of two stories. Either OpenAI has enough compute and they're about to leapfrog Claude because of that, and they're proving that through this model that is a very good answer to Mythos, or, and this is the alternative side, Anthropic's Mythos model is just plainly better than 5.5, and these benchmarks aren't actually verified, which is technically true because I don't know how official these things are. These are just through tests that a small set of users have done. It's a game of both. I'm sure Anthropic is watching this and thinking, maybe we should roll out Mythos, but they don't have the compute.
Speaker 2:
[05:30] Yeah, they don't have the inference. In fact, speaking of the inference, Sam actually made a post saying that he's really excellent work by the inference team to serve this model so efficiently. He wants to really highlight the fact that to a significant degree, they've become an AI inference company now. I think that's a really big difference than what was previously stated. Anthropic has really tough time serving compute and we see that. Even if they had Mythos available in a way that was safe, they can't serve it. OpenAI can. We see it reflected in pricing because we have some pricing for this model. It seems as if it's roughly at par with 4.7, if not slightly better.
Speaker 1:
[06:04] It's slightly more expensive, but not by much. For every million tokens input, it's both the same for Anthropic, Opus 4.7 and GPT 5.5. It's $5 in, but the output is $30 for 5.5 per million tokens, and $25 per million tokens for 4.7.
Speaker 2:
[06:22] It's a little more expensive.
Speaker 1:
[06:23] It's a little more expensive, but here's where you actually have more of a bargain using the more expensive model 5.5. It is cheaper than GPT 5.4 and it uses tokens way more efficiently to think. So what does that mean? If you are an enterprise that wants to plug in this AI model and not worry about it and just have it power your entire profit engine? Well, you end up using less tokens, so you hit your rate limits in a much slower rate, which means that you end up getting more bang for your buck as long as you use the model like 24-7 or you use it effectively. If you are just kind of out there using 5.5 to ask questions that you should maybe be asking Google, this is probably not the model for you, but otherwise super powerful one.
Speaker 2:
[07:07] Yeah, and if these prices don't mean anything to you, that's fine. As long as you have a $20 a month subscription. In fact, this is going to be available to free users fairly soon, I believe. But anyone who is a subscriber has access to this. You don't need to use the API, there's nothing fancy. You open up your app on your phone, you go to the web browser, it's there, it's available, ready to go. Now, there's a few interesting things that you can do with this model that haven't previously been possible. Although we don't quite have access to it just yet, we're recording this right as the model got launched. We do have a blog post from OpenAI themselves who are showcasing a few demos. Again, take these with a grain of salt. These are straight from OpenAI, but they are seemingly pretty impressive and pretty noteworthy as to what they're capable of doing, starting with the space mission application, which is pretty cool and very reminiscent of the moon mission that we just had.
Speaker 1:
[07:51] Yeah. If you guys don't know, Josh has many secrets on this show. One is he's a massive space fan, and when he's not hanging out with me, he's doing space simulations on whatever he can do, right? Well, okay, maybe part of that is a bit of a lie, but with this new app that we're seeing in front of us right now, this was completely vibe-coded using 5.5, and it's used to simulate a specific space mission. Now, if this looks very similar, it's because we just had a space mission for some we visited or went back to the moon in 53 years, pretty big deal, and we can see a pretty accurate simulation going on right here. So as you can see, there's various different toggles. The physics of the entire thing is very important. And that's another point I want to make about this model. It is being used for frontier research, not just in AI, but in mathematics, in genetics. Like it made frontier progression on both of these fronts. And so what we're showing here is, this is a model that goes way beyond just text and telling you what could be. It actually implements this into a lot of different things and understands the world around it, which is extremely powerful. But we have another one here. We have an earthquake tracker.
Speaker 2:
[08:56] For anyone who wants to make websites, it's so good at making websites. And this appears to be one of the strong suits. In this case, there's a few things to highlight on this earthquake tracker. One of them being that it's one, just like a pretty elegantly designed website, but two, all of the graphics are interactive. You'll notice that they update dynamically as you hover over them, as you click, it looks very clean. I assume that is pulling up to date information from an API somewhere that it set up. It is just truly competent and capable of doing these kind of longer tail tasks that are a bit more complicated than a static landing page. But have dynamic data, have the richness that you would expect from a high-end, high-quality polished website, except just built with an AI model from someone who doesn't need to know anything about coding at all. Then for the gamers also, there's another great example of a dungeon game, which is they're describing as a playable 3D dungeon arena prototype, built with Codex and GPT models. Now, I think this is something novel to this setup where Codex handles the game architecture, the combat systems, the enemy encounters, and then the character models, the character textures and animations. Those were created with third-party asset generation tools using something like ImageGen 2.0. This is also one of the earlier signs where you can actually merge a lot of these tools together to build something dynamic in a way that you previously couldn't have done before. Yeah.
Speaker 1:
[10:09] Actually, the quality of this game looks like something out of League of Legends or something like that. At least that's what it reminds me of. These games are getting way more high-def than I expected. I know it's pretty basic for anyone that's watching this. They can pick with a fine eye, but it's cool. But for those of you who prefer the more traditional side of games, this might be something that you can vibe code in a couple of minutes. Now, it may look basic, but theoretically, this is like a 3D spatially aware game, and that's not something that you could achieve, at least very easily with previous models. What I love about this as well is they've also created or included the prompt for all of these things. This is something that you can try right now. Look at this. The prompt is no more than, what's this, one, two, three, four, like 12 lines. 12 lines, dude. You can have a fully functioning game. You can probably then add an extra step or extra prompt saying, hey, can you deploy this to Versailles and send that to your friends. Now, you have a game. You're a game creator, you're a game developer. The applications for this model cannot be understated. I'm going to be very honest. I thought this model was going to be just an iterative upgrade. I didn't think it would get anywhere near Claude Mythos. Two stories have now revealed themselves, which is, one, it's the answer to Claude Mythos, and two, it's really damn good. I am now convinced that compute is everything, but not in the way that I thought it would be useful. I thought it would be hugely for, largely for pre-training. But to Sam's tweet earlier on and also in Greg Brockman, the president of OpenAI's recent interview, they're going all in on inference, test time compute, which just means that if you have more compute and if you have a good enough model, it can do the thing. Like I said, built itself, it's a self-improven model. Very, very impressive.
Speaker 2:
[11:49] It's good for solving hard problems. It's good for thinking for a long time. In fact, they marketed it as a model that can now think for 20 hours coherently, straight, which is almost a full day, it can work on a problem. What you're noticing from this prompt that's on screen is, it doesn't take that much to get it going. You don't need to spoon feed it all the way through anymore. It can make decisions on its own. It can infer conclusions on what you want, just based on the knowledge architecture that it currently has. It's amazingly impressive. In fact, one of the people who got access to it early, just posted on X that he's posting live as his prompt is seven hours into his task. It has been running for over seven hours. He said, this has literally never happened before. The models would maybe run for 30 minutes or so.
Speaker 1:
[12:28] Wow.
Speaker 2:
[12:29] Or if you really shouted them after two to three hours. But he's on seven plus hours. I think this is going to be fun for people with complicated things. If you really want to make a AAA feeling video game, or a simulator, or a really complex website, this is the model to try out and to use it with Codex and see how all these things piece together. It's really, I mean, I didn't have my hopes very high based on the Opus 4.7 to 4.6 incremental improvement. This seems like a very solid improvement over 5.4.
Speaker 1:
[12:57] Absolutely. Listen, if you are listening to this and you're like, listen, I'm not a gamer. I can't waste my time with that. I focus on more serious things. Well, for you serious people, if you're a manager at a top company or whatever that might be, this isn't just a toy or a model used for coders. A lot of the examples that we just gave are around coding. You can use this for just admin stuff or managerial work. The capability of this model to think more strategically and long-term and understand the context of the tasks that you're working towards. Like we said earlier, for coding specifically, it can work on 20-hour long expert tasks. That also applies for administrative stuff or things that are more generalized white-collar worker work. In this example, Noam Brown says, I'm a manager at OpenAI, but I'm using this model to basically manage my entire team and make sure we're focused on the right things. Guess what? The output of this team and this product has been pretty amazing. All around really excellent work by the entire team and the inference team specifically, as Sam Altman says here. I'm looking forward to using this thing. I don't have access to it right now. I've refreshed my account probably like five times at this point and it hasn't appeared. So maybe it's like a slow rollout. If you're listening to this and you've tried it out, let us know what you're using it for. Let us know what amazes you. I really want to hear more.
Speaker 2:
[14:12] Yeah, OpenAI has had a pretty incredible week and this comes on the back of their new ImageGen model that they just released, which was also unbelievable. If you haven't seen that episode, we just recorded it yesterday. So I would go advise you to see because, oh my God, it is amazing. We also recorded an episode on Apple's new CEO this week and what that means to the company as well as the hardware race and how this model, Opus, not Opus, this is GPT. GPT 5.5 is very much part of the AGI class of models that is built on Blackwell chips. And we've recorded an entire episode all about that. Very interesting, very fascinating. Also interesting and fascinating because, as always, this is the weekly roundup. We have a few other topics to talk about. We have some news out of SpaceX, which is a pseudo acquisition. Now, they haven't quite acquired Cursor being the company in question, but they have at least partnered with them with the option to buy Cursor for either $60 billion or pay $10 billion for the right to actually work together. This seems like a big deal. This seems like, I mean, XAI, we could call it SpaceX, but SpaceX AI is taking AI very seriously. They're currently behind. They clearly don't want to be behind. This is a huge step and a huge trust of support in Cursor with this minimum of $10 billion into accelerating their progress and trying to get themselves into this game.
Speaker 1:
[15:24] This is actually a genius deal and there are a few stories why it makes that so. Let me explain. If your SpaceX AI, which by the way is a ridiculous name now, like we'll just call them XAI, you are currently harboring one to 1.5 million of the frontier GPUs, mainly NVIDIA, in a warehouse. There's one issue. You're not really utilizing all of it because XAI has had a bit of a slow start to training their models. What's a genius idea? If I rent those out to another company to train their own model, then we can make money from that. That's win number one for SpaceX. But then they've thought of another thing, which is, Grok isn't really good at coding, and we are losing the race every single day we don't update our model at coding because Anthropic and Chart GPT 5.5 is completely running away with it. How did they leapfrogging get ahead? They should acquire the company that is using their own GPUs to train a frontier coding model. Then the question becomes, well, who the hell is Cursor? What's the moat that they have? Why do they have a good shot of training a better coding model than Anthropic and GPT 5.5? Aren't those two companies way ahead? Well, the answer is not quite so. Cursor, for the longest time, was the number one platform and tool for people to use to do their vibe coding. Why? Not only did they have access to frontier coding models from Claude and Chart GPT, they also had something called an agent harness. Now, you'll notice in GPT 5.5 it's really good at coding because of something called agentic coding. That is something that Cursor pretty much pioneered. It's basically the harness, the prompts, the environment that they mold the model, or rather that they mold around the model, that makes it so good and intuitive and remembers the context across every single project, like menial things like understanding your GitHub branches and working on separate flows at the same time. A lot of the top software engineers in the world right now use tools like Cursor and Argentic Coding to be able to pull this off. So Elon Musk thought, hmm, if I give you the GPUs to train a better coding model, which gives you a better product, I should have the option to acquire you. In acquiring you, I can integrate you with Grok, and Grok somehow becomes the number one coding model over the next year or so, depending on if this deal goes. And if the deal falls through and they create a really bad model, well, you pay me $10 billion for the service, or I pay you $10 billion. Not a bad deal.
Speaker 2:
[17:53] Yeah, it seems like they're going to be continuing to work with other companies to accelerate in places that they're weak at currently, because they're so strong at building out the hardware and creating these huge data centers, they need someone who could take advantage of all those GPUs. Hopefully, this will help serve that cause. And that's not the only SpaceX news this week. The other is that they have officially filed an S1, which for those who are not familiar, it means they're going public. It's official, 100%, they will be going public this year. If there were any doubts, please let them be relinquished. Here we have it, SpaceX will be going public. The most interesting thing from this was, I think, the share structure of how they're going to be organizing this for daddy Elon, who's going to be getting quite a big payday if he does well. So we have on screen here just a series of some of the financials. I mean, we know Starlink as a business has been doing unbelievable. They have about $25 billion in cash, 92 billion assets, 50 billion liabilities.
Speaker 1:
[18:45] That's quite a lot of liabilities on this. My God.
Speaker 2:
[18:48] They got a lot of debt, man. I don't know. We'll see once they finally publish everything. I'm very excited for the first earnings report where you really get a true peek behind the scenes of what's going on there. But it looks like it's going to be going public at a $1.75 trillion valuation. Now, in terms of pay structure, Elon is supposed to get 60 million shares, which is 11 tranches, vesting in $500 billion market cap increments from $1.1 trillion to $6.6 trillion share price. So for those unfamiliar with the current ceiling, I think it's NVIDIA. NVIDIA is what? Five trillion, under five trillion, close to five trillion?
Speaker 1:
[19:25] It's like 4.3.
Speaker 2:
[19:25] Under five. Okay, so not even close. They're like 20 percent away from five trillion. SpaceX needs to be, what is that? Like 20 something percent more valuable than the most valuable company in the world. But if they do, Elon gets 60 million shares. Now, I haven't done the math on exactly how much that is. But if we make some assumptions here, the total value at Vest looks like it could be about a quarter of a trillion dollars. So pretty good payday for Elon. I think the most important thing is that he's getting a lot of control over this. It seems as if he's going to have 40 something percent control of the company, which is really ultimately what was most important to him as they went public. So really exciting news. I am hopeful that it happens this June, which we can expect and it's without a shadow of a doubt going to be the largest IPO in history. I think everyone's going to be talking about it. There is a new vehicle in which some people are investing in. We are actually going to have the founder on the show soon. So keep an eye out for that one. Yeah, the SpaceX news is very exciting.
Speaker 1:
[20:21] Now, in the world of AI hardware, many people think that NVIDIA has run away with the win. And you could argue that with a $4.3 trillion market cap, not many people are competing, except that there is one company, Google. Now you might be thinking Google does all my search engines stuff. Well, Google is the only vertically integrated Mag7 company that is involved or has a frontier capability at every single layer of the AI stack. Now, right at the bottom are these things called Google TPUs, Tensor Processing Units, and they're their version of the GPU. In fact, fun fact, Google's Gemini models has never trained on an NVIDIA GPU. It's all been their own internal warehouse infrastructure, and they've been working on this thing for 10 years. Now, just today or rather this week, they released their latest generation of TPUs, the TPU-8T and the TPU-8I. Now, the TPU-8T, T stands for training or pre-training. It is highly optimized for the pre-training part of an AI model. So this is like the bulk, arguably the more expensive part of training a model. It's like teaching it like, hey, these are words, these are the general fundamental set of facts that you need to know before we can put you out into the world and present you to our users. TPU8I is specialized or hyperspecialized in inference specifically. Now, the important part about inference is, it's being used for so many different things. Number one, it's to answer all your different prompts. Whenever you write a prompt and you submit it to an AI model, it is known as inference. It's getting inference, it needs to query the model and make sure it does the right types of thinking and gives you the right answer. But the other part of inference is post-training, where a lot of people train the model and then they do more training after the fact by using it to help the model reason and think of other alternative facts before it presents you the actual answer. That's what that second TPU is. Now, Google's TPUs have been used extensively. In fact, their largest customer is a little-known AI lab known as Anthropic, which currently runs 1.5 million TPUs. The argument can be made that TPUs are largely responsible for Claude and Opus' success. Very impressive all around, but there's some other facts about this, right?
Speaker 2:
[22:32] Yeah. Well, I love the dual architecture training setup that they have here. Being hyper-specific, I mean, the AT chip in particular, it's built to reduce frontier model development cycles, they said, from months to weeks. Then we have the AI, which is the reasoning engine, which is specifically served for agentic use to deliver tokens really quick as fast as possible. As we know, Anthropic is working closely with them. Also, Google is making these for themselves. I think whoever is working with Google, whoever is kind of focused on these accelerators, is probably in for a nice little windfall as it relates to increased velocity of the training and also increased ability to distribute these models. As we know, Anthropic is having a very difficult time with this. Now, NVIDIA and Jensen are probably feeling a little shook. They got to be feeling a little bit of pressure here. It seems as if that's why they're pushing to be open source, because if you are in a closed source world, where everyone is making closed source models on their own architecture, then the NVIDIA edge very quickly disappears. I mean, I'm looking at these ships in hand. They look beautiful. They're taped out, ready to be manufactured. I think you could start getting kind of excited about this new world of accelerated hardware. We're seeing this happen again and again because Amazon just made another big investment in who else other than Anthropic, and the deal, I think, is like this has to be close to a record deal. They're owning a tremendous amount of this company now.
Speaker 1:
[23:51] The news here is Amazon announced they're investing $5 billion into Anthropic. If you ask Anthropic, they've just raised $5 billion. Congrats. The reason why this is important is, well, there's a few reasons. Number one, Anthropic knows that they don't have enough compute. The argument could be made that's why Claude Mythos hasn't been rolled out. Well, hey, Presto, now you have $5 billion worth more of compute. Now, for those of you who didn't know, Amazon is a primary investor already in Anthropic. Before this announcement, they owned around 17 percent of Anthropic. After this announcement, it's closer to 20 percent. So we're talking about one company that's publicly tradable right now, that owns a fifth of the world's leading AI lab, which is pretty crazy. Now, if we look into the stats of this, this is a five gigawatt deal, which is more than any single data center that's currently live. It's actually a multiple of five. I think SpaceX AI's Colossus 2 is the largest right now with that one million TPS. So it's going to be 5x larger than the average data center that we're seeing right now for AI specifically. And they're aiming to get one gigawatt online by the end of the year. Now, the reason why this is so good for both teams is, Anthropic already has a close relationship with AWS and Amazon's cloud computing department. So spinning up more compute clusters is going to be so easy for them. They have a working relationship. They use training code models on this, so it shouldn't be too hard to ramp this up. If you're Amazon, hey, welcome back. That $5 billion is going to come right back to you. So I don't know what kind of like circle economy this is, but it's back and it's very impressive for them.
Speaker 2:
[25:29] Is it ironic that today Amazon hit an all-time high? Oh, maybe, maybe not.
Speaker 1:
[25:33] I'm holding the stock. I got the stock.
Speaker 2:
[25:34] Clearly, clearly they're doing something right. Amazon is a phenomenal company. They're the largest shareholder in Anthropic. It's hard not to be bullish on them. It's hard not to be bullish on the accelerated computing stack. I think that's probably what Jensen is getting nervous about. That's why NVIDIA is pushing Open Source. Yes. The good news is he has some help. He has some assistance from the folks overseas in China who have been pumping out unbelievable models all week long, as it relates to Kimi and Quen, our Chinese favorites. We have Kimi K26 and Quen 3.6. There's a lot of digits and numbers. All you need to know is that the best Open Source models in the world didn't exist last week, they now exist this week, and they are better at pretty much everything, but exceptional at coding. In fact, word on the street is that some of these models are as good as GPT 5.4 was, and only a few points off of Claude. I mean, these are pretty amazing Open Source models that, again, are free to run locally on your machine if you have the machine capability of doing so. That's a big game changer.
Speaker 1:
[26:30] Okay. Typically, the story we tell with these Open Source models is, wow, aren't they so amazing? Yeah, they're the good younger brother. They're not as good as the frontier AI labs. That completely changed this week. Kimi K 2.6 is the latest model from a Chinese lab called Moonshot Labs. I believe it's Moonshot or Moonshot AI. They released their model which ends up being as good as coding or at coding as Opus 4.7 and it's 100 percent open source like you mentioned Josh, which means that maybe you could run this on a local device. Now, the answer that you would typically get back from this is, hey, listen, it's too large to run on my laptop. That is true, but with the latest Quen model, which is a 3.6 version, you can run it as an 18 gigabyte sized model, slightly quantized on your laptop today. So the point that I want to make about these models isn't exactly the specifics, but across all benchmarks, it's not as good as the Frontier AI Labs, but it's a few points. That difference and gap has closed massively over the last couple of months, which tells me two things. Number one, China has figured out some groundbreaking way to train their models that they haven't told the West about, and they're going to keep it closed and eventually close source their model releases going forwards. And number two, they've figured out a new way to use inference to their benefit. Like one thing I'm going to highlight here is this new KimiK 2.6 model can code continuously for 12 hours straight using 300 agents. So the unlock here isn't one model itself, it's spitting up 300 versions of itself and getting it to attack the problem. That's something Sam realized and what he's implementing in 5.5. That's something Opus 4.7 realized and is doing probably similarly with Mythos. So I have this question here which is like, how the F did China do this? Well, I think every three months that there's a new open model that gets released, they're making these jumps because they're using these models to train themselves. We proved that with KimiK 2.5. There's too many 2 point whatevers. The same thing is happening with Quentin. It's just all-around pretty amazing stuff.
Speaker 2:
[28:27] Yeah, Chen is crushing. Before we go, we have two quick things to hit. The first being one that we missed last week, which we need to touch on quickly. Anthropic has a design tool now. If you are a designer, if you are interested in building web pages, videos, graphics, slideshows, pitch decks, any type of visual asset, Claude now has an entire design suite built just for this purpose. It's called Claude Design. It exists separately. You can access it through the desktop app or on your browser. And it basically allows you to build visual assets in a way that you couldn't previously. Previously with Claude, you had artifacts. An artifact you could generate, something dynamic, it could kind of build you a web page. This takes it to a whole new level. You could generate wireframes if you want to try it, use less tokens. You could fill it out and create properly created prototypes that are actually clickable. It's amazing. The video we're seeing on screen highlights a few of them. Unfortunately, there was a big loser in this because this sounds like a lot of what that little design company named Figma does.
Speaker 1:
[29:21] Yeah, that little company.
Speaker 2:
[29:22] And the stock market did not love the reaction to that, did it?
Speaker 1:
[29:26] Nope, nope. It is down almost 20% on the week. I actually tracked the stock price after the announcement was made. So like it wasn't even readily available. It was literally just the tweet. 20 minutes after it was tweeted, the stock was down 6%. So the point being whether this is market speculation or not, like listen, Claude design isn't as good as Figma. They're working with a few of these different partners such as Canva. But two weeks ago, one of Anthropic's former most execs left the board of Figma and the rumors was because they were building a competitor. So it's pretty clear Anthropic is going after every single sector, whether you're a designer, a software engineer, a mathematician, a research scientist, doesn't matter. They're going after everything because the model is applicable to everything. And I don't know what this means for certain modes that companies like Figma holds, but it's certainly going to affect the stock price.
Speaker 2:
[30:14] Can you do me a favor and click the Max button real quick for me just to show the chart? Hell yeah. Oh! Yeah, minus 86% since IPO for those who are not watching on screen. It's been a pretty bad, rough run for Figma.
Speaker 1:
[30:27] We have to start naming Anthropic the stock killer, Josh. This is like every single tweet is tanking a stock.
Speaker 2:
[30:33] No, it's tough. It's brutal. We had one last thing that you wanted to mention, I know. We got to end on this strong.
Speaker 1:
[30:38] What do we have? How good is your accent or impersonation of your president, of our president, Josh?
Speaker 2:
[30:45] Pretty horrible. Not good.
Speaker 1:
[30:46] Okay. Well, then we're not going to attempt it.
Speaker 2:
[30:48] I'd love to hear your British take on it though, if you're feeling ambitious.
Speaker 1:
[30:51] Okay. My British take on this is, this is albeit hilarious and somewhat terrifying that the President of the United States is saying this. He commented on the government's relationship with Anthropic. Now, if you're wondering why on earth he's commenting on it, they're going to be releasing this Claude Mythos model. It might be a security risk. It's probably good for the government to have access to this thing and prepare necessarily. The government has been having very important conversations with bankers and governments all around the world to just try and figure out how best to prepare for this. After having an in-depth discussion with Dario Mori, which by the way, he blacklisted that CEO and Anthropic entirely from the government using it, he's now rekindling it and saying, maybe there's a deal on the line. He goes, and I quote, I'm not going to do the accent, we'll get along with Anthropic just fine, Trump said on CNBC. Wait, you got to let me try.
Speaker 2:
[31:39] We'll get along with Anthropic just fine. I think they can be of great use to us. They're high IQ people. Very good. They tend to be on the left, radical lefts, but we get along with them. I don't know.
Speaker 1:
[31:50] That's all I got.
Speaker 2:
[31:50] But that is what he said.
Speaker 1:
[31:51] Were you practicing that? That was actually pretty good.
Speaker 2:
[31:53] I was practicing my head. I was rehearsing.
Speaker 1:
[31:55] Damn. I closed my eyes whilst you were doing that, whilst I was laughing.
Speaker 2:
[31:58] Did it feel right? It sounded like him going, I'm glad.
Speaker 1:
[32:00] It channeled his spirit. It was there. It was a good effort. But I believe that's it. That is the end of the roundup.
Speaker 2:
[32:07] What a whirlwind man.
Speaker 1:
[32:08] Josh and I are recording this, FYI. It's 4 PM over here. Typically, we're morning birds. We deliver this in the morning, but we waited for the announcement of Spud, GPT 5.5, just for you guys. We're going to be bringing you the cutting edge news every single week. As Josh mentioned, we had three other amazing episodes that we filmed earlier this week. Definitely go check them out. Each 20-minute song, it's your commute to work. It's your gym session if you're not that active. Definitely go check it out and let us know what you think. But yeah, Josh, any final thoughts?
Speaker 2:
[32:36] Call me crazy, but I like the afternoon recordings. I got good energy. I'm like woken up. I'm 100 percent right now. You're eating something? I'm rocking and rolling. I'm feeling good. So I don't know, maybe we'll have to lean into this a little bit more, but that's everything. If you've made it this far, if you're still listening to this and you've heard our other episodes, you're caught up, you're done for the week. You can go touch grass, enjoy your weekend. There will be a lot more to talk about next weekend, but for now, you have fully synchronized with all of the chaos happening on the frontier of AI and technology. Thank you so much for watching. As always, we very much appreciate it. If you enjoyed this episode or any of our previous episodes from this week, don't forget to share them with a friend who you also might enjoy it, possibly. We have a newsletter on Substack that goes live twice a week. Just went live yesterday, going live again tomorrow. The Friday issue is a recap of everything that happens this week, which is always fun and exciting. In fact, I'm going to go write that as soon as we finish this episode. So thank you all for watching. As always, don't forget to subscribe, like, comment, all the good things, and we will see you guys next week.