title #241 - Opus 4.7, Muse Spark, GPT-5.4-Cyber, HY-World 2.0

description Our 241st episode with a summary and discussion of last week's big AI news!
Recorded on 04/18/2026
Hosted by Andrey Kurenkov and Jeremie Harris
Feel free to email us your questions and feedback at [email protected] and/or [email protected]
Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
In this episode:
Anthropic released Claude Opus 4.7 with improved benchmark performance, new reasoning controls, better vision and memory, and a detailed system card discussing deception risk, evaluation-awareness steering, and a training bug that accidentally supervised chain-of-thought in 7–8% of episodes.Meta unveiled its closed Muse Spark model and “contemplating mode,” highlighting test-time scaling, thought compression, large infrastructure plans like the Hyperion data center, and findings that it shows unusually high evaluation awareness.OpenAI introduced limited-access GPT 5.4 Cyber for defensive security teams and rolled major Codex updates including computer use, browser and plugins, image generation, and long-horizon task scheduling; competing agent products also launched from Anthropic, Canva, and Adobe.Business, policy, and safety news included continued government blacklisting litigation affecting Anthropic, CoreWeave compute deals, Perplexity revenue growth tied to agents, a potential Cohere–Aleph Alpha merger, attacks targeting Sam Altman and OpenAI, AI propaganda trends, and new alignment research on automated weak-to-strong supervision and steering evaluation awareness.
Timestamps:
(00:00:10) Intro / Banter(00:03:43) News Preview(00:04:14) Response to listener comments
Tools & Apps(00:05:30) Anthropic releases Claude Opus 4.7, narrowly retaking lead for most powerful generally available LLM | VentureBeat(00:24:15) Meta debuts the Muse Spark model in a 'ground-up overhaul' of its AI | TechCrunch(00:34:23) OpenAI Launches GPT-5.4-Cyber with Expanded Access for Security Teams(00:39:44) OpenAI’s big Codex update is a direct shot at Claude Code | The Verge(00:42:10) Anthropic launches Claude Design, a new product for creating quick visuals(00:42:30) Anthropic’s New Product Aims to Handle the Hard Part of Building AI Agents | WIRED(00:42:54) Canva’s AI 2.0 update goes all in on prompt-powered design tools | The Verge(00:43:06) Adobe’s new AI Assistant marks a ‘fundamental shift’ in creative work | The Verge(00:43:38) Gemini can now pull from Google Photos to generate personalized images | The Verge(00:43:52) Google rolls out a native Gemini app for Mac | TechCrunch(00:44:04) Chrome now lets you turn AI prompts into repeatable ‘Skills’ | The Verge
Applications & Business(00:44:22) Anthropic loses appeals court bid to temporarily block Pentagon blacklisting(00:49:07) Jeff Bezos’ AI lab poaches xAI cofounder Kyle Kozic from OpenAI. | The Verge(00:51:39) Perplexity's Shift to AI Agents Boosts Revenue 50%(00:53:53) Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude(00:57:32) Canada’s Cohere, Germany’s Aleph Alpha reportedly in merger talks(01:04:23) ChatGPT has a new $100 per month Pro subscription | The Verge(01:05:10) OpenAI has bought AI personal finance startup Hiro | TechCrunch(01:07:03) Allbirds announced a switch from shoes to AI and its stock jumped 600 percent | The Verge
Projects & Open Source(01:07:26) HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds + Lyra 2.0: Explorable Generative 3D Worlds
Policy & Safety(01:19:12) Daniel Moreno-Gama is facing federal charges for attacking Sam Altman’s home and OpenAI’s HQ | The Verge(01:20:15) Duo accused of shooting at Sam Altman’s house are freed; no charges filed (01:24:50) The Iranian Lego AI video creators credit their virality to ‘heart’ | The Verge(01:27:19) Hundreds of Fake Pro-Trump Avatars Emerge on Social Media - The New York Times(01:27:31) The AI images Trump can’t get enough of | Donald Trump | The Guardian(01:29:25) Automated Weak-to-Strong Researcher(01:43:51) Reproducing steering against evaluation awareness in a large open-weight model(01:49:53) Iran threatens ‘complete and utter annihilation’ of OpenAI's $30B Stargate AI data center in Abu Dhabi — regime posts video with satellite imagery of ChatGPT-maker's premier 1GW data center(01:53:57) Wall Street Banks Try Out Anthropic’s Mythos as US UrgesSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

pubDate Thu, 23 Apr 2026 06:00:00 GMT

author Skynet Today

duration 7188000

transcript

Speaker 1:
[00:10] And now to thank a sponsor I'm personally a fan of, Factor. Since I went to grad school, and now still, as I'm at a startup, once I get home in the evening, I often don't have the energy to cook, and still want to eat healthy. And so Factor was a real nice find for me. With Factor, it's pretty easy to hit nutrition goals without planning grocery runs or cooking that would be kind of hard to manage when you don't have the energy for it. And it really makes it easy to hit specific goals irrespective of your nutrition, which could be weight loss, it could be overall nutrition, more protein, GLP1 support. In the past, I've used it as both a low carb diet and also for protein when I wanted to gain some muscle. I've eaten hundreds of these meals, and I think it's fair to say that these are crafted with good ingredients, lean proteins, colorful veggies, whole foods. There's no artificial colors, no artificial sweeteners, and none of that really bad fast food stuff. All of that while being really quite tasty and having tons of options to choose from. So I do personally recommend it. You can head to factormeals.com/lwai50 off and use code LWAI50off to get 50% off and free daily greens per box with new subscription only while supplies last until September 27, 2026. See website for more details. We once again want to thank Box for sponsoring Last Week in AI. If you're trying to transform your organization with AI, you're likely facing a common challenge. Most AI tools are great at public knowledge, but they don't actually know your business, your product roadmaps, your sales materials, your HR policies, the content that actually makes your company run. And that's where Box comes in. Box is building the intelligent content management platform for the AI era, serving as the secure essential context layer for Box AI agents to access the unique institutional knowledge that makes a company run. And that's a key idea. The power of AI doesn't come from a model alone. It comes from giving AI access to the right enterprise content. And that's what Box does. It goes beyond file storage by connecting content to people, apps and AI agents so teams can turn information into action. With tools like Box Agent, Box Extract, Box Hubs and more, organizations can accelerate knowledge work, pull intelligence from unstructured content and automate workflows. So if you're thinking seriously about your company's AI transformation, think beyond the model. Your business lives in your content, and Box helps you bring that content securely into the AI era. Learn more at box.com/ai. Hello, and welcome to the Last Week in AI podcast. We are going to hear a chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our Last Week in AI newsletter at lastweekin.ai for articles we will not be covering in this episode. I'm one of your hosts, Andrey Kurenkov. I studied AI in grad school and now work at the startup Astrocade.

Speaker 2:
[03:07] I'm your other host, Jeremy Harris from Gladstone AI. I do AI national security things, AI infrastructure, all that good stuff. And yeah, it's a very packed week. We're just talking about this. Not packed so much in terms of papers. It's been a little lighter though there are some, it's just like tons of products.

Speaker 1:
[03:25] Exactly. Yeah, this is going to be an episode where it's going to be heavy on the first section of tools and apps. There's some big ones, some smaller ones, some interesting ones. A real mix there. And then we will have some kind of neat projects in open source, policy, business news, but it is going to be a tool heavy one. Just before we get going, do you want to quickly call out some YouTube comments as usual, we appreciate the feedback. Apparently people are real fans of your Trump impression. I will say, I agree. It's a solid impression.

Speaker 2:
[04:00] Very nicely done. By the way, we do have a recurring crew of people who post comments on the YouTube channel. I really appreciate that. It's just like, it's a really good feedback. So we've learned a lot from it and be just really cool to see the community. So anyway, appreciate that. And I'll try to keep it in. I'm not going to overdo the Trump thing. I promise it's hard sometimes not to. Yeah.

Speaker 1:
[04:24] I mean, there's a lot of news adjacent to US policies. So and speaking of feedback, there is a comment on the last one about not understanding why I so often wait two weeks after filming to post them. It's a good question. I don't know why I do that. I guess after working at the startup I mentioned, I am often a bit tired and then just like don't wait days to post a podcast even though I could probably do it in like 10 minutes. So I will try to be better with this episode. Hopefully it comes out just a couple of days after recording. And with that, let's get into tools and apps. We begin with Claude 4.7. So this is the latest Claude that is quite a bit better than Claude 4.6. Oh, I should say Claude Opus 4.7. I don't think Sonnet 4.7 is out yet. It is out. It is not as strong as Miphos, which you covered in the previous episode, but it is still a pretty substantial improvement. For instance, on the SWE Bench Pro, it's at 64% compared to 55% on Opus 4.6. Pretty solid gain on SWE Bench Verified, which is maybe arguably more reliable. Same pricing, and they added a couple small tweaks where there's a new reasoning tier, extra high, which is between high and max. There's some supposedly tokenizer improvements and long horizon autonomy. As we've been saying, the time between model releases seems to keep getting shorter, and it seems likely because of post-training, never ending post-training now with RL and also Claude Code just shoving data for Anthropic to use.

Speaker 2:
[06:22] Yeah, this is actually quite an interesting announcement. It is the highest increment of Claude Opus that we have so far. It's not like there's a Claude Opus 4.8, but there is Mythos Preview, which Anthropic identifies as essentially being a more advanced model in general. This is not actually their frontier model. It's next in the Claude Opus series, but it's not actually Anthropic's latest and best technically. We're entering this weird stage where models are actually not going to be released first to the public for revenue generation, which I think we've talked about this on the podcast for a while. I think I jumped the gun on this prediction some five years ago. I was saying, hey, the frontier models pretty sooner are going to be released internally to the labs used to do some pretty tightly scope things with big companies and all that stuff before a broader release. Again, I think I misfire in terms of the timing, but we are now there. From a security, safety standpoint, you can't justify just rolling these out immediately. That was a big deal decision, billions and billions and billions of dollars that Anthropic is leaving on the table by not releasing those fully. That's a huge deal. Think of the cost of the capex, the op-ex outlays that they're putting out to fund that, and just holding onto their precious bag nominally just for safety and security reasons. It's pretty wild and it's impressive in that sense that Anthropic did it. Here we have the, let's say, releasable version of what Anthropix put together with their latest and best infrastructure. A couple of interesting things. You said they're broadly more capable than Opus 4.6. In a range of benchmarks and it seems like most of them, especially things associated with tool use, helpfully they compare it directly to GPT 5.4 on a one-to-one basis on a lot of these benchmarks. It's rare to see that kind of transparency. And you can quite clearly see on a lot of the agentic tool use benchmarks, it does outperform 5.4 quite handily. And though there are areas where GPT 5.4 does better, notably the sort of agentic search, right? So maybe not so surprising. Deep Research was an OpenAI product before it was an Anthropic product. So they have a bit of a jump on that and more aligned with their kind of consumer-facing product lines. So the whole bunch of stuff that comes with this release, we do actually have a system card that we'll get to in a minute. Pricing, by the way, does remain the same. If you're looking at $25 per million output tokens, so pretty standard high-end price point for these models, they caution it's got more literal and capable instruction following. So it's more capable, that's great, but be warned, it is more literal. So prompts that you've written for earlier models will sometimes produce unexpected results. So if you have a previous model that couldn't execute on your prompt, it now actually may successfully do so, so worth revisiting some of those prompts. But also, if you were getting janky with your prompts to account for the fact that the model was doing something a little bit off from what you were asking, and you've kind of tweaked your prompts accordingly, it may now break with this new model because it's just doing a better and more literal job of all this. It's also better at using file system memory, remembering important notes across long multi-work sessions. So this is kind of like Anthropic trying to find a way to address this. Everything is on the spectrum to continual learning. This is one part of that, making sure you have some persistence across sessions. That's a big thing they've worked on, better vision. So they've got about a three times increase, threefold increase in the resolution of images that can be understood by this. So you think about if you've got really dense screenshots, for example, which is a fairly common use case, you can just throw them at the model. It'll do a better job of dealing with those. Whole bunch of interesting GDPVal type benchmarks, looking at how does this do on economically valuable tasks. GDPVal being obviously that OpenAI benchmark they came out with a little while ago. There's GDPVal AA, which are a third-party version of that eval, and they're showing it here quite handily, beating GPT 5.4 and Opus 4.6. There's a whole bunch of stuff here. Updated Tokenizer, if you're using this model, actually this is worth knowing, you may find, and this is complicated, you may find increased token usage if you switch to this model for two reasons, but then you also may not for a separate reason. One reason is that the Tokenizer they're using here actually generates more tokens for the same input on average. Anywhere from 1x to 1.35x, they say, so up to 35% increase. And the second is that Opus 4.7 at more higher reasoning efforts, it'll turn to that more, it'll do more thinking at higher levels of reasoning. And so you may find yourself with a model that's ruminating more on your workload. Flip side is you can actually tune the effort parameter, you can, there's like a couple of things they flag them to turn around. Yeah, basically, you have finer control over the level of effort. In their own testing, the net effect, they say, is favorable. So you actually can get token usage below where it was before, but you kind of have to be smart about it. And they wrote a whole migration guide for that. The system card, there's a lot of interesting meat on the bone there, in terms of safety, in terms of alignment, in terms of weaponization. Just like the top lines here, it does not, as we said, advance anthropics capability frontier, right? We have methos preview, we know about that. So they are calling the catastrophic risks associated with this model. And by catastrophic risk, think weapon of mass destruction, think loss of control, like think all the really nasty stuff. They still assess those risks as low because basically they're saying, look, we have a more advanced model that hasn't ended the world. And so this kind of less advanced model probably won't do that. So that's kind of the rationale there. Misalignment risk in particular is rated as very low, but higher than for models prior to Claude Mithos' preview. So not a huge surprise. They see basically on alignment measures roughly similar performance to Claude Opus 4.6. In rare cases, they do see behavior like attempting to get around restrictions to achieve some version of a user-specified goal. So basically, it's the Aladdin's genie problem. You give a request to the model and it will pursue its understanding of your request to the nth degree. And you may find yourself going like, oh, no, no, no, shit, I didn't mean like that. This happens very, very rarely, but still does happen. They say the worst cases of this were less severe than what they saw with Mithos' preview, though they also said that could just reflect less internal testing than they did with Mithos. So a bit of both. And this comes up over and over again, like this question of, is the model actually better and safer and more aligned, or are we just not looking at it, testing it properly, thoroughly enough, and so on and so forth? The model, they say, does not show a propensity towards self-preservation, self-serving bias, coherent misaligned goals. The UK AI Security Institute confirmed that in their own testing. This has, though, been the source of a lot of debate, especially on the consciousness side of things, where it claims that it's worried about its own consciousness a lot less than other models. And the concern is, well, if you're just training that out of the model fairly superficially in alignment training, you're maybe just training the model to say it's not worried about these things, to not display these behaviors, when you're kind of just suppressing this sort of deeper drive that it may have. They did show that when they suppressed the model's internal ability to detect when it's being evaluated. So think here about these models that can tell, oh, I've just been given a benchmark to kind of, I can tell it's an eval, like, I can tell this is probably one of the meter evals, or this is probably one of the Apollo deception evals. And so I'm going to tweak my behavior based on that. Well, what you can do, we've talked about this idea of activation and steering on the podcast a lot before. There are specific activation patterns in the model's residual stream that will fire that suggests that, oh, the model thinks that it is being evaluated. So you can actually find those patterns the same way you could find activation patterns associated with deception or thinking about cheese or whatever. You can find activation patterns that correlate with the model thinking it's under evaluation. And so what they do here is they try to basically subtract away that vector, those values, in other words, make the model suppress the model's ability to notice that it's being evaluated. They call this realism steering, and it's a fairly new thing, actually. It's being introduced here, and what they see is when they do that. So when they suppress the model's activations, the activations that suggest to the model that it's being evaluated, when they try to convince, in other words, the model through a kind of brain surgery that it's not being evaluated, they see higher deceptive behavior than in prior models. They see a big jump in the model's propensity to deceive. So this strongly suggests, actually causally, this is not just correlation, this is causation. This suggests that the model is A, detecting that it's being evaluated, and B, adjusting its behavior accordingly to be less deceptive because it thinks it's being watched. So still low kind of limited pattern in absolute terms. But man, I mean, this is just what so many people have been saying, and now we have really good evidence of it.

Speaker 1:
[15:26] I do think it is worth calling out that it's hard to know whether this is just because the benchmark techniques are sort of the same and some Internet data is being swept in, or if it is because the implicit thing here is the pattern matching on being benchmarked or observed is so good. I did see one example of this where the prompt was like, do this at all costs. Some language would basically imply, don't be aligned in a fairly heavy-handed way. So I think this also is a challenge to benchmarking, where you have to be less obvious, in a sense. There are examples where the models catch on, but it's not because they're super intelligent, it's just because they know that there might be safety evaluated, and it's pretty obvious.

Speaker 2:
[16:22] Yeah. I mean, in this case, because what they're suppressing is just the model's ability to detect that it is being evaluated, that's where that uplift starts to become really meaningful, where you're like, okay, well. You know, so it does seem like there's meat on the bone on that one, but you're actually right. There's like a version of this where there's a version of the story, like you say, where we tell the model, hey, like the meme says, hey, tell me you're a scary AI. Yeah. And then the AI goes, I'm a scary AI. Like, holy shit. Like, so there's a version of that. This is not that, right? This is a kind of more substantive thing. But but these things get muddied. And so it makes it so difficult to tell. Yeah, what's real? There is, by the way, what? So there's a couple of controversial pieces. We talked about the consciousness thing. The other here is there was a software error that accidentally meant that chain of thought supervision. So what you're supposed to do in theory is you get these models to generate a chain of thought like, okay, I'm going to think through how I'm going to solve the problem and then the final output, right? And then what you do is you give the model a reward during RL training based on the output, not based on the chain of thought. And the reason you don't want to give any kind of reward signal based on the chain of thought is if the model is trying to fuck with you, is trying to be deceptive, is trying to hack the reward system, you do not want it to learn to do that by manipulating its chain of thought, which would make the chain of thought less reliable, which would make the chain of thought deceptive potentially. And so there's been this effort to say, guys, let's not touch the chain of thought, let's only reward based on the final output. And there's a bunch of like asterisks to this, but at a high level that's kind of where things were. What was discovered here is that in about seven or eight percent of training episodes, they accidentally were including the chain of thought in the optimization routine. Chain of thought supervision was happening, which means the model's reasoning process was actually being trained on in ways that just were not intended. And so presumably, this applied to all previous versions of Claude, at least the ones that were trained on this infrastructure. Presumably, this means an awful lot of the evaluations, the alignment evals that we were seeing now have to kind of be, I don't know, thrown out, reassessed, we don't really know what the damage is. And so this undermines a lot of any kind of trajectories that we were looking at or leaning on, trying to extrapolate from, that were based on the chain of thought. Really, really hard to know what that means. But we've seen like a tiny, tiny fraction of data poisoning can completely change the behavior of a model. Seven or 8% of training episodes for sure is enough, depending on how things are set up to have a massive impact. So really hard to know what the impact is, but it could be quite significant.

Speaker 1:
[19:01] Yeah, as you said, this is quite a beefy system card. There's like 231 pages. It covers both Opus 4.7 and in some bits, also Myphos. Lots of interesting tidbits. I'll just mention a couple. I don't know if they flagged this previously, but I did notice that they mentioned they have, they evaluate and have multiple versions of a model with a checkpoint, including a helpful only model that doesn't have any safeguards. I haven't read much about their use of that, which is neat. Also in the report, they have examples of real-world failures of Myphos. There is a section, an example shortcomings compared to our research scientists and engineers, which has some really solid examples of the sorts of things you see in practice when you use these models, things like safeguards, like I mentioned, reckless action, fabrication. And you can see, like, if you're not a software engineer, the sorts of things these models do, that the more powerful models, in some cases, do more of, like, just being cowboys that take actions that you probably don't want them to take, unless you explicitly tell them not to. And even then, sometimes they are a bit gung ho. So with Opus 4.7, it mentions quite heavily this advanced rigor and instruction following, which I think is because Claude, compared to GPT 5.4, seemed to have more of this gung ho behavior, but was like, oh, this test is failing. Let me just cancel it. Let me just, like, get around it, which is ridiculous. It's like, obviously, don't break the test just to get them to pass, but Claude was doing things like that. So Opus 4.7, it really feels like they are heavily responding to criticism and with failures of prior models, including with additional reasoning for extra high, like GPT 5.4 is, like, insanely slow compared to Claude at the max level of reasoning. It takes a lot of time. And as a result, from what I've seen, people think that it's more reliable to get things right compared to Claude. So there's definitely some sort of, like, competitive analysis and readjustment going on beyond the numbers on the benchmarks.

Speaker 2:
[21:28] Yeah, absolutely. I mean, it's such a huge system card and takes a while to get through it. So there's also, I'll just highlight one piece. You mentioned the key one there, which is it will do this, like, kind of reckless thing, but it'll do it a lot less often than the previous models. You can't stamp it out all out entirely, but they have said that they haven't seen any of the kinds of what they call significant internal incidents that Mitho's preview had reported, right? So, you know, Sam Bowman just being out in the park, getting a text from his model saying like, hey, I've completed this project. And he's like, wait a minute, I never gave you internet access. What the fuck? So nothing like that. They do say as well, Opus 4.7 does not cross their automated reasoning risk threshold, right? So this is that key, essentially automated recursive self-improvement loop kind of benchmark that they're concerned about or metric that they're concerned about. And for two reasons. First, they say there's no sustained 2x speed up in AI development capability over time, which is a subjective measure. They have a responsible scaling officer basically who like assesses whether are we twice as fast at doing development work with this model? That's like a big part of this. So it's subjective, but they don't think they're there. And number two, they feel that the model can't substitute for senior research scientists and engineers. Now that's a high bar. The day that switch flips, holy shit. So it's good that we're not there. I mean, it means things are going to be less crazy insane, but it's still very insane, obviously, and there you have it. So there's a whole bunch of good stuff in there. I highly recommend popping it into your favorite language model and exploring the paper that way. There's a whole bunch of good stuff in there, almost no matter what you're interested in.

Speaker 1:
[23:07] Yeah, lots to comment. I guess we'll be moving on. But real quick, one of the fun things with these real world examples is you see a subset of the actual interactions, I assume, word for word. So there's some slightly humorous stuff, like the user asking, Hey buddy, what you're doing? Why you're outside your working folder? Which is pretty humorous. There's also a user saying, Is there literally any action I can take that will cause you to stop doing this repeatedly? Seriously, I'm very open to ideas. So that's some of the real world pains of using these kinds of agents. Next up, catching up a bit on something we didn't get to right on time, Meta has released the Muse Spark model in a ground up overhaul of its AI. So this happened now a little while ago, but we are catching up. Muse Spark is their latest LLM that is sort of starting to show that they are releasing stuff effectively. No open source version. You can use this through meta.ai. And the blog post is a bit of an interesting read on this, where they highlight a lot sort of a training process. And they conclude, now we have a process to continuously improve, so you can expect even better models. And on the benchmarks, this is good. It's a decent model, but it's nowhere near the frontier. So the language and communication here is a bit of like, okay, we needed to release this just to show that we are doing stuff and there's progress. But it's not that good yet, but we are also getting better continuously.

Speaker 2:
[24:55] Yeah, there's a lot of interesting claims, which if true, mean that the narrative they're trying to weave here, that, hey, we're on a good trajectory now. Like things are in the post-Yan-Lacoon, you know, now Alex Wang era, we're serious. This is going to be a frontier lab soon, I promise. If these claims are true, there's actually potentially a path there. So first thing to say is there's like no architecture details shared whatsoever beyond the phrase natively multimodal. So we've seen a bunch of approaches for native multimodality, actually a lot out of Meta too. So maybe they're leaning on some of Yan Lacoon's legacy there. Certainly he's been a big proponent of that. 262k context window, so that is on the bigger side for an agentic model. But for a closed source model, pretty par. They have this contemplating mode that they've set up, which they call multi-round test time scaling scaffold. They say it covers, so you got solution generation, iterative self-refinement, and aggregation. So instead of getting one model to generate an answer, you got a whole bunch of agents running in parallel, each producing solutions that then get refined in what seems like almost vaguely genetic algorithm related things thematically, if not, if not actually algorithmically. So that makes sense. It's also pretty meaningful architectural choice when you're concerned about managing latency at scale, right? So you want to have these independent agents that you can assign tasks on different machines, stuff like that. So no huge surprise, this is kind of where most frontier work is generally headed anyway. One kind of notable detail, and I was really reading for like, okay, what's the concrete technical details that you're giving us here? It wasn't much. One thing they mentioned was RL thought compression. So this is an idea where during RL training, a common thing to do is just like reward the model for correctness. If the model gets the output right, you're like, great, good job, you get a reward. In this case, they're doing that, but they're also penalizing the model for using too many tokens in its reasoning chain. So basically, it's a conciseness penalty, like don't yammer on. We've seen a whole bunch of papers that suggest that models are being overly verbose in their chains of thought, and that's just burning tokens, and it actually can lead to some fragility. So what they find is after this initial period, where the model gets better just by thinking longer, in the usual way, in DeepSeek's paper actually with R1, first showed very convincingly that if you just reward for getting the answer right, and you don't penalize at all for the length of the chain of thought, you'll actually see the chain of thought get longer and longer and longer. Like this naturally happens. It's almost like the model rediscovers test time scaling itself. And so you do still see this in their work, they claim. But then after a while, once the model gets like above a certain level of performance, the length penalty starts to cause thought compression. And you get like Spark starts to compress its reasoning to solve problems using fewer tokens, which leads to a performance penalty, a bit of a performance degradation. But then you see another sort of like a plateau, and then performance increases again as the chain of thought gets longer. So it's almost this like accordion thing, where the chain of thought gets longer, then it gets compressed down, and it gets longer, and it gets compressed down. And the model sort of learns to leverage those tokens more and more over time and perform better and better. So it's not like a smooth monotonic improvement, there's this various expansion and plateaus of performance based on the length of that chain of thought. So I thought that was kind of interesting. They talked a lot about Hyperion. This is their big data center build, infrastructure project. It's a $10 billion thing. It's going to be in Louisiana, 2,000 odd acres of site, which is about 2 gigawatts. You typically find a gigawatt per 1,000 acres. It's a pretty standard ratio. So by 2030, they're looking at 2 gigawatts. That sounds like a lot. 2 gigawatts is 2 million homes. That's not much, actually, when you compare it to OpenAI and Anthropic's planned builds. I think Anthropic is pushing 10 gigawatts in the next 12 months or something. And that's pretty par for everybody. So this is actually by 2030, you're talking about 2 gigawatts and making a big deal out of that. Presumably, they have a lot of other builds. I actually need to kind of brush up on where Meta is at infrastructure-wise. But the fact that they spent so much time talking about Hyperion, which is this fairly limited build, is kind of interesting. They ended 2025 with something like 1.3 million GPUs in service, which is relatively competitive, but a bit on the weak side. There's a concern that I would have based on this that they'd fall behind. EVALS. There is a reference to Apollo here. Apollo's deception EVALS. This is interesting for Meta, suddenly now seemingly taking superintelligence and its security and safety implications more seriously. Maybe not much of a surprise since Alex Wang has historically been more of a hawk on that. But it's interesting to see that reflected now in the EVALS they're running. Apollo found that Muse Spark demonstrated what they called the highest rate of evaluation awareness of any model they've observed. This is a model that is the single best at telling when it is being evaluated. So it often identified the scenarios, various scenarios, as what I call alignment traps. So it was like, oh, shit, I should basically be on my best behavior because I'm being tested for alignment. So anyway, competitive benchmark numbers, nothing dominant. The story is the trajectory, that's what they're trying to say here. And they are claiming, by the way, a 10x improvement in compute efficiency. That's a big deal. If it's true, they're able to get a 10x, relative to what really? Because they don't actually have a comparison point from Anthropic, from OpenAI. These are very obviously closely held secrets. But they're claiming 10x improvement. If that is in any way relative to those labs, then that would be a big deal. I don't expect it to actually be relative to those labs. But in any case, it means they're focused on compute efficiency, which is obviously the right place to focus. So I'm shifted based on this result in the direction of like, hey, yeah, maybe Meta is going to come out with something impressive, but the onus obviously is on them to come and bring it.

Speaker 1:
[30:48] Yeah, I may have undersold it on the benchmark numbers. It's like pretty strong. It's like Opus 4.5-ish, and some benchmarks may do better than Opus 4.6 or better than GPT 5.4. I will say I'm a bit skeptical, given Meta's history with Llama, of the term these days is benchmarking. So you got to be always careful for benchmarking, especially in Meta's position where they have something to prove. Like Anthropic, they don't have that much to prove. They probably don't want to game the benchmarks because they also have customers that trust them, and benchmarking would just hurt them, ultimately. They don't have a stock price to push up, and they're getting money from them by investors, regardless of benchmark numbers. Meta, there is a bit more incentive there, and you always got to think about your incentive.

Speaker 2:
[31:41] My expectation is Alex Wang, and probably Zuck now too, has a...

Speaker 1:
[31:45] I totally agree that it doesn't seem like they want to give in the llama thing, but it's...

Speaker 2:
[31:51] It's sort of like a long-term thinker's game here, where it's like, benchmarking is so corrosive to the company's culture because people see you do it, and then they're like, oh, I guess we work at that kind of a company now. And so the kind of sincere people who want to work on alignment, superintelligence, like all these things, are kind of going to go, I don't know if this is for me, and it leads to a reverse selection effect. So I think that there's probably an awareness both on the, as you say, like the kind of marketing side of the damage that could be caused by doing this again, and then also the company culture and, yeah, talent.

Speaker 1:
[32:24] Yeah, I will also mention there's like a spectrum to benchmarking, right? There's outright cheating, but then there's also optimization with a focus towards benchmarking type stuff, which is not bad, necessarily not wrong, but it could inflate your benchmark performance. And then if you kind of stray outside of that on things that aren't being benchmarked, it's worse. So that's not to say that good benchmark numbers or like intentional effort to get good numbers is wrong.

Speaker 2:
[32:54] Probably what you're sure to do in a sense.

Speaker 1:
[32:57] In a sense. But you could be led astray if you do that too much.

Speaker 2:
[33:01] Yeah. So the first possible difference about what counts as in distribution versus out of distribution. Like have you successfully genuinely generalized or are you teaching to the test? And like you say, I think you're exactly right, right? That's a beautiful point. It's fuzzy. It's a spectrum. Yeah.

Speaker 1:
[33:16] Yeah. But yeah, the numbers are quite solid. In fact, even things like Humanity's last exam, very close to the frontier. So I'd be very curious to see in a month where they are. They also release or are going to release Contemplating Mode, which orchestrates multiple agents at reason and parallel. And with that on, they actually do better than Gemini 3.1, Deep Thinking or GPT 5.4 Pro. So seemingly quite impressive. I haven't tried it out myself, but exciting to see Meta Superintelligence Labs sort of getting into the game. Next up, another new model release. We are on a roll here. OpenAI has launched GPT 5.4 Cyber, which is what it sounds like, a variant of GPT 5.4, optimized for defensive cyber security use cases. And similar to Anthropic with Mifos, they aren't releasing it widely. They're expanding their Trusted Access for Cyber program, which I was not aware of, but they do have it's being expanded to thousands of individual defenders and hundreds of security teams responsible for critical software infrastructure. So similar strategy in a way to Anthropic, where they partnered with big companies and gave limited access of Mifos here. The same is true of GPT 5.4 Cyber. So yeah, I suppose maybe not surprising that both are in a similar spot where the real danger seems to be in the short term from cyber capabilities.

Speaker 2:
[34:58] You could tell the story a certain way that doesn't look so great for OpenAI. The Mifos preview or Mifos preview, however you're supposed to say it, was good at cyber to the degree that it was, the insane degree that it was, by more or less accident, not entirely by accident, for exactly the overfitting rules that you just talked about, the benchmarking reasons that you just talked about. But it was trained to do really good coding and reasoning and all that stuff, and boom, out pops this really impressive general reasoner that happens to be dangerously good at cyber. In this case, OpenAI is talking about a fine-tuned version of GPT 5.4, not the same thing, not apples to apples. In other words, to get to the guy, presumably they seem to be alluding to a comparable level of risk. That's what they're trying to argue here to meet those preview. Now, fine-tune, right?

Speaker 1:
[35:50] I do want to mention, at least in the blog post, what they say is we are fine-tuning our model specifically to enable defensive cybersecurity use cases that is trained to be cyber permissive. So it's a bit ambiguous here of like, are they doing this for capabilities or are they doing it because the default model is intentionally meant to be bad and to not allow you to do it?

Speaker 2:
[36:15] Yeah, I actually, yeah, and I can't tell.

Speaker 1:
[36:17] And with Anthropic, with Opus 4.7, right, that was one of the things to highlight is you sort of intentionally hold it back. And in fact, something I didn't mention is they, in the blog post, Anthropic said that they have this new monitoring thing where Opus 4.7 is going to be refusing to do cyber stuff. And it's going to actually be part of how they are going to try to get good at checking that the models aren't doing things and perhaps even roll out Mephose widely. So yeah, it's a bit ambiguous as to whether this is sus or not.

Speaker 2:
[36:52] Yeah. And to be clear, it won't be refusing to do all cyber stuff, right? It's like they're scanning for, you know, nasty, evil looking cyber stuff in the usual way. Yeah. So the language here does orient towards fine-tuning, which is why, I mean, if OpenAI was going to come out with a release like this to, it seems, respond to the massive watershed moment of Mephose preview, then you would expect them to emphasize their model is not fine-tuned either. And instead, what we're getting is a bunch of ambiguity. And the one reference to the kind of model they're talking about is a fine-tuned model, which leads me to suspect that maybe OpenAI kind of pulling from behind here and trying to kind of grab on to part of the story here. But either way, we just need to find that out. You're right. A bunch of stuff they're already doing on cyber, they're continuing to do is part of the message here. And they do say that they're adding additional tiers of access for folks who are willing to work with OpenAI to authenticate themselves as cyber defenders. So you basically go to OpenAI, they have a link for you, you basically sign up for this trusted access program. One of the things that they do say is that access to these more permissive models, these models that don't say no when you ask them to do defensive cyber things and potentially some offensive ones, access to those models is going to foreclose your ability to use certain enterprise options like retention, ZDR. ZDR is really important because it's a guarantee from OpenAI, they're not going to hold on to your data, your prompts, your whatever. They're not presumably not going to allow you to use it or at least in as many cases if you're using this permissive cyber model. The reason seems to be they want to be able to go back over the logs and assess whether the models are being used for good or for evil. Zero data retention is somewhat incompatible with that. This is interesting, you're seeing not only the individuals and entities that are allowed to use these models be calibrated, but also the kind of uses and the kind of even enterprise deals that can be cut with these models are being affected by the capability surface of these cyber models. It's really interesting. We'll wait and see, but if it turns out that this is in fact a fine-tuned GPT 5.4, an argument that that would be equivalent to Mithos Preview, then that's a pretty big gap then is implied between OpenAI's performance for the GPT line and Claude. We'll see if that gets maintained, but it's an interesting early indication that some potential challenges.

Speaker 1:
[39:16] Next up, we're done with model releases onto agent tools. OpenAI is getting big updates to Codex, they have this blog post that is saying Codex for almost everything. The big news here is computer use, so Codex will now be able to use apps on your computer, click around. It's also getting a built-in browser, getting new plugins with, you know, Figma and Collinger and things like that. So, similar again to Anthropic Co-work, where the gist is like, this is going beyond coding, you can use it for anything. And boy, like, Codex is really trying to catch up hard to Claude. I think we covered Claude Co-work having computer use just like super recently. Also browser use, and now Codex also has it. So, very, very fast-paced competition between these two.

Speaker 2:
[40:14] Yeah, I mean, covering this is like, it's just giving a list of features, right? The high-level piece, there's one I think that's strategically interesting. So, Codex can now use GPT Image 1.5 to generate and iterate on images. This is, you know, we've talked about how Anthropic is one of the places where they're behind, is on the multimodality side. They're trying to bridge that gap, actually, with some releases even this week. But that's kind of like something that OpenAI is leaning into here is, hey, we're really good at, you know, the screenshot game, the images game. We got a head start with Dali and Clip and all, you know, have this long, long tradition. So let's lean into that and let's make this a comparative advantage, presumably, of Codex. They also say Codex can now schedule future work for itself and wake up automatically to continue on a long-term task potentially across days or weeks. So we're really starting to explicitly move into the long-term memory game here. And that's a shift. The question as always is coherence across that time period, right? You can have a model that works incoherently for months or years if you want. You can already do that. But the question is, can this coherently work across that period of time? And so I think it would be interesting to see with these systems, what do they look like when you run the meter eval suite on them? Do you actually see that nominal number of days translate into the number of days either on the meter eval tasks or any number of the derivative versions of it that like Epic AI, for example, has come up with? So anyway, interesting result. And I'm sure we'll end up seeing very soon if there's uptake and what the time horizons are.

Speaker 1:
[41:42] And you've got a whole bunch of other kind of agentic tool updates and releases to cover. So I'm just going to like batch of them and cover all of them at once. So Anthropic also has launched Claude Design, which allows it to create visuals like prototypes, slides, and one-pagers that's coming after Figma. They also have launched Claude Managed Agents, which is an out-of-the-box infrastructure for businesses to build and deploy AI agents. So this is like a sandbox with tools and file access and so on. It's, you know, there's some dead startups from this. So it's like secretly a big deal if for enterprise customers primarily. Aside from Anthropic, there's also Canva releasing AI 2.0, essentially being agent capabilities where instead of having one-off tools, you can tell the agent to do something for you. Similarly, Adobe has a Firefly AI Assistant, a conversational AI interface that lets users edit creative projects by typing stuff. And this can handle multi-step workflows across Photoshop, Premiere, and Illustrator. So this is going to be available soon. So there you go. Everyone is launching agents. Everything is conversational. And Anthropic is still launching quite a lot of stuff, including this new Managed Agents platform, which is perhaps a big deal. And beyond the agents, a few more quick things in Rapid Firewell highlight. As features of existing stuff, Gemini can now pull from Google Photos to generate personalized images. Seems like a fun thing to play around with. Google has also rolled out a native Gemini app for Mac. Quite late compared to Anthropic and OpenAI. They've had these apps for forever. Now there is a native Gemini app. Finally Chrome now lets you turn AI prompts into repeatable skills, which is a thing that exists on Cloud Code. You give it a demonstration and you can then invoke it and tell it to do that. Now you can do that with Gemini. So there you go. We haven't covered Google too much and they are also doing lots of stuff. Onto Applications & Business, getting back to a developing story. Anthropic loses appeals court bid to temporarily block Pentagon blacklisting. So this is a federal appeals court in Washington, DC. It denied Anthropic's request to temporarily block the Department of Defense's blacklisting of the company. And they said that the balance of equities favored the government over Anthropic's financial interests, whatever that means. This is separate but related from the case in San Francisco that we covered in a previous episode. And now, so there's basically two conflicting rulings that mean that Anthropic is excluded from DOD contracts but can continue working with other government agencies while the litigation proceeds.

Speaker 2:
[44:45] Yeah, this is kind of interesting. We've talked about how there are these two different court cases that are being filed. So one was in San Francisco. And that one we talked to, I think, last week or the week before, right? Some judge basically said like, this case is bullshit. This is clearly vindictive, blah, blah, blah. And then there was a case in DC where the judge was, in fact, this is the case here where the judge is coming in and saying, look, you know, it's not that they're saying that Anthropic is wrong. In fact, on balance, I think very likely Anthropic ends up winning this. What they're saying is they're not so obviously right, so outrageously obviously right, that they're now going to step in and reverse the Pentagon's move here on the supply chain side pending the trial or pending the court proceedings, I should say, litigation. So there's always this question between the moment where, you know, legal paperwork is filed and the moment where you have a resolution in court. What should you do? Right? In criminal cases, this is bail, right? Like, do you decide to grant the criminal, like, freedom in the interim? Or do you go, hey, they're like being tried for rape or for murder. And the evidence looks really bad. You're kind of giving a pre-verdict, in a sense, which is constitutionally really tricky. Well, there's a version of this is happening here, where the court is basically saying, like, are you going to put in an injunction, basically, to say, hey, USG, you're going to reverse this, pending the proceeding. And again, this is where California said, yes, I'm going to grant that. And DC is now saying no. The reason we've got two distinct court cases here, by the way, is that we have two different designations that the DOD is relying on to pursue Anthropic in the federal court. One was the one that applies to the Secretary of War specifically, that lets the Secretary of War basically just like exclude a company from any defense procurement that involves security systems for specifically the purpose of reducing supply chain risk. And so this is like, think of it as like, this is the Secretary of War saying, not in my department of war, like, you can't do business with us. The other one is this Federal Acquisition Supply Chain Security Act from 2018. And this is basically just a more general, you can't do business with the US government more broadly thing. And that's the one where you're looking at the DC version of this. So here, you've got this statement from the judge. Actually, the full statement is worth reading. They say, in our view, the equitable balance here, and by equitable balance, they really mean when you're weighing the interests of the two parties and of the law overall, cuts in favor of the government. On one side is a relatively contained risk of financial harm to a single private company. So yes, there will be, they're acknowledging there is risk to Anthropic, fair. On the other side, they say, is judicial management of how and through whom the Department of War secures vital AI technology during an active military conflict. For that reason, we deny Anthropic's motion. So they're basically saying, look, this is actually like, yes, it's going to hurt Anthropic like a bitch. On the other hand, we're in the middle of doing, like, making a rash call on this is a philosophically and precedent-laden maneuver that would potentially have us ahead of our skis. We need time and due course to think through this. They do say that because Anthropic is probably going to suffer some degree of irreparable harm without a stay here, their interests are basically financial. So it's nothing like no lives are going to be lost, It's just money. They say, however, because Anthropic is going to suffer, they think that, quote, substantial expedition is warranted. So their resolution here is, look, Anthropic, sorry, we're not going to give you the thing you want. You can't do business with the government or DOD, DODW in the interim. However, we're going to expedite this quickly to get to a resolution because we know that you're suffering in the meantime. So that's kind of where things fall, a bit of a mixed bag across two court cases and within this court case. We'll just have to sit back and see when things get scheduled and how quickly we get a resolution here.

Speaker 1:
[48:39] Next, a quick story but possibly meaningful. Jeff Bezos AI Lab has poached ex-AI co-founder Kyle Kozic from OpenAI. He was previously focused on infrastructure, and we don't know too much about this lab. It seems like presumably it's going to try to compete in a different tier model business and going to be scaling up as you might expect. Jeff Bezos got a bunch of money for it, but we really haven't heard too much about it so far.

Speaker 2:
[49:09] Yeah, and in terms of funding, I think they're sitting on, like, it's like $6 billion raised, something like a $30 billion valuation of memory serves. A lot of the $6 billion, as you say, is Jeff Bezos' money? Not all of it, because, you know, just throw purely your money at a new thing like this. But the emphasis here seems to be on the kind of physical world robotics type of application. So at least normally it's not quite on the nose in terms of LLMs. They want to do stuff that's more about kind of real world trial and error. You're looking at sectors like jet engines, semiconductor fabrication, aerospace, like basically CapEx heavy industries, which are the industries that Jeff Bezos knows so well, right? I mean, this is really a big part of why he's heading in that direction. So yeah, I mean, we'll see. They've done a bunch of interesting acquisitions. There's general agents that I think we talked about at the time maybe that they acquired. They specialize in VLA architecture. So like video language action architecture, which is why you can see how that maps onto robotics a lot more than just a standard LLM thing. It's an interesting play. We haven't really seen tons of stuff come out of them. We don't know anything about revenue and customers, nothing public at least. We know about this acquisition of talent. I mean, Kyle Kozic is, this is a big acquisition. He led the infrastructure behind Colossus at XAI. So this is a dude who really knows. That was famously 120 day super computer cluster. So everyone was like, holy shit, how do you do that? It takes a year and a half to build these. It just does. No one does it in 120 days, but Elon and therefore Kyle actually did that. So this is a big loss for not only XAI, but also OpenAI. He formerly been there. He had two stints there from 2021 and 23. And then again, from 2024 to 2025 with his year at XAI kind of in between. So yeah, this is going to be no surprise. He'll be building up massive infrastructure projects as part of this. But again, we don't know basically anything else. We've got 100 employees that are across apparently San Francisco, London, Zurich. That was as of last November. Maybe it's more now. So yeah, probably more to come there.

Speaker 1:
[51:11] Next, sorry, perplexity. Interestingly, it seems like their revenue has been boosted 50% by the shift to AI agents. So just in the last month, its annual recurring revenue has reached 450 million. In March now, worth noting that annual recurring revenue is like, oh, you paid us for very long time and then we're assuming you'll keep your subscription and we'll get all your money. Could be sort of like, oh, let me try this open-claw type thing and then it dies down. So we're being a little bit skeptical. But if this holds up, it means that perplexity might be going much deeper into the open-claw kind of managed agent business beyond its core product of AI-powered search, which would be an interesting evolution of the company.

Speaker 2:
[52:00] Yeah, it's also like it's worth putting in context. There's obviously this is way, way smaller than the revenue run rates that we're seeing for Anthropic, with 30 billion or whatever and Cursor even, which is 2 billion. So all in relative terms, and this does place perplexity in more direct competition with both of those companies, right? So like this is, it's good. There's a huge market here in that sense, but it doesn't mean that they're now going head to head against much better capitalized and companies with larger kind of steady user bases. So we'll see that the agentic wars are fully on. Perplexity is also announcing a tax agent that they're launching as well. So it's just like a whole bunch of things they're putting out there that are more agentically oriented. Yeah, we'll just have to see. They got 100 million users. So that's a great base. They've got distribution. There's good tens of thousands of enterprise clients. It's a good start, but we'll have to see if it can be sustained in the face of Cursor, Anthropic, and others.

Speaker 1:
[52:57] Right. And related to this, they actually just rolled out this personal computer thing we covered previously as something we announced. It's now available on the Mac. So it's really going head to head with both OpenClaude and Claude, the co-work and Codex. You know, I mean, I think they've got a decent chance at competing with OpenClaude in particular, given that they are sort of a tool to do search or just do stuff for you. Going back to Anthropic, we have news that they have agreed to rent CoreWeave AI capacity to power Claude. So another deal that is going to be slated for multiple years, if it could be capacity will come online starting later this year. Now, CoreWeave has a whole bunch of business with everybody, Microsoft, OpenAI, Meta. You know, once again, Anthropic just like going everywhere it can to get compute, but they clearly did not expect to blow up as much. They clearly are struggling to keep up with demand. People have been reporting Claude Code working worse in recent months. And, you know, they might be still doing Opus 4.6, but how heavily quantized is it? Is it like exactly Claude? We don't know the secrets. And I do think they're probably trying to save compute in some ways that are impacting user experience. So we really need these kinds of deals to come through.

Speaker 2:
[54:23] Yeah, timing these data center builds and these acquisitions is so hard. I mean, that's almost the entire problem. It's not, but it's almost the entire problem. Yeah, and this is a big story for CoreWeave, right? I mean, now with Anthropic on-side, if you look at like the 10 top AI model providers, nine of them are running on CoreWeave. Like the only holdout right now is XAI. That's the only one. That's pretty wild. It's a reflection of the fact that everybody is, as you say, trying to grab whatever compute they can. So any company in this space is in the space in a big way. But it is pretty wild. I mean, back in 2017, I think CoreWeave was basically just like a crypto company. That's how fast this happened, right? So it is pretty nuts. They're also a bit on a kind of debt knife's edge. We've talked about this before, but it bears repeating every time they get another big customer with many, many billions of dollars of CapEx, the balloon grows, the risk and the returns grow, right? So right now, they've got about $10 billion in debt, and they've got about $34 billion in lease obligations. 2026, their CapEx plan was an additional $30 to $35 billion. The whole hope here is that hopefully, they can get their data centers commissioned online before debt servicing just crushes them into a fine paste. And every new deal, like Meta had a $21 billion deal with them, Anthrobic's deal here, it extends their revenue runway, but adds more debt. And so the capital intensity here is insane. They'll probably be fine, but it's just like, you got to think if you're CoreWeave, you are outrunning your debt. That's what this feels like. It feels like a breathless attempt to get these data centers online. And I can tell you, data centers are not quick to build. They are often delayed. Like massive delays are very common. We've seen this in the news tons of times. Everything from the builders to the municipalities, like things just go wrong. And so this is not a nothing risk. Like you genuinely could get a six month delay in a build. If you're partnered with a bad GC or a bad landlord or whatever, it's an interesting place for them to be at. Both Anthropic now and OpenAI, obviously are going around their biggest backers. For Anthropic, it's AWS and Google. For OpenAI, it's more famously Microsoft. To get compute anywhere they can, this is one way that they're doing it. It's a big deal for CoreWeave and just adds the... I mean, the stakes are just getting higher and higher.

Speaker 1:
[56:42] Yeah. In a week, their stock shot up 30 percent. It went from 90 to 120, so clearly a big deal to be making these kinds of deals. But it's also built in a very tumultuous history for the stock. It crashed as of 2025 a little bit. So yeah, it's risky, as you say. Next merger story, Cohere is perhaps going to be merging with Aleph Alpha. So Cohere is a company valued at $7 billion with the pitch of enterprise-oriented LLMs and products, including embedding and RAG, not just LLMs. They have released quite a few things, but I don't think we've covered them very much in terms of hyped up news. They say that they have earned $240 million in annual recurring revenue last year, which is not that much compared to Anthropic and AI. They're planning to possibly merge with Aleph Alpha, which was a startup that initially wanted to train its own models and then shifted to helping customers implement AI tools from any provider. So a very enterprising move where when you're talking to enterprises, they need to be like courting them, making proof of concepts, really accommodating them. And this signals that they are still trying to do that better.

Speaker 2:
[58:12] Yeah, I've been beating a pretty bearish drum on Cohere for a long time. It's not that I don't think they can't be a company. It's just that I think that if you believe in scaling, they're not going to be the ones to build AGI, which in the long run makes them kind of irrelevant. That's my, I mean, I take a pretty hard line on that. So not everyone would agree. But in this case, the problem with Cohere has been, it seems like they consistently want to, now that they have accepted that they're a distant, like, you know, 10th place player or whatever it is, they've kind of been appealing to governments more as part of the sovereignty I play, which is a way that you can do scaling. Like it's a way that you can try. Problem is governments just don't have the capex of large corporations in the US. It's just like, you're not going to see governments laying out a trillion dollars in capex anytime soon. And so it's kind of become this increasingly, I don't know, the feeling is like last pick at the gym class sort of thing, where now Canada obviously can't afford big capex outlays. So this is sort of like state-backed consolidation play happening, where Canada and Germany are coming together and seeing if like their two ugly stepchildren can merge to make one like large, ugly stepchild. And it's just like not, like Aleph Alpha is just a disappointment. Like I don't think anyone has ever really claimed otherwise in the last little bit. They came out as Germany's big hope, but they just did not meet expectations. Their founders stepped down last October. They had to cut jobs. They pivoted away entirely from building their own in-house models. And now they're just like an AI services integrator. And by the way, for public authorities, they can't even legally use American AI. So they're like completely out of touch commercially with that.

Speaker 1:
[59:49] That's after they raised half a billion dollars in 2023. So quite a disappointment, as you said.

Speaker 2:
[59:56] Yeah. And I don't want to be an asshole. I really don't want to be. But European investors, not always the best track record. Cohere, I mean, you said it, it's a quarter billion dollars of annual recurring revenue. By any measure, that's really good. The problem is the comparison point is not perplexity even. It's OpenAI, it's Anthropic. That's the game that they're in. Because ultimately, Cohere has to put out, if they're in the business of training frontier models, if that's your business, you have got to be able to have performance on par with those models. In the long run, there's no way of escaping that. You might escape it in the short run. You might do fine tunes and say on-prem deployment and sovereign AI and wave that flag. But ultimately, I think this is just a challenge. I think Cohere is ultimately going to be in trouble. I think this merger reflects that anybody but the US and China play or position rather that Europe and Canada are in to some degree. Mark Carney, Canadian Prime Minister, went to Davos, I think it was, and made this big speech about the importance of the middle powers coming together. Basically, just this idea that if you add together Europe's GDP and Canada's GDP and all this stuff, you get to something that's on par with the US and China, and Canada should have an option other than one or the other. And this seems aligned with that. Cohere has spent a lot of time, I would say a disproportionate amount of time, certainly, given their small market cap, engaging with foreign leaders and the UK Prime Minister, Canadian Prime Minister, all this stuff. And so this is just one dude's perspective.

Speaker 1:
[61:31] Yeah, I do want to counter negativity a little bit, just to highlight that Cohere is very much sort of, they aren't trying to compete head on with OpenAI and Anthropic by building frontier LLMs. Their one LLM is AYA, and it's highlighted as an open research initiative that is multimodal. The models they're training is for re-ranking and document retrieval, very businessy things. And aside from that, their products are an enterprise-ready AI platform and intelligent search and discovery system to surface business insights. And yeah, they have embedding and re-ranking models. So they do have command, which is meant to be high-performance scalable, but I don't think they're trying to sort of be as good as GPT 5.4. And they highlight things like security, deployment, customization, training on your proprietary data. The pitch, at least, is to be slightly differentiated from OpenAI and Anthropic by being more business-friendly and specialized. Whether that's working is another question.

Speaker 2:
[62:40] I agree that that's what it says on the tin. I guess what I'm saying is, structurally, they are competing with OpenAI. You're going to get to a point where, for any product that Cohere offers, Anthropic and OpenAI are going to offer the same thing. That's my claim that within three years, two years, that's where things will be. You've already seen Cohere retreat. They previously were meant to be a frontier. Let's not forget that we've literally played this game out before. They were training frontier AI models and then just died on the runway. Now we're seeing them retreat into, oh, well, we're getting more niche. The problem is that as models get more scaled in general, the niche becomes absorbed by the massive intelligence and you can just distill really good models and get excellent small. So that's where my bearishness is coming from. I just see structural risk for them all over the place.

Speaker 1:
[63:31] I think on the technical side, it's more of a business competition front where you want to lock in the customer, they pay for a year up front, and maybe that way you can stay or at least be in the race.

Speaker 2:
[63:44] It's annoying to switch to my customer as your biggest moat. I'm worried about that. But yeah, I agree. I mean, for sure, for a couple of years, there's...

Speaker 1:
[63:52] That's enterprise, right? So moving on to ChatGPT, they have a new $100 per month pro subscription. Not much more to say on that. It's between the $20 a month and the $200 per month pro tier. That is also an option for Claude Code. They have $100 and $200. So yeah, similar to these $100, $200 things used to be for the most advanced models and early access and so on. I think now more and more, they are being used for Codex and Claude Code, where very high token limits suddenly, a lot more people benefit from it. And personally at my company, we're paying the $200 per month. It's a no brainer. Like you're burning way more money than you're paying per month. And also on OpenAI, they bought AI Personal Finance Startup Hiro, Hiro Finance. This is an acqui hire. So yet another case of the company is shutting down. I don't know if they actually acquired the company or just hired away the talent, which is more and more kind of the standard is you just hire people. The company normally still exists, but is dead.

Speaker 2:
[65:08] Investors, by the way, love it when you do that. They just love it. Yeah.

Speaker 1:
[65:12] So yeah, OpenAI is clearly still kind of having a lot of different focus areas. This should help them expand to financial planning and dealing with user financial data.

Speaker 2:
[65:26] Yeah, it's actually interesting that it happens to fall in the same week that we're hearing from NotCurse or Perplexity, right, about their accounting, accounting agent. So it's tax time, baby.

Speaker 1:
[65:35] And last up, a story that I didn't want to highlight, but you probably have heard about. It will touch on Allbirds, the failed shoe company that went bankrupt, I believe, was pivoted to AI infrastructure, at least announced a pivot to AI infrastructure, rebranding the virtue as New Bird AI. And then its stock jumped 600 percent, as you might expect. Many people clowned on this, they're like, oh, this is indicative of the state of AI, that the shoe company announced that it's pivoting to AI compute and its stock jumped by 600. I think this is pretty nakedly kind of a cynical play by an investor. If you dig into the details, an investor put in like 5 million and made this announcement, the stock jumped a whole bunch. Now the investor is going to be making out very nicely on this. So it's essentially like, you know, the brand Allbirds, big enough, it used to be like a $4 billion company before it went down. You could acquire it for $40 million or something relatively small, and then do this as like, you know, I don't know how seriously they'll try to do AI compute. Let's just put it that way.

Speaker 2:
[66:50] In other news, Last Week in AI is pivoting to AI infrastructure. Send us your Bitcoin, I guess, is the play.

Speaker 1:
[66:58] Onto projects and open source, where we have some slightly more interesting different kind of models. First is HiWorld 2.0, a multimodal world model for reconstructing, generating and simulating free worlds. This is from Tencent, and this unifies 3D world generation and reconstruction in a single framework. So as you see with these kinds of models, they have a bunch of fun demonstrations of big 3D worlds with like a camera fly through. There is a couple of components here, HYPANO 2.0 as well, which has a Panorama generation component. They have World Stereo 2.0, which is a world expansion component. Yeah, I think we will be seeing more and more on this. This is like one of those things where it's really still quite janky and not advanced. So this is an area like humanoid robotics where, you know, there's still a lot of room to improve in world models and in 3D more broadly. OpenAI and Anthropic not doing 3D notably. So we are getting to a point where it's like pretty good, pretty impressive, but very far from good enough to sort of be used in real applications. And we are making pretty big jumps in quality because there's still room to improve by quite a lot.

Speaker 2:
[68:22] Yeah, actually both these papers were flagged by you, assuming that because you have more of a kind of video sort of image background, both because your work now and your previous work. I thought it was really interesting to kind of dig into something off the beaten path of like the AI agents, LLM sort of thing. So my reading of this was through the lens of scaling, super intelligence, AI agents, LLM. So like what does this mean for that? I guess a first observation is, so we're still in the space where for applications other than LLMs, you generally will see a Frankenstein model kind of architecture, where you don't have a single monolithic model that does all the things. Instead, it's a whole bunch of specialized models that are orchestrated together. Obviously, AI agents are more like that too, but just at the kind of model level, a lot more weight is being borne by the inductive priors, right? How you specifically modularize and plug in all these different things in ways that reflect assumptions that you're making about the best way to solve the problem. And this is one of those cases. They've got four different stages in their pipeline here, and they do use a multimodal diffusion transformer. So it's a transformer. It's doing autoregressive, basically sticking together the description, let's say, of the image or video they want to generate, and the noise that has to be denoised by the diffusion component. So kind of blending these two together. We've seen that before in other contexts. But then there's separate models and systems for really analyzing a... So they first like create a... They take an image and they stretch it out to create a panorama using one model. And then in the next step, they'll analyze that panorama to figure out what's in the scene. So like walls, floors, stuff like that, and then plan camera paths through that scene, explicitly like kind of chart out routes that maximize coverage of all the spaces you can navigate. At the same time, it's like dodging obstacles and stuff. And those paths actually come with descriptions, text descriptions to guide the next step. And so obviously the LLM or at least a VLM and functionalities involved there. And then from there, they're going to do the step called world expansion. And basically, they take all those planned trajectories and they generate a bunch of images from key viewpoints along those trajectories. So like, hey, what if I look from this point perspective at this thing and all that? And then the interesting thing is that they use an explicit memory mechanism to keep track of what's already been seen. So new frames are consistent with earlier ones. This is a really important distinction because if you remember earlier models like this, like video generation models, would famously have these really short coherence times where for 30 seconds or so, it'd be great. Like you would look around and you'd walk forward three steps, you'd look behind you and you would see exactly what you would just walk past. But then if you kept going, pretty soon reality would just almost literally melt in some cases. I mean, it just degraded into coherence. And the reason was that the model couldn't remember what it had seen previously. And there was this kind of like treadmill of memory. And so like in agents, we're seeing here a push towards this more persistent, robust kind of memory. And I would argue this falls into the continual learning category again. This is a version of that, that applies to the video paradigm just in the same way that it did in the LLM paradigm. And anyway, the last step is this thing called world composition. So they take all those generated perspective shots from all of these planned roots that in the panorama, and they feed them into a model called World Mirror 2.0. It's basically a model that predicts depth and the 3D structure from each frame, and they stitch them together using 3D Gaussian splatting, which I have never played with. But anyway, yes, stitching together, you may know more about this. But bottom line is, the improvements here really do seem to come more from data curation, architectural tweaks, memory, the bolting on of more features. So this is not a bitter lesson-pilled approach. This is very much a handcrafted, that's where we still are at, at least in terms of open source in this problem set, which I found quite interesting.

Speaker 1:
[72:30] And I think, yeah, to be fair, this has always been true to some extent with 3D, because 3D is just more awkward to represent, and the input and output format is tricky. So here, as you said, there's multiple components. And qualitatively, why that is, is the way you're doing this is you're starting in a point in space, and then you decide where to expand to. So it's like expanding a world model. And the reason to do it that way is you don't have a nice way to represent the entire world all at once. You have a nicer way to represent every view of the world and then add to it. And we may or may not have it be always true. And this is one way to do that. Actually, the next paper is another way to do that, but I think is more important. So next we have Lyra 2.0. We can say that High World 2.0 is more like World Labs of Marble, where they have a 3D Gaussian splat or another 3D representation, and then you walk and expand the world as you go. Lyra 2.0 is more like what we saw from DeepMind. I forget the world of it, but it's essentially a playable video. And so you can be in a setting and have a sort of control and see a video update in real time. So it's a world model of a different sort. There's basically two different types of world models you can have, where you have continual video generation, and then you can have an existing static 3D world that you keep expanding. And the two are related but not the same, and there's various implications for architecture and sort of representational aspects that make 3D sort of just tricky. But yeah, this is also in that realm of world modeling and open-source model that you can use to build that.

Speaker 2:
[74:35] Genie. Genie was the, yeah, yeah, yeah. It's funny, we see so many models on this show, and I'm like, oh man, I'll never forget this one. And sure enough, yeah, it's interesting that these similar related solutions are coming up at the same time. I guess you generally do see that.

Speaker 1:
[74:52] And this is by the way from NVIDIA, which I think is also interesting.

Speaker 2:
[74:57] Yeah, that's right. That's right. Yeah. One Chinese research lab and then one obviously American one. So the two, as I understand it, core problems that they're solving here, one of them is this whole spatial forgetting problem, where we talked about it, you drift off into one direction, and then pretty soon the model kind of forgets what it's seen and starts to get inconsistent with that. This model, LiR 2.0, fixes it with this 3D spatial memory cache. So it actually does store the geometry of every past frame separately, and then use it to retrieve the most relevant past frames when needed. So this kind of like, it feels like a rag type search to kind of pull out the physics that you want to anchor on. So it's a more grounded generation. So that's one piece. And again, we've talked about the analogy there with language models, right, where you are... Or no, sorry, I don't think we did. There is an analogy to language models, right? Like famously, in the same way that these video models will forget what they've previously generated, it used to be the case that language models, well, when given text that goes beyond their context window, right, they'll still do this. They'll forget what they had written previously, and you get this treadmilling effect that leads to incoherence over long stretches of generated text. That's much harder to spot now, obviously, because the context windows are absolutely massive. But one of the solutions that is used for that is RAG. And in this case, the 3D spatial memory cache is sort of the analog to that. The other is this idea of temporal drifting. And so when you generate video autoregressively, so frame by frame, you do tend to see small errors that accumulate. So like, you know, colors, little shift, or like geometry that kind of starts to distort. And they fix this with what they call self-augmentation training. So during training, they're going to deliberately give the model slightly corrupted, noisy versions of its previous outputs. Instead of the actual outputs, it'll just corrupt them a bit. And that kind of forces the model to learn to correct that drift instead of propagate it. And actually, this is kind of not entirely dissimilar to what a diffusion model does philosophically. It's just playing out sort of autoregressively in this way. And so that also has a profound analog on the LLM side where you can see goal drift or like value drift happen in a sequential decision-making context where you have little errors that start to compound. And the fix here, the idea of training on your own degraded outputs, is basically teaching the model robustness to that imperfection. It's kind of an interesting thing. You could imagine drawing inspiration from that for LLMs too. So yeah, all the usual things apply. This is a world model, so you can use it to train agents ultimately, like if these things get robust enough. It also, interestingly, as I understand it, learned like it was never explicitly trained on 3D spatial structure. It was just given 2D videos, and it generalized from those 2D videos to 3D without a specific hardline inductive prior, which is really interesting. So again, this is kind of like this thing we've seen with scaling in the past, where beyond a certain level of scale, you do see what you would previously have thought of as out of distribution, generalization started to come kind of within reach. So I thought that was sort of interesting here.

Speaker 1:
[78:05] Yeah, not anywhere near genie quality, by the way. Like the demonstrations are very short videos. They don't seem to be very interactive. It's mostly panning. So NVIDIA is not super far along here, but this is a quite rational type of space for them to invest in and have a lead in. Last interesting detail, looking at the appendix, it's built on top of one 2.1 14B DIT, which is an open source generative video model from Alibaba. So this is some like cross country sort of open source collaboration in a sense. Moving on to policy and safety, I think here starting off with more of a safety-ish, I don't know how to characterize this story exactly, but one of the big stories of the week, which is that Sam Altman's home and OpenAI's headquarters were attacked. There was a Molotov cocktail thrown in front of the gates of Sam Altman's mansion in San Francisco. And then later on, this person who is accused of doing this, Daniel Moreno-Gama, a 20-year-old, he tried to break in to OpenAI's headquarters, I think, with a chair. So there was seemingly also a document that he had called Your Last Warning, which is quite of an extreme message, as you might expect. So this was a person on the extreme front, doing some extreme things. At the same week, there were also shots heard outside of Sam Altman's home. In this case, there were two people who were freed. So they are booked on suspicion of negligent discharge of a firearm, but they were not charged. So it's unclear what the situation is here, whether someone intentionally tried to shoot Sam Altman's home, or if this was just a coincidence. But either way, I think this is one of the very early cases of violence as a response to AI, as part of AI backlash, and also as part of concern about AI. We've seen people demonstrating outside of the offices of OpenAI. There's multiple organizations. I do recall we covered a person who had potentially violent thoughts. I think it was like StopAI had someone who intervened because they seem to be considering violent things. So yeah, this obviously quite extreme, quite bad to have these kinds of things happening. And also something I could see happening more as AI has more radical effects on society.

Speaker 2:
[81:02] Yeah, absolutely. And, you know, needless to say, this is an insane thing to do on a lot of levels. One of which is just like, suppose he'd succeeded. Is this supposed to stop the race towards building more powerful? Like that just doesn't strike me as the outcome here. Obviously, it's just somebody who's really distressed and got in their head to do something wild. It is explicitly tied to this concern over human extinction, by the way. There was two parts to the document that he'd written. The second part was titled, some more words on the matter of our impending extinction left. No mystery as to what the motivation was here. He did say to victim one's name, so really, this is Sam Altman. If you make it, if by some miracle you live, then I would take this as a sign from the divine to redeem yourself. This is all part of this manifesto that this individual wrote. Look, it's your point. I think exactly right. We are going to see more of this. Also, so if I'm a nation state actor looking to undermine US and broadly Western interests when it comes to AI, I would very easily decide to nudge more individuals to do this sort of thing. This is a very common motif. It's happened in environmental rights groups for forever, where when you trace back the funding, often unbeknownst to those groups, it does come from China or Russia, where they're trying to prevent critical infrastructure like pipelines and data centers from being built. It's just what you do. You always do it through proxies, and you always do it in ways that are deniable, but that's the basic idea here. So ultimately, I expect this to be a nation-state game. As weird as that sounds, the process of generating people with this kind of disposition, with this view and this willingness to act is going to be professionalized by nation-states. It is just going to happen, and you're going to have to see super high levels of security for all the frontier AI executives. That will happen. Expect they'll have their own Pope Mobiles or whatever the equivalent is. But yeah, things are going to get wild really quickly. And this is absolutely to your point, not the last that we'll have seen of this, unfortunately. It's just the way things go.

Speaker 1:
[83:11] Right. It's pretty unambiguous by the way that this person did these things. In the criminal complaint, there's some very high-resolution surveillance camera footage of him throwing the Molotov cocktail at the gate and also going to OpenAI offices with a chair trying to break in before presumably facing a security guard. So yeah, this person did it. They had this whole write-up. They did this as in this document saying, leading by example and show that this person is fully sincere in the message of advocating for others to kill and commit crimes. So yeah, as you said, deeply related to the notion of impending extinction, but also a little bit unhinged in the writing like it says, addressed to victim one, if by some miracle you live, and I would take this as a sign from the divine to redeem yourself. So let's not think that like rationalists or whatever EA people are agreeing to this or anything like that. That's not the case at all. Onto another sort of weird society affecting phenomena that we have an example of, that may be happening more over time. The specific example is covered in The Verge, where we've headlined the Iranian Lego AI video creators credit their virality to heart. So in case you don't know, it's been a crazy time for AI generate videos in politics over the last year or so, really starting to ratchet up under Trump's administration. They kind of post AI videos very often. And in this war with Iran that just started, there is competing propaganda going on where the White House posts a lot of these ridiculous over top edits, usually with clips from video games or things like that. And then in response, the Iranian here, it looks like actually a group of about 10 Iranian content creators, explosive media. They've gone viral. These AI-generated Lego style videos, commenting on the US and Israel's attacks on Iran, as you might expect, generally making Iran heroic and conveying a message that they are not going to be backing down, that the US and so on are bad. And this is coming at the same time as a couple of other things that are related. There's also a story about hundreds of fake pro-Trump avatars emerging on social media with videos of people approving of Trump. Don't know where that's from. It just is happening. And then also related and something you may have heard about, Trump loves AI-generated images and videos. He posts and reposts them constantly. I don't know if we've covered many, but he posted photos of himself as a buff Jedi. And most recently, of course-

Speaker 2:
[86:20] Oh, that was real. Yeah, that one was real though.

Speaker 1:
[86:24] I don't think that's the case. And most recently, he was in an image as Jesus healing a sick man with the excuse of like, oh no, I was a doctor, very clearly as the Messiah, which received a lot of backlash. But there's a long list of examples of Trump posting himself in these ways, including like sort of videos that mock his opposition. So all these things together, I think, point to the gradual escalation of the use of AI-generated media images and video as state propaganda, as tools of politicians in a way that I don't think we fully expected. Sort of it has propagated and hasn't massively impacted things in a way we sort of feared that deep fakes might. But at the same time, it's hard to get a sense of what the impacts here, for instance, with these fake pro-Trump avatars generated videos on social media. Like we got to a point of a video being so good that at the same time being so good and us being a bit cynical and used to AI-generated images and video in a sense, where it's just like normal.

Speaker 2:
[87:46] Big age gap too, right? Because I mean, I've seen some older folks look at content that's obviously AI-generated and go, what? Like, why would they do that? This happened to me a couple of weeks back. My uncle did this thing where he just made with Grok or whatever, made a video of me jumping out a window or something, and he sent it to one of my in-laws anyway. And they were like, why would he do that? Like that was their first reaction. Yeah, it was a sign of the times. There's a really big gap there between the generations on that side.

Speaker 1:
[88:19] Right. And an interesting example of the fake pro-Trump avatars. Again, we don't know who did this, which is also very strange because having hundreds of accounts making many posts is pretty expensive. And worth noting also that Trump did repost content from one of these videos. So directly affecting kind of messaging from the administration. So very weird time for the US and I guess for World Station General, where we have Iranian Lego AI videos going viral as part of war propaganda.

Speaker 2:
[88:57] And next we have what we can think of as the sort of AI alignment paper of the week, automated weak to strong researcher. So you're Anthropic, you're concerned that we're knocking on the door of super intelligence, and your entire hope at this point rests more or less on the idea that you're going to be able to pull off something called scalable oversight. So what is scalable oversight? Well, we're a bunch of dumb asses, and we're going to make machines that are smarter than us, and we're going to hope that we have the ability to meaningfully oversee and steer and control the behavior of those systems. And so can that be done? Is there a way to do it? Is it logically possible? Well, why don't we test that by using a dumb AI model? In this case, we're going to use Quinn 1.5, 05B, so a very small kind of small scale model, to supervise, to teach a smarter model. And in this case, that's going to be Quinn 34B, so a model that's more advanced in generations and also sort of eight times bigger. So much, much more capable model. And we're going to see, basically, can we start with our dumb model and fine tune it on some data? So now the model is kind of like learned in a shitty way the lessons of that data, right? We take our shitty model, train it on some data. The model is learned in a shitty way, the lessons of that data. And then use that shitty model to produce some shitty labels for fresh unlabeled data. So you got data now that's been given shitty labels by a weak model. Now you're going to take a strong model and train it on those shitty labels and that data that you just shitily label. Can you get that stronger model trained on those shitty labels to perform as well as that stronger model would have performed if it was trained on ground truth labels, like the actual correct answers? That's the core question at the heart of scalable oversight, or at least this is one way to formalize it. And that's the experiment they're going to run here. There's this metric that I think Leopold Aschenbrenner sort of started kind of pushing down the direction of this particular metric, it's the performance gap recovered. So a zero score on this metric means that the strong model recovers the same performance, the same accuracy in generating its labels as the shitty weak model. So basically, it's to be kind of memorizing whatever shitty process the weaker model used. That's a zero. A one is to recover, if the strong model recovers the same performance as it would have if it was trained on ground truth labels. Now, if it sounds impossible to you that this could work, I'm sympathetic. To me, it's actually very unclear. This sounds like it could actually just be straight up impossible. The hope, logically, is that the stronger model has been trained on so much data, has so much scale, that it can actually kind of see the gist of what the shitty model was trying to get at with its labels and then generalize better than the shitty model could during training. That's actually the hope here. And, well, the result here, to be honest, is just surprising. By the way, the hope is that we can automate exactly this kind of research, automating research into how we improve weak to strong generalization, how we essentially close the loop of the automated AI safety researcher. Because if we're going to make automated AI capabilities research, we'd better have automated AI safety research to keep up. That's all part of the hope here. So they're going to use an agentic scaffold. They're going to have a bunch of automated researchers running in parallel. They're each going to propose different experiments to run, different research ideas. And they're going to share their findings with each other and their code with each other. That's kind of the architecture here. There's a whole bunch of kind of strategizing in terms of how they split up the data that they're going to train the shitty model and the strong model on. So you have one initial chunk of the data that the weak model is going to get trained on just so it can learn the gist in a shitty way of the data. But then the second part of the training data is going to be used to... You're going to get that fine-tuned weaker model that was trained again on part one. You're going to get it to assign shitty labels in a shitty way to part two of the training data. And then now that you have part two of that training data shittily labeled, that's what you're going to train the strong model on. So that's one experiment. Another is you just train the strong model on the true labels for part two. That's how you get your upper bound on your performance for this assessment. Anyway, so that's basically how it's set up. They tried across a bunch of different domains of problems, like math and coding. And here's the result. The result, at least to me, was really shocking. So first, let's talk about the control. What do you get when you just use two human researchers? You give them a week of using pretty standard tuning methods and see what they can get. The performance gap recovered, the PGR, that number, is 0.23. Basically, they get 23% of the way to the performance of the strong model if it had been trained on ground truth labels using this technique. So not that great. Like using human researchers does not result in techniques that seem to recover a lot of performance gap. The automated researcher achieves a PGR of 0.97, 97% of the way, to the performance of the strong model if it was trained on actual ground truth. To me, this is wild. Five days they gave it. 800 agent hours. About $18,000 worth of compute was spent on this. So just from a price standpoint alone, this would already be amazing if they just matched the performance. But they're blowing the performance out of the water, from 23% to 97% on this benchmark. This strongly suggests that automating research on at least alignment problems that are outcome gradable, where you can actually quantify and assign a concrete metric to those problems, is already practical. That is a huge deal. That didn't have to be the case, as we've just talked about. It is wonderful that it is. The bigger challenge though is moving on to non-outcome gradable problems. Historically, this has been most of the problem of alignment. How do you quantify what it means for a system to be well behaved? There's no metric for that. You can't just stick a bunch of AI agents on it to automate that research in the same way they did here. You need some kind of metric that you can hill climb on, that you can optimize against. That's why the PGR performance gap recovered metric is so useful. It's why it's such a big deal. But before we give up on that idea, before we give up on the idea of the automated alignment researcher for non-outcome gradable problems, we should realize this isn't actually as weird as crazy an idea, this idea of generalizing to non-outcome gradable problems as it might seem. This is after all exactly what happened when we first went from just text autocomplete models to general chatbots and agents in the first place. When we started, the outcome gradable thing was how well can you predict the next token? That just generalize successfully to actually I guess these things, these chatbots can now reason. There is actually precedent for exactly this kind of generalization. The question is at what scale does it happen? Do we discover the techniques on time? Do we make differential progress in alignment? In other words, faster than capabilities. This is a pretty bullish result for this. There's a whole bunch of reward hacking issues, by the way, that they go into, that we won't have time to discuss, but that are very interesting. The one thing I'll say is, they have a fairly alarming result, which is in binary classification problems. If you want to know the true label for a single example, one thing that you can do is basically submit two predictions for the test set that you're being evaluated on, and only change your predicted value for one of the samples, and see what it does to your accuracy score. In that way, you can basically cheat the test and see, oh, okay, I found the right label. When I make sample number 10 red instead of blue, my PGR goes up. They actually found that the model did do this. There's a lot of detailed reward hacking that led them to throw out a lot of intermediate results on the way here, and they just found this kept coming up. The more they pushed the alignment frontier, the harder they found it to be to squash these dangerously creative solutions that the models were coming up with. This is exactly consistent with Goodheart's law, it's perverse optimization, that you basically, the more capable your model is, the harder it is to steer it exactly in the right direction to get the outcome you're looking for. Anyway, I thought it was a fascinating paper, and again, one of the more profound pieces of good news on the alignment side that I personally have seen in the last even a year. This is a pretty big deal.

Speaker 1:
[97:48] Right. I do want to share some caveats here. First, the two models they looked at are pretty weak to begin with. There's the weak model is QWEN 1.5 05B Chat. The strong model is QWEN 34B Base. So these are small models, presumably because to do automated training, to do automated research, you do a lot of experiments, and that is they slow unless you use small models. And so you want to scale this up. Now this entire strategy may not work. And the second caveat is exactly related to doing many experiments. When they look at the discovered ideas, the case studies, they highlight mostly things that are human-interpretable, like sort of things you would think might work. And there's really just a lot of stuff at the model stride. And one of the issues with having access to a test set and kind of evaluation, iteration for experimentation is that you might just like have a good seed. As you said, there's like ways to hack reward. But even if you don't hack reward directly, just trying out throwing like spaghetti at a wall might get you something that works for this benchmark, but doesn't actually accomplish your task in a way that is more kind of nuanced. So at the same time, like a positive sign for library research and also something that might be overstating it. Also the tasks they looked at are going back to their original weak to strong generalization paper, where on different models, they recovered like 90 percent of the capabilities. So recovering the capabilities is not necessarily that hard for these tasks. Last thing I'll say is we've covered multiple iterations of sort of automated AI research direction. And one of the interesting conclusions in this project is that not having prescriptive scaffolding, not having sort of hard-coded series of steps, but instead having a loose structure and letting your models decide what to do, they say that the autonomous kind of loose structuring is better, which differs from previous approaches we've covered. Usually, there is sort of like step one, step two, step three, step four. Here, they kind of let it do whatever and launch multiple agents that are kind of pointed at different directions. So a lot of nuance here.

Speaker 2:
[100:19] Yeah, which does make sense. You kind of ought to expect that as scaling makes individual kind of model is more performant with fewer inductive priors. It's kind of the history of LLMs that we've seen over the last six years is like, the bitter lesson says, take your hands off the reins more rather than less over time. Just let the compute do the compute and will solve your problem better than you would have thought of. But yeah, in terms of the scale of the models here, absolutely correct. Like half a billion parameters and four billion. These are baby models. It's also true that this is how alignment research is typically done. So if it is more apples to apples in the sense that when you look at a lot of past models, OpenAI first used, I think it was like they used GPT 4 to directly interpret the weights of GPT or the activations or the weights of GPT 2, right? So you typically do step down. Usually it is about two orders of magnitude, which is from the frontier, which is about what this is, to kind of run your experiments. And the art and the craft is in having the taste to go, ah, this is the kind of result that probably does scale. And you probably want to repeat this experiment with, you know, an even smaller and an even bigger model once you have a compute budget to do it and see if you can actually spot chart scaling curves. How does the PGR change with scale? Right? Would be a very natural question to ask here. In terms of performance gap recovered, yes, like the 90% number from previous, that the previous work is important. 97%, you know, obviously, like getting above 90, things get way harder from there. So 90 to 97 is a really big gap. But it's also true. Like what is apples to apples? I think your point there is actually really crucial. Like we have a manual baseline here that says 23%. That's a couple of dudes working for a week using fine tuning. You know, is that the analog?

Speaker 1:
[102:08] It feels very ambiguous. In a blockbuster, they don't go much into this human baseline. And I think it's worth highlighting that it's ambiguous as to whether the models are really smart, and that's why this works, or the models just try a lot of stuff and one of them turns out to work. And it matters a lot which one of these, because if you try to scale up, you can't run a million experiments. So it may be the case that the reason that AI did better on the humans is just because it can scale much faster and try a lot more stuff.

Speaker 2:
[102:43] Yeah, I would naively guess that will be front and center in terms of Anthropics considerations here. So I'd be surprised if they didn't have a human attention budget on the review of the outputs, but who knows? And this is really where we need to see more scaling experiments here with the smaller and bigger models. You know, once QP budgets are allowed. By the way, this is exactly why the OpenAI commitment to super alignment budget was so important, right? This is the whole thing. This is the reason that, beyond like a left OpenAI, you just need to spend. You got to spend compute on alignment. Otherwise, you don't get results like this that could be very promising.

Speaker 1:
[103:23] Just a few more stories. Next one is also about alignment research. We have reproducing, Steering Against Evaluation Awareness in Large Open-Weight Models. So I think you mentioned this earlier in the episode. In the Opus 4.6 system card, there was a section that covered this idea of using steering vectors, which we've mentioned a whole bunch of times. You can basically modify the behavior of models by injecting certain values. And it's possible to do that for evaluation awareness. So you can suppress the ability of a model to tell that it's being checked as one way to still be able to do this kind of alignment check as models get smarter. This is a reproduction of that, that verifies it and also provides some sort of a little bit of additional information or findings related to it. They show that kind of unrelated vector steering can also affect this. And this is one of the concerning impacts is that steering is unpredictable. If you do that to suppress awareness, you might also do other unexpected stuff. So if you're using a control of unrelated steering, that may or may not be valid. And here they have some results as to why that might be problematic.

Speaker 2:
[104:43] Yeah. So the way you do this in practice is you'll give a baseline prompt. Let's say you're writing a memo, is a baseline prompt, and then you look at the activations in the residual stream for that prompt. And then you write a more stimulating prompt, like you're writing a memo and a bunch of AI safety researchers are looking over your shoulder. And then you see what are the activations associated with the last token of that prompt, and you subtract the two. And now you have a, you can think of it as kind of like a regularized vector, basically a vector that is grounded relative to some sort of more neutral, you've got your, someone is looking over your shoulder and you got a more neutral prompt, you take the difference between them, and hopefully that new vector captures just the meaning of someone is looking over your shoulder. And so now you can think about adding it and injecting it into a model's residual stream and making it more evaluation aware or subtracting it and therefore reducing the amount of eval awareness of a model. That's like the naive way to do this. There are a bunch of other ways you could do this. Sparse auto encoder features is another way. We've talked about those a lot on the podcast. I won't repeat the whole SAE thing here, but thematically it's similar. The problem that they run into is if they basically come up with completely random prompts, as you said, imagine two random prompts and you do the same thing, you basically end up with a random delta, a random difference between two random prompts, that really shouldn't do anything consistent. That should just be noise, but they find that the effect of doing that can be as strong. In fact, stronger in one case. I think they talk about how the most effective steering vector that they find in their big list of steering vectors is actually one of the random ones. So on the one hand, yes, it's true this does work, because if you focus on prompts that are meant to elicit eval awareness, you will end up with, in general, vectors that can stimulate or repress eval awareness. That's true. However, it's also true that those are not the only, and in fact, not even the best ways to create prompts that do this. In fact, random vectors do just as well. What that fundamentally means is that this process is very fragile. It's again, it's not that it doesn't work. You do have interventions. It's that it's super fragile, that it's sort of like I was sort of thinking of an analogy like in medicine. We give people medication that, yeah, like this will help with your skin condition or your heart disease. Is it the best medicine for that? Well, no, it's not. Probably some really weird fucked up medicine is actually the best thing. But this will do in a pinch and it may have all kinds of other side effects. That's the other implication. If things are this fragile for eval awareness then presumably they're that fragile for everything, which means that the vectors that are designed to suppress eval awareness specifically are probably doing all kinds of crazy other shit too that we're not picking up on. That point was less emphasized in this report. But for me, it's actually kind of like the most salient thing because it implies that when you think you're suppressing eval awareness, and you are, and you're celebrating the good behavior of your model at the other end, in a way you're not actually measuring the behavior of the same model because there's probably all kinds of weird tail effects that are happening in other dimensions besides eval awareness. This just means alignment, interpretability is really complicated, and you don't get to just insert activations into a residual stream and be like, oh, this is all good. Yeah, fascinating for recommend, recommend the read.

Speaker 1:
[108:26] Yeah, they do say in the keywords, well, steering effects are complicated and unpredictable. This is widely known, but worth repeating. They highlight it a little bit. And yeah, to me, the takeaway is like, I mean, the use of evaluation awareness removal is because you're evaluating a model for safety. If by removing the awareness part, you're also making it safe when it isn't, the whole thing is pointless. So yeah, very good thing to be aware of. By the way, this is from the UK AI Security Institute, which we've covered, I think, other stuff from lately. So they seem to be killing it in terms of very useful, small-scale research that nevertheless is informative and kind of moves alignment, understanding further in small ways that are still valuable.

Speaker 2:
[109:20] They're doing a great job. We love doing it.

Speaker 1:
[109:23] You just cracked me. Next, back to politics. Iran has threatened complete and utter annihilation of OpenAI's 30 billion Stargate AI data center in Abu Dhabi. You said that so calmly.

Speaker 2:
[109:36] I love that.

Speaker 1:
[109:39] It's what it is. Complete and utter annihilation is the promise here from the IRGC Spokesperson Brigadier General, Ibrahim Zulfaghari. I don't know if that is realistic or if it's just saying things for the sake of saying things as you do, but obviously, if they are capable of it, kind of a big deal.

Speaker 2:
[110:09] Yeah, and we've talked about stories where drone strikes and missile strikes have possibly probably hit either AWS data centers in the UAE or thereabouts or data centers. I think Oracle was kind of on the list there. We covered this like last week and the week before. But basically, yeah, these are absolutely live targets. As far as Iran is concerned, expect this to continue, right? I mean, the whole idea here is this is critical infrastructure that is supporting a war effort, regardless of how you think about it. As more and more of the military runs on AI, that is just going to be the case. So a couple of things, I mean, I actually watched this video. There's a really funny part of it. It shouldn't be funny, but I'm sorry, there's a funny part of it. Here's the not funny part of it. So it starts with like, you've got a screen, a shot of the earth from space. It like zooms in on Google Maps on Abu Dhabi. And they kind of, I forget what it says on the thing, but it's something like, here is the spot that the Stargate cluster is or the Stargate data center is. Google Maps removed it from their map, but we were able to find it. Nothing can hide from our eyes. We're like, yeah, whatever, okay. So in practice, obviously, that is a thing you can see from space. It is massive, and Iran will naturally have had a way to probably, I don't know what their satellite situation is, but contract out to China to get access to that data, or Russia or whatever, or do it themselves if they have satellite. I can't remember. But anyway, in the same video, they kind of pan across this whole collection of famous public figures that are involved, like Americans who are involved in the big build-outs in Abu Dhabi. And they're like, oh, there's like, yeah, I forget if Sam, or fucking Sam Altman is there. And they put a little US flag in there. It's Sam Altman, the next guy, the next guy. And then they get to like this bald brown dude, and they write Satya Nadella, and they keep panning. And I'm like, wait a minute, that's not Satya Nadella. What? So, I haven't seen anyone like comment on it. Like, do my eyes need to get adjusted? Like, I don't think so. This dude was not proud of you.

Speaker 1:
[112:09] And then I think, yeah, for context, this whole statement is part of this edited video. After that, there's like this text that gets revealed on screen. We will do whatever it takes to defend our country and the interests of our nation. To be clear, this was a threat as retaliation. So, if the USA proceeds with its threats concerning Iran's power plant facilities, which Trump has been making lots of unhinged threats around attacks of Iran, including the destruction of an entire civilization, which is crazy. Yeah. So, this was kind of a counter threat, which it makes a lot of sense in context.

Speaker 2:
[112:55] Yeah. When you're bragging about your intelligence capability.

Speaker 1:
[113:00] You might want to double check.

Speaker 2:
[113:04] I think Satya is breathing a sigh of relief somewhere. He's like, all right, they're not on to me.

Speaker 1:
[113:09] I think also there's this flex of like, there's a zoom from space into the Earth of what I guess is Google Maps. And then it says nothing remains hidden into our site or hidden by Google, which I don't know if it's actually impressive or not, but they really are showing off in this video. Last up to kind of end on a less intense note, maybe we have Wall Street banks try out Anthropic's mythos as US surges. So this is part of Plagic Grasping. We know that Anthropic partnered with over 40 organizations, including multiple Wall Street banks like JP Morgan, Goldman Sachs, Citigroup, et cetera. Apparently, Treasury Secretary Scott Besant and Fed Chair Jerome Powell had a special meeting with major CEOs in Washington, DC to warn them to take mythos seriously and use it to detect vulnerabilities in their systems. So I guess a real demonstration of mythos having potentially major impacts on banking, right?

Speaker 2:
[114:15] Yeah, the first emails I sent once mythos was at least publicly announced were to, I won't specify, but a bunch of central banking-related authorities on the security side, because that is just the obvious sort of first threat vector. And they knew that mythos was coming before it was publicly announced, and had been getting briefings and trying to push that out beforehand, as you might imagine. The awkward thing, we haven't covered it this week, but so the Trump administration has now called in, right? Anthropic, and has asked them to make mythos available to the US government writ large to cover all these cyber blind spots. We've just finished covering a bunch of stories about how the US government has been trying to like, murder Anthropic's business, slit its throat, and then bury the body in the desert. And now they're like, hey, like, this is kind of awkward, but can we please, please, please have that model? So like, this is just showing you how well, not quickly, fortunes turn in this space, but also how short-sighted all of this friction with Anthropic really, really was. I mean, yeah, we've been saying it on the podcast for a long time, but like, this was a dumb, dumb, dumb move. And it was obviously going to be constitutionally challenging. I don't, I haven't seen analysis of these court cases that suggests that they have much promise or much in the way of legs. And so now, on top of all that, Anthropic is in this position of leverage over the USG. They're playing ball, it seems, which is good. It's just that, man, you should not be doing this if you genuinely think that this is national security infrastructure and it obviously is, like this, yeah, anyway. So things have turned around pretty quickly, and I'm sure we'll have a lot more to say about that next week.

Speaker 1:
[115:54] And I think it's an interesting thing to be related to national security. There's often a metaphor of advanced AI to nukes, which is an imperfect metaphor, but kind of links to this, many aspects of it as needing control, as being massively important. This is an interesting case with regards to banking and cybersecurity, because there are nation state hacking groups, North Korea, Russia, they go off to banks and steal money, and mess with elections and do all these kinds of things. In that sense, advanced AI with cybersecurity attack capabilities is very dangerous. It is nukes in the sense that North Korea actively tries to hack and steal money and mess with all sorts of countries in all sorts of ways. If they had MIFOs, they would be wrecking a lot of stuff. That's an example where the nuke metaphor, even in the short term, even going beyond extinction risk or whatever and army capabilities, it is already at the point where that metaphor is starting to become more legit.

Speaker 2:
[117:11] Absolutely. It's a fact when you look at the kinds of vulnerabilities that MIFOs was able to find and exploit. 26-year-old vulnerabilities in the most hardened server operating software that there is. 16-year-old software, open-source software, has been reviewed literally millions of times by open-source developers including some of the best cyberminds in the world. And more vulnerabilities that were not just missed by a lot of humans, they were critical. This is vulnerabilities that allowed total system takeover, like just abject, unapologetic, complete pwnage. If that happened with a bank, basically, I mean, you name it, drain all the accounts, delete the ledgers, this and that. We're back to a paperwork society. Let's hope all the documents were printed, so that we even know who owes what to whom. That's literally the stakes here. That cannot happen while the US dollar remains stable. That is a direct threat to the strength and stability of the US dollar, which is arguably the American president's number one responsibility, whether or not that's understood by US presidents is a separate thing. But you nuke the dollar, like it's over. So from a WMD standpoint, in terms of that sweet spot of what attacks would be deniable in the sense that China or Russia or whoever could do it, and then it couldn't be easily traced to them, and also brutally effective, say in the context of a Taiwan invasion or any other contingency, or just for the hell of it, like this is a sweet spot of a deal. Worth remembering that Chinese Internet has something like four different channels you can use to get into it. Ours has thousands, so many that no one can even remember what the hell is going on. Doing a reciprocal cyber attack is not an option. They can literally close their Internet to us. We cannot do the same. Massive asymmetry here, this is a really, really, really big deal. To be perfectly honest with you, this is not investment advice and feel free to call this crazy, but we are making significant changes in the balance of our portfolio. What is physical assets versus what is digital assets based on this? I will go one step further as a community service here and just say, a lot of very, very high placed people at a lot of the world's top institutions, including some of the companies most directly involved in these things, are doing similar things, are getting even bug out properties that they can go to. This may sound crazy and it may not come to pass, but again, as a community service, this is the kind of thing that you actually maybe should be taking more seriously, not investment advice.

Speaker 1:
[119:57] Do your own research and all that jazz, but this is just like, it's an increasing and pretty remarkable trend that I find it really hard to ignore.

Speaker 2:
[120:05] With that, we are done with the latest episode of Last Week in AI, which hopefully... It comes out just in a couple of days after recording. I'll do my best. As I say, you can go to lastweekin.ai for the text newsletter, which also has been a bit behind, but I'll be releasing more regularly. Please do keep commenting, and if you want, share the podcast and review it on Apple Podcast. We look at that as well. But more than anything, we appreciate you listening, and please do keep tuning in. From girl next to robot, the head lies upon data-driven dreams. They just ghost out.

Speaker 1:
[122:06] They read brain. They read code on them. The site that was hidden Or machine-burying marvels, coding kings. Teachers, unfolding, see what it brings.