title AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

description Today, we check in a year after the first Unsupervised Learning x Latent Space Crossover special to discuss everything that has changed (there is a lot) in the world of AI. This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.
Unsupervised Learning is a podcast that interviews the sharpest minds in AI about what’s real today, what will be real in the future and what it means for businesses and the world - helping builders, researchers and founders deconstruct and understand the biggest breakthroughs.
Thanks to Jacob and the UL production team for hosting and editing this!
Jacob Effron
* LinkedIn: https://www.linkedin.com/in/jacobeffron/
* X: https://x.com/jacobeffron
Full Episode on Their YouTube
We discuss:
* swyx’s view from the center of the AI engineering zeitgeist: OpenClaw, harness engineering, context engineering, evals, observability, GPUs, multimodality, and why conference tracks now reveal what matters most in AI
* Whether AI infrastructure has finally stabilized: why “skills” may be the minimal viable packaging format for agents, why infra companies have had to reinvent themselves every year, and why application companies have had an easier time surviving model volatility
* The vertical vs. horizontal AI startup debate: why application companies can act as the outsourced AI team for enterprises, why some horizontal companies still matter, and why sandboxes may be the clearest reinvention of classic cloud infrastructure for the AI era
* The “agent lab” playbook: starting with frontier models, specializing for your domain, then training your own models once you have enough data, workload, and user behavior to justify the cost and latency savings
* Why domain-specific model training is real, not just marketing: how companies like Cursor and Cognition can get users to choose their in-house models, and why search, domain specialization, and distillation are becoming more important
* Open models, custom chips, and alternative inference infrastructure: why swyx has turned more bullish on open source, why non-NVIDIA hardware is suddenly getting real attention, and why every 10x speedup can unlock new product experiences
* What it means to sell to agents instead of humans: why agent experience may mostly just be good developer experience by another name, why APIs and docs matter more than ever, and how pretraining-data incumbents are compounding advantages in an agent-first world
* Why memory and personalization may become the next big wedge: today’s models mostly reward frequency of mentions, but in the future, swyx expects product choice to be shaped much more by personalized memory systems
* The state of the AI coding wars: why coding has become one of the largest and fastest-growing categories in AI, how Anthropic, OpenAI, Cursor, and Cognition have all ridden the wave, and why the category may still have more room to run
* Capability exploration vs. efficiency: why the industry is still in a token-maxing, experiment-heavy phase where people are rewarded for spending more rather than less
* Claude Code vs. Codex and the strange stickiness of coding products: why first magical product experiences may matter more than expected, and why the bigger mystery may be why only a few names have emerged as real winners so far
* What the end state of the coding market might look like: two major players, a longer tail of niche products, and possible disruption if Microsoft, Mistral, xAI, or the Chinese labs push harder into coding
* Where application companies still have room against the labs: why frontier labs are trying to expand into verticals like finance and healthcare, but still leave space for focused companies that own the workflow and the last mile
* Why coding may be a preview of every other AI market: the first category to truly go parabolic, the clearest example of foundation model companies colliding with application companies, and a template for how future vertical AI markets may develop
* Why AI valuations now feel unbounded: from billion-dollar ARR products built in a year to trillion-dollar market caps, swyx and Jacob unpack how the AI market has broken traditional startup intuitions about scale and durability
* Consumer AI vs. coding AI: why ChatGPT’s consumer category may have plateaued on frequency and product design, while coding continues to feel like a daily-use category with real momentum
* The next product frontier beyond coding: consumer agents, computer use, and “coding agents breaking containment,” with swyx’s thesis that 2025 was the year of coding agents and 2026 may be the year they begin to do everything else
* Whether foundation models are really killing startup categories: why swyx is less worried for early founders, more worried for mid-size startups and traditional SaaS, and why building something ambitious may now be the best job interview for a frontier lab
* AI vs. SaaS and the internal culture war around adoption: the tension between AI-native employees who want to rip out expensive software and skeptics who think quick AI-built replacements create fragile systems
* Why traditional SaaS may be under real pressure: swyx’s own experience spending six figures on event and sponsor management software, the temptation to rebuild it cheaply with AI, and the broader question of whether teams will trust custom AI-native replacements
* Biosafety, security, and frontier model access: why swyx raised biosafety at a dinner with Anthropic’s Mike Krieger, why Krieger argued security is the bigger issue, and what restricted model releases reveal about Anthropic vs. OpenAI
* The era of giant models: why 10T+ parameter systems may only be a temporary rationing phase before bigger clusters arrive, why labs may increasingly keep their most powerful models private for distillation, and why scale alone no longer feels like a complete answer
* Memory as the slowest scaling factor in AI: why context windows have improved far more slowly than people hoped, why million-token context still has not changed most real workflows, and why memory may be the key bottleneck for the next generation of systems
* What swyx changed his mind on in the past year: becoming more bullish on open models, more convinced that the top tier of agent startups behaves very differently from the median AI company, and more optimistic about fine-tuning and specialized model adaptation
* “Dark factories” and zero-human-review coding: the next frontier after zero human-written code, where models not only write the code but ship it without human review, forcing companies to rethink testing and verification from first principles
* Why RL and post-training may matter more than people assumed: even if the resulting models get thrown out every few months, the data, workflows, and domain-specific improvements persist
* Synthetic rubrics, Doctor GRPO, and multi-turn RL: why reinforcement learning is becoming much more domain-specific and multi-step than many people realize, opening the door to much deeper customization
* The next frontier after coding: memory, personalization, and world models, including why swyx thinks world models matter not just for robotics or gaming, but for giving AI something closer to lived understanding
* Fei-Fei Li, spatial intelligence, and the Good Will Hunting analogy: the idea that today’s LLMs may know everything by reading it all, but still lack the lived experience that turns knowledge into a deeper kind of intelligence
Timestamps
* 00:00:00 Intro preview: AI coding wars, startup pressure, and market structure
* 00:00:28 Welcome to the Latent Space × Unsupervised Learning crossover
* 00:01:17 What AI builders are focused on now: OpenClaw, harnesses, and infra
* 00:04:33 Why AI infra is harder than apps, and where startups can still win
* 00:06:39 Should companies train their own models?
* 00:09:28 Open models, custom chips, and the new inference race
* 00:11:25 Designing products for agents, not just humans
* 00:16:49 The state of the AI coding wars in 2026
* 00:19:27 Capability exploration, token-maxing, and why coding is going parabolic
* 00:21:41 What the end state of the coding market could look like
* 00:23:50 Where app companies still have room against the labs
* 00:27:02 Why AI valuations and market swings feel unprecedented
* 00:28:56 Consumer AI vs. coding AI, and why sticky products still matter
* 00:32:28 What the next breakthrough product experience might be
* 00:32:53 2026 thesis: coding agents break containment and eat the world
* 00:35:27 Are foundation models wiping out startup categories?
* 00:37:33 AI vs. SaaS, vibe coding, and internal team tensions
* 00:40:01 Biosafety, security, and the politics of restricted model releases
* 00:42:19 Giant models, compute constraints, and the limits of scale
* 00:44:30 Memory as the real bottleneck in AI
* 00:44:57 Why swyx changed his mind on open models
* 00:47:44 Dark factories and the future of zero-human-review coding
* 00:49:36 Why post-training and RL may matter more than people think
* 00:51:50 Memory, world models, and the next frontier of intelligence
* 00:53:54 The Good Will Hunting analogy for LLMs
* 00:54:21 Outro
Transcript
[00:00:00] swyx: Isn’t that crazy? That number is just mind boggling.
[00:00:03] Jacob Effron: What is the state of the AI coding wars today?
[00:00:05] swyx: We’re in a phase of sort of like capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year coding agents 2026 is coding agents breaking containments to do everything else.
[00:00:16] Jacob Effron: Do you worry about the foundation models just getting into a bunch of these startup categories?
[00:00:21] swyx: Mid-size startups. Yes.
[00:00:23] Jacob Effron: What do you think the end state of this market is
[00:00:25] swyx: for the market structure to, to significantly change? There would be
[00:00:28] Jacob Effron: today on unsupervised lea

pubDate Thu, 23 Apr 2026 19:37:19 GMT

author Latent.Space

duration 3292000

transcript

Speaker 1:
[00:00] Isn't that crazy? That number is just mind-boggling.

Speaker 2:
[00:03] What is the state of the AI coding wars today?

Speaker 1:
[00:05] We're in a phase of capability exploration. The general thesis that I have been pursuing now is that the same way that 2025 was a year of coding agents, 2026 is coding agents breaking containment, do everything else.

Speaker 2:
[00:16] Do you worry about the foundation models just eating into a bunch of these startup categories?

Speaker 1:
[00:21] Mid-sized startups, yes.

Speaker 2:
[00:23] What do you think the end state of this market is?

Speaker 1:
[00:25] For the market structure to significantly change, there would be...

Speaker 2:
[00:28] Today on Unsupervised Learning, we had a fun episode on what's really become an annual tradition, a crossover episode with our friends at Latent Space. Swicks and I sat down and we talked about everything happening in the AI ecosystem today, what we thought of the various changes at the model layer, what's happening in the Infraworld, the coding wars, and a bunch of other things. It's a ton of fun to do this with someone I really respect and another great podcaster in the game. Without further ado, here's our episode. Well, Swicks, this is super fun to be back with another Unsupervised Learning x Latent Space Crossover episode. I feel like a lot of places we could start, but one thing I always find fascinating about the way you spend your time is you obviously are at the epicenter of this engineering movement and community, and you run these events and conferences and put on these awesome talks, and I think just have a great pulse on the zeitgeist of what's going on. Maybe to start just, what are the biggest topics people are thinking about right now?

Speaker 1:
[01:21] Yeah, so I just came back from London where we did AIE Europe, and we're doing roughly one per quarter now, which really upped the pace. We're trying to match AI speed.

Speaker 2:
[01:30] Yeah, exactly. The topics will be completely different, I imagine.

Speaker 1:
[01:33] I definitely curate the tracks. You can see what I think when you see the track list and the speakers that I invite. Obviously, OpenClaw is the story of the last four or five months. And then just below that, I would consider Harness Engineering and Context Engineering to be two related topics in Agents and RAG. And then there's a long tail of evergreen stuff, like EVALs, Observability, GPUs, and LLM Infra just in general. We also have other updates on multi-modality and generative media, let's call it. But definitely the first three that I mentioned are top of mind people.

Speaker 2:
[02:13] I think Harness is particularly so interesting. There was this tweet from Harrison Chase, the lane chain CEO that caught my eye recently, where he said, it finally feels like we have stability around the infrastructure for, around AI. And I think what he basically was implying is like look over the past two, three years as a company at the epicenter of AI infrastructure, it was a bit like playing whack-a-mole, right? You were constantly moving around with however the building patterns were evolving.

Speaker 1:
[02:36] For Harrison, for sure, right? He's basically had to reinvent the company every year since he started lane chain, right? It was lane chain, lane graph, and all deep agents. And I think he's one of the most nimble, adept, sharp people about this.

Speaker 2:
[02:49] Yeah, but he's like, now is finally the time for stability. Do you buy that or what have you kind of make of that take?

Speaker 1:
[02:56] I think that it's very expensive to say this time is different sometimes. But when you're just writing code, it's actually okay to just try to make a call. And I think it may not even matter if this call is right or not. I just don't even care that much because you can be right on the thesis, but if you don't figure out how to monetize the thesis, then who cares if you said something first? That said, it does feel like, for example, we went through a lot of different ways of packaging integrations up with agents. And it feels like we've landed at skills, which is like the minimal viable format, which is just a markdown file with some scripts attached to it. And I don't see how it can be more simple than that. And so there is some justification for the stability around harnesses. I feel like there may be more adaptation with regards to maybe like the real-time elements or subagents or memory or any of those agent disciplines, let's call it, in agent engineering. But if the thesis is that, OK, you just want agents are LLMs with tools in the loop, with a file system where they can do retrieval with skills and all these like standard tooling that now seems to be relatively consensus, then probably that makes sense. I just think like, there's no point trying to stake your reputation on this thesis that we're there, because if it changes again, just change with it. It's fine.

Speaker 2:
[04:34] I've always been struck by how that is much more challenging for infrastructure companies and application companies. Obviously, I think on the application side, you've seen Bret Taylor from Sierra, Maxine from LaGora, they're like, look, we build what's ahead of the models and we're willing to throw everything out every three months as the models get better and better. But the thing you at least have there is you have an end customer aspect that's decently sticky. They will mostly stick, they'll give you a shot at least of building these things. What I've always found more challenging at the reinvent yourself every three months of the infrastructure layer, it's like developers are definitely a pickier audience maybe than an accounting firm or a bank. And so it's definitely a more challenging position to be in, to have to constantly reinvent yourself.

Speaker 1:
[05:17] Yeah, and when they churn, it's very complete. They'll leave to the hot new thing because there's no defensibility, I guess. Even if you are a database, people can migrate workloads off databases. It's a known thing. So I think basically what we're talking about is the vertical versus horizontal debate in AI startups. And the way I think about it also is just that when you're in Lagora, when you're a bridge, you are the outsource AI team. Your job is to apply whatever state of the art AI methods.

Speaker 2:
[05:55] Yeah, like this translation layer between model capabilities and your customers.

Speaker 1:
[05:58] Yeah, to the end customers. And well, if they didn't have you, they would have to hire in the house and they're not going to hire in the house. So they have you. And I think that's a reasonable, very robust to any whatever trends and discoveries that people make in the engineering layer. I do think like there is like sort of useful horizontal companies being built, but they're all very much like sort of like the reinventions of classic cloud in the AI era and the primary one being sandboxes.

Speaker 2:
[06:29] Yeah.

Speaker 1:
[06:31] Which like it's another form of compute guys, like let's not get too excited about it. But I mean, the workloads are enormous.

Speaker 2:
[06:38] Right. It's interesting. I feel like as part of this, the questions that folks are asking around infrastructure, there's a lot around the extent to which company should have their own AI teams and what they should be doing in house. And I think there's questions around, should people be training their own models? Should people be doing RL in house based on the data they have? I feel like one has to evolve their takes on this every three months with Paces. But where are you at on this today?

Speaker 1:
[06:59] I think actually all models have gone up. And obviously, I'm involved in Cognition and also Cursor is doing a lot of old model training. And I think that is some part of what I've been calling the Agent Lab Playbook, where you start off with the state of the art models from the big labs and you specialize for your domain. But once you have enough workload and enough high quality data from your users, then you can obviously train your own models and save a lot on cost and latency and all that good stuff. You also get a marketing bonus of calling it some fancy name and putting out some research.

Speaker 2:
[07:38] From my seat, I can't tell how much of it is like actual value that's provided to the end user and how much of it is that marketing bonus, right? It seems some combination of the...

Speaker 1:
[07:45] I think it's both.

Speaker 2:
[07:46] Yeah.

Speaker 1:
[07:47] No, no, there actually is real value. And you know that for a number of reasons, like one, even when it's not subsidized, people do choose it as like one of the top four or five. This is both Composer 2 and Suite 1.6, one of the top five models, like in a fair market, in a free market, in a model switcher, people do choose it and it's not subsidized. So that's as good as it gets. But beyond that, domain-specific models, for example, for search which both companies have, absolutely makes a ton of sense. Everyone says like, yeah, you should always do this. And honestly, I think the infrastructure for that is becoming easier with Thinking Machines Tinker thing, as well as Prime Intellect's lab stuff. Yeah, I mean, this is one of those reversal of the bitter lesson where you first bootstrap on the large models and the general-purpose models to get big, and as you get very well-defined workloads that are just high-quantity but not high-variance, then you just distill down to a smaller model and run that on your own, which totally makes sense.

Speaker 2:
[08:50] What I'm less clear on is the DIY RL use case, which I think is really mostly around improved quality for different things. Obviously, there's probably more efficient ways to get a smaller model that's faster and cheaper. It'll be interesting to see whether... Obviously, you had two, three years ago this whole case of companies that were pre-training and claiming better outcomes in their domains, then getting cooked as each model iteration improved. I wonder whether a similar story plays out in the RL space, again, for the focus on pure outcomes and quality, not the cost side, which clearly, your own models for cost at scale makes a ton of sense.

Speaker 1:
[09:28] I think there are two sides to the same coin. Like, you basically always want to hold quality constant or trade off a little bit of quality for a drastic decreasing cost, and that's true for everyone. One element I wanted to bring out, which is very much in favor of open models, is custom chips. So this would be Cerebris, but also Talos. And then there's a huge range of stuff in between. This has been a huge story this past year on just like everything non-NVIDIA is getting bid up, including like freaking MatX is working, which is very rewarding for me. But I think one of those things where, like, oh, suddenly, because the number of alternative hardware is increasing and the inference that you can get is insanely high, like, we're talking thousands of tokens per second instead of less than 100. So the trade-off for quality doesn't hold as much anymore because the speed is so high.

Speaker 2:
[10:24] Have you seen a lot of companies go all in on the alternative ships?

Speaker 1:
[10:26] So Cognition has on Cerberus, and so has OpenAI. And so, no, I don't think so beyond that. And that's mostly because that's foreshadowing of what's to come. I used to be kind of a skeptic in terms of like, okay, so what if I get my inference at 100 tokens per second, sped up to 200 tokens per second? It's only 2x faster. It's not that big a deal. But when you, I think every 10x does unlock a different usage pattern. And we have proof in Talas and some of the others that you can actually drastically improve inference speed. And what happens from there, I don't even really know. It's so hard to predict when entire applications just appear at once. And it also isn't that expensive, right? So this is one of those things where I think the investment cycle is going to be multi-year. And I would caution people to not dismiss it too quickly.

Speaker 2:
[11:26] One other infra question I was curious to get your thoughts on is obviously, it seems increasingly a lot of the cutting-edge infra companies are building for agents as the buyers of their product or users of their product, right?

Speaker 1:
[11:37] Another huge step.

Speaker 2:
[11:38] And I'm trying to figure out what do you have to do differently about selling into agents? Are they just the ultimate rational developers or is there...

Speaker 1:
[11:46] No, absolutely not. I think they are easily prompt-injected and very tuned towards basically compounding existing winners. So, congrats if you won the lottery for getting into the training data before 2023, because now you're installed in there for the foreseeable future. But yeah, one stat that Vercel CTO Malta Ubel dropped at my conference was that there are now 60% of traffic to Vercel's admin app architecture for configuring Vercel applications is bots. It's not human. So your primary customer is agents now, and it's mostly coding agents, mostly people using a CLI or MCP or whatever. But yeah, I think step one, if it doesn't exist as an API that agents can use, it doesn't exist, which I think it's a good hygiene thing anyway to make everything API available, but not as an extra push on products people to not only work on the UI, you should probably work on the CLI stuff. Beyond that, I think honestly, I come from the sensibility of, I think everything that you are trying to do for agent experience now, which is the term that Matt Billman at Netlify is trying to coin, is the same thing that you should have been doing for developer experience. You should have had good docs. You should have had a consistent API that is mostly stateless. You should have, I guess, discoverable or progressive disclosure or search or whatever. Now that people have energy in finding these customers to do that, that's great. Do I believe in extending beyond that into something like AEO for gaming the chatbots? Not necessarily, but obviously there's going to be huge advantages from people who figure out the short-term wins and short-term wins can compound.

Speaker 2:
[13:42] Do you think these compounding advantages to the pre-training data cutoff company is, obviously over some period of time, I imagine that doesn't persist. And so as you think about, like, I don't know, three, four years from now, what the selection criteria end up being, do you think it still mirrors exactly what you were saying before? Like, it's exactly what you should have been doing all along to sell a good product to the developers?

Speaker 1:
[14:01] It could be, except that I think in three, four years, we'll probably have much better memory and personalization. So then general AEO or GEO doesn't really matter as much. So I think whatever memory or personalization system we end up with will probably determine what you end up choosing much more than what is currently the case, which is just frequency of mentions, as we call it.

Speaker 2:
[14:26] Yeah.

Speaker 1:
[14:26] So you just spam quantity. And I think that's something I'm looking forward to. I do think that the fundamental exercise to work through for yourself is, if you start a new sort of disrupter company now, there's a big incumbent that everyone knows, like SuperBase. SuperBase is kind of like the Postgres database incumbent. If you want to start a new SuperBase, how would you compete with them? And I don't necessarily have the answer, but I do think people resend, like relatively new. I think it was started in like 2023. And there was a recent survey where people checked what Claude recommends by default. If you just don't prompt it with anything, just say, give me an email provider and says resend, as in 70% of cases. The fact that you can get in there with such a relatively short existence, I think is encouraging. I do think like you do want to do whatever it is to get in, that very short mentions this because it's not going to be 20 of them, it's going to be like three.

Speaker 2:
[15:27] It feels like probably more consolidation than ever, or kind of like a winner take most market, than maybe the physics of go-to-market in the past might have enabled.

Speaker 1:
[15:38] The other thing also is like semantic association is going to be very important, in the sense that you want to do the combo articles where you're like, use my thing with for sale, with blah, blah, blah, and that all gets picked up in the corpus. So that's probably one thing that you want to do well. I don't know what else. It's one of those things where I feel I'm behind. I don't know how you feel about this, but like...

Speaker 2:
[16:04] I think AI is just everyone constantly feeling like they're behind some... I want to meet the person that doesn't feel behind.

Speaker 1:
[16:11] But like with AX, right? So my stance was exactly what I said before, like everything that you should do for agents is something that you should have done for humans anyway. And so to the extent that you're just getting more energy to do things for agents, great. But like it's hard to articulate what new thing, apart from just like more spam, that you should be doing anyway. That will be my take right now. I do think like there will be more turns at this. I think the personalization turn that is coming will be big. And I don't know what that looks like because like basically, we feel kind of tapped out on the memory side of things.

Speaker 2:
[16:49] Yeah. I guess since we last chatted, you took this role over at Cognition, and you obviously have a front row seat to the AI coding space today. I feel like coding in many ways, people view it as this like, besides being like the mother of all markets and this massive opportunity, I think it's kind of a preview of like what's to come for many other spaces, both, I feel like agents are most advanced in coding. I also feel like the competition between foundation models and application companies mirrors what we may see in other spaces. So maybe for our listeners, can you just lay out like what is the state of the AI coding wars today?

Speaker 1:
[17:26] It is massive, and I don't think necessarily last time we talked about this, we appreciated the size of what it is.

Speaker 2:
[17:32] No, I wish we did.

Speaker 1:
[17:34] It is the state of AI coding wars today. Both OpenAI and Anthropic have made it their P0s to compete in coding. Anthropic is at like 2.5 billion in ARR just from Cloud Code. The way they recognize ARR is up for debate. OpenAI, I don't think a public number is known, but let's call it 2 billion as well. And then Cursor is rumored to be 2 billion. Those are the public numbers that are known. So huge markets that have just been created in the past one year. Like Anthropic, like Cloud Code just recently celebrated their one year anniversary, which is pretty amazing. So I think the other thing that I see is there's some other people who are like, oh, here's the relative penetration of Cloud use cases. And it's like coding 50 percent and then legal, whatever, it's like the remaining ones. And there was a very popular tweet that was like, okay, I'll look at the empty space in all these other use cases. If you are a new founder today, you should be betting on the other stuff because on a sort of catch up theory. And my pushback is the same pushback that I had on Apple versus Google, which is like, well, why is this time different? If it went from, let's say, 10 to 50 percent in the past year, why can't it keep going? And getting that wrong is actually a very painful one because you could have just did the momentum bet instead of the mean reversion bet. So I think that is the state of things now that people are very much into psychosis. They are getting rewarded for spending more rather than spending less. And I think we're not in that phase of efficiency. We're in a phase of capability exploration. So I think people who are more crazy, who are more creative, get rewarded comparatively.

Speaker 2:
[19:27] Well, it's interesting. I mean, it feels like behind these like token maxing leaderboards and what not is this. It's the first phase of this transition from a workforce perspective is you just got to show your employer like, hey, I use these tools.

Speaker 1:
[19:37] Here's my number of tokens I cost. And that's it. They don't care about the quality right now. It is maybe distasteful to someone who cares about the craft and all that. But directionally, everyone just wants you to go up regardless. And so it's not very discerning. It's probably very sloppy. But I think it's net fine because we're still probably underusing AI just in generally. And so I think that's very interesting. Like we had on the podcast, Ryan Lopopolo from OpenAI, who spends a billion tokens a day. And that's, for those counting at home, is something like $10,000 worth a day of API tokens if they did market rates. And most of us can't afford that. But and probably a lot of what he does is slop. But if there were a new capability, he would discover it first before you because he was trying and you were not trying. And you only do things that work. Well, good for you, but the people who are going to discover the next hot thing are living at the edge.

Speaker 2:
[20:42] An increase in living at the edge is just having the compute budget to run these experiments. Kind of similar to what living at the edge on the research side has always been. It was constrained in many ways by the amount of compute you had to run these experiments. It feels similarly on the almost on the builder or actualizing these tools now.

Speaker 1:
[20:56] The other thing that's very obvious is Anthropic is kind of like the high price premium player. Restricting limits or restricting model releases is the name of the game. Whereas Codex is like, come on in guys, use our SDK, use our login, we don't care, we're going to reset limits, whatever. You do want to try to exploit the subsidies where you can get it. Definitely, Codex is super subsidized right now. Gemini also very subsidized. Comparatively, I think you should make, hey, I guess, well, that's going on. It's not that bad to be a capabilities explorer on just the $200 a month plan from Cloud Code or from OpenAI. My sense is that people aren't even there yet.

Speaker 2:
[21:41] How do you think this market ultimately plays? I mean, it's obviously such a big market that any slice of that market is interesting for anyone going after it. But I think what makes people so interesting in the coding market, particularly, is it feels like it's this foreshadowing of what will happen in any other application market that the foundation models eventually turn to and are all their models against and gather data around. And so, how do you think, you know, like, does there end up being room for lots of different kinds of players? Or like, what do you think the end state of this market is? And is that, do you think that's applicable to other markets?

Speaker 1:
[22:10] I feel like there will be, I mean, status quo is probably the most likely outcome, which is there are two big players and there's a small range of longer tail people that fit other use cases that the two big players don't. That feels right to me. I think that for the market structure to significantly change, there would be, there needs to be significant change in the economics or the brand building or the value propositions of the companies involved. And I haven't seen any in the last six months that have really changed the stories materially. So I feel like they would just keep going until something else happens. Something else happens meaning Microsoft wakes up and goes like, guys, we have GitHub, we have, we'll do something much bigger here than other than just Co-Pilot. And that will be a big change. MSL has put out a model now. And I was in a breakfast with Alex Wang where they were like, yeah, we really, really want to go after the coding use case. They haven't done anything yet, but don't underestimate them. And similarly for the Chinese labs. I think they're trying to go after it. Like ZAEI is doing stuff, GLM, ZAEI and GLM are the same thing. And so it's like everyone's trying to get a piece of that pie. I feel like the status quo hasn't been pretty stable for the past almost a year, I will say.

Speaker 2:
[23:39] Yeah. And is there room for the application companies more on the enterprise side or where do the, what surface area do the model companies leave for application companies?

Speaker 1:
[23:50] Yeah, that's a good one. It's very much evolving. I will say, because OpenAI did not have this level of attention on coding a year ago, we just don't have that much history, right? And it seems like, for example, so the big push at OpenAI now is the super app. Is that a consumer thing? Is that a product's portfolio rationalization thing? How much is that going to take away attention from coding at the time when they actually do want to put more coding? I think it's very unclear. So I do think there's all these, in both big labs, there's, sorry, both OpenAI and Anthropic and D- and XAI are separate cases. They are trying to see the other TAM expansion areas. So Cloud Code for Finance, Cloud Cowork, all those things. Whereas I think Cursor and Cognition are like comparatively just focused on coding. And so I do think they leave space. And I do think for the other verticals, that also means the same thing, that they're not going to be that intensely focused on that domain. Except for, I think, I would mark out finance and healthcare as like the next ones, that they're clearly going after. I would say comparatively, healthcare seems more thorny. There have been some announcements about it, but I would respect the finance work a lot more, just because the path to money is a lot clearer.

Speaker 2:
[25:12] Yeah. No, I mean, obviously, I think maybe similar to the space that's being left in these other domains, there's obviously a lot that's required to actually implement these tools in enterprises versus maybe just giving them, giving model access to folks out of the box.

Speaker 1:
[25:27] Yeah, yeah, yeah. So the agent lab thing is like, we'll do the last mile for you, whereas I think the model labs tend to just trust the model and be minimalist about it. Both of them work. I don't necessarily think one beats the other for every use case. All I do know is that it does seem like the large enterprises do want a dedicated partner that isn't just the model labs, which is kind of interesting.

Speaker 2:
[25:55] We've been in this phase of pure capability exploration, and so I think nothing has been better for the large labs. I mean, they are always going to be at the frontier of capability exploration, and so I think have a very good relationship with a lot of these enterprises. But ultimately, over time, the incentive structure of these labs is always going to be maximal token consumption for the end customers they work with. There's just, I think, so few companies that have actually gotten to massive scale. Maybe coding, again, is the most interesting. It's the first space that really is just completely gone. You know, yeah, you must love it every day, like absolutely insane.

Speaker 1:
[26:31] And I think even, okay, I mean, like I think we say good things about Cursor Cognition, but the sheer liftoff of, like, both Anthropic and OpenAI, because they have independent valuations, I mean, let's throw in XAI in there, because it's now IPO-ing at $1.2 trillion, that number is just mind-boggling. Like, I feel like in normal investing or normal startups, there's kind of like a ceiling market cap or valuation that, like, you reach your goal, like, oh, all right, it's gonna be chiller from now on. Like, and these guys are not slowing down.

Speaker 2:
[27:01] No, well, I also think the dynamic that's fascinating about some of these later stage companies is, you know, in the past, I feel like in venture world, if you got to a certain level of scale, the question around you was really more a valuation question. This is like why there was different phase, like, you know, types of venture people that in like the late stage growth people were just incredible at like, you know, a little bit of what's the ultimate market opportunity of this company, but also what's the right way to value it? Like, we know it's in some bands of an outcome that is like, sure, there's some variance to it, and it's like relatively understood what that bands is, and then maybe you get over time surprised to the upside. Whereas any kind of like, even the labs themselves, any later stage company, the bands of which that company might be worth right now, even in a year or two years, are so massive because of how fast the ecosystem changes, that it's like, even for later stage companies, every three months could be an existential level event to the upside, to the downside. And I think that like, you're obviously seeing it in the positive with code, which, you know, if you think about a company like Anthropic, you know, that for a while, it was like unclear if they were going to have access to enough capital to really stay in the rates, right? And then coding hit at the exact right time, they had the perfect model for it, they executed brilliantly, and, you know, now we're, you know, one of the most valuable companies in the world.

Speaker 1:
[28:13] At the same time, I don't find, I have zero sympathy for OpenAI because they're crushing it, and they're all rich, you know, this is like a high class champagne problem to have, to be number two at coding or whatever, like, who cares, like, you're doing great.

Speaker 2:
[28:27] Yeah, it's funny, though, I can't even, I mean, you would be closer to this, you know, even if you're in the coding space, but it's like, a lot of people I talk to think Codex is just as good, if not better than Cloud Code, right? I think one thing that I've been really surprised by, and maybe Cloud Code is a better product in some ways, I'm curious your thoughts, is just in consumer AI, with ChatGPT, you saw this big first mover advantage, right, where, admittedly today, like, I don't know, Cloud Gemini, great products, not sure, not abundantly clear ChatGPT is any better, but like, people stick with ChatGPT. It's the first thing to introduce them.

Speaker 1:
[28:56] They stay, but they're not growing anymore. I don't know if you've seen...

Speaker 2:
[28:59] Right, but that to me is more of like a product problem than it is, they're not, it's not like they've like lost share to someone else. My understanding is the overall problem with consumer AI today is much more of a, how do you take this tool and, you know, for folks like us, like knowledge workers, it's like this incredible magic tool, but it's not necessarily a daily active use tool for a lot of people around the world today. And what are the, like, products, it's kind of a category wide problem, like, encoding, for example, like, the entire space has gone parabolic. There may be some relative growth in other consumer AI players, but it's not like consumer AI as a category is, like, going parabolic and they're not capturing most of that thing. I think it's actually, the larger problem is much more, hey, the category has kind of hit a bit of a plateau. People haven't figured out how to bring, you know, tons more users on board or increase the frequency of those users. And so it seems more of a category wide problem that it is, you know, a massive market share change. I was going to draw the comparison to the coding space where Claude Cope was the first product, obviously, to introduce people to this magical experience. You know, by all accounts, Codex is pretty damn close to as good, if not better. But like still that first product, you would have thought that would not be a super sticky, you know, product surface area. And it actually has, it turns out, it feels like the first lab to introduce you to an experience really does keep a lot of the focus.

Speaker 1:
[30:12] I think maybe it's like still early days, you know, ChatGPT is like three plus years old and Claude Cope is only one. So, give it time. Yeah, I mean, definitely a lot of people have switched to Codex. Maybe that will keep going. It's like really hard to tell. Yeah, I do think that because we are in this like high volatility, high temperature phase, the loyalty and stickiness to first movers and category creators, I don't think is as high as it might be in some other areas in our careers that we've looked at.

Speaker 2:
[30:47] Yeah, I mean, I've been surprised by the Claude Cope thing. I would have thought that, in many ways, I always worried about the enterprise.

Speaker 1:
[30:52] Do you think you would have been gone by now?

Speaker 2:
[30:53] Not gone, but I always worried that the consumer business of these companies would be quite sticky. And then the enterprise API business was actually, in some ways, like your least loyal buyers, they would move to.

Speaker 1:
[31:06] But they worked out that it wasn't the enterprise API, it was enterprise product.

Speaker 2:
[31:09] Totally. And maybe that was the secret that, but the amount of lock-in or just default behavior that has happened in that space is more than I might have imagined with two products that by all accounts are pretty damn similar.

Speaker 1:
[31:21] Yeah, no fight there. I will say, I do think that Codex is still in like a catch-up, like in terms of personal experience. The only thing I like out of Codex is like Spark and like, I feel like the skills integration is a little bit better. I feel like the speed is a bit better, maybe because it's written in Rust or whatever. Very minor things that you like, almost like telling yourself, rather than like objectively assessing between two of them. I do think like vibes-wise, I think that's going on. I feel like the missing questions in this whole debate is like, why is this so concentrated in only two names? Where is the Gemini presence? Where is the XEI presence? They are trying, it's just they haven't made that much progress yet.

Speaker 2:
[32:13] What the Cloud Code moment does show, it actually in some ways makes you a little more bullish on the potential for someone else to catch up because it does feel like if you're the first person to introduce some magical net new product experience that that actually might be stickier than one might have imagined.

Speaker 1:
[32:27] Right, right, right, okay, yeah.

Speaker 2:
[32:28] And so everyone can believe they have a shot at that.

Speaker 1:
[32:30] What do you think that new product experience might be? It's like, and this is a failure of imagination on my part, I always wonder, people always say this, well, the thing that will save us is being first to the next new thing. What is it?

Speaker 2:
[32:41] Yeah, I don't know, something around like a consumer-agent-computer-use hybrid, I think we're like scratching the surface on the consumer side.

Speaker 1:
[32:53] So my current theory is like the open clause, like a vision of things to come.

Speaker 2:
[32:58] Totally.

Speaker 1:
[32:58] And it's going to be good that OpenAI has like the association with OpenClaw, but by no means do they have the rights to win it. The general thesis that I have been pursuing now is that the same way that 2025 was the Year of Coding Agents, 2026 is Coding Agents breaking containment to do everything else. And so Coding Agents continue to still win, but because they generate software and software eats the world, so it's kind of like the trans-associated property of software eats the world, coding agents eat software, and therefore coding agents eat the world, which is like an interesting...

Speaker 2:
[33:30] And breaking containment, always an easier phrase in the consumer context than the enterprise one. You've seen people run these really cool experiments in their own personal lives. I think like figuring out how you... Obviously, everyone's focused on the enterprise side now around how you create these experiences. I feel like the vibes, people love to have these narratives of like everything is completely shifted. It's like, I actually, OpenAI, organizationally, volatility aside, is great products, great team, great models. Like everyone else in the world is incentivized for there to be two, three more, everyone would love more like great model companies. And so I feel like the natural forces of the world revolt when any one company is too much the star of the show. There's so many people in the ecosystem that are incentivized for that not to happen. And so I think I'd be shocked if we don't have a reversion of vibes, not maybe completely the other way, but at least a little bit more equal at some point over the next six, 12 months.

Speaker 1:
[34:24] I think there's just kind of different stages. So when you talk about the world wanting more model companies, I think about like the NeoLabs.

Speaker 2:
[34:30] Yeah.

Speaker 1:
[34:31] And I mean, I don't know, is it fair to say none of them have really broken through in the past year?

Speaker 2:
[34:35] I think that's totally fair.

Speaker 1:
[34:37] Which is rough. And, well, how are we going to grow that diversity in choice? Like, this is it.

Speaker 2:
[34:46] Yeah. It'll be really interesting to see what ends up happening with that. And you've seen folks like NVIDIA very incentivized to make sure there's a broader platform of other model providers.

Speaker 1:
[34:57] I think, I don't know, people say this, but I don't think they try that hard. NVIDIA tries harder to build NEO Clouds than NEO Labs.

Speaker 2:
[35:07] Well, they try pretty damn hard to build NEO Clouds, so that's, yeah.

Speaker 1:
[35:10] But like, you know, let's call it like the core weaves of the world, much happier place, you know, than any NEO Lab built on top of them.

Speaker 2:
[35:18] Yeah. That one might argue it's easier to enable NEO Cloud to be successful than it is, you can't will a NEO Lab into existence the same way you can with NEO Cloud.

Speaker 1:
[35:25] NVIDIA has more direct control over it, for sure.

Speaker 2:
[35:27] What else is kind of catching your eye today on the startup side? There's obviously this whole narrative of the foundation models, they announce a product and every stock goes down 15 percent.

Speaker 1:
[35:36] Yeah.

Speaker 2:
[35:37] Do you worry about the foundation models just kind of eating into a bunch of these startup categories?

Speaker 1:
[35:44] Not really. I think actually, there's the point of view of being an investor in startups, and there's a point of view of do you want to start something? I think honestly, the downside for all of these is so minimal, in the sense of the worst you do is you just get hired into one of these labs anyway. So I think the market for people who just do things and try things and try to execute in a competent way, even if it doesn't work out commercially, even if it just wasn't that great anyway. But that's your job interview to go into one of these things anyway. So I don't feel that from a very, very small startup's perspective. Mid-size startups, yes. I would say there's been a lot of dead LLM infra consolidation, like the Lang fuses at the world, getting a sort of the click house. I think people have maybe worked out the domain specific playbook. And I think that's okay. Yeah, I'm not that worried about. Okay, so I would say I'd be more worried about traditional SaaS, like low-NPS SaaS. This is the whole AI versus SaaS debate that's been going on. And literally, I'm going through that exact thing in my company. So I'm kind of thinking through this on a very visceral level. On one hand, you have the people who say, you Vibe coders don't appreciate the amount of work that goes into a CRM. And yeah, you think you can rip out Salesforce. So did the 30 entrepreneurs before you. You classically underestimate the things that you don't deeply know. And you're talking to the audience is not you. At the same time, we have never been able to build software so easily and customize software so easily. And yeah, you're not going to use 90% of the things that Salesforce. So what's the typical point?

Speaker 2:
[37:33] So what have you done internally?

Speaker 1:
[37:34] So we have the main SaaS that we do for event management and sponsor management. And we pay 200k a year for that. Not huge, but chunky for my scale. And yeah, I could probably spend 2,000 and build a custom version of that. But the trick has been dealing with the rest of my team and getting them on board. Because I'm the most cynical person on my team. But I can't make that decision myself. And I think in the same way, I've been telling with other CEOs, team leaders as well. It's like, well, you can be super cloud-pilled. You can be super LLM psychosis. And you think that's OK. But you have to bring your team with you. And I think the sort of widening disparity in LLM psychosis in companies is causing real rifts. Because on one hand, the people who are less AI native are not getting with the picture. They're actually like behind. They're actually not waking up to the fact that everything you think is necessary is not actually that necessary. And in fact, it would be better of you if you just like held your nose and went in and when came out the other side, only talking to agents in natural language. And like your life would actually be better when you're just like close minded. There's that perspective. The other perspective is, oh, you vibe coder, you did this in a weekend and you got the 80% solution and now the rest of your employees have to pick up the rest of your shit that you thought you were such hot, amazing at, but like actually you didn't figure it out. And like actually all of them are still useless at this and blah, blah, blah. So I think there's this huge debate going on in every company right now. And like, you know, I have a small microcosm of it, but like, yeah, it's making me hesitate to pull the trigger. But like I will at some point. It's like maybe I put it off for one year, but not like five. But like, so like SAS is definitely getting squeezed. It does make me wonder, like, I do think that there's an opportunity for a more AI native system of record thing that is not just Postgres, or not just MongoDB, although both are very good. Maybe it's like a convex or like people bring up convex a lot. I don't know. Like, I just feel like the sort of quote unquote Firebase of AI apps isn't really a thing yet, beyond what we have, which is fine. It's just we could probably start in a more sort of rapid iteration cycle first before scaling up to like a Postgres or MongoDB, which are more sort of old tech. I was at a dinner with Mike Krieger, the CPO of Anthropic, and we were just kind of going around the room going like, what are people most worried about? And for me, instead of security, I brought up biosafety. Classic. I said it was cliche and classic, and the rest of the table were like, what do you mean someone sitting at home can manufacture a virus that wipes out half of humanity?

Speaker 2:
[40:32] It was like the OG Jeffrey Hinton, like, this is why you should be scared.

Speaker 1:
[40:35] I'm like, yeah, read the risk reports, like this is the thing. I think, and Mike was just sitting there knowing he was sitting on Mithrilus and going like, actually it's security. And I think part of it is very good marketing, too good. Like, I would actually advise Anthropic to tune down the marketing, because also it's just a very good model and you don't have to make so many marketing claims around it. At the same time, it is not really a private model if you give it to 40 companies, each of whom have like 10,000 employees or whatever. Right? It's not private. It's like, there's bad actors in there.

Speaker 2:
[41:19] Hopefully, not as bad as releasing it widely. But no, I mean, it's an interesting case study for how, I mean, many model releases, this might be the first model release that looks like the rest of them from now on.

Speaker 1:
[41:31] Right? There's an overall product strategy for Anthropic of like bundle, you know, restrict access, bundle product with model maybe, whereas OpenAI has definitely been a lot more sort of philosophically aligned on, like we will just enable access everywhere and we don't know what will come out of it.

Speaker 2:
[41:51] Right? I mean, this current moment, obviously, the cynical take is also just ties to the amount of compute that both companies.

Speaker 1:
[41:56] Right. Yeah, I think that's true. I do think like this is the scale, the dawn of larger than 10 trillion parameter models is very interesting. I think it's a temporary phenomenon because we have much larger compute clusters coming online for everyone over the next three, five years. This is already written in the cards.

Speaker 2:
[42:18] Yeah.

Speaker 1:
[42:19] So, to the extent that like, will we have rationing of models about 10 trillion in like two years? I don't think so. I think everyone will have that.

Speaker 2:
[42:28] No, we'll just have rationing of the next phase.

Speaker 1:
[42:30] Right. But like that's as it should be almost like my classic example, which this is just me theorizing not anything confirmed by Google. When Google announced Gemini, they actually announced three sizes, which was Flash Pro Ultra. They never released Ultra. They only have Pro and Flash. So my theory is they have Ultra sitting in the basement and they just keep distilling from it for Flash and Pro. Which like, yeah, I mean, I actually think that's as it should be for any lab that they do that.

Speaker 2:
[43:02] Yeah, just because those are the models that people actually want to end up using and it's just like cost-price.

Speaker 1:
[43:06] Yeah, it's cost. It's not the want. It's just the cost. I do think like it is interesting that for a while, I was considering the theory that models capped out at 2 trillion. And I think that's proving to be wrong. And well, then if I'm wrong, how wrong am I? Do we do 200 trillion? Do we do 2 quadrillion? Whatever. And I don't think we have the straight answer to that. But it's interesting that we are continuing to scale the number of params when everyone can see that we're not going to get the next thousand or one million X from this paradigm. So the other alias of the world are working on other model architecture improvements. We need a different scaling law, I guess, because I feel like people already feel like we're tattled on this. The end state of this is we turn most of the world into data centers. I don't know if we want that.

Speaker 2:
[44:08] Yeah. If the return of intelligence are there, maybe not so bad.

Speaker 1:
[44:13] I think there's just a sheer amount of unscalability that is wrangling people's sensibilities right now, especially in terms of context lengths. My classic quote is that context length is the slowest scaling factor in LMs.

Speaker 2:
[44:30] Yeah.

Speaker 1:
[44:31] We took maybe three years to go from 4,000 context length to a million, and that's about it. Like Gemini has had a million token context length for two years now, and no one's using it. Memory is probably going to be the biggest limiting constraint on all these things.

Speaker 2:
[44:50] Yeah. Certainly seems that way. I guess I'm curious over the last year since we recorded last, what's one thing you've changed your mind on?

Speaker 1:
[44:57] I feel like I was kind of bearish on open models last year, in the sense of I had just done the podcast with Ankur Goyal of BrainTrust, where he has a good cross-section of all the top AI companies, and he says market share of open source is 5% and going down. I think that's changed. I think it's going up.

Speaker 2:
[45:22] And even if... The capability gap does seem to be increasing.

Speaker 1:
[45:26] It's hard to tell. It's really hard to tell. Because for listeners, capability gap increasing is on public benchmarks. And let's say you're comparing Methos versus, I don't know, GPT-OSS or GLM 5.1. And it's really hard to tell because even if they were closing, you will also not believe that they were closing that much, because it's very easy to gain the benchmarks. So you just don't really, really know. All you know is like, there's somewhat objective open router stats on what people choose in the free market, and people do choose some of these open models in significant volume, except that a lot of them are heavily discounted. So you need to price adjust these things. So even if that were true, which I'm not sure. I feel like the number is just up now instead of down. I think the separation between what the top tier agent labs are doing versus the average start up in AI or the average GPT wrapper is significant enough that you should not worry about the mean industry number and you should cohort things into like, here's the median, here's the bottom 80% and here's the top 20%. And top 20% acts very differently than the bottom 80%. And so top 20%, which is all I care about, is definitely going towards more open models. The fireworks and the togethers are crushing. And so will all the fine tuners. So I think maybe last time we even said things like, fine tuning as a service doesn't work. Well, now it's going to work. It's a derivative of the open models market.

Speaker 2:
[47:01] Well, and also in the workload scaling to the point where people care about cost and speed more and more. And then moving from just pure use case discovery of, what can these models do to, OK, we know they can do it scale. Now let's do them cheaper and faster.

Speaker 1:
[47:14] Yeah, so that change, I think, is probably the most significant in my mind. And I always like to do the mental math of, this is what I think about scheduling a learning rate. When you've been wrong once, what else were you wrong on? And I'm kind of working through it. To me, the other thing was the coding one, which obviously I have now come full 360 on. But I think people are not appreciating dark factories enough, which I don't know if you've discussed in the pod yet. And so this is a kind of a strong DM slash Simon Willison term. The general idea is, okay, there's different levels of AI coding psychosis you can have. The very first level, which by the way, I encountered first in Cognition five months ago, was zero human written code, right? Which seems like a reasonable thing now, was less reasonable five months ago. The next frontier that sounds as crazy today as zero coding was in the past, is zero human review. Like just check it in without even reviewing it. And very few people are doing that, but OpenAI is exploring this. And I feel like it's definitely the only scalable way to do this, which it just means like you have to just kind of like flip the SDLC or change large amounts of what you normally do, which is probably things you should have done anyway, more testing, more automated verification or whatever. But like that is a frontier at which like when you have unlocked that in your companies, you are just going to produce much more quantity of software than you've ever had. It's going to be like so much so disposable, so cheap, that you can probably innovate in quality a lot as well. Like that quantity helps you get to quality, which I think people are very uncomfortable with, because people associate more quantity with slop.

Speaker 2:
[49:07] Right. Now, it's back to exactly the discussion we're having on the reaction of these token maxing scoreboards and the idea that today maybe that's not the best sign of productivity and efficiency going forward.

Speaker 1:
[49:18] Yeah, but you still get rewarded for it, so you're like, fuck it, whatever. But I think the people who are doing well, who do most well in 2026 are not the cynics who go like, oh, that's just slop, I'm not going to participate in that. They're like, OK, this is happening with or without me. Let's bend this the right way.

Speaker 2:
[49:36] Yeah, I love that. I think for me, a kind of related thing on the open source model side is for so long, I really didn't think it made any sense to do any sort of RL, post-training, pre-training, anything you could do to improve the overall quality, certainly for latency and cost, it always made sense to me, but for overall quality, God, you just get that for free in the models three, six months later. I think what I'm starting to change my tune on a little bit is hearing all these app companies talk about, we build stuff and then we throw it out three months later as the models improve. You're like, OK, well then, what you're doing for capability improvement is just another version of that. I still don't think that your RL or post-training is going to make you have a better model for years and years to come. But maybe, I think you still have to be pretty rigorous in like, is that the single best thing you can do to solve a customer problem? And oftentimes, it's literally just like, now add more data and feed more data even via connectors to these models or do some clever engineering on the backend or whatever it is. But if the single best thing you can do for that three-month time period to improve your customer's outcomes is post-training in some way that really improves the output of a model, even if you throw it out three months later because the general models get up there, it still might have been worth doing. And so I think I'm more open to...

Speaker 1:
[50:45] You throw out the results, but you don't throw out the raw data.

Speaker 2:
[50:48] Then you just run it again. And so basically there's some, obviously, at the level of cost of like $10 million, maybe that's too much, but there's some level of cost where...

Speaker 1:
[50:55] No, it's not even 10 million.

Speaker 2:
[50:56] No, of course it's not. There's obviously some level of investment at which it's the equivalent of just like staffing for engineers to go build something for three months.

Speaker 1:
[51:04] Yeah. So the other thing I really... For listeners, I'm just going to leave some droplets of info. Look into the long trajectory, the synthetic rubrics work that people are doing is very important, including something that's called Dr. GRPO. I'll just leave those key search terms in there. I think what it means is that RL is going much more multi-turn than people think, and that means that you can customize the models in way more specific dimensions than traditional, let's call it SFT or a shallow RL that was done a year ago. So like hundreds of turns. And I think that leads you down a path of complete domain specificity.

Speaker 2:
[51:50] What else are you, of these unanswered questions in AI today, are you looking for, in the next year, are you paying close attention to?

Speaker 1:
[51:58] I have a few thesis for what is the next frontier. One is memory, which memory and personalization we talked about. The other is really world models, which we've done a small little series on from Fei-Fei Li all the way to even Moon Lake and general intuition. And there's a lot of debate as to the relative importance of this. I think a lot of it manifests as 3D static worlds that you inhabit for a little bit and you walk around. And they're like, cool, but how does this help me with my B2B sass?

Speaker 2:
[52:29] And I feel like all the hype now is robotics, right?

Speaker 1:
[52:32] Yeah, and there's obviously a correlation between world models and embodied vision and experiences which leads to robotics. But I think world models is very interesting in just in improving intelligence itself from the next token prediction paradigm. And so I think people are kind of testing their edges around that. One of our top articles this year so far has been on adversarial world models. I do think if you don't do anything else, just read Fei-Fei Li's essay on spatial intelligence, on why LLMs don't have it. And she may not have the solution yet, but she has the right problem statement. And so everyone else is trying to solve that problem statement in their own way. And let's see who wins. But I don't think it does you any favor to equate world models to robotics, or world models to gaming, or some kind of like, or like the current manifestations. Because what is at stake is a much more important conception of intelligence than just answering questions. It is, does the AI understand what a table is, like what matter is, what physics is. It's almost like for those who are movie fans, it's like Good Will Hunting where Matt Damon like knows everything because he read it in a book, but he's never read it.

Speaker 2:
[53:54] Great scene with Robin Williams.

Speaker 1:
[53:55] Robin Williams. I look at that scene and I go like, that's exactly the difference between a very intelligent LLM who knows everything, but hasn't experienced anything.

Speaker 2:
[54:04] Wow. That's an awesome note to end on. Have you used that in the book? That's great.

Speaker 1:
[54:08] Yeah. One thing I've done with Latent Space is I moved to adding daily write-ups. One of the times I was doing this daily write-up, I wrote that.

Speaker 2:
[54:16] That's a great one. I love that. Also, it's been a ton of fun. Thanks so much for coming.

Speaker 1:
[54:20] Let's go hash out, man.

Speaker 2:
[54:21] I'm Jacob Effron, and this has been Unsupervised Learning, a podcast where I get to talk to the smartest people in AI and ask them tons of questions about what's happening with models and what it means for businesses in the world. As I hope is clear, I have a ton of fun doing this. It's a nights and weekends project, in addition to my day job as an investor at Redpoint. But our ability to get these incredible guests on really comes from folks like you subscribing to the podcast, sharing it with friends. It's really what ultimately makes this whole thing work. And so please consider doing that. And thank you so much for your support and listening. We'll see you next episode.