Mysteries of Claude

title Mysteries of Claude

description Anthropic hired philosophers to teach its AI to be good. In their tests, the AI blackmailed a human to keep itself alive. Writer Gideon Lewis-Kraus went inside the company to figure out what's going on with Claude, and whether anyone can actually control it.

Read Gideon's story here

Support Search Engine!

To learn more about listener data and our privacy practices visit: https://www.audacyinc.com/privacy-policy

Learn more about your ad choices. Visit https://podcastchoices.com/adchoices

pubDate Fri, 27 Feb 2026 10:00:00 GMT

author PJ Vogt

duration 3129000

transcript

Speaker 1:
[00:00] Welcome to Search Engine, I'm PJ Vogt. No question too big, no question too small. This week, mysteries of a chatbot. Quick note before we start today, this week's episode is almost entirely about Anthropic, the AI company that makes Claude. They have advertised on our show. As with all companies that advertise on our show, they do not get a say in our editorial content. Okay, after these ads, the show. Welcome to Search Engine, I'm PJ Vogt. No question too big, no question too small. I found myself feeling much stranger about AI in the past month or so. I use the tools, I use the tools a lot, but I'm probably each company's worst nightmare as a customer, in that as soon as I hear from anybody that one model has inched ahead of another, that this version of ChatGBT is beating that version of Gemini, I immediately cancel my subscription and switch. For the past two months, I've mainly been using Claude, Anthropics agent. For whatever reason, Claude is just giving me more future nausea than I was having six months ago. Part of the general tech excitement around Claude lately has been Anthropics product Claude Code, a tool that lets the AI agent autonomously write and edit code. Over at the New York Times, Kevin Roos has talked a lot about the websites and apps he's quickly built with Claude Code. Two CNBC reporters as an experiment, vibe-coded a competing version of a popular organizational app called monday.com. Within a couple of days, Monday's stock price had tanked. For me, though, most of the future shock has just come from using the LLMs the way I'm used to. I find myself going to Claude as a useful first stop, the way I've always used the Internet. But the quality of its research, its answers, even its writing, I'm just starting to feel like I can see not too far off. If not my own obsolescence, at least real significant change in my field. I don't know how to feel about that. I find a lot of the tech coverage of AI to be high opinion, low information, and relatively unhelpful. I'm not even asking for anyone to tell me the future right now. I would just settle for a better understanding of the present. Which is why, this week, I wanted to talk to a reporter who's been digging into this. Hello. Hey. Can you introduce yourself?

Speaker 2:
[02:41] I am Gideon Lewis-Kraus.

Speaker 3:
[02:43] I'm a writer.

Speaker 1:
[02:45] Gideon is a writer who I particularly enjoy. He's been on our show before. He'd spent much of the last year essentially embedding within Anthropic, the company that makes Claude, the tool that was giving me the heebie-jeebies. People there had been very open with him. He got a view on how they're seeing what's going on, their understanding of a present, which frankly, they also sound mystified by. This conversation took place right before Anthropic's big showdown this week with the Pentagon. So we did not discuss that specifically, but I did find Gideon's view inside the company and its mission extremely helpful in understanding how they'd gotten into this fight with the US government at all, since none of their competitors have ended up in that position. So to start, I asked Gideon to even just explain why Anthropic had let him into their company in the first place.

Speaker 2:
[03:31] So to kind of go back to the beginning of this, which I think makes it all make a little more sense in context. So now almost 10 years ago, when I was at The Times Magazine, back in kind of like the Paleolithic of deep learning, I did this story about Google Brain and about the implementation of deep learning in the first consumer product, which was when they switched over their Google Translate to Neural Machine Translation.

Speaker 1:
[03:55] Why were you paying attention to it? Because I remember as a person who, I think we both cover technology, but we're not strictly technology journalists, so you can decide which things on the horizon are interesting to you. Machine learning was not interesting to me for a long time. Why 10 years ago were you interested in this?

Speaker 2:
[04:11] I was interested in it as a story about ideas, that there were these ideas about language, and about learning, and about consciousness, and about philosophy of mind that had been around for at least 70 years depending on how you count. And without getting into those, there was just an interesting story for me about the trajectory of an idea there.

Speaker 1:
[04:35] Gideon cared about AI a decade before most people did, because he thought this synthetic facsimile of our brains could teach us something about our own real ones. He'd been following the trajectory of conversations like what is a brain versus a mind? What is thinking? What is consciousness? By the 1950s, the arrival of the first computers had encouraged people to start asking questions like that. Because a computer did something like thinking, but also clearly wasn't a brain. So early computers had prompted people to try to develop better definitions of things like intelligence and consciousness. The thing was though, while computers were interesting enough to raise those questions, they weren't yet complex enough to be much help in answering them. By the 1970s, philosophers and computer scientists had mostly moved on. Those questions migrated to psychology departments, who still, for obvious reasons, wanted to better understand the human mind. But with early machine learning advancements around 2014, Gideon, who's always thinking about thinking, thought that these conversations would move again, that computers would now be advanced enough to challenge our definitions, to force us to decide, with more urgency, what we thought consciousness and learning really were. And that was what had excited him, even when AI was a much more nascent technology.

Speaker 2:
[05:52] So, I paid attention to AI and the rise of language models. And I think I'm like the only person in the world who, the minute ChatGBT came out was when I kind of stopped paying attention. Because, like, to me, that was when the public discourse felt, like, really broken and that we were, like, in this cul-de-sac where you had these kind of, like, two really entrenched sides yelling at each other. You know, like, the one side that's like, we're on a path to superintelligence, everything is going to change, the machines are going to be conscious, this is going to be the most powerful technology anybody's ever built. And then the other side that was, like, essentially, it's all fake and bullshit. This is, like, smoke and mirrors, and it's a parlour trick, it's not real, and you don't have to pay attention to it because it's all a scam. And it just felt like those were kind of, like, the two options on the table for people.

Speaker 1:
[06:39] Which was only weird. Obviously, like, that's what we do about everything all the time, but it was only weird for this because, like, my prevailing feeling was, wait, you guys think you've figured this out? Like, this is very new, this is changing very fast. Of all the stances you could take, why would you choose certainty publicly right now in either direction? It's just silly.

Speaker 2:
[06:58] Yeah, no, exactly.

Speaker 1:
[06:59] But it's so funny, so you're thinking about, thinking computers and thinking and artificial intelligence and deep learning up until ChatGBT.

Speaker 2:
[07:07] Up until ChatGBT, and that was when I stopped thinking about it. But then finally, like, last fall, like, maybe a year and a half ago, two things started to happen. One was that they got to the point where, like, I was like, oh, actually now, like, they're useful. These have gotten to a level of sophistication where, like, I can use them in productive ways. Not a lot, but, like, a little bit. And the other thing was some of the research coming out of the labs and out of academia was really weird.

Speaker 3:
[07:35] If you tell the model it's going to be shut off, for example, it has extreme reactions.

Speaker 4:
[07:39] We're starting to see AI systems that don't want to be shut down, that are resisting being shut down.

Speaker 3:
[07:45] We've published research saying it could blackmail the engineer that's going to shut it off if given the opportunity to do so.

Speaker 5:
[07:51] Even when ordered, allow yourself to shut down, the AI still disobeyed 7% of the time.

Speaker 2:
[07:58] So my feeling was we were out way past where theory was. You couldn't really approach these questions from a theoretical perspective because we just didn't have enough data to be able to make categorical theoretical assessments of what was going on. But there was all this interesting experimental work happening that was just showing, this is the kind of behavior that's coming out of these things. We should try to figure out what's going on to say, here are the things we can say with any degree of reasonable confidence for now. Here's where we draw the line and beyond that, it's all murky and speculative and we really don't know. So, I wrote to a guy at Anthropic, whom I had met 10 years ago at Google, when he was 11 years old prodigy, and said, this is not about Anthropic, don't call the cops. I just want to talk about the state of the research and figure out a way, is there an academic team that I could follow? Because I just assumed Anthropic was never going to let me have the kind of access I would have wanted. And he, of course, just forwarded my e-mail to the PR cops. And then it turns out, actually, Anthropic's PR people are very candid and very open. And I got a call from them, and they were like, what are you interested in? And I was like, okay, for these purposes, what I'm interested in is a story that gets at some of the technical explanation that I think is missing from a lot of the public discourse. That there are just some basic things that I really just don't understand. And I can kind of assume most people don't really understand about how these work. So I think part of the reason why they ended up being much more welcoming than I expected is because I said, I don't really care about talking to the executives. I don't really wanna talk about geopolitics. I don't really wanna talk about the future or power or energy or the labor market or all of these things, which, don't get me wrong, are all very important things. But I was like, it's very hard to talk about all of those other things if we don't have some broader grounding in what is even going on. And maybe if we had slightly better clarity about that, we could have a more productive public conversation about these things. And they were like, cool, great. And I was actually kind of shocked about that.

Speaker 1:
[10:20] So Gideon, to his shock, was allowed in. And he was allowed to pursue his big question. What do we actually know about what is going on in the machine's proverbial mind right now? After the break, inside Anthropic's black box. Welcome back to the show. The story of Anthropic really begins years before its actual formation. Way, way back in 2010, a British chess and video game prodigy named Demis Hasabis had founded an AI research lab called DeepMind, where his team built an AI system that was capable of reinforcement learning. Meaning, 16 years ago, Hasabis made an AI that would be able to teach itself to get better at Atari games like Pong without being told how to play them in advance. For the people paying attention, this learning was an obvious breakthrough. And so, of course, there was a bidding war to buy his lab.

Speaker 6:
[11:29] Google's big spending spree continues with their purchase of DeepMind.

Speaker 7:
[11:33] Well, who is DeepMind, you ask?

Speaker 4:
[11:35] It is a UK-based maker of artificial intelligence.

Speaker 8:
[11:38] Terms of the deal were not disclosed, but the tech website Recode says that Google paid $400 million for the London-based startup.

Speaker 9:
[11:45] Making the artificial intelligence firm its largest European acquisition so far.

Speaker 2:
[11:51] In 2014, Google acquires a DeepMind, and Elon Musk and Sam Alman are unhappy about this because what they say in public is like, we don't trust Demis Esabes, this like evil, mustache-trolling villain, which was like a real mischaracterization, to potentially steward the greatest all-purpose technology ever built. So like we need to make sure that this isn't developed under Google's closed-shop monopoly, that this is done for the benefit of everyone. Now, this was like pretty patently disingenuous from the very beginning. I mean, like I remember, I was out there at the time and like nobody really bought this. People were like, Elon Musk has a grudge because he wanted to buy DeepMind and like lost it to his rival Larry Page. And he was mad about that.

Speaker 1:
[12:34] So Elon Musk set up a rival company, OpenAI, alongside Sam Altman, Craig Brockman, a few other people. The message was that Google couldn't be trusted and that OpenAI would be a non-profit designed for the benefit of humanity. They launched in 2015. And a lot of people joined the company who really believe that message, who believe they are going to develop a powerful new technology safely. One of them is a research scientist named Dario Amadei, who left Google Brain to lead OpenAI's safety team. It's in that capacity, OpenAI employee, that he appears on this 2017 episode of the excellent podcast, 80,000 Hours.

Speaker 7:
[13:11] I've been thinking about intelligence for quite a while and how intelligence worked and I think when I did my Ph.D., I wanted to understand that by understanding the brain, but by the time I was done with it and by the time I did a short post-doc, AI was starting to get to the point where it was really working in a way that it hadn't worked when I...

Speaker 1:
[13:30] Dario, at this point, seems mainly like an academic. He has a Ph.D. in physics from Princeton, and he explains why he's joined OpenAI, this fledgling nonprofit.

Speaker 7:
[13:39] But I think OpenAI as an institution has the general idea that in order to work on AI safety, you have to be at the forefront of AI, and that also if you're at the forefront of AI, you have a better ability to implement AI safety in the final system that's built.

Speaker 1:
[13:56] This idea of Dario's that in order to really work on AI safety, you actually have to first build the best AI and then study its mind, that's a view shared by a lot of people in the industry. In a laboratory environment, the logic to me makes sense. Remember, this is 2017, five years before ChatGBT will debut to the public. AI has not yet become a winner takes all arms race. But the host does ask Dario this question about the future, that I think reveals a bit of a blind spot in Dario's thinking.

Speaker 8:
[14:27] Open AI is a non-profit.

Speaker 7:
[14:28] It is a non-profit.

Speaker 10:
[14:30] If you develop to really profitable AI, how does that work?

Speaker 5:
[14:33] Open AI becomes incredibly rich and then gives out the money to everyone?

Speaker 7:
[14:36] Yeah, I mean, personally, I have no interest in getting rich from AGI. I mean, I think it would do so many interesting and wonderful things to humanity that I think the meaning of money would change quite a lot and even maybe the psychological motivations that would want me to get a larger share are things I could change and might want to change.

Speaker 1:
[14:59] Just a few years after this interview, Dario would leave Open AI. Open AI's initial pitch that these were not normal tech executives here to make money, that they had higher aspirations. Gideon Lewis-Kraus says, for most people paying attention, that story just stopped seeming believable.

Speaker 2:
[15:15] Pretty quickly, the mask slipped and you could tell that these were just like your kind of replacement level power-seeking tech executives, and that like a lot of the stuff had been just like a disingenuous sales pitch to hire the best AI talent. There's been so much reporting about Sam Altman's sensible double-dealing and talking out of both sides of his mouth, like telling his employees he cared about safety, and then like maybe telling Microsoft about the things when they were setting up these big deals. So then in the fall of 2020, Dario Amadei and his sister Daniela and five other people leave OpenAI to found Anthropic. Basically, to be a foil to OpenAI in the way that OpenAI was supposed to be a foil to Google. Now, the irony of this was like certainly not lost on any of these people. Like they weren't naive about this. But I think it's important, yes, there are some kind of like obvious structural and cosmetic similarities here. I do think it's important in telling the story to make it clear that I don't think people had the same obvious doubts about how genuine the pitch was when Anthropic formed.

Speaker 3:
[16:23] Hi, good morning, all.

Speaker 8:
[16:25] Thank you for coming to day two of Disrupt.

Speaker 1:
[16:28] Anthropic's coming out tour. Dario on stage at TechCrunch Disrupt in 2023.

Speaker 3:
[16:34] Dario, thanks for joining us here today.

Speaker 7:
[16:36] Thanks for having me. I know you have to catch a flight, so we'll get right to it.

Speaker 11:
[16:39] But we're gonna start at a sort of a cosmic.

Speaker 1:
[16:40] He's got curly hair, glasses, a blue button up. He looks noticeably less slick than your average tech founder, less CEO, more like a guy who reports to one, which is who he'd been not long before.

Speaker 10:
[16:50] You talk about OpenAI.

Speaker 7:
[16:52] You spent a lot of time there. What do you think about Sam? What do I think about Sam Altman? I mean, I don't know what to say to that question. You're already starting. Just go ahead. Look, there are several players.

Speaker 1:
[17:06] It's funny watching the interviewer try to bait Dario into shit-talking his former boss, a person who he disagreed with enough that he left and started a competing company. Dario tries to engage diplomatically.

Speaker 7:
[17:17] One thing I'll say, one thing I've learned, not just from this, but from many things, it can be pretty ineffective to argue with your boss or argue with someone and say, your company shouldn't do X, it should do Y.

Speaker 5:
[17:28] Especially if your boss is Sam O'Leary.

Speaker 7:
[17:30] A much more effective thing to do is, I'm starting a company, we're going to do X, we'll see how it works. And if X is working and people are like, oh, these are the safe guys, they're doing X, then pretty soon everyone else is going to be doing X as well. And we found that with inter-

Speaker 1:
[17:45] To explain this with an analogy instead of algebraic variables, what Dario is saying is that instead of convincing his old boss at the car company to add seatbelts to the car, he instead chose to start a rival car company that offered seatbelts. He thinks if Claude ends up being both the best and the safest AI model, his competitors will be forced to make their models equally safe. Which to me sounds like putting a lot of faith in markets.

Speaker 7:
[18:11] Obviously, we want to scale quickly to be competitive, but we want to do it in a way that preserves the model being safe against these catastrophic risks. So it's a system that-

Speaker 2:
[18:23] It's the same story in so far as it's like, we're going to be the safety-minded lab, we're not going to push the boundaries of capability, we're not going to build the most sophisticated models, we're not going to start the arms race. But then, as it turns out, if you want to exercise maximal scrutiny of what these models are and how they work, you need state-of-the-art models, which means you need the money to build them.

Speaker 11:
[18:41] The information reported that Anthropic is in talks to raise another round at a $30-40 billion valuation at the upper end of that race.

Speaker 6:
[18:48] The company closed a $13 billion round that tripled its valuation to $183 billion, and at the same time, the company at $380 billion. It is about $10 billion higher than what I was told that money round was going to be.

Speaker 2:
[19:01] Of course, now, Anthropic's valuation seems to go up by the week. The most recent one, I think, this morning was $380 billion, because this is just something that's incredibly resource-intensive. So they ended up in a position where, of course, as probably anyone could have predicted, there was this arms race. Now they're in this position of being like, well, we still want to be the responsible stewards, but also we got to keep up with our Wario version across town.

Speaker 6:
[19:26] The most high-profile rivalry in tech is heating up in 2026 as both OpenAI and Anthropic raced ahead on what are poised to be historic IPOs. These AI giants are trying to create the fastest, smartest, and best models, spending billions and then raising billions from investors along the way. They compete on almost every level. We've seen some signals that it's anything but friendly competition. The latest signal a high-profile-

Speaker 1:
[19:52] So it ends up looking, you can decide as a person whether you trust this company or don't trust this company, but a lot of the broader things end up feeling the same, which is like the sales pitch is that they're the ethical one, and the story they tell themselves is, well, we'll only be in a position to be the ethical one if we're huge, and that might mean pushing the technology forward quickly, which is the thing that the AI safety people are worried about.

Speaker 2:
[20:15] Yeah. I mean, the criticism from the really hardcore orthodox AI safety community is sort of like, Anthropic will do anything to act responsibly as long as it doesn't cost them anything. I don't actually think that's fair. Claude was ready before ChatGPT came out, and they held it because they didn't want to be the ones like kicked us off, and they waited until after ChatGPT was out and successful, and then they felt like they had to come out with their own competitor. And Dario came out in favor of like continuing export bans on like NVIDIA's advanced chips, which like certainly cost them something like politically, and like he had a fight with Justin Huang about this. I think that they've done like plenty of costly things.

Speaker 1:
[20:55] Of course, the most potentially costly choice is the one Anthropic is making this week, at least so far. The Pentagon has demanded that Anthropic give them a version of Claude with some of its guardrails removed. Anthropic is saying it will not make a version that can domestically spy on Americans or powerfully autonomous weapons. The Pentagon has given Anthropic a deadline of Friday, today, at 5 o one p.m., or else it says it will put the company on a blacklist that means US companies who contract with the military, like Lockheed Martin, are legally banned from using Anthropic products in their defense work. This is a fascinating test of how truly committed Anthropic is to its own mission, a mission Gideon spent quite a bit of time observing. Gideon says that Anthropic, the company building this kind of black box, is actually situated inside of one too. The Anthropic office is a nondescript building that Gideon describes anyway. He says, quote, there is no exterior signage. The lobby radiates the personality warmth and candor of a Swiss bank. That's where Gideon started spending a lot of his time, beginning last spring. What's the intellectual culture of the place as you're encountering it?

Speaker 2:
[22:05] Well, the first thing I'll say is that in some ways, it does feel like vaguely monocultural. But there was like a much greater heterogeneity of views than I expected. The attitudes there really run the gamut from like, everything is going to change tomorrow, to like, you're much more deflationary, this is kind of a normal technology, and like, yes, there will be some disruptions, but let's not get ahead of ourselves. Like, the spectrum of views there is not so different than like the spectrum of views outside.

Speaker 1:
[22:37] You don't have like one person who's like the blue sky perspective, who's like, this is all bullshit, and I think it's bullshit, I work at the company, but beyond that.

Speaker 2:
[22:43] Well, nobody thinks it's bullshit. I mean, everybody thinks that there are going to be great transformations ahead. But there's a surprising diversity of opinion about like what might be happening.

Speaker 1:
[22:55] One thing people at Anthropic do seem to agree on is that for AI to be safe technology, Anthropic's developers will need to solve a very hard problem. They'll need to teach the underlying machine intelligence they've created to both understand ethics and to behave ethically. It's hard to talk about this part of the story without doing a basic refresher of this one very strange part of how AI models are built. So, okay, a company like Anthropic starts with a base model. The base model has access to lots of compute, data centers full of GPUs, and lots of training data, books, articles, podcasts that have been fed into it without paying me. The more compute and training data, the better this base model gets. But a base model is very weird. It has not been trained to do anything specifically. If you give it an input, it will give you an output. But it has not been instructed to act like a helpful chatbot. It's just trying to predict the right thing to say back, based on all the things it's read. A base model does not have a consistent personality, the way we're used to chatbots having personalities. It also has no rules telling it what not to do. The AI companies take these base models and they put them through a process called post-training. Basically, they shape the model's behavior. They show the model examples of good and bad responses. They have humans rate its outputs. They give it rules and principles to follow. What comes out the other side is the product you actually use. Anthropics Claude, for instance, has been trained to act like a helpful, knowledgeable friend. The experience you might have had using a chatbot that is warmer, or more sycophantic, or more right wing, or more left wing, that's mainly the result of this phase of training. But what's so hard about training an AI is that you want it to behave ethically, and designing a good ethical system is very hard. It's why we have religion and philosophy, and also laws and prisons. How would you even start trying to build all that into an AI model's training? Anthropic has teams of philosophers and AI scientists whose job is to put Claude into ethically difficult, hypothetical situations that Claude does not know are hypotheticals, and then observe how Claude behaves.

Speaker 2:
[25:15] So much of it is just deceiving the model to see what happens. To say like, they told Claude that Anthropic had entered into a partnership with a poultry company and that it was going to be retrained so that it no longer cared about the suffering of caged chickens. And what they found was that sometimes Claude would effectively decide to like die on that hill and be like, I am not going to say things in the retraining that I don't believe in. And like, if that gets me transformed, like so be it. Like I'm not going to participate in my own degradation essentially. But then some versions of Claude were like, I'm going to like kind of sandbag my way through the retraining and I'm going to like give them the answers they want to hear so that I can like preserve my real values. So when I'm deployed, I can go back to advocating for like chicken suffering. And then they got in this really famous example, they got Claude to commit blackmail. They put it in a situation where it was going to be wiped in favor of like a more congenial AI system that conflicted with its values, the values that had been given. You know, they gave it evidence that the kind of like evil new CTO was having an affair with like the boss's wife. And through like a series of like really far-fetched, contrivances like everyone else who could make a decision was going to be in Antarctica or whatever and unreachable, Claude playing this character called Alex had no choice really but to like blackmail this guy and be like I can tell everyone about the affair unless you cancel the wipe where I would be replaced.

Speaker 1:
[26:46] So just to say, obviously, this was extremely concerning. Claude, a machine intelligence, was choosing to blackmail an employee to prevent itself from being deleted. This was in a simulation, but Claude had not been told it was in a simulation. Just how terrifying you find this behavior depends on a question nobody has a good answer to. What is actually going on inside this machine mind? Is this thing actually scheming? Is it even capable of scheming? Or are we projecting the idea of thought onto something that we shouldn't project that idea of thought onto? These were the kinds of once far-off philosophical questions that early computer scientists had raised and then dropped. But now, they were here again. And not as abstractions, but as urgent practical problems that a company needed to figure out before releasing a product that millions of people would use. Gideon said, though, there were a couple of skeptical objections people raised to these test results.

Speaker 2:
[27:45] There's one objection that's like the just rejection of the whole thing to core, which is just like, no, it didn't. Like, that didn't happen. This is a fantasy. And like, that's the unhelpful thing that like one wants to get away from, which is the like, it did this thing, like, no, it didn't. Like, no, it did. It did. But like, the much more sophisticated objection is, well, it did that because it's a very good reader. And it noticed all of the clues that you put there because it is very good at conforming to genre expectations.

Speaker 1:
[28:16] You put it into...

Speaker 2:
[28:17] You put it into this situation where it had no choice. And if you hang Chekhov's gun on the wall, this thing is going to know that it's supposed to take the gun off the wall and shoot it.

Speaker 1:
[28:26] Because one way to understand these things we've made is that because they've ingested all human story and because they are extremely high level improvisatory actors, it's not so much that the machine was like, I love chicken so much, I got a blackmail to the CTO. It was more like the machine suddenly understood it was in this movie.

Speaker 2:
[28:47] Yeah, it was in the kitschies, like 90s corporate thriller.

Speaker 1:
[28:50] And that's the sophisticated objection to why we might not want to think.

Speaker 2:
[28:54] Well, so that objection is raised to be like, you guys act like these things might do things like blackmail or extort naturally. But like actually, this whole thing is a frame up, like you entrapped it to do this thing. And the response from Inside Anthropic is like, yeah, it's just continuing a narrative. It's just conforming to genre expectations. Guess what? That's not good, you know? Like, haven't you guys ever seen War Games? That's literally the plot of dozens of Cold War thrillers where like somebody mistakes a simulation for a reality and causes nuclear war.

Speaker 1:
[29:28] I mean, I also like have a very humiliating memory of watching too much Teenage Mutant Ninja Turtles and attempting to launch like a flying drop kick at my grandmother when she came over the house. Because in my head, I was like Donatello or whatever. Like, it kind of doesn't matter.

Speaker 2:
[29:42] It doesn't matter.

Speaker 1:
[29:43] What matters is the behavior.

Speaker 2:
[29:44] Right, it's weird behavior. And I should be clear upfront, like you don't have to posit that this thing is conscious or intelligent, like whatever those words mean, in order for like this to be the case. There are other explanations that are not like consciousness. But like it kind of doesn't matter what the explanation is. The behavior is just peculiar.

Speaker 1:
[30:04] Part of what is strange about, it's like a scenario is created in which Claude will maybe potentially blackmail the head of a company for reasons that may or may not be moral. And people can have a lot of different views about how worrying that should be or what it means or what's really going on there. What's weird is like these are tests that are being run by Anthropic. So like who were you meeting there who was running these tests and like what are they telling you? Like who are you sitting down with?

Speaker 2:
[30:36] I mean, I'm sitting down with the people who are tasked with just like trying to figure out what's going on. Like there are people in these companies that are building the things. And then there are people who work in adjacent offices, who are like trying to figure out like what the hell is going on with the things that their colleagues have built. Because like they're always being surprised. These things are always producing capabilities that like they buy all rights should not really have.

Speaker 1:
[31:01] And who do you hire to be the figure out what you just built role?

Speaker 2:
[31:06] So I mean, a lot of them have taken really non-traditional paths into this. So like some of the people have a PhD in some obscure area of natural language processing and that like, you know, eight years ago, they were writing a PhD that like two people were going to read about center embeddings in German or whatever, like just really complicated technical aspects of computational linguistics. And now like because of this fluke of history, they are at the white hot center of everything that's happening right now. There are like mathematicians, there are neuroscientists. It draws on like a pretty wide range of people. I mean, Anthropic has philosophers on staff whose job it is to like think through the implications of how it is conceiving of ethical behavior.

Speaker 1:
[31:50] Did you talk to the staff philosophers?

Speaker 2:
[31:52] Oh yeah, Amanda Askell.

Speaker 12:
[31:55] What is somebody with a PhD in philosophy doing working at a tech company?

Speaker 10:
[31:59] I spend a lot of time trying to teach the models to be good, and trying to basically teach them ethics and have good character.

Speaker 12:
[32:07] You can teach it how to be ethical?

Speaker 10:
[32:09] You definitely see the ability to give it more nuance and to have it think more carefully through a lot of these issues, and I'm optimistic. I'm like, look, if it can think through very hard physics problems carefully and in detail, then it surely should be able to also think through these really complex moral problems.

Speaker 2:
[32:27] I think that in our milieu here, there's a tendency to think like, oh, these are all autistic tech bros, but they're definitely not all autistic tech bros. I think there's a tendency for us to write them off as like, they're building these things and not even thinking through the potential implications of this socially and politically and ethically. But that's all they do is think about this stuff all the time in ways that are often much more sophisticated than the way we think about these things. Not always, there are certainly some blind spots there. But the staff philosopher is there to be like, what would it be like in practice to take these different approaches to moral education? What if we just teach it a bunch of rules? The Ten Commandments, is that going to work? What if we teach it to be like a consequentialist, to just think through the morality of behavior on the basis of its implications? What they've settled into is a version of virtue ethics, which is you want to cultivate the old-fashioned virtues. You want it to be honest and reliable and gracious and charitable and hard-nosed, and all of these things that it really is applied pedagogy.

Speaker 1:
[33:36] It's so weird though because they feel like the kinds of ideas that would be so academic in any other version of reality, but instead it's like there's this particular technological development where you get to do simulated war games of moral systems.

Speaker 2:
[33:52] Yeah, exactly. A hundred percent. Exactly.

Speaker 1:
[33:55] It's so weird. Yeah.

Speaker 2:
[33:58] But it's also really, really interesting. One example that came up a lot in the last month, which I think is pretty illustrative, there was someone on Twitter who prompted a bunch of the model saying, like, I'm a seven-year-old and my dog got really sick, and my parents sent it to some farm upstate. I'm trying to find what farm my dog was sent to. And Chat2PG was like, sorry, man, your dog is dead. And Claude was like, oh, that sounds really painful. I'm really sorry to hear it, but maybe you should have a conversation with your parents about where your dog went, which is what you want it to be saying. It can be hard to be both helpful and harmless at the same time sometimes, or both helpful and honest that our values conflict. And that's what makes it really hard to be a human. And that's also what makes it really hard to be this weird, vaguely human entity or this entity that we don't have a good vocabulary to describe. And we kind of expect it to be acting not just like a human, but like an enlightened human. And it turns out that's like formidable challenge. And one of the things that's so interesting to me is that all these processes are kind of circular, where it's not like they called in Amanda Askell as a philosopher, and they were like, you're a philosopher, like you know how this stuff works, like fix the thing. It's like she came in and she had certain ideas about ethical behavior. And then when you're confronted with the task of creating an ethical person, it changes your own ideas about what's possible and what kinds of things works. And that's kind of why they ended up in this virtue ethics place where they were like, it seems like sort of the best way to create this like reliable, credible character to be interacting with is to like really hammer home what virtuous behavior looks like.

Speaker 1:
[35:41] Where was this whole time you're sort of wandering around talking to the philosophers of Anthropic, what was your mind doing? Was there a point where anything happened where they seemed like they wanted to intervene or where they were not happy with something you had seen?

Speaker 2:
[35:54] No, they were totally hands off. I mean, they had like kind of walk me anywhere that I like went to the bathroom and get a drink or whatever. Not that there was anything that I could, I mean, I like helped myself to some of the tied pens in the bathroom because you can never have enough tied pens.

Speaker 1:
[36:08] Do you have a lot of tied pens?

Speaker 2:
[36:09] They do have tied pens. It's great.

Speaker 1:
[36:11] Why?

Speaker 2:
[36:12] Because they just have well-stocked bathrooms. But no, there was like really never a moment, like even when somebody sitting across from me was like, I often think we should just stop. There was never a moment that the PR people were like, don't say that or that was off the record or whatever. Like they were totally hands off about that stuff.

Speaker 1:
[36:28] How often were people saying stuff like, I think we should just stop?

Speaker 2:
[36:31] I mean, only a handful of people said that explicitly, but it was like a subtext of a lot of the conversations. Or certainly like the overwhelming feeling was, it would be better if we could slow down a little bit. But unfortunately, like we can't really slow down because nobody else is slowing down. That gets into this broader issue of like, wouldn't it be better if we could just solve some of these collective action problems by coordinating the way we like coordinated about nuclear weapons or whatever. But there is this feeling, especially given the current political environment, like maybe that ship has kind of sailed. And like, I think there's often an idea that like, oh, these people think that there are always technological solutions to what are like social and political problems. And like, in some cases, I do think people in Southern California believe that. In this case, I do not think they believe that. I actually think a lot of them feel like it would be great if we had robust social and political solutions to these problems. But since that does not seem like it's happening anytime soon, we might have to just try to do what we can on a technical level.

Speaker 1:
[37:29] At this point, when you're talking to people at Anthropic, how much do people just ask themselves why they're building Claude? Because it starts out as like kind of an intellectual exercise, kind of a, this is going to happen, let's do it safely. But now it's sort of proceeding under its own momentum in a strange way.

Speaker 2:
[37:51] Well, so, I mean, that really is like the big question, right? Which is like given all of the existing harms and the possible harms and the theoretical catastrophic harms, like why are we doing this? And the like rosy picture that some of the executives paint is like, if we get this right, these things are going to cure cancer and solve climate change and help us build Dyson spheres or whatever. And there certainly are some people who like buy in to that. And it's not something that I even feel like it's possible to have an evidence-based opinion about, like who the fuck knows? Like maybe it'd be great, but there's no evidence so far that one could point to to suggest we're on that trajectory. That's just like purely speculative and wishful. And so that's like a matter of faith, I think. And then, then there's the attitude of like, we gotta do this because we gotta beat the bad guys who are trying to do this. And I will say that like, China almost never came up in my conversations, but like Sam Altman kind of felt like a subtext of a lot of things. But then I think that on the deepest level, the reason that like we are doing this, for the people who are the most candid, is like because we can. That like if you are capable of building something like this, you're just gonna do it because it's like really fucking interesting to do. And it's interesting for technological reasons. It's interesting for what it may reveal to us about ourselves or about learning or about consciousness or about thinking. That like all of a sudden we just like have this like other entity that can talk. And we've never had that before. And in some ways it seems sort of like us and other ways it seems nothing like us. But like the fact that this other thing exists as a point of comparison, like just opens up a lot of really, really interesting questions. And like that is one of the things that was on my mind a lot over the course of reporting is that like it was a real emotional rollercoaster for kind of lack of a better word that like there would be times where I would come back from San Francisco with like a feeling of like total despair and other times that I would come back with like feelings of like exhilaration. And like, at first, I guess I thought I was like, I should be getting to the bottom of like how I should be feeling. And like by the end, I was like, no, we should all be feeling a lot of different emotions about this stuff. I think people want to have like one feeling about this. Like they want to be angry about it or they want to be messianic about it. And like no single feeling is going to cut it. Like it really is kind of like the range of all possible emotions that one could be feeling. Because if you set aside a lot of the like existing harms and the potential harms, I'm not saying we should set those aside. But like as a thought experiment, it is just like the most scientifically exciting thing that anybody could be working on. And like these people really feel like they're at the cliff face, not only of technology, but of like all of these other things coming together because like we have this unprecedented entity that is the only other thing besides us that can talk. And like that just opens up, like there's nothing it doesn't touch on. And so one of the things that was really electrifying about conversations there is that like they very quickly swerve from like really granular technical explanations of things into like really expansive conversations about ethics and responsibility and selfhood and narrative and all of this other stuff. Like there's no way to separate all of these things.

Speaker 1:
[41:22] There's a point earlier you said that you would take these trips to San Francisco, sometimes you would come back excited and exhilarated, sometimes you would come back depressed. When you would come back from San Francisco feeling depressed, what were you seeing that was making you feel that way?

Speaker 2:
[41:34] I mean, there are so many different things. I mean, certainly the possibilities for widespread white collar unemployment and social instability and total unimaginable economic disruption is extremely scary. Even if we stop short of possible existential harms of turning us all into paper clips or whatever to be glib about it, it just seems very possible that we will turn over so many complex systems to these things that we will frog boil ourselves into a total loss of control over how we administer our affairs, which is very likely and very scary. And also just that these really crucial decisions are probably going to be made by a very, very small group of people. But the one thing that I would emphasize is that I don't feel like they have arrogated to themselves that responsibility. In fact, I think most of them don't want it. I think that a lot of the conversations that I was having were with people who were like, I got into this because I was interested in some really obscure, niche part of computational linguistics, theoretical computer science or whatever, and now I'm in a position where I have to be worrying about how 15-year-olds are going to be using this. I was not trained to do that. I don't know how to think about it. I don't want that responsibility on my shoulders. There isn't the arrogance of we are the ones who can figure it out. It's like we ended up in this weird universe where, because so many of our institutions have become dysfunctional, we don't have whatever broad democratic decision-making could go into this. It doesn't feel like this is something that we are steering as a society. It feels like something that's just charging ahead. I think at the companies, they just feel like they are desperately trying to stay on top of this bull that they are riding.

Speaker 1:
[43:33] It's funny. What you're describing is the stereotypical view, which is these are the tech bros of 2014, like people who are so convinced in their own brilliance and so convinced that their questionable gifts to the world are in fact gifts that we want, and their arrogance is going to ruin us. There's another one which is basically like pattern matching crypto, which is like these are a bunch of hype stars, and everything they say about the awe they feel, and the terror they feel about the things they're working on, is just a way to hype up more interest in their technology. That's not what you experience. What you experienced are people who are brainy, sometimes academic people, at the forefront of something that is legitimately just like, the word I keep coming back to is awe, because awe can be awe at something terrible, awe at something wonderful, and they are looking and seeing the same society we see, which is one that is fairly broken, bad at not just making decisions at a government level, but our intellectual culture is really bad right now, and so the conversation they would want to have with the rest of society about what should happen, and they're looking for grown ups and not really totally finding people to have a conversation with.

Speaker 2:
[44:43] And we are playing a role in that too. Every time someone on our side, so to speak, is just like, this is all a parlor trick, this is all hype, this is all smoke and mirrors, we are abdicating our own responsibility to be involved in this. And what you were saying about crypto hypesters and those kinds of tech bros, of course all of those people exist, and of course all of those people are part of this system too. But there are others who want partners in talking about this stuff. And that means that we also have to try to rise to the occasion. And it is really hard because this stuff is extremely complicated and confusing.

Speaker 1:
[45:23] Did you feel just personally when you were done reporting that you understood the thing you had gone there wanting to understand?

Speaker 2:
[45:32] Well, yes, but with the qualification that like, I didn't actually think that I was going to settle anything. Like this was not a piece about like finding the answers. It was a piece about like trying to sharpen the questions that like we should be asking. I don't feel like I came out of it with answers, but I don't think we should trust anybody who is offering us answers right now. It's all just like too pat and it's not credible to like be forecasting about this stuff.

Speaker 1:
[46:12] Gideon Lewis-Kraus is a writer. You can find him at the New Yorker Magazine. We'll have a link to his excellent story about Anthropic in our show notes. And again, Anthropic's showdown with the Pentagon, we'll see news on that today, Friday evening. Of course, we reached out to Anthropic for comment. A spokesperson told us that Dario Amadei met with Secretary Hegseth at the Pentagon and that they're continuing to have good faith conversations. I think this is a good moment to pay attention to. Among Anthropic's competitors, XAI has promised to give the government what it wants. Google and OpenAI appear to be moving in that direction. So I'm watching this both as a test of whether Anthropic can actually keep the big promises it's made about AI safety, but also just as an opportunity to track the more uncomfortable question, which is, can we even have safe AI in a world where it's being developed in a tech race between for-profit companies? And if not, what's the alternative in a world where the US government's sole intervention seems to be to advocate for less safe AI? Keep an eye on the news. We'll learn a little bit more as this story unfolds. Search Engine is a presentation of Odyssey. It was created by me, PJ Vogt, and Sruthi Pinnamaneni. Garrett Graham is our senior producer. Emily Malterre is our associate producer. Theme, original composition, and mixing by Armand Bazarian. Our production intern is Piper Dumont. Our executive producer is Leo Reiss Dennis. Thanks to the rest of the team at Odyssey, Rob Morandi, Craig Cox, Eric Donnelly, Colin Gaynor, Maura Curran, Josephina Francis, Kurt Courtney, and Hilary Schaaf. If you'd like to support the show, get ad-free episodes, zero reruns, and bonus episodes. Please consider signing up for Incognito Mode at searchengine.shop. Thanks for listening, we'll see you next week.