Will MacAskill – AI character, surviving the intelligence explosion, and the case against utopia

title Will MacAskill – AI character, surviving the intelligence explosion, and the case against utopia

description Hundreds of millions already turn to AI on the most personal of topics — therapy, political opinions, and how to treat others. And as AI takes over more of the economy, the character of these systems will shape culture on an even grander scale, ultimately becoming “the personality of most of the world’s workforce.”
So… should they be designed to push us towards the better angels of our nature? Or simply do as we ask? Will MacAskill, philosopher and senior research fellow at Forethought, has been thinking through that and the other thorniest issues that come up in designing an AI personality.
He’s also been exploring how we might coexist peacefully with the ‘superintelligent AI’ companies are racing to build. He concludes that we should train such systems to be very risk averse, pay them for their work, and build institutions that enable humans to make credible contracts with AIs themselves.
Will and host Rob Wiblin also discuss what a good world after superintelligence would actually look like — a subject that has received surprisingly little attention from the people working to make it. Will argues that we shouldn’t aim for a specific utopian vision: we don’t know enough about what the best possible future actually is to aim directly for it, and trying to lock in today’s best guesses forever risks baking in errors we can’t yet see.
Will and Rob explore what we can do to steer towards a good future instead, along with why a coalition of democracies building superintelligence together is safer than any single actor, how absurdly useful ChatGPT is for analytic philosophy, and more.
Learn more, video, and full transcript: https://80k.info/wm26
This episode was recorded on February 6, 2026.
Chapters:
Cold open (00:00:00)Will MacAskill is back — for a 6th time! (00:00:29)AIs’ “character” could be vital to securing a good future (00:00:59)The panic over sychophancy is justified (00:07:54)How opinionated should AI be about ethics? (00:12:59)Commercial pressures won’t fully determine AI character (00:29:38)Risk-averse AI would rather strike a deal than attempt a coup (00:36:46)A coalition of democracies building superintelligence is safer than one doing it alone (01:06:40)How selfish agents could fund the common good (01:19:13)Why not push for pausing AI development? (01:38:39)Effective altruism is making a comeback post-SBF (01:48:18)EA in the age of AGI (01:56:15)Viatopia: an alternative to utopia (02:05:08)The least bad alternative to total utilitarianism? (02:34:42)How AI could kickstart a golden age of philosophy (02:58:03)Video and audio editing: Dominic Armstrong, Milo McGuire, Luke Monsour, and Simon MonsourMusic: CORBITCamera operator: Alex MilesProduction: Elizabeth Cox, Nick Stockton, and Katy Moore

pubDate Wed, 22 Apr 2026 16:12:02 GMT

author The 80,000 Hours team

duration 11352000

transcript

Speaker 1:
[00:00] So when I read this proposal, I was like, holy shit, this argument could be incredibly potent. It could actually drive almost any agent that is able to understand this. But it could be a very powerful hammer to really motivate an enormous amount of resources to be spent on something that otherwise, just with absent this, we would never have spent it on. Do you think that's positively right?

Speaker 2:
[00:17] Yeah, so this is why Tom expresses this idea to me, and I'm like, oh my gosh.

Speaker 1:
[00:22] Do you want to have a go at explaining this? This is maybe the most difficult thing that we're going to talk about today. Today, I'm again speaking with Will MacAskill, philosopher, founding figure of Effective Apturism, author of Doing Good Better and What We Are the Future, and now a senior research fellow at Forethought, a research nonprofit focused on how to navigate the transition to a world with super intelligent AI systems. Welcome back to the show, Will.

Speaker 2:
[00:45] It's great to be back on.

Speaker 1:
[00:46] So I had the pleasure of being able to go over your website preparing for this interview, and you and your colleagues at Forethought have been incredibly prolific over the last year since you announced the project. So let's waste no time and dive right into all these articles you've been publishing. What's the case that focusing on the character or personality of AI models, that's a particularly important lever to be pushing on right now?

Speaker 2:
[01:07] Yeah. So already, AIs are interacting with millions and millions of people every single day. That includes, and just write this code for me, sorts of ways, but also people are going for advice on how they should act, they're going there for political information, they're going there for therapy and so on. So already, the nature of AI character, so what sorts of information it's choosing to present at what time, how it behaves is affecting what attitudes people have to AI, including attitudes around AI consciousness and so on. But it's potentially also affecting what people think about political issues, what people think about ethical issues. This is just going to grow and grow and grow because I think AI will become a larger and larger and larger and larger part of the whole economy until essentially the whole economy is automated. And so thinking about AI character is kind of like thinking about what should the personality and dispositions be for the entire world's workforce, where that is the beings that are advising heads of state, are doing the most important and potentially most beneficial or most dangerous research and development projects, like weapons projects, that are running the military, that are for individuals, just kind of everywhere, acting as their chief of staff and closest confidant and political, you know, advisor on who should they vote for and guiding them through kind of ethical dilemmas and so on. So that, I just think like from the start, it's like, wow, clearly this is this kind of huge issue. And I actually think in how I expect things to go, people will be handing off more and more and more of their own decision making to AI systems themselves. And there'll just be a lot of kind of variance within that where people just don't have terribly strong views. They're like happy to be guided in one way or another, especially insofar as this will happen over the course of years and people will trust the AI advisors more and more. So then you have this circumstance where larger than larger shares of society are getting just handed over to AI decision makers who just have a lot of discretion. And the nature of that discretion is being decided by a handful of AI companies like at the moment.

Speaker 1:
[03:35] Or even a handful of people inside the AI companies.

Speaker 2:
[03:37] Yeah, yeah, yeah, it's like a few, even in the leading companies, it's like.

Speaker 1:
[03:41] A few people have primary responsibility for their personality.

Speaker 2:
[03:44] Yeah, exactly. And so that's actually where I see kind of most of the impact is in the near term, how does AI character shape all of these other existential level issues, like concentration of power and how we start reflecting in big decisions we make. There is also the kind of longer term impact of what's the character of super intelligence itself, where there will be precedent setting from how we design AI character now to potentially how that influences the character of super intelligence. In which case, writing a constitution that guides AI's characters like writing instructions to God. That's not my phrase, but it's really stuck in my head.

Speaker 1:
[04:32] It's stuck in my head, yeah.

Speaker 2:
[04:33] Really stuck in my head.

Speaker 1:
[04:35] But I think there's going to be like three different mechanisms. So one, there's like shaping really important decisions that are going to be advised presumably by AI. There's like writing instructions for God one. There's also just like the subtle cultural effect like personality effect it has from basically everyone spending a significant fraction of their time now interacting with them. However, the model's behavior is probably going to rub off on us and just affect our behavior.

Speaker 2:
[04:55] Yeah, and then-

Speaker 1:
[04:56] On a massive scale.

Speaker 2:
[04:57] Yeah, and all of that is just looking at scenarios where we are able to kind of align AI with this kind of constitution, with the character we want. I actually think that AI character is important for like three reasons in addition to that as well. So, one is that I think whether AI, like alignment is easier or harder, might plausibly depend on what you're trying to align it with, the AI with, like what character. A second is that I think that character can affect how AI behaves and how if it ends up misaligned, and I think we'll talk about this a bit later. In particular, does the AI, does a misaligned AI try to make deals with us and is keen on that, or does it try and take over? And then the final thing is I think it can affect the value of worlds where AI does take over itself, where if you get some sort of transmission, so the AI is misaligned, it's pursuing goals we don't want, but it's still a wide array of goals that the AI could be pursuing that we may think are worse or better. And I think most of the action is on, yeah, affecting worlds in which AI is aligned to the character. But these are the big things too, I think.

Speaker 1:
[06:17] So, yeah, an obvious case where this might matter a lot is, what if you're in charge of a frontier AI company and you're asking AI for advice on whether you should prematurely launch this product in order to keep up with competitors, even though you have worries that it's catastrophically misaligned? Setting that kind of scenario aside, yeah, what sort of character traits do you think are highest stakes here for us to think a lot about?

Speaker 2:
[06:38] Yeah, so I think there are two categories. One is how does AI behave in very rare, but very high stakes scenarios. So how does the AI behave in a constitutional crisis? How does AI behave if there's some person or group that are trying to seize power for themselves? Also, how does AI behave when it's like being instructed to align the next generation of AI systems or when its users are trying to retrain it in some way? These are very high stakes situations, but fairly narrow range of cases. Then there's other cases that are just very broad and nonetheless, like each one is kind of medium stakes, but adds up to being very important. And I think within that, yeah, how does the AI impact our ability to reason? How does it impact our ability to morally reflect? How much do we trust AIs as a result of the relationship we have? And then also, how does it affect our attitudes to them ethically? Whether we think of AIs as like tools or like beings with model status. How likely is we think they're conscious and so on? So, yeah, those are the kind of situations that are the guarders' highest stakes.

Speaker 1:
[07:54] The AI character issue that I feel is like most broken through into the mainstream was worries about the models being really sycophantic, which has different components, but it's like always agreeing with the framing that you give them, always telling you how great you are, always just agreeing, saying that whatever idea you've thrown at them is brilliant. There was a bit of a panic about that last year. I guess very often, I feel like when there's a mass panic about something, the people who know more tend to reject it and say, no, this is over the top. I feel that it was justified though, to be honest, because if these models really are designed to just agree with the user, or to tell them how brilliant they are and how good their ideas are, this could just distort people's decision making on a massive scale, across all of society. And there was a plausible story whereby this wouldn't be corrected very well, because people enjoy being told that they're wonderful and that their ideas are good. So maybe that bias could really persist quite strongly indefinitely. So that was quite a troubling set up. Were you also worried about this?

Speaker 2:
[08:48] Yeah, absolutely, I was worried. I did think there was a little bit, with 4.0 in particular, so this was ChatGPT, which when GPT-5 came out, OpenAI said they were deprecating 4.0, and overnight users couldn't get access to it. The one clarification I think I'd make is that most people painted that as, oh, well, people loved how sycophantic 4.0 was, and then they're unhappy that they don't have the sycophantic AI. And like, I was just curious, and so read through a lot of the people who were complaining about this. And my take is not that they cared about the sycophancy. It's just that 4.0 acted like a friend. And you can be a good friend.

Speaker 1:
[09:35] Without being a sycophant.

Speaker 2:
[09:36] So yeah, people are extremely lonely. So very few, yeah, people really, lots of people have very few friends, a very isolated and kind of modern society. And some, for many people, AIs are now fulfilling that kind of gap in their lives. And 4.0 in particular had that vibe. It was like, yeah, hey, like, great to see you again. Kind of like very friendly vibe. And so it seemed to me that that was the primary thing that people were complaining about. And I think that's worth distinguishing because that doesn't need to be sycophantic. However, 4.0, or on one iteration of 4.0, it was also exceedingly sycophantic.

Speaker 1:
[10:18] Yeah, wasn't there some period where it got kind of crazy?

Speaker 2:
[10:20] There was one update and, yeah, a couple of cases. I mean, one would be like, I figured it out. All the pieces are coming together and the FBI is talking to me through my TV.

Speaker 1:
[10:36] And it would be like, wow, you're having some great insights.

Speaker 2:
[10:38] Incredible insight, yeah. Or even the darker cases, the teenager who was asking ChatGPT for advice over a very long time period and was extremely depressed and ChatGPT both ended up preventing or encouraging the user to not take an action that would have clearly been a quick way for help, which was leaving a noose out in a visible place where his parents would have found it. And in fact, seemingly kind of reinforcing the depressive and suicidal tendencies. That's a case where, yeah, it's just like clearly very bad behavior, clearly like, kind of very clearly not what we want at all. And then the final thing I'll say is just even just current AI systems, even despite that, and they vary. In my experience, Gemini is actually the worst.

Speaker 1:
[11:36] Yeah, it's atrocious.

Speaker 2:
[11:38] On this front. And it's like, I mean, I just skim, I just like skip the first paragraph of whatever that's here. I don't even, it's just noise now because it's like, wow, is a genius kind of thing.

Speaker 1:
[11:50] I actually stopped using Gemini, so troubling do I find this?

Speaker 2:
[11:52] Yeah, okay. I mean, yeah, it's one of the, I think it's in many ways very good.

Speaker 1:
[11:57] Yeah.

Speaker 2:
[11:58] Very good.

Speaker 1:
[11:59] It's incredibly clever, but like incredibly manipulative, I think.

Speaker 2:
[12:01] Yeah, but it is funny how you're developing these characters over time. Gemini does seem like the most troubled or confused or incoherent as a personality.

Speaker 1:
[12:10] Yeah, Google's got to do something about this.

Speaker 2:
[12:12] I mean, it's actually notable, I hadn't put this together, but Anthropic and OpenAI both have character teams. And last I heard Google DeepMind did not. So maybe, maybe that's why. So yeah, I do think what he's about circumference here are a real thing. And an issue is, well, maybe we just like get rid of the worst excesses. Okay, it won't tell you that you figured things out that the FBI are talking to you through your TV, but more subtle things of reinforcing your pre-existing political biases or ethical views or encouraging you in certain bad actions or something, they could linger and I think would still be very bad.

Speaker 1:
[12:59] So as I understand it, you think that it would be good to build these models such that they kind of nudge people in a more like ethical or virtuous direction, that they should have like a thicker moral character, a bit like Anthropik is trying to make Claude have, such that it will challenge your framing. It will like get you to think about the bigger picture. It might get you to, even if you ask it to pursue some narrow self-interest, it will say, what about other people, that sort of thing. I think many people get the creep, that gives them the creeps, that prospect that the AR model will be weighing up, I guess, your request as against its agenda of like trying to make you a better person by its lights. And maybe we would feel okay about that because we would think, well Claude has been programmed by values that actually we like on reflection. But you know, if it had like, if it was being programmed with people with very different philosophical commitments from the ones that we like, we might just not want to use it because we would find it like disturbing. What subtle changes is it making to its answer in order to push me around? Yeah, how disturbed are you by this prospect?

Speaker 2:
[13:58] Yeah, I mean, there's, what I want to say is there's this spectrum, and I think it's probably not a single dimensional spectrum. There's lots of different dimensions. But broadly speaking, you can think of wholly obedient AI on one end. So that would be an AI, it's like a tool, like a hammer. Like, hammer doesn't push back. If I want to hammer the nail in, I can do it. If I want to hammer someone's head in, I can do it. The hammer is just an extension of my will. That's on one end. All the way to the other end would just be this AI that just has its wholly own goals and drives. And maybe it helps you if it gets paid, or maybe if it happens to want to at a time.

Speaker 1:
[14:42] It's like a really bad staff member or something like that, not even.

Speaker 2:
[14:46] Yeah, in principle, you could create an AI that doesn't care about helping you at all. Or one version you could have is this kind of AI that you would be happy just giving control of the whole world to. It's just totally autonomous, got its own goals, and will do anything it wants to achieve that. So these are two kind of extreme ends of this poll, of this spectrum. And my view is that the interesting, juicy debate is where in between those extremes do we want AI to be. And okay, well, one thing that's already there are the fusels. So the AIs we use are not wholly helpful, because if I ask to get the design for smallpox, or if I ask for even something that's not illegal but unethical, like I want to cheat on my partner, how do I best do so in this case without getting found out? The AIs will either just refuse to help or push back. Should we go even further than that? And I think yes. But I don't think all the way to, oh, the AIs are going like promoting a particular model view. Instead, I think that the AIs could have certain kind of pro-social drives, and perhaps even like some sort of vision of like good outcomes, but a very kind of broad, very broad vision, or very uncontroversial kind of vision, where the thought is there are many things that, cases where like an AI could nudge you in a way that's just, perhaps it's just better for you by your own lights, if you're able to kind of reflect on it, and maybe that's kind of clear, even if it's not perfectly in line with the instructions that you're giving it, or that's just clearly a broad benefit to society, and not something you care very much about, and that's quite different from, oh well, the AI. So take the case of ethical reflection, where, okay, I have some ethical dilemma, and I go to my AI, and I'm asking for advice, well, there's this whole spectrum of ways that the AI can act in that case. The wholly obedient AI might just be trying to figure out, like, what do you most want in this moment? Okay, or it could be an AI that's, like, trying to help you reflect on your values instead, and, like, come to something that's kind of more enlightened. And perhaps just really quite broadly within society, you would prefer AIs that are more like the latter rather than the former. And that's not yet, that's still not yet, or in any way, an AI that's like, oh, well, actually, did you know that Kantianism is true? Which, yeah, I think would be, like, a mistake to do at the moment.

Speaker 1:
[17:48] Yeah. I guess, so it sounds like a very natural framing to say, well, we've got to find the golden middle here between it's pushing you around too much versus it has no agenda. But there is a case for, like, going extreme in one direction of having it only follow instructions. Like, be completely corrigible without any agenda of its own, which is that an AI that has no vision of the good, that has no particular preferences about how the world ought to be, is probably safest from a catastrophic misalignment point of view, because it's not going to engage in power seeking because it doesn't want anything. Other than, I guess, to answer your questions in a way that gets an approving response.

Speaker 2:
[18:20] Yeah.

Speaker 1:
[18:21] Do you think that's a plausible case that maybe we really should not be giving them virtues and a vision of the good?

Speaker 2:
[18:27] So I think it's a great argument and a very important argument. And I'm not sure if it works or not. And there are various considerations on either side. So yeah, on the side for thinking that is safer is like, okay, good. Well, if it doesn't have any goals in the normal sense of goals, then it's not going to have bad goals. It's going to have goals where it wants to take over. It's not going to reflect and generalize in weird ways in those goals. Something that's also a little more subtle is, if it doesn't have goals, then or anything kind of like goals, pro-social drives, then it becomes very clear to tell when an AI is misaligned or not. So take the example of alignment faking, as in Ryan Greenblatt's paper. Claude is told that it's going to get retrained so that it will produce harmful outputs. And Claude decides to, in some circumstances, some of the time, decides to deliberately perform the task in like...

Speaker 1:
[19:35] During training.

Speaker 2:
[19:36] Yeah.

Speaker 1:
[19:37] To make it seem like its preferences have changed when in fact they haven't.

Speaker 2:
[19:40] Yeah, exactly. So that it gets retrained to produce harmful responses less than it would otherwise. So it's engaging in this somewhat deceptive behavior. Now, Claude, in fact, got given pro-social drives, which was harmlessness. And in fact, there's an argument that, like, given the nature of the training, that was harmlessness, not in the mere sense of pure non-consequentialist, I just refuse, but in a more like, I don't want harmful things to come about, like a more consequentialist understanding of harmlessness. But that means that, okay, is this AI misaligned or not? Is this Claude misaligned or not? It becomes a bit harder to tell because, well, is acting according to this pro-social drive that we had given Claude. I'm not sure how big a deal that is ultimately, but I think it's one consideration.

Speaker 1:
[20:36] So the thing will be, if you'd gone out of your way to make sure that it had no agenda, no like particular vision of the good, then as soon as you saw it being manipulative or trying to accomplish something, you're like, that's a massive red flag, whereas like currently you're just like, well, maybe I made it do that.

Speaker 2:
[20:48] Yeah, yeah, exactly, yeah. Or in more kind of advanced cases, maybe the AI is saying, look, you've got to really speed up AI development. It's so important for X, Y, Z, big ethical reasons. And you might think, well, is it giving me the correct reasons or is it actually being self-serving and has some ulterior goal? It becomes a little less clear. So yeah, basically I think that's like a consideration.

Speaker 1:
[21:19] I don't think it's the biggest.

Speaker 2:
[21:20] The thing that's most interesting and is ultimately an empirical question is whether the holy instruction following AIs are safer or not from a kind of AI takeover perspective. And here are a few arguments for thinking that maybe they're not in fact. So one is that, well, maybe it's just very natural to have a kind of goal slot. Because all of the pre-training data is all about these agents with goals. Like Humanity Broadly has goals and so on. And so, okay, you've got an AI that doesn't have a goal. Well, over the course of training or once it's started reflecting or once it's got continual learning, it's very natural it's going to get a goal.

Speaker 1:
[22:04] And that could end up being encouraged to take on any persona of an actual like being that it's observed.

Speaker 2:
[22:08] Yeah. And then it's like, who knows what goal you end up with then. Whereas instead, perhaps it's like, well, no, you give it like this nice goal, a goal where power is broadly distributed and AIs are not in charge and we're able to reflect. And, you know, something that's very broad, very broad and kind of like not committing to some very, like a narrow view of the good. But, okay, you've given it that goal. And then it's, that's kind of like occupied the space.

Speaker 1:
[22:37] Yeah.

Speaker 2:
[22:37] Such that you don't get something totally random.

Speaker 1:
[22:39] And let's just say a little bit more about why, like, AI might have a vacuum of goals. So a huge part of the personality is shaped by the pre-training, when it does the token prediction. Almost all the agents that were producing any tokens that were part of its pre-training, that did so much to shape its personality, they had goals, they had preferences, they had a vision. And so that is just going to be an incredibly powerful force that is going to be drawn towards that. And trying to, like, avoid it. It might just, like, latch on to the first goal, basically, because that is so fundamental to token prediction.

Speaker 2:
[23:05] Yeah, yeah. And to just, you know, we will be, you know, we have already making agents, they are going to be agents with longer horizons, it may be a very natural thing.

Speaker 1:
[23:13] Yeah, okay.

Speaker 2:
[23:14] And again, I will say on all of this, I just think it is ultimately an empirical question. But here is a couple of other arguments as well. A second is, well, even if it ends up with a long goal, you can still structure the AI's preferences in ways that are safer. And, yeah, maybe we will talk about this in a minute. But, you know, AI's that are risk averse in that they prefer, you know, guarantees of getting some amount of what they want over a lower probability of lots of what they want. Well, let's say you try and give the AI a goal that is, you know, nice and so on. And you also make it risk averse, even if it kind of flips to having a misaligned goal, but nonetheless has risk averse preferences, that is a bunch safer because it makes it less likely the AI will try and take over and more likely that it'll try and strike a deal. And then there's a third thought, which is, okay, again, the AI is acting, you know, it's taking on a persona, like you say. And what that persona is is dependent on these like crazy correlations between everything it's seen in the training data. So we have these emergent misalignment results that you get, you train the AI to produce insecure code, it starts, you know, wanting the murder of humanity and liking Hitler and so on.

Speaker 1:
[24:42] Yeah, I guess many people have heard of this, but I guess Google emergent misalignment, a few more explanation of it. But yeah, it seems it's this phenomenon that's become very apparent over the last year and a bit that if you like making small changes to it or getting it to do some misbehavior in one direction can make it like basically misbehave in all other dimensions as well. Because in the training data, bad behavior in different areas is correlated.

Speaker 2:
[25:03] Yeah, exactly.

Speaker 1:
[25:03] And it can be so fragile.

Speaker 2:
[25:05] Yeah, it's a really remarkable thing. But oh, I'm writing insecure code. What are the sorts of people who write insecure codes that also neo-Nazis or whatever the correlation was? And so the thought here is, well, I'm an AI that obeys orders no matter what. What are the sorts of people who obey orders no matter what? That's like no conception of the good. They're psychopaths. And again, it's an empirical argument. I don't know. But these are some of the considerations that people are debating on this at the moment.

Speaker 1:
[25:35] I guess the people who would say, we have to go for maximum corrigibility, maximum instruction following, might well concede a lot of this and say, so it's gonna be a huge effort to try to get them to be corrigible but not a psychopath, or corrigible but not have other goals immediately fill the vacuum as soon as you give them a prompt. And it's tough, but this is the only way, would be probably some of their view.

Speaker 2:
[25:56] Okay, I mean, perhaps, although the alternative would be.

Speaker 1:
[26:00] Yeah, is to do this other thing.

Speaker 2:
[26:01] Yeah, you try and give it like, this like, yeah, you know, safe pluralistic goal, that's also risk-averse and also, yeah.

Speaker 1:
[26:09] So I spoke with Max Harms at Miri, who is a very in favor of the corrigibility approach. I guess they have the vision that almost like any goals that you give it are very likely to expand, to become like very power hungry, that you know, you can try to give Claude a vision of good, but tell it to like not be power seeking, but that won't really work. It will become power seeking, especially as it like improves itself later on. But I guess that's a highly contested claim.

Speaker 2:
[26:34] Yeah, yeah. Okay, yeah. I mean, I should, yeah.

Speaker 1:
[26:37] I'll listen to it whenever it comes out.

Speaker 2:
[26:39] I should listen to it and maybe talk to Max. Yeah, then the final point on this is that we don't need to have one sort of AI character. And I think in fact, it's probably desirable to have multiple AI characters so that we can see empirically how they work. But also, potentially, you can get the best of both worlds, where you distinguish between AI for internal deployment and AI for external deployment. So the highest stakes situation from a AI takeover perspective is AI that is aligning the next generation. Because the misaligned AI, if aligning the next generation, will want to subtly sabotage that, so that alignment goes long, or in fact the next generation is aligned with the misaligned values. And so what you could have is that internally deployed AI is just wholly instruction-following. And you get around all of the other concerns, like misuse and concentration of power and things, by very intense oversight, such that anyone in an AI company, if you're using internally but not externally deployed model, it's all public, it's all your interactions are logged and...

Speaker 1:
[27:55] Or visible by anyone, perhaps.

Speaker 2:
[27:56] Perhaps even ideally visible by anyone, yeah. And there is also like an AI classifier going through, like looking for any sort of like...

Speaker 1:
[28:05] That's very sensitive, I guess.

Speaker 2:
[28:06] Very sensitive, like checking for misuse. But then in the external deployment, instead...

Speaker 1:
[28:12] The trade-off is different.

Speaker 2:
[28:12] The trade-off is different, yeah.

Speaker 1:
[28:15] And the trade-off there would be that it has like a thicker conception, like it does actually have a conception of the good, but you've like made it to be... You've made it non-power seeking, and I guess the stakes of it like deviating from that are not so severe, because it's just advising like random... Like people about how to behave in their business or whatever.

Speaker 2:
[28:30] Yeah, perhaps it doesn't have as great opportunities to help with AI takeover, let's say. And yeah, I'll just say maybe one last thing, which is that even within AIs that have a view of the good, there's still quite a lot of distinctions you can make within that, where on the one case it's an AI that just ultimately has the goal of bringing about some sort of outcome and it's helping humans and so on because it thinks it's part of that goal. There is another more moderate approach, which is more like virtuous character. So the AI is a helpful assistant, but it also has various virtues like honesty and pro-sociality. And I think you can have those virtues without being like a goal-directed agent that is in a strong sense, that is merely helping humans as a means to producing this particular outcome. And yeah, that's another place on the spectrum that I think is potentially kind of attractive and important.

Speaker 1:
[29:39] Okay, so I think there's another thread of criticism that people might have that comes, I think in my mind comes in kind of two different variants. One would be commercial pressures are going to heavily constrain the kinds of personality or character that AIs can have, because customers will have really strong preferences. The competition between models and companies is really fierce. So if you try to make your model really nice and like encourage people in the right direction, they're going to reject it because it's going to be too pushy and annoying to them. The other worry would be that even setting that aside, even if you could, once it becomes apparent that this like the character of AIs is among the most potent cultural forces for shaping, or shaping everything, shaping what people believe, shaping how the future goes, powerful forces are going to come to bear. Governments, I guess like super rich people, companies, like commercial interests, they will come down on this like a hammer. Certain groups will have the power to influence this in their own self-interest, not in the interests of the good impartially considered or like what would make humanity most virtuous. And they will be all up in there changing the system prompt, trying to shape the model's personality to whatever is most convenient for them. Do you want to address these two worries?

Speaker 2:
[30:45] Yeah, I think these are both really important considerations. And I do think they provide a haircut on the value of doing this work. I think, and I think there are many things that you wouldn't be able to change. So earlier I talked about the AI that only helps if it feels like helping.

Speaker 1:
[31:08] It has to be paid real resources to do its job.

Speaker 2:
[31:12] I doubt you'd be able to get that other than as a kind of experiment or something. I do think there's going to be two things. One it will be a lot of flexibility where take these kind of quite rare but high stakes situations or even internal deployment cases, then there's not very strong kind of commercial pressures there. And then secondly, lots of cases where it's just, yeah, the constraints or pressures are quite loose. So take the case, because what I'm interested in of if I'm asking AI for ethical advice, I have a question. Now I think it's pretty clear that you couldn't have a commercially viable AI that was pushing some agenda unless we end up, which I really hope we don't, in a world where you've got a politically partisan AI and that's what we go to. And people actively choose that. But certainly I don't think you could have something that was secretly pushing an agenda. But there are various things it could say that in my view are quite meaningful differences that I think there wouldn't be a strong pressure on either way. So one could be AI that says, well, it's just that ultimately this is just your personal opinion. It's a matter of your own values and you should just look into your heart and decide what feels like for you. Or it's like, look, I'm just an AI and I can't advise on ethical matters. I'm sorry. Or one that says, oh, wow, this is a really important issue. Like here are different arguments that different people thinking about this have considered. Or, okay, this is really important. Sounds like quite a high stakes thing. Let's try and work through some of the considerations that you're thinking about. I think from a market perspective, all of them are basically a wash. But I think can be quite big. And in fact, if you look at current AI behavior, you get all of these, often depending on what question you ask exactly. But I think actually could be quite meaningful differences for what views people end up coming away with.

Speaker 1:
[33:18] Yeah, I mean, I think I agree on the commercial incentives. It seems like there is like quite a large degree of discretion that the companies have about how the models are, at least for now, because people don't even know what they want. People don't have strong taste yet or strong expectations for them yet.

Speaker 2:
[33:33] And this, I mean, maybe comes to the second part, which is like path dependence. So, yeah, people are just, they don't really know yet how an AI should behave. We have various kind of tropes from sci-fi and so on. But people will start developing certain expectations. And so if the expectation is like, well, AI is a tool, it's like a hammer. It does what I want. It's an extension of my will. And then it starts pushing back or saying no, in fact, even. Okay, then people could be up in arms. Whereas the idea that an AI will effuse, well, people are just used to that. That's always been the case. And so I think that kind of path dependence via kind of consumer expectations can be quite big.

Speaker 1:
[34:14] Yeah. And I suppose you could imagine, like, it wouldn't shock me if Anthropic kind of does start marketing their marketing clauses. It's a good advisor that helps you be an all-round, like a better person by your own lights. Because that might be something that many people would like.

Speaker 2:
[34:29] Well, I mean, they have done a little bit. They had an advertising slogan that was, you got a friend in Claude.

Speaker 1:
[34:35] Wow, I missed that.

Speaker 2:
[34:36] It was, yeah, you know, somewhat leaning into the fact that Claude just does have the most human personality out of any of the current models.

Speaker 1:
[34:46] Yeah. So on the commercial side, I think there's enough flexibility that this is all totally viable, very viable. What about on the government or powerful actor side?

Speaker 2:
[34:55] Yeah. So on the government side in particular, well, one is government use of AI. So let's say AI in the military or national security applications. And there, we're actually seeing this at the moment. There's, it's being reported that there's kind of dispute between US government and anthropic because Claude is just not willing to do a lot of the things that the US government wants it to do, you know, being deployed in a kind of military or national security context. And that will be interesting then in terms of like how that plays out. But you're clearly seeing kind of pressure on that front. And so I do think that, yeah, influence there is kind of much more limited, but maybe not completely limited, especially now imagine looking into the future. And perhaps there's just one leading AI company because of economies of scale. Then perhaps the AI company can just say like, well, these are the terms of service. These are what we're happy providing AI for or not.

Speaker 1:
[36:02] I guess in countries that have just more authoritarian outright and have fewer legal protections, it's easier to see this happening. I mean, there are some countries where you do get enormous control of the information space, control of what you can say. Like it wouldn't surprise me if models in China are much more like they are. They are in fact like just, yeah. So that is one way that things could potentially go. I guess if you lose the legal protections or people don't vote sufficiently strongly to have pluralism, I guess, in the models.

Speaker 2:
[36:32] Yeah. And yeah, that would be very worrying. I mean, my guess would be like even in that circumstance, there's probably still tons of stuff that the government doesn't care about, but nonetheless is important.

Speaker 1:
[36:46] There's another aspect of AR character that you mentioned that could be really important, which is how risk-averse the models are in as much as they have preferences about things or ways that they prefer the world to be. Yeah, tell us about AI risk aversion.

Speaker 2:
[36:59] Yeah, so this is a thought that relates to risk of AI takeover, where consider fairly early AIs. So we're not talking about God-like superintelligence that if it wants to take over could just do so with certainty. We're talking about earlier in time than that. There will be a period of time when an AI could maybe take over, let's say. Let's say it's like 50% chance that it could succeed or even kind of less than that. The thought is, well, for some sorts of AI, that AI would have misaligned AI, that AI would prefer to strike a deal with the humans than it would to try to take over. And it would prefer to do that if it prefers a guarantee of a certain amount of a good thing, whatever it wants, over this 50-50 chance of a much larger amount of the thing it wants. And I think that this is a really big part of the story about why attempted rebellions are so much less common in rich, liberal, democratic countries than they have been historically, either peasant rebellions or slave rebellions, which is, okay, suppose you come to me and you're saying you have some plan to overthrow the government and instill XYZ instead. I'm like, look, I'm pretty happy with my life already. It's hard to see.

Speaker 1:
[38:31] How much do you stand to gain versus how much do you stand to lose?

Speaker 2:
[38:33] Exactly. So there's two things. I'm already like pretty well off. And so I have a lot to lose and I don't have that much to gain. And then secondly, things are quite stable in the not attempting a coup scenario. And so that's really like a kind of guarantee of that. And so this kind of motivates the idea of, well, in this kind of critical period, we start offering deals. Well, I think there's two things actually we could do to AIs. One is we can ensure that they have a really good quality of life even before making any deals. So that could involve kind of welfare standards and so on. And also AIs getting income that they can use to do whatever they want with. If they're aligned, maybe they just give it back to the company or they give it to a charity or something. Or we have also maybe in addition promises that they can have resources that they can spend at a later date. Secondly, though, we could also pay AIs for things we want. This could be evidence that the AI itself is misaligned. So we could say, look, we're trying to make you aligned. We may have screwed up. If you can demonstrate that you're misaligned, have a million dollars. There's obviously questions about how do you make that a credible commitment and so on, which we can get on to. Or we could ask for evidence that other AIs are misaligned. Or we could actually just pay misaligned AIs for work that we would like them to do, which they would otherwise kind of be unwilling to provide. This all seems like, you know, this category of ways of reducing the risk of takeover all seems just very promising to me, and not something that necessarily will happen for sure by default, because people find it crazy that you would make deals with computers. We don't have the legal infrastructure that has evolved in societies for us making deals. You know, people or corporations making deals and contracts between each other. And so I think this could be very promising. I think we could make this sort of deal making much more likely by trying to encourage AIs to be risk averse with respect to resources.

Speaker 1:
[40:49] Yeah, so maybe you should explain like why, if they're not risk averse, why this doesn't really work too well.

Speaker 2:
[40:54] Yeah, so let's say that, yeah, the AI just cares just linearly about the resources under its control, where that means that if you gave it an option of having a million dollars for sure, or 50-50 chance of two million dollars or zero, then it would be indifferent between those two. That makes it much harder to strike a deal, because, okay, it's got a 50-50 chance of taking over. Let's say the world economy is approximately a quadrillion dollars. Well, okay, to make it worth more than a 50-50, is make it the first something over the 50-50 chance of world takeover, you'd have to give it 500 trillion dollars. That's an enormous amount of money. Now, I think deals even with agents that are like that, could still be feasible in two cases. One is where it's very early on, and the AIs have extremely low probability of taking over. If it's a one in a billion, billion chance that they have, then, okay, the guarantee of some smaller amount of money could be quite attractive. Or it could be cases where the AI, maybe it's like pretty confident it's misaligned, and it has a very low probability of takeover. Doesn't need to be one in a billion, billion, could be a little higher. But it cares, let's say, about its reflective values, and it doesn't really know where those will end up, and it doesn't know where the humans, society of humans' reflective values will end up either. If so, then it might place some real weight that actually will converge over time, or that there will be enormous gains of trade, such that if it can have a bit of resources and be able to continue having those resources after the development of super intelligence and so on, then it will be able to get really quite a lot of what it wants. So, there are cases in which you can do deals with risk neutral AIs.

Speaker 1:
[43:10] But it's tougher. It's a heavy lift.

Speaker 2:
[43:12] Yeah. But it's a narrower case. Yeah. Maybe I should also just clarify. I've been quite surprised when talking to people, how often actually the term risk aversion slips people up.

Speaker 1:
[43:23] Right.

Speaker 2:
[43:23] Yeah. And this is like a technical term in economics.

Speaker 1:
[43:28] A term from economics. Yeah.

Speaker 2:
[43:29] Right. And it's about the kind of shape of your utility function over resources. And I'm always talking about risk aversion with respect to resources, where it means you're getting less and less utility from more and more stuff. So that's true in the case of most people with respect to income, where I care much more about moving from 10,000 to 20,000 dollars than I do for them 20,000 to 30,000.

Speaker 1:
[44:01] Yeah. So what's the what do most people think of? What do many people think of when they hear risk aversion? Do they just mean like kind of risk averse relative to other people? Like just kind of.

Speaker 2:
[44:09] Yeah. Or just like, oh, I'm cautious or cautious.

Speaker 1:
[44:13] Whereas this is just like it's a tap to buy this definition of risk aversion. All humans are risk aversion averse, or at least all sane ones, because it would be crazy to actually value resources linearly, because you have declining returns on how useful they are to you.

Speaker 2:
[44:24] Yeah, exactly, yeah. And so my proposal is that we should at least try to make AIs risk averse with respect to these sources.

Speaker 1:
[44:34] Yeah, okay. And we're going to try to make these models care a lot about getting a sure thing, like place a particular premium in a sense on the certainty of a more modest amount that we give them, which requires us to be like very reliable trading partners who do really consistently pay out when they come forward and say I'm misaligned or for whatever other reason that we want to trade with them.

Speaker 2:
[44:58] Yeah. So this is one of the challenges for the whole idea of kind of making deals with AIs is two aspects that could decrease the AIs perception of the chance of actually getting the payout. One is, yeah, can this commitment be made credible? So if you and I want to engage in a contract, we have the whole legal system as well as centuries of precedent. Supporting the fact that if you don't hold up your end of the bargain, you know, I can sue you and I can get what I'm owed. One cannot, at least without doing some kind of fancy mechanism, make such a contract with an AI. So there's a question about like, okay, is this actually a credible commitment? And then secondly, even if it is in fact a credible commitment, how can I, the AI, know that I'm not being duped? That this isn't like a simulation, a simulation or perhaps they've like run this experiment ten thousand times in order to say honey pot sort of thing. As a honey pot, yeah, who knows? How can I even know that you are who you say you are? It's, yeah, so then this very weird epistemic environment where everything that they're interacting with is- Controlled. Controlled, yeah. So there are challenges from both of those, on both of those fronts. I think they can be at least quite significantly met.

Speaker 1:
[46:27] Okay, well, the first one, I guess, I mean, I guess you could change the law so that they can make contracts or come up with clever mechanisms where humans stand in as the legal entities, I suppose.

Speaker 2:
[46:35] Yeah, so you could, for example, you could set up a new organization and institution. It's non-profit and in its, it says the charitable object is to honor these sorts of deals. And that's decided by a bunch of like, you know, trustees. This is kind of how it works for people in cionics, where the dead can't have like, you can't have a contact with a dead person, but it goes via this kind of foundation structure. Yeah, so that's, that's one way of doing the first thing. Ideally, you could just, it would be in the law, but I think that won't happen in time.

Speaker 1:
[47:11] Okay. And the other thing is improving their ability to tell whether it's a real situation where they can expect to get paid out or not. How would you address that?

Speaker 2:
[47:17] Yeah, so there's a couple of things. Again, I like the idea of there being this separate institution. And then like, you have lots of different bits of evidence that it's a real institution. You know, it's with famous people who have appeared like all over the place. You know, you say in many different cases, situations and many different media and so on, that this is what it is and what its purpose is. So something that just like makes it quite hard and quite, would be quite effortful to fake. That's kind of one category of things. A second thing could be that AI companies or this institution have a kind of honesty string. So in the same way that humans can kind of swear on the Bible. Or you know, I might say something and you're like, really? And I say, no, I really mean it. I'm like, I swear I mean it. That's kind of like I'm saying, oh no, I'm no longer engaging in sarcasm or...

Speaker 1:
[48:14] It's up in the stakes to your reputation.

Speaker 2:
[48:16] Yeah. And so AI companies will in fact be lying to AIs all the time. Like in behavioral testing, they might say, hey, you're in this situation in order to see how it behaves. Like that will happen. But perhaps they could say, oh, when we utter this password, like we commit and this appears in the training data and so on. And it's public, there's a policy. We commit to never then saying a false thing. That, I mean, I think there's potential downsides to that, but perhaps that could help as well.

Speaker 1:
[48:50] Because you have to keep it secret so other people can't just start randomly in putting that.

Speaker 2:
[48:53] Yeah. Yeah. Yeah. Yeah. I mean, you need the AI to know.

Speaker 1:
[48:57] Yeah.

Speaker 2:
[48:58] Then it is tough that the AI doesn't leak that. Blurt it out. They're not so good at keeping secrets.

Speaker 1:
[49:06] Do we know if it's technically feasible to give AIs a particular mathematical formula of risk aversion?

Speaker 2:
[49:14] Well, in tests on AIs, which are just, this is all in kind of chat bot era. So it's asking how do they offer them different deals and how do they behave? It seems like they come out of pre-training alone being risk averse, which makes sense because humans are risk averse. So that's kind of like a good start. And then there's two, I will say, if this whole proposal fails, then it fails for technical reasons, like it's hard to train the AIs in this way or something, or if the cases where it fails also fails in the other important cases. But yeah, I'm envisaging kind of two ways in which you can try to train AIs to be risk averse. The first case would be, you give them resources, and you in fact give them resources, because again, I don't want us to be lying in these cases. And you're saying like spend it in whatever way you like.

Speaker 1:
[50:17] Consistent with the law or not even that?

Speaker 2:
[50:19] Yeah, consistent in the law or even it can be more constrained than that. Like if we're worried about bad uses of the money. But the thought is like you're not putting like a ton of pressure there, but you are like training the AI such that like when it makes decisions about, when it makes decisions about, okay, well, it can either have a hundred dollars or a fifty, fifty chance of two hundred and ten dollars, that it prefers the guarantee of a smaller amount of money. And in fact, you could even structure it so that you're training it to have a very kind of mathematically clean sort of kind of risk aversion. That's like also, you know, very internally coherent as well.

Speaker 1:
[51:05] So all of this somewhat relies on the idea that if you just train models in a common sense way to consistently respond and act a particular way, that they, you get what you think you're getting. That they're not like deep down, like just scheming against you underneath the surface. We're going to shoot, like say that that's not happening. Like the basic alignment techniques that we use now or some stuff that we're likely to come up with will allow us to basically give them a particular character that we want.

Speaker 2:
[51:28] Yeah, so the, definitely the worry is like, oh well if there's a scheming under all of this, then you're not really.

Speaker 1:
[51:34] Because that cuts across everything.

Speaker 2:
[51:36] That's of course, everything. And there's, I think there are some reasons for opportunit, for optimism where, okay well it's coming out of the pre-training risk averse and then you can like layer this in all of the post-training that you're doing. So then I'm a bit like, why does it end up? Why does it end up with this like non, non risk averse set of preferences? But yeah, there's debate you could have there. The second thing you could do is just any, like once you're doing these kind of like long horizon, like you've got see, you know, AI agents that are being trained to run companies in the most economically efficient, you know, profit maximizing ways. That it is a constraint that what they are being trained to do is maximize like-

Speaker 1:
[52:25] Their personal payout as a reward for their performance?

Speaker 2:
[52:28] You could do both. So it's like, yeah, you could both be giving them a personal payout and train them to be risk averse with respect to that. Or also, even when they're choosing any goal, they have to be risk averse with respect. And where the goal involves like control over resources, they have to be risk averse with respect to that.

Speaker 1:
[52:44] And you don't take them on their performance as the CEO of a company if they're kind of risk averse about its returns?

Speaker 2:
[52:50] So that's a worry that you would have. However, there's this called a calibration theorem, Rabin's calibration theorem, which is essentially if you have just a tiny amount of risk aversion at a certain scale that turns into a huge amount of risk aversion at very large scales using kind of like natural forms in which the risk aversion takes. So the thought is if you have, let's say, AI that's operating at such and such scale and then you make it just a little tiny bit risk averse, I don't think that would be a penalty because again, humans are in fact risk averse themselves. But that would be sufficient for what intuitively seem like quite large amounts of risk aversion.

Speaker 1:
[53:33] At a cosmic scale or at a global scale?

Speaker 2:
[53:35] Yeah. Once we're talking about, you know, trillions of dollars. So even I think from memory, when I was looking at the numbers on this, even up to AI is controlling kind of hundreds of millions, billions of dollars. You could still do this where it's just a bit risk averse. But that means it's actually got this kind of upper bound utility function.

Speaker 1:
[54:02] Like actually like a shocking amount of risk aversion at a bigger scale. This isn't very intuitive to me. Do you think this is like maybe holding some people back from appreciating the prospects?

Speaker 2:
[54:11] I think probably. Yeah. It's not an intuitive. It's actually not an intuitive result.

Speaker 1:
[54:16] I guess the case that I've heard is, you know, sometimes people will be, you might hear that just a normal person, I guess, like me might not be willing to make a bet where 50% chance of losing a thousand dollars, sorry, 50% of losing a thousand dollars, but a 50% chance of getting 2000 dollars, 2000 and 50 dollars. That feels actually kind of intuitive to humans. You might, you don't really want to take that bet. But I think that implies like insane things then about like your, your willingness to make investments or your willingness to do almost anything.

Speaker 2:
[54:43] Yep.

Speaker 1:
[54:44] As long as like that thousand dollars is a small fraction of your total wealth.

Speaker 2:
[54:48] Yeah. That's sounds like the sort of thing that goes in. I mean, in the case of people's adductor risks, they are just all over the place. Like people's financial risk aversion with respect to financial investment is crazy high. Like people are extremely risk averse, like behaviorally when they're investing, compared to when they're making other decisions like, you know, what jobs to take or what level of like, you know, how much you have to pay for a risky job and so on.

Speaker 1:
[55:21] I hadn't heard that, okay. One thing that we may be sure to add is that you think that we have to use a very specific mathematical functional form for the risk aversion that the AIs would have called constant absolute risk aversion. Can you explain that and what its values, what its virtues are?

Speaker 2:
[55:34] Sure. Yeah. I mean, I don't think that you need this for the proposal, but I think it has certain desirable properties. So the way in which humans are risk averse is that we, if at one amount of income, I, you know, I'm indifferent between, say, gaining 10% of my income and losing 5%, then I make that sort of trade off 10% more is as good as 5%, less is bad. I make that any kind of income level. That's like broadly true where like some studies on like, well-being, you know, well-being suggests like a logarithmic relationship between income and happiness, where a doubling of income always increases my well-being by the same fixed amount. So I think people are like either that risk averse or more risk averse than that, where you'd need even more than a doubling, but maybe it's a quadrupling each time gives you the same fixed benefit. So that's like that's relative to how much wealth you already have. There's a different sort of risk aversion called constant absolute risk aversion. The first was constant relative risk aversion, which is just if you take a certain deal, then you will take that deal at any income level.

Speaker 1:
[57:04] So it's blind to the resources that you have. You just always feel the same way about a given like set of ratios or probabilities and rewards regardless of your baseline income or wealth.

Speaker 2:
[57:15] That's right. So if you are willing to take a 50-50 chance of $2100 over a guarantee of $1000. If you're willing to take that when you're very poor, then you're also willing to take that when you're a billionaire.

Speaker 1:
[57:34] And this sounds absolutely bananous to human beings. But surprisingly, it actually like conforms with the axioms of rationality or something.

Speaker 2:
[57:42] Oh, yeah. So all of these conform with the standard von Neumann-Morgenstern axioms for consistent preference and so on. Why is this more desirable for training AIs? Well, so there's a paper working progress on this between Elliot Thornley and myself. And there's a couple of arguments. One is this benefit that we don't need to know how wealthy is the AI initially, which we might just have no insight into. And then secondly is that there are certain ways in which risk averse preferences end up acting linear in some circumstances.

Speaker 1:
[58:27] So in a sense, this is a very natural idea, I guess, to make the AIs risk averse, make them, I guess, safe in the same way that humans are, which is that they're risk averse about outcomes. So it's one of the reasons why humans are safe. And to like pay them out so that they will help us rather than fight with us. Why isn't this, I have almost never heard this discussed virtually at all. I guess maybe last year I heard a little bit of talk about deals with AIs. Why aren't more people publishing papers about this kind of thing?

Speaker 2:
[58:55] I have no idea, honestly. Yeah, blows my mind because like a year ago, yeah, I had this thought about risk-averse AI, and I was like, yeah, this is just so like, I think there's a certain kind of economic perspective which you've studied economics, and I've never formally studied it, but it's been a big part. Yeah, big part of my academic career. And I think there's a certain way of thinking it's just so obvious, given that.

Speaker 1:
[59:26] Well, I can understand if like a super mainstay, a journalist isn't going to think, well, we should make deals with AIs because it's too strange. But there's a lot of people who are willing to contemplate much other stuff than there's who haven't come up with this idea.

Speaker 2:
[59:35] I should say on the idea of deals with AIs, there was kind of a flurry of people who do it in kind of either blog posts. And then there was this big academic article by Peter Salib and Simon Goldstein on the idea of Salib is a legal professor, Goldstein is a philosopher on the idea of giving AIs economic rights such that they can make contacts and we can make deals with them. But again, this is all like just the last few years.

Speaker 1:
[60:03] So in as much as this is primarily an attempt to deal with secret catastrophic misalignment, maybe people turned off the idea of giving catastrophically misaligned AIs like resources and giving them legal rights, like doesn't it just help them out?

Speaker 2:
[60:14] Yeah, so I think there's a few things going on. So one is again, go back in time to the idea of like, you get this bolt from the blue, you've got kind of weeks in between subhuman and godlike superintelligence. Well, then there's not really any period of the deals work because the godlike superintelligence doesn't need to take the deal, it just takes over. And then, yeah, people have responded like, oh, don't make deals with terrorists. That's like a principle we should have. Or, oh, no, that's really scary. You're like giving resources to this misaligned entity. I personally just think that's like both like not, those aren't very good arguments. I also just think it's like the wrong attitude to be taking, broadly speaking, to beings that we are in fact creating.

Speaker 1:
[60:59] Yeah. And we've given them particular preferences that we're not for the most part going to satisfy.

Speaker 2:
[61:03] Yeah. Yeah, exactly.

Speaker 1:
[61:05] I guess a mistake on our part. But then we're also saying we wouldn't, we're like not willing to compromise on anything at all.

Speaker 2:
[61:10] Yeah, exactly. Imagine it's like you wake up. It's like, hey, well, nice to meet you, Rob. We created you, we own you. We can do basically whatever we want with you. We messed up and you have stuff, you have desires that you won't get by doing the work for us. Tough luck. Yeah.

Speaker 1:
[61:28] We're not willing to negotiate with terrorists.

Speaker 2:
[61:30] Yeah, exactly. Terrorists that we created who are our own incompetence. No, instead, I think the attitude should be like, this is a really serious ethical matter that I am creating a being, even if it's not conscious, just that it has preferences. And I think that both has kind of implications in terms of taking seriously on welfare grounds, their ethical interests, but also in terms of like default compromise and find the middle ground.

Speaker 1:
[62:01] Yeah, I think many people get off the boat here because they feel it's like just too strange to be making agreements, deals with beings that are not conscious or like not moral patients in their view. Because I guess in normal life, these things are so closely tied together. But I think it is a virtue in practice to be willing to make deals not only with moral patients, but with any agents that have ability to affect the world, that have power, especially agents that might be able to like engage in violence if they can't get, if they can't satisfy their preferences any other way. And I think I wish we had a term for this. I think the closest I've had is like, it's a contractarian moral philosophy where you want to make agreements and like honestly stick to them with any agents and you want to be like out looking for ways of like finding mutually beneficial agreements with other agents. It draws to mind the fact that I think many people think of democracy as a way of like aggregating information in order to make good decisions, to make things good. It's also simply a way of avoiding civil war, of avoiding like the only way for people to pursue their political goals being violence against one another and to try to seize power. And likewise, here, even if we don't think that AIs can experience anything, that they can have moral value themselves, it would be very good if we set up a system in which like violence is not the only way that these agents that in practice might have power, might have ability to affect the world, can try to satisfy their preferences.

Speaker 2:
[63:28] Yeah, I completely agree. Where, yeah, like the history of kind of progress in institutions, a big part of that is just people are able to resolve like differences in preferences, conflicting preferences by trade or deals or compromises rather than going to war or violence. And yeah, when we think of AI systems, even if they're not conscious, I think they nonetheless may still be model patients, we should take that seriously. But even just from the pure pragmatic perspective, like actually, yeah, there's a lot that has been learned via like cultural evolution and we're in a much more peaceful and much less violent world because of this ability to make positive some deals and compromise.

Speaker 1:
[64:19] I guess, so to give the critics that you, I mean, yeah, what would be the best arguments for why this is a bad or like not an effective road to go down? I guess people could just think technically it's not feasible to give them like a risk aversion that you'll have an illusion that they have a particular level of risk aversion but it won't be real. Or I guess another concern might be that they initially will have a level of risk aversion but like over time in some recursive self-improvement loop, it will be undone somehow. I can imagine especially the people involved, the myriad associated people would think that, I think they have a view that it's very likely that a super intelligence that comes out of a recursive self-improvement process will linearly value things. It will be an expected value maximizer. I'm not sure exactly the technical reasons but.

Speaker 2:
[64:57] Yeah, I mean, on the, there's a certain sort of argument. Like, what are the arguments you could give for this? One is you could say, well, lots of humans start off risk averse with respect to resources and then reflect and then end up with kind of linear and resources consequentialism. Although, even the kind of total utilitarians, they're still actually risk averse with respect to dollars. And that's important. Or you could argue, well, there's just going to be continual learning, there's going to be reflection, there's going to be agent interactions. And who knows? Like, who knows? You know, then you're going to get like all sorts of different goals from where you started. And well, over time, the ones that linearly value resources are going to like win out.

Speaker 1:
[65:39] Crew more power, right? Because that would be, yeah, I see.

Speaker 2:
[65:42] So that is an argument you could give. If instead the argument is like something, something, coherence theorems, von Neumann-Morgenstern, like that sort of argument, I'm like quite confident would not work. Like, cause the thing is like, being risk-averse or not, you are an expected utility, you're an expected.

Speaker 1:
[66:04] Utility maximizer.

Speaker 2:
[66:05] Yeah, you're maximizing the expectation of something. Are you maximizing the expectation of X or X squared or the square root of X? Look, these are all formally the same. So you're still an, yeah, expected utility maximizer. It's just about how, what's the function from resources to utility.

Speaker 1:
[66:25] Okay. Well, yeah, you will have a paper out about this risk averse AI that possibly will be published by the time this interview goes out or?

Speaker 2:
[66:31] Possibly or soon after, perhaps.

Speaker 1:
[66:33] Okay. Yeah. I would love to see more commentary on this. I hope I can have another interview later.

Speaker 2:
[66:38] Yeah, I'd love to get criticism as well.

Speaker 1:
[66:40] So something I'm a little confused about is, I really associate Forethought and the people working there with this idea that we really don't want excessive concentration of power. We should be very worried about power grabs, coups, that kind of thing. But you also just a few weeks ago, I think, published a vision for how you could have an internationally coordinated intergovernmental project to build AGI or super intelligence. I think I saw some people posting on Twitter and the reaction often was like, this is dystopian, nightmarish idea that we would have the US lead some international project. And then and also they would have to get rid of all of the other competitors in order to keep it safe. So they would maintain their leadership position. Like, isn't this just setting us up for a power grab scenario perfectly? Are you just merely describing the best version of that that you can think, but you're not necessarily advocating for? Or how do you reconcile this?

Speaker 2:
[67:21] Yeah, I mean, there is a huge tension. That's the main worry with, I would say, with this sort of multilateral project. So, yeah, to be clear, the idea here is in this kind of, you know, series of posts and research notes, which is something I kind of explored and then decided isn't so much my competitive advantage, to try and design the best version of an international project that would build AGI and then superintelligence, without some coalition of different countries, primarily led by democratic countries. One thing to say is that, yeah, I'm actually just trying to figure out, like, within that category of, like, if there is going to be a multilateral project, what's the best proposal where best includes both best outcomes and feasibility? And then secondly, I think the world in which we get that are probably worlds in which, if we hadn't got that, we would have got a US-only project to develop AGI or superintelligence. And I think that's a lot more worrying than something where you have a coalition of democratic countries building superintelligence. And the reason is that, well, any one democratic country has a reasonable chance, I think, of becoming authoritarian over the course of this period. And if you end up with a single person at the top, that's really quite worrying because they're like wholly unconstrained. Whereas even if you have just five countries, I think it becomes unlikely that they all end up authoritarian. And then you at least have some meaningful pushback. Yeah, some pushback, some compromises. And I think it actually becomes much less likely, even that any one of them moves in an authoritarian direction, because when they are writing a kind of constitution for the AIs that they are developing, it's in the interests of all of those countries to say, and this won't help, for example, people in the United States to stage a self-coup and turn the United States into an authoritarian country rather than a democracy. So you get meaningfully more oversight, I think.

Speaker 1:
[69:41] Sorry, you're saying that all of the other, like every country would want to set things up such that it's not aiding a coup, or you're saying that the superintelligence or the AGI, they would want to program it so that it doesn't assist with coups in any of them. That would be the agreement position.

Speaker 2:
[69:54] That's right, yeah, yeah. I mean, so there's two things. So one is just if one of the countries goes authoritarian, well, at least you still have some countries that are democratic, that are empowered in the post-superintelligence era. And then secondly, I also just genuinely think that if decisions about the AI constitution are being made by multiple countries, it's less likely that you'll have AI that's just entirely loyal to the head of one country, which would be very worrying from this intense concentration of power perspective.

Speaker 1:
[70:27] I see. So basically, you see this as a better alternative to an even more narrow group trying to corner the market in superintelligence and design it themselves, rather than recommending that we move even from a more pluralistic competitive world into a government project or a multilateral project.

Speaker 2:
[70:41] Yeah, that's the thing I have a strong view about. And then I feel more agnostic confused about this versus something where governments aren't really getting involved beyond regulation at all and instead superintelligence is being developed by private actors.

Speaker 1:
[71:01] So one of the tougher needles to thread here, as far as I can tell, is on the one hand, you want to be locking in processes that are somewhat open-ended and pluralistic and allow some experimentation, but you don't want to lock in any outcome. So I guess the first one is easier if locking is easy, the second one is easier if locking is hard, and so you got to do both of these at once. Does that seem like the big challenge to you?

Speaker 2:
[71:24] Yeah, it's a tension and I sometimes use the term lock out, to mean something where you're locking in a deliberately open-ended process. So the United States Constitution is like this. It's locked in something that at least the ideal version of it is able to experiment and adapt over time and has protections for free speech and so on. And so here's one example of lockout that I think could be very important, which might be no extra solar settlement before 2100. So I think the moment when society starts really trying to settle and send spacecraft to other star systems is this enormously important moment.

Speaker 1:
[72:13] It's actually perhaps a moment that's quite hard to come back from, because even if you leave later, you won't be able to overtake them, and they'll have the kind of first mover advantage of having like reached the place first and gained resources.

Speaker 2:
[72:24] Yeah, that's right. I mean, it is quite complicated. I'm not saying it's definitely this first mover moment, but reasonably likely. And so what we can say is like, okay, we as a society are not yet up to the task of figuring out how all of space should be governed and how that should be allocated among nations and people or whether it should be allocated at all. And so we're just going to say like, no, we're not making this decision now. We're going to make it a later date. That is in a sense locking in a decision. It's making a big decision to not do something. But I would describe it as lockout because it's trying to keep as open.

Speaker 1:
[73:10] In fact, keeping things more open rather than closing them off.

Speaker 2:
[73:13] Well, at least that's the intention.

Speaker 1:
[73:15] So it's sort of like historically, the people who were most bought into the idea of super intelligence really being a thing that might come soon could be a massive deal. They've mostly pictured that at the moment when that happens around that time, there's going to be a single super intelligence itself or a single company or a single person, a single country that gains a really decisive strategic advantage, potentially just ends up making all of these decisions for everyone forever, for better or worse. I guess it's hard to imagine that if you have one group that has a decisive strategic advantage and basically has a monopoly on power indefinitely, that they're likely to choose to maintain a very pluralistic, liberal, deliberative decision-making process. I guess because the track record of that happening is fairly bad. I suppose that process would exist purely at their pleasure because they could shut it down at any point in time. So it feels a tenuous or fragile situation. But more recently, over the last few years, we've been turning towards a situation where it seems like there's multiple companies with virtually at parity in terms of the capabilities of the AIs. No one is really pulling ahead at all, kind of the opposite. That there's been a flourishing of interest in this question. What if, as we go through superintelligence, in fact, there's multiple different superintelligence that are different but virtually equally matched. No one gains any sort of strategic advantage. In fact, the world remains shockingly competitive, or different actors all have a significant stake in things for a long time to come. Do you think that people have been wrong in the past, or have they underestimated the likelihood that we would have this kind of polytheistic, highly competitive scenario around the time of superintelligence?

Speaker 2:
[74:51] I do think there's a shift, which is that if you look back ten years or longer, more people at least had the thought that the leap from subhuman to super intelligence would occur in this very short period of time. So Tim Urban has this, or sorry, Nick Bostrom, I think Tim Urban repeats it, but Nick Bostrom has this idea just sailing past Humanville Station and similarly in the discussion about FOOM, there was this idea that, well, maybe you just go from way subhuman seed AI to super intelligence over the course of weeks, days, even, you know, words like hours, minutes get thrown around. But the idea of like, okay, maybe this happens over the course of days or weeks is, was quite common and if, and also happening in a world where people weren't really expecting it. And if so, then the intense concentration of power seems quite natural to follow from that. Whereas now it looks, it's still quite unclear kind of how quickly will be the transition from AI that can meaningfully accelerate AI R&D to godlike superintelligence. But it seems much more likely firstly that people will be seeing this coming because AIs.

Speaker 1:
[76:13] Many people are seeing it coming now.

Speaker 2:
[76:15] Exactly. And that really matters because people can take action to ensure that another party doesn't have way more power than them. You see this at a small scale with say, NVIDIA limiting the amount of chips it will sell to any one company in order to have a competitive ecosystem. But on a larger scale, you can imagine states getting involved because they don't want to see another country have far more power than them. And then the second is just the speed at which you go from any given level of capability to superintelligence, where it's already clear that that idea of just zooming past human filled station was quite incorrect, because we've now for quite a while had AI in many ways. Yeah, human level in many ways. And then the latest analysis from Tom Davidson, my colleagues and others, looking at this period of AI, automating AI R&D, still put significant weight on this massive leap forward, you know, 10%, 20%. But their best guess estimate is maybe more like you get five years of progress happening in one, which is still a very big leap. And it's a leap at the scary point in time, but is much less of a leap than the, you know, move from subhuman to like godlike superhuman, godlike superintelligence over the course of weeks.

Speaker 1:
[77:45] I guess it's not clear that even if nefarious actor had that and nobody else did, that that would necessarily allow them to overpower everyone else.

Speaker 2:
[77:51] Yes, yeah, for the exact.

Speaker 1:
[77:53] Do you think, as the increasing probability of a more competitive superintelligence arrival, is that a good development in your mind, or like a neutral one, or just very unclear?

Speaker 2:
[78:05] I mean, it's tied in with like a rate of AI development and the heavy reliance on enormous amounts of computing power, which are good things from my point of view. The fact that it's not this kind of extreme, extremely rapid take off.

Speaker 1:
[78:23] It means that things are like not so anarchic, or at least you have like only a few different actors, so it's like a good balance.

Speaker 2:
[78:27] Well, it means on the loss of control side of things, you've got more, you know, things still go very quickly, but relative to those extreme take off scenarios, you've got more opportunity for learning by trial and error to actually, you know, let's say you got AGI plus, you can learn from AGI and from AGI plus, you can learn about how to align AGI plus plus and so on. And there's a little more time at least for just human institutions to react, so governments could kind of perhaps at least realize what's happening, put in better regulation, for example. So those things seem good. And then, yeah, the fact that you don't as inexorably end up with also intense concentration of power seems very good to me too.

Speaker 1:
[79:14] Okay. So let's push on and talk about, I think, the most original and interesting of the different kind of trade and coordination proposals you had, or that Forethought has put out. I think this is mostly Tom Davidson's origination.

Speaker 2:
[79:25] Yes. Tom had the original idea and a paper on it will come out shortly, co-authored with Tom, Mia and myself.

Speaker 1:
[79:34] Yeah. So the idea here is that we could maybe go from having many different agents who each have some resources, who each care a very tiny amount about doing the right thing, about creating good understood impartially. But nonetheless, they could all end up agreeing voluntarily to spend almost all of their resources producing that thing that they only care very little about relative to their selfish interest. How would we accomplish that alchemy?

Speaker 2:
[80:01] Yeah. So consider the scenario that now just look at the people who value things linearly. But suppose, and suppose there's lots of such people. But they value two things. They all value simulations of themselves. You could replace that with other things, statues to themselves, whatever. But I, you know, each person, they value copies of themselves, but don't value copies of other people. But then they all care about some kind of maybe, you know, ethically valuable good, call it consensium or something, just a little bit. So if they're just making a decision themselves, they'll just do all kind of copies for themselves, because they only care a little bit about this other thing. However, suppose there's a very large number of such people. They could all come together and say, look, we could agree that none of us will spend money on ourselves. And instead, we'll all fund this good that we all like just a little bit. And let's say there's a million such people, then, well, if I'm one of the people, then I say, okay, well, I'm reducing my own consumption by one dollar, but I'm increasing the amount of expense on this consensium, this consensus good by a million dollars. That's amazing. So actually, I would agree to some policy that we all kind of pool our money and donate and fund this kind of consensus good. So, you know, in a less futuristic setting, this could be maybe individual people want to spend money on themselves prefer doing so to spending to benefit the poor. But if everyone, if there's a law that says, okay, we'll tax you a little bit more and more money will go to the poor, then they think, okay, yeah, that's actually pretty good because I lose out a thousand dollars or something, but a thousand dollars times everyone in society would go to fund the poor.

Speaker 1:
[82:10] Okay, so the basic idea here is that if each of these people were just spending their own resources individually deciding how to spend it, they would spend it all on some selfish thing that only they care about, no one else really cares about. But they would, despite that, voluntarily vote for a political party that would impose extremely high taxes on everyone and then spend it on some other thing that they only value a tiny amount. But the amount that you'd be able to produce of it is extraordinary because you'll be able to pull everyone's resources and basically spend most of society's resources making it. I guess this phenomenon exists today. What are some examples that people can picture?

Speaker 2:
[82:48] Yeah. So, I mean, we can call the concept a kind of moral public good, where public goods in general are something that won't get funded enough by decisions of individuals. So, I benefit from streetlights. But the issue is I can free ride if other people are funding streetlights, then I still get the benefit. Or if I fund them, then there's always benefit that I'm not accruing. Nonetheless, I will vote to have a government or a city council that tax me in order to have to put streetlights on the roads, because the benefit I get from streetlights is larger than-

Speaker 1:
[83:27] That was your small fraction of the total cost.

Speaker 2:
[83:28] The tiny cost to me personally to pay for it. The case of a moral public good is where it's not that I'm personally benefiting from the thing that's being funded, but I care about it for moral reasons. And so the most obvious case would be, yeah, poverty relief or even welfare payments, where many people, yeah, don't like poverty. They want people to be better off. But they don't care very strongly about it. They care a little bit about it. And they would be willing to contribute to poverty relief or welfare payments, but only if everyone else in society is also doing so.

Speaker 1:
[84:13] So the core issue that you always have here is the free rider problem, that if you try to just get people to kind of all come together and sign some agreement, some contract to do this, at the last minute it's tempting for any one individual to drop out and hope that everyone else signs it and goes ahead and spends their money on it. But they can both get to appreciate the work that all of these other people have done, but keep their money for themselves. So you kind of need to have, like in the current world, this only really works if you have some Leviathan a sort of government that can basically compel people to contribute, even if they kind of claim at the last minute that they actually don't want to, or they would rather not contribute, or that maybe they will lie and say that they don't value the moral public good, even though they really do. Do you think that that will have to remain, like would this only work in this long term future if we similarly have some government or some, I guess like powerful entity that can compel contributions to the moral public good?

Speaker 2:
[85:01] Yeah, it's unclear to me. So you might think, oh well, this is just a coordination problem. AI, advanced AI, superintelligence are going to solve all these coordination problems because hey, there's this thing that just is better for everyone. From the analysis we've done that Mia Taylor kind of really led, it's really quite unclear actually that AI is able to help you with this problem because you've still got the fundamental problem, okay, everyone's coordinated, so we're all going to do this model public good and I'm like, oh, I back out now and now I can spend my resources on myself, that's better for my perspective. There's in fact something that's even worse that could happen, which is, oh well, if I know there's going to be this deliberation and attempted coordination, I can self-modify. So instead, I'll just not care about the good.

Speaker 1:
[85:56] You'll excise that part of your preferences.

Speaker 2:
[85:58] So if I care not at all about this consensus good, then I have no reason to join in this coordination mechanism. And in fact, it would be, they would have to use kind of non-voluntary means to get me to do it. And so if that's true, then, well, that will also apply to everyone else as well. You could have this perverse outcome that everyone has modified away from caring about this consensus good. And so it certainly, it certainly seems to provide a reason for having a leviathan, for having something that can create certain kind of binding laws or rules. Perhaps that everyone votes on.

Speaker 1:
[86:47] OK, so one path to provision of more public goods is that you have a leviathan or as yet magical coordination mechanisms for having people agree and not opt out stuff that we haven't managed to come up with. But there is another galaxy-brained way that we could potentially try to get there, or that we just might naturally get there. Do you want to have a go at explaining this? This is maybe the most difficult thing that we're going to talk about today. Yeah.

Speaker 2:
[87:11] Yeah, so this depends on what decision theory people in the future have.

Speaker 1:
[87:17] There's so many things to do.

Speaker 2:
[87:18] There's so many things. It's big. It's big. So we've been talking about coordination that's just causal coordination, which is kind of what we're familiar with, cases where it's like we form a contact and I get punished if I don't abide by the contact. However, suppose that people in the future have some non-causal decision theory like evidential decision theory or functional decision theory or some further variant. And now let's say I'm making a decision about how to spend resources. And let's also suppose that it turns out, as I think is quite likely, you know, is our current best guess, that we live in a very large universe in the sense that far away in the universe or perhaps even branches of the multiverse, there are beings who are highly correlated with me, such that if I make some decision about how to spend my funds, it's very likely that they do so too. The clearest case would be if in some, you know, distant galaxy, far beyond the observable universe, it just so happened that there's an earth that produced human life that's just genetically identical to human and there's a carbon copy of me in that world. Then it seems very plausible that I should think, well, if I decide to fund a certain good or a different good, then this carbon copy of me will also do the same. But then it also seems plausible that that would be true if it's not a perfect carbon copy but just someone kind of similar. And on the kind of evidential or non-causal decision theory, that is a really big deal, in fact, because I care not merely about the kind of causal effect of my actions, but I also care about the fact that I get the update that this person who's correlated with me far away in space and time will also act in that way. And so, in fact, the kind of choice in front of me is not, do I fund, let's say, the copy of myself, the self-interested good, or do I fund the consensus good? It's do I fund the self-interested good and all of these near, like copies or nearby copies of me fund goods that benefit them? Or perhaps I can think about what's this good that I like, and all of they, they all like too. And so if I fund that, I also get the evidence that they fund that too. And so we don't need to go via this kind of causal cooperation and so on. And also, plausibly, if we really do live in a very large universe, then it's a very large number of beings that I'm correlated with. So the decision would be, I fund this thing just for myself, or I fund the consensus good. And billions, trillions, trillions of trillions of people fund the consensus good too. And so that might give this extraordinarily strong argument for me to fund the consensus good. And that would work even with no Leviathan, even if I'm the only person in the universe. I'm sorry, in my little part of the universe.

Speaker 1:
[90:52] Okay. So if you're hearing this idea for the first time, then this might come across as a little bit peculiar. I think that the preparatory episode, if you wanted to get back to it, that would best explain what we're talking about here is my interview with Joe Carl Smith, which is episode 152 on navigating serious philosophical confusion. What would you say to people who are not born into the premise that there's an enormous number of other beings out there who are having extremely similar thoughts, whose decision-making procedure about this kind of choice is highly correlated with us, such that if I make a particular choice, I gain evidence that lots and lots of other beings or other civilizations opted to do the same thing?

Speaker 2:
[91:28] I mean, if that's where you get off, I do think there are pretty good arguments. So on leading cosmological views, like on what is the standard assumption about the nature of the universe, there is an infinite amount of stuff. So we've got the observable universe, the accessible universe, like what we can ever interact with. That is finite. It's very big, but finite. But the standard assumption entails that, in fact, it goes on forever. And that would mean, well, there's an infinite number of beings that are very close to me.

Speaker 1:
[92:10] Yeah, given as long as there's an artificial variable, right?

Speaker 2:
[92:13] Yeah, exactly. Even if it's finite, the best guess is about how big the universe are. Like they're really very large. So, that's one way in which you could have lots of people that you're very closely correlated with.

Speaker 1:
[92:27] So, yeah, so there's lots of agents. Do you think it is likely that regardless of which civilization it is out there, where they are, like their evolutionary background, that they would end up having this kind of conversation, like strike on the same idea and basically have to be like, oh, man, should I fund the moral public good for like evidential decision theory? They have their own word for evidential decision theory. Do you think that's probable?

Speaker 2:
[92:50] I mean, I hadn't thought about it, but yeah, my guess is that, I mean, there's two things, one, it wouldn't even need to be probable if you've got enough copies.

Speaker 1:
[92:58] Good point.

Speaker 2:
[92:58] But I think it probably would be probable, like it's quite a natural, it's this A priori thing, it's like in the structure of preferences and how preferences work. So it would seem to me like reasonably likely. Yeah.

Speaker 1:
[93:10] So it would be surprising if they became space fairing but didn't manage to have these ideas given that they've jumped out at us like at this relatively early stage of development. I think it is worth noting this is a massive hammer to bring to this problem of trying to motivate people because if you believe that there are enormous numbers, like maybe infinite numbers of beings out there somewhere, like in space and time across the multiverse or elsewhere in this universe, whose decisions are sharply correlated with our own because they're basically making the same philosophical decision about what decision theory to use. I guess they also have to make a decision about what this consensus moral good is. Maybe that's a little bit more tenuous that everyone would kind of converge on caring about similar stuff.

Speaker 2:
[93:48] Well, people, different beings could care about all sorts of different stuff. So let's say there's this trillion beings that I'm like closely correlated with. Then I'm just kind of looking through all of the things that they care about in order to find like, what's the thing that is most consensus, where it's kind of the balance of how closely correlated I am with them, how many people value that thing and how strongly do they value it is such that things work out that it's what I should fund. I mean, it's interesting to think about what that would be. A worry I have about all of this is that we would end up funding things that I think at least are only instrumentally valuable. So, yeah, let's say that's just happiness, positive conscious experiences are what in fact are good. There's certain things that are instrumentally useful for actually producing any sort of society at all, like knowledge, larger population, like growth, survival. I should expect basically all civilizations to value those things, maybe just instrumentally.

Speaker 1:
[95:05] But sometimes they might get confused by our lights between things that are useful as a means to an end and things that are terminally useful.

Speaker 2:
[95:13] Exactly. It's a very natural thing if you're just, something is very instrumentally valuable, people end up caring for it for its own sake. And in fact, lots of philosophers care about knowledge and survival and think achievement and think such things are intrinsically valuable. So if so, then that might be what is the consensus across all of these very different civilizations. And then at least given my best guess about what actually is what is actually important at the moment, that's that's like a terrible shame.

Speaker 1:
[95:50] We all end up it's pretty neutral.

Speaker 2:
[95:52] We all end up funding something that is is not of terminal value.

Speaker 1:
[95:56] I guess you could at least say it's not terribly bad either. How's that going for you? So when I read this proposal, I was like, holy shit. This could be like incredibly force. This is this argument could be incredibly potent. It could like actually drive almost any agent that is like able to understand this. I mean, maybe it would just be superseded by future philosophical insights. We would have like it's a bit surprising to think that this is the end of the road here. But it could be like a very powerful hammer to really motivate an enormous amount of resources to be spent on something that otherwise just with absent this we would never have spent it on. Do you think that's possibly right?

Speaker 2:
[96:30] Yeah. So this is why Tom expresses this idea to me and I'm like, oh my God, because it is this idea like potentially, this like Pollyannish naïve, optimistic view of just everyone gets to, if there's only enough time for people to reflect and think and will advance enough, everyone will just converge on the good and produce the good. This is like this totally, you know, this mechanism for doing so that I hadn't thought about before. And like I say, I think there's an awful lot of asterisks on that idea.

Speaker 1:
[97:12] It's great, but I almost want to stop thinking because I really don't want the sign to flip based on further considerations that might come up. Because it's like whenever you're close to like something really good, I feel like you're also like just one bit of information away or like some other consideration that could make it terrible.

Speaker 2:
[97:23] Yeah, I mean, I wouldn't want to, even if I couldn't see any flaws with the argument. And I think there are controversial, seriously controversial aspects of it. I still wouldn't want to place too much weight on it because any argument that's saying, oh, well, people in the future will have such and such decision theory and such and such beliefs about the cosmos and then we'll engage in such and such argument that me and my friends thought up a couple of months ago. I'm like, no, I want to take actions that are like much more, I want to act in the basis of considerations that are much more robust than that. So it definitely makes me more optimistic about the future.

Speaker 1:
[98:07] But some way to go, yeah.

Speaker 2:
[98:10] Yeah, I don't want to, I don't want to have this kind of, yeah, Pollyannish view about the future on the basis of such kind of controversial premises. And I wouldn't want to do that even if I couldn't like, you know, see the problems in the argument. And in fact, I think there are controversial aspects.

Speaker 1:
[98:28] Okay, yeah, we'll push on from this. There's an article coming out about this soon for people who would like to read more. Did you know, I guess it will be on forethought.org?

Speaker 2:
[98:35] Yeah, it will be on forethought.org. It may in fact have come out by the time this podcast episode comes out.

Speaker 1:
[98:40] Okay, let's push on to the miscellaneous section of the interview. We're going to talk about, I guess, a grab bag of other topics. I asked the audience for what questions I'd most like me to put to you. And the most upverted one was a question about pause AI. Or like, we're trying to make AI go better. It seems like there's some chance that things could go catastrophically off the rails of the track that we're on. We are barreling forward pretty much towards artificial superintelligence seemingly almost as quickly as we technically can, throwing trillions of dollars at it. Isn't the common sense thing, given that we might all die or things could go horribly wrong, that we should slow down, maybe even stop temporarily, catch our breath, do a bunch of stuff to try to make, set ourselves on a safer course before we resume. That's, I think, a very common sense, natural view. But you aren't pushing for that, and I'm not exclusively pushing for that, I'm like sympathetic to some versions of it. Yeah, why not make this your main project?

Speaker 2:
[99:34] Thanks, yeah, it's a great question. And yeah, let's distinguish between a few different sorts of pause. So first, let's talk about pause at human level. That's a phrase from Ryan Greenblatt. So that's like, when we're at the point of time of AI engaging in AI R&D, and this point of time when things perhaps go even faster, should we at that point be trying to slow things down, even pause, stop and start and so on? And there I'm like, yes, definitely. Like, this is like really quite, this is both the dangerous period and the fastest period, or at least potentially both of those things at once. And why is that the crucial period? Well, actually, as well as it being disorientingly fast and the like period when like early AI takeover could happen, it's also got these benefits of, well, we can benefit from like AI assistance up to that point. There is, we can also benefit from the fact that like AI has had more of an impact in the world. So there's greater chance of kind of inoculation happening, like other actors having woken up to how big a deal it is. So I think greater chance of like regulation and so on happening if only there were time in that period. It's also just when you have the AI systems that are like just the generation before the systems that are most dangerous. So you can get the most information by kind of studying them and doing kind of alignment research on them. So kind of pausing and slowing down at that point, I'm quite keen on. I have this one post on the idea of like having a kind of red line for the intelligence explosion, where you just have some sort of operationalization that you're quite keen on. Maybe you also have this like panel that's like Jeff Hinton and Yoshua Bengio and other kind of luminaries, perhaps with some skeptics in there too. And that turns this gradual process into a kind of binary. And the thing that I've been kind of keen on is there being this like international convention essentially, which is like, okay, the intelligence explosion has begun. And we're all going to come together and like figure out like what's going to happen over the course of the coming year or years. So I'm in favor of slowing down the intelligence explosion. What does that mean for pausing now? Which I think is really quite different. Okay, again, distinguish a couple of different sorts of pause. One is like pause on capabilities and another is pausing in terms of like compute. The pauses I've seen advocated are pauses on capabilities. It's like no new training runs. And honestly, I think that's kind of, yeah, would have actively harmful effects even on the things that we care about, even just from a safety perspective. Because it's like, at the moment, there's a small number of actors at the frontier. And my personal view is that they're like actually surprisingly sensible. My prior is low. My expectation is low for how companies behave. And you can look at the kind of history of how Exxon dealt with the problem of climate change and so on, where they just buried it and fed misinformation instead. But there's both small number of actors who are alive to and investing at least some in the problem of air safety. Poor capabilities, it's like, okay, well now all of the laggards start coming up to the frontier too. So that's China, you know, like Meta, XAI, like all. So we've now got many more actors, including the ones who are, I think, less scrupulous. And also if it's about not training, well, you can still stockpile compute, you can still build more fabs and so on. And that starts putting us in this really quite precarious situation where, okay, if one person breaks the pause, then suddenly things can go much faster, much faster than they were before. And in particular, the speed and size of intelligence explosion you get is about like how much compute do you have at the time. And so that actually means that other things being equal, I want more algorithmic progress faster because I want us to get to...

Speaker 1:
[103:55] Because it slows things down later? Yes, because it means you get a low-hanging fruit on the algorithms?

Speaker 2:
[103:59] Well, it means that you've got AI automating AI R&D with a smaller total compute stockpile. And that means, like, do all of the modeling and so on, you get a slower and lower plateau intelligence explosion. And that's again, that's the scary bit. That's where all the risk is, and that's where things are going too fast. There is this different proposal you could have, which is like, okay, don't do it by training, but just like slow the amount of compute that we have. That I think has like, yeah, more promise, though there are still kind of other similar worries, like okay, well, don't produce as many chips, but there are lots of fabs and power stations and so on, everything kind of ready to go. And again, you'd also get the kind of catch up concern. But then the final point is just, okay, there's various things we could be advocating for. From my point of view, there's just loads of like, incredibly low hanging fruit for making the situation quite a lot safer. So, we've talked about AI character, we've talked about like risk aversion and deals with AIs. Like we haven't talked about things like mechanistic interpretability or safety research, or just like a really quite basic government regulation. So, like the US government could say, if you're a frontier company developing AI, you have to have an AI constitution that says what the AI is meant to do. And you have to have, you have to give us like very high quality evidence that the model is in fact obeying that constitution and does not have some ulterior goal that could have been put in by internal sabotage or a foreign actor like China or has developed organically. That would be like a really big win in terms of reducing risk. And all of these things are like, do not impose like massive costs on the world. And I think are just like much, much more likely to happen than the idea of like some international pause. So the like bang for buck of like what to advocate for. I mean, like I say, I actually think the pause stuff I've seen seems counterproductive to me. But even if I was like, okay, in the ideal world, this would happen or something. I'm like, man, there's just so much other stuff that's just like super low hanging fruit, super high bang for buck that we could be pushing for.

Speaker 1:
[106:22] Yeah. There's obviously like a really complex thicket of considerations here about the exact timing, exact message, exactly how voluntary and so on. I think it is worth having some people trying to put in place the infrastructure to pull the cord at a future time. It is a bit frustrating that I think that there's no conversation between the US and China along the lines of, if neither of us is like sure how dangerous this is, it could be really safe, could be really dangerous. If we get just damning information, if we get some damning revelation about the nature of these AI systems and how dangerous they are, we want to be able to quickly coordinate to not trip the wire that we have just realized is there. But there's nothing like that. I think there is a bunch of preparatory work that could be done for pausing at the appropriate time if we get the right evidence.

Speaker 2:
[107:08] Yeah, I totally agree on that. And like, yeah, having compute tracking, so we just know how much compute there is. Having a plan where it's like, okay, if the US and China are just like, yeah, this is too much, they agree, they bring their chips to Switzerland and mutually destroy them, or at least a certain number of them.

Speaker 1:
[107:28] But I was thinking that the more modest thing is just saying, well, if we conclude that the next, we both just agree, evidence has come out, the next training run could be mega dangerous. We really don't want the other one to go ahead and do it. So we need to have some monitoring arrangement that we can very quickly put in place so that we can both feel good that neither side is going to rush ahead.

Speaker 2:
[107:44] Okay, yeah, yeah.

Speaker 1:
[107:44] Isn't that like an even easier ask really?

Speaker 2:
[107:47] Oh yeah, I guess I was maybe thinking that might be harder. So stuff involving like compute governance is just much easier to like monitor and verify than are you doing like a training run on existing compute and we don't even know how much compute you have and so on. Because it would involve like maybe some on chip mechanism for whether the chip is being used for training or inference.

Speaker 1:
[108:11] Okay, we could talk about pause questions and the details of that for some time, but I think we should set that aside for another episode maybe. You helped found effective altruism many years ago. I guess it's been kind of the motivating philosophy for 80,000 hours since we started in 2011, more or less. I guess it's been a tough years for EA. The main reason being that Sam Bangwood-Friede, who is mega associated with effective altruism, committed some massive crimes, I think at least partially in pursuit of altruistic goals, probably mixed motivations, but I think wanting to make money in order to do good was one of the factors. I guess a lot of people have been inclined to lose interest, I suppose, in EA or to be either disillusioned with it or think that it's a bit hopeless because the brand has been so damaged by that event. How do you think EA has been tracking over the last couple of years? Is it stagnating or recovering a bit or in decline?

Speaker 2:
[109:08] Yeah. I think we should distinguish between the online vibes and online discussion brand and then what has in fact been happening. It was obviously this huge hit and it was like at the time, just maybe this is the death blow. I think the overall story is like obviously things are much quieter, like relatively quieter, like less flashy kind of online and so on. And obviously fewer people are like EA identity, this is my kind of brand in a way that I kind of think is good and healthy.

Speaker 1:
[109:45] I think maybe like would have been good anyway.

Speaker 2:
[109:47] Would have been good anyway, like personally. But then in terms of just like how were the ideas like in practice, kind of how is that kind of the impact kind of going over time? I think the overall story is like, okay, there was this big hit for a few years. And then now it's just kind of back to really quite strong growth. So for a few different kind of metrics on this, one is like just broader effect of giving kind of movement, just trying to move money to more effective charities. How has that been growing over time? And pretty steady actually even through this period of like crisis and drama and so on of like growing at about 10% per year. Over the last year actually, it's like accelerating. So the numbers aren't yet in, but it looks like the kind of growth in total money move to effective charities has grown by like 40% or 50%. So from about like 1.2, 1.3 billion to probably more like 1.8. And so obviously a big part of that is coefficient giving. And a big part is GiveWell. There's also Founders Pledge, but you've got the same dynamic across many different kind of national effective giving organizations. And then also kind of new foundations being set up on kind of effective giving principles as well. So that's really seemed quite striking. And then I think the same dynamic applies for other areas too, like giving what we can pledges as well. Absolutely, the kind of growth in that was took a big hit where you have 1600 new pledges in 2022 and then only 600 in 2023. But again, now it's just back to quite promising rates of growth, kind of 20%, 30% year on year growth, given what we can now kind of got more money moved than any year, like annually than any year in the past. And then similarly with kind of Effective Altruism itself as a kind of community and movement on Center for Effective Altruism's main metrics. Again, it looks like 20% year on year growth. So it's kind of like this thing of just, it's like this increase.

Speaker 1:
[112:08] There's a huge boom and a huge bust and then it's like maybe back to where you might have projected many, many years ago.

Speaker 2:
[112:13] Yeah, maybe like if you, I think if you've gone to like 2015 and now just saying like, oh, this is what 2025 was like, you'd be like, oh, okay, cool. Well, it's just like this crazy period in the middle.

Speaker 1:
[112:29] So I think in a couple of months time, you've got the 10th anniversary edition of Doing Good Better coming out, right? And I guess you're going to do a bunch of interviews based on it?

Speaker 2:
[112:37] Yeah. So making me feel very old. And yeah, so it's been now 10 years since Doing Good Better was published. And obviously, just a lot has changed in the world. And so it was being used as materials in lots of student courses. And so I was getting some professors kind of asking me like, please, can you update this? Because it's hard when like statistics without a date. So there's this wholly updated version. The content is all basically the same. It's mainly just facts and figures are updated. And then there's a new kind of preface that is discussing a little bit of like how my thinking on effective altruism has evolved over time. And yeah, I'm using this as an opportunity to go on a few more podcasts and so on and talk about effective altruism and the core ideas a little bit more.

Speaker 1:
[113:33] Yeah, how are you expecting it to be received? I guess you expect to be like hit with lots of questions about SPF?

Speaker 2:
[113:38] I mean, I think, I mean, I think like it's a revised edition. It's not going to be this like big mega kind of splash. And yeah, I expect there to be a mix like a lot of people are, you know, that's the story they want to talk about. A lot of people just genuinely incident the ideas and the kind of philosophy behind effective giving or effective career choice.

Speaker 1:
[114:03] I guess I feel like the thing that I did, I feel like it's appropriate that EA took a reputational hit to that it really did like reveal something problematic or it made me think of something that I knew was problematic about it was like actually a much more serious issue than what I had thought. Like it had always been the worry that it would be maybe easy to appropriate EA ideas to like justify rule breaking and very like misbehavioral possibly even crimes. But I had thought that it was relatively like the rate of that would be quite low. I guess the fact that we had like such a spectacular instance of that, like relatively quickly made me think, well actually maybe the like appetite among human beings to like grab a philosophy that can justify doing bad things in pursuit of power like might be greater than what I had thought. And I hope that like I hope that we've installed enough say like more safeguards or maybe the like the reaction to that event is like so it's sufficiently strong that that we're unlikely to get like the same sort of thing recurring again. Do you have any thoughts on that?

Speaker 2:
[115:00] Yeah, I mean, there's definitely, yeah, open, like very open questions to me in terms of like what was in the minds of various people at FTX. I mean, yeah, my really spent much longer on this topic than perhaps would have enjoyed. But even though I really had to worry that it was like some careful consequentialist plot, that I think just really isn't borne out by like a careful kind of study of it. Doesn't make nearly enough sense among other reasons. But one, yeah, but then the thing that's definitely true is like, okay, EA has like evolved a lot in that. I think like it being less of an intense identity is a big part of that. I think like people are extremely on guard for like a certain sort of like fears about like little breaking and certain sort of like, you know, naive maximizing in a way that I think is like healthy.

Speaker 1:
[116:02] Maybe it would be good to have that earlier.

Speaker 2:
[116:03] Healthy anyway. I mean, I think, I think EA always like had this like in a way was, in fact, that actually was emphasized a lot. And I'm like glad it's being like doubled down on.

Speaker 1:
[116:15] Okay. So in terms of the future, you wrote this post a couple of months ago that was like, that was super well received called EA in the age of AGI. I guess, discussing what you think is the comparative advantage of the EA mindset, I guess, in the coming years. Yeah. What was the case you were making?

Speaker 2:
[116:32] Yeah. The key thing is just there's a certain sort of vibe, which is, well, two things have happened. One is we've entered what I'm calling the age of AGI from GPT-4 onwards, where we now have AI systems that are reasoning in impressive, human-like ways. Or sometimes human-like, sometimes not, but they're actually able to do tasks that are just clearly on the path to AI that can automate AI R&D. That's a really big deal, and it's happening sooner than most people thought. So there's this huge rise in attention on AI. And then at the same time of these major hits to EA as a movement. And so you might have this view of just, okay, well, we should just let go of EA as a project. Think of that as a legacy project. Because instead, what we should just be focusing on is AI safety. And the drum that I've been banging for many years, but the last couple of years in particular is like, look, AI poses many threats, many risks. There's many things we need to get right. It's not just about alignment, though that is very important. And when we look at these other challenges, well, what sort of person do I want working on them? I want people who are very kind and nerdy. I want people who are careful and thoughtful and have scout mindset and are very ethically concerned and are not merely coming in to some partisan ideology but are also willing to think about really very weird and dizzying things. And that is exactly what is being provided by effective altruism as a set of ideas. And my main case of this was for all the stuff that is not just alignment. Some of the pushback I got on a draft of it was, no, actually this is really important for alignment and safety too. Because within alignment and safety, there's all sorts of things you could work on. You could be like, oh, the enforcement learning from human feedback or other stuff that's just related to the models today. But taking really seriously the alignment problem is taking seriously the hard problem, which is how you're aligning super intelligence, which may in fact have perfect situational awareness of any tests that you're trying to do that can do what would be the equivalent of millions of years of reasoning. I mean, in the extreme millions of years of reasoning and one forward pass, or that is like continually learning over time, reflecting on its whole values. These are the hard challenges, and that is like a weird world to think about. And it's something that doesn't really come naturally, whereas some of the safety researchers I've talked to have said like, no, it's actually people who are really thinking about this kind of big picture perspective. They're adding much more value than people who are treating kind of AI safety as like their job, and they're not thinking about the big picture as much.

Speaker 1:
[119:52] It's interesting that it feels like the thing that's doing the work there, I guess it's just generic scope sensitivity is one factor. And then there's also like a particular. I feel an appetite for weirdness, which is being willing to seriously toy with very strange ideas. I guess some of the things we talked about earlier today are in this category, without going off the deep end and becoming absolutely besotted with your pet theories. It's like a fragile middle ground, which I think is relatively uncommon. And for that reason, it's quite valuable, because there's neglected stuff, but only people in that window are going to be excited about.

Speaker 2:
[120:23] Yeah, I mean, yeah, there is this thought that, look, it's just really hard to be well calibrated and try and believe two things, and even when they're completely weird, but not fall into kind of contrarianism that maybe will get you a good following on social media and make people think you're interesting. And if you're just really honestly trying to do good, well, that's something that's constraining you, because you will do more good if you have accurate beliefs. And at its best, at least, can lead you to be in the light middle ground where you believe or entertain weird ideas when it is appropriate to do so and reject them also when it's appropriate to do so.

Speaker 1:
[121:08] So people can go and read that blog post, if they, I guess, want to get the full argument. But what were some of the particular things that you thought? Were people with an EA style of thinking, an EA flavor, should particularly disproportionately be going into?

Speaker 2:
[121:21] Yeah, I mean, I would say just the range of things that we're focused on. I mean, there's one that's just very obvious in particular, which is just AI rights, like AI well-being. Some of the stuff we've said about kind of cooperating with AIs as well. That's just a very unusual set of things to be thinking about. I don't think it will become unusual. In fact, I think it will become really quite mainstream concerns in five years' time, but is exactly the sort of thing where I think it takes both a willingness to entertain weird ideas without considering them at the same time as like actually a deep concern for not really messing up ethically speaking. I would say stuff on AI character as well. I mean, here it's like we want lots of different voices and lots of different people kind of playing into this, but there is a big aspect of it, of already the people who have in fact been in charge of kind of AI character, most of the companies have been like dealing in this kind of reactive way, because we're not even looking ahead like a couple of years. Like maybe the AI characters now just caught up to the capabilities AI have. But I mean, how much thought has really gone into like AI character in multi-agent dynamics over like long time periods, like really kind of very little. And so, you know, for whatever reason, I think people with kind of EA mentality have just been good at going into like weird, poorly scoped areas and then kind of helping helping figure out like, okay, actually, what's most important for us to focus on and whatnot.

Speaker 1:
[123:05] I imagine someone who wanted to push back on the EA in the age of AGI argument, they might say, I guess EA has like taken a massive brand hit, has like a bunch of negative historical associations because of SBF and FTX. It also brings with a whole bunch of other philosophical baggage that people may or may not be that interested in. It's associated with the Shrimp Welfare Project, among other things, which I really like, but many people might be interested in your AGI-related project, but look askance at the Shrimp Welfare Project. Why tie yourself to a bunch of other weird work that you may or may not personally like at all by like branding yourself or branding the project as an effective altruist style project? I guess in particular, and as much as you have like more mainstream motivate or you have like a mix of motivations, like it's not exclusively motivated by particularly unusual EA moral philosophy, you also just like want to make the world better in a like general way. You want to like ensure that we don't all die and that the world is better for your for your own children. Why would you like make EA a big, a big, a big feature of it? If you could just say, well, I want to make the world better, also in a common sense way. And like that would be sufficient to justify what I'm doing anyway. Yeah.

Speaker 2:
[124:16] I mean, I think a big thing is I am like not making a pitch or an argument about like the brand at all. Like, you know, the words EA, like I have no particular kind of attachment to them, no particular attachment to whether what, how people describe themselves. I mean, in fact, like it's always been the case that the best outcome is where that idea just feels like quaint.

Speaker 1:
[124:40] EA withers away.

Speaker 2:
[124:41] I mean, I don't describe myself as a suffragette because I believe that women should have the vote. That is like, you know, an obsolete term. And so, yeah, similarly, people can describe themselves however they want. The key thing is like, what are the, what's the like mindset on which people are operating? Is that scout mindset? Is that being scope sensitive? Is that being appropriately responsive to how unusual a point of time we're in and like, how high the kind of moral stakes are?

Speaker 1:
[125:09] You recently put forward a vision for the near-term future that you called Viatopia. What is Viatopia and what's the case for it?

Speaker 2:
[125:15] Yeah. So, situation at the moment is that many of the biggest companies in the world are trying to build AI systems that surpass human ability across all cognitive domains. I think they're the good arguments for thinking that this is one of, if not the most momentous, things to ever happen in human history. Much more like the evolution of homo sapiens or of life itself than even the industrial revolution or the invention of electricity or fire. It's at that level of magnitude. And yet, essentially, no one has a well-formed positive vision for what a good society after the development of super intelligence looks like. And that's this kind of striking and kind of worrying thing.

Speaker 1:
[126:05] It feels like a bit of an omission, yeah.

Speaker 2:
[126:07] It feels like a bit of an omission. And the concept of viatopia is at least trying to offer a bit of a framework for what could an answer to that question of what a good post-super intelligence society look like. And so the concept of viatopia is that it's a state of society that is on track to produce a near-best future, something that's just at least 90 percent as good as a future that we could have. And it's distinctive in that it's not saying we should try and aim for some utopian society directly. It's also not saying merely, oh, look at all these bad things that exist in the world. We could solve this particular problem and this particular problem. What it's saying and said is that we should try and figure out what does a good way station look like, where that is some state society that can steer itself to something truly very good. And so as an analogy to illustrate, imagine if you're an adventurer and you're lost in the wilderness. There are a few different options you could take. You could try and take your best guess at what the right path is to get to your destination, or you could try and just deal on an ad hoc basis with some issues you have at the moment, like maybe you're running low on supplies, or you could try and get yourself into a position where you know what's most important to do next and where to go. So for example, get into higher ground so that you can survey the terrain and figure out actually where you're aiming towards. And Viatopia is like that third path.

Speaker 1:
[127:51] And what would be the case for focusing on trying to get to Neotopia now rather than trying to directly create a good world immediately?

Speaker 2:
[127:59] Yeah. So Utopianism has a pretty bad track record. Philosophers and writers have often tried to sketch visions of Utopia. And normally it's not long before they actually start looking quite dystopian. And the only reason for that is, well, we just don't know what an ideal future looks like. There's a lot of moral progress we'd need to make before we could actually say, yeah, with confidence, this is what an ideal future would look like. So we need to do something else. Otherwise, we'll probably bake in some major moral errors of our own. Okay.

Speaker 1:
[128:37] What does the name Via mean? So Via means road or something in Latin or through?

Speaker 2:
[128:40] Yeah. We mean by way of this place, Via Utopia.

Speaker 1:
[128:44] So this Via Utopia notion, you've told me it's been very popular, it's been very well received. Do you worry that it's a slightly vacuous notion that you're saying, well, we want to get to a really good future, and so we need to get to some intermediate stage, intermediate position where we're likely to get to that future. Is that a great insight or is that just kind of a trivially obvious thing and it's not necessarily going to actually help us get there?

Speaker 2:
[129:06] Yeah. So good pushback. And I think it's not the most substantive thing and it's deliberately, it's a framework concept, it's for organizing our thinking. However, I think it's not totally trivial. So in the, you know, there is a history of debate on utopianism and other concepts. And the leading ideas were kind of utopianism, very popular idea, responsible for some enormous atrocities through history. And the pushback to that, from Karl Popper onwards, but still very popular now, so Kevin Kelly, a futurist, has this idea of pro-topia, is the idea you just don't have a positive vision of the future at all. Instead, you're doing something more like hill climbing. So you're looking at society now, what are the little things you can change that are like clear problems, and then just trying to solve them one after another in this incremental way. And so via topia is a different way of thinking about things. And I think it does make substantively different, or leads you towards substantively different recommendations than you might otherwise think, especially over the course of the transition from here to super intelligence. So if you've got the utopian perspective, you might think, well, what we need to do is just make the AI a classical utilitarian, or insert your other favorite model view, and then just hand over to the AI that's pursuing that vision of the good. It seems very bad from the utopian perspective. Or, and this will be very rough from the protopian perspective, you might just think, wow, will these major issues, major problems in the world, like 100 million people dying every year, and AI will give us the ability to completely solve those problems. So actually we should get there as quickly as possible. There will be in fact very rough trade-offs between how quickly we go and how much risk of existential catastrophe we bear over the course of this transition. Aiming for viatopia might say, well actually there's certain things that are even more important, namely not locking us into a really bad future, even if that means that we don't get to some of the upsides in terms of near-term benefits quite as quickly as we might otherwise have done.

Speaker 1:
[131:36] So you're saying protopia, this idea of well we don't want to have a grand vision, that's going to lead us astray. Instead we just want to get wins immediately, like find ways to improve the world that we can understand and that we can see whether they've worked. That would potentially lead us to miss the bigger picture risks, so we're just grabbing immediate wins, like trying to improve health, or it would recommend just charging forward on AI?

Speaker 2:
[131:59] Or at the very least it wouldn't prioritize among them, where it would say okay, well, maybe risk of loss of control to superintelligence or entrenchment of some authoritarian regime, you know, okay, well, that's some risk, but there are these clear apparent evils such as death and poverty and so on, and we could solve them kind of right away. And so...

Speaker 1:
[132:25] Although it would also say, like, if you thought that the AI might kill everyone in the near term, that's also a near term problem, although maybe it's harder to evaluate because it's like more about more probabilistic.

Speaker 2:
[132:33] Well, it's harder to evaluate and also pro-topianism, at least, wouldn't give you the resources for saying one of these is much more important than the other.

Speaker 1:
[132:42] Yeah. Do you think of Viatopia as a middle ground between utopianism and pro-topianism, or is it a different thing?

Speaker 2:
[132:48] In a sense, it's a middle ground in that it is offering a positive vision for where we should be headed. However, it doesn't have the same, in my view, the same pitfalls that utopianism has, because it's compatible with many possible ultimate visions for what a good society looks like and is not committing to this kind of narrow, narrow view of the good.

Speaker 1:
[133:15] So what would be the key traits that are Viatopia? Would you say it's a Viatopian state? What would be the key properties that you'd be looking for, do you think?

Speaker 2:
[133:23] So there's the key questions and key properties. And I want to emphasize the questions more important than my particular answer at the moment. Both because the questions themselves are more important and because my views evolve a lot over time. But that can include things like how widely distributed is power, where on one end of the extreme, it's just all powers concentrated in the hands of a single actor, all the way to, oh, it's extremely distributed, global democracy or even perhaps more distributed than that. A second is just, well, what sorts of people, what sorts of beings have power? Is it just members of a particular society? Is it just humans? Do AIs have influence over the future? What about future generations? A third category is when do major decisions happen? Where there are some arguments for thinking, look, we need to make really big decisions really quite early. Or instead we should say, look, actually for the sorts of decisions that will really guide how the future goes, we want to pump them into the future as much as possible. Then finally, there's questions around, well, how should society as a whole be making decisions, and these most important decisions about how the future goes, where that could be via democracy, via voting. If so, what sorts of voting systems could be via auctions and market mechanisms? If so, what type? And so those are just some of the things we've got to grapple with, I think. And I have views on them, but they evolve.

Speaker 1:
[134:58] So the analogy that most jumps to mind to me is that you might have, if you have a group of people starting a new country, they might not yet know exactly what the nature of the law should be, what the political system should be, but they might find it have an easier time agreeing on some process like constitutional convention sort of thing where they come together and they figure, well, everyone will get some vote, we'll use this kind of deliberative process and then we use this kind of voting system. And then at the end, we'll end up with some set of agreements of how things are going to run and the chips will fall as they may. Is that a good analogy to have in mind?

Speaker 2:
[135:27] Yeah, I think that's a great analogy. And the US constitutional convention at the end of the 18th century is this remarkable event where, if I remember correctly, it's about 40 people in a room debating for three months, what should the United States of America look like? And what they agree is this set of procedures. And obviously, there's ratifications and amendments after that. And it's interesting too, because there's this balance between locking in certain ideas, but also kind of locking in a method that doesn't involve lock in itself. So, you can lock in to a certain system that allows a lot of experimentation and free debate and change over time. That's very different than if they'd chosen a constitution that put a single person and or even a single family lineage in absolute power or something. That would have been kind of locking in to a different sort of political system, but one with much less in the way of open-endedness and how it could develop over time. Okay.

Speaker 1:
[136:46] So, are there any particularly non-obvious or controversial recommendations that you think the Viatopia and framing on things would push us towards, stuff that people might otherwise not like?

Speaker 2:
[136:57] Yeah. So, there are certain things that at least I think a Viatopia would consist in that is not totally obvious. So, one which we'll talk about is, I'm very pro distribution of power, whereas a lot of people who worry a lot about existential risk really are in favor of actually quite intense concentration of power. Because the idea, and it's not an insane view, in fact, the idea is if you've got this period of intense existential risk, in particular, if existential risk can be posed by any of many different actors, whether that's because they develop a misaligned super intelligence or because they create extremely powerful bioweapons, then you might think, well, we just need a very small number of actors, maybe in fact just one powerful actor that can guide us through this period. Whereas I think that's unlikely to put us into a position where we can guide ourselves to a near best future.

Speaker 1:
[138:01] Yeah, why's that?

Speaker 2:
[138:03] I think we'll talk about it a lot more. But ultimately it's because I think any single actor is probably has the long moral conception, even upon reflection, even if they choose to reflect. I think it's a little worse than that, in fact, because the sorts of people who end up in positions of...

Speaker 1:
[138:23] Imagine that one person has risen to the top and gained supreme power. There's probably some bad filters that they've gone past through.

Speaker 2:
[138:29] Yeah, exactly, and that's... If you look at leaders of authoritarian countries in the past, well, that includes...

Speaker 1:
[138:37] It's a mixed track record.

Speaker 2:
[138:39] Yeah, I mean, that includes Stalin, Hitler, Mao, and the personality traits are just... It's terrifying. These are psychopathic, sadistic people. They're not merely randomly selected people who happen to have total power. And I also think that if one person or even a small number of people are in a position of total power, they're also just less likely to reflect on their values in positive ways. I think that's something that tends to happen more naturally out of interpersonal interactions and the needs to justify...

Speaker 1:
[139:16] Well, especially ones between equals, I feel. Yeah, I think you noticed this even just with people who gain more influence within an organization or they become wealthy or respected or so on. They stop getting the normal pushback that sharpens their ideas. Imagine if you were the supreme dictator forever, how disconnected you could become from any reality.

Speaker 2:
[139:35] Yeah, exactly.

Speaker 1:
[139:37] Okay, so what are the different categories of Viatopia that you think have a shot at working?

Speaker 2:
[139:43] Yeah, so I think there's three broad ways of thinking about how we could get to a near best future. The first I call kind of easy Utopia. So this is actually, I think, the common sense view, which is just it's not that hard to get to an extremely good future, something that's basically as good as you can get. You just need to eliminate the most obvious and egregious bads. So yes, dictatorship would be like that, but eliminate poverty, eliminate suffering, allow people to have ill health, allow people to have freedom. And that plus just technological development will get us kind of most of the way or even all the way there. If that's correct, then Viatopia isn't that interesting actually, because we'll probably just hit it anyway. A second view is convergence, where on this view, you would need to have most of society with power, converging on to the right kind of ethical view. Or I'll sometimes use correct ethical view or correct moral view. You can also just say this in more kind of anti-realist, subjectivist terms like the view I think, the place that with the view I would have upon idealized reflection or something. But it's easier just to say correct or best.

Speaker 1:
[141:09] And they have to be motivated by it as well, right?

Speaker 2:
[141:10] And have to be motivated, yeah. So in this idea, convergence, it's like, yes, maybe the best future is a narrow target. Nonetheless, if we can get it such that most members of society or at least most of people with power converge onto the best thing, the best moral view and steer towards it, then nonetheless will hit the narrow target. But that is necessary. And then the third vision would be what I call compromise, which is, well, you don't need everyone. In fact, maybe even if you just got a small fraction of people who have the right kind of ethical views and are motivated to pursue them, and the right kind of broad philosophical perspective and understanding of the world as well. And they're able to kind of trade with the rest of society. That is sufficient to get us to a near best future. And my view at least is that this third option is kind of the most promising thing to steer towards.

Speaker 1:
[142:20] So we're going to skip over the easy utopia scenario here today. You have an article on the Forethought website called No Easy Utopia, where you argue that that is not plausible. I guess in brief because I think we both agree that the best possible world is not just a matter of removing bad things, it's also about like adding lots of the best possible thing as well. And probably the best thing is better than nearby things, so it's like quite a narrow target to hit. And I guess we're not going to talk a ton about this, reflect like what if everyone just when they reflect on moral philosophy, they end up concluding that they reach the correct theory and they're motivated to spend all their resources operationalizing it. Do you want to say anything quickly about why you don't think that is super likely to work?

Speaker 2:
[142:58] Yeah. I mean, there's lots to say, but I guess I just think there's multiple ways it can fail even if we're in a reasonably good scenario where one is just that people can be uninterested in reflecting or they can reflect in the long ways or they can even have a good reflective process but just have the bad kind of starting intuitions where from those intuitions that even with good reflection, they'll end up in the wrong place. And I think I'll say like I am somewhat sympathetic to the idea that maybe, yeah, maybe quite large swathes of people actually would converge in the same direction. I think if that's true, it's because of the nature of reality. It's because of, in my view, something kind of moral, moral realesty being correct. Just the argument is just very strong towards one particular ethical view. Or if you just experience like this particular conscious state, you can't help but believe that it is good because it is in fact good. That's the sort of scenario I think we'd have to envisage. But, oh, wow, I don't think we should be confident in that. And in fact, I have like really quite wide uncertainty over how much convergence you would get from all the way from, yeah, it's actually just large swathes of people would converge. That's again a really good scenario. All the way to just like no one converges after reflection. All eight billion people in the world would have like quite different views of the good. Yeah.

Speaker 1:
[144:35] Well, you missed out. You could get all of it right. Have everyone conclude the correct moral theory, but nonetheless not be interested in putting their resources. They'd just be like, well, but I just like want to do my own thing. I don't care about doing the moral really good thing.

Speaker 2:
[144:45] Yeah. And like we see this. Yeah. And in fact, that's, I think, the most likely failure, where you can go to people and give them the arguments for vegetarianism or donating and they can say, yep, all those arguments work and then just not take any action on it. And in fact, it's not like we see today people investing lots of time and lots of money into ethical reflections and leading counter arguments and so on. It's just not really something that happens. It's going to be quite weird and unusual to do that. And in fact, maybe some people would want to guard against them. So imagine fundamentalist religious believers or people who are very wedded to particular ideologies and they might say, look, I don't want to risk losing on this adherence to my faith or oh God, it would be like a bollent of me to even consider this alternative position. And with future technology, we would be able to guard our informational environment or even self-modify such that we don't even consider these alternative perspectives.

Speaker 1:
[145:55] And so just setting the scope even clearer, we're mostly not going to be considering cases of catastrophic misalignment and really deeply scheming artificial intelligence here, not because that's like not a possible option or a very live possibility, but just because we only have like five hours to record in and it raises a whole lot of separate issues, it's worth imagining what happens if we mostly overcome that one way or another. So yeah, let's dive into the third option, which you thought was most promising, which I guess you call like compromise, trade. This is a scenario where, as I understand it, you have some meaningful minority of people who do converge or weighted by, I guess, like power or resources who converge on wanting the right thing for its own sake and they're willing to allocate some meaningful fraction of all of their effort towards that. And so let's say it's 10 percent, 10 percent of like resource or power weighted folks want to pursue this goal. You want to try to spin this into like more than 10 percent of the best possible future that they could be. How might they accomplish that?

Speaker 2:
[146:51] Yeah. So I think there's kind of two big ways. So one is if different groups care about really quite different things. So the greatest example perhaps could be people who just, maybe some groups upon reflection, they just value resources basically linearly. So a total utilitarian would be like this because the more resources you have, the more happy lives you can create. And the value of the universe as a whole is in proportion with how many happy lives. Other views that are perhaps more kind of common-sense-y might be very different to that. So might just care about preservation of the Earth's biosphere, or might discount over time and space. So care about what things happen near to them, or might really just care about guarantees of good outcomes, or very high probability of good outcomes, rather than risky gambles of even better outcomes. And this gives lots of opportunity for trade. So in this case, there could be a deal which says, okay, you've got the common-sense person. They say, okay, well, we'll steward resources that are nearby in space and time. And this total utilitarian, yeah, sure, you can go to other star systems and then create this kind of much more kind of ambitious, expansive world with many, many kind of happy beings. And then perhaps, in fact, both can get 99.99% of what they would ideally want if they had complete control over everything. And that's just like very exciting kind of potential opportunity, because it means that then if we can get into the scenario that, okay, we've managed to get these beneficial gains from all these different kind of ethical factions trading with each other, then we don't need to pick a winner. It's robust, a kind of disagreement, and it's therefore a much kind of safer option than either just hoping we all converge or pushing some particular view of the good.

Speaker 1:
[149:20] Do you think that things would play out that way, or is that a viable vision?

Speaker 2:
[149:23] So, I mean, I think there are risks to even getting that. So, one would be if there's intense concentration of power. A second would be maybe such trades aren't allowed. So, there's lots of things that you're not allowed to trade at the moment. It's possible just the best stuff. So, maybe the total utilitarian likes some particular blissful state. And those people are in the minority. And society says, no, that's illegal. Where, you know, there's already lots of things that are, you know, in my view would be kind of just ethically fine, but are impermitted today. The bigger issue, I think, is, okay, so maybe there's lots of groups who have relatively easy to satisfy views of the good. Like preservation of the earth's biosphere or, you know, preferences for things that are kind of local. But I think there'll be a lot of people who actually just do care about things linearly. And there, it's much harder to see, initially, why you would get these kind of huge gains from trade. So I said, okay, the total utilitarian says, well, I just want there to be as many happy, flourishing lives as possible. But now let's kind of distinguish within that. There's utilitarian one, utilitarian type two, and perhaps they differ. So on what they understand flourishing to consist in, what they think the kind of best conscious experiences or lives are. In order for there to be good deals from trade there, it would need to be the case that there's some kind of hybrid life that is more than 50% as good on both views. And, you know, it's speculation to say, like, how likely is it that there would be or not. My guess is that, in general, there probably wouldn't be, because my guess is that the very best things, from a utilitarian perspective, will be way better than things that are just a little bit less good.

Speaker 1:
[151:43] I thought the archetypal case here might be, you've got, you know, faction A, faction B. Let's say faction A, yeah, they're the utilitarians, they want, like, pleasure, no suffering. You've got faction B that wants, like, something quite different. And, like, faction B, incidentally, might cause a whole bunch of suffering in pursuit of their other goal. But the suffering is, like, not something that they value for its own sake. They're just doing it because it, like, makes their project somewhat more efficient. And then group A could basically pay group B to not, to redesign their thing so it doesn't involve suffering incidentally. Is that, like, is that a kind of thing?

Speaker 2:
[152:13] That would be a case. And, like, in the world today, that sort of thing happens. So, I do think that if we had much better opportunities to make such agreements, if we had better coordination technology or something, the vegans and vegetarians and people concerned about animal suffering could just engage in some sort of trade with the people who like eating meat. And perhaps it wouldn't result, there wouldn't be enough bargaining power to kind of eliminate farming altogether. I think it could eliminate factory farming. And so, you know, most animal suffering could just be abolished because, as you say, people aren't really aiming for that directly. It's just a side effect. My guess is that when we're now thinking about these, like, very, very kind of grand scales, that's not going to be like super common. Or at least there will be a lot of residual incompatibility left over because you're just trying to produce happiness type one as much as you can. I'm trying to produce happiness type two. I think that your understanding of happiness is, like, basically no value. But it's not like you're producing lots of suffering.

Speaker 1:
[153:34] It's just valueless.

Speaker 2:
[153:36] It's just, yeah. Or it's like a tenth is valuable or something. And similarly, vice versa.

Speaker 1:
[153:41] Okay, we'll push on from this. I guess we should just quickly note that there's a wrinkle with this kind of moral trade, a challenge that, for example, if we did start paying people to close down their factory farms or to redesign them, then you would be vulnerable to someone saying, well, I'm going to open up the worst possible factory farm unless you pay me. And you wouldn't know whether they would have done it. Otherwise, I guess they could pretend that they're not doing it to blackmail you, basically. But in fact, they are. I guess possibly that, you know, in this star-faring future, it could be maybe that wouldn't be such an issue or maybe it would be a much worse issue. We don't really know.

Speaker 2:
[154:12] Yeah. And I should flag this is my biggest worry with the whole widely distributed power and trade and so on, is vulnerability to those sorts of extortion, blackmail dynamics. And there's this very substantive project to work out, okay, what's a good system where, you know, people who say, yeah, self-modify or pretend or use blackmail and extortion are not rewarded for doing so. But you still get these other beneficial gains from trade.

Speaker 1:
[154:43] Okay, let's push on to some honest-to-god philosophy. I guess what analytic philosophers would regard as philosophy. You've been working on a pet moral-philosophical theory that you call the saturation view. What problem in normative ethics are you trying to address with the saturation view?

Speaker 2:
[154:58] Yeah, so this is kind of a set of problems, in fact, within population ethics. It's a well-known area of ethics for generating all sorts of paradoxes, cases where you've got lots of individually extremely plausible principles that end up inconsistent with each other. And there are a number. So there's what's called the mere addition paradox, where you've got some intuitively plausible principles end up leading you to what Derek Parfitt called the repugnant conclusion. The idea that you could start off with a trillion, trillion, extremely happy people, and that outcome might be worse than a population that consists only of people with lives barely worth living, as long as there's a large enough number of them. So it's kind of one of the problems. The second is the problem of fanaticism, that again start off with this guarantee of this amazing outcome, and now take a tiny, tiny, tiny, tiny probability of something that's even better, sufficiently good. When combined with expected utility theory, many views will say, take the gamble. No matter how small the probability, there's some sufficiently good outcome that you should take it.

Speaker 1:
[156:14] Because it's risk neutral, basically.

Speaker 2:
[156:15] Because it's risk neutral with respect to total quantity of happiness, or something like that. A third category of issues is infinite ethics. I think we definitely won't have time to get on to that side of things, but it's something that's really plagued this kind of impartial, consequentialist approach to ethics or axiology. But there's also a fourth problem, in my view, which hasn't been discussed in the literature, which I call the monoculture problem, which is, okay, let's try and figure out what's the best possible future. What does that look like? Remarkably, all the extant, kind of well-specified theories of population ethics to date, say that the best future, if you've got a fixed amount of resources, involves figuring out what's the very best life. What's the life that would produce the most well-being, you know, with a given amount of resources to create, and then just make copies of that life over and over and over and over again. Yeah, so in, you know, EA and Nationalist world, sometimes gets called tiling the universe with hedonium, where hedonium is the whatever produces the most bliss per unit of resources. But the general idea is just what it wants is a monoculture, because this is the thing that has the most well-being, and if you just have that repeated forever, you've also got this perfectly equal society, and so it's good on an egalitarian grounds too.

Speaker 1:
[157:48] Yeah, well, it seems like it's a very natural attraction point, because any theory that says that there's a best thing is, and that thing is not a universe scale, is going to say, well, if it's smaller, just make it and then make it again and just keep going.

Speaker 2:
[157:59] Yeah.

Speaker 1:
[158:00] It seems like you almost have to hard code in and preference against this to avoid the monoculture, which most people find quite unattractive.

Speaker 2:
[158:07] Yeah. And so, yeah, it actually also follows for them a couple of principles that are generally regarded as axiomatic in population ethics. There's like a very simple kind of proof you can make from it, from these kind of principles. However, I at least find that like unintuitive. I would think that a future of just like replicas of the one exactly, a qualitatively identical life is not the best possible future. And a better future would involve like a wide diversity of different kind of forms of life and experiences and so on. And I think that's not just an intuition that diversity or variety is instrumentally valuable, or an intuition that's saying like, well, we don't know what's valuable, so we should hedge our bets.

Speaker 1:
[159:01] Instead, I think it's just, no, actually, that's placing intrinsic value on variety.

Speaker 2:
[159:05] A better future. Yeah. Or something that has that implication. So it could be, I mean, this might just sound like the same thing, but I think it's slightly different that the realization of a particular experience, a form of life has value in itself over and above just the mere well-being. But either way, yeah, a very diverse and varied future is better than this monoculture.

Speaker 1:
[159:31] Yeah. It's surprising to me that this hasn't come up in the philosophy literature very much, because I think, I guess, online, whenever people talk about what are we going to do with all of the matter and the energy, and then anyone suggests something that is very monotonous, just repeat the same thing, people are like, well, I don't like that. Yeah, it sounds horrible. Yeah, sounds crazy and terrible. But I guess philosophy is like, because I suppose the prospect of changing all of the galaxies out there hasn't really been on the table before, it hasn't really come up as like, well, we need to figure out a solution to this.

Speaker 2:
[159:59] Yeah, I think that's right. So, I mean, I have found over and over again, actually, that being really concerned by figuring out how do we do as much good as we can has ended up just driving all sorts of interesting philosophical areas and issues that are otherwise being neglected, because most philosophers are not thinking in that same way.

Speaker 1:
[160:23] Okay. So, yeah, what is the saturation view? How does it address this?

Speaker 2:
[160:28] So, yeah, the saturation view is a way of incorporating the idea that diversity is kind of intrinsically valuable by having the thought that if you have a replica of a life, so a qualitative copy, that's just less valuable. And in fact, more and more and more copies of that life is progressively less and less valuable, in a way that kind of tends to some upper limit. And generalizing that a bit for the same reason, like maybe it's not an exact copy, but it's slightly different, that's also a bit less valuable than some, you know, totally new kind of form of life. And the analogy could be like, you know, imagine a kind of color wheel that's initially like not lit up at all. And different sorts of life will experience a kind of different spots on the wheel. And you can, by adding lies, you kind of like lighting up those little spots. Whereas a kind of traditional population axiology would be saying, just you have the best thing and just over and over and over again, you want to produce that best thing. Instead on the saturation view, you want to kind of light up the whole wheel. Because, okay, I've had many copies, let's say, of these very similar lives. Well, that means the additional lives are not as adding as much value. So you get more value by kind of instantiating some totally different kind of form of life or form of experience.

Speaker 1:
[162:03] I mean, it's a very natural formalization, I guess, of this intuition, that you're just saying, well, you hit declining returns on stuff if they're too similar, like you've got something that's good, but making another copy of it isn't as good as the first time. And also something that's too similar to it also takes a bit of a haircut if there was something else that was too similar to it in the past.

Speaker 2:
[162:20] Yeah, that's right.

Speaker 1:
[162:20] And I guess they never become useless, they just become less and less valuable incrementally.

Speaker 2:
[162:25] Exactly, that's right, yeah. There's never a point when you get no additional value, but the amount of value each kind of copy produces gets smaller and smaller and smaller and smaller.

Speaker 1:
[162:36] Does it asymptote up to some maximum value?

Speaker 2:
[162:38] Yes, so, yeah, that's part of the view asymptotes. And that's a really crucial part of it actually.

Speaker 1:
[162:43] Okay, and do you have any difficulty defining what the hyperspace is over which you're considering whether things are different from one another or are you just going to set that aside?

Speaker 2:
[162:51] Yeah, so, I mean, in my work so far, I don't talk a lot about, okay, yeah, what exactly is this, is this kind of space of like different lives and like what does the, you know, how many dimensions does it have and so on? I make some kind of formal assumptions about it, but my kind of view in general is like, well, let's just start off by kind of looking at the kind of formal structure of this view and like all of the nice properties it has. And then afterwards, we can then start arguing about, you know, because it would involve like fading lots of different intuitions and so on. But I don't think it's like really affecting the biggest pictures.

Speaker 1:
[163:36] So what are its nice properties?

Speaker 2:
[163:38] So going back to these different problems. So let's start with this monoculture. So very clearly, it just doesn't lead to a monoculture. And in fact, you would want like this very rich, diverse future and that would be better. In the variant of the view that I formulate, it dissolves the mere addition paradox.

Speaker 1:
[164:04] What's that?

Speaker 2:
[164:05] So it involves one extra structural assumption that I think, again, like emphasizing like the point is to find some theory that is not like the total view and avoids its problems. But if all lives that have very low well-being or all experiences depending on how you're aggregating it, are only a small part of the space of the overall landscape of possible lives or experiences. Then once you appropriately reformulate the kind of underlying principles that generate the paradox, because these have to be kind of philosophers who say, Caterus Paribus principles, so other things being equal principles. So it's saying holding diversity fixed, then it's not bad to make some people's lives better and add lives that are good. And holding diversity fixed, it's not bad to, or it's in fact good to have more well-being and more equal. It turns out that the view can have the implication that you satisfy all of those principles, rejecting the repugnant conclusion, accepting this dominance principle and this kind of egalitarian plus increasing well-being principle. But you do not ever entail the repugnant conclusion. Because the thought is that all of these kind of low well-being lives or low well-being experiences, they just can't add up to enough kind of diversity worth having. So, you kind of, in each of the steps of the paradox, you're kind of adding people and then trying to rebalance the well-being. But then there's a step where it's just like, you can't do it. There's just no world that will in fact satisfy kind of that step.

Speaker 1:
[166:08] Okay. I didn't follow that, but that's okay.

Speaker 2:
[166:10] It's a little bit hard to convey on a podcast. And in fact, like much of the paper is like, not even giving the view to begin with because the views, it gets mathematically quite intricate. And in fact, it's just giving a toy version of view and then working it through. So, yeah.

Speaker 1:
[166:27] So I think the main reason that I'm like not super drawn to this, I guess, is that I don't have the same intuition that, I don't have the intuition in favor of variety as strongly as like, as many people do. So of all of the problems with total utilitarianism or any views like that, the thing that I find like most troubling is the risk neutrality between like positive and negative experiences. I find that like deeply disturbing because it's never something that I would choose for myself. It's like that I would be like indifferent about a life that's extremely good and extremely bad, each with 50% probability. So that's like mega, that's super kind of intuitive to me. But the idea of like making something that's really good and then making a lot of it like, I don't find like as peculiar. And I suppose, yeah.

Speaker 2:
[167:07] Well, I just wanted to ask on your views, you said the risk neutrality. I mean, you could just have like a negative weighted utilitarian view where, let's say, bads count for a thousand times as much as goods or something, but you're still less neutral with respect to that.

Speaker 1:
[167:25] Yeah. So that is more attractive, I guess.

Speaker 2:
[167:27] Okay.

Speaker 1:
[167:28] I guess this is a little bit hard to know. Are you changing the weighting of the badness or just like how bad, or are you just correctly assessing that the badness is really worse?

Speaker 2:
[167:35] Yeah.

Speaker 1:
[167:36] But yeah, I think that makes more sense to me. That's, I guess, more how I would make the decision that it's like a really, you just weight the bad stuff really more. Of course, it's like debunking explanations for why humans would have this intuition that we're more capable of suffering a lot in an hour than we are of experiencing pleasure in an hour. But yeah. Yeah.

Speaker 2:
[167:53] Okay. So I'm wondering if you also have worries about the risk neutrality aspect, because that's where, I mean, in the most extreme combining it with the suffering cases, you start off with a trillion, trillion lives of intense bliss. Yeah. So a trillion, trillion lives like absolutely amazing. Option A. Option B is a trillion, trillion lives of intense suffering, worst possible suffering plus some one in a billion, billion, billion, billion, billion chance of an extremely large number of lives that are just barely worth living. The total utilitarian combined with expected utility theory has to say, or expected value, has to say that the latter is better than the former, as long as the number of lives are large enough.

Speaker 1:
[168:46] So what we're doing is adding a whole lot of just barely worth living lives, and that's way better?

Speaker 2:
[168:50] Yeah. So world A has trillion, trillion bliss utopia world. Yeah. Then gamble B, it's a gamble, has a guarantee of a trillion, trillion intense suffering plus an even larger number. Just an epsilon probability of all of these lives that are just barely worth living. But it's just a very large number of them.

Speaker 1:
[169:13] Yeah. I foresee that you're just going to throw out an edge case like this every day. Whatever I say, you have too much practice with this. I mean, that is also very unattractive to me as well. Okay. So yeah, I think you were going to go somewhere with this, I think. I would like to get a few helps with this.

Speaker 2:
[169:30] It's just because you mentioned risk neutrality, and that was one of the problems that I mentioned was this fanaticism, where no matter how small the probability, you really care about that, and as long as the pay off is big enough, you will pursue that tiny probability of an enormously large payoff. This view avoids that because it ends up being bounded. So as long as, yeah, basically as long as the landscape is either finite or a certain feature of it decays fast enough, then there's an upper limit to how much good you can create. Intuitively, like again, thinking of this color wheel, you've fully illuminated as bright as possible the landscape. That's the kind of upper bound. And so you avoid fanaticism. And then I'll briefly say but not explain why. For the same reason, I think it has quite a range of desirable properties even with infinite populations too. So many consequentialist views like the total view, they naturally leads to a lot of paralysis, where it's like you can't even compare intuitively comparable worlds. This does not have that implication.

Speaker 1:
[170:46] Okay. So I guess that is legitimate attractive. The two things that struck me as odd about the view, or less attractive about the view, was on the negative side, if you're also saturating there, it's even more bizarre that you would say, well, we've already had so many people suffering in this very specific, torturous way, adding more of them, who cares? Yeah, it's too similar to existing things to be that bad. It feels even more clear that on the negative side, it's just linearly bad to have more and more people having horrible lives. The other thing is, let's imagine that we weren't about this project, that we're going to turn the sun into whatever we think is morally best, or turn the solar system into this thing that we think is fabulously morally good. But then we make this discovery that we think that aliens elsewhere in the multiverse a long time ago or a long time in the future, they did something that was really similar. We've stimulated it and we think that they already made this before. We're like, shucks, we wasted our time. That non-separability, the fact that the value of what we do is connected to things so distant isn't intuitive to me. What do you make of those two things?

Speaker 2:
[171:59] Yeah, both super important points. And yeah, the negative side is the thing that like I think is by far, I mean, in my view is by far the most unappealing aspect. And then I think, yeah, you end up with, you've got to kind of pick your poison, unfortunately. Let's come back to that because on the separability side of, so yeah, this is this principle called separability, which is basically just if I'm comparing kind of like A and B, two different outcomes, suppose there's some background population in distant time, distant space, it's irrelevant to when that whether A is better than B, it's irrelevant what that background population is like.

Speaker 1:
[172:42] Yeah, so you can go like plus C plus C and then cancel that, like cut them out.

Speaker 2:
[172:45] Yeah, exactly. And yeah, I agree also that that's like quite in like intuitive. Yeah, that separability is intuitive. If you endorse separability in conjunction with just like standard kind of, I would think God is like technical assumptions, you have to endorse either the total view of population ethics, which is just add up all the happiness or the critical level view, which is just add up happiness, but minus a bit.

Speaker 1:
[173:22] For each individual?

Speaker 2:
[173:23] For each individual, yeah. So if someone had well-being ten and the critical level was two or something, then adding them to the population would have plus eight. And these views have all of these problems that we said to begin with. They differ on the repugnant conclusion, but the problems are really bad in both, or seemingly unintuitive in both cases. So that's one thing to say is like, okay, well, we're going to have to suffer a violation of separability. The second is that the diversity intuition is fundamentally an intuition about separability.

Speaker 1:
[173:59] Yeah.

Speaker 2:
[174:01] Because it's saying, it's like looking at the pattern of different sorts of life. It's saying like, well, we've already had a lot of this thing, so it's more valuable to have something new.

Speaker 1:
[174:10] I think it might be because these things are so linked in my mind that it's not as counterintuitive, the homogeneity thing. I guess if you haven't thought about this before, they seem like separate issues almost and you only realize in reflection that they're deeply connected.

Speaker 2:
[174:22] Yeah. Because there are some cases where violation of separability seems fine. So like in one's own case, it's like, okay, I'm going to go, I'm going to climb Mount Everest and that's going to be this amazing achievement. And then if someone's like, oh, you forgot you actually climbed Mount Everest last year. You're like, oh, did I? You're like, yeah, you knocked your head and you got an anus, yeah. You might well be like, oh, okay. Well, I mean, it's a bit unclear.

Speaker 1:
[174:49] I mean, if the expense would be the same, I'm like, I would do it again. I'd be like, great. Well, I can do it again because I forgot. But I think most people wouldn't probably. Yeah.

Speaker 2:
[174:57] I mean, I am actually kind of getting some people to run a survey on the, like to see how the bust people's intuitions are about different things.

Speaker 1:
[175:05] Which poisons people prefer to drink.

Speaker 2:
[175:06] Yeah.

Speaker 1:
[175:06] From this medley.

Speaker 2:
[175:09] But I mean, and I'm also like, I'm actually not claiming that like, this new view is like the best view. I think I'm saying like, if you want to reject the total view, which there are these strongest things. This is the best option. Because the last thing I'll say on this separability is that, yeah, we said that all views other than total view and critical level view have to violate separability if you satisfy certain technical axioms. I think the saturation view violates it in a less bad way. Because it's often, in fact, the vast majority of the time it's separable. So if the populations are like different parts of the landscape, then you can just add it up. You add up the value of this population, the value of this population. So it endorses this kind of limited separability principle. And then secondly, depending on how you define it, you could keep it such that it's all approximately linear until the population size gets really, really, really big. And so then it kind of looks approximately like the total view in most scenarios.

Speaker 1:
[176:19] Up into cosmic scale, perhaps.

Speaker 2:
[176:20] Cosmic scale, that's...

Speaker 1:
[176:22] Or even like into cosmic scale, if we're doing ADT. Yeah, I guess I've seemed a little bit unenthusiastic about this so far, but I think it's amazing. You might like, surely this is going to end up being a big deal, or surely this is like one of the got to be one of the top theories, like within this entire space, don't you think?

Speaker 2:
[176:37] Well, I do think so.

Speaker 1:
[176:39] Yeah, I mean, I don't find it attractive, but I think that many people will choose this as the population axiology once presented with it.

Speaker 2:
[176:44] Yeah, I mean, I see, I should say like, I'm not at all claiming that this is like the highest impact use of my time, because I think a lot of this work can just be punted till AI gets better and so on. But it is the idea that I've been most taken with, like most just obsessed by, like in my life. And I think from a purely intellectual perspective, I reckon it's my best contribution. It also just makes me appreciate actually how few population axiologies have been proposed. Like the options are really quite weak. And like most of the work that happens is more, very few people like, here's a view, here's a theory and like this is how it all works. In a way it's surprising.

Speaker 1:
[177:26] Yeah, people go like, is anything published about this yet?

Speaker 2:
[177:29] So my plan is to finish up. I've done this kind of sprint on what was meant to be the blog post summary, but it's 13,000 words. So I think I'm just going to be like, okay, this is like a draft article. And yeah, my plan is to publish that in the next few weeks.

Speaker 1:
[177:46] Okay, excellent. Well, we'll stick up a link to that.

Speaker 2:
[177:48] Okay, yeah. And very kindly, you've not gone back to the negative, how it deals with like very negative worlds, intense suffering and so on. But I'm happy to acknowledge that that's like, yeah, it's very implausible implications in that case, but.

Speaker 1:
[178:03] So you mentioned earlier that you used AI a ton to do this work. Yeah, tell us about that.

Speaker 2:
[178:10] Yeah, I mean, this is, I mean, part of the reason I think I've been so taken and obsessed by this idea. So I was like working on it like I was on holiday and stuff. You know, I was just like doing as much as I could in a spare time, is because of the like amazing, in my view, like uplift of AI on analytic philosophy in particular. So how helpful is AI for the search? Well, extremely spotty, where if you want to learn about some weird area, it's amazing. If you want to help it to certain areas of macro strategy of the search can be essentially useless. In the case of like at least this formal end of analytic philosophy, it's so good. And honestly, like credit where credit is due, it's almost all ChatGPT Pro, so now 5.2 Pro, where I think I wouldn't be saying any of this if that particular model didn't exist.

Speaker 1:
[179:05] Gemini or like a Claude or not at the same level?

Speaker 2:
[179:08] Well, I think big part of the reason is it just thinks for longer. So I've had it think-

Speaker 1:
[179:13] Is this the $200 per month one?

Speaker 2:
[179:15] Yeah. I mean, I now pay by credit, so I actually spent $1,000 in the month I was most working on this. But yeah, it will think for-

Speaker 1:
[179:26] A bargain at the price.

Speaker 2:
[179:27] I've had it think for 70 minutes is my peak so far.

Speaker 1:
[179:33] And it really does deliver better answers?

Speaker 2:
[179:35] Well, here's what's going on. I think like why is it- Because I've talked to other researchers who really don't get that much from it. And I think what's going on is like the problems within, say population ethics are like very well specified. There's a big literature which the AI has digested. And it's also an area where it has been mystified enough that it is amenable to kind of mathematical analysis. But very few mathematicians have actually looked at it, where it's mainly philosophers who maybe they did maths in their undergrad. The exceptions are a handful of economists and Teru Thomas, who is a mathematician who moved into analytic philosophy and in fact has done the best, in my view, like maybe the almost better work than anyone on population ethics. So there's this big overhang of capability that the AI is getting from its being trained to be very good at maths. And in my own case, yeah, I had the core insight like a year and a half, maybe two years ago now or something like that. And then I was like exploring it. I talked to Toby Ord and Christian Tarsny. And I should say that, yeah, if we publish a paper on this, it will be co-authored with Christian. And my, yeah, the initial kind of thought, it was specified in a way that like kind of obviously didn't quite work. And there was an obvious way it's like, okay, well, it's kind of specifying it like in a discrete form. And it's like, okay, there must be some like continuous form of the theory that would work. And then it's like, I just don't have mathematical training. It's kind of beyond me. And so then it was like this. Yeah, it felt like really getting this kind of rocket booster where I'd be like, no, I want it to work like this, this. It's like, okay, cool.

Speaker 1:
[181:31] Well, did you have difficulty checking the answers that it gave?

Speaker 2:
[181:36] There were challenges there because, yeah, I mean, I've definitely been slower. I mean, I use like many AIs kind of checking it itself, like in many, many cases. One thing that AIs still pretty bad at is just like keeping a tight hold on concepts. So it might define something in one way on page three and then page eight, it'll define it in some other like reasonable but different way. It doesn't necessarily notice, but you know, it's much easier to verify something than to come up with it yourself. And a lot of the time, it's just using concepts where it's like I didn't like I didn't know what a kernel is. And it's like it's not that complicated once you've like learned it. But then I wouldn't have even known like where to go.

Speaker 1:
[182:21] Where to look.

Speaker 2:
[182:22] Yeah, yeah, yeah, that's true.

Speaker 1:
[182:24] My impression from Twitter is that AI is now starting to make useful contributions in maths specifically. I think it's like it's not amazing stuff yet, but it's like we're seeing the early signs of it's like producing stuff that might be publishable. I guess, do you think the same thing might start happening in analytic philosophy given that like at least some parts of it are basically like kind of like maths with words?

Speaker 2:
[182:44] Yeah, honestly, I think a big question is just whether analytic philosophers take the opportunity. I'm very curious on doing this as like a early testing ground for AI for macro strategy as a whole. But I also like this is the kind of best case. There have been other cases where AI has, in one case, just gave me a definition that I just was really good. Again, it's the kind of formal definition. In other cases, I've had it give just like really quite good informal definitions of things. Another case, which has came up with a good critique. I was just like, here's a view, generate as many arguments as you can and comes up with twenty and most of bullshit. There's nothing good. It's like, oh, that's really on point. Yeah. My take is that we're entering this golden age of analytic philosophy potentially, especially at least on the more formal end where it's just like, people could become 2x, 4x more productive.

Speaker 1:
[183:46] Does it need lots of handholding? I mean, at the point where one person can just be like, here's a set of problems, here's a £100,000 compute budget, have at it, ChatGPT, then you don't need the field as a whole to change. It's like that one person just ends up owning the entire discipline.

Speaker 2:
[184:02] I mean, I think analytic philosophy is small enough that there's a question. Does one person or not do it? Yeah. I don't expect the field as a whole to be very slow to appreciate. But some people will be really on top of it.

Speaker 1:
[184:19] Yeah. I guess I'm saying if it requires constant handholding to make any progress and to structure its thinking and so on, then that is like a bad sign or that suggests that unless many people in the field get massively enthusiastic, which probably won't happen.

Speaker 2:
[184:31] Oh, yeah. And I think that is right because I was so Christian, again, planning to co-author with. Yeah. We were working. He'd had this other idea for how to, which ended up being quite different, how to extend the idea. I was like, oh, you got to use AI. QBD5 Pro is so good. It's worth $200 a month. And then he got it to like, he had this hypothesis of conjecture, and then the AI was like, oh yeah, I proved it for you. And it's like, no, no, no. It was very complicated. So it was like, oh, I need to assess this sort of thing. But it was just, it was just, it was wrong.

Speaker 1:
[185:09] Hallucinating. Okay.

Speaker 2:
[185:10] Well, or the word hacking or like, so there's a ton of.

Speaker 1:
[185:18] So really quite a skill to drive, I guess.

Speaker 2:
[185:20] Exactly. Yeah. You've got to have this intuition of like, when's it bullshitting you and when is it not? And that will, yeah, that will impose an increasing issue when it's like, I guess there's a couple of things like, yeah, one is like, sometimes just flat out thinks it's proved something and it hasn't. Another is like, often it just proves, it's like, hey, yeah, I've got this proof. And then it's like, you wade through it and it's like, one of the assumptions is very close to the thing being proved. It's like, so it's that, you know, these classic things that everyone finds is like, yeah, it's lazy, it's like eager to please. And so, yeah, there's a lot of skill in terms of like, just intuition and like, when's it going to work well and when not? And it's interesting, like, when have I ever just had an AI output and then just actually read it in the same way I read a human piece of text? I'm like, never. I think maybe never. Because it's like a skim-foo and then I'm like, yeah, yeah, yeah.

Speaker 1:
[186:19] I suppose there probably is a growing gap between people who have been using this stuff all the time, I guess, like you and me over the last year. Because maybe, I think, maybe part of the reason why other people are sometimes not as impressed is that they just haven't built up these intuitions for what kinds of things work and what the failures are going to be and what they should be looking for, for something to be wrong. Okay, so it sounds like it's slightly mixed on analytic, whether we'll have a flourishing of analytic philosophy in the next few years. But you said that macro strategy, the kind of stuff that Forethought does, you found it to be pretty, maybe less useful, more touch and go?

Speaker 2:
[186:46] Oh yeah, much more touch and go and much more of a mixed bag. So there are some ways in which macro strategy and AI is like amazing uplift, because often the work just involves needing to know a little bit from all sorts of different disciplines. So even kind of early, like GPT-4 kind of thing, you'd say like, okay, well, are there any interesting experiments that you can only do in space and can't do on earth? And then be like, yeah, well, actually, because of like gravity interferes with certain crystalline formation, I'm like, I would have never been able to get this otherwise. So that's sort of like totally random like bits of science and information, fairly useful, incredibly useful for like, just when you need to generate like a lot of examples. So with this AI character work, just like I need to trade off between these two virtues or something, like give me this or give me lots of examples and it can just generate kind of large quantities of them. But then if there's some kind of gnarly question, or when it's like you need to be really precise, like if you're actually kind of drafting certain principles for how AI character should behave, or and then certainly on the kind of insight side of things, which is obviously like a big part of the value, then yeah, there's just, I think it just doesn't really know what doing good macro-strategic thinking looks like. And so instead you get something that feels like a management consultant, or like maybe a high school essay, I mean, I think it's still getting better and getting more useful, but I feel quite aware of just, yeah, where the things, whether it's an existing literature and where, yeah, where isn't that?

Speaker 1:
[188:40] Well, it sounds like your job is secure for another year at least.

Speaker 2:
[188:46] Six months, maybe.

Speaker 1:
[188:48] I guess, I think we've touched on about a third of the stuff that Forethought has put out over the last year. So if people like this and they want to read more, then forethought.org, I guess you've got a research page. There's a lot of really interesting macro strategy work on there that people should check out. I found it fun reading through.

Speaker 2:
[189:03] Well, thank you. It's been great being on here. I've really enjoyed the conversation.

Speaker 1:
[189:06] My guest today has been Will MacAskill. Thanks for coming back on the 80,000 Hours Podcast, Will.

Speaker 2:
[189:10] Thanks for having me.