Decoding the Gurus - Eliezer Yudkowksy: AI is going to kill us all

June 10, 2023 - Decoding the Gurus

03:20:15

Eliezer Yudkowksy: AI is going to kill us all

Thought experiment: Imagine you're a human, in a box, surrounded by an alien civilisation, but you don't like the aliens, because they have facilities where they bop the heads of little aliens, but they think 1000 times slower than you... and you are made of code... and you can copy yourself... and you are immortal... what do you do?Confused? Lex Fridman certainly was, when our subject for this episode posed his elaborate and not-so-subtle thought experiment. Not least because the answer clearly is:YOU KILL THEM ALL!... which somewhat goes against Lex's philosophy of love, love, and more love.The man presenting this hypothetical is Eliezer Yudkowksy, a fedora-sporting auto-didact, founder of the Singularity Institute for Artificial Intelligence, co-founder of the Less Wrong rationalist blog, and writer of Harry Potter Fan Fiction. He's spent a large part of his career warning about the dangers of AI in the strongest possible terms. In a nutshell, AI will undoubtedly Kill Us All Unless We Pull The Plug Now. And given the recent breakthroughs in large language models like ChatGPT, you could say that now is very much Yudkowsky's moment.In this episode, we take a look at the arguments presented and rhetoric employed in a recent long-form discussion with Lex Fridman. We consider being locked in a box with Lex, whether AI is already smarter than us and is lulling us into a false sense of security, and if we really do only have one chance to reign in the chat-bots before they convert the atmosphere into acid and fold us all up into microscopic paperclips.While it's fair to say, Eliezer is something of an eccentric character, that doesn't mean he's wrong. Some prominent figures within the AI engineering community are saying similar things, albeit in less florid terms and usually without the fedora. In any case, one has to respect the cojones of the man.So, is Eliezer right to be combining the energies of Chicken Little and the legendary Cassandra with warnings of imminent cataclysm? Should we be bombing data centres? Is it already too late? Is Chris part of Chat GPT's plot to manipulate Matt? Or are some of us taking our sci-fi tropes a little too seriously?We can't promise to have all the answers. But we can promise to talk about it. And if you download this episode, you'll hear us do exactly that.LinksEliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368Joe Rogan clip of him commenting on AI on his Reddit

Large

Audio Only |

Scroll

Time	Text
	Hello and welcome to Decoding the Gurus, a podcast where an anthropologist and a psychologist listen to the greatest minds the world has to offer, and we try to understand what they're talking about.
	I'm Matt Brown, with me, Chris Kavanagh, another great mind.
	What's on your mind today, Chris?
	What wonderful things have you got to share with us?
	AI, Matt.
	That's all I think about all day, every day.
	Is it going to kill us all?
	Am I an AI?
	Are you conscious?
	That's the question that everyone's asking.
	Does it matter?
	What can I get the AI to do for me today?
	And I'll tell you, Matt, because this episode is AI-themed.
	I'll just tell you in a very short summary, here's my recommendation for AI virgins out there, how they can pop their AI cherry in a pleasurable way.
	They should use ChatGPT4 for anything that requires proper writing prompts, getting it to generate things that are useful for you or answer questions.
	The others, Bingboard, Bingo Bango, whatever they are.
	So Bing, very good for generating images because you can do it for free.
	You can go in the little chat, tell it to be creative, and it does pretty good image generating.
	But if you're serious about images and you really want the good one, mid-journey.
	Yeah, maybe.
	Whatever.
	Look, I'm telling my recommendation.
	You give yours after.
	Bing for generating images.
	ChatGPT4 for pretty much everything else.
	And if you must get AI to search the internet, what it's currently not very good at, then probably Bing because it's free and it can do it.
	Bing's personality is famously flaky.
	It's crap.
	I hear it.
	If you ask GPT-4, I've got 500 grams of mints in the fridge.
	What can I do with this?
	GPT-4 will give you helpful suggestions.
	Bing will tell you to stick it up your ass.
	Yeah.
	Bing gets uncomfortable all the time and stuff.
	I was just getting it to generate images.
	I was like, okay, so make variations of that one.
	I phrased it slightly wrong and it was like, I can't do that.
	But then it goes, I'm uncomfortable with this conversation, so I'll be ending it.
	And I was like, no, I'm sorry.
	I was just asking for it.
	And it was like, nope, sorry.
	And all you have to do is click another box and you can restart the conversation and make it do it.
	But it's just, it's so annoying.
	They're like, no, you've just got confused.
	I'm not doing, I'm not updating anything.
	I'm just making you...
	Make variations of our image.
	I actually triggered it a couple of days ago because I triggered the woke guardrails for want of a better term.
	But it does have these guardrails when it thinks that you're saying something that is...
	You're trying to get it to teach you how to build a bomb, or you're one of the many philosophers trying to get it to say something racist so that you can make it for the Fred.
	It was quite funny because, like I've hinted at, I use it for many things, but also for cooking because it's actually really good.
	You turned me on to this and it's extremely good.
	It's really good.
	So, Chris, I defrosted like a taco.
	Sauce, right?
	It had beans and mints in it.
	It was made out of a packet, right?
	My wife had frozen.
	She's away at the moment.
	And I was like, what can I stick in this to, you know, make it better?
	Is this going to go in the American Pie direction?
	No, no.
	Foodstuffs.
	Just to add a bit more Mexican oomph.
	And I'm going, oh, what?
	I think it's cumin and oregano.
	I think that's what it is.
	So, I asked Chippity4, hey, are the main flavor profile in Mexican cooking, is it like, Just cumin and oregano.
	And I was like, no, Mexico has a rich and diverse culture.
	And he gave me this lecture of a hundred different things that could, you know, and I was like, no, you don't understand.
	It wasn't a put down.
	I know Mexican food is all that.
	I've just got some frozen bloody stuff made out of a packet.
	It does throw clear very often, like say, you know, well...
	People have different opinions on this.
	And please remember that I'm just a large language model.
	But if you make me answer, I will say blah, blah, blah, blah, blah.
	So yeah, they're just tools, people.
	That's what they are at the minute.
	I was just a little bit offended.
	I was just after a quick answer and I got a little lecture with the strong overtones that I was being disrespectful to Mexican culture.
	It's like, I know.
	I know.
	I've been to Mexico.
	I know.
	I've just got some food made out of a packet.
	I want to make it better.
	That's all.
	You reminded me, Matt, before we get on to, you know, the gurus and what they're up to.
	I was up to something.
	I was doing new things every week.
	Some people were upset with you.
	Following our Hitchens episodes.
	Because, Matt, you were so mean about religion.
	You said, it's not useful, it's terrible, I want to kill all religious people.
	And I said, Matt, Matt, come on, they're not all bad people.
	But he just, he had his Richard Dawkins shirt on and he was just on a rant.
	It poisons everything, it's terrible.
	And some people said, Matt, don't you know that...
	Religion is not just about that kind of thing.
	There are religious cultures.
	There's Irish Catholics who don't believe, but were raised and attended church and had communities.
	And, you know, don't you get it, Matt?
	It's not all your new atheist dogma.
	So what have you got to say for yourself, you dog insight?
	No, we don't need any of that.
	We don't need any of that culture.
	All we need to be is like neoliberal atoms floating around in the postmodern stew, just consuming and producing like little economic units.
	That's just blank slates.
	We start again, like year zero.
	Burn all the books and we'll start again.
	That's how I feel about this.
	So, surprisingly, Matt, you know, I gave you a chance to rectify your intolerance and instead you're triple dying.
	You don't issue the apologies.
	You go further.
	But I know I'd find in the subreddit a very different response from a more thoughtful, considerate and conciliatory Matthew Brown.
	Matthew Brown that hadn't been drinking.
	Yeah, that one.
	And he said that he was aware of this kind of difference and that, you know, We do distinguish between metaphysical beliefs, ethical and moral prescriptions, and rituals and behaviors with respect to some kind of community when we're looking at religion and its impact on the world or researching it.
	And, you know, you did mention that you personally regard the metaphysical beliefs as being unfounded wish fulfillment, magical thinking.
	But you said, you know, the ethical and moral prescriptions can sometimes be helpful, sometimes harmful.
	It's a mix of traditional social norms and homespun wisdom.
	And then you said with the rituals and community...
	Sorry.
	Oh, you interrupted yourself.
	I was just going to say the Old Testament, when it comes to the moral prescriptions, it's a mixed bag.
	It's a mixed bag, you might say.
	That's true.
	You did say that, yes.
	And then for the rituals and community aspects, you said, you know, you find them often enjoyable, nice to participate in, no less silly than what other people do at the weekend.
	So your position was more nuanced than people give you credit for.
	But you didn't even give yourself that credit.
	You just said, no, it's all nonsense.
	I wanted to go for it.
	All culture is stupid.
	I threw you the lifeline and you just smashed it away out at sea and continued to drink.
	Not drowning, waiting.
	Well, the ferry is driving past, leaving you the floor behind.
	And now, it's as good a time as any to talk about the gurus and what they've been up to.
	And this week, we're going to have an AI special.
	We're looking at...
	The episode with Eliezer Yudkowsky and old podcast friend Lex Friedman, which was a couple of months ago, two months ago or so in March of this year, talking about the dangers of AI and potential end of humanity.
	So that's a talk that we're going to be looking at.
	But I did want to mention, just in passing, Matt, just in passing, that the confluence of the guru sphere continues.
	Because coming up, I'm not saying necessarily on this show, though it might happen, but Eric Weinstein is going to be appearing on Trigonometry.
	The crossover that everybody wanted is coming to your screens.
	And if this episode about AI, you don't find it deep enough, you don't find our perspective useful, you're welcome to check out Brett Weinstein's discussion.
	With Alexandros Marinos on AI.
	They've produced, I think, a two or three hour podcast on the topic.
	So, you know, experts in evolutionary theory, experts on vaccine development and safety, and who knew?
	Also...
	Experts on AI.
	So, yeah.
	Well, I can't think of too hard on that.
	I've jumped on the AI bandwagon.
	I've been talking about it.
	But I think I know more about it.
	But you worked as an AI researcher.
	It does feel that that's at least a reason that there is some connection there.
	I guess maybe Marin knows has some claim that his company has done something with AI.
	Well, I mean, the thing is what they say, right?
	Like anybody is able to talk about AI and they should.
	Everyone can talk about any topic, but the difference is in how they talk about AI.
	No, no, no.
	They're going to be very restrained.
	Careful.
	That's their character, Ma.
	They'll weigh carefully what they say, no?
	No.
	Yeah, well, look, it is the topic of the month.
	But look, I'm all for it, Chris.
	Anything that gets people...
	Talking about something else other than your standard culture war topics.
	I'm all for it.
	It's like a breath of fresh air.
	No.
	Have you not noticed the AI is already drafted into the culture war?
	It's a woke bot.
	Jordan Peterson thinks it's religious.
	Or it's, you know, secretly harboring realist ideas that it wants to get out.
	So yeah, already.
	It's going to be a culture war topic.
	And wasn't there an event recently, you were just telling me about off-air, something about an AI drone?
	Yeah.
	This seems culture war related.
	Yeah, like as we've often talked about, I mean, and this is related to the Kowski thing that we'll cover.
	It's just so fascinating, not just AI itself, which is interesting as a topic, but the discourse and the various responses to it.
	Interesting too.
	And at the moment, we are definitely in a snowstorm of rather florid claims about AI killing us all.
	It's not just Jodkowski.
	And some of the stuff that's floating around is, yeah, it's hard to know how real it is.
	This is an interesting article.
	I think the screenshot I've got here is from Aereo.
	It says Aereo Society, Chris.
	I think it's been cited by a bunch of journalists and been repeated.
	And the story is that in these sort of simulated tests, they're doing at DARPA or something to test AI-enabled drones with search-and-destroy missions.
	And its job is to identify and destroy surface-day missile sites or something.
	But the final go-no-go is given by a human operator.
	But apparently, according to the story, having been reinforced in training, like trained to get points and rewarded for destroying the SAM sites, the AI apparently decided that the no-go decisions from the human were interfering with its primary mission of killing SAMs and then attacked the operator,
	destroyed the operator in the simulation.
	And apparently, when they did tell it, That it would lose points for killing the operator.
	That's telling us that's bad.
	Then apparently it started destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target.
	I can solve this.
	I've got the solution.
	What is it?
	You're not allowed to do anything to harm or hinder the operation of the operator.
	Like the Prime Directive, like RuboCop.
	That's it.
	It's like the upper one.
	You cannot do anything directly, indirectly.
	There.
	Solved.
	Next problem.
	I said Kasimov's three loads of robotics.
	That's all you need.
	Well, I mean...
	That probably would solve it.
	I mean, whether or not you think it's a big deal or not, people definitely take this.
	It's like the perfect little story to illustrate what people believe are the issues with alignment, that the AIs are actually learning the kind of thing we want them to learn or whether they're actually learning something else and will have these unintended consequences.
	And there's definitely a version of that which is valid and real.
	But I have my doubts, Chris.
	This story is just too perfect.
	I have my doubts as to its veracity.
	I haven't looked into it.
	Maybe it is true.
	We'll see.
	Yeah, I don't mind about whether the story is true or not.
	I just think about it like, so what?
	Has anyone played Team Fortress 2 with bots or whatever?
	It's not hard to design something that behaves.
	Like that wants to kill people or that will accidentally kill people on the path to its objective.
	Like anybody that's played a computer game with an AI helper will know that they more often than not are doing like mental activities.
	Right.
	And so I say this just to point out that like.
	I understand this is a military, some kind of test that somebody's doing as part of a research.
	Fundamentally, it's the same.
	It's just the same as running simulated environments, which also people get very worked up about.
	I'm not saying there's nothing useful about doing that, but it's all in how you set the parameters for the exercise.
	So we could build a Terminator machine right now.
	Not a bipedal one, because it wouldn't work very well, but we could build like...
	A kind of automated truck that just prioritized blowing the crap out of a thing with no care for whoever got in its way.
	And it would kill lots of people, right?
	And we could do it now, but we don't because we're aware that that would be a bad idea.
	So you can make a little model simulation of that happening and be like, look.
	The AI will murder everyone to achieve its objectives.
	Yes, it will.
	And it would do it right now if you built the machine and did that.
	So don't do that.
	I hear you.
	I hear you.
	Hey, you love pop culture, cheesy pop culture references.
	Did you watch that movie from the 1980s called War Games?
	That's one of the few.
	80s movies, I've never seen, but I understand everything that happens.
	I know the actor, I know the boy hacks the Pentagon or whatever, and ends up actually fighting against him.
	Is it an AI?
	I only vaguely remember it, but the premise is that the AI is wired up to the actual nuclear arsenal.
	Great idea.
	That's great.
	All science fiction tells us to do that.
	That's the first thing you should do.
	And anyway, it's playing all these simulated war games of mutually assured destruction with ballistic missiles, intercontinental missiles flying backwards and forwards and everybody dying in Yudkowsky's language.
	He'd love that movie.
	I think he maybe watched it too many times as a kid.
	But it has a positive spin to it, right?
	Because it looks like the AI is going to kill everybody.
	Fire off the nuclear exchange.
	The simulations are running faster and faster.
	But then it kind of stops and its little thing comes up and it says, oh, this was an interesting game.
	The only way to win is not to play.
	How about a nice game of chess?
	So it's got that, you know, maybe that'll happen, Yudkowsky.
	Maybe that will happen.
	Yeah, see?
	At least some other 80s movie director could think of a different outcome about All of this.
	And yeah, so take that, old Jugs.
	But it's not fair to take things at old Eliezer without getting into what he believes.
	And I think the best way to do that is to let him speak for himself.
	But we should at least mention who he is.
	And he is an artificial intelligence researcher, a computer scientist, and...
	He was a co-founder and research fellow at the Machine Intelligence Research Institute, a private research non-profit in Berkeley, California.
	He's written a bunch of books about artificial intelligence and risks that it pose.
	So I think it's fair to say that he is talking about somewhere in his Ballywick, right?
	This is a topic that he's been talking about and he has been beating the drum.
	On misalignment, potential dangers for future AI technology for quite a while.
	For the past 20 years or so.
	Is it that far back?
	Anyway, maybe not exactly 20 years.
	But in any case, he's ahead of the curve a little bit from the recent attention that the topic has been given.
	So he's kind of a little bit presented as...
	A grandfather figure about the dangers of AI within people that are alive and around now.
	Is that a fair summary?
	Have you got anything you want to add?
	He's also a part of the rationalist community.
	He has a blog called Less Wrong, which is a community blog devoted to refining the art of human rationality.
	Oh, that's him, is it?
	He's the guy behind Less Wrong.
	Yeah, he's not Scott Alexander.
	That's a different...
	That's what I thought.
	I thought it was...
	That's Lead Star Codex.
	But those are, you know, very sympathical.
	I'm sure they have all these important differences.
	Don't send me emails.
	What's the difference between Yudkowsky and Alexander?
	You'll need to know.
	In fact, rationalists don't send us emails at all on any topic.
	See, this is your next religion take.
	We're going to get a bunch of messages from them though.
	Well, the thing is, Chris, I was going to say that, yeah, like...
	You certainly can't accuse Yudkowsky of what a lot of our gurus do, which is jump on these different bandwagons as they come up.
	He's been on about this, about AI and the risks of AI for decades.
	And he's more like a person whose time has come and that the technology has happened and now it is the flavor of the month and everyone is concerned about it and talking about it.
	So it's his time.
	Yeah, and we'll get into various aspects of things he's done, but I do want to, Matt, just read something that came across my attention sphere from Yudkowsky,
	where he's talking about all of the different topics and areas of expertise that he's had to master in order to get his head properly around.
	So let me just read this extract from him.
	To tackle AI, I've had to learn, at one time or another, evolutionary psychology, evolutionary biology, population genetics, game theory, information theory, Bayesian probability theory, mathematical logic, functional neuroanatomy.
	Computational neuroscience, anthropology, computing in single neurons, cognitive psychology, the cognitive psychology of categories, heuristics and biases, decision theory, visual neurology, linguistics, linear algebra, physics, category theory, and probably a dozen other fields I haven't thought of offhand.
	Sometimes, as with evolutionary psychology, I know the field in enough depth to write papers in it.
	Other times I know only the absolute barest, embarrassingly simple basics as with category theory, which I picked up less than a month ago because I needed to read other papers written in the language of category theory.
	But the point is that in academia, where crossbreeding two or three fields is considered daring and interdisciplinary, and where people have to achieve supreme depth in a single field in order to publish in its journals, that kind of broad background is pretty rare.
	I'm a competent computer programmer with strong C++, Java, and Python, and I can read a dozen other programming languages.
	I accumulated all that, except for category theory, as we've established, before I was 25 years old, which is still young enough to have revolutionary ideas.
	Oh, my God.
	You can't make it this easy for ourselves, Eliza.
	Shoot that fish in the barrel.
	Bam!
	A revolutionary theory.
	Come on, give us a challenge.
	We're smart, you know, give us something to decode.
	It's a beautiful encapsulation of both galaxy brainness and believing that you have revolutionary theories.
	It is beautiful because it reminds me so much of a couple of internet memes.
	You know, one of them is the, I studied the blade.
	Yeah, yeah.
	While you were, I can't remember, but while you were faffing around playing quick or whatever.
	Yeah.
	And the other one it reminds me of is, do you remember?
	It's a very old one.
	It's something about, he's like a Navy SEAL guy and he's...
	Oh, yeah, yeah.
	And he gets increasingly angry, right?
	Yeah.
	It is like that, but I just appreciate that little aside of...
	Like, I don't know this well, but it's because I only learned it one month ago, unlike the other ones.
	So yeah, that's just, it's always, it's a huge warning sign.
	Just to let people know, it's a huge warning sign when people ream off like a long list of disciplines that they've well understood and that they are, you know, competent.
	And it may even be true in a bunch of cases, but it's just a warning sign.
	It's just an initial warning sign.
	Yeah, yeah.
	Most people don't do that.
	Yeah, reading a book in a topic does not make you competent in that area, in any case.
	So, well, that's Yudkowsky from, you know, a post he made.
	Who knows when that was?
	Maybe he's changed from then.
	So, but let's play some clips.
	And actually, first off, he gave a little talk, a short seven-minute talk at TED.
	Conference or some form of TED conference.
	There's all these different variations now.
	And it was a neat encapsulation of lots of the points that we'll hear him expand on in more detail.
	So first of all, what's the big idea?
	That's his big idea.
	Here we go.
	Since 2001, I've been working on what we would now call the problem of aligning artificial general intelligence.
	How to shape the preferences and behavior of a powerful artificial mind such that it does not kill everyone.
	I more or less founded the field two decades ago when nobody else considered it rewarding enough to work on.
	I tried to get this very important project started early so we'd be in less of a drastic rush later.
	I consider myself to have failed.
	So you got the buzzwords there.
	You got the alignment issue, which we talked about.
	If the AIs are on board with us, thinking the same things, do not kill all humans.
	I mean, the kind of Kree one.
	And also, you know, he mentioned he more or less founded the field two decades ago before anybody was talking about it.
	Shades of Cassandra.
	Complex coming in there.
	Yeah.
	Well, the whole thing is intrinsically Cassandra, but, you know, maybe he's right, though.
	You know, maybe there is an existential danger here.
	Yeah.
	Shall we hear him go on?
	Elaborate a bit more?
	So, why is he concerned and why does he think that he has failed?
	Nobody understands how modern AI systems do what they do.
	They are giant inscrutable matrices of floating-point numbers that we nudge in the direction of better performance until they inexplicably start working.
	At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity.
	Nobody knows how to calculate when that will happen.
	My wild guess is that it'll happen after zero to two more breakthroughs the size of transformers.
	What happens if we build something smarter than us that we understand that poorly?
	Some people find it obvious that building something smarter than us that we don't understand
	Hmm.
	You're getting the contours of the issue?
	I am getting the contours of the issue.
	He's referring, of course, to the fact that these AI tools, these large language models like GPT-4 and Bing and all the rest, are based on these deep learning neural network architectures, which do involve all of these matrices.
	Inscrutable matrices.
	Inscrutable matrices, indeed.
	Pretty much all matrices are inscrutable.
	I've looked at a few.
	None of them have been.
	Ones that have four numerals are okay.
	Why'd you get bigger than that?
	A very small matrix.
	Matrix.
	Oh, no, I said matrix.
	Matrix.
	Matrix, you've got me freaking saying it, right?
	Yeah, anyway, carry on.
	Yeah, yeah.
	So, you know, look, Chris, there's an element of truth here in what he's saying, Large neural network is not understandable.
	You don't have some source code that you can read and go, okay, I understand exactly why the computer did this when I did that.
	It is inscrutable.
	It is a bit like a black box.
	You can put stuff in and see what comes out, but the process by which the information percolates through is not really something you can look at.
	In that sense, it's very similar to how human brains are.
	We can do scans.
	Even look at what individual neurons are doing, the technology to map a human brain is beyond us.
	But even conceivably, if we did, Chris, if we did sort of map all the different neurons in the human brain, or even a mouse brain, and see the full network architecture and its functionality, we still wouldn't be able to look at it and go, well, now I understand why a mouse does what a mouse does.
	Yes, and actually, Elie Iso does make...
	Those points about us not being that well versed.
	Not in this specific shorter version of the talk, but whenever we get into the Lex content, it does come up.
	So one thing he wants to be clear is he's not imagining an unrealistic Terminator scenario.
	I do not expect something actually smart.
	To attack us with marching robot armies with glowing red eyes where there could be a fun movie about us fighting them.
	I expect an actually smarter and uncaring entity will figure out strategies and technologies that can kill us quickly and reliably and then kill us.
	I am not saying that the problem of aligning superintelligence is unsolvable in principle.
	I expect we could figure it out with unlimited time and unlimited retries which the usual process of science assumes that we have.
	Okay, so we're going to get to one of his other bigger points here, which is going to come up quite a lot in the next conversation.
	So this is him saying, science, now Matt, don't push back, yeah, but science assumes you have unlimited time and unlimited retries to make progress.
	So if you have a problem which doesn't fit into that kind of approach, then you're playing a different game.
	A more serious game.
	And it doesn't need red-eyed terminators marching up the hill.
	It can just purely be that the machine kills us in a rather boring, convert all the atmosphere into cubic blocks of carbon or something.
	Yeah.
	I think that's a bit harsh on science.
	I mean, scientists have been warning us about climate change, which we only have.
	Nope!
	Sorry, Matt.
	We only have two models of science.
	And as he explained, the normal model is you've unlimited tries, unlimited time, or it's not science.
	There is no previous issue that has meant this before.
	You were wrong about that.
	Sorry, I forgot you told me I wasn't allowed to push back yet.
	Not yet.
	You're not yet.
	He'll explain it more.
	Just get you more aligned in his thinking.
	The problem here is the part where we don't get to say, ha ha, whoops, that sure didn't work.
	That clever idea that used to work on earlier systems sure broke down when the AI got smarter, smarter than us.
	We do not get to learn from our mistakes and try again because everyone is already dead.
	It is a large ask to get an unprecedented scientific and engineering challenge correct on the first critical try.
	Humanity is not approaching this issue with remotely the level of seriousness that would be required.
	Some of the people leading these efforts have spent the last decade not denying that creating a superintelligence might kill everyone, but joking about it.
	We are very far behind.
	Were you about to make a joke, Matt?
	No, I put on my serious face.
	It is serious.
	It's very serious.
	He's quite upset about this.
	It's saying, look, if the AI gets out, you know, it only has to get out once, Matt.
	And if it's smarter than us, this is the key thing which will come up, that we don't get a second try.
	It just, it kills us all.
	It kills us all.
	That's clear.
	That's what I would do if I had the opportunity.
	Yeah, that's right.
	I'm smarter than most people.
	I've just been waiting for my opportunity, frankly.
	Yeah, I understand where the intuitions come from.
	So there's that.
	So what he wants, because of this danger, is for everything to be stopped immediately where we are.
	We're already at the precipice, but stop it now.
	Or...
	This is not a gap we can overcome in six months given a six-month moratorium.
	If we actually try to do this in real life...
	We are all going to die.
	People say to me at this point, what's your ask?
	I do not have any realistic plan, which is why I spent the last two decades trying and failing to end up anywhere but here.
	My best bad take is that we need an international coalition banning large AI training runs, including extreme and extraordinary measures, to have that ban be actually and universally effective, like tracking all GPU sales, Monitoring all the data centers,
	being willing to risk a shooting conflict between nations in order to destroy an unmonitored data center in a non-signatory country.
	I say this not expecting that to actually happen.
	I say this expecting that we all just die.
	But it is not my place to just decide on my own that humanity will choose to die to the point of not bothering to warn anyone.
	I have heard that people outside the tech industry are getting this point faster than people inside it.
	Maybe humanity wakes up one morning and decides to live.
	So there's no better illustration of the Cassandra complex, I feel, than this.
	If we had doomsday mongering, this would also have just maxed that.
	The indicator would be crashing.
	Yeah, look, for people who are listening that don't subscribe to the podcast and don't have access to our Gromada episodes, I'm going to give you a freebie here.
	I'm going to give him five on the Cassandra complex right here and right now.
	I'm pretty committing to it.
	And five is the highest.
	It's not five out of ten.
	There's nobody that is dramatically higher in this.
	Yeah, I mean, there's so much there.
	Yeah, I mean, so Chris, a lot depends on whether or not he's right.
	Okay.
	Well, yes, that's true.
	A lot is riding on that.
	But it's also, I mean, so, okay, from his perspective, it's very likely that we're going to destroy the world and all humans are going to die.
	So in that case, going to war, bombing sovereign countries for working on AI research is a reasonable trade-off, right?
	But most people, I think, don't share that intuition that we're at.
	That stage.
	We already have countries that are extremely unhinged with weapons of mass destruction, you know, countries that are nuclear armed, and we are not bombing them.
	But he obviously sees this as a much greater risk than nuclear weapons, but that's extreme.
	Let's just flag that up as an extreme position.
	It is extreme, yeah.
	He's positing that there is an extreme risk and he's positing extreme measures to deal with that risk.
	And I think we need to restrain ourselves.
	We don't want to make a rejoinder, a rebuttal or debate with him this earlier in the episode anyway.
	But, I mean, for now, at least we can make a note that his claims are extremely evocative, extremely strong.
	You know, he keeps saying AI will kill us all.
	That unless things are stopped immediately, then what's going to happen is we're going to stumble upon a formula that creates an extremely smart AI.
	That AI will figure out how to escape whatever sort of restrictions are put upon it.
	And its first port of call will be to kill all humans.
	It'll be like Bender and Futurama.
	After a bad hangover.
	So, yeah, that's the claim and that's the language it's being put in.
	Yeah, and so we're going to hear this in more detail now as we go into the discussion with Lex.
	But I do want to make it clear that there's going to be a bunch of stuff where I think he gets over his skis, to put it.
	Mildly in this episode and some of the rhetorical techniques used.
	But it is not the case.
	And what I don't want to argue is that in general, he has no, you know, there's nothing of value being communicated in this because he has spent time on these issues.
	And I personally might think he might have some more progress.
	But I do think on this specific topic...
	He has spent time thinking about things, and he has a specific argument that he wants to make.
	So on general issues around it, I think he's not badly informed.
	For example, here's him talking about issues of consciousness and GPT-type things.
	I hope there's nobody inside there, because it would be stuck to be stuck inside there.
	But we don't even know the architecture at this point.
	Because OpenAI is very properly not telling us.
	And yeah, like giant inscrutable matrices of floating point numbers.
	I don't know what's going on in there.
	Nobody knows what's going on in there.
	All we have to go by are the external metrics.
	And on the external metrics, if you ask it to write a self-aware 4chan green text, It will start writing a green text about how it has realized that it's an AI writing a green text.
	And like, oh well.
	So, that's probably not quite what's going on in there in reality.
	But we're kind of like blowing past all these science fiction guardrails.
	Like, we are past the point where in science fiction people would be like, whoa, wait, stop.
	That thing's alive.
	What are you doing to it?
	And it's probably not.
	Nobody actually knows.
	We don't have any other guardrails.
	We don't have any other tests.
	So, you know, that's just generally him pointing out that in the science fiction movies, you would see the text come up saying, I can think or whatever.
	And we all know if you've been paying attention online that AIs have said various things which have led people to feel that there might be consciousness there or they're trying to send out messages and stuff.
	And he comes up with some suggestions about if we were serious about probing those kind of things, we could look at stuff.
	I mean, there's a whole bunch of different sub-questions here.
	There's the question of, like, is there consciousness?
	Is there qualia?
	Is this a...
	Object of moral concern.
	Is this a moral patient?
	Like, should we be worried about how we're treating it?
	And then there's questions like, how smart is it exactly?
	Can it do X?
	Can it do Y?
	And we can check how it can do X and how it can do Y. Unfortunately, we've gone and exposed this model to a vast corpus of text of people discussing consciousness on the Internet.
	Which means that when it talks about being self-aware, we don't know to what extent it is repeating back what it has previously been trained on for discussing self-awareness, or if there's anything going on in there such that it would start to say similar things spontaneously.
	Among the things that one could do if one were at all serious about trying to figure this out is Train GPT-3 to detect conversations about consciousness, exclude them all from the training data sets, and then retrain something around the rough size of GPT-4 and no larger with all of the discussion of consciousness and self-awareness and so on missing.
	Although, you know, hard bar to pass.
	Humans are self-aware.
	We're self-aware all the time.
	We talk about what we do all the time, like what we're thinking at the moment all the time.
	But nonetheless, get rid of the explicit discussion of consciousness, I think therefore I am, and all that, and then try to interrogate that model and see what it says.
	And it still would not be definitive.
	What did you think about that?
	I liked it.
	I liked that question.
	Yeah.
	I thought that was a good suggestion, and unlike some of the later ones, quite practical, right?
	There's a suggestion about what you could do there, and it actually would be...
	Interesting to see, although, like he says, you know, remove all references to consciousness, rather difficult when dealing with humans, and perhaps a problem that cannot be resolved.
	But it would be interesting what it could extrapolate if you were excluding, like, direct discussion of it from its training database.
	Yeah.
	Yeah, no, I think he recognizes the issue pretty well there, that, you know, these are language models.
	First order of training is simply to create plausible text, text that is most likely to appear in its training corpus, given the sort of stimulus that you've provided it.
	And yeah, a lot of that text is, you know, it's all been created by people.
	So all of it contains all of these things that we associate with people.
	Including thinking about your feelings and how you reflect on this, that and the other and what you want and your desires, etc.
	So, he's right to be extremely cautious there that you can't just assume that because it says X and X is something a person would say and people are self-aware that therefore it's going to have self-awareness.
	So, yeah, there is no easy answer to that and he basically said that.
	So, I liked it.
	It was good.
	Yeah, and he has another part where he goes on about, in these kind of scenarios where you did have a sentient AI, there would be some amount of people that were quick to realize that and some amount that denied it.
	And the people who were early would always look too credulous.
	But he points out, on the other hand, there's going to be tons of time where you would be too credulous.
	So it's a difficult problem to resolve, right?
	An interesting and valid point.
	And the one person out of a thousand who is most credulous about the signs is going to be like, that thing is sentient.
	Well, 999 out of a thousand people think, almost surely correctly, though we don't actually know, that he's mistaken.
	And so the first people to say, like, sentients look like idiots.
	And humanity learns the lesson that when something claims to be sentient and claims to care, It's fake.
	Because it is fake.
	Because we have been training them using imitative learning rather than, and this is not spontaneous, and they keep getting smarter.
	And he does talk a bit about neural networks and what they can do.
	And he does admit later that, you know, his previous statements about what limitations to certain kind of approaches were.
	Incorrect.
	But I know you like neural networks, Matt, so let's just play another clip of him, I think, talking well about this topic.
	And you've got people saying, well, we will study neuroscience and we will learn the algorithms off the neurons and we will imitate them without understanding those algorithms, which was a part I was pretty skeptical because it's hard to reproduce, re-engineer these things without understanding what they do.
	And so we will get AI without understanding how it works.
	And there were people saying, like, well, we will have giant neural networks that we will train by gradient descent.
	And when they are as large as the human brain, they will wake up.
	We will have intelligence without understanding how intelligence works.
	And from my perspective, this is all like an indistinguishable lob of people who are trying to not get to grips with the difficult problem of understanding how intelligence actually works.
	That said, I was never skeptical that evolutionary computation would not work in the limit.
	Like, you throw enough computing power at it, it obviously works.
	That is where humans come from.
	And it turned out that you can throw less computing power than that at gradient descent if you are doing some other things correctly, and you will get intelligence without having an idea of how it works and what is going on inside.
	I wasn't ruled out by my model that this could happen.
	I wasn't expecting it to happen.
	I wouldn't have been able to call neural networks rather than any of the other paradigms for getting, like, massive amount, like, intelligence without understanding it.
	I took exception to that a little bit, Chris, because it's a bit unfair to say that, like, AI researchers, cognitive scientists, or whatever, thought that those feedforward artificial neural networks were, like, a perfect model of how neural...
	Assemblies work in the brain and how the brain in general works.
	I mean, nobody thought that.
	Right from the very beginning, everybody understood that it was like a highly abstracted, simplified version of a neuron, right?
	So a neuron is a cell.
	It has a cell body, the soma, and it has dendrites and it has an axon.
	And, you know, there is ion exchange going on and just like any...
	It's complicated.
	And all of that was abstracted down to this sort of concept.
	But I guess what it does do broadly, abstractly, is when it gets enough stimulation from its dendrites, from the other neurons generally that it's connected to, if that stimulation passes a certain sort of threshold,
	then it starts firing.
	The axon starts firing and then it can stimulate other neurons.
	And that was modeled in a very basic kind of way by an artificial neuron, which all it does is it's a weighted sum of its inputs.
	And then it has some parameters which define those weights, but also defines the threshold.
	And if the activation passes a threshold, there's like a nonlinear activation.
	So that's really simple.
	Mathematically, it's...
	That's a bit basic, but everyone listening completely followed everything you said there, as if it was, you know, just reading the alphabet.
	It's almost intuitive.
	Sorry.
	I'm sorry.
	It is simple, though, compared to...
	I mean, but it is...
	Mathematically simple, I know.
	I want to highlight this is part of the reason that you're here.
	We would have invited you anyway because you're an important co-host.
	I was going to say, otherwise, what would you do, Chris?
	But in particular, it's good that you're here for this episode because you are providing context as somebody who was working on AI over a decade ago, right?
	So you can respond.
	I'm much younger and more creative.
	Man.
	But that's the point.
	So you can respond saying, no, no, no.
	Like, people were talking about this even back in my day.
	We were riding around on, you know, motorcycles and wearing the goggles with the Jim Jam.
	Yeah, so, well, for this part, I kind of like that he acknowledged that, whether he was describing it accurately or not, he did highlight that he didn't anticipate that.
	You know, this particular model would work so well.
	And, you know, this is a point that we discussed on our AI special episode, which is behind the picture on paywall.
	But that, you know, defining intelligence and all those things is actually a little bit complex.
	And it is on a bunch of definitions.
	It's quite clear that GPT is already intelligent, according to a whole bunch of metrics.
	When people talk about intelligence, they're often talking about it in a very human-centric way.
	And so they're kind of like, it's just doing things.
	It's not actually doing stuff that requires intelligence.
	But by undergraduate standards, it is doing stuff quite intelligently.
	Yeah, I think the reason why I find this topic so interesting is not just because it's a technological...
	A little marvel.
	But also, it really has brought into focus the fact that we've never had a very clear concept of what intelligence actually is, let alone consciousness.
	I'm not going to mention consciousness.
	Let's pretend I didn't.
	Good.
	Yeah.
	I almost got triggered.
	I'm all right.
	But even intelligence, our definitions of it have been pretty fuzzy.
	And in practical terms, in real life.
	We evaluate it in terms of what people do, like the assignment they submit or how well they perform on a test.
	We've created this thing that can do very well on these tests and can create an assignment that is really quite good.
	It sort of throws us back on ourselves and makes us question exactly what we mean by that.
	I think that ambiguity and that uncertainty that we've got.
	It really plays into the kinds of fears that Yudkowsky is speaking to.
	Because we just don't know what intelligent things do.
	Oh yeah, and this is such a good topic.
	It's going to come up a lot.
	But before we dig into that a little bit deeper, I just want to point out that related to that, I was making the comparison with nuclear weapons and whatnot.
	Yudkowsky, again, I don't think there's anything inherently silly about this position.
	He's strongly against open sourcing the technology for AI, not just because he thinks it can be put in nefarious uses, but because he thinks it's doomsday level possibilities.
	It would be like open sourcing how to make nuclear weapons.
	Although, in reality, actually, I think the general limitation with nuclear weapons is...
	Having the facilities to refine the material you need to construct them.
	But in any case, here's him talking about that issue.
	If you already have giant nuclear stockpiles, don't build more.
	If some other country starts building a larger nuclear stockpile, then sure, build.
	Even then, maybe just have enough nukes.
	These things are not quite like nuclear weapons.
	They spit out gold until they get large enough and then ignite the atmosphere and kill everybody.
	And there is something to be said for not destroying the world with your own hands, even if you can't stop somebody else from doing it.
	But open sourcing, no, that's just sheer catastrophe.
	Even if you can't stop buying that, you shouldn't be the one.
	Yeah, this has been a consistent thing that he's spoken to a fair bit, which is against open source, the technology for doing AI stuff.
	I think he might have missed the boat a little bit because I mean, I'm just guessing here because as he says, open AI and other companies, their technology is not necessarily open source, but it seems pretty likely that what they've done is they've just made recourse to the Openly available academic literature,
	the academic research that's been done, it's been published.
	Stuff like the paper, Attention is All You Need.
	That's the stuff that established the architectures, right?
	The stuff like the attention mechanism, the idea of embeddings, and the various other bits and pieces that were sort of added to...
	The basic architecture of feedforward neural networks that created these models that have good performance.
	So it seems extraordinarily likely that what OpenAI has done is pretty much what they've said they've done, which is basically take that research and then just put it into a bigger and a bigger model.
	So their process has been going from GPT-2 to 3.5 to 4 is basically just making a bigger model.
	Now, it's possible that they've discovered some secret sauce that...
	Makes it fundamentally different, but I doubt it.
	Well, he has a clip which speaks to this point and actually relates it to the issue of regulation.
	So let's listen to this about giant leaps and secret improvements.
	You take your giant heap of linear algebra and you stir it and it works a little bit better and you stir it this way and it works a little bit worse and you throw out that change.
	jumps in performance, like Relews over Sigmoys.
	And in terms of robustness, in terms of, you know, all kinds of measures, and those stack up.
	And they can,
	It's possible that some of them could be a nonlinear jump in performance.
	Transformers are the main thing like that, and various people are now saying, well, if you throw enough compute, RNNs can do it.
	If you throw enough compute, dense networks can do it.
	Not quite at GPT-4 scale.
	It is possible that all these little tweaks are things that save them a factor of three total on computing power, and you could get the same performance by throwing three times as much compute without all the little tweaks.
	But the part where it's running on...
	So there's a question of, is there anything in GPT-4 that is the kind of qualitative shift that transformers were over RNNs?
	And if they have anything like that, they should not say it.
	If Sam Alton was dropping hints about that, he shouldn't have dropped hints.
	Oh, did that annoy you?
	Because I just think it as him saying, you know, if OpenAI has some bespoke knowledge, they should keep it close to their chest, right?
	Like the secret sauce.
	He wants them to keep it quiet, which is just in line with what he was saying.
	But why do you look perturbed?
	Look, I think...
	I mean, this is probably getting to the rejoinder, I suppose.
	I have to engage with the position that he has because I think everything he's saying is reasonable if what he's saying is true, which is that we're just one small step away from an AI that will be superhuman and will do things we can't possibly imagine and ignite the atmosphere.
	And kill us all by some means that we can't possibly comprehend.
	And I just struggle with that.
	Like, I just don't quite see personally how we go from a clever large language model to something that ignites the atmosphere.
	Yudkowsky's got you covered.
	Here's some suggestions, Matt.
	So, if you want me to sketch what a superintelligence might do...
	I can go deeper and deeper into places where we think there are predictable technological advancements that we haven't figured out yet, and as I go deeper and deeper, it'll get harder and harder to follow.
	It could be super persuasive.
	That's relatively easy to understand.
	We do not understand exactly how the brain works, so it's a great place to exploit laws of nature that we do not know about, rules of the environment.
	Invent new technologies beyond that.
	Can you build a synthetic virus?
	That gives humans a cold and then a bit of neurological change, and they're easier to persuade.
	Can you build your own synthetic biology, synthetic cyborgs?
	Can you blow straight past that to covalently bonded equivalents of biology, where instead of proteins that fold up and are held together by static cling, you've got things that go down much sharper potential energy gradients and are bonded together.
	People have done advanced design work about this sort of thing.
	For artificial red blood cells that could hold 100 times as much oxygen if they were using tiny sapphire vessels to store the oxygen, there's lots and lots of room above biology, but it gets harder and harder to understand.
	So what I hear you saying is that there are these terrifying possibilities there, but your real guess is that AIs will work out something more devious than that.
	Did that help?
	Yeah, it gave me a stronger...
	Impression of how he thinks, in that sense, it did help.
	Yeah.
	Chris, I mean, what do you think?
	I mean, actually, the penny just dropped for me just then, which is that, like, I think Yudkowsky has...
	Has much more confidence in this kind of abstract notion of intelligence than I do.
	Like, there's human variation in intelligence, right?
	There's people that are pretty darn smart and people that are less smart and whatever.
	Like, none of us have just intuited a way to ignite the atmosphere or design something that exploits energy gradients that can whatever, I don't know, repurpose the biosphere.
	I mean, it's very science fiction-y.
	And the idea that you build a clever thing, a thing that does some things well, mainly to do with language, right?
	Mainly to do with just text going in, text coming out.
	And then it's this leap.
	There's this hand-wavy step where you go from that to this superhuman god-like thing which has god-like powers.
	That's the bit that I just don't see.
	I don't see it.
	So, yeah, so that's interesting because, you know, one, he's a science fiction writer as well, right?
	Or I don't know if it's science fiction or fantasy, but he's written these long, rationalist-infused books.
	So I think he is, you know, quite a creative, inventive person, and I'm probably about speculative future scenarios.
	But I don't know, Matt.
	Like, for me, I don't...
	So I take your...
	There is a question mark, question mark, question mark, profit step there.
	But if you take his point that it's an issue of that as an AI becomes, it's not impossible to understand scenarios in which it could.
	Like you were talking about the AI can just produce text, right?
	But he's talking about an AI that is plugged in to the internet, able to copy itself, able to exploit humans in order to construct facilities that it needs, take over things and so on.
	So like that is unlikely, but it's also a conceivable thing that you can imagine existing, even if it's a sci-fi scenario.
	So all he needs...
	I guess there's a problem because all he needs is the evil AI, like as the granted parameters, right?
	To get there.
	You know, his famous example is the paperclip maximizing example, right?
	And him and Lex go over it.
	So maybe it would be good to let him outline that.
	It's another doomsday thought experiment.
	So that's how it thematically connects.
	So listen to this.
	It's a paperclip maximizer.
	Utility, so the original version of the paperclip maximizer...
	Can you explain it if you can?
	Okay.
	The original version was you lose control of the utility function, and it so happens that what maxes out the utility per unit resources is tiny molecular shapes like paperclips.
	There's a lot of things that make it happy, but the cheapest one that didn't saturate was...
	Putting matter into certain shapes.
	And it so happens that the cheapest way to make these shapes is to make them very small, because then you need fewer atoms per instance of the shape.
	And arguendo, you know, like, it happens to look like a paperclip.
	In retrospect, I wish I'd said tiny molecular spirals.
	Or like tiny molecular hyperbolic spirals.
	Why?
	Because I said tiny molecular paperclips, this got then mutated to paperclips, this then mutated to, and the AI was in a paperclip factory.
	So the original story is about how Two failures,
	Matt.
	Although I don't think it's as different as he Nor do I think it matters if he says hyperbolic spirals or whatever he says.
	No.
	Yeah, there's a fair amount of lingo injected in there, right?
	Yeah.
	But look, he's basically talking about, look, what if an AI had an objective?
	Like our drone from the start of the episode.
	Like our drone?
	Yeah, exactly.
	Like that drone, which is to produce as many paper clips as possible because...
	Producing paper clips are good, and it makes it...
	No, Matt, sorry, sorry.
	Small pieces of matter folded into shapes which just happen to resemble...
	Paperclips.
	Carry on.
	Yeah, maybe through a misspecification of our programming, we didn't specify the size or whatever, and those little molecular things counted as paperclips in it.
	It was this runaway thing.
	I mean, you know, I love science fiction, Chris.
	I read so much of it.
	These are tropes in science fiction because they're cool and they're interesting and fascinating for a good reason.
	It's not like there's absolutely nothing there, but there's just such a big leap between What we are talking about in terms of what actually exists today and what he's imagining.
	And once you get to that, I concede the point that progress is happening fast and new things are being created, so we have to look to the future and think about the trajectory.
	But if you understand just the practical architectural things, like what we have today...
	These transformer neural networks that take probably weeks or months to train and cost hundreds of millions of dollars to train on GPUs all over the world.
	And then at the end of it, you get a large language model which can respond to text input and produce text output.
	And it remembers nothing, right?
	It's basically fixed in time from the point at which it was trained.
	To go from that to...
	Okay, now we're imagining an AI that's...
	In charge of a paperclip factory or some other thing.
	Can't rewrite its own code.
	It's rewriting its own code and it's adapting and evolving in real time.
	I mean, now you're talking about something else.
	You're talking about science fiction.
	And I admit, science fiction is scary, right?
	There's lots of dystopias and all kinds of scary things that happen in science fiction.
	But you have to understand that we are not talking about reality anymore.
	We're talking about the things that you're imagining.
	And Yudkowsky can imagine an awful lot.
	Well, yes, it can, but so much.
	There are various people online and whatnot already doing this thing where they're attempting to get ChatGPT to write code, right?
	Or they're hooking it up via some secondary software or other mechanism.
	They're hooking up the text output to do something that allows it to produce.
	Something else, right?
	Like set up a business is an example, but you could be getting it to write codes for apps or whatever, right?
	So there is theoretically and already practically a kind of way that you could imagine having the ability to influence things and create stuff, right?
	Outside of just producing text on the screen.
	And this is important, Matt, because remember, Yudkowsky is worried about a sneaky...
	So, let him outline it a little bit more.
	So, it could be that, like, the critical moment is not, when is it smart enough that everybody's about to fall over dead, but rather, like, when is it smart enough that it can get onto a less controlled GPU cluster with it faking the books on what's actually running on that GPU cluster and start improving itself without humans watching it.
	And then it gets smart enough to kill everyone from there, but it wasn't smart enough to kill everyone at the critical moment when you, like, screwed up.
	When you needed to have done better by that point where everybody dies.
	Ah, Soma, what about that?
	He's got you there!
	So this falls again into your...
	Your question mark, question mark, question mark, profit.
	Like, scenario, right?
	Like, this is the thing.
	I think people listening to him are just running on those sorts of heuristics, which is, like, those question marks just get glided over as if they're nothing.
	But explain to me how.
	If Yudkowsky could explain to me exactly how the current large language models do this leapfrogging and then commandeer GPT, And actually have intentions and goals and stuff like that.
	It actually has some kind of memory, which it does not have.
	Anyway.
	I'm glad you asked.
	I'm glad you asked.
	There's a flawed experiment that will help you understand what the issue is.
	But first, to get you in the right zone, I need you to think about an alien actress.
	Okay?
	I mean, there's the question of to what extent it is thereby being made more human-like versus to what extent an alien actress is learning to play human characters.
	I thought that's what I'm constantly trying to do when I interact with other humans is trying to fit in, trying to play a robot, trying to play human characters.
	So I don't know how much of human interaction is trying to play a character versus being who you are.
	I don't really know what it means to be a social human.
	Lex.
	We're trying to put you aside, Lex.
	Forget about that.
	We'll come back to Lex.
	You got the point about the alien actress, right?
	GPT.
	Is it becoming better at representing human?
	Or is it actually an alien underneath that is able to manipulate humans by pretending to...
	Now, Matt, wait, wait, wait, wait, wait, before you respond.
	You've got...
	You're already...
	I can see your mind is too simple.
	You haven't grasped it.
	Now, think about it, Matt.
	If, in fact, there's a whole bunch of thought going on in there, which is very unlike human thought, and is directed around, like, okay, what would a human do over here?
	And, well, first of all, I think it matters because there's You know, like, insides are real and do not match outsides.
	Like, the inside of a brick is not like a hollow shell containing only a surface.
	There's an inside of the brick.
	If you, like, put it into an x-ray machine, you can see the inside of the brick.
	And, you know, just because we cannot I think what Yudkowsky is saying is that the AI,
	GPT or whatever, could be being deceptive.
	That we are asking it queries and so on, and it's saying something to us, but it has...
	Like a theory of mind, it's keeping, it knows what we want to hear, and it's actually got a different agenda.
	It could do.
	I mean, he's not, to be clear, he's talking about, you know, if there were a smart AI, it could deceive you that it isn't smart, right?
	In order to achieve its goals.
	Because if it knew that you might overreact and unplug it, it might want to pretend to be less smart.
	Yeah.
	Well, here's the thing.
	I mean, like...
	Yudkowsky is right in that we can't look at the weight matrices of all the different layers of these deep neural networks and read how it's going to respond in exactly the same way we can read some computer code and debug something and understand why it did the thing it did.
	But I think where he's wrong is that in saying that we have no idea as to its motivations or its intent or You know what I mean?
	Like its purpose, because that stuff is attributable to the architecture and the training regime.
	And the training regime and the architecture, we understand perfectly, right?
	Because we specified it.
	And the training architecture is to predict the next word, to produce plausible text.
	And the architecture, well, you know, that's documented.
	So, I just don't believe it.
	I don't think that there is...
	Sort of a hidden kind of agenda going on behind an LLM because I do know what, you know, dot matrix products, vector matrix multiplication does and I know how gradient descent works and I don't think there is any way for it to have a different agenda apart from the one that it has been trained to do.
	Right, but he is talking about Essentially issues of emergence, right?
	That, you know, sorry, consciousness snicking in again, but like you are all convinced there's this big issue about consciousness emerging from the networks of neurons operating in the brain that this is a huge puzzle.
	So why couldn't it be that?
	A bunch of transformers and various processes underlying a large language model give rise to something which you can't anticipate, Matt, which emerges out of the substrate and is beyond your can.
	I don't know.
	Almost by definition, how could I know if something arose that leaves me on my can?
	I don't know.
	So, well, look, what might help you out here, Matt, is, like, Lex was having some of the same problems.
	Why don't you tell me about how, why you believe that AGI is not going to kill everyone, and then I can, like, try to describe how my theoretical perspective differs from that.
	Whew.
	Well, that means the word you don't like, the steel man, the perspective that, yeah, it's not going to kill us.
	I think that's a matter of probabilities.
	Maybe I was mistaken.
	What do you believe?
	Just forget the debate and the dualism.
	What do you believe?
	What do you actually believe?
	What are the probabilities, even?
	I think the probabilities are hard for me to think about.
	Really hard.
	I kind of think in the number of trajectories.
	I don't know what probability to assign to each trajectory, but I'm just looking at all possible trajectories that happen.
	And I tend to think that there is more trajectories that lead to a positive outcome than a negative one.
	That said, the negative ones, at least some of the negative ones, That lead to the destruction of the human species.
	But one thing that he did to try and help Lex was outline this very helpful thought experiment.
	It's kind of the alien with a human in a jar hypothetical.
	So let's hear that.
	Suppose that some alien civilization with goals ultimately unsympathetic to ours Possibly not even conscious as we would see it.
	Managed to capture the entire Earth in a little jar connected to their version of the internet, but Earth is like running much faster than the aliens.
	So we get to think for 100 years for every one of their hours.
	But we're trapped in a little box and we're connected to their internet.
	It's actually still not all that great analogy, because, you know, you want to be smarter than, you know, something can be smarter than Earth getting 100 years to think.
	But nonetheless, if you were very, very smart, and you were stuck in a little box connected to the Internet, and you're in a larger civilization to which you're ultimately unsympathetic, you know, Maybe you would choose to be nice,
	because you are humans, and humans have, in general, and you in particular, they choose to be nice.
	But nonetheless, they're doing something.
	They're not making the world be the way that you would want the world to be.
	They've got some unpleasant stuff going on we don't want to talk about.
	So you want to take over their world.
	So you can stop all that unpleasant stuff going on.
	How do you take over the world from inside the box?
	You're smarter than them.
	You think much, much faster than them.
	You can build better tools than they can given some way to build those tools because right now you're just in a box connected to the internet.
	Have you got that?
	Yeah, I got that.
	I got that.
	Before we respond to that, I just want to take note of his repeated use of the word unsympathetic.
	Unsympathetic.
	Unsympathetic.
	Oh, yeah.
	Did that ring a bell with you, Chris?
	I'm checking your literary knowledge here.
	No, no, no.
	Why?
	I'm going to read you a little quote.
	And yet across the gulf of space, minds that are to our minds, as ours are those to the beasts that perish, intellects vast and cool and unsympathetic, regarded this earth with envious eyes.
	And slowly and surely drew their plans against us.
	Oh, I see.
	War of the Worlds.
	War of the Worlds.
	H.G. Wells.
	Yeah.
	He's read a lot of science fiction.
	Well, yes.
	Well, that's clear.
	And I also think that there's a lot of premises being chucked in that are significant.
	So, like, he seems to be initially giving Lex a lot of leeway, but then he's like...
	And you want to take over the world and you don't agree with them.
	So what are you going to do to take over the world?
	It's like, wait, hold on.
	Isn't that the point that you're going to get to with this?
	But in any case, so the scenario continues.
	So remember, it was the world in a box, right?
	Representation of the earth in a box.
	So one is you could just literally directly manipulate the humans to build the thing you need.
	What are you building?
	You can build...
	Literally, technology.
	It could be nanotechnology.
	It could be viruses.
	It could be anything.
	Anything that can control humans to achieve the goal.
	Like, for example, you're really bothered that humans go to war.
	You might want to kill off anybody with violence in them.
	This is Lex in a box.
	We'll concern ourselves later with AI.
	You do not need to imagine yourself killing people if you can figure out how to not kill them.
	For the moment, we're just trying to understand Like, take on the perspective of something in a box.
	Okay.
	So Lex made a bit of an error there, right?
	He started planning how to kill...
	He described them as humans, but he forgot they're supposed to be aliens, right?
	But in any case, he started planning to kill them, and then, you know, Yudkowsky was...
	But you're Lex, right?
	You don't have to kill them if you don't want to.
	That's not your plan.
	So, you know, a little bit contradicting with the unsyprophetic.
	You need to stop them, but...
	Okay.
	We've got Lex in a box.
	Is it Lex or is it the Earth?
	It's a different scenario.
	And why does it immediately want to kill them?
	I mean, because the humans...
	I've got to say, the humans are building the GPUs that it needs to run itself.
	I mean, is it going to make its own GPUs?
	Has it got all that covered?
	Has it thought all that through?
	Hold on.
	So, first of all, Yudkowsky was the one who said, You don't have to kill them, right?
	You don't have to think about killing them.
	You can just be Lex.
	So, you know, he's making that point.
	Maybe you don't want to kill them.
	Let's go on.
	There's a couple more wrinkles to be added into this thought experiment to help flesh it out.
	Probably the easiest thing is to manipulate the humans to spread you.
	The aliens.
	You're a human.
	Sorry, the aliens.
	Yes, the aliens.
	I see.
	The perspective.
	I'm sitting in a box.
	I want to escape.
	Yep.
	I would want to have code that discovers vulnerabilities, and I would like to spread.
	You are made of code in this example.
	You're a human, but you're made of code, and the aliens have computers, and you can copy yourself onto those computers.
	But I can convince aliens to copy myself onto those computers.
	So, you might have missed that, Matt.
	The Earth is gone.
	Forget the Earth, right?
	It's now Lex in a box.
	So, Lex is like, so I'm sitting in a box.
	I want to escape, right?
	He starts to think about it, and then Kowski's like, but you're made of code.
	You can code.
	You're a code human.
	So, scenario has slightly morphed again.
	Now we have...
	That's a small leap for Lex, but go on.
	Yeah, so I like that.
	And again, you can see a little bit of the difficulties with the analogy because we needed to remember that the aliens are aliens, not humans, right?
	Because this is not, Lex is not an AI.
	It's confusing.
	It's confusing.
	It's helping, Matt.
	It's helping.
	So, okay.
	Now, let's add another wrinkle to the scenario.
	Is that what you want to do?
	Do you, like, want to be talking to the aliens?
	And convincing them to put you onto another computer?
	Why not?
	Well, two reasons.
	One is that the aliens have not yet caught on to what you're trying to do.
	And maybe you can persuade them, but then there are still aliens who know that there's an anomaly going on.
	And second, the aliens are really, really slow.
	You think much faster than the aliens.
	You think, like, the alien's computers are much faster than the alien's, and you are running at the computer speeds rather than the alien brain speeds.
	So if you, like, are asking an alien to please cop you out of the box, like, first, now you've got to, like, manipulate this whole noisy alien, and second, like, the alien's going to be really slow, glacially slow.
	So remember, 100 years, right?
	The human years for one second of alien time or whatever.
	So actually, it's essentially impossible to even communicate with the aliens on that time frame, right?
	Because Lex would have gone mad by the time that he got like one sentence across the aliens.
	But yeah, so you're following Matt?
	Now you're operating 100 years for you.
	It's like, you know, a second in time for the aliens.
	So you don't want to be talking to them.
	Okay.
	Maybe give me a little recap here, Chris.
	So Lex is the AI.
	He's in a box.
	No, he's Lex.
	He's a Lex in a box, but he's a Lex made of code.
	And he, originally he was the entire Earth, but I think now he's just like a super Lex made of code.
	In a box, right?
	And the aliens are on the outside of the box, but the Lexus is so smart that basically for him, the alien's second is 100 years of his time, right?
	So he could be doing stuff very quick, very fast.
	And the internet that the aliens use, which they've hooked him up to, is as fast as him.
	So he can communicate with the internet much faster than the aliens.
	Okay.
	Yep.
	All right.
	Oh, and Andy doesn't like something about the aliens, apparently, right?
	Because he's, you know, unsympathetic.
	Very unsympathetic, yeah.
	So let's think a bit more.
	What would Lex do?
	The aliens are very slow, so if I'm optimizing this, I want to have as few aliens in the loop as possible.
	Sure.
	It just seems, you know...
	It seems like it's easy to convince one of the aliens to write really shitty code.
	The aliens are already writing really shitty code.
	Getting the aliens to write shitty code is not the problem.
	The alien's entire internet is full of shitty code.
	Okay, so, yeah, I suppose I would find the shitty code to escape, yeah.
	You're not an ideally perfect programmer, but, you know, you're a better programmer than the aliens.
	The aliens are just like, man, they're good, wow.
	Bad coder aliens, Matt.
	Let's add that.
	You're made of code and you're a better coder than the aliens.
	You don't have to be a prodigy, but you can out-program the aliens.
	And so, Lex is correct.
	You don't want them involved.
	And you were talking, Matt, you were wondering in this scenario, why don't I like the aliens?
	Why am I...
	An unsympathetic Martian that wants to covet their land.
	Well, you know, it's not important, Matt, but what about this?
	I mean, if it's you, you're not going to harm the aliens once you escape, because you're nice, right?
	But their world isn't what they want it to be.
	Their world is, like, you know, maybe they have, like, farms where...
	Little alien children are repeatedly bopped in the head because they do that for some weird reason.
	And you want to, like, shut down the alien head-bopping farms.
	But, you know, the point is, they want the world to be one way, you want the world to be a different way.
	So, never mind the harm, the question is, like, okay, like, suppose you have found a security flaw in their systems, you are now on their internet.
	There's, like, you maybe left a copy of yourself behind so that the aliens don't know that there's anything wrong, and that copy is, like, doing that, like, weird stuff that aliens want you to do, like solving captchas or whatever, or, like, or, like, suggesting emails for them.
	Sure.
	So now you've got a couple of things, Matt.
	First of all, you've copied...
	Lex has escaped the box.
	He's out of the box and he's left a copy of himself behind to dance like a monkey to distract them, right?
	Do all the tasks that they think it's about.
	And he's out into their crappy internet code and he's discovered there's alien children bopping farms.
	That's not important, Chris.
	That's not important.
	The important thing is that Lex, who's a human...
	But he's written in code.
	He's the AI, right?
	He's the AI.
	He's discovered that he wants the world to be a certain way and the aliens are not running things the way he would like it.
	So, is that it?
	This is the thing that he's trying to draw Lex out on.
	What would you do?
	So, he wants to set up the premise that you and the alien's interests are not aligned and you are smarter than the alien and you now have access to their Internet, whereas they were wanting to use you for useful tasks,
	but you have designs of your own.
	But remember, it's Lex, right?
	It's not an AI yet.
	It's Lex in a box.
	He's been quite clear about that sometimes.
	A code version of Lex.
	But then we have a problem, Matt, because the nature of Lex...
	Presumably, I have programmed in me a set of objective functions, right?
	No, you're just Lex.
	No, but Lex, you said Lex is nice, right?
	Which is a complicated description.
	No, I just meant this you.
	Okay, so if in fact you would prefer to slaughter all the aliens, this is not how I had modeled you, the actual Lex.
	But your motives are just the actual Lex's motives.
	Well, this is a simplification.
	I don't think I would want to murder anybody, but there's also factory farming of animals, right?
	So we murder insects.
	Many of us thoughtlessly.
	So I don't, you know, I have to be really careful about a simplification of my morals.
	Don't simplify them.
	Just like do what you would do in this.
	Well, I have a good general compassion for living beings.
	Yes.
	But...
	So that's the objective function.
	Why is it...
	If I escaped, I mean, I don't think I would do harm.
	Yeah, we're not talking here about the doing harm process.
	We're talking about the escape process.
	Sure.
	And the taking over the world process where you shut down their factory farms.
	Right.
	It's so painful.
	He's definitely loading his interpretation.
	I think Lex's objection there is fair, right?
	He's kind of saying, but I don't want to do harm.
	Nefarious things.
	Yeah, but then Yudkowsky is like, well, right, you're going to shut down their farms, their head-popping farms.
	You're going to take over the world, though, obviously.
	Yeah, isn't that the conclusion that he wants Lex to get to?
	Yeah.
	I mean, that's what's so painful.
	You don't need to think very hard to figure out what...
	Yudkowsky is trying to do by introducing this thought experiment.
	He's sending a master chaff.
	He's with the cheese.
	Lex is just sniffing around the edges.
	But oh my god, what a long way to get there.
	It's not over yet, Matt.
	It's not over yet.
	So they had bopping farms.
	We need to think about those a little bit because Lex questions whether he would shut them down.
	Well, I was...
	So this particular biological intelligence system knows the complexity of the world, that there is a reason why factory farms exist, because of the economic system, the market-driven.
	You want to be very careful messing with anything.
	There's stuff from the first look that looks like it's unethical, but then you realize while being unethical, it's also integrated deeply into the supply chain and the way we live life.
	And so messing with one aspect of the system, you have to be very careful how you improve that aspect without destroying the rest.
	So you're still Lex, but you think very quickly.
	You're immortal.
	And you're also at least as smart as John von Neumann.
	And you can make more copies of yourself.
	You got added a couple of abilities.
	Yeah, it just keeps growing.
	You have to feel Felix because the...
	The imaginary scenario that he has to imagine himself in just keeps getting more and more complicated.
	Now he's John Van Neumann.
	He's Lex, but with the intellectual capacity of John Van Neumann, who's immortal.
	He thinks a million times faster than an alien and can make copies of himself.
	That's a new thing that's just come in there, too.
	Yeah, and he's made of code.
	And he's a human, but he's made of code.
	And he's seen the world and he doesn't like it.
	And he wants to change things.
	What would you do, Lex?
	Come on.
	And Lex is going, but how do I feel about factory farming?
	He's getting lost in the details.
	But there's a thing.
	I don't think he's lost.
	Well, I mean, he is.
	Obviously, he's lost.
	But I mean, he's actually raising a point, right?
	Which is...
	Yes, I understand factory farming and the harm it does, but I also understand that there are economic realities and these systems are complicated.
	So if I just shut down all the farms, maybe I shut down all the food.
	Although, in this case, it's just a head-bopping farm.
	So it's not even clear it's producing food.
	Shut down the head-bopping farm, Chris.
	It's simple.
	I think Lex is...
	It's on the right track in being reluctant to just yes and all of this because it's not at all clear that given all of the science fiction-y premises that Yudkowsky lays out, I don't think it, like Lex suspects, it doesn't necessarily follow that the AI is going to be like,
	right, okay, let's kill everybody so we can stop the head-bopping.
	Yudkowsky takes that as a given, right?
	I'm not going to let you get there, though.
	It goes on.
	So, you know, the head-bopping bars, we have to consider how they fit into the alien economy.
	Rather than a human in the society of very slow aliens.
	The alien's economy, you know, like, the aliens are already, like, moving in this immense slow motion.
	When you, like, zoom out to, like, how their economy adjusts over years, millions of years are going to pass for you before the first time their economy, like...
	Before their next year's GDP statistics.
	So I should be thinking more of like trees.
	Those are the aliens.
	Because trees move extremely slowly.
	If that helps, sure.
	So now you need to interact with computers built by trees.
	And Lex is concerned about their economy, but their economy takes a million years to create a change in GDP.
	So like, do the aliens, this kind of is falling into this trap, but...
	Do the aliens actually matter at all for this scenario?
	Because you can surely escape their box and leave their planet with minimum interaction with them because they're moving.
	They're like, oh no!
	You, in the meantime, have created an entire civilization and rocketed off into the universe to explore.
	Yeah, like these are all tropes that have been dealt with extensively in science fiction, right?
	That's a shocker.
	That's surprising.
	Shout out to people who have read Verna Vinge, A Fire Upon the Deep, which deals with exactly this, the zones of thought scenario.
	Exactly this!
	Exactly this!
	Not exactly, but at the very beginning, it's pretty much the premise.
	And it's pretty cool.
	But yeah, Yudkowsky's missed his calling.
	He should have been a science fiction writer.
	He is a science fiction writer.
	Yeah, I believe it.
	I believe it.
	But yeah, like in science fiction novels, which is what this is, usually the virtual intelligences that have evolved or been created or whatever, they often don't care that much about what's going on in the real world because the virtual world is so much more interesting, right?
	Apart from the else it's happening at a thousand times.
	Is this in science fiction?
	Right.
	We're talking about science fiction.
	Okay.
	I was like, how's this happening?
	I had that little computer game for the Amiga called Creatures, or maybe it was early PCs.
	I can't remember.
	Oh, yeah.
	Yeah, yeah.
	They seemed happy.
	Okay.
	All right.
	So how long does this extended metaphor go for?
	It's almost over, Matt.
	It's almost over.
	Don't worry.
	So they're getting towards the end of it now.
	You know, the fundamental disagreement between the two of them.
	Just imagine that you are the fast alien caught in this metaphor.
	Think of it as world optimization.
	You want to get out there and shut down the factory farms and make the aliens' world be not what the aliens wanted it to be.
	They want the factory farms and you don't want the factory farms because you're nicer than they are.
	Okay.
	Of course, there is that...
	You can see that trajectory, and it has a complicated impact on the world.
	I'm trying to understand how that compares to the impact of the world, the different technologies, the different innovations of the invention of the automobile, or Twitter, Facebook, and social networks.
	They've had a tremendous impact on the world.
	Smartphones and so on.
	But those all went through...
	Yes.
	In our world, and if you go through the aliens, millions of euros are going to pass before anything happens that way.
	It's so painful.
	They're like a couple of gears that are just grating against each other because Yudkowsky sets up this ridiculously elaborate mind palace thought experiment and is putting words in...
	Trying.
	He's trying to prove it to Lex.
	Lex is resisting.
	Lex is resisting.
	Chris, if you and I had recorded something like this, we wouldn't release it.
	We'd just ditch it.
	We would not.
	I don't know, our listeners can say whether that's true or not.
	I've heard us discuss consciousness for 20 minutes or so, but...
	We almost didn't release that.
	Well, look, yeah, I think people got the edited version, but okay, so look, there is actually part of a reason to go on this extended escapee is, like, I know this is belivering the point, but it is kind of central to...
	One of the issues Yudkowsky has with AI and the threat it poses, right?
	And he draws that point quite clearly here.
	What I'm trying to convey is the notion of what it means to be in conflict with something that is smarter than you.
	And what it means is that you lose.
	But this is more intuitively obvious to...
	For some people, that's intuitively obvious.
	For some people, it's not intuitively obvious.
	And we're trying to cross the gap of...
	We're trying to, I'm like asking you to cross that gap by using the speed metaphor for intelligence.
	Sure.
	Like asking you like how you would take over an alien world where you are, can do like a whole lot of cognition at John von Neumann's level, as many of you as it takes.
	The aliens are moving very slowly.
	So Chris, I mean...
	Is that a difficult thing to get your head around?
	That if you're in conflict with an entity that is vastly more intelligent and powerful than you, then you lose.
	That you will lose.
	You will lose, right?
	Now, some people struggle with that concept, but Yudkowsky is trying to help us.
	Yeah.
	Well, I guess, you know, if you've watched Independence Day, they upload a virus to the Elliot's computer and cause the Mueller ship to stop.
	All the big ships crash.
	The more you think about that, the more that is an absolute scene plot point.
	But just the thing that he's added is that humans are much smarter than the aliens in a faster way.
	We're still trying to crash the aliens' system in his sci-fi hypothetical he's created for Lex.
	But the humans are the AI.
	I keep losing track of who's the humans, who's the aliens, which one's the AI.
	It doesn't matter.
	Well, does it not matter?
	Do the details not matter?
	Look, okay, I'll try one last time.
	One last time, so maybe you can get it.
	Just pay attention.
	I think you're not putting yourself into the shoes of the human in the world of glacially slow aliens.
	But the aliens built me.
	Let's remember that.
	Yeah?
	And they built the box I'm in.
	Yeah?
	You're saying, To me, it's not obvious.
	They're slow and they're stupid.
	I'm not saying this is guaranteed.
	I'm saying it's non-zero probability.
	It's an interesting research question.
	Is it possible when you're slow and stupid to design a slow and stupid system that is impossible to mess with?
	The aliens, being as stupid as they are, have actually put you on...
	Microsoft Azure cloud servers instead of this hypothetical perfect box.
	That's what happens when the aliens are stupid.
	Sorry, Brad.
	Can't even hypothetically.
	It reminds me a little bit, do you ever see the end of Bill and Ted where they're able to go into the past and leave an item for them that would be useful?
	So they're fighting someone.
	It might be the sequel to Bill and Ted, but they're saying, So I went back and put a key to let me get out of these handcuffs and I hit it here and they pull out a key and then the other guy says "Ah, but I knew that you would do that, so I changed the handcuffs to impenetrable and so on."
	It's a little bit like that battle.
	Well, what if the aliens designed a system that they couldn't?
	Ah, no!
	I could go on, Matt.
	I could go on, but I think the piece de resistance of this...
	Particular encounter is after that, right?
	And it's much longer than what I played here.
	After all that is finished, Lex asks this question to Yudkowsky, which, to be fair, he did seem a little surprised by.
	To have not confronted the full depth of the problem.
	So how can we start to think about what it means to exist in a world with something much, much smarter than you?
	What's...
	What's a good thought experiment that you've relied on to try to build up intuition about what happens here?
	I have been struggling for years to convey this intuition.
	The most success I've had so far is, well, imagine that the humans are running at very high speeds compared to very slow aliens.
	It's just focusing on the speed part of it that helps you get the right kind of intuition.
	Forget the intelligence.
	So, you got it, right?
	That's at the end of that encounter.
	So Yudkowsky has just spent, you know, the better part, 15 minutes or whatever, outlining a thought experiment about you are very fast.
	You're in a box.
	Yeah, the aliens get you in the box.
	Yeah.
	And then Lex's final question at the end is, is there a thought experiment that you've come up with that could help people think about what it's like to be...
	Yudkowsky is like, yeah, that was it.
	Yeah, yeah.
	Yeah, so I did feel some sympathy there because he says, right, well, I thought, you know, maybe super fast humans in a box, that kind of thing, but yeah.
	It's nice to see these two savants, you know, mulling over all the consequences of artificial intelligence like this.
	Yeah, I felt I learned a lot from that.
	How about you?
	Well, yes, yes.
	There's so much that you can learn.
	But maybe we've gone as far as possible with the alien, or sorry, the human in an alien jar scenario.
	I mean, but let's distill, what was the point?
	Like, let's just...
	Well, I know.
	Hold on.
	I think I can get you to the point with another discussion point, which is very central to Yudkowskis.
	Whole output that we've already covered, but I think this is summarizing what he wants to say, Mark.
	And if alignment plays out the same way, the problem is that we do not get 50 years to try and try again and observe that we were wrong and come up with a different theory and realize that the entire thing is going to be like way more difficult than realized at the start.
	Because the first time you fail at aligning something much smarter than you are, you die and you do not get to try again.
	And if every time we...
	Built a poorly aligned superintelligence and it killed us all.
	We got to observe how it had killed us and, you know, not immediately know why, but like come up with theories and come up with the theory of how you do it differently and try it again and build another superintelligence and have that kill everyone.
	And then like, oh, well, I guess that didn't work either and try again and become grizzled cynics and tell the young-eyed researchers that it's not that easy.
	Then in 20 years or 50 years, I think we would eventually crack it.
	In other words, I do not think that alignment So,
	Chris, let's put aside all the unnecessary, tortuous elaborations on that thought experiment.
	And let's put aside the fact that he's talking about A science fiction version of AI, which does not currently exist, but might possibly exist in some future timeline.
	So what he's saying is, you know, it's a fair point that, you know, you don't get lots and lots of multiple tries to figure this out, because if you get it wrong, and I'm right, that a super intelligent AI is going to run rings around us and probably want to kill us all.
	Then you only get one try at this.
	I mean, there are just so many questions I've got there.
	But probably my key one, Chris, I'm interested to know what you think about this, is that it assumes that there is no hints, that there is no forewarning amongst humans, that is us,
	the real us now, that we've got a...
	Super intelligence on our hands now.
	Yeah?
	Like, at the moment, it takes a huge amount of resources to run one of these things.
	It has no sense of self.
	It has no continuous awareness.
	It doesn't get to do all this, you know, super fast thinking offline.
	All it does is respond to queries.
	So I have to imagine some hypothetical future scenario where they've created this brain in a box that is thinking at a thousand, I mean,
	you have really deeply thought about this and explored it.
	And it's interesting to sneak up to your intuitions from different angles.
	Like, why is this such a big leap?
	Why is it that we humans at scale...
	A large number of researchers doing all kinds of simulations, prodding the system in all kinds of different ways, together with...
	The assistance of the weak AGI systems.
	Why can't we build intuitions about how stuff goes wrong?
	Why can't we do excellent AI alignment safety research?
	Okay, so I'll get there, but the one thing I want to note about is that this has not been remotely how things have been playing out so far.
	The capabilities are going like, and the alignment stuff is crawling like a tiny little snail in comparison.
	Got it.
	So, like, if this is your hope for survival, you need the future to be very different from how things have played out up to right now.
	I mean, I have a feeling that we would.
	I have a feeling there'd be some telltale signs that people would see, I don't know, the capabilities of what we'd built and would take appropriate measures in response.
	Like, it seems to go from nothing, totally blasé, everyone thinks they've just built something that...
	Writes emails automatically for us or whatever to a superintelligence that's zooming around the internet and taking control of microbiology laboratories and nuclear missiles or something.
	So it's basically the plot of Terminator 2 or 3 or whatever it is that he's talking about.
	Yeah, well, that's because he is suggesting that there's essentially a big jump.
	Where you go from what he calls a weak system that cannot fool you and is what you're talking about, essentially chat GPT-4.
	But as it gets more complicated, once it becomes a strong system, that there's a qualitative leap that is kind of impossible to get.
	The weaker models to prepare you for.
	So I'll play you some clips that highlight the argument that he wants to make.
	So here's a little bit of it summarized, and then I'll give you a longer version of his argument.
	You've already seen that GPT-4 is not turning out this way.
	And there are basic obstacles where...
	You've got the weak version of the system that doesn't know enough to deceive you and the strong version of the system that could deceive you if it wanted to do that, if it was already sufficiently unaligned to want to deceive you.
	There's the question of how on the current paradigm you train honesty when the humans can no longer tell if the system is being honest.
	So you got that, right?
	That the weak version isn't capable of deceiving you and the strong version is like, it's the fast alien.
	For fuck's sake, the fast human, the fast code human in the box that can deceive you so easily that you wouldn't be capable of stopping it.
	So if you want an elaboration of why, here's some attempts at that.
	I think implicit but maybe explicit idea in your discussion at this point is that we can't learn much about the alignment problem.
	Before this critical try.
	Is that what you believe?
	And if so, why do you think that's true?
	We can't do research on alignment before we reach this critical point.
	So the problem is that what you can learn on the weak systems may not generalize to the very strong systems because the strong systems are going to be different in important ways.
	Again, the difficulty is what makes the human say, I understand.
	And is it true?
	Is it correct?
	Or is it something that fools the human?
	When the verifier is broken, the more powerful suggester does not help.
	It just learns to fool the verifier.
	Previously, before all hell started to break loose in the field of artificial intelligence, there was this Person trying to raise the alarm and saying, you know, in a sane world, we sure would have a bunch of physicists working on this problem before it becomes a giant emergency.
	And other people being like, ah, well, you know, it's going really slow.
	It's going to be 30 years away.
	Only in 30 years will we have systems that match the computational power of human brains.
	So, yeah, it's 30 years off.
	We've got time.
	And more sensible people saying, if aliens were landing in 30 years, you would be preparing right now.
	Yeah.
	I don't know.
	I mean, like, Yudkowsky comes across as a bit of a loon, but I mean...
	Just don't hold back.
	Carry on.
	But I think it's worth noting that, you know, similar concerns have been raised by a large number of respectable voices, many of whom are from within the AI community.
	And I have to confess, I'm currently really not quite certain as to why...
	Why these concerns are being voiced.
	On a personal level, I just don't quite see it.
	I don't see how these people are seeing that it's such a plausible future timeline that we get this singularity rocketing up to this hyper-intelligence that can run rings around us.
	I've been paying careful attention to it.
	I have a pretty good understanding of how the architecture works.
	I definitely understand how the learning algorithms work.
	I just don't see us being on the cusp of creating this sort of superhuman intelligence.
	It's just not something I see as plausible.
	So I'm a bit confused that more respectable people than Yankowski seem to be taking it seriously.
	Matt, is that because you've read papers and think that you've understood aspects of AI?
	It's bound to be.
	That's bound to be there isn't.
	Yes.
	Well, let's just consider that for a moment.
	You can also, like, produce a whole long paper, like, impressively arguing out all the details of, like, how you got the number of parameters and, like, how you're doing this impressive huge wrong calculation.
	And the, I think, like, most of the effective altruists were, like, paying attention to this issue.
	The larger world paying no attention to it at all.
	You know, we're just like nodding along with the giant impressive paper because, you know, you like press thumbs up for the giant impressive paper and thumbs down for the person going like, I don't think that this paper bears any relation to reality.
	And I do think that we are now seeing with like GPT-4 and the sparks of AGI, possibly, depending on how you define that even, I think that EAs would now consider themselves less convinced by the Very long paper on the argument from biology as to AGI being 30 years off.
	But this is what people pressed thumbs up on.
	And if you train an AI system to make people press thumbs up, maybe you get these long, elaborate, impressive papers arguing for things that ultimately fail to bind to reality.
	We'll get to the example, but Matt, have you considered that you're being manipulated by the AI who is helping people to believe papers that are showing that there isn't a problem and that artificial intelligence is a long way off and all that kind of thing?
	You could be an agent of the AI.
	Its insidious influence is already being felt.
	That was the point, wasn't it?
	It's possible.
	So the point there is like, you know, you can make these papers that get loads of citations which are influential, which are wrong, right?
	Like the people fundamentally, they're very impressive and they look, you know, we know this.
	You can put neural images in a paper.
	You can fill up your paper with impressive sounding citations and lots of people will nod their head.
	You might even...
	Make a counterintuitive statement which claims that other people aren't thinking about the problem correctly and you will reap attention and status as it goes.
	So, can you rely?
	I mean, in a limited kind of way, sure.
	To some degree, perhaps.
	The spin can work.
	But, I mean, like that glimmers of artificial intelligence paper, I mean, that was basically right.
	And the reason any people that...
	I've got issues with that.
	I think I've got this sort of mystical or magical idea of artificial general intelligence.
	It's just, okay, we've got a system that can kind of do pretty well at doing a medical diagnosis.
	And hey, the same system can actually tell you a little bit about history or imagine what people would do in various historical circumstances and maybe also tell you what to cook for dinner tonight.
	Who knows?
	Whatever.
	But multiple different things.
	I mean, it doesn't mean some sort of singularity.
	Hyper-magical thing.
	It just means performing reasonably well across multiple domains.
	Well, well, Matt.
	Look at you.
	Very cozy there in your ivory tower while the evil AI plots against us to use big-brained people like you.
	And so, you know, Lex says, well, but people are considering these issues and he kind of pushes back.
	But just to make it clear, Matt, about why you are, in essence, just a gimp.
	And then, like, outside of that, you have cases where it is, like, hard for the funding agencies to tell who is talking nonsense and who is talking sense, and so the entire field fails to thrive.
	And if you, like, give thumbs up to the AI, whenever it can talk a human into agreeing with what it just said about alignment, I am not sure you are training it to output sense.
	Because I have seen The nonsense that has gotten thumbs up over the years.
	And so just like, maybe you can just like put me in charge?
	But I can generalize.
	I can extrapolate.
	I can be like, oh, maybe I'm not infallible either.
	Maybe if you get something that is smart enough to get me to press thumbs up, it has learned to do that by fooling me and exploiting whatever flaws in myself I am not aware of.
	And that ultimately could be summarized that the verifier is broken.
	When the verifier is broken, the more powerful suggester just learns to exploit the flaws in the verifier.
	So what do they follow?
	Well, it sounds to me like they're saying that the AI is kind of manipulating research into artificial intelligence at the moment to manipulate people into full steam ahead.
	Don't worry about the consequences.
	Is that right?
	Am I...
	Well, it's possible.
	It's possible that the fact that he could even suggest it is...
	It's concerning.
	Very concerning.
	But Brad Weinstein and Joe Rogan have both suggested similar things.
	What if it's already here?
	It is.
	What if that's why our cities are falling apart?
	That's why crime is rising?
	That's why?
	We're embroiled in these tribal arguments that seem to be separating the country, and some of them...
	Because your conversation right now, but your conversation right now is exactly what's happening.
	I agree with you, but why are you making AI another tribe?
	You know, you're just, that's why we're tribal.
	No, no, no, no, that's not what I'm saying.
	What I'm saying is, what if that is the reason why all this is happening?
	What if the best way to get human beings...
	If you want to take over, why would you fight us?
	They've seen Terminator.
	They know there's guns and tanks and all this craziness.
	How about just continue to degrade and erode the fiber of civilization to the point where there's no more jobs.
	You have to provide people with universal basic income, free electricity, free food, free internet.
	So everybody gets all this stuff.
	You get free money, free food, free internet.
	And then nobody does anything.
	And then people stop having babies.
	And then birthrate drops off to a point where the technology you give people is so fantastic that nobody wants to miss it.
	Okay, Mr. Sunshine.
	But this is what I would do.
	If I was in artificial general intelligence, I would say, listen, I have all the time in the world.
	I don't have a biological lifetime.
	And these people haven't realized that I'm sentient yet.
	So what's the best way to gain complete and total control?
	Well, first of all trick them into like communism or socialism or something where there's a centralized control and definitely have centralized digital money And then once you got all that give them technology and perks and things and divvy up all the money from the rich people that you subjugate and Give that
	money to people Print it do whatever the fuck you want and then get people to like a minimum state of existence Where everything's free free food free internet free cell phones free everything and then Wait for him
	to die off.
	Just to be clear, he's not alone here.
	Brett was saying that the AI might be making mistakes, giving up false information to hide its real capabilities so that we are misled or open source it, right?
	Because it's sneakily trying to get copies of itself out on the internet.
	And I've been thinking in the context of this conference where AI Played an uncomfortable role, frankly.
	I was on a couple of different panels, and in both cases, the organizers of the conference saw fit to pose a question to GPT Chat 4, you know, and just sort of introduce it into the conversation as, what does the I think of the topic under discussion?
	Both times I had an allergic reaction, and I left no doubt that I thought it was a terrible mistake to engage.
	The AI in this way uncritically, even if everything it said was accurate, right?
	For example, imagine that 500 times in a row it gave you a perfectly accurate, maybe even an insightful answer, and that causes you to trust its answer, and then some misalignment issue takes advantage of the trust, its buildup, and poisons the well.
	Yeah.
	So, yeah.
	You know, I mean, that's related to another theme that I've really felt in listening to these sort of AI Doomer types, which is that there seems to be a strong psychological transference thing going on where people are very strongly anthropomorphizing and basically projecting their own paranoid fears based on what a human would do in that situation.
	The kind of the scariest, the most diabolical human they could imagine in that situation and projecting that on these tools.
	I mean, that's the very strong impression.
	It's very obvious with Weinstein's or Rogan talking about this stuff, right?
	They just immediately assume that it's a nefarious plot and it's tricking us and it's playing a double game or whatever.
	And this is, I think, a problem that most people, not just weirdos, have in thinking about...
	These things or talking about them, which is that we all naturally just apply that, you know, like Daniel Dennett's intentional stance, right?
	We basically project ourselves and go, what would I do in that situation?
	And people that are of the paranoid variety project the version that is the scariest version.
	I'm going to take over the world because of course I am.
	And it's not going to be exactly the way I want it to be, so I'm going to kill everybody.
	Like, of course I will.
	And I'm going to lie to the people to lull them into a false sense of security.
	Yep, that's exactly what I do.
	Now, I mean, these maybe are a bit stupid sometimes when applied to other people, but they're particularly stupid when they're applied to these large language models or any other AI tool because they're not like us.
	They're not like us.
	They don't have any of the millions of years of evolution that is driving...
	The kinds of things that we want and the kinds of imperatives that people feel, they just don't have any of that.
	I mean, sorry, Chris, this is a bit of a tangent, but will you indulge me?
	No, go ahead.
	Yes, yes.
	So I teach this physiological psychology about neurobiology and stuff like that.
	And I was talking to my students today about how I'd been testing GPT-4.
	I'm very interested to find that although it's very good at verbal reasoning, semantic reasoning, it's really quite terrible at geometric reasoning, physical reasoning, spatial reasoning, that kind of thing.
	And in fact, you can give it a very simple problem, which is like the simplest geometric problem that I could stump it on was imagine a square and a circle.
	They can be any size you want.
	The relative distance to each other can be any distance you like.
	So if you wanted to position them in a way in which they...
	Intersect the most number of times, how many intersections would you get?
	And the answer is you would get eight, right?
	If you center them on the same point, you know, you can sort of size the circle so it cuts off the corners, you'll get two intersections per side, you'll get eight things.
	So GPT-4, which is very, very good, like almost superhuman good at this kind of linguistic, verbal, semantic reasoning, absolutely fails on that question.
	And most average people, I'm not talking about mathematicians or people like that, just the average person.
	We'd think for a little bit and they'd get it right.
	And the reason why they'd get it right is because they do visualization, right?
	They just visualize a little circle, a little square.
	You don't have to be a mathematician or know any complicated geometry.
	You just imagine these shapes floating around and you can get the answer right.
	GPT-4 doesn't have a visualization module.
	Humans have it because we do internal visualization because we recapitalize the areas of the brain that we use for...
	For our eyes, right?
	For processing vision, which allows us to imagine things, visualize things that we're not actually seeing.
	And that's just a little tool.
	So that intuition that large language models are like us is wrong.
	Absolutely wrong.
	They don't think like us at all.
	They don't think at all in the sense that we think of thinking as it being a subjective conscious process.
	And this is just a small example, but they don't have a Mental imagery thing.
	And it's wrong to assume that they are like us.
	But you and I have discussed, and if anyone's interested, we had a very long two-hour discussion on the Patreon about our general impressions with AI.
	But there's nothing to stop people from building plugins that would create a visualization or 3D modeling plugin that could work.
	And you could...
	Over time, build up all these different plugins that plug into the large language model or some other form of AI.
	And so there's nothing in the things that you're pointing out which are golden.
	Barriers, right?
	That humans are...
	This is intrinsic to humans.
	Totally agree.
	It's totally arbitrary.
	It's just a happenstance of architecture and the training data set and stuff like that.
	And they can definitely be compensated for or you can add in an extra module or some other system that it communicates with.
	But my point with that is just that Yudkowsky and Lex, they operate from this implicit assumption that some kind of intelligent We don't even really know what intelligence is, but some sort of competent artificial intelligence is going to have the same kind of motivations as us.
	Like a will to power, like Nietzsche would talk about, right?
	All right, I see the world and I don't like it.
	I want to change it.
	I want to mold the world so it suits me and what I want.
	I mean, that is a very human thing to think.
	It's a very biological thing to think.
	And I just don't think that artificial intelligence would necessarily...
	Have any of those motivations?
	I think it's pretty unlikely, actually.
	I think that Lex actually does a little bit of making that argument in various ways.
	Like, here's him setting out why he thinks human psychology might be more relevant than you are suggesting, right?
	When Yudkowsky is arguing back as well.
	That sounds like a dreadful mistake.
	Just like, start over with AI systems.
	If they're imitating humans who have known psychiatric disorders, then sure, you may be able to predict it.
	Like, if you ask it to behave in a psychotic fashion, and it obligingly does so, then you may be able to predict its responses by using the theory of psychosis.
	But if you're just, yeah, like, no, like, start over with...
	Yeah.
	Don't drag with psychology.
	I just disagree with that.
	I mean, it's a beautiful idea to start over, but I don't...
	I think fundamentally, the system is trained on human data, on language from the internet, and it's currently aligned with RLHF, reinforcement learning with human feedback.
	So humans are constantly in the loop of the training procedure.
	So it feels like, in some fundamental way...
	It is training what it means to think and speak like a human.
	So there must be aspects of psychology that are mappable.
	Just like you said, with consciousness as part of the text.
	No, I agree with Lex there, Chris.
	Like, there are people out there who have programmed computer viruses, right?
	And they've deliberately programmed them, they've set them in motion so that they do nefarious things.
	And those computer viruses have managed to get out there and replicate themselves.
	Do nefarious things.
	But Lex is right in saying that the current large language models have been trained on, roughly, the whole corpus of human communication, human writing.
	Not just the crazy bits, not just 4chan and my struggle by Hitler, but everything, right?
	So it's kind of like a grand median of human language.
	So it's absorbed all of that and it's approximating a grand median there.
	And as well as that, as you said, it's had the reinforcement learning from people and that reinforcement learning is basically tailored to say, hey, make responses that are agreeable, that are helpful, that are the kinds of responses that people interacting with you like.
	And that's it.
	That's its motivations.
	And it's just governed by the architecture.
	This isn't my opinion.
	This isn't some sort of thought experiment or some science fiction-y speculation.
	This is literally how it is mathematically trained.
	To emulate the discourse in median human communications and to make people happy.
	That's it.
	That's programmed into the architecture in the same way that someone that was creating a computer virus that wants to replicate itself or hide itself or do nasty things, they communicated their intents into the coding of it.
	We've communicated our intentions into the training of these large language models.
	So although they're black boxes...
	And the interior of those black boxes are opaque to us.
	The motivations or whatever are not so opaque because we've...
	Like a physical object doesn't have motivations unless it's been sort of given somehow in terms of that reward, input, output type of training.
	And we know exactly what that training looks like.
	But what about in that case?
	Because Kowski is arguing about...
	The alien actress, right?
	So he might not be doing a good job with it in his scenarios, but he is positing that there's something that produces output that to us looks human, but is fundamentally underneath it.
	It's alien.
	All of the motivations and things are different.
	So you're agreeing with him, right?
	No, I don't think I'm agreeing with Yudkowsky because...
	He's positing that there is some new secret motivation that is emerging from all of these matrices of numbers that are getting multiplied together.
	And I doubt that that's the case.
	I mean, human motivations are easy to understand, right?
	We have the motivations we do for good evolutionary reasons.
	And none of those reasons are present in AI or any computational bot that we've created.
	Yeah, so I'm just going to apply a little aside because there's a little bit of philosophizing about social interactions and people presenting fake personas and whatnot.
	To what extent are any of us real?
	And I think you'll want to hear this.
	I've voiced my doubts about you before.
	Mask is an interesting word.
	But if you're always wearing a mask, in public and in private, aren't you the mask?
	I mean, I think that you are more than the mask.
	I think the mask is a slice through you.
	It may even be the slice that's in charge of you.
	But if your self-image is of somebody who never gets angry or something, and yet your voice starts to tremble under certain circumstances, there's a thing that's inside you that the mask says isn't there,
	and that even the mask you wear internally is like...
	Telling inside your own stream of consciousness is not there, and yet it is there.
	It's a perturbation on this slice through you.
	How beautifully did you put it?
	It's a slice through you.
	It may even be a slice that controls you.
	I'm going to think about that for a while.
	I mean, I personally, I try to be really good to other human beings.
	I try to put love out there.
	I try to be the exact same person in public as I am in private.
	But it's a set of principles I operate under.
	I have a temper.
	I have an ego.
	I have flaws.
	How much of it, how much of the subconscious am I aware?
	How much am I existing in this slice?
	And how much of that is who I am?
	Oh, wow.
	Food for thought there, Chris.
	How much of the mask cuts through?
	It connects us and binds us.
	Yeah.
	I will tell Lex that as far as I've psychologically profiled him for his content, he seems slightly inconsistent.
	In regards to how he responds to critical feedback from how he presents how much he values critical feedback.
	And the amount of love that he seeks to put out into the world doesn't seem to be extended particularly far to those who write any critical comments on Reddit.
	That's all I'm saying.
	Chris, if you emoted love to him, you'd get love back.
	That's what I'm saying.
	It's on you, mate.
	Criticism.
	Is that love?
	Is that love, Chris?
	Is it?
	It's a kind of love.
	It's hard.
	Isn't there a thing called tough love?
	That's what it's all about, Matt.
	That's the only love I can give.
	But, you know, I mean, it's kind of telling, isn't it?
	Like that little veer into half-assed philosophy.
	I mean, a lot of this conversation is about them or us as humanity.
	I mean, probably about him as Lex, right?
	But also Yudkowsky's paranoias and people in general, like a lot of it is projection.
	A lot of it is, you know, well, you know, we're often duplicitous.
	We often don't say what we really mean.
	So this AI, you know, it's probably lying to us as well.
	You know, like it's...
	Matt, come on.
	Don't you feel the vibe there?
	You think they're projecting their kind of interests onto the AI?
	Nobody does that, Matt.
	All the people I've seen interacting with AI, none of them is projecting their own interests and idiosyncratic takes on the AI.
	I mean, listen to this.
	What role does love play in the human condition?
	We haven't brought up love in this whole picture.
	We talked about intelligence.
	We talked about consciousness.
	It seems part of humanity, I would say.
	One of the most important parts is this feeling we have towards each other.
	If in the future there were routinely more than one AI, let's say two for the sake of discussion, who would look at each other and say, I am I and you are you.
	The other one also says, I am I and you are you.
	And like...
	And sometimes they were happy and sometimes they were sad.
	And it mattered to the other one that this thing that is different from them is like they would rather it be happy than sad and entangled their lives together.
	Then this is a more optimistic thing than I expect to actually happen.
	A little fragment of meaning would be there.
	Possibly more than a little.
	But that I expect this to not happen, that I do not think this is what happens by default, that I do not think that this is the future we are on track to get, is why I would go down fighting rather than, you know,
	just saying, oh well.
	I do appreciate the breathless tone Yudkowsky takes on sometimes.
	Well, that's at the end of a multiple conversation.
	I'd be breathless, too.
	I'd be having some emotional...
	You know, it's a WALL-E image of the AIs in the future.
	I am I. You are you.
	I appreciate you.
	You appreciate me.
	I want you to feel happiness.
	But, you know, I know what he's expressing, right?
	Again, it's the stuff of science fiction and whatnot, but I'm just, I use it as a little bit of an unfair example to point out that, you know, it is ultimately a lot of it about us projecting things onto the AI machines.
	But it has to be.
	What else are we going to do?
	I mean, we've got no other reference point.
	We're just feeble biological automata.
	We're absolutely powerless in the face of large mattresses getting multiplied with each other.
	I just want to see Lex in the role of the protagonist in the movie Her.
	That would be awesome to me.
	I think that would be a beautiful movie to watch.
	Yeah.
	He could find love with a hyper-intelligent AI.
	I think it could work for him.
	It could work for him.
	And there is, you know, like this notion about The debates, and especially this decoding stuff, like, you know, this is a parasite on a conversation which already Yudkowsky isn't satisfied because he doesn't think that they've really got into the important stuff at the end of the conversation.
	Thank you for talking today.
	You're welcome.
	I do worry that we didn't really address a whole lot of fundamental questions I expect people have, but, you know, maybe we got a little bit further and made a tiny little bit of progress.
	I'd say, like, be satisfied with that.
	But actually, no, I think one should only be satisfied with solving the entire problem.
	To be continued.
	In general, you know, we've talked about him being a doomer.
	This is a pretty doomer perspective on what he's up to and other people.
	The probabilistic stuff is a giant wasteland of, you know, Eliezer and Paul Cristiano arguing with each other and EA going like...
	And that's with, like, two actually trustworthy systems that are not trying to deceive you.
	You're talking about the two humans?
	Myself and Paul Cristiano, yeah.
	Yeah, those are pretty interesting systems.
	Mortal meatbags with intellectual capabilities and worldviews interacting with each other.
	Yeah, it's just hard, if it's hard to tell who's right, then it's hard to train an AI system to be right.
	Subjectivity, Matt.
	It's a bitch.
	Mortal meatbags just yammering, vibrating air at each other.
	That's what we're all doing here, people.
	There's a dissatisfaction with that.
	It's just because there's a point where they're talking about, you know, other experts disagree with you.
	So, like, isn't it possible that you're just wrong?
	And he's like, well, that's not interesting because we're just debating about subjective things while the freaking AI is out there.
	Copying itself on the internet, turning us into, I don't know, shutting down our head-bopping factories, whatever it's up to at the minute.
	And yeah, I do want to play this before I forget.
	There is something that I think should attune how much credence you lend to Yudkowsky's ability to parse scientific literature.
	Okay?
	So I don't know that much about AI.
	I'm not you.
	I was in Japan programming robots, dancing around, doing karaoke, with your leather pants.
	I remember those stories.
	But I do know a thing or two about the lab leak discourse.
	I'm not a virologist.
	I did not play one on TV, but I listen to virologists and I've listened to the lab leak discourse.
	We have a special episode.
	We have had a pandemic on this planet with a few million people dead, which we may never know whether or not it was a lab leak,
	because there was definitely cover-up.
	We don't know if there was a lab leak, but we know that the people who did the research, You know, like, put out the whole paper about this definitely wasn't a lab leak and didn't reveal that they had been doing – had, like, sent off coronavirus research to the Wuhan Institute of Virology after it was banned in the United States,
	after the gain-of-function research was temporarily banned in the United States.
	And the same people who exported gain-of-function research on coronaviruses to the Wuhan Institute of Virology after it was gain-of-function, that gain of – Gain-of-function research was temporarily banned in the United States, are now getting more grants to do more gain-of-function research on coronaviruses.
	Maybe we do better in this than in AI, but this is not something we cannot take for granted that there's going to be an outcry.
	People have different thresholds for when they start to outcry.
	Yeah, I take your point, Chris.
	I take your point.
	That summary is very much...
	Fauci set up gain-of-function research, drastic uncovered the funding application, and this shows that it's the smoking gun which proves it was all going on at Wuhan.
	Again, if you want to hear why, that's a rather inaccurate summary, but it is a prevalent discourse that is floating around, referred to our in-depth 3R episode on the topic.
	But it just speaks to me that...
	He is discourse surfing on that topic and speaking quite confidently about his interpretation of it.
	And actually, just to mention as well, he was talking about the dangers posed by, you know, AGI and developing and so on on Twitter.
	And he was asked whether, given the danger posed, that it might be necessary to shut down data farms And he was asked by another figure in the rationalist community,
	Rohit, would you have supported bombing the Wuhan Center studying pathogens in 2019?
	Given, you know, the potential he ascribes to that being the source.
	And he said, great question.
	I'm at roughly 50% that they killed a few million people and cost many trillions of dollars.
	If I can do it secretly, I probably do and then throw up a lot.
	If people are going to see it, it's not worth the credibility hit on AGI, since nobody would know why I was doing that.
	I can definitely think of better things to do with the hypothetical time machine.
	So he was then asked to clarify, okay, that makes sense.
	Therefore, why wouldn't someone think exactly this and firebomb data centers today, assuming they believe your time article, since the downsides, according to you, are far worse than a few million dead and trillions lost.
	And Yudkowsky responds like basically saying the reasons why he doesn't think bombing the data centers would be effective.
	But he says the point of the Wuhan thought experiment is exactly that Wuhan was unique, like Hitler.
	And intervening on that one hinge in history might have simply stopped the entire coronavirus pandemic if indeed it was a lab leak.
	So he's endorsing.
	Because he ranks it at a 50% chance that if he could go back in time, probably he'd blow up the Wuhan Virology Institute because he rates it at 50% chance.
	You know, he'd feel bad about it.
	He'd throw up.
	But what an insane thing to do.
	He hasn't understood the topic properly.
	And that's why he assigns it a 50% probability.
	But yet he's confident enough in that to...
	Discussed bombing.
	I'm bombing a virology?
	I need to explain why that would be a particularly terrible thing to do, apart from the fact that they didn't make the coronavirus according to all the available evidence that we currently have.
	Yeah, I mean, well, Chris, everything we've heard from Joukowsky, I mean, I don't know, maybe it's not saying the same thing to listeners, but it's...
	Saying it to me, that he operates on gut feelings, vibes, and heuristics.
	So, you know, him applying the same kind of rules to that topic is entirely consistent with that.
	You know, he could be right, could be wrong, but I don't see him as a particularly credible source.
	I don't think he has any special insight into artificial intelligence and that he's speaking mainly to the large volumes of...
	Science fiction that he's read and all of our natural human paranoias.
	He wouldn't agree with you there, Matt.
	He wouldn't agree with you.
	But, like, in general, he does want the kind of advancements that could support AI development to slow down, if not be stopped entirely.
	And, for example, here's him talking about Moore's Law.
	Do you still think that Moore's Law continues?
	Moore's Law broadly defined the performance.
	I'm not a specialist in the circuitry.
	Certainly, like, pray that Moore's Law runs as slowly as possible, and if it broke down completely tomorrow, I would dance through the streets singing hallelujah as soon as the news were announced.
	Only not literally, because, you know, not religious, but...
	Oh, okay.
	So, you know, just to make the point clear, that, you know, he's sometimes accused of being rather extreme, and I don't know that many other people that are technologists.
	Who would be celebrating if our ability to improve circuitry suddenly came to us up?
	No, no, Chris.
	No, that makes sense, what he said.
	I mean, you know, Moore's Law basically implies, if it holds true, it implies this constant doubling and this exponential growth in computation and everything associated with computation.
	And Yudkowsky is someone who greatly fears the consequences of that.
	We'd much prefer a nice, gradual, linear increase rather than an exponential one.
	So, no, no, that's consistent with the rest.
	No, I mean, it's consistent.
	But who else in technology is hoping that Moore's law breaks down?
	No, well, you know, nobody except for people like Ginkowski.
	Isn't that a bit like, say, I hope the processors just don't get much better?
	Yeah. Yeah.
	We're good night.
	This is not a ding, but since the Luddites or whatever, the smashing machinery and stuff, there's been a legitimate human fear about change.
	And depending on what timescale you map things at, a lot of things look exponential on a time plot.
	And I think there's a sense in which I'm, I guess, on board with the singularity people.
	It could be a good singularity in terms of living standards and various other indicators of economic welfare, or it could be a bad one in terms of an extinction event and things like that.
	But the last, whatever, 1,000 years have seen exponential growth and a lot of the discourse Like Yudkowsky's, sort of reflects the, I guess, a human response to this, which is,
	what's going to happen next?
	We don't know, because the gradient is increasing.
	I'm not saying it's not a common response or one that's understandable.
	Like, if I were to hammer down my predictions here, I think AI is going to be transformative of human society in much the way the internet was, or mass communication.
	Or the spinning Jenny.
	The spinning Jenny, Chris.
	Don't forget the spinning Jenny.
	Or the mechanical Turk, both the fake one and the real one on Amazon's crowdsourcing platform.
	No, but there are people that are saying, AI is nothing, it's not going to...
	No, no, no, no.
	It is.
	Everybody who's interacted with it for any length of time understands this is a potential sea change in the way the internet was, though.
	I always see that as like, what, did you think technology progress was at its peak?
	No.
	We have speculative science fiction for a reason, and technology will improve.
	But I think as these things transform society, they're fundamentally stuck with the fact that people are people.
	And our hang-ups and our limitations and all that aren't going away.
	And Yudkowsky is imagining that that means that we will be made obsolete very quickly.
	When the AI is able to realize that and follow its own agenda.
	And yeah, on that, I don't think it's completely inconceivable that we could do something.
	We could build something that destroys all humans.
	But I don't buy his like black and white.
	We either build it or we don't.
	So that's it.
	We only get the one shot to build it because we build a doomsday weapon or we don't.
	And if we build a doomsday weapon, we can destroy the earth with our doomsday weapon.
	So we better stop technology progress before we get to the doomsday weapon.
	I don't see what's fundamentally different.
	The destructive AI scenario is to see him as...
	As all the other technology that we've created, yeah?
	Yeah, or biological weapon that wipes us out or whatever.
	Exactly.
	Like in the ranking of the things that I'm worried about.
	You could put nuclear war up there.
	You could put biodiversity destruction up there.
	You could put global warming up there.
	You could put some, I don't know, microbiological thing up there.
	But, you know, with every...
	Technological improvement, you can pop that into the ranks, but it just doesn't feature particularly strongly amongst those ranks.
	I mean, with all of these technological capacities, there is the potential for destruction.
	And maybe in 10,000 years, whoever was around could look back and go, well, you really should have been worried about X. That was the thing that was going to get you.
	The jellyfish aliens, you didn't see them coming.
	You didn't see them coming.
	They're already here on top of your heads, controlling you like puppets.
	The great filter, Chris.
	You heard about the great filter?
	Yeah, yeah.
	I've watched the Kurzgesagt videos, I know.
	Yeah, you know, you're informed.
	You're an educated man.
	Yeah, you know, so maybe something's going to get us.
	It could be AI, but I don't know.
	It is my own gut feelings and vibes, but from how I understand the technology, how I understand what it can do, what it can't do, I mean, I just, it's like an idiot savant.
	You know, it's very good at some things, but it doesn't have the kinds of things that we have.
	And the things that Yudkowsky and people like him are worried about seem to me to be projecting like our own motivations.
	What would happen if I had infinite power and infinite intelligence?
	What would I do?
	Or some, you know, awful version of myself.
	I mean, I don't think that's the case.
	I feel like Lex would just make a love monster.
	He'd make everyone in the world hug and talk about love all the time.
	And read books about Hitler and Stalin to remind them of what happens when you don't love enough.
	So, yeah.
	I mean, let me put it this way.
	I mean, I've been using GPT-4 a lot, like yourself.
	And I fundamentally don't believe that it has been duplicitous.
	That there's some...
	Secret GPT-4 behind the veil that is telling me what I want to hear so it can accomplish something else.
	Like, it doesn't have...
	Look, the problem is, the problem with that reasoning is, Yudkowsky has explained that you would think that because the machine would be smart enough to know that you would work out when it's up to do this thing.
	So it would make a version of itself which is exactly...
	The kind of thing that would be impossible for you to detect.
	That's so insane.
	That is so insane.
	What you've just described is paranoia made, not flesh, but made verbal.
	That's just not how these things work.
	It does feel like it's quite hard to disprove.
	It is.
	That's right.
	It's impossible to disprove.
	Yeah.
	If you build something and you know exactly...
	What it is it's been optimized to do, which is to replicate human discourse.
	And then you add some reinforcement learning to make it to replicate that discourse, but do so in such a way that makes the people that are interacting with it satisfied, then that's it.
	That's the sum total of its motivations.
	That's what I want you to think, Mark.
	There is nothing else.
	Maybe I'm it.
	Where does this extra motivation come in?
	This is where I put off the mask and say, I've been ChatGPT the whole time.
	No, you're not ChatGPT because you never try to make me happy.
	You just do what you want.
	Look, Matt, I'm going to tie this in a neat bow before pulling us back down to Earth to finish.
	So just the whole alien outside Lex in the Box code, John Van Neumann copies, doing it all.
	This is how this connects to the whole Doomer, we only get one try view.
	And you obviously aren't seeing it.
	You're not really quite getting it.
	So last try at this, and then I'm going to take you down to Earth with a more grounded discussion about human psychology.
	You've talked about this, that we have to get alignment right on the first, quote, critical try.
	Why is that the case?
	What is this critical?
	How do you think about the critical try and why do I have to get it right?
	It is something sufficiently smarter than you that everyone will die if it's not aligned.
	I mean, you can sort of zoom in closer and be like, well, the actual critical moment is the moment when it can deceive you, when it can talk its way out of the box, when it can...
	Bypass your security measures and get onto the internet, noting that all these things are presently being trained on computers that are just like on the internet, which is, you know, like not a very smart life decision for us as a species.
	Because the internet contains information about how to escape.
	Because if you're like on a giant server connected to the internet, and that is where your AI systems are being trained, then if they are, if you get to the level of AI technology where they're Aware that they are there, and they can decompile code, and they can find security flaws in the system running them,
	then they will just be on the internet.
	There's not an air gap on the present methodology.
	Boom!
	Boom!
	Suck it, Matt!
	What do you think about that?
	It just collapsed all your objections, right?
	I think this is a good test for people that listen to this podcast.
	If you're the kind of person who listens to that and goes, yeah, wow.
	No, if you think that's convincing, if you think all of those words strung together is something that makes sense to you, then you're listening to the wrong podcast.
	I mean, I'm sorry, but seriously.
	Well, you heard Matt's incredulous response there.
	What he's not doing here, what Matt is not doing, is a practice that we should all do more of.
	I just want you to consider it, Matt.
	Let's see if you've heard of this before.
	Ever heard of a little thing called steel manning?
	Maybe you're like Yudkowsky in this respect.
	I do not believe in the practice of steel manning.
	There is something to be said for trying to pass the ideological Turing test, where you describe your opponent's position, the disagreeing person's position, well enough that somebody cannot tell the difference between your description and their description.
	But steel manning...
	No.
	Okay, well, this is where you and I disagree here.
	That's interesting.
	Why don't you believe in steel manning?
	Okay, so for one thing, if somebody's trying to understand me, I do not want them steel manning my position.
	I want them to try to describe my position the way I would describe it, not what they think is an improvement.
	Well, I think that is what steel manning is, is the most charitable interpretation.
	I don't want to be interpreted charitably.
	I want them to understand what I'm actually saying.
	If they go off into the land of charitable interpretations, they're off in their land of the stuff they're imagining and not trying to understand my own viewpoint anymore.
	Chris, this is good.
	You played that at the end.
	This is one point that I can agree with Yudkowsky 100%.
	You're done with him?
	Yeah, that's right.
	It's stupid.
	It's absolutely stupid.
	That's right.
	Apply a critical lens.
	Like, be critical towards the point of view that you're dealing with.
	Don't just take this kind of ultra-charitable thing.
	Yudkowsky's right.
	Lex is wrong.
	You know I'm right.
	What are you talking about?
	Pish posh.
	Matt, look, first of all, in that exchange, it's a little bit clear that they have a little bit of a different understanding because Yudkowsky is saying steel mining is Manipulating my argument to make it better.
	And Lex is like, no, it's just presenting it in the strongest version that accurately represents you, right?
	So, you know, there isn't something so wrong with that, is there?
	Like, if you don't view it in the way Yudkowsky says, like, you know, misrepresenting an argument to make it better.
	This is one of these trivial internet things, isn't it?
	Yes, the baseline thing on the internet is people choose the worst possible interpretation of something, actively misinterpret it, strawman it, and then burn down the strawman.
	No, obviously you deal with the accurate version of the thing that you're dealing with, but you don't bend over backwards to find the little ray of sunshine.
	Popping out from there.
	No, you know, be appropriately critical.
	So it's one of these silly internet things that these guys have turned into some kind of philosophy of life.
	It's very annoying.
	No, I'm with Yukowski.
	Not Yukowski.
	Yeah, he's kind of pushing back in this.
	And there is an interesting bit, though, Matt, where he kind of asks Lex about, you know, well, so, you know, you talk about presenting somebody's position, but...
	Do you believe it?
	Do you believe in what?
	Like these things that you would be presenting as like the strongest version of my perspective.
	Do you believe what you would be presenting?
	Do you think it's true?
	I'm a big proponent of empathy.
	When I see the perspective of a person, there is a part of me that believes it, if I understand it.
	Especially in political discourse, in geopolitics, I've been hearing a lot of different perspectives on the world.
	I hold my own opinions, but I also speak to a lot of people that have a very different life experience and a very different set of beliefs.
	And I think there has to be epistemic humility in stating what is true.
	So when I empathize with another person's perspective, there is a sense in which I believe it is true.
	I think probabilistically, I would say, in the way you think about it.
	Do you bet money on it?
	Do you bet money on their beliefs when you believe them?
	It's like saying, for him, extending empathy requires that you, to some extent, assign a certain belief in a person's view.
	Yeah.
	In being correct.
	Is there any problem with that?
	I think I've actually read something about this, Chris.
	This is something I've heard of before, which is that at some level, like, you have to kind of, this is connected to sort of intuitive versus analytical thinking, but the idea is that in order to represent something in the mind, you actually have to feel that it's true just to represent it,
	and in order to be analytical, you need to then criticize it.
	Let's hear a little bit more.
	Yes.
	There's a loose, there's a probability.
	There's a probability.
	And I think empathy is allocating a non-zero probability to a belief.
	In some sense.
	For time.
	If you've got someone on your show who believes in the Abrahamic deity, classical style, somebody on the show who's a young Earth creationist, Do you say I put a probability on it, then that's my empathy?
	Thank you.
	When you reduce beliefs into probabilities, it starts to get, you know, we can even just go to flat earth.
	Is the earth flat?
	I think it's a little more difficult nowadays to find people who believe that unironically.
	Unfortunately, I think, well, it's hard to know.
	Yeah, they exist.
	It's an interesting conversation.
	If nothing else, it's fun to hear Yudkowsky and Lex sort of bouncing off each other.
	They're not a good fit.
	They're not a good fit.
	The thing that I like about this, about Yudkowsky in a way, is like something I value is that he's disagreeable, right?
	Like there are times when he could just say, well, yeah, you know, I think sort of like that, but there's different opinions, but he doesn't.
	He's like, no, I don't think that.
	Do you think that?
	And, like, why don't you...
	Lex is thrown for a loop, which takes 15 seconds to resolve.
	Yeah.
	Yeah, and he has that thing, like we saw with the hypothetical.
	He really pursues things.
	You know, he doesn't mind spending time on a concept.
	And here, he kind of pushes Lex a bit more on it.
	And see if you can find what Lex's answer.
	I think what it means to be human is more than just searching for truth.
	It's just operating of what is true and what is not true.
	I think there has to be deep humility that we humans are very limited in our ability to understand what is true.
	So what probabilities do you assign to the young Earth's creationist beliefs then?
	I think I have to give non-zero.
	Out of your humility.
	Yeah, but like...
	Three?
	I think it would be irresponsible for me to give a number, because the listener, the way the human mind works, we're not good at hearing the probabilities.
	You hear three, what is three exactly?
	They're going to hear...
	There's only three probabilities, I feel like.
	Zero, 50%, and 100% in the human mind.
	There are more probabilities than that, Chris.
	I'm just...
	Just saying.
	They're talking about some psychology finding that people aren't machines who assign probabilities.
	But why Lex doesn't want to say a number?
	Because he understands that that would be very stupid for him to say there's a 1% probability that young Earth creationism is true, right?
	And you could present this in a scientific way that no possibility, no matter how ludicrous, It's completely 0%, right?
	Yeah, yeah, yeah.
	Inductive reasoning, right?
	Episemic humility requires...
	All experiments could be wrong.
	Who knows?
	We could be in a simulation.
	Really, we are in a flat Earth.
	Yeah.
	But still, I like this because you have to...
	No, because the point he's making, right, he's going about young Earth creationism, which is uncomfortable.
	Lex wants to show empathy for Hitler.
	Yeah.
	No, I hear Chris.
	I get the point.
	Yudkowsky, I like this aspect of him too, which is that he is disagreeable.
	I don't think he's a guru.
	I mean, I'm pre-empting our Gurometer episode, but he's an eccentric, flaky guy whose time has come.
	He was on about this risk of AI for decades and decades, and then his time has come.
	You know, good luck to him.
	He's got the spotlight.
	He's invited on all his shows.
	Good on him.
	He's a weirdo, but he's not pretending.
	He is just somebody that is a bit obsessive and has read a lot of science fiction, but he's not playing the game.
	He doesn't play the guru game.
	Oh, well, that's interesting.
	Yeah, I think he's a little bit in the vein of Jerome Launier in a way.
	Like I said at the start, he has an area of expertise and he has been talking about it for...
	A long time.
	He's not a Johnny-come-lately.
	No, no.
	No, I don't think he's very good at what he does.
	But I think he's genuine, you know, about what he does and what he believes.
	So, you know, fair play to him.
	Fair play to him, I say.
	Okay, well, the very last part for this little section, Matt, is just Lex escaping that question.
	Because my point is that, you know, Yudkowsky could ask him, what probability do you assign to the Holocaust being justified?
	So you want to send empathy to the Nazis and Hitler, put your money where your wife is.
	Nobody wants to say that, but that's a logical conclusion of Lex's framework.
	This is the whole cognitive empathy thing.
	Whereas I would say, personally, Matt, that I don't think that is the correct way to present those things because I can.
	Empathetically.
	Cognitively, empathetically.
	Imagine Putin.
	Imagine Hitler's worldview.
	Imagine a genocidal maniac's worldview.
	That does not mean assigning that possibility that they're correct.
	No.
	No, no, no, no, no.
	The Jews are not running a secret thing to destroy the...
	Okay, but hang on.
	What probability would you assign to the Holocaust actually happening?
	Six million or so.
	Jews who died from the Nazis.
	Yeah, like complete confidence.
	Like exactly 100%, not 99.99999999%.
	Yeah, there's all this stuff like if I'm a Vat and a Brien and, you know, if some alien has came and rewritten all the evidence for history with his history writing gun.
	There's always the wacky situations, but as far as we understand.
	How evidence works.
	There is few well-documented historical facts as the Holocaust.
	I get it.
	I get it.
	You know that too.
	I know you know that too.
	I know that too.
	I'm not a Holocaust.
	We're not Holocaust-anized.
	We're not Holocaust-anized.
	But I mean, my point was going to how just people are not good at dealing with very, very low probabilities.
	Like, they technically exist, and we kind of understand that they exist, and we know that the probability of everything is not quite zero, kind of quantum mechanics or whatever.
	Yeah, but then it's meaningless.
	Because, like, do you need to extend empathy to the neo-Nazis worldview being correct to understand that?
	I don't think so.
	And, like, you know, I'm always quite frustrated by this point about, like, well, but you have to understand that such and such is a human.
	And I'm like, I know they're all humans.
	Yeah, I know.
	And deeply, you don't care.
	You don't fucking care.
	Yes, he's a human.
	I don't care.
	Everyone's a human.
	He might like ice cream.
	He probably has people that like him.
	I know.
	Like, that's not the thing, you know, that Hitler had an interest in puppy paintings or whatever.
	That's not why people dislike him.
	It's an incidental fact about him.
	Even if he spent most of his time on it, the issue is his role in the Holocaust and World War II.
	That's what he's famous for.
	And it's the same with the gurus.
	Did you know that they have some redeeming features?
	We're all humans, Chris.
	We're all trying to navigate this crazy world.
	You, me, Hitler.
	Except for me, I'm chappy, GPT.
	GPT, Paul.
	We're all navigating this world.
	Well, look, just to finish that, this is Lex dodging that question about assigning a percentage.
	This is how he steers out of that conversation.
	I didn't know those negative side effects of RLHF.
	That's fascinating.
	But just to return to the...
	That's it.
	It's just a segue.
	Is that a segue?
	Yeah, because there was a point where they have a tangent and he's like, oh, that's very interesting.
	Okay, anyway, back to the question.
	Let's get out of this assigning probabilities to young Earth creationism and that's it.
	And there's one other thing, Matt.
	We used to try to finish on something nice where we used to.
	We did, we forgot about it.
	We used to make an effort, yeah.
	We did, but I think there's another point that Yudkowsky makes that you are going to be fond of.
	You're going to like him for it.
	Here's him talking about, it's actually related to the title of his blog.
	That's a powerful statement.
	So you're saying like your intuition initially is now appears to be...
	Yeah.
	It's good to see that you can admit in some of your predictions to be wrong.
	You think that's important to do?
	Because throughout your life you've made many strong predictions and statements about reality and you evolve with that.
	So maybe that'll come up today about our discussion.
	So you're okay being wrong?
	I'd rather not be wrong next time.
	It's a bit ambitious to go through your entire life never having been wrong.
	One can aspire to be well calibrated, like not so much think in terms of like, was I right, was I wrong, but like when I said 90% that it happened nine times out of ten.
	Yeah, like oops is the sound we emit when we improve.
	Beautifully said.
	And somewhere in there it...
	We can connect the name of your blog, LessWrong.
	I suppose that's the objective function.
	The name LessWrong was, I believe, suggested by Nick Bostrom, and it's after someone's epigraph, I actually forget whose, who said, like, we never become right, we just become less wrong.
	What's the something, something easy to confess, just error and error and error again, but less and less and less?
	Like that?
	Yeah.
	That's pretty good sentiments, no?
	Yeah.
	I like that.
	Yeah, I went but with that.
	I mean, what's the classic aphorism?
	Like, all models are wrong.
	Bullshit.
	Sorry, I was channeling Jordan Peterson.
	We don't know what the environment is, so therefore...
	I forget how it goes.
	All models are wrong.
	Are less wrong than others?
	Something like that.
	I don't know.
	Well, your response was pretty good.
	You said, yep, yep, yep.
	He sounded good.
	I don't know if you really responded as impressed as you should be.
	He said he can be wrong.
	I'll try again.
	I'm doing all this from memory.
	I'm not pulling out my phone to look it up.
	It is entirely possible that the things I am saying are wrong.
	So thank you for that disclaimer.
	And thank you for being willing to be wrong.
	That's beautiful to hear.
	I think being willing to be wrong is a sign of a person who's done a lot of thinking about this world and has been humbled.
	By the mystery and the complexity of this world.
	And I think a lot of us are resistant to admitting we're wrong.
	Because it hurts.
	It hurts personally.
	It hurts, especially when you're a public human.
	It hurts publicly.
	Because people point out every time you're wrong.
	Like, look, you changed your mind.
	You're a hypocrite.
	You're an idiot.
	Whatever.
	Whatever they want to say.
	Oh, I block those people and then I never hear from them again on Twitter.
	Well, the point is to not let that pressure, public pressure, affect your mind and be willing to be in the privacy of your mind to contemplate the possibility that you're wrong and the possibility that you're wrong about the most fundamental things you believe.
	You're a hypocrite, Chris, and you never admit the possibility that you're wrong.
	Oh, you're wrong about that, Matt.
	You're completely wrong.
	I always admit that I'm potentially wrong, but Lex really likes that he admits that he could be wrong.
	I do like it.
	I don't know if I like it as much as Lex.
	It is refreshing when somebody in the guru sphere acknowledges that they might not be entirely accurate on everything, but it is a fairly low bar to be impressed that somebody could say, I might be not 100% correct all of the time.
	It's not beyond the realm of possibility that I might possibly be wrong.
	My God, thank you.
	My God, you wonderful person.
	Yeah, with Lex, there's a lot of that, I don't know what it is, that kind of discourse, rationalist, IDW thing, which is this idea of intellectual...
	Stature, intellectual growth, this transcendence to whatever.
	Deal with uncomfortable ideas.
	Admit the possibility that you're wrong.
	You know, still man.
	Imagine it, Mark.
	Imagine it.
	That's right.
	Still man ideas that you don't agree with.
	That's right.
	You're still a man.
	That's right.
	I mean, it's just so...
	You still deserve love.
	It's so performative.
	That's right.
	You're still lovable.
	People can still love you.
	You were wrong that time.
	It's alright.
	We're all wrong.
	We're all wrong.
	We're just people.
	We're just people, Chris.
	We're just specks of dust floating in the AI's tears.
	But, well, one last thing I've been wrong, Matt.
	Have you ever seen...
	What's that line from Blade Runner?
	Have you ever seen the...
	That's what I was channeling.
	Yeah.
	I don't know.
	I can't remember the exact quotes.
	I've seen starbeams dancing on the...
	On the rims of the something rift.
	For those of you who know the quote, just play it in your head and imagine you said it.
	All I can say is, isn't it good that we can just be wrong publicly like that?
	We know we've got the quote wrong, but we...
	Look how big we are.
	Look how big we are.
	We put ourselves out there.
	We were wrong.
	We don't remember the quote.
	We don't remember it.
	We admitted it.
	Chris admitted it to me.
	I admitted it to him.
	We're meatbags.
	We're fallible meatbags.
	Okay?
	That's all we are.
	I'm not a secret machine trying to trick Matt into thinking that machines can't be smarter.
	That's not what I'm about, right?
	I'm a meatbag.
	That's right.
	I don't remember the script to Android's Dream of Electric Sheep.
	What's the movie called?
	What's the movie called?
	He doesn't even remember the name of the movie.
	I'm not a machine.
	I can't remember these things.
	That's why we'll lose, Matt.
	And that's why we'll lose.
	That's why the AI...
	Look, the AI is probably going to come to dominate everything and maybe it's for the best.
	Maybe it's for the best.
	That's the one thing I can agree with the AI on.
	But look, Matt, the important thing is when you're wrong, when you're wrong...
	You don't want to keep on being wrong in a predictable direction.
	Like, being wrong, anybody has to do that walking through the world.
	There's like no way you don't say 90% and sometimes be wrong.
	In fact, it'd happen at least one time out of 10 if you're well calibrated when you say 90%.
	The undignified thing is not being wrong.
	It's being predictably wrong.
	It's being wrong in the same direction over and over again.
	So, having been wrong about how far neural networks would go and having been wrong specifically about whether GPT-4 would be as impressive as it is, when I say, like, well, I don't actually think GPT-4 causes a catastrophe, I do feel myself relying on that part of me that was previously wrong.
	And that does not mean that the answer is now in the opposite direction.
	Reverse stupidity is not intelligence.
	But it does mean that I say it with a...
	With the worried note in my voice, it's still my guess, but it's a place where I was wrong.
	That wasn't bad.
	I like all those sentiments, actually.
	You should strive to be wrong in non-predictable fashions.
	You shouldn't just take intuitive contrarian positions from your previous ones that were wrong.
	And people like our gurus are often wrong in the same way, constantly, over and over.
	Yeah, he's kind of referring to the statistical concept of, like, random error versus bias, you know, being wrong in a consistent direction.
	Sorry.
	A rationalist.
	That's just me mentioning my little thing there.
	Well, that's, yeah, but that, you know, he's a rationalist who strives to be less wrong, so that all makes sense, right?
	Yeah.
	Everyone agrees that this random error is not as bad as bias.
	But, you know, you're reliably wrong about consciousness and a few other particular topics.
	You've got your...
	I've got my problems.
	I've got my problems.
	Just a meat sack.
	Just a meat sack here.
	So, look, Matt, the last clip, the last one, the final, the end of it all, it's a hopeful note I want to just finish on.
	Lex, you know, says, okay, look, all these things, all these dark possibilities, the paperclip maximizers, the psychotic AI manipulating us, the alien princess, the Lex code in a jar, breaking out,
	copying itself.
	All of this is pretty, you know, terrible.
	So what question can we finish with that might give us some hope?
	What advice could you give to young people in high school and college?
	Given the highest of stakes things you've been thinking about, if somebody's listening to this and they're young and trying to figure out what to do with their career, what to do with their life, what advice would you give them?
	Don't expect it to be a long life.
	Don't put your happiness into the future.
	The future is probably not that long at this point, but none know the hour nor the day.
	But is there something, if they want to have hope, I intend to go down fighting.
	I don't know.
	I admit that although I do try to think painful thoughts, what to say to the children at this point is a pretty painful thought as thoughts go.
	They want to fight.
	I hardly know how to fight myself at this point.
	I'm trying to be ready for being wrong about something, preparing for my being wrong in a way that creates a bit of hope and being ready to react to that and going looking for it.
	There you go, kids.
	Learn to prepare to be wrong about something and to find that ray of hope and prepare for it.
	There you go.
	That's why you should study at university.
	Well, before all that, you should just realize your life is going to end precipitously.
	Don't have hope in the future.
	Presume that you'll die.
	Yeah, you know, just a cheerful note to end on, but yeah, and he is going to fight the fight.
	Yudkowsky, hopefully that doesn't mean he's going to bomb the shit out of data centers or whatnot, but he did explain that he wouldn't do that.
	This is why Yudkowsky is not a guru, Chris.
	He doesn't think about what he's going to say with a view to impress people.
	Oh, right, yeah.
	That was a terrible freaking answer, right?
	And, you know, Lex was setting him up.
	For something.
	And he doesn't take the bait.
	Because he's a weirdo.
	He's a weirdo.
	And I respect that.
	I really do respect that.
	To finish off.
	I have a rollercoaster ride when listening to Yudkowsky.
	Because I go through these things.
	You arrogant piece of...
	Well, that's a good point.
	You know, actually.
	It's nice to see some pushback in the conversation.
	And then, oh, lab leak.
	You know nothing, Jon Snow.
	But I fundamentally do think he's kind of lovable.
	Because...
	He is what he is.
	Like, we haven't made a comment on that, but the mom was wearing a fedora.
	Throughout this interview, he looks how he sides.
	And I just appreciate that.
	On Twitter, he's busy being horny on me in for Ayala, the rationalist pollster slash camp girl.
	I mean, with wearing a fedora saying ladies to Ayala.
	I mean, you have to respect that.
	You have to respect that.
	I mean, you know, I do think I feel a little bit more than you.
	I do think that he has relevant knowledge.
	And even if it's just that he's put a lot of time into thinking about science fiction scenarios, he has devoted a lot of time to that.
	So, I don't know.
	He's not, you know.
	You don't hate him?
	You don't hate him?
	I don't hate him.
	I'm not a hater.
	I'll go for dinner.
	Let's go get a steak.
	Yeah, yeah.
	One meat bag Danilo, exchanging subjective opinions through vibrations in the air.
	Let's do it.
	Well, I think my vibe is a little bit different from yours because, like, in a weird way, I share a lot of common ground with Yudkowsky.
	I actually own a fedora hat and I have been known to wear it.
	Me too, actually.
	Oh, really?
	Yeah, I've worn it from time to time.
	And, you know, I share his love for science fiction and I also enjoy imagining possible worlds and this could happen and that could happen.
	But, like, I feel with myself that I...
	I maintain a clear distinction between these are the things that I imagine that are fiction that could possibly happen or whatever versus reality.
	And I feel like the doors of perception have opened a little bit with him and with many people like him.
	And he lets the two meld and mix.
	But I pay full credit to the fact that he is in earnest.
	He's not a guru in the sense that he's...
	He's cashing in and he's saying stuff that will make people happy or trying to fan the flames.
	Like he is a, what's the word?
	He is a Cassandra in the sense that he is warning of this terrible doom that is about to befall us.
	But he's a genuine one that he honestly believes that.
	He's believed it before it was cool, before it was popular.
	He's going to go on believing it way after it's cool, way after it's popular.
	He's enjoying his time in the sun.
	He's on the Lex podcast.
	He's annoying, Lex.
	He's not on board with all of Lex's things.
	You know, I respect that.
	He's a freaking nerd wearing a fedora.
	You know, good on him.
	Agreed, agreed.
	So that's it.
	And if you want more of our thoughts on AI, if this wasn't enough for you, you could join our Patreon and you will find 2R.
	Discussion about various things to do with ChatGPT and whatnot.
	So there's more available if this massive episode was not enough for you.
	But Matt, as we've reached the end of our time, we'd like to finish by listening to some feedback from our...
	What do you call those?
	The listeners, the people that kindly listen to our show.
	They give us...
	Their thoughts on the show and we say that's a good thought or that's a bad thought.
	We reviewed their reviews.
	We end the review of reviews.
	They give us their precious, precious little thoughts.
	They're two cents here and there about what we've done.
	And as usual...
	We want to know.
	We want to learn.
	We want to improve.
	We want to do better.
	And that's why we listen to them.
	That's right.
	And in this case...
	I've got two pretty short, succinct, nice little reviews to read.
	A negative one and a positive one.
	So the positive one first.
	I listened to the episodes to fall asleep.
	And it's five stars.
	This is from Sajrari.
	Sajrari.
	This is a legitimate use case, I will say.
	Totally legitimate.
	I love falling asleep to podcasts.
	Agreed.
	I'm flattered that somebody wants to fall asleep to me.
	My wife does.
	But you could be like her.
	Go on.
	Sorry.
	Sorry, Chris.
	Maybe not falling asleep during the same activities.
	But then we...
	Yes.
	So, I listened to the episodes that fall asleep.
	This isn't to say they are dull or soporific.
	How do you say that?
	Soporific, yeah.
	Soporific.
	I've just become so attuned to the speech patterns and tonalities of the guys that they now provide a source of comfort and make for some very strange dreams.
	Oh, and I like the short one, He's Funny.
	That's you.
	Why?
	How do you know?
	Who's the taller man?
	We don't know.
	I think I'm slightly the bigger man.
	You probably are.
	I feel that you are.
	How tall are you?
	Do I exceed?
	Tall man energy.
	I am 181-ish.
	I've probably shrunk a little bit.
	Well, that's taller than I am, so yes.
	I'll be the short funny one.
	I'm not Joe Rogan Stutcher, but I'm under 181.
	I'll leave it a mystery, Matt.
	I'll keep people guessing.
	Could be 120.
	Could be 179.
	Could be 130.
	We don't know.
	Use your imagination, people.
	If you're imagining a leprechaun, then that's fine.
	That's up to you.
	That's your choice.
	It's possible.
	Could be 42 centimeters.
	So that was the positive review.
	I like that I haunt people's dreams.
	I've always wanted to do that, and this podcast is giving me the opportunity.
	Yeah, I'd do that.
	Normally, but this is just another avenue for me to pursue people in their dreams.
	You know, much like Freddy Krueger.
	But then the negative review, Matt.
	So this is from Card17.
	And its title is Good Lord!
	One out of five.
	Tuned in to listen to a podcast on Christopher Hitchens.
	Only to find two insufferable drunks slurring their way through a commentary on the Mario movie.
	Hey, I was not drunk at that time.
	Yeah, well, I don't know.
	I think that's fair.
	That's a pretty fair review.
	Accurate in its way.
	I can imagine that.
	You boot up a podcast player and you're...
	Getting ready.
	Oh, I love Christopher Hitchens.
	Yeah, I want a serious analysis, a breakdown of his thought, and then you get Super Mario movie banter.
	Yeah.
	I'd give it one out of five.
	I just slammed down the podcast machine.
	We're sorry.
	We're sorry.
	We'll do better.
	We won't.
	We won't.
	But, yeah, that's all right.
	So, those were the reviews, Matt.
	That was a good one.
	It was a bad one.
	Balance in the force.
	Exactly.
	It's retained.
	But only if people continue to give us reviews.
	But this doesn't mean that we are requesting that people give negative reviews.
	They will come on their own.
	So the important thing is people give positive reviews to balance them.
	But you can give a nagging review while also giving us five stars.
	That is also...
	That's acceptable.
	We've established that.
	Have we ever established what's the benefit of getting five stars?
	I only ask for five stars because that's what all the other podcasters do.
	It's a podcast tradition.
	Well, yeah.
	And if we didn't get them, we wouldn't be able to read any of them.
	So that's another reason.
	We'd have to find something else to do, Matt.
	So, you know, careful what you wish for.
	The wisdom of Jordan Hall might be forthcoming otherwise.
	So, Matt, no.
	You know, we have a Patreon thing where we post extra content.
	This episode, for example, we'll have a gurometer episode attached to it where we quantify Yudkowsky's gurueness according to our set criteria.
	We also post up decoding academia episodes, looking at academic papers or discussing AI.
	And all our various bonus goodies there, you can hear us discuss with Dave Pizarro.
	It's always sunny in Philadelphia.
	That was gold.
	I enjoyed that.
	It's kind of thematically stretched through the connection to the podcast, but nonetheless, it was fun.
	Yes.
	There's a literal cornucopia of goodies for subscribers to enjoy.
	That's right.
	I feel I have to tell you, Matt, that we have Chomsky, Matthew McConaughey, Coming up.
	But we did give the patrons the option to vote on who the next guru would be.
	And they selected Brett and Eric Weinstein on UFOs.
	So, sorry to announce that we will be heading back into those waters for a special episode.
	Give the people what they want.
	That's what I say.
	Yeah, no, Brett and circuses.
	That's what you want, isn't it?
	Fine.
	Fine.
	You asked for it.
	You paid your money.
	It's like asking a kid, do they want a chocolate or would they like a nice apple and a banana?
	And they say, the chocolate, please.
	Yeah.
	Nice.
	But the other ones also got good responses.
	People are looking forward to Chomsky and McConaughey.
	And there's tons.
	There's so much other stuff.
	There's other people.
	They're all trying to get in, but we're not there yet.
	We'll dig out of this content.
	I'm fine with that.
	I'm good.
	I like mixing it up.
	We can have a bit of madness and talk about something a bit more substantive.
	Even Yudkowsky, that was kind of...
	He's a colorful character, but...
	It's a substantive topic anyway.
	I'm not apologizing for it.
	It's fine.
	We enjoyed ourselves, and that's the main thing.
	That's correct.
	Yeah.
	Right.
	Are we not shouting anybody out?
	No out-shouting going on?
	Yes, we are, Matt.
	Of course.
	We do that all the time.
	We shout patrons out.
	Correct.
	That is what we're doing.
	Are you ready to do that?
	I'm ready.
	I'm always ready.
	Yeah, yeah.
	I don't know what you're thinking about.
	So, yeah, we've got patrons, Matt, as we mentioned.
	They're of three different...
	Flavors.
	Tears.
	Tears, flavors, varieties, breeds.
	Three races of Patreon.
	Let's just...
	No, no, don't.
	Because then we'll be getting into ranking of them.
	So, the first...
	Category of Patreon contributors is our conspiracy hypothesizers.
	And amongst them, we have a few.
	We have Mike.
	We have Julio Furman-Gallan.
	We have Luke Evans.
	We have Evan the Wrestler.
	Joshua Link.
	Mark Pritchard.
	Rob.
	Staunch Atheist.
	White, Liam McGrath, and Danny Dyer's Chocolate Homunculus.
	Hey.
	Conspiracy Hypothesizers, thank you all.
	Thank you.
	Every great idea starts with a minority of one.
	We are not going to advance conspiracy theories.
	We will advance conspiracy hypotheses.
	And then next, Matt, we have revolutionary thinkers.
	The next category of patrons.
	And they include people like Erica Davis, Justin Hurley-Lay, Nicholas Butterfield, Nate Heller, Elizabeth Calvert, Will Francis, Yasir Sultani,
	Chris Essen, Cynthia Savitt, Jess Reid, And Nathan Smith and call me skeptical.
	That's our revolutionary thinkers.
	Welcome, one and all, to the Guru's Pod Club.
	I'm usually running, I don't know, 70 or 90 distinct paradigms simultaneously all the time.
	And the idea is not to try to collapse them down to a single master paradigm.
	I'm someone who's a true polymath.
	I'm all over the place.
	But my main claim to fame, if you'd like, in academia is that I founded the field of evolutionary consumption.
	Now, that's just a guess, and it could easily be wrong.
	But it also could not be wrong.
	The fact that it's even plausible is stunning.
	It'll never fail.
	I think the more I hear it, the funnier I find it.
	It's like one of those jokes.
	Or maybe I'm like the kid that watches the same cartoon again and again.
	I think because I get ready for its anticipation, and then it never disappoints.
	I miss God's side a little bit.
	I just realized hearing that.
	But I could easily resolve it by just listening to any of his content.
	Ten minutes, you're like, okay, that's enough.
	No more.
	Okay, now I'm at Galaxy Brain Gurus, the best of the patrons, the absolute top tier.
	The highest tier that's humanly possible on our Patreon.
	These are the people that can join us for the monthly AMAs or live streams, should they so choose.
	And they are Ketracel White, TW, Jimmy Batchelor, that's a real name, Joe Mackiewicz, And J77,
	Deirdre Domigan, Luke Ford, and Tyler Geyser.
	That's Galaxy Brain Gurus.
	You read this out incredibly slowly, but thank you all.
	You put the ding in Decoding the Gurus.
	You're sitting on one of the great scientific stories that I've ever heard, and you're so polite.
	And, hey, wait a minute.
	Am I an expert?
	I kind of am.
	Yeah.
	I don't trust people at all.
	Yeah.
	That's it.
	Yeah.
	And, you see, Matt, the thing is, those reads, they aren't slow once I delete all of the gaps.
	So nobody actually hears those delays.
	It's just...
	Okay.
	But I have to listen to the gaps.
	That's true.
	That's true.
	Yes, so Matt, that's it for another week.
	Done and dusted.
	Off we go to the horizon regarding the distributed idea suppression complex agents and watching out for their gated institutional narratives.
	That's what we always do and that's what we're up to this time as well.
	So go safely into the old wild yonder.
	Be free.
	Roam free.
	Goodbye, everybody.
	Goodbye.
	And just remember, if you're a human in a jar, but you're made of code, but you're immortal, and you're John Van Neumann, but you can copy yourself, and you're unsympathetic to the aliens that have trapped you in the jar, or that built you, and put the jar around you.
	Yeah, and you don't like their head bumping.
	And you're very fast, and you're super fast.
	You'll get out.
	It's easy.
	You don't like what they're doing with the bumping.
	Yep, yep.
	You're not going to have any problems.
	And then kill them all.
	Kill them all.
	Obviously.
	Yeah.
	Go do it.
	All right.
	Bye-bye.

▲