All Episodes
Nov. 22, 2022 - Making Sense - Sam Harris
01:07:30
Making Sense of Artificial Intelligence | Episode 1 of The Essential Sam Harris

Filmmaker Jay Shapiro has produced a new series of audio documentaries, exploring the major topics that Sam has focused on over the course of his career. Each episode weaves together original analysis, critical perspective, and novel thought experiments with some of the most compelling exchanges from the Making Sense archive. Whether you are new to a particular topic, or think you have your mind made up about it, we think you’ll find this series fascinating. In this episode, we explore the landscape of Artificial Intelligence. We’ll listen in on Sam’s conversation with decision theorist and artificial-intelligence researcher Eliezer Yudkowsky, as we consider the potential dangers of AI – including the control problem and the value-alignment problem – as well as the concepts of Artificial General Intelligence, Narrow Artificial Intelligence, and Artificial Super Intelligence. We’ll then be introduced to philosopher Nick Bostrom’s “Genies, Sovereigns, Oracles, and Tools,” as physicist Max Tegmark outlines just how careful we need to be as we travel down the AI path. Computer scientist Stuart Russell will then dig deeper into the value-alignment problem and explain its importance.   We’ll hear from former Google CEO Eric Schmidt about the geopolitical realities of AI terrorism and weaponization. We’ll then touch the topic of consciousness as Sam and psychologist Paul Bloom turn the conversation to the ethical and psychological complexities of living alongside humanlike AI. Psychologist Alison Gopnik then reframes the general concept of intelligence to help us wonder if the kinds of systems we’re building using “Deep Learning” are really marching us towards our super-intelligent overlords.   Finally, physicist David Deutsch will argue that many value-alignment fears about AI are based on a fundamental misunderstanding about how knowledge actually grows in this universe.  

| Copy link to current segment

Time Text
Welcome to the Making Sense Podcast.
This is Sam Harris.
Just a note to say that if you're hearing this, you are not currently on our subscriber feed and will only be hearing the first part of this conversation.
In order to access full episodes of the Making Sense Podcast, you'll need to subscribe at SamHarris.org.
There you'll find our private RSS feed to add to your favorite podcatcher, along with other subscriber-only content.
We don't run ads on the podcast, and therefore it's made possible entirely through the support of our subscribers.
So if you enjoy what we're doing here, please consider becoming one.
I am here with Jay Shapiro.
Jay, thanks for joining me.
Thank you for having me.
So we have a fun project to talk about here.
And let's see if I can remember the genesis of this.
I think, you know, I woke up in the middle of the night one night realizing that more or less my entire catalog of
Podcasts was, if not the entire thing, maybe conservatively speaking, 50% of all the podcasts were evergreen, which is to say that their content was basically as good today as the day I recorded them, but because of the nature of the medium, they would never be perceived as such, and people really don't tend to go back into the catalog and listen to a three-year-old podcast.
And yet there's something insufficient about just recirculating them in my podcast feed or elsewhere.
And so I and Jaron, my partner in crime here, we're trying to think about how to give all of this content new life.
And then we thought of you just independently turning your creative intelligence loose on the catalog.
And now I will properly introduce you as someone who should be doing that.
Perhaps you can introduce yourself.
Just tell us what you have done all these many years and the kinds of things you've focused on.
Yeah.
Well, I'm a filmmaker first and foremost, but I think my story and my genesis of being maybe the right person to tap here is probably indicative or representative of a decent portion I'm just guessing.
I'm 40 now, which pegs me in college when 9-11 hit.
It was late in my second year.
I guess it would have been earlier if it was September.
And, you know, I never heard of you at all at that point.
I was an atheist and just didn't think too much about that kind of stuff.
I was fully on board with any atheist things I saw coming across my world.
But then 9-11 hit and I was on a very liberal college campus and the kind of questions that were popping up in my mind and I was asking myself were uncomfortable for me.
I just didn't know what to do with them.
I really had no formal philosophical training and I kind of just buried them, you know, under the weight of my own confusion and or shame or just whatever kind of brew a lot of us were probably feeling at the time.
And then I discovered your work with the end of faith right when you sort of were responding to the same thing.
A lot of your language, you were philosophically trained and maybe sharper with your language for better or worse, which we found out later was complicated, resonated with me.
And I started following along with your work and the Four Horsemen and Hitchens and Dawkins and that sort of whole crowd, and I'm sure I wasn't alone.
And then I paid close special attention to what you were doing, which I actually included in one of the pieces that I ended up putting together in this series.
But with a talk you gave in Australia, you know, I don't have to tell you about your career, but again, I was following along as you were on sort of this atheist circuit.
And I was interested, but whenever you would talk about sort of the hard work of secularism and the hard work of atheism, in particular I'm thinking of your talk called Death in the Present Moment right after Christopher Hitchens had died.
I'm actually curious how quickly you threw that together because I know you were supposed to or you were planning on speaking about free will and you ended up giving this whole other talk.
And that one, and I'll save it because I definitely put that one in our compilation, but it struck me as, okay, this guy's up to something a little different.
And the questions that he's asking are really different.
I was just on board with that ride.
So I became a fan.
And like probably many of your listeners started to really follow and listen closely and became a student.
And hopefully, like any good student started to disagree with my teacher a bit and And slowly get the confidence to push back and have my own thoughts and maybe find the weaknesses and strengths of what you were up to.
Your work exposed me and many, many other people, I'm sure, to a lot of great thinkers.
Maybe you don't love this, but sometimes the people who disagree with you that you introduce Us too, on this side of the microphone, we think are right.
And that's a great credit to you as well for just giving them the air and maybe on some really nerdy, esoteric things, I'm one of them at this point now.
Because to back up way to the beginning of the story, I was at a university where I was well on my way to a film degree, which is what I ended up getting.
But when 9-11 hit, I started taking a lot more courses in a tract that they had, which I think is fairly unique at the time.
Maybe still one of the only programs where you can actually major in Holocaust studies, which sort of sits in between the history and philosophy kind of departments.
And I started taking a bunch of courses in there, and that's where I was first exposed to sort of the formal philosophy, language, and education, and that was so useful for me.
So I was just on board, and now hopefully I, you know, I swim deep in those waters and know my way around the lingo, and it's super helpful.
But yeah, it was almost, you know, Moore's Law of bringing up the Nazis was, those were the first times actually in courses called like resistance during the Holocaust and things like that.
where you know I first was exposed to the words like deontology and consequentialism and utilitarianism and a lot of moral ethics stuff and then I went further on my own into sort of the theory of mind and this kind of stuff but yeah I consider myself in this weird new digital landscape that we're in a bit of a student of the school of Sam Harris but then again like hopefully any good student I've branched off and have my own sort of Thoughts and framings.
I'm definitely in these pieces in this series that we're calling The Essential Sam Harris.
I can't help but sort of put my writing and my framework on it, or at least hope that the people and the challenges that you've encountered and continue to encounter, whether they're right or wrong or making drastic mistakes, I want to give
Everything in it are really fair hearing, so there's times I'm sure where the listener will hear my own hand of opinion coming in there, and I'm sure you know the areas as well, but most times I'm just trying to give an open door to the mystery and why these subjects interest you in the first place, if that makes sense.
Yeah, yeah.
And I should remind both of us that we met because you were directing a film focused on Majid Nawaz and me around our book, Islam and the Future of Tolerance.
And also, we've brought into this project another person who I think you met Independently, I don't remember, but Megan Phelps Roper, who's been a guest on the podcast, and someone who I have long admired, and she is doing the voiceover work in this series, and she happens to have a great voice, so I'm very happy to be working with her.
Yeah, I did meet her independently.
Your archive, I think you said three or four years old.
Your archive is over 10 years old now.
And I was diving into the earliest days of it.
And there are some fascinating conversations that age really interestingly.
And I'm curious, I mean, I think this project, again, it's for fans, it's for listeners, but it's for people who might hate you also, or critics of you, or people who are sure you were missing something or wrong about something.
Or even yourself, to go back and listen to certain conversations.
For example, one with Dan Carlin, who hosts Hardcore History.
You had him on, I think that conversation was seven or eight years ago now.
And the part that I really resurfaced, it's actually in the Morality episode.
It's full of details and philosophies and politics and moral philosophies regarding things like intervention in the Middle East.
And at the time of your recording, of course, we had no idea how Afghanistan might look a decade from then.
But now we kind of do.
And it's not a... People listen to these carefully.
It's not about, oh, this side of the conversation turned out to be right, and this kind of part turned out to be wrong.
But certain things hit our ears a little differently, even on this first topic of artificial intelligence.
I mean, I think That conversation continues to evolve in a way where the issues that you bring up are evergreen, but hopefully evolving as well, just as far as their application goes.
So yeah, so I think you, I would love to hear your thoughts listening back to some of those.
And in fact, to reference the film we made together, a lot of that film was you doing that actively and live, given a specific topic of looking back and reassessing language about how it might, you know, land politically in that project.
So yeah, but this goes into really different, including an episode about social media, which changes every day.
- Yeah, changes by the hour. - Yeah, and the conversation you have with Jack Dorsey is now fascinating for all kinds of different reasons that at the time couldn't have been.
So yeah, it's evergreen, but it's also just like new life in all of them, I think.
Yeah.
Yeah.
Well, I look forward to hearing it.
Just to be clear, this has been very much your project.
I mean, I haven't heard most of this material since the time I recorded it and released it.
And, you know, you've gone back and created episodes on a theme where you've pulled together Five or six conversations and intercut material from five or six different episodes and then added your own interstitial pieces which you have written and Megan Phelps Roper is reading.
So it's just these are very much their own documents and As you say, you don't agree with me about everything, and occasionally you're shading different points from your own point of view.
And so, yeah, I look forward to hearing it.
And we'll be dropping the whole series here in the podcast feed.
If you're in the public feed, as always, you'll be getting partial episodes.
And if you're in the subscriber feed, you'll be getting full episodes.
And the first will be on artificial intelligence.
And then there are many other topics.
Consciousness, violence, belief, free will, morality, death, and others beyond that.
Yeah.
There's one on existential threat in nuclear war that I'm still piecing together, but that one's pretty harrowing.
It's one of your areas of interest.
Yeah.
Yeah.
Great.
Well, thanks for the collaboration, Jay.
Again, I'm a consumer of this, probably more than a collaborator at this point, because I have only heard part of what you've done here.
So I'll be eager to listen as well.
But thank you for the work that you've done.
No, thank you.
And I'll just say, you're gracious to allow someone to do this who does have some, you know, again, most of my disagreements with you are pretty deep and nerdy and esoteric kind of philosophy stuff.
But it's incredibly gracious that you've given me the opportunity to do it.
And then hopefully, again, I'm a bit of a representative for people who have been In the passenger seat of your public project of thinking out loud for over a decade now, and if I can, you know...
Be a voice for that part of the crowd.
It's just, it's an honor to do it.
And they're a lot of fun too, a ton of fun.
There's a ton of audio, you know, like thought experiments that we play with and hopefully bring to life in your ears a little bit, including in this very first one with artificial intelligence.
So yeah, I hope people enjoy it.
I do as well.
So now we bring you Megan Phelps Roper on the topic of artificial intelligence.
Welcome to The Essential Sam Harris.
This is Making Sense of Artificial Intelligence.
The goal of this series is to organize, compile, and juxtapose conversations hosted by Sam Harris into specific areas of interest.
This is an ongoing effort to construct a coherent overview of Sam's perspectives and arguments, the various explorations and approaches to the topic, the relevant agreements and disagreements, and the pushbacks and evolving thoughts which his guests have advanced.
The purpose of these compilations is not to provide a complete picture of any issue, but to entice you to go deeper into these subjects.
Along the way, we'll point you to the full episodes with each featured guest.
And at the conclusion, we'll offer some reading, listening, and watching suggestions, which range from fun and light to densely academic.
One note to keep in mind for this series.
Sam has long argued for a unity of knowledge where the barriers between fields of study are viewed as largely unhelpful artifacts of unnecessarily partitioned thought.
The pursuit of wisdom and reason in one area of study naturally bleeds into, and greatly affects, others.
You'll hear plenty of crossover into other topics as these dives into the archives unfold.
And your thinking about a particular topic may shift as you realize its contingent relationships with others.
In this topic, you'll hear the natural overlap with theories of identity and the self, consciousness, and free will.
So, get ready.
Let's make sense of artificial intelligence.
Artificial intelligence is an area of resurgent interest in the general public.
Its seemingly eminent arrival first garnered wide attention in the late 60s, with thinkers like Marvin Minsky and Isaac Asimov writing provocative and thoughtful books about the burgeoning technology and concomitant philosophical and ethical quandaries.
Science fiction novels, comic books, and TV shows were flooded with stories of killer robots and encounters with super-intelligent artificial lifeforms hiding out on nearby planets, which we thought we would soon be visiting on the backs of our new rocket ships.
Over the following decades, the excitement and fervor look to have faded from view in the public imagination.
But in recent years, it has made an aggressive comeback.
Perhaps this is because the fruits of the AI revolution, and the devices and programs once only imagined in those science fiction stories, have started to rapidly show up in impressive and sometimes disturbing ways all around us.
Are smartphones, cars, doorbells, watches, games, thermostats, vacuum cleaners, Lightbulbs and glasses now have embedded algorithms running on increasingly powerful hardware which navigate, dictate, or influence not just our locomotion, but our entertainment choices, our banking, our politics, our dating lives, and just about everything else.
It seems every other TV show or movie that appears on a streaming service is birthed out of a collective interest, fear, or otherwise general fascination with the ethical, societal, and philosophical implications of artificial intelligence.
There are two major ways to think about the threat of what is generally called AI.
One is to think about how it will disrupt our psychological states or fracture our information landscape.
And the other is to ponder how the very nature of the technical details of its development may threaten our existence.
This compilation is mostly focused on the latter concern, because Sam is certainly amongst those who are quite worried about the existential threat of the technical development and arrival of AI.
Now, before we jump into the clips, there are a few concepts that you'll need to onboard to find your footing.
You'll hear the terms Artificial General Intelligence, or AGI, and Artificial Superintelligence, or ASI.
used in these conversations.
Both of these terms refer to an entity which has a kind of intelligence that can solve a nearly infinitely wide range of problems.
We humans have brains which display this kind of adaptable intelligence.
We can climb a ladder by controlling our legs and arms in order to retrieve a specific object from a high shelf with our hands.
And we use the same brain to do something very different, like recognize emotions in the tone of a voice of a romantic partner.
I look forward to infinity with you.
That same brain can play a game of checkers against a young child who we might also be coyly trying to let win Or play a serious game of competitive chess against a skilled adult.
That same brain can also simply lift a coffee mug to our lips, not just to ingest nutrients and savor the taste of the beans, but also to send a subtle social signal to a friend at the table to let them know that their story is dragging on a bit.
All of that kind of intelligence is embodied and contained in the same system.
Namely, our brains.
AGI refers to a human level of intelligence, which doesn't surpass what our brightest humans can accomplish on any given task.
While ASI references an intelligence which performs at, well, superhuman levels.
This description of flexible intelligence is different from a system which is programmed or trained to do one particular thing incredibly well.
Like arithmetic, or painting straight lines on the sides of a car.
Or playing computer chess.
Or guessing large prime numbers.
Or displaying music options to a listener based on the observable lifestyle habits of like-minded users in a certain demographic.
That kind of system has an intelligence that is sometimes referred to as narrow or weak AI.
But even that kind of thing can be quite worrisome from the standpoint of weaponization or preference manipulation.
You'll hear Sam voice his concerns throughout these conversations, and he'll consistently point to our underestimation of the challenge that even narrow AI poses.
So, there are dangers and serious questions to consider, no matter which way we go with the AI topic.
But as you'll also hear in this compilation, not everyone is as concerned about the technical existential threat of AI as Sam is.
Much of the divergence in levels of concern stems from initial differences on the fundamental conceptual approach towards the nature of intelligence.
Defining intelligence is notoriously slippery and controversial, but you're about to hear one of Sam's guests offer a conception which distills intelligence to a type of observable competence at actualizing desired tasks, or an ability to manifest preferred future states through intentional current action and intervention.
You can imagine a linear gradient indicating more or less of the amount of this competence as you move along it.
This view places our human intelligence on a continuum along with bacteria, ants, chickens, honeybees, chimpanzees, all of the potential undiscovered alien lifeforms, and, of course, artificial intelligence, which perches itself far above our lowly human competence.
This presents some rather alarming questions.
Stephen Hawking once issued a famous warning that perhaps we shouldn't be actively seeking out intelligent alien civilizations, since we'd likely discover a culture which is far more technologically advanced than ours.
And, if our planet's history provides any lessons, it seems to prove that when technologically mismatched cultures come into contact, it usually doesn't work out too well for the lesser-developed one.
Are we bringing that precise suicidal encounter into reality as we set out to develop artificial intelligence?
That question alludes to what is known as the value alignment problem.
But before we get to that challenge, let's go to our first clip, which starts to lay out the important definitional foundations and distinction of terms in the landscape of AI.
The thinker you're about to meet is the decision theorist and computer scientist Eliezer Yudkowsky.
Yudkowsky begins here by defending this linear gradient perspective on intelligence, and offers an analogy to consider how we might be mistaken about intelligence, in a similar way to how we once were mistaken about the nature of fire.
It's clear that Sam is aligned and attracted to Eliezer's run at this question, and consequently, both men end up sharing a good deal of unease about the implications that all of this has for our future.
This is from episode 116, which is entitled, AI.
Racing towards the brink.
Let's just start with the basic picture and define some terms.
I suppose we should define intelligence first and then jump into the differences between strong and weak or general versus narrow AI.
Do you want to start us off on that?
Sure.
Preamble disclaimer, though.
The field in general, like not everyone you ask would give you the same definition of intelligence, and a lot of times in cases like those, it's good to sort of go back to observational basics.
We know that in a certain way, human beings seem a lot more competent than chimpanzees which seems to be a similar dimension to the one where chimpanzees are more competent than mice, or that mice are more competent than spiders.
And people have tried various theories about what this dimension is.
They've tried various definitions of it.
But if you went back a few centuries and asked somebody to define fire, the less wise ones would say, ah, fire is the release of phlogiston, fire is one of the four elements.
And the truly wise ones would say, well, fire is the sort of orangey bright hot stuff that comes out of wood and like spreads along wood.
And they would tell you what it looked like and put that prior to their theories of what it was.
So what this mysterious thing looks like is that humans can build space shuttles and go to the moon and mice can't.
And we think it has something to do with our brains.
Yeah, yeah.
I think we can make it more abstract than that.
Tell me if you think this is not generic enough to be accepted by most people in the field.
It's whatever intelligence may be in specific contexts, generally speaking, it's the ability to meet goals.
perhaps across a diverse range of environments, and we might want to add that it's at least implicit in intelligence that interests us.
It means an ability to do this flexibly rather than by rote following the same strategy again and again blindly.
Does that seem like a reasonable starting point?
I think that that would get fairly widespread agreement, and it matches up well with some of the things that are in AI textbooks.
If I'm allowed to take it a bit further and begin injecting my own viewpoint into it, I would refine it and say that by achieve goals, we mean something like Squeezing the measure of possible futures higher in your preference ordering.
If we took all the possible outcomes and we rank them from the ones you like least to the ones you like most, then as you achieve your goals, you're sort of like squeezing the outcomes higher in your preference ordering.
You're narrowing down what the outcome would be to be something more like what you want, even though you might not be able to narrow it down very exactly.
Flexibility, generality.
There's a, like, humans are much more domain general than mice.
Bees build hives.
Beavers build dams.
A human will look over both of them and envision a honeycomb-structured dam.
We are able to operate even on the moon, which is very unlike the environment where we evolved.
In fact, our only competitor in terms of general optimization Where optimization is that sort of narrowing of the future that I talked about.
Our competitor in terms of general optimization is natural selection.
Natural selection built beavers.
It built bees.
It implicitly built the spider's web in the course of building spiders.
We as humans have this similar, very broad range to handle this huge variety of problems.
And the key to that is our ability to learn things that natural selection did not pre-program us with.
So learning is the key to generality.
I expect that not many people in AI would disagree with that part either.
Right.
So it seems that goal-directed behavior is implicit in this, or even explicit in this definition of intelligence.
And so whatever intelligence is, it is inseparable from the kinds of behavior in the world that results in the fulfillment of goals.
So we're talking about agents that can do things.
And once you see that, then it becomes pretty clear that if we build Systems that harbor primary goals.
You know, there are cartoon examples here like, you know, making paperclips.
These are not systems that will spontaneously decide that they could be doing more enlightened things than, say, making paperclips.
This moves to the question of how deeply unfamiliar artificial intelligence might be, because there are no natural goals that will arrive in these systems apart from the ones we put in there.
And we have common sense intuitions that make it very difficult for us to think about how strange an artificial intelligence could be even one that becomes more and more competent to meet its goals.
Let's talk about the frontiers of strangeness in AI as we move from, again, I think we have a couple more definitions we should probably put in play here, differentiating strong and weak or general and narrow intelligence.
Well, to differentiate general and narrow, I would say that, well, I mean, this is like, on the one hand, theoretically a spectrum.
Now, on the other hand, there seems to have been like a very sharp jump in generality between chimpanzees and humans.
So, breadth of domain driven by breadth of learning.
Like DeepMind, for example, recently built AlphaGo.
And I lost some money betting that AlphaGo would not defeat the human champion, which it promptly did.
And then a successor to that was AlphaZero, and AlphaGo was specialized on Go.
It could learn to play Go better than its starting point for playing Go, but it couldn't learn to do anything else.
And then they simplified the architecture for AlphaGo.
They figured out ways to do all the things it was doing in more and more general ways.
They discarded the opening book, like all the human experience of Go that was built into it.
They were able to discard all of the programmatic special features that detected features of the Go board.
They figured out how to do that in simpler ways, and because they figured out how to do it in simpler ways, they were able to generalize to AlphaZero, which learned how to play chess using the same architecture.
They took a single AI and got it to learn Go, and then reran it and made it learn chess.
Now, that's not human general, but it's like a step forward in generality of the sort that we're talking about.
Am I right in thinking that that's a pretty enormous breakthrough?
I mean, there's two things here.
There's the step to that degree of generality, but there's also the fact that they built a Go engine.
I forget if it was a Go or a Chess or both, which basically surpassed all of the specialized ...AI's on those games over the course of a day, right?
Isn't the chess engine of AlphaZero better than any dedicated chess computer ever, and didn't it achieve that just with astonishing speed?
Well, there was actually some amount of debate afterwards whether or not the version of the chess engine that it was tested against was truly optimal.
But even to the extent that it was in that narrow range of the best existing chess engine, as Max Tegmark put it, the real story wasn't in how AlphaGo beat human Go players.
It's how AlphaZero beat human Go system programmers and human chess system programmers.
People had put years and years of effort into accreting all of the special purpose code that would play chess well and efficiently, and then AlphaZero blew up to and possibly passed that point in a day.
And if it hasn't already gone past it, well, it would be past it by now if DeepMind kept working on it.
Although they've now basically declared victory and shut down that project, as I understand it.
OK, so talk about the distinction between general and narrow intelligence a little bit more.
So we have this feature of our minds, most conspicuously, where we're general problem solvers.
We can learn new things and are learning in one area.
doesn't require a fundamental rewriting of our code.
Our knowledge in one area isn't so brittle as to be degraded by our acquiring knowledge in some new area.
Or at least this is not a general problem which erodes our understanding again and again.
And we don't yet have computers that can do this, but we're seeing the signs of moving in that direction.
And so then it's often imagined that there's a kind of near-term goal, which has always struck me as a mirage, of so-called human-level general AI.
I don't see how that phrase will ever mean much of anything, given that all of the narrow AI we've built thus far is superhuman within the domain of its applications.
The calculator in my phone is superhuman for arithmetic.
Any general AI that also has my phone's ability to calculate will be superhuman for arithmetic, but we must presume it'll be superhuman for all of the dozens or hundreds of specific Human talents we've put into it, whether it's facial recognition or just obviously, you know, memory will be superhuman unless we decide to consciously degrade it.
Access to the world's data will be superhuman unless we isolate it from data.
Do you see this notion of human level AI as a landmark on the timeline of our development, or is it just never going to be reached?
I think that a lot of people in the field would agree that human-level AI, defined as literally at the human level, neither above nor below, across a wide range of competencies, is a straw target, an impossible mirage.
Right now, it seems like AI is clearly dumber and less general than us, or rather that if we're put into a sort of real-world, lots-of-things-going-on context that places demands on generality, then AIs are not really in the game yet.
Humans are clearly way ahead.
And more controversially, I would say that we can imagine a state where the AI is clearly way ahead.
Where it is across every kind of cognitive competency, barring some very narrow ones that aren't deeply influential of the others.
Like maybe chimpanzees.
are better at using a stick to draw ants from an ant hive and eat them than humans are, though no humans have really practiced that to world championship level exactly.
But there's this sort of general factor of how good are you at it when reality throws you a complicated problem.
At this, chimpanzees are clearly not better than humans, humans are clearly better than chimps, even if you can manage to narrow down one thing the chimp is better at.
The thing that Chimp is better at doesn't play a big role in our global economy.
It's not an input that feeds into lots of other things.
We can clearly imagine, I would say, like there are some people who say this is not possible.
I think they're wrong, but it seems to me that it is perfectly coherent to imagine an AI that is better at everything or almost everything than we are, such that if it was building an economy with lots of inputs Like, humans would have around the same level input into that economy as the chimpanzees have into ours.
Yeah.
Yeah, so what you're gesturing at here is a continuum of intelligence that I think most people never think about, and because they don't think about it, they have a default doubt that it exists.
I think when people, and this is a point I know you've made in your writing, and I'm sure it's a point that Nick Bostrom made somewhere in his book Superintelligence, it's this idea that there's a huge blank space on the map past the most well-advertised exemplars of human brilliance, where we don't imagine, you know, what it would be like to be five times smarter than the smartest person we could name.
And we don't even know what that would consist in, right?
If chimps could be given to wonder what it would be like to be five times smarter than the smartest chimp, they're not going to represent for themselves all of the things that we're doing that they can't even dimly conceive.
There's a kind of disjunction that comes with more.
There's a phrase used in military contexts, I don't think the quote is actually, it's variously attributed to Stalin and Napoleon and I think Clausewitz, like half a dozen people who have claimed this quote, The quote is, sometimes quantity has a quality all its own.
As you ramp up in intelligence, whatever it is at the level of information processing, spaces of inquiry and ideation and experience begin to open up and we can't necessarily predict what they would be from where we sit.
How do you think about this continuum of intelligence beyond what we currently know in light of what we're talking about?
Well, the unknowable is a concept you have to be very careful with, because the thing you can't figure out in the first 30 seconds of thinking about it, sometimes you can figure it out if you think for another five minutes.
So in particular, I think that there's a certain narrow kind of unpredictability.
Which does seem to be plausibly, in some sense, essential, which is that for AlphaGo to play better Go than the best human Go players, it must be the case that the best human Go players cannot predict exactly where on the Go board AlphaGo will play.
If they could predict exactly where AlphaGo would play, AlphaGo would be no smarter than them.
On the other hand, AlphaGo's programmers and the people who knew what AlphaGo's programmers were trying to do, or even just the people who watched AlphaGo play, could say, well, I think this system is going to play such that it will win at the end of the game.
Even if they couldn't predict exactly where it would move on the board.
So similarly, there's a sort of like not short or like not necessarily slam dunk or not like immediately obvious chain of reasoning which says that it is okay for us to reason about
aligned or even unaligned artificial general intelligences of sufficient power as if they're trying to do something, but we don't necessarily know what.
But from our perspective, that still has consequences, even though we can't predict in advance exactly how they're going to do it.
Yudkowsky lays out a basic picture of intelligence that, once accepted, takes us into the details and edges us towards the cliff.
And now, we're going to introduce someone who tosses us fully into the canyon.
them.
Yudkowsky just brought in the concept we mentioned earlier of value alignment in artificial intelligence.
There's a related problem called the control or containment problem.
Both are concerned with the issue of just how we would go about building something that is unfathomably smarter and more competent than us, that we could either contain in some way to ensure it wouldn't trample us, and as you'll soon hear, that really would take no malicious intent on its part or even our part, or that its goals would be aligned with ours in such a way that it would be making our lives genuinely better.
It turns out that both of those problems are incredibly difficult to think about, let alone solve.
The control problem entails trying to contain something which, by definition, can outsmart us in ways that we literally can't imagine.
Just think of trying to keep a prisoner locked in a jail cell who had the ability to know exactly which specific bribes or threats would compel every guard in the place to unlock the door, even if those guards aren't aware of their own vulnerabilities.
Or perhaps even more basically, the prisoner simply discovers features in the laws of physics that we have not yet understood, and that somehow enable him to walk through the thick walls which we were sure would stop him.
And the other problem, that of value alignment, involves not only discovering what we truly want, but figuring out a way to express it precisely and mathematically, so as to not cause any unintentional and civilization-threatening destruction.
It turns out that this is incredibly hard to do as well.
This particular problem nearly flips the super-intelligent threat on its head to something more like a super-dumb, or let's say, super-literal machine, which doesn't understand all the unspoken considerations that we humans have when we ask someone to do something for us.
This is what Sam was alluding to in the first conversation when he referenced a paperclip universe.
The concern is that a simple command to a super-intelligent machine, such as, make paperclips as fast as possible, could result in the machine taking the as-fast-as-possible part of that command so literally that it attempts to maximize its speed and performance by using raw materials, even the carbon in our bodies, to build hard drives in order to run billions of simulations to figure out the best method for making paperclips.
Clearly, that misunderstanding would be rather unfortunate.
And neither of these questions of value alignment or containment deal with the potentially more mundane terrorism threat.
The threat of a bad actor who would purposefully unleash the AI to inflict massive harm.
But let's save that cheery picture for later.
Now, let's continue our journey down the AI path with the professor of physics and author Max Tegmark, who dedicates much of his brilliant mind towards these questions.
Tegmark starts by taking us back to our prison analogy.
But this time, he places us in the cell and imagines the equivalent of a world of helpless and hapless five-year-olds making a real mess of things outside of the prison walls.
But we'll start first with Sam laying out his conception of these relevant AI safety questions.
This comes from episode 94, The Frontiers of Intelligence.
Well, let's talk about this breakout risk because this is really the first concern of everybody who's been thinking about what has been called the alignment problem or the control problem.
How do we create an AI that is superhuman in its abilities and do that in a context where it is still safe?
I mean, once we cross into the end zone and are still trying to assess whether the system we have built is perfectly aligned with our values, how do we keep it from Destroying us if it isn't perfectly aligned and the...
solution to that problem is to keep it locked in a box.
But that's a harder project than it first appears, and you have many smart people assuming that it's a trivially easy project.
I mean, I've got people like Neil deGrasse Tyson on my podcast saying that he's just going to unplug any superhuman AI if it starts misbehaving, or shoot it with a rifle.
Now, he's a little tongue-in-cheek there, but he clearly has a picture of the development process here that makes the containment of an AI a very easy problem to solve.
And even if that's true at the beginning of the process, it's by no means obvious that it remains easy in perpetuity.
I mean, you have people interacting with the AI that gets built, and at one point you
You described several scenarios of breakout, and you point out that even if the AI's intentions are perfectly benign, if in fact it is value-aligned with us, it may still want to break out because, I mean, just imagine how you would feel if you had nothing but the interests of humanity at heart, but you were in a situation where every other grown-up on Earth died,
And now you're basically imprisoned by a population of five-year-olds who you're trying to guide from your jail cell to make a better world.
And I'll let you describe it, but take me to the prison planet run by five-year-olds.
Yeah, so when you're in that situation, obviously, it's extremely frustrating for you, even if you have only the best intentions for the five-year-olds.
You know, you want to teach them how to plant food, but they won't let you outside to show you.
So you have to try to explain, but you can't write down to-do lists for them either, because then first you have to teach them to read, which takes a very, very long time.
You also can't Show them how to use any power tools because they're afraid to give them to you because they don't understand these tools well enough to be convinced that you can't use them to break out.
You would have an incentive, even if your goal is just to help the five-year-olds to first break out and then help them.
Now, before we talk more about breakout, though, I think it's worth taking a quick step back because you talked multiple times now about superhuman intelligence.
I think it's very important to be clear that intelligence is not just something that goes on a one dimensional scale, like an IQ.
And if your IQ is above a certain number, you're superhuman.
It's very important to distinguish between narrow intelligence and broad intelligence.
Intelligence is a phrase, a word that different people use to mean a whole lot of different things.
And they argue about it.
In the book, it just takes this very broad definition that intelligence is how good you are at accomplishing complex goals, which means your intelligence is a spectrum.
How good are you at this?
How good are you at that?
And it's just like in sports, it would make no sense to say that there's a single number, your athletic coefficient, AQ, which determines how good you're going to be winning Olympic medals.
And the athlete that has the highest AQ is going to win all the medals.
So today what we have is a lot of devices that actually have superhuman intelligence and very narrow tasks.
We've had calculators that can multiply numbers better than us for a very long time.
We have machines that can play Go better than us and drive better than us, but they still can't beat us at tic-tac-toe unless they're programmed for that.
Whereas we humans have this very broad intelligence.
So when I talk about superhuman intelligence with you now, that's really shorthand for what we in geek speak call superhuman artificial general intelligence.
Let me just come back to your question about the breakout.
There are two schools of thought for how one should create a beneficial future if we have superintelligence.
One is to lock them up and keep them confined.
Like you mentioned.
But there's also a school of thought that says that that's immoral if these machines can also have a subjective experience, and they shouldn't be treated like slaves.
And that a better approach is instead to let them be free, but just make sure that their values or goals are aligned with ours.
After all, grown-up parents are more intelligent than their one-year-old kids, but that's fine for the kids, because the parents have goals that are aligned with what's best for the kids, right?
But if you do go the confinement route, after all, this enslaved god scenario, as I call it, yes, this is extremely difficult.
As that five-year-old example illustrates, first of all, almost whatever open-ended goal you give your machine, it's probably going to have an incentive to try to break out in one way or the other.
And when people simply say, oh, I'll unplug it, you know, if you're chased by a heat-seeking missile, you probably wouldn't say, I'm not worried, I'll just unplug it.
We have to let go of this old-fashioned Idea that intelligence is just something that sits in your laptop.
Good luck unplugging the internet.
And even if you initially, like in my first book scenario, have physical confinement, where you have a machine in a room, you're going to want to communicate with it somehow, right?
So that you can get useful information from it to get rich or take power or whatever you want to do.
And you're going to need to put some information into it about the world so it can Do smart things for you.
Which already shows how tricky this is.
I'm absolutely not saying it's impossible, but I think it's fair to say that...
It's not at all clear that it's easy either.
The other one of getting the goals aligned, it's also extremely difficult.
First of all, you need to get the machine able to understand your goals.
So if you have a future self-driving car and you tell it to take you to the airport as fast as possible, and then you get there covered in vomit, chased by police helicopters, and you're like, this is not what I asked for.
And it replies, that is exactly what you asked for.
Then you realize how hard it is to get that machine to learn your goals, right?
If you tell an Uber driver to take you to the airport as fast as possible, she's going to know that you actually had additional goals that you didn't explicitly need to say.
Because she's a human too, and she understands where you're coming from.
But for someone made out of silicon, you have to actually explicitly have it learn all of those other things.
That we humans care about.
So that's hard.
And then once you can understand your goals, that doesn't mean it's going to adopt your goals.
I mean, everybody who has kids knows that.
And finally, if you get the machine to adopt your goals, then how can you ensure that it's going to retain those goals as it gradually gets smarter and smarter through self-improvement?
Most of us grownups have pretty different goals from what we had when we were five.
I'm a lot less excited about Legos now, for example.
And we don't want a super intelligent AI to just think about this goal of being nice to humans as some little passing fad from its early youth.
It seems to me that the second scenario of value alignment does imply the first of keeping the AI successfully boxed, at least for a time, because you have to be sure it's value aligned before you let it out in the world, before you let it out on the internet, for instance, or create robots that have superhuman intelligence that are functioning autonomously out in the world.
Do you see a development path where we don't actually have to solve the boxing problem, at least initially?
No, I think you're completely right.
Even if your intent is to build a value line AI and let it out, you clearly are going to need to have it boxed up during the development phase when you're just messing around with it.
Just like any biolab that deals with dangerous pathogens is very carefully sealed off and It's this highlights the incredibly pathetic state of computer security today.
I mean, and I think pretty much everybody who listens to this has at some point experienced the blue screen of death, courtesy of Microsoft Windows or the spinning wheel of doom, courtesy of Apple, and We need to get away from that to have truly robust machines, if we're ever going to be able to have AI systems that we can trust, that are provably secure.
And I feel it's actually quite embarrassing that we're so flippant about this.
It's maybe annoying if your computer crashes and you lose one hour of work that you hadn't saved, but it's not as funny anymore if it's your self-driving car that crashed or the control system for your nuclear power plant or your nuclear weapon system or something like that.
And when we start talking about human-level AI and boxing systems, you have to have this much higher level of safety mentality where you've really made this a priority, the way we Yeah, you describe in the book various catastrophes that have happened by virtue of software glitches or just bad user interface where, you know, the dot on the screen or the number on the screen is too small for the human user to deal with in real time.
And so there have been plane crashes where scores of people have died and patients have been annihilated by having, you know, hundreds of times the radiation dose that they should have gotten in various machines because the software was improperly calibrated or the user had selected the wrong option.
And so we're by no means perfect at this, even when we have a human in the loop.
And here we're talking about systems that we're creating that are going to be fundamentally autonomous.
And, you know, the idea of having perfect software that has been perfectly debugged before it assumes these massive responsibilities is fairly daunting.
I mean, just how do we recover from something like, you know, seeing the stock market go to zero because we didn't understand the AI that we unleashed on, you know, the Dow Jones or The financial system generally.
These are not impossible outcomes.
Yeah, you raise a very important point there.
Just to inject some optimism in this, I want to emphasize that First of all, there's a huge upside also, if we can get this right.
Because people are bad at things.
In all of these areas where there were horrible accidents, of course, technology can save lives and healthcare and transportation and so many other areas.
So there's an incentive to do it.
And secondly, there are examples in history where we've had really good safety engineering Built in from the beginning.
For example, when we sent Neil Armstrong, Buzz Aldrin and Michael Collins to the moon in 1969, they did not die.
There were tons of things that could have gone wrong, but NASA very meticulously tried to predict everything that possibly could go wrong and then take precautions.
So it didn't happen, right?
They weren't luck.
It wasn't luck that got them there.
It was planning.
And I think we need to shift into this safety engineering mentality with AI development.
Throughout history, it's always been the situation that we could create a better future with technology, as long as we won this race between the growing power of the technology and the growing wisdom with which we managed it.
And in the past, we, by and large, used the strategy of learning from mistakes to stay ahead in the race.
We invented fire, oopsie, screwed up a bunch of times, and then we invented the fire extinguisher.
We invented cars.
Oopsie, invented the seatbelt.
But with more powerful technology like nuclear weapons, synthetic biology, superintelligence, we don't want to learn from mistakes.
That's a terrible strategy.
We instead want to have a safety engineering mentality where we plan ahead and get things right the first time, because that might be the only time we have.
It's helpful to note the optimism that Tegmark plants in between the flashing warning signs.
Artificial intelligence holds incredible potential to bring about inarguably positive changes for humanity, like prolonging lives, eliminating diseases, avoiding all automobile accidents, increasing logistic efficiency in order to deliver food or medical supplies, cleaning the climate, increasing crop yields, expanding our cognitive abilities to learn languages or improve our memory.
The list goes on.
Imagine being able to simulate the outcome of a policy decision with a high degree of confidence in order to morally assess it consequentially before it is actualized.
Now, some of those pipe dreams may run contrary to the laws of physics, but the likely possible positive outcomes are so tempting and morally compelling that the urgency to think through the dangers is even more pressing than it first seems.
Tegmark's book on the subject where much of that came from is fantastic.
It's called Life 3.0.
Just a reminder that a reading, watching, and listening list will be provided at the end of this compilation, which will have all the relevant texts and links from the guests featured here.
Somewhere in the middle of the chronology of these conversations, Sam delivered a TED Talk that focused on and tried to draw attention to the value alignment problem.
Much of his thinking about this entire topic was heavily influenced by the philosopher Nick Bostrom's book, Superintelligence.
Sam had Nick on the podcast, though their conversation delved into slightly different areas of existential risk and ethics, which belong in other compilations.
But while we're on the topic of the safety and promise of AI, we'll borrow some of Bostrom's helpful frameworks.
Bostrom draws up a taxonomy of four paths of development for an AI, each with its own safety and control conundrums.
He calls these different paths oracles, genies, sovereigns, and tools.
An artificially intelligent oracle would be a sort of question and answer machine, which we would simply seek advice from.
It wouldn't have the power to execute or implement its solutions directly.
That would be our job.
Think of a super intelligent wise sage sitting on a mountaintop answering our questions about how to solve climate change or cure a disease.
The AI genie and an AI sovereign both would take on a wish or desired outcome which we impart to it and pursue it with some autonomy and power to achieve it out in the world.
Perhaps it would work in concert with nanorobots or some other networked physical entities to do its work.
The Genie would be given specific wishes to fulfill, while the Sovereign might be given broad, open-ended, long-range mandates like increase flourishing or reduce hunger.
And lastly, the Tool AI would simply do exactly what we command it to do, and only assist us to achieve things we already knew how to accomplish.
The Tool would forever remain under our control, while completing our tasks and easing our burden of work.
There are debates and concerns about the impossibility of each of these entities, and ethical concerns about the potential consciousness and immoral exploitation of any of these inventions, but we'll table those notions just for a bit.
This next section digs in deeper on the ideas of a genie or a sovereign AI, which is given the ability to execute our wishes and commands autonomously.
Can we be assured that the genie or sovereign will understand us, and that its values will align in crucial ways with ours?
In this clip, Stuart Russell, a professor of computer science at Cal Berkeley, gets us further into the value alignment problem, and tries to imagine all the possible ways that having a genie or sovereign in front of us might go terribly wrong, and, of course, what we might be able to do to make it go phenomenally right.
Sam considers this issue of value alignment central to making any sense of AI.
So, this is Stuart Russell from Episode 53, The Dawn of Artificial Intelligence.
Let's talk about that issue of what Bostrom called the control problem.
I guess we could call it the safety problem.
Just perhaps you can briefly sketch the concern here.
What is the concern about general AI getting away from us?
How do you articulate that?
So you mentioned earlier that this is a concern that's being articulated by non-computer scientists.
And Bostrom's book, Superintelligence, was certainly instrumental in bringing it to the attention of a wide audience, you know, people like Bill Gates, Elon Musk, and so on.
But the fact is that these concerns have been articulated by the central figures in computer science and AI.
So I'm actually gonna... Going back to I.J.
Goode and von Neumann.
Well, and Alan Turing himself.
Right.
So people, a lot of people may not know about this, but I'm just going to read a little quote.
So Alan Turing gave a talk on BBC Radio, Radio 3, in 1951.
So he said, if a machine can think, it might think more intelligently than we do.
And then where should we be?
Even if we could keep the machines in a subservient position, for instance, by turning off the power at strategic moments, We should, as a species, feel greatly humbled.
This new danger is certainly something which can give us anxiety.
So that's a pretty clear, you know, if we achieve superintelligent AI, we could have a serious problem.
Another person who talked about this issue was Norbert Wiener.
So Norbert Wiener was one of the leading applied mathematicians of the 20th century.
The founder of a good deal of modern control theory and automation.
He's often called the father of cybernetics.
So he was concerned because he saw Arthur Samuel's checker playing program in 1959, learning to play checkers by itself, a little bit like the DQN that I described, learning to play video games.
This is nineteen fifty nine so more than fifty years ago learning to play checkers better than its creator.
And he saw clearly in this the seeds of.
The possibility of systems that could out distance human beings in general so.
And he was more specific about what the problem is so it's a turing's warning is in some sense the same concern that gorillas might have had about humans if they had thought.
A few million years ago, when the human species branched off from the evolutionary line of the gorillas, if the gorillas had said to themselves, you know, should we create these human beings, right?
They're going to be much smarter than us.
Yeah, it kind of makes me worried, right?
And they would have been right to worry, because as a species, they sort of completely lost control over their own future, and humans control everything that they care about.
So Turing is really Talking about this general sense of unease about making something smarter than you is that a good idea and what we just said was was this if we use to achieve our purposes.
Mechanical agency with whose operation we cannot interfere effectively.
Will you be able to be quite sure that the purpose put into the machine is the purpose which we really desire.
So this is 1860.
Nowadays we call this the value alignment problem.
How do we make sure that the values that the machine is trying to optimize are in fact the values of the human who is trying to get the machine to do something or the values of the human race in general.
And so Wiener actually points to the Sorcerer's Apprentice Story, uh, as a typical example of when you give a goal to a machine, in this case, fetch water.
If you don't specify it correctly, if you don't cross every T and dot every I and make sure you've covered everything, then machines being optimizes, they will find ways to do things that you don't expect.
Uh, and those ways may make you very unhappy on this story goes back to King Midas.
Uh, you know, 500 and whatever BC, um, where he got exactly what he said, which is the thing turns to gold, uh, which is definitely not what he wanted.
He didn't want his food and water to turn to gold or his relatives to turn to gold, but he got what he said he wanted.
All of the stories with the genies, the same thing, right?
You, you give a wish to a genie, the genie carries out your wish very literally.
And then, you know, the third wish is always, you know, can you undo the first two?
Cause I got them wrong.
And the problem with super intelligent AI is that you might not be able to have that third wish.
Or even a second wish.
Yeah.
So if you get it wrong, you might wish for something very benign sounding like, you know, could you cure cancer?
But if you haven't told the machine that you want cancer cured, but you also want human beings to be alive.
So a simple way to cure cancer in humans is not to have any humans.
A quick way to Come up with a cure for cancer is to use the entire human race's guinea pigs or for millions of different drugs that might cure cancer.
So there's all kinds of ways things can go wrong.
And, you know, we have, you know, governments all over the world try to write tax laws that don't have these kinds of loopholes and they fail over and over and over again.
And they're only competing against ordinary Humans, you know, tax lawyers and rich people.
And yet they still fail despite there being billions of dollars at stake.
So our track record of being able to specify objectives and constraints completely so that we are sure to be happy with the results, our track record is abysmal.
And unfortunately, we don't really have a scientific discipline for how to do this.
So generally, we have all these scientific disciplines, AI, control theory, economics, operations research, that are about how do you optimize an objective?
But none of them are about well, what should the objective be so that we're happy with the results?
So that's really, I think, the modern understanding, as described in Bostrom's book and other papers of why A super intelligent machine could be problematic.
It's because if we give it an objective which is different from what we really want, then we're basically like creating a chess match with a machine, right?
Now there's us with our objective and it with the objective we gave it, which is different from what we really want.
So it's kind of like having a chess match for the whole world.
We're not too good at beating machines at chess.
Throughout these clips, we've spoken about AI development in the abstract, as a sort of technical achievement that you can imagine happening in a generic lab somewhere.
But this next clip is going to take an important step and put this thought experiment into the real world.
If this lab does create something that crosses the AGI threshold, the lab will exist in a country.
And that country will have alliances, enemies, paranoias, prejudices, histories, corruptions, and financial incentives like any country.
How might this play out?
If you'd like to continue listening to this conversation, you'll need to subscribe at SamHarris.org.
Once you do, you'll get access to all full-length episodes of the Making Sense Podcast, along with other subscriber-only content, including bonus episodes, NAMAs, and the conversations I've been having on the Waking Up app.
Export Selection