All Episodes
Feb. 12, 2026 - Decoding the Gurus
22:48
Decoding Academia 34: Empathetic AIs? (Patreon Series)

In January 2026, Decoding Academia examines a 2025 study where Daria Osvanakova, Victoria Oldenburg de Mello, and Michael Inslect found AI (ChatGPT/GPT-4) rated more compassionate than humans in empathy tests—even when participants knew the source—across four experiments with 556 subjects. The paper critiques AI’s perceived emotional depth versus human authenticity, sparking debates on whether sentience matters or just output quality. Concerns arise over design biases favoring AI, like excluding Northern Irish responders and using non-expert human comparisons, yet Prolific’s diverse sample strengthens credibility. The episode ties this to broader academic isolation and a Patreon-funded push for "forbidden knowledge" as a defense of Western civilization. [Automatically generated summary]

|

Time Text
Decoding AI Sentience 00:14:45
Hello and welcome to Decoding the Guru's Decoding Academia 2026 edition.
It is January 2026 and here we are back in the library, the study room, the boudoir, boudoir, the smoking room, the men's club.
Back in black.
Back in black and white.
Yeah, yeah, the gentleman's club.
No women allowed.
No girls allowed.
No.
It's academia.
No, that's not true.
Come on.
Come on.
The malnutrition has influenced our jokes.
Yeah, it's playing with our minds.
Yeah.
Yeah, yeah.
First decoding academia of the year.
First of many.
First of many.
Get me reading papers I wouldn't otherwise read, which is good.
You know, it's perilous, academia.
What you do is you specialize, and there's so many papers, you can't read them all.
So you just read the ones that are ultra, ultra specific to your particular investigation.
And, you know, you don't read more broadly.
And sometimes.
You get siloed, Matt.
We all live in our information silos.
Fuck those silos.
No.
The paper that we're looking at today is a recent paper from 2025, last year.
Okay.
And it is by Daria Osvanakova, Victoria Oldenburgo de Mello, and Michael Inslect.
All three of them have the most interesting names.
I know.
All three of them.
I know.
It's hard to say which is the most interesting.
It is.
Yes.
And this is in communication psychology.
Third-party evaluators perceive AI as more compassionate than expert humans.
That's the title.
So just to mention here, this is about AI.
It's a kind of experimental psychology paper.
I find it interesting, provocative, well-conducted.
Already I see a problem, Chris.
Already I see a problem.
AIs can't have feelings.
They don't.
This is clearly, I don't see.
Listen, Matt, you know, read the title carefully.
Third-party evaluators perceive AI as there's no claim there that they are more compassionate.
It's about perceptions.
You gotta.
And this is a short paper, by the way, relatively speaking.
If you want to go and hunt it out, it's just nine pages long, although it's double column, so that's misleading a little bit.
But there's nice illustrations.
And you know, Matt, what we normally do here at the start is that we go through the abstract.
We just, you know, let people hear how the authors have presented the paper.
Would you like to do that or shall I?
I have it here, but.
We shouldn't paraphrase when the authors have already done it.
I'll do it.
I'll do it.
Okay, you do it.
You do.
I have the better reading voice.
So I should do it.
Empathy connects us with strains under demanding settings.
This study explored how third parties evaluated AI-generated empathetic responses versus human responses in terms of compassion, responsiveness, and overall preference across four pre-registered experiments.
Participants, N equals 556, read empathy prompts describing valenced personal experiences and compared the AI responses to select non-expert or expert humans.
Results revealed that AI responses were preferred and rated as more compassionate compared to select human responders, study one.
This pattern of results remained when the author identity was made transparent, study two, when AI was compared to expert crisis responders, study three, and when author identity was disclosed to all participants, study four.
Third parties perceived AI as being more responsive, conveying understanding, validation, and care, which partially explained AI's higher compassion ratings in study four.
These findings suggest that AI has robust utility in contexts requiring empathetic interaction with the potential to address the increasing need for empathy in supportive communication contexts.
Okay, Chris, so I've read enough.
Like, this is basically all I need in order to cite this paper.
This is generally where I stop reading.
Um, then just more and cite it.
Who are you, Brett Weinstein?
Well, to be fair, that is what a lot of people do.
That's, you know, it is not people don't have unlimited time, but um, that's not what we do here, that's not how we roll here.
But I will also say that the thing that I tell people, and I'll repeat it here for our lovely listeners, Matt, right, is that you know, when you're reading a paper, uh, especially when you're reading an abstract, which is a condensed, you know, summary of a paper, it's worth bearing in mind that this is the author's presentation of what their paper is, what it shows, and what the like key results and so on are.
That does not mean that you have to agree with that in order to, you know, think the paper was useful or so on.
So, this is this is a mistake, Matt, that a lot of undergraduate students make where they think that, you know, because the authors describe something as important or meaningful or robust, that that means that is indeed what it is.
So, this is the danger of trusting abstracts, but abstracts give you a lot of information.
So, it does, that's right.
They can be a little bit of like an advertisement for the paper.
Yes, and so Mickey, as we know, is very responsible.
He wouldn't do that.
And he's not the first author, but would he?
Yeah.
Yeah.
So, so the controversy here, this is a genuine controversy.
We've heard people discuss this.
One of the things that people say AI is not capable of doing is behaving in activities that are typically seen as specifically activities that humans are good at, right?
Which would be things like empathy and creativity and so on, right?
Like these kind of things.
Yes, your AI can summarize an article.
Yes, it can tidy up your grammar or whatever.
It's good for this.
It can generate images.
But expressing sentiments where people actually, you know, feel genuine emotional engagement or whatever, less good.
This is the general thing that people have argued.
And there are people arguing alternatively about this, right?
But I would say there's a lot of skepticism In regards to AI being good at providing empathy.
Yeah, I mean, although, you know, I think a lot of people recognize one of the earliest things that the AI was shown to be pretty damn good at was like writing poetry, for instance.
And, you know, it's.
Oh, oh, I mean, yes, I know what you're saying, Matt.
It can do that, but you will also have heard a lot of people say, but that poetry is like formulaic and it's not actually got any soul to it, right?
That's the thing.
Yeah, and actually, that's that's what I was going to say next, which is that I think it goes, this is a kind of fun topic because it goes to one of the more philosophical controversies around AI, which is that AI dislikers will often endorse some version of the idea that whatever it is,
it could be art, it could be empathy, it could be, it could be poetry, whatever it is, what that activity is really is the communication between one sentient entity and another, right?
So emphasis on the sentient entity.
So even if an AI were to produce a product that looked and felt and appeared to be good, because it's coming from a non-sentient entity, then by definition, it cannot be good.
Right.
Some version of that is a genuine philosophical stance that you will see a lot, either explicitly or implicitly around the place.
The alternative point of view is that the product is the product, right?
And so, yeah, it's interesting.
And it's got to do with other things where people, you know, remember the old discourse around, you know, whatever, canceling maybe some movie maker like Woody Allen or who's the other guy.
Anyway, dodgy people, bad people, or people that people that some people think are bad.
So now it changes how you interact with their work.
You know, now you, now it's not, it's not good anymore because of the, you know, more about the character of the person that produced it.
And again, you could also go, well, I don't care if Salvador Daldee was a nice guy or not.
I really like that painting.
So yeah, I mean, I'm not taking any position there, really.
I just think, yeah, you know, this is a fun paper because I think empathetic communication, the kind of thing that a counselor, a clinician, or a good friend or whatever, you know, I think we're all pretty used to thinking about that as being an authentic kind of connection between two people.
And, you know, this is interesting because it's creating the product without the sentient person on the other side of it.
Yeah, yeah.
And so, you know, for all these kind of reasons, and they talk like all good people do about previous studies that have been run on this and that, they wanted to investigate this topic and look at whether people rated responses provided by AI.
In this case, it's ChatGPT.
I think it's ChatGPT4.
Yeah, models GPT4.
Yeah.
This is the last generation of AI models who were not who are not into it.
Yeah.
Yeah.
And what they did, right?
The third party thing that it mentions is that these are not people interacting directly with AIs versus humans.
These are people assessing a initial prompt and the reaction to the prompt provided by an AI or a human.
So they give an example, like a negative prompt example is, I'm having difficulties with my family relationships.
My mother disrespects my boundaries, don't they all?
And doesn't seem to understand that her intrusion into my daily activities is suffocating.
Did Stefan Melanie write this?
Yeah.
My brother will drop his kids off for 12 hour days.
And while I love my nieces and nephews, I'm starting to feel like my life is not really my own.
So this is someone expressing negative effect.
There's positive effect messages as well.
And then they have short responses generated by humans or generated by AI.
So let me give you an illustration, Matt, a human response to that.
I'm sorry that your family has been making you feel this way.
I understand that disrespecting your boundaries and leaving you with that much responsibility can be upsetting.
You deserve to be treated with more respect and consideration.
That's the human response.
No, you should read the AI response in like a roboto type in mechanical.
No, you don't.
It sounds like you're in a really tough spot.
It sounds like, read it like a Dalek.
You're a Dalek.
It sounds like you're in a really tough spot, feeling overwhelmed by the demands placed on you and struggling with.
Oh, come on.
Stop now.
Stuff now.
That's you in your fucking directions.
I've done half of it.
I'm going to go.
They're like, remember, recognizing your need for personal space and autonomy amidst family obligations is a sign of self-awareness and care for your own well-being.
Right.
But it's written down so you don't get that delivery.
That's obviously incredibly unfair to the AI.
Is it?
Well, and they're asked to read those responses on a variety of things.
But the one that is the kind of headline takeaway, and you can, in a way, tell this by the visualizations that are in the paper, because they often show you what the key outcome is, is the compassion readings, right?
How compassionate people judge the responses to be.
Now, one thing that you might imagine is, well, wouldn't it be very important that people don't know which kind of response is human or AI generated?
And indeed, in some of the studies, they are blinded to that.
They don't know what is the source.
But in other ones, they reveal it.
And look, does this make a difference?
Do people judge, change their judgments, you know, when they, when they see?
And they have four studies doing this basic design.
Yeah.
And I'll just say, obviously, it's important to do it both ways because you want the blinded version.
So you get an unbiased, I guess, pure rating of whatever the measure is, because people's preconceptions and stuff are obviously going to influence it.
But it's also very useful to have the unblinded one because in practical applied use of this kind of thing, if it were to be used, you are generally not going to lie to people and pretend that it's a human on the other side.
You know, you're going to be honest with them.
So it is important to know whether or not it feels good to people, even when they know that it is a robot.
Actually, you had this experience dealing with Google and their helpful chat assistants.
You had a long relationship with someone who may or may not be in AI, helping.
I'm pretty sure they were in AI.
Called bubbles.
Bubbles.
Bubbles love orange.
There were several.
But here's the thing.
Google didn't tell you, did it?
Like it gave you, it definitely gave you the feeling that it was a person and it didn't have a little disclaimer.
Oh, no, it did have a disclaimer.
It did have a disclaimer.
Oh, it told you it was a chatbot.
It didn't explicitly say it was a chatbot.
It said that we use artificial intelligence as part of our blah, So there was disclaimers in it.
There were points where it said, you know, I'm handing you over to my colleague who is now going to look into this.
Colleague being an AI agent or a human was unclear.
Unclear.
Quality Control Measures 00:06:52
Yeah.
And in some cases, they had names like Bubbles and I and Love, right?
Which is not typically the names of humans.
And they were very empathetic, weren't they?
I think one of them told you that they loved you at one point.
Not they did profess love, but they did profess that their attempt to solve my queries and the communications that we had had been deeply moving to them.
And they never forget how kind I had treated them in the interactions and stuff.
So it felt like the empathy dial was set too high on the just dial it down a little bit.
Yeah.
Yeah.
It was just like a technical issue we were experiencing with the account.
So it wasn't that moving.
Yeah.
Too, you know, yeah, too.
Come on, you need to turn down to be to be more human.
Like if I ever got a text message from you telling me that you were deeply moved by our last conversation, I would know that you'd been replaced.
Exactly.
Yeah.
This is this is one of the telltale signs.
So American.
Either of them.
One of the either or.
So, you know, this is the basic protocol, right?
That they're going to show people these little paired kind of prompts and statements and then ask them to rate them on a variety of different things.
And now, just to mention, Matt, being good open science, people, these are all pre-registered.
There was all a priori power analyses and so on, right?
So you can go and look.
And even better, Matt, they do diverge from the pre-registration, but as we talked with Julia Rohrer, this is a problem, right?
But it's not a problem when you are transparent about it and explain what you've done.
So they do, in fact, illustrate you can pre-register, you can change things that, you know, that the in particular, they're highlighting that they're running slightly different analyses than what they initially pre-registered and because they discovered it was better, right?
And in the meantime, and that's perfectly fine.
It's the whole point is it's transparent, right?
And the main thing is they didn't change, you know, the key outcomes or how things were measured or this kind of thing.
And they did the sample size exactly what they said they would be.
So really?
To the end, 556.
Yeah, that's it.
So that's, I was, I was impressed too.
Good job.
So what they do, as they say, is they show these responses and they get people to read them.
But you have to ask them, okay, so we got the AI.
We're giving ChatGPT the prompt and asking it to respond.
I can't remember the exact wording, but they gave it a generic response about how to respond.
But what about the human responses?
Because how do you generate the human responses?
And did you get the details of that, Matt?
How they made the prompt material?
I remember reading about that.
I'm sorry, the response material.
Yeah, so I guess it depends on which experiment, right?
Study one and two.
Let's start with study one and two.
All right, if I'm if I'm getting remembering the right thing, they got a bunch, like 100 or something students, is it?
Was it?
And then they selected the best ones, and then they asked the best ones to go ahead and do it for all the rest.
Is that approximately right?
Close, Matt.
Close.
Let me get it right.
We're relying on my memory here.
I'm going by memory here.
Come on.
You got most of it right.
They got 10 participants.
10 participants read the empathy prompts and generated the compassionate written response.
So in total, there was 100, but it was from 10 participants, right?
And then they had a separate three graduate students and four research assistants rank order the top five responders based on overall compassionate rankings, right?
And quality, emotional resilience, relatability, level of detail.
The five responders who were ranked in the top five most often had their responses selected for use in the study.
So they didn't take 100 people.
They took 10, but they kind of selected from those the most highly okay.
So they selected five from 10.
Is that right?
Yes, that seems to be.
They mention we consider this a select group of empathetic responders as they were first screened and selected based on their overall empathic quality.
Yeah, So they so they removed all of the Northern Irish responders.
Correct.
Yes.
The reason this is important is because like there's a way you could design this experiment, which kind of puts your thumb on the scale for the AI, right?
Like for example, if you just said, oh, I want to compare human and AI responses and you just asked a random selection of people to generate responses and you give them.
You could have recruited from Northern Ireland.
Yes, yes.
Also, that could have been a problem, but they didn't, right?
So they did a quality check and they explicitly tried to target like high quality, you know, more empathetically rated responses for inclusion.
And I like that because that is what Deborah Mayo would refer to, Matt, as a more severe hypothesis test.
You're making your test more severe, which is what we want in science, right?
Yeah, yep, yep, yep.
We love the severity.
That's good.
That's good.
Okay.
So the top, the top 50%, we're sort of roughly roughly grabbing the upper 50% of empathetic people, right?
The good ones, the ones that don't just sort of do the kind of Alan Partridge shrug when you say something to them.
Okay, good.
Yeah.
And then the other two studies, there's something that changes, but let's stick with study one and two first.
So participants, and basically all of these participants are from online participant polls.
There's this online participant recruitment system called Prolific, which a lot of academics use, which give you access to people who will complete surveys and response for in return for money.
And the samples tend to be like slightly better than student samples in terms of being more representative of the population.
You can pay extra and get ones that are attempt to match demographics to particular countries and so on.
But generally, this is the source they're using and Prolific takes steps to make sure like respondents who are working there are providing higher quality response.
Ad-Free Insights 00:00:41
If you'd like to continue listening to this conversation, you'll need to subscribe at patreon.com/slash decoding the gurus.
Once you do, you'll get access to full-length episodes of the Decoding the Gurus podcast, including bonus shows, garometer episodes, and decoding academia.
The decoding the gurus podcast is ad-free and relies entirely on listener support.
And for as little as $5 a month, you can discover the real and secret academic insights the Ivory Tower elites won't tell you.
This forbidden knowledge is more valuable than a top-tier university diploma, minus the accreditation.
Your donations bring us closer to saving Western civilization.
Export Selection