All Episodes
Jan. 30, 2026 - Decoding the Gurus
01:31:40
Open Science, Psychology, and the Art of Not Quite Claiming Causality with Julia Rohrer

In a rare departure from our usual diet of online weirdos, this episode features an academic who is very much not a guru. We’re joined by Julia Rohrer, a psychologist at Leipzig University whose work straddles the disciplinary boundaries of open science, research transparency, and causal inference. Julia is also an editor at Psychological Science and has spent much of the last decade politely pointing out that psychologists often don’t quite know what they’re estimating, why, or under which assumptions.We talk about the state of psychology after the replication crisis, whether open science reforms have genuinely improved research practice (or just added new boxes to tick), and why causal thinking is unavoidable even when researchers insist they are “only describing associations.” Julia explains why the standard dance of imply causality → deny causality → add boilerplate disclaimer is unhelpful, and argues instead for being explicit about the causal questions researchers actually care about and the assumptions required to answer them.Along the way we discuss images of scientists in the public and amongst the gurus, how post-treatment bias sneaks into even well-intentioned experimental designs, why specifying the estimand matters more than running ever-fancier models, and how psychology’s current norms can potentially punish honesty about uncertainty. We also touch on her work on birth-order effects and offer some possible reasons for optimism.With all the guru talk, people sometimes ask us to recommend things that we like, and Julia's work is one such example!LinksJulia Rohrer’s websiteThe 100% CI blogRohrer, J. M. (2024). Causal inference for psychologists who think that causal inference is not for them. Social and Personality Psychology Compass, 18(3), e12948.Rohrer, J. M., Tierney, W., Uhlmann, E. L., DeBruine, L. M., Heyman, T., Jones, B., ... & Yarkoni, T. (2021). Putting the self in self-correction: Findings from the loss-of-confidence project. Perspectives on Psychological Science, 16(6), 1255-1269.Rohrer, J. M., Egloff, B., & Schmukle, S. C. (2015). Examining the effects of birth order on personality. Proceedings of the National Academy of Sciences, 112(46), 14224-14229.BEMC MAY 2024 - Julia Rohrer - "Causal confusions correlate with casual conclusions"Dr. Tobias Dienlin - Less casual causal inference for experiments and longitudinal data: Research talk by Julia Rohrer

|

Time Text
Broader Model Challenges 00:13:54
Hello and welcome to Decoding the Gurus, a podcast where a psychologist and an anthropologist of sorts look at online gurus and assorted weirdos.
But occasionally we don't do that and we sometimes try to speak to people who we think are actually doing good things and have interesting approaches to topics.
And so we have an interview today and a guest joining us who is Julia Rohrer, an academic Matt and I are both a fan of, who works in the Wilhelm Wundt Institute for Psychology at Leipzig University and has been active in open science movement for as long as I have paid attention to them and now more recently does a lot of work about causal inference.
So we are having you on, Julia, not as an intervention for emerging guru-ness, but we're allowed to model that not all academics are the terrible bastards that the gurus keep saying they are.
So yeah, thank you.
Thank you for having me.
And I was a bit concerned when you talked about online weirdos.
This introduction would go into another direction.
No, that's a perfectly accurate introduction.
Thank you.
So Julia, you've been working in causal inference and the whole area of, I guess, reforming psychology, educating psychologists, trying to get us to do better methods.
Do you still feel optimistic and excited about the future of the field or are we wallowing and beyond salvation?
You're already starting with a mean question.
I think maybe like 10 years ago, I wrote a blog post when I was like very much into open science, how like the field wasn't doing great and I was still very optimistic about the future.
I'm not sure how optimistic I am.
I think we are like making good progress.
I'm like, as I'm getting older, I think I see more maybe fundamental conceptual issues.
It is also related to the causal inference stuff.
That does make me sometimes wonder, like, right?
I see all these manuscripts that get submitted to psychological science.
I'm like, I'm not entirely sure we all know what we are doing here and why.
So I'd say it's mixed, but it depends also on the angle of the other person where they come off as optimistic or pessimistic.
I think my coherent structure for this is that the causal inference stuff can be blackpelling slightly.
But whenever I've heard you, Julia, talk about the open science movement, you know, like you say, it depends on the framing and the audience, whether you want to frame it positively or negatively.
But for those in our audience, I think most people would understand the replication crisis and the concern about methodological reform in psychology and other sciences and social sciences.
But it's been now like 14 years since Darrell Bam's Billing the Future paper and the open science movement now is like a standard picture.
Badges are attached to lots of journals and stuff.
So in terms of open science, setting aside causality and causal inference, how about that?
Is there cause for optimism there in the spread of open science and methodological reform?
Or is it similarly a deal of half measures?
So I would be more positive on that side, I guess.
So it has been like 14 years.
So actually, I started like my undergraduate degree when all of this started in 2011.
And I think there has actually been a lot of changes to training, to practices, to also what's just the default norm.
So if you're now like, I'm not going to show you my data, that's none of your business, that is going to raise eyebrows now.
And I think that is huge progress.
And sometimes it's easy to forget that because you would think, oh, by now, all data get shared by default if it's possible and so on.
And that is not the case.
However, we now have a norm that people understand that data ought to be shared and it's normal to request that and it's okay to check and so on.
And if you compare that to other fields, I think where it's still like, oh, why would I give my data to anybody else, right?
Like this is not how it works.
So I think there have been like huge shifts in norms and it's easy to forget about them because I think it's also kind of easy to forget how bad things used to be.
And so sometimes there's like a really bad paper where you can just tell, okay, they tortured the data until they found something across 20 studies.
And the hypothesis is wild to begin with, right?
And then we're all like, oh, yeah, let's party.
Like it's 2010 because we know that this doesn't like fly anymore at the good outlets at least.
And so I do think there has been like huge progress and it's hard to deny.
It has taken 14 years.
So the question is whether that's quick or slow.
I don't know.
I mean, it has been a large chunk of my life, but I think it's fairly like fast because the scientific norms are incredibly sticky and slow to change.
I think there's one small cause for optimism there, and it's a very personal one, which is that I think for me, one of the obstacles to sharing the data and also making the research paper fully replicable and sharing all of the code that produces the key results and figures and so on, was just that it was just a lot of extra work to document it and make sure it's all presentable rather than the sort of hack together thing that most scientists do.
But I've actually found that a little bit of help from the old AIs there actually takes a huge burden off.
They can quite quickly create, you know, very presentable scripts from your things.
You can check for yourself that, yes, it's produced the same results.
And boom, done.
It doesn't have to be a huge job, right?
Yeah, I think the technology has gotten a lot more accessible.
So the open science framework is part of that as well.
But I think also like training has shifted a lot.
So I think like a famous Leipzig anthropologist, Richard McElrith sometimes likes to talk about how when he was like a student, you know, you could maybe like run a t-test or something like that.
And now I think the next generation really does have a much higher skill level also when it comes to all things computational stuff.
And so on.
I can see that like during my own career, like I learned no R during my studies.
And now, and then at some point we started introducing it and the master students, we had to teach them.
Now they are taught by default in statistics.
So the level just keeps rising.
And a lot of things where maybe some years ago I would have said, well, now students don't need to bother with stuff like version controls, that your code is fully documented in all steps.
And now the students are asking me, like, how does this Git thing work?
Can you explain that to us?
And so on.
So I think that has really accelerated.
And now, of course, with, so I struggle a lot, for example, with version control, which ensures that you can reconstruct how you changed your code over time and so on.
And Git is kind of very confusing, but it's the industry standard thing to do that.
And that is made so much more easy by LLMs that can just like answer your stupid questions in a manner that online forums just can't because people tend to presume more knowledge than I tend to have.
And so it's become so much easier.
And you can see that that is, of course, like a big part of it, making it just easy.
And if it's easy, people will do it.
It's a big challenge, though, isn't it?
Like in terms of teaching psychologists and social scientists, even God forbid anthropologists, how to do the technical stuff, notably statistics.
Because what happens still where I am is that students get taught a grab bag of recipes, including t-tests and multiple regression and so on, that are kind of, you know, turn the handle and get the answer.
Assumption testing is taught as like a checklist.
And you know the complaint.
It's not taught as it would be taught to a mathematical statistician.
But of course, it can't be taught properly because these are social scientists, right?
Their main field of study is psychology or something similar.
And the statistics and the technical stuff is like an extra thing.
And it's probably not, in many cases, it's not the skill set that they may have, or it's not the passion that drew them to the field of study.
So I feel like we've got a challenge there in terms of how much people can be taught.
What's your take on that?
So first of all, I'm not sure whether mathematical statisticians are trained in a manner that they are better prepared to actually apply statistics to answer substantive research questions.
Oh, that's true.
That's true.
So sometimes they are those like studies with completely wacky model fitting, whatever.
And then they actually had a mathematician, right?
And it's like, yeah, I mean, the mathematician is like the one responsible for making the mathematics work, but it doesn't mean that they make sense.
But yeah, in any case, I still hear about that a lot.
And in particular, so it's early career researchers starting to teach and they learned all these things.
Oh, wait, it's all a linear model.
So we don't need to teach these 100 things and flowcharts and so on.
But then they are in a context where you have to stick to a certain curriculum and it's like set up in a certain way.
That being said, I think this is changing quite a bit.
At least in Germany for psychology, I can observe it that a lot of the courses have shifted the structure.
So it's less like, I mean, we're still teaching t-tests and so on here in Leipzig, but there's always like an eye on, okay, now that's the broader model and that's the broader model.
And this is how to think about it.
And this is what p-values really mean and so on.
So I think it has improved a lot.
I also think there is, for example, I'm teaching research methods for undergraduates right now.
And this is a distinct lecture from statistics, but it's the same course.
And there aren't great materials actually to teach that, I would say.
So it's a lot of like, okay, so here is a list of features of experiment and here is a list of this and here is a list of that and so on.
And I kind of feel like we are like missing like a cohesive perspective on that.
So for example, when I teach my lecture, it's essentially all causal inference.
I mean, in the beginning, there is some like theory of science and so on, which is a fun part.
But then it's a lot of, okay, so we need a causal angle on the world for everything, not just if you want to make causal claims.
But for example, if you just want to document, oh, how many percent of people are depressed, you still need to kind of understand like what generated the data, like what affects how people respond to the questionnaire, what affects whether they end up in your survey in the first place, right?
Like depressed people are maybe less likely to participate in service and so on.
And so I am personally trying to build something where it's like kind of like it's all like that causal perspective and you draw like a little graph and you write down the things in the world that you think that matter and how they affect each other and try to reason with that.
But it's not been like spelled out anywhere in a textbook.
So this is just like making it up as I go.
And I think if there was a very nice textbook that taught this approach, I think people would start using it because people like using things that just work out of the box.
So there is some stuff missing there.
I also do see a lot of good statistics teaching actually.
And I actually think the statistics teaching is on average way ahead of the practices of the papers that I see submitted to journals.
So I actually think like the other parts may be lagging behind more than teaching.
I found that when I went through like statistics instruction as a graduate student, because I transferred from anthropology, I did a second master's specifically to do like basic quantitative analysis.
And the course that I took there, they were fine.
They were what Matt's describing, the kind of old approach, right?
Here's how to do ANOVAs, here's how to do TTS.
And then after I finished my PhD or during it, I got interested in open science and so on and took Daniel Leakin's online MOOC and other resources that were online, like online material.
And probably if I didn't have the foundation of being forced to sit statistics lectures, it would have been more of a struggle.
But I did kind of go back then and understand what I had been told because I was putting it in the context of like, here is basically like causal entrance and like scientific theory, right, around things.
And I teach research methods and intro stats now as well.
And I know this is getting to the causal entrance part a little bit, but I find that in teaching students about papers, not design, but like reading the papers, in almost all psychology papers, you can write like X to Y as the, that's what they're doing in the paper, whether they say it's causal or not, like that is what's there.
And students learning to identify what is the X to the Y being claimed.
And then they've operationalized that in like a very, a very specific, sometimes insane way.
But in my case, I'm teaching psychology students and sometimes they haven't done that, even though they've been through foundation courses.
So it's kind of interesting because they have all this knowledge about psychology, but not very much about the building block approach, I guess I would put it as.
Causal Claims and Randomization 00:15:21
And this is in Japan.
I'm in Japan, not the Froziate on the rest of the courses in Japan.
It could just be the students of my courses.
But yeah.
I think there's a very funny thing happening there.
So I teach first-year undergraduates, right?
They are not even learning inferential statistics yet, like next semester and so on.
And for them, when I'm like, yeah, and usually people are interested in how things affect each other, that is like the bulk of research.
It's not everything.
It's maybe not even the most important part, but it is a large part of the studies in psychology.
And so, of course, you're interested in how this affects that.
And that is your research question.
Yeah, yeah, sure, of course, of course.
Of course, that's what we're doing.
It makes perfect sense to them.
But then if you have like master students who have spent like years immersed in that literature where you're like, oh, no, we are not interested whether X affects Y.
We are interested in whether there is contingent trees between the within subject changes in X on Y and so on.
They get really confused.
And I mean, they mostly just get confused.
But some of them go so far as to absorb that this is how you talk about these things and presume that these are like meaningful research questions about intra-individual contingencies and so on.
And it's kind of funny.
So I think it's something that comes very naturally to you if you haven't been trained for a very long time that you do not talk about causal effects unless you have an experiment.
Then it's fine, right?
But it also leads to funny things.
So there was recently that paper about the effects of multilingualism and it was a bit of a mess.
So it supposedly showed that multilingualism led to slower biological aging.
And there were many, many things wrong with that paper.
Among them that they didn't even know whether people spoke multiple languages.
So in the end, it was just the country.
So it was a paper that found that people in Luxembourg are healthy because that is the data point that drove everything.
But in any case, so they even said like something, right?
So everything was obviously interpreted causally in all subsequent reporting as well.
And then they had a sentence in there, oh, that future randomized control trials are needed.
And if you just pause for a second, like what is the randomized clinical trial to investigate the effects of country level multilingualism?
Like, are we going to randomize countries and some of them become Luxembourgs and others don't?
And it's just like, if you, if you think about it for a second, just as like a naive person, you're like, wait, well, how does that, what precisely do you have in mind here, right?
Like, is it multilingualism, what you would randomize here?
Like, how's that supposed to work?
But if you spend a lot of time in psychology, you will just be, yeah, of course, you know, randomized clinical, like randomized control trials will solve that.
And they just read it, like they add it to every paper.
And I see this so often.
And as a reviewer, I always have to say, like, could you please spell out what you have in mind here?
Because usually the paper is not a randomized control trial because the thing they are interested in cannot be randomized.
Yeah.
Yeah.
Part of what you're speaking to there is, of course, you know, about causal inference and so on.
But also part of it is our, is the enculturation that happens in a discipline like psychology.
Like we learn the correct phrases.
We learn that this is the way you normally lead an introduction.
These are the caveats you make and the limitations.
And the danger, I guess, is that people just at some point stop thinking and say all the right things like a catechism, like a religious right.
And the whole system sort of encourages people to do that, right?
Because if you don't make those obligatory sort of statements in some shape or form, generally reviewer two will smack you over the wrist.
Yeah.
And so I'm, it's, so I'm trying to kind of work against that as a reviewer and as an editor, right?
So I will be the one pushing back.
But then sometimes I give introductory talks on causal inference and people are like, okay, so how am I supposed to deal with that in my paper, right?
And what are the reviewers going to say?
And then I can just say, okay, so it really depends on the who that reviewer will be.
And so some reviewer just wants you to add that one sentence.
This was not a randomized experiment.
So we cannot interpret it causally.
Here's our causal interpretation anyway.
And then there's reviewers like me who will be like, well, this is just inconsistent, right?
Please instead spell out your assumptions.
I think there are ways to balance that and like leave everybody happy, including the one who's just going through the checklist.
But it is more effortful than just working off the checklist, right?
And adding that sentence that it wasn't randomized.
So future longitudinal and experimental studies are needed.
And then nowadays it's also like you need to say that your findings won't generalize to weird populations and so on, which everybody knows, of course, but you still need to edit.
And so I think it's a bit like different people are pushing into different directions.
And it's always the easy way out to do what, like just add the boilerplate and so on.
I'm trying to pushing against it and to trying, like I'm trying to normalize, like write the papers as if you mean it.
Actually, stand behind what you're saying, but it is a bit of a process.
And I always feel sorry if people then end up having bad experiences.
So this is like a systemic problem, I think, what the peer review process pushes people to do, even if they actually would like to do it better.
Yeah, they get caught in the middle with the kind of those are the standards of the field.
They didn't meet them and they're just like kind of trying to get along, right?
So yeah.
And actually, I thought, Julia, when I've been talking about like people who haven't agreed bad habits yet, whenever talking about basic causal inference, and a lot of people have heard like two things, even if it's their introductory course, is that causality doesn't equal causation, right?
And then the second is you can't infer causality without experiments, right?
Controlled experiments and randomization.
And in those cases, I've asked people first that, you know, can you infer causality without experiments?
And they're like, no.
And then how did we find out that smoking caused cancer?
I mean, I know there are experiments in that literature, but the primary finding is the like longitudinal data of people who were smoking, like epidemiological studies, right?
And then animal studies and whatnot.
But you can't randomize people into like heavy smoking conditions or not and see if they develop cancer, right?
For ethical.
Or similarly, the fact that the majority of scientists agree that a large asteroid collided with Earth, and that's why there aren't so many dinosaurs except for birds around.
But there's like a causal historical fact that is part of science, right?
But it doesn't rely on randomized experiment.
And most people seem to accept, oh, yeah, yeah, that is right.
Like you can talk about causal things that have happened and you can discover things that are like causally related.
But it's like social sciences and psychology has created a rule, which is, but not here, not here.
You won't be allowed to infer anything unless you have a controlled experiment.
And, you know, there are some good reasons behind that.
But yeah, it seems like people are with examples able to override that heuristic or at least realize there's exceptions.
So yeah, I think this is really like a training issue that what sticks with people is that correlation does not equal causation, which is technically correct, right?
And that randomized controlled studies are a great thing.
And I agree.
I think they are the closest thing to magic we have in causal inference, which is amazing for many purposes.
Unfortunately, not for everything.
And I think one problem with teaching that is so, I mean, obviously this causes issues for people who do non-experimental research because then it's like, okay, you're not allowed to make any causal claims.
And causal claims are what is actually interesting.
So yeah, it's just tough luck.
Your research is not going to be interesting, or you have to come up with some workaround to pretend you're doing something else.
But I actually think it's also bad on the experimentalist side of things.
And that's because I do think, so to some extent, people start to believe that randomization will be sufficient to warrant any type of causal claim, because the thing that matters for causality is the randomization.
And this happens to that, that I think leads to all sorts of bad stuff.
So for example, people won't understand that you can only get precisely the causal effect of the thing that you randomized.
And the randomization might not necessarily do what you want it to do.
Some people might not adhere to the instructions and so on.
And that causes all sorts of issues.
And then, for example, in psychology, there's a thing where you just kick out the people that do not follow your instructions.
So, you assign them to do something, some people don't do it.
And you're like, yeah, okay, I'm just excluding them from my study, right?
Because they didn't do what they are supposed to do.
And there is very little awareness that the second you do that, you are actually no longer operating with an experiment because you are essentially doing something depending on what happens after randomization.
And that actually means it's no longer a randomized experiment.
So, the one, I think, the simplest example I can come up with.
So, you might be interested in whether reading an article about causal inference increases your well-being at the end of the day, or maybe even decreases it, right?
And so, you randomize students to either read that article or do something else.
And then you see that, like, I don't know, half of the students don't actually read the article.
And then you just exclude those who did not do that.
But excluding those who didn't read the article will probably selectively exclude those who lacked motivation and maybe had other stuff going on.
So, suddenly, you're looking at like the most motivated students who had a good day, who read the article, to the control group that did something else.
And suddenly, it looks like, oh, yeah, actually, it makes people like super motivated and happy to read about causal inference.
And that's just because you kicked everybody out who wasn't, right?
And so, this happens like a lot.
This is called post-treatment bias because you introduce information about what happened after the treatment, like whether they actually followed or not.
And that's a huge issue.
And I think it's very hard for psychologists to wrap their heads around it.
So, sometimes the experimentalists like get some inkling and then they find out, oh my God, this has huge implications for mediation analysis.
But they lack the terminology because they've never been trained in like systematic causal inference.
So, they notice that something is off.
They can't quite put their finger on it.
And often they will just not notice actually.
So, I often have experimentalists who are at some point like, oh, but wait, so what does it mean if I exclude people who did not like pass the manipulation check?
Where we assume, oh, the manipulation didn't work, but maybe these people are also different and so on.
And so, you usually like people have some ideas.
I mean, experimentalists are smart people.
They work out stuff.
It would be so much easier for them to work it all out if we had ever taught them about causal inference in the first place.
Now, this is probably an obvious question, but clearly there are other sources of evidence for causality other than randomized controlled trials or experiments.
They may not be, you know, no single piece of evidence will probably be the smoking gun.
It won't be definitive.
But, you know, what are some other ways in which a researcher could gather evidence for causality?
So, yeah, that will strongly depend on the specific topics.
But, for example, since you already brought up smoking, so when it comes to smoking, right, by now we have a good mechanistic understanding on what the cigarettes contain, what that stuff can do in your body, how it accumulates, and so on.
So, this actually corresponds to a thing that is formalized in causal inference.
So, you can identify causal effects if you know all the mechanism.
It's called front-door identification.
If you want to do it like fully worked out on paper, the assumptions are very strong and so on.
But in everyday life, we take a much weaker form of it.
And this is that, okay, we know that this thing affects this other thing, and this is part of a mechanism that leads to this and so on.
And I mean, for a lot of medication, it is essentially like, okay, it seems kind of plausible that it could work here because we know how the body works.
And I think one great example for that, so there is a beauty YouTuber, like Lab Muffin Beauty.
And she has like full videos about like how industry influences.
And then there's p-hacking and so on.
But why she still believes in certain things like retinals for anti-aging?
And it's just like, no, our understanding is very good of what these things do in the skin and so on.
So it makes sense to assume that even if there are no randomized control studies for economic reasons, because there's no incentive to do that for like the general thing that works, because then other companies can just snatch it up and so on.
So mechanisms are, of course, like one huge part there.
And then it might even just be associations.
And that is the thing that throws people off, but it's just like associations that are hard to explain away otherwise.
So if there is something that you cannot explain in your current worldview, but this one like theory fully accounts for it, well, then you're like in the realm of, okay, so let's do hypothesis testing.
And then this is actually strong evidence.
And this works as well.
And I don't think it is actually distinct from causal inference because the important part is like you can't explain the association otherwise, which is the same as saying, okay, we don't have any like alternative confounders and so on that can explain this.
So my take is always a bit so causal inference.
I think if you like go really deep into the formalization and the different ways to identify the effects and so on, it is kind of all the same.
Like the things that intuitively work, they also work formally.
If you spell it out, why you think that works, right?
There are assumptions, like we can't explain this in any other way and so on.
And clearly people do have like an intuitive grasp on how these things work in certain circumstances.
And then the problem is that it's always fallible.
So you can always come up with an example where you're like, oh, everybody thought like the COVID vaccine was going to do this and that, but instead it did this.
Isn't this counterintuitive?
Assumptions and Uncertainty 00:10:20
And you can only find that out through randomized control tries.
And that is the part about causal inference is generally like fallible.
Like things go wrong because you put in assumptions and assumptions can always be wrong and so on.
So the randomization still plays a very important role.
So I know that some causal inference people go all the way to the other side and they're like, oh no, randomized control studies are horrible.
They lack external validity and so on.
And to me, it's more like, no, it is like it is close to magic.
It is like one very important part of the toolkit that does a lot of heavy lifting.
Unfortunately, it can never do all of the heavy lifting.
And you kind of need the whole toolkit to really understand how you can also like work across scientific fields and so on.
So I think if you're a psychologist, it's just like many of the causes we're interested in.
We can't manipulate them directly.
This would be different if we were interested in the effects of certain pills.
And that is just the nature of the field.
So it's just going to make the causal inference a bit harder.
Yeah.
I mean, it's a topic dear to my heart because I work in addiction.
And you cannot give people addictions.
It's not possible.
Can't be done.
Not yet.
You can.
You can give them addictions.
You're just not allowed to.
Yeah.
Companies do it all the time.
Trust me.
I've asked my ethics board.
They said I can't.
Trust me.
And what you've also spoken about at length and written about is that these heuristics we have around causality and just not taking, I guess, a more nuanced approach means that you have a weird sort of schizophrenic approach to many papers where researchers will write the entire motivation for the paper and the theoretical treatment and introduction in causal thinking,
interpret their results largely implicitly causally, and then provide a disclaimer that, oh, by the way, we can't infer this.
Someday somebody should do an experiment.
So, yeah, like, what is the way to fix this?
I mean, what is the way to not erroneously infer causality when we shouldn't?
On the other hand, be honest about what it is that is motivating us to do the research and I guess, yeah, hit the right tone.
So I think the important question here is there will be a lot of uncertainty when you do causal inference with observational data in particular, but even in experimental setups.
And the question is where you put that uncertainty, like how do you cope with it?
And so I think the current paradigm, and it's something I do believe that has evolved to cope with contradictory requirements.
You need to tell a nice causal story.
You must never claim causality based on observational data.
So you write that weird paper that kind of tries to have it both ways with always like, I think the focus here is always like on plausible deniability.
So you get away with the causal storytelling.
When you're challenged, you can always, but I just said it predicts it.
And I added that paragraph that everybody adds.
So clearly I didn't do anything wrong.
And so I think this is like the adaptive response to the contradictory requirements.
And it puts all the uncertainty in the research question.
Like, what the heck are you even doing here?
Oh, I'm interested in whether this predicts that longitudinally.
I'm not quite sure.
Maybe I'm also interested in causality.
And I think the way to move forward here is to take all that uncertainty and push it into the conclusions.
So it would be like, I am interested in this causal effect.
And this is like easy to justify because usually the causal effect is the interesting thing that matters for science.
And then you say, okay, so how can I get at this causal effect?
And this will usually involve them.
It will always involve assumptions in any context.
And sometimes these assumptions might be quite lightweight in a randomized experiment.
It might be, okay, the manipulation actually like works the way we think it works and does not affect anything else.
Maybe we're even just interested in the pill that we can randomize as is and so on.
But often it will involve more assumptions.
And even in a randomized study, this will be assumptions that your manipulation actually does what it does, which you cannot always empirically confirm.
In an observational study, it will usually be assumptions of the type of, okay, we did not forget any unobserved confounders here.
And this is a super strong assumption.
So I tell people to think of it more in degrees.
Like we didn't forget any huge confounders here.
I think this is the important part.
We did not miss anything.
That's obvious that everybody in the literature is talking about.
We also did not accidentally control for the wrong variables because that can also make things go really wrong and so on.
And these are the assumptions.
And under these assumptions, you have conclusions.
And you can draw these conclusions, but then the question is, should you trust them?
And you should trust to them depending on the degree to which you trust your assumptions.
So now the uncertainty is not, oh, is this a correlation or does this reflect a causal effect?
But rather, it's so this could be an estimate of the causal effect, but its credibility hinges on these assumptions and there will be uncertainty about whether these are true or not.
And they might people might even disagree, right?
And when I edit papers or review them, I try to be like maximally permissive.
So I'm like, I'm fine with this like heroic mediation analysis, as long as you tell me the assumptions, right?
Like you're saying here that negative affect and depression are not confounded by any common causes.
It's okay, please spell that out, right?
And then just say, okay, this is the estimate of the indirect effect under the assumption that negative affect and depression that are conceptually overlapping do not share any common causes apart from this one variable in the study.
And then just spell that out, right?
And I think this way you are being honest about what you're trying to do.
You are also fully transparent of what needs to go into it.
And the reader can evaluate on their own whether they go with the assumptions or not.
And you can actually even pinpoint where people disagree.
Now, the huge issue with that.
So I try to write my papers like that.
I try to encourage others to write papers like that, which involves being a bit more lenient when a paper does spell out assumptions.
That sounds ridiculous, but everybody's making them anyway.
So let's not punish people for talking about it.
The problem is, and that's in a paper by Baron Bohm and Pearl, that assumptions are self-defeating in their honesty.
If you say, oh, I'm assuming this and that, then it's always like somebody will be like, oh, no, but that's not plausible, right?
And I think it is like just how we train people that you should always find like the weak spot and then go at it.
And so if people offer up their weak spots, their assumptions, then reviewers will go at them.
Right.
And this is a problem.
I think it's a problem of training.
It's a problem of culture that people don't notice the assumptions when they are hidden, but notice when they are put out in the open.
But I still kind of firmly believe, okay, this is the way forward.
The assumptions need to be out in the open.
And I do think it's possible to get there.
And I'm mainly saying this also because economics has that, I mean, it has its own problems, but it does have that very strong norm that you focus on the plausibility of the causal conclusions and then you spell out your identification strategy, which essentially are the assumptions that you need for the causal inferences.
And that works because if you don't do that, the reviewers will just pounce at you, right?
And I mean, economists like doing that and call each other out and so on.
And so if the reviewers are able to spot the assumptions when they are still hidden, then they can tell people to spell them out.
And they won't just punish people for being transparent because they can tell when something is being hidden.
But that is obviously like a huge training task that we are facing right now.
I think we are making some progress.
I sometimes see other reviewers like pointing out these points or saying just things like, oh, you know, there's this paper that says that statistical control requires causal justification.
Could you spell out your justification here and so on?
So I think this is slowly changing.
It's clearly not changing for researchers who have played the game for a very long time and are very used to not handling things that way.
But this is also something where I might be mildly optimistic that we could get there eventually.
I think I can give you a note of optimism, Juliette.
So like one thing is for any of our listeners who aren't familiar with that academic style of writing, this is a common thing where people will say X causes Y or they're claiming a causal relationship and then the reviewer requests them to remove the causal language.
So they go through and replace it with X is associated with Y, but the rest of the paper says the exact same.
So it's like they've just done find and replace for any time causal language is there.
But the whole logic of the paper implies causality.
So you might think that's strange.
Like wouldn't that not actually do anything?
And the answer is yes, but it is a common request.
Hey, Chris, I have to interrupt you because I think, Julie, you might find this funny.
Because I read on your blog that you're interested in the so-called granger causality and longitudinal models, cross-lab models generally.
And for everyone listening, it's basically you've got a couple of time series.
And if X happens before Y regularly, then you can say you've got a certain prediction in time.
It doesn't really provide strong evidence for causality because there's lots of reasons for that temporal relationship.
I actually published a paper last year, which was trying to improve on that a little bit.
And then I saw the same concept in your blog just today, which is that as well as doing that, you can check for inverse granger causality.
Yeah.
Not just X predicting Y in the future, but if Y also predicts X in the future, then it's kind of a wash, right?
There's no evidence that X precedes Y more than the converse, right?
So you could improve on it a little bit to give temporal asymmetry.
So anyway, this is a cold concept of the paper.
It was a methods paper all about causal inference.
It proved causal inferences of the title.
Temporal Predictions vs. Causality 00:16:19
And the reviewer made me do a find and replace to remove all of the things where I was talking.
I had to remove the word.
I literally had to remove the word causal in certain spots and with directed prediction or some stupid phrase.
Directed association.
It only goes one way, not the other.
I thought it was funny.
That's pretty good.
But I'm still going to attempt to give a silver lining because we listen to like genuinely terrible people on the internet and often people that have some form of academic background, like people like Andrew Huberman or there's a psychiatrist, Dr. K, and so on.
So these are people that invoke, or Jordan Peterson, right, you know, in that mode.
And these are people that invoke academic terminology and academic expertise a lot.
And oftentimes what Matt and I are doing on the podcast is literally just trying to work out what they are claiming, like what they are saying caused this to happen.
And they live in the realm of dream associations.
It's kind of like a Jungian waltz where they will throw out connections between multiple concepts and they're very clearly and sometimes very explicitly describing causal relationships or, you know, saying that this is caused usually by the woke mind virus or whatever the case might be.
But the thing for me is that when I was listening to your talks in preparation for the interview, that you're talking to academics who all agree we want to get it right.
And we know we're doing things wrong and there's bad practices and we got to reform it and so on.
But it made me think that like in the wider world that's out there, there's people making terrible, terrible causal inferences constantly.
And they don't care at all about the quality of evidence.
It's all kind of vibes based.
And in that respect, I feel like sometimes academics might be too negative about themselves because the fact that all the stuff that you're talking about now is legitimate and a legitimate concern, but it's just so much better than what passes for causal inference, you know, in the world.
And in those cases, those people often have like much bigger influence and influence on grant getting.
And, you know, with the US in the current administration, they are actually selecting research priorities based on RFK Jr.'s causal inference.
So this is just to say I'm not giving academics a pass, but I've just said it could be so much worse.
To be honest, that is a bit of a, I mean, first of all, I'm glad you did your homework and listened to some talks of mine.
I mean, it is quite a low bar.
Yeah, because I would, I guess if I were like an outsider, I would presume that psychologists had a better training in causal inference given all the causal claims they make.
And sometimes people will be like, they will just presume that psychologists had good reasons to believe something.
And it's also like people in other fields will be like, economists will sometimes be like, yeah, how do you psychologists actually know that?
Because these variables, they are so endogenous and say, oh, no, it's just a correlation.
And they're like, what?
Like, how does it pass?
It's different norms and so on.
But no, yeah, of course.
And I mean, in academic research, there are also people who are very motivated to support certain claims, right?
And that strong prior will affect also how they employ causal inference.
And so initially, I used to make the mistake of look at who sites by Prima to cause a graphs.
And there were a lot of people who just wanted to make the claim that something affects something.
And so they will just say, oh, you should not control for these variables because they are all mediators of the effect.
And then they are citing like Rohr 2017.
I'm like, oh, no, just stop looking at people citing me.
No, but this is bound to happen.
So when people have strong motivations to find out either way, they are going to use all their smarts in a targeted manner to come out at one way or the other.
Now, the thing is, for the general public, to some extent, I guess I do have some hope in that I think actually many people are interested in how to actually draw causal inferences.
And to think of one example, so something that I really like routinely notice is like people notice they have some digestive issue and I was like, okay, what is causing that?
And this is something where like causal inference can go so wrong and people will just assume they can't have this and they will stop eating that for 50 years or something.
Well, it wouldn't have been necessary.
And actually, I think there would be space to provide those people with the right toolset.
And they are already nerds that rabbit hole and like, oh, I do an elimination diet where I remove everything and then just eat plain rice for some time to establish a baseline and then add ingredients one by one.
And this is essentially like doing like trying to do controlled causal inference.
And I mean, I think the most extreme nerds even like try to randomize themselves, right?
Which is kind of hard to blind yourself, but it is possible.
And so I think people do like people are naturally motivated to find out about cause and effect because it is really relevant, right?
Like for your life to figure out how you can like affect your outcomes and so on.
So I think there is like interest in that.
But then if the scientists are already so bad at it, I guess we shouldn't expect people to figure out how to do proper causal inference in their own lives.
And I think one area where that is very obvious is people with infants and babies and toddlers, which is like the most noisiest situation you could imagine because kids are just very noisy in the sense of like kind of unpredictable and so many variables affecting them and so on.
And people come up with the most bizarre things, right?
People will be like, oh no, I'm breastfeeding.
Can't eat gummy bears because my kid will like scream all night long, and so on, and I think there there would be like a lot of space for like public education, to um, teach people so that they don't mislead themselves as well, because I think people generally crave accurate information about how the world works and we are just like not teaching them how to do that.
I'm gonna agree, but i'm also going to say that, like my experience on online and looking at the people that cultivate the huge audiences, is that if the message that academics have is that look, we're bad at it, but we, but we're, we're trying okay, and we, we know there are ways that work right, to get it better, and there is good research out there, and you're right about people having a hunger to work out, you know,
hear about science or hear about how to make inferences right, but the problem is that the people you're competing with for attention are telling them that the approach, which feels good, like all the things that you're saying, is wrong right, that that is actually the correct way to do it, and also that the people that will tell you that's wrong, they're liars or that they're.
They're kind of elitists in Ivory Tower.
So I i'm just thinking matt about um the case.
We were listening to an influencer who they've got like a genuine health thing.
They've had sudden hearing loss and they they want to work out how to fix it and so they're doing multiple treatments, experimental treatments, at the same time and they released a video to their audience explaining, you know, all the treatments they're going through.
They talked about possible links to covet vaccines.
Just they weren't saying, they were just talking about there might be the association, they don't know.
But they then were like quite confident that whatever they do and if it works, they'll feed it back to their audience and this will, you know, help people that are in the same situation.
So like, for tons of reasons, that's a terrible thing, taking multiple experimental treatments and then if you return to baseline, you know where most people with sudden hearing loss do get hearing back you will infer that it's because of the things you did.
But in the comment section and then I know people caution around that, but there's so many people that are like, thank you so much for you know you're doing the real kind of investigations that we need.
This is what scientists won't tell you.
And then when I see like scientists talk about it, they're kind of like, yeah, look at the craps the journals are popping out.
And it's true, I have the same feeling, but i'm kind of like.
But their message is, we are empowering our audience to find all these things that like scientists are lying to you about.
And then the scientists tell them scientists are pretty shit at their job and they're not even good at like extruding the variables.
So like i'm just wondering that this is why I we are sometimes accused of being establishment shelves, because we're saying, but but it's better than what they're doing.
Um yeah yeah I, I see the dilemma And I also see like you're in a different spot, right?
Because you are paying so much attention to the worst going on.
I try to be like more focused on the positive.
And so, for example, I have a side thing that I'm actually doing substantive research about like birth order effects.
And I quite often talk to journalists because this topic is just like catnip.
Like everybody, oh, like what does it mean to be a bigger sister and so on.
And so, for example, in my experience, and that might be, there's probably some selection bias going on there, but the journalists I usually talk to are like super interested in getting it right.
They're paying a lot of attention.
If the paper is open access, they will even read it.
They even had people like challenge me about the methodology of like one classroom experiment we did and so on.
And so my focus is more on the, oh, there are so many people who actually do want to understand things and get it right.
But then you are focusing on the gurus, right?
And that is kind of like conditioning on the worst of it.
But I think, so I think there are multiple issues here.
So one thing is if it's all just storytelling, then you just always do better when you're not constrained by reality.
Right.
Because you can craft really nice narratives about, I don't know, the evils of modernity and so on, if you're not by any means constrained by reality.
And I think if this collides with people who are desperate, which I think is also like many people with health issues and so on, who follow these people, then they might just want to buy the hope, right?
I mean, this is actually more your topic, my topic, right?
And this is like an unfortunate combination.
I don't know whether this is like, I have no idea what's the average dynamic of how these processes play out.
I don't even know like what's the average like attempt of a non-scientist to find out something about something, right?
Like I don't know what would be representative, but there's definitely like huge heterogeneity in there.
I can, for example, understand that the public health people, I think in particular, are often quite like insistent on like you should tell people to trust the science.
And that I think that might have positive effects to just be like, trust the science.
And there was actually a huge issue also initially with the open science movement where people were complaining, right?
Like psychologists are like laundering their dirty clothing in public and everybody will be like, oh, we can't trust science anymore.
And then they will stop getting vaccinated.
And I really hope this didn't happen because I really hope people are capable of mentally separating like academic psychology, which is very different from medical research and so on.
But no, it's always a real risk.
Like if you criticize science and push scientists to do better, that might be perceived by the public as, oh, they don't have their house in order.
And I wouldn't disagree with that.
The problem is that there's like a lot of nuance there, right?
And there are things that you should believe and ought to believe for very good reasons and so on.
And other things that you shouldn't.
And we often have this with our secretary who's super interested in all of psychology.
And sometimes she will be like, oh, man, that all sounds really horrible.
Like, I don't know what to believe anymore.
And then I'm trying to provide her some like heuristics.
Like, you shouldn't believe this type of psychological research without asking me first, because then I look at the study for you.
But then there are other things where I'm like, I'm not going to research that because I trust the mechanisms in that field and I trust all the regulatory processes and so on.
So I will just take any vaccine I can get actually, right?
And I do see how it's hard to get that nuance right, in particular in a polarized environment, right?
Where they are either like, oh, yeah, science, that's the best thing in the world, or like just like, oh, science is horrible.
Those are all shills and so on.
So it's hard.
I'm not that negative about it, but this might also be biased because I'm not in the US.
The situation in the US might feel different.
I'm not paying much attention to the online gurus, although I sometimes enjoy to listen to your episode just to get some impression of the amazing human variety that is out there.
But I'm just saying, Julia, you could make a lot of money if you just changed your tone a little bit.
You would be like the greatest weapon of somebody that could legitimately criticize.
But I completely agree that it doesn't mean that therefore you have to present an overly rosy picture.
You shouldn't talk about the problems.
You shouldn't argue for reforms.
I don't think that's the solution.
And I think that generally all of the efforts that people make and the criticisms, the very public criticisms, are necessary and important.
It's just I kind of have sympathy in a way for the people in the middle.
I don't always think they're just unwilling receivers of this because I think it's flattering to be told that you, despite doing no research ever or spending a day learning about methodological things, you know, can completely criticize entire fields by just your intuitions.
Like, I get that because I have the CM drives, but I do think that it is hard to explain to people that sometimes, like you said with your secretary, if you want to explain to someone how to identify good and bad research, there's heuristics, but in general, it actually would take a little bit of time to learn how to separate this is a good,
very well-conducted study with reasonable inferences versus this is a low-quality, like quite hyperbolic study, right?
Because it can look in structure almost identical if you don't have the training to distinguish.
That is absolutely correct.
And I mean, even scientists sometimes seem to be lacking the training to do this reliably.
And I think, I guess, my hope would be, right?
I mean, I am training psychologists, and my hope would be that we can like ramp up the norms and people's ability to discern, right?
That the field as a whole becomes more credible.
So that it is more like, okay, now if the expert actually agree on that, you can believe that because we do have this mechanism, we do have all these controls.
And that's a bit aligned to the take of Simeon Vazir, who's always like, okay, we want to be credible, but the credibility needs to be earned.
And then there is always the issue, like, how do you know whether a field is credible to begin with or not?
Descriptive Differences in Birth Order Research 00:05:06
And so on.
And these are legit concerns.
I guess I'm just less prone to believe that we will all be out-competed by the online gurus, which I don't know.
I don't know.
From my perspective, they are like a fascinating, like niche phenomenon.
But also, my social sample might be biased because I don't befriend people.
I hope you're right.
I wish you were right.
I hope that we're just over-negatively biased by our sample.
But yeah, I fear.
Hey, Julie, just to get back to your birth order paper, just to double check there, you did analyze a lot of data, and I think you found essentially no real effects from birth order.
Is that correct?
Yeah, so I can give you like the short summary of the birth order literature.
So it's a mess.
So first of all, so this is definitely, I mean, this is part of the reason why I think the open science stuff was so relevant to me because the old studies are just a mess from many, many different angles.
But then there is like, there is like a legit research interest in whether there are systematic birth order differences.
And then there's actually a different question whether these are effects of birth order.
But I think the descriptive differences are already interesting enough.
The one thing that we do find that is fairly reliable in Western samples, I should say, is that firstborns tend to be a bit smarter.
So they have like, it's maybe an average difference of two to three IQ points.
So what I always say when I talk to journalists.
So what you're saying is all else being equal, if you knew nothing else about us, you would assume that I had an IQ two or three points greater than Chris.
That's actually I'm the second.
And my brother is a physicist and I'm a lawyer.
So yeah.
We're not making Iranius causal inferences here, Julia.
This is science.
We're just doing science.
The thing is, what I always tell people, so first of all, if you've ever talked to somebody, you will probably have more information than the birth order position provides.
And then this is like a very small average difference.
So if I tested anybody like twice on consecutive days, the measurement error alone would exceed like the average difference that you get over thousand of sibling pairs.
And then it's not deterministic.
I always have to say that because you're spoiling Matt's funnel.
Because I'm a firstborn, right?
And so when I published like the first paper and that got like picked up a lot by the media and then my back then boyfriend of my sister was like, so your sister scientifically demonstrated that she's smarter than you.
And then she sent out a press release to all major German users.
And so in my defense, even in the paper, we just check like the, so we have data from Germany and we just look at diets, so people who have like one sibling, and then where we actually have data for both siblings.
And if it was like, no, if there was no association, it would be like a coin flip who's smarter, right?
It would be 50-50.
So then there would be no association.
So it's not 50-50.
In our data, it's more like 60 to 40.
But that still means that for like 40% of the sibling pairs, the later born is the smarter one.
So this is what I always am at also in defense of my little sister.
But just in general, like this isn't a huge story.
And this is just the, I think, the most reliable effect that we have.
And then we usually don't find much going on for personality.
In particular, once you take into account that firstborns are by definition older.
So if you compare somebody who's a bit older to somebody who's a bit younger, you will find the effects of personality maturation, that personality, at least in self-reports, gets on average a bit more mature.
So people will appear more conscientious and so on.
And so this is like another story for my sister, where one time she was like sitting in my living room and just looking around.
She was like, do you think like in three years I will also have like a dining table and plants in my room?
You just need three more years and then we can see, right?
It would be unfair to compare it now.
Like you have to compare us at the same ages.
So there is not much going on.
And honestly, so this was the topic I started working on.
And that was just by coincidence.
So I applied to become a student research assistant and the professor who's still my current boss was like, yeah, I had that one thing I always wanted to look into, birth editor.
And I'm like, yeah, sure, why not?
And I found nothing, except for the intelligence effect, we published that.
So this was my bachelor's thesis, but we later published it in PNAS.
And then people were like, oh, wow, you published in PNAS.
You need to do this for the rest of your life.
But we didn't find anything, right?
And so this is part of the reason why I was like, okay, I need to do something else, like substantively, because I can't be doing this for the rest of my life if I don't believe that there's actually much of interest or even of practical relevance going on there.
Family Narratives Matter 00:03:33
But in terms of people preferring a nice story, And this is something which is true in the guru sphere and in the public discourse.
And it's also a problem in academia as well.
We like a sexy story.
It's easy to publish and it's tempting to bend things towards that direction.
I mean, did you find, for instance, that when the journalists were interviewing you about the sexy birth order topic, and there was sort of this sort of disappointment came over their faces when you told them, actually, there's not much to it.
There's nothing much happening with personality.
So actually, I think, and this might be, again, be like the journalists who are willing to report about these things, but they were really into the, oh, there's actually nothing going on.
I think it's just like, maybe because it is controversial to say that it doesn't matter systematically.
So I had the impression they were really into that.
And that is maybe also part why I think that the journalists are great.
And there was even, so you might know like, what's her name?
Catherine, like the princess of Wales.
And she had a, she had a third child.
Right.
And at that time, like, I got like journalists contacting me whether I could talk about what it means to be a thirdborn.
Right.
And I just gave interviews.
Yes, birth order doesn't seem to have any effects and so on.
And then later on, when that boy started school, I got contacted again.
So six years later and gave an interview.
And it was really like the questions were like, yeah, what does it mean to be like a thirdborn?
I was like, I mean, usually we don't find any effects, but in a royal family, it will, of course, probably be different, right?
Because the firstborns play a special role.
And then, of course, they are like in the media and people are speculating and whatever.
So surely this is going to be different in the individual case, but I can't make any statements about that, right?
I need thousands of people to say anything about birth order effects because if there's anything, it's tiny.
I can't say anything about that.
And they were really like lapping it up and they printed that in a newspaper.
And so I think it is also like, it is also like a cool like narrative that people enjoy being a bit like, oh, we are debunking things or even like there are interesting aspects to it, right?
Because there will be like you will have a different role in childhood if you're the eldest daughter, right?
So this will make a difference for your family relationships.
And that might even like, so I also get interviewed when Christmas season approaches because you're on Christmas, people meet their families.
And then it's like, oh, it's just like 20 years ago, right?
And the oldest daughter will like do all the cleaning and so on.
And then I can be like, yeah, no, that is probably like an accurate observation in many families.
We just don't see that it affects personality, which might speak to the fact that we need to distinguish between personality and how people act in certain contexts, which might as well be like hugely shaped by their birth order.
It might strongly depend on the family.
And I think people do also think of that as a narrative, right?
It's just like more nuanced than, oh, because you're a firstborn, you're smarter now, which is something I think the Lad Bible said about my study.
So there are different, I think, narratives you can eke out there.
And I don't think the simplest, sexiest one is the one that people like best, because I think people also like some degree of sophistication.
And again, I might be biased because I'm just like being contacted by journalists who think that the finding that there's mostly nothing going on is newsworthy.
But in general, I have the feeling that this has an appeal to people.
Incentives and Pre-Registration 00:12:01
And I was asked to write a book about it.
And I was like, I'm not the right person to write a book about siblings because I can only tell you that where they don't matter, right?
But people are interested in that as well.
Yeah, and I think like also, you know, there's been plenty of coverage of like the replication crisis or like the Stanford prison experiment re-evaluations and so on.
So there is an interest in those kind of counter narratives, maybe perhaps a very strong interest in some quarters.
But Julia, I had a question that I was hoping you could help me resolve because I can put the two things together in my head about it.
So in the open science movement, which I've been generally like very supportive and in favor of, even with the various little controversies that have come about within the open science movement arguing with itself, did you pre-register your own study well enough and so on.
But I have seen the papers, you know, where people, and I've done it myself with students, where you get them to compare the pre-registrations to the actual finished paper and you lo and behold, there are usually deviations that aren't mentioned in the paper, right?
Depending on the person, or that things that are in the open data set are completely uninterpretable because nothing's been labeled and so on, right?
It's just there and you get the badge.
And the thing is that I know there's been legitimate complaints about this because this is an issue, right?
This is kind of people focusing on the badge or receiving the accolade without doing the open washing.
Oh, is that what it's called?
That heard that.
But at the same time, I see a lot of papers.
The famous one, which is kind of pre any of this, is the one looking at null effects in NHLBI clinical trials, right?
This one where it was actually introduced in 2000, the requirement to register your trial, right, in the UK for, I think it was cancer trials or something like that.
And then you see before it, there's all successful results.
And then after, it's basically all null except for like one or two studies.
And similarly, your blog colleague and Daniel Likens were looking at registered report versus standard reports and so on.
And the pattern is always the same, that like the effect size decrease, the amount of null results shoot up.
But basically, researchers become much better.
They're much more similar to me in being able to collect null results at an impressive period.
But if it were the case that everybody or like a large amount of people are just kind of going through the motions to get the badge, how can those two things, you know, kind of come together where we're saying people are doing that imperfectly and not well.
But yeah, when it happens.
Yeah, so I think I can give you a partial answer already.
So, I mean, the badge and the pre-registration thing, this is usually just pre-registered studies.
So you have a pre-registration, you collect your data, you write it up, you publish it.
What Anna Shale and Daniel Larkins looked at were specifically registered reports.
Registered reports.
And I think these are a lot more potent.
So this is where you write up your plan and then the plan gets reviewed and you get issued in principle acceptance once the reviewers are happy with your plan and then you implement that.
And then you have zero incentive to spin anything.
Whereas if you pre-register, you still have all the incentives in the world to spin your story to make a nice narrative.
And then because we know that pre-registrations aren't routinely checked by reviewers and so on, it's actually you can do the spin and then you even get a badge, right?
So this is the open washing aspect.
And this is where people are like, oh, everything is going to get gamed.
And I think this is true.
Like everybody is trying to game everything.
Like people are also trying to gain rigorous causal inference now and so on.
So I think this is just something that happens.
It's also something where I think we know in principle how to react.
And this is actually checking the pre-registration during the review process.
Is something that we do now at Psychological Science where we also check whether um, the data and the code can actually reproduce the findings.
So when a paper gets accepted, it gets like accepted pending the reproducibility check right.
And if you made it at that point like um, Psychological Science is still a prestigious outlet, so it's like okay, now this paper is accepted, but only they can actually get the numbers right, like out of the data.
So there's a lot of incentive to make this as smoothly working as possible.
It is still a lot of work, like it's a huge amount of effort.
People will be like oh, who's going to do that?
And so we have a huge team of um star editors.
So that is like statistics, transparency and rigor.
But actually even that is not enough.
So we need like a volunteer network because it is so much work to check somebody else's results.
So it is like a huge investment.
I also think it actually should just be the norm for, like it's just normative reasons, like if you read a paper in a peer-reviewed journal um, I think as a layperson, it would be justified for you to assume that somebody checked the numbers and did the math and at the very least, like if you take the code and the data, you get the results that are reported.
I think this is a low bar actually right, and that this, that this is still like.
When I tell people about psych science, they're like oh, but who's supposed to do that?
Right, like it can't be the reviewers.
We can't expect that from reviewers.
But i'm like, why why?
Why actually not right, or like from journals?
Because it's like it's about quality control, but we don't even do that quality control.
Um, we are just like used to churning out papers and this is something, I think, where we are like it's.
It's a hard, like a hard shift, but it's also something where I think we might finally catch up with um public perception.
So there is a fun um anecdote there about my boss when he had his very first paper and then he asked his advisor like okay, so I mean, this was back in the day.
So he's like, how do I send them the data for the peer review process?
Is that like a floppy disk right?
Like what format do I need?
And I was like no no no, you don't do that.
How do I do not do that?
Right, how would they check my findings if they don't have my data and it was like a no-brainer for them and I think it should be a no-brainer like this is how we should want to do science.
This is not how we are currently doing it.
And he was like I don't know, 30 years ahead of his time maybe um, but I think we are getting there.
Um, I know that some political science journals are doing that.
It is a lot of like upfront investment.
I also see how the cost could um just go down over time, because if you want to get into those journals, you will make sure that um it runs like the code just runs and like outputs the results in a manner that you can easily look at them.
And I mean it did.
I did one of those reproducibility checks and it was actually I mean there were some r package issues, but when that was done I clicked like one button and all the results were reproduced.
So it's absolutely possible to do that.
It is hard initially.
It's much easier now because you can use so many tools to make coding easier and so on, and if we turn that into the norm, then also checking the numbers gets easier.
You can automate parts of the process right.
There are also people using LLM tools now to like check whether the pre-registration and the published study match, and so on.
So there's so much we can gain there and make it smoother.
We just need to like make it clear that, okay, actually, we want this.
Like we want to be sure that published studies have numbers that can actually be reproduced from the data and so on.
It's not a high bar.
I think everybody would agree that this would be the ideal state.
So we just need to put in the effort and maybe also reward people who put in the effort and so on and just maybe try.
So this is what we're doing at psychological science.
Like we're trying all the radical stuff.
Let's see how it will go.
It is a lot of work, but I think it's a worthwhile attempt.
But speaking about people, you know, the incentives and people gaming any system, you know, what you're describing sounds great, but it is all kind of pro bono, the volunteer hard work by yourself and others.
So, you know, I'm also thinking of the, you know, the way the bigger publishers work in terms of their monetization model and also the pressures on academics to publish, publish, publish, publish.
Or teach courses.
Some of us teach courses.
Some of us teach courses.
I don't know.
That's too big a question, I suppose.
But do you feel like it's rewarded?
Like, are you rewarded for doing this work?
Is it really done?
So, I mean, I'm a postdoc, so I'm probably not getting rewarded for it.
But I do think, so yes, I do think the incentives matter.
I also think that a lot of the incentives are social in nature.
And that also means they can shift over time, right?
As can expectations.
And so sometimes I think people have that like, oh, it's a race to the bottom.
Like everybody churns out as many papers as possible and that will get like the maximum reward and so on.
But that's clearly not what we are observing on average.
So I think there is like you have these people who just have like 200 papers in predatory journals, but they are not outperforming others for like the actually prestigious positions.
And there is like some people out-competing others by cutting corners.
So I just think it needs to be like shifted on multiple levels.
There needs to be soft pressure.
So I had a PhD mentor who was an economist.
And the economists, I think, are more aware of like reputational costs and how you can use these to enforce norms.
So there could be a thing.
We are very far away from that in psychology.
But I think in economics, if you publish too much, it looks suspicious.
And people will look at the papers and they're like, it's not even in a top five outlet.
What are you doing with your life there, right?
Like publishing two papers a year.
I like, Julia, that you're probably not aware of a fellow called Gary Stevenson from Gary's Economics, because our listeners will have heard recently like his presentation of the economic field.
And the economic field, as you said, you know, it has its own issues, but his presentation is not of a robust empirical science, but a pure voodoo science thing.
So when you say, you know, I don't think I know that you're not endorsing like economists as the model to hold up, but you're also not presenting the picture here is which is that there is a single model that controls all economists and it is the single agent model, Matt, right?
That's the only model and you're not allowed to model inequality.
That's the main thing he says about, and that all economists are rich and that's why they refuse to report any results that say there's any inequality.
And I think this is like absolutely fair.
So I'm thinking less of it that we should, I think we should emulate like the best parts of what economists are doing, right?
As well as the economists should emulate some parts of psychology because they have huge issues as well.
But I'm thinking more of in terms like, okay, so there is a counterfactual science where actually having your name on more stuff. can impose costs if those papers are bad.
Like it is possible to build like a community that operates more like that as opposed to the more papers, the better.
And I think like just seeing that there is variation between the fields means it's okay, you can find an equilibrium elsewhere where it's not just quantity.
Potential vs. Publication Prestige 00:07:46
Now, I do think we have like lots of issues also with the prestige outlets.
And I think they are now just cashing in on their names, right?
So all the like kind of bad nature and science brand journals that really are publishing horrible research and researchers pay a lot of money to get in there and say, yeah, okay, you're cashing in on the reputation that has accumulated over 100 years with this like usually tightly guarded outlet.
And I think that is an issue.
I kind of hope that this fixes itself by just like destroying the brand name over time.
It will take some time.
So it still works, right?
I do remember, so there was that one journal that started Nature Human Behavior.
And I mean, literally what they had initially was the nature brand name, right?
And it can only take off if people submit work that is potentially high impact and so on.
And it totally worked out.
And I was kind of observing that.
It's like, it's kind of curious, right?
And people were like, oh, it's a nature journal.
Yeah, but it doesn't have an impact factor yet.
But yeah, but maybe it's a nature journal.
Everybody will submit there, right?
And so on.
And so it still kind of works.
I'm really just hoping that it will collapse at some point where you like exploited your reputation to an extent that the reputation just fades.
And I know that, for example, psychological science had a very good reputation, but it was also the outlet for the sexy, cool findings.
And then that kind of gave it a bad name.
And then they came around and had like editors who were like strong reformers and so on.
And slowly rebuilding, it's still not back to where it was.
And now we have all these additional requirements if you want to get in there.
But I think these things do change over time.
Maybe I sometimes wish they changed faster.
Maybe I sometimes wish people would realize like to which extent journals are also like scamming them without providing added value apart from career progression.
Right.
So yeah, I think it's mixed.
I try not to judge people too much if they are just like trying to play along to maybe eke out a living, but I do judge people harshly who are at the point where they have a tenured position and still play along.
Right.
And then they're like, oh, but my PhD students like, yeah, but you're still like just hacking your way through.
So I cannot respect that.
But most importantly, I think for me, it's more like, okay, I wouldn't want to get into the position where I look back onto like an academic career.
And it was all kind of bogus.
I think that would be frustrating.
So I'm trying to like focus on the positive stuff.
I feel there are a lot of people in that position and they're quite quite protective and not sad about it.
But you know, I have something that I think is like possibly a positive note towards the end of not holding you hostage too much longer, Julia.
But, you know, Matt, I'm not usually the one offering like sunny ticks, right?
But you were asking Julia about who's going to do all this work.
And I think that's a legitimate question.
I do think part of the answer is AIs.
But the other answer, and I've seen you give talks about this, Julia, talking about examples of it, that now we have a bunch of new journals that are very friendly to open science and kind of other improvements, registered reports and so on, and like open peer review.
And I'm always amazed at the amount of initiatives by academics where they are doing free labor.
They've like come up with a little thing where they're going to review papers for journals if you, you know, and they'll give them a badge from their collaboration network to outsource the work for journals.
And not just that, but the example you give, which I hadn't heard of recently, was people setting bounties on their papers to look for statistical errors.
And so far it's like four out of four, or at least it was at the time that you talked about it.
So not a huge incentive to do that.
But I thought that's the kind of impressive thing that is like, I know I really do have an issue that I'm looking at the worst parts of the internet talking about it.
But it's so counter to the image of academics that's presented in the guru material we have, which is that they're all out just for themselves.
They're all constantly, you know, lying, producing low quality research, and they don't care if things are true.
And I see so much around the open science movement, around like reforms, that it's clear academics do care and they do unpaid labor and they set up blogs and they, you know, they go on podcasts about being paid.
And yeah, it just, it's so counter to the image of like the repricious, ideologically invested researchers that I think it's worth noting.
And you, Julia, you know, you're a self-deprecating academic, but you are somebody that really walks the walk in that regard.
So yeah, I think you are part of the solution, even if you can't say so yourself.
That is very kind.
No, but it's also my impression, like in particular in the early open science movement.
So like in the beginning of the Society for the Improvement of Psychological Science and so on, there is so many people who do care.
And this also includes senior academics who are essentially like blowing the whistle on their own work as well.
Right.
And so I do think there are a lot of people like interested in doing the right thing.
Of course, at the same time, people don't want to hurt themselves.
It's also very, I think, just natural, right?
And so I don't, because if it were the case that everybody was just ruthlessly optimizing the metrics to like have a career, there are individuals that do that, right?
But I am a personality psychologist, so I always see like the range of behaviors.
And I think we need to make sure that we don't just like select the very worst cases.
But I think we are already not doing that because at some point people are just like, oh, that person is just a scam, right?
And everybody can see it.
And so I see a lot of initiatives that try to do something differently.
And it's a lot of just trying out stuff, right?
And sometimes I'm like, yeah, no, no, that's not going to work, right?
And then some other stuff works.
It's hard to predict, right?
Just because you brought up the bug bounty program.
So this is run by my co-blogger, Malte Elsen.
So he got a huge grant and they are doing essentially reviews of published papers searching for errors.
And I can tell you by now, five reviews have been completed and there were five papers with errors.
So this is absolutely the norm.
And I think it's good that we are talking about it.
It's also something that the norms are shifting.
So I think now you can talk to academics and say, oh, actually, there are like errors in all papers.
And most would be, yeah, probably.
I mean, maybe not mine, but yeah, no, they are probably like issues that we are working on.
So overall, yeah, maybe I'm still like optimistic.
I see a lot of potential, a lot of people doing good stuff, a lot of problems also, like problems in how we like select for publications, select for people who get into positions of power and so on.
But overall, I think the last 14 years, I think, were positive development.
I don't have any comparative because I started my career in 2011.
There's a positive association, it's not a causation.
It's just a relationship at the positive direction.
And by the way, Mike, you muted yourself.
I was going to propose just placing an incredibly small bounty on fighting error in my papers.
Like, I'll pay like $5, but no more.
Don't look that hard.
Just have a quick scan.
Positive Developments Amidst Challenges 00:06:18
No, I second all that.
I think it's a positive vibe.
Yeah, Julia, we didn't really have time to get into it.
But actually, I think for people listening, some of this stuff around causality, in particular, the directed acid graphs and just essentially making a visual diagram of what you think causes what, how things are related.
It is intuitively how most people think about everyday life and things that matter to them.
And I think it comes very naturally to us if we just apply some of these tools.
So we'll link people to some of your materials there, Julia.
And maybe one day we'll have a more focused chat about it.
Not just academics communicating with colour academics.
I actually never thought of that as a suggestion that originates from me.
So I think this is just coming from the causal inference literature where you usually have like a well-defined intervention, right?
You have like control and treatment.
And then you look at the difference in the outcome and you're trying to make sense of the unit there and whether that is a lot or not.
And the outcome measure is just the outcome measure.
And I think this is a particular weakness of psychology that we always jump to the latent construct.
So instead of saying like, oh, being later born, right, means that on average your IQ is two to three IQ points smaller or something like that.
It would be like, oh, there's an association between birth order and intelligence.
And then I've even seen paper calculate correlation coefficients for that.
It's like, what is the correlation between like birth order and intelligence?
Like what does it even mean, right?
Like we're comparing people here.
And so this is just naturally like something that arises from the causal inference perspective, where you're thinking of even just like hypothetical intervention, right?
Like what if I gave you five points more grid?
What would that do to your educational attainment and so on?
And trying to tie it to more specific metrics.
And I think this, I mean, this is partly because it's like field like maybe public health and so on, where you do want to have a tangible outcome metric, right?
It's not just people are healthier, but it's like, oh, we are like saving that many lives or that many years and so on.
And this is important because it's also important for policy.
And I think it would be healthy for psychology to go into that direction.
Maybe also to sometimes realize, yeah, maybe this isn't that important, even if it's there.
I mean, it's already big if it's there.
It's not important.
And I mean, this is also something I've seen in the open science movement that people, right, like you start caring about p-hacking, all these studies are p-hacked.
But then you also notice, but even if they weren't p-hacked, like, what would they be good for?
What would we learn from them?
Would they have any applicability, right?
And so this is part of the birth order research.
And I think one economist even asked me, I mean, what does it even mean?
Like, you cannot, like, surely you wouldn't adjust the number of children to tweak their birth order position to reap the benefits, right?
So, yeah, it's a good point.
And so I'm more moving into that direction, I think, that you do need to quantify effects that actually matter in some way.
And that might also be on a theoretical level, but even then you need to like consider, okay, how much does this account for in the outcome?
Is this important?
Is this practically irrelevant?
We do anything with it.
And I think that's a helpful angle.
I think a lot of open science people have developed that, right?
Because they see so many bad studies where they're like, even if it were true, it wouldn't be interesting, right?
But also from the causal inference angle, where you're like, well, it's impossible to identify that effect, but then why precisely are we interested in it?
And sometimes there are basic science reasons why we might be interested, but sometimes it's like, okay, we wouldn't be able to do anything with that knowledge because we cannot intervene on that thing anyway.
Then I think it's fine.
I mean, I always respect it if people like really like rabbit hone and are like, no, but this is the thing I'm interested in.
I can respect that if you do it well.
And I think it's also like a process of realizing, okay, maybe birth order is not the most exciting topic in the world.
And then I did well-being research.
I was like, oh, that field is a bit of a mess.
And they do have measurement issues, right?
And like trying to find something where you feel like you are actually contributing something.
I know that this is also maybe a privilege of being a bit early in career that you're not yet fully booked or like that one thing you're going to do for the rest of your life because everybody wants you to do that.
But I've also seen like senior people, right?
Like switching moving fields or maybe turning more to like meta research, which is maybe something that is like, you know, just like at a certain point, you're like, okay, looking down on what I did.
Oh my God, what are we going to do now?
So, and I think these are all healthy developments.
I recommend being an applied statistician.
That way, you can change what you're working on every few years, and it really doesn't matter.
The topic, the topic is beside the point.
But honestly, I just think there's so many of the issues that you focus on, like measurement theory, you know, having a clear concept of what you're measuring versus the construct, the concept of what you hope you're measuring, having a clear model in mind, not just throwing statistics at it, but thinking about a model.
And by all means, a DAG, a graphical one is the best way to start there, I think.
Like these are the sorts of issues.
Like I consult with PhD students all the time.
People send them to me when they don't know what to do.
And the problems they're having all revolve around the sorts of stuff that you focus on.
So I think you should, yeah, you should definitely write that textbook that you were talking about.
Just doing your spare time.
It's on my to-do list.
It's on my to-do list.
And also ward psychologists.
I went to a couple of conferences with cross-cultural psychologists.
And because of these issues, there were people giving talks like what we need to do is to extended field work with a small community for multiple years and like not trying to generalize out.
It's like you're describing anthropology.
That's not the solution.
I know that field.
So there's dragons that way too.
But I'm just saying, be careful for looking at this look just in the wrong places.
There's space for ethnographic work, but it's not always the greatest science producer.
Thanks For Coming 00:00:33
But Julia, thanks so much for coming on and talking to us and listening to us waffle about gurus and so on as well.
It has genuinely been a pleasure.
And we'll link, like Matt said, out to your blog and your work and some of the talks that we reference.
But yeah, I do think for people listening, they often say, oh, you guys hear all this stuff.
What sort of stuff do you like?
Stuff like thanks, Julia.
Export Selection