Decoding the Gurus - *Patreon Preview* Decoding Academia #2: False Positive Psychology

Dec. 30, 2021 - Decoding the Gurus

58:34

Patreon Preview Decoding Academia #2: False Positive Psychology

The New Year is upon us and the gurologists thought they should contribute to the holiday cheer in the only way they can... releasing a rambling podcast about academic minutiae!This is not a new guru episode, instead it is a preview of our Patreon bonus series 'Decoding Academia' in which we discuss research that has influenced us & we think is relevant for understanding the gurus (or in this case approaching research critically).The paper in question is a classic social psychology paper that slightly pre-empted the Replication Crisis with very timely warnings about lax methodological standards. The paper is titled 'False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant' and is by Simmons, Nelson, & Simonsohn (2011).If you want to read the paper yourself it can be freely accessed here. It introduces the concept of 'Researcher Degrees of Freedom' and also statistically *proves* that listening to certain music can make you physically younger. How? Join us and find out! Finally... just a quick note to say Happy New Year from Chris & Matt! We will be back early next year with our Robert Wright episode and many gurus thereafter.

Large

Audio Only |

Scroll

Time	Text
	Welcome, everybody.
	Another episode of Decoding Academia, where Chris and I, at the moment anyway, are taking turns in picking research articles that we enjoy and talking about it.
	If you're watching this on YouTube, you'll see that Chris has a very nice stack of academic-y books behind him.
	There's nothing behind me because my brother's at his house at the moment.
	He only has one working USB port, which is being used for my microphone, so I cannot...
	Use the keyboard to search for an image.
	It's because the real knowledge is all up there.
	You don't need any Zoom background.
	There's no virtue signaling for you.
	Chris, what have you got for us today?
	You've got something...
	I've got something more up to date after your ancient publication from the 1980s.
	I'm taking us back just 10 years ago, a decade ago now, though, to 2011, and a paper...
	That has proved extremely influential, cited thousands of times, called False Positive Psychology.
	Undisclosed flexibility in data collection and analysis allows presenting anything as significant by Joseph Simmons, Leif Nelson, and Uri Simonson.
	Now, Chris, we had a bit of confusion about which was the right article for me to read.
	If you recall, you recommended...
	Did we?
	We in inverted commas did, because you told me that there was an article, and you gave me the rough title, and you told me the author was Ioannidis.
	Oh, yeah, right.
	And so I read a book called, I read an article called Why Most Published Research Findings Are False, very similar to what you said by John Ioannidis.
	And in fact...
	The article that you're recommending isn't by Odanus, it's by Simons, Nelson, and Simonson.
	So that would explain why I read the wrong article, Chris.
	I just realized that as well.
	Well, they're very similar.
	They're very similar in tone.
	So what was the title of his one?
	Why most research findings are wrong, right?
	Something like that.
	Yeah.
	Yeah.
	So, I mean, the two are very interlinked and often get cited together.
	But the cautionary tale.
	That we will be unable to focus on because he's not an author is that Ayandas has become something of a COVID contrarian, or at least he's been arguing that the virus is not as bad as is popularly made out.
	And he's published various studies to make that argument.
	And he's generally regarded as being very bad in the coronavirus era and to be engaging in many of the things that he Was previously warning people about in the selective way that he analyzes data and so on.
	So he's a cautionary tale, but he's not an author of this paper.
	He's not an author in the end.
	Yeah, so the confusion.
	Once again, Chris, I have to apologize because I got so confused.
	I read the wrong article.
	That's all right, but you know this article.
	You've seen this article.
	This is a classic of the genre.
	For me, the reason this article is I often assign this to the first year students that I'm teaching about methods or that kind of thing, because I think it is a very nice introduction to a lot of the issues that are central to the replication crisis.
	And it also gives a practical demonstration of how they could be applied.
	Into like presenting the results of papers by giving these kind of two fake experiments, but we'll get into them.
	So what about you, Matt?
	Do you have anything general about this paper?
	General comments to begin with.
	Many people listening would be familiar with the replication crisis, which particularly affected experimental social psychology, but arguably affects.
	A lot of areas in psychology and the social sciences generally.
	So the other bit of background knowledge for people is that, you know, in a prototypical experimental study, there will be sort of a hypothesis, a research methodology, some sort of experimental survey, some sort of measures are selected,
	some procedures are done, participants are recruited and measures are made.
	And then the data is subjected to some statistical analysis.
	And most people will have heard of p-values.
	This is...
	I always get this wrong.
	I shouldn't get it wrong.
	The correct definition of what a p-value actually is.
	Oh, good luck.
	But yeah, I'm not going to try.
	I'm upset.
	Yeah, yeah, they're stupid, the definition of it.
	So, but look, long story short, p-values get used by researchers to determine whether or not a finding would be unlikely to have happened by chance.
	And they often choose a threshold of p as less than or equal to 0.05.
	So that would be corresponding to a 5% probability that the statistic would be observed, assuming the null hypothesis.
	If they get something smaller than that, then they reject the null hypothesis they accept.
	The alternative hypothesis and Bob's your uncle.
	You can go ahead, write the paper up and get some citations.
	Yeah.
	But Chris, you want to take us through it a bit more or some more comments from you?
	So the concern is like an acceptable rate for false positives, right?
	If you're getting...
	I like the analogies that compare like getting a cancer diagnosis, right?
	We wouldn't accept a 5% error rate in a cancer diagnosis usually because the results are...
	Are quite impactful for people who receive them.
	But for other things, so the error rate that you'll accept on a given test should be variable.
	But in a lot of social sciences, a general rule of a 5% error, 1 in 20 error is applied.
	And p-values are, like you say, there's a lot of nuance in the specific way you describe them or whatever.
	But another feature about them, which is important, that Is that they are the probability over time, right?
	So you can trust p-values to give you the kind of correct answer if you are applying them over time.
	Any individual p-value should be treated with some suspicion, right?
	Because just the way the probabilities work.
	But in any case, it isn't all about p-values, this paper.
	There's plenty of papers which are about p-values.
	What this paper is primarily about is a separate construct, which is equally as influential to psychology, but not so commonly known.
	And they describe it as researcher degrees of freedom.
	And it essentially means the choices that people make in the course of collecting or analyzing the data, reporting data.
	That can be reported or are not reported.
	So it's all the choices that researchers make, like how big will the sample size be?
	What kind of measures will I use?
	What kind of statistical test will I use?
	What results will I report in the end paper?
	Which things will I not mention in the end paper?
	Which journal will I target it for?
	So on and so forth.
	All of these hundreds of decisions that go into research and each one.
	Has multiple choices attached to it.
	So these are the degrees of freedom that you have.
	And this paper is saying why we need to pay more attention and take steps to try and control them or be more transparent about what they are.
	Yeah.
	So it's obviously like it's a technical topic, but it's one that's near and dear to the heart of practicing.
	Scientists and researchers because everybody wants the research literature to be true.
	This is the idea that everything that is published reflects a true kind of finding.
	Now, the problem is that when you collect data, there's inherent randomness involved of various kinds.
	You don't get absolute true or not true evidence.
	What you get is some sort of measure of certainty.
	So, you know, as you said, by convention in the social sciences, they've tended towards just setting this p-value, this probability that the finding is, or the truth is, there is no difference or no relationship when actual fact there is one.
	They've set this arbitrary threshold of 0.05, and if it crosses that, Then are taken as like a, well, okay, this is probably true.
	It's, it's at least the probability of it being true is high enough that we're happy for this to be published and for it to enter the literature.
	So that's all very well and good unless somebody has their, their finger on the tiller.
	So that's what this is about.
	So one of the examples they give to illustrate like researcher degrees of freedom says they talk about excluding participants, right?
	You run a study and you've got your data and then You might think, well, some of these results are pretty crappy.
	The data is low quality.
	So how do we remove the bad quality responses?
	And they point out that when they looked at 30 articles in psych science, they had a whole bunch of different reasons for excluding participants.
	In particular, they were focused on like time, right?
	The time it takes to complete a task.
	And they were saying that people apply different standards.
	Some people apply the fastest.
	2.5 or fastest 5% and remove them.
	Other people take like a certain amount of standard deviation from the mean as a signal.
	Other people using a whole variety of different criteria, right?
	But there's no standardized set criteria and that if the researchers decided this in advance or for some particular reason, it could be useful or it can be fine, but it could also be the case that willingly Or knowingly or unknowingly that they are selecting cutoff points that enable them to remove results that are inconvenient or that help push things towards significance.
	And when there's such an incentive, when you're more likely to get a paper published, the results are more interesting whenever you find a result that is statistically significant.
	This can add up, right?
	These little decisions that you make to push things.
	In one way or another.
	Yeah, yeah.
	So to take this example, as you said, there isn't an obvious standard rule that should be applied.
	And it's probably okay for multiple rules to be applied.
	I think the real issue here is whether it's post hoc or a priori, right?
	So this is something I'm sure we'll talk about more.
	But when you talk about these researcher degrees of freedom, so there's all these decision points, as you said, and they're kind of fuzzy.
	It's kind of arguable.
	Should it be?
	Should you?
	You know, reject these responses because they took much longer than most other participants.
	Should you reject it if they sped through it and they completed it in the top percentile or whatever?
	Or do you set some number in milliseconds?
	You know, you could argue for any of those decisions.
	And it's arguably, again, not a problem for you to choose any of those choices.
	But the issue is, is whether you make that decision before you look at the data and start, you know, having a play with it and doing some statistical analysis or a priori.
	Or whether or not you analyze your data, you might find your results are kind of, you know, they're kind of maybe in the right direction, but not really significant.
	And then you might go back and revisit some of those decisions to, in scare quotes, improve the quality of your data set.
	And you stop fiddling when you get the result that you're looking for.
	Yeah.
	And so they give a nice demonstration.
	This is one of the reasons I really like this paper.
	So they do, they report two studies.
	That they actually ran.
	So study one is called Musical Contrast and Subjective Age.
	And they get 30 participants and they randomly assign them to listen to Kalimba by Mr. Scruff, which came with Windows 7 operating system.
	So this dates it.
	Or a children's song, Hot Potato, performed by the band that Matt often recommends, The Wiggles.
	So a kind of children's song or more adult.
	You know, adults could listen to it.
	Adults could listen to the Wiggles too, Chris, just saying.
	It's fine.
	Yes.
	And they, they then report how old they feel right now after listening to the music and people feel older after listening hot potato than listening to the control song, right?
	And it's P equals 0.03 is the relevant P value.
	And then they, but that's the prelude to study two.
	When they say we decide to conceptually replicate and extend these findings.
	And we look whether listening to songs about older age actually makes people younger, physically younger.
	So they use the same method with 20 undergraduate students.
	They listen to Kalimba again by Mr. Scruff or When I'm 64 by The Beatles.
	And then they indicate their...
	Their birth date and their father's age.
	And they find that, as predicted, according to birth dates, people were a year and a half younger on average after listening to When I'm 64 than they were with kalimba.
	P equals 0.04.
	So they say the studies were conducted with real participants, employed legitimate statistical analyses, and are reported truthfully.
	Nevertheless, they seem to support hypotheses that are unlikely.
	Study 1. Or necessarily false study too.
	And how they did this is by exercising researcher degrees of freedom.
	So do you want to explain what kind of things they did, Matt?
	We don't have to cover it all, but how could they make these results without actually finding these interesting findings?
	Well, before I do, let's just be super explicit here.
	They purportedly found that listening to one of these songs made people physically younger as per their self-reported age, right?
	So people in one condition actually found them to be of younger age than the people that had listened to those songs.
	This is, of course, logically impossible.
	There is no way for listening to a song to make you physically younger.
	Whereas, as they say, they did legitimate statistics.
	They didn't lie in anything that they reported, and yet they found if Brackets 117 equals 4.92.
	That's an F statistic.
	And they report a P equals 0.04, which is just below the 0.05 threshold of significance.
	So it's obviously dodgy.
	Can't be right.
	So how did they accomplish that?
	Well, there's a few things you can notice about this.
	One thing is they've got relatively few participants.
	They've only got 20, right?
	So I think it's 10 in each condition.
	Is that right?
	I guess so.
	I'm not sure the conditions, but probably around that.
	That's going to make their means bounce around a little bit more.
	It's not going to make it more significant if you like, but it maybe gives a little bit more wriggle room.
	But then really, the main thing that's pushing it in the direction of significance is the stuff that they don't say.
	So even though they didn't lie, they made a lot of little decisions, these researcher degrees of freedom along the way.
	In order to bring those levels down.
	So for instance, they measured multiple dependent variables.
	So the dependent variables, the thing that you're expecting is going to change as a result of the outcome, which is age.
	So they report age, but they measured other things, didn't they, Chris?
	What else did they measure?
	How old they felt, how much they would enjoy eating at dinner, the square root of 100.
	Their agreement with the statement computers are complicated machines, they're Muller's age, they're Foller's age, whether they would take advantage of an early bird special, their political orientation, which of four Canadian quarterbacks they believe won an award, how often they refer to the past as the good old days and their gender.
	Yep, yep.
	So they measured a bunch of things and they could presumably tested.
	Ran their statistical test on all of these outcomes.
	So with every one of those statistical tests, they're getting another shot of that p-value, which assuming the null hypothesis is true, there's no effect, which is a pretty safe assumption here.
	They're going to have a 5% chance every time of having that p-value land under that magical threshold.
	Yeah.
	And so the other thing that they did was, and they detailed this in the table, they include In gray text, all of the information that they left out.
	And if you put it back in, it looks very different.
	So that's how I read off all those alternative variables, which they don't mention in the results.
	They also dropped 14 participants because they had a separate condition with a different song.
	So Hot Potato by The Wiggles.
	So if they had to find that people were younger in that condition, they could have...
	Taking that or so on.
	Another thing that they do is they conduct the analysis after every experimental session of approximately 10 participants, they say.
	They did not decide in advance when to determine.
	So they're basically checking when things reach statistical significance, and then they're going to choose the cutoff point, right?
	Instead of collecting all the data and analyzing it at the end.
	So there's various things.
	That they're doing behind the scenes and they don't lie about any of it.
	They just don't report it.
	And of course, this is an extreme version where they're trying to reach an outcome that is on the face of it ridiculous.
	But the point that they want to make is a broader one that you can have this exact same circumstance in the legitimate study where people don't report variables, which maybe they say, well, these weren't, you know, they didn't really have an impact on our results.
	So we don't need to.
	Report them and maybe they drop out some participants like we talked about for quality control or because some condition didn't really work.
	And the issue is if you drop out all that information, it can make your story read more smoothly, be more persuasive, but you're taking all this information away from the reader of your article.
	So instead of learning that you did 20 statistical tests and one of them reached significance, which would be a lot less impressive.
	You're only given the information of one statistical test and that's important.
	Yeah.
	And just for completeness, I think there was one other research degree of freedom there, which was the decision whether or not to include a covariate in their analysis.
	So they ended up controlling for father's age.
	But not Muller's edge or not any yellow fingers.
	Yeah, so you can imagine if you measured a whole bunch of other potential independent variables, demographics or whatever, you could try putting some in, taking some out, all these different permutations, each of which would sort of give you another shot at finding a p-value below 0.05.
	So it's just an illustration, but it illustrates that when you do exercise those researcher degrees of freedom, Then it's not that hard to find a spurious, significant effect.
	And I think one of the morals of the story here, of course, is that people really need to not do this omission thing.
	Because as you say, it'd be far less convincing.
	In fact, it'd be totally unconvincing.
	It wouldn't get published in the first place if they had reported all of those degrees of freedom.
	And this is why there's a push, like with open science moves or in the wake of the replication crisis.
	To make research more transparent and to provide the information of what you're going to record in advance or to make all of the data available online for other people to see.
	Because I think it prevents or it kind of ties your hands to a certain extent to present things in a misleading way.
	The more transparent you are.
	It doesn't prevent you from creating results and producing them, but just means that people can put them in more context.
	On its own, I think would be a good contribution of this paper, like a nice little simple illustration.
	And I often assign the text of these two studies to undergraduates and ask them to try and identify methodological issues and then provide them the table with the researcher degrees of freedom.
	So it's a nice little exercise that you can show people to be skeptical of research findings, but they also add in this neat simulation study.
	Where they conduct simulations that vary the four things which they say are crucial in researcher degrees of freedom.
	So choosing among dependent variables, as we discussed, choosing sample size, using covariates, and reporting subsets of experimental conditions, right?
	So these, they label them A, B, C, and D. And they look at if you exercise these on data that is non...
	That is just random, right?
	How often can you create false positive results at different levels of p-values?
	And they have like p under 0.1, p under 0.05, and p under 0.01, right?
	So kind of increasing levels of harshness.
	And they basically show that in each one, you can increase the chance by between 10 and 20%, depending.
	On the thresholds, but for the normal P under 0.5, each of these individual things like controlling for gender or interaction with gender, you can increase the likelihood to get a significant result by 12%.
	And if you combine them, if you do multiple versions, once you're adjusting like the dependent variables that you report using selective covariates and adjusting sample size and so on, you get up to 60 or 80%,
	depending on your thresholds, a chance of getting a false positive result.
	So they're really saying that these individual choices, they might move the needle only 10% or 5%, but if you combine them, you suddenly become more likely than not to get a false positive result.
	And this is a problem.
	It is a problem.
	Yeah.
	So look, in the little toy example that's reported, In the paper.
	So it's kind of reasonably obvious that what is being done is a little bit shady and we already know in advance that the result couldn't possibly be true anyway, right?
	But it does apply to the real world because often the statistics that are, the data sets are much bigger and more complicated.
	The questions are more complicated.
	The statistics that are being used are more complicated.
	So the degrees of freedom are real and the researcher genuinely It's a sobering tale because...
	It's not just researchers who are twirling their evil mustaches and deliberately fiddling with the books in order to make these false results.
	Those people probably do exist, but far more common, I would say, is an earnest researcher dealing with a complicated design, some complicated statistics, and being genuinely unsure about these questions.
	Like, oh, you know, which cases should I discard?
	Should I control for this or that?
	Genuinely not being sure and could quite easily, even with the best of intentions, treat the effect that they're looking for as the kind of signal that they're analyzing it in the right way.
	And that if they're not finding stuff, it's like, oh gee, especially an inexperienced researcher or a graduate student could think to themselves, well, I've obviously made a mistake here with my analysis.
	I need to go back and fix it up.
	And the senior academics can say, maybe you can reanalyze it using like...
	Look at subsets of data and so on.
	And there's actually Daryl Bem, an influential social psychologist, provided a methodology guidebook for undergraduates.
	And it talked about analyzing data and it strongly described, like, cut your data up, look at it from different angles, add covariates, take them away.
	You need to torture your database until it tells you the story.
	And it might not be the story you started out with.
	It's, you know, what your data teaches you.
	And that's like the concept of data phishing or phishing for significance is a, you know, like a cautionary tale.
	But that's exactly what's being described, right?
	If you're just playing around with numbers and you're not testing things in that kind of predetermined manner, humans are pattern seekers.
	And when we're motivated to get particular answers.
	When you've got lots of different ways to analyze things, lots of different tools that you can put in to test things, you can find creative results.
	And this is a big part of why we had the replication crisis because it's not individual researchers.
	There are cases of fraud where people falsified the data, but that's more rare than people engaging in motivated analytical choices and then getting the results to a level.
	That they're ready to publish in a paper.
	And there are all the incentives geared towards that the students did not create or the researchers did not create to do that.
	Journals would not accept null results.
	People were unhappy with negative findings.
	So it wasn't just individuals making bad choices.
	It's that the kind of incentive structure, especially around journal publications, incentivized positive, splashy.
	Counter-intuitive results.
	And so what you got was a load of results that were created via these kinds of methods and that simply cannot hold up to rigorous replication because once you do quite strict tests and don't engage in those,
	the results don't appear again, right?
	Yeah, as you say.
	The underlying issue here is the incentive system that is at play.
	And yeah, the individuals involved didn't create it.
	It's very understandable for journals and the community generally to be more interested in new, exciting findings, something different, not just, oh, we did this carefully constructed experiment and we didn't find anything.
	It didn't work.
	It can work.
	Even that's the point that my framing of that is wrong, right?
	Because it didn't work implies it should have worked.
	In fact, I was going to mention the same thing, which is that I've often caught graduate students that I've supervised using the same language, which is that it didn't work.
	In other words, I failed, right?
	This research that I tried to do was a failure.
	And it's just really the wrong framing.
	We have to try to learn a different framing.
	The other thing, too, is that it is the...
	Incentives for the individual researchers, especially junior researchers.
	The job market...
	Publish or perish.
	Publish or perish.
	Everyone knows it's difficult to establish yourself in academia.
	And for graduate students or early career researchers these days, the pressures are intense.
	And you'll have this advice given to you, which is that, you know, you need to publish, you know, get at least four or five papers published by this amount of time.
	And then you'll have a chance at getting this kind of fellowship or this scholarship and so on.
	And there's this great filter going on where people are filtered out unless they meet those thresholds.
	So the pressure on them is intense.
	And also a supervisor, a more established academic who might be advising one of these.
	People, they obviously want the best for them.
	They could quite naturally in a meeting say, oh, I see.
	Okay.
	It was not significant.
	Hey, well, did you try doing it this way?
	Or are you sure that was the right decision?
	And it can all be done with the best of intentions and it can lead to the problem that we're talking about.
	I can also say that it's not always, it's like often the senior researchers are the ones that want the positive result more than the graduate students or so on, because there's a becoming an increasing generational divide where The younger researchers are okay with null results,
	okay with open science practices.
	And the people that are more hesitant are the more established researchers who often have their own theories that they want to protect.
	And Matt, one thing I want to point out here is all of this might sound in a way similar to the critique of some of the gurus that they raise about academia, right?
	That it's this rigged system that the incentive structure is there not to produce truth, but to create.
	Jobs, right?
	And to like silence people who are producing inconvenient results.
	But the difference that I want to make here, or the contrast I want to highlight, and it relates to this paper as well, is first of all, that despite of all these flaws, which people acknowledge, like this paper is getting published in academia, right?
	Because people want to self-correct.
	And this is not the first time these issues have been raised.
	They're known about.
	And there are steps taken to address it.
	And even though there are all these skewed incentives, because science is self-correcting, because people try to replicate things and so on, the literature does detect them.
	And similarly, as we'll get into, I think, at the tail end of this paper, this paper has suggestions for how to resolve the issue, recommendations for researchers about what to do.
	And they're practical.
	They have been.
	Started to be implemented across academia and various social sciences, and they're producing effects.
	And it's this pragmatic approach of recognizing humans' limitations, including researchers' limitations and all the incentives, and trying to devise systems that address it.
	But they're not about these self-aggrandizing takes where the researchers are talking about how they're much better than everyone else.
	No, they're giving a set of suggestions about how things can...
	Yeah.
	And not being unrealistic about it.
	So this is why the gurus are so infuriating in some respect, because they're taking away the ability to discuss this topic with nuance and just tying it to all this culture war nonsense.
	And it isn't like there's no issues with academia or no one's talking about it.
	It's just that like this kind of stuff, it doesn't interest them because it's technical.
	Yeah.
	It's boring to them because they...
	That's what they care about.
	Yeah, I mean, totally agree with that.
	This might sound superficially similar to the kinds of things they're saying, but it's just not.
	The gurus wave their hand and make these blanket statements about everything being corrupt.
	It's not, right?
	This is a problem, but it's not the only thing that's going on.
	Not all research fails to replicate.
	A lot of findings are perfectly fine.
	Despite these issues.
	With these researcher degrees of freedom, for instance, many researchers, a lot of the time do pretty well at avoiding most of these pitfalls anyway, right?
	Yeah.
	As you said, Chris, the replication crisis started in social psychology and the people who found it, the people who raised attention to it were psychologists in the field.
	Yeah.
	And then the people that...
	You have been madly going about replicating stuff and checking whether stuff is true.
	Publishing papers like this, establishing open science protocols are people within the discipline.
	So there's been absolutely no conspiracy of silence, right?
	This is something that, you know, we want to fix.
	Well, I'm speaking too strongly.
	Yeah, but there are conservative elements which are like critical of it in our debates, like the tone of criticism.
	And various issues, but the point stands regardless.
	And I also think that when you say that the replication crisis started in social science, you mean like the phenomenon, right?
	As in, it was detected in social science.
	And then there has been subsequent efforts to look at economics, to look at cancer research and so on.
	And they find problems.
	They've also found better rates of replication in some disciplines, which are heartening.
	But still, they find these issues and the same solutions apply.
	And the other thing I say too, Chris, about this epistemic paradigm of quantitative science, whether it's social science or some other kind of science, involving statistics and involving measures, involving these kinds of research designs, it lends itself to being checked.
	It's possible to check and find these things.
	Now, there are other disciplines that will remain nameless, that can't be checked in this way.
	So it's just important to get the balance right.
	It's obviously a good thing, the replication crisis to be identified, more of these open science protocols and these guidelines being followed to avoid these issues, constraining these researcher degrees of freedom.
	A lot of positive things we can do.
	It's also possible to take it in a kind of puritanical way a little bit too far.
	And as you sort of hinted at, it's possible to do things like, imagine taking a graduate student who's published maybe their second or their third article ever and who has made some methodological mistakes and then naming and shaming them and demanding their articles retract,
	all that stuff.
	Like, I don't think there's any need for a kind of a pogrom.
	To eliminate the undesired.
	Yeah, and these are debates that go on about, you know, like a paper is published and it may have methodological flaws, but not be egregious.
	And then is the tone of the people, or is the criticism that it attracts fair and proportionate?
	And there are legitimate debates to be had there.
	But another point I think that's a reference to our gurus.
	Is that the superficial critique, although there's parallels, one thing is that, and you see this a lot with Brett Weinstein and Heller Hayne, for example, that they claim to be skeptical about research,
	but they're not, right?
	When they find a paper which fits their priors, they're extremely credulous about it.
	And what I try to emphasize to my students and basically to everyone is that You should read papers critically.
	You should be skeptical and you should be asking questions about what information was missing or so on.
	But not just for the papers that you don't like the results of.
	All papers should be taken as provisional.
	And this isn't a new thing that I've developed.
	It's the attitude of science and it's the recommendation of methodological textbooks since they've existed have made this point.
	It's not new knowledge.
	It's hard to put into practice.
	And the gurus here, like Jordan Peterson or whatever, they have no interest in that.
	All they use papers for is these kind of rhetorical weapons that they can fling across.
	And the methodological limitations might be of interest if they can undermine the study that they don't like, but they'll never highlight the methodological issues with studies that they like.
	And this is why this is using research wrong and approaching research wrong.
	It's using it as a rhetorical weapon.
	Instead of a record of a study that was conducted on a finding, right?
	Instead of seeing it as a cumulative endeavor.
	Yeah.
	I was heading in the same direction myself.
	I mean, the way in which this kind of thing can be weaponized is to have these extraordinarily high standards and then selectively apply them to whatever studies do not fit whatever your preferred narrative, for want of a better word,
	is.
	And that's the way in which it could be.
	You know, a terrible thing because, you know, no study is perfect.
	No methodology is perfect.
	And if you selectively apply extremely high standards, then as you say, it can simply be used as, well, it's just a form of scientism, really using, using sciency, like a sciency critique to make a rhetorical point.
	Now, as you say, it's hard, right?
	The whole thing is hard doing it, doing it well, doing it right is hard, both doing the research and also critiquing the research.
	And the thing that makes it hard is that.
	People have prejudices.
	People have preferences.
	People have desires.
	And I've always been against sort of fusing together what we want and what we desire, what we think would be good and beneficial activism, for want of a better word, with the process of finding out what is.
	I know there's a lot of caveats to be made there, but I think it's really important to be dispassionate.
	And the sort of thing you were describing, which is to apply an even-handed threshold and even-handed standard universally requires like a level of dispassion and not to have a preference to like delegitimize this type of research and promote that kind of research.
	I'd add that just one point there, Matt, and I think you'd be completely on board with it, but it's no problem to be passionate and have a preferred result.
	You have to be dispassionate in the analysis and the design of the study.
	So you can have what you want to be the case, but to be a good scientist, you have to respect what the data says and not put your thumb on the scale.
	So you can be an activist, but you can't massage your data to support your preferred conclusion.
	Absolutely.
	So those two roles can coexist in the same person.
	It could be those two people at different times, but it's like wearing two hats, you know, you have to.
	You can put on this hat and you put on that hat, but there's no other way around it.
	And it's not easy.
	That's the point.
	It's easy to say, to be dispassionate, but it's easier said than done.
	There's somebody who's kind of terrible on light called Lee Gerson.
	He's a social psychologist and also a kind of warrior in the culture wars, but he is correct.
	Sometimes when he diagnoses problems about methodological issues and he talks about the uneven application of standards of rigor across the different topics.
	And regardless of Lee's broader view, he's completely right about that.
	It has to be applied consistently in order to be science.
	Matt, maybe a good thing to move to is to look at what they suggest as the solution, as they put it, simple solution to the problem of positive publications.
	And they gave six points for offers, four points for academic reviewers.
	So maybe I'll take the offers and you recommendations, and then we can talk about the guidelines for reviewers quickly.
	So they say...
	Offers must decide the rule for terminating data collection before data collection begins, must collect at least 20 observations per cell, must list all variables collected in the study, must report all experimental conditions, must report what statistical results are if the observations are included,
	and analyses without covariance.
	So all of those are sort of technical in nature, but the fundamental principle is just disclose everything that you do.
	Be transparent and have minimum standards, right?
	And another important thing, decide things in advance, not on the fly.
	Yeah, absolutely.
	So being transparent, the other thing is, as you say, deciding things in advance, that a priori versus post-hoc decision-making, reporting all the information and not making these sins of omission.
	Just a bit more culturally, like in terms of cultural advice, not technical advice.
	Culturally, the advice for researchers and for Editors, reviewers, is to embrace null results, embrace boring stuff, and to not perhaps have such a high priority on the results being some flashy,
	new, exciting, shiny thing, but rather just bedding down and solidifying and giving confidence in the sort of core findings in the field.
	And, you know, this is probably too much to ask for, but if we can move away from the idea...
	I mean, I don't know.
	Actually, Chris, I'm going to ask you this, right?
	Because I don't think there's an easy solution to this one.
	Presently, publications and citations to those publications is the ultimate benchmark by which academics are judged.
	Unfortunately, stuff that gets published, stuff that gets cited, tends to be shiny, new, exciting, and therefore creates this massive incentive, you know, these unhelpful incentives.
	Yes.
	I struggle to think.
	Of an alternative way to decide which academic to hire for a position or to grant a scholarship for or whatever.
	How do we leave those metrics behind?
	Well, I don't think we can leave the metrics behind, but I think we can factor in the quality of studies, like as a metric.
	So for example, there can be a famous study, which is badly conducted and is hugely influential and in a very careful Much larger pre-registered replication effort finds null results and gets much less citations.
	But I think amongst certain kinds of researchers, and they're increasingly common, the negative replication is valued.
	It might not be valued in the citation metrics, but I think there's an increasing awareness and there's increasing moves from funding bodies to require things like open science protocols be applied for.
	Grants and stuff.
	So it's never going to be perfect, but I think there is, at least in all the hiring panels and stuff that I've been on, there's scope for considering other aspects than just the citation metrics.
	And in most cases, it hasn't come down to that.
	Where I've been, where they're like, is there a paper in psych science or something like that?
	It's been more, this person isn't a good fit because they don't have...
	The right kind of skills.
	That's a good point, actually, now I think about it.
	When you think about these citation metrics and flashy papers, that's something that often comes into play 20 years later in your career.
	It's not generally something that grants you entry.
	As you say, if you're on a hiring committee, if you're on a scholarship committee, if you're a reviewer or an editor...
	It's possible to value different things and it takes a little bit more effort to actually read the papers and pay careful.
	It takes more work rather than just looking at the Google Scholar citations, but it absolutely can be done.
	And, you know, as you were speaking, I realized that we do exactly that in my little neck of the woods.
	So, yeah, it's not as pessimistic as I'm made out to be.
	Well, I think the other thing that you've emphasized that's really important is like a greater tolerance amongst researchers and reviewers.
	And the people consuming the research for, like, messy results, non-significant results, or null results, right?
	This greater willingness to accept a non-sexy, does not make the study worthless and is also valuable, right?
	And I think that is becoming more common, where now it's acceptable to find null results.
	You still have, of course.
	It'll never get to the point where people are happy to get negative results for their theory.
	It's just not going to happen like that.
	It doesn't even happen in the hard sciences with, you know, people who have very, very lots of things riding on the calculations that they make.
	But it's not the way our psychology is, but it can be the way that our scientific standards are.
	And, you know, we keep referencing open science.
	We'll probably cover them on later episodes in more detail, what kind of things can be involved.
	There, but I see lots of reasons to be optimistic.
	Yeah, yeah, I do too.
	And in fact, this is something I'm trying to speak about more and more with students and early career researchers.
	I mean, just in a very superficial way, the result you want is not always a significant result.
	Just to give you an example, Chris, we recently did an analysis because in Western Australia, there are very few pokey machines or slot machines.
	And in other states in Australia, There are, right?
	Everything else about those places is pretty much the same.
	So it sort of forms like a natural experiment.
	All the other forms of gambling are available.
	There's this one that's been removed or made less available via regulation.
	So what we can do is create a model and see if we can attribute the lower rates of gambling problems in Western Australia to the lack of access to that particular gambling form, right?
	So as part of that analysis, we had a model.
	Of what's the likelihood of a person having gambling problems given their participation in a wide range of different forms.
	Now we wanted, I mean, you know, wanted in scare quotes.
	Yeah.
	It suited us, according to our theory, conditional on participation.
	We wouldn't expect there to be a difference between those two states.
	Okay.
	Conditional on participation.
	So if there had been a significant difference between.
	The states with respect to conditional participation, so one of those covariates, then that would have been disappointing for us because we didn't have a good explanation for that.
	Like, remember our theory assumed that it was lack of access, right?
	Lack of participation that led to that difference.
	So, I mean, look, that's just a very superficial example, but for all social scientists who may not be super experienced with statistics, the lesson from that is that what matters is your model.
	What matters?
	Is the formal structure that you're using to describe a phenomenon.
	And a parsimonious structure, one that involves less degrees of freedom, and there's less relationships between things, is actually a good thing, right?
	It makes the model more parsimonious.
	So even in very superficial ways, this is getting to my point, I guess educating researchers that you can be designing your studies and thinking about your analyses, not in terms of trying to find a significant.
	Result, which is basically just describing a difference or a relationship, but rather to try to find a parsimonious description of the phenomena that you're interested in.
	Yeah.
	Yeah.
	I sign on to all such endeavors.
	Like in many respects, we are preaching to a known choir here, but I think it's good for people to hear about possible.
	Solutions and like why it is that, for example, when we talk about things that we are kind of harking on about, you know, approaching research objectively and stuff.
	There's another moral to this story, which is related to the gurus and so on, and just people doing their own research on the internet, right?
	Which is that one thing to learn from this is that evaluating the validity or how much credence to give to a result that's reported in a study in the academic literature is not straightforward.
	You certainly cannot get it from the abstract.
	If you do not have a solid grounding in so many methodological issues and the topic area at hand, then probably your chances of making an independent judgment of that particular paper are relatively low.
	I'm not saying don't attempt it.
	By all means, get involved.
	But just have an appropriate level of confidence in one's own abilities.
	Because one of the things we see with the gurus and their followers is they massively Overestimate their own abilities.
	And also they make relatively little effort to actually take a particular result and put it in the context of the entire research field, let alone what you were describing, Chris, which is actually to take into account the actual, the rigor evidence in that study and all the rest.
	So the point I'm making is not give up, it's too hard or whatever.
	My point is just that it's difficult.
	It takes training, it takes experience, and it takes an awful lot of time such that even though me, Professor of statistics.
	Many people say I'm clever.
	My mum says I'm handsome.
	My mum says I'm very clever.
	I don't try to do it a lot of the time because I know it's beyond me without investing like six months of my life, right?
	Rather, I actually do put some trust into other...
	Relevant people, people with credentials, people who do have the necessary backgrounds, experts, and so on.
	And, you know, you never give complete trust.
	The gurus like to just, you know, make it a binary kind of, oh, so you blindly trust, you know, blah, blah, blah.
	No, it's not that.
	It's about recognizing your own limitations to undertake this kind of thing and giving an appropriate level of trust to people who have put the decades of work in.
	That's like, you know, you learn as well when you start doing this kind of thing that like, Meta-analyses seem like the solution initially, because these are studies where they collect together lots of studies and look for overall patterns.
	And then if there's bad studies, it can be weeded out from the overall effect.
	But then you quickly realize that it doesn't work if there's things like publication bias, where it's sensitive to find positive results.
	And also, people can exercise researcher degrees of freedom in what studies they include and whatnot.
	And we've seen this in the meta-analyses of ivermectin.
	And again, there, you know, it's a cautionary tale, but the people who understand methodological critiques and are expert at detecting, you know, issues in research, they looked at the ivermectin studies.
	This was not their area of expertise, right?
	Actually, a lot of them were primarily psychology-trained people.
	And they noted, oh, there's warning signs here.
	There's warning signs in the data.
	There's warning signs in the way the studies are reported and so on.
	There's a warning sign that the cable being used is a default Excel table with the labels still visible.
	These sound like stupid things, but they matter because they're a sign that there's something wrong here.
	And when you try to present that to people, they regard it as, oh, you know, you're just trying to, you are exercising your degrees of freedom to dismiss studies that you don't like.
	And it can be that.
	But it can also be that you actually are able to correctly identify good quality and low quality studies.
	So there are instances, as you say, where sort of generalist knowledge, like in that case, psychologists applying sort of basic statistical epidemiological principles to a different area.
	There is overlap between disciplines and there are ways in which a generalist can contribute and provide a helpful critique.
	And there are ways in which...
	It's just specifically in the area of methodology, right?
	That's all I want to say is methodologists can go across boundaries to a certain extent, but when they're talking about methodology, but they don't know about the genetics data or the, you know, that kind of thing.
	Yeah.
	So just be very wary of the people that claim to be this polymath who can figure out everything in an afternoon and is now qualified.
	You know, that's different from what...
	What Chris was just saying before.
	Yeah.
	And so to take it back to this paper to finish off.
	So the nice thing about this paper, we're talking about the last paper with Abelson was readable.
	This is a readable paper, a bit more technical, but they have this nice part when they get to the discussion where they preempt criticisms.
	They have a section called criticisms, not far enough or too far.
	And they're taking these kinds of different positions that people might critique them for.
	And then they have a section called.
	Non-solutions in which they discuss alternatives that people might suggest and why they don't find them convincing, even if they're useful things, right?
	They want to say these on their own, another solution to the researcher degrees of freedom issue.
	Things like using Bayesian statistics, which incorporates prior probability.
	Now, that is potentially part of a solution, but the issue is that you have to assign prior probability in Bayesian statistics.
	So there is another degree of freedom that you can insert your...
	Preference on, right?
	And there's other things that they highlight.
	But I just like this paper, again, because it's readable.
	It presents an argument, identifies a problem, gives solutions, and it addresses nuances, right?
	Here's some other solutions that we think are relevant, but maybe not the answer.
	And it's short.
	It's like the paper in total is like six pages or so, although it's double column.
	So it's actually double that.
	Yeah, it's really, I love this paper.
	And rereading that reminded me of why I think it's such a good paper.
	So there we go, Matt.
	What about you concluding remarks or overall thoughts?
	It's a good paper.
	And the lesson to be taken from it is that there's just a variety of pragmatic things.
	Some simple, some a bit more complicated, some more difficult than others.
	There's just a variety of practical things we can do.
	This is a problem.
	It's kind of partly technical, partly sociological.
	It's never going to be perfect.
	The world isn't going to be perfect.
	The practice of science is not going to be perfect.
	And, you know, it doesn't have to be.
	It just has to be pretty good.
	Pretty, pretty, pretty.
	I think that's a very good point, though, is that, like, in some sense, this combines rigorous scientific objectivity, kind of, you need to be methodologically cautious.
	You need to pre-register studies and so on.
	Along with the acknowledgement that researchers are people and they exercise choices.
	And we have to factor that in when analyzing results and when contending with scientific data.
	So there is something to be said for the postmodern approach to emphasize science as a sociological or situated thing constructed by imperfect people.
	And it's a good illustration that you can take those insights and apply them in a practical, positive way.
	Rather than just a deconstructive, useless, jibber-jabber way.
	Yeah.
	Yeah, that's right.
	I mean, I'm totally on board with the critiques on the sociology of science, but I think a lot of them are focused on these political things and gender and, you know, whatever, imperialism, whatever, when actually the problems are like the stuff we talked about right here.
	There's no imperialism required.
	It's just a bunch of people being fallible.
	The other final point too is that exploratory science is okay too.
	You just have to say it.
	Just have to say it's exploratory and just be clear about what it is you're doing.
	It all doesn't have to be this kind of, oh, we specify these hypotheses and advances, blah, blah, blah.
	You can set it up such that you have heaps of degrees of freedom.
	As long as you correct for that to the extent that you can statistically and as long as they're made totally transparent.
	So yeah, I think everything's going to be fine.
	We're going to implement all of these recommendations.
	Don't read Stuart Ritchie's book about people dying from transplants that shouldn't be done and all that kind of thing.
	Science is okay.
	It's all right.
	It was a bit of a speed bump.
	Bit of a blip.
	Don't worry.
	Well, that's our second paper done.
	For all of you who have reached the end, thank you for sticking around.
	And do you know the next people, Matt?
	I don't remember the name of it off the top of my head, but I can say that it's going to be Supercore and it's going to be about iterated prisoner's dilemma, which is a fun little curious thing in computer science, but has wonderful applications to game theory and understanding society.
	That's it.
	Wow.
	We've given them...
	I look forward to learning about it myself after that introduction.
	So if you thought this was dry, that's going to be juicy.
	Just you weird.
	Even more.
	But yeah.
	So that's it.
	Thanks, everybody.
	Thank you.
	Bye.

▲