It’s easy to focus on the big flaws in science, like p-values, and forget about the little ones. Scientists are people, after all, but because of the technical nature of their analysis it’s easier for them to flub up. Nicholas Brown and James Heathers found a simple way to check their results, in some cases: look for impossible averages.

[T]he mean of the 28 participants in the experimental condition, reported as 5.19, cannot be correct. Since all responses were integers between 1 and 7, the total of the response scores across all participants must also be an integer in the range 28–196. The two integers that give a result closest to the reported mean of 5.19 (which will typically have been subjected to rounding) are 145 and 146. However, 145 divided by 28 is 5.17857142, which conventional rounding returns as 5.18. Likewise, 146 divided by 28 is 5.21428571, which rounds to 5.21. That is, there is no combination of responses to the question that can give a mean of 5.19 when correctly rounded. Similar considerations apply to the reported mean of 3.87 in the control condition. Multiplying this value by the sample size (27) gives 104.49, suggesting that the total score across participants must have been either 104 or 105. But 104 divided by 27 is 3.851, which rounds to 3.85, and 105 divided by 27 is 3.888, which rounds to 3.89.[1]

It’s pretty trivial, yet half of the published scientific papers they looked at had at least one “inconsistent” result, in their terminology, while one in five had multiple. When they requested the raw data from the authors of 21 papers, all nine of those who provided that data had some sort of flaw in their data analysis. Most were thankful for the second-check.

Wait, only a third of all authors got back? Unfortunately, it’s pretty common for researchers to drag their feet on data requests.

Unfortunately, 6 months later, after writing more than 400 e-mails—and sending some corresponding authors detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes—we ended up with a meager 38 positive reactions and the actual data sets from 64 studies (25.7% of the total number of 249 data sets). This means that 73% of the authors did not share their data.

Interestingly, the current response rate shows a remarkable similarity to the response rate (24%) that Wolins (1962) reported over 40 years ago after his student had requested data from authors of 37 articles in several APA journals. Moreover, in a time when data were not electronically available, not readily copied, and not sent easily by e-mail, Craig and Reese (1973) received 38% of the data sets they requested from research published in four APA journals.[2]

This isn’t as big a deal as the replication crisis, and a lot of this non-response can be explained by overworked researchers with a bit of academic churn. Nonetheless, it also makes great conditions for fraud, especially given how tough it is to publish replications.

An analysis of all studies published over the period of five years in the 100 top-ranked educational science journals revealed that the proportion of replication studies was 0.13% (221 of 164,589) only. Remarkably, almost two thirds of these studies replicated the results of the original studies. This relatively large proportion however is put into perspective by the fact that more than half of the replication studies were published by the same authors who were also responsible for the original studies. When only those studies were analyzed that had no overlap of authors the proportion of succesful replications declined to about 50%.[3]

Simple metrics like GRIM might seem trivial, but they’re essential if we want to improve science. GRIM in particular is easy enough to carry out with a pocket calculator, so I hope you’ll keep it in mind as you read a scientific paper.

[1] Brown, Nicholas J. L., and James A. J. Heathers. “The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology.” PeerJ Preprints, May 23, 2016. https://peerj.com/preprints/2064

[2] Wicherts, Jelte M., Denny Borsboom, Judith Kats, and Dylan Molenaar. “The Poor Availability of Psychological Research Data for Reanalysis.” American Psychologist 61, no. 7 (2006): 726–28. doi:10.1037/0003-066X.61.7.726.

[3] Fabry, Götz, and Martin R. Fischer. “Replication – The Ugly Duckling of Science?GMS Zeitschrift Für Medizinische Ausbildung 32, no. 5 (November 16, 2015). doi:10.3205/zma000999.