It’s an all too familiar situation to find yourself in when carrying out research. After weeks of arduous lab work, you test your hypotheses using the statistical tools that underpin all modern science. Alas, it comes to naught – no significance. Theoretically, you should write up the report anyway, showing the ways in which your grand theory was proved wrong. But who is going to do that? Whatever the reasons for your research, the same truths hold; significance is king. If your research doesn’t demonstrate statistical significance then I don’t rate your chances.
The next step is all too common: randomly testing different variables together until you find something with the magic p-value. It’s poor science, but everyone does it. It doesn’t matter whether you have done all the background work for this set of variables or not. A few hours later and your entire project is now fundamentally changed, you’ve come up with some half-plausible theory to underpin your idea, and so you begin to write a report.
Science is in the middle of a crisis in confidence. It is commonly known that the scientific method relies on core concepts, but perhaps the most important of these is that all good science should be replicable. When research is published, it’s a reasonable assumption that if someone else were to re-do that work, they would get the same or similar results. Evidence is mounting that this simply isn’t the case and that the majority of studies aren’t repeatable; reviews of this in the pharmaceutical industry show that up to three quarters of the most-cited oncology studies cannot be successfully replicated1,2. And as a result, we’re finding it harder than ever to develop new drugs.
So, what’s going wrong? I personally think it’s our attitude towards statistics: they are being treated as the sacred markers of research quality. They are not. Statistics are just a statement, telling us how likely it is that the correlation we see is a result of random chance. Without strong theory backing up the data, statistics are meaningless. And when you start to randomly compare variables there is a high chance that you will find something ‘significant’. If you do 20 tests, you would expect one of them to be significant just by chance, which is precisely the definition of a p-value of 0.05.
And yet, to get research published it is absolutely essential to show significance. A paper that can show a statistically significant result is over three times more likely to be published than one which cannot3, even if the negative result has a better theoretical basis and method. This puts incredible pressure on scientists, whose entire careers depend on regular publications, to ensure that their research always turns out significant results. There are numerous kinds of scientific fraud, and randomly testing for significance, whilst the most widespread, is perhaps the least insidious kind. It is also very common to structure an experiment in such a way as to guarantee significance, and these methodological flaws often go unscrutinised. In the lab the pressure to publish can put students and assistants in an uncomfortable position. One study found that a third had been pressured to support their mentor’s hypothesis even when they felt the data didn’t support it, and around a fifth felt that they had been forced to produce sub-par data themselves4 .
So, it’s clear that data is often handled poorly in the lab and that the drive to publish leads to widespread dubious practice. The last safety net against poor science is the peer review. Papers are sent out to experts in the field, who are then supposed to read them with a highly critical eye and weed out the errors. And yet, when a spoof paper riddled with blatant errors was sent out to 304 journals in 2013, it was accepted for publication by over half of the peer reviewers5. Science is competitive, but the number of papers published every year has skyrocketed thanks to the growing success of open-access journals and online publishing. This means that peer reviewers are given more and more papers to review whilst facing increasing demands on their time to do their own research. Consequently, papers are being given increasingly cursory reviews or even sent to a non-expert in the field. It’s not surprising to me then that papers with major flaws can slip through the net because they tick the ‘statistics box’ irrespective of the quality of these statistics.
There seem to be few real solutions to this issue. The Journal of Basic and Applied Social Psychology has moved to an outright ban of the use of p-values in their publications, on the basis that such values can be used as a ‘crutch’ for scientists with weak data6. Whilst this might go some of the way to addressing the issue in my view it still fundamentally misunderstands the problem. Statistics have a valid place in science. They are exceptionally useful tools in determining whether your data is likely to be consistent with the null hypothesis. Removing those tools doesn’t solve the fundamental problem – that many scientists are statistically illiterate.
Another better solution that has been proposed, is pre-registering hypotheses and methods with the journal before the experiment is done. This then guarantees publication even if the data is negative and removes the incentive to p-hack or alter the hypothesis to fit the conclusions. In addition, directing funding towards replication efforts would go a long way to identifying irreproducible results that do get published. And finally, official paid peer review positions would encourage more stringent reviews before the studies are accepted for publication, thereby weeding out the poor science. However, by far the most important solution is to reform the way that statistics are taught to prospective scientists. Until statistics are taught in an appropriate way across the board we cannot possibly hope to fix this problem.
The reproducibility crisis is deeply complex, with far more underlying causes than can possibly be addressed in one article. However, when we cannot rely on even the plurality of published papers to be reproducible, it becomes infinitely harder to refute the “anti-vaxxers” of the world, infinitely harder to convince sceptics that anthropogenic climate change is occurring, and infinitely harder to fight the message that “alternative healing” is somehow better than evidence based medicine. Our over-reliance on statistics, and the way that this pushes us to corrupt research, is a fundamental problem that undermines the trustworthiness of science and has far reaching consequences.
 C. Glenn Begley, & Lee M. Ellis. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531.
 Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets?. Nat Rev Drug Discov. 2011;10(9):712.
 Dickersin, Chan, Chalmersx, Sacks, & Smith. (1987). Publication bias and clinical trials. Controlled Clinical Trials, 8(4), 343-353.
 Mobley A, Linder SK, Braeuer R, Ellis LM, Zwelling L. A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLoS ONE. 2013;8(5):e63221.
 Bohannon J. Who’s afraid of peer review?. Science. 2013;342(6154):60-5.
 Trafimow, D. & Marks, M. Basic Appl. Soc. Psych. 37, 1–2 (2015).