[Originally published in Nautilus Magazine)
A few years ago, I became aware of a rarely-discussed problem in science: the irreproducibility crisis. A group of researchers at Amgen, an American pharmaceutical company, attempted to replicate 53 landmark cancer discoveries in close collaboration with the authors. Many of these papers were published in high-impact journals and came from prestigious academic institutions. To the surprise of everyone involved, they were able to replicate only six of those papers – approximately 10%. As expected, this observation had wide reverberations throughout the scientific community. The inability to independently replicate scientific findings threatens to undermine public and governmental trust in the institution of science itself.
Yet, as an experimental biologist, my initial reaction to this crisis was dismissive. I reaffirmed to myself that science is self-correcting, and that wrong ideas have a place within scientific discourse. After all, this is the very characteristic that distinguishes science from other human endeavors and gives it its nobility.
But as it turns out, irreproducibility in itself was not the problem—rather, it was its extent, which is becoming more apparent due to the exponential rise in scientific output (over 1.1 million scientific papers were indexed in PubMed in 2015 alone).
The cause of this problem is often misconceived as intentional fraud—which does occur, and is documented by Retraction Watch. But in reality, a majority of irreproducible research stems from a complex matrix of statistical, technical, and psychological biases that are rampant within the scientific community.
Also, the institutionalization of science in the early decades of the 20th century created a scientific sub-culture, with its own reward systems, behaviors, and social norms. The rest of society openly acknowledges the existence of it: In popular culture, scientists are portrayed as selfless individuals who are solely motivated by curiosity and a hunger for knowledge. However, the existence of the irreproducibility crisis implies that other motives may also exist.
The first question to ask, in addressing this problem, is: Why do scientists do science?
This question itself is the subject of an entire academic discipline. Sociologists of science have consistently identified “public recognition” as their motivating factor. Of course, other drivers do exist, such as puzzle-solving, knowledge building and financial gain. But recognition seems to represent the common essential driver. Scientists’ behavior on an individual level is consistent with this view. We are obsessed with discovering things first, affiliating with prestigious institutions, publishing in recognized journals, getting cited by the masses, winning awards, and standing on stages. Scientists, like the rest of humanity, crave attention and respect by their peers and role models.
The inability of scientists to admit this humanizing fact is understandable: The implication that their motives are self-serving may diminish the very nobility of science as a pursuit of knowledge. However, as the well recognized sociologist Robert Merton pointed out, the scientists’ need for recognition may stem from their need to be assured that what they know is worth knowing, and that they are capable of original thought. According to this view, recognition may play a role in boosting intellectual confidence which is essential for discovery and innovation.
The true nature of scientific motivation is also evident in the incentive systems built to ensure that scientists are rewarded with what they seek. These rewards often come in some form of validation, such as awards, titles, and press coverage, which are then translated by the institution of science into career advancement and opportunities for greater prestige. Guidelines for promotion in several academic centers where I have worked have listed “Broader reputation than local area” as one of two promotion criteria for associate professors. In other words, the promotion of an assistant professor to an associate professor requires them to be famous within their field.
Currently, publishing in prestigious journals and getting extensively cited represent the height of recognition in the scientific community. These two metrics (publication and citation) imply quality, but have long been proven to be hollow. Papers in high impact journals, for example, suffer from irreproducibility at almost the same rate as they do in lower impact journals. And when high profile papers are retracted, they are usually cited considerably before the retractions occur—and even after!
The inconvenient truth is that scientists can achieve fame and advance their careers through various accomplishments that do not incorporate or prioritize the quality of their work. If recognition is not based on quality, then scientists will not modify their behaviors to select for it. In the culture of modern science, it is better to be wrong than to be second.
This does not mean that quality is completely neglected. For example, the Nobel Prize—the most coveted form of recognition—is associated with scientific discoveries of the highest caliber. But for the tens of thousands of scientists fighting over shrinking research budgets, winning much less visible awards becomes an obsession needed for promotions and grants.
You might think that there’s a strong counter-balance to this. The majority of the assessment metrics for quality in modern science is, after all, based on citations, such as impact factor and h-index. Conceptually, citations represent a good approximation of quality. However, they are greatly influenced by the sociological dynamics of the scientific community and can thus be gamed. For example, peer reviewers can ask authors to cite their papers as an implied condition for favorable critique. Also, journal editors encourage citation of relevant papers published in the same journal to drive up its impact factor. Interestingly, savvy scientists often add citations to their papers preemptively to appease potential reviewers and editors.
The gaming of those metrics should not be viewed as merely a consequence of a flawed publishing model but as a reflection of academic motives. So introducing new publishing platforms, or changes in the peer-review process—such as the innovations pioneered by F1000 and PLOS ONE—although very important and timely, may not lead to drastic changes in scientists’ behaviors
and thus may not improve reproducibility. That will only happen when the behavior’s outcome becomes more closely aligned with the coveted reward—recognition.
To realign the crave for recognition with good science, we need quality metrics independent of sociological norms. Above all, objective quality should be based on the concept of independent replication: A finding would not be accepted as true unless it is independently verified. This conceptual distinction between replicated and un-replicated studies will change how science is reported and discussed.
Quality metrics based on such objective criteria will increase the visibility of both strong and weak papers. They will incentivize scientists to only publish findings they have confidence in, and discourage publishing for the sake of publishing.
The impact of this new system would also be far-reaching. Institutions would want to hire faculty with stellar qualitative records to build trust with industrial and governmental funders. Funding agencies will be inclined to support grants whose hypotheses are built on strong premises, and are submitted by investigators and institutions both known for quality. The public would become more skeptical of un-replicated science preventing the wide adoption of false scientific ideas.
Of course, the transition to an institutionalized process to assess replication-based quality will require structural changes. First, scientists need to be incentivized to perform replication studies, through recognition and career advancement. Second, a database of replication studies needs to be curated, which will require direct involvement from the scientific community. Third, mathematical derivations of replication-based metrics need to be developed and tested. Fourth, the new metrics need to be integrated into the scientific process without disrupting its flow.
It is our responsibility as scientists to create transparency on how academic science is incentivized, produced, and evaluated. As Brian Nosek and colleagues from the Center of Open Science once said, “Openness is not needed because we [scientists] are untrustworthy; it is needed because we are human.”