Two laboratories thought they’d found the perfect workaround to the ethically thorny issue of using stem cells from human embryos for research. In 1999 and 2000, they reported that they’d figured out how to convert bone marrow cells into many different kinds of tissues.
The field went wild. Within just a few years, by biologist Sean Morrison’s count, hundreds of labs reported exciting results where bone marrow cells “transdifferentiated” into many useful varieties. Scientists scrapped their ongoing research plans to dive into this rapidly growing field.
But was it real? Amy Wagers and colleagues at Stanford University decided to find out. They ran a series of carefully crafted experiments and concluded in 2002 that transdifferentiation of bone-marrow cells essentially didn’t exist (beyond the cells’ well-known ability to change into various types of blood cells).
The entire endeavor popped like a soap bubble.
“This episode illustrated how the power of suggestion could cause many scientists to see things in their experiments that weren’t really there and how it takes years for a field to self-correct,” Morrison wrote in an editorial in the journal eLife.
Morrison, a Howard Hughes Medical Institute investigator at the University of Texas Southwestern Medical Center, wasn’t simply concerned about the effort wasted in this one line of research. He is concerned that problems such as this pervade the biomedical literature and contribute to what’s become known as the “reproducibility crisis.”
I’d been covering science for 30 years for National Public Radio, and as stories such as these began to accumulate, I decided to spend a year systematically investigating the problem of poor-quality science. I describe what I learned in my new book, Rigor Mortis: How Sloppy Science Creates Failed Cures, Crushes Hope and Wastes Billion. And though the problems I uncovered made it clear that the biomedical research system faces serious challenges, I was also surprised and inspired by the openness with which almost everyone I interviewed was willing to talk about the crisis, its origins, and possible solutions.
But which half?
By some accounts, as much as half of what gets published in the biomedical literature is deeply flawed, if not outright false. Of course, we should not expect perfection from research labs. Science is by its very nature an error-prone enterprise. And so it should be. Safe ideas don’t push the field forward. Risk begets reward (or, of course, failure).
Only a few studies have tried to measure the magnitude of this problem directly. In one, scientists at the MD Anderson Cancer Center asked their colleagues whether they’d ever had trouble reproducing a study. Two-thirds of the senior investigators answered yes. Asked whether the differences were ever resolved, only about one-third said they had been. “This finding is very alarming as scientific knowledge and advancement are based upon peer-reviewed publications, the cornerstone of access to ‘presumed’ knowledge,” the authors wrote when they published the survey findings.
In another effort, the American Society for Cell Biology surveyed its members in 2014 and found that 71% of those who responded had at some point been unable to replicate a published result, and they reported that 40% of the time the conflict was never resolved. Two-thirds of the time, the scientists suspected that the original finding had been a false-positive or had been tainted by “a lack of expertise or rigor.” The society adds an important caveat: of the 8,000 members surveyed, it heard back from only 11%, so its numbers aren’t convincing. That said, Nature surveyed more than 1,500 scientists in the spring of 2016 and saw very similar results: more than 70% of scientists who responded had tried and failed to reproduce an experiment, and about half of those agreed that there’s a “significant crisis” of reproducibility. Only 10% said there was no crisis at all, or that they had no opinion.
“I don’t think anyone gets up in the morning and goes to work with the intention to do bad science or sloppy science,” says Malcolm Macleod of the University of Edinburgh. He has been writing and thinking about this problem for more than a decade. He started off wondering why almost no treatment for stroke has succeeded (with the exception of the drug tPA, which dissolves blood clots but doesn’t act on damaged nerve cells), despite many seemingly promising leads from animal studies.
As he dug into this question, he came to a sobering conclusion. Unconscious bias among scientists arises every step of the way: in selecting the correct number of animals for a study, in deciding which results to include and which to toss aside, and in analyzing the final results. Each step of that process introduces considerable uncertainty. Macleod said that when you compound those sources of bias and error, only around 15% of published studies may be correct. In many cases, the reported effect may be real but considerably weaker than the study concludes.
These problems are rarely deliberate attempts to produce misleading results. Unconscious bias, like the wishful thinking that drove the transdifferentiation frenzy, is a common explanation. That’s partly a consequence of human nature.
“We might think of an experiment as a conversation with nature, where we ask a question and listen for an answer,” Martin Schwartz of Yale University wrote in an essay titled “The Importance of Indifference in Scientific Research,” published in the Journal of Cell Science.
This process is unavoidably personal because the scientist asks the question and then interprets the answer. When making the inevitable judgments involved in this process, Schwartz says, scientists would do well to remain passionately disinterested. “Buddhists call it non-attachment,” he wrote. “We all have hopes, desires and ambitions. Non-attachment means acknowledging them, accepting them and then not inserting them into a process that at some level has nothing to do with you.”
That is more easily said than done. As physicist Richard Feynman famously told a graduating class at Caltech as he talked about the process of science, “The first principle is that you must not fool yourself—and you are the easiest person to fool.”
235 reasons why
And there’s no shortage of ways to go astray. Surveying papers from biomedical science in 2010, David Chavalarias and John Ioannidis cataloged 235 forms of bias, which they published in the Journal of Clinical Epidemiology. Yes, 235 ways scientists can fool themselves, with sober names such as confounding, selection bias, recall bias, reporting bias, ascertainment bias, sex bias, cognitive bias, measurement bias, verification bias, publication bias, observer bias, and on and on.
But though biases may typically be unconscious, this is not simply a story of human nature. Scientists are also more likely to fool themselves into believing splashy findings because the reward system in biomedical research encourages them to do so.
Some of the pressure results from the funding crunch facing biomedical research. The National Institutes of Health budget doubled between 1998 and 2003, leading to a vast expansion of the enterprise. That included a 50% increase in biomedical lab space at universities. But in 2003, Congress stopped feeding the beast. Adjusting for an inflation rate calculated for biomedical research and development, funding declined by 20% in the following decade. That pressure means that less than one in five grants gets funded. And that creates an incentive for scientists to burnish their results.
“Most people who work in science are working as hard as they can. They are working as long as they can in terms of the hours they are putting in,” says Brian Martinson, a sociologist at HealthPartners Institute in Minneapolis. “They are often going beyond their own physical limits. And they are working as smart as they can. And so if you are doing all those things, what else can you do to get an edge, to get ahead, to be the person who crosses the finish line first? All you can do is cut corners. That’s the only option left you.”
Martinson was a member of the National Academies of Sciences, Engineering, and Medicine’s committee that in April 2017 published a report on scientific integrity. It updated a report produced 25 years earlier. According to committee member C. K. Gunsalus, the previous study focused on the “bad apples” in research—those few scientists who were actively engaging in inappropriate behavior. The 2017 study looks more closely—as Gunsalus puts it, at the barrel itself and the barrel makers—to focus on the incentives that are driving scientists toward conclusions that don’t survive the test of time.
One of the central problems revolves around publishing. Top journals want exciting findings to publish, because hot papers bolster their “impact factor,” which ultimately can translate into profits. University deans, in turn, look to those publications as a surrogate for scientific achievement.
Veronique Kiermer served as executive editor of Nature and its allied journals from 2010 to 2015, when this issue came to a boil. She’s dismayed that the editors at Nature are essentially determining scientists’ fates when choosing which studies to publish. Editors “are looking for things that seem particularly interesting. They often get it right, and they often get it wrong. But that’s what it is. It’s a subjective judgment,” she told me. “The scientific community outsources to them the power that they haven’t asked for and shouldn’t really have.” Impact factor may gauge the overall stature of a journal, she said, “but the fact that it has increasingly been used as a reflection of the quality of a single paper in the journal is wrong. It’s incredibly wrong.”
The last experiment
Sometimes gaming the publication system can be as easy as skipping a particular experiment. Olaf Andersen, a journal editor and professor at Weill Cornell Medical College, has seen this type of omission. “You have a story that looks very good. You’ve not done anything wrong. But you know the system better than anybody, and you know that there’s an experiment that’s going to, with a yes or no, tell you whether you’re right or wrong,” Andersen told me. “Some people are not willing to do that experiment.” A journal can crank up the pressure even more by telling scientists that it will likely accept their paper if they can conduct one more experiment backing up their findings. Just think of the incentive that creates to produce exactly what you’re looking for. To Kiermer, the former Nature editor, “That is dangerous. That is really scary.”
Something like that apparently happened in a celebrated case of scientific misconduct in 2014. Researchers in Japan claimed to have developed an easy technique for producing extraordinarily useful stem cells. A simple stress, such as giving cells an acid bath or squeezing them through a tiny glass pipe, could reprogram them to become amazingly versatile. The paper was reportedly rejected by the journals Science, Nature, and Cell.
Undaunted, the researchers modified it and then resubmitted to Nature, which published it. Nature won’t say what changes the authors had made to enable it to pass muster on a second peer review, but the paper didn’t stand the test of time. Labs around the world tried and failed to reproduce the work (and ultimately suggested how the original researchers may have been fooled into believing that they had a genuine effect). RIKEN, the Japanese lab where the research was done, retracted the paper and found the first author guilty of scientific misconduct. Her widely respected supervisor committed suicide as the story unfolded in the public spotlight.
There is no question that the pressures built up in the system are having a corrosive effect on the output from scientific labs. But Henry Bourne, an emeritus professor at the University of California, San Francisco, also believes scientists themselves need to change. “I think that is what the real problem is—balancing ambition and delight,” he told me. Scientists need both ambition and delight to succeed, but right now the money crunch has tilted them far too much in the direction of personal ambition.
“Without curiosity, without the delight in figuring things out, you are doomed to make up stories,” Bourne said. “Occasionally they’ll be right, but frequently they will not be. And the whole history of science before the experimental age is essentially that. They’d make up stories, and there wouldn’t be anything to most of them. Biomedical science was confined to the four humors. You know how wonderful that was!” Hippocrates’s system based on those humors—blood, yellow and black bile, and phlegm—didn’t exactly create a solid foundation for understanding disease.
Bourne argued that if scientists don’t focus on the delight of discovery, “what you have is a whole bunch of people who are just like everybody else: they want to get ahead, put food on the table, enjoy themselves. In order to do so, they feel like they have to publish papers. And they do, because they can’t get any money if they don’t.” But papers themselves don’t move science forward if they spring from flimsy ideas.
Fixing this will also require a new attitude among deans, funding panels, journal editors, and tenure committees, who all have competing needs. Nobody is particularly happy with the current state of affairs, but the situation is unlikely to correct itself. Perhaps it is time for leading scientists, heads of scientific societies and academies, university presidents, journal editors, funding agency leaders, and policy makers to come together and work toward specific policies and practices that can begin to free scientists from the perverse and baked-in incentives of today’s scientific culture—to free them to put the delight of discovery above the ambition to get yet another grant and add yet another publication to their curriculum vitae.
Richard Harris is a science correspondent at NPR News.