Advances in our understanding of the genome suggest that for most public health challenges it may not be.
The future of health care is increasingly promised to rest on “precision” genomic medicine, based on the idea that what we are is in our DNA. The pervasiveness of this belief can be seen in the National Institutes of Health’s Precision Medicine Initiative, which promises to bring “precision medicine to all areas of health and healthcare on a large scale.” The promise in turn rests on the view that genes cause disease and that identifying them will allow doctors to predict an individual’s future disease and customize treatment precisely. The promise of “precision” suggests that we need exhaustive enumeration of genetic variants, requiring essentially open-ended projects with enormous samples—”Big Data” collected explicitly without being based on specific hypotheses.
This research paradigm is conceptually locking up ever more of the nation’s investment in biomedical science. It’s fair to ask, however, what justifies the underlying vision. Does what we’ve learned about genetics in the past half century support this promise of precision? If not, are there better ways to direct our finite fiscal and intellectual research resources?
Many diseases and other human traits are predominantly caused by single genes. Indeed, it was such traits in pea plants that led to our basic understanding of genes. But a century’s research trajectory has now shown that most traits, including common diseases such as diabetes, autism, and schizophrenia, or even just height, are due to the effects of not one but tens or even hundreds of genes, and every case is different. We can understand why this is so, but how should research be redirected as a result of this knowledge?
Ideas as well as organisms evolve. What we think today is a product of what was thought yesterday. For historical reasons, genes have become the iconic idea of the causes of our being: every day we hear that something is “in our DNA.” It is but a small step to the promise of precision medicine. But as we’ll see, the science itself tells us plainly what the politics of science hyperbole and science funding keep hidden: genes are important, but precision medicine is likely a false icon.
Crossing Mendel with Darwin
Let’s begin at the beginning. In 1856, a Moravian monk named Gregor Mendel set out to improve the yield of pea plants. He chose to work with pea traits that were qualitative, that is, that took on distinct states (for example, green/yellow or smooth/wrinkled) that didn’t change over generations. He knew that other traits were less predictable, but the ones he chose bred “true,” which made them reliable for farmers.
In many ways we are what we learn, and when Mendel was a student in Vienna, he heard lectures on a new theory that all chemical elements were multiples of the element oxygen. I think this led him to expect similar units of inheritance, and to attribute the variation he studied to discrete units of causation that he, too, called “elements.”
Science builds on what history provides, and Mendel’s work provided the basis for searching for these units of inheritance, which were later named “genes.” It is important that they were seen as units, because that expectation allowed the early geneticist Thomas Hunt Morgan and others to show that they corresponded to specific locations along chromosomes in the nucleus of cells. That in turn led others to show that genes were codes for specific proteins, the fundamental causal units of life. The code works because genes are strings of individual elements called nucleotides, of which there are four different kinds, whose sequential order in specific locations on a chromosome specifies, among other things, the order of a string of amino acids that will be assembled in the cell to make a specific protein. Variation in these codes arises among humans and other organisms because of mutation, changes in the nucleotide string that occur from time to time.
But there was already a fly in the causative ointment, and that fly was Mendel’s British contemporary Charles Darwin. His idea was that organisms evolved in a gradual flow of continuous, quantitative change, in which the contributions of the traits of parents blended to form the traits of their offspring. That seemed incompatible with the apparent permanence and qualitative nature of Mendel’s Elements that definitely did not blend.
By the 1930s, biologists had come to understand Darwin’s gradualism as the net effect of contributions of countless separate, but individually very small, Mendelian effects. This synthesis opened the door to modern genetic investigation. Decades of success at finding genes, their specific locations on chromosomes, their coding functions, and the regulation of their expression—when and in what cells a gene is used—followed. An important result of these advances is the view that what we are is affected by our genes, our individual sets of these causative points. Technologies based on this assumption have enabled us to do genomewide mapping, that is, using various statistical methods to search the genome—our 23 pairs of chromosomes (one set inherited from each parent)—for locations in which DNA sequence variation among individuals is associated with a trait of interest, such as a specific disease, or range of blood pressure, height, or even some purported measures of behavior or intelligence.
Interpreted through the lens of today’s computational Big Data worldview, mapping has led to the belief that wholesale enumeration of these causal points, on a genomewide scale, will lead us out of the wilderness of causal incomprehension and into an era of precise understanding of genes and their actions. Supported by massive funding, we geneticists have gotten our wish: a tsunami of data that will, we still insist, be the source of the “precision” in precision medicine.
But I wonder if this may more properly be viewed as a failed success. That’s because the data are revealing that what we wanted to find, and thought would be simple enough, is generally not how life works. To see what I mean, it will help to think figuratively, and perhaps in some ways literally, in terms of causal dimensions.
Genetic variation in any species, including humans, is the result of population history. That history is a process of descent, with genetic transmission that connects individuals and their functions generation after generation. An essential aspect of population history is heritable variation, whose ultimate source is genetic, that is, mutations in our DNA.
Because genes have specific locations on chromosomes, it is tempting to liken genetic variation to points of causal light that are either on or off, green or yellow. This was essentially Mendel’s bright idea in choosing the traits he would study in peas. That assumption lets us focus on each trait’s causal point and ignore the rest of the genome. But for the complex, later-onset, and environmentally affected diseases, whose genetic basis is the subject of the big-data swoon today, that assumption usually doesn’t hold. Obvious examples of non-Mendelian, non-discrete traits are heart disease, obesity, height, weight, intelligence, schizophrenia, blood pressure, late-onset diabetes—and perhaps even the tendency of some of us to write cranky assessments of the situation. Even though parents and offspring resemble each other for such traits to some degree, none have Mendelian two-state point causes.
Furthermore, treating genes as individual points ignores important aspects of how genes are used. The human genome is home to tens of thousands of different genes, but that’s not all: short DNA sequences near each gene control that gene’s expression, that is, when and in what cells the gene is expressed. These sequences are binding sites for the assembly of tens of regulatory proteins. The use of a given gene depends on the arrangement of these nearby sites along the line of nucleotides that is a chromosome, and that adds a linear dimension to what otherwise might seem to be a string of independent causal points. In fact, chromosomes contain many other types of sequence strings, whose functions depend on their location and arrangement along the chromosome.
A rather amazing fact is that all of our cells contain the same genome, which we inherited from our parents, but we are differentiated organisms with many different tissues and organs, and even each tissue does different things under different conditions. That means that gene expression patterns vary cell by cell and tissue by tissue: a gene isn’t always just “on” or “off,” but must instead respond to context and circumstances that change. This adds a time dimension of genetic causation. But there is more.
Population history generates webs of redundancy in genetic functions, meaning that the causal space is so complex that many different genetic pathways, involving different genes or patterns of gene usage, can achieve similar results. Indeed, perhaps the most important finding of gene mapping studies so far is the extent to which many different genotypes yield similar traits such as stature, blood pressure, diabetes, or intelligence. In general, no two individuals have the same trait for the same genomic reason. The contributions of individual causally related variants are elusive, because the variant’s effects are typically very small, the variant is rare, or both, and these vary among human populations.
Further complicating this picture is that mutations arise in the cells of each of the tissues in our body during our lives. These mutations are transmitted within the tissue when the cells divide, and they can affect the cells’ behavior, sometimes quite seriously. Cancer is the clearest example. However, these are called somatic mutations because though they can affect our traits, they are not in the germline (sperm or egg cells) transmitted from parent to offspring. Yet it is that inherited genome sequence on which mapping is typically based because it is presumed to apply to all of a person’s cells.
Genomes have codes for a repertoire of regulatory genes, whose coded proteins bind DNA near some other gene to affect that gene’s expression. But this is not a one-for-one kind of control. Instead, a regulatory protein is typically used in many different contexts. What makes regulation gene-specific is the combination of these factors that binds to nearby DNA to regulate the gene’s usage. This is like using some common keywords together to get a combination that yields a precise hit in a Google search. And since the regulatory genes are themselves each coded in a different place in the genome, their assembly in locally specific combinations elsewhere in the genome requires them to navigate to get there, and that makes genetic action three-dimensional within the cell. We may also think of action by combination as adding even another, rather abstract logical dimension to the genetic causal landscape.
More knowledge, less precision
Despite a steady drumbeat of promises linking genomics to precision medicine, mapping studies show, in exquisitely clear detail, the opposite of what would be needed to fulfill those promises: rampant causal imprecision. Furthermore, we are seeing only the proverbial tip of the genomic causal iceberg because mapping done to date has mainly involved Europeans. Yet we know very well that mutations are always arising in every local area, and those few that spread to geographically distant locations take countless generations to do so. That means that much if not most of the causation of the same trait will vary among populations: what we find in Europe will apply only partly to other places, meaning that even if the idea that genomic Big Data prediction would work, separate large-scale mapping studies would be required in each place.
However, even this is far from the most disturbing aspect of genetic complexity and its unpredictability. The deluge of variation specifically identified by mapping typically accounts for only a fraction—usually a small fraction—of the trait’s overall heritability, that is, the estimated part of its overall causation that is genetic. The unaccounted part involves genetic variants I referred to above with weak or rare individual effects. This “leaf litter” of countless unidentifiable individually minor variants will vary among individuals and populations. Identifying more of these sites is a typical rationale for requesting funds for expensive mapping studies involving hundreds of thousands of people. Increasing sample sizes and numbers of studies will mainly just add to the inexhaustible cacophony of variation that we find.
We should not be surprised by this. A major reason for the plethora of rare and weak effects is not the Darwinian one of relentless competition and hence precisely eagle-eyed natural selection. That is a convenient ideology that doesn’t fit the reality. Natural selection quickly favors strong positive effects that assure survival, and removes harmful ones, but is hard-pressed to discriminate among a multitude of tiny effects of less than existential significance. Instead of competition, the typical weak effects of individual mapped sites is more likely due to what has passed the screen of cooperation among gene products during the development of the embryo: by and large, what is born healthy already must basically function properly.
The variants that are survivable and hence available for mapping to find will generally have the residual weak genetic effects that are compatible with life. And of course the plethora of weak effects is by far most of what has been found. And you might notice that so far I have not yet even mentioned the environment, and our lifestyles, which can hardly even be measured accurately and yet which affect many if not most of our traits.
Could mapping now mainly be a very expensive exercise in chasing rainbows? It has successfully revealed something about biological causation that goes beyond the arrays of genes along a chromosomal line as individual, independent point causes with nearby regulatory elements. These complexities seem to throw into question the very idea of genomics-based precision medicine, fundamentally related to the concept of precision itself, about which more below.
And yet there are still deeper challenges. What connects and coordinates the tens to thousands of factors that contribute to a trait we are trying to understand? The mechanisms underlying the complex interactions among these factors remain largely unaccounted for. I think this at least raises the possibility that our causal landscape has some additional unrecognized dimensionality.
Spooky action at a (very short) distance
Albert Einstein famously couldn’t accept that physical effects could occur super rapidly at long distances in space and time—an idea called “entanglement.” He called it “spooky action at a distance.” However, he was wrong: in physics, distant effects really are important. In genetics, too, we can ask how interactions are accounted for over the very short distances across the genomic causal landscape.
Gene action is organized into networks, in which one gene activates or inhibits one or more other genes, which in turn affects yet others. In any cell, at any time, many networks are active, with thousands of genes being expressed. Each of these genes’ local chromosome region is bound by an appropriate combination of regulatory proteins. Do these molecules simply dart randomly around so rapidly and at such suitable concentrations that they all more or less automatically find each other fast enough and in the right combinations to trigger the right gene expression for that cell, just by chance? Or might something else be needed for our understanding, some other dimensional glue, to attract each complex of factors to its appropriate place and time? If such phenomena exist, they will not be found by enumerating countless weak variants in endless megastudies. A few examples of communication at a distance will show what I mean.
Our genomes contain hundreds of genes that enable us to smell different odors. These genes are located in clusters of varying numbers of the genes, and the clusters are scattered on almost all of our chromosomes. Yet, despite having hundreds to choose from, each odor-detecting cell in our nose uses just one of these genes. An odor molecule sniffed in will be detected only by cells expressing some particularly suitable detector gene, which sends a very specific “I smell it!” message to the brain, and it is the combination of signals that makes “lemon” something we can identify. What kind of communication within the nucleus selects just one of these odor-detecting genes in each cell in the lining of the nose, shutting down all the others?
Another example of currently unexplained communication in our cells is that when a gonadal cell is about to divide to produce a sperm or egg cell, the two copies of each chromosome (one that was inherited from each parent) line up with each other, and they then separate as the cell divides, so that the resulting cells each contain just one copy. How do they find each other in the nucleus, to align in this way?
Not all gene regulatory networks work within individual cells. Complicated organisms like us exist because there is communication among the cells in and between our different tissues and organs. The way this works is that cells monitor their external environment, detecting and responding to signal molecules that were produced by other cells in the body. That is how hormones work. The communication is two-way, receiving cells also sending molecules to signal other cells, and cells may even monitor the relative amount of different signal molecules that are passing by. This means that gene action, and its results, cannot be understood from looking at gene usage in the dimensions within cells alone. This may not be spooky, but it is action at a distance.
How do thousands of different genes scattered across the chromosomes become activated at any given time, based on signals they generate or detect from outside? The nucleus may seem like a bowl of spaghetti in which the many chromosomes are freely floating around on long, tangled strands. But clearly the thousands of molecules and their combinations avoid being so tangled as to interfere with their local production and assemblies, or with their impressive ability to vary within cells as circumstances change and among cells of different types that, nonetheless, are doing this with copies of the person’s same genome. Something must be organizing this four-dimensional space-and-time pattern whose changeability shows that it must include contingency standby factors.
Could something other than sequences of gene-by-gene activation—some as yet unidentified causal dimensions—be involved? It is at least fair to suggest that our deeply entrenched enumerative view of genomics as the means of revealing causality is blinding us to other possibilities, by locking us in research pathways, and their associated policy implications and promises, that may be past their prime.
We naturally hunger for frameworks that explain the world, but these often become dogmas. A dominant genetic thought-mode today rests on the enumeration of point causes. Yet, and despite the gravitational pull of its history, I’ve tried to use the idea of complex causal dimensions to show there are reasons to doubt that an understanding of biological traits, much less their precision prediction, can be reached by racing down the enumerative Mendelian track that dominates genetic science today. We should use what tools we have to be as precise as possible, of course, but that’s not what the word “precision” really means, as we should understand if better policy is our goal.
The classical criterion for genetic control of a trait was its presence in families in specific Mendelian patterns. Those inheritance patterns distinguish genes from other factors in our lives. But adequate samples of multiply affected families are hard to come by, and one of the rationales for Big Data mapping was that common diseases are caused by common variants with strong effect that could be identified in huge population samples without the need for chasing down sets of families.
There was never a good reason to believe that this would work, and mapping has clearly confirmed that simple genetic causes are not generally responsible for common diseases. Indeed, in an ironic twist, some mapping investigators are now defending megascale projects by stressing the importance of searching for rare, not common, variants with strong enough effects to be detected, believe it or not, in occasional families. There will of course be some successful searches, for reasons that were clear a century ago, because some rare genetic variants do have strong effect on their own. But by far most cases of common diseases are not each caused by a different single-gene effect.
There’s a deeper point, too. The idea of precision implies that there is a truth out there, and as we make our measurement instruments better, our estimates will approach that truth ever more closely. That works when what we want to measure has a true value the way, say, the speed of light does. Could the comparable medical fact be that a genetic variant has some true probability of causing a disease, rather than always doing so? Unfortunately, not even this is so, and for a subtle reason.
We can see this by asking why, if the variant is a cause, doesn’t the disease always result when the variant is present, rather than only with some probability? The reason is that the outcome depends not on that variant alone, but on the combination of many additional factors that I’ve discussed that are also present. Not only is that combination different for each person, but many of those factors, such as mutation and lifestyles, will arise in the future, and are not predictable, not even in principle.
This clearly means that we have no way to know how imprecise our predictions are. And that in turns shows why the basis for the promised genomic precision medicine simply does not exist, no matter how much we might wish otherwise.
It is reasonable to ask whether the complexity of genetic causation is a new discovery. Let’s look back nearly two decades, to the year 2000. In two books widely read by both scientists and the general public, Richard Lewontin and Evelyn Fox Keller noted the iconic status that “genes” had attained, and they warned about oversimplifying or centralizing the role of genes in life, ignoring or downplaying both the organisms that contain genes and the environments in which organisms must function. In that same year, the geneticist Joseph Terwilliger and I cautioned about these issues specifically in the context of the then-blooming romance with genomewide mapping.
Even earlier, in 1993, I concluded my own book by noting “enumeration is … a rapidly obsolescing way to think about the relationship between genotype and phenotype.” As I then said, the ultimate goal should be synthesis, and I’ve tried here to explain the nature of the genomic causal landscape that we need to confront if we are to go beyond enumeration.
The sometimes cosmic-scale complexity of the possible interactions within and among genomic dimensions is out there for all to see. Thoughtful geneticists understand these things perfectly well. So, on what basis can we promise precision predictability from DNA sequences?
Unfortunately, much of the answer is that the reality of improving the yield of publicly sponsored science is about the money, not the science. Underlying that reality is that when scientists must get their very salaries, and universities their operating funds, from individual grants, a conservative, defensive, safe, assembly-line, and eventually sclerotic system that always promises future miracles is as inevitable as sunrise. It’s what we have today.
For more, and more flexible, progress to be made, research resources should be moved away from Big Data studies that are too large, open-ended, and entrenched. Major funding change always meets staunch resistance, of course, but there should be no welfare system for geneticists if we refuse one for coal miners. Scientists are capable people who can, and would, adapt to a system that funds more focused and innovative ideas. The same resources could be applied in better ways to ease human suffering more directly and increase the chances of truly innovative discovery.
For starters, a great many life-devastating diseases really are genetic in every sense of the word. Cystic fibrosis, multiple sclerosis, muscular dystrophy, and Huntington’s disease are just a few well-known examples. We should make intensive investment in genetic-engineering technologies to fix such problems. When that has succeeded, as I think it often will, the engineering methods could then be applied to weaker, more subtle genetic effects, although their multiplicity and individually unique combinations mean that may have less probability of success. Meanwhile, we already know that for many or most common, complex diseases, by far the best medicine is prevention, which is about lifestyles, and that is where urgent investment should be made. Without toxic lifestyles, the remaining cases really would be genetic.
Science is hard, all the more so when problems seem urgent, as in our natural desire to prevent or treat disease. We can’t just go on Amazon and order discoveries of a fundamentally new sort that will revolutionize the future of medicine. If changes in the research funding system relieved investigators from the relentless scramble for funds, they could be freed to do truly creative work. If that creativity were informed by the challenges and opportunities in the clinical setting, rather than by the technological imperative to keep sequencing genes, then the grip of Mendelian fundamentalism might be loosened. A way should be found to shift funding toward more, longer, even if smaller grants, to support projects where scientific creativity brings together learning at the bench and the bedside.
The resulting freedom would enable projects to be more diverse and less safely me-too. Most ideas will fail, because that’s how science is. But some will almost certainly succeed, and yield bigger rewards for human knowledge and well-being than churning out more of the same. Change can be difficult, but life itself has evolved through change. That is a lesson we could apply to the evolution of the research enterprise, too.
Kenneth M. Weiss is Evan Pugh Professor Emeritus of Anthropology and Genetics, Department of Anthropology, Penn State University.
Evelyn Fox Keller, The Century of the Gene (Cambridge, MA: Harvard University Press, 2000).
R. C. Lewontin, The Triple Helix: Gene, Organism, and Environment (Cambridge, MA: Harvard University Press, 2000).
Kenneth M. Weiss, Genetic Variation and Human Disease: Principles and Evolutionary Approaches (New York, NY: Cambridge University Press, 1993).
Kenneth M. Weiss and Joseph D. Terwilliger, “How many diseases does it take to map a gene with SNPs?” Nature Genetics 26 (2000): 151-157.