From Human Genome Research to Personalized Health Care

The potential is widely recognized, but much more knowledge is needed to make the science clinically useful.

“Big Science” in the life sciences was launched in 1986 with a bold plan to develop the technologies to determine the sequence of the 3 billion nucleotide base pairs (letters of DNA code) in the human genome. The Human Genome Project declared success by 2001 and has stimulated a wealth of related research. Analyses of the genomes of many organisms have yielded powerful evidence of sequences conserved during evolution. Analyses of microorganisms set the stage for pathogen/host interaction studies. Essentially all fields of life sciences research have been transformed by knowledge of protein-coding genes, recognition of genomic variation across individuals, findings of new mechanisms of regulation of gene expression, and patterns of proteins and metabolites in generating the features of living organisms. From the beginning, there have been high expectations that such knowledge would enhance clinical and public health practice through understanding of predispositions to disease, identification of molecular signatures and biomarkers for stratification of patients with different subtypes of a disease, earlier diagnoses, and discovery of molecular targets for therapeutic and preventive interventions.

There has been compelling evidence for at least 150 years that genetics plays a major role in many traits and diseases. Identical twins are much more likely to manifest similar traits and develop similar diseases than are fraternal twins (or regular siblings). Modern researchers first tested individual genes that seemed scientifically related to a particular disease. Now gene chips can probe 500,000 sequences throughout the genome for variation in single-nucleotide polymorphisms (SNPs) and segments of chromosomes. Genome-wide association studies have demonstrated genetic influence on height; glucose, cholesterol, and blood pressure levels; and risks for childhood-onset and adult-onset diabetes, macular degeneration of the retina, various cancers, coronary heart disease, mental illnesses, inflammatory bowel disease, and other diseases. Enthusiasm about these statistical associations stimulated the formation of companies to offer testing services with direct-to-consumer promotion. However, the market was leaping way ahead of the science.

Serious limitations in this approach have now been recognized. First, stringent statistical criteria are required to reduce the likelihood of false-positive associations, since such large numbers of genomic variants (SNPs) are tested. Second, very few of the highly associated genomic variants actually alter protein-coding gene sequences; this is no surprise, since our 20,000 protein-coding genes take up only 1.5% of the genome sequence. Tying genomic variants to nearby protein-coding genes is highly speculative, making predictions of the functional effects of the variation quite uncertain. Third, the 20 genomic variants associated with height together account for only 3% of the actual variation in height; similarly, 20 or more genomic variants associated with a risk of diabetes account for less than 10% of the risk. The results are not a sufficient basis for predictive medicine. Undeterred, geneticists are screening a far larger set of SNPs to identify more variants of small effect and are searching for less common variants that might have larger effects on disease risk. They are also using new sequencing methods that aim to find all variation, not just sample the SNP sites. The cost of SNP genotyping is now under $1,000 per person. The cost of sequencing, meanwhile, has dropped from the original investment of $3 billion to obtain the first sequence to an estimated $10,000 to sequence an individual with the latest technology, and may reach $1,000 in the next few years.

I believe that much of the unexplained variation in susceptibility will be explained by nongenetic environmental and behavioral risk factors that interact with genetic variation to mediate the risk and severity of disease. We will return to this topic of “ecogenetics” and its policy implications below.

Functional genomics

DNA sequences code inherited information. Proteins and RNA molecules interact with the DNA and histone proteins in chromosomes to regulate the expression of genes. In fact, all nucleated cells in each individual start with the same DNA; gene regulation and mutations during embryonic and later development and during the rest of life create differences among organs and cells. In concert with nongenetic variables, they influence the risk of various diseases. Just as we now have technologies to sequence genomic DNA and databases and informatics tools to interpret the laboratory output, we have developed proteomics technologies to characterize large numbers of proteins. Proteins are much more challenging to analyze, because they undergo numerous chemical modifications that generate a large number of different forms of the protein, with major differences in function. There may be as many as 1 million protein forms generated from the 20,000 genes. One way that we have evolved to have such complex functions with many fewer genes than the 50,000 to 100,000 that scientists expected to find is alternative splicing of DNA or RNA, generating additional protein products; these splice isoforms represent a new class of potential protein biomarkers for cancers and other diseases.

Powerful computational methods are required for multidimensional analyses that capture variation in genome sequence, chromosome structure, gene regulation, proteins, and metabolites. Such molecular signatures can be useful for deeper understanding of the complex biology of the cell and for tests of diagnosis and prognosis. However, it has been difficult to design and validate clinical tests with the high specificity (few false positives) and high sensitivity (few false negatives) needed to be useful in screening populations with low prevalence of a disease; the Food and Drug Administration (FDA) has approved very few new diagnostic tests in the past decade. Numerous publications have reported molecular signatures based on gene or protein expression for cancers and other diseases, but replication of this work in additional patients and laboratories depends on promising new technologies.

Systems biology/systems medicine

Complex biological functions may be disrupted by mutations in individual genes. Diagnosing these usually rather rare disorders has been quite successful, often with specific gene tests, followed by counseling for families. Common diseases are much harder. We now recognize that the generation of complex functions requires many gene-gene and gene-environment interactions acting over time. The field of systems biology is devoted to identifying and characterizing the pathways, networks, and modules of these genes and gene-regulatory functions. The significance of this field is profound, because therapies or preventive interventions in medicine may require subtle modification of entire networks rather than highly targeted, high-dose action on just one gene product such as a cell receptor or an enzyme. This concept will drastically alter our approaches to drug discovery and may enhance the ratio of therapeutic benefit to adverse effects. Understanding the interactions of pathways in cancers, cardiovascular diseases, the nervous system, or inflammatory disorders is likely to lead us to target more than one pathway. In the case of cancers, we should take a hint from combination therapy for microbial infections and design multi-target therapies that could both hit stem cells and prevent the emergence of resistant cancer cells. These approaches may require major revisions of FDA policies governing the drug-development/drug-approval process, which are barriers to combination therapies.

Pharmacogenetics and pharmacogenomics

Although it is well known that patients vary remarkably in their responses to most drugs, drug development and clinicians’ prescriptions generally are still designed for the average patient. For example, effective tests developed 50 years ago to identify patients at high risk for potentially lethal effects from muscle relaxants used in anesthesia are still not incorporated into standard medical practice. Recently, the FDA recommended the use of two gene tests to help doctors choose the initial dose for the anticoagulant warfarin, but many physicians are rightly skeptical about the practical value of the tests because they may not yield information in time for the initial doses and are often no more informative than the response to a standard first dose. Knowing the genotype in advance might help the few percent of patients with very high (or very low) sensitivity to the drug. Comparative effectiveness/cost-benefit analyses for such testing are under way.

Genomics and proteomics will eventually be important in earlier detection of adverse effects of drugs and dietary supplements in susceptible individuals, transforming toxicology from a descriptive to a predictive science.

Ecogenetics

One of the biggest challenges for realizing the medical and public health benefits of genomics is the capture of the variation in nongenetic environmental and behavioral risk factors for disease and the discovery of gene-environment interactions. Environmental factors include infections, diet, nutrition, stress, physical activity, pollutants, pesticides, radiation, noise and other physical agents, herbal medicines, smoking, alcohol, and other prescription and nonprescription drugs. Such exposures may cause mutations in genes and transient or heritable modifications in the methylation patterns of histone proteins and DNA in the chromosomes. These variables also affect responses to therapy.

Infectious diseases offer numerous opportunities for personalized treatment, because both the patient and the infectious agent can be genotyped, and interactions may be critical for the choice of therapy. The development of vaccines for particularly troublesome infections such as HIV, tuberculosis, malaria, and influenza requires much more knowledge than we currently have about the pathogens and susceptible human subgroups. Genomics is being incorporated into surveillance outposts around the world to detect the emergence of new strains of pathogens in animals and animal handlers, which may reduce the risks of future pandemics.

Personalized health care

This phrase is understandably very popular. It reflects the admirable goal of tailoring the treatment to the patient and the fact that different people with the same diagnosis may have multiple underlying mechanisms of disease and may require quite different therapies. With many widely used drugs, fewer than 30% of patients treated actually experience a benefit, and some of these may be getting better on their own or through placebo effects. The path to the ideal of predictive, preventive, personalized, and participatory (P4) health care must proceed through several complex steps. There must be sufficient evidence at molecular, physiological, and clinical levels to subtype patient groups and stratify them for targeted therapy or prevention. For example, specific subgroups of leukemia, breast cancer, and colon cancer patients can now be treated with molecularly targeted drugs. Conversely, anticancer drugs that target epidermal growth factor receptors have no benefit in the approximately onehalf of colon cancer patients who have a particular gene variant in a complementary pathway. There is a big leap from carefully selected patients in a randomized clinical trial of efficacy to evidence of effectiveness in patients with many coexisting diseases being cared for in the community. Similarly, the comparative effectiveness of medical devices and surgical procedures may depend on many practical details of access to timely care in the real world. In addition, physicians may continue to use a drug or device with even a low probability of benefit if no better therapy is available. Proving no benefit is difficult, and patients often demand a specific treatment.

Information about variation in responses to drugs and tests to guide clinical decisionmaking is available in the PharmacoGenomics Knowledge Base (www.pharmGKB.org). With the present push to install electronic health records, complex results for individual patients could be click-linked to resources for the interpretation of such tests. As information from molecular tests and imaging becomes much more complex, the routine admonition to “ask your doctor” must be supplemented by effective guidance to the doctor to tap into additional online resources.

Policy challenges

The long-awaited passage in 2008 of the Genetic Information Non-discrimination Act (GINA) helps clarify the rules for ensuring the privacy and confidentiality of personal health information and prohibits discrimination in health insurance and employment tied to genetic traits. Senator Ted Kennedy (D-MA) described GINA as “the first major new civil rights bill of the new century.” Of course, such protections should apply to all personal health information, especially in this electronic age; many privacy issues remain unresolved. The Department of Health and Human Services (DHHS) Office of the National Coordinator for Health Information Technology and the DHHS and National Institutes of Health (NIH) Offices of Protection of Participants in Biomedical Research will be important players as medical information becomes increasingly electronic.

A major federal policy plan with commitment to interagency cooperation is needed in the domain of ecogenetics. Linking medical and environmental data sets is complicated because patient information in genomic studies is routinely de-identified to protect patient privacy and confidentiality. Proper management of coded information could facilitate links between genomic labs and large-scale monitoring such as the periodic National Health and Nutrition Examination Study of the Centers for Disease Control and Prevention; the air, water, and waste-site pollution monitoring conducted by the Environmental Protection Agency and state and metropolitan agencies; and population-based epidemiology studies of conditions such as childhood cancers. Statisticians have methods for imputing reasonable estimates of exposures in neighborhoods and for individuals; these data could be merged with information from increasingly affordable molecular and genomic assays. The Genes and Environment Initiative within NIH co-led by the National Institute for Environmental Health Sciences and the National Human Genome Research Institute has invested in new exposure-measurement technologies. Making these links work is critical to realizing the benefits of our rapidly accelerating knowledge of genomic variation.

The new concepts for drug discovery and biomarkers that emerge from systems biology and pharmacogenomics are receiving attention at the FDA. Standardized requirements for clinical chemistry must be incorporated into academic/industry partnerships for drug studies and trials of biomarkers.

The National Coalition for Health Professional Education in Genetics aims to increase genetic literacy as a foundation for consumer discussions and decisions. The focus of several state health departments and consumer protection agencies on the tests and advertisements for tests for personalized genomic risks is timely, because the genome variants associated with particular diseases presently account for too small a portion of the risk to allow credible conclusions about those risks for individuals to be drawn.

With the necessary research of all kinds and with effective interagency partnerships, we can expect to see the following benefits emerge in the near future:

Enormous expansion of information about the complex molecular biology of many common diseases from the sequencing of DNAs and RNAs and the study of proteins and metabolites, with costs as low as $1,000 for an individual human genome sequence;
Gene-, organ-, and cause-specific molecular signature tests for several diseases;
Systems/pathways/networks bases for some new drugs and drug combinations, probably for treatment of cancers, brain disorders, and cardiovascular and liver diseases;
Advances in pharmacogenomics for drug approvals and e-prescribing guidelines, providing advice for patients centered on more-refined diagnoses and more-effective, safer therapies;
Information about modifiable environmental and behavioral factors tied to genotypes and disease risks for public health and personal actions; and
A better basis for consumer genomics, starting with advice that broad public health measures—increased physical activity, good nutrition, and control of blood pressure, cholesterol, weight, blood glucose, and infectious and chemical exposures—have multi-organ benefits that surely swamp the effects of statistically associated genome-based risk factors. Hopefully, we will gain evidence about whether knowledge of genetic predispositions motivates people to pursue healthier behaviors.

Search Issues