The Limits of Knowledge: Personal and Public

One of the most basic assumptions underlying much of Western thinking is that individuals are rational beings, able to form judgments based on empirical information and logical deliberations in their quest for a course of action most suited to advancing their goals. This is assumed to be true for personal choices and for societal ones—that is, for public policies. A common narrative is that people used to be swayed by myths, folktales, and rituals (with religion sometimes added in), but the Enlightenment ushered in the Age of Reason, in which we are increasingly freed from traditional beliefs and instead rely on the findings of science. Progress is hence in the cards, driven by evidence. This assumption was first applied to nature, as we learned to crack its codes and employ its resources. For the past 200 years or so, it has also been applied to society. We no longer take society for granted as something to which we have to adapt, but we seek to re- make it in line with our designs. For many people, this means such things as improving relations among the races, reducing income inequalities, and redefining marriage, among other actions.

Economics, by far the most influential social science, has strongly supported the assumption of rationality. It sees individuals as people who have preferences and seek to choose among alternative purchases, careers, investments, and other options in ways that best “maximize” whatever they desire. This assumption has also come to be shared by major segments of other social sciences, including not just significant parts of political science (for instance, in the view that voters make rational choices) and sociology (people date to improve their status), but even law (laws are viewed as restructuring incentives) and history (changes in the organization of institutions can be explained in terms of the rational interests of individuals seeking to structure the world so as to maximize net benefits).

But this message is being upended by insights from the relatively new field of behavioral economics, which has demonstrated beyond reasonable doubt that people are unable to act rationally and are hardwired to make erroneous judgments that even specialized training cannot correct. Being created by people, governments have similar traits that spell trouble for rational policymaking and the progress that is supposed to follow. Still, a closer examination suggests that the findings of behavioral economics are not so much a reason for despair as an indication of the need for a rather different strategy. Once we fully accept our intellectual limitations, we can improve our personal decisionmaking as well as our public policies.

Scientific sea change

Some segments of social science never really bought into the progress and rationality assumption. Oswald Spengler, a German philosopher and mathematician best known for his book The Decline of the West, published in two volumes between 1926 and 1928, held that history is basically running in circles, repeating itself rather than marching forward. Social psychologists showed that people can be made to see things differently, even such “obvious” things as the length of lines, if other people around them take different positions. Psychologists demonstrated that we are driven by motives that lurk in our subconscious, which we neither understand nor control. Sociologists found that billions of people in many parts of the world continue to be swayed by old beliefs. However, the voices of these social scientists were long muted, especially in the public realm.

Different reasons may explain why those who might be called the “rationalist” social scientists drowned out the “nonrationalist” ones. These reasons include the can-do attitude generated by major breakthroughs in the natural sciences, the vanquishing of major diseases, and strong economic growth. Progress—driven by reason, rational decisionmaking, and above all, science—seemed self- evident. The fact that the rationalist social sciences used mathematical models and had the appearance of physics, while the nonrationalist ones drew more on narratives and qualitative data, also benefited rationalist social scientists.

Behavioral economics began to come into its own as doubts increased about society’s ability to vanquish the “remaining” diseases (see the war against cancer) and ensure economic progress, and as we became more aware of the challenges that science and technology pose. Above all, behavioral economics assembled a very robust body of data, much of it based on experiments. Recently, behavioral economics caught the attention of policymakers and captured the attention of the media, especially after its widely recognized leading scholar, Daniel Kahneman, was awarded the 2002 Nobel Prize in economics, the queen of rationalistic sciences, despite the fact that his training and research were in psychology.

Because the main findings of behavioral economics have become rather familiar, it is necessary to review them only briefly. The essential finding is that human beings are not able to make rational decisions. They misread information and draw inappropriate or logically unwarranted conclusions from it. Their failings come in two different forms. One takes place when we think fast. In his book Thinking, Fast and Slow, Kahneman called this System 1 thinking—thinking based on intuition. For instance, when we ask what two plus two equals, no processing of information and no deliberations are involved. The answer jumps out at us. However, when we engage in slow, or System 2, thinking, which we are reluctant to do because it is demanding, laborious, and costly, we often fail. In short, we are not rational thinkers.

In seeking to explain individuals’ real-life choices, in contrast to the optimal decisionmaking that they often fail to perform, Kahneman and Amos Tversky, a frequent collaborator, developed “prospect theory,” which has three major bundles of findings.

First, individuals’ evaluations are made with respect to a reference point, which Kahneman defines as an “earlier state relative to which gains and losses are evaluated.” When it comes to housing transactions, for example, many people use the purchase price of their house as the reference point, and they are less likely to sell a house that has lost value than one that has appreciated in value, disregarding changes in the conditions of the market.

The second major element is that evaluations of changes are subject to the principle of diminishing sensitivity. For example, the difference between $900 and $1,000 is subjectively less than that between $100 and $200, even though from a rational viewpoint both amounts are the same. This principle helps to explain why most individuals would prefer to take a 50% chance of losing $1,000 rather than accept a $500 loss: the pain of losing $500 is more than 50% of the pain of losing $1,000.

The third element is that individuals tend to exhibit strong loss aversion, with losses looming larger in their calculations than gains. For example, most people would not gamble on a coin toss in which they would lose $100 on tails and win $125 on heads. It is estimated that the “loss-aversion ratio” for most people is roughly between 1.5 and 2.5, so they would need to be offered about $200 on heads to take the bet.

Proof repeated

Replication is considered an essential requirement of robust science. However, in social science research this criterion is not often met. Hence, it is a notable achievement of behavioral economics that its key findings have been often replicated. For instance, Kahneman and Tversky found that responses to an obscure question (for example, what percentage of African nations are members of the United Nations) were systematically influenced by something that one would not expect people to be affected by if they were thinking rationally, namely a random number that had been generated in front of them. When a big number was generated, the subjects’ responses were larger on average than when a small number was generated. This finding indicates that the perception of an initial value, even one unrelated to the matter at hand, affects the final judgments of the participants, an irrational connection.

The effect demonstrated by this experiment has been replicated with a variety of stimuli and subjects. For instance, Karen Jacowitz and Kahneman found that subjects’ estimates of a city’s population could be systematically influenced by an “anchoring” question: Estimates were higher when subjects were asked to consider whether the city in question had at least 5 million people and were lower when subjects were instead asked whether the city had at least 200,000 people. J. Edward Russo and Paul Schoemaker further demonstrated this effect, finding that when asked to estimate the date that Attila the Hun was defeated in Europe, subjects’ answers were influenced by an initial anchor constructed from their phone numbers. Also, Drazen Prelec, Dan Ariely, and George Loewenstein found that when subjects wrote down the last two digits of their Social Security numbers next to a list of items up for auction, those with the highest numbers were willing to bid three times as much on average as those with the lowest.

Other studies have repeatedly replicated another phenomenon, known as “endowment,” observed by behavioral economists. In endowment, people place a higher value on goods they own than on identical ones they do not. For example, Kahneman, Jack Knetsch, and Richard Thaler found that when half the students in a room were given mugs, and then those with mugs were invited to sell them and those without were invited to buy them, those with mugs demanded roughly twice as much to part with their mugs as others were willing to pay for them. Similarly, Robert Franciosi and colleagues found that when subjects could trade mugs for cash and vice versa, those endowed with mugs were less willing to trade than would be predicted by standard economic theory.

True in real life

Many behavioral economics studies are conducted as experiments under laboratory conditions. This method is preferred by scientists because it allows extraneous variables to be controlled. However, extensive reliance on lab studies has led some critics to suggest that behavioral economics’ key findings may apply only, or at least much more strongly, under the artificial conditions of the lab and not in the field (that is, in real life).

Recent work in behavioral economics, however, has shown that its findings do hold outside of the lab. For instance, a study by Brigitte Madrian and Dennis Shea illustrates how what is called the “status quo bias,” a common facet of behavioral economics, shapes employee decisions on whether to participate in 401(k) retirement savings programs. Because of this bias, many millions of individuals do not contribute to these saving programs, even though the contributions are clearly in their self-interest. In another experiment conducted in the field, Uri Gneezy and Aldo Rustichini found that neoclassical theory expectations regarding incentives and punishments did not predict the behavior of parents at a daycare center in Israel. When Israeli daycare centers struggling with the problem of parents arriving after closing time to pick up their children implemented a fine of 10 shekels to discourage lateness, the number of parents arriving late actually increased—an example of nonrational economic behavior in action.

Shlomo Benartzi, Alessandro Previtero, and Richard Thaler studied what economists call the “annuity puzzle,” or the tendency of people to forego annuitizing their wealth when they retire, even though it would assure them of more annual income for the rest of their lives and reduce their risk of outliving their retirement savings. In a survey of 450 retirement 401(k) plans, only 6% of participants chose an annuity when it was available.

Resistant mistakes

Behavioral economics provides little solace for those who believe in progress. Data show that education and training do not help people overcome their cognitive limitations. For example, 85% of doctoral students in the decision science program at the Stanford Graduate School of Business, who had extensive training in statistics, still made basic mistakes in combining two probabilities. Studies also have shown that even people specifically alerted to their cognitive blinders are still affected by them in their deliberations.

My own work shows that decisionmaking is often nonrational not only because of people’s cognitive limitations, but also because their choices are affected by their values and emotions. Thus, whereas from an economic viewpoint a poor devout Muslim or Jew should purchase pork if it costs much less than other sources of protein, this is not an option these decisionmakers consider. This decision is a priori blocked out for them by their beliefs. As I see it, this is neither slow nor fast thinking, but not thinking. The same holds for numerous other decisions, such as whether to sell oneself for sex, spy for a foreign power, or choose to live in a distant place. True, if the price differential is very high, some people will not heed their beliefs. However, some will honor them at any price, up to giving up their life. What is particularly relevant for decisionmaking theory is that most individuals in this group will not even consider the option, and those who violate their belief will feel guilty, which often will lead them to act irrationally in one way or another.

Emotions rather than reasoning also significantly affect individuals’ political beliefs and behavior. For example, when people in the United States were asked in a Washington Post–ABC News poll whether President Barack Obama can do anything to lower gas prices, roughly two-thirds of Republicans said he can, whereas two-thirds of Democrats said that he cannot. When George W. Bush was in the White House, and the same question was asked, these numbers were reversed. Citizens thus tend to weight their political loyalties over the facts, even flip-flopping their views when loyalty demands it.

Policymakers, who make decisions based not merely on their individual intellectual capacities and beliefs but also benefit from the work of their staff, nevertheless often devise or follow policies that disregard major facts. For example, policymakers have supported austerity programs to reduce deficits when economies are slowing down, instead of adding stimulus and committing to reduce deficits later, as most economic studies would suggest. They have repeatedly engaged in attempts to build democratic governments by running elections in places, such as Afghanistan, where the other elements essential for building such governments are missing. And they have assumed that self-regulation will work even when those who need to be restrained have strong motives to act against the public interest and their own longterm interest. It may seem a vast overstatement until one looks around that most public policies fall far short of the goals they set out for themselves, cost much more than expected, and have undesirable and unexpected side effects. We seem to experience equally great difficulties in making rational public policies as we do when making personal ones.

Adapting to limits and failings

The findings of behavioral economics have led to some adaptations in the rationalist models. For instance, economics no longer assumes that information is instantly absorbed without any costs (an adaption that arguably preceded behavioral economics and was not necessarily driven by it). Thus, it now is considered rational if someone in the market for a specific car stops comparative shopping after visiting, say, three places, because spending more time looking around is held to “cost” more than the additional benefit from finding a somewhat lower price. Aside from such modifications in the rationalist models, behavioral economics has had some effects on ways in which public policies are formed.

Richard Thaler, a professor at the University of Chicago, is a highly regarded behavioral economist. He argued in his influential book Nudge: Improving Decisions about Health, Wealth, and Happiness (coauthored with Cass Sunstein) that people do not make decisions in a vacuum, based on their own analysis of the information and in line with their preferences. They inevitably act within an environment that affects their processing of information and decisionmaking. For instance, if an employer offers his workers health insurance and a choice between two programs, they are not going to analyze or seek out many others. They are a bit more likely to do so if the employer will reimburse them in part for costs if they choose a program other than the ones offered by their workplace.

Thaler hence suggests restructuring “external” factors so as to ease and improve the decisionmaking processes of people, whether they are consumers, workers, patients, or voters. His most often–cited example is signing people up for a 401(k) retirement program but allowing them to opt out rather than asking them if they want to opt in. This policy is directly based on the behavioral economics finding that people do not act in their best interest, which would be to sign up for a pension program as soon as possible. Due largely to Thaler’s influence, Great Britain will be implementing legislation in late 2012 that will change the default option for corporate pension funds, with employees being automatically enrolled unless they elect to opt out.

Thaler called this restructuring “nudging,” because this approach, unlike traditional regulations, does not force anybody to toe a line, but merely encourages them to do what is considered rational, without having to perform an analysis and act. Thaler noted up front, and critics stressed, that this approach will work well only as long as those who nudge have the interest of those who are being nudged at heart.

The other author of Nudge, Cass Sunstein, has been called “the nudgemeister.” President Obama appointed him to head the Office of Information and Regulatory Affairs in the White House. Sunstein has been working to remove regulations that are unnecessary, obsolete, or unduly burdensome and to foster new ones. One of his main achievements that is based on behavioral economics has been to simplify the information released to the public—to take into account individuals’ limited capacity to digest data. This was achieved most visibly in the redesigned dietary recommendations and fuel efficiency stickers for cars.

Stumbling forward

As Kahneman, who among other posts is a Senior Scholar at the Woodrow Wilson School of Public and International Affairs at Princeton University, explained in a personal correspondence, the reason why behavioral economics has not taken over is that “at this point there is no behavioral macroeconomics, no forecasting models based on behavioral foundations, etc. It is probably too early to conclude that these achievements are impossible. In any event, I think it is fair to [say] that behavioral approaches have prevailed wherever they have competed—but they have not competed in many central domains of economics, and the standard model remains dominant by default. It turns out to be extremely difficult to do good economics with more complex assumptions, although steady progress is being made.”

As I see it, behavioral economics suggests that we need a radical change in our approach to personal and collective decisionmaking, an intellectual shift of a Copernican magnitude. I can here merely illustrate the contours of a much less demanding form of decisionmaking. The basic approach turns the rationalistic assumptions on their head; it takes as a starting point that people are unable to gain and process all the relevant information, and they are unable to draw logical conclusions from the data they do command; in other words, that the default is nonrational decisionmaking. It assumes that given the complexity of the social world, we must move forward not like people equipped with powerful headlights on a night brightly lit by a full moon, but like people who stumble forward in a dark cave, with a two-volt flashlight: very carefully and always ready to change course.

If they follow my line of thinking, nonrationalists will assume that they are likely to make the wrong choice, and hence they will seek to provide for as many opportunities as possible to adapt course as a project unfolds and more information becomes available and to make as few irrevocable commitments as possible at the starting point. Simple example: If you are building a house, do not sign off on the architect’s plans but insist that you be allowed to make changes. As you find out what digging the foundations reveals, the cost of some materials rises unexpectedly while that of others falls, new ideas occur to you, and so on. In other words, we are better adapted to our limitations if we can fracture our decisions and stretch them out over time rather than front-load them (which is one major reason for the common failure of long-term, especially central, planning).

The less we know, I suggest, the larger the reserves we need. (It should be noted that our inability to know and to process what we know is smaller in some areas than in others, such as in dealing with infectious diseases versus mental illness.) We should expect unexpected difficulties to arise and retain uncommitted resources to deal with these difficulties. This holds for militaries as well as for people who start a new business and for most everyone else.

The less we know, the more we should hedge. Studies of investment have long shown that people achieve better results if they do not try to determine which investment instrument will do better and invest in that instrument, but divide their investments among various instruments. As the U.S. Securities and Exchange Commission has noted in a “beginners’ guide” to investing: “Historically, the returns of the three major asset categories [stocks, bonds, and cash] [show that by] investing in more than one asset category, you’ll reduce the risk that you’ll lose money and your portfolio’s overall investment returns will have a smoother ride.” The less we know, the more we should not merely hedge, but hedge more widely; for instance, by not merely distributing our investments among different asset categories (with some financial advisers recommending investing in real estate as well as stocks and bonds) but also within each category (investing in at least a dozen stocks rather than just four or five). The same concept applies to decisions by military planners (who should not rely, for example, on one new type of fighter airplane) and to decisions by economic planners (who should not rely on choosing winners and losers, a process otherwise known as “industrial policy”).

An important element of the nonrational approach to policymaking is to acknowledge our limitations and not to overpromise results. In order to gain capital from venture funds or credits from banks, or appropriations from legislatures, those who seek support often portray their expected gains in the most optimistic way possible. If the preceding analysis is correct, such overselling leads to disappointments and resentments when the new program to improve, say, the reading or science scores of the nation’s high-school students does not pan out and we must redesign the program and seek more resources. All of the parties involved would be better off if all such new programs were framed from the start as experiments. It should be understood that their need to be adapted is not a sign of failure, but as the only way to make progress, such as it is, in the social world: slowly, one step back for every one or two steps forward, at greater costs and sacrifices than expected, with fewer results.

All this looks grim only if one ignores the lessons of behavioral economics. If one truly takes them in, rather than assumes that we are born with wings ready to fly, we shall learn to do as well as human beings can in a world in which we must learn to crawl before we can run.

Archives – Fall 2012

These paintings by Albert Herter adorn the walls of the Members’ Room of the National Academy of Sciences Building in Washington, D.C. Opened in 1924, the building recently underwent a major two-year restoration project during which time these panels, as well as other works by Herter throughout the building, were restored to their original grandeur. These paintings are two of eight stylized insignia representing historic universities. The age of the figures suggests the relative age of the schools represented. For example, Harvard University, founded in 1636, is represented by a young man, in contrast to the more mature figure representing Cambridge University, established circa 1209.

The Climate Struggle Heats Up

In The Hockey Stick and the Climate Wars, Michael E. Mann offers a personal assessment of the controversies and shenanigans that have surrounded the issue of global warming during the past decade and a half. The “hockey stick” refers to a famous graph, produced by the author and others in 1998 and refined in 1999, that shows the average global temperature during the past 1,000 years spiking upward in the late 20th century, exceeding the levels reached during the Medieval Warm Period. The “climate wars” that followed were largely generated by globalwarming deniers, who sought to discredit not only Mann’s graph but also the scientific foundations on which it rested. Considering the hostile rhetoric of climate-change skeptics and the personal threats leveled against the author and his colleagues, the military analogy is apt. As Mann notes, the late “right-wing provocateur Andrew Breitbart had ‘tweeted’: ‘Capital punishment for Dr. James Hanson. Climategate is high treason.’”

The “Climategate” issue in question, more properly known as the “Climatic Research Unit Email Controversy,” was the key episode in the larger struggle and thus forms the ultimate focus of Mann’s book. In November 2009, several weeks before the opening of the United Nations Climate Change Conference in Copenhagen, a computer server at the University of East Anglia in the United Kingdom was hacked, giving opponents access to thousands of private emails that had circulated among climatologists. Several exchanges that seemingly indicated scientific misconduct soon proliferated across the Web, convincing many right-wing commentators that climate change was little more than a hoax perpetrated by environmental zealots masquerading as dispassionate scientists.

Mann seeks above all to set the record of this incident straight, specifying the actual content of the relatively innocuous and purposely misconstrued messages. As he notes, “Climategate” is a poor label, because it implies that the wrongdoing was at the hands of the scientists and not the hackers—and their supporters—who illegally gained private information and then twisted it out of context to score political points.

Call it what you will, the 2009 email scandal was momentous, turning mainstream conservative opinion in the United States against the very concept of anthropogenic climate change. A mere half-decade ago, the Republican Party leadership not only accepted global warming but also embraced farreaching carbon-control efforts, provided that they remained market-oriented. Today, most GOP stalwarts scoff at the mere possibility of humancaused climate change, regarding all suggested responses as veiled attempts to shackle the U.S. economy. This shift reflects the general rightward swing of the conservative movement that occurred with the economic crisis of 2008 and the election of Barack Obama, but the seemingly conspiratorial emails helped propel the transition. In rightwing Web sites to this day, climatechange concerns are commonly dismissed as having been debunked by the “Climategate” revelations. The fact that no fewer than eight scientific committees subsequently examined the scandal and found no evidence of misbehavior is either ignored or dismissed. Party-line thinking, it now sometimes seems, has come to doubt the integrity of the entire scientific establishment, viewing the exoneration of the maligned climatologists as additional evidence of a vastly larger plot.

The passage of the Republican establishment into such antiscientific terrain has troubling implications. Although the conservative movement in the United States has long harbored a fundamentally antiscientific “creationist” wing, until recently its core constituency fully embraced reason and science. A mere 20 years ago, opposition to the scientific method was more closely associated with the far left. At the time, eco-radicals, radical feminists, and anticolonialists castigated science as an inherently violent, “masculinist,” and imperial project designed to dominate nature and control subjugated peoples. Those of us who participated in the mid-1990s conference called The Flight From Science and Reason, devoted to countering this antirational onslaught, found ourselves pilloried by the academic left for supporting a reactionary cause and accepting funds (out of necessity) from conservative foundations. Although hostility to science has by no means entirely evaporated from the left, it has long since ceased to simmer. Today, it is the political right that is inclined to regard science with contempt, with the left acting, although not consistently, as its champion. Such a state of affairs is unlikely to serve the interests of the conservative movement; those who deny science in the end refute reality, a difficult position to maintain in the long run.

The larger significance of the climate wars, however, is largely bypassed by Mann. Yet the book as it stands is still powerful and important, offering a damning indictment of the political campaigns of the climate-change deniers. The author does an admirable job of explaining climatological technicalities and their statistical foundations for a lay audience, and his dissection of the “Climategate” controversy is masterful, establishing the essential innocence of the researchers in a clear and convincing manner. Following in the footsteps of Naomi Oreskes and Erik M. Conway’s Merchants of Doubt, Mann also does a fine job of exposing the complex machinations of the denial apparatus, outlining the many connections among foundations, bloggers and other pundits, politicians, and a few “maverick” scientists of varying repute. He also shows how journalists deepen the confusion by framing the climate wars as balanced scientific debates, when in fact virtually all reputable climatologists fully accept the reality of anthropogenic climate change.

Personal excursions

As detailed as Mann’s expositions of climate-research techniques and controversies are, they are still not adequate to fill an entire book. But rather than supplying the necessary bulk by taking on larger political and conceptual issues, he turns in a personal direction. As a result, The Hockey Stick and the Climate Wars is in the end a journalistic-scientific account conjoined with a sketchy autobiography. Unfortunately, the biographical material does nothing to advance the author’s larger arguments.

Considering the personal abuse that he received, it is perhaps not surprising that Mann would have taken his book in a personal direction. Not only did powerful interests try to torpedo his tenure, but both he and his family were physically threatened. One email read, “You and your colleagues who have promoted this scandal ought to be shot, quartered, and fed to the pigs along with your whole damn families.” Although such a message might be dismissed as the ravings of a deranged fanatic, Mann shows that it fits into a larger pattern of character assassination employed by many climate-change denialists. In what he aptly deems the “Serengeti strategy,” opponents select individual climatologists for assault, much as lions pick off single zebras, trusting that the naïve scientific community will be unable to mount an adequate defense of its most beleaguered members.

Although Mann’s account of such attacks is powerful and chilling, his larger strategy of couching his arguments within an autobiographical framework was not well advised. Five pages into the first chapter, we are whisked away from compelling issues of science and subterfuge into Mann’s unexceptional childhood, learning, for example, about his fascination with the possibility of faster-than-light travel. What bearing such information might have on the climate controversy is unclear. Thankfully, self-revelation diminishes after the first chapter, although a distracting personal focus pervades the entire text.

One can imagine that the author included such superfluous information at the urging of an editor or agent. “Human interest” is thought by many to be the key to brisk book sales, but to function as promised, the biographical passages must at least be interesting. Pandering to imagined audience desires, moreover, hardly seems fitting for a university press book, which should presumably aim for a higher common denominator.

At times, moreover, Mann also unduly simplifies technical issues, occasionally to the point of error. On page 32, for example, he tells us that, “tropical tree species typically do not have annual growth rings (look at a palm tree stump sometime if you don’t believe this).” Actually, the lack of growth rings in palm trees has nothing to do with their location in the tropics; those growing in temperate northern California also lack annular patterns. Instead, palms have no growth rings because they are monocotyledons that do not produce true wood. Considering the fact that tree-ring data were crucial to the “hockey-stick” climatic reconstruction, such a confused explanation is more than a little troubling.

A wallflower no longer

Although Mann generally sticks to a straightforward narrative of events interspersed with technical explanations and personal details, he does gesture in a broader direction near the end of the book. His main concern here is the proper role of the scientist in public policy debates. Mann claims to have experienced a personal transformation in this regard over the course of his ordeal. He claims that before the climate wars, “taking anything remotely resembling a position regarding climate change policy was, to me, anathema.” Being unintentionally thrown onto the public stage and subjected to personal vilification brought a change of mind: “Everything I have experienced since then has gradually convinced me that my former viewpoint was misguided.” Mann now advocates insistent political engagement by the climatological community.

Mann’s revised position on this matter seems reasonable. The idea of the disinterested scientist single-mindedly pursuing truth while remaining oblivious to wider issues has long struck many as an ideal that can never be fully realized, and one that only the naïve would wholeheartedly embrace in the first place. But despite his conversion to a more activist perspective, Mann still writes as if his views derive entirely from scientific inquiry and rational reflection, uncontaminated by the ideological blinders and self-serving motivations that so distort those of his opponents. Insofar as his personal scientific endeavors are concerned, such strict adherence to the canons of reason most likely does obtain. But The Hockey Stick and the Climate Wars strays well outside the confines of pure research into highly contestable political terrain. Here Mann’s own ideological presuppositions guide, and at times deform, his larger arguments.

Consider, for example, Mann’s discussion of what he calls shooting the messenger: the tactic of viciously attacking those who bring accurate but unwelcome environmental news. Mann traces this ploy to denunciations of Rachel Carson’s Silent Spring (1962) and Paul Ehrlich’s The Population Bomb (1968). Because Ehrlich’s book, Mann claims, “has ultimately proven prophetic,” condemnations of its “alarmism” by the likes of Julian Simon can only be regarded as early examples of invidious “swiftboating.” This claim is preposterous. Mann sees prophetic insight because The Population Bomb depicted humanity and nature as locked on a “collision course,” a ubiquitous concept in environmental circles at the time that merely formed the backdrop of the book, not its thesis. Ehrlich’s actual prophesy was of an impending global catastrophe, aptly summarized by his famous opening lines:

“The battle to feed all of humanity is over. In the 1970s and 1980s hundreds of millions of people will starve to death in spite of any crash programs embarked upon now.” That prediction, like almost all others made in the book, was not fulfilled. Taken on its own terms, The Population Bomb can only be regarded as a spectacularly anti -prophetic work.

A more contemporary example of Mann’s ideological blinders is found in his expressed surprise that “even the conservative Foundation for Individual Rights in Education (FIRE)” denounced Virginia Attorney General Ken Cuccinelli’s “witch hunt against climate scientists.” Even? Because FIRE is a strictly nonpartisan organization devoted to protecting all forms of free expression on U.S. campuses, one could hardly have expected anything else. Merely describing FIRE as “conservative” reflects either willful ignorance of the foundation or fundamental confusion about the meaning of the term. Admittedly, FIRE more often defends right-leaning students, professors, and campus organizations against restrictions imposed by left-leaning administrators than the reverse, but that is only because First Amendment rights on campus are thwarted more often by far-left than far-right restrictive penchants. Denigrating an unwavering First Amendment association because it advocates on behalf of constitutionally protected speech that one disagrees with or finds distasteful can only be regarded as a betrayal of a core value of liberal society.

As his title indicates, Mann sees the conflict over climate change as an intellectual “war,” with the fate of Earth itself hanging on its resolution. As his final chapter demonstrates, he now sees himself a fighter in this portentous struggle, engaging battle with a combative book. The defense that he puts up is strong, and he effectively demolishes many of the bulwarks of his opponents. But his effort remains something of a rearguard action, one that will not likely make much of a difference in the larger struggle. Mann’s passion and his climatological expertise are clearly evident, but his ability to gain a gain a wide readership, much less to sway a broad swath of public opinion, remains limited. Winning the climate wars will require convincing the bulk of the population that global warming presents a grave threat that can be successfully met through public policy reforms. Given its unrelenting partisanship, unbalanced ideological proclivities, and insistence on personal excursions, the role of The Hockey Stick and the Climate War is likely to be circumscribed.

Back to Basics on Energy Policy

In June 1973, President Richard Nixon addressed the emerging energy crisis, saying that “the answer to our long-term needs lies in developing new forms of energy.” He asked Congress for a five-year, $10 billion budget to “ensure the development of technologies vital to meeting our future energy needs.” With this speech, the federal government set out to engineer a fundamental transformation of our energy supply.

All seven subsequent presidents have endorsed Nixon’s goal, and during the past 40 years, the federal government has spent about $150 billion (in 2012 dollars) on energy R&D, offered $35 billion in loan guarantees, and imposed numerous expensive energy mandates in an effort to develop new energy sources. During this time, many talented and dedicated people have worked hard, done some excellent science, and learned a great deal. Yet federal energy technology policy has failed to reshape the U.S. energy market in any meaningful way.

The major failure has been in efforts to commercialize technologies, with many billions of dollars essentially wasted on loan guarantees, tax credits, and other subsidies that never produced results. We have failed to learn that commercialization cannot be forced and must wait until the technologies are competitive enough to support private investment on a market basis.

It’s time to refocus the nation’s efforts on what the federal government has traditionally done best: supporting conceptual and technical research. Commercialization should be left to the marketplace.

A rocky road

Fossil fuels (oil, coal, and natural gas) are the dominant source of energy today and will be for decades to come (figure below). The federal push to develop alternatives in the two other major energy categories—nuclear and renewables—has been rocky, to say the least.

The nuclear era began with the brilliant physics and engineering of the Manhattan Project and the subsequent development of nuclear reactors by the U.S. Navy for submarines and aircraft carriers. The road to civilian applications, however, has been less successful. The aftermath of World War II brought a wide range of ideas for the peaceful application of nuclear power, including nuclear-powered aircraft, rockets, and even cars (the Ford Nucleon). In 1959, the U.S. government launched the N.S. (Nuclear Ship) Savannah, a nuclear-powered passenger-cargo ship, to demonstrate the commercial potential of nuclear power. The ship was a technical success but an economic failure, and no other commercial nuclear vessels were ever built in the United States.

Electricity generation emerged as the best civilian application for nuclear fission. In the immediate postwar period, coal was the dominant source of electricity, but it created serious air quality problems. Beginning in 1965, electric power companies began building generating plants burning high-sulfur heavy fuel oil made from imported crude. Nuclear power offered the potential for an almost infinite fuel supply with no pollution or dependence on foreign oil. Technically, nuclear power plants can be built almost anywhere and in any number, and can be operated with high load factors, often more than 90%. At first, the economics of nuclear power looked promising, prompting Atomic Energy Commission Chairman Lewis Strauss to hope that electricity would someday be “too cheap to meter.” The federal government offered massive support. Since 1973, roughly 30% of federal energy R&D has been devoted to nuclear power. Billions of dollars were also spent on subsidies for plant construction and commercial and regulatory support. There was also extensive spending for military applications.

Civilian nuclear power made some real progress, but only for a while. The United States has 104 nuclear power plants with a total capacity of around 100 gigawatts (GW). These plants were built in three roughly equal tranches: the first plants broke ground between 1964 and 1969, the second between 1969 and 1974, and the final third between 1974 and 1977. The first tranche showed good economics and an average construction time of 6.5 years. Prospects began to dim quickly, however, with reduced government subsidies, increasingly strict safety standards, and growing public opposition. The second tranche of nuclear plants took an average of 10.5 years to build, and the third tranche nearly 12 years. The Watts Bar #1 plant of the Tennessee Valley Authority came on line in May 1996, 23 years after construction started. Long delays are deadly to the economics of any capital-intensive project. Groundbreaking for new nuclear power plants came to a complete halt in March 1977, and hopes for a restart died with the Three Mile Island accident in 1979. Sixty-three planned nuclear power plants were canceled between 1974 and 1995.

Many of the completed nuclear power plants suffered severe cost overruns, but the canceled plants were often even more expensive. The Shoreham plant on Long Island cost $6 billion, about 30 times the original estimate. Even worse, the completed plant was abandoned and decommissioned before startup in the face of overwhelming public opposition. The burden of this stillborn plant fell on Long Island electricity consumers.

The commercial nuclear program also produced another problem: radioactive waste. A typical nuclear plant generates about 20 tons of radioactive waste annually. Almost all the waste products are currently stored at power plant sites, but there are limitations to this approach. The federal government has always anticipated a long-term solution through either a central storage site or fuel reprocessing. But these options have proven to be politically difficult, and the planned waste repository at Yucca Mountain in Nevada has been scrubbed after more than $10 billion in sunk costs. There is no solution in sight.

In 1973, Nixon predicted that nuclear energy would provide 25% of the nation’s electricity supply by 1985 and 50% by 2000. The actual share was about 15% in 1985, plateauing at 20% in 1991. The nuclear program thus proved much more modest and much more expensive than expected.

The context for nuclear power has now changed. Not only are nuclear plants much more expensive than anticipated, but their primary competitor is no longer dirty coal or imported oil but instead clean, inexpensive domestic natural gas. The problem of nuclear waste remains unsolved, and public opposition, rekindled by last year’s Fukushima disaster in Japan, will probably frustrate the nuclear industry for the foreseeable future.

Only two new nuclear plants are currently under con- struction. The latest Energy Information Administration (EIA) outlook projects 10 GW of new nuclear capacity by 2035, plus another 7 GW of productivity improvements in existing plants. Unfortunately, the EIA also anticipates the gradual retirement of older nuclear plants, leading to an absolute decline in nuclear capacity after 2029. In current parlance, nuclear is no longer scalable.

The outlook for renewables

The limitations on nuclear power leave us with renewables as our only alternative to fossil fuels. In 1973, renewables accounted for 4 quads or about 6% of our energy consumption. By 2010, renewables had doubled to 8 quads and their share had increased to 8%. By the year 2035, the EIA projects another 50% increase to about 12 quads. It’s tempting to see the growth in renewables as evidence of success in developing alternative fuels, but when we look at the disparate components of the renewables category, the picture is not quite so rosy.

The current 8 quads of U.S. renewables include: hydropower (2.5 quads), wood (2 quads), municipal solid waste (MSW) (0.5 quad), corn ethanol (2 quads), geothermal (0.2 quad), wind (1 quad), and solar (0.1 quad). Let’s look at each of these energy sources.

Hydropower provides clean and virtually carbon-free power, but is expensive to build, and its output varies by season and by year. Both scalability and economics are limited by geography. The United States has 1,426 hydro plants with a combined capacity of 78 GW. About 40% of this capacity, however, is in 16 massive dams on major rivers, such as the Grand Coulee (7 GW), Chief Joseph (2.5 GW), John Day (2 GW), and Dalles (2 GW) dams on the Columbia River in Washington State and Oregon. Almost all of the low-cost sites are already developed, and the EIA projects an increase of only 0.5 quad by 2035.

The economics of wood are excellent, but only if you happen to be in the lumber or paper industry. About two-thirds of the 2 quads of wood energy is in the 15 main timber-producing states in the Southeast and Northwest, and growth will track lumber and paper production. MSW contributes another 0.5 quad to our energy supply. Producing electricity from burning trash may help cities deal with landfill limitations, but it’s not a very efficient, economical, or environmentally friendly form of energy. MSW electricity costs about four times as much as natural gas combined-cycle power and, according to the Environmental Protection Agency, releases toxic chemicals, mercury, and dioxin. Not much growth potential here.

Corn ethanol (2 quads) is the only renewable energy source competing with oil, but it’s expensive and diverts crops from the food supply. In 2011, the United States used nearly one-third of its corn crop to replace only 5% of its oil supply—not a very good tradeoff. The resulting corn price increases have hurt not only U.S. consumers but also consumers in poor countries that rely on U.S. corn exports.

Ethanol is not technically complex. In fact, virtually every society during the past 5,000 years has mastered the technology of distilling ethanol from plants. Ethanol shows great economics for wine and spirits, but not for fuel. Fuel ethanol requires buying huge amounts of corn, transporting it to a distillery, operating the distillery, and then distributing the final product into the gasoline pool. Ethanol contains only two-thirds as much energy per gallon as gasoline, and moving it by pipeline presents a number of technical problems. Despite these problems, the federal government forced ethanol into the market, initially through federal excise tax breaks, plus a tariff to keep out Brazilian imports. So far, U.S. consumers have paid about $50 billion in subsidies plus at least an additional $5 billion per year in higher food prices.

Although direct ethanol subsidies were eliminated in 2011, Congress has retained a mandate requiring that the gasoline supply contain increasing amounts of ethanol. The 2022 requirement of 36 billion gallons would consume almost the entire corn crop. To square this circle, Congress mandated that ethanol from cellulose feedstocks contribute at least 16 billion gallons by 2022. That makes the arithmetic work, but unfortunately, there is no viable technology to produce cellulosic ethanol, and corn-based ethanol is hitting its limits.

Next on the list is geothermal (0.2 quad). In many seismically active areas, high-temperature, high-pressure water trapped below the surface can be accessed by shallow drilling. Although its environmental footprint is small, geothermal energy is limited by geography and thus not scalable. About 85% of U.S. geothermal capacity is in California and another 13% in Nevada. The EIA projects growth to about 0.5 quad by 2035, a negligible contribution to the energy balance.

Wind energy (1 quad) has shown rapid growth, with electricity-generating capacity increasing from less than 2.5 GW in 2000 to over 43 GW today. Wind economics have improved somewhat, mainly by making turbines larger and building ever larger wind farms, but wind power is still expensive. Onshore wind power costs about 70% more than natural gas combined–cycle power, and offshore wind costs about 300% more. Wind power is also intermittent, has low load factors (around 30%), and is disproportionately available at night, when utilities have other low-cost units sitting idle.

Perhaps most important, state-of-the-art wind turbines, some taller than the Washington Monument, bring into conflict two basic tenets of the environmental movement: support for clean energy and opposition to disturbing pristine areas. Although the environmental community supports wind power in general, large wind farms near populated areas tend to generate substantial local opposition, often from staunch environmentalists. As a result, most wind turbines are built in remote areas, requiring expensive longdistance transmission lines.

There would be no wind power in the United States without massive federal and state support, including a 2.2-cent per kilowatt-hour federal production tax credit and Renewable Portfolio Standards in various states that require electric utilities to acquire a certain percentage of their power from approved renewable sources, regardless of cost. These subsidies and mandates cost consumers/taxpayers on the order of $3.5 billion to $4 billion a year. The EIA outlook shows only modest growth in wind power from 1 quad today to about 2 quads in 2035, which is still less than 2% of energy supply.

Solar, the icon of the green movement, contributes only 0.1 quad, or about 0.1% of the U.S. energy supply. The first solar cells, developed by Bell Labs in 1954, were inefficient (4% conversion) and expensive, but they were a real engineering breakthrough. The search for more cost-effective solar became a cornerstone of the federal energy R&D effort in 1977 with the establishment of the Department of Energy’s (DOE’s) Solar Energy Research Institute, now part of the National Renewable Energy Laboratory.

Today’s silicon crystal solar panels are a dramatic improvement over the original Bell Lab designs, with some cells achieving a conversion efficiency of about 35%, but the power output is still intermittent and load factors are very low, around 15%. On balance, solar energy is way too expensive for widespread application. Residential rooftop solar panels or large-scale photovoltaic power plants generate electricity at 7 to 10 times the cost of grid power. Even with continued heavy subsidies at the federal and state levels, the EIA expects solar energy to grow from 0.1 quad today to only about 0.4 quad by 2035, which is less than 0.5% of our energy supply 80 years after their invention.

Why have we failed?

On balance, 40 years of intensive federal research have produced no new technologies that could be called transformative. Why have all the hard work and taxpayer money failed to meet the national goals set by the past eight presidents?

Successful technologies pass through three distinct stages. In the conceptual phase, we develop a solid understanding of the science involved. In the technical phase, we learn how to build machines that actually work. In the final commercial phase, we figure out how to make products whose cost and performance convince consumers to prefer them over competing products. Few technologies can move all the way through this progression, and it’s not easy to pick the ultimate winners.

The United States has a long history of extraordinary technological achievements in all aspects of life, and the federal government has played a critical role in this process, driven mainly by national security needs. Having learned the hard way in World War II the disastrous consequences of military inferiority, the United States has invested trillions of dollars in our defense capabilities. The economics of national security, however, are different from the economics of the civilian economy. Technical superiority is mission-critical in many military situations, and the Pentagon is often willing to pay a high premium for relatively small advantages. The military also has a high tolerance for program failures, cost overruns, and an inefficient procurement process, because the consequences of military failure can be catastrophic. As a result, the military tends to focus on technical success. Fortuitously, some but by no means all military advances turn out to have major commercial applications. Four-engine bombers easily became airliners. Jet engines, computers, the Internet, and the Global Positioning System moved easily into the civilian economy. Commercial applications, however, are incidental to military research, not its objective.

In the civilian economy, new technologies face a much more severe economic test. The Department of Defense’s (DOD’s) fiscal year (FY) 2012 budget, excluding the Iraq and Afghanistan wars, is $531 billion. U.S. consumers, however, pay approximately $1.5 trillion for energy every year. The economy is therefore very sensitive to energy costs, and forcing the use of more expensive forms of energy can have serious consequences for growth.

The mantra of the energy R&D program has always been, “If we can put a man on the Moon, we can do anything,” but this comparison is wrong. Apollo was a conceptual and technical triumph with no commercial aspirations. Between 1969 and 1972, the United States landed 12 astronauts on the Moon at a cost of $12.5 billion (in 2012 dollars) per astronaut. The purpose of the program was to accomplish a technically difficult feat a few times despite the enormous cost. Civilian technology requires the exact opposite: the ability to do something on a large scale at a low cost. More than 40 years after Neil Armstrong’s giant leap, spaceflight is still too expensive for the average citizen, at least those unable to pay Virgin Galactic $200,000 for a 15-minute suborbital ride.

Supersonic flight faced the same problems. On October 14, 1947, Air Force pilot Chuck Yeager broke the sound barrier in the Bell X-1. The U.S. military quickly developed supersonic fighter and ultimately bomber aircraft and has been flying them successfully for more than 60 years. In contrast, there are no supersonic airliners in civilian service. Everyone is familiar with President John Kennedy’s famous May 1961 speech in which he committed the United States “to achieving the goal, before this decade is out, of landing a man on the moon and returning him safely to Earth.” Less well known is Kennedy’s commencement address to the U.S. Air Force Academy in June 1963, in which he committed us to the development “of a commercially successful supersonic transport superior to that being built in any other country of the world.” The difference between Apollo’s success and the supersonic transport’s failure lies in the single word “commercial.”

Jet engines are superior to piston engines in terms of efficiency, performance, and cost, thus creating the perfect conditions for commercial application. Supersonic flight, however, is very expensive, because the only way to break the sound barrier is to burn massive amounts of fuel. Furthermore, supersonic aircraft cost more to build and maintain and produce annoying sonic booms over populated areas. For the military, supersonic flight can mean the difference between mission success and failure and between life and death for flight crews. The high cost of high speed is therefore justified. For civilians, however, supersonic travel means the ability to get from New York to London in 3.5 hours rather than 6.5. The clientele for supersonic airliners are the tiny group of people who value their time at $1,000 to $2,000 per hour. The supersonic Concorde was a technical marvel but a commercial disaster, barely covering its cash operating costs until its retirement in 2003. Its European investors never recovered their investment.

Transformation of the nation’s energy balance requires technologies that succeed not just in the conceptual and technical phases but in the commercial phase as well, and government commercialization efforts have been a complete failure. The root cause of the problem is the inherently political nature of government programs.

The market is far superior to government as a vehicle for commercialization. Private investors spend their own money and tend to watch it carefully. Lots of private commercialization efforts fail. In the marketplace, however, investors demand that management recognize failure quickly and stop throwing good money after bad. The government, on the other hand, has an almost infinite supply of other people’s money to spend. Elected officials are reluctant to say to voters, “The solution to this problem should be left to the marketplace.” They would much rather say, “If you elect me, I will fix this problem for you through government action.” The promise itself often brings the desired short-term political benefits, even if the policy itself ultimately fails.

In his 2006 State of the Union message, President Bush made a pitch for cellulosic ethanol, stating, “Our goal is to make this new kind of ethanol practical and competitive within six years.” It’s now six years later. The government spent some money and put in place a cellulosic ethanol mandate, but there is still no economically viable way to make cellulosic ethanol.

President Obama’s “Blueprint for an Energy Strategy” claims as its centerpiece a Clean Energy Standard, which would require that an increasing share of electric power come from “clean sources.” The White House claims that “With this requirement in place, clean sources would account for 80% of our electricity by 2035.” The administration has also launched the SunShot Initiative “to make solar energy cost-competitive with other forms of energy by the end of the decade.” The objectives of the DOE’s FY 2013 budget request include reducing the cost of car batteries by three-quarters by 2020. These are appealing promises, but how exactly will the government make these things happen, and why should we believe them? By the time we know whether these objectives can be met, Obama will be long retired from the White House.

Moreover, the actual allocation of government funds is always heavily influenced by short-term political considerations. Most federal employees are capable, honest, and professional. Congress, however, tends to write laws to direct funds to favored constituencies, and the president controls the appointments of all senior officials in the Executive Branch, creating an understandable sensitivity to the viewpoint of the White House.

Since 1973, the federal government has spent $40 billion (in 2012 dollars) on coal R&D, including President Bush’s $2 billion Coal Research Initiative designed to “improve coal’s competitiveness in future energy supply markets.” Why are U.S. consumers better off with more coal and less domestic natural gas in the energy balance? Why not let the coal industry compete on its own? The answer, of course, is jobs. Government coal programs support employment in several critical swing states, including West Virginia, Illinois, Ohio, Pennsylvania, Montana, and Missouri.

In order to support the commercialization of renewable energy, the Energy Policy Act of 2005 authorized loan guarantees, supplemented by President Obama with additional money from the economic stimulus program. Outstanding DOE loan guarantees now total $34.7 billion. Despite the best efforts of DOE staff, considerations of the political connections of loan applicants, the districts in which they operate, and the number of jobs they provide find their way into the decisionmaking process.

The loan guarantees include $8.4 billion for electric cars and their batteries. The technology for low-cost electric cars does not yet exist, and it’s unlikely that forcing the manufacture of the current generation of vehicles will create a technological breakthrough. In fact, forced scale-up may actually impede technological progress. Tesla Motors, for example, received a $465 million government loan guarantee. Its first model, the all-electric Roadster, was priced at more than $100,000. The Model S, Tesla’s new all-electric offering for 2012, is a mid-size sedan selling for $50,000, or twice the price of comparable conventional cars such as the Toyota Camry or Ford Fusion. Even with the current $7,500 federal tax credit, Tesla’s cars are toys for rich people. Rather than working to design commercially viable electric vehicles, electric car and battery companies lobby the federal government for subsidies. Why innovate when you can make money on inferior technology? Why spend R&D dollars when lobbying dollars generate more profit?

Rather than working to design commercially viable electric vehicles, electric car and battery companies lobby the federal government for subsidies. Why innovate when you can make money on inferior technology?

Two new nuclear power plants have been granted $10.3 billion in loan guarantees. What is to be gained from expensive subsidies to nuclear plants without solving the outstanding problems of economics, waste disposal, and public opposition?

More than $1 billion of program money has gone to solar manufacturing companies, including the infamous Solyndra. Unfortunately, the solar market created by state and federal subsidies for U.S. homeowners has now been taken over by the Chinese, whose manufacturing costs are lower. U.S. homeowners are now buying expensive solar systems, and the U.S. solar industry is going bankrupt, both at taxpayer expense.

An increasingly popular argument in defense of federal renewable commercialization efforts is the need to compensate for government support to fossil fuels. A 2011 study by Management Information Services, Inc. (MISI) calculated federal energy incentives between 1950 and 2010, including tax policy, regulation, R&D, market activity, and government services, and concluded that 44% of these incentives went to oil, 12% to coal, 14% to natural gas, 11% to hydro, 9% to nuclear, and only 10% to wind and solar. There are serious methodological problems with this study, such as defining partial relief from oil price controls as a subsidy, but let’s take the numbers at face value. The study counts $369 billion (in 2010 dollars) in government support for oil, implying that oil has been underpriced and therefore unfairly advantaged in the market place as compared to renewables, which enjoyed only $81 billion in support. This analysis ignores the fact that oil is the most heavily taxed product on Earth. Between 1950 and 2010, U.S. excise taxes on motor fuels totaled $1.2 trillion at the federal level and another $2 trillion at the state level (in 2010 dollars). Renewables are not burdened with any significant taxes.

Eight steps to reform

So given the disappointing results of government energy R&D to date, where should the United States go from here? Here are eight suggestions for restructuring the federal energy technology program.

First, focus the federal energy technology budget on conceptual and technical research. Private companies don’t do enough conceptual research, because it’s difficult to capture the benefits of better science. Further, laboratory work is relatively inexpensive, and the government has a real comparative advantage in this area.

Second, give up the loan guarantees, production tax credits, renewable portfolio standards, government/industry partnerships, and mandates. These programs are the most expensive and least effective components of our energy policy. They try to force premature commercialization.

Third, stop trying to pick winners. Government energy R&D should seek a wide range of fundamentally new approaches and ideas. For example, the current technologies of crystalline silicon or thin-film solar cells may never be commercially successful. Federal R&D dollars would be better spent working on fundamentally new ways to capture sunlight. The concepts produced from federal research should be made available free of charge to U.S. companies for development by the private sector.

Fourth, fix the Research and Experimentation Tax Credit (RETC). The 20% RETC reduces the cost of private research, but it applies only to incremental R&D expenditures, and Congress periodically allows it to lapse. A permanent 5 to 10% tax credit for all R&D would allow companies to plan properly and boost overall R&D, including in energy.

Fifth, deal with externalities explicitly rather than implicitly. Fossil fuels do indeed have external costs that are not reflected in their prices, and renewable technology does offer potential advantages. U.S. air and water quality have improved dramatically during the past several decades because of the creation of reasonable federal standards for criteria pollutants, such as sulfur, carbon monoxide, and particulates, and allowing the affected companies to choose the most effective and least expensive means to comply.

Other externalities, however, are harder to address. U.S. dependence on the global oil market raises national security concerns because oil price volatility has an impact on the economy. The United States needs a comprehensive strategy to deal with this problem, involving not only alternative fuels but domestic oil resource development, diplomacy, and defense. Forcing small amounts of expensive ethanol into the gasoline pool offers little if any relief. Replacing coal or domestic natural gas with wind or solar has no impact at all on the oil market.

Carbon dioxide (CO2) is an even more perplexing problem. Climate change scientists argue that increasing CO2 emissions will have a catastrophic impact on humanity. If this view is correct, the solution must involve substantial and cooperative carbon reductions on a global scale. No government anywhere in the world, including the United States, has shown any willingness to bear the economic costs of such reductions. All the current solar and wind power in the market today have reduced U.S. CO2 emissions by 30 to 35 million metric tons per year, which is less than one-10th of 1% of the current worldwide total. China is increasing its carbon emissions by that amount every month.

The congressional cap-and-trade proposals discussed during the past few years would have limited the price of carbon to $25 to $50 per metric ton out of fear that higher CO2 prices would impede economic growth. Such a price is well below the level required to change consumer behavior or to bring the current generation of renewable technologies into the market. Addressing climate change requires an open and honest debate about the severe tradeoffs involved and the feasibility of achieving meaningful results on a global scale. The forced commercialization of tiny amounts of highcost wind and solar is simply throwing money away.

Sixth, stop using job creation as a rationale for renewable energy. Any federal program by definition creates jobs for the people who administer it and for those who receive its largesse. Jobs are also destroyed, however, by the removal of those funds from private capital markets by taxation or government borrowing. It’s impossible to determine whether this substitution has a net positive effect on the economy. Government spending should be judged solely on whether it meets program objectives.

Seventh, let the U.S military focus on defending the country. Covering military installations with solar panels and wind turbines and testing biofuels in military aircraft may enhance the DOD’s relations with Congress and the White House, but do nothing to enhance military capabilities or reduce costs. The considerable prestige of the military cannot rescue expensive, poor-performing energy technologies.

Finally, and perhaps most importantly, don’t overpromise. For 40 years now, political leaders of both parties have been proclaiming that government can plan and engineer a fundamental transformation of the energy industry. Some elected officials continue to promise specific new technologies within specific time frames. Although these promises always sound good when they are made, none has ever been kept. When the promises prove to be empty, the politicians who made them are long gone from the scene.

The United States won the Cold War because it did not succumb to the temptations of central planning. Instead, it put its trust in the strength of its economic institutions to find new technologies and bring them to fruition, with the government and the private sector each doing what it does best. It’s quite possible, perhaps even likely, that the next 100 years will bring energy revolutions as profound and consequential as electricity or the automobile, but the technologies that spark these revolutions may not even be on today’s list of government-sponsored candidates. A real energy supply transformation may involve technologies we can’t even imagine today.

In 1540, Francisco Vasquez de Coronado set out from Mexico into what is now the U.S. Southwest in search of the Seven Cities of Gold. He failed not because his project was underfunded, but because he was seeking something that didn’t exist. In contrast, in 1804 President Thomas Jefferson sent Lewis and Clark’s Corps of Discovery to explore the U.S. Northwest—not to find anything specific but to find what was actually there. Unlike Coronado, Jefferson understood the essence of research.

Applying New Research to Improve Science Education

Science, technology, engineering, and mathematics (STEM) education is critical to the U.S. future because of its relevance to the economy and the need for a citizenry able to make wise decisions on issues faced by modern society. Calls for improvement have become increasingly widespread and desperate, and there have been countless national, local, and private programs aimed at improving STEM education, but there continues to be little discernible change in either student achievement or student interest in STEM. Articles and letters in the spring and summer 2012 editions of Issues extensively discussed STEM education issues. Largely absent from these discussions, however, is attention to learning.

This is unfortunate because there is an extensive body of recent research on how learning is accomplished, with clear implications for what constitutes effective STEM teaching and how that differs from typical current teaching at the K-12 and college levels. Failure to understand this learning-focused perspective is also a root cause of the failures of many reform efforts. Furthermore, the incentive systems in higher education, in part driven by government programs, act to prevent the adoption of these research-based ideas in teaching and teacher training.

A new approach

The current approach to STEM education is built on the assumption that students come to school with different brains and that education is the process of immersing these brains in knowledge, facts, and procedures, which those brains then absorb to varying degrees. The extent of absorption is largely determined by the inherent talent and interest of the brain. Thus, those with STEM “talent” will succeed, usually easily, whereas the others have no hope. Research advances in cognitive psychology, brain physiology, and classroom practices are painting a very different picture of how learning works.

We are learning that complex expertise is a matter not of filling up an existing brain with knowledge, but of brain development. This development comes about as the result of intensive practice of the cognitive processes that define the specific expertise, and effective teaching can greatly reduce the impact of initial differences among the learners.

This research has established important underlying causes and principles and important specific results, but it is far from complete. More research is needed on how to accomplish the desired learning most effectively over the full range of STEM skills and potential learners in our classrooms, as well as how to best train teachers.

What is learning STEM?

The appropriate STEM educational goal should be to maximize the extent to which the learners develop expertise in the relevant subject, where expertise is defined by what scientists and engineers do. This is not to say that every learner should become a scientist or engineer, or that they could become one by taking any one class, but rather that the value of the educational experiences should be measured by their effectiveness at changing the thinking of the learner to be more like that of an expert when solving problems and making decisions relevant to the discipline. As discussed in the National Research Council study Taking Science to School, modern research has shown that children have the capability to begin this process and learn complex reasoning at much earlier ages than previously thought, at least from the beginning of their formal schooling. Naturally, it is necessary and desirable for younger children to learn less specialized expertise encompassing a broader range of disciplines than would be the case for older learners.

Expertise has been extensively studied across a variety of disciplines. Experts in any given discipline have large amounts of knowledge and particular discipline-specific ways in which they organize and apply that knowledge. Experts also have the capability to monitor their own thinking when solving problems in their discipline, testing their understanding and the suitability of different solution approaches, and making corrections as appropriate. There are a number of more specific components of expertise that apply across the STEM disciplines. These include the use of:

  • Discipline- and topic-specific mental models involving relevant cause and effect relationships that are used to make predictions about behavior and solve problems.
  • Sophisticated criteria for deciding which of these models do or don’t apply in a given situation, and processes for regularly testing the appropriateness of the model being used.
  • Complex pattern-recognition systems for distinguishing between relevant and irrelevant information.
  • Specialized representations.
  • Criteria for selecting the likely optimum solution method to a given problem.
  • Self-checking and sense making, including the use of discipline-specific criteria for checking the suitability of a solution method and a result.
  • Procedures and knowledge, some discipline-specific and some not, that have become so automatic with practice that they can be used without requiring conscious mental processing. This frees up cognitive resources for other tasks.

Many of these components involve making decisions in the presence of limited information—a vital but often educationally neglected aspect of expertise. All of these components are embedded in the knowledge and practices of the discipline, but that knowledge is linked with the process and context, which are essential elements for knowledge to be useful. Similarly, measuring the learning of most elements of this expertise is inherently discipline-specific.

How is learning achieved?

Researchers are also making great progress in determining how expertise is acquired, with the basic conclusion being that those cognitive processes that are explicitly and strenuously practiced are those that are learned. The learning of complex expertise is thus quite analogous to muscle development. In response to the extended strenuous use of a muscle, it grows and strengthens. In a similar way, the brain changes and develops in response to its strenuous extended use. Advances in brain science have now made it possible to observe some of these changes.

Specific elements, collectively called “deliberate practice,” have been identified as key to acquiring expertise across many different areas of human endeavor. This involves the learner solving a set of tasks or problems that are challenging but doable and that involve explicitly practicing the appropriate expert thinking and performance. The tasks must be sufficiently difficult to require intense effort by the learner if progress is to be made, and hence must be adjusted to the current state of expertise of the learner. Deliberate practice also includes internal reflection by the learner and feedback from the teacher/coach, during which the achievement of the learner is compared with a standard, and there is an analysis of how to make further progress. The level of expert-like performance has been shown to be closely linked to the duration of deliberate practice. Thousands of hours of deliberate practice are typically required to reach an elite level of performance.

This research has a number of important implications for STEM education. First, it means that learning is inherently difficult, so that motivation plays a large role. To succeed, the learner must be convinced of the value of the goal and believe that hard work, not innate talent, is critical. Second, activities that do not demand substantial focus and effort provide little educational value. Listening passively to a lecture, doing many easy, repetitive tasks, or practicing irrelevant skills produce little learning. Third, although there are distinct differences among learners, for the great majority the amount of time spent in deliberate practice transcends any other variables in determining learning outcomes.

Implications for teaching

From the learning perspective, effective teaching is that which maximizes the learner’s engagement in cognitive processes that are necessary to develop expertise. As such, the characteristics of an effective teacher are very analogous to those of a good athletic coach: designing effective practice activities that break down and collectively embody all the essential component skills, motivating the learner to work hard on them, and providing effective feedback.

The effective STEM teacher must:

  • Understand expert thinking and design suitable practice tasks.
  • Target student thinking and learning needs. Such tasks must be appropriate to the level of the learner and be effective at building on learners’ current thinking to move them to higher expertise. The teacher must be aware of and connect with the prior thinking of the learner as well as have an understanding of the cognitive difficulties posed by the material.
  • Motivate the student to put in the extensive effort that is required for learning. This involves generating a sense of self-efficacy and ownership of the learning; making the subject interesting, relevant, and inspiring; developing a sense of identity in the learner as a STEM expert; and other factors that affect motivation. How to do this in practice is dependent on the subject matter and the characteristics of the learner—their prior experience, level of mastery, and individual and sociocultural values.
  • Provide effective feedback that is timely and directly addresses the student’s thinking. This requires the teacher to recognize the student’s thought processes, be aware of the typical cognitive challenges with the material, and prepare particular questions, tasks, and examples to help the learner overcome those challenges. Research has shown several effective means of providing feedback, including short, focused lectures if the student has been carefully prepared to learn from that lecture.
  • Understand how learning works, and use that to guide all of their activities. In addition to the research on learning expertise, this includes other well-established principles regarding how the human brain processes and remembers information that are relevant to education, such as the limitations of the brain’s short-term memory and what processes enhance long-term retention.

Although many of these instructional activities are easier to do one on one, there are a variety of pedagogical techniques and simple technologies that extend the capabilities of the teacher to provide these elements of instruction to many students at once in a classroom, often by productively using student-student interactions. Examples of approaches that have demonstrated their effectiveness can be found in recommended reading articles by Michelle Smith and by Louis Deslauriers et al.

Effective STEM teaching is a specific learned expertise that includes, and goes well beyond, STEM subject expertise. Developing such teaching expertise should be the focus of STEM teacher training. Teachers must have a deep mastery of the content so they know what expert thinking is, but they also must have “pedagogical content knowledge.” This is an understanding of how students learn the particular content and the challenges and opportunities for facilitation of learning at a topic-specific level.

This view of STEM teaching as optimizing the development of expertise provides clearer and more detailed guidance than what is currently available from the classroom research on effective teaching. Most of the classroom research on effective teaching looks at K-12 classrooms and attempts to link student progress on standardized tests with various teacher credentials, traits, or training. Although there has been progress, it is limited because of the challenges of carrying out educational research of this type. There are a large number of uncontrolled variables in the K-12 school environment that affect student learning, the standardized tests are often of questionable validity for measuring learning, teacher credentials and training are at best tenuous measures of their content mastery and pedagogical content mastery, and the general level of these masteries is low in the K-12 teacher population. The level of mastery is particularly low in elementary- and middle-school teachers. All of these factors conspire to make the signals small and easily masked by other variables.

At the college level, the number of uncontrolled variables is much smaller, and as reviewed in the NRC report Discipline-Based Education Research, it is much clearer that those teachers who practice pedagogy that supports deliberate practice by the students show substantially greater learning gains than are achieved with traditional lectures. For example, the learning of concepts for all students is improved, with typical increases of 50 to 100%, and the dropout and failure rates are roughly halved.

Shortcomings of the current system

Typical K-16 STEM teaching contrasts starkly with what I have just described as effective teaching. At the K-12 level, although there are notable exceptions, the typical teacher starts out with a very weak idea of what it means to think like a scientist or engineer. Very few K-12 teachers, including many who were STEM majors, acquire sufficient domain expertise in their preparation. Hence, the typical teacher begins with very little capability to properly design the requisite learning tasks. Furthermore, their lack of content mastery, combined with a lack of pedagogical content knowledge, prevents them from properly evaluating and guiding the students’ thinking. Much of the time, students in class are listening passively or practicing procedures that neither have the desired cognitive elements nor require the level of strenuousness that are important for learning.

Teachers at both the K-12 and undergraduate levels also have limited knowledge of the learning process and what is known about how the mind functions, resulting in common educational practices that are clearly counter to what research shows is optimum, both for processing and learning information in the classroom environment and for achieving long-term retention. Another shortcoming of teaching at all levels is the strong tendency to teach “anti-creativity.” Students are taught and tested on solving well-defined artificial problems posed by the teacher, where the goal is to use the specific procedure the teacher intended to produce the intended answer. This requires essentially the opposite cognitive process from STEM creativity, which is primarily recognizing the relevance of previously unappreciated relationships or information to solve a problem in a novel way.

At the undergraduate level, STEM teachers generally have a high degree of subject expertise. Unfortunately, this is not reflected in the cognitive activities of the students in the classroom, which again consist largely of listening, with very little cognitive processing needed or possible. Students do homework and exam problems that primarily involve practicing solution procedures, albeit complex and/or mathematically sophisticated ones. However, the assigned problems almost never explicitly require the sorts of cognitive tasks that are the critical components of expertise described above. Instructors also often suffer from “expert blindness,” failing to recognize and make explicit many mental processes that they have practiced so much that they are automatic.

Another problem at the postsecondary level is the common belief that effective teaching is only a matter of providing information to the learner, with everything else being the responsibility of the learners and/or their innate limitations. It is common to assume that motivation, and even curiosity about a subject, are entirely the responsibility of the student, even when the student does not yet know much about the subject.

Failure of reform efforts

The perspective on learning that I have described also explains the failure of many STEM reform efforts.

Belief in the importance of innate talent or other characteristics. Schools have long focused educational resources on learners that have been identified in some manner as exceptional. Although the research shows that all brains learn expertise in fundamentally the same way, that is not to say that all learners are the same. Many different aspects affect the learning of a particular student. Previous learning experiences and sociocultural background and values obviously play a role. There is a large and contentious literature as to the relative significance of innate ability/talent or the optimum learning style of each individual, with many claims and fads supported by little or questionable research.

Researchers have tried for decades to demonstrate that success is largely determined by some innate traits and that by measuring those traits with IQ tests or other means, one can preselect children who are destined for greatness and then focus educational resources on them. This field of research has been plagued by difficulties with selection bias and the lack of adequate controls. Although there continues to be some debate, the bulk of the research is now showing that, excepting the lower tail of the distribution consisting of students with pathologies, the predictive value of any such early tests of intellectual capability is very limited. From an educational policy point of view, the most important research result is that any predictive value is small compared to the later effects of the amount and quality of deliberate practice undertaken by the learner. That predictive value is also small compared to the effects of the learners’ and teachers’ beliefs about learning and the learners’ intellectual capabilities. Although early measurements of talent, or IQ, independent of other factors have at best small correlation with later accomplishment, simply labeling someone as talented or not has a much larger correlation. It should be noted that in many schools students who are classified as deficient by tests with very weak predictive value are put into classrooms that provide much less deliberate practice than the norm, whereas the opposite is true for students who are classified as gifted. The subsequent difference in learning outcomes for the two groups provides an apparent validation for what is merely a self-fulfilling prophecy. Given these findings, human capital is clearly maximized by assuming that, except for students with obvious pathologies, every student is capable of great achievement in STEM and should be provided with the educational experiences that will maximize their learning.

The idea that for each individual there is a unique learning style is surprisingly widespread given the lack of supporting evidence for this claim, and in fact significant evidence showing the contrary, as reviewed by Hal Pashler of the University of California at San Diego and others.

Because of the presence of many different factors that influence a student’s success in STEM, including the mind’s natural tendency to learn, some students do succeed in spite of the many deficiencies in the educational system. Most notably, parents can play a major role in both early cognitive development and STEM interest, which are major contributors to later success. However, optimizing the teaching as I described would allow success for a much larger fraction of the population, as well as allowing those students who are successful in the current system to do even better.

Poor standards and accountability. Standards have had a major role in education reform efforts, but they are very much a double-edged sword. Although good definitions and assessments of the desired learning are essential, bad definitions are very harmful. There are tremendous pitfalls in developing good, widely used standards and assessments. The old concept of learning, combined with expert blindness and individual biases, exerts a constant pressure on standards to devolve into a list of facts covering everyone’s areas of interest, with little connection to the essential elements of expertise. The shortcomings in the standards are then reinforced by the large-scale assessment systems, because measuring a student’s knowledge of memorized facts and simple procedures is much cheaper and easier than authentic measurements of expertise. So although good standards and good assessment must be at the core of any serious STEM education improvement effort, poor standards and poor assessments can have very negative consequences. The recent National Academy of Sciences–led effort on new science standards, starting with a carefully thought-out guiding framework, is an excellent start, but this must avoid all the pitfalls as it is carried through to large-scale assessments of student mastery. Finally, good standards and assessments will never by themselves result in substantial improvement in STEM education, because they are only one of several essential components to achieving learning.

Competitions and other informal science programs: Attempting to separate the inspiration from the learning. Motivation in its entirety, including the elements of inspiration, is such fundamental requirement for learning that any approach that separates it from any aspect the learning process is doomed to be ineffective. Unfortunately, a large number of government and private programs that support the many science and engineering competitions and out-of-school programs assume that they are separable. The assumption of such programs is that by inspiring children through competitions or other enrichment experiences, they will then thrive in formal school experiences that provide little motivation or inspiration and still go on to achieve STEM success. Given the questionable assumptions about the learning process that underlie these programs, we should not be surprised that there is little evidence that such programs ultimately succeed, and some limited evidence to the contrary. The past 20 years have seen an explosion in the number of participants in engineering-oriented competitions such as First Robotics and others, while the fraction of the population getting college degrees in engineering has remained constant. A study by Rena Subotnik and colleagues that tracked high-school Westinghouse (now Intel) talent search winners, an extraordinarily elite group already deeply immersed in science, found that a substantial fraction, including nearly half of the women, had switched out of science within a few years, largely because of their experiences in the formal education system. It is not that such enrichment experiences are bad, just that they are inherently limited in their effectiveness. Programs that introduce these motivational elements as an integral part of every aspect of the STEM learning process, particularly in formal schooling, would probably be more effective.

Silver-bullet solutions. A number of prominent scientists, beginning as far back as the Sputnik era, have introduced new curricula based on their understanding of the subject. The implicit assumption of such efforts is that someone with a high level of subject expertise can simply explain to novices how an expert thinks about the subject, and the novices (either students or K-12 teachers) will then embrace and use that way of thinking and be experts themselves. This assumption is strongly contradicted by the research on expertise and learning, and so the failure of such efforts is no surprise.

A number of elements such as school organization, teacher salaries, working conditions, and others have been put forth as the element that, if changed, will fix STEM education. Although some of these may well be a piece of a comprehensive reform, they are not particularly STEM-specific and by themselves will do little to address the basic shortcomings in STEM teaching and learning.

The conceptual flaws of STEM teacher in-service professional development. The federal government spends a few hundred million dollars each year on in-service teacher professional development in STEM, with states and private sources providing additional funding. Suzanne Wilson’s review of the effectiveness of such professional development activities finds evidence of little success and identifies structural factors that inhibit effectiveness. From the perspective of learning expertise, it is clear why teacher professional development is fundamentally ineffective and expensive. If these teachers failed to master the STEM content as full-time students in high school and college, it is unrealistic to think they will now achieve that mastery as employees through some intermittent, part-time, usually voluntary activity on top of their primary job.

Why change is hard

First, nearly everyone who has gone to school perceives himself or herself to be an expert on education, resulting in a tendency to seize on solutions that overlook the complexities of the education system and how the brain learns. Second, there are long-neglected structural elements and incentives within the higher education system that actively inhibit the adoption of better teaching methods and the better training of teachers. These deserve special attention.

Improving undergraduate STEM teaching to produce better-educated graduates and better-trained future K-12 teachers is a necessary first step in any serious effort to improve STEM education, but there are several barriers to accomplishing this. First, the tens of billions of dollars of federal research funding going to academic institutions, combined with no accountability for educational outcomes at the levels of the department or the individual faculty member, have shaped the university incentive system to focus almost entirely on research. Thus, STEM departments and individual faculty members, regardless of their personal inclinations, are forced to prioritize their time accordingly, with the adoption of better teaching practices, improved student outcomes, and contributing to the training of future K-12 STEM teachers ranking very low. Second, to the limited extent that there are data, STEM instructional practices appear to be similarly poor across the range of postsecondary institutions. This is probably because the research-intensive universities produce most of the Ph.D.s, who become the faculty at all types of institutions, and so the educational values and standards of the research-intensive universities have become pervasive. Third, with a few exceptions, the individual academic departments retain nearly absolute control over what they teach and how they teach. Deans, provosts, and especially presidents have almost no authority over, or even knowledge of, educational practices in use by the faculty. Any successful effort to change undergraduate STEM teaching must change the incentives and accountability at the level of the academic department and the individual faculty member in the research-intensive universities.

A possible option would be to make a department’s eligibility to receive federal STEM research funds contingent on the reporting and publication of undergraduate teaching practices and student outcomes. A standard reporting format would make it possible to compare the extent to which departments and institutions employ best practices. Prospective students could then make more-informed decisions about which institution and department would provide them with the best education.

Most K-12 teacher preparation programs have a local focus, and they make money for the institutions of which they are a part. There is no accepted professional standard for teacher training, and there is a financial incentive for institutions to accept and graduate as many education majors as possible. This has resulted in low standards, particularly in math and science, with teacher education programs frequently having the lowest math and science requirements of any major at the institution. This also means that they attract students with the greatest antipathy toward math and science. Research by my colleagues has found that elementary education majors have far more novice-like attitudes about physics than do students in any other major at the university. Federal programs to support the training of K-12 STEM teachers provide easily available scholarship money, which reinforces the status quo by ensuring a plentiful supply of students in spite of the programs’ low quality. Rewarding institutions that produce graduates with the expertise needed to be highly effective teachers is an essential step in bringing about the massive change that is needed in the preparation of STEM teachers.

Focusing on STEM learning and how it is achieved provides a valuable perspective for understanding the shortcomings of the educational system and how it can be improved. It clarifies why the current system is producing poor results and explains why current and past efforts to improve the situation have had little effect. However, it also offers hope. Improvement is contingent on changes in the incentive system in higher education to bring about the widespread adoption of STEM teaching methods and the training of K-12 teachers that embody what research has shown is important for effective learning. These tasks are admittedly challenging, but the results would be dramatic. The United States would go from being a laggard in STEM education to the world leader.

Controlling the Arms Bazaar

In July 2012, after years of preliminary effort, United Nations (UN) member nations gathered in New York to draft a treaty that would provide the foundation for regulating the international arms trade. Time ran out on the Arms Trade Treaty negotiations, but treaty proponents have promised to continue the effort.

The arms trade includes everything from handguns to ballistic missiles and a wide range of supporting equipment and technology. After an extended slump following the end of the Cold War, the trade is currently growing. The Congressional Research Service estimates that in 2011, worldwide arms agreements, which include sales and assistance, totaled more than $85 billion, almost twice the 2010 total. Small Arms Survey, a well-regarded nongovernmental organization based in Geneva, Switzerland, estimates the size of the legal trade in small arms as at least $8.5 billion a year. For obvious reasons, no one has a reliable estimate for the size of the illicit arms trade.

In The Shadow World: Inside the Global Arms Trade, Andrew Feinstein makes the case for bringing arms transfers under greater control. This is probably the most comprehensive investigative account of the global arms trade since Anthony Sampson’s The Arms Bazaar in the mid-1970s. Like The Arms Bazaar, it is filled with accounts of the world of arms brokers, shady deals, and covert assistance. Feinstein also addresses the economic pressures that drive arms procurement and exports in the regular defense budget and policy process. Above all, it is a story of corruption and its deadly consequences.

Feinstein writes about the impact of corruption in the arms trade from personal experience. As a legislator in South Africa’s early post-apartheid government, he found himself the ranking member of the African National Congress on the parliament’s Public Accounts Committee. In the course of his service he confronted evidence of corruption and mismanagement in multibillion-dollar arms deals—by a government initially pledged to cut military spending—and was eventually forced to give up his seat after he refused to stop pushing for an investigation of the allegations.

The Shadow World is first-rate advocacy research with extensive and well-documented evidence assembled in service of his indictment of the current state of the global arms trade. “The trade in weapons,” he writes, “is a parallel world of money, corruption, deceit, and death. It operates according to its own rules, largely unscrutinized, bringing enormous benefits to the chosen few, and suffering and immiseration to millions. The trade corrodes our democracies, weakens already fragile states and often undermines the very national security it purports to strengthen.” Readers will have to decide, however, whether Feinstein’s understandable anger at the ruinous consequences of the arms trade at times pushes him to argue his case in ways that may make it hard to persuade those not already inclined to share his views.

Several recurring storylines will give readers insights into facets of the global trade. Some illustrate the workings of the trade in major weapons systems, and some focus on the illicit trade, mostly in small arms and light weapons, and its devastating effects in conflicts around the world; both underscore the political role that arms transfers continue to play for suppliers and recipients.

One of the storylines concerns the Middle East and especially Saudi Arabia. In the wake of the 1973 Arab-Israeli war, the boycott by the Organization of the Petroleum Exporting Countries and subsequent price increases generated vast revenues for oil-producing states, giving them the wherewithal and the security incentives to expand their arsenals. For arms-producing states, the incentives to cement political relationships and “soak up petrodollars” with arms exports were equally strong. The United States, the United Kingdom, and other producers, including the Soviet Union/Russia, vied to sell huge quantities of weapons, particularly advanced aircraft and naval vessels. Israel and Egypt, at the center of continuing efforts to craft a lasting Middle East peace settlement, also receive huge quantities of weapons as security assistance. Even with the recent rise of China and India as major arms importers, a substantial portion of the world’s weapons exports in any given year will be destined for the Middle East.

At the center of Feinstein’s Middle East story is Saudi Arabia, with its apparently insatiable appetite for modern weaponry and pervasive culture of corruption that has tainted its relations with major arms producers. The Al Yamamah (Arabic for “The Dove”) deal between the United Kingdom and the Saudis captures the essence and the high politics of these transactions; personal interventions by Prime Minister Margaret Thatcher helped to seal the deal. Over 20 years, advanced fighters from BAE Systems, the prime contractor, went to the Saudis, and in return the UK government received guaranteed supplies of oil. Although all the details of the sale, the largest UK arms export deal ever, were never revealed, BAE made tens of billions of pounds.

Charges of bribery and multimillion-dollar slush funds dogged the deal from the beginning, and Feinstein provides many details drawn from investigative reporting. A UK government investigation of BAE practices was launched in 2004, but when Saudi Arabia objected, Prime Minister Blair ordered it shut down in 2006 in the name of maintaining good relations. Another major installment in the Al Yamamah deal was signed shortly afterward. The UK investigation ended, but in 2010 BAE was fined $400 million as the outcome of a U.S. Justice Department investigation of a U.S. bank charged with funneling BAE’s bribes to a Saudi prince.

One of the persistent problems facing efforts to tackle corruption is the belief that “this is how things work” in overseas arms deals. Until relatively recently, many countries permitted companies to deduct bribes from their taxes as part of the cost of doing business, a practice that the Organization for Economic and Co-operation and Development Convention on Combating Bribery, which took effect in 1999, and a number of new national laws and regulations have helped to curb. More general efforts against money laundering in the name of counterterrorism or counternarcotics have also made bribery more difficult.

How much things may have improved in the wake of these efforts is unclear. The 2011 Transparency International (TI) Bribe Payers Index, which ranks both countries and sectors by the likelihood of paying bribes to gain overseas business, ranks the arms, defense, and military sector 10th, a significant improvement from earlier surveys. Earlier TI research cited by Feinstein, however, concluded that the arms trade was “hard-wired for corruption” and accounted for 40% of all corruption in global trade.

The civilian toll

Another recurring theme in Feinstein’s work is the terrible price paid by the civilians caught in conflict fueled and sustained by the illicit trade in weapons. The book opens with an account of the horror that consumed Sierra Leone during the civil war of the late 1990s; the stories of slaughter and mutilation by the brutal Revolutionary United Front (RUF) fighters are difficult to read. Feinstein focuses particularly on Africa, where many of the world’s poorest nations are also those most afflicted with recurring conflicts that sap any hope for development. The genocide in Rwanda, the continuing conflicts in Zaire/DR Congo, and the civil wars in Angola and Liberia all share common features: the web of brokers and dealers who supply the arms and the weak or corrupt leaders who tolerate or often facilitate the trade. Here one encounters Viktor Bout, perhaps the best-known of the breed of arms brokers from the former Soviet Union, who turned the experience and contacts gained in the covert campaigns of the Cold War to profitable new ventures in post–Cold War conflicts. Another is Leonid Minin, the Ukrainian broker who supplied the RUF in Sierra Leone, with the active connivance of Liberian president Charles Taylor and the leaders of other nearby states. Both eventually fell afoul of the law, but their fates illustrate the differences in current national regulations. Minin spent two years in jail in Italy awaiting trial but was ultimately released because his crimes had taken place overseas. In contrast, U.S. law may assert extraterritorial jurisdiction. In early 2012, Bout, who was arrested in a sting operation in Thailand and extradited to the United States, was convicted and sentenced to 25 years in prison for conspiring to sell weapons to a U.S.-designated foreign terrorist group based in Colombia.

Rather than address other aspects of the global arms trade—the book contains little if any coverage of Russia or China, for example, although both countries are part of the global problem and essential to any solutions—Feinstein devotes a significant portion of the book to U.S. domestic defense procurement and the corrosive effects of the military-industrial complex on U.S. foreign policy. In doing so, he makes some provocative charges. The U.S. military is the most powerful fighting force in the world, he writes, but the system for setting its budgets and buying its weapons is “the most expensive and arguably the most systemically corrupt. … While corruption in export deals has declined since the toughening of legislation and enforcement, the importance of the domestic market, combined with elected representatives’ dual need to deliver jobs to their constituents and to raise money for biennial elections, has led to systemic legal bribery.”

At this point the book becomes less of an account of the arms trade and more of an examination of the roots of U.S. militarism in the interlocking interests of Congress, the Pentagon, and defense companies in bigger budgets and the wars that justify them. The problems in the procurement system are well recognized and the bulk of the evidence Feinstein cites is part of the mainstream policy debate. His argument that the U.S. system is irredeemably corrupt is certainly not universally accepted and will weaken his case for a good many readers. Whatever one thinks of these arguments, they have the effect of making an already huge problem seem so enormous that one despairs of solutions. And that is part of the dilemma that Feinstein faces at the end: What is to be done?

Feinstein, like almost anyone who cares about the damage caused by the arms trade, is faced with the problem of finding solutions that could actually make a difference. If one accepts, as he does, that “some form of arms industry is required in the dangerous and unpredictable world we inhabit,” and that “obviously, the manufacture of weapons and related materiel may contribute to our general security,” then one is immediately in the world of balancing interests and making tradeoffs.

Feinstein makes a powerful case that at present the balance is dangerously skewed, but he does not offer many remedies. An Arms Trade Treaty could offer an overall framework for more coherent national action. The international treaties banning landmines and cluster munitions have shown a willingness by some states to outlaw weapons whose harm to civilians both during and long after conflict outweighs any military utility, although the United States has so far refused to sign either agreement. There have been encouraging regional and national efforts to improve regulations for parts of the arms trade, and the UN deserves great credit for its willingness to tackle the problems posed by the illicit trade in small arms and light weapons.

But the global arms trade is so vast and complicated that one could exhaust oneself trying to control it. Limited efforts are almost inevitable, yet they are vulnerable to the charge that their constrained impact is not worth the effort when measured against the number of conflicts and the scope of damage that fall outside the controls. The Shadow World does not offer many answers, but Feinstein deserves great credit for documenting why the questions must be asked.

Affordable Access to Space

The high cost of reaching orbit is the major factor preventing the large-scale exploration and exploitation of space. When I fly from College Station, Texas, to almost anywhere in the United States, I pay $4 to $8 per kilogram (kg) of me. When a satellite is launched into space, the customer (or taxpayer) pays approximately $10,000 to $20,000/kg. Space travel will not become affordable until the age of rocketry is replaced by an age of new propulsion technology—and only government action will make that happen.

Since Sputnik inaugurated the space age in 1957, chemical rockets have propelled every payload into orbit and beyond. Rockets work well, but they are expensive. Their high costs have restricted access to space to the governments, corporations, and organizations that can afford tens or hundreds of millions of dollars to launch a satellite. Consequently, half a century after Sputnik, only a few hundred tons of payloads, the equivalent of two Boeing 747 freighter flights, reach orbit annually. The number of people who have reached orbit since Yuri Gagarin in 1961 could fit into one Airbus 380.

Nor are rockets fully reliable. Their failure rate while carrying communications satellites to geosynchronous orbit in 1997–2006 was 8%. Taurus booster failures in 2009 and 2011 cost NASA $700 million in lost satellites. Insuring a communications satellite from launch through its first year of operation costs 11 to 20% of the total cost, which is two orders of magnitude greater than for a Boeing 747.

For $125 million, an Atlas V will lift 9,000 kg to low Earth orbit for $14,000/kg, which is much less than the $25 million for the 1,300 kg carried by a Taurus at $19,000/kg. Future developments promise some improvement, but even reducing costs by an order of magnitude, a goal not envisioned by rocket advocates in the next decades, still means a dauntingly high cost. The much-heralded Virgin Galactic space tours cost $200,000 per person (approximately $2,000/kg) but will go only 60 miles up, far below Earth or-bit and demanding an order of magnitude less energy.

Under current trends, the technology for reaching orbit in 2030 and beyond will be essentially unchanged from 1957. This continued dependence on rockets is not for lack of effort. Since the introduction of the space shuttle in 1981, the National Aeronautics and Space Administration (NASA) alone has spent over $21 billion on cancelled rocket programs such as the X-33. The military also has its share of cancelled projects, such as the Rapid-Access Small-Cargo Affordable Launch (RASCAL).

Efforts by private firms to develop rockets over the past two decades have largely floundered or become dependent on government funding. The problem is not incompetence or ineptness of governments, corporations, or individuals (although overly optimistic statements have created unrealistic expectations), but the very challenge of leaving Earth. The phrase “It’s not rocket science” is part of popular culture for a reason. The technology of designing, building, and launching a rocket into a harsh, unforgiving environment is very demanding.

Why, if the cost and reliability of rockets limit space exploitation and exploration, have alternatives not been developed? First, rockets fulfill existing limited demand sufficiently well to deter the development of alternatives. Indeed, the entire space industry revolves around chemical rockets. The situation is analogous to airplane engine technology in the 1930s, when the efficiency and output of piston engines increased even as their theoretical limits were becoming increasingly apparent. The military demands of World War II and the Cold War greatly hastened the development of the jet engine. No such pressing urgency exists today for rockets.

Second, proposed alternatives to rockets are technologically immature. Moving from the laboratory to practical application will demand billions of dollars over many years. The perceived benefits are too distant for industry or nonprofits to invest serious resources. Only the federal government can provide the sustained commitment over many years that is necessary for development.

And now for something completely different

The goal is not to develop new technologies for technology’s sake, but to develop technologies to drastically decrease the cost of reaching orbit.

One reason why rockets cost so much is that over 90% of a rocket’s weight is fuel and expendable rocket stages. The actual payload is only a few percent. The alternative to the rocket is a ground-based system (GBS), which keeps the engine and most of the fuel on the ground, so the spacecraft is almost all payload, not propellant. As well as being more efficient, GBS is inherently safer than rockets, because the capsules will not carry liquid fuels and their complex equipment, eliminating the danger of an explosion.

As with any technology in its formative phase, a range of possibilities exists. Leading contenders include beamed energy propulsion and space elevators. Magnetic levitation and light gas guns have less potential. Most important, the alternatives have the potential to reduce the cost per kilogram by up to two orders of magnitude to $200/kg.

In beamed energy propulsion, a microwave or laser beam from the ground station strikes the bottom of the capsule. The resultant heat compresses and explodes the air or solid fuel there, providing lift and guidance. Researchers in the United States and Japan have propelled small models by lasers and microwaves, demonstrating proof of the concept.

Space elevators employ a thin tether attached to a satellite serving as a counterbalance tens of thousands of kilometers above Earth. A platform holding the payload crawls up the tether. Generating more publicity and better art than beamed energy, this concept depends on the development of materials strong and light enough to serve as the tether.

Magnetic levitation and magnetic propulsion systems would give a high initial velocity to a spaceplane, which would then use a scramjet or rocket to propel itself into orbit. These are not true GBSs, but ways to replace the lower stages of a rocket with a more efficient, less costly way of reaching the upper atmosphere.

The idea of employing a gigantic gun to launch space capsules received a very public unveiling from Jules Verne in his 1865 From the Earth to the Moon. Serious development occurred a century later when the U.S. and Canadian governments funded the High Altitude Research Project (HARP) by Gerald Bull in the 1960s and the Super High Altitude Research Project ♯ at Lawrence Livermore Laboratory in the 1980s and 1990s. Instead of igniting a conventional propellant, the gun compressed a low–molecular-weight gas such as hydrogen to produce a higher velocity. The small projected payload of only 1 kg helped lead to the project’s cancellation. Growing interest in picosatellites, which weigh less than 1 kg, however, may revive interest in very small payloads.

A game changer

The concept of GBS encompasses a range of technologies with payloads ranging from a kilogram to hundreds or thousands of kilograms. All assume a high frequency of launches, so that a GBS system could launch thousands of tons per year, an order of magnitude more than current launchers.

Since the 1930s, every country that has developed rockets had the state play the major role in funding and guiding those efforts, whether civilian or military.

GBS should greatly change thinking about satellite design, function, and operations. The high cost of reaching orbit means satellites today are built to maximize yield per kilogram, which results in high costs to develop, assemble, and test satellites. GBS would provide designers with several options: an unchanged satellite with sharply lower launch costs; heavier but less expensive satellites; bigger, more capable satellites; and smaller, less capable satellites.

GBS will strengthen current trends toward distributing functions among many satellites and building picosatellites, nanosatellites (which weigh between 1 and 10kg), and microsatellites, which weigh between 10 and 100 kg. The satellite systems of 2030 may consist of clusters of satellites, each specialized but operating together.

Satellite operators may decide to launch a satellite by traditional rocket but send its fuel by GBS. NASA and the Department of Defense have started to investigate orbital refueling and replenishing. GBS could provide an incentive to develop these and related technologies to rendezvous, dock, transfer liquids, and build larger satellites in Earth orbit. Similarly, launching arrays of solar cells by GBS and attaching them to a satellite in orbit might meet satellites’ growing demand for more electricity.

The lower launch costs will reduce the need to maximize operational efficiency per kilogram. Satellites may grow in size and weight but drop in cost as engineers rethink their criteria for success. The development of standard structures, components, and modules may finally bring the efficiency of large-scale production to satellites, further reducing satellite costs and thus encouraging more actors to explore and exploit space.

Expanding existing services such as communications and remote sensing is an obvious market for GBS. By significantly reducing the barriers to entry posed by high launch costs, GBS should create obvious new markets such as providing propellants, water, and other bulk supplies to satellites and larger facilities in orbit.

The real radical promise of GBS is creating new markets made possible by both sharply reduced launch costs and the ability to launch thousands of tons annually. Just as the Erie Canal aided the western expansion of the United States in the 1820s by reducing the cost and risk of moving people, products, and produce, so too will GBS encourage and promote commercial exploitation and scientific exploration of space.

The 1994 NASA Commercial Space Transport Study identified many potential markets, including the long-expected space manufacturing and biopharmaceuticals, but also more provocative ideas such as orbiting movie studios, space debris disposal, and burial of nuclear waste. The report expected these markets to emerge only if launch costs dropped to hundreds of dollars per kilogram.

Perhaps the two most intriguing potential “killer apps” are solar power generation and nuclear waste disposal in solar orbit, two markets that could consume thousands of tons of launch capacity annually.

Space-based solar power (SBSP) promises gigawatts of electric power with minimum environmental damage. Too ambitious when proposed in 1968, technological advances and growing concern about providing environmentally friendly baseload electricity have renewed interest in collecting solar energy in orbit and transmitting it via microwave to Earth. Studies by the National Space Security Office of the Department of Defense in 2007 and the International Academy of Astronautics in 2011 concluded that constructing a one-gigawatt solar power station in geosynchronous orbit was technically feasible. The economics of launch costs, however, were another matter.

At $20,000/kg, launching the 3,000 metric tons of material and equipment for a SBSP station would cost an impractical $60 billion. At $200/kg, the launch cost would be $600 million, a much less daunting financial obstacle. For SBSP to become a reality, reducing the cost of reaching orbit is as important as the technology.

A less technologically and politically challenging market for GBS to serve is beaming electricity from a small SBSP to other satellites. The Air Force’s energy plan through 2025 considers this a desirable and transformational technology.

Even more unconventional is safely disposing of nuclear waste, a political, technical, and economic nightmare with no widely accepted solution. Historically, garbage has been buried or recycled. The idea of launching waste into space instead of burying it seems counterintuitive and dangerous. From an aerospace perspective, space is for satellites, not garbage. Neither the aerospace nor nuclear engineering community is advocating for space-based nuclear waste disposal, but GBS could change their thinking.

GBS could make disposing of nuclear waste in space economically and technically feasible. The concept is simple: A GBS system would launch capsules either directly to their destination (solar orbit or into the Sun) or into a high Earth orbit. If the latter, a solar sail or electric engine would then propel the capsule to its destination. The capsule, of course, would consist primarily of shielding and other technology to ensure the safety and integrity of the waste.

Space-based disposal may not only permanently solve a problem that threatens the future of nuclear energy but also fund GBS development and deployment. The Department of Energy expects to spend over $100 billion burying 50,000 tons of tons of high-level spent commercial fuel, a cost of approximately $1000/kg. Reactors elsewhere in the world have produced another 200,000 tons. A huge market exists to eliminate this waste. If space-based disposal proves politically, economically, and technically feasible, then shifting some of the billions of dollars destined for underground storage could fund GBS.

Roadmap to space

If GBS is such a good idea, why has it not been developed? The good news is that researchers have demonstrated that GBS concepts are theoretically feasible; the bad news is that these concepts remain in the laboratory. On the nine-stage Technology Readiness Level (TRL) scale that NASA and the military use to judge the maturity of a technology, GBS technologies are at TRL 1 or 2, still in the early stages of proving their practicality and worth. GBS faces the classic technological chicken-and-egg conundrum: Demand is too low to justify developing new technologies to reach orbit because the conventional cost of reaching orbit is so high that it depresses demand. This cycle can only be broken by government action.

For GBS to evolve into a mature, functioning system will require a sustained commitment of billions of dollars over many years. Developing GBS is a legitimate and necessary role of the federal government. Historically, the federal government has supported the development, construction, and operation of transportation infrastructure, including roads, canals, railroads, airways, and highways. Most pertinent, by 1957 the U.S. military had spent more than $12 billion (over $90 billion in current dollars) developing rockets. Without government funding in the 1950s, there would have been no NASA space program in the 1960s.

Rocket development received government funding because of the understandable market failure of the private sector and nonprofit organizations. In the 1920s and 1930s, individuals and private groups in Europe and the United States tried building their own rockets. They quickly discovered that rockets demand a commitment of financial, scientific, technical, and human resources far beyond what they could muster. Only a government could provide those resources. Since the 1930s, every country that has developed rockets had the state play the major role in funding and guiding those efforts, whether civilian or military.

Moving GBS from lab to launch will be a long journey of many steps. The immediate goal is to establish strategic roadmaps with metrics to enable proponents, patrons, users, and analysts to judge progress across competing approaches. A partial model to emulate is the 2011 Defense Advanced Research Projects Agency–NASA—organized conference that studied what technologies would be needed to build a starship.

Crucially, a conference to develop a GBS roadmap must include potential as well as existing rocket users, who need to consider how a radical reduction in launch costs would alter how they conceive of spacecraft and their applications. GBS designers need to know what existing and new users want, such as the minimum acceptable payload weight and maximum acceptable acceleration. Involving users will also create stakeholders to support GBS.

This roadmap conference will create criteria to judge the development of GBS. These paper studies will cost very little. The next step will be laboratory studies to climb the TRL ladder. Depending on the desired pace, this stage will demand millions or a few tens of millions of dollars annually.

The current low levels of technological maturity offer both challenge and opportunity. GBS covers several competing approaches, with many alternatives within each approach. The United States should not make the mistake that the nuclear power industry and Atomic Energy Commission made in the 1950s and 1960s when they focused on the technology closest to commercialization instead of the type of reactor best suited for civilian power generation. The government must carefully establish a level playing field among competing approaches lest it foreclose on a longterm promising technology.

Only after a few years will funding significantly increase as development moves from component and proof-of-concept testing to integrated prototypes. At that point, decisions must be made about whether GBS is sufficiently promising to merit significant funding and, if so, which concept or concepts to advance. This research may prove that expectations were unrealistically optimistic. If so, better to know that earlier than later and to realize that rocket technology will indeed continue to dominate access to space.

GBS needs an agency to nurture it until it is ready for commercialization. Candidates are NASA and the Department of Energy (DOE). NASA is the obvious but not necessarily the best sponsor because of its focus on rockets. The agency’s 2010 Technology Area Breakdown Structure of its Space Technology Roadmaps includes only “ground launch assist” as part of possible future launch propulsion systems, indicating a lack of current institutional support for GBS.

The DOE may offer a more inviting home, especially if its new Advanced Research Projects Agency–Energy (ARPAE) administers the research to move GBS up the TRL ladder. Because the DOE has less invested institutionally in rockets, it may prove a stronger sponsor. Furthermore, many GBS technologies are electrical in nature, necessitating a different skill- and mindset than that of NASA’s rocket engineers and scientists. Indeed, changing launch systems will demand changing existing ways of organization, funding, thinking, and acting, as well as developing the new technologies.

The search for alternatives to rockets is not exclusively a U.S. endeavor. The International Symposiums on Beamed Energy Propulsion (ISBEP) have attracted engineers and scientists from Japan, China, Russia, Brazil, South Korea, and other countries. Because the level of investment is so low at this stage, international cooperation in exploring GBS alternatives may be easily accomplished. Possibly a bit of friendly national competition might accelerate GBS development.

GBS may prove more promise than potential. The technological challenges may prove overwhelming or the costs too similar to those of rockets to justify development. But unless a dedicated effort tests that potential, low-cost access to space will remain in the realm of science fiction.

Developing GBS will be expensive, but the failure to create low-cost access to orbit will be even more expensive by delaying the large-scale exploration and exploitation of space. As with nuclear fusion research, the potential is great. Unlike fusion research, the time to success will be much shorter—if the effort is made. Just as government funding developed the technology that enabled humanity’s first footsteps into space, so too can the government development of GBS make the second half-century of the space age even more exciting than the first.

Decoupling Water and Violent Conflict

As the saying goes, water is the stuff of life. It is a basic human need, the lifeblood of critical ecosystems, and a basis for livelihoods and cultures for countless communities around the world. There are no substitutes for water’s most important uses. Recognizing A water’s importance, the United Nations (UN) has declared it to be a human right.

Water is also the stuff of social conflict. Control of this strategic resource shapes global trade and investment in diverse sectors such as agriculture, minerals processing, apparel, and electronics. Reconciling the many uses of water is never easy, and as demand grows, competition among them heats up. Climate change, with its potential for profoundly affecting the water cycle globally and locally, adds a volatile new element to this mix.

Apparent intensified global competition around water has led some observers to foresee growing risks of violent conflict. The Secretary-General of the UN, Ban Ki-Moon, warned the World Economic Forum in 2008 that “environmental stress, due to lack of water, may lead to conflict, and would be greater in poor nations. . . . As the global economy grows, so will its thirst. Many more conflicts lie just over the horizon.” In the United States, a report released by the National Intelligence Council in May 2012 also drew a stark picture, warning that “During the next 10 years, many countries important to the United States will experience water problems—shortages, poor water quality, or floods—that will risk instability and state failure, increase regional tensions, and distract them from working with the United States on important U.S. policy objectives.” Although deeming it unlikely that countries would wage war over water in the near term, the assessment cautioned that “water problems—when combined with poverty, social tensions, environmental degradation, ineffectual leadership, and weak political institutions—contribute to social disruptions that can result in state failure.”

Is there a looming threat of violence and instability around water? Are rival uses and growing demand for limited supplies poised to take a dangerous turn? What are the real risks, and what can be done to minimize them? Current knowledge about these questions is far from complete, and factors such as climate change and economic uncertainty confound confident forecasting. Nevertheless, what is known suggests important and perhaps surprising answers to these questions.

On the one hand, there is already extensive violence around water in today’s world, and little has been done to institutionalize the means to prevent or effectively manage water-related conflicts. On the other hand, the greatest risks come not from countries going to war over shared river basins or from rival factions launching civil wars to control water supplies. Rather, the danger is in a more complicated mix of bad water management, the marginalization of water-related livelihoods, and the undercutting of critical services provided by freshwater ecosystems, compounded by the inability of many governments to manage such challenges legitimately and effectively. These problems have little to do with growing water scarcity per se; rather, they are rooted in shortsighted, archaic, or unfair policies that affect water. The risks are real, but they are also fundamentally different than typically portrayed.

Is water growing scarce?

Obviously, water is scarce for anyone who lacks safe, affordable, and reliable access to it. Despite some progress in improving access, almost 1 billion people lack safe, reliable supplies of drinking water. More than 2 billion lack adequate sanitation. An estimated 2 million people die annually from preventable water-related diseases. The World Health Organization has estimated that 10% of the global disease burden could be eliminated simply by improving access to clean water and sanitation. Like people, ecosystems also face health risks when denied adequate supplies of water. Studies suggest that freshwater ecosystems, which provide vital goods and services such as clean water, biodiversity, flood protection, and recreation, are as endangered as any other type of ecosystem on the planet, given all the damming, dumping, draining, and diverting to which they have been subjected.

In global terms, freshwater availability is limited by the water cycle. Humans currently trap and use less than one-fifth of global runoff (the portion of the water cycle between rainfall and the sea), suggesting substantial possibilities for future growth. But after subtracting floodwaters that cannot be captured and the water in extremely remote river basins, the proportion trapped and used by humans rises to about half of what is realistically available. Tapping groundwater offers a temporary supplement, but those resources are being drawn down at unsustainable rates in many aquifers. Desalination of seawater is growing in some locales, but will probably remain a niche solution for the foreseeable future, given its cost and energy intensity. And none of this considers the substantial proportion of runoff that is best left alone to sustain the health of essential freshwater ecosystems. This combination of factors gives rise to what some experts have dubbed the gloomy arithmetic of water: growing populations and growing demand encounter a more or less fixed supply. Projections suggest that as much as one-third of the world’s people will live in river basins marked by significant water deficits within a few decades.

But before accepting these circumstances as a guarantor of scarcity and conflict, several cautions are in order. First, water is used with astonishing inefficiency in many agricultural, industrial, and municipal applications. The United States uses less water today than it did in 1980, despite a one-third increase in population and a more than doubling of gross national product, primarily by reducing some of the grossest inefficiencies, particularly in agriculture. Tightening supplies and rising prices can generate social conflict, to be sure. But they can also stimulate innovation and more efficient use. A recent study found that up to one-third of California’s urban water use could be saved with current technologies, and 85% of those savings are available at lower cost than that of securing new supplies.

Second, scarcity can be caused as much by the laws, policy decisions, and economic practices that govern water as by the physical limits of supply. Water supplies can be protected through water-quality measures and prudent landuse practices that keep watershed ecosystem services functioning. The failure to do so makes less water available for its most valuable uses. Inappropriate incentives, such as heavily subsidized electricity for pumping groundwater, can cause overuse of water. Unwise subsidies may also lead water users away from effective, longstanding practices that enhance supplies, such as rainwater harvesting or the maintenance of community water tanks.

Third, and more controversially, so-called “virtual water” must be considered in any assessment of water scarcity. The term refers to the water embodied in goods manufactured or harvested in one place but consumed in another. Simply put, many processes that use water can be accomplished with less water use, or with less pressure on scarce water supplies, simply by doing them elsewhere. For example, cotton grown in the deserts of Central Asia or the U.S. southwest relies on costly irrigation schemes and makes little sense in water terms when compared with, say, the rainfed cotton crops of West Africa. Relying on virtual water by importing water-intensive products promises its own tensions and conflicts, particularly when an imposed reliance on volatile world food markets hurts local food security, or when land grabs lock up large tracts of prime farmland for remote consumers. This practice offers a reminder, once again, that there are important social variables—laws, policies, economic incentives, and management decisions—that sit between how much water is physically available and how it is ultimately used.

The wild card in all of this is climate change, which promises profound, if sometimes hard to specify, consequences for the water cycle. Climate models generally agree that precipitation and evaporation will be accelerated with global warming; that current weather patterns will tend to shift toward the poles; and that, very roughly speaking, dry regions will tend to get dryer and wet ones wetter. Water supplies from glaciers and snow cover will see a net decrease in a warmer world, and precipitation patterns are projected to become more variable and more concentrated, with a higher proportion of total rainfall coming in storms of higher intensity. Climate science also suggests an increase in extreme weather events, although the evidence here is less clear-cut. The big message, of course, is that all of this will impose very difficult adjustments on water systems and uses around the world, even in places where water does not become less available.

The adjustment problem is complicated by the fact that water as a resource is already notoriously difficult to manage well. Water supplies are highly unevenly distributed across the landscape and can be highly variable from season to season and year to year. And because water flows, it is hard to capture and even harder to establish effective and enforceable rights to its access and use. Water can be moved beyond the river basins, lakes, and aquifers in which it resides, but often only at great economic, social, and political cost. Moreover, water is notoriously difficult to price properly, because of the many externalities that surround its production and the many public-goods aspects of its use. Managing water well is as difficult as it is essential.

Need for proper focus

Given all of these factors, fears about a future of violent conflict around water are not surprising. Most of the concern has centered on the specter of dangerous competition for water supplies in international river basins. Two-thirds of the world’s major rivers either cross borders or form borders between countries, and virtually all of the world’s non-island nations share at least one important river with one or more neighbors. This can create complex and difficult hydropolitical relations. Shared use by two or three countries can create stark asymmetries in the balance of power between upstream states that control the headwaters and downstream states in the floodplain or delta. In cases in which large numbers of countries share a river, such as the 6 countries arrayed along the Mekong, the 10 through which the Danube passes, or the 12 sitting in the Nile watershed, the complex challenges of collective action may inhibit cooperation and heighten tensions.

The good news is that although countries may sometimes use bellicose rhetoric when discussing water supplies, there are no significant examples in the historical record of countries going to war over water. The most comprehensive study to date, which looked at water-related events in shared river basins during the second half of the 20th century, found that cooperative events, such as treaties, scientific exchanges, or verbal declarations of cooperation, outnumbered instances of conflict, such as verbal hostility, coercive diplomacy, or troop mobilization, by roughly two to one; and that even the most severe episodes of conflict stopped short of outright warfare. Moreover, when conflict episodes did occur, they were typically not the result of water scarcity. Rather, the key factor was the inability of governments to adapt to rapid changes, such as when part of a country split off to become a new one or when a decision to build a large dam was made without consulting downstream neighbors.

The reasons for the lack of violent conflict are not surprising: War between nations is an increasingly rare event in world politics, water relations are embedded in broader relations between countries, and there are far less costly alternatives than war to improve water availability or efficiency of use. Well-designed cooperative agreements can go a long way toward managing shared rivers in a fair and peaceful manner.

But there is also bad news. First, the absence of historical water wars provides no guarantees about the future, especially if the world is entering an era of unprecedented competition for water supplies. Even if war is unlikely, there is a risk that governments will increasingly use water as a tool of coercive diplomacy, injecting tensions into international diplomacy and complicating opportunities to find cooperative water solutions. Second, well-institutionalized cooperation is not common and can be difficult to create. Fewer than half of the world’s shared rivers have a treaty in place governing water uses other than transportation, and many of the existing accords are archaic instruments that lack a foundation in best practices for water management. Catalyzed by the World Bank, the Nile basin states have been negotiating since 1999 but have not hit on a formula for a basinwide treaty. Several upstream states in the basin signed a separate deal in 2010, but it was vehemently opposed by downstream Egypt and Sudan.

There are some examples of enduring agreements in tense neighborhoods, including those on the Indus, Mekong, and Jordan rivers. But even among basins that do have modern agreements, few contain all countries in the basin, and many of the more impressive agreements on paper have not been implemented effectively. In 1997, the UN General Assembly approved a framework convention setting out core legal principles that should guide basin-level cooperation, including equitable water use, avoiding significant harm to other basin states, sound environmental management, information sharing, and peaceful dispute resolution. But the convention has languished, with only a handful of UN member states having ratified it.

Thus, there is a clear need to strengthen institutionalized cooperation. But the deeper problem is that some agreements may actually increase the risk of conflict. Most of the violence around water plays out on a local scale, such as a localized catchment, rather than on the scale of an international drainage basin. This means that many of the most immediate risks of escalating water conflict are found at the subnational level rather than in relations between nations. A mutually acceptable agreement may smooth tensions between countries (arguably, the smaller conflict risk). But if that agreement embraces poorly conceived water development plans or imposes unmanageable costs of adjustment on local communities, it may worsen the larger conflict risk of tensions within the watershed and complicate the challenge of managing water equitably and effectively.

The Mekong, which originates on the Tibetan Plateau and culminates some 4,000 kilometers downstream in the delta region of southern Vietnam, provides an important example. A cooperative arrangement among the four lower riparian countries (Cambodia, Laos, Thailand, and Vietnam) was established in 1957, culminated in a treaty at the end of the Vietnam War in 1975, and evolved into a stronger, modernized agreement in 1995. The agreement has been a vehicle for the exchange of information among participants and for efforts to engage upstream nonmembers China and Burma. Bringing those countries fully into the regime would certainly stabilize regional interstate relations. But what if the effect of doing so is an acceleration of poorly planned dam development throughout the basin? The consequences for the millions of people who live in the delta and depend on the river’s unique seasonal ebb and flow for their livelihoods in fishing and agriculture could be severe in terms of dislocation, human insecurity, and destabilizing social conflict.

Incentives for conflict

The emphasis on conflict risks in shared river basins also overlooks the changing character of violent conflict in world affairs in recent decades, with domestic or civil conflict outstripping interstate war as the primary threat to peace. Do a country’s water circumstances enhance the risk of civil war? Two decades of scholarly research and debate, although not resolving the issue, have reframed thinking about this important question. There are many documented instances in which scarcities of renewable resources, including water, but also forests, grazing land, and fisheries, have generated localized violent conflict. But efforts to correlate water scarcities or drought with the onset of civil war have for the most part not found a statistically significant link. Civil wars are driven primarily by factors such as political exclusion and economic marginalization, often accompanied by or channeled through ethnic, religious, or other forms of cultural bias. Although resource scarcities may be an element in creating those conditions, it is clear that important social variables, such as the quality of governance, the inclusiveness of institutional processes, and the equity of economic arrangements, play a key mediating role between environmental pressures and any social outcomes. Another way of stating the findings: Countries that have these properties have substantial ability to manage resource conflicts peacefully, whereas countries that lack them are already at much greater risk of civil conflict, irrespective of resource pressures.

Indeed, to the extent that there is a direct connection between natural resources and civil war, it seems to stem from resource abundance, not environmentally induced scarcity. According to the UN Environment Programme, at least 18 civil wars since 1990 have had a direct link to natural resources, either because the conflict was about control of the resource, or because control of the resource generated revenues that enabled combatants to sustain the conflict. The link between resource abundance and civil conflict—a variation of the so-called resource curse—is particularly pronounced for petroleum, but has also been documented with regard to high-value resources that are easily looted, such as alluvial diamonds or hardwood timber.

This link raises a provocative question about water. As demand growth and climate change make existing water supplies more valuable, perhaps dramatically so, could water become a conflict resource along the lines of oil, precious minerals, or hardwood timber? To answer this question requires understanding the causal mechanisms at work behind the association between resources and conflict. Much attention has been given to the idea that high-value resources can lead to civil violence by creating economic incentives for secession or insurgency: splitting off to create a new state if the resource is in a remote region, or seizing control of the state itself if necessary to control the resource. Here, the difficulty of capturing and controlling water resources suggests that there may be little in common with oil, although the salience of water in driving the burgeoning phenomenon of agricultural land grabs represents a different but no less troubling sort of resource capture.

However, the heart of the problem is not simply an incentive to struggle for control of economically useful resources, but rather the type of governance that springs up around such resources. The problem is not simply that revenues are large (in principle, a good thing for the host government and the nation). The problem is that such revenues fluctuate wildly in unstable world markets, breed corruption and patronage, and exempt governments from doing the hard political work of legitimate taxation, a critical source of broader political legitimacy. As researcher Michael Ross has put it, “States whose revenues are massive, unstable, opaque, and do not come from taxes, tend to have some strange qualities”—including proneness to civil conflict.

Although revenues from water are unlikely to rival those from oil any time soon, there are ways in which water projects can emulate some of the worst features of the resource curse. Dams and infrastructure for large-scale hydroelectric or irrigation schemes are vast capital-intensive public works projects that lend themselves to corruption and the cultivation of patronage networks. Moreover, an abundance of high-value resources can lure governments into embracing bad development models, with excessive borrowing against presumed revenues, social illegitimacy, and a failure to manage risk properly. And if the project is to export electricity or agricultural commodities, it links the country’s economic fate to highly volatile international markets in food and energy.

An example is the controversial Belo Monte dam project in the Brazilian Amazon. Although Brazil’s government has embraced Big Hydro as a centerpiece of the country’s energy future, the myriad uncertainties around projects designed to work for half a century or more make them highly risky ventures. An independent study that modeled plausible scenarios for the dam’s future, based on several key parameters that will shape its ultimate profitability, illustrates the problem. The researchers found that in less than one-third of the scenarios they modeled did Belo Monte turn out to be an economically rational undertaking over the full course of its usable life. In all of the other scenarios, profitability was derailed, sometimes disastrously, by some unfavorable combination of energy prices, inadequate stream flow, better economic uses of the river and surrounding land, and other uncontrollable factors. The dam has also fostered intense protest and dissent, not unlike the local controversies surrounding oil extraction in the Niger delta, the western Amazon, and elsewhere. Thus, rather than avoiding outright civil war over scarce water supplies, the larger problem is to avoid the toll of the resource curse on the political-economic capacity and social legitimacy of governments, by finding a prudent path of water development.

Finding a prudent path

The picture emerging from what is being learned about water and conflict, at both the international and national levels, is that the problem is more complex than simply zerosum competition for scarce water supplies. If there is a risk of large-scale violence, it stems not from physical water scarcity but from the institutionalized arrangements that shape how actors govern water. The risks—of destructive competition, failures of cooperation, and perhaps even violent conflict—increase when such arrangements are ineffective, illegitimate, or unable to adapt to changing circumstances. International cooperation initiatives and national policy reforms that frame the problem narrowly as water scarcity will not be effective conflict-management tools.

Indeed, poorly designed responses at the national or international level could easily make matters worse. Most of the expressions of grievance, confrontational events, and episodes of actual violence that occur around water take place on a local scale. Comprehensive data on episodes of water-related protest, rioting, strikes, and other forms of contention and confrontation are lacking, but accumulating evidence suggests that they are abundant. Globally, an estimated 40 million to 80 million people have been displaced as a result of dam construction, many forcibly and most without adequate compensation. A keyword search of world news coverage for any given month will turn up several episodes of protest around water pricing issues, poor water service delivery, contentious infrastructure projects, water pollution controversies, or land grabs that cut off water access. An events database compiled by researchers at the Peace Research Institute Oslo identified 1,850 “conflictive events” around water in the Middle East/North Africa region from 1997 to 2009, or one episode every 2.5 days.

Most contentious episodes around water are not violent. Nor are they necessarily a bad thing, because protest may be a healthy expression of citizen concerns and may provide a needed stimulus for positive change. But such episodes underscore the high stakes for people’s livelihoods and the health of communities and can be taken as early warning signals of failed or illegitimate policies. Nor is it enough to fall back on calls for broader political reform. Democratization does not by itself reduce the frequency of contentious episodes around water and may even increase them, because of the greater political space it affords for expressing dissent.

There is also worrisome evidence that water-related stresses, including but not limited to scarcity of supply, may stimulate more than just peaceful protest. One recent study, covering 40 countries in sub-Saharan Africa over a 20-year period, found that deviations from normal rainfall patterns led to an increase in social conflict, including demonstrations, riots, strikes, antigovernment violence, and communal conflict between different social groups. The researchers found an upsurge during both unusually wet and unusually dry years, with the link to violent forms of conflict being stronger in unusually wet years. Another study, targeted more narrowly on East Africa, found a similar pattern of increased conflict around rainfall extremes, with higher rates of antigovernment rebel activity in unusually dry years and higher rates of intergroup conflict in unusually wet ones.

It remains unclear whether social tension and violence in these instances are triggered by the effects of extremely dry or wet years on people’s lives and livelihoods, by the grievances that result from how governments respond or fail to respond, by the perception that certain conditions are political opportunities to be exploited, or by all of the above. But again, the message is that the problem is not simply one of scarce water. The dangers lie in the turbulence of adjusting to change and in the risks that result when legitimate, welladapted responses are not forthcoming.

Moving ahead

Spurred by a fear of conflict over scarce water supplies, the international community, including bilateral donors, the World Bank, and nongovernmental organizations such as Green Cross International and the World Wildlife Fund, has sought to strengthen international water diplomacy in several ways. Efforts include promoting ratification of the UN Watercourses Convention and fostering dialogue efforts such as the Nile Basin Initiative, the German-funded River Basin Dialogue for Africa, and a host of basin-specific initiatives. To be helpful in managing water-related conflicts, such efforts must recognize the local and national dimensions of conflict risk, rather than simply the international dimension. And they must tackle the complex roots of a problem that cannot be reduced to preventing scarcity-induced water wars.

Perhaps the most important policy reform is to strengthen people’s rights to water and its benefits and give them a greater voice in water governance. In this light, the UN took an important step by recognizing the human right to water and the obligation of states to respect, protect, and fulfill water rights. The human right to water is also implicit in internationally recognized rights to food, survival, adequate living standards, and peoples’ right to manage their own resources. Although the strengthening of the rights-based approach to water needs is a welcome trend, it must be complemented by stronger participatory mechanisms that give stakeholders a clearer and more audible voice in water governance decisions. Most of the attention here has gone to national-level reforms in water law. Although such changes are important, they need to be reinforced at other scales, including local community-based natural resource management and international river basin negotiations.

Toward that end, the UN Watercourses Convention should not simply be ratified but improved. The convention provides a useful framework for guiding negotiations between governments to minimize conflict over shared basins, based on principles of information sharing, prior notification, dispute resolution mechanisms, and participation by all basin states. However, the convention lacks any commitments to the human security of local communities affected by international water dealmaking or provisions for voice and redress on the part of those communities (some of which were removed from the original draft articles when the convention came to the UN General Assembly). The convention is also inadequate in its environmental provisions, which stress pollution control but not ecological flows and ecosystem management. Without these innovations, the convention may facilitate better dividing of water supplies among nations but offers little guidance on genuinely cooperative watershed management for sustainability and human security. Without such reforms, the rush to promote international cooperation may simply accelerate poorly conceived exploitation of water resources.

There is also substantial room to improve water-related foreign aid. In terms of overall aid flows, water has been a second-tier priority as compared with other sectors, garnering only about 5% of total aid flows. Moreover, water aid tends to privilege drinking water over sanitation, new sources over existing systems, supply over demand, and resource development over ecosystem services and water quality.

Approaches to this problem must be flexible, especially given the highly uncertain climate ahead, financially as well as climatologically. Capital-intensive, large-scale water projects have had highly variable performance, often failing to return the benefits projected for them, and this has been the case even under more predictable financial and hydrologic conditions than those that lie ahead. Nor does it make sense to rely on silver-bullet solutions, be it desalinization, genetically engineered crops, or virtual water. These innovations may matter on the margins, but they probably will not be transformative during the next few decades, if ever. A diverse portfolio of responses, stressing demand management and locally adapted solutions, promises a more flexible, resilient approach. Moreover, this rule of thumb applies to policies and institutions as much as technologies. Much of the attention during the past decade has been devoted to institutional reform for better-integrated water resources management, in the sense of stronger coordination among different user sectors and different levels of governance. Although such reforms have been much needed, care must be taken that they do not create such inflexible, hard-wired structures of water decisionmaking as to be unable to adjust to a rapidly changing terrain.

Finally, it is critically important to tap water’s cooperative, peace-building potential, on scales ranging from local communities to international rivers. Strengthening dispute resolution mechanisms in shared international basins is of course a key element. But the lack of effective dispute-resolution tools is one of the weaker links in the chain of good water governance at all scales, national and subnational as well as international. There is also a need to link these levels more effectively, because transnational forces such as land acquisition or price fluctuations in global food and energy markets often drive local disputes. Current mechanisms, such as the Permanent Court of Arbitration, the World Bank’s inspection panel, or the World Trade Organization’s dispute resolution process, are ill-suited to address transnational disputes that involve a wider cast of characters than just national governments.

Looking ahead, there are twin dangers. The first and more obvious is an inadequate response that fails to address water poverty, the onslaught facing freshwater ecosystems, and the weakly institutionalized cooperation concerning so many of the world’s rivers. The second peril—perhaps more subtle, but no less important—is the danger of doing the wrong things: sacrificing water sustainability to food and energy needs, using international cooperation in a way that forces the adjustment costs of poorly conceived development onto local communities, or rushing to replace Big Fossil with Big Hydro for a greenhouse future. If water governance is to be truly conflict-sensitive, it must navigate effectively between these two dangers.

Global Bioethics: Hopes, Fears, and New Voices

During the 1990s, James Grifo, a physician and researcher at New York University, had been working to develop a technique to help treat certain kinds of infertility. Although in vitro fertilization (IVF) treatments had been successful for many of his patients, IVF could not help women whose eggs were genetically sound and could be fertilized, but were not viable enough to grow into a healthy embryo. In such cases, Grifo imagined it might be possible to remove the nucleus from a donor egg from a healthy woman, replace it with the gene-carrying nucleus taken from the patient’s egg, and then implant the reconstructed egg into the patient’s uterus where it would continue to develop. Because the implanted egg would retain the mother’s DNA, she would give birth to a biologically related child.

Although the idea had never been tested, it gave hope to one of Grifo’s patients who desperately wanted a biologically related child. Willing to gamble on this coveted goal, she gave Grifo half a million dollars over 10 years to work on the technique.

It was a basic human desire combined with unfortunate circumstances, but also with the extraordinary potential that scientific research seemed to offer. These are common ingredients in questions of bioethics. Health science research is driven by many kinds of desires and is often coupled with a sense of urgency. Previously unimagined techniques seem to put distant hopes suddenly within reach.

Complexities arise

Grifo first conducted a series of experiments in mice. Once he had perfected the technique of nuclear transfer between eggs, he wanted to see if the eggs could produce viable offspring. His team implanted the eggs in mice. It worked. Several litters of healthy baby mice were born.

The time felt right to try the technique in humans. Grifo and his team had become adept at the precise and fastidious technique of nuclear transfer, and his patient, having waited while the technique was developed and perfected, was getting older. The team tried the experiment in five patients, including the woman who had funded the research.

It failed. “The eggs made with nuclear transfer fertilized and made embryos, but no one got pregnant,” Grifo explained. The eggs, it seemed, were too immature.

At New York University, Grifo is the director of the Division of Reproductive Endocrinology, the director of the Fertility Center Program, and a professor of Obstetrics and Gynecology. His line of work meets a real demand. According to the Centers for Disease Control and Prevention, nearly 7.4 million U.S. women between the ages of 15 and 44, or roughly 12% of this demographic group, have sought treatment or services for infertility. Behind these statistics are individuals and families struggling with difficult news and asking about what new treatments might become available. Although most women lack the wealth and willingness to go to such extreme lengths as Grifo’s patient did, infertility evokes deep human emotions, desires, and hopes. It also brings out deep fears.

So do some new scientific procedures, especially when they relate to creating, sustaining, or ending human life. And here bioethics gets complicated. Here, profound individual experiences of hope, desire, and fear meet with disparate societal hopes and fears, ethical questions, and a fair measure of the unknown.

To many people, bioethics sounds like an abstract idea, something official panels and committees discuss. But bioethical problems start with a story, or usually many stories, often about people having hope despite long odds. Hope to overcome a disease, to conceive, to heal from an injury. And when that story has conflict, as all good stories do, the conflict often comes in the form of fear: fear of the unknown, fear of cultural change, fear of technology, fear of ethical or moral slippery slopes.

Grifo and his team ran headfirst into that fear. One day in 2001, Grifo received a call from Susan Blumenthal, who was then the U.S. Assistant Surgeon General.

“I’ll tell you exactly what she said,” Grifo recalls: “‘What the hell do you think you’re doing up there?’ So I explained the history, the fact that we had IRB approval for all aspects of it. And she said, ‘You need to do this in monkeys first.’ Well, monkey IVF is way behind human IVF, and I don’t have any monkeys who want it.”

IRB approval—approval by an institutional review board—is a cornerstone of ethical and responsible research. Before any research can be done on animals or humans, the institution (New York University in this case) –must conduct a review of the proposed research to ensure that it conforms to ethical standards. Grifo’s research had received such approval every step of the way.

A week after the telephone call, Grifo received a letter from the U.S. Food and Drug Administration (FDA) telling him that he had to file a new drug application. He was shocked. “We weren’t doing drug research,” he says. “The FDA doesn’t regulate this kind of research. They dared me to keep doing it.” In fact, in 2001, the FDA did claim jurisdiction over nuclear transfer research. It had become clear that Grifo would have a hard time continuing this research in the United States.

Here, bioethics gets even more complicated: Science and bioethics are globalizing. Researchers collaborate across universities, countries, continents, and cultures. Worldwide, people such as Grifo’s patient face health challenges and raise hopes that drive research. Lawmakers in different countries are making different decisions about the ethics of such research. As research travels, it runs into different ethical and legal boundaries and also potentially transgresses or circumvents those boundaries. This can feed xenophobic stereotypes in which some countries are depicted as overly permissive, as fundamentally unethical. But it has been amply demonstrated that stereotypes often obscure more than they reveal.

Scene shift

At the time of Grifo’s telephone call from the FDA, John Zhang, now a well-known IVF physician in New York, was a senior research fellow in training with Grifo. Zhang had colleagues in China, and Grifo and Zhang decided to offer the researchers in China the chance to continue the work. None of them anticipated what would happen next.

On October 14, 2003, major media outlets—including The New York Times and The Washington Post–reported that a research team at Sun Yat-sen University in Guangzhou, China, had successfully impregnated a woman using eggs made by nuclear transfer. This was the team, led by Guanglun Zhuang, to which Grifo and Zhang had given their research. Although no baby was born–the three fetuses that developed from implanted eggs were delivered too early to thrive–the research nonetheless suggested that the technique was sound. Grifo recounts that the lack of success was due to obstetrical problems rather than problems with the procedure itself. “It worked,” he says emphatically.

The media focused on several concerns. One was that since human eggs contain a small energy center called a mitochondrion, which exists outside the nucleus and has its own tiny amount of DNA inherited solely from the woman who produced the egg, children born of this technique could be said to have three genetic parents: the egg donor, the woman who carried the implanted egg to term, and the man whose sperm was used to fertilize the egg. Also, concerns were expressed that research using the nuclear transfer technique was a step toward genetic engineering of human beings and human cloning. A third concern was that the technique, still experimental, might pose unknown risks to the safety of the mother and any children. Media reports highlighted the newness and riskiness of the technique, framing it as a story of questionable scientists and questionable ethics. They asked, ought we to do this kind of work? Is it too risky?

Grifo was shocked at the emergent controversy. For him, the media reports fueled public outrage and misunderstanding. He is adamant that the procedure does not constitute cloning. “Cloning is making a copy of a human being who already exists,” he said in a 2003 interview with The New York Times. “This is nuclear transfer, one element of cloning. It allows a couple to have their genetic baby, not a clone. They shouldn’t even be discussed in the same sentence.”

It is important here to clarify the distinction between reproductive and therapeutic cloning. Mention of human cloning tends to evoke the image of an identical person, but this has not yet been shown to be possible in humans. Therapeutic cloning, which is the aim of most human embryonic stem cell research, involves the production of an embryo with identical DNA to the patient from which stem cells are then harvested and used–hopefully–to treat the patient’s condition without risk of an immune reaction. Reproductive cloning has the intention of creating a genetically identical human being and is banned in most countries. Thus the debate about the use of human embryos for stem cell research involves therapeutic cloning but not reproductive cloning, even though they share techniques.

To Grifo, the issues raised in the mainstream press represented a misunderstanding of the science, the kind of misunderstanding that is often at the center of bioethical debate. The researchers saw their work as straightforward and in the interest of patients. But other people had more visceral reactions, along with complex questions about how, when, and under what conditions scientists ought to intervene, for instance, in matters such as human reproduction.

It is also hard to separate politics, economics, and culture from the controversy. Individual experiences, cultural elements, national politics, economic competition, and global politics all shape bioethics together, and each of those is somewhat influenced by, and also influences, media portrayals.

Viewing events in retrospect, Grifo says he would never have published anything until the technique produced a baby. He knows it makes a difference when and how people hear about a technique in the media. Recalling the first attempts in the 1970s to produce a baby via IVF, he notes that the first was an ectopic pregnancy and the second a miscarriage. If the press had reported on these results in today’s environment, he reasons, government regulators would have stepped in and researchers would not have been allowed to make the progress that they have in IVF. By now, more than 3 million babies have been born through IVF.

Traveling science, traveling bioethics

Bioethics gets even more complicated when deeply personal disruptions become entangled with national, international, or indeed global considerations. Bioethics frequently addresses questions of global significance that consider human flourishing and risk on a grand scale. But the experiences that it draws and deliberates on are often, at their core, deeply personal: bearing a child, watching a loved one suffer, living with a devastating disease, facing death.

The nuclear transfer experiment was, at its core, about real women with all the personal challenges that go along with pregnancy and infertility. It was a familiar story: A woman wanted to have a baby of her own and had fertility problems. She wanted the baby to be genetically related to her, not to the egg donor. This mattered to her personally, not as an abstract and theoretical question of ethics. It also mattered to the women who participated in the China study.

To Grifo, the research is ethical in that it answers to a serious problem; it is the regulations that are not ethical. Of the patients struggling with fertility problems, “I sit here and listen to them weep,” he told The Wall Street Journal. “That is powerful. And not one person writing the laws understands that.”

For Grifo and Zhuang, the tears and hopes made transnational partnership worthwhile. But if the media storm that followed was hard to foresee, so perhaps were the stereotypes embedded in that storm.

Wild East?

Reproductive biomedical research is not just about the ethics of conception; it is also about the ethics of misconception. When the West generates stereotypes about Asia, there are personal repercussions for Asian researchers, for the global research community and its supporters, and for people wanting to bear children and manage disease. The way people in the United States perceive Asia has implications for the future, for Asia, for the United States, for science, and for questions of global bioethics.

How do national boundaries matter as scientific research becomes increasingly global? As the Grifo case illustrated, it is hard enough within a single country to agree on bioethical questions. As researchers and research increasingly cross national boundaries, and because biological research increasingly has implications for all of humanity, people are asking questions about how it might be possible to establish international standards of bioethics in light of cultural differences and scientific competition.

For some people, the Grifo-Zhuang experiment smacked of ethical outsourcing. It gave rise to fears that Asia was like a new Wild West–or Wild East–of unfettered, unethical scientific practices. A Wall Street Journal report in 2003 on the work pointed to the light enforcement of regulations governing fertility clinics in China, “making China a growing haven for freewheeling research into reproductive medicine and cutting-edge genetics.” Jeffrey Kahn, the director of the Center for Bioethics at the University of Minnesota, said in a 2003 New York Times interview that he sees this kind of transnational collaboration as a way of skirting ethical issues and regulations, “as an end run around oversight and restrictions within the United States.” To the extent that bioethics is shaped by hope and fear, this is the face of the fear: fear of the unknown and often a xenophobic fear.

News representations of Chinese biotechnology at that time reflected such fear. Reports said that Chinese biologists had engaged in human cloning, that embryologists had transferred human cell nuclei into rabbit eggs, and that relatively little public debate was taking place. Such reports fueled an impassioned and fearful response in the United States. As a typical example, the New Atlantis journal ran an article in 2003 titled “Chinese Bioethics? ‘Voluntary’ Eugenics and the Prospects for Reform.” The authors referred to recent experiments in China that “raise yet more troubling questions about the ethics of biotechnology in that still authoritarian country,” and they concluded that “it is therefore a distinct possibility that the Chinese government will permit and perhaps secretly encourage the creation of cloned or genetically modified children for the ‘good of society.’”

Such research projects do indeed merit serious attention. They should provoke intense scrutiny and ongoing public and governmental consideration wherever they are conducted. Although many of these same kinds of research projects were underway at Western sites, in the media these researchers were mainly characterized as rogue scientists who were seeking fame and fortune, or as marginalized “sects” or “cults”—in other words, as individuals rather than representatives of a country. But in discussions of Asia, and of China in particular, questions of bioethics were framed at the level of a people, culture, country, or region.

Bioethical institutions were developing in China even as these controversies were taking place. Ole Doering, a China specialist and philosopher, reports that a “new wave of infrastructure building to regulate and monitor biomedical activities in China took off in 1998.” He, too, writes about ethical outsourcing, but from a different angle. Doering quotes a semi-official Chinese daily newspaper warning in 2003 that “we must be aware that some scientists from developed countries make use of the ignorance and eagerness of their colleagues in the developing countries to carry out experiments banned in their own nations.” In this view, ethical resistance to the Grifo collaboration from a Chinese perspective might not so much question unscrupulous Chinese researchers, but unscrupulous and exploitative foreign, and implicitly Western, collaborators.

Racing ahead

The implications of the Grifo-Zhuang nuclear transfer aftermath reached far beyond fertility treatments and reproduction. The nuclear transfer technique was also seen as central to the promise of stem cell research. The hope was that if one could replace the nucleus of a human embryo with a nucleus from a patient’s cell, then get it to develop for about a week or so, one could get stem cells with that patient’s DNA. This meant that these stem cells could be used to potentially regenerate almost any kind of damaged tissue without prompting an immune response. The potential to treat formerly intractable conditions seemed close at hand.

Nuclear transfer thus holds high stakes and high potential in stem cell research. And stem cell research is frequently characterized as a race: among competing scientists, laboratories, and countries, as well as for cures, money, and fame. The phrase “stem cell race” abounds in the press. The phrase’s popularity was fueled, in part, by the restrictions that President George W. Bush placed in 2001 on federal funding for human embryonic stem cell research. The restrictions limited federal funding to a few existing human embryonic stem cell lines, the so-called presidential lines.

Scientists quickly expressed concerns that the restrictions would threaten this field of medical science. Patients and families worried that treatments and cures would be delayed. Politicians and venture capitalists worried that their regions and investments would be hurt by restricted research funding. Fears and hopes continue to cycle through these bioethical debates.

Scientific globalization evokes images of international competition, every country trying to get ahead. Yet while many Westerners worried that Asian countries would race ahead, unfettered by research and ethical regulations, the inverse may actually be happening in some places. In countries where no regulations yet cover such practices as nuclear transfer or stem cell research, some researchers feel reluctant or even afraid to work in controversial fields without a green light from policymakers, ethicists, and the public. Indeed, policymakers in many countries are working hard to develop ethical research guidelines. Although some people still think of regulations as stifling research, a lack of formal guidelines could be worse, if this means that researchers are not certain what is culturally or legally permissible, now or later.

In China, regulators moved quickly in the aftermath of the Grifo-Zhuang nuclear transfer pregnancy story to ban the procedure. Despite China’s quick response, stereotypes persist, as pointed out by Erica Jonlin, clinical research administrator and regulatory manager at the University of Washington Department of Medicine, whose daily work involves questions of ethics, research, and stem cells. On one hand, she says, “Scientists can collaborate. Scientists like to collaborate.” But she says there remains a stereotype in the U.S. scientific community that scientists in China can do anything. In fact, if there ever were regulatory advantages to doing research in China, they’ve largely gone away. But the fear of unfettered Asian research continues.

Experiences in Taiwan

In some ways, China acted as a stand-in for a broader U.S. cultural fear about Asia, and East Asia in particular. Indeed, many people in Asian countries did see stem cell research, and biotechnology more generally, as a new hope, a way to catch up with the West on the global stage of science. They also saw it as a way to bolster their economies. Singapore developed Biopolis, a state-of-the-art biotech site based on the model of Silicon Valley and famous for recruiting highlevel scientists from the West. South Korea developed a wellfunded stem cell research laboratory at Seoul National University and seemed poised to become a global leader in stem cell research until a scandal involving its leader, Hwang Woo-suk, broke in late 2005.

Taiwan, too, announced in 2005 a national project to develop the country as “Biomedtech Island”–an Asian hub for biomedical technology. Underscoring the urgency, a minister of Taiwan’s Science and Technology Advisory Group said in a 2005 report in Taiwan News, “We are under pressure of time to get the ‘Taiwan–Biomedtech Island’ plan going as soon as possible.” Pointing out similar projects underway in China and Singapore, the official said Taiwan hoped to “compete well in the advanced biomedical fields and become the leader in the field in Asia.” Stem cell research was an important part of this plan.

At Academia Sinica, Taiwan’s most prestigious research institution, broad open spaces and rows of palm trees frame the state-of-the art science facilities. There, until recently, John Yu headed the stem cell research program. He and his wife, Alice Yu, left successful scientific careers in San Diego to help build Taiwan’s biotech sector.

In practice, John Yu spends much of his time not at the laboratory bench but on the development of ethical research protocols. He founded the Taiwan Society for Stem Cell Research, which developed a scientific network and holds discussions on how best to regulate research. He served as Taiwan’s representative at the International Society for Stem Cell Research and was a member of the task force that developed in 2006 the society’s “Guidelines for the Conduct of Human Embryonic Stem Cell Research,” a global standard for ethical stem cell practice.

He is a vocal critic of unregulated stem cell research and therapeutics. Work in Taiwan and elsewhere that is perceived as unethical risks resulting in public opprobrium not only for the individual researcher or physician, but also for the science itself. And although many people in the United States worried that an Asian lack of regulation and ethical constraint would create an atmosphere of unfettered and unethical research, in Taiwan, the opposite seemed to occur.

Consider the case of one young Taiwanese stem cell scientist. (Given the sensitivity of his position, he would rather not be named, so he will be called Dr. Li.) Beneath his softspoken and unassuming demeanor, Dr. Li exudes a passion for his work. For him, stem cell research has both deeply personal and national stakes. He grew up in Taiwan, then completed his education and training as a stem cell biologist in the United Kingdom and the United States. He began a family and was developing a promising career in the United States when he returned to Taiwan in 2004. Like John and Alice Yu, Dr. Li returned to help build biotech in his home country. “Maybe this will sound naïve,” he says, “but originally I came back to Taiwan because I had this idea that it’s my duty; that maybe I can help Taiwan a little bit on stem cell research.”

In his previous work, Dr. Li had used dozens, perhaps hundreds, of human embryos. But in Taiwan, by 2007, when guidelines were still waiting for government authorization, he had not used a single one.

Instead, he helped to establish such guidelines and found himself in deep reflection about the ethicality of his own research. Rather than speeding up his research, the lack of clear policy in Taiwan slowed it down. It seemed that the established policies in the United Kingdom and the United States had enabled him to focus on his research, shielding him perhaps from deep ethical reflection of the type that now holds his attention. He also attributes this shift to more personal factors, such as his maturation and the birth of his first child. He recognized the potentiality that inheres in the human embryo. No longer seeing an embryo as just a research object, he came to see its potential, given just the right set of extremely contingent circumstances, to become someone’s child. He understands the hopes that stem cell research inspires, and the fears, too.

Stem cell research has numerous risks. Individual scientists risk their reputations, careers, and even their freedom if they conduct work that is deemed unethical. Treatments are risky for patients. The science itself relies on public support. Researchers worry that hype and premature human treatment might ultimately diminish support. John Yu of Academia Sinica says this is the greatest worry for stem cell researchers: “We don’t want society to expect too much in terms of what we can achieve now.” He cites a U.S. survey which suggests that the general public’s expectations about therapy developments from stem cells are much more optimistic than those of stem cell scientists. His concern is that hype and “unregulated” physicians will lead the public to expect too much too soon, thus setting the stage for the fragile support of stem cell research to be undermined when therapeutic production is slower.

Dr. Li is also concerned about public attitudes toward stem cell research, saying that “everybody that works in this field, they really want to know what is the public opinion.” It is personal for him: “Myself, I want to know. I really want to know, what do they think about this.” Although stem cell research is still not a major topic of public conversation in Taiwan, some insights about public attitudes may emerge from a study led by Shui-chuen Lee, a Confucian philosopher and bioethicist, and Duujian Tsai, a sociologist and community organizer. A team led by these two professors conducted surveys to identify public knowledge and public concerns about stem cell research.

As Dr. Li and John Yu know, it is not enough to progress scientifically; science has to be done carefully and correctly at every step. Taiwan became a full electoral democracy in 1996, after a 12-year transition period. Before, Taiwan was ruled under martial law for 38 years. So, domestically, public inclusion has become an important topic of governance– political and scientific. And internationally, public inclusion has become an important component of responsible scientific decisionmaking. It is not enough to have bioethical and research policies; increasingly, such policies have to both represent broad public consensus and conform to international standards.

The California experience

While people in Taiwan were taking surveys to assess their knowledge of and support for stem cell research, across the Pacific, Californians were showing their support at the ballot box. In a heavily funded campaign, proponents of Proposition 71, the California Stem Cell Research and Cures Initiative, asked the public to support stem cell research.

The campaign was successful. Passed in 2004, the initiative mandated state investment in stem cell research: $3 billion over 10 years. Proposition 71 represented a new kind of public engagement with science. With federal funding for most human embryonic stem cell research halted in 2001, California and several other states, including Connecticut, Illinois, New Jersey, New York, and Maryland, subsequently took it upon themselves to fund this type of research.

In California, many supporters saw a vote for Proposition 71 as a hopeful vote against President Bush and what they perceived as an anti-science, ideologically driven, and fear-building regime. They saw their vote as progressive, pro-science, and pro-cures, with real people’s lives at stake. They also saw it as a more democratic, if risky, way to fund science. In California, public input has come to be seen as a necessary element in doing ethical science, in both research and its funding.

In a sense, then, bioethics has become explicitly context-specific. Classically, the field of ethics poses such questions as “What should I to do?” and “What constitutes the good life?” But when people encounter fast-changing biomedical technologies, these questions can be especially difficult to answer.

For Jeff Sheehy, advocacy is the answer, as witnessed by his active involvement in various aspects of HIV/AIDS work. He successfully established organ transplantation programs for people living with HIV in California and nationally. He is open about his own struggles in living with HIV. At a 2010 meeting of the California Institute for Regenerative Medicine (CIRM), he said, “For instance, I’m 53, so I’m here”—pointing to a graph of life expectancy for those living with HIV/AIDS—“and it’s a real bet for me whether I’m going to make my five-year-old daughter’s wedding, unless …”

In November 2004, Sheehy received a call from the leader of the California State Senate, John Burton, asking him to accept an appointment as a patient advocate to the governing board of CIRM, which was established by the passage of Proposition 71. Still on the board, he is also now director for communications at the University of California, San Francisco AIDS Research Institute.

Uniquely, CIRM included a mandate to include the state’s diverse communities in every aspect of its decisionmaking process. As a result, these communities help in addressing a range of issues, such as determining which supply companies to use and setting mandates for preferential pricing for the state on any procedures and products to emerge from CIRM-funded research. Proposition 71 was seen to hold more than just the potential to produce cures for various medical conditions; it was seen as a way for the state to gain a foothold in what held promise to become a burgeoning field of the biotech economy.

Writ more broadly, debates in the United States about stem cell research have mainly centered on human embryonic stem cell research and questions about the moral status of the human embryo. In a way unique to this country, the debates are shaped strongly by the divisive abortion issue. For some U.S. residents, the destruction of a human embryo on the research bench is equivalent to something like murder.

For Jeff Sheehy, this is a false argument. “It seems like the whole embryo argument here has been misunderstood,” he says. He says that the embryos are not created for research, but are excess embryos “created to fulfill people who wanted to have children” using IVF. It would be better, he adds, for people who oppose the research to also support public funding of IVF, thereby reducing the strong financial incentive to create as many embryos as possible with each IVF cycle. This would, after all, reduce the overall number of embryos created.

Ultimately, Sheehy suggests, his voice rising slightly, the decision of whether to destroy these excess embryos, to donate them to science, or to give them to others seeking IVF should lie with the parents. “These are ethical choices for parents,” he says. “They should have the autonomy as Americans and as parents.”

Here, Sheehy appeals to the values of anti-paternalism, individual autonomy, and parental decisionmaking that he sees as hallmarks of the United States. He also brings up some broader questions: What kinds of matters should be private and what kinds should be public—and what kinds of things should be publicly funded? The ongoing health care debates and the recession have revealed much about the deep divisions that exist about such topics.

For countries, scientists, and patients, the stem cell race is afoot. Each group experiences a sense of urgency, but none more so than those of patients waiting and hoping for treatments and cures. Sheehy feels this acutely; he has seen his community devastated by HIV/AIDS and, after all, he hopes to make it to his daughter’s wedding.

It is turning out that stem cells look like they may be able to cure HIV infection. In 2008, doctors in Berlin reported that a stem cell transplant had functionally cured a patient with HIV. In 2009, CIRM committed up to $20 million for a study to replicate the results. This would not have happened without Sheehy on the governing board. Many of the board’s members thought that such a sizeable investment was unnecessary. After all, HIV/AIDS in California is being relatively well managed by combinations of antiretroviral drugs.

Sheehy argued, however, that these drugs are problematic and that he and many of his friends would happily trade them for the hope offered by a stem cell therapy. He described for the board the significant side effects of these medications and recounted in personal terms the increased rates of heart disease, non–HIV-related cancers, and neurological deficits that accompany HIV/AIDS infection. When critics discourage funding for stem cell therapies because they do not think anyone with HIV will participate in a clinical trial of such experimental procedures, he is there to say, “I would.”

Changing landscape

The stories of Jeff Sheehy’s activism, public dialogues in Taiwan, and James Grifo’s patient all suggest that the relationship between the scientific sphere and the public sphere is changing. No longer are scientists seen as appropriately selfregulating. CIRM’s inclusion of 10 patient advocates on its governing board also signals a new way of funding and guiding science.

It is also becoming increasingly clear that context matters—cultural, geographic, economic contexts surely, but also the specific details of each case. The mainstream media framed the Grifo-Zhuang case as controversial science, but it left out the context in which a woman, desperate for a biologically related child, prompted and funded the research. Although this detail may well raise additional questions about the ethicality of such funding arrangements, the details nonetheless matter. Individual patients are shaping emerging research.

On the international stage, despite variations in how different countries approach bioethics, the guidelines for human embryonic stem cell research developed by the International Society for Stem Cell Research have found relative acceptance in almost all countries where such research is being conducted. Also, in 2011, nearly a decade after the Grifo-Zhuang controversy, Britain’s esteemed Nuffield Council on Bioethics approved a new IVF technique that involves replacement of the mitochondrion rather than the entire nucleus of a patient’s egg. Though this approach raises very similar ethical concerns, the media response to date has been fairly neutral.

Slowly changing mores are not comforting to someone hoping for a cure to a disease or a chance to bear a child. Nor are they comforting to people who see them as a slippery slope that threatens human integrity and flourishing. But increasingly, locally and globally, bioethical decisions are including more voices, of individual scientists and patients and activists alongside scientific leaders and formal ethicists. Science and bioethics are indeed global endeavors, and now new kinds of relationships and new voices are emerging within and across borders.

From the Hill – Fall 2012

R&D funding picture remains mixed, as budget negotiations stall

The laborious process of crafting a federal budget for the next fiscal year (FY) 2013 appeared set to grind to a halt, when Senate Majority Leader Harry Reid (D-NV) and House Speaker John Boehner (R-OH) said on July 31 that they had reached an agreement on a continuing resolution to fund the government through March 2013. The agreement became necessary when congressional leaders recognized that with the election looming, neither chamber was likely to approve the 12 spending bills before the beginning of FY 2013 on October 1.

The agreement also appeared to settle a running dispute between the parties over total FY 2013 spending. The 2011 debt-ceiling agreement established a discretionary spending cap of $1.047 trillion. Although the administration and Senate Democrats have abided by this agreed-on limit, the House GOP passed a budget resolution that capped overall spending at $1.028 trillion. The lower cap had drawn the ire of Democrats, and the White House had consistently promised to veto any spending bill that abides by the lower cap. The House position has been jettisoned, at least for now.

Despite the apparent settlement of the dispute, Congress remains at an impasse on negotiations to avert the “sequestration” cutbacks required for both defense and nondefense spending set to begin in January 2013. Negotiations are under way to avert the across-the-board 10% cuts to defense and 8% cuts to nondefense spending, and a number of Republicans have said they may be willing to consider revenue increases as a part of the package. Even so, support for deep cuts remains strong in some quarters, particularly for nondefense discretionary spending, a category that includes virtually all federal spending outside of defense and entitlement spending and virtually all nondefense R&D. Nondefense discretionary spending was targeted for cuts in the House-passed budget resolution, and similar proposals to protect defense spending at the expense of nondefense spending have been attached as riders to other bills.

To combat these attempted cuts, approximately 3,000 organizations from across the public interest spectrum recently sent a letter to Congress asking for a responsible deficit-reduction approach that does not include further cuts to nondefense discretionary spending, which has already been cut by about 10% since FY 2010. Projecting the effects of this approach into the future, the American Association for the Advancement of Science (AAAS) has estimated that shifting the planned cuts entirely to nondefense areas could result in a reduction of 18%, or $52 billion, in nondefense R&D funding at science agencies over the next five years.

Pressure also remains intense on the defense side, as defense contractors, who have long argued that the cuts will force them to fire thousands of employees, have said they will be required under federal law to issue layoff notices by November 2, which is 60 days before the cuts begin to take effect and just a few days before the elections. Reports by the National Association of Manufacturers and the Aerospace Industry Association have placed budgetcut– induced job losses at one to two million. Congress passed legislation that requires the administration and Pentagon to explain exactly how the across-the-board cuts would be allocated, as the Office of Management and Budget begins consultations with the agencies on these questions.

Even as the overall picture remains cloudy, there nevertheless has been some progress on spending bills on the House side. On August 2, the Senate Appropriations Committee passed its version of the FY 2013 Defense Appropriations bill, after the full House passed its own version two weeks earlier. According to AAAS estimates, the bill would reduce Department of Defense (DOD) R&D by $1.9 billion or 2.5% below FY 2012 levels, nearly equal to the overall cut proposed by the administration. In contrast, the House version would reduce overall DOD R&D by about half as much. Basic research across all military departments and agencies would be funded at roughly $2.1 billion under all three proposals, whereas applied research funding in the Senate bill is more generous than in either the administration or the House proposals. As in prior years, and mirroring the House, the Senate Committee has restored substantial funding to R&D in the Defense Health Program, which the administration had targeted for a nearly 46.9% cut. The FY 2013 Interior/Environment spending bill is now the lone R&D-heavy spending legislation yet to be taken up by the Senate Appropriations Committee.

A House appropriations subcommittee passed the Labor, Health, and Human Services spending bill on July 18, which would keep National Institutes of Health (NIH) funding flat for FY 2013, similar to the Senate version of the bill. The House bill would terminate the Agency for Healthcare Research and Quality, and mandate a 90/10 split for extramural/intramural research and a 55% split for basic research at NIH, in an attempt at ensuring that both kinds of research remain priorities for the agency. The subcommittee would also reduce the maximum salaries that institutions can charge to NIH as a cost-saving measure.

On June 29, the full House voted 261 to 163 to approve the FY 2013 Transportation and Housing and Urban Development spending bill. According to AAAS estimates, the Department of Transportation would receive approximately $1 billion for R&D funding in FY 2013, an increase of $70 million or 7.4% above FY 2012 levels, although less than the president’s request. The Federal Highway Administration would receive most of the R&D boost sought by the administration, reaching $494 million, 20.2% higher than in FY 2012.

On June 28, the House Appropriations Committee approved its FY 2013 Interior and Environment appropriations bill, which would slash R&D funding at the Department of Interior, the Environmental Protection Agency (EPA), and the Forest Service. According to AAAS estimates, the bill would fund Interior R&D at approximately $740 million, $122 million or 14.2% below the president’s budget request and $56 million or 7.1% below FY 2012 levels. U.S. Geological Survey (USGS) R&D would be cut by 14.4% below the president’s request and 8% below FY 2012. EPA funding would be reduced by 10.2% below the president’s request and 8.9% below FY 2012. The cut is almost entirely in EPA science and technology, and the committee also passed several amendments to limit the EPA’s ability to regulate greenhouse gas (GHG) emissions and toxins.

On June 19, the House Appropriations Committee approved its FY 2013 agriculture funding bill (H.R. 5973). According to AAAS estimates, the bill would cut U.S. Department of Agriculture R&D by 4.5% below the president’s request and 5.9% below FY 2012, although part of this apparent cut is attributable to the end of the biomass R&D program, which is up for reauthorization in FY 2013 in the current farm bill. Although the Agricultural Research Service and the National Institute of Food and Agriculture would be cut, the Agriculture and Food Research Initiative would receive a boost of 4.6% above current-year funding. The bill generally falls short of the president’s request and the current Senate version of the bill (S. 2375), which was passed by committee on April 26 and awaits floor action.

House Republicans hold controversial hearings on EPA rules

The House held two controversial hearings on the EPA on June 6, addressing the effects of recent rules on the oil and gas industries. The Committee on Energy and Commerce’s Subcommittee on Energy and Power invited several stakeholders to express their concerns about EPA enforcement in Region 6, which includes Arkansas, Louisiana, New Mexico, Oklahoma, and Texas. The House Science, Space, and Technology Committee’s Subcommittee on Energy and Environment’s hearing focused on the costs and benefits of recent EPA rules, featuring witnesses from industry groups.

House Republicans used the hearings to criticize the EPA, saying it interferes with state regulations, unfairly burdens coal and other fossil fuel industries, and creates standards based on faulty science. At the Energy Committee hearing, Barry Smitherton, chairman of the Texas Railroad Com- mission, discussed the Range Resources case. In December 2010, the EPA issued an emergency endangerment order to the Range Resources Company against the advice of the commission, which serves as a state regulatory agency. It was later discovered that Range Resources was not responsible for the groundwater pollution that the EPA had detected, and the order was lifted, but only after the company spent millions of dollars defending itself, according to Smitherton.

At the Science Committee hearing, Tom Wolf, executive director of the Illinois Chamber of Commerce Energy Council, said the New Source Performance Standards for carbon dioxide emissions worked for natural gas plants but were impossible for coal-powered plants to meet with the currently available technology. He said the large leap in standards would create a roadblock for coal producers, instead of the intended incentive for innovation.

At the same hearing, Energy and Environment Subcommittee Chairman Andy Harris (R-MD), and Michael Honeycutt, chief toxicologist at the Texas Commission on Environmental Quality, discussed what they saw as the EPA’s overestimation of the benefits conferred by its rules.

Democrats on both subcommittees criticized the hearings’ intentions. Energy and Environment Subcommittee Ranking Member Brad Miller (D-NC) called the Science Committee’s hearing “one more forum for specific big industries to air their grievances about the EPA.” Energy Committee Ranking Member Henry Waxman (D-CA) called his colleagues’ opening statements “part of the fact-free, anti-EPA rhetoric of the Republicans.”

Bills introduced to improve forensic science

Kirk Odom served 20 years in prison for a crime he did not commit. He was convicted on the basis of a mistaken victim identification and faulty forensics. Thirty years later, DNA testing on a hair found at the crime scene, as well as stains on pillowcases and the victim’s clothing, proved that he was innocent. Odom was the third person in three years to have his conviction overturned because of unreliable hair analyses in Washington, DC.

Nationwide, there have been 292 post-conviction DNA exonerations in the United States since 1989, and, according to the Innocence Project, a nonprofit that helps prisoners who were wrongly convicted, about half of those wrongful convictions were due at least in part to poor forensic science.

Congress is considering several new bills in response to these recent events, as well as an investigative report by the Washington Post and a 2009 National Research Council (NRC) study, Strengthening Forensic Science in the United States: A Path Forward.

Last year, Sen. Patrick Leahy (DVT) introduced the Criminal Justice and Forensic Science Reform Act (S. 132). It would establish an Office of Forensic Science in the Department of Justice (DOJ) that would be responsible for creating and implementing uniform standards and enforcing regulations, as well as a Forensic Science Board to determine research priorities. (The placement of such an office in the DOJ runs counter to the NRC report, which said that a new and independent organization would be needed to regulate the forensic community, because no existing government agency has a relevant mission statement or the appropriate resources to take on this task.) The bill would also require that any labs or individuals receiving federal funding be accredited based on standards outlined by the new Board and Office of Forensic Science. The bill is currently being considered by the Senate Committee on the Judiciary.

In July, Sen. John D. Rockefeller IV (D-WV) introduced the Forensic Science and Standards Act of 2012 (S. 3378), which directs the National Institute for Standards and Technology (NIST) to develop standards for forensic scientists and establish a Forensic Science Advisory Committee. It would be composed of research scientists, forensic scientists, and members of the legal and law enforcement communities and chaired by the director of NIST and the attorney general. Rockefeller’s bill would also establish a National Forensic Science Coordinating Office in the National Science Foundation to develop a research strategy and provide grant money for forensic science research centers. S. 3378 is currently being reviewed by the Senate Committee on Commerce, Science, and Transportation. Rep. Eddie Bernice Johnson (D-TX) introduced a companion bill, H.R. 6106, which has been referred to the House Committees on Science, Space, and Technology, as well as the Judiciary Committee.

The proponents of the bills believe that nationally recognized standards and a strong peer-review process, much like the one that helps to regulate the rest of the scientific community, will result in better research and more accurate analyses and lead to fewer wrongful convictions.

Senate committee examines EPA rule on air pollution from fracking

The Senate Committee on Environment and Public Works’ Subcommittee on Clean Air and Nuclear Safety held a hearing on June 19 to review new EPA air standards for hydraulically fractured natural gas wells and oil and natural gas storage.

The EPA rule, which was finalized on April 18, includes the first-ever national standards on air pollution from gas produced in wells using a process known as fracking. Members of the oil and gas industry have criticized the new rule for its potential impact on domestic natural gas production.

Opening statements and comments at the hearing fell along party lines. Subcommittee Chairman Thomas Carper (D-DE) praised the EPA for addressing the lack of fracking regulations in most states, and Sen. Benjamin Cardin (D-MD) insisted that air pollution is a national issue because it does not follow state boundaries. On the other side, Subcommittee Ranking Member John Barrasso (R-WY) criticized the Obama administration for working against the natural gas industry, despite its “all-of-the-above” energy rhetoric. Committee Ranking Member James Inhofe (R-OK) brought up recent EPA controversies and argued that the regulation of fractured wells should be left to the states.

The first panel featured Gina McCarthy, the EPA’s assistant administrator for the Office of Air and Radiation, who said the recent rule on air emissions was achievable, would result in cost savings, would reduce air pollution, and would not slow natural gas production. She also described changes made to the final version of the rule in response to industry feedback, which included the introduction of a transition period before the use of reduced emission completions (also known as “green completions”) would be required. Green completions capture natural gas that is emitted during a well’s flowback period, preventing the release of volatile organic compounds into the atmosphere. The final rule also includes a new subcategory of wells in low-pressure areas. These wells are not required to use green completions, because the new technology is not cost-effective in those cases.

On the second panel, Fred Krupp, a member of the Secretary of Energy’s Advisory Board Natural Gas Subcommittee, outlined the subcommittee’s findings that oil and natural gas production results in the emission of toxic air pollutants such as carcinogenic benzene, ground-level ozone, and methane, which he said causes global warming at a rate 72 times higher than that of carbon dioxide.

John Corra and William Allison, representatives of state regulatory agencies in Wyoming and Colorado, respectively, highlighted the current environmental regulations in their states, which the EPA used as the basis for the new rule. Both stressed the importance of allowing states flexibility during implementation, because the use of green completions is not technologically or economically feasible at some sites.

Tisha Schuller, the president and chief executive officer of the Colorado Oil and Gas Association, expressed her concerns that the EPA overestimated the benefits of the rule by overestimating the emissions from fractured wells and overestimating the cost savings from the rule, while underestimating the costs for new equipment and regulatory and administrative requirements. Darren Smith, the environmental manager at Devon Energy Corporation, suggested that the EPA overestimated the current emissions from fractured wells.

Carper ended the hearing by hailing the rule as “common sense.” Although the witnesses agreed that the rule needs some tweaks before it is fully implemented in 2015, the debate overall seemed to be, as McCarthy said, a question of whether the rule was “good or very good.”

Federal science and technology in brief

  • The AAAS Office of Government Relations has developed a Web site (http://elections.aaas.org/2012/) that describes and tracks the presidential candidates’ positions on science, technology, and innovation issues.
  • New rules proposed by the Small Business Administration for the Small Business Innovation Research program and the Small Business Technology Transfer program are causing a stir among industry advocates. The rules could eliminate a requirement that grant applicants be majority-owned by a U.S. resident or company. Instead, they propose that applicants must operate primarily in the United States or “make a significant contribution to the U.S. economy,” without a U.S. ownership requirement. There are also concerns that the new proposal could create a loophole that opens the door for large companies, in addition to small businesses, to receive grants.

“From the Hill” is adapted from the newsletter Science and Technology in Congress, published by the Office of Government Relations of the American Association for the Advancement of Science (www.aaas.org) in Washington, DC.

Do High-Stakes Tests Improve Learning?

Test-based incentives, which reward or sanction schools, teachers, and students based on students’ test scores, have dominated U.S. education policy for decades. But a recent study suggests that they should be used with caution and carefully evaluated.

The United States has long performed at a middling level on international assessments of students’ math, reading, and science knowledge, trailing many other high-income countries. In their efforts to improve K-12 education, U.S. policymakers have increasingly turned to offering incentives—either to schools, to teachers, or to students themselves—to increase students’ standardized test scores.

For example, the No Child Left Behind (NCLB) law, which has governed public education for more than 10 years, sanctions schools whose students do not perform well on standardized tests. More recently, states and school districts have experimented with awarding bonuses to teachers if their students’ test scores climb. Twenty-five states target the incentives to students themselves by requiring them to pass an exit exam before receiving their diploma.

All of these policies share a fundamental principle: They reward or sanction students, teachers, or schools based on how well students score on standardized tests. Policymakers hope that by holding various players in the education system accountable for how much students learn, they will be motivated to improve student performance. But do test-based incentives actually drive improvements in student learning?

In an effort to answer that question, a recent study by the National Research Council took a comprehensive look at the available research on how incentives affect student learning. The study committee, composed of experts in education, economics, and psychology, examined a range of studies on the effects of many types of incentive programs. What it found was not encouraging: The incentive systems that have been carefully studied have had only small effects, and in many cases no effect, on student learning.

Measuring student learning

At best, any test can measure students’ knowledge of only a subset of the content in a particular subject area; it is also generally more difficult to design test items at higher levels of cognitive complexity. These limitations take on greater significance when incentives are tied to the test results. Research has shown that incentives can encourage teachers to “teach to the test” by narrowing their focus to the material most likely to appear on the test. As a result, their students’ scores may be artificially inflated because the score reflects their knowledge of only part of the material the students should know about the subject.

For example, if teachers move from covering the full range of material in eighth-grade mathematics to focusing only on the portion included on the test, their students’ test scores may rise even as their learning in the untested part of the subject stays the same or even declines.

In measuring how incentives affect student learning of a subject, it is important to look at students’ scores not on the high-stakes test that is tied to the incentives, but at low-stakes tests that are designed to provide a general picture of the quality of learning and do not have direct consequences for schools, teachers, or students. Because there is no incentive that would motivate teachers to narrow their instruction to the materials tested on low-stakes tests, the scores on those tests, such as the National Assessment of Educational Progress (NAEP), are less likely to be inflated and can give a more reliable picture of student learning in a subject area. In conducting its review of the research, the committee focused mainly on studies that based their assessment on low-stakes tests.

The committee also limited its evaluation to studies that allowed researchers to draw causal conclusions about the effects of test-based incentives. This means that studies had to have a comparison group of students, teachers, or schools that were not subject to incentives or rewards, and that individuals or groups could not self-select into the comparison group. In addition, the committee looked only at studies of programs that had existed long enough to supply meaningful results, which means that some programs, particularly many involving performance pay for teachers, were too new to evaluate.

Effects small, variable

The committee examined research on 15 programs with a range of configurations to assess the effects when incentives are given to schools, teachers, and students. Findings on some of these incentive programs are summarized below, and the effect sizes of all of them are shown in Figure 1.

Incentives for schools. Many state programs, as well as NCLB, reward or sanction schools based on the improvements made by their students. Under NCLB, for example, schools that do not show adequate yearly progress in improving student test scores face escalating consequences. Schools must first file improvement plans, make curriculum changes, and offer students school choice or tutoring; if progress is not shown, they are required to restructure in various ways. Some programs tie incentives to test score gains among students at all scoring levels, whereas others tie the incentive to the number of students who move from nonproficient to proficient levels in a subject area.

To understand how these types of incentives affect student learning, the committee looked at a synthesis of 14 studies of state-level incentive programs for schools before NCLB, as well as two studies on the impact of NCLB itself. Across subjects and grade levels, the research indicates an effect size of about 0.08 on student learning—equivalent to raising a student’s performance from the 50th to the 53rd percentile—when evaluated using the NAEP, a low-stakes test. The positive effect was strongest for fourth-grade mathematics.

Incentives for teachers. Many programs in this category are simply too new to be meaningfully evaluated, but those that have been assessed reveal effects even smaller than those for school-based incentives. One program that has existed long enough to be evaluated is a Nashville-based program that offered teachers bonuses of $5,000 to $15,000 for improvements in their students’ test scores. The proportion of participating teachers who received a bonus increased from one-third in the first year to one-half in the third year. However, over three years and four grades, researchers found an average effect size of .04 standard deviations on the high-stakes test, which was not statistically significant.

Another initiative, the Teacher Advancement Program, is a nationwide, foundation-developed program that offers teachers bonuses of up to $12,000 based on students’ test score gains; it also offers professional development to teachers. As of 2007, the program had been implemented in more than 180 U.S. schools. One evaluation of the program found no statistically significant effect on student test scores as measured by the high-stakes tests themselves. Another evaluation that looked at student math scores found that TAP schools increased test score gains on a low-stakes test by one to two points in grades 2-5, a statistically significant gain. In grades 6-10 the changes were either statistically insignificant or showed decreases of one to three points. Across grades, the average effect was 0.01 standard deviations.

One program did find significant positive results. A Texas program offered high-school teachers bonuses of $500 to $1,000 for each of their students who scored a 3 or higher (out of 5) on an Advance Placement (AP) exam; students also received smaller cash bonuses for a score of 3 or higher. The program included teacher training and a curriculum for earlier grades to prepare students to take AP classes. In schools that implemented the program, the number of students who scored at least an 1,100 on the SAT or a 24 on the ACT—in this context, the low-stakes tests—increased by two percentage points in the first year of the program and by one point each in the second and third years. By year three, enrollment in AP programs increased by 34%, and the number of students attending college increased by 5.3%.

Incentives for students. In an incentive experiment carried out over two years in New York City, fourth and seventh graders were offered cash rewards (up to $25 for the fourth graders and $50 for the seventh graders) based on scores on math and reading tests. Evaluators found that across eight combinations of subject and grade level, the average effect size on student learning was 0.01 when measured by New York state tests in reading and mathematics, a low-stakes test in this context because it offered no cash rewards.

The most widely used test-based incentives targeted at students are high-school exit exams, which are now required of almost two-thirds of public high-school students. Although the exams and subjects covered vary from state to state, students typically must pass tests in multiple subjects before they are awarded a high school diploma. The committee found that exit exams decrease the rate of high-school graduation by about two percentage points but do not increase student learning when measured by NAEP scores.

Incentive programs in other countries. The committee also examined six studies of incentive programs in India, Israel, and Kenya and found effects on achievement ranging from 0.01 to 0.19 standard deviations. However, most of the studies measured student achievement using the high-stakes tests attached to the incentives. Moreover, the India and Kenya programs were in developing countries where the educational context, which included high rates of teacher absenteeism and high student dropout rates in middle school, differed markedly from developed nations, making the studies’ lessons for the United States unclear.

Conclusions

Looking across all of the combinations of incentives, the committee found that when evaluated using low-stakes tests, incentives’ overall effects on achievement tend to be small and are effectively zero for a number of programs. Even when evaluated using the high-stakes tests attached to the incentives, a number of programs show only small effects.

The largest effects resulted from incentives applied to schools, such as those used in NCLB. Even here, however, the effect size of 0.08 is the equivalent of moving a student performing at the 50th percentile to the 53rd percentile. Raising student achievement in the United States to reach the level of the highest-performing nations would require a gain equivalent to moving a student at the 50th percentile to the 84th percentile. Unfortunately, no intervention has been demonstrated to produce an increase that dramatic. The improvement generated by school-based incentives is no less than that shown by other successful educational interventions.

However, although some types of incentives perform as well as other interventions, given the immense amount of policy emphasis that incentives have received during the past three decades, the amount of improvement they have produced so far is strikingly small. The study committee concluded that despite using incentives in various forms for 30 years, policymakers and educational administrators still do not know how to use them to consistently generate positive effects on student achievement and drive improvements in education.

What’s next?

Should policymakers give up on test-based incentives? Although the study’s findings do not necessarily mean that it is impossible to use incentives successfully, the small benefits they have produced so far suggest that they should be used with caution and carefully evaluated for effectiveness when they are used.

The study committee recommends a path of careful experimentation with new uses of incentives, combined with a more balanced approach to educational interventions. Evidence does not support staking so much of our hope for educational improvement on this single method; rather, it suggests that we should be moving some of our eggs out of the incentives basket and into other complementary efforts to improve education.

To those ends, the committee’s report urges that policymakers and educators take the following steps:

Experiment with using test-based incentives in more sophisticated ways, as one part of a richer accountability and improvement system. For example, some have proposed using school-based incentives with broader performance measures, such as graduation rates or measures of students’ project-based work. Others have proposed using test results as a “trigger” mechanism for fuller evaluations of schools. Under such a system, teachers or schools with low test scores would not automatically be sanctioned. Instead, the test results would identify schools that may need a review of their organizational and instructional practices and more intensive support for teachers.

Design tests in ways that discourage narrow teaching. The design of any incentive-based system should start with a description of the most valued educational goals, and the tests and indicators used should reflect those goals. However, the precise content and format should not remain the same over the years; a test that asks very similar questions from year to year and the same formats will become predictable and encourage teaching to the test. Even if the questions were initially an excellent gauge of student performance, over time the test scores are likely to become distorted as a result. To reduce the inclination to teach to the test, the tests should be designed to sample subject matter broadly and include continually changing content and item formats. Test items should be used only rarely and unpredictably.

Carefully evaluate the effectiveness of any new incentive program pursued. Incentives’ effectiveness may depend on the particular features of the program: whether it is schools, teachers, or students who are offered incentives, for example, and which tests or performance measures are used. These features should be carefully documented so that their effects can be considered when the program is assessed.

Consider using test scores in an informational role rather than attaching explicit rewards or sanctions. The policy discussion during the past decade has rested on the assumption that using tests in this way—to give educators and the public information about performance without explicit consequences—is not enough to produce change. But psychological research suggests that informational uses may be more effective in some situations to motivate students and educators.

Balance further experimentation with incentives with complementary efforts to improve other parts of the educational system. In continuing to explore options with test-based incentives, policymakers should keep in mind the costs of doing so. During the past two decades, substantial attention and resources have been devoted to using incentives in an attempt to strengthen education, an experiment that was worthwhile because it seemed to offer a promising route to improvement. Further investment still seems to be worthwhile because there are more sophisticated proposals for using test-based incentives that offer hope for improvement and deserve to be tried. But the available evidence on incentives’ effects so far does not justify a single-minded focus on them as a primary tool of education policy without a complementary focus on other aspects of the education system.

As policymakers continue to explore incentive approaches that have not been tried, they should avoid draining resources and attention away from other aspects of the education system, such as efforts to improve standards curricula, instructional methods, and teachers’ skills. Without these complementary efforts, no incentives are likely to work.


Michael Hout is the Natalie Cohen Chair of Sociology and Demography at the Berkeley Population Center, University of California, Berkeley. He chaired the study committee that produced the report from which this article is drawn, Incentives and Test-Based Accountability in Education, available from National Academies Press. Stuart Elliott directed the study and serves as director of the National Research Council’s Board on Testing and Assessment. Sara Frueh is a writer for the National Research Council.

Escape from the Great Distress: The Role of Rules

The early part of my career was focused on the elusive notion of an idea and its economic power. I worked both at the theoretical level and the policy level in thinking through what we should do in our policies to take the full advantage of the power of ideas.

What makes ideas so remarkable is their capacity for shared use. A bottle of valuable medicine can heal one person, but the formula that is used to make the medicine is as valuable as the total number of people on Earth. Economists call this concept “non-rivalry.” Working through both the theoretical and policy implications of non-rivalry took up the first decade or more of my career.

There is a saying that you all know that we use to capture this character of non-rivalry: If you give someone a fish, you feed them for a day, but if you teach someone to fish, you destroy another aquatic ecosystem.

My new work really springs from this, from a recognition that progress, in the broadest sense, is a function not just of the development of new technologies, but also the development of rules that make sure that we use those technologies appropriately to make us all better off.

My goal is to explore this co-evolution of rules and technologies. In basic control theory, the distinction is made between control variables and state variables. The state variable could be the amount of water in the tank, and the control variable is the valve. The science policy community—and I certainly include myself in my early work—thinks of rules as control variables. We use legislative actions such as the Bayh-Dole Act to change the rules on patenting, or we make new rules about education subsidies to attract more people to science and engineering. If we just come up with a good rule and figure out how to get it implemented, that will have implications for the dynamics of technology.

The point I want to argue today is that we can step back and think of rules and technologies as state variables that influence each other and co-evolve. The deep control variables are something I am going to call metarules. The metarules are the rules by which we change the rules that we need to change on a day-to-day basis. In a dynamic world, we need to keep changing those rules all the time.

The Charter Cities initiative is an attempt on my part to propose a different metarule for changing the rules in developing countries, one that could, in some sense, circumvent many of the roadblocks that stop changes in rules. The underlying strategy behind Charter Cities is to use the power of a start-up in contrast to something like reform of an existing institution. The start-up firms are the way in which very important new innovations come into an industry. Perhaps, start-up political jurisdictions could have that same effect in the developing world. The Charter Cities Web site has information about the theory and its practical implications in Honduras, which is starting an entirely new political jurisdiction and city.

But my purpose here is not to address the political hurdles in developing countries but to examine the dynamic relationship between rules and technologies in the United States. To frame this, let me start with an example of a rule that we decided that we wanted to change decisively in the 1960s and 1970s: the rule that said that blacks should be treated differently than whites in the United States.

Calling that a rule highlights something that might not be obvious. Rules capture any social regularity. They can be created by legislation, regulation, and enforcement, but they can also be the result of social norms. Discrimination was the result of both of those processes.

The nation went about changing that rule in a variety of ways. One way was through the filing of lawsuits that eventually made their way to the Supreme Court. These lawsuits appealed to fundamental principles in the Constitution and led to the early steps in desegregation in schooling.

Another way was through congressional action that culminated in the Civil Rights Act of 1964. This legislation is enforced by detailed rules, issued by organizations such as the Equal Employment Opportunity Commission, that stipulate what steps an employer must take to implement an affirmative action program.

A third way in which the nation changed these rules is illustrated by the actions taken by the U.S. Army after the Vietnam War, when it sought to eliminate discrimination and achieve better racial harmony as it made the transition to an all-volunteer army. Charles C. Moskos, a Northwestern University sociologist renowned for his study of the military, tells the story of a cafeteria at Fort Hood, where all the black soldiers would regularly eat together at one table. One day, a white sergeant went over to the black soldiers and told them to go sit at other tables and that in the future there would be no table that would be considered exclusively for black or white use. Moskos observed that this type of rule enforcement, which was appropriate for the military, would not be effective on his college campus.

Even though many people might object to the military system of enforcing rules, the fact is that the Army has been the most successful U.S. institution at abolishing formal discrimination, reducing informal discrimination, and achieving thorough integration. We should be open to the possibility that there are different ways to go about changing, rules. The familiar paths of elections, legislative action, and formal judicial procedures are not the only viable and legitimate options.

Consider another example that is more recent and more familiar: stabilizing an economy in the face of a dramatic crisis. The nation has two systems of metarules for responding to a crisis. One is that Congress can pass special legislation, as it did in 2008 with the Troubled Asset Relief Program. The other is that the Federal Reserve Board has the authority to take unilateral action on many financial matters, just as the Army can in managing its troops. The Fed responded much more quickly and aggressively than did Congress, and arguably, it deserves the credit for saving the United States and the world from a disastrous global financial panic.

We need to keep an open mind about metarules and to be rigorously scientific in identifying goals and assessing the mechanisms for achieving them.

The power of metarules

Having articulated this framework of the co-evolution of rules and technologies and certain metarules that define how rules are updated, what does it imply for the nation’s response to the Great Distress, the coming 5 to 10 years of economic turmoil into which the United States is heading? Can an understanding of metarules make it possible to understand this period in a longer historical context?

One possibility is that the legislative and regulatory process that the nation uses to set many of the rules for its economy may not be producing the desired outcomes. In particular, they may be giving too much power to the hierarchical organizations that collect net income from the government. Consider a few examples and their connection to the Great Distress.

The financial sector in this country is regulated by legislation and regulations passed by agencies such as the Securities and Exchange Commission. Congress recently completed a reform process of legislation and some reorganization of these agencies because of the evidence that existing rules and regulations were not effective in preventing the recent financial crisis. The nation is making a serious effort to update its rules, but it is far from certain that the current approach will be effective.

Look at the healthcare sector, which has just been through a major round of legislation designed to reform its rules, a task that has been on the national agenda for decades. A movement to revise the relationship between national and state governments and their employees is gaining strength. Several states are trying to limit the range of items that are subject to union collective bargaining agreements. Many states and local jurisdictions have eliminated elected school boards and given control of the schools to a governor, mayor, or other official. In higher education, the rapidly expanding for-profit institutions are trying to influence the legislation and the regulations that influence their industry.

What is common to these activities? In each case, they are well-organized entities in some kind of a hierarchical structure that have the opportunity to have significant increases in net income if the government makes one set of decisions or to have reductions in that income if the government makes a different set of decisions. In all cases one can argue that they have been too effective in promoting their own self-interest with the result that the national interest has suffered. All of these attempts at reform are trying to redress the balance.

For decades now, a background theme in economic discussions has been the increase in income inequality. Although economists are engaged in a long-running debate about the details of how best to measure income, the underlying fact about which there is no dispute is that income inequality has been growing in the United States. Even if technology has been improving rapidly enough to raise average income, somehow the benefits of that are being distributed in a way that favors those on the upper rungs of the economic ladder so that median income is not growing.

Tyler Cowen of George Mason University interprets the underlying problem as a problem in technology. He speculates that perhaps technology is not coming in fast enough or maybe it is not the right form of technology.

An alternative perspective illuminates the importance of rules. The key factor is that finance and health, the two sectors where concerns have arisen about the quality and effectiveness of their governing rules, have captured a rapidly increasing share of GDP.

Finance has grown from roughly 6% of GDP in the 1970s to more than 15% today. Healthcare has grown from 5-6 to 12-13%. The data for government are provided for reference. This measurement of government is essentially the workers that governments hire and the buildings that they build. This is not a measure of taxes collected to then return to consumers in the Medicare program. That is treated as a transfer. It might surprise many Americans to learn that government’s share of GDP has actually decreased during this period when shares of finance and healthcare were growing dramatically.

If there is something wrong with the rules in finance and healthcare and if those rules in some sense have actually grown worse since the 1970s, and these sectors are becoming much larger fractions of GDP, it is conceivable that they are behind some of the broader measures of distress in the economy.

For example, compensation and returns in finance are highly concentrated among a very small number of people. Having even more of GDP going to finance and having that be very concentrated among a few people is clearly contributing to increases in income inequality. Some of the compensation in healthcare, especially among medical professionals, is also concentrated at the upper end of the income scale and thus also may be contributing to inequality.

The most interesting question, however, is not the details of the forces behind income distribution but what is happening to the quality of the rules that govern finance and health care. Has quality been deteriorating, and if so, what are the implications?

In the recent healthcare debate little mention was made of the fact that most U.S. healthcare providers, such as the Blue Shields and the Blue Crosses, used to be nonprofits. Then in the 1990s, the country witnessed a massive wave of privatization of those organizations. At the time, there was concern about whether private organizations would provide the same charitable care as the nonprofits had, but other than that, no one raised any concern about the broader implications of this change. Instead of having a healthcare sector that looks primarily like Kaiser or the Mayo Clinic, we have these very large corporate interests in both insurance and in hospitals. The insurance part actually shows up in the finance curve. The employees of the hospital would show up in healthcare.

The possibility here is that having corporate entities with executives whose personal compensation is dependent on corporate profits created political pressures to adjust at the margin any law or any piece of legislation at the national level that influenced the profits of these organizations. They created very strong incentives to tweak those items in the direction of increasing the compensation for people in this sector. Some of the resistance to national healthcare reforms and some of the very effective political action right now to undermine the most recent healthcare reform may, in fact, be the long-run consequence of having switched from a nonprofit organization in this sector to a for-profit system of organization.

Failure to pay attention to the dynamics of rule-setting and the nature of the operative metarules could have blinded us to what was happening in areas such as setting compensation levels for Medicare. It may have allowed us to make what could end up being the biggest budgetary mistake in our nation’s history. It is not something that will be easy to roll back.

Why might the for-profit entity be problematic here? Part of it is just because so much of its compensation comes directly from the government or comes in ways that are influenced by government action, such as tax subsidies for employer-provided healthcare. But part of it may be because we have a system of metarules for setting the laws and regulations that relies so heavily on legislatures.

Remember, we could have had a system like the military that sets the rules. We could have had a system like the Supreme Court. But what we have is a Congress that passes bills and then agencies that implement those bills with regulations. That legislative process may be too vulnerable to manipulation by very well financed entities with an enormous amount of wealth and income at stake. Every dollar of cost savings in healthcare means a dollar in reduced income for somebody in that sector.

How would this show up? When Congress was at a budget impasse in Spring 2011 and in danger of shutting down, it managed to pass a bill at the very last minute. One measure in that complex legislation repealed a provision of the healthcare reform bill that provided free-choice vouchers even though these vouchers have no effect on the budget. Someone, and we don’t know who, inserted this repeal measure at the last minute.

There was also a rollback of $2.2 billion that had been set aside in the healthcare reform act to support the creation of a few private health cooperatives, which were designed precisely to rebuild some of the nonprofit providers in the healthcare sector that had been wiped out in the 1990s. But the funding for those was also wiped out as part of this budget compromise.

Healthcare interests, which have a huge stake in this area, managed to catch the government at this point of maximum vulnerability and extract a few key concessions for the industry. I think you can anticipate that we will see this scenario repeated in each of the next few crises that will take place in the Congress in the months to come.

Another example involves the for-profit higher education sector. Many of these institutions enroll large numbers of students who depend on government-guaranteed loans to pay their tuition. But students who attend these schools default on their loans at a higher rate than those who attend nonprofit institutions. Experts attribute this high default rate to the for-profits’ aggressive recruiting techniques that attract many unqualified students who do not pass their courses and therefore have trouble finding jobs or who find that the education they receive does not make them more employable.

The Department of Education is trying to promulgate regulations that would force disclosure of employment prospects for students, prohibit certain dishonest marketing tactics in recruiting students, and threaten a cutoff of the access to guaranteed student loans for institutions that have the worst default rates. Even though these regulations would clearly save the government money, a hundred members of Congress signed a letter to President Obama stating their disappointment that the budget legislation did not include language prohibiting the Department of Education from implementing its regulations.

This issue will not go away. We may be in the early stages of a shift toward for-profit provision of education, analogous to the 1990s shift to for-profit provision of healthcare. The budgetary and other social effects could be significant.

The one point to make about government employee unions is that this is not a left/right issue. These unions have a very big stake in net income from the government. Some observers worry that the unions have been able to use contract provisions such as defined-benefit retirement programs and disability benefits designed to extract payments that were not fully disclosed or that circumvented the usual accounting and budgetary mechanisms.

All of these examples go beyond the left and the right, Democrat and Republican. It is a general reflection of the vulnerability of legislative-like mechanisms to manipulation by these kinds of interests. A mayor, a governor, or a president is understood to be individually responsible for the outcome of decisions and policies, but members of elected bodies are able to elude this direct responsibility.

Rewriting the rules

So what are the possible responses to this? One that is fascinating is a move to change the role of the Congress. In the case of the Base Realignment and Closing Commission, Congress delegated a group of people to make the tough decisions and develop a comprehensive plan that would then receive an up or down vote. This insulated members of Congress from the back office negotiations, special deals, last-minute insertions, and other opportunities for organized special interests to exert undue influence.

One of the most important measures in the healthcare reform act is the Independent Payment Advisory Board, which will provide this kind of function for proposed cost reductions in the Medicare program. It will be very interesting to see if this survives the next few budget crises. If it does, there is a chance that it could lead to real change in how we make rules in the healthcare sector. The Greenspan commission was supposed to perform a similar function for Social Security, and the recent debt limit legislation creates a special congressional group to develop a budget plan that will be submitted for a simple up/down vote by the full Congress. Thus, we may see an evolution of the legislature to emphasize its investigatory functions and its up/down vote functions, but to have it much less involved in the detailed crafting of the legislation that influences the dollars that come in and out.

Another option is to try to rebuild the nonprofit sector in healthcare or to take steps to protect the nonprofit sector in higher education. It will be very interesting to watch how official policy goes in this direction.

One final point that is worth emphasizing is that top U.S. public officials are very poorly paid compared with their counterparts in other countries. The top civil servant in any department in Singapore receives a base salary of $1.5 million a year with the possibility of bonuses of as much as $500,000. There are promotion tournaments for the people who aspire to achieve those top positions. That means there is a serious financial incentive for talented people to stay in the government and work their way to the top.

An agency such as the Securities and Exchange Commission (SEC) has to deal with pressure from the financial sector for regulations that let them earn profits at the expense of stability of the economy. If the SEC staff included people capable of earning millions of dollars in the private sector but who had devoted their entire careers to government service, the agency might have been better positioned to resist a very well organized campaign to adopt regulations that served the interests financial industry.

The critical point is that in trying to change the metarules we should not limit ourselves to the obvious paths of judicial action or legislative reform. The Army acts differently. The Fed acts differently. (And if these ill-conceived proposals to abolish the Fed succeed, the economy will encounter disasters we cannot even imagine.) All of these approaches have costs and benefits. Our responsibility is to be objective and scientific about what the goals are, what the data say, what the theory suggests, and to be willing to ask broader questions about what would be a good way to organize ourselves.

If we do that, we will make better progress. Progress there will be much more important for helping us get back to what we think of as the golden era in the United States than measures designed simply to speed up the rate of technological change.

Qualitative Metrics in Science Policy: What Can’t Be Counted, Counts

The past half-century has ushered in a veritable revolution in the science of metrics, as the surprisingly long life of Moore’s Law and related advances in information technology have led to a vast reservoir of quantitative information ripe for study with powerful analytical tools. For instance, captured in the research literature are methods to measure the quality of one’s health, to quantify athletic performance, and to determine something as innately intangible as consumer confidence. Even in the domain of science and technology, economists have made efforts to assess the return on investment (ROI) of science and engineering research funded by government and industry, finding that university-level research is one of the best long-term investments that can be made. Yet, in the United States, academic research—and research universities more broadly—have long remained exempt from any real adherence to performance- and ROI-based metrics despite this nation’s quantitative revolution.

The reasons for this are as much historical as institutional, going beyond just the difficulty of measuring research ROI. Under the auspices of Vannevar Bush, the chief scientist-policymaker in the Roosevelt and Truman administrations, science and engineering R&D, having demonstrated their value in World War II, gained an elevated stature in the public sphere as critical and unimpeachable assets to the U.S. superpower state. A social compact of sorts was struck between science and society—the federal government would support scientific research, primarily in universities, and the benefits would flow to students (via education) and the general public (via technological innovation). Accordingly, few questioned the value that research universities offered the nation, and there seemed little need for systematic evaluation, let alone metrics.

A changing compact

Now, more than six decades after World War II and more than 20 years after the Cold War, the social compact between research universities and the federal government is being questioned. Research universities are being asked to answer two difficult questions: Is the investment in university research paying off, and is current university research well structured to meet the challenges of the future? Today, the answers to these questions are no longer being taken for granted in the affirmative.

The shift from agnosticism to skepticism of scientific research is perhaps exemplified most clearly in the appropriations data. Research funding as a portion of the federal budget and as a percentage of gross domestic product (GDP) has fallen by more than 40% since 1970, with the funding for the physical sciences and engineering cut by 50% during the same period. In recent years, even biomedical research funding, which has been popular with the public and politicians, has lost ground to inflation. Partly because of constraints on funding, federal agencies have increased pressure on the various research communities to defend funding levels and set priorities, particularly in fields such as particle and nuclear physics, astronomy, and atmospheric and ocean sciences that require expensive experimental facilities. And although rigorous application of quantitative evaluation metrics has not yet become a routine part of budget planning for federal R&D programs and their advisors in the research communities, change is on the way. Already, the Government Performance and Result Act (GPRA) of 1993 requires that all funding agencies develop strategic plans, set performance goals, define metrics, assess progress, and explain any failure to meet goals. GPRA does not require metrics for every research project that an agency funds, but it clearly has altered the landscape. For example, the National Science Foundation (NSF) revised its review criteria in 1997 to better reflect its GPRA strategic plan by including a second “broader impacts” criterion.

To be sure, research isn’t the only aspect of the modern university’s portfolio that is being questioned; academic institutions have also lately come under fire for emphasizing the quality of research over education. For example, the President’s Council of Advisors on Science and Technology (PCAST) recently issued the report Engage to Excel: Producing One million Additional College Graduates with Degrees in Science, Technology, Engineering and Mathematics (STEM), which includes specific recommendations on how teaching quality at research universities can be bolstered. The report focuses on improving STEM teaching in the first two years of university study, with one objective being the retention of larger numbers of intended STEM majors. Historically, 40% of STEM majors change to non-STEM disciplines before graduation. It’s difficult, however, to assess the efficacy of the report’s recommendations without a standardized means of evaluation. Without the development of some mechanism—through collaboration between the universities and federal agencies, not by imposing new regulations—to evaluate the effectiveness of innovative approaches to undergraduate education, it’s unlikely that they can see proper implementation.

Of course, there is an even clearer impetus for change: The current U.S. debt crisis, with federal deficits on the order of $1 trillion, promises to severely squeeze discretionary spending, especially non-defense budgets. Historically, federal research funding to universities has tended to rise and fall with overall domestic discretionary spending. President Obama and many members of Congress in both parties appreciate the special importance of investments in science and engineering research. However, they need ammunition in the form of compelling analysis that demonstrates the contribution that academic research makes to the well-being of Americans. Indeed, in this climate of tight purse strings, budgetary pressures, and anemic domestic growth, science policy is no longer exempt from data-driven accountability, and there is growing interest in identifying appropriate tools to measure and document the outcomes of investments in research universities; in short, a “science of science policy.”

A science of science policy

The perceived need for a science of science policy has not gone unaddressed. As we write, a multiagency, collaborative effort, spearheaded by NSF, is attempting to meticulously catalogue the socioeconomic effects of science R&D investments at the university level through a process called “STAR METRICS.” It is a step in the right direction and may herald the advent of powerful new tools to guide science policymaking on the federal level while demonstrating that U.S. research universities are upholding their end of the social compact. The American people are generally in favor of federal funding for academic research, but they still want to know what their tax dollars are buying, and carefully chosen metrics are a way to do that.

However, this progress in recognizing the importance of and measuring societal impacts of federally funded research universities, although substantive, requires a caveat. Finding metrics that accurately assess the true value of research universities—the qualitative contributions and long-range potential as well as more easily measured impacts—is enormously difficult.

Industry leaders and many policymakers are understandably focused on the role of research in the nation’s economic competiveness and are thus tempted to emphasize shortterm economic gains as a measure of research impact. But doing so frames this values debate in a manner that ignores the rich history of U.S. innovation and threatens to damage the nation’s leadership in science and technology and its vital role in the innovation ecosystem. Although the scientific community may not lose such a battle outright—who wants to argue against nurturing the next Google, whose pioneering algorithmic studies at Stanford University were originally funded by the NSF?—the debate becomes, at the very least, an uphill battle. By intrinsically limiting the scope of what benefits scientific research provides for the nation, we would fundamentally narrow the utility and significance of such debate and impose a handicap on research and universities where there should be none.

Broader impacts

Ultimately, the true societal value of the nation’s intellectual capital coming from scientific research and research universities cannot be monetized, packaged, and fit neatly into dollars and cents. Can biodiversity, clean oceans, and clearer skies really be expressed in terms of jobs or income? And what of revolutionary discoveries and advances in medicine and security? Shouldn’t we know more about the fundamental makeup of the universe and our place in it? Furthermore, even for discoveries and inventions that do have the promise of future commercial applications, it often takes decades to realize the results.

With respect to STEM education, there is a compelling case to be made for the broader advantages that research universities confer on the educational process. After all, in a world in which information of all kinds is readily and inexpensively accessible, process knowledge increasingly trumps content knowledge; facts, lectures, and textbookbased learning must necessarily be subordinate to handson, design-based experiences. As such, it is intuitive that the opportunity that universities provide to undergraduates for independent research—such as those funded by the NSF’s Research Experiences for Undergraduates program—is a critical aspect of the educational process of becoming a truly 21st century-ready scientist or engineer. It is difficult, however, to imagine this benefit accruing in the form of higher test scores or starting salaries.

And that’s only the lip of the test tube. The social compact inspired by Vannevar Bush’s vision and fueled by the investment of federal funds in higher education has delivered on its promise by creating a government-supported, universitydriven research system that has graduated generations of scientists, engineers, and other professionals who became the nation’s innovators and business leaders. The impact does not end at the U.S. border. Widely acknowledged as the best in the world, U.S. universities have educated countless non-native intellectuals, foreign officials, and even heads of state and their children. Given the sheer ubiquity of U.S. training among the international intelligentsia, one can only imagine the degree to which U.S. perspectives dictate world discourse, simply by virtue of cultural and intellectual diffusion. This “soft power” can lead to progress where standard diplomatic channels are blocked. During the darkest days of the Cold War, U.S. physicists and their Russian colleagues continued to collaborate in an effort to peel back the underlying mechanisms of the natural universe. This is what science diplomacy is all about. However, although the benefits seem clear, the impact is difficult to quantify.

Efforts of other nations

The challenge of evaluating the perhaps unquantifiable impacts of research universities need not prove intractable. Indeed, the United States is not alone in this endeavor, and this nation can learn from the experiences of other nations as U.S. researchers, working with the federal funding agencies, seek to develop research performance metrics most appropriate for this country.

For example, in 2008, Australia, one of the world’s leaders in the percentage of GDP it devotes to scientific research, replaced its quantitative metrics paradigm with a more qualitative “Research Quality Framework,” which includes panel assessments of “impact in the form of the social, economic, environmental and cultural returns of research beyond the academic peer community.” This national policy is based on the latest research in context-dependent metrics and places particular emphasis on the social implications and effects of university-led scientific research. It is noteworthy that the Australian approach does not jettison quantitative metrics entirely; rather, it attempts to deftly merge the subjective with the objective, with evaluations grounded firmly on the basis of expert opinion.

Australia is not alone. New Zealand and the Netherlands have developed and incorporated impact assessments of scientific research that extend beyond markets and academia. The United Kingdom has also taken steps to include broader impact evaluations and qualitative metrics for research universities alongside traditional quantitative measures with its Research Assessment Exercise and Research Assessment Framework, although not without controversy. Sweden, France, and Singapore have begun devising hybrid science policy measurement schemes of their own as well.

These examples make it clear that there is a large pool of best practices (or at least experiences) from around the world to draw on, improve on, and adapt to U.S. needs. Combining these ideas with current pilot initiatives such as STAR METRICS and the public value-mapping and sociotechnical integration research being carried out at the Consortium for Science, Policy, and Outcomes, allows for a practical and seamless transition to this framework, while leveraging federal resources to permit scalability.

A way forward

Many U.S. academic researchers are questioning whether current efforts to define research evaluation metrics are likely to be fruitful and are leery of getting involved. Academic researchers understandably worry that the likely result of efforts to define research metrics and evaluation mechanisms will be an increased emphasis on “directed research” and a consequent loss of freedom to explore fundamental aspects of nature. This is a legitimate concern because the trends in recent decades have been worrisome. But the issue of research assessment is not likely to go away. U.S. researchers, like their colleagues in other parts of the world, need to work with the funding agencies to do two things: ensure that fundamental basic research remains high on the list of priorities and help develop the most appropriate metrics and mechanisms to evaluate research effects, both those that are quantifiable and, arguably more important, those that are not. The need to act is becoming more urgent in the face of a worldwide budgetary crisis that is likely to be around for some time.

One area that deserves particular attention is the impact of research on the quality of university education, undergraduate as well as graduate. The need for such measurements is motivated by the aforementioned White House report that may come to represent a tipping point for STEM education at research universities. This may be an ideal opportunity for universities and agencies to work together by launching experiments with different approaches to educational evaluation in parallel with research-impact assessments.

Whatever evaluative processes the research communities and federal agencies select, market-based metrics must not be the primary consideration. The American people will be best served by metrics that capture broad social contributions of research universities in a holistic, contextual manner. Moreover, U.S. leadership in science, engineering, and technology will be best served by metrics that are clear, sensible, and attentive to the long-term value that can result from breakthrough research. Although research metrics will vary across federal agencies, according to their respective roles and missions, there are some fundamentals that define the quality of research and that should be reflected in standard research metrics that apply across government; indeed, that are universal.

Unless the U.S. research community engages in the process of determining appropriate quantitative and qualitative metrics as well as assessment mechanisms that are based on expert opinion, rather than columns of numbers, the nation could end up saddled with a system that suppresses innovation and drives the best minds out of science and engineering or out of the country. The American people would be the ultimate losers.

To wit, it was Einstein himself who said “Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.”


Rahul Rekhi () is a senior in the Departments of Bioengineering and Economics at Rice University, as well as at the James A. Baker III Institute for Public Policy. Neal Lane () is the Malcolm Gillis University Professor and Senior Fellow in Science and Technology Policy at the James A. Baker III Institute for Public Policy at Rice University.

Forum – Summer 2012

National education standards

Martin West’s “Global Lessons for Improving U.S. Education” ( Issues, Spring 2012) reaffirms the importance of developing and implementing rigorous standards and transparent data-driven accountability in our schools.

The extent of students’ capabilities carries an immense economic and social impact for America’s families, communities, and states. As the United States continues to rank below the world’s high-performing nations in core subjects, we must transform our education system to ensure that every student gains the power of knowledge.

We cannot fund our way to the top. West points out that the United States achieves second-rate results while spending significantly more per student than higher-performing countries. Our nation has the resources, financial and human, to host the greatest education system in the world. And that is what our students deserve: an education system designed to equip every child to achieve their God-given potential.

As West notes, education reform should improve the quality of education available rather than merely increase the quantity that students consume. To accomplish this, complete transformation is needed.

We need higher standards. West reports that by age 15, the average U.S. student is a year behind the average student in six countries in math. Recognizing the need for change, state leaders developed new benchmarks focused on preparing today’s students to thrive in the 21st-century workforce. Thanks to those efforts, most states have adopted the Common Core State Standards, and beginning in the 2014–2015 school year, math skills and concepts will be more focused across the nation. The first step, adopting the new standards, has largely been accomplished. The second step, implementing these standards and preparing teachers and school infrastructure, is critical.

We must hold schools accountable for the learning of each and every student. Transparent data-driven accountability that recognizes both progress and performance is necessary. This provides leaders, communities, and parents with a clear understanding of their school’s true state of education. Without assessments to compare students’ achievements against the standard of knowledge and skills they need to be successful, teachers, school leaders, and parents are ill-equipped to help each child learn.

We must also hold all students to high expectations and end the damaging practice of social promotion. Too many students leave high school only to face college remediation. Worse yet, many students never graduate because they become bored or fall behind.

A state-led movement to fundamentally shift education policy toward student learning and cognitive development is sweeping the nation. In the past two years, nearly half the states have adopted reforms to improve student achievement and transform education for their students. This is exciting, but we must remember that successful education reform is a process, not an event.

Only after lawmakers, communities, and educators recognize the urgency of the matter and commit to a longterm investment in our students’ future will our states unleash an unprecedented flood of human potential.

JEB BUSH

The author was governor of Florida from 1999 to 2007 and chairs the Foundation for Excellence in Education.


Martin West builds a careful case that our nation’s productivity is closely tied to our educational outcomes. Although it is true, as West notes, that global academic competition is not a zero-sum game, the failure of our education system is profound. Even in our usual area of competitive advantage, higher education, we face rapid growth from competing nations. In K-12 schools, moreover, the situation is increasingly bleak.

West flags some consequences, including increased gross domestic product, that are achievable with better education outcomes. I think we can state the risks even more bluntly, and here I add at least two. As co-chair with Condoleeza Rice of the Council on Foreign Relations Independent Task Force on Education, we identified a critical national security risk produced by the inability to recruit sufficiently educated troops. More broadly, as students drop out of high school in droves and cannot earn a living wage, we begin to experience a threat to U.S. unity and cohesion, and consequently an erosion of our social fabric.

These conditions require immediate and drastic action. We can surely learn from international policy, but there are some steps we must take to address our unique system woes. Our Council report identified three paths. First, common core standards, to be voluntarily adopted by the states, must raise the game and include subjects critical to national security. Currently, most states set the bar too low, and all too often a high-school diploma is not worth the paper on which it is printed. Second, we must see an expansion of choice in public education, allowing all parents to choose their child’s school regardless of income level, leveraging the uniquely American system of competition and innovation. Finally, we need a national security audit to hold policymakers accountable and to deepen public knowledge of the risks to our security apparatus that exist without high-quality education. Although we can continue to document our nation’s dismal education performance on standardized tests, until this issue becomes a national priority of great urgency our students will continue to fall behind their global peers.

JOEL KLEIN

News Corporation

New York, New York


Martin West has written a very insightful piece about comparisons of U.S. student performance with that of students in other leading countries and the relationship of student achievement to countries’ economic competitiveness. I have little quarrel with his analysis as far as it goes. His concluding policy lessons are important, although I would substitute a discussion of school choice in the United States for his “privateschool competition” notion. U.S. private schools operate very differently from those in other nations, and as he notes, they are more like U.S. charter schools.

My problems with the West piece are three major differences among high-performing countries and the United States that he never mentions: The unique U.S. decentralized system of K-12 education governance; the grossly inequitable ways that schools are funded; and the lack of support for social services, especially preschool education and health care, which children in most leading countries receive routinely.

No other advanced country is as decentralized in governing education as the United States, with its system of local, state, and federal funding and responsibilities organized among 15,000 school districts with mostly elected school boards, 5,000 charter schools, and state and federal regulators. Many of the high-performing nations were not always so successful educationally or economically. Ye t these countries broke with their pasts, embraced the national redesign of their now centrally operated school systems, and instituted fairer funding schemes. And they did this relatively quickly, over two or three decades. Radical change in the design of public education has never taken place in the United States.

With regard to school finance, in the United States, schools with large concentrations of low-income students usually receive less funding than other schools with fewer low-income students in their districts. And districts with low property wealth in most states have fewer resources unless a state chooses to pick up a significant share of education costs. The federal government pays for at best 10% of education costs and barely makes a dent in differences in education spending among states even when adjusted for cost differences. Other advanced countries fund their schools more centrally and allocate funds to schools based on student needs. As Andreas Schleicher of the Organisation for Economic Cooperation and Development has pointed out, “It is noteworthy that spending patterns in many of the world’s successful education systems are markedly different from the U.S. These countries invest the money where the challenges are greatest rather than making resources contingent on the economic context of the local communities in which schools are located, and they put in place incentives and support systems that attract the most talented school teachers to the most difficult classrooms.” (testimony March, 9, 2010, before the Senate Health, Education, Labor, and Pensions Committee; http://help.senate.gov/imo/media/doc/Schleicher.pdf).

Of course monetary investments alone in the United States and abroad don’t track school achievement scores. As an advocate for fiscal equity, I’m constantly challenged about high spending in several low-achieving districts; for example, Newark, New Jersey. However, no other successful country provides less funding for lowincome or primary-language learners than for more affluent and native language speakers. But the United States does. What we need is a state-based system of school funding based on student need, with an accompanying accountability system for results and productivity.

Finally, most advanced countries make sure all young children have access to high-quality early childhood programs. And they guarantee necessary health care. In the United States, only 66% of low-income four-year-olds attend preschool, whereas 90% of more affluent students do.

CYNTHIA G. (CINDY) BROWN

Vice President for Education Policy

Center for American Progress

Washington, DC


An incomplete view of adolescence

On a recent trip to the Pocono Mountains in central Pennsylvania, I found myself perusing a book at our small bed-and-breakfast about the lives of Pennsylvania coal miners in the 19th century. On the cover was an old daguerreotype of a formally dressed and solemn-looking pair, decked out with flowers and holding hands. The caption revealed that the couple had just been married. The bridegroom was 16 years old and the bride barely 14.

Laurence Steinberg’s “Should the Science of Adolescent Brain Development Inform Public Policy? (Issues, Spring 2012) put me in mind of this photo. It should not be forgotten that in our country today, the study of adolescent behavior is almost exclusively the study of the behavior of our own adolescents. Although psychologists routinely attend to the spectrum of conduct across prevailing social conditions, they are much less sensitive to the social realities of the past. The reasons for this are not hard to fathom. The teens who lived in prior centuries are dead. Their parents, mentors, and teachers are too. None can be interviewed and their brains cannot be scanned. At most we have historical or literary records of their thoughts and doings. Almost always this qualifies as anecdote, not data.

Nonetheless, we have reason to believe that the past was a radically different place for the young. As the picture of the solemn newlyweds suggests, teens in previous centuries were often expected to and in fact did accede to weighty responsibilities at a relatively early age. Courtiers, military men, and future leaders emerged in their midteens and engaged in practical endeavors we now associate with adulthood. Males routinely took on work, community, and political roles and were charged with family support. Girls married early and were immediately treated as women. For most of human history, teen moms have been the norm rather than the exception. By and large, these young men and women had no choice. The signature institutions of today’s prolonged adolescence, and one might say prolonged immaturity, simply did not exist. For the vast majority, book learning and formal education were simply unavailable. In this radically different cultural and institutional setting, teens were expected to behave with far greater maturity and to take on adult roles. In response to those expectations, many (indeed almost all) actually did.

Why does this matter? Scientists who study human development should not lose sight of the phenomenon of range restriction; that is, the failure to canvass all the possible environmental and cultural influences that bear on human behavior. Range restriction is, of course, a notorious source of distortion in our understanding of human nature. But that distortion is not just cross-sectional; it is longitudinal as well. Because psychosocial research today virtually ignores history, its picture of development is necessarily incomplete. In noting that teens tend to be more impulsive, shortsighted, risk-seeking, and self-centered, Steinberg nods to past differences but does not give them nearly enough weight. By assigning few serious responsibilities to teens and demanding relatively little of them, we permit certain tendencies to emerge and even encourage them to thrive. The behaviors we currently associate with adolescence are, in effect, a historical luxury, enabled by an affluent, individualistic, self-expressive, and permissive culture that represents a restricted range of possibilities. Psychologists and brain scientists should never forget that the past is a foreign country. The adolescence we observe, and that we are now actively seeking to correlate with neuroscientific findings, is not the way it has to be.

AMY L. WAX

Robert Mundheim Professor of Law

University of Pennsylvania Law School

Philadelphia, Pennsylvania


California: Radical carbon cuts needed?

In “The 80% Solution: Radical Carbon Emission Cuts for California” (Issues, Spring 2012), Jane C. S. Long and Jeffery Greenblatt summarize an admirable analysis by the California Council on Science and Technology (CCST) on the technical feasibility of California’s bold target to cut greenhouse gases to 80% below 1990 levels by 2050. The full report merits careful study, particularly the spreadsheet that lays out the analysis and lets you work through alternative assumptions.

The study’s most important insight is one that is easy to overlook or misinterpret: the demonstration that it is feasible to cut California emissions by 60% using only available and nearavailable technologies. Granted this falls short of the 80% target, but it is still an enormous reduction, particularly considering projected population and economic growth through 2050. The result is a powerful demonstration of just how much can be achieved with technologies that are already commercial or nearly so. More distant advances promise even larger reductions, of course, and merit serious pursuit. Indeed, the authors find that further advances are needed for the last 20% of the target. But this result forcefully rebuts the popular delusion that we can limit climate change risks by ignoring near-term technologies and the incentives and institutional changes needed to deploy them, relying instead on research that chases big, distant, breakthrough innovations.

The study’s projected gains are probably conservative because its mandate was limited to technical feasibility. This required the participants to finesse several key issues of economics and policy related to the cost of technologies. In separating near-available from moredistant technologies, they used only a rough and implicit cost threshold, excluding options judged likely to remain “excessively costly” through 2050. But of the technologies they judged feasible, many will cost more than the highemitting alternatives they supplant. Achieving the rapid deployment on which their results depend will thus require policy-generated incentives that favor low- and nonemitting technologies over conventional high-emitting ones, including economy-wide measures (for example, emission taxes or tradable emission-permit systems) that put a price on emissions.

Such policies have two effects. On the supply side, they give investors incentives to develop and deploy emission-cutting technologies. At the same time, they affect demand by raising the price of fossil energy and the products using it. Of these two effects, the first figures in the CCST analysis, but only implicitly. Its actual effect on emissions could be more or less than their “feasible” reduction, depending on whether the incentives enacted lie above or below the rough cost cutoff marking their boundary of feasible technologies. The second, demand-side effect will also reduce emissions, as consumers and other economic actors adjust their decisions to the higher cost of emissionintensive energy. But the CCST feasibility analysis did not consider such behavioral adjustments, because they were outside their mandate. These thus represent an additional consequence of the policy-generated incentives needed to motivate the innovation and technology deployment on which the CCST analysis focuses. Although these pose significant social, economic, and political challenges, mainly related to their distributive effects, they represent an additional environmental benefit of well-designed policy. They also provide much of the answer to the question the authors appropriately pose, of how to avoid losing the gains from technological progress through behavioral rebounds.

EDWARD A. PARSON

Joseph L. Sax Collegiate Professor of Law

Professor of Natural Resources and Environment

University of Michigan

Ann Arbor, Michigan


Jane C. S. Long and Jeffery Greenblatt observe that California’s plan for radical decarbonization of its economy is unachievable, even with massive social and technological reengineering. I suggest that the decarbonization agenda is also unwise, illiberal, and regressive.

First, let’s be clear: There is very little environmental benefit to be had in California’s pursuit of decarbonization. California produces about 1% of global greenhouse gas emissions. Even if California cut its emissions to zero, there would be no significant impact on global average temperatures.

But there will most certainly be costs. As I document in The Myth of Green Energy Jobs: The European Experience, we know what happens when governments pursue policies that raise the costs of energy and compromise competitive energy markets: Net jobs are lost, economies contract, industry flees, individuals are pushed into energy poverty, and corruption blooms. The effects of such policies fall most heavily on society’s most vulnerable populations: the elderly, the poor, and the infirm.

Californians should know this. As demographer Joel Kotkin has documented, California is in sharp decline, economically and socially, partly if not largely because of its pursuit of draconian environmental regulations. Pursuing green-at-all-cost policies has produced a highly regressive two-tiered economy that increasingly consists of the ultrawealthy and a poorly paid serving class. This is exactly the opposite of what California’s “progressive” politics promised.

Yes, California has a beautiful environment that must be protected, but it must be protected pragmatically in a way that balances human values with environmental values; in a way that is compatible with a flourishing society of people who are empowered socially, politically, and economically. Chasing after radical decarbonization is not a part of that picture.

KENNETH P. GREEN

Resident Scholar

American Enterprise Institute

Washington, DC


Better STEM for all

Robert D. Atkinson’s proposals for a new science, technology, engineering, and mathematics (STEM) education reform strategy, if enacted, would shortchange students and imperil our nation’s economic recovery (“Why the Current Education Reform Strategy Won’t Work,” Issues, Spring 2012). His arguments reflect a selective and myopic reading of the data and seem to ignore completely significant developments in research on student learning and the economy. Atkinson suggests that we move from some STEM for all to all STEM for some. As the executive director of Project Kaleidoscope, a national organization that has worked on improving STEM education in colleges and universities for more than two decades, I recommend that readers consider a broader set of research studies that are guiding the STEM reform movement. This economic and educational research makes clear that we need more widespread adoption of engaging teaching and learning strategies so that all students will graduate with a better understanding of our scienceand technology-driven world. These strategies have been shown to motivate students to stay in STEM and help them achieve higher levels of proficiency, particularly students who are traditionally underrepresented in higher education and especially in STEM fields.

Educational, policy, and business leaders from many sectors have been calling for years for a multifaceted reform strategy to improve STEM learning outcomes for all students and, at the same time, increase the numbers of students graduating with STEM degrees. It isn’t an either/or proposition. Atkinson focuses exclusively on what he calls STEM jobs. He seems to have missed, however, the findings of the recent study from the Georgetown University Center on Education and the Workforce. That study’s authors suggest that we do face a shortage of workers with high-level STEM degrees, but as they put it, “the deeper problem is a broader scarcity of workers with basic STEM competencies across the entire economy.” In fact, they note, “the demand for workers in STEM-related occupations is increasing at every education level. The STEM supply problem goes beyond the need for more professional scientists, engineer, and mathematicians. We also need more qualified technicians and skilled STEM workers in Advanced Manufacturing, Utilities and Transportation, Mining, and other technology-driven industries.”

Atkinson ignores this and other data that suggest that employers want greater focus on STEM learning for aspiring STEM students and others. In a national survey of employers commissioned by the Association of American Colleges and Universities in 2009, 70% of business leaders said they want colleges and universities to place more emphasis on science and technology for all college students. Moreover, STEM learning is essential for increasing our nation’s civic capacity as well. Atkinson handily ignores the ways in which scientific and quantitative literacy are essential for responsible citizenship in a world as steeped in technology and scientific innovation as our own.

Atkinson is also mistaken about what it takes to become a scientist or engineer. He notes that “being a scientist or engineer requires above-average intelligence.” Even if we accept this premise, there is abundant evidence that many students with high levels of intelligence lack access to programs that support effective STEM learning. The research on learning has shown that students from a variety of backgrounds are capable of excelling when given the opportunity. Our best future doctors and engineers may be at the poorest schools in our nation, and without an approach of some STEM for all, they may never emerge to cure cancer or create better technological solutions to pressing national and global challenges.

Atkinson’s proposal to focus resources on more STEM for fewer students would perpetuate the current state of exclusivity and favor those who already have access to strong preparation. We agree with his assertion that “society currently does a poor job in high school and college of helping those students get all the way to a STEM degree.” But we need to also draw in those students who may have an aptitude for STEM but are not already in the pipeline. This means we must focus on students from minority groups traditionally underserved by the K-12 school systems and by colleges and universities.

SUSAN ELROD

Executive Director

Project Kaleidoscope

Association of American Colleges & Universities

Washington, DC


Robert D. Atkinson’s essay is timely and important. There is a broad consensus that the economic success of the United States depends on effective education in STEM. The consensus extends to top business leaders, as shown by the 2005 Business Roundtable report Tapping America’s Potential: The Education for Innovation Initiative, and to national politicians and university presidents, as shown by the 2005 Council on Competitiveness report Innovate America. The National Science Foundation, one of the largest sources of research funding for science and engineering research, considers STEM education to be a core part of its mission.

Atkinson’s article ranges over many topics, but his main point is that we should devote most of our STEM education resources to a small group of talented students who have demonstrated the potential and motivation to excel at STEM. And although he is not explicit about this, he also seems to believe that we should stop teaching STEM to the majority of students who do not have such aptitude or interest or at least, that doing so is not a compelling national interest.

Atkinson’s article is well reasoned and grounded in solid data, but I would modify his argument somewhat. I agree that the main reason that talented college students leave STEM majors is because of the way those disciplines are taught: in large lecture classes focused on memorization of material to prepare for midterm and final exams. However, I worry that it may be too simplistic to attribute this to poor teaching as Atkinson does. Rather, the problem is that college STEM teaching is often not based on findings from learning sciences. From learning-sciences research, we know that deeper learning results from active, project-based learning, as students engage in solving real-world problems in teams with other learners. Atkinson mentions the Olin College of Engineering, although he doesn’t note that they are one of the leaders in transforming curricula away from lectures and toward these learning-sciences–based models. Better teachers cannot fix the broken model of lecture-based instruction; we need a radical transformation toward research-based teaching.

I agree with Atkinson that one goal of national education policy should be to provide the fuel needed to power a science- and technology-driven U.S. economy, and to achieve this goal, one could reasonably argue for Atkinson’s all STEM for some framework. Atkinson points out that only 5% of jobs are STEM jobs, and that more than 70% of these jobs are in computer science and engineering. But I would also argue for a second goal of national educational policy: to prepare an educated citizenry to vote intelligently on STEM issues. On average, U.S. citizens are woefully unprepared to understand issues such as global climate change, evolution by natural selection, floridation of drinking water, and immunization of children. A powerful nation needs both some STEM for all, and all STEM for some.

KEITH SAWYER

Associate Professor

Department of Education

Washington University

St. Louis, Missouri


I read Robert Atkinson’s article with great interest. His catchy contrast of “STEM for all” with “all STEM for some” provokes the reader and challenges some of our most popular beliefs about what the nation needs to do to increase STEM innovation.

We need both engineers and professional piano players, but by relying on the strained analogy of music education and STEM education, Atkinson risks losing his audience. The valuable core of his argument may be dismissed by people who hold and intensely advance many of the ideas he identifies as myths: In a globalized society, all students need to be job-ready in STEM, K-12 STEM teachers need to be paid more, and students need to be convinced STEM is “cool.” Because space does not permit me to challenge or applaud Atkinson’s arguments fully, I have selected three points for commentary.

IN THE SAME WAY THAT OUR SOCIETY IS THE RICHER FOR DEVELOPING A CITIZENRY THAT APPRECIATES GOOD MUSIC WHETHER OR NOT EACH PERSON DEVELOPS INTO A PROFESSIONAL JAZZ MUSICIAN, OUR SOCIETY IS THE RICHER FOR CULTIVATING THE SEEDBED OF STEM LITERACY.

First, innovations that lead to largescale improvements in our standard of living grow out of the STEM disciplines. In this regard, Atkinson is quite right. As a nation, we have not allocated sufficient resources to the development of high-level STEM talents, which are the source of these desirable innovations. Although U. S. graduate schools with their labs and research teams are the best in the world at cultivating STEM talent among students who have the interests and the abilities to profit from them, it is a long wait and a long pipeline to that experience. Atkinson is understandably impatient.

I agree with Atkinson’s argument that we have neglected the students who have the interest and ability to become innovative scientists, engineers, mathematicians, and inventors. Our federal investment in them, the tiny Jacob K. Javits Gifted and Talented Students Program, was zeroed out in the latest federal budget. Allocating resources to talented children and adolescents has always been a tough policy sell. With the exception of the postSputnik panic, the nation has not directed sufficient funds to the development of STEM talent during the precollegiate years. Atkinson’s proposal that more specialized STEM high schools be established is one useful strategy and appears to have garnered current political support at the federal and state levels. We do have specialized STEM high schools now, and we appear to be on the road to establishing more of them. While we are at it, we should fund them more generously, too. Now, however, how are we to encourage students who wish to take on a challenging STEM course of study at the high school or undergraduate collegiate levels? STEM innovators do not spring fully formed from Zeus’s forehead, ready at age four to major in physics.

Second, I therefore disagree with Atkinson’s apparent assumption that STEM education is not necessary for all children and adolescents. Here’s why. Atkinson is quite correct when he identifies interest, ability, and motivation to become STEM professionals as important psychological variables. We know through the study of talent development in many disciplines (including the STEM disciplines of science and mathematics in particular) and through intervention studies in schools that children’s interests appear early and through exposure to opportunity. Cultivation of STEM talent at the elementary grade levels and subsequently through the full K-12 schooling experience is the nursery out of which STEM innovators grow. Not to devote resources through teacher preparation in STEM at the primary and elementary grade levels is shortsighted. Not to devote resources to teacher preparation for the identification of STEM interests and talents in young children is shortsighted. In general, our educational research community and our educational professionals are abundantly shortsighted about talent development; we do not need more of the perspective that bright and motivated youngsters can make it on their own through benign policy neglect or minimal educational services. I applaud Atkinson’s willingness to place the uncomfortable elephants directly in the center of the room. He simply needs to allow them to be in the room earlier in the K-12 educational trajectory.

Third, Atkinson relies on economic arguments to advance his perspective. According to his argument, we cannot afford to take the STEM for all approach, so we must retrench and take the all STEM for some approach. Sadly, this false resource allocation dichotomy, too, is shortsighted. Perhaps a clarification of our national goals for STEM might assist us in disentangling the economic arguments. We must have twin goals for STEM. Other nations do, and it appears to be working for them. Surely we can follow suit. The twin goals are, indeed, STEM innovation and STEM literacy.

In a recent report from the National Science Board, Preparing the Next Generation of STEM Innovators: Identifying and Developing Our Nation’s Human Capital, convincing arguments buttressed by research and policy recommendations were put forth. The development of greater numbers of specialized STEM programs with research support was recommended; increasing captivating and challenging science experiences for young children guided by confident teachers was recommended, too. These actions provide the nurseries out of which the next generation of STEM innovators grows.

In addition to high-level STEM innovators, our nation does need a STEM-literate citizenry. We have citizens who fear fluoride in the water, are duped by the misleading use of statistics in political debates, and fail to support the acquisition of tax revenue to replace bridges ready to collapse from age. STEM literacy is necessary if our nation wishes to engage in enlightened political discourse, improve our standard of living, and be willing to vote for the enlightened tax policies that make economic investment in education—the kind of education that fosters STEM talent development—possible. In the same way that our society is the richer for developing a citizenry that appreciates good music whether or not each person develops into a professional jazz musician, our society is the richer for cultivating the seedbed of STEM literacy. The foundation of STEM literacy sets our future STEM innovators on the pathway to talent development and advanced achievement in and passion for “doing” science, technology, engineering, and mathematics.

ANN ROBINSON

Professor and Director

Jodie Mahony Center

University of Arkansas at Little Rock

Past President, National Association for Gifted Children


The article by Robert D. Atkinson should be read together with the one by Brian Bosworth on “Expanding Certificate Programs” in the Fall 2011 Issues. They both tell the same story: Instead of a one-size-fits-all education path from kindergarten through college, we need to identify at an early stage each child’s inclination and provide an appropriate education path toward his/her goal. For those inclined toward math and science, provide academic challenges to stimulate their interest. Children with manual skills may pursue a certificate program in a selected trade. Others with an inclination toward the arts could be rewarded by appropriate stimuli. All should receive a basic education, including effective communication, but the curriculum should be adjusted to provide stimuli in the subjects that each individual finds most productive and rewarding.

This approach presents two major challenges: identifying at an early age the inclinations and interests of each child, and creating the required diversity in curriculum, especially in high school. I believe that child development research provides the capability for the former and that the diversity already exists in the education system; it only needs to be applied differently to different students.

The positive effects of such an approach are obvious: Everyone learns faster when the subject matter is interesting and they recognize its potential usefulness. The potential negative effect is that the classification of individual inclinations results in value grading. Children are particularly inclined to, “I’m better than you, because …” Parents and educators must work to counter this tendency. We must teach that people of various skills are different, not better or worse. We need carpenters as well as mathematicians; after a tornado, we need carpenters more than mathematicians.

This approach should start in elementary school, although the impact would be greatest in high school. The initial emphasis should be on illustrating the type of activities associated with each subject. Invited presentations by practitioners in many fields could be useful, with emphasis on “What do you do during a typical day?” Later, the subject progresses to developing the necessary skills. I estimate that approximately half the high-school curriculum would be for basic subjects required for all students, with the rest focused on the chosen career. A change in course after the student develops a better understanding of the options should be facilitated.

These two articles and others emphasize the need to improve our education system. In my opinion, the key is to get each child hooked on some subjects early in their development and then provide continuing stimulation, so that school is a rewarding experience. It’s difficult to overcome in high school and college the effect of years of poor achievement for those who are not academically inclined, or of boredom for those who have never been challenged.

VICTOR VAN LINT

Consultant

La Jolla, California


Praise for SEMATECH

Conventional economic theory holds that nations should concentrate on their natural and comparative advantages. But nations are also capable of creating advantage where none otherwise exists, by nurturing, supporting, and protecting particular industries. SEMA -TECH, a collaboration of U.S. semiconductor companies and the U.S. Department of Defense (DOD), has rightly been credited with generating significant national advantage.

Richard Van Atta and Marko M. G. Slusarczuk recount the history of SEMATECH in restoring the preeminence of the U.S. semiconductor industry in the 1980s, and with some notable exceptions, get the story right (“The Tunnel at the End of the Lights,” Issues, Spring 2012). SEMATECH enabled the U.S. semiconductor industry to overcome a major manufacturing deficiency relative to the Japanese industry that no company or government on its own could do.

In short order, the U.S. industry regained technological leadership and was back to more than 50% of world market share. The DOD achieved savings and a stronger industrial base for advanced microelectronics. The nation benefited from the high growth and productivity of a strong, strategic industry.

The criticisms of SEMATECH referred to by the authors are misplaced. Rather than promoting one semiconductor device technology over another, the mission of SEMATECH was to strengthen the semiconductor manufacturing infrastructure generally. The critical rationale for DOD participation was to be able to rely on the manufacturing capability of a commercial semiconductor industry rather than the abysmally low yields that prime defense contractors were achieving in their manufacture of semiconductors for defense systems.

Any project as broad and challenging as SEMATECH will not satisfy all of the wide-ranging objectives of individual participants. SEMATECH sought to meet a specific and common need, and in combination with other measures, which included action against Japanese unfair trade practices, joint support of advanced university research through the Semiconductor Research Corporation, and innovative product development by individual companies, created a valuable national advantage.

The authors are correct that the SEMATECH model can be only a partial answer to sustaining U.S. technology leadership. Although raising important questions, they stop short of offering any learning from the SEMATECH case. That is unfortunate, because several important principles emerge from the SEMATECH experience:

  • By joining with industry rather than merely funding industry, government gains leverage, scale, and broader talent to attack a particular challenge.
  • With a narrow focus derived from a government mission rather than an intervention based on broad industrial policy, the government can harness the self-interest of industry and make its own substantive contributions to collaboration.
  • Unlike in procurement, where it is in the driver’s seat, the government must be unusually flexible and able to make tradeoffs when collaborating with industry.
  • To obtain the greatest advantage for a nation, a collaborative initiative needs to be part of broader government efforts to attract and incentivize industry through inducements such as improved education, tax, and regulatory policies.

Collaboration does not come naturally to government, and to succeed it must be carefully structured to meet a specific challenge. Because it can be an effective avenue to meet legitimate government needs and create advantage for the nation, government/industry collaboration, consistent with the SEMATECH principles, deserves much wider consideration.

W. C LARK MCFADDEN II

Senior Counsel

Orrick, Herrington & Sutcliffe

Washington, DC

The author is counsel to the Semiconductor Industry Association.


Lessons from nuclear disasters

I write this comment two weeks after my first trip to Chernobyl, where I had the opportunity to visit the nuclear power plant facility and to speak with villagers whose lives were permanently touched by the disaster 26 years ago. From the perspective of the plant’s management, catastrophe has become a success, given that construction is under way for a new confinement shelter for Reactor No. 4, and a new onsite storage facility for the 20,000 used fuel rods is planned. Management considers itself expert in managing the risks of nuclear reactor accidents, and it has hosted delegations from Fukushima. My “lesson learned” is that the enormity of the consequences of a reactor accident is unacceptable, even if the immediate number of deaths is small and the future burden of excess cancer is estimated to be so small that it will not be detected.

The Fukushima disaster is another opportunity to learn, adding to Chernobyl and Three Mile Island, and also to earlier incidents involving the Windscale reactor in the United Kingdom and the Kyshtym accident in the Urals. In “Learning from Fukushima” (Issues, Spring 2012), Sebastian M. Pfotenhauer and colleagues find three general lessons in their analysis of the Fukushima disaster: the interplay of politics and technology, the reach of the disaster beyond Japan, and the limitations of models. They are correct in noting the critical relevance of context; postmortems on each disaster have pointed to diverse contributing factors beyond engineering and operations. There is, however, a unifying overconfidence in engineering and risk predictions.

Pfotenhauer and colleagues seem to have maintained their belief in using models, if only better models can be developed that capture the broader system more fully and incorporate enhanced inputs. They echo Jasanoff’s previous call for humility, but not loudly enough. We should step back from the details and ask whether current models have proved useful at all and whether they can be refined to provide predictions that can be trusted. Regardless of probabilities, scenarios, and uncertainties, real events have happened.

They also argue that “understanding of the interconnections between society and technology” is requisite for the sustainability of nuclear power. The argument seems correct, but there is not a single, unified society involved with nuclear power, but the diverse societies of the 31 nations with nuclear power plants. We see this diversity in the responses of different countries to Fukushima, with some phasing out nuclear power and others bringing new plants online. The call for enhanced global governance is one potential solution to this diversity.

The Fukushima disaster has brought attention to reactor accidents, but there are other points of risk in the nuclear fuel cycle. The health and environmental hazards of uranium mining and milling are overlooked, although thousands of lung cancer deaths have occurred among uranium miners exposed to high levels of radon. Among the most recent cohort of U.S. uranium miners, those working in New Mexico from the 1950s through the 1980s, we found a fourfold increase in lung cancer mortality as compared with that in the general population. At the end of the cycle, the problem of waste remains unsolved. Stored fuel rods were a problem at Fukushima. At Chernobyl, management expects a technological solution for disposal within 50 years, and in the United States, high-level waste disposal remains unsolved. We are making an unacceptable intergenerational transfer of waste.

Pfotenhauer and colleagues end by addressing the sustainability of nuclear power. Beyond greater “understanding of the interconnections between society and technology,” society needs processes that will allow the engagement of its many stakeholders. After all, their lives and the futures of their children are at stake.

JONATHAN M. SAMET

Professor and Flora L. Thornton Chair

Department of Preventive Medicine

Keck School of Medicine

Director, Institute for Global Health

University of Southern California

Los Angeles, California


The past history of nuclear power in Japan is definitely past history, as all reactors have now been shut down and will have difficulty getting approval to restart. After Nagasaki and Hiroshima, the Japanese public understandably came to fear radiation (in contrast with Germany). I cannot comment on the history of Japanese nuclear power, but I do have several comments on other parts of the article by Sebastian M. Pfotenhauer and colleagues.

The primary cause of the enormous damage and loss of life at the Fukushima event was the tsunami, which the article glides over. It states that “One common refrain has been that Fukushima happened because politics interfered with technology,” but a review of several reports does not include this “common refrain.” These include: Japanese Earthquake and Tsunami: Implications for the UK Nuclear Industry (Final Report, HM Chief Inspector of Nuclear Installations, September 2011); Fukushima Daichi: ANS Committee Report (report by The American Nuclear Society Special Committee on Fuku – shima, March 2012); Fukushima Conclusions & Lessons (International Atomic Energy Agency International Fact Finding Expert Mission); and Executive Summary of the Interim Report ([Japanese] Investigation Committee on the Accident at Fukushima Nuclear Power Stations of Tokyo Electric Power Company, December 2011).

The article states that “Inadequate risk assessment models have been identified as another main culprit.” I agree with the authors that “models provide useful but incomplete guidance,” particularly if the guidance is not followed. The tsunami did produce an enormous wave, but the models had predicted that. One close-by utility had followed the advice and its plants were safe (see Safe Shutdown of Onagawa Nuclear Power Station, The Closest BWRs to the 3/11/11 Epicenter, Isao Kato, presentation at MIT, March 2012).

Finally, the suggestion to make the International Atomic Energy Agency (IAEA) an international licensing agency suggests the authors’ lack of knowledge about the agency, its performance, and its status.

I agree that transparency in decisionmaking and involving the public are wise approaches, if cultural differences are recognized. Both approaches have been recommended for use in the United States by several committees of the National Academies [see Improving Risk Communication (National Academies Press, 1989) and Understanding Risk: Informing Decisions in a Democratic Society (National Academies Press, 1996) (one of the authors of this article was a member of the committee that produced this report)].

I believe that the major lessons from the Fukushima disaster are those noted by the IAEA and American Nuclear Society reviews: the necessity to develop coordinated government/industry response procedures and the willingness to address the severe accident analyses, by either spending the necessary funds or closing the plants.

JOHN F. AHEARNE

Chapel Hill, North Carolina


Internet freedom: Not a foreign-policy issue

On January 21, 2010, Secretary of State Hillary Clinton turned “Internet freedom” into a rallying cry for U.S. foreign policy. Two years later, she remains firmly convinced of the importance of this cause, as evidenced in “Internet Freedom and Human Rights” (Issues, Spring 2012).

Unfortunately, a foreign-policy emphasis on Internet freedom is ill-advised for a simple if counterintuitive reason: The Internet is never the primary cause of critical changes in governance or human rights.

This statement goes against the dominant punditry regarding the supposed social-media revolutions of the Middle East, so it’s worth reconsidering the Arab Spring. Technology cheerleaders highlight protests in Egypt that were organized on Facebook and viral videos that captured oppressive governments red-handed. Undoubtedly, the Internet facilitates communication, a critical element of collective action.

Consider, however, scenes from other countries: In 2009, the State Department specifically asked Twitter to postpone a scheduled update so that protests in Iran could continue. Twitter complied, the protests continued, but there was no Iranian Twitter revolution. The Iranian government quashed it. In Syria, President Assad shut down two-thirds of the Internet in June 2011, but protesters there continue the fight without social media. In Saudi Arabia, Facebook activists announced a “Day of Rage” that was stillborn under a monarchy that bans civil society. In Libya, Gaddafi cut off the Internet but was roundly defeated.

Naturally, there are complex political, social, economic, and military reasons for these diverse outcomes. But there is no consistent way to explain them that takes the Internet or any other electronic communication medium to be a significant cause. Meanwhile, in China, there are more than 900 million mobile phone accounts, 500 million Internet users, and no serious attempts at revolution. It’s clear that the Internet is neither necessary nor sufficient to encourage democracy.

Claims that communication technologies are the cause or the catalyst of large-scale political change are based on the classic confusion of correlation with cause. It’s not that tweeting foments rebellion, but that today, all rebellions are tweeted. In previous eras, rebellions were phoned, printed, and even lit by lantern (well known from Paul Revere’s famous ride at the outset of the American Revolution. Where there’s self-confidence and frustration, protesters find a way.

Ultimately, politics trumps cyberspace and bullets beat bits, and I say this as a computer scientist. In America, we enjoy an open Internet where anyone can blog or tweet or post as they like, because we have a tradition of free speech and a free press, not because we have a formal policy of Internet freedom; in fact, we don’t.

I am apprehensive of meddling in the affairs of sovereign nations, especially when our own moral high ground seems to be eroding. Nevertheless, Clinton would better achieve her own goals if she doubled down on influencing the laws and social norms that protect free speech in general, rather than seeking to enforce them in a virtual world that for the most part merely reflects the physical one.

KENTARO TOYAMA

Visiting Researcher, University of California, Berkeley

Redmond, Washington


Over the course of the past few months Secretary of State Hillary Clinton has given three major speeches articulating the U.S. policy toward Internet freedom in terms of the U.N. Declaration of Human Rights. The speeches come at a moment of great celebration of the Internet for its role in the Arab Spring and a moment of deep concern about the future of the Internet as some nations have launched a campaign to radically change the nature of Internet governance.

As the Internet moves to the center of economic, social, and political life in the 21st century, the challenges of keeping it open will increase. It is vitally important to frame the issue properly. Although I applaud Secretary Clinton’s strong commitment to an open Internet, I have concerns about Clinton’s formulation.

We need to acknowledge that the impact of the Internet on the Articles of the Declaration is uneven and accept the fact that the Internet does not create nor can it solve all problems. The digital revolution has done more to advance some principles of the Declaration than its drafters and signatories could ever have dreamed. However, there are articles of the Declaration where its effect is mixed. Secretary Clinton mentions theft of (intellectual) property (Article 17) as s problem magnified by the Internet, but we could easily add concerns about theft of identity (Article 3) and invasions of privacy (Article 12).

The challenge is to defend the open communications network that has delivered immense progress toward freedom of speech, assembly, and cultural participation, without letting it be held hostage to the other goals of the Declaration, where the outcome is more mixed and the argument more complex. It is possible and essential to distinguish the Internet and its governance as a communications standard from the other social, political, and economic goals of the Declaration. The more activity that a standard enables, the better. Public policy then sorts out which activities are to be encouraged or discouraged.

The issue should not be framed as a struggle against repressive governments. This undervalues the importance of the Internet and invites the politicization of Internet governance. The digital communications revolution affords people who live under non-repressive regimes a much greater ability to exercise their rights.

The issue should not be framed as one of holding Internet companies to a higher standard because of the nature of the Internet. This risks sounding naïve and dependent on voluntary corporate behavior to achieve what governments and civil society have failed to.

We should not allow technology to parade as public policy and vice versa. Although technology and policy are never entirely separate, the core Internet governance issues have been substantially technical and non-governmental. Although there is always room for improvement in the openness and transparency of the governance process, we need to avoid the temptation to manipulate the technology to achieve our goals, lofty as we believe them to be, since that tends to legitimize the manipulation of the technology for far less lofty purposes.

MARK COOPER

Research Director

Consumer Federation of America

Washington, DC

Communicating Uncertainty: Fulfilling the Duty to Inform

Experts’ knowledge has little practical value unless its recipients know how sound it is. It may even have negative value if it induces unwarranted confidence or is so hesitant that other, overstated claims push it aside. As a result, experts owe decisionmakers candid assessments of what they do and do not know.

Decisionmakers ignore those assessments of confidence (and uncertainty) at their own peril. They may choose to act with exaggerated confidence, in order to carry the day in political debates, or with exaggerated hesitancy, in order to avoid responsibility. However, when making decisions, they need to know how firm the ground is under their science, lest opponents attack unsuspected weaknesses or seize initiatives that might have been theirs.

Sherman Kent, the father of intelligence analysis, captured these risks in his classic essay “Words of Estimative Probability,” showing how leaders can be misled by ambiguous expressions of uncertainty. As a case in point, he takes a forecast from National Intelligence Estimate 29-51, Probability of an Invasion of Yugoslavia in 1951: “Although it is impossible to determine which course the Kremlin is likely to adopt, we believe that the extent of Satellite military and propaganda preparations indicates that an attack on Yugoslavia in 1951 should be considered a serious possibility.” When he asked other members of the Board of National Estimates “what odds they had had in mind when they agreed to that wording,” their answers ranged from 1:4 to 4:1. Political or military leaders who interpret that forecast differently might take very different actions, as might leaders who make different assumptions about how much the analysts agree.

Figure 1 shows the same problem in a very different domain. In fall 2005, as avian flu loomed, epidemiologist Larry Brilliant convened a meeting of public health experts, able to assess the threat, and technology experts, able to assess the options for keeping society going, if worst came to worst. In preparation for the meeting, we surveyed their beliefs. The figure shows their answers to the first question on our survey, eliciting the probability that the virus would become an efficient human-to-human transmitter in the next three years.

FIGURE 1

Judgments of “the probability that H5N1 will become an efficient human-to-human transmitter (capable of being propagated through at least two epidemiological generations of humans) some time during the next 3 years.”

Data collected in October 2005. [Source: W. Bruine de Bruin, B. Fischhoff, L. Brilliant, and D. Caruso, “Expert Judgments of Pandemic Influenza,” Global Public Health 1, no. 2 (2006): 178–193]

The public health experts generally saw a probability of around 10%, with a minority seeing a higher one. The technology experts, as smart a lay audience as one could imagine, saw higher probabilities. Possibly, they had heard both groups of public health experts and had sided with the more worried one. More likely, though, they had seen the experts’ great concern and then assumed a high probability. However, with an anticipated case-fatality rate of 15% (provided in response to another question), a 10% chance of efficient transmission is plenty of reason for concern.

Knowing that probability is essential to orderly decisionmaking. However, it is nowhere to be found in the voluminous coverage of H5N1. Knowing that probability is also essential to evaluating public health officials’ performance. If they seemed to be implying a 70% chance of a pandemic, then they may seem to have been alarmist, given that none occurred. That feeling would have been reinforced if they seemed equally alarmed during the H1N1 mobilization, unless they said clearly that they perceived a low-probability event with very high consequences should it came to pass.

For an expert who saw a 10% chance, a pandemic would have been surprising. For an expert who saw a 70% chance, the absence of a pandemic would have been. However, neither surprise would render the prediction indefensible. That happens only when an impossible event (probability = 0%) occurs or a sure thing (probability = 100%) does not. Thus, for an individual event, any probability other than 0% or 100% is something of a hedge, because it cannot be invalidated. For a set of events, though, probability judgments should be calibrated, in the sense that events with a 70% chance happen 70% of the time, 10% events 10% of the time, and so on.

After many years of eliciting probability judgments from foreign policy experts, then seeing what happened, Philip Tetlock concluded that they were generally overconfident. That is, events often did not happen when the experts were confident that they would or happened when the experts were confident that they would not. In contrast, Allan Murphy and Robert Winkler found that probability-of-precipitation (PoP) forecasts express appropriate confidence. It rains 70% of the time that they give it a 70% chance.

Thus, decisionmakers need to know not only how confident their experts are, but also how to interpret those expressions of confidence. Are they like Tetlock’s experts, whose confidence should be discounted, or like Murphy and Winkler’s, who know how much they know? Without explicit statements of confidence, evaluated in the light of experience, there is no way of knowing.

Kent ended his essay by lamenting analysts’ reluctance to be that explicit, a situation that largely persists today. And not just in intelligence. For example, climate scientists have wrangled with similar resistance, despite being burned by forecasts whose imprecision allowed them to be interpreted as expressing more confidence than was actually intended. Even PoP forecasts face sporadic recidivism from weather forecasters who would rather communicate less about how much they know.

Reasons for reluctance

Experts’ reluctance to express their uncertainty has understandable causes, and identifying those causes can help us develop techniques to make expert advice more useful.

Experts see uncertainty as misplaced imprecision. Some experts feel that being explicit about their uncertainties is wrong, because it conveys more precision than is warranted. In such cases, the role of uncertainty in decisionmaking has not been explained well. Unless decisionmakers are told how strong or shaky experts’ evidence is, then they must guess. If they guess wrong, then the experts have failed them, leading to decisions made with too much or too little confidence and without understanding the sources of their uncertainty. Knowing about those sources allows decisionmakers to protect themselves against vulnerabilities, commission the analyses needed to reduce them, and design effective actions. As a result, decisionmakers have a right, and a responsibility, to demand that uncertainty be disclosed.

In other cases, experts are willing to share their uncertainty but see no need because it seems to go without saying: Why waste decisionmakers’ valuable time by stating the obvious? Such reticence reflects the normal human tendency to exaggerate how much of one’s knowledge goes without saying. Much of cognitive social psychology’s stock in trade comes from documenting variants of that tendency. For example, the “common knowledge effect” is the tendency to believe that others share one’s beliefs. It creates unpleasant surprises when others fail to read between the lines of one’s unwittingly incomplete messages. The “false consensus effect” is the tendency to believe that others share one’s attitudes. It creates unpleasant surprises when those others make different choices, even when they see the same facts because they have different objectives.

Because we cannot read one another’s minds, all serious communication needs some empirical evaluation, lest its effectiveness be exaggerated. That is especially true when communicating about unusual topics to unfamiliar audiences, where communicators can neither assume shared understanding nor guess how they might be misinterpreted. That is typically the lot of experts charged with conveying their knowledge and its uncertainties to decisionmakers. As a result, they bear a special responsibility to test their messages before transmission. At the least, they can run them by some nonexperts somewhat like the target audience, asking them to paraphrase its content, “just be sure that it was clear.”

When decisionmakers receive personal briefings, they can ask clarifying questions or make inferences that reveal how well they have mastered the experts’ message. If decisionmakers receive a communication directed at a general audience, though, they can only hope that its senders have shown due diligence in disambiguating the message. If not, then they need to add a layer of uncertainty, as a result of having to guess what they might be missing in what the experts are trying to say. In such cases, they are not getting full value from the experts’ work.

Experts do not expect uncertainties to be understood. Some experts hesitate to communicate uncertainties because they do not expect nonexpert audiences to benefit. Sometimes, that skepticism comes from overestimating how much they need to communicate in order to bring value. Decisionmakers need not become experts in order to benefit from knowing about uncertainties. Rather, they can often derive great marginal utility from authoritative accounts conveying the gist of the key issues, allowing them to select those where they need to know more.

In other cases, though, experts’ reluctance to communicate uncertainties comes from fearing that their audience would not understand them. They might think that decisionmakers lack the needed cognitive abilities and substantive background or that they cannot handle the truth. Therefore, the experts assume that decisionmakers need the sureties of uncertainty-free forecasts. Such skepticism takes somewhat different forms with the two expressions of uncertainty: summary judgments of its extent (for example, credible intervals around possible values) and analyses of its sources (for example, threats to the validity of theories and observations).

With summary judgments, skeptical experts fear that nonexperts lack the numeracy needed to make sense of probabilities. There are, indeed, tests of numeracy that many laypeople fail. However, these tests typically involve abstract calculations without the context that can give people a feeling for the numbers. Experts can hardly support the claim that lay decisionmakers are so incompetent that they should be denied information relevant to their own well-being.

The expressions of uncertainty in Figure 1 not only have explicit numeric probabilities but are also attached to an event specified precisely enough that one can, eventually, tell whether it has happened. That standard is violated in many communications, which attach vague verbal quantifiers (such as rare, likely, or most) to vague events (such as environmental damage, better health, or economic insecurity). Unless the experts realize the ambiguity in their communications, they may blame their audience for not “getting” messages that it had little chance to understand.

PoP forecasts are sporadically alleged to confuse the public. The problem, though, seems to lie with the event, not the probability attached to it. People have a feeling for what 70% means but are often unsure whether it refers to the fraction of the forecast period that it will rain, the fraction of the area that it will cover, or the chance of a measurable amount at the weather station. (In the United States, it is the last.) Elite decisionmakers may be able to demand clarity and relevance from the experts who serve them. Members of the general public have little way to defend themselves against ambiguous communications or against accusations of having failed to comprehend the incomprehensible.

Analogous fears about lay incompetence underlie some experts’ reluctance to explain their uncertainty. As with many intuitions about others’ behavior, these fears have some basis in reality. What expert has not observed nonexperts make egregious misstatements about essential scientific facts? What expert has not heard, or expressed, concerns about the decline of STEM (science, technology, engineering, and mathematics) education and literacy? Here too, intuitions can be misplaced, unless supported by formal analysis of what people need to know and empirical study of what they already do know.

Decisionmakers need to know the facts (and attendant uncertainties) relevant to the decisions they face. It is nice to know many other things, especially if such background knowledge facilitates absorbing decision-relevant information. However, one does not need coursework in ecology or biochemistry to grasp the gist of the uncertainty captured in statements such as, “We have little experience with the effects of current ocean acidification (or saline runoff from hydrofracking or the effects of large-scale wind farms on developing-country electrical grids, or the off-label use of newly approved pharmaceuticals).”

Experts anticipate being criticized for communicating uncertainty. Some experts hesitate to express uncertainties because they see disincentives for such candor. Good decisionmakers want to know the truth, however bad and uncertain. That knowledge allows them to know what risks they are taking, to prepare for surprises, and to present their choices with confidence or caution. However, unless experts are confident that they are reporting to good decisionmakers, they need assurance that they will be protected if they report uncertainties.

Aligning the incentives of experts and decisionmakers is a basic challenge for any organization, analyzed in the recent National Research Council report Intelligence Analysis for Tomorrow, sponsored by the Office of the Director of National Intelligence. In addition to evaluating approaches to analysis, the report considers the organizational behavior research relevant to recruiting, rewarding, and retaining analysts capable of using those approaches, and their natural talents, to the fullest. Among other things, it recommends having analysts routinely assign probabilities for their forecasts precisely enough to be evaluated in the light of subsequent experience. Table 1 describes distinctions among forms of expert performance revealed in probabilistic forecasts.

TABLE 1

Brier Score Decomposition for evaluating probabilistic forecasts

Three kinds of performance:

  • Knowledge. Forecasts are better the more often they come true (for example, it rains when rain is forecasted and not when it is not). However, that kind of accuracy does not tell the whole story, without considering how hard the task is (for example, forecasts in the Willamette Valley would often be correct just predicting rain in the winter and none in the summer).
  • Resolution. Assessing uncertainty requires discriminating among different states of knowledge. It is calculated as the variance in the percentage of correct predictions associated with different levels of expressed confidence.
  • Calibration. Making such discriminations useful to decisionmakers requires conveying the knowledge associated with each level. Perfect calibration means being correct XX% of the time when one is XX% confident. Calibration scores reflect the squared difference between those two percentages, penalizing those who are especially over- or underconfident. [See A. H. Murphy, “A New Vector Partition of the Probability Score,” Journal of Applied Meteorology 12 (1973): 595-600.]

Without such policies, experts may realistically fear that decisionmakers will reward bravado or waffling over candor. Those fears can permeate work life far beyond the moments of truth where analyses are completed and communicated. Experts are people, too, and subject to rivalries and miscommunication among individuals and groups, even when working for the same cause. Requiring accurate assessment of uncertainty reduces the temptation to respond strategically rather than honestly. Conversely, it protects sincere experts from the demoralization that comes with seeing others work the system and prosper.

Even when their organization provides proper incentives, some experts might fear their colleagues’ censure should their full disclosure of uncertainty reveal their field’s “trade secrets.” Table 2 shows some boundary conditions on results from the experiments that underpin much decisionmaking research, cast in terms of how features of those studies affect the quality of the performance that they reveal. Knowing them is essential to applying that science in the right places and with the appropriate confidence. However, declaring them acknowledges limits to science that has often had to fight for recognition against more formal analyses (such as economics or operations research), even though the latter necessarily neglect phenomena that are not readily quantified. Decisionmakers need equal disclosure from all disciplines, lest they be unduly influenced by those that oversell their wares.

TABLE 2

Boundary conditions on experimental tasks studying decisionmaking performance

  • The tasks are clearly described. That can produce better decisions, if it removes the clutter of everyday life, or worse decisions, if that clutter provides better context, such as what choices other people are making.
  • The tasks have low stakes. That can produce better decisions, if it reduces stress, or worse decisions, if it reduces motivation.
  • The tasks are approved by university ethics committees. That can produce better decisions, if it reduces worry about being deceived, or worse decisions, if it induces artificiality.
  • The tasks focus on researchers’ hypotheses. That can produce better decisions, if researchers are looking for decisionmakers’ insights, or worse decisions, if they are studying biases.

[Adapted from B. Fischhoff and J. Kadvany, Risk: A Very Short Introduction (Oxford: Oxford University Press, 2011), p. 110]

Experts do not know how to express their uncertainties. A final barrier faces some experts who realize the value of assessing uncertainty, trust decisionmakers to understand well-formulated communications, and expect to be rewarded for doing so: They are uncertain how to perform those tasks to a professional standard. Although all disciplines train practitioners to examine their evidence and analyses critically, not all provide training in summarizing their residual uncertainties in succinct, standard form. Indeed, some disciplines offer only rudimentary statistical training for summarizing the variability in observations, which is one input to overall uncertainty.

For example, the complexity of medical research requires such specialization that subject-matter experts might learn just enough to communicate with the experts in a project’s “statistical core,” entrusted with knowing the full suite of statistical theory and methods. Although perhaps reasonable on other grounds, that division of labor can mean that subject-matter experts understand little more than rudimentary statistical measures such as P values. Moreover, they may have limited appreciation of concepts, such as how statistical significance differs from practical significance, assumes representative sampling, and depends on both sample size and measurement precision.

Figure 2, taken from the 2007 National Intelligence Estimate for Iraq, shows an important effort by U.S. analysts to clarify the uncertainties in their work. It stays close to its experts’ intuitive ways of expressing themselves, requiring it to try to explain those terms to decisionmakers. Figure 3 represents an alternative strategy, requiring experts to take a step toward decisionmakers and summarize their uncertainties in explicit decision-relevant terms. Namely, it asks experts for minimum and maximum possible values, along with some intermediate ones. As with PoP forecasts, sets of such judgments can be calibrated in the light of experience. For example, actual values should turn out to be higher than the 0.95 fractile only 5% of the time. Experts are overconfident if more than 10% of actual events fall outside the 0.05 to 0.95 range; they are underconfident if there are too few such surprises.

FIGURE 2
Explanation of uncertainty terms in intelligence analysis

What we mean when we say: An explanation of estimative language

When we use words such as “we judge” or “we assess”—terms we use synonymously—as well as “we estimate,” “likely” or “indicate,” we are trying to convey an analytical assessment or judgment. These assessments, which are based on incomplete or at times fragmentary information are not a fact, proof, or knowledge. Some analytical judgments are based directly on collected information; others rest on previous judgments, which serve as building blocks. In either type of judgment, we do not have “evidence” that shows something to be a fact or that definitively links two items or issues.

Intelligence judgments pertaining to likelihood are intended to reflect the Community’s sense of the probability of a development or event. Assigning precise numerical ratings to such judgments would imply more rigor than we intend. The chart below provides a rough idea of the relationship of terms to each other.

We do not intend the term “unlikely” to imply an event will not happen. We use “probably” and “likely” to indicate there is a greater than even chance. We use words such as “we cannot dismiss,” “we cannot rule out,” and “we cannot discount” to reflect an unlikely—or even remote—event whose consequences are such it warrants mentioning. Words such as “may be” and “suggest” are used to reflect situations in which we are unable to assess the likelihood generally because relevant information is nonexistence, sketchy, or fragmented.

[Source: Office of the Director for National Intelligence, Prospects for Iraq’s Stability: A Challenging Road Ahead (Washington, DC: 2007]

FIGURE 3
Recommended box plot for expressing uncertainty

Source: P. Campbell, “Understanding the Receivers and the Receptions of Science’s Uncertain Messages,” Philosophical Transactions of the Royal Society 369 (2011): 4891—4912.

Some experts object in principle to making such judgments, arguing that all probabilities should be “objective” estimates of the relative frequency of identical repeated events (such as precipitation). Advocates of the alternative “subjectivist” principle argue that even calculated probabilities require judgments (such as deciding that climate conditions are stable enough to calculate meaningful rates). Whatever the merits of these competing principles, as a practical matter, decisionmakers need numeric probabilities (or verbal ones with clear, consensual numeric equivalents). Indeed, subjective probabilities are formally defined (in utility theory) in practical terms; namely, the gambles that people will take based on them. Experts are only human if they prefer to express themselves in verbal terms, even when they receive quantitative information. However, doing so places their preferences above decisionmakers’ needs.

The decisionmakers’ role

Just as decisionmakers need help from experts in order to make informed choices, experts need help from decisionmakers in order to provide the most useful information. Here are steps that decisionmakers can take for overcoming each source of experts’ reluctance to express uncertainty. Implementing them is easier for elite decisionmakers, who can make direct demands on experts, than for members of the general public, who can only hope for better service.

If experts see little value in expressing their uncertainty, show how decisions depend on it. Assume that experts cannot guess the nature of one’s decisions or one’s current thinking about them. Experts need direction, regarding all three elements of any decision: the decisionmakers’ goals, options, and beliefs (about the chances of achieving each goal with each option). The applied science of translating decisionmakers’ perceptions into formal terms is called decision analysis. Done well, it helps decisionmakers clarify their thinking and their need to understand the uncertainty in expert knowledge. However, even ordinary conversations can reveal much about decisionmakers’ information needs: Are they worried about that? Don’t they see those risks? Aren’t they considering that option? Decisionmakers must require those interactions with experts.

Whether done directly or through intermediaries such as survey researchers or decision analysts, those interactions should follow the same pattern. Begin by having decisionmakers describe their decisions in their own natural terms, so that any issue can emerge. Have them elaborate on whatever issues they raise, in order to hear them out. Then probe for other issues that experts expected to hear, seeing if those issues were missed or the experts were mistaken. Making decisions explicit should clarify the role of uncertainty in them.

If experts fear being misunderstood, insist that they trust their audience. Without empirical evidence, experts cannot be expected to know what decisionmakers currently know about a topic or could know if provided with well-designed communications. Human behavior is too complex for even behavioral scientists to make confident predictions. As a result, they should discipline their predictions with data, avoiding the sweeping generalizations found in popular accounts (“people are driven by their emotions,” “people are overconfident,” “people can trust their intuitions”).

Moreover, whatever their skepticism about the public’s abilities, experts often have a duty to inform. They cannot, in Bertolt Brecht’s terms, decide that the public has “forfeited [their] confidence and could only win it back by redoubled labor.” Even if experts might like a process that “dissolved the people and elected another one,” earning the public’s trust requires demonstrating their commitment to its well-being and their competence in their work. One part of that demonstration is assessing and communicating their uncertainty.

If experts anticipate being punished for candor, stand by them. Decisionmakers who need to know about uncertainty must protect those who provide it. Formal protection requires personnel policies that reward properly calibrated expressions of uncertainty, not overstated or evasive analyses. It also requires policies that create the feedback needed for learning and critical internal discussion. It requires using experts to inform decisions, not to justify them.

Informal protection requires decisionmakers to demonstrate a commitment to avoid hindsight bias, which understates uncertainty in order to blame experts for not having made difficult predictions, and hindsight bias, which overstates uncertainty in order to avoid blame by claiming that no one could have predicted the unhappy events following decisionmakers’ choices. Clearly explicating uncertainty conveys that commitment. It might even embolden experts to defy the pressures arising from professional norms and interdisciplinary competition to understate uncertainty.

If experts are unsure how to express themselves, provide standard means. At least since the early 18th century, when Daniel Bernoulli introduced his ideas on probability, scientists have struggled to conceptualize chance and uncertainty. There have been many thoughtful proposals for formalizing those concepts, but few that have passed both the theoretical test of rigorous peer review and the practical test of applicability. Decisionmakers should insist that experts use those proven methods for expressing uncertainty. Whatever its intuitive appeal, a new method is likely to share flaws with some of the other thoughtful approaches that have fallen by the wayside. Table 3 answers some questions experts might have.

TABLE 3

FAQ for experts worried about providing subjective probability judgments

Concern 1: People will misinterpret them, inferring greater precision than I intended.

Response: Behavioral research has found that most people like receiving explicit quantitative expressions of uncertainty (such as credible intervals), can interpret them well enough to extract their main message, and misinterpret verbal expressions of uncertainty (such as “good” evidence or “rare” side effect). For most audiences, misunderstanding is more likely with verbal expressions.

Concern 2: People cannot use probabilities.

Response: Behavioral research has found that even laypeople can provide high-quality probability judgments, if they are asked clear questions and given the chance to reflect on them. That research measures the quality of those judgments in terms of their internal consistency (or coherence) and external accuracy (or calibration).

Concern 3: My credible intervals will be used unfairly in performance evaluations.

Response: Probability judgments can protect experts by having them express the extent of their knowledge, so that they are not unfairly accused of being too confident or not confident enough.

Decisionmakers should then meet the experts halfway by mastering the basic concepts underlying those standard methods. The National Research Council report mentioned earlier identified several such approaches whose perspective should be familiar to any producer or consumer of intelligence analyses. For example, when decisionmakers know that their opponents are trying to anticipate their choices, they should realize that game theory addresses such situations and therefore commission analyses where needed and know how far to trust their conclusions. A companion volume, Intelligence Analysis: Behavioral and Social Science Foundations, provides elementary introductions to standard methods, written for decisionmakers.

Experts are sometimes reluctant to provide succinct accounts of the uncertainties surrounding their work. They may not realize the value of that information to decisionmakers. They may not trust decisionmakers to grasp those accounts. They may not expect such candor to be rewarded. They may not know how to express themselves.

Decisionmakers with a need, and perhaps a right, to know about uncertainty have ways, and perhaps a responsibility, to overcome that reluctance. They should make their decisions clear, require such accounts, reward experts who provide them, and adopt standard modes of expression. If successful, they will make the critical thinking that is part of experts’ normal work accessible to those who can use it.

Mother of Invention

High-handed corporate monopoly and high-minded national treasure, the American Telephone and Telegraph Company (AT&T) was a unique project of this country’s pragmatism and for decades the envy of the world in extending low-cost local telephone service.

At the heart of AT&T was its R&D unit, Bell Laboratories, the world’s greatest entity of its kind, and a giant manufacturing arm, Western Electric. Jon Gertner’s The Idea Factory is hardly the first book about AT&T history or the first about Bell Labs research. The company published a massive survey near its peak in 1977 (a second edition appeared in 1983), Engineering and Operations in the Bell System. On the topic of its pure science research, the physicist Jeremy Bernstein’s Three Degrees above Zero (1984) told the story of the discovery of the cosmic background radiation at the Holmdel branch. In 1997, Michael Riordan and Lillian Hoddeson published Crystal Fire: The Birth of the Information Age, an awardwinning history of the lab’s greatest technological triumph, the transistor.

The Idea Factory is still welcome. It is the first study of Bell Labs that puts its history in its full organizational, political, and administrative context. AT&T was a company striving to expand and maintain a privileged empire under a government that saw it alternatively as a trusted military/industrial partner and an anticompetitive threat. This ambiguous embrace, Gertner suggests, inadvertently encouraged a culture that combined a gifted and diverse workforce with a long-term outlook, creating the foundations of a new information economy, which in turn made radical changes in the charter of the parent company inevitable.

Gertner’s story is the interaction between three leaders of Bell Labs in its critical years—Mervin Kelly, Jim Fisk, and William Oliver Baker—and three of its greatest scientific minds: William Shockley, Claude Shannon, and John Pierce. Of all these men, Kelly may have been the most influential. When Bell Labs was established in 1925, many vital components of the telephone system needed radical upgrading if the network was to continue growing. Bell Labs had to develop and constantly improve its own equipment, such as the manufacture and sheathing of cables and wires. It had to invent and build its own testing apparatus. Even the leather belts of telephone line workers were rigorously studied and specified. The integration of R&D with Western Electric manufacturing ensured that challenges in supply chains and industrial processes could be met at early stages.

For an ultimately revolutionary organization, Bell Labs had remarkably conservative values: Equipment was designed, manufactured, and tested to last for 40 years. By today’s Wall Street and Silicon Valley standards, it was also remarkably egalitarian. The best-paid staff members earned no more than 10 times the annual wages of the lowest paid. Yet it was also far from a civilservice culture. Kelly’s crucial move after the end of World War II was a virtual coup, demoting experienced managers so that new groups led by younger researchers such as William Shockley could take charge, reducing at least one of the veterans to tears. Many of the materials needed did not exist yet. Shockley’s fixation on taking personal credit, so contrary to the collegial norms of the Labs, was also tolerated because of his ability to catalyze ideas developed by others.

The transistor also illustrated how the Bell system’s sensitive political position encouraged rapid diffusion of its inventions. In the midst of today’s patent wars, it’s striking to see how AT&T managers licensed what Gertner and others have considered the greatest invention of the 20th century. Ceremoniously distributing samples around the world, they made the technology available to all manufacturers for a license fee of $25,000, almost a token amount considering the development expenses and the profits to be made. It was the Bell system’s monopoly status that encouraged such generosity, justifying their view of their own company as a public resource rather than a selfish old-style trust.

The Bell system could not deploy the transistor without another revolutionary innovation that the Labs were developing at around the same time: the theory of information, which enabled engineers to ensure the highest volume and quality of transmission possible. It was one of the Labs’ most gifted researchers, Claude Shannon, who in 1948 published one of the most influential mathematical papers of all time, “A Mathematical Theory of Communication.” It was Shannon who coined the word and concept of a bit and who showed not only how ordinary communication was full of redundancy that could be suppressed for the sake of speed, but also how the addition of redundant information in the form of error-correcting codes could overcome glitches in transmission. Shannon’s work, theoretical as it was, also owed much to the Labs’ military contracts. He had been a major figure in the development of cryptography and in the automation of counterattacks against German missiles attacking London during World War II.

The Cold War was also a strategic boon for Bell Labs, helping deflect planned antitrust actions against the Western Electric monopoly on AT&T equipment supply. By performing vital work on the Distant Early Warning missile detection system and Nike missile systems and by managing Sandia Laboratories, the Bell system could continue arguing for a privileged legal position as a national strategic asset. One of Mervin Kelly’s main challenges, as he saw it, was to balance the expansion of civilian telecommunication service, and its role in prosperity and economic growth, with the company’s military role.

Not all of Bell Labs’ investments paid off. Although managed by one of the most brilliant members of the Labs, John Pierce, the satellite communication program was doomed when the federal government decided to create COMSAT in 1962, excluding AT&T. The Picturephone seemed to be the future of business and ultimately residential service, after Bell Labs’ success in reducing the costs of what were originally elite services, but the service never reached a critical mass of users. And AT&T was slow to recognize the value of fiber optics, originally developed in Europe, partly because AT&T was focused on long-distance and Picturephone transmissions, for which it believed hollow pipes (waveguides) were more practical. Gertner shows how the Labs, for all their maverick genius, were not immune to corporate inertia and the tyranny of sunk costs in older systems.

Still, Gertner may be too harsh in judging the Picturephone only as a fiasco. The historian Kenneth Lipartito has suggested that the fate of major inventions is subject not to an all-knowing market but to many contingencies. The Picturephone initiative might be regarded as a kind of proto-Web, designed for sharing images and documents as much as for face-to-face electronic conversations. Although the Picturephone may not have had a chance to become more than a niche product, the project was like many other failures in creating the conditions for ultimate successes. The real point is not that the Picturephone was a folly, but that even by the 1960s, the acceptance of technological innovation had become so complex that it could no longer be planned reliably, even by the best minds.

The paradox of Bell Labs in its original form was that its very success upset the sensitive equilibrium of its charter. The greater the scope of its products and services, the more would-be competitors could cry foul. As early as 1943, Kelly wrote to his colleagues that despite the Bell system’s conservative philosophy, “our basic technology is becoming increasingly similar to that of a high-value, annual model, highly competitive, young, vigorous and growing industry.” As the management guru Peter Drucker wrote in 1984, the applications of the Labs’ innovations were beyond the ability of any one company to realize. One scenario Drucker imagined beyond the breakup—an independent, self-supporting laboratory licensing patents to all companies—seemed too visionary even to Drucker. As Gertner notes, some of the Labs’ greatest breakthroughs were responses to challenges of telephone service, not developments of pure science. In fact, Bell Labs survived through repeated reorganization, retaining its ties to AT&T and the operating companies and enjoying spectacular prosperity as Lucent in the 1990s, before the collapse of the dotcom boom destroyed most of its market value.

Key omissions

As a history of some of Bell Labs’ greatest ideas and an analysis of the company’s strengths and shortcomings, The Idea Factory is the best all-around account yet. Yet it slights some individuals and innovations that today’s software industry considers among the Labs’ greatest contributions: the UNIX operating system and the C programming language, both credited in part to the late computer scientist Dennis Ritchie. Nor does it describe the remarkable work of the Labs in human factors, such as the remarkable succession of handsets designed in consultation with Henry Dreyfuss, which were so robust and comfortable to use that many are still in service. Dreyfuss’s designs for the Bell System, realized in collaboration with the Labs and Western Electric, remain a foundation of today’s human/electronic interfaces.

On balance, Gertner sees Bell Labs as part of a bygone business model that cannot and should not be revived in an age of rapid, consumer-oriented electronic product development and global supply chains. He cites a 1995 paper by a Bell Labs mathematician, Andrew Odlyzko, suggesting that dominating narrow market segments, not pursuing breakthrough inventions, was now the main way to corporate profits. Incremental improvements rather than giant steps were the way to go.

That remains an accurate description of the current research environment. But is this situation entirely inevitable? Breakthrough industrial research continues. IBM’s Watson Research Center’s artificial intelligence program, for example, is on a par with the tradition of the old Bell Labs. Even Toyota, then considered a conservative “fast follower” company, created the radically new Prius hybrid car in only four years in the mid-1990s. Gertner might have also have considered the vigorous academic debate on whether markets promote underinvestment in research, dating from Kenneth Arrow’s paper “Economic Welfare and the Allocation of Resources for Invention” 50 years ago. If government rules and constraints stimulated creativity at Bell Labs, isn’t it possible that changes in the tax code to shift rewards to longterm investment might benefit not only shareholders of individual companies but the broader world economy?

Yet in one respect, Gertner is right: The Labs were a unique creature of their times. As he points out, they concentrated a generation of brilliant, ambitious, often eccentric young men from the ranks of 1920s Midwestern rural and small-town tinkerers. During the Depression, the Labs were able to assemble a critical mass of sheer talent, especially because they paid up to twice academic salaries at a time when few universities were hiring. Hard times may even have accelerated researchers’ ideas. Gertner observes that when hours were reduced to save money, staff members used their extra time to audit courses taught by famous physicists at Columbia University, a subway ride from the Labs’ original location in lower Manhattan. Mervin Kelly and his successors also understood how to manage this diverse cohort and above all how to promote an ethic of generally unstinting cooperation, open doors, and what is now called mentoring. Its only parallel in academia was the Massachusetts Institute of Technology’s Building 20 after World War II. As Gertner observes, the design of the Holmdel laboratory, with its vast atrium and exterior corridors with views of the countryside taking the place of the more spontaneous interactions of the Murray Hill campus, was already beginning to undermine the ethos of the Labs. Universities, for all their vaunted interdisciplinary initiatives, are still organized around proudly independent departments.

The wonder of Bell Labs is that it could be so focused on carefully specified corporate goals in expanding its network, yet so open to chance encounters (in hiring, too) and fortunate accidents. The study of the organization’s past will thus remain an indispensable resource for thinking about the future of serendipity.

Decision Support for Developing Energy Strategies

The United States clearly needs a new energy strategy. In fact, many industrialized nations are in the same position. But this raises an obvious question: What is an energy strategy? In our view, it is a framework that will guide comprehensive and logical discussions about energy development and delivery. It is a deliberative process that encourages involvement from all key stakeholders and gives each of them a legitimate voice in the decisions at hand. It is a way to organize information and dialogue about energy options and their anticipated consequences. And it is a way to structure decisionmaking about energy choices in a manner that facilitates and easily incorporates learning.

What is not an energy strategy? In contrast to most efforts now under way in North America, it is not about promoting specific actions, such as drilling for oil offshore or exploiting unconventional oil and gas resources on land. It is not about advocating energy transportation options, such as oil and gas pipelines. It is not a plan to build infrastructure for renewable energy sources, such as wind or solar farms, or a rationale for providing subsidies for ethanol producers or for setting a price on carbon emissions. It is not a way to advance efficiency standards or carbon capture and storage, or an education plan aimed at demand-side management. Overall, an energy strategy is not about what can be done or (in the eyes of some observers) should be done. Instead, it is a process for organizing analyses, encouraging deliberations, and making decisions in a scientifically rigorous, transparent, and defensible manner.

A good analogy for an energy strategy is that of an individual’s financial investments: Different people have different investment objectives and different tolerances for accepting risks, both of which change through time. So it makes sense that investment strategies will differ across individuals and through time. An energy strategy is also specific to the objectives of the decision participants, and a useful strategy is one that establishes a framework for helping people—policymakers, scientists and innovators, and the public—to answer questions about which components of an energy system are preferred. Specifically, an energy strategy should inform choices about the desired level of investment in each element of an energy portfolio, where these investments should be made geographically, and the signals or tipping points that will trigger the reallocation of funds and attention from one resource (coal, for example) to another (say, renewables) over time. It should distinguish between sources that are ready for development and those that require additional research. Overlaid on these questions, which themselves are not easy to answer, are questions about the level of risk and uncertainty that policymakers and the public are willing to tolerate.

Barriers to good decisionmaking

Decisionmaking, although seemingly intuitive, is fraught with complexity. Staying with the example of financial planning, consider the choices that people must make about their investment portfolio. Most people have a sense of what they want to achieve with their decisions—for example, high rates of return, stability, low uncertainty, and social responsibility. People tend also to know what a subset of their options is. But despite this knowledge, the vast majority of people have made investment decisions that they have regretted. In our view, such behavior has five main causes, as demonstrated by a wealth of research:

First, people are not strict maximizers of overall utility during decisionmaking. Rather than evaluating alternatives by carefully weighing the importance of the various attributes—costs and benefits in terms of economic, environmental, health-related, and social considerations, for example— people take shortcuts. Even though these shortcuts are commonplace, many people fail to recognize their existence or the systematic biases that accompany them. It is true that these shortcuts are an essential aspect of human decisionmaking; without them, most of the decisions people face in their daily lives would be overwhelming. On the other hand, as the consequences associated with high-stakes decisions increase, as is the case in making national energy choices, so too does the level of effort and accuracy required on the part of decisionmakers.

Second, decisionmakers typically do a rather poor job of fully characterizing and appropriately bounding the decision problems (or opportunities) they are being asked to confront. In many cases, problems are cast too narrowly, such that single objectives (such as maximizing economic opportunities or minimizing carbon emissions) become the sole focus, to the detriment of other objectives that also deserve attention. In other cases, decisions are cast so broadly, with dozens of competing stakeholders and objectives, that the result is paralysis and, ultimately, inaction. And for the goals and objectives that are considered during decisionmaking, people tend not to do a terribly good job of determining accurately and precisely how to measure their performance or achievement.

Third, people tend to anchor too easily on certain alternatives and typically do not do a good job of thinking broadly and creatively about the full range options they can and should be considering. Too often, decisionmakers focus on alternatives that fit neatly with deeply held ideologies, that most easily come to mind, or that have been implemented previously. Decisionmakers also often possess a strong bias toward being unnecessarily faithful to existing investments, even when trading them in for others makes more sense in light of public, business, or national interests (decision researchers call this the sunk cost bias). Each of these tendencies is problematic for decisionmaking. Given the gravity of decisions related to energy, the alternatives under consideration must go beyond the status quo, or the obvious and familiar. They should be responsive to markedly different objectives and strategies, thereby presenting decisionmakers with real options and choices.

Fourth, when these factors—judgmental shortcuts, poorly specified problems, and insufficient creativity when thinking about alternatives—are combined, it becomes difficult, if not impossible, for decisionmakers to confront the tradeoffs that inevitably arise when choosing among options. Policymakers talk often about “win-win” alternatives and consensus. But the fact is that the design of a defensible energy strategy will always involve tradeoffs, giving up something valued in exchange for something else that is also valued, and this threatens consensus and renders win-win alternatives impossible.

Fifth, decisionmakers often fail to adequately learn from their past successes and failures or from the successes and failures of others. Rather than treating decisionmaking as a series of one-off events, there is need for a more adaptive approach designed specifically to help decisionmakers and policymakers learn about systems in which they work by carefully monitoring the outcomes of decisions through time. A good adaptive framework will also help decisionmakers draw lessons from multiple decisions across several jurisdictions as a means of identifying the next and best moves in what is viewed as a series of linked policy decisions.

Constructive in nature

These observations challenge a common assumption held by pollsters, social scientists, and policy analysts, among others, that people possess a pool of preexisting preferences that they simply uncover during the process of making judgments. It is true that in a variety of contexts, preexisting preferences can indeed be identified; people prefer red wine to white, or baseball to football. However, recent research in the decision sciences has demonstrated that there are also many situations where the preferences or preference orders needed to inform decisions are insufficient or altogether absent.

Generally, these decision contexts share one or more of three characteristics. First, the decision context may be foreign, with the implication that preexisting preferences do not exist. Second, decisionmakers may be faced with the relatively common situation in which the evaluation of competing alternatives causes two or more preexisting preferences to conflict. In other words, tradeoffs become necessary, which requires the construction of new preferences based on how decisionmakers balance or rebalance conflicting priorities. Third, decisionmakers may be required to translate qualitative expressions of preference into quantitative ones (and vice versa). Moving from the recommendation, for example, that a carbon market be created to actually setting a price on carbon requires a constructive process. Decisions about energy strategy typically include all three of these features.

Under these conditions, people are unable to evaluate decision problems and alternatives by simply drawing on preexisting and stable preferences. Instead, they must construct their preferences, and by extension, the judgments and decisions that result from them, in response to cues that are available during the decisionmaking process itself. Some of these cues will be internal, reflecting deeply held worldviews or ideologies. And some will be external, in the sense that they are associated with the information that accompanies a decision problem; for example, these cues may take the form of technical information presented by experts about problems or alternatives, or they may only become apparent in light of recent events (as the risks associated with nuclear power became much more salient after the meltdown at Japan’s Fukushima Daiichi plant in 2011). From this perspective, deliberative processes convened by researchers and policymakers, be they experimental or practical, or employed by individuals or groups, have the de facto purpose of serving as engineers of judgment and decisionmaking rather than as tools for simply revealing preexisting preferences.

The implications of preference construction for decisions about an energy strategy are far-reaching. On the one hand, the constructive nature of judgments can be viewed as a “bad news” story, in that it suggests that people can be easily manipulated by interest groups or by industry. One need not look far (the protests around Canadian oil sands and the Keystone XL pipeline, for example) to see how easily and quickly public opinion and related policy preferences can be shaped by a well-organized social movement or public relations effort.

On the other hand, the constructive nature of energy strategy judgments is also very much a “good news” story. For example, the notion of constructed judgments means that decision support processes (and institutions) can be designed so that they do a better job of accounting for how information and decisionmaking strategies are used or misused during the construction of judgments. By recognizing that decisionmakers rely heavily on contextual cues that are available to them as they construct judgments, it becomes possible for analysts and facilitators to provide a defensible context or structure for decisionmaking. Indeed, it is our view that those who lead such decisionmaking processes are obligated to employ decision processes that will help people construct the highest-quality judgments possible in light of the various constraints they face, including access to high-quality information, time to think carefully and deliberate options, adequate funding, and informationprocessing capabilities.

Structuring decisions

If one accepts the argument that a national energy strategy is akin to a long-range investment (or in some cases, divestment) program that requires carefully constructed judgments, then a broad-based and iterative decisionmaking process will be required to engage stakeholders over an extended period.

In designing such a process, it is worth noting that many advocates of inclusivity in decisionmaking worry that too much structure will lead to biased input and will unnecessarily constrain the breadth of ideas and expertise. This is the “error of commission” argument. Although we acknowledge this concern, we argue that when incorporating stakeholder views relating to important energy choices, far more is needed than just an invitation for the interested parties to participate and share their opinions. Such an approach, typical of many public involvement processes, will have substantial shortcomings in terms of helping people to make thoughtful and defensible decisions in complex or unfamiliar contexts. This is called the “error of omission” argument. To bring this latter point to life, one need only look at the chaos and frustration accompanying the approximately 4,000 10-minute testimonies before by the Joint Review Panel that is considering (on behalf of Canada’s National Energy Board) different options for transporting bitumen from the oil sands in Alberta to tidewater in northwestern British Columbia (and then by ship to Asia).

Decision researchers have long demonstrated that in a variety of loosely structured situations, both individuals and groups grapple with a predictable set of difficulties when making complex decisions that are related to how information is framed and how emotions interact with, and often preempt, more in-depth analysis. One of the fundamental conclusions is that people often end up making decisions that, at best, only partially address the full range of their concerns and subsequently fail to confront required tradeoffs when evaluating competing alternatives.

These findings also suggest that along with the provision of information about the likely consequences of proposed actions, a carefully structured framework for decisionmaking is needed to help provide the necessary context needed to better understand the complex social, economic, and environmental issues that are commonplace in discussions about energy. Such a framework is composed of six basic elements, each one supporting the others in ways that are dictated by the specific decision context. These elements serve to:

  • Define clearly the decision problem that is to be the focus of analysis while taking into account the bounds and constraints under which decisions must be made.
  • Identify objectives that will guide the decisionmaking process, including the performance measures that will be used to gauge success or failure in terms of meeting them.
  • Create logical and creative alternatives that directly address these objectives.
  • Establish the predicted consequences that are associated with alternative courses of action, including key sources of uncertainty.
  • Confront inevitable tradeoffs when selecting among alternatives.
  • Implement decisions, monitor outcomes (as measured by the achievement of objectives), and adapt to changing conditions.

Regional case in point

These lessons are evident in recent research in which several of us developed and tested a framework for crafting an energy strategy for Michigan State University (MSU). (For further information, see http://energytransition.msu.edu.) MSU has a cogeneration facility located on campus that converts the thermal energy from burning coal, natural gas, and biomass into electricity and steam. With a peak electrical output of 99.3 megawatts and a pressurized steam generation capacity of up to 1.3 million pounds per hour, it is the largest on-campus coal-burning power plant in the United States. The facility is the principal energy provider to the main campus and is capable of meeting approximately 97% of all electricity demand. Steam that is generated is distributed at high pressure to the campus to provide heating and cooling to a campus spread over approximately 5,000 acres.

In 2008, MSU commissioned development of a process for developing a new strategy for long-range energy generation on the campus. The goal was to transition away from a fossil fuel–based (coal and natural gas) energy strategy to one based entirely on renewables by approximately mid-century. A parallel goal was to help establish a multistakeholder decision support process that could serve as a template for similar energy strategy decisions in Michigan, elsewhere in the Unites States, and abroad.

The research team began by holding a series of meetings with university officials to define the decision problem (for example, the desire to transition from fossil fuels to renewables) and identify the boundary conditions for the decisionmaking process (for example, identifying stakeholders whose ideas would be critical to the process). We followed these meetings with several workshops and focus groups to identify the range of objectives that were important to key stakeholders on and off campus (for example, students, staff, faculty, and neighboring communities) and potential performance measures that would be useful for tracking their achievement. Through additional workshops and a lengthy engineering review process, we narrowed the objectives and their associated performance measures to a short list of critical considerations that would be used as part of a strategy development process cast widely across the community.

In a critical step at this stage, we created an energy system model capable of forecasting the anticipated outcomes of alternative energy strategies in terms of the key objectives and related performance measures. This model became the centerpiece of an online decision support platform that people—policymakers, experts, and the public—would use as a means of participating in the development of the energy strategy. The online platform built on recommendations from the National Research Council, issued in 2009, about how best to present information relevant to decisions about energy in a decision-focused environment. The platform was designed to engage people in the process of learning about energy systems, including their environmental, economic, and social considerations.

Beyond simply educating people, however, the decision support framework provided users with an opportunity to design their own alternative energy system. In constructing their energy system of the future, users could mix and match individual energy generation (and supporting) technologies for deployment at different times over the course of the energy strategy. The technologies for consideration included centralized power plant options (for example, coal, natural gas, biomass, or nuclear power), decentralized options (solar, natural gas, microturbines), energy from the national power grid (relying on either conventional fuels or renewables), carbon management techniques (for example, carbon capture and storage), and levels of effort expended on building efficiency. As users built their energy strategies, they were able to monitor their ability to meet future energy demand, and they could track, via the energy system model, the forecasted performance of their strategy, as measured against the agreed-on objectives and performance measures.

In addition to simply suggesting a desired energy strategy, this decision support framework also challenged users to evaluate their portfolios in comparison with a broad array of others representing markedly different priorities. In doing so, people were required to be explicit about the pros and cons of each of the energy strategy options under consideration; for example, how much additional cost were they willing to bear in exchange for reduced greenhouse gas emissions or the warm glow that comes with being at the leading edge of innovation? Conversely, to what extent were users willing to comprise on air quality or employment as a means of keeping costs near the status quo?

In order to inform these comparisons, the decision support platform included a module that helped users confront tradeoffs and make internally consistent choices (that is, choices that reflected objectives of greatest concern). We built this module, which uses tools from multicriteria decision analysis, on the notion that internally consistent choices begin by having a clear sense of how important individual objectives are to decisionmakers. With this information in hand, users could apply the energy system model and determine a rank order of energy strategy alternatives based on the degree to which each one best satisfied the most important objectives.

A scaled-down version of this decision support system is now on display at the Marian Koshland Science Museum of the National Academy of Sciences in Washington, DC. It can be used by museum visitors of all ages and all levels of education to simulate the creation of a national-level energy strategy in the United States. At the time that the MSU and Koshland frameworks were designed, they were intended for making discrete decisions required for the creation of an energy strategy. For making and revising decisions through time, users would need to revisit the decision support tool (and update the energy system model, if necessary) at various intervals during the rollout of an energy strategy. By doing so, decisionmakers could evaluate existing aspects of an energy strategy by the degree to which they still reflected the current state of the science around energy systems. And, importantly, they could evaluate an existing energy strategy by the degree to which it still reflected objectives of greatest, perhaps national, concern.

Approaching decisions about energy in this way may seem like a tall order and, worse, a recipe for making large investments (for example, in infrastructure) that cannot easily be reversed. It is true that energy strategies will require large investments of this type. But technically speaking, there are ways forward. In the case of our work with MSU, for example, energy alternatives that incorporated flexible infrastructure, such as swappable fuel powergeneration units, were favored over technologies that would lock decisionmakers into a particular fuel type for decades. Practically speaking, this meant that flexibility and reversibility became high-priority objectives (trumping others related to cost, for example) in the eyes of planners and policymakers.

Another example comes from the hydroelectric utility in British Columbia, where the provincial energy strategy was designed to include regular reviews of all decisions pertaining to water releases (and, therefore, electricity generation) at hydroelectric dams. These reviews are required to ensure that energy projects remain in line with the objectives of key stakeholders and the changing state of scientific knowledge about the broader social and environmental systems in which energy infrastructure resides. In both of these cases, and in others, policymakers are also beginning to recognize that following through on sunk costs, even if devoted to projects that cannot easily be reversed or retasked, is not a sensible strategy in many energy strategy decisions, because they are irrelevant when considering the outcome that ought to matter most—namely, future benefits.

Overall, an energy strategy needs to be flexible and adaptive so that it can incorporate what is learned over time. Admittedly, however, decisionmaking over time and the adaptive demands of making a sequence of choices add additional challenges to already difficult decisions. Fortunately, the kind of decisionmaking approach we are describing provides science-based guidance, and much-needed structure, to energy strategy development that will by necessity require multiyear (or multidecade) investments.

To this end, we are currently in the process of creating an upgraded version of the MSU decision support framework for use in developing a national energy strategy in Canada. This version of the framework includes an opportunity for decisionmakers to project decisions farther into the future, taking into account the changing tenor of the energy debate in the country. Such changes may include, for example, evolving assumptions about emerging technologies and the need for infrastructure, and the national and international demand for Canadian energy resources, which may be affected by concerns about climate change, adoption of policies that put a price on carbon, or changes in policies or behavior that may affect energy recovery, processing, or use. We are also using a similar approach to lend insight to decisions about hydraulic fracturing in oil and gas development (which has cumulative effects on environmental, economic, and social systems), pipeline-permitting processes, and carbon and climate management initiatives domestically (such as carbon capture and storage or geoengineering) and in the developing world (for example, through the United Nations Collaborative Initiative on Reducing Emissions from Deforestation and Forest Degradation).

In sum, the decision support framework outlined here encapsulates the five critical decision support elements: clarifying problems, thinking clearly about objectives, designing creative alternatives, modeling consequences, and confronting tradeoffs. It works by breaking what is a very complex decision—the creation of an energy strategy—into a series of smaller, more manageable parts that are less prone to error and bias. Research conducted to evaluate this framework has shown that it leads to higher-quality decisions (measured by the degree to which users’ choices are internally consistent), more-satisfied and better-educated decisionmakers, and, importantly, greater trust and transparency in the process.

The road ahead

Because of complexities associated with decisions of the type faced by policy makers and society around energy, we recommend strongly that policymakers (and researchers) turn their attention toward enhancing decision support capabilities around energy and related concerns, such as climate change. The Obama administration’s creation of a climate services portal within the National Oceanic and Atmospheric Administration, as well as other independent initiatives focused on energy, are important first steps toward this goal in that they place up-to-date information about problems and opportunities in the hands of decisionmakers. However, thoughtful and defensible decisions concerning the development of energy strategies will require more than high-quality scientific information. Energy strategies, whether local, regional, or national, will also require a process for incorporating the values and risk tolerances of stakeholders and for linking values and facts as part of a series of thoughtful decisions over time and space.

An energy strategy needs to be flexible and adaptive so that it can incorporate what is learned over time.

In this regard, energy strategies (and the decisions that underlie them) are not vastly different from strategies that many people are familiar with and support: those relating to national defense. Strategies for national defense require investments, reinvestments, and divestments across different branches of the military. Defense strategies must also recognize the need for different investment decisions on a geographic scale, understanding that there is no one-size-fitsall approach to securing the nation. And defense strategies must be nimble in the sense that they are flexible and can shift (sometimes quickly and sometimes more slowly) in response to existing and emerging national security threats.

Likewise, the development of energy strategies will require different levels of investment in different kinds of energy-generating technologies (and perhaps in technologies for managing carbon dioxide and other greenhouse gases). In a country as large as the United States, those decisions will need to be responsive to and respectful of different needs and constraints in different geographic locations. And as boundary conditions (policies, market demands, and environmental concerns, among others) change, so too will the need for investments in different energy technologies.

Even under the best circumstances, members of the public and policymakers alike will need help in making these kinds of complex and interlocking decisions. As we have argued, decision processes are often prone to shortcuts, error, and bias. In the case of choices as important as those concerning national energy strategies, failing to address these challenges in a credible way is as irresponsible as relying on out-of-date and substandard technologies. Failing to make strides in the science and application of decision support approaches for energy development choices would be as foolish as continuing to rely on kerosene to illuminate the nation’s streets and homes.

In the end, what will separate the successful actors from the unsuccessful ones in the new world energy order is the recognition that a focus on a single approach or even a bundle of approaches at a single point in time is not the answer. Moreover, successful nations will recognize that they need to go well beyond simply providing people with a menu of energy-related offerings. The real need is to provide people with a mechanism for making a series of difficult and interrelated choices among them over time. This is only way to avoid ideological stalemate. When viewed in this light, the real product of a national energy strategy is not a particular outcome. Instead, it is a sensible, credible, and defensible decisionmaking process.

Archives – Summer 2012

J. P. WILSON, The Academy by Moonlight, Oil on canvas, 28.25 x 34.25 inches, 1925.

The Academy by Moonlight

Draftsman James Perry Wilson (1889-1976) was an associate of architect Bertram Grosvenor Goodhue during the design and construction of the National Academy of Sciences building. He was also a self-taught plein air painter, well known for painting backdrops such as the ones in the American Museum of Natural History’s Hall of North American Mammals. This painting is unusual because Wilson combined his architectural rendering skills with his fascination in capturing the quality of moonlight.

The NAS building, which has been closed for extensive restoration, is now open to the public.

What Makes U.S. Energy Consumers Tick?

On October 6, 1997, during the runup to the Kyoto Protocol negotiations in Japan later that year, President William J. Clinton described barriers to the adoption of energy-efficient technologies at a White House Conference on climate change. Saying that he was “plagued” by the example of the compact fluorescent light bulb in the reading lamp in his living room, he asked himself, “Why isn’t every light bulb in the White House like this?”

The president then put his finger on a central question about consumer behavior, asking, “Why are we not all doing this? … we’d have to pay 60% more for the light bulb, but it would have three times the useful life. Therefore … we’d pay more up front, we’d save more money in the long run, and we’d use a whole lot less carbon. And why don’t we do it? Why do we have any other kind of light bulbs in our homes? So when you get right down to it, now, this is where the rubber meets the road.”

His remarks still resonate today. They highlight fundamental and persistent challenges in the United States to the use of energy efficiency as a potent tool for efforts to mitigate climate change, strengthen national energy security, and realize the economic benefits of a comprehensive, forward-thinking energy policy. These obstacles include individual and collective attitudes and behavior, household economics, and a paucity of readily available information on the benefits of energy-efficient consumer technologies. Understanding and managing such obstacles is the realm of the social sciences rather than technology and engineering, yet the social science of energy efficiency remains underappreciated by those who are best positioned to institute policies to promote energy efficiency as a solution to energy and climate problems.

Every president since Richard Nixon has devised a plan for changing the U.S. energy system, yet each one has failed to meet its objectives. The good news is that during the past four decades, the commercial availability of many advanced, efficient, and cleaner energy technologies has increased while their costs have fallen substantially. Partly as a result of the new technologies, the United States has steadily reduced the energy intensity of its economy. Nevertheless, it still ranks 134th among nations in the overall energy efficiency of its economy. Even among its industrialized economic competitors, the United States compares poorly in terms of energy intensity: According to data from the U.S. Energy Information Administration, it is about 38% more energy-intensive than Germany and Japan.

Countless careful studies of the energy system, including those from the National Academies and the President’s Council of Advisors on Science and Technology, have provided a clear idea of the technologies needed to transition to a less carbon-intensive economy, yet the U.S. energy system of today looks much like the one of four decades ago. The Department of Energy (DOE), though tasked with funding and conducting “use-inspired” research, devotes little time or investment to studying how newly developed energy technologies ultimately succeed or fail in the marketplace and how they affect U.S. society.

Furthermore, the vast majority of the DOE’s investment in energy research, development, and demonstration has focused on supply-side technologies. Since 1985, the DOE has dedicated only 19% of its research, development, and demonstration (RD&D) spending to energy-efficient technologies, and nearly half of the total has gone to advanced vehicle technologies. The federal government has paid virtually no attention to the energy-related social sciences, yet the energy savings achievable through behavioral changes and the adoption of existing technologies are in many cases larger, cheaper, and more immediate than those achievable through further technology development, at least in the near term. More federal support for technological RD&D is certainly sorely needed, but those investments could be significantly leveraged through the application of social and behavioral research on technology acceptance and use.

Picking low-hanging fruit

Household energy consumption for space heating, appliances, lighting, and personal transportation is responsible for nearly 40% of carbon emissions in the United States. The potential benefits of greater energy efficiency in the household sector are large; a 2009 study by Thomas Dietz and colleagues found that annual greenhouse gas emissions from the residential sector could be reduced by 20% within 10 years by employing 17 types of behavioral interventions, such as weatherizing houses or properly maintaining vehicles and heating, ventilation, and air conditioning equipment. This level of reduction equates to 7.5% of total U.S. greenhouse gas emissions. The study did not assume 100% adoption of each intervention; rather, drawing on empirical data of adoption rates for previous health and environmental behavior interventions, the researchers estimated potential adoption rates ranging from 15% for carpooling to 90% for weatherization.

Notably, the analysis included only interventions that are currently and broadly available, are low- to no-cost, and do not require major lifestyle changes. Even greater reductions could be achieved through actions that require greater lifestyle adjustments, such as living closer to work or telecommuting, and through the adoption of novel technologies that are currently on the verge of mass-market penetration, such as heat-pump water heating and air conditioning.

The study also did not include the emissions savings achievable through federally mandated improvements in the energy efficiency of appliances and lighting use, such as the phase-out of inefficient light bulbs stipulated by Congress in the Energy Independence and Security Act of 2007. Indeed, the emissions savings achievable through new energy-efficiency regulations could equal those possible through behavioral interventions. Thus, changes in household behavior and advanced technology adoption, coupled with new government efficiency standards, could reduce household greenhouse gas emissions by 15% of total U.S. emissions within 10 years, with low costs or even positive financial returns to consumers.

Engaging consumers

Achieving these results will require developing more effective strategies to promote the adoption of energy-efficient technologies and practices. Up-front cost is a major barrier to the adoption of more-efficient technologies, and this consideration often outweighs the potential for long-term savings in consumer decisionmaking processes. On the behavioral side, barriers to household actions include existing regulations, infrastructure issues, limited consumer choice, and a lack of information about the energy (and cost) savings achievable through behavioral changes.

To overcome current barriers, behavioral interventions will need to be coupled with properly designed policies aimed at facilitating their adoption. Drawing on 30 years of social science research on consumer behavior and decisionmaking, the researchers who conducted the 2009 study developed six design principles for effective technology deployment programs.

First, outreach programs should focus on the actions and technologies that are likely to have the greatest impact; that is, those with the most technical potential and the greatest potential to change behaviors and attitudes among the largest number of individuals. Second, where applicable, the financial incentives must be sufficient to get people’s attention. Third, an effective marketing campaign must be put into action. Fourth, credible and accessible information must be made available to the consumer. Fifth, participation in the program must be simple and easy. Finally, a trustworthy quality-control mechanism must be in place to ensure that products and services meet expectations.

It is instructive to compare two recent federal programs in light of their fidelity to these principles: the low-income weatherization assistance program and energy-efficiency tax credits funded by the 2009 American Reinvestment and Recovery Act (ARRA), and the 2009 vehicle trade-in program known as “Cash for Clunkers,” which like ARRA was designed primarily as an economic stimulus measure.

Although serious questions remain about whether Cash for Clunkers achieved its stated policy goals in a cost-effective manner, there is little question that it was wildly popular: The initial $1 billion allocation was claimed within a month, and a supplemental $2 billion appropriation was used up shortly thereafter. Nearly 700,000 cars were scrapped in only two months. The program was successful in catalyzing public acceptance because it met the criteria listed above: The financial reward (a $3,500 to $4,500 rebate) was large and immediate; trustworthy information on the product was readily available; the program was publicized through an extensive, industry-financed marketing campaign; participation was easy, since car dealers handled most of the paperwork; and the product was of known quality.

In contrast, although the financial incentives for participation in ARRA-funded weatherization and energy efficiency tax credit programs are generous, these programs often suffer from poor marketing, delayed incentives, burdensome paperwork, and uncertain product quality.

Greater attention should be paid to addressing these barriers by applying existing behavioral and social science research to energy policy and by fostering new research on a number of critical questions. The most productive strategy will be to identify and promote the behaviors and technologies that can have the greatest impact on energy consumption and simultaneously to address the many barriers to these choices through major outreach campaigns. The failure to fulfill any of the six stated principles can block progress, yet many programs focus on satisfying only one or two tenets and thus do not gain much headway at the consumer level. As President Clinton suggested, simply providing consumers with pertinent information on energy savings, though important, is not sufficient to effect change.

Energy efficiency in transportation

Cash for Clunkers unintentionally illustrated the difficulty of achieving cost-effective emissions reductions through financial incentives. To be sure, a primary goal of the program was to rescue the U.S. auto industry, but a second policy objective was to improve the efficiency of the U.S. passenger fleet. Analyses of the results of the program by researchers from Stanford University and the University of California at Davis found that the program’s cost was between $162 and $500 per ton of carbon dioxide emissions avoided. This range considerably exceeds the estimated $28 per ton carbon dioxide abatement cost of the cap-and-trade regime included in the painstakingly negotiated American Clean Energy and Security Act of 2009, as computed by the Congressional Budget Office. Research on incentivizing hybrid vehicle purchases suggests that reducing the program’s cost to be comparable with other carbon-mitigation strategies on a per-ton basis would probably have required lowering the rebate to such an extent that it would have discouraged public participation.

Why don’t vehicle purchasers simply demand greater fuel efficiency in the marketplace? Their resistance runs counter to rational economic thought, because the extra cost of efficiency technologies, such as hybrid gas-electric power trains, can be recouped over time through lifetime fuel savings, even without taking into account vehicle purchasing credits and other incentives. Barriers to greater acceptance include the difficulty of estimating fuel savings, the inherent complexity of the process of purchasing vehicles, and the many competing attributes that consumers look for in a vehicle, such as body styling, safety appliances, and luxury features.

A further barrier is the huge variability in fuel savings across hybrid vehicle models. Not all hybrids are the same. In many cases, the greater fuel efficiency of hybrid gas-electric engines has been used to boost engine horsepower rather than to improve fuel economy, as measured by miles traveled per gallon of gas. Dissatisfaction with the fuel economy of certain hybrid models, and a lack of understanding of the difference between fuel efficiency and fuel economy, may at least partially explain why only 35% of hybrid car owners buy another hybrid vehicle.

Compounding the problem, it does not appear that federal income tax incentives for purchasing cars that are more efficient have been effective, due to delayed and uncertain returns. A 2011 Harvard Kennedy School study of U.S. hybrid vehicle purchases found that the type of tax incentive offered was as important as the generosity of the incentive: sales tax waivers were associated with more than a tenfold increase in hybrid sales relative to income tax credits of similar value. Rising gasoline prices were associated with greater hybrid vehicle sales, but this effect operated almost entirely through vehicles with very high fuel economy, because “mild” hybrids, such as the Honda Civic, offered only marginal fuel economy improvements. Social preferences, most notably environmentalism, and access to high-occupancy vehicle lanes were also found to be significant factors in consumer adoption.

Social forces at work

Consumers often turn to trusted acquaintances, such as friends and neighbors, for information on the comparative benefits of energy-efficiency improvements. Social networks and early technology adopters are thus important mechanisms by which accurate information on energy efficiency is disseminated within neighborhoods and other social circles. To continue with the hybrid car example, social norms have proven essential to the diffusion of hybrid vehicle technology. Matthew Kahn at the University of California at Los Angeles has examined the effect of high concentrations of “green” voters in California counties on the ownership of “green” and “nongreen” vehicles. His regression analysis found that as the Green Party share of voters increased from 0% to 4%, the predicted count of registered Priuses per census tract increased from 2.2 to 46.2. Strikingly, only the Prius among hybrid cars showed such a large positive correlation with political affiliation. Kahn attributed this effect to social interactions and the perception of the Prius as the “greenest” vehicle to buy.

In an intriguing 2007 survey of hybrid vehicle owners that deals a blow to economic rationality, Ken Kurani and Thomas Turrentine of the University of California at Davis found not a single household that analyzed its fuel costs in a systematic way, and almost none that factored gasoline costs into the household budget. The high fuel economy of the hybrids purchased by these households signified some other important value. The researchers found that some buyers of highly efficient vehicles were attracted by the new technology, others by the environmental benefits or a sense of “living lighter,” but that potential financial savings motivated none.

In light of these observations, does providing consumers with increased information actually enable people to make more-informed decisions about energy efficiency? Here, the main issues are informational overload (too much information can lead to analytic paralysis) and the structural obstacles that consumers face in adopting new highly efficient technologies. More research is needed on how to design information to be easily understood, how to disseminate this information through trusted sources, and where well-designed information has the highest impact.

There also is the question of who should provide the information. A developing area of interest is the potential role of service providers, including realtors, mortgage agents, and service technicians, in educating households on opportunities to improve building efficiency, either during the home purchasing process or during large renovations or additions.

From the consumer standpoint, technology adoption is affected not only by price but also by payback time: What annual return in energy savings do consumers require before they will use a technology? Because this rate is profoundly affected by social norms and behavioral considerations, social science research can provide guidelines on how to reduce it so the up-front cost of energy efficiency becomes less of a deterrent.

To overcome current barriers, behavioral interventions will need to be coupled with properly designed policies aimed at facilitating their adoption.

More research is also needed on the “rebound” effect, inwhich the energy savings from efficiency improvements are partially offset by corresponding increases in energy use. Two types of rebound, direct and indirect, have been observed. Direct rebound refers to situations in which moreefficient energy technologies lead to increased use of those same technologies. One example is when households use the financial savings from more efficient home heating equipment to heat their home to a greater extent. Indirect rebound refers to cases where the savings from efficiency gains are used to purchase other energy-intensive goods and services, either at the individual level or because of increased economic activity across society.

In theory, the rebound effect can be quite large, as can happen, for example, if drivers of hybrid vehicles find that they can now drive twice as far on the same gallon of gasoline (a case of direct rebound). In practice, direct rebound effects rarely approach the savings from energy-efficiency improvements: For air conditioning, space heating, and transportation, the effect is generally in the range of 10 to 30%, whereas for lighting and appliances, it typically is less than 20%.

Indirect rebound is often more difficult to measure, but in some cases it may exceed the magnitude of direct rebound. It should be kept in mind, however, that to the extent that indirect rebound effects are coupled to increased economic activity or standards of living, they are not necessarily an undesirable phenomenon, partuclarly for economically-disadvantaged populations. Nevertheless, there is interest among researchers in exploring ways to reduce rebound. For example, what would discourage drivers from responding to increased fuel economy by driving more miles? Would a gradually escalating gas tax be as effective as suggested by current models, or is there an effective upper limit to vehicle use that eliminates the need to create financial disincentives for increased energy use?

Major challenges remain

Reducing the energy intensity of the U.S. economy is a huge undertaking that will certainly require strong and committed leadership. Political will to devise ambitious and strategic energy policy is feeble, resistance from some interests is formidable, and the public does not appear to pay much attention to energy policy. Compared with current knowledge about the technological options, there is only a rough understanding of how society shapes the energy system, and how the system, in turn, affects society. In the United States, local leadership as well as public and corporate support for major energy policy changes have enabled some local and state governments to enact strategic and dramatic changes in policy. Similar successes have occurred in other countries.

A report from the American Academy of Arts and Sciences, Beyond Technology: Strengthening Energy Policy through Social Science, issued in 2011, concluded that the social sciences could promote energy efficiency and other advanced energy technologies through better-informed energy policy decisions. The report highlighted several existing social science tools that could be used immediately to make energy policy and programs more effective, and the study committee also raised critical questions that still need to be examined rigorously. Questions included the following: Where large up-front financial barriers to the adoption of energy-efficient technologies exist, what policies most effectively persuade consumers to adopt those technologies? Which policies do so most cost-effectively? Numerous jurisdictions around the country have experimented with policies to reduce or eliminate upfront financial barriers, but which of these are the most successful and why? How can governments and utilities effectively market energy-efficiency programs?

Existing social science knowledge can help formulate marketing campaigns that take into account intrinsic and extrinsic motivations, personally relevant and nontechnical information, social norms and relationships, and technologies that are accommodated by pervasive routines and lifestyles. Still other energy policies, however, suffer from a lack of social science research about their effectiveness. Although jurisdictions around the country have adopted building standards for energy efficiency, few data are available about how well they are enforced. The nation relies increasingly on voluntary industry standards, such as LEED and Energy Star, but far too little is known about who participates in the programs, how to improve participation rates, and whether the programs deliver the kinds of returns they promise.

Indeed, scientific investigation of people, firms, institutions, and behavior can help generate better understanding of how to reduce the energy intensity of the nation’s economy. To this end, the President’s Council of Advisors on Science and Technology, reporting in 2010 on its in-depth examination of the U.S. energy innovation system, recommended that the DOE and the National Science Foundation launch an interdisciplinary social science research program to address the nontechnical barriers to cleaner and more efficient energy technologies. The National Academies’ study America’s Climate Choices, published in 2011, also recommended establishing an integrative, interdisciplinary research enterprise that includes the social sciences.

Charting an agenda

The American Academy’s Beyond Technology report presents a preliminary social science research agenda to understand the societal and institutional challenges to achieving an alternative energy future, recognize the pioneering work in this area, and identify knowledge gaps. The agenda is grouped into three clusters. The first grouping, comprising individual behavior, decisionmaking, and technology acceptance, includes questions such as: How does the private sector market products, and what is the applicability for products with social benefits? How are energy-related norms and behaviors influenced by social networks? What are the effective and ineffective strategies for engaging the consumer in pricing strategies (such as time-of-use electric billing, which charges higher rates during peak energy-use periods and lower rates during off-peak periods) that contribute to more efficient energy use?

The second grouping asks how to incorporate behavior into policy analysis: How do people actually use and respond to household technologies such as smart meters, and how does this response differ from what models assume? What behavioral changes have the most economic and technical potential?

The third cluster of research questions relates to policy development and regulations. Key questions include: How do jurisdictional conflicts (especially between state and federal policies) impede public/private partnerships? What is the relative effectiveness of existing energy policies? How does the United States compare with other countries? Ongoing work at the American Academy is highlighting these research needs and tackling some of the most pressing questions.

Striving for consilience

As a bottom line, the reason for fortifying the link between energy policy development and the social sciences boils down to two crucial observations. First, although transforming the energy system could provide vast social benefits, achieving large reductions in the energy intensity of the U.S. economy will require significant societal changes. The nation can change not only its sources of energy but also how energy is used, delivered, priced, and regulated. The social and behavioral sciences can illuminate much about how social processes might help shape—and drive—that change. Second, the social sciences can help in overcoming barriers to taking sensible steps to change the system in the near term, a classic example being the “efficiency paradox,” in which residential and business consumers choose not to make improvements in energy efficiency that would in fact bring rapid financial benefits.

In his book Consilience, published in 1998, the biologist E. O. Wilson posited that most real-world problems exist at the intersection of different disciplines: “Only fluency across boundaries will provide a clear picture of the world as it really is, not as seen through the lens of ideologies and religious dogmas or commanded by myopic response to immediate need.” Energy problems—perhaps most clearly in the case of climate change—are excellent examples. Indeed, as the President’s Council of Advisors on Science and Technology, the National Academies, and the American Academy have all recognized, truly interdisciplinary research on the U.S. energy system is long past due. Many observers are now asking how to create green jobs, how to better compete in the global marketplace, and how policy hinders or helps the innovation enterprise. But research funding to answer such questions in a rigorous way is essentially nonexistent. At present, there are few obvious federal sources of support for social science or interdisciplinary research on energy, nor are there abundant private resources.

As the United States struggles to responsibly address massive energy-related challenges, including climate change, energy poverty, energy security, and imperatives for economic growth, it will be necessary to balance intellectual and financial investment in the physical and natural sciences and engineering with a commitment to the social sciences. The consilience of all available intellectual resources is necessary if the nation is to achieve an alternative energy future better suited to current and future needs.

From the Hill – Summer 2012

R&D funding receives some good news amid major uncertainty

Despite continuing calls from many in Congress to cut spending, the fiscal year (FY) 2013 appropriations process has generally been positive for a number of R&D agencies, with some key science funders seeing surprising increases in the early going.

However, any good news must be tempered by the realities of the current fiscal climate. No funding legislation has been finalized to date; nor is it clear how many spending bills, if any, will be finalized and signed into law before the November elections. Equally unclear is the path to addressing the automatic across-the-board cuts known as sequestration, scheduled to begin in January 2013. These cuts would amount to roughly 8% for nondefense spending and 10% for defense, and although Congress and the administration are clearly concerned about the extent of these cuts, no agreement to roll back them back is yet in sight. Further, several proposals have emerged to shift the onus of these cuts to nondefense spending in order to protect the military. If this happens, it could result in severe long-run consequences for R&D funding.

Thus far, a handful of spending bills, including those funding the National Science Foundation (NSF), the National Aeronautics and Space Administration (NASA), and the Departments of Commerce, Energy, Homeland Security, and Veterans Affairs have been passed by the full House. None have passed the Senate, though several have made it through the Senate Appropriations Committee.

The House NSF/NASA bill is more generous in many areas than might have been expected, although R&D at most agencies would fall somewhat short of President Obama’s request. NSF R&D would receive a $221 million or 3.9% boost over FY 2012 levels, more than the Senate Appropriations Committee has approved in its version of the bill. The House also defeated a proposed floor amendment to cut an additional $1 billion from NSF, although amendments prohibiting funding for the Climate Change Education Program and for political science research at NSF passed.

NASA R&D would receive an increase of 1.3% or $123 million in the House-passed bill, with cuts to space exploration more than offset by increases elsewhere. The current Senate version would be somewhat more generous, providing a 2.5% or $230 million increase. Both chambers approved more spending than the administration requested for the Science Directorate, and both would seek to restore funding to NASA’s planetary science program in response to the administration’s proposed cuts. Many of the increases the administration has sought for R&D at the Department of Commerce appear to be holding up, with the House and Senate voting to increase R&D spending by more than 10% at the National Oceanic and Atmospheric Administration and the National Institute of Standards and Technology (NIST).

The House has been less generous at the Department of Energy. Although atomic defense R&D would be increased by 8.4% or $361 million, Office of Science R&D would be cut by 1.6% or $69 million, and clean energy and energy efficiency R&D would be slashed, as would the Advanced Research Projects Agency-Energy. The current Senate version of the bill, conversely, grants several programs modest increases from FY 2012, getting somewhat closer to the president’s request in many instances. Elsewhere, the House Appropriations Committee passed the Defense spending bill on a voice vote. The bill would reduce Department of Defense R&D spending by 1.1% or $793 million. However, this figure would still be 1.5% or $1.1 billion more than the administration’s request. Basic research would remain flat.

The Senate committee has voted to keep Department of Agriculture funding flat, although some key research programs would receive substantial boosts.

In the Department of Homeland Security, Science and Technology Directorate R&D would receive a 30% increase under both the House and Senate bills.

The EPA issues new air pollution rule; key Republicans object

Despite objections from some key Republican lawmakers, the Environmental Protection Agency (EPA) issued a final rule on air pollution for the oil and natural gas industry on April 18, including the first-ever national standards on air pollution from hydraulically fractured gas wells.

The New Source Performance Standards (NSPS) for crude oil and natural gas production require onshore gas wells to use reduced emissions completions (also known as “green completions”) to capture volatile organic compound (VOC) emissions that escape during the fracturing process. However, as a concession to industry groups that expressed concerns about having access to the technology in time for the rule’s enactment, the regulation allows for a two-year transition period, during which completion combustion devices, such as flares, can be used to burn off any gas that is released.

The EPA rule also provides emission standards for storage vessels and certain controllers and compressors, while revising previous VOC and sulfur dioxide emission standards for natural gas processing plants. In total, the EPA estimates that the new rule will reduce annual emissions of VOCs by 190,000 to 290,000 tons, air toxics by 12,000 to 20,000 tons, and methane by 1 to 1.7 million short tons by the time it is fully enacted in January 2015. This translates to an estimated cost savings of $11 million to $19 million, according to the agency.

Howard Feldman, the director of Regulatory and Scientific Affairs for the American Petroleum Institute, signaled support, although not a definitive endorsement, for the changes in the regulation. Some environmental groups applauded the new standards but pushed for further federal regulation of hydraulic fracturing.

Some members of Congress objected to the change. A few days before the final ruling was issued, House Energy and Commerce Chairman Fred Upton (R-MI), Ed Whitfield (R-KY), who chairs the panel’s Energy and Power Subcommittee, and committee chairman emeritus Joe Barton (R-TX) sent an open letter to EPA Administrator Lisa Jackson expressing their reservations about the proposed ruling. “We are concerned about the rule’s potential to adversely impact both the near- and long-term production of oil and natural gas, including unconventional resources, at a time when domestic production is increasingly important to our national economy, jobs and consumers,” they said. Sen. James Inhofe (R-OK), ranking member on the Senate Committee on Environment and Public Works, wrote a similar letter, criticizing the EPA’s process for developing the standards.

Senate panel considers national standards for forensic evidence

The Senate Committee on Commerce, Science, and Transportation held a hearing on March 28 to address the federal government’s role in establishing scientific standards for forensic evidence.

Currently, there are no national standards in forensic science, leaving interpretation of evidence, such as DNA and fingerprint matching, up to individual scientists and technicians. The rapid development of evidencebased standards is crucial because, said Rep. Tom Udall (D-NM), many prosecutors and lawmakers assume that forensic evidence has undergone rigorous scientific review, a misconception propagated by popular television shows.

Chairman Jay Rockefeller (D-WV) opened the hearing by outlining the unique position of forensic science: Although there are many fields of forensics, the discipline does not have a culture of science with a peer review process. Rockefeller announced his intention to prioritize the introduction of science into forensics, using the work of NSF and NIST for guidance. Rep. John Boozman (R-AK) discussed the importance of science-based forensic standards for homeland security and the U.S. justice system, citing a 2009 National Academy of Sciences report, which called for the standardization of forensics.

The three witnesses at the hearing cited the need for more research to develop these standards. Eric Lander, co-chair of the President’s Council of Advisors on Science and Technology, described his experience as a scientific expert during one of the first DNA fingerprinting cases in the United States. He emphasized the need for collaboration between the science and law communities to develop standards for what constitutes a “match” and associated probabilities in forensic evidence.

NIST Director Patrick Gallagher agreed, saying his mission was to develop a science-based national system of measurements for forensics, in collaboration with the Department of Justice (DOJ). He defended the $5 million request for the initiative in the president’s FY 2013 budget and discussed allotment of the funding to priority program areas, such as the development of new reference methods and technologies for understanding crime scenes and identifying criminals.

NSF Director Subra Suresh highlighted the role of his agency in the development of forensics standards through its funding of basic research. Between 2009 and 2011, more than 100 NSF grants were awarded to support forensics research and education.

There was also discussion of necessary infrastructure to facilitate standards development. Boozman’s proposal to create an independent Office of Forensic Science at the DOJ was hailed as a good mechanism for interdisciplinary communication and collaboration by Gallagher, although Lander wondered if it would be able to do more than simply identify areas in need of work.

Cybersecurity bills advance in House, Senate

Congress continues to debate cybersecurity legislation, with the House passing the Cyber Intelligence Sharing and Protection Act (CISPA, H.R. 3523) and members of the Senate pushing for consideration of the Cybersecurity Act of 2012 (S. 2105).

Sponsored by House Select Intelligence Committee Chairman Mike Rogers (R-MI) and ranking member Dutch Ruppersberger (D-MD), CISPA passed the House by a vote of 248 to 168, with 42 Democrats voting for the bill and 28 Republicans voting against it. The legislation would remove legal barriers preventing the government and private companies from sharing information regarding cyberattacks and network security. It would also limit the federal government’s jurisdiction in seeking cybersecurity information.

CISPA passed a day after the Office of Management and Budget released a statement saying that President Obama would veto the bill if it reached his desk. The White House expressed concern that the bill does not adequately address individual privacy concerns and national infrastructure vulnerabilities. The bill has been referred to the Select Committee on Intelligence for consideration.

CISPA is similar to a Senate bill, the Strengthening and Enhancing Cybersecurity by Using Research, Education, Information, and Technology Act of 2012 (SECURE IT, S. 2151), in its focus on voluntary information sharing. SECURE IT, sponsored by Sen. John McCain (R-AZ) and seven other Republicans, also includes clauses meant to strengthen criminal penalties for cybercrimes. The legislation has been referred to the Senate Committee on Commerce, Science, and Transportation. A House version of the bill (H.R. 4263), sponsored by Rep. Mary Bono Mack (R-CA), is currently in several committees.

Meanwhile, the bipartisan Cybersecurity Act of 2012, sponsored by Homeland Security and Governmental Affairs Committee Chairman Joe Lieberman (I-CT) and Ranking Member Susan Collins (R-ME), would empower the Secretary of Homeland Security to conduct a top-level assessment of cybersecurity risks and develop requirements for securing critical infrastructure. It would also provide more privacy protection than CISPA or SECURE IT.

The White House has endorsed the Cybersecurity Act of 2012, although a coalition of several civil liberties groups has come out against it, saying the bill does not provide adequate privacy protection. The coalition, including the American Civil Liberties Union and the Center for Democracy and Technology, is concerned that the bill would allow military spy agencies access to personal information and permit the federal government to use that information during unrelated criminal investigations. Sen. Ron Wyden (D-OR) has expressed similar concerns about the bill’s privacy protections. Although Democratic senators have said they are willing to add in more privacy protection clauses, House Republicans say they will not support any legislation that includes new regulations.

In the meantime, the House has passed several other cybersecurity bills. The Federal Information Security Amendments Act of 2012 (H.R. 4257) would update the Federal Information Security Management Act (FISMA) to increase the responsibility of federal agencies to update information security infrastructure. The Cybersecurity Enhancement Act of 2012 (H.R. 2096) directs federal agencies participating in the National High-Performance Computing Program to draft and implement a Congress-approved cybersecurity R&D plan. The bill was introduced by Rep. Michael McCaul (R-TX) and passed the house by 396 to 10. House Science, Space, and Technology Committee Chairman Ralph Hall (R-TX) sponsored the Advancing America’s Networking and Information Technology Research and Development Act of 2012 (H.R. 3834), which passed the House by a voice vote. The legislation would update the High-Performance Computing Act of 1991, which established the National High-Performance Computing Program.

Federal science and technology in brief

  • The National Institutes of Health has announced a new program to match researchers with a selection of pharmaceutical industry compounds in order to promote academic research to search for new treatments. The National Center for Advancing Translational Sciences will initially partner with Pfizer, AstraZeneca, and Eli Lilly and Co., which have agreed to make dozens of their compounds available for a pilot phase. The initiative, Discovering New Therapeutic Uses for Existing Molecules, provides templates for handling intellectual property used in or developed through the program. Industry partners will retain ownership of their compounds, while academic research partners will own any intellectual property they discover, with the right to publish the results of their work.
  • The Obama administration released its National Bioeconomy Blueprint, outlining steps that agencies will take to drive economic activity powered by research and innovation in the biosciences. Areas of focus include energy, translational medicine, agriculture, and homeland security.
  • The U.S. Global Change Research Program has released the administration’s 10-year strategic plan for global change research. According to the press release, the strategy will expand “to incorporate the complex dynamics of ecosystems and human social-economic activities and how those factors influence global change.”
  • The EPA released its 17th annual inventory of overall emissions for six greenhouse gases. The total emissions for 2010, equivalent to 6,822 million metric tons of carbon dioxide, represent a 3.2% increase over 2009 levels. The EPA attributes the increase to increased energy use in all sectors of the economy, increased energy demand related to an expanding economy, and warmer weather during the summer of 2010.
  • The Food and Drug Administration has published three documents in the Federal Register to promote changes in the ways in which medically important antibiotics are used in food-producing animals. The first, a final guidance for industry called “The Judicious Use of Medically Important Antimicrobial Drugs in Food-Producing Animals,” recommends phasing out the agricultural use of medically important drugs. The second is a draft guidance for sponsors of certain new animal drug products, and the third is a draft proposed regulation for veterinary feed directives.
  • On April 24, House Science, Space, and Technology Committee Ranking Member Eddie Bernice Johnson (D-TX) introduced the Broadening Participation in STEM Education Act (H.R. 4483) at a conference organized by the National Action Council for Minorities in Education. The bill aims to expand the number of minorities in undergraduate science, technology, math and engineering fields and would authorize NSF “to award grants to colleges and universities that want to implement or expand innovative, research-based approaches to recruit and retain students from underrepresented minority groups.”

“From the Hill” is adapted from the newsletter Science and Technology in Congress, published by the Office of Government Relations of the American Association for the Advancement of Science (www.aaas.org) in Washington, DC.

Data Deluge and the Human Microbiome Project

A specter is haunting science: the specter of data overload. All the powers of the scientific establishment have entered into a holy alliance to exorcise this specter: the National Institutes of Health (NIH), the National Science Foundation (NSF), and the Department of Energy, among others. What funding agency has not called for novel software to distill meaning from a torrent of data – for example, from the 700 megabytes (Mb) of data per second produced by the Large Hadron Collider, the 1,600 gigabytes (Gb) generated each day by NASA’s Solar Observatory, the 140 terabytes (Tb) to flow every day om the Large Synoptic Survey Telescope, or the 480 petabytes (Pb) expected daily from the Square Kilometer array? To deal with “the fast-growing volume of digital data,” the Obama administration on March 29, 2012, announced a Big Data Initiative, with $200 million in new commitments by research agencies and an additional $250 million investment by the Department of Defense to “improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.”

In the biological sciences, the data deluge has become most suffocating. Technological advances make it continually faster and cheaper to produce genomic sequence data than to store, manage, and analyze them. To keep up with the flow of data, some biologists have called for a change in the scientific method from the traditional practice that poses a hypothesis and then pursues the phenomena necessary to test it to a novel approach that uses mathematical tools to scan data for interesting associations. Eric Lander of the Broad Institute wrote in Nature magazine that the greatest impact of data-rich genomics “has been the ability to investigate biological phenomena in a comprehensive, unbiased, hypothesis-free manner.” Chris Anderson in Wired magazine wrote an article titled “The Data Deluge Makes the Scientific Method Obsolete.” Anne Thesson and David Patterson at the Marine Biological Laboratory in Woods Hole have called for the emergence of a “Big New Biology, focused on aggregating and querying existing data in novel ways.”

The sorcerer’s apprentice

Of the many instances of data overload, the press has given most attention to gene sequencing, which identifies the pattern of base pairs of nucleotides in a DNA fragment. In an article titled “Will Computers Crash Genomics?” Elizabeth Pennisi, writing in Science magazine in February 2011, reported that sequencing centers have produced data sets so large they have to be mailed physically on disks and drives because it could take weeks to transfer them electronically over the Internet. “A single DNA sequencer can now generate in a day what it took 10 years to collect for the Human Genome Project,” Pennisi wrote. She quoted a Canadian bioinformaticist, Lincoln Stein, who predicted that the torrent of DNA data “will swamp our storage systems and crush our computer clusters.”

“The field of genomics is caught in a data deluge,” Andrew Pollack wrote on November 30, 2011, in the New York Times. The story quotes C. Titus Brown, a bioinformatics specialist at Michigan State University. According to Brown, the NIH-sponsored Human Microbiome Project (HMP), which samples and sequences microbial populations found in the human gut and other bodily sites, has already generated about a million times as much sequence data as did the initial Human Genome Project. “It’s not at all clear what you do with that data,” he said.

Eric Green, director of the National Human Genome Research Institute (NHGRI), expressed a similar concern in February 2011 in a videotaped lecture. Green used a picture of a boy trying to drink from a fire hose and the famous image of the great wave by the 19th Century Japanese artist Hokusai to illustrate what he described as a “large onslaught of data sets” and a “massive tsunami” associated with the HMP and other sequencing activities. Green wrote with a coauthor in Nature, “Computational tools are quickly becoming inadequate for analyzing the amount of genomic data that can now be generated, and this mismatch will worsen.” The website of just one of many NIH sequencing initiatives—the 1000 Genomes Project—says that its data set “is currently around 130 Tb in size and growing.”

Although biologists may be uncertain how to respond to the data tsunami, they agree about its cause. A fascinating NIH webpage (http://www.genome.gov/sequencingcosts/) describes the declining “cost of determining one megabase (Mb; a million bases) of DNA sequence of a specified quality.” Advances in gene sequencing technology, many of which were prompted by NIH, have pushed down the cost from about $1,000 per Mb in January 2008 to about 10 cents today. This dime-a-Mb price includes: “Labor, administration, management, utilities, reagents, and consumables; sequencing instruments … informatics activities directly related to sequence production … ; submission of data to a public database;” as well as “indirect” costs. Who can resist a bargain like that?

The cost of storage, maintenance, and transfer, however, forms a bottleneck that keeps data from potential users. According to a principle known as Kryder’s Law, the price of data storage falls by half every 14 months. Matthew Dublin at Genome Technology magazine has written, “A t present, the per-base cost of sequencing is dropping by about half every five months, and this trend shows no sign of slowing down. … Factor in all of these variables, and the logarithmic graphs start looking like signs of a data-management doomsday.”

Data-management doomsday appeared to arrive in February 2011 when the National Center for Biotechnology Information (NCBI) announced that because of budgetary constraints, it would phase out its Sequence Read Archive (SRA) and other database resources. This was remarkable because the HMP in its Data Release and Resource Sharing Guidelines requires that all sequence data along with annotations and identifications be submitted to NCBI on a weekly basis. The decision to close NCBI was reversed, but for a time it seemed that genomic sequence data, like nuclear waste, would have to be stored on site until a national depository could be found. With hypothesis-driven science, investigators are often able to obtain whatever data they need either in their own laboratories or through one of many commercial DNA sequencing services. Why search through difficult public databases when it is so inexpensive to do your own sequencing when you know why you are doing it? In 2011, for example, a deadly outbreak of enterohemorrhagic Escherichia coli in Germany sickened almost 4,000 people, about 50 of whom died. Early in the outbreak, biologists, using a commercial sequencing service, took only a few days to identify the culprit E. coli strain and trace it to its source.

Data tsunamis build when researchers sequence first and look for questions to ask later. Is it possible to identify a logical stopping place for generating genomic data? The number of different microbiota in and around human beings, for example, is for all practical purposes infinite. The deluge rises in the absence of a framework for deciding which ones to sequence and for containing, organizing, and interpreting the data that result.

The blog of one major HMP participant, the J. Craig Venter Institute, reported on early results from “700 samples from hundreds of individuals taken from up to 16 distinct body sites.” The “data produced from the sequences exceeds 10 terabytes,” which had to be stored. “Ultimately researchers want to relate this information to healthy versus disease states in humans,” the Venter Institute blog hopefully opined, although “ultimately” does not suggest when or how this might be done. In an interview with Science magazine on June 6, 2012, George Weinstock, a principal investigator for the HMP, said, “Despite the huge amount of the work that has been done on the human microbiome, the number of rigorously proved connections between disease and microbiome are few to none.”

In a famous poem, “Der Zauberlehrling” (“The Sorcerer’s Apprentice”), Goethe describes a wizard who, as he leaves his workshop, assigns his apprentice chores, including bringing water from a well. The apprentice enchants a broom to fetch water for him, but he cannot stop the broom as it endlessly repeats its task and floods the workshop. The apprentice hacks the broom in pieces, but each piece becomes a new broom and brings more water. By forcing down the cost of gene sequencing, NIH and others have created magic brooms that endlessly fetch genomic data. At BGI, a large sequencing center, Bingqiang Wang lamented, “We are drowning in the genome data that our high-throughput sequencing machines create every day.”

The Human Microbiome Project

The HMP illustrates the problem that data deluge poses when a context or framework for interpreting those data is lacking. According to Lita Proctor, the working group coordinator, the HMP was “specifically devised and implemented to create a set of data, reagents, or other material whose primary utility will be as a resource for the broad scientific community.” Officially launched by NIH in October 2007 as a $157 million, five-year effort, the project seeks to “characterize the microbial communities found at several different sites on the human body, including nasal passages, oral cavities, skin, gastrointestinal tract, and urogenital tract.” The microbiota in and on a healthy human are thought to contain 10 times as many cells as the human body itself and to include bacteria, viruses, archaea, protozoans, and fungi. According to Proctor, the HMP will survey “a cohort of healthy adults to produce a reference dataset of baseline microbiomes.” But as microbiologist George Weinstock, associate director of the Genome Institute at Washington University in St. Louis, pointed out in a presentation, “Probably [there is] not a ‘reference’ microbiome.” The project will collect “sequences of reference strains,” although the idea or purpose of a “reference strain” is not defined.

There are genomics projects that are complex and challenging but still rely on well-understood rules of induction to try to answer a specific question, such as which genetic mutations cause Mendelian (monogenetic) disease. A more ambitious effort, the genome-wide association study (GWAS), involves scanning the genomes of many people to find variations associated with phenotypic differences while controlling for environmental factors. Phenotypic traits may resist definition (“asthma,” for example, could refer to a great number of problems), and environmental factors (“lifestyle,” for example) may confound it. Even if it is hard to control for genotypic, phenotypic, and environmental variance, however, at least one has some idea of what these concepts mean in GWAS research.

With the HMP it is different. Distinctions between genotype and phenotype or between genomic and environmental factors are impossible to understand conceptually much less control experimentally. It is not clear, for example, whether the microbiome should be merged with the human genome or considered part of its environment. Are the phenotypic traits of the associated bacteria human traits or not? Are the genomes of the microbiota part of the human genome, whereas the organisms themselves are part of its environment? Questions such as these are so imponderable, so up for grabs, that they are not worth asking. In evolutionary biology, the concept of a reference organism is entirely clear; it provides the model to which developmental and other phenomena in similar organisms are compared. The HMP speaks in terms of a “reference set” of perhaps 3,000 genetic sequences but is not clear what “reference” in this context may mean.

To create a “reference set of microbial gene sequences,” the HMP began with a “jumpstart” phase that funded four largescale sequencing centers. The announcement stated, “This initiative will begin with the sequencing of up to 600 genomes from both cultured and uncultured bacteria, plus several non-bacterial microbes.” As costs fell, the number of reference microbial genomes rose; the HMP Working Group stated that the project “will add at least 900 additional reference bacterial genome sequences to the public database.” In a more recent document, Proctor refers to a “target catalog of 3,000 microbial genome sequences.”

The HMP jumpstart phase has made rapid progress not only in reaching its target of 600, 1,000, or 3,000 reference gene sequences but also, to quote the initial announcement, in continuing “with metagenomic analysis to characterize the complexity of microbial communities at individual body sites.” Metagenomic analysis involves sequencing mixed fragments of DNA or RNA detected in samples of organic material. In one European study, researchers produced 576.7 Gb of sequences detected in stool samples. The research team discovered 3.3 million nonredundant microbial genes, primarily bacterial, associated with what they dubbed as “the fecal microbial community.” This outnumbers by 150 times the genes identified in the human genome proper. Opportunities for further sequencing abound. According to another study, “For every bacterium in our body, there’s probably 100 phages, with an estimated 10 billion of these viruses packed into each gram of human stool.”

To get some conceptual purchase on the HMP, its Working Group asked “whether there is a core microbiome at each body site.” There has been a lively debate over whether this is a meaningful question. A group of geneticists reported in Genome Biology in 2011 that “there is pronounced variability in an individual’s microbiota across months, weeks, and even days. Additionally, only a small fraction of the total taxa found within a single body site appear to be present across all time points, suggesting that no core temporal microbiome exists.” It appears safe to say that bacteria of the broad phyla Bacteroidetes and Firmicutes are found in every individual. Beyond this, according to Yale microbiologists Ashley Shade and Jo Handelsman, “what constitutes a core remains elusive.” Microbiologist Julian Marchesi opines that “when we drill down the taxonomic levels it seems that this concept becomes more sketchy and different studies and methods provide different answers.”

According to a study published in 2005, “the bacterial communities in the human gut vary tremendously from one individual to the next.” An article published in Nature in 2010 lists several studies that have shown “substantial diversity of the gut microbiome between healthy individuals.” Even identical twins present an amazing diversity in the microbiota that accompany them. A metagenomic analysis of fecal microbiota “revealed an estimated 800–900 bacterial species in each co-twin, less than half of which were shared by both individuals.”

The HMP announcement states, “Initially, 16S rRNA gene sequencing will be used to identify the microbiome community structure at each site.” This genetic sequence encodes part of the cellular machinery responsible for synthesizing proteins in bacteria and archaea and is fairly well conserved when microbes divide. So far, HMP “has produced a 2.3-terabyte 16S ribosomal RNA metagenomic data set of over 35 billion reads taken from 690 samples from 300 U.S. subjects, across 15 body sites,” according to a recent report in Nature Reviews Genetics. That a microbiome community (for example, the “fecal microbial community”) can be delimited and defined by this large-scale sequencing project cannot be assumed. Even if there is a community structure, it is not clear that 16S RNA sequencing will identify it. In the genome of any given microbe, a dozen or more copies of the 16S gene may be found, with significant nucleotide differences among them. Microbes that differ genetically in dramatic ways, moreover, can contain 16S genes that are identical or nearly the same.

“The genus Bacillus is a good example of this,” microbiologists J. Michael Janda and Sharon L. Abbott explain. The 16S rRNA genes associated with strains of B. globisporus and B. psychrophilus are more than 99.5% the same, but at the genomic level these strains show little relatedness. Other researchers comment that because of the limited nucleotide variability in the 16S gene in bacteria, “taxonomic assignment of species present in a mixed microbial sample remains a computational challenge.”

The assignment of species, however, may be incidental to the problem of understanding what is meant by concepts such as “community” and “structure” when applied to a fecal sample or to any collection of organic matter that may be subject to metagenomic analysis. More challenging may be the problem of determining how many discrete microbiomes occupy a bodily site. The mouth, for example, harbors any number of unique habitats, each with its own distinct populations. There does not seem to be a way to count how many microbiomes are there.

The HMP in its initial phase has succeeded in producing very large data sets, but it has yet to provide a conceptual framework beyond tagging sequences to the general locations where they are found. According to University of Michigan epidemiologist Betsy Foxman and coauthors, “A s the HMP moves forward, it would benefit from the development of an overall conceptual framework for structuring the research agenda, analyzing the resulting data, and applying the results in order to improve human health.” Lita Proctor has described the same challenge. “Given that variation in the microbiome appears to be far greater than human genetic variability, repeated studies in each target population will be needed to identify keystone microbiome signatures against a complex and contextually dependent background.” The project, however, has yet to suggest meanings for concepts such as “keystone microbiome signatures” and to agree on ways to identify and re-identify microbiomes, if these exist, as entities through time and change.

Are we ecosystems?

In 2007, a team of microbiologists introduced the HMP in an inaugural article published in Nature magazine. “If humans are thought of as a composite of microbial and human cells,” they wrote, “then the picture that emerges is one of a human ‘supraorganism.’” A companion paper similarly presented the HMP in ecological terms. “Humans and their collective microbiota are segmented into many local communities, each comprising an individual human,” it stated. “This ecological pattern, characterized by strong interactions within distinct local communities and limited interactions or migration between them, is described as a metacommunity.” In her 2011 report on the progress of the HMP, Proctor likewise refers to the “human superorganism.”

The leaders of the HMP almost universally appeal to ecological concepts and metaphors to provide a conceptual framework for their research. The inaugural HMP article in Nature declared, “Questions about the human microbiome are new only in terms of the system to which they apply. Similar questions have inspired and confounded ecologists working on macro-scale ecosystems for decades.” The article continues, “It is expected that the HMP will uncover whether the principles of ecology, gleaned from studies of the macroscopic world, apply to the microscopic world that humans harbor.” Likewise, in a 2011 manifesto titled “Our Microbial Selves: What Ecology Can Teach Us,” a group of microbiologists working with the HMP proposed to “answer fundamental questions that were previously inaccessible” by using “well-tested ecological theories to gain insight into changes in the microbiome.”

The absence of a conceptual framework for interpreting HMP data becomes apparent when one asks which principles and well-tested theories in ecology can provide insight into the human microbiome. Ecologists do not have a settled idea of the “function” of an ecological community or system. They may point out that some apparently “functional” ecosystems, such as salt marshes, are monocultures. The microbiologists state, “Community ecologists are interested in what controls patterns in diversity and the dynamics of consortia in the same environment.” Ecologists have never found consensus that such patterns exist, however, nor identified any forces that control them. Environmental historian Donald Worster has written, “Nature should be regarded as a landscape of patches, big and little, patches of all textures and colors, a patchwork quilt of living things, changing continually through time and space, responding to an unceasing barrage of perturbations. The stitches in that quilt never hold for long.”

According to bioethicist Eric Juengst, HMP scientists believe “that the human body should be understood as an ecosystem with multiple ecological niches and habitats” and that “human beings should be understood as ‘superorganisms’ that incorporate multiple symbiotic cell species into a single individual with very blurry boundaries.” The architects of the HMP, Juengst has written, “describe the individual human body as itself an ecosystem.” Researchers “almost universally declare human beings to be ‘superorganisms’ rather than discrete biological individuals, rendering our personal boundaries fluid and flexible.” This fundamentally changes how the patient in a medical context is portrayed, not as an individual but as an ecosystem.

How well does the ecological analogy work in medicine? Microbiologists at the University of Colorado involved in microbiome research have opined, “Diversity might also have a crucial role in ecosystem health by contributing to stability.” Has microbial diversity a role in human health? Are more microbes of more kinds better for you? The stability-diversity hypothesis and the idea of “ecosystem health” have been so roundly criticized by ecologists that they have largely abandoned these concepts. According to three prominent ecologists, Volker Grimm, Eric Schmidt, and Christian Wissel, “The term ‘stability’ has no practical meaning in ecology.” Indeed it cannot have any meaning because ecology lacks identity conditions for ecosystems—that is, criteria by which to determine when a site remains the same or becomes a different ecosystem though time and change. This suggests a difference between humans and ecosystems. A patient may die but the superorganism or metacommunity lives on, as would any ecosystem, even if perturbed. Death may increase biotic diversity and therefore ecosystem health in the microbiome of the cadaver. Change happens. It’s all good. As Juengst points out, “there are no bad guys in ecosystems.”

The HMP has a predecessor in ecology. As a commentary in Science points out, “For a 7-year period ending in 1974, the United States participated in the International Biological Program (IBP)—an ambitious effort that was supposed to revolutionize … ecology and usher in a new age of ‘Big Biology.’” The IBP, which received about $60 million from NSF and more funding from international agencies, attempted to survey or census the biota in “six biomes: the tundra, coniferous and deciduous forests, grassland, desert, and tropical vegetation,” according to botanist Paul Risser in 1970. “The data from all the sites is sent to Colorado State University where initial analysis and summary takes place,” Risser added. “Eventually we will translate these … statements into computer languages to permit simulation and optimization analysis.”

The purpose of the IBP was not to test a hypothesis or answer a specific question but to provide a resource for the scientific community. It intended to “determine the biological basis of productivity and human welfare.” This proved difficult because the 1,800 U.S. scientists who engaged in IBP research discovered in each “biome” a collection of thousands of patches, each with a different and transient assortment of species. The concept of a biome failed to provide a conceptual framework for organizing the plentiful data the IBP produced. There were no non-arbitrary bounds or biota to define a biome; sites changed from season to season and day to day. The data sets may still languish at Colorado where they were sent, or they may have disappeared.

Ethical questions

A salient and pressing ethical, legal, and social issue confronting the HMP has to do with the maintenance of the data sets it produces in view of the absence of a conceptual framework for interpreting them. Two biologists at NCBI have written, “A n interesting, perhaps provocative question is whether a sufficient number of genomes have already been sequenced.” They speculate “that microbial genomics has already reached the stage of diminishing returns, such that each new genome yields information of progressively decreasing utility.” In the absence of a conceptual framework other than vague ecological metaphors such as “superorganism,” one may ask if society has an ethical duty to keep the tsunami of data the HMP produces.

Ewan Birney, a bioinformationist at the European Bioinformatics Institute, has said that because the cost of generating data falls much faster than the cost of storing it, “there will come a point when we will have to spend an exponential amount on data storage.” A recent estimate pegs the price for “cloud” storage at 14 cents per gigabyte per month. The National Library of Medicine (NLM) 2012 budget requested $116 million to support NCBI; funding is “specifically added … to meet the challenge of collecting, organizing, analyzing, and disseminating the deluge of data emanating from NIH-funded high-throughput genomic sequencing initiatives.” Should society pay to store and manage HMP data until a conceptual framework can be found for interpreting them?

Are data entitled to be preserved? Science writer Matthew Dublin has proposed that the data tsunami challenges scientists who believe data are a sacred responsibility. The public, especially when budgets are tight, may grow weary of the cost. Scientists who test a hypothesis or answer a question they care about have an incentive to keep their data. This may not be as true of data created primarily as a resource for the broad scientific community.

One may identify two extreme positions as bookends between which to locate an ethically defensible response in the context of the HMP to the data deluge or tsunami problem. On the one hand, NIH could write off the HMP as a sunk cost and leave it to the scientific community to decide which data it wants to keep as a resource. In other words, one could say that NBCI had it right when it threatened to close. To find the needle in the haystack, one does not add more hay. It is worse: Needles turn into hay when looked at differently, and hay turns into needles. Instead of sequencing first and looking for questions later, NIH may do better, in the words of Green and Guyer, to support “individual investigators to pursue more effective hypothesis-driven research.” Let the hypothesis decide what data it needs. There is a lot of sequencing capacity out there and a practically infinite number of microbes to sequence. If the point is just to keep the sequencers busy, then the HMP represents mission creep in the Human Genome Project.

On the other hand, NIH could double down on its investment by sequencing more and more microbes and metagenomes in the hope that large enough data sets will speak for themselves and yield insights in response to the principles of ecology and other algorithms. This approach calls not only for a conceptual framework that does not yet exist, but for a philosophy of science that abandons hypothesis-driven research. According to a group of biologists writing in BioScience, data-intensive science does not test theories, models, or hypotheses but “requires new synthetic analysis techniques to explore and identify … truly novel and surprising patterns that are ‘born from the data.’” An HMP website agrees: “The data sets produced by metagenomic sequencing and related components will be very large and complex, requiring novel analytical tools for distilling useful information from vast amounts of sequence data, functional genomic data and subject metadata.”

One may question the view of its founding advocates that the “HMP is a logical, conceptual, and experimental extension of the Human Genome Project.” Spatial contiguity, often transitory, as when your dog licks your hand, relates a human genome to microbiota, but this is not a logical, conceptual, or experimental extension. Individuals vary genetically in ways that show no correlation with the ways in which their microbiota vary. A group of biologists did observe, however, that the HMP “is following in the footsteps of the Human Genome Project … [in its] potential disappointment and resentment over the lack of medical applications.”

At the other extreme, one may respond that society has an obligation, to quote the NLM again, “to meet the challenge of collecting, organizing, analyzing, and disseminating the deluge of data emanating from NIH-funded high-throughput genomic sequencing initiatives.” The HMP Working Group has written, “Computational methods to process and analyze such data are in their infancy, and, in particular, objective measures and benchmarks of their effectiveness have been lacking.” The massive investment in producing data sets, according to this view, is not a sunk cost but a justification for more investment in computational methods, since without them the value of the data, which was produced in anticipation of these algorithms, will be lost.

The HMP may find itself in the position of the miller in the famous German fairy tale who, to gain influence with the king, said his daughter could spin straw into gold. The king provided a spinning wheel and plenty of straw. Fortunately, the girl in the fairy tale overheard an imp who while dancing chanted a novel, integrative, synthetic, computational algorithm for spinning straw into gold. When an analogous informatics becomes available for the HMP, we may call it the Rumpelstiltskin algorithm.

A possible way to hasten this transition might be to privatize as a nonprofit outfit the Data Analysis and Coordination Center (DACC), which is the central repository for all HMP data. The DACC would then charge for the use of data a fee that represents some part of the cost of storing and making it accessible. Researchers could then decide whether it is cheaper to do their own sequencing and annotate it in their own way or to download the data. If algorithms appear that turn Big Data into Big Biology, the DACC will support itself by the fees it charges. If interest in the data set is too little to meet the costs of maintaining it, however, one may wonder where to put it.

According to Nature Medicine, “Each day, approximately 10 terabytes of data stream out of more than 90 gene sequencers at The Broad Institute” at Harvard and MIT, one of four major sequencing centers funded by NIH to jumpstart the HMP. In a podcast associated with a talk she gave in September 2011, Toby Bloom, director of informatics at the Broad Genome Sequencing Platform, commented, “as the data get older and the technology gets older, … our data isn’t just two years old, it’s four years old or six years old.

And what do we do with that older data? What of it do we have to keep, and what do we do about the costs?” Bloom considered the storage issues tractable. “Dealing with the size of the data is no longer the thing that keeps me up at night … What I want to address is, where do we want to go with all this data?”

The Small Business Innovation Research Program

The Small Business Innovation Development Act of 1982 created the Small Business Innovation Research (SBIR) program to stimulate technological innovation, to use small businesses of 500 or fewer employees to meet federal R&D needs, to foster and encourage participation of minority and disadvantaged persons in technological innovation, and to increase private sector commercialization of innovations derived from federal R&D. In 1983, this set-aside program totaled $45 million. The SBIR program grew to nearly $2 billion by 2008 when Congress failed to reauthorize it. For several years it survived through a series of multi-month extensions, and at the end of December 2011 it was finally reauthorized for another six years.

In 2000, Congress mandated that the National Research Council (NRC) conduct an evaluation of the economic benefits achieved by the SBIR program and make recommendations to Congress for improvements. Part of that evaluation exercise involved an extensive survey in 2005 of more than 6,000 Phase II projects funded between 1992 and 2001 by the Department of Defense (DoD), the National Institutes of Health (NIH), Department of Energy (DOE), the National Aeronautics and Space Administration (NASA), and the National Science Foundation (NSF). Typically, the Phase II studies lasted for not more than two years and were capped at $750,000. The accompanying figures track several performance characteristics of the SBIR program based on our analysis of the 1,878 randomly selected projects conducted at firms that responded to the NRC survey.

Who are the major SBIR funders?

Agencies allocate 2.5% of their extramural research budget to their SBIR program. DoD ($943 million) and NIH ($562 million) are the largest SBIR funding agencies, followed by DOE, NASA, and NSF.

SBIR awards by funding agency. Fiscal Year 2005

Source: Link, Albert N. and John T. Scott (2012). Employment Growth from Public Support of Innovation in Small Firms, Kalamazoo, MI: W.E. Upjohn I nstitute for Employment Research.

Does SBIR stimulate additional research?

The SBIR program aims to fund projects that firms would not have undertaken because of their risk and uncertainty and other barriers to innovation such as limited access to scientific equipment. When an awarded firm was asked if it would have undertaken the funded research project in the absence of its SBIR award, respondents generally said “no.” Table 2 shows that over 60% of those surveyed responded “probably not” or “definitely not”; and fewer than 20% responded “probably yes” or “definitely yes” to this counterfactual question.

Would the research be performed without SBIR support?

Source: Link and Scott (2012)

Does SBIR-funded research result in products?

Commercialization is an explicit objective of the SBIR program. The average predicted probability of commercialization of SBIR projects is determined from econometric models. Despite barriers to innovation faced by small firms, the probability of success is almost 50%.

Probability of commercialization

Source: Link, Albert N. and John T. Scott (2010). “Government as Entrepreneur: Evaluating the Commercialization Success of SBIR Projects,” Research Policy 39: 589-601.

Does SBIR stimulate near-term job creation?

Although job creation is not an explicit objective of the SBIR program, policy discussions related to the current reauthorization of the program focused on employment effects. The direct, short-term employment effects attributable to SBIR funding are small. That is, as shown in Table 4, the number of employees retained as a result of the technology developed during the funded project is on average less than 2.

Jobs created immediately at SBIR companies by technology developments

Source: Link and Scott (2012)

Does SBIR have a long-term effect on employment?

The longer-run impact of SBIR funding on the overall employment growth of award-recipient firms is substantial. The average long-run SBIR-induced employment gain is over 25, and the average employment gain per million dollars awarded is over 40. The data indicate that firms receiving the SBIR funding are able to overcome the initial technology-based hurdles that small, entrepreneurial firms frequently face and to achieve long-term employment growth.

Long-term employment effects from SBIR awards

Source: Link and Scott (2012), Table 5.4, p. 76.

Note: Employment gains were measured among NSF-funded projects, but those gains were less than the gains predicted in the absence of an SBIR award.

U.S. Competitiveness: The Mexican Connection

A “giant sucking sound” was the memorable description made by presidential candidate Ross Perot during the 1992 campaign of the impact that the North American Free Trade Agreement (NAFTA) would have, as businesses and jobs moved from the United States to Mexico. The reity is that economic cooperation with Mexico has been a boon for U.S. industry and has strengthened the country’s competitive position in ways that have produced broad economic benefits. Today, as China and other Asian countries have emerged as major economic challengers, expanding economic cooperation with Mexico is one of the best ways for the United States to improve its global competitiveness.

Regional integration between the United States and Mexico is already vast and deep. As the United States’ second largest export market and third largest trading partner, Mexico is clearly important to the U.S. economy. Merchandise ade has more than quintupled since NAFTA went into effect in 1994, and in 2011, bilateral goods and services trade reached approximately a half-trillion dollars for the first time. The U.S. Chamber of Commerce has calculated that the jobs of six million American workers depend on U.S.-Mexico trade. Many of those jobs are in border states, which have especially close ties to Mexico, but Mexico is also the top buyer of exports from states as far away as New Hampshire (mostly computers and electronics). In fact, 20 states, from Michigan to Florida, sell more than a billion dollars’ worth of goods to Mexico each year, and Mexico is the first or second most important export market for 21 states.

The United States and Mexico are also major investors in one another. In fact, combined foreign direct investment holdings now total more than $100 billion. According to the most recent count by the Department of Commerce, U.S.-owned companies operating in Mexico created $25 billion in value added and employed nearly a million workers. Mexican investment in the United States is less than U.S. investment in Mexico, but it is has been growing rapidly in recent years. Several of Mexico’s top companies, which are increasingly global operations, have made significant investments in the United States. Mexico’s Cemex, for example, is North America’s largest maker of cement and concrete products. Grupo Bimbo, which owns well-known brands such as Entenmanns’s, Thomas’s English Muffins, and Sara Lee, is the largest baked goods company in the Americas. Even Saks Fifth Avenue and the New York Times Company are supported by significant Mexican investment.

The massive volume of commerce and investment is important, but the depth of regional integration is the primary reason why Mexico contributes to U.S. competitiveness. Mexico and the United States do not just trade products, they build them together. In fact, to understand regional trade, it is necessary to view imports and exports in a different light. Whereas imports from most of the world are what they appear to be—foreign products—the same cannot be said of imports from Mexico. During production, materials and parts often cross the southwest border numerous times while U.S. and Mexican factories each perform the parts of the manufacturing process they can do most competitively. Because of the complementary nature of the two economies, close geographic proximity, and NAFTA, which eliminated most tariff barriers to regional trade, the U.S. and Mexican manufacturing sectors are deeply integrated.

Demonstrating this integration is the fact that 40% of the value of U.S. imports from Mexico comes from materials and parts produced in the United States. This means that 40 cents of every dollar the United States spends on Mexican goods actually supports U.S. firms. The only other major trading partner that comes close to this amount is Canada, the United States’ other NAFTA partner, with 25% U.S. content. Chinese imports, on the other hand, have an average of only 4% U.S. content, meaning that the purchase of imports from China does not have the same positive impact on U.S. manufacturers.

The regional auto industry is a good example of this production-sharing phenomenon. The United States, Mexico, and Canada each produce and assemble auto parts, sending them back and forth as they work together to build cars and trucks. Cars built in North America are said to have their parts cross the United States borders eight times as they are being produced, and between 80 and 90% of the U.S. auto industry trade with its North American partners is intra-industry, both of which signal an extremely high level of vertical specialization. As a result, Detroit exports more goods to Mexico than any other U.S. city, and the North American auto industry has proven much more resilient than many expected. Although several of North America’s largest automakers nearly collapsed during the financial crisis in 2008 and 2009, a robust recovery is now under way. Mexico and the United States have each experienced the sharpest rise in vehicle production of the world’s top 10 auto producers during the past two years, growing 51 and 72%, respectively, between 2009 and 2011.

From competitors to partners

The United States and Mexico once worked relatively independently to manufacture goods and export them, but now they work together to produce goods that are sold on the global market. With their economies so intimately linked, the United States and Mexico now experience the cycle of growth and recession together. If they ever were economic competitors, it is clear that they have now become partners that will largely sink or swim together. Because they are in the same boat, the United States and Mexico should develop a joint strategy to increase regional competitiveness vis-à-vis the rest of the world.

The groundwork is already laid, and several recent trends are in North America’s favor. To begin with, Mexico and the United States are among the most open economies in the world. Through their extensive networks of free trade agreements, the two countries have tariff-free access to more than 50 countries, including the large economies of the European Union and Japan. This presents a tremendous opportunity for jointly produced goods to be exported around the world, something that could create jobs and improve the trade balance of the United States. The key, of course, is getting costs sufficiently low and productivity sufficiently high that North American goods are competitive with their European and Asian competitors.

Labor costs in China are rising while oil prices are increasing transportation costs, and new advanced manufacturing techniques are making labor an ever-smaller portion of the total cost of making a product. These factors have led to what the Economist recently called the boomerang effect: Some companies that chased cheap wages in China in the previous two decades have reconsidered their decision to move production offshore. Some are now more interested in either increasing production in Mexico or moving it back to the United States.

What is amazing is that North America is recovering its competitiveness without much of a strategy. Imagine how much more could be done if policymakers fully understood and took advantage of this opportunity. Instead of simply enjoying the moderate recovery of the manufacturing sector, the United States, Mexico and Canada should work as partners to develop policies that could lead to a real resurgence of the region.

Without a doubt, each country must address a number of domestic challenges. Many, such as education and fiscal reform, are needed in Mexico and the United States. Mexico also needs to strengthen the rule of law, increase competition, and improve productivity in the energy sector, and the United States needs to revamp its immigration system so that it can continue to attract the most motivated and talented individuals to contribute to its economy. The regional policy options outlined below go hand in hand with these domestic efforts, and together they have the power to truly revitalize the regional economy.

Policy for a competitive region

The border. With an integrated regional manufacturing sector, the same goods cross the U.S.-Mexico border several times as they are being produced. Consequently, the effects of any barriers to trade, tariff or nontariff, are multiplied by the number of border crossings that take place during production. In the NAFTA region, tariffs are not a significant trade barrier, but the importance of having efficient border management and customs procedures is difficult to overstate.

After NAFTA took effect and trade barriers fell, bilateral trade skyrocketed, more than tripling by 2000. But after the terrorist attacks of 9/11, a new approach to homeland security led to a “thickening” of the border. Trade and passenger travel ground to a near halt. Although trade has been moving since then, the new security concerns have meant that there was never a return to the status quo. Between 2000 and 2010, legal entries of commercial trucks into the United States at the southern border dropped by 41%. Since then, several studies have attempted to estimate the cost of increased border wait times on the regional economy, particularly of border communities. The results are varied, but there is widespread agreement that border-related congestion has had a multibillion-dollar effect on the U.S. and Mexican economies.

Seeking to mitigate these costs, the U.S. and Mexican governments developed the 21st Century Border initiative, which is based largely on the idea that neither security nor efficiency has to be sacrificed to improve the other. By expediting the flow of safe and legal border crossers and cargo, officials can focus more of their attention on seeking dangerous people and goods. This is the concept behind the trusted traveler (SENTRI) and trusted shipper (FAST and C-TPAT) programs in place at the Mexican border. Frequent border crossers prove they are low risk by undergoing an extensive background check and interview process. In return, they get to use special lanes to quickly cross the border.

There is no silver bullet in border management, but these programs are the closest thing. They make the border safer while lessening the need for building more vehicle lanes at entry ports and increasing the number of border staff. They should be expanded and vigorously promoted. Where they are in place, the United States should work with Mexican officials to ensure that use of the dedicated express lanes significantly reduces waiting times, so that there is an incentive to join the programs.

Moderate infrastructure investments are also needed, because although trade has quintupled, relatively few entry ports have seen any major upgrades or expansions. Public/private partnerships are an important mechanism to bring needed funding to the border area, and the Department of Homeland Security should work with Congress to create secure and appropriate mechanisms to encourage their use, if it determines that the current legal environment excessively limits such use. Such partnerships have been successful in some areas, but many border communities and businesses would be willing to commit more resources to facilitate travel and commerce.

Transportation networks. Given the importance of U.S.-Mexico trade, the development of regional transportation networks to facilitate trade is too important to leave to chance and ad hoc processes. Local, state, and federal representatives should and do have a voice in the process of guiding the development of border infrastructure and the highway and rail lines that link the interior states of Mexico and the United States. What is lacking is a coherent and robust master planning process to ensure that strategic rather than political interests are the guiding force behind border and transportation infrastructure investments.

In 2006, California and Baja California took the initiative to begin developing a regional master plan, an awardwinning project that many believe could be successfully replicated. Other regions of the U.S.-Mexico border have similar plans in various stages of development, but a true master plan spanning the entire border would best facilitate the competitiveness of the United States and Mexico.

Customs. In addition to the cost of long and unpredictable border wait times, importers and exporters must meet significant documentation requirements, especially in order to take advantage of the tariff preferences granted by NAFTA. The agreement’s rules of origin, for example, stipulate that only products from the United States, Canada, or Mexico should get preferential treatment. This means that firms must maintain records proving that their products, and sometimes the parts contained within them, were made or sufficiently altered within the NAFTA area. This paperwork burden can at times be substantial, especially for small- and medium-sized businesses.

In theory, the way to solve this issue is to create a customs union (like that of the European Union) with a set of common external tariffs for all nonmember countries. With a common tariff, the movement of goods within the region would be subject only to security checks, because customs requirements would all be addressed as goods enter or exit the perimeter of the customs union. In practice, this would be very difficult to achieve in North America, given the number of trade agreements each country is party to and the various industries each has sought to protect while negotiating those agreements.

As has been suggested by former U.S. Trade Representative Carla Hills, a more appropriate approach may be to take things product by product. For goods that already face similar external tariffs in each of the NAFTA countries, negotiations could be started to have tariffs lowered to the lowest of the three. When a common external tariff is reached for a product, it could then be exempted from most customs requirements at the United States’ southern and northern borders.

A regional partnership for global trade issues. In order to develop a North American export platform, the NAFTA countries should begin to see themselves as an economic alliance. The countries of North America should, whenever possible, engage the global community as partners on trade issues. It may often make sense for the United States, Canada, and Mexico to jointly approach third countries to resolve trade disputes, given the integrated nature of regional manufacturing.

Each of the three North American countries has made a strategic decision to strengthen its engagement with Asia. Given the dynamism of so many Asian economies, this is entirely appropriate. A strategic question, though, is whether they should each make this pivot individually or do so as a bloc. The United States is currently engaged in trade negotiations with a number of Pacific Rim countries to form the Trans-Pacific Partnership. Both Mexico and Canada have signaled their desire to join the negotiations, and finding a way to bring them in is the right strategic move for the United States. The Trans-Pacific Partnership has the potential to actually deepen North American integration, strengthening rules on topics such as intellectual property rights. With full regional participation, it would also open new markets for jointly produced goods.


Christopher Wilson () is an associate at the Mexico Institute of the Woodrow Wilson International Center for Scholars in Washington, DC, and the author of the 2011 Wilson Center report Working Together: Economic Ties between the United States and Mexico.

Getting the Most Out of Electric Vehicle Subsidies

The electrification of passenger vehicles has the potential to address three of the most critical challenges of our time: Plug-in vehicles may produce fewer greenhouse gas emissions when powered by electricity instead of gasoline, depending on the electricity source; reduce and displace tailpipe emissions, which affect people and the environment; and reduce gasoline consumption, helping to diminish dependence on imported oil and diversify transportation energy sources.

Several electrification technologies exist for helping to achieve these goals. Hybrid electric vehicles (HEVs), such as the Toyota Prius and the Ford Fusion Hybrid, don’t plug in. They still use gasoline for net propulsion energy, but they also use an electric motor and a small battery pack to improve fuel efficiency.

Plug-in hybrid electric vehicles (PHEVs), such as the GM Volt, charge an onboard battery via a wall outlet. They use electricity for propulsion when the battery is charged but also have a gasoline engine for use when the battery is depleted. Larger PHEV batteries enable longer electric travel between charges. The PHEV version of the Prius has an 11- mile battery pack; the GM Volt has a 35-mile battery pack.

Battery electric vehicles (BEVs), such as the Nissan Leaf, plug in to charge an onboard battery. They have no gasoline backup, so they require large battery packs to enable longer trips, and they require higher-power charging equipment to refill the battery overnight. The Nissan Leaf has a 73- mile battery pack; the Ford Focus Electric has a 76-mile battery pack.

Current federal policy intended to encourage the development and deployment of plug-in vehicles includes tax subsidies established in the 2009 American Recovery and Reinvestment Act of up to $7,500 per vehicle. Some members of Congress have proposed extending this tax credit, others have proposed eliminating it, and President Obama proposed increasing the credit to $10,000 to help meet his administration’s target of one million plug-in vehicles on the road by 2015. Both existing and proposed subsidies provide larger payments for vehicles with larger battery packs.

Larger battery packs enable vehicles to displace more gasoline, so at first glance one might think that subsidizing larger battery packs is better for the environment and for oil security. But large battery packs are also expensive; the added weight reduces efficiency; they are underused when the battery capacity is larger than needed for a typical trip; they have greater charging infrastructure requirements; and they produce more emissions during manufacturing. Whether larger battery packs offer more benefits on balance depends on their net impacts from cradle to grave.

Running the numbers

Using ranges of values from the academic literature and government studies, it is possible to quantify lifetime externality costs, including greenhouse gases, human health effects, agricultural losses, and infrastructure degradation, caused by air emissions from conventional and electrified vehicles. Many of these damages vary with the location of air-emission releases, so it is important to account for the existing and potential future locations of vehicle tailpipes, power plants, oil refineries, vehicle and battery production facilities, and upstream supply chain entities, such as mines for raw material extraction. It is also possible to estimate the extra U.S. costs of oil consumption beyond the market price paid, including increased vulnerability to oil supply disruptions, increases in world oil prices due to U.S. demand, and military spending related to oil security.

If we add up all of these costs, which we did in a study published in 2012 in the Proceedings of the National Academy of Sciences, we find thousands of dollars of damages per vehicle (gasoline or electric) that are paid by the overall population rather than only by those releasing the emissions and consuming the oil. These costs are substantial. But, importantly, the potential of plug-in vehicles to reduce these costs is modest: much lower than the $7,500 tax credit and small compared to ownership costs. This is because the damages caused over the life cycle of a vehicle are caused not only by gasoline consumption, which is reduced with plug-in vehicles, but also by emissions from battery and electricity production, which are increased with plug-in vehicles.

Today’s policies provide larger subsidies for vehicles with larger battery packs, but those large battery packs do not generally offer more benefits, even in optimistic scenarios. For example, as a base case assume that the battery will last the life of the vehicle and take average U.S. estimates for electricity production, oil refining, vehicle and battery production, driving location, upstream supply chain emissions, and greenhouse gas emission costs. In this case, HEVs and PHEVs with small battery packs cause lower damages than conventional gasoline vehicles, but BEVs with large battery packs actually increase net damages. In an optimistic scenario where plug-in vehicles receive all of their charging electricity from zero-emission sources, the lifetime benefits of plugin vehicles exceeds the benefits of HEVs by about $1,000. In contrast, if plug-in vehicles are charged using coal-generated electricity, they could cause several thousands of dollars more damage per vehicle.

HEVs, PHEVs, and BEVs are all expected to provide some benefits over conventional vehicles on average, but those benefits do not necessarily increase with battery size, and even in the most optimistic scenarios the large subsidies for vehicles with large battery packs are not justified by their air-emission and oil-displacement potential.

Policy adjustment

Under current federal policy, plug-in vehicles with battery packs at least as large as the Chevy Volt’s [16 kilowatt-hours (kWh), providing about 35 electric miles per charge] receive the full $7,500 tax credit, while vehicles with smaller battery packs, such as the Toyota Prius Plug-in Hybrid (4.4 kWh, providing about 11 electric miles per charge) receive only $2,500. At first glance, tripling the subsidy may seem justified because the electric range is tripled. But tripling the range does not mean tripling the amount of gasoline displaced or emissions reduced: Increasing battery size has diminishing returns. In fact, when we consider U.S. driving patterns (many short trips, where the larger battery is only dead weight), U.S. average emissions from battery and electricity production, and the other factors described above, the small 4.4-kWh battery actually has more net benefits than the larger 16-kWh battery. Even in the most optimistic scenarios where vehicles are charged with zero-emission electricity, the larger battery packs offer only comparable or slightly greater net benefits, not double or triple. Public funds are limited, and because today’s policy consumes more resources when subsidizing large-battery vehicles, fewer of them can be supported under a fixed budget. Allocating a fixed budget to a flat $2,500 subsidy for all plug-in vehicles would more than triple the potential air-emissions and oildisplacement benefits of the subsidized vehicles as compared to subsidizing one-third as many large-battery vehicles at $7,500 each.

It is important to note that in the future, plug-in vehicles with large battery packs might be able to offer the largest benefits at the lowest costs if all the right factors fall into place, including low-cost batteries, low-emissions electricity, long battery life, and high gasoline prices. Policies supporting R&D for battery improvements and large emissions reductions from electricity generation can help move the country in this direction. But such a future may take decades to realize and is not guaranteed because of uncertain technical, economic, and political factors. In the near term, HEVs and PHEVs with small battery packs are more robust, offering more air-emission and oil-displacement benefits per dollar spent. And although some characteristics of longerrange batteries are different, the production of small-battery vehicles in the near term will create demand for batteries that will help drive learning and innovation to lower the costs of all electrified vehicles.

There are myriad other arguments for supporting vehicle electrification beyond human health, environmental, and oil-displacement effects. This long list might include job creation, reducing the trade deficit by shifting from foreign to domestic fuel sources, enabling a distributed storage resource to support the integration of intermittent renewable electricity generation, reducing oil revenues to states hostile to U.S. interests, hedging against an anticipated oil-scarce or carbon-constrained future, improving regulatory control over emissions associated with poor vehicle maintenance, generating positive externalities by encouraging innovation, encouraging domestic development of strategic technical competency and intellectual property, reducing nonfinancial political and human suffering effects from war and political instability, and promoting international environmental justice. However, because HEVs and PHEVs with smaller battery packs provide more air-emissions reduction and oil displacement per dollar spent and offer lifetime costs competitive with conventional vehicles, it is not clear that directing near-term subsidies toward vehicles with large battery packs would produce superior results on any of these objectives.

We should not forget that the most efficient policies would target externalities directly, through mechanisms such as an economywide carbon price, cap-and-trade policies, and gasoline taxes. Such policies are generally understood to be far more efficient than technology-specific subsidies, and we should consider subsidies as an inferior substitute given the political difficulties of implementing efficient market-based policies that address the problem directly. In the absence of such policies, federal subsidies and policies designed to encourage electrified vehicle adoption would produce more benefit at lower cost for the foreseeable future by targeting the purchase of vehicles with small battery packs.


Jeremy J. Michalek () is an associate professor of engineering and public policy and of mechanical engineering at Carnegie Mellon University; Mikhail Chester is an assistant professor in civil, environmental, and sustainability engineering at Arizona State University; and Constantine Samaras is an engineer at the RAND Corporation.

Decisionmaking, Transitions, and Resilient Futures

In early 2010, two major earthquakes hit the Western Hemisphere: a 7.0 magnitude quake southwest of Porte-au-Prince, Haiti (population 9.7 million), and an 8.8 temblor north of Concepción, Chile (population 17.1 million). The death toll in Haiti was over 220,000; that in Chile, fewer than 1,000. This two-orders-of-magnitude difference can be attributed in part to the distances between the epicenters of the quakes and the countries’ respective population centers. But the largest part of the difference is the result of the Chilean government’s consistent willingness to heed the advice provided by its world-renowned natural and social scientists and engineers—advice that minimized vulnerability with strict building codes and enabled robust emergency response through preparedness planning. Disasters may strike randomly, but the extent of the damage and the speed of and capability for response and rebuilding have nothing to do with luck and everything to do with science.

Not all eventualities for which governments need to plan are natural disasters such as earthquakes, or even very abrupt; many shifts occur so gradually that their consequences may not be felt for some time. Environmental and social changes are happening, and scientific data show that many, from floods and severe storms to droughts and wildfire, are having greater effects than they had in the 20th century. Government agencies are beginning to take action. The U.S. military is trying to anticipate, mitigate, and adapt to the consequences of the environmental changes that are happening right now, and some localities and major cities such as New York, Chicago, Seattle, and Los Angeles are also making preparations. Still, both the military and the cities have questions in the arena of interactions between human and environmental systems that research could help to answer. They and the broader civil society can benefit from existing and future research that identifies potential environmental changes and explores social factors influencing how well planning and responses can limit damages.

One of the biggest challenges confronting society is climate change. Most people consider additional natural science research on climate change to be a wise societal investment. This is because naturally occurring changes such as slight alterations in Earth’s orbit have over the eons caused both ice ages and warm periods, with profound implications for life on Earth. If human activities have the potential to interact with natural cycles and bring an end to the relative stability that the climate system has experienced over the past 10,000 years, the potential risks posed by climate change could be large and are thus worth understanding. But what are the risks, and how large might they be? Unfortunately, the amount of warming to which we are already committed because of past emissions and inertia in energy and economic infrastructure makes achievement of the low end of the range of climate change futures almost impossible. Thus, iterative risk management is now being framed as a combination of adaptation (preparing for and responding to changes to which the climate system is already committed over the next several decades) and mitigation (reducing human contributions to climate change), where efforts started now will have significant consequences for the magnitude and nature of climate change and associated impacts after mid-century. Research seeks to understand the risks of different combinations of these approaches, to identify many potential effects and societal consequences, and to clarify where, when, and how likely these consequences are to occur, given different levels and rates of climate change, and how they will interact with other societal and environmental changes.

Energy security is a parallel case involving both natural resources and societal risks. A country dependent on importing energy is vulnerable to supply disruption resulting from international politics or the domestic policies of the exporting countries; expenditures on imports can also negatively affect domestic economic growth. Focusing efforts on developing an apparently abundant domestic source of fuel for electricity generation may provide various forms of “security” in the short term—jobs, economic growth—but may engender unintended consequences that result from uncertain side effects of the new technology. Although these considerations have natural-science and engineering components, the greatest risks are in the social arena: economic ramifications, health effects, or quality-of-life changes, to name but a few. How great an impact might events in these risk categories have?

Risk is usually seen as a function of the consequences of an event, such as loss of life or economic damages, combined with the likelihood of its occurrence. The details of interactions between the natural and social systems of Earth in this time of transition are not predictable with our current level of understanding, but that there are risks involved is undeniable, and that there will be some drastic consequences if and when these events occur is likely. How a nation averts, prepares for, and/or manages risk forms the difference between human loss of life and destroyed infrastructure such as occurred in Haiti, and the minimized loss of life and recovering infrastructure of Chile.

Research in the social sciences, integrated with climate and environmental research, provides insights about consequences and likelihood and the potential effectiveness of different approaches to increase resilience or reduce human contributions to the drivers of change. Social science research contributes to global environmental risk management by projecting the effect of alternative human choices: not predicting the future, but providing “if/then” analyses of the potential consequences of acting or not acting, of alternative economic development pathways, future scenarios of population growth, different technologies, or the aggregate effects of billions of consumer choices made every day. It helps to anticipate vulnerabilities and damaging exposures of environmental and societal change and to identify and plan for potential opportunities that may arise. Drawing on these and other insights, it also contributes significantly to the development of decision-support mechanisms that help decisionmakers with the complex sets of choices that they face.

Informing decisions

The differences between the disasters in Haiti and Chile had to do with preparedness for any disaster or eventuality. Preparedness for environmental changes, either abrupt or gradual, involves one or more series of decisions, any one of which may also have a place in strategies for maximizing agricultural productivity or energy efficiency, for example, even under stationary environmental conditions.

Decisionmaking can be regarded as a process that results in the selection of a course of action among one or more alternative scenarios. Individuals often make decisions unconsciously, based on need, preference, and values; these may be rational or irrational, depending on the balance between emotion and reason engendered by the situation calling for the decision. Societal decisions about policies, public expenditures, and other issues involve additional influences, including complexity that arises from differences in need, perception, and values across individuals, and cultural and other differences across groups. Effective decisionmaking processes enable participants to explore these differences, incorporate information, and iterate to achieve Rob Carter, Metropolis (excerpt), Animation; 3:00 minutes, 2008. a common understanding and basis for action.

Kelly Sims Gallagher and John C. Randell point out that important and expensive government programs apparently do not take advantage of what is already known about consumer behavior. The predictive capacity of this existing knowledge also indicates that there is a great deal of value to be gained from integrating research on creating new technologies with research to address the nontechnical barriers to the adoption of those technologies by society. These understandings of individuals’ choice behaviors could then be planning inputs for alternative scenarios of futures in which these technologies are deployed to increase national energy security and human well-being.

When decisions must be made that affect groups of people or large segments of society, the process should be conducted within a structured framework that helps the decisionmakers take into account multiple objectives, needs, preferences, and values as well as vulnerabilities, risks, and uncertainties. Such a framework, for energy strategy, is outlined by Joseph Arvai and his coauthors in this issue. Within this decision-support framework, the principles of scenario planning and systems thinking, developed through social science research, are applied to break very complex decisions into smaller, more tractable parts that are not prone to error and bias and are internally consistent. Such a decision-support process leads to more satisfied and better educated decisionmakers as well as a more transparent process in which affected parties place greater trust. A framework of this kind has been used by Michigan State University and is demonstrated in an interactive exhibit at the Marian Koshland Science Museum in Washington, DC, and is being developed for Canada’s national energy strategy.

The application of either method calls for some ability to compare values among different choices. In a pure market analysis, this may be possible because the choices can all be valued with a common monetary metric. But as Stephen Polasky and Seth Binder explain, environmental decisionmaking presents challenges because many of the inputs and outcomes cannot easily be measured monetarily because they are market externalities. Most also have strong impacts, because a single one may simultaneously affect multiple environmental and social factors. As these authors note, making decisions about tradeoffs among multiple objectives that society cares about involves making value judgments. There are extant methods for comparing values of market internalities and externalities, and these can be used now in scenario planning. But there is research yet to be done on ways to collect information on the values of the alternatives and on methods for aggregating these values to estimate social net benefits.

Even if there are adequate metrics, it is a fact that decisions and policy have to be made in the context of uncertainties. There will always be more than one possible result of a decision, uncontrollable and uncertain forces may affect the feasibility of implementing a decision, and some of the possibilities are likely to involve a loss, catastrophe, or other undesirable outcome. Uncertainty is not only unavoidable in the absence of prior knowledge of an actual future, it is also a tenet of the scientific process that the eradication of uncertainty is itself inherently a danger. Certainty can lead to complacency and a lack of the questioning that leads to increasing knowledge. It is important to understand which of these is the source of an expert’s uncertainty when that expert is asked for advice by policymakers, as well as to have some idea of the degree of uncertainty to which the advice is subject. As Baruch Fischhoff describes, however, it is possible for scientists to communicate more clearly with policymakers about the parameters of uncertainty inherent in the expert advice they are providing, and for policymakers to help experts understand the need for their advice to be couched in terms that allow judgments to be made about the degree of confidence that can be placed in it.

Science for transitions

Going forward, research in behavioral economics, risk communication, governance, decision science, and socioecological interactions will provide new data and information that can inform the processes and strategies described in this group of articles. Some examples of the kinds of questions about global environmental change and its implications that require social science research to answer include:

  • How can warning systems for droughts, floods, or severe weather be made more effective?
  • How are major environmental hazards and changes linked to humanitarian disasters, political instability, and other security threats? What makes some societies better able to cope than others?
  • How can diverse societies, comprising individuals with very different values and risk tolerances, agree on how to place a value on potential impacts, considering economic damages as well as less tangible factors such as cultural or environmental benefits?
  • How can limited existing knowledge be best used by decisionmakers? How can they weigh the risks of waiting for more information against the benefits of acting when knowledge is more complete?
  • How will different human choices regarding economic development, population, technology, and consumption contribute to different levels and rates of climate change?
  • How should risks from high-consequence, low-probability exposures and events be assessed and communicated by scientists to decisionmakers and the public?
  • What are the employment and economic effects of “green” stimulus and energy technology policies, including factors that contribute to their success or failure?
  • What factors contribute to the development and diffusion of technological innovations? How do technical, institutional, social, economic, and behavioral dynamics accelerate or slow improvements in energy efficiency and the deployment of emerging technologies such as “smart” electrical grids?
  • What are the economic, social, and other implications of environmental markets and other approaches for reducing emissions, adapting to change, or national security?

Two recent developments have the potential to advance this research. In the recently released National Global Change Research Plan 2012–2021: A Strategic Plan for the U. S. Global Change Research Program, increasing attention is devoted to the interactions of coupled human-environmental systems, understanding the societal consequences of environmental change, and providing information in a fashion that is useful to decisionmakers. This plan is built around four goals: advance science, inform decisions, conduct sustained assessments, and communicate and educate. It specifically calls for increased social science research to achieve these objectives.

The National Academy of Sciences has also taken steps to encourage the research needed to address these issues by establishing the Board on Environmental Change and Society (BECS). The board builds on two decades of work by a predecessor committee on the human dimensions of global environmental change. Its goal is to advance the scientific basis for understanding coupled human-environment systems and to inform transitions needed to improve human well-being in the face of environmental change. By making behavioral, social, economic, and decision sciences research accessible to environmental policy and by integrating social and environmental research, the board seeks to identify potential opportunities, anticipate vulnerabilities and damaging exposures, and inform policies and transitions that contribute to environmental sustainability. With its focus on coupled human-environment systems, innovation and technology deployment, risk and governance, vulnerability and adaptation to environmental change, resilience, and decision support, BECS’s scholarly work and publicly accessible products will in the coming years be important resources for policymakers at all levels of government, the private sector, education, and the public. The articles that follow provide examples of the kinds of contributions that BECS can make.

Eight Questions for Drug Policy Research

Drug abuse—of licit and illicit drugs alike—is a big medical and social problem and attracts a substantial amount of research attention. But the most attractive and most easily fundable research topics are not always those with the most to contribute to improved social outcomes. If the scientific effort paid more attention to the substantial opportunities for improved policies, its contribution to the public welfare might be greater.

The current research agenda around drug policy concentrates on the biology, psychology, and sociology of drugtaking and on the existing repertoire of drug-control interventions. But that repertoire has only limited capacity to shrink the damage that drug users do to themselves and others or the harms associated with drug dealing, drug enforcement, and drug-related incarceration; and the current research effort pays little attention to some innovative policies with substantial apparent promise of providing improved results.

At the same time, public opinion on marijuana has shifted so much that legalization has moved from the dreams of enthusiasts to the realm of practical possibility. Yet voters looking to science for guidance on the practicalities of legalization in various forms find little direct help.

All of this suggests the potential of a research effort less focused on current approaches and more attentive to alternatives.

The standard set of drug policies largely consists of:

With respect to alcohol and tobacco, there is great room or improvement even within the existing policy repertoire for example, by raising taxes), even before more-innovaive approaches are considered. With respect to the currently illicit drugs, it is much harder to see how increasing or slightly modifying standard-issue efforts will measurably shrink the size of the problems.

The costs—fiscal, personal, and social—of keeping half a million drug offenders (mostly dealers) behind bars are sufficiently great to raise the question of whether less comprehensive but more targeted drug enforcement might be the better course. Various forms of focused enforcement offer the promise of greatly reduced drug abuse, nondrug crime, and incarceration. These include testing and sanctions programs, interventions to shrink flagrant retail drug markets, collective deterrence directed at violent drug-dealing organizations, and drug-law enforcement aimed at deterring and incapacitating unusually violent individual dealers. Substantial increases in alcohol taxes might also greatly reduce abuse, as might developing more- effective treatments for stimulant abusers or improving the actual evidence base underlying the movement toward “evidence-based policies.”

These opportunities and changes ought to influence the research agenda. Surely what we try to find out should bear some relationship to the practical choices we face. Below we list eight research questions that we think would be worth answering. We have selected them primarily for policy relevance rather than for purely scientific interest.

1) How responsive is drug use to changes in price, risk, availability, and “normalcy”?

The fundamental policy question concerning any drug is whether to make it legal or prohibited. Although the choice s not merely binary, a fairly sharp line divides the spectrum of options. A substance is legal if a large segment of he population can purchase and possess it for unsupervised “recreational” use, and if there are no restrictions on who can produce and sell the drug beyond licensing and routine regulations.

Accepting that binary simplification, the choice becomes what kind of problem one prefers. Use and use-related problems will be more prevalent if the substance is legal. Prohibition will reduce, not eliminate, use and abuse, but with three principal costs: black markets that can be violent and corrupting, enforcement costs that exceed those of regulating a legal market, and increased damage per unit of consumption among those who use despite the ban. (Total use related harm could go up or down depending on the extent to which the reduction in use offsets the increase in harmfulness per unit of use.)

The costs of prohibition are easier to observe than are its benefits in the form of averted use and use-related problems. In that sense, prohibition is like investments in prevention, such as improving roads; it’s easier to identify the costs than to identify lives saved in accidents that did not happen.

We would like to know the long-run effect on consumption of changes in both price and the nonprice aspects of availability, including legal risks and stigma. There is now a literature estimating the price elasticity of demand for illegal drugs, but the estimates vary widely from one study to the next and many studies are based on surveys that may not give adequate weight to the heavy users who dominate consumption. Moreover, legalization would probably involve price declines that go far beyond the support of historical data.

Furthermore, as Mark Moore pointed out many years ago, the nonprice terms of availability, which he conceptualized as “search cost,” may match price effects in terms of their impact on consumption. Ye t those effects have never been quantitatively estimated for a change as profound as that from illegality to legality. The decision not to enforce laws against small cannabis transactions in the Netherlands did not cause an explosion in use; whether and how much it increased consumption and whether the establishment of retail shops mattered remain controversial questions.

This ignorance about the effect on consumption hamstrings attempts to be objective and analytical when discussing the question of whether to legalize any of the currently illicit drugs, and if so, under what conditions.

2) How responsive is the use of drug Y to changes in policy toward drug X?

Polydrug use is the norm, particularly among frequent and compulsive users. (Most users do not fall in that category, but the minority who do account for the bulk of consumption and harms.) Therefore, “scoring” policy interventions by considering only effects on the target substance is potentially misleading.

The costs—fiscal, personal, and social—of keeping half a million drug offenders (mostly dealers) behind bars are sufficiently great to raise the question of whether less comprehensive but more targeted drug enforcement might be the better course.

For example, driving up the price of one drug, say cocaine, might reduce its use, but victory celebrations should be tempered if the reduction stemmed from users switching to methamphetamine or heroin. On the other hand, school based drug-prevention efforts may generate greater benefits through effects on alcohol and tobacco abuse than via their effects on illegal drug use. Comparing them to other drug-control interventions, such as mandatory minimum sentences for drug dealers, in terms of ability to control illegal drugs alone is a mistake; those school-based prevention interventions are not (just) illicit-drug–control programs.

But policy is largely made one substance at a time. Drugs are added to schedules of prohibited substances based on their potential for abuse and for use as medicine. Reformers clamor for evidence-based policies that rank individual drugs’ harmfulness, as attempted recently by David Nutt, and ban only the most dangerous. Ye t it makes little practical sense to allow powder cocaine while banning crack, because anyone with baking soda and a microwave oven can convert powder to crack.

Considerations of substitution or complementarity ought to arise in making policy toward some of the so-called designer drugs. Mephedrone looks relatively good if most of its users would otherwise have been abusing methamphetamine; it looks terrible if in fact it acts as a stepping stone to methamphetamine use. But no one knows which is the case.

Marijuana legalization is in play in a way it has not been since the 1970s. Various authors have produced social-welfare analyses of marijuana legalization, toting up the benefits of reduced enforcement costs and the costs of greater need for treatment, accounting for potential tax revenues and the like.

Yet the marijuana-specific gains and losses from legalization would be swamped by the uncertainties concerning its effects on alcohol consumption. The damage from alcohol is a large multiple of the damage from cannabis; thus a 10% change, up or down, in alcohol abuse could outweigh any changes in marijuana-related outcomes.

There is conflicting evidence as to whether marijuana and alcohol are complements or substitutes; no one can rule out even larger increases or decreases in alcohol use as a result of marijuana legalization, especially in the long run.

Marijuana legalization might also influence heavy use of cocaine or cigarette smoking. But again, no one knows whether that effect would be to drive cocaine or cigarette use up or down, let alone by how much. If doubling marijuana use led to even a 1% increase or decrease in tobacco use, it could produce 4,000 more or 4,000 fewer tobacco related deaths per year, far more than the (quite small) number of deaths associated with marijuana.

This uncertainty makes it impossible to produce a solid benefit/cost analysis of marijuana legalization with existing data. That suggests both caution in drawing policy conclusions and aggressive efforts to learn more about cross-elasticities among drugs prone to abuse.

3) Can we stop large numbers of drug-involved criminal offenders from using illicit drugs?

Many county, state, and federal initiatives target drug use among criminal offenders. Ye t most do little to curtail drug use or crime. An exception is the drug courts process; some implementations of that idea have been shown to reduce drug use and other illegal behavior. Unfortunately, the resource intensity of drug courts limits their potential scope. The requirement that every participant must appear regularly before a judge for a status hearing means that a drug court judge can oversee fewer than 100 offenders at any time.

The HOPE approach to enforcing conditions of probation and parole, named after Hawaii’s Opportunity Probation with Enforcement, offers the potential for reducing use among drug-involved offenders at a larger scale. Like drug courts, HOPE provides swift and certain sanctions for probation violations, including drug use. HOPE starts with a formal warning that any violation of probation conditions will lead to an immediate but brief stay in jail. Probationers are then subject to regular random drug testing: six times a month at first, diminishing in frequency with sustained compliance. A positive drug test leads to an immediate arrest and a brief jail stay (usually a few days but in some jurisdictions as little as a few hours in a holding cell). Probationers appear before the judge only if they have violated a rule; in contrast, a drug court judge participates in every status review. Thus HOPE sites can supervise large numbers of offenders; a single judge in Hawaii now supervises more than 2,000 HOPE probationers.

In a large randomized controlled trial (RCT), Hawaii’s HOPE program greatly outperformed standard probation in reducing drug use, new crimes, and incarceration among a population of mostly methamphetamine-using felony probationers. A similar program in Tarrant County, Texas (encompassing Arlington and Fort Worth), appears to produce similar results, although this has not yet been verified by an RCT, as has a smaller-scale program (verified by an RCT) among parolees in Seattle. Reductions in drug use of 80%, in new arrests of 30 to 50%, and in days behind bars of 50% appear to be achievable at scale. The last result is the most striking; get-tough automatic-incarceration policies can reduce incarceration rather than increasing it, if the emphasis is on certainty and celerity rather than severity.

The Department of Justice is funding four additional RCTs; those results should help clarify how generalizable the HOPE outcomes are. But to date there has been no systematic experimentation to test how variations in program parameters lead to variations in results.

Hawaii’s HOPE program uses two days in jail as its typical first sanction. Penalties escalate for repeated violations, and the 15% or so of participants who violate a fourth time face a choice between residential treatment and prison. No one is mandated to undergo treatment except after repeated failures. The results suggest that this is an effective design, but is it optimal? Would some sanction short of jail for the first violation—a curfew, home confinement, or community service—work as well? Are escalating penalties necessary and if so, what is the optimal pattern of escalation? Is there a subset of offenders who ought to be mandated to treatment immediately rather than waiting for failures to accumulate? Should cannabis be included in the list of drugs tested for, as it is in Hawaii, or excluded? How about synthetically produced cannabinoids (sold as “Spice”) and cathinones (sold as “bath salts”), which require more complex and costly screening? Would adding other services to the mix improve outcomes? How can HOPE be integrated with existing treatment-diversion programs and drug courts? How can HOPE principles best be applied to parole, pretrial release, and juvenile offenders?

Answering these questions would require measuring the results of systematic variation in program conditions. There is no strong reason to think that the optimal program design will be the same in every jurisdiction or for every offender population. But it’s time to move beyond the question “Does HOPE work?” to consider how to optimize the design of testing-and-sanctions programs.

4) Can we stop alcohol-abusing criminal offenders from getting drunk?

Under current law, state governments effectively give every adult a license to purchase and consume alcohol in unlimited quantities. Judges in some jurisdictions can temporarily revoke that license for those with an alcohol-related offense by prohibiting drinking and going to bars as conditions of bail or probation. However, because alcohol passes through the body quickly, a typical random-but-infrequent testing regiment would miss most violations, making the revocation toothless.

In 2005, South Dakota embraced an innovative approach to this problem, called 24/7 Sobriety. As a condition of bail, repeat drunk drivers who were ordered to abstain from alcohol were now subject to twice-a-day breathalyzer tests, every day. Those testing positive or missing the test were immediately subject to a short stay in jail, typically a night or two. What started as a five-county pilot program expanded throughout the state, and judges began applying the program to offenders with all types of alcohol-related criminal behavior, not just drunk driving. Some jurisdictions even started using continuous alcohol-monitoring bracelets, which can remotely test for alcohol consumption every 30 minutes. Approximately 20,000 South Dakotans have participated in 24/7—an astounding figure for a state with a population of 825,000.

The anecdotal evidence about the program is spectacular; fewer than 1% of the 4.8 million breathalyzer tests ordered since 2005 were failed or missed. That is not because the offenders have no interest in drinking. About half of the participants miss or fail at least one test, but very few do so more than once or twice. 24/7 is now up and running in other states, and will soon be operating in the United Kingdom. As of yet there are no peer-reviewed studies of 24/7, but preliminary results from a rigorous quasi-experimental evaluation suggest that the program did reduce repeat drunk driving in South Dakota. Furthermore, as with HOPE, there remains a need to better understand for whom the program works, how long the effects last, the mechanism(s) by which it works, and whether it can be effective in a more urban environment.

Programs such as HOPE and 24/7 can complement traditional treatment by providing “behavioral triage.” Identifying which subset of substance abusers cannot stop drinking on their own, even under the threat of sanctions, allows the system to direct scarce treatment resources specifically to that minority.

Another way to take away someone’s drinking license would be to require that bars and package stores card every would be to require that bars and package stores card every buyer and to issue modified driver’s licenses with nondrinker markings on them to those convicted of alcohol-related crimes. This approach would probably face legal and political challenges, but that should not discourage serious analysis of the idea.

There is also strong evidence that increasing the excise tax on alcohol could reduce alcohol-related crime. Duke University economist Philip Cook estimates that doubling the federal tax, leading to a price increase of about 10%, would reduce violent crime and auto fatalities by about 3%, a striking saving in deaths for a relatively minor and easy-to-administer policy change. There is also evidence that formal treatment, both psychological and pharmacological, can yield improvements in outcomes for those who accept it.

There is also strong evidence that increasing the excise linked. Among people with drug problems who are also crimtax on alcohol could reduce alcohol-related crime. Duke Uni- inally active, criminal activity tends to rise and fall with drug versity economist Philip Cook estimates that doubling the consumption. Reductions in crime constitute a major benfederal tax, leading to a price increase of about 10%, would efit of providing drug treatment for the offender population, reduce violent crime and auto fatalities by about 3%, a strik- or of imposing HOPE-style community supervision. ing saving in deaths for a relatively minor and easy-to-ad- Reducing drug use among active offenders could also minister policy change. There is also evidence that formal shrink illicit drug markets, producing benefits everywhere, treatment, both psychological and pharmacological, can yield from inner-city neighborhoods wracked by flagrant drug improvements in outcomes for those who accept it.

5) How concentrated is hard-drug use among active criminals?

Literally hundreds of substances have been prohibited, but the big three expensive drugs (sometimes called the “hard” drugs)—cocaine, including crack; heroin; and methamphetamine— account for most of the societal harm. The serious criminal activity and other harms associated with those substances are further highly concentrated among a minority of their users. Many people commit a little bit of crime or use hard drugs a handful of times, but relatively few make a habit of either one. Despite their relatively small numbers, those frequent users and their suppliers account for a large share of all drug-related crime and violence.

The populations overlap; an astonishing proportion of those committing income-generating crimes, such as robbery, as opposed to arson, are drug-dependent and/or intoxicated at the time of their offense, and a large proportion of frequent users of expensive drugs commit income-generating crime. Moreover, the two sets of behaviors are causally linked. Among people with drug problems who are also criminally active, criminal activity tends to rise and fall with drug consumption. Reductions in crime constitute a major benefit of providing drug treatment for the offender population, or of imposing HOPE-style community supervision.

Reducing drug use among active offenders could also shrink illicit drug markets, producing benefits everywhere, from inner-city neighborhoods wracked by flagrant drug dealing to source and transit countries such as Colombia and Mexico.

A back-of-the envelope calculation suggests the potential size of these effects. The National Survey on Drug Use and Health estimates users in the household population. The Arrestee Drug Abuse Monitoring Program measures the rate of active substance use among active offenders (by self-report and urinalysis). Two decades ago, an author of this article (Kleiman) and Chris Putala, then on the Senate Judiciary Committee staff, used the predecessors of those surveys to estimate that about three-quarters of all heavy (morethan-weekly) cocaine users had been arrested for a nondrug felony in the previous year.

Applying the Pareto Law’s rule of thumb that 80% of the volume of any activity is likely to be accounted for by about 20% of those who engage in it—true, for example, about the distribution of alcohol consumption—suggests that something like three-fifths of all the cocaine is used by people who get arrested in the course of a typical year and who are therefore likely to be on probation, parole, or pretrial release if not behind bars.

Combining that calculation with the result from HOPE that frequent testing with swift and certain sanctions can shrink (in the Hawaii case) methamphetamine use among heavily drug-involved felony probationers by 80%, suggests that total hard-drug volume might be reduced by something like 50% if HOPE-style supervision were applied to all heavy users of hard drugs under criminal-justice supervision. No known drug-enforcement program has any comparable capability to shrink illicit-market volumes.

By the same token, HOPE seems to reduce criminal activity, as measured by felony arrests, by 30 to 50%. If frequent offenders commit 80% of income-generating crime, and half of those frequent offenders also have serious harddrug problems, such a reduction in offending within that group could reduce total income-generating crime by approximately 15 to 20%, while also decreasing the number of jail and prison inmates.

The Kleiman and Putala estimate was necessarily crude because it was based on studies that weren’t designed to measure the concentration of hard-drug use among offenders. Unfortunately, no one in the interim has attempted to refine that estimate with more precise methods (for example, stochastic-process modeling) or more recent data.

6) What is the evidence for evidence-based practices?

Many agencies now recommend (and some states and federal grant programs mandate) adoption of prevention and treatment programs that are evidence-based. But the move toward evidence-based practices has one serious limitation: the quality of the evidence base. It is important to ask what qualifies as evidence and who gets to produce it. Many programs are expanded and replicated on the basis of weak evidence. Study design matters. A review by George Mason University Criminologist David Weisburd and colleagues showed that the effect size of offender programs is negatively related to study quality: The more rigorous the study is, the smaller its reported effects.

Who does the evaluation can also make a difference. Texas A&M Epidemiologist Dennis Gorman found that evaluations authored by program developers report much larger effect sizes than those authored by independent researchers. Yet Benjamin Wright and colleagues reported that more than half of the substance-abuse programs targeting criminal-justice programs that were designated as evidence-based on the Substance Abuse and Mental Health Services Administration’s (SAMHSA’s) National Registry of Evidence Based Programs and Practices (NREPP) include the program developer as evaluator. Consequently, we may be spending large sums of money on ineffective programs. Many jurisdictions, secure in their illusory evidence base, could become complacent about searching for alternative programs that really do work.

We need to get better at identifying effective strategies and helping practitioners sort through the evidence. Requiring that publicly funded programs be evaluated and show improved outcomes using strong research designs—experimental designs where feasible, well-designed historicalcontrol strategies where experiments can’t be done, and “intent-to-treat” analyses rather than cherry-picking success by studying program completers only—would cut the number of programs designated as promising or evidence-based by more than 75%. Not only would this relieve taxpayers of the burden of supporting ineffective programs, it would also help researchers identify more promising directions for future intervention research.

The potential for selection biases when studying druginvolved people is substantial and thus makes experimental designs much more valuable. Small is key. It avoids expense, and equally important, it avoids champions with bruised egos. It is difficult to scale back a program once an agency becomes invested in the project. Small pilot evaluations that do show positive outcomes can then be replicated and expanded if the replications show similarly positive results.

7) What treats stimulant abuse?

Science can alleviate social problems not only by guiding policy but also by inventing better tools. The holy grail of such innovations would be a technology that addresses stimulant dependence.

The ubiquitous “treatment works” mantra masks a sharp disparity in technologies available for treating opiates (heroin and oxycodone) as opposed to stimulants (notably cocaine, crack, and meth). A variety of so-called opiate-substitute therapies (OSTs) exist that essentially substitute supervised use of legal, pure, and cheap opiates for unsupervised use of street opiates. Methadone is the first and best-known OST, but there are others. A number of countries even use clinically supplied heroin to substitute for street heroin.

OST stabilizes dependent individuals’ chaotic lives, with positive effects on a wide range of life outcomes, such as increased employment and reduced criminality and rates of overdose. Sometimes stabilization is a first step toward abstinence, but for better and for worse the dominant thinking since the 1980s has been to view substitution therapy as an open-ended therapy, akin to insulin for diabetics. Either way, OST consistently fares very well in evaluations that quantify social benefits produced relative to program costs.

There is no comparable technology for treating stimulant dependence. This is not for lack of trying. The National Institute on Drug Abuse has invested hundreds of millions of dollars in the quest for pharmacotherapies for stimulants. Decades of work have produced many promising advances in basic science, but with comparatively little effect on clinical practice. The gap between opiate and stimulant treatment technologies matters more in the United States and the rest of the Western Hemisphere, where stimulants have a large market, than in the rest of the world, where opiates remain predominant.

There are two reactions to this zero-for-very-many batting average. One is to redouble efforts; after all, Edison tried a lot of filament materials before hitting on carbonized bamboo. The other is to give up on the quest for a chemical that can offset, undo, or modulate stimulants’ effects in the brain and pursue other approaches. For example, immunotherapies are a fundamentally different technology inasmuch as the active introduced agent does not cross the blood-brain barrier. Rather, the antibodies act almost more like interdiction agents, but interdicting the drug molecules between ingestion and their crossing the blood-brain barrier rather than interdicting at the nation’s border.

There is evidence from clinical trials showing that some cognitive-behavioral therapies can reduce stimulant consumption for some individuals. Contingency management also takes a behavioral rather than a chemical approach, essentially incentivizing dependent users to remain abstinent. The stunning finding is that, properly deployed, very small incentives (for example, vouchers for everyday items) can induce much greater behavioral change than can conventional treatment methods alone.

The ability of contingency management to reduce consumption, and the finding that even the heaviest users respond to price increases by consuming less, profoundly challenge conventional thinking about the meaning of addiction. They seem superficially at odds with the clear evidence that addiction is a brain disease with a physiological basis. Brainimaging studies let us see literally how chronic use changes the brain in ways that are not reversed by mere withdrawal of the drugs. So just as light simultaneously displays characteristics of a particle and a wave, so too addiction simultaneously has characteristics of a physiological disease and a behavior over which the person has (at least limited) control.

8) What reduces drug-market violence?

Drug dealers can be very violent. Some use violence to settle disputes about territory or transactions; others use violence to climb the organizational ladder or intimidate witnesses or enforcement officials. Because many dealers have guns or have easy access to them, they also sometimes use these weapons to address conflicts that have nothing to do with drugs. Because the market tends to replace drug dealers who are incarcerated, there is little reason to think that routine drug-law enforcement can reduce violence; the opposite might even be true if greater enforcement pressure makes violence more advantageous to those most willing to use it.

That raises the question of whether drug-law enforcement can be designed specifically to reduce violence. One set of strategies toward this end is known as focused deterrence or pulling-levers policing. These approaches involve lawenforcement officials directly communicating a credible threat to violent individuals or groups, with the goal of reducing the violence level, even if the level of drug dealing or gang activity remains the same. Such interventions aim to tip situations from high-violence to low-violence equilibria by changing the actual and perceived probability of punishment; for example, by making violent drug dealing riskier, in enforcement terms, than less violent drug dealing.

The seminal effort was the Boston gun project Ceasefire, which focused on reducing juvenile homicides in the mid-1990s. Recognizing that many of the homicides stemmed from clashes between juvenile gangs, the strategy focused on telling members of each gang that if anyone in the gang shot someone (usually a member of a rival gang) police and prosecutors would pull every lever legally available against the entire gang, regardless of which individual had pulled the trigger. Instead of receiving praise from colleagues for increasing the group’s prestige, the potential shooter now had to deal with the fact that killing put the entire group at risk. Thus the social power of the gang was enlisted on the side of violence reduction. The results were dramatic: Youth gun homicides in Boston fell from two a month before the intervention to zero while the intervention lasted. Variants of Ceasefire have been implemented across the country, some with impressive results.

An alternative to the Ceasefire group-focused strategy is a focus on specific drug markets where flagrant dealing leads to violence and disorder. Police and prosecutors in High Point, North Carolina, adopted a focused-deterrence approach, which involved strong collaborations with community members. Their model, referred to as the Drug Market Intervention, involved identifying all of the dealers in the targeted market, making undercover buys from them (often on film), arresting the most violent dealers, and not arresting the others. Instead, the latter were invited to a community meeting where they were told that, although cases were made against them, they were going to get another chance as long as they stopped dealing. The flagrant drug market in that neighborhood, as David Kennedy reports, vanished literally overnight and has not reappeared for the subsequent seven years. The program has been replicated in dozens of jurisdictions, and there is a growing evidence base showing that it can reduce crime.

A third approach recognizes the heterogeneity in violence among individual drug dealers. By focusing enforcement on those identified as the most violent, police can create both Darwinian and incentive pressures to reduce the overall violence level. This technique has yet to be systematically evaluated. This seems like an attractive research opportunity if a jurisdiction wants to try out such an approach.

An especially challenging problem is dealing-related violence in Mexico, now claiming more than 1,000 lives per month. It is worth considering whether a Ceasefire-style strategy might start a tipping process toward a less violent market. Such a strategy could exploit two features of the current situation: The Mexican groups make most of their money selling drugs for distribution in the United States, and the United States has much greater drug enforcement capacity than does Mexico. If the Mexican government were to select one of the major organizations and target it for destruction after a transparent process based on relative violence levels, U.S. drug-law enforcement might be able to put the target group out of business by focusing attention on the U.S. distributors that buy their drugs from the target Mexican organization, thereby pressuring them to find an alternative source. If that happened, the target organization would find itself without a market for its product.

If one organization could be destroyed in this fashion, the remaining groups might respond to an announcement that a second selection process was underway by competitively reducing their violence levels, each hoping that one of its rivals would be chosen as the second target. The result might be—with the emphasis on might—a dramatic reduction in bloodshed.

Whatever the technical details of violence-minimizing drug-law enforcement, its conceptual basis is the understanding that in established markets enforcement pressure can have a greater effect on how drugs are sold than on how much is sold. So violence reduction is potentially more feasible than is greatly reducing drug dealing generally.

Conclusion

Drug policy involves contested questions of value as well as of fact; that limits the proper role of science in policymaking. And many of the factual questions are too hard to be solved with the current state of the art: The mechanisms of price and quantity determination in illicit markets, for example, have remained largely impervious to investigation. Conversely, research on drug abuse can provide insight into a variety of scientifically interesting questions about the nature of human motivation and self-regulation, complicated by imperfect information, intoxication, and impairment, and engaging group dynamics and tipping phenomena; not every study needs to be justified in terms of its potential contribution to making better policy. However, good theory is often developed in response to practical challenges, and policymakers need the guidance of scientists. Broadening the current research agenda away from biomedical studies and evaluations of the existing policy repertoire could produce both more interesting science and more successful policies.

Valuing the Environment for Decisionmaking

Making thoughtful decisions about environmental challenges that involve wide-ranging and potentially irreversible consequences is of profound importance for current and future human wellbeing. How much and how fast should greenhouse gas emissions be reduced to minimize global climate change? What standards should be set for air and water quality? What should be done to protect biodiversity and to maintain ecological processes? Addressing such questions involves weighing benefits and costs in multiple dimensions. In spite of the high stakes, however, the nation—its government and society—often fails to take systematic account of the environmental consequences in its actual decisionmaking and instead follows standard operating procedures or existing legislative mandates, or simply muddles through.

Virtually all important environmental management and policy decisions have a wide range of effects. For example, zoning or development decisions about land use can have a variety of environmental impacts (for example, on local water and air quality, the potential for flooding downstream, carbon sequestration, and habitat for wildlife) as well as economic and social effects (on economic development, jobs, and income). Similarly, decisions on limits on emissions of air pollutants or greenhouse gases can affect a range of environmental, economic, and social concerns. These results affect multiple groups who often have very different views about desired outcomes (for example, developers versus environmentalists). Effects differ across geography (upstream versus downstream) and time (current versus future impacts). Choosing among management or policy options that differ in terms of environmental, economic, and social outcomes with spatial and temporal components may at first glance seem overwhelmingly complex, with dimensions that seem incomparable. Good environmental management and policy decisionmaking, however, necessitates systematic evaluation and consideration of the effects of management and policy on the affected public. Even though the quantitative valuation of these effects will never be perfect, the outcome of attempts to assess value provides important information to help guide decisionmaking.

Decisions, decisions

Management and policy decisions typically involve difficult tradeoffs that bring improvements in some dimensions and declines in others. Ultimately, deciding whether to choose management or policy alternative A or B requires an evaluation of whether A or B is “better,” where better is determined by the objectives of the decisionmaker. It is easy to conclude that one alternative is better than another if it is better in all dimensions. But making comparisons in which one alternative is better in some dimensions but worse in others requires making difficult value judgments. For example, clearing land for housing development may result in higher incomes and more jobs but reduce habitat for species and worsen local water quality. Whether land clearing is the right decision will depend on whether an increase in incomes and jobs is valued more highly than maintaining habitat and water quality. But how can one really compare income versus habitat for species or jobs versus water quality? Comparing across these different dimensions seems like comparing the proverbial apples and oranges. Reaching an environmental management or policy decision, though, requires the decisionmaker to compare apples and oranges, either explicitly or implicitly.

For an individual, deciding which college to attend, where to live, or what job to take is often a hard choice to make, in large part because it involves changes in multiple dimensions simultaneously. Moving to a new job in a new city may be a better professional opportunity and offer a new set of cultural amenities, but is it worth disrupting family life, moving away from friends, and making adjustments to a new community? Though it is difficult to compare such alternatives, people do make these decisions all the time. In choosing an option, taking account of all the factors, people make a determination that one option is better than the other available options.

As difficult as such choices can be for an individual, making environmental management and policy decisions adds yet another level of complexity. Such decisions affect many people simultaneously and thus require finding a way to aggregate values across different people to reach a decision. Management and policy decisions can make some groups better off while making others worse off, requiring a different sort of apples-and-oranges comparison.

Two methods used in such multidimensional, multiperson decisionmaking contexts are economic benefit/cost calculations and multicriteria decision analysis (MCDA). Each of these methods transforms a complex multidimensional problem involving multiple people into a single dimension that can be used to rank alternatives. These methods act like a blender that mixes apples and oranges to produce a fruit smoothie. Decisionmakers can then decide which fruit smoothie they like the best.

Economics reduces multidimensional problems to a single dimension by measuring the value of changes in each dimension with a common metric, which is typically, but not necessarily, a monetary metric. Economists tend to prefer a monetary metric because it is a pervasive, intuitive, and easily observable measure of the values that people attribute to an array of everyday goods and services. In wellfunctioning markets, the price of a good or service reflects its marginal value to the buyer measured in terms of the common monetary metric: what the buyer is willing to pay to have the good or service. This fact makes the marginal values of many very different goods and services commensurable. The concept extends even to environmental attributes that do not have a market value, such as clean air, as long as people are willing to make tradeoffs in their consumption of some market goods in order to obtain other nonmarket attributes.

The ability to measure values with a common monetary metric rests on two key premises. First, individual willingness to pay for an item is assumed to accurately represent the value of that item to the individual: that is, how much better off the individual is with the item than without the item, measured in monetary terms. Second, the aggregation of values to the societal level requires that the correspondence between willingness to pay and well-being be comparable across individuals, so that a measure of societal value is equal to the (appropriately weighted) sum of values across all individuals in society. This comparability is necessary in order to do benefit/cost analysis resulting in a single number that summarizes social net benefits.

With the ability to produce an aggregate social net benefit calculation for any policy option, the economic benefit/cost decision rule is simple: Choose the option that maximizes social net benefits. This simple rule can be extended to account for uncertainty by maximizing expected social net benefits, where net benefits for individuals can include risk aversion (that is, a willingness to pay to avoid being subjected to uncertain outcomes). The decision rule can also incorporate constraints that restrict outcomes, so that they do not violate minimum environmental standards or basic human rights. As noted, however, the social net benefit calculation requires that individuals evaluate multiple dimensions with a single monetary metric of value and that these values be comparable across individuals. Without such interpersonal comparability, management or policy changes resulting in both winners and losers cannot be evaluated. In this case, only alternatives in which everyone is better off are clearly superior, and such alternatives are extremely unlikely to emerge.

Benefit/cost calculations have been applied to a wide variety of environmental policies. All recent presidents, both Democratic and Republican, have required agencies to evaluate the benefits and costs of regulations, including environmental regulations. Executive Order 12866 signed by President Clinton in 1993 states that agencies “shall assess both the costs and the benefits of the intended regulation” and “in choosing among alternative regulatory approaches, agencies should select those approaches that maximize net benefits” The Environmental Protection Agency (EPA) has done extensive benefit/cost calculations of regulations, particularly regulations under the Clean Air Act. The EPA estimated that the 1990 Clean Air Act would provide benefits of $2 trillion between 1990 and 2020 while imposing costs of $65 billion, a benefit-to-cost ratio of approximately 30-to-1. A prior study of the benefits and costs of the Clean Air Act from 1970 to 1990 found a similarly large benefitto-cost ratio.

The economic benefit/cost approach to maximizing social net benefits may be thought of as belonging to the broader class of MCDA methods, all of which require explicit or implicit weighting of various attributes of expected outcomes of management or policy decisions. Although some MCDA methods accommodate only quantitative attributes, others also permit qualitative attributes. Given attributes and weights, different MCDA methods take different approaches to evaluating alternatives. Some methods seek to identify the best alternative, similar to the economic approach of maximizing social net benefits, while others, such as goal programming, seek to identify alternatives that meet certain thresholds of performance. In goal programming, aspirational or minimally acceptable thresholds are set for each criterion, and alternatives are evaluated according to the priority-weighted distances by which criteria fall short of these thresholds. In general, MCDA methods seek to maximize a social welfare function of a particular, often implicit, form.

Setting relative values

To be operational, benefit/cost and MCDA methods require information on relative values (weights) for different dimensions of value affected by environmental management or policy. Economics and decision sciences tend to take different approaches to assembling information about values. In economics, the values of different management or policy options are derived from aggregating the net benefits to individuals in society for that option. In decision sciences, a variety of methods are used to assemble information on weights to assign to different dimensions.

The task of the economist in understanding relative values for an individual is far easier for marketed goods and services than for nonmarketed environmental attributes. For marketed goods and services, economists use observations on how much is purchased at a given price over a range of different prices to construct a demand function. The demand function summarizes information on the willingness to pay of the individual for the good or service. In competitive markets, the supply function reflects the marginal cost of producing the good or service. Demand and supply can be used to define economic surplus, which is the difference between the (marginal) willingness to pay given by demand and the marginal cost of production given by supply. Summing up this difference over the entire quantity traded is equal to economic surplus; that is, the value generated from the production and consumption of the good or service.

Some environmental changes directly affect marketed goods and services, and the value of these effects can be evaluated by assessing the net change in economic surplus in the affected markets. Take, for example, the potential effects of excess nutrients in a body of water that cause dead zones (areas of low oxygen), resulting in lowered fish and shellfish populations and reduced commercial harvests. With basic information about consumer demand and the costs of supply, economists can estimate the expected loss in economic surplus from the reduction in harvests. Adjustments to economic surplus calculations are necessary when market imperfections, such as monopoly pricing, taxes, or subsidies, result in price distortions so that prices are not a true reflection of the value of marketed goods and services.

The concept of economic surplus (value) also applies to environmental attributes, such as clean air or access to natural areas, for which there is no market. Valuing nonmarket goods and services is more difficult, because there is no readily observable signal of value that is comparable to a market price. Economists have devised a suite of nonmarket valuation tools that can be applied to value nonmarketed environmental attributes. Some nonmarket valuation methods use observable expenditure on a different marketed good or service to draw an inference about the value of the nonmarketed environmental attribute of interest. For example, housing prices may reflect the increased willingness to pay for housing in locations with better environmental amenities, such as access to lakes and parks or better air quality. The choice of where to recreate can reveal information about the relative value of environmental amenities that vary across recreation sites. Other methods of estimating value record changes in expenditures, such as changes in the cost to treat drinking water with changes in water quality.

Economists cannot use observed expenditures to value all important changes to the environment. For example, if all of the lakes in a region are polluted and no one uses them for recreation, it will be difficult to assess the value of reducing pollution on recreational value, unless one is willing to make inferences from other regions. More fundamentally, there are limited or no directly observable expenditures or other behavioral clues for some environment attributes, particularly non-use benefits such as knowing that species exist. In Antonio Briceño, Overfishing, from the Millions of Pieces: Only One Puzzle Project, Digital c-print on Fuji Crystal Archival paper, 21 x 60 inches, 2010. the absence of observable behavior, economists use survey questions to ask people about values for changes in environmental attributes. Such “stated preference” methods include contingent valuation and conjoint analysis. The contingent valuation method presents survey respondents with a hypothetical change in the environment, such as a 10% increase in the size of humpback whale populations, and asks whether they would be willing to pay a specified amount for the change. Varying the specified amount and observing the proportion of people saying yes generates information analogous to a demand curve for marketed goods and services. In conjoint analysis, people are asked to rank a series of outcomes that differ in the quantities of various attributes. Conjoint analysis allows direct evaluation of how people trade off one attribute versus another, such as an improvement in air quality versus greater access to open space. If one of the attributes is income or expenditure, then the analyst can also estimate willingness to pay.

Some actions, such as emissions of greenhouse gases, cause changes in multiple dimensions that occur over extended periods. For example, a change in carbon storage in ecosystems that reduces atmospheric concentrations causes changes in climate forcing and ocean acidification, which in turn affect myriad other environmental attributes, including precipitation patterns, with effects on agricultural production, the probability and severity of flooding, and the health of marine resources, among others. Summarizing the value of all these changes into a single estimate of the social cost of carbon (SCC) requires complex integrated assessment models that predict both environmental and economic outcomes and attach estimates of the value of those outcomes. Further complicating matters, SCC estimates depend on levels of emissions that can be affected by the very policy choice that SCC is meant to inform. For this reason and others, such as the choice of social discount rate, the estimates of the SCC range from near zero to hundreds of dollars per ton of carbon.

Instead of the often-complex process of economic valuation, MCDA typically relies on a set of alternative methods for establishing relative values or weights on different criteria, to be chosen by the decisionmakers. The identification of weights may be done by introspection, deliberation, or negotiation—or some combination of the three—among stakeholders. Setting relative weights may also be done as part of an iterative process in which alternatives are evaluated, weights reassessed in light of the evaluation, and new criteria weights applied.

One example of how relative weights for different criteria are set in MCDA is through application of the analytical hierarchy process. In this process, decisionmakers are asked to determine a set of top-level criteria, and within each of these to determine the subcomponent criteria. They are then asked to rank the relative importance of criteria at each level of the hierarchy. For example, suppose a decisionmaker is evaluating policies aimed at controlling non–point-source pollution from agriculture with two overarching criteria of water quality and economic effects. If these criteria are assigned equal importance, then each receives a weight of 0.5. At the next level of hierarchy, suppose that the water quality criteria include water clarity, dissolved oxygen content, and temperature, and that the economic criteria include farm income and jobs. If the decisionmaker believes that water clarity is twice as important as dissolved oxygen, and dissolved oxygen is twice as important as temperature, their weights at this level of hierarchy are 4/7, 2/7, and 1/7, respectively. Suppose that jobs are ranked as twice as important as farm income, then the weights would be 2/3 and 1/3. The overall weights in the analysis would then be 0.5 times these values: 2/7 for water clarity, 1/7 for dissolved oxygen content, 1/14 for water temperature, 1/3 for jobs, and 1/6 for farm income.

A potentially important difference between economic and MCDA approaches to valuation is in whose values are incorporated. In principle, valuation in benefit/cost assessments includes the value of everyone affected by management or policy choices, though in practice there may be questions about whether economic valuation methods accurately reflect societal values. In MCDA, it is typically a smaller subset of people that is involved in setting relative weights. For local-scale problems, MCDA methods could include all affected parties in a deliberative process, but as the scale of the problem grows, this will not be possible. For larger-scale environmental problems, ranging up to global concerns such as climate change, there is the question of representation and whether those present adequately reflect the views of the wider public. In addition, relative weights in MCDA should not be treated as constant but should reflect changes in circumstances, something that is typically captured in economic valuation methods.

Weighty issues

Any environmental management or policy decision is likely to entail winners and losers. How should the distribution of benefits and costs across groups be treated in environmental management and policy decisions? Critics of benefit/cost analysis contend that reliance on economic valuation systematically disadvantages those with less money. Greater wealth means greater ability (and thus willingness) to pay, so benefit/cost analysis effectively gives more weight to those with more money (“voting with dollars”). One way to answer this criticism is to give a higher weight to the values of those with less wealth. Economists have found considerable evidence of diminishing marginal utility of income, meaning that the value of an additional dollar to a poor person is greater than to a rich person. This fact can be used to justify “equity weights” based on differences in wealth. For example, an equity weight argument would mean that otherwise equal damages from future climate change should be given greater weight in low-income countries than in high-income countries. In addition, if society is committed to protecting the interests of particular groups, it can constrain consideration of options to those that achieve specified distributional goals.

Since the effects of alternative environmental management and policy options will differ across generations, a fundamental challenge in valuing environmental management and policy decisions is how to aggregate benefits and costs that accrue to current and future generations (inter-generational distribution). For example, more aggressive climate change mitigation strategies impose costs on the current generation but generate benefits for future generations.Economists typically use discounting to aggregate benefits and costs over time. The standard economic rationale for discounting is that investments yield a positive expected real rate of return, so that having a dollar today is worth more than having a dollar in the future. Costs and benefits realized at different points in time are thus commensurable in present value terms after discounting.

The standard discounting approach works well for nearterm private investment decisions, but what about for longterm social decisions affecting the welfare of future generations? If one accepts the principle of equal moral standing of all generations, there would seem to be little ethical justification for discounting future welfare. Frank Ramsay, the father of economic approaches to discounting and growth theory, maintained that it was “ethically indefensible” to treat the welfare of current and future generations differently. However, to the extent that future generations are expected to be better off than the current generation, discounting can be justified as an intergenerational application of equity weights. By the same principle, if environmental conditions worsen significantly and future generations are expected to be less well off than the present generation, this would imply a negative discount rate; that is, discounting of present benefits relative to future benefits. As recent debates on climate change policy aptly illustrate, there is little agreement among economists, or between economists and others, on discounting.

Uncertainty is a central issue in environmental management and policy. Uncertainty enters at various steps in the link between management and policy choices and eventual effects on the value of outcomes. There can be uncertainty about how changes in management or policy affect choices made by individuals and businesses (behavioral uncertainty), how changes in human actions affect the environment (scientific uncertainty), and how consequent changes in the environment will affect human well-being (value uncertainty). Recent work on the value of ecosystems services illustrates each of these uncertainties. For example, the Conservation Reserve Program, which pays landowners for taking land out of production and restores perennial vegetation, can shift patterns of land use and, in turn, result in changes in carbon sequestration, water quality, and habitat provision. Program participation and the provision of services depend on the choices of individual landowners, which are uncertain. There are key gaps in the science linking land use to service provision, such as how changes in land use will affect changes in carbon storage in soil or populations of particular species, making provision uncertain even when behavioral uncertainty is ignored. There are also key gaps in information pertaining to the link between services and benefits, making value uncertain even if provision is known. The value of water quality improvement, for example, depends as much on who uses the water and for what purpose as on the water quality itself.

Economic approaches typically use an expected utility framework to deal with uncertainty, where the value of each potential outcome is weighted by its probability of occurrence. This approach summarizes expected social net benefits across dimensions, as discussed above, but also across all possible outcomes that could occur given a management or policy choice. Using the expected utility framework, however, requires information about probabilities as well as values under all potential outcomes. For environmental issues involving complex system dynamics, such as climate change or the provision of ecosystem services, the list of possible outcomes in the future may be unknown, much less how to specify probabilities or likely values for each of these outcomes. Beyond the challenge of scientific uncertainty, there may also be uncertainty about the preferences of future generation and how they will value various outcomes. Inability to objectively quantify probabilities or values requires modifying expected utility, such as by using subjective judgments to establish probabilities or values, or setting bounds on decisions thought to pose unacceptable risks (for example, safe minimum standards). A particular challenge to making decisions under uncertainty arises from consideration of catastrophic outcomes. It is difficult to set probabilities on such events because they are rare, but small changes in assumptions about these probabilities can lead to large changes in policy advice.

People make mistakes, often in systematic and predictable ways. They tend to be overly optimistic, biased toward the present, and averse to losses. They have trouble thinking through complex problems, especially those with uncertainty. Given these facts, some analysts question the validity of using valuation studies that rely on observed choices, survey responses, or even deliberative processes among affected parties as an important input for setting environmental policy. The alternative, however, would be to delegate judgments about the relative value of outcomes to political leaders or scientific experts. Elected leaders, at least in theory, should reflect public values. Environmental scientists, however, have no special claim to understanding public values. In either case, there is no guarantee that top-down decisions will reflect the underlying values of the public at large any better than an imperfect reflection of values gathered through valuation exercises.

In principle, economic valuation methods can estimate value for all environmental attributes, either through inferences from observable behavior or responses in stated preference surveys. In practice, however, it is generally not possible to get a complete economic assessment of all environmental values. Some values connected with the environment are notoriously difficult to assess in monetary terms. For example, what is the monetary value of conserving species with important spiritual or cultural value? Some critics contend that individuals are cognitively incapable of evaluating tradeoffs between utilitarian goods (such as commodities and ecosystem services) and moral goods (such as the existence of a species). There are sharp disagreements between psychologists and economists—and among economists themselves—on this point. Even when it is possible in principle to estimate monetary values, there may be insufficient data to do so. Nevertheless, economic methods can provide evidence about the value of many important environmental attributes.

The value of valuation

Though difficult, collecting information about the relative values of alternative potential outcomes, in all of their multiple dimensions, is vital to good environmental management and policy decisionmaking. Setting environmental policy is not simply a matter of applying the best science, as important as that is. Environmental management and policy typically involve making decisions about tradeoffs among multiple objectives about which society cares. Making decisions about such tradeoffs involves making value judgments. If these judgments are to improve human wellbeing, they should reflect the underlying values of individuals affected by the policy.

Economic valuation methods applied in the context of environmental management and policy seek to inform decisionmaking by collecting information about the value of alternatives to affected individuals and then aggregating these values to determine an estimate of social net benefits. In simple benefit/cost analysis, the management or policy option with the highest social net benefits should then be the preferred option. The great advantage of the simple benefit/cost approach is that it incorporates economic valuation methods to represent values of the affected public, summarizes this information into a single ranking, and uses this ranking to help guide policy. Valuation information can also be combined with other decisions rules, such as those that minimize the risk of bad outcomes occurring.

Rather than trying to summarize everything in a single number, as in simple benefit/cost analysis, it may be better to disaggregate results and report a wider set of results instead. Reporting a single number can hide important implicit value judgments. Though less tidy, reporting a set of results has the advantage of letting decisionmakers see important distributional consequences by reporting benefits and costs to different groups (such as income classes, geographic regions, and generations), as well as a range of possible outcomes under important sources of uncertainty. Additionally, results can be shown for different assumptions about important parameter values over which there may be disagreement (for example, the discount rate). Doing so makes clear the effect of different modeling and value judgments on the ranking of alternatives and lets decisionmakers better understand whether rankings are robust to changes in assumptions. For example, in reviewing efforts by economists to measure “inclusive wealth” intended to value all natural, human, manufactured, and social capital in order to provide a summary measure of sustainability, Joseph Stiglitz and Amartya Sen, two Nobel laureate economists, and colleagues concluded that such attempts overreach. Instead, they recommended that a number of measures be used, including biophysical measures in which the data or understanding are insufficient to provide trustworthy estimates of monetary value.

Regardless of whether a single number or set of results is reported for each management or policy option, analysts working in support of environmental decisionmaking have a duty to make the analysis transparent and the result clear. Why decisions are made can be explained and defended. “Black box” models that only experts understand are rarely trusted by nonexperts and often fail to build support for decisions or trust in the process of decisionmaking.

Because there is no such thing as a perfect assessment of environmental effects or associated values, decisionmakers and others should view the results of benefit/cost analysis or MCDA as input into the decisionmaking process, rather than uncritically accepting the results and implementing the highest-ranked alternative. But done well, assessments that incorporate valuation information can inform and improve environmental decisionmaking.

Global Lessons for Improving U.S. Education

The middling performance of U.S. students on international achievement tests is by now familiar, so the overall results of the latest Program for International Student Assessment (PISA) study, released in late 2010, came as no surprise. Among the 34 developed democracies that are members of the Organization for Economic Cooperation and Development (OECD), 15-year-olds in the United States ranked 14th in reading, 17th in science, and no better than 25th in mathematics. The new wrinkle in the data was the participation of China’s Shanghai province, whose students took top honors in all three subjects, besting U.S. students by the equivalent of multiple grade levels in each. Home to the nation’s wealthiest city and a magnet for its most ambitious and talented citizens, Shanghai’s results are hardly representative of China as a whole. Yet its students’ eye-popping performance seemed to highlight new challenges facing the U.S. economy in an age of unprecedented global trade.

The notion that educational competition threatens the future prosperity of the United States had already been a recurrent theme in many quarters. The Obama administration, for example, cited a link between education and national economic competitiveness in making the case for the education funding allocated through the American Recovery and Reinvestment Act, for the state-level policy changes incentivized by the Race to the Top grant competition, and for increased federal support of early childhood education.

However, the relationship between education and international competitiveness remains “a subject rife with myth and misunderstanding,” as even Arne Duncan, secretary of the U.S. Department of Education, has noted. This confusion may stem from the fact that the concept of international competitiveness is notoriously difficult to pin down. Academic economists, for example, have long criticized the view that countries in a globalized economy are engaged in a zero-sum game in which only some can emerge as winners and others will inevitably lose out. All countries can in the-ory benefit from international trade by specializing in those activities in which they have a comparative advantage. In what sense, then, does it make sense to talk about national economies competing?

These general lessons seem doubly true for education, where the mechanisms by which gains abroad would undermine U.S. prosperity are altogether unclear. Educational improvements in other countries enhance the productivity of their workforces, which in turn should reduce the costs of imports to the United States, benefitting all U.S. residents except perhaps those who compete directly in producing the same goods. At the top end of the education spectrum, growth in the number of graduate degrees awarded in fields such as science and engineering fosters technological advances from which the United States can benefit regardless of where key discoveries are made. For these and other reasons, developments such as Shanghai’s performance on the PISA, although at first glance startling, may in fact represent good news.

This is not to say that the very real educational challenges facing the United States are irrelevant to its future economic performance. On the contrary, the evidence that the quality of a nation’s education system is a key determinant of the future growth of its economy is increasingly strong. Therefore, the United States may benefit by examining past and ongoing research on educational performance across countries and considering the actions that higher-performing nations have taken that have helped their students to succeed.

Hard comparisons

Launched in 2000 as a project of the OECD, the PISA is administered every three years to nationally representative samples of students in each OECD country and in a growing number of partner countries and subnational units such as Shanghai. The 74 education systems that participated in the latest PISA study, conducted during 2009, represented more than 85% of the global economy and included virtually all of the United States’ major trading partners, making it a particularly useful source of information on U.S. students’ relative standing.

U.S. students performed well below the OECD average in math and essentially matched the average in science. In math, the United States trailed 17 OECD countries by a statistically significant margin, its performance was indistinguishable from that of 11 countries, and it significantly outperformed only five countries. In science, the United States significantly trailed 12 countries and outperformed nine. Countries scoring at similar levels to the United States in both subjects include Austria, the Czech Republic, Hungary, Ireland, Poland, Portugal, and Sweden.

The gap in average math and science achievement between the United States and the top-performing national school systems is dramatic. In math, the average U.S. student by age 15 was at least a full year behind the average student in six countries, including Canada, Japan, and the Netherlands. Students in six additional countries, including Australia, Belgium, Estonia, and Germany, outperformed U.S. students by more than half a year.

The second-rate performance of U.S. students is particularly striking given the level of resources the nation devotes to elementary and secondary education. Data on cumulative expenditures per student in public and private schools between ages 6 and 15 confirm that the United States spends more than any other OECD country except Luxembourg. Most of the higher-performing countries spend between $60,000 and $80,000 per student, compared with nearly $105,000 in the United States.

Some observers have speculated that despite the modest performance of its average students, the U.S. education system is characterized by pockets of excellence that can be expected to meet the needs of the knowledge economy. However, there is no clear evidence that educating a subset of students to very high levels is more important for national economic success than raising average achievement levels. Moreover, the United States in fact fares no better in comparisons of the share of students performing at exceptionally high levels. For example, only 9.9% of U.S. students taking the PISA math test achieved at level 5 or 6, the two top performance categories, which, according to test administrators, indicate that students are capable of complex mathematical tasks requiring broad, well-developed thinking and reasoning skills. Twenty-four countries outranked the United States by this metric. The share of students achieving level 5 or 6 exceeded 20% in five countries and exceeded 15% in another 10. In Shanghai, 50.4% of students surpassed this benchmark, more than five times the level in the United States.

Another common response to the disappointing performance of U.S. students has been to emphasize the relative diversity of U.S. students and the wide variation in their socioeconomic status. Family background characteristics and other out-of-school factors clearly have a profound influence on students’ academic achievement. The available international assessments, all of which offer only a snapshot of how students have learned at a single point in time rather than evidence on how much progress they are making from one year to the next, are therefore best viewed as measuring the combined effects of differences in school quality and differences in these contextual factors. The latter are poorly measured across countries, making it difficult to pin down their relative import.

Even so, it is difficult to attribute the relative ranking of U.S. students to out-of-school factors alone. The share of U.S. students with college-educated parents, a key predictor of school success, actually ranks 8th among the OECD countries. The typical U.S. student is also well above the OECD average, according to PISA’s preferred measure of students’ socioeconomic status.

A record of poor results

U.S. students, however, have never fared well in various international comparisons of student achievement. The United States ranked 11th out of 12 countries participating in the first major international study of student achievement, conducted in 1964, and its math and science scores on the 2009 PISA actually reflected modest improvements from the previous test. The United States’ traditional reputation as the world’s educational leader stems instead from the fact that it experienced a far earlier spread of mass secondary education than did most other nations.

In the first half of the 20th century, demand for secondary schooling in the United States surged as technological changes increased the wages available to workers who could follow written instructions, decipher blueprints, and perform basic calculations. The nation’s highly decentralized school system, in which local communities could vote independently to support the creation of a high school, provided a uniquely favorable mechanism to drive increased public investment in schooling. As economic historian Claudia Goldin of Harvard University has documented, by 1955 almost 80% of 15- to 19-year-olds were enrolled full-time in general secondary schooling, more than double the share in any European country.

The United States’ historical advantage in terms of educational attainment has long since eroded, however. U.S. high-school graduation rates peaked in 1970 at roughly 80% and have declined slightly since, a trend often masked in official statistics by the growing number of students receiving alternative credentials, such as a General Educational Development (GED) certificate. Although the share of students enrolling in college has continued to climb, the share completing a college degree has hardly budged. As this pattern suggests, both the time students are taking to complete college degrees and dropout rates among students enrolling in college have increased sharply. This trend seems especially puzzling in light of the fact that the economic returns from completing a postsecondary degree—and the economic costs of dropping out of high school—have grown substantially over the same period.

A policy agenda centered on closing the global achievement gap between U.S. students and those in other developed countries would provide a complementary and arguably more encompassing rationale for education reform than one focused primarily on closing achievement gaps among subpopulations within the United States.

Meanwhile, other developed countries have continued to see steady increases in educational attainment and, in many cases, now have postsecondary completion rates that exceed those in the United States. The U.S. high-school graduation rate now trails the average for European Union countries and ranks no better than 18th among the 26 OECD countries for which comparable data are available. On average across the OECD, postsecondary completion rates have increased steadily from one age cohort to the next. Although only 20% of those aged 55 to 64 have a postsecondary degree, the share among those aged 25 to 34 is up to 35%. The postsecondary completion rate of U.S. residents aged 25 to 34 remains above the OECD average at 42%, but this reflects a decline of one percentage point relative to those aged 35 to 44 and is only marginally higher than the rate registered by older cohorts.

To be sure, in many respects the U.S. higher education system remains the envy of the world. Despite recent concerns about rapidly increasing costs, declining degree completion rates, and the quality of instruction available to undergraduate students, U.S. universities continue to dominate world rankings of research productivity. The 2011 Academic Rankings of World Universities, an annual publication of the Shanghai Jiao Tong University, placed eight U.S. universities within the global top 10, 17 within the top 20, and 151 within the top 500. A 2008 RAND study commissioned by the U.S. Department of Defense found that 63% of the world’s most highly cited academic papers in science and technology were produced by researchers based in the United States. Moreover, the United States remains the top destination for graduate students studying outside of their own countries, attracting 19% of all foreign students in 2008. This rate is nine percentage points higher than the rate of the closest U.S. competitor, the United Kingdom.

Yet surely the most dramatic educational development in recent decades has been the rapid global expansion of higher education. Harvard economist Richard Freeman has estimated that the U.S. share of the total number of postsecondary students worldwide fell from 29% in 1970 to just 12% in 2006, a 60% decline. A portion of this decline reflects the progress of developed countries, but the more important factor by far has been the spectacular expansion of higher education in emerging economies, such as China and India. In China alone, postsecondary enrollments exploded from fewer than 100,000 students in 1970 to 23.4 million in 2006. The increase over the same period in India was from 2.5 million to 12.9 million students. In comparison, just 17.5 million U.S. students were enrolled in postsecondary degree programs in 2006.

Although these enrollment numbers reflect China and India’s sheer size and say nothing about the quality of instruction students receive, several recent reports have nonetheless concluded that the rapidly shifting landscape of higher education threatens the United States’ continued dominance in strategically important fields such as science and technology. Perhaps best known is the 2007 National Academies report Rising Above the Gathering Storm, which warned that “the scientific and technological building blocks critical to our economic leadership are eroding at a time when many other nations are gathering strength.” A follow-up report issued in 2010 by some of the authors of Gathering Storm warned that the storm was “approaching category five.” Although critics claim that the reports exaggerated the degree to which the research coming out of emerging economies is comparable to that produced by scholars based in the United States, it seems safe to conclude that in the future the nation will occupy a much smaller share of a rapidly expanding academic marketplace.

Costs of low-quality education

How concerned should the United States be about these developments? And is it the improvement in educational outcomes abroad that should motivate concern?

After all, until very recently the performance of the U.S. economy had far surpassed that of the industrialized world as a whole, despite the mediocre performance of U.S. students on international tests. Some observers have gone so far as to question the existence of a link between available measures of the performance of national education systems and economic success.

Such skepticism was not entirely misplaced. Economists as far back as Adam Smith have highlighted the theoretical importance of human capital as a source of national economic growth. For technologically advanced countries, highly educated workers are a source of innovations needed to further enhance labor productivity. For countries far from the frontier, education is necessary to enable workers to be able to adopt new technologies developed elsewhere. Because a given country is likely to be both near and far from the technological frontier in various industries at any given point in time, both of these mechanisms are likely to operate simultaneously. Yet rigorous empirical evidence supporting these common-sense propositions has been notoriously difficult to produce.

One key limitation of early research examining the relationship between education and economic growth is that it was based on crude measures of school enrollment ratios or the average years of schooling completed by the adult population. Although studies taking this approach tend to find a modest positive relationship between schooling and economic growth across countries, years of schooling is an incomplete and potentially quite misleading indicator of the performance of national education systems. Measures of educational attainment implicitly assume that a year of schooling is equally valuable regardless of where it is completed, despite the clear evidence from international assessments that the skills achieved by students of the same age vary widely across countries.

Economists Eric Hanushek of Stanford University and Ludger Woessmann of the University of Munich have addressed this limitation in an important series of papers published since 2008. Their key innovation is the use of 12 international assessments of math and science achievement conducted between 1964 and 2003 to construct a comparable measure of the cognitive skills of secondary school students for a large sample of countries. They went on to analyze the relationship between this measure and economic growth rates between 1960 and 2000 across all 50 countries for which cognitive skills and growth data are available and separately across 24 members of the OECD.

Their work has yielded several notable results. First, after controlling for both a country’s initial gross domestic product (GDP) per capita and the average years of schooling completed in 1960, they found that a one standard deviation increase in test scores is associated with an increase in annual growth rates of nearly 2%. Taken at face value, this implies that raising the performance of U.S. students in math and science to the level of that of a top-performing nation would increase the U.S. growth rate by more than a full percentage point over the long run; that is, once students educated to that level of academic accomplishment make up the entire national workforce. Second, they found that both the share of a country’s students performing at a very high level and the share performing above a very low level appear to contribute to economic growth in roughly equal amounts, suggesting that there is no clear economic rationale for policymakers to focus exclusively on improving performance at the top or the bottom of the ability distribution. Third, after controlling for their test-based measure of students’ cognitive skills, they found that the number of years of schooling completed by the average student is no longer predictive of growth rates. This suggests that policies intended to increase the quantity of schooling that students receive will bear economic fruit only if they are accompanied by measurable improvements in students’ cognitive skills.

Although these studies have offered a clear improvement over previous evidence, skeptics may wonder whether the pattern identified linking education quality and economic growth in fact reflects a causal relationship. It is of course possible that there are unidentified factors that enhance both the quality of national education systems and economic growth. Hanushek and Woessmann have performed a series of analyses intended to rule out these concerns. Although none of these tests of causation is definitive on its own, together they strongly suggest that policies that increase education quality would in fact generate a meaningful economic return.

Moreover, the magnitude of the relationship observed is so large that it would remain important even if a substantial portion of it were driven by other factors. Consider the results of a simulation in which it is assumed that the math achievement of U.S. students improves by 0.25 standard deviation gradually over 20 years. This increase would raise U.S. performance to roughly that of some mid-level OECD countries, such as New Zealand and the Netherlands, but not to that of the highest-performing OECD countries. Assuming that the past relationship between test scores and economic growth holds true in the future, the net present value of the resulting increment to GDP over an 80-year horizon would amount to almost $44 trillion. A parallel simulation of the consequences of bringing U.S. students up to the level of the top-performing countries suggests that doing so would yield benefits with a net present value approaching $112 trillion.

Yet despite ubiquitous rhetoric about education’s importance for countries competing in the global marketplace, there is no evidence that these potential gains would come at the expense of other nations. Put differently, there is no reason to suspect that U.S. residents are made worse off in absolute terms by the superior performance of students in places such as Finland, Korea, or even Shanghai. At the higher education level, U.S. universities clearly face growing competition in recruiting talented international students and faculty and will probably find it difficult to maintain their current dominance of world rankings. Yet as labor economist Richard Freeman of Harvard University has explained, “the globalization of higher education should benefit the U.S. and the rest of the world by accelerating the rate of technological advance associated with science and engineering and by speeding the adoption of best practices around the world, which will lower the costs of production and prices of goods.”

Reporting systems that make it possible to compare the performance of students in specific U.S. school districts to that in top-performing countries internationally could help to alter perceptions and broaden support for reform.

This is not to say that a continued decline in the relative standing of the U.S. education system would leave the nation’s economy entirely unaffected. As Caroline Hoxby, a labor and public economist at Stanford University, has noted, studies of the “factor content” of U.S. exports and economic growth have long documented their disproportionate reliance on human capital. This pattern suggests that the United States has traditionally had a comparative advantage in the production of goods that depend on skilled labor, which in turns reflects its historical edge in the efficient production of highly educated workers. In recent decades, U.S. companies have increasingly addressed labor shortages in technical fields by “importing” human capital in the form of highly educated immigrants, many of whom received their postsecondary training in the United States. This strategy cannot be a source of comparative advantage in the long run, however, because other countries are by definition able to import talented immigrants at the same cost. The decline in the relative performance of the U.S. educational system may therefore have adverse consequences for the high-tech sectors on which the nation has historically depended to generate overall growth. The ability of the nation’s economy as a whole to adapt in the face of such a disruption is, of course, an open question.

Policy lessons

In short, although there is little indication that education is an area in which countries are engaged in zero-sum global competition for scarce resources, education reform does provide a means to enhance economic growth and, in turn, the nation’s capacity to address its mounting fiscal challenges. Even if that were not the case, the moral argument for addressing the performance of the most dysfunctional U.S. school systems and the inequalities in social outcomes they produce would be overwhelming. What then are the lessons policymakers should draw from the growing body of evidence examining the performance of school systems across various countries?

The first and most straightforward lesson is simply that dramatic improvement is possible and that this is true even of the best-performing state school systems. Not only do many countries perform at markedly higher levels despite being at lower levels of economic development, but several of these countries have improved their performance substantially in the relatively short period since international tests were first widely administered. Nor do the international data suggest that countries face a stark tradeoff between excellence and equity when considering strategies to raise student achievement. In fact, the countries with the highest average test scores tend to exhibit less overall inequality in test scores and, in many cases, weaker dependence of achievement on family background characteristics.

A policy agenda centered on closing the global achievement gap between U.S. students and those in other developed countries would provide a complementary and arguably more encompassing rationale for education reform than one focused primarily on closing achievement gaps among subpopulations within the United States. The urgency of closing domestic achievement gaps is without question, but the current emphasis on this goal may well reinforce the perception among members of the middle class that their schools are performing at acceptable levels. The 2011 PDK-Gallup Poll shows that more than half of all U.S. residents currently assign the public schools in their local community an “A” or “B” grade, while only 17% assign one of those grades to public schools in the nation as a whole. This gap between local and national evaluations has widened considerably over the past decade, and similar data from the 2011 Education Next–PEPG Survey show that well-educated, affluent citizens are particularly likely to rate their local schools favorably. Reporting systems that make it possible to compare the performance of students in specific U.S. school districts to that in top-performing countries internationally could help to alter perceptions and broaden support for reform.

A second lesson is that reform efforts should aim to improve the quality of education available to U.S. students in elementary and secondary schools rather than merely increase the quantity of education they consume. The large economic return from the completion of college and especially graduate degrees suggests that there is considerable demand for workers who have been educated to those levels, and policymakers would be wise to address issues, such as the complexity of financial aid systems, that create obstacles to degree completion for academically prepared students. But increasing educational attainment should not be an end in and of itself. Doing so is unlikely to yield economic benefits without reforms to K-12 schooling that ensure that a growing number of students are equipped for the rigors of postsecondary work.

Another general lesson is that additional financial investment is neither necessary nor sufficient to improve the quality of elementary and secondary education. Current data clearly show that other developed countries have managed to achieve far greater productivity in their school systems, in many cases spending considerably less than the United States, to achieve superior results. Nor have countries that have increased spending levels in recent decades experienced gains in their performance on international assessments, a pattern that is consistent with the mixed track record of spending increases in producing improved student outcomes in the United States.

If countries with high-performing elementary and secondary education systems have not spent their way to the top, how have they managed to get there? Unfortunately, using international evidence to draw more specific policy guidance for the United States remains a challenge. Although it is straightforward to document correlations between a given policy and performance across countries, it is much harder to rule out the existence of other factors that could explain the relationship. The vast cultural and contextual differences from one country to the next also imply that policies and practices that work well in one setting may not do so in another. Even so, there are three broad areas in which the consistency of findings across studies using different international tests and country samples bears attention.

Exit exams. Perhaps the best-documented factor is that students perform at higher levels in countries (and in regions within countries) with externally administered, curriculum-based exams at the completion of secondary schooling that carry significant consequences for students of all ability levels. Although many states in the United States now require students to pass an exam in order to receive a high-school diploma, these tests are typically designed to assess minimum competency in math and reading and are all but irrelevant to students elsewhere in the performance distribution. In contrast, exit exams in many European and Asian countries cover a broader swath of the curriculum, play a central role in determining students’ postsecondary options, and carry significant weight in the labor market. As a result, these systems provide strong incentives for student effort and valuable information to parents and other stakeholders about the relative performance of secondary schools. The most rigorous available evidence indicates that math and science achievement is a full grade-level equivalent higher in countries with such an exam system in the relevant subject.

Private-school competition. Countries vary widely in the extent to which they make use of the private sector to provide public education. In countries such as Belgium, the Netherlands, and (more recently) Sweden, for example, private schools receive government subsidies for each student enrolled equivalent to the level of funding received by state-run schools. Because private schools in these countries are more heavily regulated than those in the United States, they more closely resemble U.S. charter schools, although they typically have a distinctive religious character. In theory, government funding for private schools can provide families of all income levels with a broader range of options and subject the state-run school system to increased competition from alternative providers. Rigorous studies confirm that students in countries that for historical reasons have a larger share of students in private schools perform at higher levels on international assessments while spending less on primary and secondary education. Such evidence suggests that competition can spur school productivity. In addition, the achievement gap between socioeconomically disadvantaged and advantaged students is reduced in countries in which private schools receive more government funds.

High-ability teachers. Much attention has recently been devoted to the fact that several of the highest-performing countries internationally draw their teachers disproportionately from the top third of all students completing college degrees. This contrasts sharply with recruitment patterns in the United States. Given the strong evidence that teacher effectiveness is the most important school-based determinant of student achievement, this factor probably plays a decisive role in the success of the highest-performing countries. Unfortunately, as education economist Dan Goldhaber of the University of Washington has pointed out, the differences in teacher policies across countries that have been documented to date “do not point toward a consensus about the types of policies—or even sets of policies—that might ensure a high-quality teacher workforce.”

Although increasing average salaries provides one potential mechanism to attract a more capable teaching workforce, there is no clear relationship between teacher salary levels and student performance among developed countries. Especially given the current strains on district and state budgets, any funds devoted to increasing teacher salaries should be targeted at subjects such as math and science, in which qualified candidates have stronger earnings opportunities in other industries, and at teachers who demonstrate themselves to be effective in the classroom. Intriguingly, the only available study on the latter topic shows that countries that allow teacher salaries to be adjusted based on their performance in the classroom perform at higher levels.

Vital national priority

During the past two decades, state and federal efforts to improve U.S. education have centered on the development of test-based accountability systems that reward and sanction schools based on their students’ performance on state assessments. The evidence is clear that the federal No Child Left Behind Act and its state-level predecessors have improved student achievement, particularly for students at the bottom of the performance distribution. Yet the progress made under these policies falls well short of their ambitious goals. Equally important, the progress appears to have been limited to a one-time increment in performance rather than launching schools on a trajectory of continuous improvement.

International evidence may not yet be capable of providing definitive guidance for closing the global achievement gap between students in the United States and those in the top-performing countries abroad. It does, however, indicate that holding students accountable for their performance, creating competition from alternative providers of schooling, and developing strategies to recruit and retain more capable teachers all have important roles to play in addressing what should be a vital national priority.

The Tunnel at the End of the Light: The Future of the U.S. Semiconductor Industry

Today, as it was 25 years ago, U.S. leadership in the semiconductor industry appears to be in peril, with increasingly robust competition from companies in Europe and Asia that are often subsidized by national governments. Twenty-five years ago, the United States responded vigorously to a Japanese challenge to its leadership. U.S. industry convinced the government, largely for national security reasons, to make investments that helped preserve and sustain U.S. leadership. The main mechanism for this turnaround was an unprecedented industry/government consortium called SEMATECH, which today has attained a near-mythical status.

The world has changed in the past 25 years, however. Today, industry is not clamoring for government help. In a more globalized economy, companies appear to be more concerned with their overall international position, rather than the relative strength of the U.S.-based segment. More over, the United States continues to lead the world in semi conductor R&D. Companies can use breakthroughs derived from that research to develop and manufacture new products anywhere in the world.

Indeed, it appears increasingly likely that most semiconductor manufacturing will no longer be done in the United States. But if this is the case, what are the implications for the U.S. economy? Are the national security concerns that fueled SEMATECH’s creation no longer relevant? Unfortunately, today’s policymakers are not even focusing on these questions. They should be.

We believe that there could be significant ramifications to the end of cutting-edge semiconductor manufacturing in the United States and that government involvement that goes beyond R&D funding may be necessary. But the U.S. government has traditionally been averse to policies supporting commercialization, and the current ideological makeup of Congress dictates against anything smacking of industrial policy.

But assuming that more government help is needed, and that Congress is even willing to provide it, what form should it take? In considering this question, we decided to reexamine the SEMATECH experience. We concluded that SEMATECH met the objectives of the U.S. semiconductor companies that established it but was only a partial answer to sustaining U.S. leadership in this critical technology. Moreover, as a consortium that received half of its funds over a decade from the U.S. Department of Defense (DOD) under the rationale of supporting national security, the SEMATECH experience raises some unaddressed policy questions as well as questions about how government should approach vexing issues about future technology leadership.

The origins of SEMATECH

In the late 1970s, U.S. semiconductor firms concluded that collectively they had a competitiveness problem. Japanese companies were aggressively targeting the dynamic random access memory (DRAM) business. U.S. companies believed that the Japanese firms were competing unfairly, aided by various government programs and subsidies. They contended that these arrangements allowed Japanese firms to develop and manufacture DRAMs and then dump them on the market at prices below cost. Initially, U.S. industry responded by forming the Semiconductor Industry Association as a forum for addressing key competitive issues.

In 1987, a Defense Science Board (DSB) Task Force issued a report articulating growing concerns about the competitiveness of the U.S. integrated circuit (IC) industry. The DSB study depicted semiconductor manufacturing as a national security problem and argued that the government should address it. A key recommendation was the creation of the entity that became SEMATECH.

The Reagan administration initially opposed an industry/ government consortium, considering it inappropriate industrial policy. But Congress, concerned with what it considered to be the real prospect that the United States would cede the IC manufacturing industry to Japan, approved a bill creating SEMATECH, and President Reagan signed it into law.

From the outset, there were some concerns about SEMATECH. One was the nature of a consortium itself, which is essentially a club with members who pay to join. SEMATECH was made up of about 80% of the leading U.S. semiconductor chip manufacturing firms. But some companies, for various reasons, declined to join, and were critical of SEMATECH for two reasons. First, SEMATECH was criticized for focusing on mainstream technology and thus defining the next generation of technology based on a limited view of the world. SEMATECH decided to focus on silicon complementary metal-oxide–semiconductor (CMOS) ICs. This technology is the basis for memory and other high-volume devices that were targets of Japanese competitors. Cypress Semiconductor, a chief critic that made application-specific integrated circuits (ASICs), believed that SEMATECH supported incumbent mass market technologies, not those of more specialized producers. Second, companies that had declined to participate in the consortium argued that because SEMATECH received half of its funding from the federal government, the results of its efforts should be equally available to all. But SEMATECH adamantly maintained its view that only those who had paid their fair share should reap preferential benefits.

Another concern about the creation of SEMATECH was the limited role given to the DOD despite the national security rationale for SEMATECH and the fact that the DOD would provide 50% of the funding for the consortium. In the enabling legislation for SEMATECH, Congress ensured that the DOD would have little direct input in the project planning and activities of the organization. Congress, following the position of SEMATECH’s commercial participants, concluded simply that industry knew best what to do and how to do it.

But from the start, it was clear that the interests of the government, and especially the DOD, were not the same as those of the commercial IC industry. This is made clear in an Institute for Defense Analyses (IDA) study done for the DOD on technology areas that needed attention from a defense perspective. One highlighted area was ASIC technology, because of the DOD’s need for affordable, low-volume specialty ICs. Although ASIC technology had great commercial potential, it was unlikely that SEMATECH would pursue that technology because of its business model.

The IDA study also emphasized the need to invest in manufacturing tools, especially lithography technology, for future generations of ICs. Although industry participants in SEMATECH did not object to government investment in advanced lithography, they saw this type of effort as separate from SEMATECH. However, this raised concern about how longer-term DOD-sponsored lithography R&D would integrate with the near-term SEMATECH focus on developing technology to improve processing yields.

A third area emphasized was the need to develop non-silicon IC technology, especially that based on materials such as gallium arsenide. These technologies are especially important to defense applications that require high-speed signal processing and had, in the view of some, great commercial potential. Indeed, gallium arsenide IC technology became the critical enabling technology for cellular phones.

Thus, it was clear early on that SEMATECH, with its narrow agenda and focus on the survival of its member companies, could not, from the DOD’s perspective, sufficiently serve national security interests in IC development. The DOD needed to develop new types of devices for future capabilities, the processes needed to fabricate them, and a U.S. industrial base with a first-mover advantage, so that the DOD could reap strategic and tactical advantages resulting from the development of the new technologies. Some saw the manufacturing of these other devices as vital to national security, perhaps even more so than the standard commercial CMOS ICs emphasized by SEMATECH.

This longer-term perspective led the DOD to sponsor research in areas SEMATECH was not emphasizing, including the Very High Speed IC program, which focused on analog-to-digital converter ICs. These were not standard CMOS ICs and required different fabrication processes. The DOD also funded research on non–silicon-based ICs, particularly those using gallium arsenide, through the Monolithic Microwave IC program. In addition, under the Defense Advanced Research Projects Agency (DARPA), the Very Large Scale Integration (VLSI) program supported research on advanced IC architecture, design tools, and manufacturing tools, especially lithography.

Diverging interests

The divergent interests of the government and SEMATECH came to the forefront in the1990s over the issue of lithography technology. Lithography processing tools produce intricate circuit design patterns on semiconductor wafers. Their continued improvement has enabled IC manufacturers to shrink feature size and pack ever-increasing numbers of transistors and functionality into a single chip.

Lithography tools are extremely complex and increasingly expensive. The current leading-edge tools cost more than $50 million, and next-generation tools are projected to cost nearly $125 million. Manufacturers, however, can make only one or two leading-edge tools per month. The highest profit margins for IC products come immediately after an advance occurs in lithography technology. Once the improved lithography tools become more widely available, the ICs they produce become commodity items with thin margins. Thus, the order in which IC manufacturers get access to the most advanced tools is an important component of their profitability. Tool suppliers use this as leverage to reward their largest and most loyal customers.

The DOD through DARPA and industry through SEMATECH supported the development of advanced lithography tools by two U.S. suppliers: GCA and PerkinElmer (later Silicon Valley Group, or SVG). These companies once dominated the global lithography market but were displaced by the Japanese firms Nikon and Canon. With federal and other external funding, the U.S.-developed tools became competitive with the tools offered by Nikon and Canon. The DOD wanted U.S. IC firms to buy and use the U.S. tools, thereby supporting a U.S. semiconductor infrastructure. The leading U.S. IC firms, however, were reluctant and made it increasingly clear that they wanted the U.S.-developed technology to be available to their Japanese lithography tool suppliers. In essence, key U.S. IC firms were happy to have this technology developed but wanted to continue to use Nikon and Canon as their suppliers.

This crystallized the divergent interests of the DOD and some of the major U.S. IC firms. Some in the DOD saw it in the nation’s interest for commercial and defense purposes to have U.S.-based lithography technology capabilities. The industry leaders, on the other hand, emphasized business concerns about the ability of the U.S. lithography firms to scale production and deliver and support the tools. Further, because they had established special relationships with Nikon or Canon, they had good early access to key tools, providing them with a competitive advantage.

From the government standpoint, this raised the following question: Why did companies encourage the government to fund these U.S. tools if they were not going to buy them? SEMATECH members paid for some of the tool development, which gave them the right of first refusal but no obligation to buy. Yet U.S. lithography firms would not survive unless major U.S. companies bought their tools.

As the U.S. lithography toolmakers foundered, the U.S. government now faced the question of what, if anything, it could do with the remnants of the U.S. industry. With government acquiescence, SVG acquired PerkinElmer’s lithography business in 1990 and formed Silicon Valley Group Lithography (SVGL). SVGL proceeded to develop Perkin Elmer’s breakthrough step-and-scan technology but still struggled to attract a customer base. In 1993, it talked with Canon about sharing the underlying technology, but the U.S. government objected to any such transfer. In 2000, ASML, a Netherlands-based lithography company, a joint venture of Phillips and ASM, announced its intent to acquire SVGL for $1.6 billion. After an initial objection by the U.S. Business and Industry Council, ASML completed the acquisition of SVGL in 2001, albeit with some very specific strictures to satisfy U.S. security concerns.

ASML, with strong support from the European Union (EU), industrial collaborations through the Belgium-based Interuniversity Microelectronics Center (IMEC), and the technology that it acquired from SVGL, developed a strong customer base among IC manufacturers in Europe, as well as those emerging in Korea and Taiwan that could not gain early access to leading-edge tools from the dominant Japanese providers. By addressing this underserved market, ASML in 2002 became the global leader, with 45% market share. Through a series of technical innovations that solved major problems for the IC manufacturers, it grew to 70% market share in 2011. Canon, meanwhile, lost most of its business.

Thus, it is arguable that U.S. industrial policy in the 1980s and 1990s, coupled with that of Europe, was successful. U.S. government–funded technology helped create a highly capable lithography competitor who offered an alternative supplier to then-dominant Japanese leaders, reducing the prospects of unacceptable concentration of a critical production tool. However, the firm that successfully implemented this technology was a Dutch one, which also received strong support from European firms and technology policies and programs.

Subsequently, another issue arose in the late 1990s between the U.S. government and U.S. IC firms over lithography. For years, IC manufacturers were concerned that optical lithography would reach its technological limits and the industry would no longer be able to continue shrinking features on IC chips. One promising, albeit extremely challenging, solution lay in the development of lithography based on extreme ultraviolet light (EUV), a technology that originated at U.S. Department of Energy (DOE) laboratories.

To access EUV technology, Intel in 1997 formed the EUV LLC, which entered into a cooperative R&D agreement (CRADA) with DOE. As part of this agreement, Intel and its partners would pay $250 million over three years to cover the direct salary costs of government researchers at the national labs and acquire equipment and materials for the labs, as well as cover the costs of its own researchers dedicated to the project. In return, the consortium would have exclusive rights to the technology in the EUV lithography field of use. At the time, it was the largest CRADA ever undertaken.

Once the EUV LLC executed the CRADA, Intel announced that it intended to bring in Nikon and ASML to help develop the technology. This unprecedented access by foreign corporations to U.S. national defense laboratories became an issue for DOE and Congress. In reviewing the available options, DOE rejected the Intel proposal to partner with Nikon but allowed it to set up a partnership with ASML. Among conditions for this access, ASML had to commit to use SVGL’s U.S. facilities for manufacturing.

Originally slated for production in early 2000s, EUV tools are still not available. Today, ASML is the only company in the world with preproduction EUV tools undergoing evaluation by IC manufacturers. Analysts predict that if ASML successfully delivers commercially viable EUV lithography tools, it will expand its global market share to 80%.

U.S. policy clearly had an effect on the semiconductor industry, but drawing lessons from this experience is not a simple matter. Tensions between commercial and national security goals were never fully resolved. Although some U.S. companies benefited from federal efforts, several foreign companies also reaped benefits, and the overall gains for U.S. interests were not as broad or as long-lasting as hoped. Now the nation’s IC industry faces a new and somewhat different challenge in a different global economic environment. Developing an effective policy response is a challenge that can be informed but not guided by previous efforts.

The government’s role

The DOD’s investments in IC R&D typically aim far ahead of the trajectory of commercial R&D. In that sense, the DOD’s R&D can be considered disruptive because it often leads to technologies that are not embedded in current practice. However, the last thing that the mainstream semiconductor industry wants is anything disruptive. The existing commercial IC industry follows a highly controlled evolutionary roadmap to maintain the Moore’s Law pace of doubling the number of transistors on an IC every 18 to 24 months. Only when industry hits a wall and can no longer proceed along an evolutionary path will it consider radical change.

Today, the industry again faces such a wall. The cost per transistor on an IC, which has been declining at an exponential rate for more than four decades, has leveled off, and continuing progress is simply not economically possible with mainstream technology. Industry is counting on ASML’s EUV lithography tools to restore the pattern of reliable cost reduction, but it is not certain that these tools will be able to meet the technical and economic requirements. There are both huge engineering and basic physics challenges to the development of new technologies. The lithography tool will have to register the successive exposure layers with an accuracy within four nanometers or about 20 atoms, and at this scale even the photon intensity of the light becomes a concern. We are literally reaching the tunnel at the end of the light.

DARPA has continued to fund R&D on next-generation lithography, albeit with an emphasis on technologies that support cost-effective fabrication of the low volumes of the specialty ICs needed by the military. It has funded technologies called nano-imprint lithography and reflected electron beam lithography. DARPA’s program, however, is limited to proving technical feasibility and does not address the investment needed to take the technology to a production-worthy tool.

Taking a tool from the technology feasibility demonstration stage to a production-worthy product is a bet-the-company proposition, similar to the stakes in developing a new commercial airliner. The penalty for missing the market can be devastating. In the past, U.S. lithography toolmakers had the best technology in the world, but failed to commercialize it. Today, the United States does not even have a firm that makes lithography tools.

The U.S. government has repeatedly invested in technology development but has deliberately avoided funding commercialization efforts. The U.S. aversion to policies supporting commercialization is a stark contrast to those of other countries. The EU, for example, through IMEC, Germany’s Fraunhofer Institute, and other institutions, supports applied research and product development. This has helped companies such as ASML navigate the treacherous waters of commercialization. Japan, Korea, Taiwan, China, and others also have government-funded applied technology programs.

As the world of electronics moves into the realm of nanoscale electronics, or nanotronics, the United States has focused on proof-of-concept technology R&D programs at DARPA, including some novel but unproven nanoscale production approaches, such as nanoimprint, dip-pen, and electron beam lithography. Yet today there is no concerted, focused national R&D program addressing the nanotronic manufacturing infrastructure. Corporate consortia and regional centers, such as the Albany Nanotech Complex in New York and the associated IBM-led Semiconductor Research Alliance, have sprung up without federal funding. However, such efforts are corporation-dominated and international in focus. They do not aim at a national agenda of technology leadership.

Without a parallel effort focused on developing the high-volume manufacturing technology and infrastructure necessary to make actual products, the United States will probably not reap the rewards of its investments. U.S.-based companies that develop the technology will have to go elsewhere to manufacture. The President’s Council of Advisors on Science and Technology 2010 report on the National Nanotechnology Initiative emphasized the need to put greater emphasis on manufacturing and commercialization in a nanoelectronics research initiative, stating that “over the next five years, the federal government should double the funding devoted to nanomanufacturing.”

The United States has a long history of funding the underlying technology and manufacturing capability in areas where national security is the primary application. The situation becomes less clear in dual-use situations, in which the technology has both commercial and defense importance. IC technology is clearly dual-use, given its pervasiveness in commercial products and its criticality in most aspects of military strategic and tactical operations.

The question is, given the national security implications of IC technology, should there be a concerted U.S. policy to address nanotronics manufacturing? If the health of the U.S. semiconductor manufacturing industry was so important for national security 25 years ago that it was the rationale for creating SEMATECH, should the United States be doing something similar today? If not, what has changed? If it undertakes such an activity, should it structure it in a way that gives the federal government an active role and voice?

The DOD has a strong interest in sustaining U.S. leadership in emerging technologies that will provide needed military capabilities. But the United States also needs to foster these technologies to maintain the health of an industry that has become a key component of a modern economy.

Who would fund this effort? Is this a job for the DOD, as it was in the past? It is certainly in the DOD’s interest to see key technologies mature and for the United States to be a leader in manufacturing. On the other hand, does this warrant a broader research funding agenda to include the Departments of Commerce and Energy? We contend that this is a much broader and more profound issue for national security, encompassing economic, energy, and even environmental security. Without the robust development of nanotronic-based industries, the United States faces the prospect of losing its leading position in the broader information technology sector, with cascading effects on other industries that depend on this technology to continue boosting their productivity.

Our concern is that there is inadequate focus on and discussion of these issues. The United States still has some tremendous advantages, including companies that are leaders in the most advanced IC technologies and their production, some robust tool and equipment firms, and a strong government-funded R&D system. However, other countries see the opportunity to claim the field and are funding national-level R&D programs in manufacturing at the nanoscale. Furthermore, the economic, geopolitical, and security landscape has changed fundamentally since 1985, which adds complexities to assessing the situation and determining potential approaches to address it. Given these dynamics, we conclude that it is time for concerted discussion to determine whether nanotronics manufacturing is an urgent national and economic security issue, and if so, what should be done about it.

Should the Science of Adolescent Brain Development Inform Public Policy?

The science of adolescent brain development is making its way into the national conversation. As an early researcher in the field, I regularly receive calls from journalists asking how the science of adolescent brain development should affect the way society treats teenagers. I have been asked whether this science justifies raising the driving age, outlawing the solitary confinement of incarcerated juveniles, excluding 18-year-olds from the military, or prohibiting 16-year-olds from serving as lifeguards on the Jersey Shore. Explicit reference to the neuroscience of adolescence is slowly creeping into legal and policy discussions as well as popular culture. The U.S. Supreme Court discussed adolescent brain science during oral arguments in Roper v. Simmons, which abolished the juvenile death penalty, and cited the field in its 2010 decision in Graham v. Florida, which prohibited sentencing juveniles convicted of crimes other than homicide to life without parole.

There is now incontrovertible evidence that adolescence is a period of significant changes in brain structure and function. Although most of this work has appeared just in the past 15 years, there is already strong consensus among developmental neuroscientists about the nature of this change. And the most important conclusion to emerge from recent research is that important changes in brain anatomy and activity take place far longer into development than had been previously thought. Reasonable people may disagree about what these findings may mean as society decides how to treat young people, but there is little room for disagreement about the fact that adolescence is a period of substantial brain maturation with respect to both structure and function.

Brain changes

Four specific structural changes in the brain during adolescence are noteworthy. First, there is a decrease in gray matter in prefrontal regions of the brain, reflective of synaptic pruning, the process through which unused connections between neurons are eliminated. The elimination of these unused synapses occurs mainly during pre-adolescence and early adolescence, the period during which major improvements in basic cognitive abilities and logical reasoning are seen, in part due to these very anatomical changes.

Second, important changes in activity involving the neurotransmitter dopamine occur during early adolescence, especially around puberty. There are substantial changes in the density and distribution of dopamine receptors in pathways that connect the limbic system, which is where emotions are processed and rewards and punishments experienced, and the prefrontal cortex, which is the brain’s chief executive officer. There is more dopaminergic activity in these pathways during the first part of adolescence than at any other time in development. Because dopamine plays a critical role in how humans experience pleasure, these changes have important implications for sensation-seeking.

Third, there is an increase in white matter in the prefrontal cortex during adolescence. This is largely the result of myelination, the process through which nerve fibers become sheathed in myelin, a white, fatty substance that improves the efficiency of brain circuits. Unlike the synaptic pruning of the prefrontal areas, which is mainly finished by mid-adolescence, myelination continues well into late adolescence and early adulthood. More efficient neural connections within the prefrontal cortex are important for higher-order cognitive functions—planning ahead, weighing risks and rewards, and making complicated decisions, among others—that are regulated by multiple prefrontal areas working in concert.

Fourth, there is an increase in the strength of connections between the prefrontal cortex and the limbic system. This anatomical change is especially important for emotion regulation, which is facilitated by increased connectivity between regions important in the processing of emotional information and those important in self-control. These connections permit different brain systems to communicate with each other more effectively, and these gains also are ongoing well into late adolescence. If you were to compare a young teenager’s brain with that of a young adult, you would see a much more extensive network of myelinated cables connecting brain regions.

Adolescence is not just a time of tremendous change in the brain’s structure. It is also a time of important changes in how the brain works, as revealed in studies using functional magnetic resonance imaging, or fMRI. What do these imaging studies reveal about the adolescent brain? First, over the course of adolescence and into early adulthood, there is a strengthening of activity in brain systems involving self-regulation. During tasks that require self-control, adults employ a wider network of brain regions than do adolescents, and this trait may make self-control easier, by distributing the work across multiple areas of the brain rather than overtaxing a smaller number of regions.

Second, there are important changes in the way the brain responds to rewards. When one examines a brain scan acquired during a task in which individuals who are about to play a game are shown rewarding stimuli, such as piles of coins or pictures of happy faces, it is usually the case that adolescents’ reward centers are activated more than occurs in children or adults. (Interestingly, these age differences are more consistently observed when individuals are anticipating rewards than when they are receiving them.) Heightened sensitivity to anticipated rewards motivates adolescents to engage in acts, even risky acts, when the potential for pleasure is high, such as with unprotected sex, fast driving, or experimentation with drugs. In our laboratory, Jason Chein and I have shown that this hypersensitivity to reward is particularly pronounced when adolescents are with their friends, and we think this helps explain why adolescent risk-taking so often occurs in groups.

A third change in brain function over the course of adolescence involves increases in the simultaneous involvement of multiple brain regions in response to arousing stimuli, such as pictures of angry or terrified faces. Before adulthood, there is less cross-talk between the brain systems that regulate rational decisionmaking and those that regulate emotional arousal. During adolescence, very strong feelings are less likely to be modulated by the involvement of brain regions involved in controlling impulses, planning ahead, and comparing the costs and benefits of alternative courses of action. This is one reason why susceptibility to peer pressure declines as adolescents grow into adulthood; as they mature, individuals become better able to put the brakes on an impulse that is aroused by their friends.

Importance of timing

These structural and functional changes do not all take place along one uniform timetable, and the differences in their timing raise two important points relevant to the use of neuroscience to guide public policy. First, there is no simple answer to the question of when an adolescent brain becomes an adult brain. Brain systems implicated in basic cognitive processes reach adult levels of maturity by mid-adolescence, whereas those that are active in self-regulation do not fully mature until late adolescence or even early adulthood. In other words, adolescents mature intellectually before they mature socially or emotionally, a fact that helps explain why teenagers who are so smart in some respects sometimes do surprisingly dumb things.

To the extent that society wishes to use developmental neuroscience to inform public policy decisions on where to draw age boundaries between adolescence and adulthood, it is therefore important to match the policy question with the right science. In his dissenting opinion in Roper, the juvenile death penalty case, Justice Antonin Scalia criticized the American Psychological Association, which submitted an amicus brief arguing that adolescents are not as mature as adults and therefore should not be eligible for the death penalty. As Scalia pointed out, the association had previously taken the stance that adolescents should be permitted to make decisions about abortion without involving their parents, because young people’s decision-making is just as competent as that of adults.

The association’s two positions may seem inconsistent at first glance, but it is entirely possible that an adolescent might be mature enough for some decisions but not others. After all, the circumstances under which individuals make medical decisions and commit crimes are very different and make different sorts of demands on individuals’ brains and abilities. State laws governing adolescent abortion require a waiting period before the procedure can be performed, as well as consultation with an adult—a parent, health care provider, or judge. These policies discourage impetuous and short-sighted acts and create circumstances under which adolescents’ decision-making has been shown to be just as competent as that demonstrated by adults. In contrast, violent crimes are usually committed by adolescents when they are emotionally aroused and with their friends—two conditions that increase the likelihood of impulsivity and sensation-seeking and that exacerbate adolescent immaturity. From a neuroscientific standpoint, it therefore makes perfect sense to have a lower age for autonomous medical decision-making than for eligibility for capital punishment, because certain brain systems mature earlier than others.

There is another kind of asynchrony in brain development during adolescence that is important for public policy. Middle adolescence is a period during which brain systems implicated in how a person responds to rewards are at their height of arousability but systems important for self-regulation are still immature. The different timetables followed by these different brain systems create a vulnerability to risky and reckless behavior that is greater in middle adolescence than before or after. It’s as if the brain’s accelerator is pressed to the floor before a good braking system is in place. Given this, it’s no surprise that the commission of crime peaks around age 17—as does first experimentation with alcohol and marijuana, automobile crashes, accidental drownings, and attempted suicide.

In sum, the consensus to emerge from recent research on the adolescent brain is that teenagers are not as mature in either brain structure or function as adults. This does not mean that adolescents’ brains are “defective,” just as no one would say that newborns’ muscular systems are defective because they are not capable of walking or their language systems are defective because they can’t yet carry on conversations. The fact that the adolescent brain is still developing, and in this regard is less mature than the adult brain, is normative, not pathological. Adolescence is a developmental stage, not a disease, mental illness, or defect. But it is a time when people are, on average, not as mature as they will be when they become adults.

I am frequently asked how to reconcile this view of adolescence with historical evidence that adolescents successfully performed adult roles in previous eras. This may be true, but all societies in recorded history have recognized a period of development between childhood and adulthood, and writers as far back as Aristotle have characterized adolescents as less able to control themselves and more prone to risk-taking than adults. As Shakespeare wrote in The Winter’s Tale: “I would there were no age between ten and three-and-twenty, or that youth would sleep out the rest; for there is nothing in the between but getting wenches with child, wronging the ancientry, stealing, fighting.” That was in 1623, without the benefit of brain scans.

Science in the policy arena

Although there is a good degree of consensus among neuroscientists about many of the ways in which brain structure and function change during adolescence, it is less clear just how informative this work is about adolescent behavior for public policy. Because all behavior must have neurobiological underpinnings, it is hardly revelatory to say that adolescents behave the way they do because of “something in their brain.” Moreover, society hardly needs neuroscience to tell it that, relative to adults, adolescents are more likely to engage in sensation seeking, less likely to control their impulses, or less likely to plan ahead. So how does neuroscience add to society’s understanding of adolescent behavior? What is the value, other than advances in basic neuroscience, of studies that provide neurobiological evidence that is consistent with what is already known about human behavior?

I’d like to consider five such possibilities, two that I think are valid, two that I think are mistaken, and one where my assessment is equivocal. Let me begin with two rationales that are widely believed but that are specious.

The first mistake is to interpret age differences in brain structure or function as conclusive evidence that the relevant behaviors must therefore be hard-wired. A correlation between brain development and behavioral development is just that: a correlation. It says nothing about the causes of the behavior or about the relative contributions of nature and nurture. In some cases, the behavior may indeed follow directly from biologically driven changes in brain structure or function. But in others, the reverse is true—that is, the observed brain change is the consequence of experience. Yes, adolescents may develop better impulse control as a result of changes within the prefrontal cortex, and it may be true that these anatomical changes are programmed to unfold along a predetermined timetable. But it is also plausible that the structural changes observed in the prefrontal cortex result from experiences that demand that adolescents exercise self-control, in much the same way that changes in muscle structure and function often follow from exercise.

A second mistake is assuming that the existence of a biological correlate of some behavior demonstrates that the behavior cannot be changed. It is surely the case that some of the changes in brain structure and function that take place during adolescence are relatively impervious to environmental influence. But it is also known that the brain is malleable, and there is a good deal of evidence that adolescence is, in fact, a period of especially heightened neuroplasticity. That’s one reason it is a period of such vulnerability to many forms of mental illness.

I suspect that the changes in reward sensitivity that I described earlier are largely determined by biology and, in particular, by puberty. I say this because the changes in reward seeking observed in young adolescents are also seen in other mammals when they go through puberty. This makes perfect sense from an evolutionary perspective, because adolescence is the period during which mammals become sexually active, a behavior that is motivated by the expectation of pleasure. An increase in reward sensitivity soon after puberty is added insurance that mammals will do what it takes to reproduce while they are at the peak of fertility, including engaging in a certain amount of risky behavior, such as leaving the nest or troop to venture out into the wild. In fact, the age at peak human fecundity (that is, the age at which an individual should begin having sex if he or she wants to have the most children possible) is about the same as the age at the peak of risk-taking—between 16 and 17 years of age.

Other brain changes that take place during adolescence are probably driven to a great extent by nurture and may therefore be modifiable by experience. There is growing evidence that the actual structure of prefrontal regions active in self-control can be influenced by training and practice. So in addition to assuming that biology causes behavior, and not the reverse, it is also mistaken to think that the biology of the brain can’t be changed.

How science can help

How, then, does neuroscience contribute to a better understanding of adolescent behavior? As I said, I think the neuroscience serves at least two important functions.

First, neuroscientific evidence can provide added support for behavioral evidence when the neuroscience and the behavioral science are conceptually and theoretically aligned. Notice that I used the word “support” here. Because scientific evidence of any sort is always more compelling when it has been shown to be valid, when neuroscientific findings about adolescent brain development are consistent with findings from behavioral research, the neuroscience provides added confidence in the behavioral findings. But it is incorrect to privilege the neuroscientific evidence over the behavioral evidence, which is frequently done because the neuroscientific evidence is often assumed—incorrectly— by laypersons to be more reliable, precise, or valid. Many nonscientists are more persuaded by neuroscience than by behavioral science, because they often lack the training or expertise that would enable them to view the neuroscience through a critical lens. In science, familiarity breeds skepticism, and the lack of knowledge that most laypersons have about the workings of the brain, much less the nuances of neuroscientific methods, often leads them to be overly impressed by brain science and underwhelmed by behavioral research, even when the latter may be more relevant to policy decisions.

A second way in which neuroscience can be useful is that it may help generate new hypotheses about adolescent development that can then be tested in behavioral studies. This is especially important when behavioral methods are inherently unable to distinguish between alternative accounts of a phenomenon. Let me illustrate this point with an example from our ongoing research.

As I noted earlier, it has been hypothesized that heightened risk-taking in adolescence is thought to be the product of an easily aroused reward system and an immature self-regulatory system. The arousal of the reward system takes place early in adolescence and is closely tied to puberty, whereas the maturation of the self-regulatory system is independent of puberty and unfolds gradually, from preadolescence through young adulthood.

In our studies, we have shown that reward sensitivity, preference for immediate rewards, sensation-seeking, and a greater focus on the rewards of a risky choice all increase between pre-adolescence and mid-adolescence, peak between ages 15 and 17, and then decline. In contrast, controlling impulses, planning ahead, and resisting peer influence all increase gradually from pre-adolescence through late adolescence, and in some instances, into early adulthood.

Although one can show without the benefit of neuroscience that the inclination to take risks is generally higher in adolescence than before or after, having knowledge about the course of brain development provides insight into the underlying processes that might account for this pattern. We’ve shown in several experiments that adolescents take more risks when they are with their friends than when they are alone. But is this because the presence of peers interferes with self-control or because it affects the way in which adolescents experience the rewards of the risky decision? It isn’t possible to answer this question by asking teenagers why they take more risks when their friends are around; they admit that they do, but they say they don’t know why. But through neuroimaging, we discovered that the peer effect was specifically due to the impact that peers have on adolescents’ reward sensitivity. Why does this matter? Because if the chief reason that adolescents experiment with tobacco, alcohol, and other drugs is that they are at a point in life where everything rewarding feels especially so, trying to teach them to “Just Say No” is probably futile. I’ve argued elsewhere that raising the price of cigarettes and alcohol, thereby making these rewarding substances harder to obtain, is probably a more effective public policy than health education.

I’ve now described two valid reasons to use neuroscience to better understand adolescent behavior and two questionable ones. I want to add a fifth, which concerns the attributions we make about individuals’ behavior. This particular use of neuroscience is having a tremendous impact on criminal law.

I recently was asked to provide an expert opinion in a Michigan case involving a prison convict named Anthony, who as a 17-year-old was part of a group of teenagers who robbed a small store. During the robbery, one of the teenagers shot and killed the storekeeper. Although the teenagers had planned the robbery, they did not engage in the act with the intention of shooting, much less murdering, someone. But under the state’s criminal law, the crime qualified as felony murder, which in Michigan carries a mandatory sentence of life without the possibility of parole for all members of the group involved in the robbery—including Anthony, who had fled the store before the shooting took place.

At issue now is a challenge by Anthony—who has been in prison for 33 years—to vacate the sentence in light of the Supreme Court’s ruling in Graham v. Florida that life without parole is cruel and unusual punishment for juveniles because they are less mature than adults. The ruling in that case was limited to crimes other than homicide, however. The challenge to Michigan’s law is based on the argument that the logic behind the Graham decision applies to felony murder as well.

I was asked specifically whether a 17-year-old could have anticipated that someone might be killed during the robbery. It is quite clear from the trial transcript that Anthony didn’t anticipate this consequence, but didn’t is not the same as couldn’t. It is known from behavioral research that the average 17-year-old is less likely than the average adult to think ahead, control his impulses, and foresee the consequences of his actions; and clinical evaluations of Anthony revealed that he was a normal 17-year-old. But “less likely” means just that; it doesn’t mean unable, but neither does it mean unwilling. As I will explain, the distinction between didn’t and couldn’t is important under the law. And studies of adolescent brain development might be helpful in distinguishing between the two.

The issue before the Michigan Court is not whether Anthony is guilty. He freely admitted having participated in the robbery, and there was clear evidence that the victim was shot and killed by one of the robbers. So there is no doubt that Anthony is guilty of felony murder. But even when someone is found guilty, many factors can influence the sentence he receives. Individuals who are deemed less than fully responsible are punished less severely than those who are judged to be fully responsible, even if the consequences of the act are identical. Manslaughter is not punished as harshly as premeditated murder, even though both result in the death of another individual. So the question in Anthony’s case, as it was in the Roper and Graham Supreme Court cases, is whether 17-year-olds are fully responsible for their behavior. If they are not, they should not be punished as severely as individuals whose responsibility is not diminished.

In order for something to diminish criminal responsibility, it has to be something that was not the person’s fault— that was outside his control. If someone has an untreatable tumor on his frontal lobe that is thought to make him unable to control aggressive outbursts, he is less than fully responsible for his aggressive behavior as a result of something that isn’t his fault, and the presence of the tumor would be viewed as a mitigating factor if he were being sentenced for a violent crime. On the other hand, if someone with no neurobiological deficit goes into a bar, drinks himself into a state of rage, and commits a violent crime as a result, the fact that he was drunk does not diminish his responsibility for his act. It doesn’t matter whether the mitigating factor is biological, psychological, or environmental. The issue is whether the diminished responsibility is the person’s fault and whether the individual could have been able to compensate for whatever it is that was uncontrollable.

Judgments about mitigation are often difficult to make because most of the time, factors that diminish responsibility fall somewhere between the extremes of things that are obviously beyond an individual’s control, such as brain tumors, and those that an individual could have controlled, such as self-inflicted inebriation. But in many cases, things are not so clear-cut. One must make a judgment call, and one looks for evidence that tips the balance in one direction or the other. Profound mental retardation that compromises foresight is a mitigating condition. A lack of foresight as a result of stupidity that is within the normal range of intelligence is not. Being forced to commit a crime because a gun is pointed at one’s head mitigates criminal responsibility. Committing a crime in order to save face in front of friends who have made a dare does not. Many things can lead a person to act impulsively or without foresight but are not necessarily mitigating. A genetic inclination toward aggression is probably in this category, as is having been raised in a rotten neighborhood. Both are external forces, but society does not see them as so determinative that they automatically diminish personal responsibility.

As I have discussed, studies of adolescent brain anatomy clearly indicate that regions of the brain that regulate such things as foresight, impulse control, and resistance to peer pressure are still developing at age 17. And imaging studies show that immaturity in these regions is linked to adolescents’ poorer performance on tasks that require these capabilities. Evidence that the adolescent brain is less mature than the adult brain in ways that affect some of the behaviors that mitigate criminal responsibility suggests that at least some of adolescents’ irresponsible behavior is not entirely their fault.

The brain science, in and of itself, does not carry the day, but when the results of behavioral science are added to the mix, I think it tips the balance toward viewing adolescent impulsivity, short-sightedness, and susceptibility to peer pressure as developmentally normative phenomena that teenagers cannot fully control. This is why I have argued that adolescents should be viewed as inherently less responsible than adults, and should be punished less harshly than adults, even when the crimes they are convicted of are identical. I do not find persuasive the counterargument that some adolescents can exercise self-control or that some adults are just as impulsive and short-sighted as teenagers. Of course there is variability in brain and behavior among adolescents, and of course there is variability among adults. But the average differences between the age groups are significant, and that is what counts as society draws age boundaries under the law on the basis of science.

Age ranges for responsibility

Beyond criminal law, how should social policy involving young people take this into account? Society needs to distinguish between people who are ready for the rights and responsibilities of adulthood and those who are not. Science can help in deciding where best to draw the lines. Based on what is now known about brain development—and I say “now known” because new studies are appearing every month—it is reasonable to posit that there is an age range during which adult neurobiological maturity is reached. Framing this as an age range, rather than pinpointing a discrete chronological age, is useful, because doing so accommodates that fact that different brain systems mature along different timetables, and different individuals mature at different ages and different rates. The lower bound of this age range is probably somewhere around 15. By this I mean that if society had an agreed-upon measure of adult neurobiological maturity (which it doesn’t yet have, but may at some point in the future), it would be unlikely that many individuals would have attained this mark before turning 15. The upper bound of the age range is probably somewhere around 22. That is, it would be unlikely that there would be many normally developing individuals who have not reached adult neurobiological maturity by the time they have turned 22.

If society were to choose either of these endpoints as the age of majority, it would be forced to accept many errors of classification, because granting adult status at age 15 would result in treating many immature individuals as adults, which is dangerous, whereas waiting until age 22 would result in treating many mature individuals as children, which is unjust. So what is society to do? I think there are four possible options.

The first option is to pick the mid-point of this range. Yes, this would result in classifying some immature individuals as adults and some mature ones as children. But this would be true no matter what chronological age is picked, and assuming that the age of neurobiological maturity is normally distributed, fewer errors would be made by picking an age near the middle of the range than at either of the extremes. Doing so would place the dividing line somewhere around 18, which, it turns out, is the presumptive age of majority pretty much everywhere around the world. In the vast majority of countries, 18 is the age at which individuals are permitted to vote, drink, drive, and enjoy other adult rights. And just think—the international community arrived at this without the benefit of brain scans.

A second possibility would be to decide, on an issue-by-issue basis, what it takes to be “mature enough.” Society does this regularly. Although the presumptive age of majority in the United States is 18, the nation deviates from this age more often than not. Consider, for a moment, the different ages mandated for determining when individuals can make independent medical decisions, drive, hold various types of employment, marry, view R-rated movies without an adult chaperone, vote, serve in the military, enter into contracts, buy cigarettes, and purchase alcohol. The age of majority with respect to these matters ranges from 15 to 21, which is surprisingly reasonable, given what science says about brain development. The only deviation I can think of that falls out of this range is the nation’s inexplicable willingness to try people younger than 15 as adults, but this policy, in part because of the influence of brain science, is now being questioned in many jurisdictions.

Although the aforementioned age range may be reasonable, society doesn’t rely on science to link specific ages to specific rights or responsibilities, and some of the nation’s laws are baffling, to say the least, when viewed through the lens of science or public health. How is it possible to rationalize permitting teenagers to drive before they are permitted to see R-rated movies on their own, sentencing juveniles to life without parole before they are old enough to serve on a jury, or sending young people into combat before they can buy beer? The answer is that policies that distinguish between adolescents and adults are made for all sorts of reasons, and science, including neuroscience, is only one of many proper considerations.

A third possibility would be to shift from a binary classification system, in which everyone is legally either a child or an adult, to a regime that uses three legal categories: one for children, one for adolescents, and one for adults. The nation does this for some purposes under the law now, although the age boundaries around the middle category aren’t necessarily scientifically derived. For example, many states have graduated drivers’ licensing, a system in which adolescents are permitted to drive, but are not granted full driving privileges until they reach a certain age. This model also is used in the construction of child labor laws, where adolescents are allowed to work once they’ve reached a certain age, but there are limits on the types of jobs they can hold and the numbers of hours they can work.

In our book Rethinking Juvenile Justice, Elizabeth Scott and I have argued that this is how the nation should structure the justice system, treating adolescent offenders as an intermediate category, neither as children, whose crimes society excuses, nor as adults, whom society holds fully responsible for their acts. I’ve heard the suggestion that society should apply this model to drinking as well, and permit individuals between 18 and 20 to purchase beer and wine, but not hard liquor, and to face especially stiff punishment for intoxication or wrongdoing under the influence of alcohol. There are some areas of the law, though, where a three-way system would be difficult to imagine, such as voting.

A final possibility is acknowledging that there is variability in brain and behavioral development among people of the same chronological age and making individualized decisions rather than draw categorical age boundaries at all. This was the stance taken by many of the Supreme Court justices who dissented in the juvenile death penalty and life without parole cases. They argued that instead of treating adolescents as a class of individuals who are too immature to be held fully responsible for their behavior, the policy should be to assess each offender’s maturity to determine his criminal culpability. The justices did not specify what tools would be needed to do this, however, and reliably assessing psychological maturity is easier said than done. There is a big difference between using neuroscience to guide the formulation of policy and using it to determine how individual cases are adjudicated. Although it may be possible to say that, on average, people who are Johnny’s age are typically less mature than adults, we cannot say whether Johnny himself is.

Science may someday have the tools to image an adolescent’s brain and draw conclusions about that individual’s neurobiological maturity relative to established age norms for various aspects of brain structure and function, but such norms do not yet exist, and the cost of doing individualized assessments of neurobiological maturity would be prohibitively expensive. Moreover, it is not clear that society would end up making better decisions using neurobiological assessments than those it makes on the basis of chronological age or than those it might make using behavioral or psychological measures. It makes far more sense to rely on a driving test than a brain scan to determine whether someone is ready to drive. So don’t expect to see brain scanners any time soon at your local taverns or movie theaters.

Accepting the challenges

The study of adolescent brain development has made tremendous progress in the very short period that scientists have been studying the adolescent brain systematically. As the science moves ahead, the big challenge facing those of us who want to apply this research to policy will be understanding the complicated interplay of biological maturation and environmental influence as they jointly shape adolescent behavior. And this can be achieved only through collaboration between neuroscientists and scholars from other disciplines. Brain science should inform the nation’s policy discussions when it is relevant, but society should not make policy decisions on the basis of brain science alone.

Whether the revelation that the adolescent brain may be less mature than scientists had previously thought is ultimately a good thing, a bad thing, or a mixed blessing for young people remains to be seen. Some policymakers will use this evidence to argue in favor of restricting adolescents’ rights, and others will use it to advocate for policies that protect adolescents from harm. In either case, scientists should welcome the opportunity to inform policy discussions with the best available empirical evidence.