The Promise of Data-Driven Policymaking

The federal government is only scratching the surface in its use of information technologies to collect, analyze, and use data in innovative ways.

During the past decade, advances in information technology have ignited a revolution in decisionmaking, from business to sports to policing. Previously, decisions in these areas had been heavily influenced by factors other than empirical evidence, including personal experience or observation, instinct, hype, and dogma or belief. The ability to collect and analyze large amounts of data, however, has allowed decisionmakers to cut through these potential distortions to discover what really works.

In the corporate sector, a wide variety of data-driven approaches are now in place to boost profits, including systems to improve performance and reliability, evaluate the success of advertising campaigns, and determine optimal price. Marriot International, for example, has created a program called Total Hotel Optimization that uses data to shape customer promotions and set prices on rooms, conference facilities, and catering.

In Major League Baseball, the scouting departments of some of the most successful teams are stocked with statistical experts who crunch numbers to determine which players to draft and sign. As described in Michael Lewis’s Moneyball, Oakland A’s General Manager Billy Beane relied on statistical analysis to build one of baseball’s most winning teams while maintaining one of the lowest payrolls.

Data-driven policing took hold in the mid-1990s when the New York City Police Department put in place a computerized system, called CompStat, to track and map crime by neighborhood, allowing the department to more effectively deploy its resources. Under this system, which has been replicated in dozens of cities, the city’s murder rate plummeted almost 70%, well above national averages.

A similar revolution in government decisionmaking is waiting to be unleashed. Policymaking, as it currently stands, can be like driving through a dense fog in the middle of the night. Large data gaps make it difficult to see problems clearly and chart a course forward. In education, for example, we lack basic classroom data that could be used to deploy highly effective teachers where they are needed most. In health care, we are unable to systematically draw comparisons across providers to identify the most effective approaches and most needed investments. And in the environmental arena, basic data on air and water pollution as well as chemical exposures are often unavailable, impairing our ability to prevent public harm.

In a paper-based world, the requisite information was virtually impossible to generate. The costs and administrative burden associated with data collection and analysis were simply too steep. As the corporate sector is demonstrating, however, these barriers have now been substantially reduced. New information technologies make possible—and affordable—a series of monitoring opportunities, data exchanges, analytical inquiries, policy evaluations, and performance comparisons that would have been impossible even a few years ago.

By more effectively harnessing these technologies, government can begin to close data gaps that have long impeded effective policymaking. As problems are illuminated, policy-making can become more targeted, with attention appropriately and efficiently directed; more tailored, so that responses fit divergent needs; more nimble, able to adjust quickly to changing circumstances; and more experimental, with real-time testing of how problems respond to different strategies. Building such a data-driven government will require sustained leadership and investment, but it is now within our reach.

From the greenhouse gas emissions causing climate change to the particulates linked to rising childhood asthma, many of today’s most vexing environmental problems cannot be seen. Likewise, without good data, it is difficult to tease out the multiple elements that turn failing schools into successful ones or identify the factors that cause some hospitals to outperform others. New technologies for data collection, analysis, and dissemination provide the opportunity to make the invisible visible, the intangible tangible, and the complex manageable.

Previously, data had to be reported on paper to government and then entered by hand into a database. This slow and painstaking process severely constrained data collection and forced decisionmakers to diagnose problems based on an incomplete picture drawn from sometimes years-old and error-ridden data. Today, however, government no longer faces the same imperative to pick and choose what information to collect, thanks to breathtaking advances in information-gathering technologies.

Sensor and satellite technologies provide the ability to collect data remotely—24/7, with no data entry necessary—on almost anything in the physical environment, including air and water quality, the health of ecosystems, traffic flow, and the condition of critical infrastructure, such as roads and bridges. For other types of data, including health care records and student test scores, electronic reporting and management systems can seamlessly and instantaneously transfer and aggregate data and check for errors. These technologies are still underused but if effectively harnessed could be used to build a robust information infrastructure for more precise problem spotlighting.

The ability to quickly process information also enables more responsive government. Currently, government often responds only after public harm—illness, death, and other hardships or crises—is manifest. Real-time data collection, on the other hand, empowers government officials to spot problems in time to take preventive action. In a report on the possibility of a terrorist attack on drinking water supplies, for example, the Government Accountability Office noted that experts it consulted “most strongly supported developing real-time monitoring technologies to quickly detect contaminants in treated drinking water on its way to consumers.”

Knowing that a problem exists is frequently not enough, of course. It may also be necessary to know the problem’s nature and shape to effectively develop solutions. What factors contribute to the problem, including how factors interact with each other and their relative importance? What people or communities are most affected? And what is the trend over time, including projections of future severity?

IN 2003, THE BUSH ADMINISTRATION LAUNCHED A NEW TOOL— THE PERFORMANCE ASSESSMENT RATING TOOL (PART)—TO EVALUATE THE PERFORMANCE OF INDIVIDUAL PROGRAMS IN ALL FEDERAL AGENCIES. BUT PART REVIEWS ARE OPEN TO A GREAT DEAL OF SUBJECTIVE INTERPRETATION AND POTENTIALLY POLITICAL MANIPULATION.

Answering these questions requires careful analysis, again with the help of new information technologies. Relational database and data-warehousing systems allow multiple data sets to be queried at once, providing the opportunity to break down the data silos that are now the rule in federal government. For example, we could fuse pollution data, such as annual toxic releases, with public health data, such as cancer-related deaths, and census data. Such integration would facilitate research to uncover what sort of pollution is causing what sort of health effects in what sort of population.

There are also analytical tools that go beyond simple queries to generate deeper understanding. Geographic information systems (GISs) provide the ability to map and visually overlay multiple data sets. Data-mining systems apply automated algorithms to extract patterns, draw correlations, disentangle issues of causation, and predict future results. Within moments, these tools can generate new knowledge that might take years to uncover manually.

As data are collected and analyzed, they can be shared with the public, opening up the policymaking process. Much more still needs to be done, but government Web sites are starting to provide searchable databases, GISs, and other analytical tools. The public can request databases on CD-ROM, so that data can be reconfigured, repackaged, or merged with other data. Data disseminated electronically empower a broad array of actors—including the press, political opponents of the governing party, academics, nongovernmental organizations, the private sector, and concerned citizens— to uncover problems, develop innovative solutions, and demand results.

Indeed, baseball’s move toward data-driven decision-making was initiated not by teams but by fans using their personal computers to crunch statistics and develop a deeper understanding of the game. Billy Beane latched on to and applied these fans’ ideas. Likewise, those outside government can be a huge asset for policymaking, if given the tools to conduct their own analyses.

The federal government is now only scratching the surface in its use of new technologies to collect, analyze, and disseminate data. Antiquated paper-based recordkeeping still pervades U.S. health care, for example, and industrial facilities still hand-report pollution data, often as estimates of pollution, not precise measurements. Moreover, data sets are almost never fused across federal agencies or even within agencies, and only sometimes are they made searchable through the Internet. As new technologies are put to greater use, a far clearer picture of our problems will emerge, opening the door to more targeted, tailored, and precise policymaking.

In the absence of good data, policymaking frequently relies on intuition, past experience, or expertise, all of which have serious drawbacks. A considerable body of research has demonstrated how emotion, issue framing, cascade effects, and other biases cloud policy judgments. Data allow for cool analysis that can help overcome these biases and achieve better policy results.

Of course, this is not to say that data can provide all the answers. Even as we close data gaps with new technologies, there will always be some issues that are difficult or even impossible to capture quantitatively. Thoughtful analysis and human judgment are required to interpret available data and take account of factors that may not be reflected in the numbers. In addition, values are essential to inform policy choices and will continue, appropriately, to be the subject of political debate.

As gaps in knowledge are closed, however, the zone in which political judgment plays out narrows, facilitating consensus and smarter policymaking. In particular, more refined data allow policymakers to develop responses that are targeted at the most important problems or causal factors, calibrated for disparate impacts, and tailored to meet individualized needs.

Policymaking begins with the setting of priorities. Policymakers may identify an array of problems that should be addressed, but because of resource constraints they may be forced to pick and choose. Often, these choices are made haphazardly. Government does a poor job of justifying and delineating priorities for both regulation and the budget. Why is a regulation being undertaken over other possibilities? Why is the budgetary pie divided the way it is? Data can be used to compare problems by relative severity to more efficiently and equitably allocate attention and resources.

Finding ways to package and unlock raw data is essential to drawing such comparisons. This might be as simple as providing quantitative tables that highlight key information. The National Highway Traffic Safety Administration (NHTSA) does a good job of organizing data on auto fatalities and injuries by state. But it might provide even greater clarity. The city of Charlotte, North Carolina, for example, has developed neighborhood “quality of life” rankings, updated every two years, based on 20 indicators measuring conditions in 173 “neighborhood statistical areas.” These indicators are used to identify and target fragile neighborhoods for revitalization.

A BROADER VISION IS NEEDED TO MODERNIZE AND REVOLUTIONIZE THE FEDERAL GOVERNMENT. TOO OFTEN, TECHNOLOGY DEPLOYMENT, DATA GENERATION, POLICY DEVELOPMENT, AND PERFORMANCE MEASUREMENT ARE PURSUED ALMOST AS SEPARATE ENTERPRISES, WITH LITTLE THOUGHT GIVEN TO HOW THEY CONNECT TO AND SUPPORT EACH OTHER.

Specific problems can be similarly dissected to enable targeted policymaking. A problem may have a number of different causes of varying importance, or factors may interact with each other to mitigate or aggravate a problem. The health consequences of one pollutant, for instance, may be aggravated by another pollutant. Knowing this information allows policymakers to focus efforts on key causal factors.

The shape of a problem and the response required also may shift according to a host of background variables, including differences in geography, local infrastructure, demographic makeup, and even individual people. With refined data and analysis, policies can be directed at those most at risk and tailored to fit individual needs or circumstances. The United Kingdom, for example, is moving to personalize learning by providing teachers with a data-rich picture of each student’s needs, strengths, and interests. This knowledge, assembled through new information technology, can be applied so that students are taught in ways that work best for them. Fine-grained data allow policymakers to manage diversity and respond to individualized needs rather than forcing conformity to a uniform approach or standard.

Even with the best data, policymaking is not an exact science and will rarely be done precisely right the first time. Just as leading companies follow the mantra of continuous improvement, good governance requires a process of ongoing trial and error. Once an initiative is implemented, we need to continuously monitor and measure how it is working and make adjustments for better results.

The federal government took a step in this direction with the Government Performance and Results Act (GPRA) of 1993, which requires each federal agency to regularly set goals by which performance is to be measured. Done well, such goal-setting can clarify choices about how to direct attention and resources, communicate expectations and instill a sense of purpose, and stimulate problem-solving and experimentation to find what works. Frequently, however, agencies focus on outputs (activities performed to achieve a goal) or, worse yet, inputs (such as money spent) rather than outcomes that measure actual real-world improvements.

Outputs and inputs are not unimportant, but it is vital to understand how they interact with outcomes to find the most effective and efficient approaches. The measurement of outputs and inputs in isolation may cause government personnel to focus on the performing of tasks that have little to do with real-world results.

Even when there is commitment to develop outcome-focused goals and measures, there can be significant hurdles. Sometimes it is not clear which metrics to use and how to isolate the influence of a policy from the influence of other factors. If oversimplified or misdirected, performance measurement can create warped perceptions and distorted incentives. Some doctors have reportedly begun to turn away gravely ill patients, for example, to boost their personal fatality ratings provided by the federal government. Careful deliberation is required to ensure that metrics accurately reflect program performance and promote desired outcomes. As issues evolve, new metrics will need to be developed and indicators reconfigured.

In 2003, the Bush administration launched a new tool— the Performance Assessment Rating Tool (PART)—to evaluate the performance of individual programs in all federal agencies, ostensibly to inform the president’s budget decisions. But PART reviews, conducted by the White House Office of Management and Budget, are open to a great deal of subjective interpretation and potentially political manipulation. The Federal Emergency Management Agency’s disaster response and recovery programs, for instance, were scored as “adequate” shortly after gross deficiencies were exposed in the response to Hurricane Katrina.

To be successful, performance evaluation must be transparent, free of political manipulation, and based on credible and easily understood data. With reliable performance data in hand, it is then possible to make necessary adjustments to government programs. Policies that are producing good results should be extended and expanded. Those that are not should be rethought, with resources redeployed.

Key in this is government’s ability to incorporate performance data into the decisionmaking process. Even after federal agencies issue their annual GPRA reports, policymakers seldom take notice or make use of the data. In contrast, under Baltimore’s successful CitiStat system (put in place by then-Mayor Martin O’Malley in 2000 and replicated by at least 11 other U.S. cities) heads of city departments report to City Hall every other week to present updated performance data and answer questions from high-level officials in the mayor’s office, sometimes including the mayor.

The frequency of review sessions keeps city leadership focused on the numbers, so that problems are quickly spotted and addressed. CitiStat is credited with saving Baltimore $350 million since its inception while dramatically improving city programs and services. (The city guarantees, for example, that a pothole will be repaired within 48 hours after receiving a public complaint.) As Maryland’s new governor, O’Malley is now implementing this approach on the state level, as is Washington Governor Christine Gregoire.

The ability to track and apply performance data could deliver enormous benefits at the federal level as well. Building this capacity would not only enhance government’s ability to refine policies and adjust to changing circumstances; it would also allow federal agencies to replace one-size-fits-all rules or standards with flexible approaches that encourage policy competition.

Those responsible for implementation, such as state and local governments, industrial facilities, and schools, could be empowered to develop their own solutions so long as real-world objectives are met. Focusing on results, rather than required tasks, encourages experimentation and innovation while allowing policies to be tailored to local circumstances.

Federal agencies can then promote collective learning by evaluating relative performance among peers and spotlighting the most effective strategies that should be expanded, as well as ineffective strategies that should be avoided. NHTSA, for example, has promoted collective learning among states as one of its primary strategies to increase seatbelt usage. In one case, NHTSA urged and worked with states to replicate North Carolina’s “Click It or Ticket” program, which had achieved significant gains by stepping up the enforcement of seatbelt laws, with particular attention aimed at teens and young adults.

Ranking performance against a relevant peer group provides a particularly strong incentive to address weaknesses and adopt top-performing solutions. No state wants to be identified as a laggard, and all desire recognition for outperforming peers. Performance benchmarking, now done only sporadically, can be used to jump-start a race to the top without any federal command and control.

The idea that government should base its decisions on data, evidence, and rational analysis is not new, of course. What’s new is the opportunity created by information technologies to crystallize problems and highlight effective solutions. This opportunity, however, is still waiting to be seized. Policymaking persists much as it always has, even as technology has raced ahead and decisionmaking is transformed in the corporate sector and other realms.

A broader vision is needed to modernize and revolutionize the federal government. Too often, the various steps discussed above—technology deployment, data generation, policy development, and performance measurement—are pursued almost as separate enterprises, with little thought given to how they connect to and support each other. Bringing these components into a coherent whole is essential to implement data-driven policymaking.

The first order of business in this effort is building a robust information infrastructure. Government decisionmaking currently suffers from persistent data gaps, the lack of systematic analysis, and poor information management and dissemination. Accordingly, information needs must be methodically identified and then addressed through a government-wide strategy to procure and deploy new technologies.

This also should be accompanied by changes in the policymaking process, so that decisionmakers are positioned to capitalize on the information generated. In particular, this means creating systems, such as Baltimore’s CitiStat program, or enhancing existing systems, such as GPRA, to ensure that policymakers regularly consult data to guide decisions and drive real-world results.

Less tangible but equally important is the need to change the way we think about policymaking. Refined data permit more targeted, tailored, and experimental policymaking. Success depends on recognizing these opportunities and devising new approaches to take advantage of them.

Finally, a movement toward data-driven policymaking cannot happen without political leadership. At the federal level, the president and Congress must step up. Getting the dozens of different departments and agencies that make up the federal government to embrace this approach and harmonize efforts where responsibilities overlap will require significant planning, coordination, oversight, and, perhaps most crucially, investment, so that core agency functions are enhanced and not disrupted.

As we break down these barriers, however, we will begin to reap the benefits of a data-driven government that is more effective, efficient, open, and accountable. Let the revolution begin.

Your participation enriches the conversation

Respond to the ideas raised in this essay by writing to [email protected]. And read what others are saying in our lively Forum section.

Cite this Article

Esty, Daniel, and Reece Rushing. “The Promise of Data-Driven Policymaking.” Issues in Science and Technology 23, no. 4 (Summer 2007).

Vol. XXIII, No. 4, Summer 2007