Are New Accountability Rules Bad for Science?
If properly implemented, the new requirements could strengthen science and enhance its public support.
In 1993, the U.S. Congress quietly passed a good government bill, with little fanfare and full bipartisan support. President Clinton happily signed it into law, and Vice President Gore incorporated its principles into his own initiatives to make government work better. The law required each part of government to set goals and measure progress toward them as part of the process of determining budgets. On its surface, this was not the type of legislation that one would expect to spur controversy. Guess again.
The first reaction of the federally supported scientific research community to this law was shock and disbelief. The devil was in the details. The law required strategic planning and quantitative annual performance targets–activities either unknown or unwanted among many researchers. Surely, many thought, the new requirements were not intended to apply to science. They were wrong. The law made it clear that only the Central Intelligence Agency might be exempted.
Since that time, the story of the relationship between research and the Government Performance and Results Act of 1993 (GPRA or the Results Act) has been one of accommodation–from both sides. Research agency officials who at first just hoped the law would go away were soon pointing out that it presented a wonderful opportunity for explaining the benefits of research to the public. A year later, each of the agencies was working on developing an implementation plan suited to its particular character. Recently, the National Academies’ Committee on Science, Engineering, and Public Policy issued an enthusiastic report, stressing the feasibility of the task. On the other side, the Office of Management and Budget (OMB) and Congress have gone from insisting on precise numbers to approving, if not embracing, retrospective, qualitative performance goals.
The Results Act story is more than the tale of yet another government regulation gone awry. The act in fact became a lightning rod for a number of issues swirling around U.S. research in the postCold War period. Was the law really the fine print in the new social contract between science and society? Its requirements derive from a management model. Does this mean that research can and should be managed? It calls for accountability. Does the research community consider itself accountable? How concretely? And finally, the requirements for strategic planning and stakeholder consultation highlighted the question: What and whom is federally sponsored research for? Neither GPRA nor the discussion around it has answered any of these questions directly. But they have focused our attention on issues that will be with us for a long time. In the process we have learned some lessons about how to think about the government role in science and technology.
The act
The theory behind results-oriented management is simple and appealing. In the old days, agency officials defined their success in terms of how much money they were spending. When asked, “How is your program doing?” they answered with their budget allocations. But once these officials are converted into results-oriented managers, they will focus first on the results they are producing for the U.S. public. They will use simple, objective measures to determine whether they are producing those results. Then they will turn their creative energy to explore new, more effective ways of producing them that might even cost less than the current budget allocation.
To implement this theory, the Results Act requires that each agency prepare a strategic plan that covers a period of at least five years and is updated every three years. This plan must cover all of the agency’s main activities, although they can be aggregated in any way that the agency thinks is sensible, provided that its congressional committees agree. Strategic plans are required at the agency level but not necessarily for subunits. The National Institutes of Health (NIH) accordingly did not submit its own plan but was included in the strategic plan for the Department of Health and Human Services. Likewise, defense R&D occupies only a tiny spot in the overall Department of Defense strategic plan.
From the strategic plan goals, agencies derive their performance goals, which are passed on to Congress in a performance plan. Eventually, the performance plan is to be integrated into the annual budget submission. The ideal performance plan, at least from the viewpoint of the accountants and auditors, indicates the specific program funds and personnel numbers devoted to each performance goal. The plan must specify target levels of performance for a particular fiscal year. For example, if an agency wants to improve customer service, it might set a performance target for the percentage of telephone calls answered within three minutes or the percentage of customer problems resolved within two days. After the end of each fiscal year, agencies must report to Congress whether they met their performance goals. OMB has asked that the spring performance report required under GPRA be incorporated into an “accountability report,” which will also include a set of financial and auditing reviews required under other pieces of legislation. The first accountability reports were prepared this spring.
The Results Act is modeled on similar legislation that exists at state and local levels in the United States and has been adopted by several other countries. Federal budget reformers have previously attempted to accomplish their goals by means of executive orders, but these have all been withdrawn fairly quickly when it became apparent that the results would not be worth the burden of paperwork and administrative change. Some old-timers predicted the same fate for GPRA. In the first years after it was passed, a few agencies rushed to implement the framework but others held back. Congressional staff also showed little interest in the law until 1996. Then, a newly reelected Republican Congress faced a newly reelected Democratic president just at the time when GPRA was due for implementation, and a ho-hum process of implementation became a confrontation between the legislative and executive branches. A private organization briefed congressional staff on the GPRA requirements and gave them a checklist for evaluating the draft strategic plans due the following spring. Staff turned the checklist into an “examination,” which most draft plans from the agencies failed. Headlines in the Washington Post thus became an incentive for agencies to improve their plans, which they did.
GPRA in research
In the research agencies, the Results Act requirements did not enter a vacuum. Evaluation offices at the National Science Foundation (NSF) and NIH had explored various measures of research activity and impact in the 1970s, and NIH even experimented with monitoring institutes with publication counts and impact measures. An Office of Science and Technology Policy (OSTP) report in 1996 referred to research evaluation measures as being “in their infancy,” but nothing could be further from the truth. In fact, they were geriatric. NIH had discontinued its publication data series because it was not producing enough useful management information. In fact, many universities reported that publication counts were not just unhelpful, but downright distorting as a means of assessing their research programs.
The method of choice in research evaluation around the world was the expert review panel. In the United States, the National Institute of Standards and Technology (NIST) had been reviewing its program in this way since the 1950s. During the 1980s and 1990s, other mission agencies, including the Departments of Defense, Energy, and Agriculture had been strengthening their program review processes. It was common practice to give external review panels compilations of data on program activities and results and to ask the reviewers to weigh them in their evaluation. In the early days of response to GPRA, it was not clear to agency evaluation staff how such review processes could be translated into annual performance goals with quantitative targets.
There was considerable debate in the early years over what was to count as an outcome. Because of the law’s demand for quantitative measures, there was a strong tendency to focus on what could be counted, to the neglect of what was important. One camp wanted to measure agency processes: Was the funding being given out effectively? OMB was initially in this camp, because this set of measures focused on efficient management. A number of grant-supported researchers also thought it was a good idea for the act to focus on whether their granting agencies were producing results for them, not on whether they were producing results for the public.
Another camp realized that since only a small part of the money given by Congress to granting agencies went into administration, accountability for the bulk of the money would also be needed. Fortunately for the public, many in Congress were in this camp, although sometimes over-enthusiastically. In the end, both management measures and research outcomes have been included in several performance plans, including those of NIH and NSF.
The discussion in the research community quickly converged on a set of inherent problems in applying the GPRA requirements to research. First, the most important outcomes of research, major breakthroughs that radically change knowledge and practice, are unpredictable in both direction and timing. Trying to plan them and set annual milestones is not only futile but possibly dangerous if it focuses the attention of researchers on the short term rather than the innovative. As one observer has put it, “We can’t predict where a discovery is going to happen, let alone tell when we are halfway through one.” Second, the outputs of research supported by one agency intermingle with those of activities supported from many other sources to produce outcomes. Trying to line up spending and personnel figures in one agency with the outcomes of such intermingled processes does not make sense.
Third, there are no quantitative measures of research quality. Effective use of performance measures calls for a “balanced scorecard”–a set of measures that includes several of the most important aspects of the activity. If distortions in behavior appear as people orient toward one measure, the distortion will be obvious in another measure, allowing the manager to take corrective action. In research, pressure to produce lots of publications can easily crowd out attention to quality. But without a measure of quality, research managers cannot balance their scorecards, except with descriptive information and human judgments, such as the information one receives from panels.
The risks of applying GPRA too mechanistically in research thus became clear. First, as we have seen, short-termism lurks around every corner in the GPRA world in the form of overemphasis on management processes, on research outputs (“conduct five intensive operations periods at this facility”) rather than outcomes (“improve approaches for preventing or delaying the onset or the progression of diseases and disabilities”), and on the predictable instead of the revolutionary. Short-termism is probably bad in any area of government operations but would be particularly damaging in research, which is an investment in future capabilities.
Contractualism is a second potential danger. If too much of the weight of accountability rests on individual investigators and projects, they will become risk-averse. Many investigators think that the new accountability requirements will hold them more closely to the specific objectives that they articulate in their proposals and that their individual projects will have to meet every goal in their funding agency’s strategic plan. Most observers feel that such a system would kill creativity. Although no U.S. agency is actually planning to implement the law in this way, the new project-reporting systems that agencies are designing under GPRA seem to send this message implicitly. Moreover, research councils in other countries have adopted approaches that place the accountability burden on individual projects rather than portfolios of projects. The fear is thus not completely unfounded.
Third, reporting requirements could place an undue burden on researchers. University-based investigators grasped instantly that every agency from which they received funds would, in the near future, begin asking for outcome reports from every piece of work they funded in order to respond to the new law. Since these are government agencies, they would eventually try to harmonize the systems, but it would take quite some time. In the meantime, more time for paperwork means less time for research.
As GPRA implementation neared, ways to avoid these risks emerged. Most agencies made their strategic plan goals very broad and featured knowledge production prominently as an outcome. Some agencies modified the notion of the annual target level of performance to allow retrospectively applied qualitative performance goals. To keep the risk of contractualism under control, agencies planned to evaluate portfolios of projects, and even portfolios of programs, rather than individual ones. And the idea that expert panels will need to play an important role in the system has gradually come to be taken for granted. The majority of performance goals that appeared in the first set of performance plans specify outputs, and many of them took the form of annual milestones in a research plan. True outcome goals were mostly put in qualitative forms.
Basic research
The research constituencies of NIH and NSF had historically not seen strategic planning as applicable to science, and in 1993, both agencies had had recent bad experiences with it. Bernadine Healy, director of NIH in the early 1990s, had developed a strategic plan that included the controversial claim that biomedical research should contribute to economic prosperity as well as personal health. Widely seen as a top-down effort, the plan was buried before it was released. Because NIH is only a part of a larger government department, it has never been required under the Results Act to produce a strategic plan, and it has received only scant coverage in the department-level plan. One departmental strategic plan goal was focused on the NIH mission: “Strengthen the nation’s health sciences research enterprise and enhance its productivity.”
Also in the early 1990s, NSF staff developed a strategic plan under Walter Massey’s directorship, but the National Science Board did not authorize it for distribution. Nonetheless, articulating the broad purposes of government-sponsored research was seen as an important task in the postCold War period. OSTP issued Science in the National Interest, articulating five broad goals. In its wake, a new NSF director, Neal Lane, began the strategic planning process again and won National Science Board approval for NSF in a Changing World. It also articulated very generic goals and strategies, such as “Enable the U.S. to uphold a position of world leadership in all aspects of science, mathematics, and engineering,” and “Develop intellectual capital.” This document formed the first framework for GPRA planning at NSF.
To prepare for annual performance planning, NSF volunteered four pilot projects under GPRA in the areas of computing, facilities, centers, and management. Initial rounds of target-setting in these areas taught that it is wise to consult with grantees and that it is easier to set targets than to gather the data on them. The pilot project performance plans were scaled up into draft performance plans for NSF’s major functions, then ran into a snag. Several of the plans included standard output performance indicators such as numbers of publications and students trained. But senior management did not think these measures conveyed enough about what NSF was actually trying to do and were worried that they would skew behavior toward quantity rather than quality, thus undermining NSF’s mission. Eventually, instead of setting performance goals for output indicators, NSF proposed qualitative scaling: describing acceptable and unacceptable levels of performance in words rather than numbers. This approach was condoned in the fine print of the law. For research, it had the advantages of allowing the formulation of longer-term objectives and allowing them to be applied retrospectively. Management goals for NSF have been put in quantitative form.
Underlying its qualitative approach, however, NSF was also committing itself to building up much more information on project results. Final project reports, previously open-ended and gathered on paper, are now being collected through a Web-based system that immediately enters the information into a database maintained at NSF. Questions cover the same topics as the old form but are more detailed. NSF has further committed itself to converting an existing review mechanism called Committees of Visitors (COVs), which currently audits the peer review process, to shift its focus toward program results. COVs will receive information from the results database and rate the program in question using the qualitative scales from the performance plan. The process will thus end up closely resembling program reviews at applied research agencies, although different criteria for evaluation will be used.
NIH has followed NSF’s lead in setting qualitative targets for research goals and quantitative ones for “means” of various sorts, including program administration. But NIH leadership claims that examples of breakthroughs and advances are sufficient indicators of performance. Such “stories of success” are among the most widely used methods of communicating how research produces benefits for the public, but analysts generally agree that they provide no useful management information and do not help with the tradeoff issues that agencies, OMB, and Congress face regularly.
Applied research
In contrast to the slow movement at the basic research agencies, the National Oceanic and Atmospheric Administration (NOAA), part of the Department of Commerce, began putting a performance budgeting system in place before the passage of the Results Act. Research contributes to several of the strategic and performance goals of the agency and is judged by the effectiveness of that contribution. The goals of the NOAA strategic plan formed the structure for its 1995 budget submission to Congress. But the Senate Appropriations Committee sent the budget back and asked for it in traditional budget categories. NOAA has addressed this challenge by doing a dual budget, one in each form. Research goals are often put in milestone form.
The Department of Commerce, however, had not adopted any standard approach. Another agency of Commerce, NIST, was following an approach quite different from NOAA’s. For many years, NIST had been investing in careful program evaluation and developing outcome-oriented performance indicators to monitor the effectiveness of its extramural programs: the Manufacturing Extension Partnerships and the Advanced Technology Program. But NIST had never incorporated these into a performance budgeting system. The larger Department of Commerce performance plan struggled mightily to incorporate specific performance goals from NOAA and NIST into a complex matrix structure of goals and programs. Nevertheless, congressional staff gave it low marks (33 points out of 100).
The Department of Energy (DOE) also responded early to the call for results-oriented management. A strategic plan formed the framework for a “performance agreement” between the secretary of energy and the president, submitted for fiscal year 1997. This exercise gave DOE experience with performance planning, and its first official performance plan, submitted under GPRA for fiscal year 1999, was actually its third edition. Because DOE includes the basic energy sciences program, which supports high-energy physics with its many large facilities, efficient facilities management figured among the performance goals. Quantitative targets for technical improvement also appeared on the list, along with milestone-type goals.
The creative tension between adopting standard GPRA approaches and letting each agency develop its own approach appeared in the early attention paid to the Army Research Laboratory (ARL) as a model in results-oriented management. ARL is too small to have to respond directly to GPRA, but in the early 1990s, the ARL director began demanding performance information. A long list of indicators was compiled, made longer by the fact that each unit and stakeholder group wanted to add one that it felt reflected its performance particularly well. As the list grew unwieldy, ARL planning staff considered combining them into an index but rejected the plan because the index would say so little in and of itself. ARL eventually decided to collect over 30 performance indicators, but its director focuses on a few that need work in a particular year. Among the indicators were customer evaluation scores, collected on a project-by-project basis on a simple mail-back form. ARL also established a high-level user panel to assess the overall success of its programs once a year, and it adopted a site visit review system like NIST’s, because the director considers detailed technical feedback on ARL programs to be worth the cost. The ARL approach illustrates the intelligent management use of performance information, but its targeted, customer-oriented research mission leads to some processes and indicators that are not appropriate in other agencies.
The Agricultural Research Service (ARS), a set of laboratories under the direct management of the Department of Agriculture, has developed a strategic plan that reflects the categories and priorities of the department’s plan. A first draft of an accompanying performance plan relied heavily on quantitative output indicators such as publications but was rejected by senior management after review. Instead, ARS fully embraced the milestone approach, selecting particular technical targets and mapping the steps toward them that will be taken in a particular fiscal year. This approach put the plan on a very short time horizon (milestones must be passed within two years from the date of writing the plan), and staff admit that the technical targets were set for only some of its activities.
The National Aeronautics and Space Administration also embraced the roadmap/milestone approach for much of its performance planning for research. In addition, it set quantitative targets for technical improvements in certain instruments and also set itself the goal of producing four percent of the “most important science stories” in the annual review by Science News.
The questions researchers ask in applied research are often quite similar to those asked in basic scientific research, exploring natural phenomena at their deepest level. But it is generally agreed that the management styles for the two types of research cannot be the same. In applied research, the practical problems to be solved are better specified, and the customers for research results can be clearly identified. This allows applied research organizations to use methods such as customer feedback and road mapping effectively. Basic research, in contrast, requires more freedom at the detailed level: macro shaping with micro autonomy. The Results Act is flexible enough to allow either style.
Old wine in new bottles?
One breathes a sign of relief to find that traditional research management practices are reappearing in slightly modified forms in GPRA performance plans. But then it is fair to ask whether GPRA actually represents anything new in the research world. My view is that although there is continuity, there is also change in three directions: pressure, packaging, and publics. Are the impacts of these changes likely to be good or bad for science?
There is no question that the pressure for accountability from research is rising around the world. In the 1980s, the common notion was that this was budget pressure: Decisionmakers facing tough budget tradeoffs wanted information to make better decisions. There must be some truth here. But the pressure also rose in places where budgets for research were rising, indicating another force at work. I suggest that the other factor is the knowledge economy. Research is playing a bigger role in economic growth, and its ever-rising profile attracts more attention. This kind of pressure, then, should be welcomed. Surely, it is better than not deserving any attention at all.
But what about the packaging? Is the Results Act a straitjacket or a comfortable new suit? The experience of other countries in incorporating similar frameworks is relevant here. Wherever such management tools have been adopted–for example, in Australia, New Zealand, and the United Kingdom–there have been complaints during a period of adjustment. But research and researchers have survived, and survived with better connections into the political sphere than they would have achieved without the framework. In the United Kingdom, for example, the new management processes have increased dialogue between university researchers and industrial research leaders. Most research councils in other countries have managed to report performance indicators to their treasury departments with less fanfare than in the United States, and no earthquakes have been reported as a result. After a GPRA-like reform initiative, research management in New Zealand is more transparent and consultative. The initial focus on short-term activity indicators has given way to a call for longer-term processes that develop a more strategic view.
Among the three key provisions of GPRA, strategic planning probably presents the most interesting, and so far underutilized, opportunities for the research community. The law requires congressional consultation in the strategic planning process, and results-oriented management calls for significant involvement of stakeholders in the process. Stakeholders are the groups outside the agency or activity that care whether it grows or shrinks, whether it is well managed or poorly managed. GPRA provides researchers an opportunity to identify stakeholders and to draw them into a process of long-term thinking about the usefulness of the research. For example, at the urging of the Institute of Medicine, NIH is beginning to respond to this opportunity by convening its new Director’s Council of Public Representatives.
Perhaps the most damaging aspect of GPRA implementation for research is the defensive reaction of some senior administrators and high-level groups to the notion of listening to the public in strategic planning and assessment. There is nothing in the Results Act that takes decisionmaking at project level out of the hands of the most technically competent people available. But GPRA does provide an opportunity for each federal research program to demonstrate concretely who benefits from the research by involving knowledgeable potential users in its strategic planning and retrospective assessment processes. In this, GPRA reflects world trends in research management. Those who think that they are protecting themselves by not responding may actually be outdating themselves.
Next steps
As Presidential Science Advisor Neal Lane said recently when asked about GPRA, “It’s the law.” Like it or not, researchers and agencies are going to have to live with it. In this somewhat-new world, the best advice to researchers is also what the law is intended to produce: Get strategic. Follow your best interests and talents in forming your research agenda, but also think about the routes through which it is going to benefit the public. Are you communicating with audiences other than your immediate colleagues about what you are doing? Usually research does not feed directly into societal problem-solving but is instead taken up by intermediate professionals such as corporate technology managers or health care professionals. Do you as a researcher know what problems those professionals are grappling with, what their priorities are? If not, you might want to get involved in the GPRA process at your funding agency and help increase your effectiveness.
Agencies are now largely out of their defensive stage and beginning to test the GPRA waters. Their challenges are clear: to stretch their capabilities just enough through the strategic planning process to move toward or stay at the cutting edge, to pare performance indicators to a minimum, to set performance goals that create movement without generating busywork, and finally, to listen carefully to the messages carried in assessment and reshape programs toward the public good.
The most important group at this time in GPRA implementation is Congress. Oversight of research activities is scattered across a number of congressional committees. Although staff from those committees consult with each other, they are not required to develop a common set of expectations for performance information. Appropriations committees face large-scale budget tradeoffs with regard to research, whereas authorizing committees have more direct management oversight responsibility. Authorizing committees for health research get direct public input regularly, whereas authorizing committees for NSF and Commerce hear more from universities and large firms. Indeed, these very different responsibilities and political contexts have so far led the various congressional committees to develop quite different GPRA expectations.
Congress has the choice about whether GPRA remains a law or not. Has it generated enough benefit in strategic information to offset the paperwork burden? Rumors of the imminent demise of the law are tempered by a recent report from the Congressional Research Service indicating that its principles have been incorporated into more than 40 other laws. Thus, even if GPRA disappears, results-oriented management probably will not.
Most important, the stated goal of the law itself is to increase the confidence of the U.S. public in government. Will it also increase public confidence in research? To achieve a positive answer to that question, it is crucial that Congress not waste its energy developing output indicators. Instead, it should ask, “Who are the stakeholders? Is this agency listening to them? What do they have to say about their involvement in setting directions and evaluating results?” Addressing these questions will benefit research by promoting increased public understanding and support.