Flirting with Disaster

Perspectives

JAMES R. PHIMISTER

VICKI M. BIER

HOWARD C. KUNREUTHER

Flirting with Disaster

In the aftermath of catastrophes, it is common to discover prior indicators, missed signals, and dismissed alerts that preceded the event. Indeed, in reviewing the accident literature, there are many notable examples where such prior signals were observed but not recognized and understood for the threat that they posed. They include the 1979 Three Mile Island nuclear power plant accident (another U.S. nuclear plant had narrowly averted a similar accident two years before) and the 2000 Concorde crash (tires had burst and penetrated the fuel tanks on five previous flights). Probably the most famous examples are the two space shuttle disasters. After it was determined that an O-ring failure had doomed Challenger in 1986, it was recognized that on a number of other occasions, O-rings had partially failed. After the loss of Columbia on February 1, 2004, investigators found that insulating foam had become detached from the external tank and pierced the orbiter’s thermal protection system. Although managers at the National Aeronautics and Space Administration (NASA) had observed debris strikes on numerous prior missions and had recognized them as a potentially serious issue, the Columbia Accident Investigation Board concluded that over time, a degree of complacency about the importance of debris strikes had crept into NASA’s culture.

These so-called precursor events can in hindsight seem so conspicuous that it is hard to understand why they were not recognized and acted on. In practice, organizations and individuals face significant challenges in identifying and reporting precursors; filtering, prioritizing, and analyzing signals that represent significant threats; and tracking the implementation of corrective actions until completion. Multiple approaches have been developed across industries and between firms within an industry to use information about precursor events and economic incentives to improve the safety of technological systems.

Precursors occur more often than accidents, of course, and normally have minimal impacts compared to those of an actual event. As such, they are inexpensive learning opportunities for understanding what could go wrong. Rallying an organization to identify and report precursor events can unearth numerous instances of potentially serious safety gaps. In contrast, failure to solicit, capture, and benefit from precursor information simply wastes a valuable resource that can be used to improve safety.

In recent years, research has pinpointed a variety of approaches that should help inform organizations seeking to initiate precursor-reporting programs, as well as those seeking to revise existing ones. All of these approaches have advantages and drawbacks; there is no one-size-fits-all program. However, although effectively managing precursors is challenging, choosing not to use precursor information to improve safety is unacceptable in high-hazard industries. A precursor event is an opportunity from which to learn and improve safety; not actively trying to learn from these events borders on neglect.

Centralized versus decentralized management

Some industries have centralized bodies that collect, analyze, and disseminate information on precursor events, whereas others have more fragmented site-specific or company-run precursor programs that serve the same role, but for smaller organizational groups. In both cases, government agencies typically work with industry stakeholders to facilitate the establishment of suitable guidelines and institutional contexts within which these bodies can work.

An example of a centralized approach is the Accident Sequence Precursor program overseen by the U.S. Nuclear Regulatory Commission (NRC). The program screens and selects precursors, primarily from Licensee Event Reports that plant operators must submit to the NRC when certain defined precursors that could affect plant safety occur. Each event is analyzed to determine its severity and relevance to safety. The results are aggregated across plants and then shared with licensees.

The airline industry takes a decentralized approach to the issue. Its Aviation Safety Actions Programs (ASAPs) encourage employees to voluntarily report safety information that may be critical to accident prevention. These programs are based on memoranda of understanding among the relevant airlines, the Federal Aviation Administration (FAA), and applicable third parties, such as labor organizations. Although carriers operate the programs, they must adhere to federal guidelines and share generated safety information with the FAA.

GOVERNMENT MUST PLAY A ROLE IN FACILITATING AN INFORMED DIALOGUE WITH INDUSTRY STAKEHOLDERS TO ENSURE THAT PRECURSOR INFORMATION IS SUCCESSFULLY USED AND MANAGED.

Centralized programs may prove better at capturing and detecting trends across all participating organizations, because a single body analyzes reported precursor information. Furthermore, centralized programs may have more impact than decentralized programs if they are recognized as industry watchdogs.

In contrast, decentralized programs may benefit from a greater sense of ownership among participants and can lead to closer interaction among those who report events, those who analyze them, and those who implement corrective actions. Since American Airlines initiated its ASAP in 1999, 43 other carriers have followed; which is evidence of the success of this approach. Each program is carrier-run and allows specific carrier-related safety issues to be addressed, although the FAA and labor unions are able to share lessons learned more broadly.

Command and control versus voluntary reporting

A key issue to be addressed in the design of precursor-reporting programs is whether reporting should be mandatory or voluntary. Government regulators can play a key role in deciding whether certain types of precursors must be reported by organizations under their jurisdiction. Regulatory bodies can also provide legal safeguards, such as protection from prosecution, for those who report precursor events voluntarily. Decisions on these key issues will often inform subsequent decisions about how to either enforce or encourage reporting. Systems with mandated reporting of precursor events normally involve penalties, such as fines, for failure to report certain types of events. Voluntary reporting systems often require the development of trust, goodwill, and an organizational climate that encourages individuals to report.

The Transportation Recall Enhancement, Accountability, and Documentation (TREAD) Act, passed by Congress in response to problems with Firestone tires on Ford vehicles, is an example of a mandated precursor-reporting system. The act requires automobile and automobile-part manufacturers to report a host of precursor data, such as consumer complaints and warranty claims and adjustments, along with more serious accident data. Failure to comply can result in up to $15 million in civil penalties, as well as potential criminal penalties. Precursor data are collected as Early Warning Reports. Although the system is relatively new, warranty data from the reports have been publicized by a number of news channels.

The Aviation Reporting Safety System (ASRS), a centralized system run by NASA for the FAA, is an example of a voluntary precursor-reporting program. The program accepts and analyzes reports voluntarily submitted by pilots, air traffic controllers, flight attendants, mechanics, ground personnel, and others in the airline industry. Individuals are encouraged to report incidents or situations that compromise aviation safety. (Reports about actual accidents or possible criminal activities, such as pilot inebriation or drug trafficking, are not accepted.) To encourage reporting, any identifying information is removed before reports are entered into the ASRS database. The FAA’s commitment not to use information from ASRS reports as a basis for enforcement actions has been a key factor in the submission of more than 600,000 reports since the program’s inception.

The choice of mandatory versus voluntary reporting requires careful consideration. Many precursors result in no obvious damage and are often only observed by one or two individuals. Therefore, to some extent, the reporting of precursors is a voluntary action even if a mandatory reporting system is in place. Voluntary reporting can create a more positive collaborative safety culture among people working on the front line, management, and other stakeholders, such as government agencies and labor unions. When voluntary reporting is implemented, managers must actively solicit reports and take them and act on them as appropriate. It is important that employees receive feedback on how the organization has addressed precursor reports. Failure to take these steps will often result in a dwindling of submitted reports because employees will believe that they are not truly valued by senior management. To further ensure a steady stream of reports, certain protections are often stipulated, such as reporter anonymity and immunity from disciplinary action. This protection can create a cooperative environment between the reporter and those analyzing precursor events and indicate that safety is valued as an organizational norm.

Nonetheless, in some cases mandatory reporting may be preferable, with corresponding punitive sanctions for noncompliance. More specifically, implementation of voluntary precursor programs may be difficult to enact after calamitous events, because public sentiment may push for stricter enforcement measures. In addition, if stakeholders routinely capture precursor information, there may be little to gain by asking for voluntary submission of precursor reports to a centralized agency. Both of these factors may have influenced the mandatory reporting requirements of the TREAD Act. After the 2001 congressional hearings into the Ford-Firestone problems, public sentiment may have encouraged mandated reporting. Furthermore, much of the warranty data that car manufacturers were asked to submit under the TREAD Act were already collected by stakeholders.

Broad versus specific definition of precursors

Defining what a precursor is can be surprisingly difficult because the attributes of a precursor can be subjective. Any definition involves specifying those events or conditions that are sufficiently unsafe to merit analysis. The number and quality of reports submitted also depend on how precursors are defined.

A definition that creates a specific and typically high threshold for reporting may result in fewer reports but potentially more significant findings than looser definitions that encourage reporting of events with a wider range of severities, some of which may involve only a minor loss of safety margin. If the threshold for reporting is set too high (an accident must be narrowly averted to merit reporting) or precursors are defined too precisely (to include only very specific events), some risk-significant events may not be reported.

Conversely, if the threshold for reporting is set too low, the system may be overwhelmed by false alarms or inconsequential events, especially if some corrective action or substantial analysis is required for all reported events. A low threshold can also lead to a perception that the reporting system is of little value. These competing tradeoffs can lead to type I (false positive) and type II (false negative) errors. These types of errors can result in too much investigation of issues that are not problematic and too little investigation of issues that are.

The NRC has chosen to limit the use of the term precursor to events that could lead to the meltdown of a nuclear reactor’s core and that exceed a specified level of severity. For instance, a precursor in a nuclear plant could be a complete failure of one safety system or a partial failure of two safety systems. Events of lesser severity, with low conditional probabilities of core damage, are either not considered precursors or not singled out as deserving of further analysis. To compensate for this very specific definition, the NRC and plant licensees also use other avenues for reporting, such as site-run incident-reporting programs, thereby reducing their exposure to potential type II errors.

In contrast, the Veterans Administration’s (VA’s) Patient Safety Reporting System allows the reporting of any safety-related event, including serious injuries and close calls, but also lessons learned and ideas for safety improvements. In implementing such a system, type I errors can prove problematic because precursor program managers may face a deluge of reports ranging from the very serious to quite benign.

Perceptions of precursor reporting

Is a large number of precursor reports indicative of a safe or unsafe system? There is no simple answer to this question. Organizations and government agencies must be particularly careful in making pronouncements that safety has increased or worsened based on a change in the number of precursor reports.

In some cases, an increase in precursor reports can indicate a greater concern with safety, because it suggests that employees are actively looking for flaws in the system. For example, during the inception of a voluntary reporting system, a rise in the number of reports suggests that employees are participating in the program. Such an increase was observed after the NASA-FAA ASRS program, whose reporting system was established in 1976, experienced a 10-fold increase in reporting between 1983 and 1991, despite accident rates remaining relatively constant.

In other cases, a large number of reports may indicate an unsafe system, particularly in systems where precursors are detected automatically (with an electronic surveillance system). Thus, a decrease in reported precursor events is indicative of safer operations. For example, the rail industry automatically monitors “signals passed at danger,” such as when a train passes a red signal. In this system, reported events are a clear and unambiguous departure from safe operation. As such, any increase in signals passed at danger over time indicates a less safe system.

In general, vigilance should be maintained whether few or many precursors are observed. When few precursors are observed, organizations must question whether they are actively soliciting and identifying relevant signals of potential danger. If many precursors are observed, organizations must determine whether they are giving appropriate attention to these reported events. For instance, in some past accidents, including the losses of the two space shuttles, repeated precursors without an accident led to the perception that the system was more robust than it actually was. This phenomenon, termed “the normalization of deviance” by Diane Vaughan, professor of sociology at Boston College, can result in implicit acceptance of higher and higher risks over time.

In either situation, whether only a few or many precursors are observed, the events can create a compelling case for resolving significant safety threats. Jim Bagian, director of the VA’s National Center for Patient Safety, astutely summarized why organizations should not overly focus on reporting numbers but rather undertake follow-up studies on ways to reduce risks, when he noted that, “In the end, success is not about counting reports. It is about identifying vulnerabilities and precursors to problems and then formulating and implementing corrective actions. Analysis and actions are the keys, and success is manifested by changes in the culture and the workplace.”

Roles for government and industry

Engaging organizations in reporting and learning from accident precursors is a valuable aspect of safety management. Maintaining safety is an ongoing dynamic process that does not stop once a technology has been designed, built, or deployed. Indicators of future problems can and do arise despite the best engineering practices, strict adherence to standards, and ongoing maintenance. There is thus a strong need for mechanisms to capture and benefit from these indicators, with precursor programs being formal approaches to using signals successfully.

Government and industry must work to define new precursor programs and make ongoing improvements in existing ones. Government agencies, especially those overseeing high-hazard industries, must play a role in facilitating an informed dialogue with industry stakeholders to ensure that precursor information is successfully used and managed to maintain and improve system safety. Numerous issues, including reporter indemnity and the sharing of risk-related information between the private sector and the government, require government input. At the same time, the private sector must embrace precursor management as one vital approach in the ongoing pursuit of system safety. Indeed, it is the responsibility of the private sector to be an engaged partner with government to help ensure that precursor programs are defined, implemented, and successfully managed on an ongoing basis.


James Phimister was project director and Vicki M. Bier and Howard C. Kunreuther served as co-chairs of the National Academy of Engineering committee that produced the report Accident Precursor Analysis and Management: Reducing Technological Risk through Diligence. Bier is a professor in the Department of Industrial and Systems Engineering and director of the Center for Human Performance and Risk Analysis at the University of Wisconsin, Madison. Kunreuther is the Cecilia Yen Koo Professor and co-director of the Risk Management and Decision Processes Center of the Wharton School at the University of Pennsylvania.