Flaws In Forensic Science
The Limits of the Polygraph
The time has come to be truthful about its reliability and usefulness.
Developed almost a century ago, the polygraph is the most famous in a long line of techniques that have been used for detecting deception and determining truth. Indeed, for many in the U.S. law enforcement and intelligence communities, it has become the most valued method for identifying criminals, spies, and saboteurs when direct evidence is lacking. Advocates of its use can plausibly claim that the polygraph has a basis in modern science, because it relies on measures of physiological processes. Yet advocates have repeatedly failed to build any strong scientific justification for its use. Despite this, the polygraph is finding new forensic and quasi-forensic applications in areas where the scientific base is even weaker than it is for the traditional use in criminal trials. This is a very troubling, because these new uses are based on overconfidence in the test’s accuracy.
In recent years, and especially since the 2001 terrorist attacks, the U.S. public seems to have become far more willing to believe that modern technology can detect evildoers with precision and before they can do damage. This belief is promulgated in numerous television dramas that portray polygraph tests and other detection technologies as accurately revealing hidden truths about everything from whether a suitor is lying to prospective parents-in-law to which of many possible suspects has committed some hideous crime. Unfortunately, the best available technologies do not perform nearly as well as people would like or as television programs suggest. This situation is unlikely to change any time soon.
Although there is growing pressure from some constituencies to expand the use of polygraph testing in forensic and other public contexts, it would be far wiser for law enforcement and security agencies to minimize use of the tests and to find strategies for reducing threats to public safety and national security that rely as little as possible on the polygraph. Courts that are skeptical about the validity of polygraph evidence are well justified in their attitude.
An unsuccessful attempt to introduce a polygraph test in a District of Columbia murder case in the 1920s led to a famous court decision. A trial judge’s refusal to allow the testimony of William Moulton Marston, who while a graduate student at Harvard had experimented with a method for detecting deception by measuring systolic blood pressure, was appealed. In the 1923 case of Frye v. United States, the circuit court affirmed the trial judge’s ruling, stating that, “while courts will go a long way in admitting expert testimony deduced from a well-organized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs . . . We think the systolic blood pressure deception test has not yet gained such standing.”
The Frye “general acceptance” test became the dominant rule governing the admissibility of scientific expert testimony for the next 70 years. Most courts refused to admit testimony about polygraph evidence, often with reference to Frye. (Marston, by the way, became prominent not only as a polygraph advocate but as the creator, in 1940, of the first female comic book action hero: Wonder Woman, who was known for the special powers of her equipment, including a magic lasso that “was unbreakable, infinitely stretchable, and could make all who are encircled in it tell the truth.”)
In 1993, in the Daubert case, the Supreme Court outlined the current test for the admissibility of scientific evidence in the federal courts. The Daubert test, codified in the Federal Rules of Evidence in 2000, requires trial court judges to act as gatekeepers and evaluate whether the basis for proffered scientific, technical, or other specialized knowledge is reliable and valid. Although Daubert replaced the general acceptance test of Frye, many states, including New York, California, Illinois, and Florida, continue to use Frye. Increasingly, however, courts in Frye jurisdictions are applying a hybrid test that incorporates much of the Daubert thinking. This thinking is consistent with the belief of most scientists that hypotheses gain strength from having survived rigorous testing.
Despite the consistency in basic outlook that evidence such as polygraph tests must be evaluated on the basis of its scientific merit, actual court decisions regarding polygraph use vary widely. In general, courts look at the admissibility of polygraph test results in several ways. Many courts, especially state courts, maintain a per se rule excluding polygraph evidence. They do so for reasons ranging from doubt about its scientific merit to concerns that its use would usurp the traditional jury function of assessing credibility. However, a significant number of jurisdictions that otherwise exclude polygraph evidence under a per se rule nonetheless allow the parties to stipulate to the admissibility of the evidence before the test is administered. These courts typically set requirements on matters such as the qualifications of the polygraph examiners and the conditions under which the tests are to be given. It is presumed that the stipulation makes the examinee take the test more seriously and leads to the selection of more impartial polygraph examiners, both factors that produce more accurate results. These assumptions have some commonsense appeal, but they are unsupported by research and don’t address whether the accuracy and reliability of neutral polygraph examinations are sufficient to permit them as evidence.
There is a troubling aspect to the practice of permitting the parties to stipulate to polygraph admissibility. Ordinarily, judges determine the existence of preliminary facts that are necessary to the admission of proffered evidence. That the parties are willing to stipulate to the admissibility of polygraph results should not free the judge from making the preliminary determination of validity. To be sure, parties regularly stipulate to the admissibility of evidence. But polygraph evidence is unique in that the stipulation occurs before the evidence—the polygraph result—exists. Because of the error rates of polygraph tests, courts should be reluctant to endorse stipulations that amount to little more than a calculated gamble.
Since Daubert, the biggest change in form, if not substance, in regard to polygraphs is the increased number of federal courts that articulate a discretionary standard for determining admissibility. The Ninth Circuit Court held in United States v. Cordoba that Daubert requires trial courts to evaluate polygraph evidence with particularity in each case. This decision does not appear, however, to have substantially changed the practice of excluding polygraph evidence. Federal courts still invariably exclude such evidence under Cordoba, pointing to high error rates and the lack of standards for administering polygraphs. Rule 403 of the Federal Rules, which provides for the exclusion of otherwise admissible evidence when its probative value is substantially outweighed by unfair prejudice, plays a prominent part in leading courts to exclude polygraph evidence. Courts regularly cite Rule 403 when noting the danger that polygraphs will infringe on the jury’s role in making credibility judgments, confuse the jury, or waste the court’s time.
Many jurisdictions outside of the purview of the Federal Rules now employ discretionary admittance tests. Possibly the most permissive jurisdiction is New Mexico, with its law that “entrusts the admissibility of polygraph evidence to the sound discretion of the trial court.” In Massachusetts, a Daubert state, trial courts have similar discretion to admit polygraph evidence, although with a significant caveat. The Massachusetts Supreme Judicial Court held that polygraph evidence is admissible only after the proponent introduces results of proficiency exams that indicate the examiner’s reliability.
Two main constitutional issues have arisen in courts’ decisions about admitting polygraph test results as evidence: the claim that excluding exculpatory polygraph results violates a defendant’s Sixth Amendment right to present evidence, and the claim that admission of inculpatory polygraph results violates a defendant’s Fifth and Fourteenth Amendment rights to due process. In general, courts have steered clear of the minutiae of polygraph research and have treated reservations regarding polygraph accuracy as not rising to constitutional dimensions. For example, in United States v. Scheffer in 1998, the Supreme Court upheld a military court rule that per se excludes polygraph evidence. The court said that exclusionary rules “do not infringe the rights of the accused to present a defense as long as they are not arbitrary or disproportionate to the purposes they are designed to serve.” According to the court, the per se rule has the aim of keeping unreliable evidence from the jury: The government’s conclusion that polygraphs were not sufficiently reliable was supported by the fact that “to this day, the scientific community remains extremely polarized about reliability of polygraph techniques.”
Constitutional questions also arise when defendants claim that admission of inculpatory polygraph results violate due process principles. Once again, courts generally find that the evidentiary standards applicable to polygraphs meet constitutional requirements. Courts have held, however, that the Fifth Amendment privilege against self-incrimination applies to the taking of a polygraph, and thus a defendant’s refusal to do so cannot be used against him or her. Moreover, courts carefully evaluate the waiver of a defendant’s right to counsel or right to remain silent in regard to stipulation agreements concerning polygraph examinations.
The perils of ambiguity
In the wake of controversy over allegations of espionage by Wen Ho Lee, a nuclear scientist at the Department of Energy’s Los Alamos National Laboratory, the department ordered that polygraph tests be given to scientists working in similar positions. Soon thereafter, at the request of Congress, the department asked the National Research Council (NRC) to conduct a thorough study of polygraph testing’s ability to distinguish accurately between lying and truth-telling across a variety of settings and examinees, even in the face of countermeasures that may be employed to defeat the test. Although the NRC was asked to focus on uses of the polygraph for personnel security screening, it examined all available evidence on polygraph test validity, almost all of which comes from studies of specific-event investigations.
The validity of polygraph testing depends in part on the purpose for which it is used. When it is used for investigation of a specific event, such as after a crime, it is possible to ask questions that have little ambiguity, such as “Did you see the victim on Monday?” Thus it is clear what counts as a truthful answer. When used for screening, such as to detect spies or members of a terrorist cell, there is no known specific event being investigated, so the questions must be generic, such as “Did you ever reveal classified information to an unauthorized person?” It may not be clear to the examinee or the examiner whether a particular activity justifies a “yes” answer, so examinees may believe that they are lying when providing factually truthful responses, or vice versa. Such ambiguity necessarily reduces the test’s accuracy. Validity is further compromised when tests are used for what might be called prospective screening (for example, with people believed to be risks for future illegal activity), because such uses involve making inferences about future behavior on the basis of information about past behaviors that may be quite different. For example, does visiting a pornographic Web site or lying about such activity on a polygraph test predict future sex offending?
These and other continuing concerns prompted the Department of Energy to ask the National Research Council (NRC) to conduct a thorough study of the validity of polygraph testing; that is, its ability to distinguish accurately between lying and truth-telling across a variety of settings and examinees and even in the face of countermeasures that may be employed to defeat the test. Although the NRC was asked to focus on uses of the polygraph for personnel security screening, it examined all available evidence on polygraph test validity, almost all of which comes from studies of specific-event investigations.
The NRC study, completed in 2003, examined the basic science underlying the physiological measures used in polygraph testing and the available evidence on polygraph accuracy in actual and simulated investigations. With respect to the basic science, the study concluded that although psychological states associated with deception, such as fear of being accurately judged as deceptive, do tend to affect the physiological responses that the polygraph measures, many other factors, such as anxiety about being tested, also affect those responses. Such phenomena make polygraph testing intrinsically susceptible to producing erroneous results.
To assess test accuracy, the committee sought all available published and unpublished studies that could provide relevant evidence. The quality of the studies was low, with few exceptions. Moreover, there are inherent limitations to the research methods. Laboratory studies suffer from lack of realism. In particular, the consequences associated with lying or being judged deceptive in the laboratory almost never mirrored the seriousness of these actions in the real-world settings in which the polygraph is used. Field studies are limited by the difficulty of identifying the truth against which test results should be judged and the lack of control of extraneous factors. Most of the research, in both the laboratory and in the field, does not fully address key potential threats to validity.
The study found that with examinees untrained in countermeasures designed to beat the test, specific-incident polygraph tests “can discriminate lying from truth-telling at rates well above chance, though well below perfection.” It was impossible to give a more precise estimate of polygraph accuracy, because accuracy levels varied widely across studies for reasons that could not be determined from the research reports.
For several reasons, however, estimates of accuracy from these studies are almost certainly higher than the actual polygraph accuracy of specific-incident testing in the field. Laboratory studies tend to overestimate accuracy, because laboratory conditions involve much less variation in test implementation, in the characteristics of examinees, and in the nature and context of investigations than arise in typical field applications. Field studies of polygraph testing are plagued by selection and measurement biases, such as the inclusion of tests carried out by examiners with knowledge of the evidence and of cases whose outcomes are affected by the examination. In addition, they frequently lack a clear and independent determination of truth. Because of these inherent biases, field studies are also highly likely to overestimate real-world polygraph accuracy.
To help inform policy discussions, the committee calculated the performance of polygraph tests with several possible accuracy indexes in hypothetical populations with known proportions of liars and truth-tellers. The committee’s conclusions were supported by beyond-the-best-case analyses that assumed a greater accuracy level than scientific theory or validation research suggested could be consistently achieved by field polygraph tests, even in specific-incident investigations.
The practical implications of any test accuracy level depend on the application for which the test is to be used. Table 1 shows beyond-the-best-case performance for polygraph tests in two hypothetical applications. In each case, the test is used in two ways. In “suspicious” mode, the test is interpreted strictly enough to correctly identify 80 percent of deceptive examinees; in “friendly” mode, it is interpreted to protect the innocent, so that fewer than half of 1 percent of the innocent examinees “fails.” In each case, we assume that 10,000 tests are given over a period of time.
In Table 1, a security screening application, we assume that only 10 of 10,000 examinees are guilty of a target offense, such as espionage. In the suspicious mode, the test identifies 8 of the 10 spies, but also falsely implicates about 1,598 innocent examinees. Further investigation of all 1,606 people would be needed to find the 8 spies. Someone who “failed” this test would have a 99.5 percent chance of being innocent. In the friendly mode, only about 39 innocent employees would fail the test, but 8 of the 10 spies would “pass” and be allowed to continue doing damage. The committee concluded that for practical security screening applications, polygraph testing is not accurate enough to rely on for detecting deception.
Table 2 summarizes criminal investigation applications in which only suspects are tested, and half of the suspects (5,000 of 10,000) are actually guilty. In the suspicious mode, the test correctly implicates about 4,000 of the guilty but falsely implicates about 800 of the innocent. Thus, almost 17 percent of those who “fail” the test are in fact innocent. In our judgment, “failing” such a polygraph test would leave reasonable doubt about guilt. In the friendly mode, only about 19 of 5,000 innocent people would “fail” the test, but about 4,000 of the 5,000 criminals would “pass.” Of those who “fail,” 98 percent would be guilty, but few criminals would fail.
Reasonable people may disagree about whether a test with these properties is accurate enough to use in a particular law enforcement or national security application. We cannot overemphasize, however, that the scientific evidence is clear that polygraph testing is less accurate than these hypothetical results indicate, even for examinees untrained in countermeasures. In addition, it is impossible to tell from the research how much less accurate the testing is. Accuracy in any particular application depends on factors that remain unknown.
Two justifications are offered for using polygraph testing as an investigative tool. One is based on validity: the idea that test results accurately indicate whether an examinee is telling the truth in responding to particular questions. The other is based on utility: the idea that examinees, because they believe that deception may be revealed by the test, will be deterred from undesired actions that might later be investigated with the polygraph or induced to admit those actions during a polygraph examination. The two justifications are sometimes confused, as when success at eliciting admissions is used to support the claim that the polygraph is a valid scientific technique.
On the basis of field reports and indirect scientific evidence, we believe that polygraph testing is likely to have some utility for deterring security violations, increasing the frequency of admissions of such violations, deterring employment applications from potentially poor security risks, and increasing public confidence in national security organizations. Such utility derives from beliefs about the procedure’s validity, which are distinct from actual validity or accuracy. Polygraph screening programs that yield only a small percentage of positive test results, such as the programs used in the Departments of Energy and Defense, might be useful for deterrence, eliciting admissions, and related purposes. This does not mean that the test results can be relied on to discriminate between lying and truth-telling among people who do not admit to crimes. Most people who lie about committing major security violations would “pass” such a screening test.
Overconfidence in the polygraph—a belief in its accuracy that goes beyond what is justified by the evidence—presents a significant danger to achieving the objectives for which the polygraph is used. In national security applications, overconfidence in polygraph screening can create a false sense of security among policymakers, employees in sensitive positions, and the general public that may in turn lead to inappropriate relaxation of other methods of ensuring security, such as periodic security reinvestigation and vigilance about potential security violations in facilities that use the polygraph for screening. It can waste public resources by devoting to the polygraph funds and energy that would be better spent on alternative procedures. It can lead to unnecessary loss of competent or highly skilled individuals in security organizations because of suspicions cast on them by false positive polygraph exams or because of their fear of such prospects. And it can lead to credible claims that agencies using polygraphs are infringing civil liberties while producing insufficient benefits to national security.
It may be harmless if a television show fails to discriminate between science and science fiction, but it is dangerous when government does not know the difference. In our work conducting the NRC study, we found that many officials in intelligence, counterintelligence, and law enforcement agencies believe that if there are spies, saboteurs, or terrorists working in sensitive positions in the federal government, the polygraph tests currently used for counterintelligence purposes will find most of them. Many such officials also believe that experienced examiners can easily identify people who use countermeasures to try to beat the test. Scientific evidence does not support any of these beliefs; in fact, it goes contrary to all of them.
It can also be dangerous if courts or juries are overconfident about polygraph accuracy. If jurors share the misunderstandings that are common among counterintelligence experts and television writers, they are likely to give undue credence to any polygraph evidence that may be admitted. The dangers are even greater as polygraph testing expands into forensic applications that are not subject to strong challenge in adversarial processes.
New avenues of misuse
Polygraphs and polygraph-like test are used for a variety of other purposes, ranging from identifying fraudulent insurance claims to verifying the winner of a fishing contest. The use of the polygraph for interrogating foreign nationals in terrorism investigations and to verify information from informants, which is much in the news recently, often involves tests being given through translators. There is no scientific evidence supporting any of these uses, an the use of translators introduces additional questions about reliability.
Perhaps the most prevalent use of polygraphs that has emerged beyond those in criminal investigation and national security settings has been in post-conviction sex-offender maintenance programs, which are now required in more than 30 states. As part of their probation program in a typical jurisdiction, released sex offenders are required to submit to periodic polygraph examinations. This practice seems to have originated in the 1960s but became widespread only in the past decade or so.
Advocates for this use tout its accuracy in other settings, citing studies claiming between 96 and 98 percent accuracy in correctly identifying deception and suggesting that polygraph accuracy has improved with recent advances in technology. Both claims are inconsistent with the general evidence on polygraph accuracy. Regarding actual sex offending, we have been unable to locate a single controlled randomized trial or field trial in connection with polygraph testing with anything approaching credibility. Instead of developing serious validation efforts, the American Polygraph Association has acted as if all that is required is uniformity of process. The view, according to J. E. Consigli, who presented the association position in the Handbook of Polygraph Testing, is that “the utility use of polygraphs to elicit admissions and break through denial in sex offenders has demonstrated its necessity.”
Although the use of the polygraph to screen sex offenders may have utility for eliciting admissions, it is important to note the wide discretion given to polygraph examiners. There is a lack of uniformity in the types of polygraph examinations used in various jurisdictions, in terms both of questions used and of format. In terms of accuracy, there is no evidence that a “failed” polygraph test is an accurate indicator of concealed sex crimes. This is a particular concern because polygraph tests given in such settings often revolve around test questions that emphasize “sexually deviant” or “high-risk” behavior, such as the use of alcohol or drugs, sexual activity with a consenting adult, or “masturbation to deviant fantasy,” rather than on the detection of actual sex crimes or other violations of the terms of parole. Such testing is based on the presumption that legal but “undesirable” behaviors are indicators of illegal activity. Claims that polygraph testing is an effective and important management or treatment tool that lowers sexual and general criminal recidivism during supervision and treatment have no credible scientific basis.
Courts have been relatively permissive about the use of polygraphs in probation programs, viewing the evidentiary standards in such settings as different from those associated with the courts themselves. For example, in State v. Travis, the court found that, although the defendant’s agreement to a condition of probation requiring him to submit to a polygraph examination did not establish the admissibility of the results, it could still be used as grounds for the revocation of parole because he was uncooperative and resisted supervision.
The use of polygraph testing in a variety of settings has clearly proved to be extremely problematic. What, then, should be done? At a minimum, we need to continue to be wary about the claimed validity of the polygraph as a scientific tool, especially with regard to its current forensic uses. We believe that the courts have been justified in casting a skeptical eye on the relevance and suitability of polygraph test results as legal evidence. Generalizing from the available scientific evidence to the circumstances of a particular polygraph examination is fraught with difficulty. Further, the courts should extend their reluctance to rely on the polygraph to the many quasi-forensic uses that are emerging, such as the sex offender management programs. The courts and the legal system should not act as if there is a scientific basis for many, if any, of these uses.
John E. Consigli, “Post-Conviction Sex Offender Testing and the American Polygraph Association,” chapter 8 in The Handbook of Polygraph Testing, ed. M. Kleiner (San Diego, Calif.: Academic Press, 2001), 237—250.
Theodore P. Cross and Leonard Saxe, “A Critique of the Validity of Polygraph Testing in Child Sexual Abuse Cases,” Journal of Child Sex Abuse 1 (1992): 19—33.
Theodore P. Cross and Leonard Saxe, “Polygraph Testing and Sexual Abuse: The Lure of the Magic Lasso,” Child Maltreatment 6 (2001): 195—206.
Kim English, Linda Jones, Diane Pasini-Hill, Diane Patrick, and Sydney Cooley-Towell, The Value of Polygraph Testing in Sex Offender Management (research report submitted to the National Institute of Justice, Washington, D.C., December 2000).
David L. Faigman, David H. Kaye, Michael J. Saks, and Joseph Sanders, ed., Modern Scientific Evidence: The Law and the Science of Expert Testimony. St. Paul: West Group, 2002.
William M. Marston, The Lie Detector Test (New York: Richard R. Smith, 1938).
David L. Faigman is a professor at the Hastings College of the Law, University of California, San Francisco. Stephen E. Fienberg is Maurice Falk University Professor of Statistics and Social Science at Carnegie Mellon University. Paul C. Stern ([email protected]) is a principal staff officer at the National Research Council. Faigman was a committee member, Fienberg was committee chair, and Stern was study director for the National Research Council study The Polygraph and Lie Detection (2003).