The Limits of the Polygraph

Flaws In Forensic Science

DAVID L. FAIGMAN

STEPHEN E. FIENBERG

PAUL C. STERN

The Limits of the Polygraph

The time has come to be truthful about its reliability and usefulness.

Developed almost a century ago, the polygraph is the most famous in a long line of techniques that have been used for detecting deception and determining truth. Indeed, for many in the U.S. law enforcement and intelligence communities, it has become the most valued method for identifying criminals, spies, and saboteurs when direct evidence is lacking. Advocates of its use can plausibly claim that the polygraph has a basis in modern science, because it relies on measures of physiological processes. Yet advocates have repeatedly failed to build any strong scientific justification for its use. Despite this, the polygraph is finding new forensic and quasi-forensic applications in areas where the scientific base is even weaker than it is for the traditional use in criminal trials. This is a very troubling, because these new uses are based on overconfidence in the test’s accuracy.

In recent years, and especially since the 2001 terrorist attacks, the U.S. public seems to have become far more willing to believe that modern technology can detect evildoers with precision and before they can do damage. This belief is promulgated in numerous television dramas that portray polygraph tests and other detection technologies as accurately revealing hidden truths about everything from whether a suitor is lying to prospective parents-in-law to which of many possible suspects has committed some hideous crime. Unfortunately, the best available technologies do not perform nearly as well as people would like or as television programs suggest. This situation is unlikely to change any time soon.

Although there is growing pressure from some constituencies to expand the use of polygraph testing in forensic and other public contexts, it would be far wiser for law enforcement and security agencies to minimize use of the tests and to find strategies for reducing threats to public safety and national security that rely as little as possible on the polygraph. Courts that are skeptical about the validity of polygraph evidence are well justified in their attitude.

Legal precedents

An unsuccessful attempt to introduce a polygraph test in a District of Columbia murder case in the 1920s led to a famous court decision. A trial judge’s refusal to allow the testimony of William Moulton Marston, who while a graduate student at Harvard had experimented with a method for detecting deception by measuring systolic blood pressure, was appealed. In the 1923 case of Frye v. United States, the circuit court affirmed the trial judge’s ruling, stating that, “while courts will go a long way in admitting expert testimony deduced from a well-organized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs . . . We think the systolic blood pressure deception test has not yet gained such standing.”

The Frye “general acceptance” test became the dominant rule governing the admissibility of scientific expert testimony for the next 70 years. Most courts refused to admit testimony about polygraph evidence, often with reference to Frye. (Marston, by the way, became prominent not only as a polygraph advocate but as the creator, in 1940, of the first female comic book action hero: Wonder Woman, who was known for the special powers of her equipment, including a magic lasso that “was unbreakable, infinitely stretchable, and could make all who are encircled in it tell the truth.”)

In 1993, in the Daubert case, the Supreme Court outlined the current test for the admissibility of scientific evidence in the federal courts. The Daubert test, codified in the Federal Rules of Evidence in 2000, requires trial court judges to act as gatekeepers and evaluate whether the basis for proffered scientific, technical, or other specialized knowledge is reliable and valid. Although Daubert replaced the general acceptance test of Frye, many states, including New York, California, Illinois, and Florida, continue to use Frye. Increasingly, however, courts in Frye jurisdictions are applying a hybrid test that incorporates much of the Daubert thinking. This thinking is consistent with the belief of most scientists that hypotheses gain strength from having survived rigorous testing.

Despite the consistency in basic outlook that evidence such as polygraph tests must be evaluated on the basis of its scientific merit, actual court decisions regarding polygraph use vary widely. In general, courts look at the admissibility of polygraph test results in several ways. Many courts, especially state courts, maintain a per se rule excluding polygraph evidence. They do so for reasons ranging from doubt about its scientific merit to concerns that its use would usurp the traditional jury function of assessing credibility. However, a significant number of jurisdictions that otherwise exclude polygraph evidence under a per se rule nonetheless allow the parties to stipulate to the admissibility of the evidence before the test is administered. These courts typically set requirements on matters such as the qualifications of the polygraph examiners and the conditions under which the tests are to be given. It is presumed that the stipulation makes the examinee take the test more seriously and leads to the selection of more impartial polygraph examiners, both factors that produce more accurate results. These assumptions have some commonsense appeal, but they are unsupported by research and don’t address whether the accuracy and reliability of neutral polygraph examinations are sufficient to permit them as evidence.

There is a troubling aspect to the practice of permitting the parties to stipulate to polygraph admissibility. Ordinarily, judges determine the existence of preliminary facts that are necessary to the admission of proffered evidence. That the parties are willing to stipulate to the admissibility of polygraph results should not free the judge from making the preliminary determination of validity. To be sure, parties regularly stipulate to the admissibility of evidence. But polygraph evidence is unique in that the stipulation occurs before the evidence—the polygraph result—exists. Because of the error rates of polygraph tests, courts should be reluctant to endorse stipulations that amount to little more than a calculated gamble.

Since Daubert, the biggest change in form, if not substance, in regard to polygraphs is the increased number of federal courts that articulate a discretionary standard for determining admissibility. The Ninth Circuit Court held in United States v. Cordoba that Daubert requires trial courts to evaluate polygraph evidence with particularity in each case. This decision does not appear, however, to have substantially changed the practice of excluding polygraph evidence. Federal courts still invariably exclude such evidence under Cordoba, pointing to high error rates and the lack of standards for administering polygraphs. Rule 403 of the Federal Rules, which provides for the exclusion of otherwise admissible evidence when its probative value is substantially outweighed by unfair prejudice, plays a prominent part in leading courts to exclude polygraph evidence. Courts regularly cite Rule 403 when noting the danger that polygraphs will infringe on the jury’s role in making credibility judgments, confuse the jury, or waste the court’s time.

Many jurisdictions outside of the purview of the Federal Rules now employ discretionary admittance tests. Possibly the most permissive jurisdiction is New Mexico, with its law that “entrusts the admissibility of polygraph evidence to the sound discretion of the trial court.” In Massachusetts, a Daubert state, trial courts have similar discretion to admit polygraph evidence, although with a significant caveat. The Massachusetts Supreme Judicial Court held that polygraph evidence is admissible only after the proponent introduces results of proficiency exams that indicate the examiner’s reliability.

Overconfidence in the polygraph presents a significant danger to achieving the objectives for which the polygraph is used.

Two main constitutional issues have arisen in courts’ decisions about admitting polygraph test results as evidence: the claim that excluding exculpatory polygraph results violates a defendant’s Sixth Amendment right to present evidence, and the claim that admission of inculpatory polygraph results violates a defendant’s Fifth and Fourteenth Amendment rights to due process. In general, courts have steered clear of the minutiae of polygraph research and have treated reservations regarding polygraph accuracy as not rising to constitutional dimensions. For example, in United States v. Scheffer in 1998, the Supreme Court upheld a military court rule that per se excludes polygraph evidence. The court said that exclusionary rules “do not infringe the rights of the accused to present a defense as long as they are not arbitrary or disproportionate to the purposes they are designed to serve.” According to the court, the per se rule has the aim of keeping unreliable evidence from the jury: The government’s conclusion that polygraphs were not sufficiently reliable was supported by the fact that “to this day, the scientific community remains extremely polarized about reliability of polygraph techniques.”

Constitutional questions also arise when defendants claim that admission of inculpatory polygraph results violate due process principles. Once again, courts generally find that the evidentiary standards applicable to polygraphs meet constitutional requirements. Courts have held, however, that the Fifth Amendment privilege against self-incrimination applies to the taking of a polygraph, and thus a defendant’s refusal to do so cannot be used against him or her. Moreover, courts carefully evaluate the waiver of a defendant’s right to counsel or right to remain silent in regard to stipulation agreements concerning polygraph examinations.

The perils of ambiguity

In the wake of controversy over allegations of espionage by Wen Ho Lee, a nuclear scientist at the Department of Energy’s Los Alamos National Laboratory, the department ordered that polygraph tests be given to scientists working in similar positions. Soon thereafter, at the request of Congress, the department asked the National Research Council (NRC) to conduct a thorough study of polygraph testing’s ability to distinguish accurately between lying and truth-telling across a variety of settings and examinees, even in the face of countermeasures that may be employed to defeat the test. Although the NRC was asked to focus on uses of the polygraph for personnel security screening, it examined all available evidence on polygraph test validity, almost all of which comes from studies of specific-event investigations.

The validity of polygraph testing depends in part on the purpose for which it is used. When it is used for investigation of a specific event, such as after a crime, it is possible to ask questions that have little ambiguity, such as “Did you see the victim on Monday?” Thus it is clear what counts as a truthful answer. When used for screening, such as to detect spies or members of a terrorist cell, there is no known specific event being investigated, so the questions must be generic, such as “Did you ever reveal classified information to an unauthorized person?” It may not be clear to the examinee or the examiner whether a particular activity justifies a “yes” answer, so examinees may believe that they are lying when providing factually truthful responses, or vice versa. Such ambiguity necessarily reduces the test’s accuracy. Validity is further compromised when tests are used for what might be called prospective screening (for example, with people believed to be risks for future illegal activity), because such uses involve making inferences about future behavior on the basis of information about past behaviors that may be quite different. For example, does visiting a pornographic Web site or lying about such activity on a polygraph test predict future sex offending?

These and other continuing concerns prompted the Department of Energy to ask the National Research Council (NRC) to conduct a thorough study of the validity of polygraph testing; that is, its ability to distinguish accurately between lying and truth-telling across a variety of settings and examinees and even in the face of countermeasures that may be employed to defeat the test. Although the NRC was asked to focus on uses of the polygraph for personnel security screening, it examined all available evidence on polygraph test validity, almost all of which comes from studies of specific-event investigations.

The NRC study, completed in 2003, examined the basic science underlying the physiological measures used in polygraph testing and the available evidence on polygraph accuracy in actual and simulated investigations. With respect to the basic science, the study concluded that although psychological states associated with deception, such as fear of being accurately judged as deceptive, do tend to affect the physiological responses that the polygraph measures, many other factors, such as anxiety about being tested, also affect those responses. Such phenomena make polygraph testing intrinsically susceptible to producing erroneous results.

To assess test accuracy, the committee sought all available published and unpublished studies that could provide relevant evidence. The quality of the studies was low, with few exceptions. Moreover, there are inherent limitations to the research methods. Laboratory studies suffer from lack of realism. In particular, the consequences associated with lying or being judged deceptive in the laboratory almost never mirrored the seriousness of these actions in the real-world settings in which the polygraph is used. Field studies are limited by the difficulty of identifying the truth against which test results should be judged and the lack of control of extraneous factors. Most of the research, in both the laboratory and in the field, does not fully address key potential threats to validity.

The study found that with examinees untrained in countermeasures designed to beat the test, specific-incident polygraph tests “can discriminate lying from truth-telling at rates well above chance, though well below perfection.” It was impossible to give a more precise estimate of polygraph accuracy, because accuracy levels varied widely across studies for reasons that could not be determined from the research reports.

For several reasons, however, estimates of accuracy from these studies are almost certainly higher than the actual polygraph accuracy of specific-incident testing in the field. Laboratory studies tend to overestimate accuracy, because laboratory conditions involve much less variation in test implementation, in the characteristics of examinees, and in the nature and context of investigations than arise in typical field applications. Field studies of polygraph testing are plagued by selection and measurement biases, such as the inclusion of tests carried out by examiners with knowledge of the evidence and of cases whose outcomes are affected by the examination. In addition, they frequently lack a clear and independent determination of truth. Because of these inherent biases, field studies are also highly likely to overestimate real-world polygraph accuracy.

To help inform policy discussions, the committee calculated the performance of polygraph tests with several possible accuracy indexes in hypothetical populations with known proportions of liars and truth-tellers. The committee’s conclusions were supported by beyond-the-best-case analyses that assumed a greater accuracy level than scientific theory or validation research suggested could be consistently achieved by field polygraph tests, even in specific-incident investigations.

The practical implications of any test accuracy level depend on the application for which the test is to be used. Table 1 shows beyond-the-best-case performance for polygraph tests in two hypothetical applications. In each case, the test is used in two ways. In “suspicious” mode, the test is interpreted strictly enough to correctly identify 80 percent of deceptive examinees; in “friendly” mode, it is interpreted to protect the innocent, so that fewer than half of 1 percent of the innocent examinees “fails.” In each case, we assume that 10,000 tests are given over a period of time.