History Can Help Us Chart AI’s Future
Current technical approaches to preventing harm from artificial intelligence and machine learning largely focus on bias in training data and careless (even malicious) misuse. To be sure, these are crucial steps, but they are not sufficient solutions. Many risks from AI are not simply due to flawed executions of an otherwise sound strategy: AI’s penchant for enabling bias and misinformation is built into its “data-driven” modeling paradigm.
This paradigm forms the foundation of present-day machine learning. It relies on data-intensive pattern recognition techniques that generalize from past examples without direct reference to, or even knowledge about, what is being modeled. In other words, data-driven methods are designed to predict the probable output of processes that they can’t describe or explain. That deliberate omission of explanatory models leaves these methods particularly receptive to misdirection.
Today, this data-intensive, brute-force approach to machine learning has become largely synonymous with artificial intelligence and computational modeling as a whole. Yet history shows that the rise of data-driven machine learning was neither natural nor inevitable. Even machine learning itself was not always so data-centric. Today’s dominant paradigm of data-driven machine learning in key areas such as natural language processing represents what Alfred Spector, then Google’s vice president for research, lauded in 2010 as “almost a 180-degree turn in the established approaches to speech recognition.”
Through its early decades, AI research in the United States fixated on replicating human cognitive faculties, based on an assumption that, as historian Stephanie Dick puts it, “computers and minds were the same kind of thing.” The devotion to this human analogy began to change in the 1970s with a highly unorthodox “statistical approach” to speech recognition at IBM. In a stark departure from the established “knowledge-based” approaches of the period, IBM researchers abandoned elaborate formal representations of linguistic knowledge and used statistical pattern recognition techniques to predict the most likely sequence of words, based on large quantities of sample data. Those very researchers described to me how this work owed much of its success to the unique computing resources available at IBM, where they had access to more computing power than anyone else. Even more importantly, they had access to more training data in a period where digitized text was vanishingly scarce by today’s standards. During a federal antitrust case against the company from 1969 to 1982, IBM had manually digitized over 100,000 pages of witness testimony using a warehouse facility full of keypunch operators to manually encode text onto Hollerith punched cards. This material was repurposed into a training corpus of unprecedented size for the period, at around 100 million words.
What resulted was an abandonment of knowledge-based approaches aimed at simulating human decision processes in favor of data-driven approaches aimed solely at predicting their output. This signaled a fundamental reimagining of the relation between human and machine intelligence. Director of IBM’s Continuous Speech Recognition group Fred Jelinek described their approach in 1987 as “the natural way for the machine,” quipping that “if a machine has to fly, it does so as an airplane does—not by flapping its wings.”
The success of this approach directly triggered a shift to data-driven approaches across natural language processing as well as machine vision, bioinformatics, and other domains. In 2009, top Google researchers pointed the earlier success of the statistical approach to speech recognition as proof that “invariably, simple models and a lot of data trump more elaborate models based on less data.”
Framing machine intelligence as something fundamentally distinct from, if not antithetical to, human understanding set a powerful precedent for replacing expert knowledge with data-driven approximation in computational modeling. Generative AI takes this logic a crucial step further, using data not only to model the world, but to actively remake it.
Large language models are both ignorant of and indifferent toward the substance of the statements they generate; they gauge only how likely it is for a sequence of text to appear. Which is to say, if the results pushed to our social media feeds are decided by algorithms that are intentionally designed to only predict patterns, but not to understand them, can the flourishing of misinformation really come as such a surprise?
A failure to recognize how such problems may be intrinsic to the very logic of data-driven machine learning inspires oft-misguided technical fixes, such as increased data collection and tracking, which can lead to harms such as predatory inclusion (in which outwardly democratizing schemes further exploit already marginalized groups). Such approaches are limited because they presume more machine learning to be the best recourse.
But the lens of history helps us break out of this circular thinking. The perpetual expansion of data-driven machine learning should not be seen as a foregone conclusion. Its rise to prominence was embedded in certain assumptions and priorities that became entrenched in its technical framework and normalized over time. Instead of defaulting to tactics that augment machine learning, we need to consider that in some circumstances the very logic of machine learning might be fundamentally unsuitable to our aims.