AI Lacks Ethic Checks for Human Experimentation
Following Nazi medical experiments in World War II and outrage over the US Public Health Service’s four-decade-long Tuskegee syphilis study, bioethicists laid out frameworks, such as the 1947 Nuremberg Code and the 1979 Belmont Report, to regulate medical experimentation on human subjects. Today social media—and, increasingly, generative artificial intelligence—are constantly experimenting on human subjects, but without institutional checks to prevent harm.
In fact, over the last two decades, individuals have become so used to being part of large-scale testing that society has essentially been configured to produce human laboratories for AI. Examples include experiments with biometric and payment systems in refugee camps (designed to investigate use cases for blockchain applications), urban living labs where families are offered rent-free housing in exchange for serving as human subjects in a permanent marketing and branding experiment, and a mobile money research and development program where mobile providers offer their African consumers to firms looking to test new biometric and fintech applications. Originally put forward as a simpler way to test applications, the convention of software as “continual beta” rather than more discrete releases has enabled business models that depend on the creation of laboratory populations whose use of the software is observed in real time.
This experimentation on human populations has become normalized, and forms of AI experimentation are touted as a route to economic development. The Digital Europe Programme launched AI testing and experimentation facilities in 2023 to support what the program calls “regulatory sandboxes,” where populations will interact with AI deployments in order to produce information for regulators on harms and benefits. The goal is to allow some forms of real-world testing for smaller tech companies “without undue pressure from industry giants.” It is unclear, however, what can pressure the giants and what constitutes a meaningful sandbox for generative AI; given that it is already being incorporated into the base layers of applications we would be hard-pressed to avoid, the boundaries between the sandbox and the world are unclear.
Generative AI is an extreme case of unregulated experimentation-as-innovation, with no formal mechanism for considering potential harms. These experiments are already producing unforeseen ruptures in professional practice and knowledge: students are using ChatGPT to cheat on exams, and lawyers are filing AI-drafted briefs with fabricated case citations. Generative AI also undermines the public’s grip on the notion of “ground truth” by hallucinating false information in subtle and unpredictable ways.
These two breakdowns constitute an abrupt removal of what philosopher Regina Rini has termed “the epistemic backstop,”—that is, the benchmark for considering something real. Generative AI subverts information-seeking practices that professional domains such as law, policy, and medicine rely on; it also corrupts the ability to draw on common truth in public debates. Ironically, that disruption is being classed as success by the developers of such systems, emphasizing that this is not an experiment we are conducting but one that is being conducted upon us.
This is problematic from a governance point of view because much of current regulation places the responsibility for AI safety on individuals, whereas in reality they are the subjects of an experiment being conducted across society. The challenge this creates for researchers is to identify the kinds of rupture generative AI can cause and at what scales, and then translate the problem into a regulatory one. Then authorities can formalize and impose accountability, rather than creating diffuse and ill-defined forms of responsibility for individuals. Getting this right will guide how the technology develops and set the risks AI will pose in the medium and longer term.
Much like what happened with biomedical experimentation in the twentieth century, the work of defining boundaries for AI experimentation goes beyond “AI safety” to AI legitimacy, and this is the next frontier of conceptual social scientific work. Sectors, disciplines, and regulatory authorities must work to update the definition of experimentation so that it includes digitally enabled and data-driven forms of testing. It can no longer be assumed that experimentation is a bounded activity with impacts only on a single, visible group of people. Experimentation at scale is frequently invisible to its subjects, but this does not render it any less problematic or absolve regulators from creating ways of scrutinizing and controlling it.