Generative AI Is a Crisis for Copyright Law

By Kate Crawford, Jason Schultz

Generative artificial intelligence is driving copyright into a crisis. More than a dozen copyright cases about AI were filed in the United States last year, up severalfold from all filings from 2020 to 2022. In early 2023, the US Copyright Office launched the most comprehensive review of the entire copyright system in 50 years, with a focus on generative AI. Simply put, the widespread use of AI is poised to force a substantial reworking of how, where, and to whom copyright should apply.

Starting with the 1710 British statute, “An Act for the Encouragement of Learning,” Anglo-American copyright law has provided a framework around creative production and ownership. Copyright is even embedded in the US Constitution as a tool “to promote the Progress of Science and useful Arts.” Now generative AI is destabilizing the foundational concepts of copyright law as it was originally conceived.

Typical copyright lawsuits focus on a single work and a single unauthorized copy, or “output,” to determine if infringement has occurred. When it comes to the capture of online data to train AI systems, the sheer scale and scope of these datasets overwhelms traditional analysis. The LAION 5-B dataset, used to train the AI image generator Stable Diffusion, contains 5 billion images and text captions harvested from the internet, while CommonPool (a collection of datasets released by nonprofit LAION in April to democratize machine learning), offers 12.8 billion images and captions. Generative AI systems have used datasets like these to produce billions of outputs.

US courts are likely to find that training AI systems on copyrighted works is acceptable under the fair use exemption, which allows for limited use of copyrighted works without permission in some cases.

For many artists and designers, this feels like an existential threat. Their work is being used to train AI systems, which can then create images and texts that replicate their artistic style. But to date, no court has considered AI training to be copyright infringement: following the Google Books case in 2015, which assessed scanning books to create a searchable index, US courts are likely to find that training AI systems on copyrighted works is acceptable under the fair use exemption, which allows for limited use of copyrighted works without permission in some cases when the use serves the public interest. It is also permitted in the European Union under the text and data mining exception of EU digital copyright law.

Copyright law has also struggled with authorship by AI systems. Anglo-American law presumes that work has an “author” somewhere. To encourage human creativity, some authors need the economic incentive of a time-limited monopoly on making, selling, and showing their work. But algorithms don’t need incentives. So according to the US Copyright Office they aren’t entitled to copyright. The same reasoning applied to other cases involving nonhuman authors, including the case where a macaque took selfies using a nature photographer’s camera. Generative AI is the latest in a line of nonhumans deemed unfit to hold copyright.

Nor are human prompters likely to have copyrights in AI-generated work. The algorithms and neural net architectures behind generative AI algorithms produce outputs that are inherently unpredictable, and any human prompter has less control over a creation than the model does.

Where does this leave us? For the moment, in limbo. The billions of works produced by generative AI are unowned and can be used anywhere, by anyone, for any purpose. Whether a ChatGPT novella or a Stable Diffusion artwork, output now exists as unclaimable content in the commercial workings of copyright itself. This is a radical moment in creative production: a stream of works without any legally recognizable author.

This is a radical moment in creative production: a stream of works without any legally recognizable author.

There is an equivalent crisis in proving copyright infringement. Historically, this has been easy, but when a generative AI system produces infringing content, be it an image of Mickey Mouse or Pikachu, courts will struggle with the question of who is initiating the copying. The AI researchers who gathered the training dataset? The company that trained the model? The user who prompted the model? It’s unclear where agency and accountability lie, so how can courts order an appropriate remedy?

Copyright law was developed by eighteenth-century capitalists to intertwine art with commerce. In the twenty-first century, it is being used by technology companies to allow them to exploit all the works of human creativity that are digitized and online. But the destabilization around generative AI is also an opportunity for a more radical reassessment of the social, legal, and cultural frameworks underpinning creative production.

What expectations of consent, credit, or compensation should human creators have going forward, when their online work is routinely incorporated into training sets? What happens when humans make works using generative AI that cannot have copyright protection? And how does our understanding of the value of human creativity change when it is increasingly mediated by technology, be it the pen, paintbrush, Photoshop, or DALL-E?

It may be time to develop concepts of intellectual property with a stronger focus on equity and creativity as opposed to economic incentives for media corporations. We are seeing early prototypes emerge from the recent collective bargaining agreements for writers, actors, and directors, many of whom lack copyrights but are nonetheless at the creative core of filmmaking. The lessons we learn from them could set a powerful precedent for how to pluralize intellectual property. Making a better world will require a deeper philosophical engagement with what it is to create, who has a say in how creations can be used, and who should profit.

Search Issues

Generative AI Is a Crisis for Copyright Law

Join the Conversation