Vondran Legal® AI & IP Law Firm!
The Battle Over AI Training Data: Is It Fair Use?
Artificial intelligence is transforming nearly every industry, from legal research and healthcare to entertainment, publishing, and software development. But beneath the rapid growth of generative AI lies a legal question that has sparked dozens of high-profile lawsuits:
Can AI companies legally use copyrighted works to train their models without obtaining permission from copyright owners?
The answer increasingly appears to be "yes"—at least under certain circumstances.
Recent federal court decisions involving companies such as Anthropic, Meta, and Ross Intelligence suggest that courts are generally receptive to the argument that using copyrighted works to train AI models constitutes fair use under U.S. copyright law. However, the issue remains far from settled, and the legal landscape continues to evolve.
This article examines the emerging case law, the four fair use factors, and what copyright owners, content creators, and AI developers should know moving forward.
Why AI Companies Use Copyrighted Material
Training large language models (LLMs) requires enormous quantities of data. AI systems learn patterns, language structures, and relationships by analyzing vast collections of text, images, audio, video, and other content.
The challenge is that much of the world's valuable data is protected by copyright.
Books, newspaper articles, photographs, music, software code, legal materials, and academic publications often provide the rich information needed to develop sophisticated AI systems.
Obtaining licenses from every copyright owner is frequently impractical, expensive, and time-consuming. As a result, many AI developers trained their models using unlicensed copyrighted materials, leading to a wave of copyright infringement lawsuits.
The central legal defense raised by AI companies has been fair use.
Understanding Fair Use in AI Copyright Cases
Fair use is an affirmative defense under Section 107 of the Copyright Act.
Courts evaluate four statutory factors:
-
Purpose and character of the use
-
Nature of the copyrighted work
-
Amount and substantiality used
-
Effect on the market
No single factor is dispositive. Courts balance all four factors together.
Recent AI decisions provide valuable insight into how these factors may be applied to AI training.
Factor One: Purpose and Character of the Use
The Transformative Use Inquiry
The first factor examines whether the use is transformative.
A use is considered transformative when it creates a new purpose or meaning that differs from the original copyrighted work.
Courts increasingly view AI training as transformative because AI systems generally do not reproduce copyrighted works verbatim. Instead, they learn patterns, relationships, and concepts from the data.
Bartz v. Anthropic
One of the most significant AI copyright decisions to date is Bartz v. Anthropic.
The case involved Anthropic's use of copyrighted books to train its AI model, Claude.
The court found that training Claude on copyrighted books constituted fair use because the resulting outputs were "spectacularly different" from the original works.
The court compared AI learning to human learning, explaining that people routinely read books and later create their own original works based on acquired knowledge.
According to the court, copyright law does not give authors the right to prevent others from learning from their works.
This reasoning strongly supports AI developers who use copyrighted materials to train models that generate new and distinct outputs.
When AI Training Is Not Transformative
Not every AI-related use qualifies as fair use.
Thomson Reuters v. Ross Intelligence
In this case, Ross Intelligence copied Westlaw headnotes to develop a competing legal research platform.
The court concluded that Ross was not using the materials for a transformative purpose.
Instead, Ross used the copyrighted content as a shortcut to build a commercial substitute that directly competed with Westlaw.
As a result, the first fair use factor favored Thomson Reuters.
The lesson is clear:
Courts are more likely to find fair use when AI learns from copyrighted works rather than when it simply copies those works to create a competing product.
Factor Two: Nature of the Copyrighted Work
The second factor examines the type of work being copied.
Copyright law provides stronger protection to highly creative works than to factual or functional works.
Examples of highly protected works include:
-
Novels
-
Songs
-
Motion pictures
-
Artwork
-
Photography
Examples receiving less protection include:
-
Databases
-
Technical manuals
-
Functional software elements
-
Factual compilations
Creative Works May Weigh Against Fair Use
In Bartz, Anthropic acknowledged that the books used for training contained significant creative expression.
Because copyright law strongly protects expressive content, the court found that this factor weighed against fair use.
However, the court still ultimately ruled in favor of Anthropic after considering all four factors together.
This demonstrates an important point:
Even when one factor favors the copyright owner, the overall fair use analysis may still favor the AI developer.
Factor Three: Amount and Substantiality of the Use
The third factor examines how much copyrighted material is used.
Historically, copying an entire work often weighs against fair use.
AI cases present a unique challenge because developers frequently copy entire works during the training process.
Yet courts increasingly recognize that copying an entire work may sometimes be necessary for transformative purposes.
The Google Books Analogy
A key precedent is Authors Guild v. Google.
Google scanned millions of copyrighted books and created a searchable database that displayed limited text snippets.
Although entire books were copied internally, users could not access complete copies.
The court held that Google's use was fair because:
-
Only small portions were displayed
-
Users could not reconstruct entire books
-
The system did not serve as a substitute for purchasing books
This case has become an important reference point for AI litigation.
Many courts view AI training similarly because models generally learn from copyrighted works without reproducing them in their outputs.
Factor Four: Market Harm
The fourth factor often carries substantial weight.
Courts examine whether the defendant's conduct harms markets that copyright law is intended to protect.
The Three Market Harm Theories
In Kadrey v. Meta Platforms, the court identified three possible theories of market harm.
Theory One: Direct Substitution
The AI model outputs copies or near-copies of copyrighted works.
This could reduce sales of the original works.
Courts generally agree this type of harm is relevant.
Theory Two: Lost AI Licensing Revenue
Copyright owners argue they should be paid licensing fees whenever AI developers train models using copyrighted content.
However, the Kadrey court expressed skepticism toward this argument.
The court reasoned that copyright law does not automatically grant copyright owners a monopoly over AI training markets.
Simply claiming that licensing opportunities were lost may not be sufficient to defeat fair use.
Theory Three: Competitive AI-Generated Content
This may become the most important issue in future litigation.
Under this theory, AI-generated outputs may not be direct copies but could nevertheless compete with the original works.
For example:
-
AI-generated novels competing with authors
-
AI-generated illustrations competing with artists
-
AI-generated music competing with musicians
-
AI-generated journalism competing with publishers
Although the court found insufficient evidence of such harm in Kadrey, it indicated that future plaintiffs could potentially succeed if they provide stronger evidence.
This creates a significant area of legal uncertainty.
What These Cases Mean for AI Companies
The emerging trend favors AI developers.
Courts increasingly recognize that:
-
Learning from copyrighted materials is different from reproducing them.
-
AI training often serves transformative purposes.
-
Copyright law protects expression, not ideas, facts, or knowledge.
-
AI systems generally extract patterns rather than republish copyrighted works.
As a result, many AI training practices may qualify as fair use.
However, AI companies remain vulnerable when their models:
-
Reproduce copyrighted works verbatim.
-
Generate substantially similar outputs.
-
Function as substitutes for copyrighted content.
-
Cause measurable market harm.
What Copyright Owners Should Watch
Content creators, publishers, musicians, artists, photographers, and authors should closely monitor developments in AI litigation.
Future cases may focus heavily on:
-
Evidence of market displacement.
-
Competition from AI-generated content.
-
Economic harm to creative industries.
-
The growing market for AI licensing agreements.
The strongest future claims may involve demonstrating that AI-generated works directly compete with and diminish demand for human-created works.
The Future of AI Copyright Litigation
The first wave of AI copyright lawsuits suggests that courts are generally receptive to fair use defenses for AI training.
However, several important questions remain unresolved:
-
How much similarity between AI outputs and copyrighted works is too much?
-
What evidence is necessary to prove market harm?
-
Can copyright owners successfully establish a protectable AI licensing market?
-
Will Congress enact legislation specifically addressing AI training?
The answers to these questions will shape the future of artificial intelligence and intellectual property law.
Final Thoughts
The emerging consensus from federal courts is that training AI models on copyrighted works often qualifies as fair use when the resulting outputs are transformative and do not serve as substitutes for the original works.
At the same time, courts continue to leave the door open for copyright owners to challenge AI systems that cause real economic harm or generate competing content.
As AI technology evolves, the tension between innovation and copyright protection will remain one of the most important legal battles of the digital age.
If you are an AI developer, publisher, artist, author, software company, or content creator facing copyright issues related to artificial intelligence, consult experienced intellectual property counsel to evaluate your rights and obligations under this rapidly developing area of law.

