AI Training and Copyright Fair Use: What Recent Court Decisions Mean for AI Developers and Copyright Owners

Posted by Steve Vondran | Jun 22, 2026

Vondran Legal® AI & IP Law Firm!

The Battle Over AI Training Data: Is It Fair Use?

Artificial intelligence is transforming nearly every industry, from legal research and healthcare to entertainment, publishing, and software development. But beneath the rapid growth of generative AI lies a legal question that has sparked dozens of high-profile lawsuits:

Can AI companies legally use copyrighted works to train their models without obtaining permission from copyright owners?

The answer increasingly appears to be "yes"—at least under certain circumstances.

Recent federal court decisions involving companies such as Anthropic, Meta, and Ross Intelligence suggest that courts are generally receptive to the argument that using copyrighted works to train AI models constitutes fair use under U.S. copyright law. However, the issue remains far from settled, and the legal landscape continues to evolve.

This article examines the emerging case law, the four fair use factors, and what copyright owners, content creators, and AI developers should know moving forward.

Why AI Companies Use Copyrighted Material

Training large language models (LLMs) requires enormous quantities of data. AI systems learn patterns, language structures, and relationships by analyzing vast collections of text, images, audio, video, and other content.

The challenge is that much of the world's valuable data is protected by copyright.

Books, newspaper articles, photographs, music, software code, legal materials, and academic publications often provide the rich information needed to develop sophisticated AI systems.

Obtaining licenses from every copyright owner is frequently impractical, expensive, and time-consuming. As a result, many AI developers trained their models using unlicensed copyrighted materials, leading to a wave of copyright infringement lawsuits.

The central legal defense raised by AI companies has been fair use.

Understanding Fair Use in AI Copyright Cases

Fair use is an affirmative defense under Section 107 of the Copyright Act.

Courts evaluate four statutory factors:

Purpose and character of the use
Nature of the copyrighted work
Amount and substantiality used
Effect on the market

No single factor is dispositive. Courts balance all four factors together.

Recent AI decisions provide valuable insight into how these factors may be applied to AI training.

Factor One: Purpose and Character of the Use

The Transformative Use Inquiry

The first factor examines whether the use is transformative.

A use is considered transformative when it creates a new purpose or meaning that differs from the original copyrighted work.

Courts increasingly view AI training as transformative because AI systems generally do not reproduce copyrighted works verbatim. Instead, they learn patterns, relationships, and concepts from the data.

Bartz v. Anthropic

One of the most significant AI copyright decisions to date is Bartz v. Anthropic.

The case involved Anthropic's use of copyrighted books to train its AI model, Claude.

The court found that training Claude on copyrighted books constituted fair use because the resulting outputs were "spectacularly different" from the original works.

The court compared AI learning to human learning, explaining that people routinely read books and later create their own original works based on acquired knowledge.

According to the court, copyright law does not give authors the right to prevent others from learning from their works.

This reasoning strongly supports AI developers who use copyrighted materials to train models that generate new and distinct outputs.

When AI Training Is Not Transformative

Not every AI-related use qualifies as fair use.

Thomson Reuters v. Ross Intelligence

In this case, Ross Intelligence copied Westlaw headnotes to develop a competing legal research platform.

The court concluded that Ross was not using the materials for a transformative purpose.

Instead, Ross used the copyrighted content as a shortcut to build a commercial substitute that directly competed with Westlaw.

As a result, the first fair use factor favored Thomson Reuters.

The lesson is clear:

Courts are more likely to find fair use when AI learns from copyrighted works rather than when it simply copies those works to create a competing product.

Factor Two: Nature of the Copyrighted Work

The second factor examines the type of work being copied.

Examples of highly protected works include:

Novels
Songs
Motion pictures
Artwork
Photography

Examples receiving less protection include:

Databases
Technical manuals
Functional software elements
Factual compilations

Creative Works May Weigh Against Fair Use

In Bartz, Anthropic acknowledged that the books used for training contained significant creative expression.

Because copyright law strongly protects expressive content, the court found that this factor weighed against fair use.

However, the court still ultimately ruled in favor of Anthropic after considering all four factors together.

This demonstrates an important point:

Even when one factor favors the copyright owner, the overall fair use analysis may still favor the AI developer.

Factor Three: Amount and Substantiality of the Use

The third factor examines how much copyrighted material is used.

Historically, copying an entire work often weighs against fair use.

AI cases present a unique challenge because developers frequently copy entire works during the training process.

Yet courts increasingly recognize that copying an entire work may sometimes be necessary for transformative purposes.

The Google Books Analogy

A key precedent is Authors Guild v. Google.

Google scanned millions of copyrighted books and created a searchable database that displayed limited text snippets.

Although entire books were copied internally, users could not access complete copies.

The court held that Google's use was fair because:

Only small portions were displayed
Users could not reconstruct entire books
The system did not serve as a substitute for purchasing books

This case has become an important reference point for AI litigation.

Many courts view AI training similarly because models generally learn from copyrighted works without reproducing them in their outputs.

Factor Four: Market Harm

The fourth factor often carries substantial weight.

Courts examine whether the defendant's conduct harms markets that copyright law is intended to protect.

The Three Market Harm Theories

In Kadrey v. Meta Platforms, the court identified three possible theories of market harm.

Theory One: Direct Substitution

The AI model outputs copies or near-copies of copyrighted works.

This could reduce sales of the original works.

Courts generally agree this type of harm is relevant.

Theory Two: Lost AI Licensing Revenue

However, the Kadrey court expressed skepticism toward this argument.

The court reasoned that copyright law does not automatically grant copyright owners a monopoly over AI training markets.

Simply claiming that licensing opportunities were lost may not be sufficient to defeat fair use.

Theory Three: Competitive AI-Generated Content

This may become the most important issue in future litigation.

Under this theory, AI-generated outputs may not be direct copies but could nevertheless compete with the original works.

For example:

AI-generated novels competing with authors
AI-generated illustrations competing with artists
AI-generated music competing with musicians
AI-generated journalism competing with publishers

Although the court found insufficient evidence of such harm in Kadrey, it indicated that future plaintiffs could potentially succeed if they provide stronger evidence.

This creates a significant area of legal uncertainty.

What These Cases Mean for AI Companies

The emerging trend favors AI developers.

Courts increasingly recognize that:

Learning from copyrighted materials is different from reproducing them.
AI training often serves transformative purposes.
Copyright law protects expression, not ideas, facts, or knowledge.
AI systems generally extract patterns rather than republish copyrighted works.

As a result, many AI training practices may qualify as fair use.

However, AI companies remain vulnerable when their models:

Reproduce copyrighted works verbatim.
Generate substantially similar outputs.
Function as substitutes for copyrighted content.
Cause measurable market harm.

What Copyright Owners Should Watch

Content creators, publishers, musicians, artists, photographers, and authors should closely monitor developments in AI litigation.

Future cases may focus heavily on:

Evidence of market displacement.
Competition from AI-generated content.
Economic harm to creative industries.
The growing market for AI licensing agreements.

The strongest future claims may involve demonstrating that AI-generated works directly compete with and diminish demand for human-created works.

The Future of AI Copyright Litigation

The first wave of AI copyright lawsuits suggests that courts are generally receptive to fair use defenses for AI training.

However, several important questions remain unresolved:

How much similarity between AI outputs and copyrighted works is too much?
What evidence is necessary to prove market harm?
Can copyright owners successfully establish a protectable AI licensing market?
Will Congress enact legislation specifically addressing AI training?

The answers to these questions will shape the future of artificial intelligence and intellectual property law.

Final Thoughts

The emerging consensus from federal courts is that training AI models on copyrighted works often qualifies as fair use when the resulting outputs are transformative and do not serve as substitutes for the original works.

At the same time, courts continue to leave the door open for copyright owners to challenge AI systems that cause real economic harm or generate competing content.

As AI technology evolves, the tension between innovation and copyright protection will remain one of the most important legal battles of the digital age.

If you are an AI developer, publisher, artist, author, software company, or content creator facing copyright issues related to artificial intelligence, consult experienced intellectual property counsel to evaluate your rights and obligations under this rapidly developing area of law.

Attorney Steve® Blog

AI Training and Copyright Fair Use: What Recent Court Decisions Mean for AI Developers and Copyright Owners

Vondran Legal® AI & IP Law Firm!

The Battle Over AI Training Data: Is It Fair Use?

Why AI Companies Use Copyrighted Material

Understanding Fair Use in AI Copyright Cases

Factor One: Purpose and Character of the Use

The Transformative Use Inquiry

Bartz v. Anthropic

When AI Training Is Not Transformative

Thomson Reuters v. Ross Intelligence

Factor Two: Nature of the Copyrighted Work

Creative Works May Weigh Against Fair Use

Factor Three: Amount and Substantiality of the Use

The Google Books Analogy

Factor Four: Market Harm

The Three Market Harm Theories

Theory One: Direct Substitution

Theory Two: Lost AI Licensing Revenue

Theory Three: Competitive AI-Generated Content

What These Cases Mean for AI Companies

What Copyright Owners Should Watch

The Future of AI Copyright Litigation

Final Thoughts

About the Author

Steve Vondran

Contact us for an initial consultation!

Office Locations

Menu

Attorney Steve® Blog

AI Training and Copyright Fair Use: What Recent Court Decisions Mean for AI Developers and Copyright Owners

Vondran Legal® AI & IP Law Firm!

The Battle Over AI Training Data: Is It Fair Use?

Why AI Companies Use Copyrighted Material

Understanding Fair Use in AI Copyright Cases

Factor One: Purpose and Character of the Use

The Transformative Use Inquiry

Bartz v. Anthropic

When AI Training Is Not Transformative

Thomson Reuters v. Ross Intelligence

Factor Two: Nature of the Copyrighted Work

Creative Works May Weigh Against Fair Use

Factor Three: Amount and Substantiality of the Use

The Google Books Analogy

Factor Four: Market Harm

The Three Market Harm Theories

Theory One: Direct Substitution

Theory Two: Lost AI Licensing Revenue

Theory Three: Competitive AI-Generated Content

What These Cases Mean for AI Companies

What Copyright Owners Should Watch

The Future of AI Copyright Litigation

Final Thoughts

Share

About the Author

Steve Vondran

Contact us for an initial consultation!

Office Locations

Menu