The Copyright Office Issues Its Long-Awaited Report on AI Training Material and Fair Use.

Will It Stymie the U.S. AI Industry?

May 23, 2025

Note: A much shorter, less detailed version of this column is available on the Leading-Edge Law Group website.

On May 9, the U.S. Copyright Office (USCO) finally released its report analyzing whether training an AI on the copyright property of others without their permission is fair use. The analysis makes some key calls in favor of content creators and against generative AI (“GenAI”) companies, such as OpenAI (ChatGPT), X/Twitter (Grok), Meta (Llama), and Google (Gemini).

What does this portend for U.S. AI businesses? Will it handicap U.S. AI companies and allow China to race ahead? Will it protect content creators and enable them to earn AI-training licensing revenue? Indeed, what will it mean for the future of the U.S. economy?

The report is 108 pages long. I won’t summarize it all.

How GenAI Works

Before giving key takeaways, here’s some background:

Training a general-purpose, world-class AI (a “foundation model”) requires a lot of training data. In the training phase, a GenAI neural network is fed copious data to enable it to understand the relationships between its tiniest components, such as words, punctuation, and numbers (these are called “tokens”). This training process adjusts the “weights” in the neural network. The GenAI doesn’t store a literal copy of specific training data. When you put a query into a GenAI, the setting of its weights determines the output.

While the makers of foundation models are tight-lipped about this, it’s widely understood that a lot of the training data was obtained by scraping material from the Internet without obtaining licenses from content owners. Almost all material on the Internet is someone’s copyright property. In some cases, the training data is believed to be obtained from pirate sources, such as shadow libraries of material taken from behind pay walls.

AI Makers Claim Fair Use

Content owners are suing AI makers, claiming that using their material without permission is copyright infringement. AI makers claim fair use. Fair use is a defense to copyright infringement. Copying someone else’s copyright property without permission generally is copyright infringement unless the user proves fair use.

Statutory federal copyright law specifies four factors for evaluating fair use:

· the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

· the nature of the copyrighted work;

· the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

· the effect of the use upon the potential market for or value of the copyrighted work.

In practice, the biggest consideration in judging fair use is whether the defendant’s conduct substantially harms the market value of the plaintiff’s copyright property. For example, in a 2023 decision, the Supreme Court held that the Andy Warhol Foundation was not engaging in fair use when it used without permission a photograph of the musician Prince as the basis for a piece of Warhol-created pop art that the foundation licensed to Vanity Fair magazine for use as cover art, because that cover use competed with the photographer’s licensing opportunities.

Except for one relatively minor case that doesn’t concern generative AI (which I analyzed here), no court has yet decided whether it’s fair use to train an AI using someone’s copyright-protected content without permission. Also, this fair use analysis concerning AI is unlike any prior legal cases, even ones concerning computer technology.

The Copyright Office Took Two Controversial Positions

On two crucial issues, the USCO report controversially sided with content creators over AI companies.

First, it opined that when GenAI outputs material in the style of a content creator without reproducing any specific copyrighted-protected elements, that may not be fair use, particularly if the training material was obtained from a pirate source, and if the type of content at issue could be licensed for training.

Second, it opined that the fair use analysis must consider AI training and its outputs holistically rather than judging training and outputs separately.

U.S. technology companies will push back hard in both places.

I realize that’s confusing. Let’s unpack it.

Controversial Position #1 – “In The Style Of” Outputs

Let’s start with outputs.

As the report admitted, creating content “in the style of” an author or artist is widely considered not to be copyright infringement provided you don’t use any of that person’s specific creative expression. (My earlier deeper analysis of that is here.) Copyright does not protect ideas or facts. It protects only the creative way in which content is expressed. For example, it’s not copyright infringement to produce a book written in the style of Tom Wolfe, provided you don’t take his character names, plot lines, or material passages of text.

But, surprisingly, the report said that when an AI generates output in the style of an author or artist, that might be copyright infringement because of AI's power to analyze material in training far more quickly and deeply than a human being and to generate output easily and voluminously.

In short, the report opined that such style mimicry would be legal if done by a human but possibly illegal if done by AI. The USCO admitted it is in “uncharted territory” in reaching this conclusion.

Controversial Position #2 – Analyzing Training and Outputs Holistically Rather Than Individually

This relates to the USCO’s decision to combine AI training and output into a single fair use analysis rather than treating the two steps separately. That’s controversial because, if you consider each stage separately, AI providers would probably escape liability except for what are likely rare situations.

As the report admits, if you consider the training phase in isolation, while that involves using copies of other people’s copyright property without permission, that conduct almost certainly is fair use. This training simply learns about the relationships in data. That doesn’t harm the marketplace value of the training material.

And, if you consider the output phase in isolation, the output is copyright infringing only if it appropriates specific copyright-protected expression, such as verbatim copies of text (such as newspaper articles and song lyrics) or cartoon characters (such as Homer Simpson or SpongeBob), but that appears to be a small percentage of outputs.

The USCO opined that you can’t consider each step in isolation. It opined that the copying done in training is likely not fair use when the output is material that competes with or otherwise reduces the marketplace value of the property of copyright owners, even if that output would not be copyright-infringing if done by a human.

As precedent, the USCO cited some cases (principally Authors Guild v. Google, Inc.) that concerned multi-step technological processes and had a holistic fair use analysis, but the courts in those cases didn’t expressly address the appropriateness of doing a fair use analysis holistically rather than analyzing each step individually. It appears the parties in those cases didn’t litigate that issue. Also, there are cases (such as Sega Enterprises, Ltd. v. Accolade, Inc.) where courts addressed a sequence of technological processes individually in a fair use analysis, but, again, it appears the issue of whether to do so was not litigated.

Will the Trump Administration Withdraw the Report?

The Trump Administration might withdraw this report.

The Trump Administration is friendlier to the U.S. AI industry than the Biden Administration. Shortly after taking office, it rescinded a Biden Administration executive order on the development and use of AI, which was restrictive and burdensome.

The day before the report was released, the Trump Administration fired the head of the Library of Congress, which oversees the USCO. The day after the report was issued, it fired the head of the USCO. The administration didn’t comment on whether these firings were related to the report.

The USCO may have rushed out the report to prevent the Trump Administration from meddling with it. The version released was labeled a “pre-publication version.” It’s unusual to release a non-final version.

This report is not the law. Courts will decide this fair use issue. They’ll certainly consider this report, but they aren’t bound to follow it.

Following the Report Might Hamper United States Competitiveness vis-à-vis China but Would Aid Content Creators

In a May 19 article, the Wall Street Journal reported that the European technology sector is far behind the U.S. and China.

It noted that the total market capitalization of large U.S. technology companies (market cap $1 billion or more) is $2.53 trillion. In China, that total is $702 billion, and only $333 billion in the European Union. The article points out that the European technology sector is small in part because it’s so heavily regulated, although Europe’s employee-friendly laws and work culture also hamper this sector.

Many commentators flagged the report as a win for China, saying it could hamper the U.S. AI industry and allow China to race ahead.

On the other hand, content creators are rightfully fearful. For example, why pay to license a stock image for use in advertising when you can generate one with AI?

Ultimately, the Fight is about Money

This fight ultimately comes down to money.

Content creators want to force AI companies to pay for licenses to use their material to train AI. They also want the right to withhold their material from training. Some may fear that the competitive threat of AI material isn’t worth license fees.

AI companies argue that it’s impractical to license all the training material they need to build foundation models, that collections of some types of training data don’t exist, and licensing would be financially prohibitive.

The result of this fight will have a profound impact on the fortunes of the U.S. tech and creative industries, and perhaps on the economic future of the U.S. itself.

Written on May 22, 2025

by John B. Farmer

Leading-Edge Law - John B. Farmer’s Substack

Discussion about this post