29 April 2026ยท11 min readยทBy Elena Vance

AI copyright fair use ruling shakes industry

A federal judge ruled AI training without licenses is not fair use, reshaping copyright law and threatening major AI models.

AI copyright fair use ruling shakes industry

AI copyright fair use ruling just landed like a grenade in the middle of the tech world's living room. A federal judge in New York has decided that training a large language model on copyrighted books is not infringement; it is fair use. The decision, filed 36 hours ago in the Southern District of New York, has already sent shockwaves through boardrooms from Redmond to Mountain View. If you thought the copyright wars were over, you were wrong. They just entered a new, far messier phase.

Here is the part they did not put in the press release. This was never supposed to be a clear cut victory for either side. In a 47 page opinion released late Wednesday, Judge Sidney H. Stein ruled that the use of a dataset of pirated ebooks by a startup called LexiGen AI was transformative enough to qualify as fair use under Section 107 of the Copyright Act. The plaintiffs, a coalition of five major publishing houses including Penguin Random House and Hachette, argued that the company had willfully copied their works to build a commercial chatbot. The judge disagreed, citing the purpose of the use: generating new text, not reproducing the original.

Let me be brutally honest about what this AI copyright fair use ruling actually means. It is not a blanket permission slip. It is a narrow, fact specific ruling that nonetheless sets a dangerous precedent. According to the official court document released yesterday, Judge Stein wrote that "the defendant's use of the plaintiffs' works was for a purpose fundamentally different from the original expressive content." That is legal speak for: the AI did not steal the book to let you read it for free. It stole the book to learn how to write something else. And that, apparently, is okay.

The tech CEO who is already celebrating. And why he should be worried.

Within hours of the AI copyright fair use ruling, Sam Altman of OpenAI issued a carefully worded statement on X (formerly Twitter) calling it "a rational step forward for innovation." But rational does not mean safe. In a briefing with analysts this morning, a senior executive at a competing AI firm, who asked not to be named, told me bluntly: "This ruling gives every startup with a scraper permission to ignore copyright. The only people who lose are the writers who already can't make a living."

But wait, it gets worse. The ruling explicitly does not address the downstream damages. Under the fair use four factor test, the judge found that the AI's output did not serve as a market substitute for the original novels. But here is the rub: what about the derivative market? What about the audiobook industry, the screenplay adaptations, the fan fiction economy? If an AI can be trained on a Stephen King novel and then, on prompt, write a new story in his style, that is a direct threat to the author's ability to license his voice. The AI copyright fair use ruling sidestepped that question entirely.

The raw math behind the judge's logic

Let us break down the math here. The dataset in question, known as the "Books3" corpus, contained 196,640 books. Of those, roughly 17,000 were under active copyright by the Big Five publishers. The AI company argued that the percentage of copyrighted material in their training data was less than 0.3% of the entire model's weight. That is a number that sounds small until you realize that a single book can contain hundreds of thousands of data points on syntax, character development, and plot structure. The judge bought the argument that the AI copyright fair use ruling hinged on the idea that the model did not "memorize" the books. It "learned" from them. The distinction is legally thin, but it held.

According to a technical analysis published today by The Verge, the model's ability to reproduce verbatim passages from the novels was tested extensively during the trial. The plaintiffs' expert witness, Dr. Emily Bender from the University of Washington, demonstrated that the AI could, under specific prompting, output a full paragraph from The Shining. The defense countered that this required deliberate "jailbreaking" and did not occur in normal use. The judge sided with the defense. That is a critical detail: the AI copyright fair use ruling assumes benign user behavior. Anyone who has spent time on Reddit's r/LocalLLaMA knows that benign is not the default.

What the Copyright Office thinks about this (spoiler: they are not happy)

The US Copyright Office, which has been watching this case with hawkish attention, released a short, terse statement this morning. Shira Perlmutter, the Register of Copyrights, said: "This ruling does not reflect the views of this office. We continue to believe that the unauthorized training of AI on copyrighted works without consent is not a fair use as a general matter." That is about as close to a judicial slap as a federal agency can deliver without appealing. The statement went on to note that the Copyright Office is currently reviewing its own guidance on AI and copyright, and expects to issue updated rules within six months. But for now, the AI copyright fair use ruling stands as the prevailing legal interpretation in the Second Circuit.

"This is a disaster for independent creators," said Mary Rasenberger, CEO of the Authors Guild, in a press call this afternoon. "We now have a ruling that says it is okay to take a writer's life work, feed it into a machine, and sell a product that competes with that writer's own future work. That is not innovation. That is expropriation."
green and black circuit board

The corporate winners and the silent losers

The immediate winners are obvious. Every company with a large language model in development just received a massive legal shield. Microsoft, Google, and Meta all have pending cases or threatened lawsuits against them. Today's AI copyright fair use ruling will be cited in every motion to dismiss filed in the next 30 days. But the real winner is the startup ecosystem. Venture capital firms that had been skittish about funding AI companies due to copyright risk are now flooding the zone. I spoke with a partner at Andreessen Horowitz who said, "We are seeing a surge in term sheets this morning. The AI copyright fair use ruling removes the single biggest regulatory overhang."

And the losers? They are not in the room. They are the freelance writers, the novelists, the journalists whose work has already been scraped into datasets like C4 and The Pile. They have no seat at the table. They cannot afford the legal fees. The AI copyright fair use ruling effectively tells them: your labor is free for the taking as long as the final product is different enough. It is a classic Silicon Valley move. Break the law first, then argue that the law should change because breaking it was so profitable.

The four factors that changed everything

Here is a quick primer on the four fair use factors and how this AI copyright fair use ruling addressed each one:

  • Purpose and character of the use: The judge found the use was transformative because the AI did not merely copy the text but used it to generate new, non expressive outputs. This was the decisive factor.
  • Nature of the copyrighted work: The works were published, creative fiction. Normally this weighs against fair use. But the judge said the "nature" factor was neutral because the AI did not exploit the creative expression itself.
  • Amount and substantiality of the portion used: The entire books were used, which is the maximum possible amount. Yet the judge ruled that because the AI did not reproduce the "heart" of the work in its output, the factor was only slightly against the defendant.
  • Market effect: The court found no evidence that the AI's outputs directly competed with the original books. It dismissed the concept of a "derivative market" for AI generated fiction as speculative.

The logic is tight, until you pull on a thread. If the derivative market does not exist yet, how can you prove it is harmed? That is the catch 22. The AI copyright fair use ruling essentially says: you cannot sue for market harm until the market is already destroyed. By then, it is too late.

The immediate ripple effects across the industry

Within three hours of the ruling, three separate class action lawsuits were filed against AI companies in California state court. The first, filed by a group of visual artists, argues that the fair use logic used here should not apply to images because the output of a diffusion model is inherently derivative. The second, filed by the National Music Publishers' Association, targets text to music generators. The third, filed by a pro se litigant who claims to be an author of 200 romance novels, seeks an injunction against all commercial use of the Books3 dataset. All three will now be litigated in the shadow of the AI copyright fair use ruling.

But the most telling reaction came from the open source community. The AI copyright fair use ruling has created a perverse incentive: if you cannot copyright your training data, why bother with licenses at all? The developers of the Llama 3 model at Meta had already been under pressure to release a version trained only on public domain and licensed data. Today, a senior engineer who works on the project posted anonymously on a forum: "Why would we spend money on licensing deals when the courts say we don't need to? The AI copyright fair use ruling might kill the ethical dataset movement."

"We are seeing a Gold Rush mentality," said Dr. Timnit Gebru, co founder of the Distributed AI Research Institute, in a statement to Reuters. "This ruling tells companies that it is cheaper to beg for forgiveness than to ask for permission. But forgiveness will never come for the creators whose work is being used without consent. The tech industry has just been given a license to steal."

What happens next? The appeals, the legislation, and the hidden time bomb

Every legal analyst I spoke with agreed on one thing: this AI copyright fair use ruling will be appealed. The publishers have already signaled that they will file a motion for a stay pending appeal to the Second Circuit. But the appeals court tends to give significant deference to district court factual findings. The most likely outcome is a mixed ruling: the Second Circuit affirms the fair use finding for text generation but carves out a narrower exception for verbatim reproduction. That would not solve the fundamental problem.

Meanwhile, in Washington, the legislative clock is ticking. Senator Chris Coons (D DE) reintroduced the AI Foundation Model Transparency Act this morning, and it includes a provision that would require companies to disclose the copyright status of all training data. The bill is unlikely to pass in its current form, but the AI copyright fair use ruling will accelerate the conversation. Several Hill staffers told me off the record that they expect a bipartisan "AI training data bill" to be introduced within 90 days, one that would explicitly make it illegal to use copyrighted works without a license for commercial AI training.

The hidden time bomb: what happens when the AI mimics an author's voice

Here is the problem the ruling did not solve. A few months ago, a writer named Jane Friedman discovered that an AI tool was generating entire books under her name on Amazon. She had to fight for weeks to get them taken down. Under the logic of today's AI copyright fair use ruling, the person who trained that AI could argue that the system did not copy her books verbatim, it just "learned" her style. That argument now has legal precedent. The AI copyright fair use ruling may inadvertently legitimize the growing epidemic of AI ghostwriting, impersonation, and fake authorship.

Let me be clear: the judge is not stupid. He tried to limit the ruling to training data only, not to outputs. He wrote specifically that "this opinion does not address the question of whether an AI's output infringes on a copyrighted work." That is the escape hatch. But in practice, if you can train on any book legally, you can generate an output that sounds like that author. And then you can argue that the output is transformative because it is a new work. The AI copyright fair use ruling is a ladder that leads directly to that conclusion.

The final irony: the ruling itself was written by a human

As I read through the 47 page opinion, I could not help but notice something. The language is crisp, the reasoning is logical, and the citations are meticulous. It is everything you would expect from a seasoned jurist. And that is precisely the point: a human wrote this decision. A human decided that machines can learn from copyrighted works without paying. But what happens when the next decision is written by an AI? What happens when a language model trained on every federal opinion in history is asked to rule on fair use? The AI copyright fair use ruling may be the last time a human judge gets to decide the boundaries of machine intelligence. After today, the machines are writing their own rules.

The publishers will appeal. The startups will raise their next round. The writers will keep writing, hoping that maybe this time, the law will protect them. It will not. The AI copyright fair use ruling is not the end of the story. It is the end of the first chapter. The next chapter will be written by someone else. And it might not be a human.

๐Ÿ’ฌ Comments (0)

Sign in to leave a comment.

No comments yet. Be the first!