Black Box Notes

On opacity, auditability, and the limits of trust in modern AI systems.

Notes

The $1.5 Billion Settlement — What Bartz v. Anthropic Means Going Forward

The largest publicly reported recovery in US copyright history settles a narrow legal question and opens a wider operational one. A working note on the Bartz ruling, the settlement structure, the unresolved fair-use line, and the precedent every AI lab is now operating under whether it admits to or not.

X LinkedIn Mastodon Print

The headline figure is the part that travels. $1.5 billion is the figure quoted in the trade press, the policy briefings, the procurement memos, and the slide decks that have followed the Bartz v. Anthropic settlement around the AI industry since it was announced in August 2025. The Authors Guild has called it “the largest publicly reported recovery in US copyright history.” NPR’s coverage of the settlement and the Authors Guild explainer place the number on the record. The judge granted preliminary approval in late September 2025, as CNBC reported at the time.

The headline figure is also the part that obscures the more interesting parts. The Bartz settlement is not the resolution of the question “is training generative AI on copyrighted books fair use.” It is the resolution of a much narrower question — what happens when a lab knowingly retains, in its corporate possession, several hundred thousand pirated copies of books it did not legally acquire. The fair-use ruling that preceded the settlement was a partial victory for Anthropic. The settlement was a complete defeat on the part of the case it did not win. The two facts together produce a precedent every AI lab now operates under, regardless of whether it has explicitly acknowledged so.

This is a note on what that precedent actually is, what it does not yet do, and the operational position it has set up for the next decade of training-data acquisition.

The split ruling — what Judge Alsup actually decided

Before the settlement, there was a ruling. In June 2025, Judge William Alsup of the Northern District of California issued a summary-judgment order that split the case along a line that is now load-bearing for the entire industry. The analysis by ArentFox Schiff is the most useful summary of what the order did.

The first part of the ruling held that training a large language model on copyrighted books was, in this case, fair use. The judge described the training as “quintessentially transformative.” The reasoning followed the familiar four-factor analysis, with the transformative-purpose finding doing the heaviest lifting. The opinion is careful — it does not announce a general rule that training on copyrighted material is always fair use — but its operative finding, for the question the industry was asking, was that the training itself, on books Anthropic had legally acquired, did not infringe.

The second part of the ruling held the opposite of the second piece of conduct. Anthropic had also, separately, downloaded pirated copies of books from shadow library sources — the cluster of repositories the opinion identifies as LibGen, PiLiMi, and similar archives — and retained them in its corporate document store. The judge held that downloading, copying, and storing those works was not fair use. The argument that the pirated copies were “intermediate” or “research” copies did not save the conduct. The court treated the existence of the pirated set, separate from any model trained on it, as actionable infringement.

The split was the entire story. The training of a model trained on legitimately acquired material was protected. The corporate retention of pirated material was not. The case proceeded toward trial on the second piece, and the settlement is, in effect, a price tag on the second piece alone.

The settlement structure — $3,000 a work, on roughly half a million works

The arithmetic is the part that matters going forward, more than the headline figure. The settlement covers approximately 500,000 pirated works. The per-work recovery is approximately $3,000. The total comes to the $1.5 billion the press release announced.

The structural facts of the settlement are recorded in the Authors Guild explainer and the Copyright Alliance’s working note on participation. The Wolters Kluwer Copyright Blog provides a procedural account of the path from preliminary to final approval, and the Susman Godfrey announcement — Susman Godfrey was lead counsel for the plaintiffs — confirms the operative figures from the plaintiffs’ side.

Two features of the settlement structure matter for what comes next.

The first feature is that the settlement releases past claims only. It does not license future training. An author whose work was on the pirated list is paid for the past infringement; they have not consented to the work being used to train any future Anthropic model. The Copyright Alliance is explicit about this. Anthropic is free to acquire the same titles through legitimate channels and train on those copies under the fair-use posture of Judge Alsup’s order, but the settlement does not retroactively launder the pirated set into a future-training licence. The lab still needs a clean acquisition path going forward for any continued use of that material.

The second feature is the per-work figure. $3,000 is now a number that exists in the world. It is not the figure a court has held to be the value of an infringed work in general; it is the figure two sophisticated parties, advised by sophisticated counsel, agreed to as the price of past infringement of a half-million-work corpus held by a well-capitalised AI lab. It is, however, the only such figure that exists. In the absence of a competing benchmark — and there is no competing benchmark — it is the figure every other lab now operates against.

The precedent the industry now lives under

The legal precedent of Bartz v. Anthropic, narrowly described, is exactly two things. The training of an LLM on legitimately acquired copyrighted books may be fair use under the facts of this case. The downloading and retention of pirated copies of those same books is not. Everything else the ruling has been cited for is extrapolation by readers, not a holding of the court.

The operational precedent — what the case has produced in the world, regardless of what it has held in law — is more consequential. Three features of that operational precedent are worth recording.

The first is that every AI lab now has a known liability per pirated work. The figure is $3,000, and the corpus size is, in Anthropic’s case, approximately 500,000 works. The arithmetic generalises. A lab that has, somewhere in its training pipeline, a similarly sized cluster of pirated material is now, on the public record, looking at a liability exposure of roughly the same order of magnitude as Anthropic’s settlement. The exposure is not theoretical. The case where the exposure is real has been litigated, settled, and approved. The fact that the figure exists changes the procurement question for every customer of every AI lab. It changes the audit question for every regulator. And it changes the disclosure question for every operator who has reason to know what the training corpus contains.

The second is that the fair-use defence has been narrowed by the line the ruling drew. Before Bartz, an AI lab could plausibly argue that the question of training on copyrighted material was a single question with a fair-use answer; the lab could treat acquisition and training as a single transaction whose lawfulness rose or fell together. Bartz drew the line between acquisition and use. A lab arguing fair use now has to argue it about training specifically, on lawfully acquired copies specifically, and the argument is harder to make if the corpus contains acknowledged pirated material. The defence is not gone. It is conditional in a way it was not previously.

The third is that the discovery process the case generated has set a template. The plaintiffs were able to develop, on the record, evidence that the pirated corpus existed, that it had been retained, and that it had been used. Future cases will follow that template. The discovery procedures that were good enough to surface the Anthropic pirated set are now good enough to surface the equivalent at any other lab whose practices in 2020-2022 followed industry norms. The industry norms in that period are now a litigation exposure.

Anthropic’s response, honestly read

Anthropic’s public response to the settlement has been characteristically restrained. The company has not, on the record, conceded that the ruling reflects badly on its prior practices. The company has, however, paid, and the payment is the substantive admission. A company that believed Judge Alsup’s order was wrongly decided on the pirated-corpus part of the case would have pressed on. A company that paid $1.5 billion paid because the alternative — proceeding to trial with the holding on the pirated set already against it — was worse.

That is the most honest reading of the corporate behaviour. It is also the reading the rest of the industry has, in practice, taken. The labs that have publicly described changes to their training-data acquisition processes since the Bartz settlement have, by and large, described the changes in language that takes the pirated-set problem seriously. The labs that have not described changes are, by inference, either in the same legal position Anthropic was in or in a worse one.

The publication’s view is that the honest characterisation of Anthropic’s response is this: the company won the part of the case it wanted to win, paid heavily for the part of the case it could not win, and did not pretend the second part had not happened. That is more candid than the equivalent corporate posture would have been in most large-cap industries. It is also, from the position of a regulator or a downstream customer, not enough. The candour does not make the underlying problem go away.

What this does not yet do

It is worth being precise about what the Bartz settlement has not resolved.

It has not produced a general rule about fair use for AI training. The ruling applies to a particular set of facts, in a particular jurisdiction, decided by a particular judge. Future cases — and there are many — may produce different holdings on different facts. The AI Lawsuit Tracker shows more than 160 AI copyright cases pending; some of them present materially different fact patterns, including the cases involving live-published news content, the cases involving image models, and the cases involving music. The fair-use posture in those cases will be litigated on its own terms.

It has not produced a licensing framework that an AI lab can use prospectively. The settlement is backward-looking. A lab that wants to train on copyrighted material going forward still needs to either acquire the material through commercial channels, secure direct licences from the rightsholders, or rely on fair use as a forward-looking defence in litigation that may not yet have been filed. None of those options is, as of late spring 2026, a clean industry-standard solution.

It has not produced clarity on the question of model outputs that reproduce copyrighted material. The Bartz case was about training inputs. The downstream question — when does a generated output infringe the rightsholder’s work — is the question NYT v. OpenAI is now developing, and that case is at a substantively different procedural stage. The two questions are related but the Bartz holding does not answer the second.

And it has not produced a settled position for the open-weights labs. The settlement was between a closed-weights lab and a class of authors. An open-weights model — particularly one trained, even in part, on the same pirated-set sources — presents a different remediation problem. The pirated set is downloadable in the weights. The settlement does not resolve what the obligations of the lab that released those weights are with respect to that pirated material.

The procurement consequence

For a customer of an AI lab in 2026 — for the legal, compliance, or procurement team that is evaluating whether to deploy a frontier model into an enterprise workflow — the Bartz settlement is not an item of historical interest. It is the operative precedent for the question they have to ask their vendor, on the record, in writing: what is in the training corpus, how was it acquired, what would the same discovery process produce if applied to your stack, and what is your indemnity position with respect to a claim that follows the Bartz pattern.

The publication’s view is that the vendor responses to those questions are, in mid-2026, varied. Some labs have produced substantive written responses. Some have produced legal disclaimers. Some have refused to answer. The variance is itself informative. A vendor whose response to a Bartz-pattern question is a refusal to answer is communicating something specific about its own assessment of its exposure.

The buyer’s reading of those responses is, in our reading of the procurement conversations we have heard about, the part of the procurement conversation that is now doing the most work. The capability gap between the major labs has narrowed; the litigation-exposure gap has not. The litigation exposure is the part of the comparison that decides the deal.

This is the precedent Bartz v. Anthropic actually set. The fair-use part of the ruling protects the labs in the limited sense the ruling applies. The pirated-set part of the ruling exposes them in the much broader sense the settlement priced. Every lab now operates under both parts of the precedent. The labs that have read the precedent that way are positioned to be candid with their customers about what their training corpus looks like. The labs that have not read it that way are operating, in the procurement conversations the publication has heard about, as if the figure $1.5 billion does not exist in the world. It does. It is on the docket, it has been approved by the court, and it has set the price.

Editorial note. This piece treats the Bartz v. Anthropic settlement as a matter of public record, and links to the published primary and secondary sources we have read. We have not attempted to characterise positions any party to the settlement has not put on the record. Where we have summarised a published source, we have linked to it; where we have inferred from corporate behaviour rather than corporate statement, we have said so. The publication will track future filings and rulings in the broader litigation landscape and will write follow-up pieces when the next material development on the record warrants one.

Copied