> Does this even make sense? Are the copyright laws so bad that a statement like...

musicale · 2026-01-23T03:37:45 1769139465

> There's also precedent, in google scanning massive amounts of books,

Except that Google acquired the books legally, and first sale doctrine applies to physical books.

> but not reproducing them

See also: "Extracting books from production language models"

https://news.ycombinator.com/item?id=46569799

olejorgenb · 2026-01-19T16:10:00 1768839000

> I don't see how they get around "procuring them" from 3rd party dubious sources

Yeah, isn't this what Anthropic was found guilty off?

bulbar · 2026-01-20T06:34:29 1768890869

Is they don't reproduce the data of any kind, how could the LLM be of any use?

The whole/main intention of an LLM is to reproduce knowledge.

masfuerte · 2026-01-19T15:56:54 1768838214

Scanning books is literally reproducing them. Copying books from Anna's Archive is also literally reproducing them. The idea that it is only copyright infringement if you engage in further reproduction is just wrong.

As a consumer you are unlikely to be targeted for such "end-user" infringement, but that doesn't mean it's not infringement.

NitpickLawyer · 2026-01-19T17:41:51 1768844511

https://cases.justia.com/federal/appellate-courts/ca2/13-482...

This is the conclusion of the saga between the author's guild v. google. It goes through a lot of factors, but in the end the conclusion is this:

> In sum, we conclude that: (1) Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use. (2) Google’s provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement. Nor, on this record, is Google a contributory infringer.

amanaplanacanal · 2026-01-19T16:40:54 1768840854

It seems like they pretty much don't care unless you distribute the copy. There is certainly precedent for it, going back to the Betamax case in the 1980s.

Ferret7446 · 2026-01-19T16:31:37 1768840297

Private reproductions are allowed (e.g. backups). Distributing them non-privately is not.

masfuerte · 2026-01-19T16:47:27 1768841247

Backups are permitted (and not for all media) when you legally acquired the source. Scanning a physical book is not a permitted backup, and neither is downloading a book from Anna's archive.

fc417fc802 · 2026-01-19T18:43:22 1768848202

> Scanning a physical book is not a permitted backup

On what basis do you claim that?

You're also missing critical legal context. When a would be consumer downloads pirated media in lieu of purchasing it he damages the would be seller. When my automated web scraper inadvertently archives some pirated content on my local disk no one is financially harmed.

The question is where the boundary between those things lies.

gruez · 2026-01-19T19:35:28 1768851328

>Distributing them non-privately is not.

You can even distribute them, to some limits.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....