Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Does this even make sense? Are the copyright laws so bad that a statement like this would actually be in NVIDIA’s favor?

It makes some sense, yeah. There's also precedent, in google scanning massive amounts of books, but not reproducing them. Most of our current copyright laws deal with reproductions. That's a no-no. It gets murky on the rest. Nvda's argument here is that they're not reproducing the works, they're not providing the works for other people, they're "scanning the books and computing some statistics over the entire set". Kinda similar to Google. Kinda not.

I don't see how they get around "procuring them" from 3rd party dubious sources, but oh well. The only certain thing is that our current laws didn't cover this, and probably now it's too late.



> There's also precedent, in google scanning massive amounts of books,

Except that Google acquired the books legally, and first sale doctrine applies to physical books.

> but not reproducing them

See also: "Extracting books from production language models"

https://news.ycombinator.com/item?id=46569799


> I don't see how they get around "procuring them" from 3rd party dubious sources

Yeah, isn't this what Anthropic was found guilty off?


Is they don't reproduce the data of any kind, how could the LLM be of any use?

The whole/main intention of an LLM is to reproduce knowledge.


Scanning books is literally reproducing them. Copying books from Anna's Archive is also literally reproducing them. The idea that it is only copyright infringement if you engage in further reproduction is just wrong.

As a consumer you are unlikely to be targeted for such "end-user" infringement, but that doesn't mean it's not infringement.


https://cases.justia.com/federal/appellate-courts/ca2/13-482...

This is the conclusion of the saga between the author's guild v. google. It goes through a lot of factors, but in the end the conclusion is this:

> In sum, we conclude that: (1) Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use. (2) Google’s provision of digitized copies to the libraries that supplied the books, on the understanding that the libraries will use the copies in a manner consistent with the copyright law, also does not constitute infringement. Nor, on this record, is Google a contributory infringer.


It seems like they pretty much don't care unless you distribute the copy. There is certainly precedent for it, going back to the Betamax case in the 1980s.


Private reproductions are allowed (e.g. backups). Distributing them non-privately is not.


Backups are permitted (and not for all media) when you legally acquired the source. Scanning a physical book is not a permitted backup, and neither is downloading a book from Anna's archive.


> Scanning a physical book is not a permitted backup

On what basis do you claim that?

You're also missing critical legal context. When a would be consumer downloads pirated media in lieu of purchasing it he damages the would be seller. When my automated web scraper inadvertently archives some pirated content on my local disk no one is financially harmed.

The question is where the boundary between those things lies.


>Distributing them non-privately is not.

You can even distribute them, to some limits.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: