Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Two big issues with Archive.org are that 1. it's a single point of failure, they don't encourage mirror sites to emerge, and 2. they keep using the "brand" to fight unwinnable battles like hosting books they don't own online, risking the whole endeavor.

I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.



Its surprising that archive.org is the only such outfit I have encountered. Just like we have had libraries since ancient times, why are there so few digital libraries? There must be others, but nowhere near the number (or awareness) that we should have.

Heck, existing paper-based libraries should probably each include a digital archiving department.

Maybe this is already happening or already exists, and is trivial to those studying library science or something. I can hope, anyway.


There are lots of web archiving projects out there:

https://en.wikipedia.org/wiki/List_of_Web_archiving_initiati...

But the web is large. And public sector or academic librarian teams tend to be small. The IA's the one that people have coalesced around.


Excellent question.

Local neighborhood libraries could have their own curated digital archive, as cache for fast local search, and archival backup for long-term resilience.


> I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.

Or somebody accidentally `rm -rf`'s an empty variable. Or The Big One hits San Fran. Or somebody in crisis breaks in with a crowbar, matchbook, and jug of gasoline.

They're a rather old-school shop. Own their own servers, all in one location I think. Bare metal admin stuff, and data's only mirrored across two disks per file IIRC. Keeps costs down. It's what makes the whole operation possible. But I also wonder sometimes.



If you check the issues, you'll learn this is not a supported project anymore (and honestly, it hardly worked even back then).


> Now that Google no longer shows cached results,

That was also the "end of an era" of sorts right there.-

> they don't encourage mirror sites to emerge,

Something over BitTorrent or blockchain would work well here, methinks. As a baseline substrate.-




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: