Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does "all" mean all the URLs publicly known, or did they exhaustively iterate the entire URL namespace?


They iterated the entire URL namespace by having volunteers run a client so they didn't get IP banned.


are we sure that the whole entire URL namespace has been mapped?

How would that even function, I mean, did they loop through every single permutation and see the result, or what exactly/ how would that work?


> did they loop through every single permutation and see the result, or what exactly/ how would that work?

In short, yes. Since no one can make new links, it's a pre-defined space to search. They just requested every possible key, and recorded the answer, and then uploaded it to a shared database.


The pipeline code is available for review of the mechanics of http requests made if you follow the ArchiveTeam wiki links.


Beautiful. I wish I had seen this and could have helped.


they are still archiving other url shorteners https://tracker.archiveteam.org:1338/ you can participate in that


The goo.gl URLs that are publicly known are already in the Internet Archive and Common Crawl crawls.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: