Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does "all" mean all the URLs publicly known, or did they exhaustively iterate the entire URL namespace?




They iterated the entire URL namespace by having volunteers run a client so they didn't get IP banned.

are we sure that the whole entire URL namespace has been mapped?

How would that even function, I mean, did they loop through every single permutation and see the result, or what exactly/ how would that work?


> did they loop through every single permutation and see the result, or what exactly/ how would that work?

In short, yes. Since no one can make new links, it's a pre-defined space to search. They just requested every possible key, and recorded the answer, and then uploaded it to a shared database.


The pipeline code is available for review of the mechanics of http requests made if you follow the ArchiveTeam wiki links.

Beautiful. I wish I had seen this and could have helped.

they are still archiving other url shorteners https://tracker.archiveteam.org:1338/ you can participate in that

The goo.gl URLs that are publicly known are already in the Internet Archive and Common Crawl crawls.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: