Hacker News new | past | comments | ask | show | jobs | submit login

The sites are from the torrent and also from the archive team as well. I had to write some code that went through all sites and update the links. I also at the same time tried to just extract the html body and use that for indexing.... Yahoo! must still have the original sites. Surely they could just put them online as a "Read Only" version. They would have nothing to lose



> Yahoo! must still have the original sites.

Are you sure? It cost money to maintain hardware and infrastructure.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: