Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The trick is, of course, that it's nearly impossible to predict what will be useful to someone ahead of time. While you can probably sort out some of the spam, a comprehensive archiving project should probably avoid false positives when throwing things away.

Seems like a hard problem to solve. The low-hanging fruit would probably be detecting duplicates and combining them, which loses redundancy but handles all of those identical landing pages.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: