Hacker News new | past | comments | ask | show | jobs | submit login

While I agree that it is whitewashing biased data, maybe they get good results with new and unseen URLs that try to look like some relevant page using the same tricks as the scam URLs in the corpus.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact