Hacker News new | past | comments | ask | show | jobs | submit login

You limit the crawl time or number of requests per domain for all domains, and set the limit proportional to how important the domain is.

There's a ton of these types of of things online, you can't e.g. exhaustively crawl every wikipedia mirror someone's put online.






Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: