Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
gcb0
on July 5, 2018
|
parent
|
context
|
favorite
| on:
How to crawl a quarter billion webpages in 40 hour...
just do like browsers did with user agent strings. call your bot "botx (google crawler compatible)" and crawl everything that allows Google bot without any weight on your conscience.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: