Hacker News new | past | comments | ask | show | jobs | submit login

I'd wager any startup that tries to crawl a few sites like Amazon, Yelp, Linkedin, etc will be blocked. Google, however gets a pass because they're Google. So yes, I believe their huge index, and ability to crawl any site at will is a huge, huge advantage for them.



I built a search engine that was able to crawl Amazon and Yelp. The toughest sites were reddit and facebook.


at scale? millions of pages a week? And now? I wrote a crawler that could crawl Amazon as early as a year ago too, but now it doesn't work.


And google sucks at those too.


Amazon lets anyone crawl them, Yelp has a whitelist and no you can't get on it, Linkedin has a whitelist and no you can't get on it, Facebook has a whitelist and no you can't get on it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: