I can also comment here. I built and still run a petabyte-scale web crawler:


Common Crawl and other sources do in fact have a ton of data that can be used which is very affordable.

The DATA itself stopped being a real competitive advantage probably 2008-2010.

Google's major advantage now is its algorithms and the fact that they've proven it works and is reliable.

Most importantly, its the brand. Google MEANS search in the US and that won't change anytime soon.

PS,... if you need tons of web and social data Datastreamer can hook you up too :)

