My beef with Hadoop, and other big data tools, is that for pretty much any task other than outlier detection, sampling works as good, and is cheaper and easy to manage and reason about.
Even Google, the king of big data, will sample your hits on Google Analytics if your site gets too much traffic.
Even Google, the king of big data, will sample your hits on Google Analytics if your site gets too much traffic.