
[pdf] Twitter’s Real-Time Related Query Suggestion Architecture - Anon84
http://arxiv.org/pdf/1210.7350v1.pdf
======
bslatkin
Why didn't they just run Hadoop on a sample of the logs instead of the full
dataset? It seems like they were just looking for popular related queries
anyways. Any ideas?

