
Realtime Hadoop usage at Facebook: The Complete Story - LiveTheDream
http://hadoopblog.blogspot.com/2011/07/realtime-hadoop-usage-at-facebook.html?m=1
======
chuhnk
This is extremely interesting stuff. The posts that continue tease the problem
of having to process millions of events in real time. Here is the direct link
to the paper <http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf> incase
anyone doesn't feel like clicking through. What I'm very drawn in by is
facebook's continued and heavy use of mysql. They didn't run off to some nosql
based datastore when things got tough, they learned to scale mysql, they
leveraged its strong points while trying to minimize using its weaknesses. And
then they looked at alternatives that compliment their existing systems. This
is scaling at its best and as a system administrator its the kind of challenge
I find both fascinating and exciting.

~~~
simonw
"They didn't run off to some nosql based datastore when things got tough" -
not exactly... they did create Cassandra, and these days they run quite a few
things on top of HBase.

~~~
chuhnk
Yes however I can imagine it wasn't the solution they went to in the
beginning. They continued to work with mysql and then implementing caching
layers before getting to the point of needing another datastore.

------
seagaia
Interesting, as much as I don't really like facebook.

I only see Hadoop being used even more widely now - that, or
spinoffs/improvements, etc., as needed, unless Hadoop really can scale as much
as possible.

At least, that's what this article tells me.

<http://www.infoq.com/news/2011/06/hortonworks>

~~~
blumentopf
It sure is difficult to find companies that are NOT using Hadoop for somewhat
morally questionable stuff (like ad targeting).

It's impressive what the Facebook guys did with Hadoop/HBase, but it looks
like they patched here and there and everywhere to make it work for their
specific use cases. Makes me wonder if a from-scratch redesign is in order to
really get realtime processing to work properly with Hadoop.

------
sneak
Is Facebook contributing these changes upstream?

