
Facebook has the world's largest Hadoop cluster - DanielRibeiro
http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html
======
miguelrios
"Sunday, May 9, 2010" just saying... those numbers may be different now.

~~~
DanielRibeiro
It was mentioned on _Apache Hadoop Goes Realtime at Facebook_ paper[1],
presented on June 12-16 2011.

At least facebook research guys are still advertising it.

[1] ref 16 at <http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf>

------
klausa
Does anyone knows what caused such big spike in size of data around August 09?
Did they simply move more/all data into Hadoop or is there another
explanation?

Number of machine stayed relatively the same, so it seems really weird that
they increased size of data 10000-fold while not even doubling their machine
number.

~~~
zach
Hmm. FarmVille was released shortly before that time, wasn't it? Just a
thought.

~~~
RyanKearney
Except Zynga holds all that information.

------
dwhitney
Bah! Largest?!? I'm absolutely certain Hadoop Clusters of that size are
spawned on Amazon Web Services several times a day. In fact, I'd say something
more along the lines of, "wow, look at how old skool Facebook is!"

~~~
jpitz
You're making the arguably extraordinary claim that _someone_ is spinning up
Facebook-sized workloads on AWS, multiple times a day. Until you can back that
up with a source, you're going to get downvoted.

~~~
0x12
There is this article:

[http://developer.yahoo.com/blogs/hadoop/posts/2008/09/scalin...](http://developer.yahoo.com/blogs/hadoop/posts/2008/09/scaling_hadoop_to_4000_nodes_a/)

Which is from 2008 and which references a cluster twice the size of the
facebook one, and mentions several production clusters the same size as the
facebook one.

They are (obviously, this being Yahoo!) not on Amazon though.

~~~
DanielRibeiro
From the article:

 _That's a total of more than 21 PB of configured storage capacity! This is
larger than the previously known Yahoo!'s cluster[1] of 14 PB._

Guess they are comparing data volumes. But great for pointing out that they do
not have the largest in machine count.

[1]
[http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalab...](http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/)

