
Facebook's software architecture - colbyaley
http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html?spref=tw
======
23david
I found this part interesting: "F4 currently stores over 65PB of logical data
and saves over 53PB of storage."

It looks like most of the savings here is due to optimization of the
replication level of storage, according to the usage patterns of the data.

Also found it interesting that it doesn't talk about CDN usage.

~~~
nbm
What would you have liked to know about CDN usage?

(I work on the team that builds and runs the Facebook CDN infrastructure.)

~~~
23david
If the replication factor for blobs is calculated from the 'hotness' of the
data (by looking at the age for example), I'm wondering how CDNs come into
play.

Increasing the replication factor would allow for faster reads, but there
usually wouldn't be a need for that if you offload reads of hot data to a CDN
service (could be an internal CDN...).

~~~
nbm
There is a just-before-storage cache (through which the CDN edge requests)
which reduces requests per object, but it does not affect the number of hot
objects per disk. With Haystack each object is fully available with sane
performance in degraded state. This means 1/3 I/O per disk, whereas F4 only
has 1 easily available full copy, which means 100% load on one disk per object
stored.

Hope that makes sense - let me know if not.

------
shaydoc
Excellent article. I like the architecture explanation. I also like the
simplicity of the description, makes it really easy to understand the
descision making process.

I think it is a very logical solution. Naturally when I think of facebook, its
the freshness of the data that is important for me, as a user I like to know
whats going on now, as opposed to older timeframes. I think it seems quite
logical that it is much more optimal to manage their data this way.

------
akurilin
On this topic, where do you guys learn more about architecture of large
conglomerates of web services and the latest trends?

I read High Scalability and the occasional company blog. Is there anything
else out there that might be even better? Blogs, books, forums, doesn't
matter.

~~~
dantiberian
I've learned heaps from reading the papers that these companies publish.
Google, Facebook, and Amazon all publish well written, readable, papers. If
you're looking for somewhere to start, then the Dynamo paper is a classic:
[http://www.allthingsdistributed.com/files/amazon-dynamo-
sosp...](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)

~~~
gansai
just curious: "learned heaps" \--> did it just improve your knowledgebase or
it actually helped you practically to solve a problem which you were facing.?
I am just curious, as I am a starter to reading these kind of papers. thanks.

~~~
dantiberian
A bit of both, some like the Dynamo paper helped me understand Cassandra
better, while other like Spanner aren't directly applicable to my day to day
work but introduced me to new ideas.

------
Yadi
Great post. Thanks this is an awesome blog post, I love it when someone
summarizes the good stuff :D! I wish someone would do the same for the AWS
cloud papers.

Wow TIL what Tao is and what it can do.

This part is very sexy to me: Facebook's new architecture splits the media
into two categories: 1) hot/recently-added media, which is still stored in
Haystack, and 2) warm media (still not cold), which is now stored in F4
storage and not in Haystack.

~~~
kukla
Some AWS related links from that blog:
[http://muratbuffalo.blogspot.com/2010/11/dynamo-amazons-
high...](http://muratbuffalo.blogspot.com/2010/11/dynamo-amazons-highly-
available-key.html)

[http://muratbuffalo.blogspot.com/2012/10/building-fault-
tole...](http://muratbuffalo.blogspot.com/2012/10/building-fault-tolerant-
applications-on.html)

[http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-
day-...](http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-
day-2-keynote.html)

[http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-
rest...](http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-rest-of-
day-2.html)

[http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-
day-...](http://muratbuffalo.blogspot.com/2013/04/aws-summit-nyc-day-1.html)

~~~
Yadi
Ow my! This is Christmas for me right here! Thanks again, this is awesome!

------
JSno
the title of his blog"Facebook's software architecture" is misleading.
actually these papers are talking about data storage and database. not
"software architecture." Though, his blog is informative and consistent. Love
it!

~~~
EpicEng
Digital data and information storage is not a component of a software
architecture? Of course it is, and anyone working on a similar project would
need to include these layers in their architectural specifications.

~~~
oebs
One could argue that software architecture is the inner structure of an
application, i.e. the libraries, modules, classes, functions and, most
importantly, their interfaces. The GOF design patterns are purely a matter of
software architecture, for example.

System architecture on the other hand describes the different
applications/services that compromise a system, including the ones that you
write yourself, plus the ones that you just "use", e.g. database, storage,
load balancer, etc.

I would argue that the article mostly talks about system architecture with
focus on storage and I guess the parent you replied to would too.

