Hacker News new | past | comments | ask | show | jobs | submit login
Is it possible to host Facebook on AWS? (sqlizer.io)
198 points by vinnyglennon on June 28, 2017 | hide | past | web | favorite | 154 comments

I don't understand why people think that companies can get special pricing from AWS but not drive feature development. Large customers like Netflix, GE & friends get to drive feature development. Heck, if you're the CIA and give AWS $600 MM they will build a private region for you!

This post has two contradictory quotes in my opinion:

"It’s worth noting that this AWS price wouldn’t be what Facebook paid in this hypothetical situation. Much like Snapchat and Netflix, Facebook would be a heavy and influential user"

"Facebook’s years of specialization for running Facebook are in contrast to AWS, whose storage is designed for multi-purpose (albeit-heavy) use." - Amazon already has several tiers of storage (S3, S3 - Infrequent Access, Glacier, CloudFront)... who is to say that they wouldn't be motivated to introduce more classes by a large customer like FB.

Sidenote: It's remarkable that Amazon happily provides server services to Netflix, who is a huge competitor!

It's not remarkable its Bezos' business strategy.

The way he thinks about things is it's not a threat to his business to have competitors in his space not just because he is prepared to undercut them on cost but because he knows that his true focus is on customer satisfaction. And competition, whether through providing customers with a choice or encouraging his own systems to provide better services is good for his business.

But in this case it's really just because Netflix is too insignificant to matter to Amazon overall. It's just a tiny fraction of their revenue and what they're trying to do.

And if you want to be Machiavellian of course it's good because then Amazon has leverage of Netflix.

Seriously how would it look if Amazon turned down a company because they were competing? It would be an admission of defeat and weakness for AZ. That's not how an "apex predator" thinks.

Keep your friends close, and your competitors....

Keep your friends close, and your competitors hosted

I love that, seriously. Machiavelli's Guide to PaaS.

> Seriously how would it look if Amazon turned down a company because they were competing? It would be an admission of defeat and weakness for AZ. That's not how an "apex predator" thinks.

It looked shitty when it happened with Amazon and Apple

Could you elaborate on what happened? I don't know about this.

I think it's referring to them removing the AppleTV from amazon.com as well as not supporting playback of Amazon Video on the AppleTV https://www.wired.com/2015/10/amazon-apple-tv-chromecast/

The latter decision has been reversed now though https://techcrunch.com/2017/06/05/amazons-prime-video-app-is...

Also, it isn't really Apple's fault for not building an Amazon video app, is it? I mean, it's not like they denied them access to the App Store.

I found this to be really stupid behavior of Amazon. Seemed like they only wanted to push their competing product (Fire TV).

It's good they figured things out, though. A Prime Video app was announced at WWDC.

> Seriously how would it look if Amazon turned down a company because they were competing?

It would look bad: https://www.wired.com/2015/10/amazon-apple-tv-chromecast/

Hmm, sounds like it's the new "embrace, extend, extinguish".

That's ascribing emotions to a situation where they don't really matter.

Amazon's selling. Netflix is buying. It'd be stupid to refuse the money. Netflix wouldn't lose much sleep over it–at their size, building their own infrastructure can't be much worse than renting.

Plus you'd get the idea out there that Amazon isn't the sort of reliable utility company their marketing suggests. They have their fingers in so many markets, a third of their customers would start to fear being cut off.

AWS is also a separate division within Amazon, and probably not lacking self-confidence. The people are judged by the numbers of AWS, and they'd probably resist attempts to get them to forgo revenue just for the sake of some other division's feeling.

I can see their thought process—

"Netflix will most likely exist even if we don't provide them server services as there exist other cloud providers and / or Netflix is big enough to roll their own, so we might as well have a piece of our competitor's success by having their infra spend come to us."

Netflix absolutely should roll their own system. At the very least as a meta layer on top of the major cloud providers.

We have, and we open sourced it :)


Here is a blog post (2010) describing the decision to move out of our own data centers:


To me, the most important part of Netflix isn't the encoding or compute parts, ( AWS ), but the delivery. And all of their delivery, networks are already done by themselves through Open Connect and OP Server.

So they already have their own system. As as some below have mentioned, they could simply move to Azure if they wanted without much trouble.

So Netflix now needs to hire and manage a huge infrastructure team that they don't have today. (Not to say they don't have ops folks but the incremental need would be enormous.)

That's not to say it might not be a worthwhile approach over the long run but it would be a huge strategic investment in something that has essentially nothing to do with what they view as their core differentiating service to customers.

Would Netflix have done things differently if they had known they'd reach this level of usage? Perhaps. But disentangling now would be very difficult and almost certainly disruptive--including a likely reduction in quality of service.

Their infrastructure team + infrastructure would be cheaper then their Amazon spend, guaranteed (based on known Amazon margins).

And disentangling be disruptive? Stand up your new datacenter and shift your apps over with DNS when they're ready to serve production traffic. If you can move traffic between regions, you can move it between on prem and cloud.

You assume that Netflix pays retail price.

I don't. I assume a deep discount because Amazon uses them as a shining case study, despite only their control plane/front end hosted with AWS.

I'm pretty sure they do a lot of their own stuff, AFAIK, they run their own CDN, and then use AWS for everything else.

Why do you think they don't?

Whats more remarkable is Netflix uses them! If I was Netflix I would not fund a competitor unless it was truly best most cost effective option by a significant margin. Amazon's margin isn't in retail, its in infrastructure.

A lot of big companies that compete with Amazon in some vertical have a no-Amazon policy for infrastructure. Some even require their business associates (vendors, etc) commit to contracts to do the same.

I work at a company that sells ebooks and directly competes with Amazon in that space. All of our infrastructure is on AWS. We could move elsewhere easily if we had to, but we probably won't. A good friend who works over at AWS said that they were "more than happy to have [our] business". In terms of net revenue it's probably way in their favor.

It's certainly good PR for Amazon to treat even serious competitors fair on AWS.

Amazon seems to want to be in every business sector there is, so almost any company could end up competing with them.

Making money from your competitors is a good thing. It also ensures their margin is worse than yours everything else equal.

If Amazon didn't host Netflix, Netflix would have been hosted on a competitor's platform. Nothing would have changed about Netflix and Amazon, plus many others, competing in the production and delivery of entertainment content.

Serious question: Would it be legal for them not to?

IANAL but I don't see a judge deciding that AWS doesn't have a million competitors in every dedicated server host or collocation datacenter, even when AWS was the only cloud infrastructure provider. As long as antitrust laws don't apply, freedom of association is pretty unassailable.

See the recent Walmart / Amazon kerfuffle it's like employee non-competes to me http://fortune.com/2017/06/21/walmart-amazon-whole-foods/?__...

Yes, they don't have a monopoly or even close to one in VMs.

Or video streaming.

ianal but I was under the impression that in the US you need to provide equal service to any customer, or else be ready to tread very carefully to avoid legal consequences. This doesn't mean you have to give a competitor special pricing, but I think it does mean you have to sell them your catalog listed products and services at list price, or else be prepared to head to court eventually.

Where did this impression come from? I thought basically the opposite was true: that you could do business with whomever you want, as long as you're not discriminating against a protected class of people.

Therese aren't so much contradictions as indicators the reasonable answer to the question posed is really a simple 'No'. The only sane way it becomes yes is 'Yes it's possible if Facebook rewrote a bunch of Facebook AND Amazon rewrote a bunch of AWS'. That kind of possible isn't very interesting, though. It's surely possible to host Facebook on AWS if Amazon built an exact replica of each FB data center, trucked FB's servers over and then called the result part of AWS.

I think a lot of the smaller details this article glosses over contribute to FB's ability to run on so few machines. For example, comparing my anecdotes of FB's internal network (they probably have the most organized/mature IPv6 network on the planet) with those of my AWS public cloud experience, I'd reckon that a lot of FB applications would require architectural redesigns to operate in that kind of environment.

The real question isn't "can FB be hosted on AWS?", it's "why isn't FB competing with AWS?" because what they've already got is much better for the range of applications that they deploy.

I suppose the reason Facebook doesn't resell their infrastructure is because their endgame is to _be_ the internet. From their point of view, any site you're on that isn't Facebook is their competition.

FB open sourced their infrastructure. Anyone can lift these designs and build their data centers using identical specs.


I think both points are true: FB does want to be the Internet, but they also realize that the path to that endgame lies in capturing (and driving) behavior, not in building infrastructure.

The latter is just a necessary tool. But, you're not going to threaten FB 's mission by simply building out an identical (or even superior) infrastructure. So, there's virtually zero-risk to them in open-sourcing it.

...which gives you a moderately-expensive pile of laser-focused hardware and nothing to run on it. Far less than half the infrastructure story.

(Not that I'm criticizing Facebook. Open-sourcing code is hard -- see Borg.)

Nothing to run on it??

Correct. "Infrastructure" means a lot of things. You don't get Tupperware, their scheduling and resource management infrastructure, most of their logging infra, most of their monitoring infra, operating systems for the whitebox hardware, even power and facilities management for the OC stuff... I could go on. They are quite good about releasing open source, which is why I was careful to point out that I'm not criticizing Facebook; open sourcing this stuff is really hard, because internal software develops tendrils to other internal things that you often don't want to open source.

The point was that yes, Open Compute exists, but its existence is only partially relevant when one claims that Facebook has opened their "infrastructure." The two sentences in GP are basically what I'm responding to -- mainly the verbatim "FB open sourced their infrastructure" -- and I guess the implied context of Facebook infrastructure was lost when I said "nothing to run on it," sorry. I meant Facebook code. So this is clear: Open Compute is good, and I'm appreciative of Facebook, just want to make sure we understand what "open infrastructure" means and its limitations in this case.

Isn't there any software that would let you run your own social network like facebook on that hardware?

Sure, you could write a social network app (or any other server-side app) and run it on that hardware. It just wouldn't take advantage of any of the Facebook-specific features on that hardware. So you'd be paying for a lot of stuff you're not using. For Facebook, it's more valuable to use it as Facebook hardware than to rent it out as generic servers, because Facebook actually uses all its proprietary features.


Maybe, but I think Facebook understands that great products attract users, not scalable/speedy infrastructure.

That's not entirely accurate. Page-load time has a significant effect on user retention. Yes, it's important to have a great product, but being fast is part of being great.

That's exactly why only a fool will use reactjs in their web design, because FB will put you out of business and if you sue them for infringing your patents, your reactjs license is revoked

To be clear: reactjs uses the 3 Clause BSD license. https://github.com/facebook/react/blob/master/LICENSE

It's incredibly permissive (nearly the same as the MIT license), and does not allow for the concept of having it "revoked".

That's exactly why only a fool would build their software product around patent protection.

I see lots of down voting on this person, but not many replies. Is there truth to this comment? Is React licensing something companies should avoid?

Not a fan of FB here. That comment is nonsense. It is completely unthinkable that Facebook would do this.

> It is completely unthinkable that Facebook would do this.

Facebook is a (among many other things) a software company, and software companies do warfare with patents. It might be their least favorite weapon, one they have yet to use - but "unthinkable" can only refer to your beliefs -- it is actually extremely thinkable, even if at this point in time improbable.

The lifetime of JS frameworks is so short that this is not a realistic concern, beyond that the license makes it impossible. Sorry but this is not a scenario that will play out in some way that you'd have to be worried about it when evaluating your choices today.

For react, you are probably right.

I'm really concerned about zstd working its way into some future de-facto standard file format, though - I think the name "zstd" is marketing genius (of the evil kind).

if you're a threat to FB, the unthinkable will happen.

If you're a threat to FB you:

(1) can afford to roll your own framework

(2) can afford the lawyers required to point out that this doesn't hold water

totally true, if you're google, however if you're a startup, and build your platform on react, it would be really hard to switch to other platform once you have million of lines of code.

I didn't know this, this needs to be more well known.

How does this affect libraries like preact?

It's not accurate. Facebook offers a blanket patent license with React, and the patent license (not the software license — just the patent one) terminates if you sue Facebook for patent infringement. And of course, if you sue them for any other reason, nothing happens at all. And even if the patent license is terminated, AFAIK React isn't known to be patent-encumbered.

Acting like this is some landmine that allows Facebook to just put anyone they dislike out of business is ridiculous. Other JavaScript frameworks like Angular and Ember don't even include patent clauses, so even if Facebook revokes the patent grant, that leaves you in the same position that you're in by default with Angular. The license even specifically allows you to counter-sue Facebook if they already have a lawsuit against you.

I think the cause of the complaint is that if they wanted these things in a license, they could've used Apache 2 and been done with it.

Any confusion that arises from deciding to depart from well-understood licenses is fairly laid at their door, in my view.

Apache 2 is somewhat less permissive than Facebook's choice of BSD + patent grant. I agree that people probably wouldn't have been able to spread the FUD so successfully if they'd gone for a more conventional setup, but the licenses are short (probably shorter than Apache combined) and pretty clear. It feels pretty unfair to say "Facebook's open source dev team deserve to have their names run down because they chose more permissive license terms than we're used to."

> Acting like this is some landmine that allows Facebook to just put anyone they dislike out of business is ridiculous.

That's true; however,

> React isn't known to be patent-encumbered.

That's a useless assertion. Many patents only become known once the lawsuits start. Facebook has amassed a patent portfolio, and patents are a legal instrument that can only be used offensively.

Sure, but none of that makes it a useless assertion. I'm not saying React is a magic shield from lawsuits — I'm saying that React doesn't appear to be any more dangerous in this regard than the majority of open-source software. You could validly express the same concerns about nearly anything.

Remember, the claim I was disputing was that "only a fool will use reactjs" because the patent license is conditional. All I'm saying is that I don't see any reason to be more concerned about React than, say, Angular.

It is rather well known but it's mostly FUD.

for those who downvoted me, can you explain why I'm wrong,

Would you believe that Apple today would use react in their products?

What would Steve Jobs say to someone who suggests using react in apple software/saas?

downvoting on HN is being abused here, just because someone is a big fan of reactjs and disagrees with me, they automatically downvote a valid opinion based on facts.

see for yourself, from react patent clause "The license granted hereunder will terminate, automatically and without notice,"


Yes, Apple is known to use React.

If you are in doubt, the license that could terminate is just the patent grant. The copyright license would remain in effect.

I explained how you're wrong here: https://news.ycombinator.com/item?id=14659675

The TL;DR is: "That is just a patent license, and most OSS doesn't even include a patent license at all, so it seems weird to be super-concerned in this case in particular."

Thanks for the reference ,but I read your earlier comment

here's the issue when picking reactjs vs others (like vuejs)

Choice A. by picking reactjs I'm in a clear violation of FB patents, but I'm covered as long I don't sue FB.

Choice B. by picking others (like vuejs), I may or may not be violating FB patents, (virtualDOM has some prior art, as some people claim)

why it's so hard for people to understand that "May or Maynot" is a better choice of "definitely" violating FB patents in case I need to sue FB if they are killing my business by copying my patents.

I know that suing for patents is not best business model. But, having that weapon (even defensively) is better than not having one.

still people downvote for gray areas?

You're being down voted because this core bit of your position is a load of crap:

> Choice A. by picking reactjs I'm in a clear violation of FB patents

Please tell us what Facebook patents one is "violating" by using React.

all tech companies file patents to protect their cutting edge technology, so it's common sense FB has patents on react.

your argument of where are the FB patents on react is an example of your naivety

my position all along has been, using other libs than react is LESS likely to violate FB patents, and thus gives more freedom to compete with FB. if that's your definition of load-of-crap, well, that's not my definition.

They went from $3.6 billion to $10.2 billion in net income in a year. The $3.6 billion is probably more net income than Amazon has generated combined in the last two decades. The $10 billion is more than Amazon will earn a decade from now.

Their focus should be solely on what is going to likely end up being one of the three or four greatest money producing machines in world history (of those owned by a traditional corporation that is).

Cloud computing profits are - and will remain - a joke compared to the $20 billion in net income they'll yield in just a few more years from what they're already doing. Their focus should remain fairly narrow, they're a mere 13 years in at this point. Diving into cloud computing would be making the same exact mistake that Microsoft made with Bing, and that Google made with all their laughable social attempts, and so on. Facebook would split the market further and likely still end up #2 or #3 at best. Their best engineering talent should be solely focused on the golden goose social monopoly, which nobody else has and nobody else can compete with.

> Cloud computing profits are - and will remain - a joke compared to the $20 billion in net income they'll yield in just a few more years from what they're already doing.

I think the market for cloud computing is very, very large.

Consider that it's a composite of "all server hardware vended or leased" and "a significant fraction of middleware and server software vended".

Microsoft are in it because they follow their existing rivers of gold as they shift. Google are in it because they desperately need a second river of gold to allow them to survive unexpected collapses in advertising revenue.

> Cloud computing profits are - and will remain - a joke compared to the $20 billion in net income

Business are different.

Margins also means more vulnerable.

Amazon doesn't like margins. Andy Jessy routinely mock "the old guards"' obsession with margins.

The possibility that fb is Suddenly dying because of some new competitor probably is 10* of Amazon. Just random statement, you get the idea.

As for siganifance, cloud computing is much bigger, it runs the actual computing that produce things that is more valuable than ads. This is in general sense, I do not want to judge their social impact.

Cloud computing is indeed lucrative, look how AWS saves Amazon(AWS generates almost 1B net income a single quarter right now). But the initial commitment is absolutely massive, and before they can really become competitive in cloud computing , they are probably going to lose a substantial chuck of money. Comparing to Google, Facebook is even more of an advertising and media company, wholesales might be what they want to dig in.

"Their focus should be solely on what is going to likely end up being one of the three or four greatest money producing machines in world"

Companies don't "produce money", that would just devalue currency.

I gusss you had a good go at making it sound cool.

Is this your first time reading figurative language?

> The real question isn't "can FB be hosted on AWS?", it's "why isn't FB competing with AWS?" because what they've already got is much better for the range of applications that they deploy.

This. When they bought Parse I thought for sure they would get into the same business as AWS / Azure / etc but then they phased it out. Their have so much network infrastructure I thought for sure they would want to offer such services and yet they still have not.

I'm starting to think they won't ever get into the same business.

Nah, see Yegge's platform rant: https://plus.google.com/+RipRowan/posts/eVeouesvaVX. Facebook already is a platform. They have no need to go into platform-of-a-platform territory. Their platform is what they've created already, so they've dedicated hardware/network designs strictly around that. Going "metaplatform" would just make their own platform slower.

Also to recognize: Amazon is a store, not an application. It doesn't need specialized hardware or whatever. Its customers care much more about package delivery speed / cost, etc than whatever hardware is running in the background. So it can box the same generic web servers it uses and sell them at scale. Facebook is entirely digital. It needs to specialize everything hardware-wise in order to stay ahead. The things it optimizes for, in terms of hardware, likely has no relevance to 99.9999% of basic CRUD apps out there. So they stick to what they do best. Their own platform.

Interestingly, Parse was hosted on AWS until the very end even after being bought by Facebook.

One of the big things it glosses over is core counts... counting "servers" ignores that AWS has current-gen instance families with as few as 32 VCPUs in their largest instance type versus 128 VCPUs in the x1 family. If you look at multiple generations of families the difference between "servers" becomes even more stark.

PleSe expand on this

I think what he means is that not all servers are created equal, so counting "servers" is a useless metric. An EC2 32xlarge instance with 128 VCPUs is going to handle far more traffic than a large number of smaller servers.

This is true, and something I've wondered too. They have an outstanding infrastructure team.

> The real question isn't "can FB be hosted on AWS?", it's "why isn't FB competing with AWS?"

A naive guess would be that providing services for users you don't control is harder than for users you do control.

> If we take the 2012 estimate to be true, Facebook’s server capacity outstrips Moore’s Law.

As well as inflation, and all manner of other totally irrelevant concepts.

It could be relevant if there is some fixed refresh cycle of servers. That is, newer servers can serve more end users because of the faster cpu, drives, etc. The author seems to have skipped the concept though.

Yeah, I looked through the article, but never does he mention that one rack of today's server vintage might have an extra order of magnitude more RAM, if not cores, than one from 5 years ago.

Ten years ago, four-core Barcelona Opterons were so hot and in high demand that, if you wanted to be on the latest servers, you had to show a performance boost worth the premium. Sometime this year you'll be able to buy dual processor Naples systems with 64 cores (128 threads).

I assume that their assumption is that the facebook application has grown in complexity to keep pace with better servers. There is certainly a lot more going on as they had several years ago.

is it really irrelevant? It's a well understood benchmark for tech growth rate is it not?

It's a benchmark for transistor density, nothing else. Of course a single company could grow faster than transistor density can scale. The only possible insight, if you can call it that, is that facebook owns more square centimeters of transistors now than they did in 2012.

That's the original inspiration, but I thought people adopted the exponential rate from the original context as a bench mark for all sorts of stuff. https://en.wikipedia.org/wiki/Moore%27s_law#Other_formulatio.... At best I can figure the problem with the auhtor's usage is that it's not relative to a per unit performance but rather a per enterprise performance, ie not a faster server but more of them over a unit of time. I'm not entirely convinced the restriction to a per unit basis is valid, but maybe that's a more meaningful definition than just arbitrary exponential growth.

I'm a bit mystified by all the downvotes, am I mistaken in understanding that Moore's law can be understood in a generalized context with respect to the rate not the quantity the rate was originally measured from? Basically he observed a particular rate constant or exponential growth profile. The quoted wiki page states: "Several measures of digital technology are improving at exponential rates related to Moore's law, including the size, cost, density, and speed of components. Moore wrote only about the density of components, "a component being a transistor, resistor, diode or capacitor,"[102] at minimum cost."". So essentially when comparing the growth of one quantity vs another it's not the quantities that are being compared but the rates.

In 2012 Facebook had 1BN users and 180K servers. 1,000,000 users / 180,000 servers = 5,556 users per server In 2017 Facebook has close to 2BN users. 2,000,000 users / 5556 users per server = 360,000 servers

Notice how the 1B has only 6 zeros. The awesome part is that this mistake evens itself out later, yielding a correct number for "The of servers".

Keep in mind, you can bet your ass that prices would be drastically lower than the public prices posted.

Large enterprises of any significance, and especially flagship/strategic customers, don't pay list prices.

Every cloud provider has private pricing and enterprise discount programs. That goes for all hardware vendors (e.g. Cisco, Palo Alto Networks, Oracle, etc selling to any company) as well.

I'm sure Netflix pays nowhere near public pricing on AWS and I'm sure AWS pays less than anyone in the world for an Intel CPU.

wholesale vs retail costs...

I'd have to assume Facebook is using well over 500TB/mo of outbound traffic. I'd Assume on the order of 1-2Tbps or more on average.

Disclamer: I work for AWS.

Well over. 500TB a month is only 250KB per active user.

Oh. I see this might've just been a typo. Agree with the 1Tbps+ figure for external bandwidth (and maybe 300%+ on that internally and between regions).

500TB/mo is nothing for FB or AWS.

Hope you don't work with any anything to do with scale at AWS.

You're looking more towards 1 Exabyte a month...

The author of the article makes the assumption of 500TB/mo in the article. I'd agree, probably more like 1EB+ per mo. I guessed the 1-3Tbps figure, I haven't seen FB publish anything around that.

Reddit and Netflix are hosted on AWS so yes you could.

You wouldn't run it the same as if you had your own infrastructure though, and probably you would run it more horizontal and cached but it could be done much cheaper than this estimate. On your own environments you can use larger servers, dbs and sharding or clustering is closer. But the cloud is different and you'd be able to run it but differently. The cloud definitely influences architecture.

> Reddit and Netflix are hosted on AWS so yes you could.

Those are read-heavy properties (yes, even Reddit, a miracle of aggressive caching). Don't underestimate Facebook's write load, which is the bloody difficult thing to scale.

Not sure if I buy that Reddit is read-heavy compared to Facebook, got any data on that?

No comparison -- I wouldn't buy that either. Just saying they're an eyeball property, with a somewhat moderate write load all things considered. Many billions of page views versus their ~millions DAUs is my gut feeling there, since there are probably far more people reading threads off of Akamai than interacting.

Little bit (somewhat outdated) here: http://highscalability.com/blog/2013/8/26/reddit-lessons-lea...

Facebook is recording screen movement and a variety of other logging that reddit wouldn't be recording.

User actions like upvoting vs liking per user probably falls in reddits favor but the content being shared on facebook is much larger in size.

Reddit doesn't let people upload videos and pictures to their site. You have to post pictures and videos on other people sites and then link to them. Sort of the same thing with Netflix. Netflix doesn't have a bunch of people uploading stuff to their servers all the time.

Reddit allows (and encourages) people to upload images to their site for a year now: https://www.reddit.com/r/announcements/comments/4p5dm9/image...

Netflix is small since 99% of their traffic hits CDN.

Some big players with their own hardware host overflow capacity in cloud data centers, e.g. so that in DDOS they can dip into additional capacity. I don't know if Facebook does this, but I expect that many of the big players already peer with cloud data centers.

"We estimate that Facebook has 830,000 servers in 2017."

No way this can be true. Facebook scaled primarily using CDNs like Akamai. Akamai is the largest CDN in the world and has no more than 300k-350k servers all over the world. It's one of the largest distributed system in the world. I worked there, so I know this for sure.

Facebook has recently been moving their data in house, but they do not have 830,000 servers. That's almost 3 times Akamai. At this point, Facebook would be more profitable letting people use their server infrastructure instead of using it for themselves.

Running Facebook over AWS is a CDN problem that was solved by Akamai!

<insert "when you're a hammer..." joke here>

Not to be a jerk, but, running Facebook over AWS is not a CDN problem. That's crazy talk. There are far more Facebook requirements that AWS can meet that Akamai cannot. You have to look beyond their content delivery reqs.

Also, they could absolutely have 800k servers. I don't think comparing their size to Akamai is fair. The requirements of each company are drastically different.

Do I think the 830k is accurate? No, but only because these estimates are usually wrong (unless informed by an inside source).

Akamai handles 15-30% of all web traffic on any given day with 200k-300k servers...does Facebook have that kind of load in a day?

The problem is mostly a CDN one, crunching data and writing to the servers are second to actually serving users fast.

For example, I don't necessarily need to see the latest posts, comments, etc. so you can queue up the writing and sync data later, but when I hit fb.com, I have to see stuff...

I think working for Akamai you're looking at things in a rather CDN-centric sort of way, but shuffling bits around is a relatively small percentage of what Facebook does. Actually doing things with the data beyond serving it up takes a lot more servers. The "CDN-equivalent" servers in PoPs etc for FB are a a pretty small percentage of the total, and the bulk are things like web/cache/search/DB servers.

Servers might not be just for serving. Facebook likely have huge clusters for offline computation/analytics/machine learning stuff. In order to process the logs at their scale, I won't be surprised if they have hadoop cluster that scales over several thousand even 10k nodes, or multiple clusters.

Yeah I thought that math sounded a little fuzzy myself.

Since we're guesstimating numbers a lot, it would be interesting to also guesstimate the human resources costs required to run the infrastructure: X electrical engineers, Y cooling engineers, Z SREs, W infrastructure devops ppls, etc.

Using the cloud you don't need X,Y,Z and presumably will need less of W. Assuming a 100+k/year salary, that's like 200+k/year total cost of employment. If X+Y+Z-ΔW ~= 1000, then the cloud provides a 200M savings on that front. The real number is probably 2x 3x higher, but I'm not going to bother going back and updating the guesstimates.

You are also thinking about this in a perfect vacuum. Global scale presents an entirely different --non-technical-- challenge.

Take for example, Dropbox stated they are moving off AWS -- except in Europe (and other non-US areas I'm sure). Having boots on the ground abroad, operating 24/7, dedicated to the mission of the company where they have not even visited ... near impossible.


Why? Hiring people around the globe is hard who are going to remove that HDD 24/7 is pretty hard. Hiring lawyers to understand foreign law is hard. Keeping your company focused abroad and building a culture abroad is very hard.

How long does it take for a company to expand to India on their own? In AWS, you just change your CloudFormation template to another region.

Right; but it doesn't change the fact that any company capable of operating technical infrastructure at scale (aka any company with compute/storage needs above a half-dozen long rows of racks at a datacenter) can do it for about 60% of what it would cost on AWS.

AWS is still a huge boon to startups from a cash flow perspective - it's like leasing equipment rather than buying it, except you're also leasing a very small fraction of all the folks you need to build/operate the equipment. It's also very cheap compared to trying to build and run the infrastructure yourself at small scale.

And don't kid yourself -- you still need SREs in AWS. They're working on different issues, but AWS introduces as many issues in an SRE's scope as it removes.

Minor nitpick,

1,000,000 users / 180,000 servers = 5,556 users per server

Should be

1,000,000,000 users / 180,000 servers = 5,556 users per server

> We need to take into account that Facebook not only has double the users but also more data created per person - photos, videos, live streams etc. Plus it now hosts Instagram. So let’s double the number.

... but apparently no efficiency gains in 5 years to reduce the number again? Faster processes, more cores, more ram, better storage, more senior-developer-hours, bespoke datacentres... ?

I think you could easily build the NEXT facebook on AWS or any of the big cloud computing platforms. But obviously moving facebook to AWS at its current size wouldn't be practical, and no one would probably ever do something like that.

What I have heard about Facebook infrastructure it would better with Google cloud than Amazon.

A mostly good thought experiment, IMO, if pointless since it's never going to happen.

The answer is no. This is because you can't compare commodity hosting like AWS with specialized hosting like Facebook's data centers and infrastructure software. This question of possibility has nothing to do with 'number of servers' or 'amount of money'. It's architecturally incompatible.

it's still possible, you just have to change the architecture

With that concept, everything is always possible. "Can you do X with Y and Z?" -> "Yes but change Y and Z and then X is possible".

Tech's own Fermi problems.

> Its application code is still developed using PHP

This is surprising since 1) Facebook is most likely using Hack and 2) it could save a lot of money by moving to Go.

> Facebook’s entire site runs on HHVM (desktop, API and mobile)

Isn't Facebook running mostly on React? I'm guessing HHVM only really powers the API, business logic, etc.

I imagine the lifetime savings of having a backend in Go is dwarfed by the one-time cost of such a migration.

That's why they invented Hack and HHVM in the first place -- it was a cheap-enough compromise that didn't require rewriting the whole code base.

I'm not even convinced there's much savings. Hhvm supports an asynchronous pattern, and their proxygen web server (no fixed number of fcgi processes), so the typical PHP bottlenecks may not be an issue for FB.

Wouldn't most of the pressure be on whatever the data layer is?

IIRC, Facebook uses (their derivative of) PHP to collect data from a number of services in different languages over Thrift, which they developed.

PHP is the presentation layer. Much of the actual logic is in C++, Java, and maybe even Go by now.

(Former Facebook employee)

The VAST majority of online serving code at Facebook is written in Hack (PHP) and is part of a gigantic codebase called fbwww. Backend services that do heavy lifting (mostly search, feed ranking, ad matching, TAO, various caches, proxies, etc) are fairly light on business logic and written in C++.

It isn't a microservices architecture at all: they only move stuff to C++ for latency and efficiency reasons. For instance most of the ad stack is written in Hack, but certain key pieces use C++ for better performance. I'm not aware of any Java or Go code that is part of the online serving stack.

Do you know of any place actually using microservices at scale ? ( 400 Million + users )

Your definition of "at scale" doesn't seem to leave room for Amazon, Netflix, or Uber (all in tens to low hundreds of million monthly active users according to a cursory Google search) but... those.

Guess my perspective of scale is different.

Its just an observation of having worked on systems at scale and I dont see the microservices / containers used.

I am curious if anyone else has.

The more I am around containers and the network complexity/issues that are created the less I am convinced it is the most optimal way and I am looking for opinions that differ from mine so I can learn

In my experience, these issues are minor when your large company has an infrastructure team dedicated to the service substrate (CI, self-service incremental deploys, service discovery and routing, standard RPC layer, canonical repository of IDL files, standard messaging layer, persistent storage as a service, time series collection, dashboarding, alerting, distributed tracing, safe internal and external nginx configs, etc).

Microservices are probably a bad idea (for now) if you aren't large enough for such a team.

But you've constructed your question to exclude the companies that work this way, which I think any reasonable person would call "at scale."

Only about 3.2 million people have the internet (as of 2015). Your definition of scale is 12.5% of the entire connected world or more?

This is like the guy in these threads saying $400k a year for a software engineer is "market rate" and/or "easily achievable."

Counting website users isn't a great metric for Amazon because a lot of the traffic is from AWS. But Amazon has been successfully running on a service oriented architecture for about 15 years now.

They mentioned exactly that as a talking point during a talk I saw in 2012, so I'd imagine it's even more accurate now. Thrift exists for a reason.

Another fun bit from that deck was a newsfeed-related PHP function with something like over 70 arguments across a dozen lines -- basically template data to render that had organically grown over time. It earned its own slide. There was an audible gasp from the room when it was revealed in all its glory, and they talked about the work it takes to refactor stuff like that (which is extensive).

It's not one-time; Go will always have an ongoing time cost in reading and writing tedious boilerplate to reimplement features Hack already had built in.

> 2) it could save a lot of money by moving to Go.

[citation needed]

Even if they reduced their server usage by 10% that would amount to a lot of money, and it's probable the performance benefit would be much higher than that.

Facebook spends more on people than they do on servers. I'm sure they have considered switching languages, but decided against it.

What is this assertion based on?

HHVM is a VM for executing Hack and PHP, used extensively on the backend at Facebook.

The majority (all?) of the front end and mobile apps are powered by React and React Native, respectively.

Yes, that was why I mentioned it. The article said PHP was powering their web apps, but in fact it's only powering the APIs since their frontends are SPAs.

> I'm guessing HHVM only really powers the API, business logic, etc.

So "only" everything except UI (which is run by the browser, not FB's servers), then. And that's only for browsers that use it at all, as they still need to render a lot of stuff from the servers both for speed and for people who turn off js or use older or limited browsers.

And this is a weird thing to say anyway, nothing runs "on" react; both the framework(s) and the business logic run on something else.

Explain point #2, please.

Let's imagine Facebook spends 2 billion on servers every year. If it could reduce the number of servers by 20% because of the better Go performance it would save 200 million every year.

Go outperforms HHVM by a wider margin than 20%. I've seen benchmarks as high as 50%. Sure, benchmarks aren't real world applications, but it's safe to say that generally speaking Go is significantly faster than HHVM.

It would need a much higher return than your estimate if 20% to be worthwhile given the downsides (everything breaks, legions of engineers spend years figuring out how to do the migration). Also most systems end up being I/O or memory pressure bound, so the 20% number could easily be 0% or even negative.

PEOPLE ON THE INTERNET BE LIKE: "Let's rewrite everything in Go."

How is replacing their entire engineering staff cheaper?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact