Hacker News new | past | comments | ask | show | jobs | submit login
Why Is Facebook Not in the Cloud Business? (interconnected.blog)
309 points by ceohockey60 on March 23, 2020 | hide | past | favorite | 325 comments

Used to work at fb in an infra team.

Their abstraction for job scheduling (Tupperware) is about 5 years behind - something like Borg or EC2/EMR. Something like that is a fundamental reason why fb can’t do cloud as it is right now.

Plus, the infra teams do operate like product - impact at all costs. Which to most management translates to short-term impact over technical quality.

It would be a true 180 in terms of eng culture if they could pull off a cloud platform. An example is how they bought Parse and killed it, while firebase at google is doing extremely well.

Having said all that, I think focusing on impact over technical quality is probably the right business decision for what they were trying to do at the time - drive engagement and revenue.

Killing of Parse definitely showed that Facebook didn't want to be in PaaS/IaaS. Now, there's going to be severe trust issues with any PaaS/IaaS with a Facebook branding.

Parse was ahead of its time, good Engineering, nice documentations and possible the first platform to make building backend/database system for an app seamless. Even when Facebook killed it, they did a great job at open-sourcing the platform.[1]

I had migrated my existing services to open-source Parse and even built a stateless application with >250,000 users with it few years back. But if I had to build the same backend now, I'll probably do it with Go/pgSQL instead of NodeJS/MongoDB as in Parse.


I mean, yes, but also a reputation for killing things off hasn't entirely stopped people from using GCP.

I should have been more clearer, FB's reputation loss is not just from killing things!

Honestly this was the first thing I thought when I read the headline. No one using enterprise cloud services is going to trust Facebook with their data and mission critical systems.

Well Google has invested more than $30 billion in the could last 3 years. This wait until Google is going to pull the plug is getting a bit annoying. I welcome the competition with AWS.

Yep, I have to agree. Facebook in some ways is incredibly ahead of the game, and in other ways, is shockingly behind - I was baffled when I started working there.

say more

From what I’ve seen their dev tools internally are incredible- Facebook engineers constantly are saying that github is crazy behind and I’ve witnessed things that they can do that would make that true. Other times... they still have pretty old school hookups and getting stuff provisioned can be a pain (but if you can make a compelling business case this can all change)

Does Facebook still use Phabricator?

Yes, but it has forked so much that it bears very little resemblance to the public OSS version.

this is true, but a lot of resources put into a new team that was completely walled off from the rest of the company, in terms of roadmaps and eng culture, could overcome this problem.

I think someone at Amazon in the mid-2000s would not have necessarily looked at their internal stack vs. other companies and picked them to become the breakout cloud winner, but when I was there (late 2000s), the AWS team was kept quite separate, and aside from things like S3 the retail side of the company wasn't really using much in the way of AWS infrastructure yet.

Amazon starting a cloud offering is quite different from anyone coming later. Even though lot of open source infra exists, expectations from dev/companies in 2015+ are sky high. If FB really wanted to have an offering, it needs to be specialized in something specific. May be Oculus game server hosting or something tied to FB/WhatsApp/Insta features

> firebase at google is doing extremely well


It's certainly popular but everyone I know (me included) is running away from it asap.

Our managers keep saying GCP is the best, and a lot of developers hate it. But who cares. And we also tried Firebase, luckily idea was abandoned.

> a lot of developers hate it

Any particular complaints?

(I work on Compute Engine)

Not GP, but GCP's GUI is incredibly confusing and difficult to use. It is clearly a variety of products (poorly) stitched together. That has nothing to do with Compute Engine, and AWS is just as bad (if not worse).

My new projects will all be on Azure because Microsoft has pivoted to a company with reliable investment in UI, developer-friendliness, and long-term support. Serving businesses has also been a core of their company from the beginning, whereas Google seems unwilling to provide the human support required and AWS is clearly at odds with Amazon's primary culture.

It does seem like the underlying products in GCP are very good, but they mostly seem to replicate other offerings from Azure and AWS.

There are also those such as myself who instead feel that Azure's UI is by far by worst of the big three, and that GCP is not bad at all, and that AWS is big, complex, and ugly, yet practical. The long term support story of AWS is also great.

> The long term support story of AWS is also great

I agree in general, but specific services can be a problem. New server images have broken my apps, waiting for AWS to update support for languages or RDBMS can be painful, documentation is generally confusing (mostly because there are 10 ways to do every simple thing), and I bump into bugs regularly.

Thanks for the feedback -- I'd be curious to understand what about the Azure UI you find most striking. For obvious reasons, I don't have much experience with it.

I'm generally fond of our console, particularly the tools that show both the API and CLI invocations for most UI activities. On the other hand, I work on core virtualization, so the things I want to express tend to be very simple ("make me a giant VM", "make me a gianter VM", etc. :)

At my last work I was a developer occasionally doing things with their AWS account.

I now run a large application mostly built in GCP and also now replicated in Azure for various compliance reasons.

So although my experience with the 3 clouds is not equal, I've been exposed to them all, and actually massively prefer GCP for the UIs and CLI tools.

AWS stll has better support by third party tools, and Azure is lagging behind everything.

Thanks for the feedback here -- as I mentioned in another comment I'm generally fond of our UIs, and it's nice to know that I'm not _entirely_ crazy.

A big weakness is that all services have generic names so it is IMPOSSIBLE to google them and the documentation is weak and hard to navigate.

Thanks for the comment -- I think (although I honestly don't know for certain) that the goal is for the names to convey something about what the service does. In particular, so that in the context of the catalog the general purposes of the various services are at least _somewhat_ obvious.

The docs are always a work in progress -- anything you can recall trying to do with GCP that was particularly inscrutable (can't fix everything in one go, but one fix at a time is better than nothing :)

Why are you running away from it? I have occasionally toyed with using it in a couple situations, just wondering what the dealbreakers are.

Not the person you responded to, but I just joined somewhere new and it's clear Firebase was the biggest mistake / is the biggest source of technical debt. It was helpful for bootstrapping 6 years ago, but now it stands out as the worst part of our tech stack. For our small shop, we will not be able to migrate any time soon.

- Most frustrating are the outages we have no control over. It seems like every other day, the server running our firebase instance mysteriously disappears from the internet. We can ping `my-app.fibaseio.com` but receive no events from it. This lasts anywhere from 5 to 25 minutes. Usually there is no blip in the firebase status page. The main devs have given up trying to debug it.

- We are nearing the resource limits of the paid plan for a single realtime database. We're working on splitting our one database into a master and sharding what we can.

- The particular eventual consistency and transaction semantics require workarounds everywhere. If you want an atomic transaction, you have to perform it on some common parent, but you are also encouraged to keep your data model flat and normalize as much as possible. Our integration test suite is tiny and horrendously slow because we cannot rely on Firebase to be timely, so there are huge timeouts everywhere that regularly take ages. Every past attempt at tuning the timeouts to be shorter eventually causes spurious failures a week later.

- Unless your data closely models the kind of chat application Firebase was built around, you end up needing a real database eventually. Not just to perform real queries with joins and complex logic, but also to essentially maintain indices that anemic mobile clients can use because the firebase filtered query semantics are limited to a single key. Now you need some kind of daemon to shuffle data in and out of your real database. Unfortunately, your real database is missing tons of foreign key constraints because that's the easiest way to handle firebase's eventual consistency.

That's just my take a month after joining somewhere intimately coupled to firebase.

>The main devs have given up trying to debug it.

Well, now you have my interest! Is this a talent gap or fatigue in the organization? Sounds like a meaty challenge.

It's a combination of a lack of developer bandwidth, fatigue, and typical difficult google customer service. Previously it was just a sole backend developer who also did the server operations/SRE type stuff plus a couple front end web developers. To them, it quickly became just a fact of life after repeatedly failing to find a solution. The customer service people don't like that, because these short down times always happen during standard business hours (only time our service is ever used). Complaints about disrupted sales pitches or customer demos go ignored.

I'm joining as the second backend developer, so maybe while I'm still green and have the naive bravery, I will go give it another shot next time it happens. I don't know much about advanced networking, but we suspect there's some kind of partition happening frequently between our servers and our production firebase instance. Uncertain how to debug that and how to work around it. It usually affects all of our servers, which are located in a single data center. I suppose next time I will try to track down and inspect the socket being used by the firebase SDK, but I don't really know what to be looking for at that point.

just gonna throw out there - in the time it'll take you to do that socket level debugging, you could probably port the whole app back to sql.

Mostly database-related issues as far as I understand. What I’m wondering is: is it related to Firebase only or more would it apply more generally to any NoSQL databases provider?

laughs in SQL DBA

aside from outages, these sounds like self-inflicted problems.

So I've been using it in production for about 4 years. I still have a couple of projects there, unfortunately.

Firebase lets you grow super easily at first but the bigger your project the more of a problem it becomes in terms of development.

The biggest problem is that databases are extremely inadequate for anything that goes beyond prototypes or very simple uses cases. There are no relations, very basic querying / filtering, no ACID, etc. your logic to interact with the databases becomes super tedious. Fauna is by far a much better serverless DB.

There is huge vendor lock in and their client libraries are huge. The JS library is like 200kB minified and gzipped. You can in principle use REST to interact with the backend but you lose most interesting features (realtime, sync, etc).


Cloud Functions are extremely tedious to work with. Most of your functions cannot be tested locally and they take forever to upload. Sometimes even minutes. Sometimes deploy gets stuck and don't you dare cancel the upload otherwise it will take more time to be able to upload again. With Cloudflare Workers you can do all in local dev and deploy takes seconds. With Zeit Now you can also make all dev locally.

The Firebase Console is a huge piece of bloated JS made in Angular 1 IIRC. It's also very basic as far as functionality goes.

Firebase hosting is good. Nothing to complain but all other options are equally great (Netlify, Zeit, Cloudflare Workers Sites, etc).

Not positive if it's still an issue or not but they had a Geopoint type in Firestore but I incorrectly assumed you'd be able to do geo searches because of that. Their proposed solution was to pipe the Firestore docs to Algolia indexes and use their service. I did that and I love Algolia but don't assume anything because it's Google.

I have the opposite experience. Both my current employer and my old are migrating _to_ Firebase as we speak.

Kind of the same. As a somewhat advanced user of Firebase (2 projects with >25K MAU each), it works extremely well.

When you say that Tupperware is 5 years behind Borg, what do you mean? It would take 5 years to add current Borg features? It lacks features Borg had 5 years ago? Or it would take 5 years to reach parity with Borg as launched, i.e. it's actually 17+ years behind?

The last statement is factually wrong. Borg was launched 2005, 5 years behind would be 20 years.

Also, the majority of their resources are web machines, running HHVM. These machines are just serving monkeys, which are highly optimized for that workload.

I think 5 years behind may be an exaggeration. People can read https://engineering.fb.com/data-center-engineering/tupperwar... and make up their own mind.

5 years might actually be right on the mark? I think Borg and EC2 et al. had all of this functionality 5 years ago, but I'm not sure it was all there more than 5 years ago.

Killing Parse really surprised me at the time.

Is Parse dead?


They may still get into the business, and it would make sense for them to do so, and may even solve their privacy problem while they are at it.

Amazon did it a long time ago, and when they started, they had no sales team. They had to build that capability slowly over time, but they had the first mover advantage, so they had time to learn. Now they are number one.

Microsoft already had the enterprise sales team, they had to learn how to sell cloud. They had some trouble with the shift, and with the technology, but now they are knocking it out of the park, and are solidly in the number two spot behind Amazon.

Google is struggling with their cloud business. While what they offer is technically superior to all their competitors, they don’t have the sales team, nor the experience in building one. They are trying, but they haven’t gotten the hang of it yet, and in the meantime, Amazon and Microsoft just keep growing.

This is the position Facebook would be in. They probably have the technology, or can at least attract the necessary talent to build it (I know that I would at least listen if they said, “Come build a new cloud with us from the ground up”). But they don’t have the enterprise sales team, just like Google.

Now, in Google’s defense, they have realized they are coming from behind, and as such have focused on selling their higher level services. If you want to do AI and Machine Learning, Google’s cloud is the place to do it! And then maybe while you’re there, if your core business is AI, maybe you’ll use some of their other services as well, since your data is already there.

Facebook could make a similar play. They could build a cloud and then require any 3rd party apps that work with Facebook to use their cloud. This would jumpstart their adoption, and could potentially help them solve their privacy issues. They could require that you never send any personal Facebook data out of their cloud, and could closely monitor all the outbound traffic. Then they could theoretically allow even more access to Facebook data, if they knew they could control what happens to that data after the 3rd party gets hold of it.

In other words they could launch a cloud that lets you run your 3rd party Facebook apps in a controlled and audited way, and even give you building blocks to do it quickly and efficiently. It would boost their bottom line, because there is good margins in compute and storage, and at the same time give them more control of their own data.

Google has a real trust issue with me that's twofold.

One is that you can't trust them to stick with a product offering. They are driven by a throw things at the wall and see what sticks strategy rather than some deeper vision for the world. So I'm always suspicious that they will abandon anything they launch.

Two is that they don't believe in offering quality support. I had this issue with an Android app recently. It got flagged incorrectly for a compliance concern and I was not able to reach a knowledgable human under any circumstances, including getting it escalated by internal Google staff. I would have easily paid a support contract, $1k or more, to get access to a human. In the end, I had to guess at what their algorithm was flagging and just ended up tricking it through trial and error.

Google absolutely needs to fix their trust problem, but it is also completely wrong to compare support between their paid GCP customers vs their mostly free Android app developers. Fwiw, I recently called Google customer support to fix a billing problem and was pleasantly surprised that I talked to a real human quickly, AND they were able to resolve my problem and issue a refund. So maybe they are improving on that end too.

>vs their mostly free Android app developers.

But why? Per OP - they'd be willing to pay for an enterprise support contract. Why should a mom and pop shop expect that their experience being a small fish will be any different in GCP than literally EVERY OTHER GOOGLE SERVICE?

It's one thing to tell people using a free service that their only option is automated support - it's quite another to tell customers you just flat-out refuse to offer a paid support model. That tells me that you are organizationally deficient at providing customer support. I've yet to hear anyone using paid g suite speak praises about their support experience if they ever have issues. Quite the opposite.

> Why should a mom and pop shop expect that their experience being a small fish will be any different in GCP than literally EVERY OTHER GOOGLE SERVICE?

Well, for one thing, everyone you ask except apparently the HN comment section will tell you.

I've probably seen this exact conversation play out 10+ times now. Someone says that GCP has poor customer support by analogy to other, mostly free services. Someone who actually uses GCP customer support claims that this is not the case. Some third (or perhaps the original) person blows them off and insists that Google's behavior toward the statistically adversarial and free users of other services must be representative of Google's behavior toward an entirely different group of users.

It's baffling.

The point is not that it must be, the point is that if you've burned your bridges once before, you're going to have a hard time selling again. If I've been screwed once by Google (and I have been, multiple times), selling me GCP has to overcome those trust issues.

Not to mention that I have had rocky issues with GCP as well. There was documentation that lied about its caching behavior, cost me over $1000 of my personal money, they took over a year to fix the bug and offered no reimbursement. Maybe I'm a small fry and don't deserve support, but this is the kind of customer management that Google is absolutely terrible at.

I don't know why you find it baffling. I've tried to use GSuite customer support (as a paid-for customer) and found it terrible. Why would I roll the dice and GCP a try given I might have had concerns about support for other products in the past?

You go to a restaurant, try a dish and it gives you food poisoning. Do you have to go back and try every single item on the menu? The other ones might be great, but realistically you probably go to a different restaurant after that.

This is not what I'm talking about. A data point that GSuite has poor customer support is actually useful! GSuite and GCP are bundled together on Google's financial reports, and probably overlap a lot in customer support expectations since they both handle paying customers. So someone sharing their perspective that GSuite has poor customer support is contributing to the discussion. It's infinitely more valuable that then 9000th iteration of "Remember how Google killed Google Reader?"

To give you the depth of my misgiving here, I just can't see them sticking it out as the third place provider. Maybe they won't kill GCP, but I could easily see them gradually lowering their focus until I would regret having built anything on it.

It's easy to talk about Google apps that have been killed. But there's an entirely different category which just got abandoned or lost their ambition.

So given that, I can take in and acknowledge what you're saying about GCP having good or even great support. I just don't have any faith that it will last.

I saw their Android support through the lens of comparing them to Apple. I pay a small amount to Apple and have no problem getting my support rep on the phone and all support experiences I've had there have been amazing. So why doesn't the Android experience compare? I just don't see it in Google's DNA to fight to win. Android has given up and is trying to be the lowest quality second place they can be. And that's exactly what I expect to happen from GCP. Maybe GCP will fight hard at the beginning to establish itself, but after that, I expect them to do the minimal amount to maintain third place.

> I just can't see them sticking it out as the third place provider.

Maybe that's the case, but I can't see it. Cloud computing is such a huge business, it doesn't take more than a third-place position to make more money than say, Youtube. Some random article [1] has GCP's 2019 Q4 revenue at $2.6B compared to Youtube at $4.7B, and total revenue in the cloud market is definitely going up way faster than total revenue in the advertising market. Plus, the "Google sucks at products" narrative is based on a reputation that people at Google would much rather be building technology than products. Cloud computing seems like the perfect match for such a reputation. App Engine is almost as old as AWS, and looks very much like it started as an excuse to have something for Guido Van Rossum to do.

> I saw their Android support through the lens of comparing them to Apple.

> So why doesn't the Android experience compare?

It came out in the Google vs Oracle lawsuit that Android had only made Google something like $10B since it started. So GCP is already making something like 10x what Android makes, and given cellphone saturation this will likely only grow.

> But there's an entirely different category which just got abandoned or lost their ambition.

AWS has a whole pile of services that you can tell no longer get any attention, or in some cases, even have full time staff anymore. Services that still aren't integrated with CloudFormation after years and years because clearly nobody cares. A guarantee that the lights stay on and the service continues to handle requests is the most you can ask or get from any of the existing cloud providers.

[1] https://www.businessinsider.com/google-cloud-revenue-first-t...

> Google's behavior toward an entirely different group of users.

That’s also a wrong assumption. Android devs may very well be paid GCP customers if they had a great experience with the Android platform.

I am not talking about only full stack solo developers, I am also talking about companies with mobile apps with backend needs.

I use Fi, when it was Project Fi they had great support. Now it is Google Fi and my last interaction with their support was bad enough that I am ready to switch. I would never trust Google with my business.

I work at Lyft and have to work directly with Google on several of their various enterprise offerings.

Their support is downright AWFUL. Getting someone from Google to help is so challenging that they have decided that in order to work with them on their enterprise offerings you must go through a 3rd party vendor. Their third party vendors are all small companies with which my organization has very little trust for.

I will never knowingly try to do anything with Google again after the hellish experience I have had dealing with them and their vendors so far, it just isn't worth it.

  Free Android app developers
Google takes 30% on android app sales. I do not see it as free.

I’ve never launched an app. If you integrate directly with stripe do you still have to pay 30% to google?

That's against the developer TOS for both major app stores, although enforcement is lax in the Google store.

Both the App Store and Play have restrictions on taking payment in-app if you don't use their payment API

Google Ads and GSuite, which are paid products, have terrible customer support in my experience. Maybe our account was not big enough? With AWS, just the opposite: knowledgeable and fast. Was even able to put in feature requests.

Google payed support for GCE is absolutely embarrassingly terrible. To get to someone who knows what they are talking about you have to go through many many people who have zero clue, over frustrating course of many days. Every.single.time.

We're not going to be the biggest fish, but our spend is approaching $1mio per year. If you spend more than that and have a TAM YMMV.

>> Fwiw, I recently called Google customer support to fix a billing problem and was pleasantly surprised that I talked to a real human quickly

You just described their problem. You pay them money, whatever they ask, and are shocked that e human provided support to you

Neither of these concerns affect Google Cloud Platform. 1) Google has not cancelled any GCP product that I know of, and 2) GCP has excellent support with fast SLA-timed responses from support personnel. Additionally, Google, the org., recently decided to redouble efforts to grow GCP.

I've brought this up a few times, but the Prediction API was deprecated: https://web.archive.org/web/20200112103521/https://cloud.goo...

I understand Google heavily rewards new product launches with promotions. Is that part of company culture not at all within the GCP org? I don't know, I'm asking.

People aren't conflating GCP with the rest of Google. They're just unaware of any markedly different promotion incentives in that org.

It doesn't matter though. Until they fix their product trust issues for the rest of their business, people will always conflate them.

I understand what you're saying, but it can be frustrating at times.

"I got a counterfeit product on Amazon, therefore I won't trust AWS".

Yeah that's legitimate. "Amazon doesn't care enough to make sure counterfeits are struck down, continuing to put people at risk. Why wouldn't that corporate culture and lack of concern for the well being of the customer transfer over to AWS? What if I'm hosting with someone that'll leave big security holes or oversell capabilities just to drive sales at the expense of quality?"

Trust is trust, and if amazon can't be trusted to behave well in one context, over time, that erodes people's trust in the platform to behave well in other contexts. Thats what's going on with Google - they killed reader, and it's the same company, why are we magically expecting different behavior in cloud?

It might not be warranted, but that's irrelevant. Trust is earned, and Google isn't entitled to it, they have to earn it. Doesn't matter how they lost it, market forces are market forces, they need to get it back

I think that doesn't weight as much in people's opinion because the counterfeit problem only came uo in the last couple years. AWS hat already a good reputation by then.

Yep. Amazon search has built that trust of Amazon (although plenty of retail companies refuse to use AWS because of the retail competition). But there was a time when Google was universally the coolest company in the room - and I'd be interested to see how deep the trust well goes with amazon. The one thing with amazon you can trust is that they want to make money, and cloud makes them money, so they're at least not gonna pull the plug on you.

> "I got a counterfeit product on Amazon, therefore I won't trust AWS".

This is actually a good argument. I can easily imagine that this could become a problem for Amazon in the future.

It's not just "I bought a counterfeit item on Amazon".. But instead:

"Amazon continues to allow counterfeits to pervade. I cannot trust them with anything important."

This was an issue - SES email deliverability out of Amazon was poor for a while before they got their head straight around supporting spammers. Haven't kept up, but it may still be lower than folks paying more attention despite amazon's size and skill.

Sure, but when I got a counterfeit product on amazon I called them and got it fixed quickly and relatively painlessly.

It's not just about GCP and cancelations. By now I associate Google with high maintenance, constant and unnecessary change (great for Google, but nothing in it for me), no maintenance and bugfixes, and neverending re-writes.

I feel exploited. I feel like I am the product. This might work when providing a free service to the public but we, developers, are not the general public.

As condescending and cliche as it may sound, we don't like our time wasted. Google's free services should be a magnet to entice, a funnel to capture and channel our hearts and minds, not the developer repellant they have become.

The fact that someone has to spell this is out is a testament to how out of touch Google has become with its traditional base - developers.

> Neither of these concerns affect Google Cloud Platform. 1) Google has not cancelled any GCP product that I know of,

Of course it affects GCP, because due to their existing reputation, people think “Google has not cancelled any GCP product yet

Not sure why you're being downvoted. It's absolutely valid. Trust is trust, and if you can't trust the organization to behave in the correct way in one situation why would you expect them to behave in a correct way and a different situation if it's the same organization? Same culture same compensation. they're not entitled to trust, they have to earn it, and it doesn't matter how they lost it, it's their job to get it back, not ours to see the good in their hearts or whatever

"GCP has excellent support with fast SLA-timed responses from support personnel"

You have to be kidding me. Their payed support is terrible. We recently had a problem with occasional timeouts contacting the GCE container registry which causes our autoscaling groups to sometimes fail to start new nodes.

I shit you not but this one of the selected support answers (after many prior back and forth). Obviously, this issue is still ongoing after many days.

"Thank you for your information.

I have searched our internal documentation about any service outage and any network issues open recently, And so far I can’t find any explanation of your issue.

However, I found the following instruction [1] where describe how to run a local copy of the Google Docker Registry.

In addition, also attached the docs where described about how to set up a private Docker registry [2].

If above instruction does not work then please let me know and I will be happy to help you.

[1] https://stackoverflow.com/questions/27243294/unable-to-pull-... [2] https://www.digitalocean.com/community/tutorials/how-to-set-...

Seriously? Run a private registry? We don't run a private registry because we don't want to deal with that stuff.

People can't separate this out

These concerns do affect the platform since clearly on HN, a technical forum, the trust issues engineers have with Google are a serious consideration. Sure they haven't cancelled anything yet, the concern though is that very few people have faith that Google won't cancel things in the future.

They may have changed their policies since 2018, but they used to close accounts with no warning.


>They are driven by a throw things at the wall and see what sticks strategy rather than some deeper vision for the world.

You've really hit the nail on the head for me there. The people I really respect are the people who can look at idea a and say "that's failing because it's a bad idea" and look at idea b and say "that's a good idea, but we need to work harder at it". Google (from the outside) appears to say "These are all ideas, and they failed, kill them". There's no understanding or insight into the world, there's only 'experiments', which is as good as throwing a random number generator at an authentication system.

Thank you.

It's interesting that they've chosen to bring in someone from Oracle to fix these problems.

Where GP claims that Google "doesn't have the sales team nor the experience in building one", it seems like an Oracle exec would be a perfect fit, because for all the hate levied against Oracle, for all sorts of reasons, it seems like enterprise sales is something they're quite good at.

Based on my conversation with my FB friends, FB's major challenge to tapping into cloud business is their "moving fast and breaking things" culture, and their arguably cut-throat impact-oriented perf system. We have to admit that not all tasks in development are glorious. Someone has to be on call. Someone has to fix all the important bugs. Someone has to implement features that customers ask for, even though the features may not be glorious. I personally even enjoy bug fixing, as it is closest to scientific investigation. However, would I prioritize bug fix if I were in FB? Hell no. I would launch launch launch, and show impact impact and impact. Would that be good for cloud customers or the business? Of course not. Would that erode customer trust in the long run? Oh yeah. Sometime somewhere a nasty bug will show up in this type of culture.

Google has something similar and promotion is tied more to novelty than to keeping the lights on.

That tends to promote product churn.

Don't forget that Amazon is essentially a retailer, and their a certain amount of customer trust is essential.

And Amazon has “customer obsession” as a core value. I just don’t see that at Facebook or Google.

Because most of the time, a Facebook user or a Google user doesn't pay at all. Only advertisers do. And that's why these two companies generally don't have a customer-obsession culture.

There's a reason FB and Google call them users instead of customers. I remember an old joke... "there's only one other industry that calls its customers users."

How is that different than e.g. Amazon? They too have a vested interest in moving fast, innovating, etc... That's really how aws came to be anyways.

AWS rewards bug fixing. More specifically, they reward making customers happy with existing products, which often includes bug fixing.

I would say aws even biased against rewrites and would prefer incremental improvements on the existing code base. Pretty polar opposite of google / fb.

>Google is struggling with their cloud business. While what they offer is technically superior to all their competitors, they don’t have the sales team, nor the experience in building one. They are trying, but they haven’t gotten the hang of it yet, and in the meantime, Amazon and Microsoft just keep growing.

Don't forget the support. I used to work at a Series B startup, and AWS provided excellent support. We probably tossed them a few million a year, so I can only imagine what kind of white-glove top-tier support a big enterprise spender would've received.

We got responses for anything ranging from general UI questions to highly-in-the-weeds Redshift technical questions within 1 business day. >90% of the time, the issue was resolved upon the first response.

Google on the other hand, has very poor support. Facebook is by no means a support paragon either.

For this reason alone, I see a Microsoft-Amazon cloud duopoly with anyone else being a minor player as the only outcome. When shit hits the fan (as it is now) you need enterprise support capabilities.

We're owned by one of AWS's biggest clients and we get to straight up request features every other month and have their teams build it for us. We have a few humans assigned to our account that will physically come into our office when we have problems worth discussing. We also get access to quite a lot of price/billing advantages.

Google we can't even get them to answer an email.

But you're not one of googled biggest customers, so.... If you were, your experience night be very different.

We're smack on the frontpage of https://cloud.google.com/customers/

Google is more important to our parent org than Amazon; Google is a strategic partner, Amazon is simply another tech vendor. We might be part of a 11-12 figure megacorp now, but Amazon hasn't treated us too differently from since we were a tiny startup with not much spend, and neither has Google.

Yep - AWS even goes out of scope on support requests for smaller folks (they really shouldn't).

Google "support" is horrendous. You cannot PAY to get them to help you. We were on google apps (a long time ago now) and there was some state issue with admin transitions - so you'd get stuck. 100's of begging comments from plenty of PAYING users about the issue on their forums. Calls got you zilch. Crickets. Finally 2 years later - oh, we noticed blah blah and this might work now.

I would never trust anything critical to google. They literally will NOT take your money to help you - we'd have paid $10K to have someone press whatever damn button needed pressing.

I know GSuite is on a whole tier lower standard of support, but man do we have horror stories there. Here's a fun one:

One of our engineers needed to send out a couple hundred individual emails to other people inside our company. So, being an engineer he automated it, wrote a little script to send the emails via SMTP. I guess some system within Gmail flagged it as suspicious behavior and froze our account. Oh hey, now no one in the company can send or receive email. Great. IT tries to contact support. They told him to send an email from our account to open a ticket. We can't send emails. Took several hours of phone tag to eventually reach a human who wasn't on a helpdesk flow chart. His response? Google can't/won't do anything just wait for the automated systems to eventually release the freeze in the next few days.

So yeah, no company emails for the next day or so. Google thought that was a perfectly OK way to run an enterprise software service.

haha - this matches exactly my experience from above and a few other times! Glad I'm not the only one they crapped on despite being a paying customer. I'm playing with GCP but haven't had anyone actually deploy to it that I work with - let's be safe goes a long way to keep folks making decisions off google I've found.

I like their search / email products though.

Could you provide examples of "very poor support" from Google Cloud?

AWS support tended to be very sharp, understand the situation and workflow holistically, and occasionally provide additional information (i.e. they could understand what, as the customer, I _needed to know to resolve my issue_ rather than only answering _exactly what I asked_). Like I said, 90% or more of AWS support asks required no follow ups. Not the case for GCP.

GCP support tended to give more generic or vague answers, or would simply "unblock you to the next blocker". As a support expert, you'd hope they understood the workflows, didn't seem to be the case. Google searches seem to indicate this isn't too uncommon amongst cloud platform users. GCP is the technically most sophisticated product, but my experience as a user was stability was far more comforting when fires broke out, as they will for any cloud vendor.

As an aside, I once experienced a G Suite "circular lockout" issue where I had to request permissions from myself. I spent hours agonizing over fixing it, and never actually heard back on that issue at all from the support team. I'm sure GCP and GSuite are independent support teams, though.

Note: I haven't used GCP in a bit over 12 months. Maybe things have changed since then, I don't know. But it certainly seemed that if you're a small company, they don't really care about you. I'd assume that GCP's large enterprise clients received excellent support.

You must have a really big contract with AWS to have such a great experience with AWS support, or are still at the level of manually requesting limit increase tickets, or something.

It seems like half the time AWS's tier-1 support sends the me a link to the documentation that I linked to in my ticket, the other half the time I spend hours, sometimes days of engineering time to get the logs they asked for, only for them to come back with "I talked with the product team and oh yeah that's a known issue". If I'm really lucky, I don't have to prove to them it's their fault that something's broken before they admit that there's a problem.

Frustratingly, it's all covered under NDA, which is where things really get ugly, because you can't even really talk about it. I'm sure support is awesome if you're Netflix spending however many millions of dollars a month. Maybe at that level there's a secret site that straight up says product X has limitations Y and Z, but after having to prove to AWS that the problem is on their end, multiple times, their enterprise support is worthless. It's a good lever to push on in a contact though.

I've never used GCP support so can't say anything about it.

> Maybe at that level there's a secret site that straight up says product X has limitations Y and Z

I promise you that there is not. :)

Usually it went the other way. We (Netflix) would say "hey we think we found a limit in product X" and they would come back later and say, "huh you're right no one has ever seen that before". Then we'd get on the phone with the engineer who wrote it and we'd work through the bug together.

So in a sense, yes, we had amazing support, but only because we were their beta testers. It was a good relationship though, because it meant when we had problems we got a lot of help.

They were always willing to put the resources in to make sure we were insanely happy.

But talking to other customers, that part of the equation seems to be there no matter who you are.

> Google is struggling with their cloud business. While what they offer is technically superior to all their competitors, they don’t have the sales team

You sure about that? GCP has hired (or acquihired - Diane Greene for example) sales/management execs from all of the big boys in the industry (VMWare, SAP, Oracle, etc).

1) I see no indication it's "failing", just lagging behind the others.

2) My theory as to why its been lagging behind AWS/Azure is because of trust. AWS was first and is now the "no one got fired for using AWS" of cloud computing. Microsoft is simply entrenched. Oh you want to migrate your on-premise Hyper-V Windows OS's to Azure, we'll gladly help you click this button. Google is notorious for lack of "can I call a person?" support and I think that's permeated itself into its enterprise sales.

GCP did $9 billion in revenue in 2019. Amazon and Microsoft may be ahead of them, but saying GCP is "struggling" is a weird way to put it.

GCP did not do $9B in revenue. Google Cloud did. That includes GApps.

And by struggling I mean struggling to keep up with Amazon and Microsoft, whose cloud computing units are significantly larger and growing faster in both absolute dollars and by percentage.

In other words they are behind and falling father behind.

> larger and growing faster in both absolute dollars and by percentage.

Where are those specific numbers? AFAIK they don't split it out from all of Google Cloud.

They don't break them out publicly but they appear in private analyst reports, which are generally very well researched and pretty accurate.

Which are where? Source?

EDIT: I did some quick googling and no one seems to share anything. So where are you getting your guidance may I ask?


From private analyst reports. You basically have to know someone to get them (or pay a truckload of money). Sorry I can't give you a better source.

I'm well aware of their model. So do you personally have access to one? Can you provide a specific number? Or is this all conjecture?

Diane Greene was fired.

> Microsoft already had the enterprise sales team, they had to learn how to sell cloud.

They also had to grit there teeth and embrace Linux, which would have never happened if Ballmer was still there. There is just no demand/need for Windows only cloud offering. You want to be in cloud? You better embrace Linux.

> While what they offer is technically superior to all their competitors

I don't have exposure to their offerings but I've heard this and it is rarely qualified with specifics. What is it they have that is superior?

Namely their network and load balancer. The network is more stable and higher bandwidth and the load balancer is less pathological.

Edit: Amusingly I was downvoted for providing specifics. So I guess others don't agree with my assessment?

AWS offers hundreds of very mature, extremely reliable services, many of which have either no competition or half-assed clones with few of the same features.

Serverless is a great example of this. GCP and Azure both have serverless offerings, but neither of them has the equivalent of Lambda Layers, which has been groundbreaking.

Even if we grant that Google's network is better, how can you point to that single dimension and claim that GCP is a better cloud platform? For most business's use cases it would be professional malpractice to recommend GCP over AWS or Azure.

The products that Google does offer are technically superior to their equivalent products at AWS.

AWS has a much bigger breadth of offerings, and I 100% agree with you that it would be malpractice to recommend GCP over AWS, for myriad reasons.

But by mentioning Lambda layers, for example, you're not making an oranges to oranges comparison.

For the actual functions offering, for example, the Google one is cheaper, has more consistent network access to the data stores, and starts up faster. Technically superior in every way.

But I would never use GCP's serverless offering unless I had to.

Whatever the advantages of GCP, they don't amount to "technical superiority" imo.

The fact that Oracle's main relational database offerings are still the most technologically sophisticated does not mean that Oracle Cloud is technologically superior to AWS (or GCP for that matter). It just means that they beat the other cloud providers at one thing.

> They could build a cloud and then require any 3rd party apps that work with Facebook to use their cloud

That sounds like an antitrust investigation waiting to happen. (Or it should; that Google hasn't been murdered by prosecutors suggests that the system isn't working as it should)

Yea, this actually seems to fall pretty specifically into the Microsoft case.

AFAIK, Google does not require third parties working with Google to use GCP

> AFAIK, Google does not require third parties working with Google to use GCP

Oh no, I don't think so; I was thinking of their exploiting their search dominance to push Chrome.

That sounds like Firebase on Android.

Facebook may be too late to the IaaS game which is already a crowded field. I think they could easily do PaaS/SaaS based on their bread and butter domains. Between FB, Insta and WhatsApp they are probably the top mobile developer in the world, handle more user-generated content than anyone and messaging. These are super valuable services. Imagine spinning up a business and subscribing to Facebook's content moderation tools or app deployment pipeline? Those would be extremely valuable and don't have an outright market leader right now.

Google didn’t have sales team? Uh who do you think selling all those ads?

Most ads are managed and sold by other agencies. Google redirects all but the largest accounts to its channel partners. And selling media is nothing like enterprise infrastructure.

Just want to say that I always see you post on ad-tech related stuff and your comments belong to the infinitesimal fraction of ad-tech stuff on this site that is informed and not just the result of pure speculation.

Thank you, I appreciate the note. It's based on 12 years and 4 adtech companies worth of experience in the industry, and now trying to change it from the inside.

Like it is and it isn't right?

Some media sales-people would be super comfortable selling enterprise stuff (because a bunch of them started at MS/Oracle/Sun etc).

The front-of-house salespeople would be fine, so you'd just need to hire the back-of-house sales-people (normally the more technical ones).

The real problem is that Marketing and IT are very different departments, and it's hard to change contacts in one area into opportunity in another (I think this applies to both Google and FB).

But mostly, to answer the titular question: it's about margins, nothing else. There's less money in cloud than in ads. So, like Google, they'll probably do something like this when investors start to worry about the ad market.

Sure but it's rare to successfully transition. The ad market is very political and depends on connections, not product. Google salespeople are also known as "order takers" since they don't really need to sell, companies always buy them by default.

It's the opposite in enterprise tech where Google is the underdog and needs to do quite a bit of selling, and base it on the strength of the product and functionality. GCP is hiring strongly for enterprise/software sales experience but struggling with the leadership to use them effectively.

There's plenty of margins in cloud though. AWS has proven it and GCP has the potential to be bigger than their ads business, but I agree that it's not currently as lucrative today.

> But they don’t have the enterprise sales team, just like Google.

Paraphrasing OP: Google knew how to sell ads, not how to sell cloud...

And msft sales knew how to sell cloud because they’ve been selling AD and office?

They had been selling infrastructure (Windows Servers). They already knew how to sell infrastructure. In fact, most of the first sales of Azure were just as an add on to their existing enterprise contracts.

- SQL Server

- The Dynamics suite

- Visual Studio / TFS

- Windows Server family products

All of those were "cloud-adjacent" even in their proto-forms as on-premises–only offerings. Even if they weren't "real cloud infrastructure" they were big Capital-E Enterprise products and a lot of selling cloud is still B2B Capital-E Enterprise work at the end of the day.

I remember in the late 90s (I think) articles about how MS just didn't get enterprise and would never break in.

Yeah these are on prem software licenses. Completely different from metered virtual infrastructure.

You're not thinking big enough. It's not about the specific thing you're selling, it's the skillset.

Microsoft sales has a skillset of going into a company, talking to the CIO/COO/CTO/CEO and selling them infrastructure to run their business.

Google has the skills of going to the CMO or the marketing manager and selling them advertising.

Which of those skill sets translates more directly to selling cloud infrastructure?

Sure, but the point remains that this was a sales engine that was very similar. It certainly was a faster path for Microsoft to bootstrap cloud sales teams than whatever Google's long, winding path from ad sales to whatever it is they think they are doing with GCP sales.

(Though it hasn't seemed to work effectively for IBM. That may just be IBM incompetence as old school commodity-priced Mainframe mentality should be exactly the same as cloud sales. There's probably an amusing alternate world where IBM Cloud is using something like 1950s presentations only slightly modified to sell "cloud" in 2020, just because they could.)

Doesn’t matter, it’s the relationships that count. People from Microsoft eat filet mignons with the top 50 IT execs at every major company. That’s why any turd of an MS product can immediately get tens of millions of paying users.

There's a different between selling ads, and selling IT to enterprises.

Google had experience in the former. Microsoft had endless experience in the later.

Ive thought they should break into the "social network services" market for a long time. You want to build the next reddit or facebook, facebook is equipped to handle a lot of the heavy lifting. Facebook could profit off the competition, by selling them engineering and compute as a service, without having to gobble up every threat.


"Build the next Reddit or Facebook, on Facebook Social Services"

That does sound good, even for someone like me who has spent considerable effort ridding my life of FB's influence.

The killer product they have is their users and the data mined from them. Allowing third-party apps to leverage their infrastructure as a service would be brilliant.

> While what they offer is technically superior to all their competitors

No it's not...

They were. Doesn't anyone remember Parse?


They still are in SaaS - with 'Workplace by Facebook'

The article is referring to IaaS from my initial read.


Parse doesn't seem to have been any kind of general-purpose cloud at all.

Rather, it appears a highly targeted backend specifically for mobile apps only. The article says "back-end services for data storage, notifications and user management".

They're some pieces of a cloud, for sure. But it's worlds away from the general-purpose offerings like AWS and Azure that the author is talking about. So not surprised the author didn't bother to mention it.

The author is specifically talking about higher-margin back-end infra beyond compute/storage, just like Parse was:

“... a hypothetical ‘Facebook cloud’ could offer very attractive, differentiating services beyond the basics of compute, storage, and network.”

I am kind of amazed the author didn't mention Parse once.

I was exposed to Parse, but was it ever widely used/popular?

It's still actively developed. https://github.com/parse-community/parse-server

Yes, it was a fairly popular BaaS offering during 2014-2016


It was definitely popular. It used to be the go to solution on r/AndroidDev for people wanting a simple backend solution without having much experience. Everyone switched to Google Firebase after Facebook shut it down.

Why? The discussion is about IaaS not SaaS

The difference is that Parse ran on AWS. In my opinion, the real reason Facebook isn't in the cloud business, and why Parse was eventually shuttered, is that Facebook will never allow external code to run anywhere in their network.

I worked at FB, and I think they would probably do OK if they entered the space as they are very smart about physical infrastructure, but the most plausible explanation I heard from an executive (forget which one) when asked this was that they are already building out at a very fast clip, and feel that they can make a better ROI by using that infrastructure themselves rather than renting it out.

Zuck has said as much at all hands meetings.

If the growth knobs were turned off across all it's products and spare capacity became a thing, then it might make sense, but at the rate they are building there is no purpose to being a middle man.

Make a code optimization and save 1-2% of CPU on the web tier and you will be a hero.

Matches my experience - was very much a "sure, we could do that, but it's not our core competency or the best use of our resources"

Right now facebook has a couple options when a new network emerges. Buy the competitor, clone the competitor, ignore them, compete with a not completely similar product. Why not another option, like "take a percentage of their profit in exchange for service." I dont think it should be looked at as just as direct ROI, but also as a hedge against future obsolescence. Its a better position to be in, to be the ocean, if everybody wakes up one day and jumps ship.

Visa and Microsoft make a ton of money being a grease or tax on other business, a small shaving of money in exchange for massive efficiency, instead of trying to BE every business. Is anyone offering an out of the box Social Graph as a Service kit?

It would also be a huge growth opportunity for Facebook Ad Services, to be THE ad marketplace of all new social networks. And in a play right out of Adobe's playbook, buying some things like bigcommerce, ecommdash, shiptstation would position them to sell underlying services that allow their customers to compete directly with amazon, ebay, adobe, salesforce. There is surely a lot more money they could make selling saas and pass, vs becoming another generic iaas provider.



You make money if you are the #1 or #2 player in a market. If you are more like #4, you will just lose money. If Facebook tried to enter the cloud market today, they would be way behind. Without a unique advantage that would induce customers to switch, why would anyone switch? And if they tried to compete by cutting prices they would just lose money.

I agree with your statement, but there's still a lot of room for improvement. AWS is dominant, but their software/tools are horrendous. Their web admin UI looks, feels, and has the general usability of some horrible Java enterprise tool designed by a committee of Windows developers in the mid-00s, because that's exactly what it is. I don't know how Azure and GCloud compare, but they cannot possibly be worse. AWS does what people need, but I don't know anyone who likes working with it. Except maybe Windows admins from the 00s.

Agreed, I feel like AWS has the worst UI of them all, followed by Azure's Metro/Visual Studio-style UI and then GCP is OK on the UI front, but lacks when it comes to monitoring and configuration options for things like CloudSQL for example.

AWS is all about the API, some services don't even have a UI. Once you start using the API AWS is the best option out there.

Yeah. My AWS experience is on small-scale deployments--up to a few dozen servers. It's painful to manage them through the Amazon UI. When I left that client, the devops team there was transitioning to TerraForm and other deployment and configuration systems, and connecting to in-house monitoring systems. That's the right approach. It's out of reach for smaller teams who can't afford the extra engineering; they're better off with something like Digital Ocean. DO has a fantastic UI and a decent API, but doesn't support anywhere near the level of enterprise features that AWS does.

I did hear that, but sometimes is nice do do things from the UI, especially when testing new services.

Omg.. yes. Their UI is awful. CloudWatch gives me ulcers. But! At the same time, infra engineers live on the command line using their CLI and what not.

Dev facing FB interfaces are hard to navigate and chaotic. Not a fan personally.

What would be interesting though is infrastructure that leans on what they know best in terms of architecture and development.

GraphQL APIs, React frontends, PHP, JS, MySQL... Stuff that works for them is widely adopted by the webdev world as well and webdevs want convenient, scalable infrastructure.

Oh and they also have people working with/on Haskell right? I bet tons of Haskell devs would love use FB infrastructure if Haskell support was provided. A niche market for sure, but I suspect it is one that longs for something like this.

I feel seen and heard. AWS' console, client libraries, documentation, naming, etc. never fail to disappoint. Do I think Facebook would do better? No clue.

I had a chunk of credit to spend on GCP, and it's been much better, and seems to be about the same price. (We went with AWS at work because, at the time, it was simply cheaper for us).

Google Cloud has an amazing interface, and their new developer experience is really nice. I normally don't have a lot of positive things to say about Google but they definitely did a good job there.

But who really cares? The web console is something that you only use once in a blue moon. What AWS did right was make a workable command line and API so that you don’t have to point and click all day.

Gcloud’s interface is significantly better

Is that really true? I can think of at least 4 brands of car, shoes, jam, and even hosting services.

Cloud infrastructure requires a massive, massive, massive amount of up-front investment. And below a certain % of the market, it becomes impossible to pay back that investment or even keep it going.

The industry is closer to large airplane manufacturers (e.g. only Airbus and Boeing), search engines (Google and a little bit of Bing), and similar.

Shoes, jam, and hosting services all require minimal investment. (Cars require a lot of investment too, but not at the level of building a cloud. Which is why there are more than 4 brands of cars, but not 4,000 like shoes or hosting services.)

EDIT: In response to comments below, I'm not talking about physical infrastructure here, which does scale rather than being largely up-front. I'm talking about the absolutely massive software investment in tooling and reliability. Not only are you building 1,000's of API endpoints across dozens of services, but you need to also guarantee 99.99% uptime and virtually zero data loss when required. And sure, a company like Facebook is the type of company that can engineer that stuff -- but it still costs $$$$$$$$$$$$ to build.

There are dozens of providers outside of the top 3, like DigitalOcean and Packet. FB already has a massive investment for their own infrastructure so the marginal cost is greatly reduced.

I guess the point would be that they've already made that investment by building their various platforms.

> Cloud infrastructure requires a massive, massive, massive amount of up-front investment

Citation needed. If anything, it's the opposite, it scales up very linearly.

Digital Ocean is doing well. They're #4 as far as I remember. Maybe Oracle Cloud is suffering but that's because no one trusts Oracle - custom chipset or otherwise.

They could open source their container scheduler Tupperware and become the GKE in that space. FTE Facebook Tupperware Engine

I heard they are very passionate in aligning to their mission - empowering people to form community and brining world together.

How does entering into Cloud Business align to the company mission? Wouldn't it be a distraction?

I think you misspelled "spying on users and selling data/ads."

Well both Google and FB have some sort of vague bullshit mission statements. What matters at the end of day is growth and how wallstreet sees them. That's how they measure themselves. Facebook more-so than it's peers "Growth at any cost".

(Former FB Eng)

The main reason is because it doesn't align with FB's values.

How is building a cloud business going to connect the world?

(As fluffy as this sounds, mission was taken seriously -- all the way up to Schrep)

How do the Libra crypto connect the world? No sarcasm here.

Having an easy way to send money around the world without having to worry about which currency it should be in would be a huge help for some communities. A lot of artistic/creative groups work on a kind of semi-gift-economy model where everyone buys everyone else's stuff, not quite as an arms-length market-value transaction but as a semi-social interaction where you buy things from the people you hang out with online at what's probably more than a "fair" price. Having something like Patreon but less US-specific and more closely integrated with social media (admittedly Patreon's integration with Discord is starting to fill that gap) would genuinely help.

It allows anyone around the world to send and receive money instantly and at low cost.

Without first-mover advantage, without the enterprise sales network it would be just foolish for FB to waste their time and energy on such highly competitive field without much of natural advantages. You don't really synergize social media with cloud infra.

And they know it too, so they rather compete in fields where they are good at, namely social media and apps. Maybe its part laziness, part lack of interest on their part too. But to me it would be wiser to focus on some promising new areas, rather than start catching up 2 laps behind the competition. Yet it's not easy, like VR/AR market shows. Maybe in ML services they could challenge Google a little more.

> You don't really synergize social media with cloud infra.

I think your first-mover and competitive industry points stand, but I don't see how retail and search synergize better with cloud infra than social. The cloud infra business model is (as far as I can tell) just building general purpose computing-at-scale features in a reusable, billable way and selling those to consumers. VM/container/database/serverless/etc orchestration isn't more aligned with retail than other businesses as far as I can tell.

Well with retail you are kind of already in the business of buying and selling hardware, so to me it doesn't seem that big of a stretch. In both you are handling big volumes with very high emphasis on cost-margins.

Search, yeah. Intrinsically maybe not, but Google has some very skilled engineers and researchers, who have pioneered a lot of the modern cloud applications/algorithms eg Borg/k8s, MapReduce, GFS and so on. So they have the know-how on how to build topnotch services, just not maybe as good business-savviness to sell them.

I'm not sure many people would be keen on deploying their entire cloud on Facebook either. Forget about privacy concerns and all that, it just doesn't make sense when they have the 'move fast and break things' philosophy.

The last thing you want in high availability cloud infra is a team of engineers who will sacrifice stability for innovation.

Even if they did do it, there's not a chance they could launch it under 'Facebook'.

Do they have a "move fast and break things" philosophy in 2020? I know they did a decade ago, but is that still true today? Further, does that philosophy apply to their core infrastructure, or just leaf-node feature development? Is there any indication that Facebook's cloud infra team sacrifices stability for innovation? When was the last time Facebook had a major outage that was traced to some core service? How does their infra track record stack up to AWS (I'm a happy AWS customer, but their infrastructure flakes on us in a major way several times per year)?

"move fast with stable infrastructure" is the current phrase

Here are a few reasons:

1) its hard to do well and still be profitable.

2) It would require a massive internal shift, The security tools internally for doing things like IAMs are far to primitive. It would take years to re-tool, stopping momentum on products that make money

3) its not where Zuckerberg see facebook in the future. That future is AR/VR.

Google stumbled with AR, It's structure means that any AR will eat at the mobile buisness.

Apple might make a splash, but its unlikely that they will be first to market.

Which leaves a whole segment for facebook to dominate, if it makes a decent product, and tackles it's image.

Apple just unveiled an ipad with lidar sensors. I would say Apple is poised to make the biggest impact since they can make the most phenomenal mass market hardware and well designed software to go with it.

AR/VR will still take a while to be mass market. We need a lot more compute, better algorithms and hardware.

All of these clouds are spawned out of already-large tech companies, many of whom have built a massive technical infrastructure to support their original businesses, then turn those resources into services to rent out in the form of cloud

This is a myth that’s been disputed several times by high ranking Amazon officials. Amazon never turned their internal infrastructure into AWS. It was always a completely separate initiative. Amazon was not on AWS for years.

Google also only uses GCP for a few internal services.

Technical infrastructure = ability to build and run cost-efficient data centers at scale.

I'm not sure anybody's ever seriously claimed the software infrastructure was the same. The "origin myth" of AWS I've always heard is that many of their servers were going unused outside the Christmas season, so AWS was a way to monetize them the rest of the year. At a hardware level.

So all these tech companies have been able to leverage their technical infrastructure -- the hardware one.

There are many conflicting stories about the origin of AWS, but the whole "our infra is underutilized except during Christmas, let's rent it out!" is one that I never hear repeated by anyone with any claim to knowing what was actually happening. I think it's an "Omidyar Pez Dispenser"-tier story: fully bullshit.

It kind of doesn't even withstand a moment's scrutiny. What was Amazon going to do during Christmas? Like, if they lease all this spare capacity during non-peak, what happens to those customers during peak? There are spot instances now, but the first two AWS products, S3 and EC2, didn't come with some sort of, "Except during peak" contract. So Amazon needs exactly as much hardware to meet demand during Peak as it did before, and now it's got a bunch of customer workloads it also needs to serve during peak. I don't see how this would've solved any problems.

Most stories about the origin of AWS have a few things in common: (1) There was limited support for the idea of AWS at senior executive levels (2) The early AWS products were not used by for any significant Retail workloads within Amazon for quite a long time (as in, several years) (3) As recently as a few years ago, there were major organizational pushes to move legacy Retail workloads onto AWS (4) As recently as last year, there were still so many teams on Oracle that everyone with an Oracle dependency was required to have an OKR to get off it. I include this because IIRC this is public-- after Oracle was making hay with the "Amazon sells Dynamo/RDS but runs on Oracle" campaign, Amazon publicly released information about their Oracle Annihilation roadmap. So it supports the theory that it's taken Amazon quite a long time to shift certain Retail workloads to AWS, even in cases where an AWS product that's suited to the workload has existed for quite a long time.

I've never worked for Amazon, so I don't know, and it's impossible for all the stories I've heard from ex-Amazon people to all be true, but I think there's a pretty good argument against the "we should put this spare off-peak capacity to use" AWS origin myth.

I could totally imagine them being at capacity and buying more servers for Christmas in October then after Christmas renting the excess out all year (Jan thru dec). Then in October buy more server again, rinse repeat. Sell 2019s Christmas burst servers starting in 2020... How does that not pass the sniff test?

Obviously they wouldn’t pull the rug from folks and spot didn’t exist, but that doesn't mean it didn’t happen that way.

Not only was Retail on Oracle, so were large portions of AWS...

Many of GCP's services did actually grow out of Google's internal tools. For example, BigQuery is a public version of Google's internal Dremel. And Cloud Spanner is of course a public version of Spanner.

And Kubernetes came from Borg.

And Cloud Source Repositories is similar to the internal code search as well as https://cs.android.com and https://source.chromium.org.

Edit: and https://cs.opensource.google. It’s this product: https://developers.google.com/code-search

Kubernetes is an open-source project inspired by Borg, not a proprietary product like the examples given by GP.

Yes, Kubernetes's design was inspired by Borg, but the code is not Borg's code. BigQuery and Cloud Spanner use most of the code of the internal versions. That's the difference.

It supports the original claim that is being disputed:

> technical infrastructure to support their original businesses[0], then turn those resources into services[1] to rent out in the form of cloud[2]

[0]: i.e. Borg

[1]: i.e. Kubernetes

[2]: i.e. GKE

EMR was obviously inspired by Google. And arguably the whole idea of using huge numbers of cheap x86 machines, rather than more expensive servers was inspired by Google’s search architecture.

Imagine if Google suddenly decided to enforce the MapReduce patent.

But this is still backwards. Internal google services are being cloned into watered down versions for the public. Google isn’t moving to GCP internally.

Yes and no. They run on the same hardware and share a lot of the internal tools. Which is a very big lift.


Wrong; it's true. I work on Google Cloud Storage. There are multiple layers of sharing at varying depths. The externally visible APIs to cloud storage are the tip of the iceberg, and the iceberg is very, very deep.

The part about AWS is false.

That's the software level though. Further down the stack, things like hardware designs, hardware purchasing, and maintenance can be shared.

Assuming the basic assumptions hold. Google for example doesn’t really care much if individual searches are slightly less accurate so low data center temperatures should be less important for them than say a financial institution. https://www.cnet.com/news/google-computer-memory-flakier-tha...

Facebook’s hardware designs might generalize well, but perhaps not.

I...I don't think the article you're quoting supports your conclusions. It concludes that running (slightly) too hot doesn't meaningfully impact error rates, and that even if you run at the recommended temperature, memory errors will happen anyway.

So the conclusion is that, whether you're a financial institution or Google, you need ECC, at which point running too hot doesn't have any appreciable effect anyway.

“Previous research, such as some data from a 300-computer cluster, showed that memory modules had correctable error rates of 200 to 5,000 failures per billion hours of operation. Google, though, found the rate much higher: 25,000 to 75,000 failures per billion hours.”

First it’s old data, but google having observed higher error rates suggests different choices. Location, power supplies, motherboard design etc can all play a role. Further, they should be optimizing for slightly different things. Finally, they got data for temperatures from running their actual production system at these temperatures while assuming it would cause even more problems.

> but google having observed higher error rates suggests different choices.

Alternatively, it suggests better data from a larger, real-er world study.

>Finally, they got data for temperatures from running their actual production system at these temperatures while assuming it would cause even more problems.

I'm not sure what you're saying here. To be able to detect the effect of temp, they ran at both higher and lower temps and compared.

Better data, is unlikely to be the cases as these are simply reports of ECC faults. Every large data center can collect this information equally easily. The ranges are simply for location or time specific variations not some uncertainty metric. You can read other articles about their hardware choices and why that may be an issue.

> Every large data center can collect this information equally easily.

Right, but they aren't published in controlled studies, usually. The article mentions that prior to the Google study in question, the next best example used 300 machines.

So yes, compared to contemporary (and since then!) public studies of ECC faults, I'd say that the Google study is pretty darn authoritative. You're welcome to cite other recent examples to the contrary though (with DDR 1 and early DDR 2 RAM, of course. Modern sticks fault less).

> The ranges are simply for location or time specific variations not some uncertainty metric.

I'm not sure what you're talking about. Section 5.2 and 5.3 of the paper is about the effect of temperature and utilization. It shows that temperature has a negligible effect when controlling for utilization, while the reverse is untrue.

I disagree that 300 computers was simply not enough data to be useful benchmark. But, you can find studies using more RAM if you go looking.

Supercomputers of that era had far more than 300 nodes, and you can find studies of their memory error rates. The Roadrunner supercomputer for example had 19,440 compute nodes with 4GB of ram each.

PS: The architecture was odd with 6,480 Opteron processors and 12,960 Cell processors + 216 System x3755 I/O nodes, but it’s still using commodity RAM.

But, as is mentioned didn't have long term ram reliability numbers published.

Just one example among many for the Jaguar supercomputer, with 18,866 nodes using DDR2. https://arch.cs.utah.edu/arch-rd-club/dram-errors.pdf. 250,000 correctable memory errors per month so quite a bit of data to work with.

This is simply a low effort type of paper on an important issue. So, there are plenty of them out there.

PS: If you want to compare here is one for DDR3 https://www.cs.virginia.edu/~gurumurthi/papers/asplos15.pdf

I think it's a mix of:

1) being too late in the game, at this stage.

2) unable to rebrand themselves as a trustworthy service for enterprises. They're stuck with the "social" and "fun" label.

While they have competent infrastructure, an IaaS offering needs a platform whose capabilities are more complex than one where you're running only first-party applications. And even if they spent the time and effort, they'd be entering a commoditified, crowded field. All the big players can run servers, the value-add is from SaaS and/or surrounding PaaS, the latter of which they'd have to build from scratch.

Their previous forays into PaaS, done at a time when Twitter was also in this space, have faltered. And Facebook's pre-existing services are too far up the stack to be relevant to any IaaS ambition.

I doubt the 'trust issues' contribute significantly in Facebook's decision to not offer this, but I agree they'd be a factor for potential customers once the offering came to be. Facebook's branding is odd in that they've doubled down on the Facebook brand despite acquiring services that flourished in part for being "not Facebook". They seem resistant to rebrand into a holding company, and their choice supports the view that the company's tech mission is to remain tightly coupled with their flagship social network.

The big miss for Facebook isn’t cloud computing platforms, but mobile platforms. Mark has admitted this himself when asked what his mistakes/regrets are. It was easy to miss in the context of social’s massive growth. But the dust is settling and it is clear that mobile platforms hold tremendous power over consumere.

Facebook didn’t miss mobile. They completely repositioned themselves when it came to mobile and pivoted away from being an app platform.

Mobile was the best thing that happen to them.

It was the best thing that happened to them, but also caused them to cede control to platform vendors.

I think you give Facebook too much credit for repositioning themselves. Facebook and the news feed was a natural product for mobile. There was no pivot at all. They merely saw where user attention was going and built for that surface. I would argue they got off track by building for mobile "web" and thinking native APIs weren't important. Then they course corrected. But I think the miss here is that social connectivity was a new thing happening to software and they could have made a strong play to influence the direction of the mobile OS. They don't have much leverage now.

Facebook did a huge company-wide refocus on mobile at a time when it was not at all clear smartphones were going to become the dominant platform. I really do give zuck et.al. a lot of credit for seeing that coming [source: I was an employee at the time]

React Native?

I think the article is talking about lower-level IaaS/PaaS.

Based on what would they expect to win business, from whom? Could they do it at lower cost? Probably not. Do they have any technology that leapfrogs the others? Not aware of any. Do they have good business relationships for this sort of thing? No. So what else is there?

I don't think the author understands why/how Cloud business works.

Software is just part of it. It is a logistics problem fundamentally. And Amazon is not only a software shop, it is also the most advanced logistics company in US, both physically and virtually, and those two go hand-in-hand.

So no, I don't think FB can get into Cloud business anytime soon, and I also doubt they have that much of the cash to actually expand the business, it is going to be huge investment, globally. The market right now is not going to be forgiving to new comers.

Disclaimer: ex-AWS employee

Short answer: they’re not at all trustworthy.

So you're saying "move fast and break stuff" isn't good for a platform pitch?

Then why is Google? They have the same business model and disregard for privacy.

Google has never been implicated in a plot to hack the U.S. presidential election. Now, I don't actually Believe that the ads displayed on facebook had any real impact on how people voted. They were never on screen for very long, and they took up a small percentage of screen space compared to all the other user and advertiser generated content. But Zuckerberg did testify before congress on live TV and that's what people are going to remember.

Google and Twitter were targeted by the Russians as well. They just didn't go public when they found the evidence, unlike Facebook, until after Facebook did. Facebook bears the brunt in the press because it screwed over the press so much in recent years.

> Google has never been implicated in a plot to hack the U.S. presidential election.

Sure they have. The very same election as Facebook (and Twitter), in fact.


Not saying Google is an angel, but at least I don't recall them collecting phone numbers for 2FA while promising to not use them for ad targeting and then silently doing so.

This thread is filled with comments about how people don't trust Google enough to use GCP. It's hurting them too, and given how far behind AWS and all they are, it's a serious hurdle to overcome

Google doesn't get caught as often?

And how are they doing compared to AWS and Azure?

Though I doubt that's the reason GCP is where it is, it doesn't really help your point.

I’d like to know why they’re not in the search business. They have enough signal from their platforms to offer great personalized search results and it would allow them to grow their ad revenue with intent based ads.

The same reason why Apple is not in the search business even though search is a key part of the phone experience: someone else does the job really well.

Facebook would need a way to differentiate the search experience. I don't think personalization is enough since Google personalizes too.

I also don't think a company should build products for the _purpose_ of revenue. Revenue is the output of building a great product with a purpose for users. So I think it goes back to Facebook asking what unique purpose would it provide in it's search experience?

Facebook should get into the search business for the same reason that Apple is getting into the software subscription business, it's an obvious way to grow their business.

Facebook already has the analytics in place from their share buttons, ad platform and user engagement on the feed to create a great ranking system. A lot of people already use instagram to search for restaurants, fashion, travel and products, same goes for facebook marketplace and business pages, all they'd need to do is expand the search bars in their apps to include web results. Tracking search queries would give them true intent data that they could use to improve their ad targeting across their whole ad platform.

The only thing google has on facebook right now is the ability to let advertisers target search queries. (just did a google search and it looks like they're starting to add that option: https://www.facebook.com/business/news/reach-people-who-are-...)

Because why would you use it over Google? That's a lot of effort to build and launch a likely poorly differentiated service that they'd need to spend a lot of money on to steal users from Google. Why would a user prefer Facebook to Google for search?

Because a huge portion of the world already spends all day on facebook, instagram and whatsapp. All they have to do is integrate web search into their apps.

I'm sure they already have most of the web indexed and search history would give them amazing information about their users.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact