- You're too small for efficient economies of scale on your own equipment (i.e. AWS is cheaper when considering total cost of ownership).
- You need to scale rapidly to meet demand
The second one is largely a data issue, if you have enough historical data on your customers, their habits, usage, and so on then scaling becomes predictable and even when it isn't you could offload only part of your infrastructure to a cloud vendor.
What's interesting is that several companies that I know which rely on AWS/Azure/et al aren't on it for either of the two above stated "good" reasons.
They are large businesses and do almost no automated scaling. They're on it for what can only be described as internal political limitations, meaning that they are on these services to remove the politics of technology infrastructure, one less manager at the top, a shorter chain of communications, an external party to blame when something does go wrong, and issues like HR/benefits/etc for infrastructure employees is outside the scope.
In effect they view themselves as "not a technology company" so look to employ the fewest technology employees as they can. Even in cases where technology is paramount to their success. It is very interesting to watch, and I'm not even claiming they're "wrong" to handle their infrastructure this way, just that it is hard to quantify the exact reasoning for it.
I'd argue that the core competency of Dropbox is its easy syncing. Dropbox wanted to get that to market quickly. If they had spent the time building out a data storage solution on their own, it would have meant months or years of work before they had a reliable product. Paying AWS means giving Amazon some premium, but it also means that you don't have to build out that item. It's not only about economies of scale and rapid demand. It's also about time to market.
I think it's a reasonable strategy to calculate out something along the lines of "we can pay Amazon $3N to store our data or store it ourselves for $N. However, it will take a year to build a reliable, distributed data store and we don't even know if customers want our product yet. So, let's build it on Amazon and if we get traction, we'll migrate."
S3 is a value-added service and creating your own S3 means sinking time. Even though data storage is very very near to Dropbox's core competency, it's really the syncing that was the selling point of Dropbox. To get that syncing product in front of customers as fast as possible, leveraging S3 made a lot of sense. It gave them a much faster time to market.
As time went on, they had traction, and S3 costs mounted, it made sense for them to start investing in their own data storage.
It's about figuring out what's important (the syncing is the product) and figuring out what will help you go to market fast (S3) and figuring out how to lower costs after you have traction (transitioning to in-house storage).
Yes, a lot of companies use cloud services when they don't need them. However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts. AWS can seem a bit pricy compared to alternatives, but I'm guessing that Amazon offers just enough discounts to large customers that they look at the cost of running their own stuff and the cost of migration and Amazon doesn't look so bad.
Still, when you're trying to go to market, you don't want to be distracted building pieces that customers don't care about when you can rent it from Amazon for reasonable rates. You haven't even proven that someone wants your product yet and your time is better spent on delivering what the customers want rather than infrastructure that saves costs. As you mature as a company, the calculus can change and Dropbox seems to have hit that transition quite well.
I imagine this could be the case for a lot of smaller tech startups, and perhaps even some larger companies that don’t have significant web traffic or ongoing real-time computer services.
Something like Gusto might be a good example. I would guess that each of their paying customers (employees of companies using them) leads to only a handful of initial or yearly setup tasks and maybe a handful of web requests per month, but represents solid revenue.
The most obvious counterexamples would be any company with persistence real-time services, like Dropbox or Mixpanel, or companies with a huge number of web requests with a very small rate of conversion to revenue, like an ad network or an ad-supported social network or media site.
Of course, the dollar amounts dropbox saved are compared to those negotiated prices.
In my experience it isn't so much the storage price itself, but the network transfer that makes AWS absurdly expensive.
Most companies don't need something like S3. They can perfectly suffice with one server, maybe using RAID-1, or just using backups. Data corruption mostly happens through logical errors anyway and nothing in S3 will protect you from that.
S3 supports object versioning, which very much will protect you from anything other than writing the wrong data for the entire history of an object.
Anyway, there's plenty of filesystems that have versioning built in as well.
Besides, I've been tasked with recovering the specific state of a database where every version of every object was available, and it was only some 200k or so records. That took me about 2 weeks. (mostly for writing code that could find a consistent version of the whole thing)
Given the storage usecase of DropBox what would be the percent of saving if DropBox indeed went with Google or Digital Ocean?
In every one of those cases the group in charge has been the wrong group and it really makes you wonder who has been asleep at the wheel so long that this has occurred.
Maybe outsourcing to AWS for a couple of years is a good way to reboot the organization. Cheaper than slowly going out of business. When the fad dies down you start hiring people back who are a little more humble and cheaper than AWS.
It seems like this problem happens in organizations that equate headcount with power. Which I guess makes some sort of sense but doesn’t feel right. Plenty of companies do not have the majority of their employees working on producing the goods they sell. Especially if they’ve started automating.
But as I said above, this is an ‘asleep at the wheel’ situation. It seems like it’s often not the biggest problem these companies have with their vision.
And, it sometimes happens that actual tech companies start outsourcing their tech, which is a whole other troubling pattern.
Heh, at one previous company it would take 6-9 months to provision a new VM and 12-18 months for a physical. Those entrenched IT organisations absolutely deserve to get their lunches eaten.
Buuuut at my current workplace I am starting to see some slowdowns in doing AWS stuff as "departments" get more involved.
Like the "cloud team" does the account but networks must provision the VPC, and there's a "gateway review board" that gets involved if you define a new network egress or ingress etc etc.
I feel like many of the early advantages of cloud in enterprise are going to get eroded as the paper pushers catch on and "add value by defining process".
Yes, this is totally antithetical to modern working practices such as "DevOps"
At that place e.g. the VM team could provision you a VM (eventually) but they couldn't do you a login on it, that was some other team. But you couldn't raise the paperwork against that team until the VM was created, so everything proceeded serially, and each step at a glacial pace. The time ratio between them and Azure or AWS is literally under one minute for every month!
There was an old saying that 80% of outages were caused by humans. But there is more to account for than that. For example:
- Certifications that the org has obtained
- Contracts that the org has signed with customers
- Compliance with laws and regulations (and international)
- Insurance requirements
It can add up in a hurry and can slow an org down quickly.
It would be nice if an org could review all of its failures every few years and drop nearly all of its processes and procedures and start out fresh again - vowing to build around the failures of the past.
Indeed. That IT department is learning a lesson of what happens when your end users discover cloud providers and come into the office and tell their management how quick and easy it is. Managers don't know or care about the past - they only know that what should take minutes is taking months. If that department had been able to turn around a VM request in a day or two - not unreasonable - they would have been safe. Now they are being asked questions for which they have no good answers, such as, what exactly do all of you do all day?
and also in these days of data breaches where the sensitive data was just lying around on open s3, more are more places require approval from above before adding anything to aws.
While your reasons are valid you are missing an important one:
Resource scarcity: the engineers that I need allocate to infrastructure I rather have working on user facing features and improvements. Talent is scarce, being able to out source infrastructure frees up valuable engineering time.
This is one of the main reasons, for example, that Spotify (I’m not working for them) is moving to google cloud.
There can be advantages in that the developers more often can do the tasks passably well enough that you can spread tasks around, but if it's not accounted for a lot of the time people are fooling themselves when it comes to the costs.
When it comes to large companies like Spotify, the situation changes substantially in that they're virtually guaranteed to pay a fraction of published prices (at least that's my experience with much smaller companies that have bothered to negotiate).
This has been my experience working with companies that use cloud services as well.
Another big waste of time is on application optimization especially around database usage. Cloud services tend to provide very low IOPS storage (and then charge exorbitant amounts for semi-decent performance) which forces spending a lot of wasted time on optimization which would never be an issue on dedicated hardware.
It's generally the case across large parts of IT. I confused the heck out of the first manager I started sending itemized weekly reports of the cost of each functional area and feature requests (based on average salary per job description), as he'd never seen it before. But it very quickly changed a lot of behaviors when they realized the value of the resources spent on various features.
It is cheaper than hiring a full DevOps team which is a better apples to apples comparison. By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation. If the load cannot be spread across the team but requires specialized DevOps engineers then I lose both those very important points. Obviously once your company is large enough it's different but for small teams/companies it is an important factor.
My experience is that it often is more expensive when you actually account for lost development time.
> By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation.
... and my experience of this is that I'll never do it, ever, because of how it affects retention.
> Obviously once your company is large enough it's different but for small teams/companies it is an important factor.
Most of my clients have been small to medium sized, and I see the cost tip in teams with as few as 3-4 people at times.
This assertion supports what vidarh wrote. What you wrote has nothing to do with DevOps or software or engineering - what you are really saying is that you are saving money by coercing your developers into working two jobs at the same time. I have been in this position as a developer at a company where we had on-call rotations. This is a false economy and a quick way to increase stress, alienate employees, and increase turnover. Infrastructure tasks get neglected and are performed poorly because those tasks are now just necessary distractions to the main workload of feature development, to be gotten over with as quickly as possible. A lot of things get overlooked because no one "owns" areas like backups and disaster recovery.
It doesn't take many people to do soup to nuts businesses.. think WhatsApp's 50 engineers, Netflix's 100 person OCA team (if you don't think OCA is a standalone product you don't know much about technology business) doing 40% of the Internet by volume.. the vast majority of people just aren't very good that work in technology. Business governance grossly underestimates the effects of mediocre performance.
So the real question is why aren't governors trying to encourage WhatsApp and OCA style businesses, it's far more cost efficient. I understand why an organization itself empire builds, misaligned incentives.
Cloud services still need configuring and managing. You're saving on 2-3 days upfront on racking and cabling, on boxes that will last at least 3 years, probably longer. So if this is your only reason, you're making a false economy, eventually the costs of an undermanaged cloud will bite you (e.g. VM sprawl, networking rules that no-one understands, possibly-orphaned storage, etc).
"Infrastructure" is a little broader than just some cabling, much broader. You're also assuming that whoever will be in charge of DIY is a) more competent at scale than whatever will be scraped together for the cloud, and b) available with no resource cannibalisation.
The point the person you're replying to was trying to make was that for every "good" hire you're deciding where to allocate them, and sourcing plumbing from a cloud provider lets you allocate preferentially to product development (ie business growth). Even if you "pay more" for that setup, in theory business growth you will achieve more rapidly pays for it many times over (first mover advantage, market leader advantage, cost of money over time, etc).
The costs of pinching the wrong penny and making technical hiring more difficult, diluting your talent pool, can be the difference between huge success and too little too late. An undermanaged local setup that cost you 3 years on Time to Market will bite you long before 'eventually' comes, and you won't have oodles of cash to fix the problem.
For personaly projects, I use AWS and Azure (though am likely to migrate everything to a single box at OVH because it turns out to be cheaper for better performance - go figure) and it's made a certain amount of sense (up to now). At work we use dedicated hardware, because the cloud can't deliver the bang per buck.
At growth companies using cloud makes complete sense because it's all about time to market and iterating on your business proposition. Requirements change all the time and having flexibility of cloud offerings gives you the velocity. Whether at scale/in maintenance mode it makes much for sense to cut corners and optimise spending.
Either way you want to focus on what brings the most business value.
Those two reasons (for needing aws) are the technical problems that aws can solve, but you can't solve in-house. That is, they are not even solvable on a boardroom whiteboard, where the board pretend everything is just a matter of resource (money) allocation.
But (imo) most of the things that companies fail on... it's not because it is impossible to do a good job. They fail for less inevitable reasons.
In any case, I actually like the strategy where you try to be good at the things that you're good at but minimize things you need to be good at.
Dropbox knew that aws was expensive. If the numbers here are real, then in housing would have been a byug efficiency gain (on said boardroom whiteboards) for years. Makes sense when you consider what Dropbox does.
I assume they paid this price because it let them avoid being an infrastructure company. They would have had to be a very good infrastructure company. Why introduce this additional failure point, limiting factor or whatnot?
I (maybe you too) have seen the kinds of problems that aws solve be the limiting factor in a bunch of companies. The fact that they're technicaly solvable is almost academic, at some point.
tldr, sometimes it's good to solve certain problems with money.
- you need to iterate rapidly with with scale and reliability. if you have the right expertise this becomes very quick to setup. it lets you focus 100% on product iterations.
- you need (predictable) on-demand compute for crunching large amounts of data or running some batch jobs. it just doesn't make sense to do this on your own equipment.
- your median cpu utilization is low, so you want to save costs and you move to a serverless architecture, effectively moving the cpu utilization you pay for to 100%.
- But most importantly AWS isn't just compute and storage primitives: AWS has a vast array of abstractions on top of the cloud primitives: managed clusters, machine learning services, virtual desktops, app streaming, CI/CD pipelines, built-in IAM and Role based access control, to name a few !
Now more and more companies are locked in in Amazon.
Hard to find a good old Datacenter Admin.
I know perfectly well how to provision and scale a large infrastructure and can give you 99,999% availability in any application, BGP if needed.
Yet no one is interested in that. Sure I can write Ansible scripts and Terraform policies, but it's a miniscule part of my skillset and doing it on AWS is just boring compared to building the backend that powers it.
The counter argument I have is that at different sizes of operations, completely new skills become important, so you and your staff are left behind.
Example: my previous employer became large enough in terms of hardware footprint (~>1M cores) that it started getting difficult to find commercial colo space. How good are software and systems engineers at electrical engineering? :)
Granted, if you need 1M+ cores, you're going to be dealing with humans most places (including AWS) to get the best deal possible, and that also means the cost differences can change fairly substantially (e.g. the instances I know of that are in "ask us" territory are not paying anywhere near published prices)
That said 1M cores is not that much. Depending on your needs it's as little as "just" 500 racks. Plenty of managed providers will be happy to provide customized service to e.g. design and/or manage a setup for something that size.
You go to google cloud and order 10k preemptible instances per datacenter. That solves half of your problem. Then the same onto SoftLayer and AWS and revise monthly who is the cheapest one. It's not very difficult.
Contrary to you, I think that most places are not ready to accept that kind of contract. I have a friend who evaluated to move one of his compute grid to DigitalOcean and he was contacted to stop doing that, their servers are not meant to run at 100% CPU use all the time.
Regarding cloud being much higher cost: indeed. Anecdotal evidence; I know of a big reference customer who said they would consider the move to cloud a success if total cost didn't go up more than 2x. A few years in, that hadn't happened yet...
Twice as much would be a huge savings. In the cloud, I can give you 20k cores in one hour and I can give you 100k cores in two hours, after the initial investment to open multiple regions and networks backbones. The price is known in advance and the investment can be cancelled anytime.
Obtaining the same 20k cores from a colo will be a hassle. Months of negotiations with nothing being done. I'd expect any sane provider to bail if trying to discuss 100k cores, it's too much upfront investment. It would take years to get half of it, the project is long abandoned before anything is delivered and everyone who worked on it has left.
As a tech enthusiast I love what's possible with AWS, Azure, GC, etc. As a salesperson I don't mind selling these services (although the margins stink compared to selling VPS or dedicated). But there is a lot of cloud-overkill going on out there.
The big reason for most people is CAPEX vs. OPEX - even if it doesn't make financial sense in a dollar amount, it does in an accounting sense. Investors don't like to see big CAPEX numbers but seem fine with large OPEX ones.
If things go pear shaped large OPEX numbers resolve themselves as OP-erations get slimmed and shut down. Large CAPEX numbers, in the same situation, resolve themselves through liquidation and tears...
More importantly, OPEX comes from next years profits yielding a business I can loan against. CAPEX comes from last years profits, increasing the amount of loans I need to get it together.
It's the difference between thinking about short term profit margins and thinking about asset growth over time. Throwing a lot of optional cash today at a problem is better business than being forced to throw non-optional cash at a problem whenever the problem is feeling problematic. It's also quite freeing in terms of M&A.
That's a very illuminating way to highlight the accounting fears mentioned in the gp.
No one tries to build their own power station, or make their own laptops. They're better off using engineering resources on higher order stuff, unless, like Dropbox, the margins / TCO you are getting on your storage is an absolutely huge deal.
They might outsource most of the work, but then the gap between generating and using power is much more clearly defined.
someone asked me on PM how to get your hands on that pile of virtual cash. I'm no expert on that, and likely YMMV, but for us, we basically called account managers in the big 3 (GCP, aws, azure) and went shopping. With some PR visibility (i.e seed, some stupid "innovation reward", I guess any alternative empty proof of existence applies), we got offers (in the 50-500K range).
The way I see it, that's the de-facto standard - launch a VC funded startup, and cloud providers will align next your door to shell credits on your head.
Google also has a dedicated page where they partner with a bunch of VC's .