Hacker News new | comments | ask | show | jobs | submit login
Dropbox saved almost $75M over two years by moving out of AWS (geekwire.com)
494 points by shaklee3 11 months ago | hide | past | web | favorite | 210 comments

AWS sells optionality. If you build your own data center, you are vulnerable to uncertain needs in a lot of ways.

(1) your business scales at a different rate than you planned -- either faster or slower are problems!

(2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?

(3) your business plans shift or pivot

A big chunk of the Amazon price should be considered as providing flexibility in the future. It isn't fair to compare prices backwards-looking: where you know what were your actual needs and can compare what it would have cost to meet those by AWS vs in house.

The valid comparison is forward looking: what will it cost to meet needs over an uncertain variety of scenarios by AWS compared to in-house.

The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!

You are right with this analysis. The only thing to add is why Amazon still isn’t the only choice and why sometimes it is in fact better to not use it.

Your comment makes it sound like unless you know your growth pattern, can predict your spikes, don’t plan to pivot, and you know these things perfectly, you will lose. That’s not the case. The reason for that is that you have a quite significant cost difference between AWS and DIY. DIY is cheaper by a significant enough margin that you might be able to buy, say, double the capacity you need and still spend less. So if you made a mistake of up to 2x your growth and your spikes, you are still fine.

Even if you are a small operation, you still have the option to lease hardware. Then your response time to add new servers is usually hours, not days like if you were to go the full own-your-hardware route.

As an exercise, you can try to rent and benchmark a $300/month server from e.g. SoftLayer and then compare that against a $300/month EC2 instance. Chances are, you will be blown away by the performance difference.

I don't think anybody will argue that if you have a specialised workload (CPU heavy, storage heavy, etc) there's definitely cost savings at scale for DIY over cloud.

But the calculation is harder than that. People are terrible at estimating the ops personnel cost for DIY. Turns out it's hard running with minimal downtime, or engineering hardware + software to be fault-tolerant. It's hard to put a price on dev tools/APIs/doco/the whole eco-system.

Especially for that last reason, I have never been "blown away" by Softlayer, even when their instances were way beefier than anything AWS/GCP offered. YMMV.

> Turns out it's hard running with minimal downtime, or engineering hardware + software to be fault-tolerant.

It's hard to do that with cloud too. Your instances can go down at any time.

What you're trading off is a small number of people on-site at the data center (and it can in fact be small!) plus some slightly-old-school sysadminning versus new-school cloud-native development. Maybe the latter is easier to hire for (although I doubt that) or your current people find it more fun. But it's not like physical hardware dies all the time, or that you're immune from seeing the effects of physical hardware dying just because you're on the cloud.

You might save on hiring one person to replace disks when they fail and one person to write some PXE boot automation, but you'll need more than two people to run a Kubernetes at the same scale of compute capacity than those two people could have run for you.

Well, you and the poster are about to be disappointed. I did the comparison recently for 512GB memory servers and Google Cloud is now the same price as SoftLayer, while AWS is a bit more.


If you are willing to pay a lot of money upfront, which you are since you're talking about building your own datacenter, AWS and Google will both give you huge discount and be significantly cheaper than SoftLayer.

I don’t see any performance details in your write up. Is the SL server actually comparable to the GC one? I don’t know how to equate vCPUs to real world performance.

Memory is easy to compare. But how does it stack up CPU-wise.

Another question is whether lower tier servers compare favorably or not. How does a $200-500/month server compare?

Lastly, is it possible that SL is just not competitive anymore, but another provider is? I have gotten out of that gig a few years ago so I honestly don’t know, but is it possible that Hetzner or someone similar is now the leader in leased hardware?

CPUs are in the same ballpark. They are all recent generation Intel Xeon with a lot of cores. The providers each have a few different builds, you'd need to provision the machine and check the model number if you want the details.

SoftLayer is a disaster on lower tier servers. You only go to them for higher tiers.

The market is very competitive and moving fast. Both Google Cloud and AWS have been rolling out high memory hardware recently at a reasonable rate.

Hetzner is amateur tier. It's cheap second hands servers with no ECC. It's not comparable to enterprise infrastructure.

I think you misunderstand, I agree Softlayer is terrible - I was merely pointing out that when their pricing wasn't bad, their tooling has always been pants.

(But then they were bought by a company who's "cloud strategy" is to put the word "cloud" in as many DB2 and mainframe contracts as possible.)

Yes, I agree that the tooling is subpar but the price point and the servers with terabyte of memory were decent arguments, when the competition didn't have them.

It's noteworthy that SoftLayer is now worse on all aspects.

> if you have a specialised workload (CPU heavy, storage heavy, etc) there's definitely cost savings at scale for DIY

I wasn't sure what your use "heavy" means here -- is it "a lot of" or "disproportionate"? Years ago there was much less flexibility with IaaS machine shapes, but I was super impressed when Google launched "custom machine types". There's a little slider to scale up/down your RAM/CPU independently, and of course storage is already independent. In fairness, there is some correlation between CPU allocation and storage IOPS, but that's inherent to scheduling a portion of the host machine reasonably.


Yeah, if you're using loads of CPU, but not all the RAM or block storage. This usually happens if your problem isn't really parallelisable. Then scaling out isn't really something you can do easily or quickly, so the cloud loses it's appeal a bit. In those cases, it might make sense to go DIY/in-house.

First I don’t mean SoftLayer’s cloud. I mean their physical boxes. A dual processor Xeon that is all your own is a lot.

Second, it depends on your expertise. And you don’t need scale. I ran a successfully bit of infrastructure on SL with seven servers that was quite a bit cheaper than an equivalent AWS setup. I was pretty much solely responsible for the infrastructure and still my responsibilities included more than 80% coding and later management. Given my salary at the time, it was quite cost effective.

was the comparison based on on-demand instances or reserved instances? we reserve almost all of our instances within our company for 3 years and also get an extra discount on top of it. based on my calculations, i've seen AWS cheapest out of all. would like to know if i'm missing anything

This was a few years ago so prices have changed on both sides. What I advocate is doing exactly what you did: do a comparison with actual numbers and performance stats instead of hand waving that the cloud is always best.

another factor that gets overlooked some is on-hand skills and hireable skills.

if you have experience (and/or your team does) with dedicated, that should probably be for a non-trivial in decision making. likewise if you have more aws/cloud experience (or have those skills available on staff), there's benefits to that.

cloud skills don't magically happen - someone with minimal AWS skills can provide you with a setup that is more expensive to run, and possibly more insecure than a locked down dedicated box.

Exactly. If you have expertise to run your own servers, and you don't have a spiky workload, maybe it's worth it. If you have cloud expertise, that may be the winner.

you make a good point, but my anecdotal story from experience says you really need 10-30x your average capacity for peak times instead of double. an ideal world would be you have your "cheap" datacenter that you use most of the time, and somehow extend into the cloud for the true rare event when you need 30x. I'm not sure how feasible that is, though

Can't you use a hybrid strategy and use the cloud only to handle the peak load?

Handling variable load is not free in the cloud, so if you are going to pay it anyway, you can keep your predictable cost down and use the cloud when you have that temporary extra load.

Depends on your workload. If your see this traffic pattern, then yes you absolutely are a candidate for the cloud. But if you are, say, an ecommerce shop and your spike is just double your regular traffic, then who knows what’s better.

Can you use AWS for extraordinary peaks and DIY solution for cooler periods of time?

Put another way: Would Dropbox have gotten off the ground and progressed as it did without AWS?

Probably not.

Outgrowing AWS is a great problem to have.

Not to mention they basically resold a single AWS feature. All they had to do was rebuild S3 (and ec2).

Ah, you sound like this person: https://news.ycombinator.com/item?id=9224

> For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

Not only did BrandonM have a point in 2007, today it is even easier and much more advantageous to do file synchronization with Free Software tools. If you have multiple machines you have SSH on all of them already. There is no better solution out there than git together with ediff for versioning and conflict resolution. rsync and unison are great for syncing media files across many devices/USB drives/microSD cards with different filesystems on them. Hardlink or ZFS snapshots just work.

Dropbox is not even an option for me because there is no official support or reliable way to run it on OpenBSD.

Dropbox is a consumer product with limited features and use cases. Smugly dismissing the needs and technical capabilities of power users is not going to endear you to anyone here.

Then it’s good that my goal is not to endear myself to anyone here. Flippantly dismissing certain technologies and engaging in “just”isms needs to be called out because it’s not productive or helpful. Moreover, I think it’s funny that the same kinds of posts are being made in 2018 as were made in 2007 (i.e. “why don’t you just...”), considering they have nothing to do with Dropbox saving money by migrating away from AWS. Really, it’s not germane at all. Perhaps if the source were an article about open source alternatives to Dropbox.

> Flippantly dismissing certain technologies and engaging in “just”isms needs to be called out because it’s not productive or helpful.

I agree that hashkb's comment was flippant and unhelpful, but your reply was not any better and was the one that went off-topic.

That's all true, but you are not their customer here. Their customer is the person who has never heard half the words you just said - or at least the person who has to support a bunch of people who've never heard half the words you just said. :)

And he was totally right. Now it's even easier and mature open source tool and libraries are available. https://syncthing.net/ is good example

I think that's a little disingenuous. Dropbox provides:

1. a nice web interface;

2. synchronisation;

3. an easy-to-understand pricing model;

4. tightly integrated desktop support (at least for OS X).

There is not dishonor in that. The “magic” they provide is still better than the competition.

S3 and EC2 are by design commodity, low level artifacts. Like any other commodity, you should always be shopping around.

The big issue with Amazon has always been their “roach motel” philosophy where data goes in but doesn’t come out. (ie they hose you with network costs)

If you have enough scale and enough dollars, it is often cheaper to build bespoke infrastructure vs renting somebody else’s at margin.

From the perspective of a business, building your own data center only makes sense if it part of your business (like at Dropbox). Paying a margin for a service that is not part of your core model is per se not a bad thing, in most cases it makes a lot of sense. That is because if a company makes a huge investment, it makes this investment with its own capital. In theory (and that often applies to reality) a company can only invest money effectively in its core business.

This is why many companies with a lot of money still rent their offices – even if they intend to stay in the (often customized) buildings so they don't need flexibility. It just would not make sense to buy and to bind the capital in a building, it can be invested in their core business with higher returns.

An exception might be made in cases where a company has suddenly so much money that investing it all in its core business might diminish the capital/return rate, eg. through the US tax break, where a company shifts the money stored oversee back to the US. This is why for Google it might make sense to buy property, even though its not their business: They just have to much money.

Correct, but all of those are basically independent of the backend storage system. At one point you could almost consider Dropbox to be a VAR for S3 storage, with the emphasis on "Value Added."

Add that their flat rate pricing is actually a discount for moderately heavy users. If you use half or more of the 1 TB on the pro plan you’re already saving money compared to S3 pricing.[1] This is easy to do if you use something like Arq to back up your machines and take advantage of selective sync to archive rarely used files from old projects.

[1] S3 at 0.023/GB/month * 500 = $11.50

and 11.50 doesn't include traffic, which can get you even higher.

While S3 was the bulk of it, I wouldn't be surprised if they had used DynamoDB for metadata storage, SQS for job queuing (pushing share data down to clients asynchronously), SNS for broadcasting pub/sub, Redshift for data analysis, ASG for deployments, etc.

cost for both DynamoDB and Amazon Redshift can both get out of hands real quickly. I've seen it first hand. With Redshift, the problem is that data teams do not adjust the default configuration of 5 query slots. So when things slow down, instead of increasing concurrency for a given node, they add more nodes. I've seen monthly invoices of $100K+ for a workload that could have been done with less than a $2,000 / month.

metadata on mysql on bare metal last i checked. it wasn’t on ec2

> A big chunk of the Amazon price should be considered as providing flexibility in the future.

A recent place I worked at didn't understand this. They were going to the cloud as a strategic move because they didn't want to run a data center any more. Their computing needs were stable and predictable - no "Black Friday" sales that needed triple the servers for a short part of the year. They were going to end up paying far more for the total computing cost than if they had just kept buying blade servers and VMWare licenses.

I've worked somewhere that didn't care. Business SaaS platform, that moved to the cloud because maintaining servers was dull and taking up too much developer and devops headspace. The entire yearly spend was about as much as the salary of an ops person.

I'd argue that companies where the hosting costs are the primary, or even a significant cost, are a small minority.

I worked somewhere where the resident DevOps/Sysadmin guy would rather repeatedly write me emails about using an extra 10GB of disk for a data science project than just buy more storage. And this was on an instance with like half as much memory as disk available. There are some people in this industry who just have zero business sense.

Then there are people on the opposite side of the spectrum who don't even look at server costs. I have a task sitting in our backlog to optimize our instance usage which could save us over $100,000 a year. It has been constantly deprioritized in order work on feature development which is stupid because we could hire another developer to do it in a few days and then keep him to increase our feature development all while costing our company the same amount of money.

Granted, I think this is more of a case of the prisoners dilemma [0] between lower management and upper management (lower management doesn't want to work on it because it doesn't produce features that make him look good but he also does not want to propose the additional developer for it because then upper management will just tell lower management to do it without the additional developer).

[0]: https://en.wikipedia.org/wiki/Prisoner%27s_dilemma

This is a pretty common trend in lots of places. These kind of decisions are driven for things other than computing needs. Maybe to look cool/cloudy/nextgen ...

You can design for and get steady-state discounts on cloud too. It's not only about flexibility but also maintainability and ops overhead. The increased spend on cloud is still usually less than the cost of a sysadmin/IT team and the handling of inevitable hardware and DC issues.

This is what happens when directors and C-level folks get a reputation bonus from being able to talk about how they "led a migration to the cloud" in their previous company.

A lot of times companies can yield short term savings because of how they depreciate or are taxed on assets like server rooms and equipment.

Most of the times I saw that,they did it because in-house pricing was insane and in IT they wanted to cut costs.

Fire team, comeback a few years with a new cheaper team and in-house pricing.

Job done

It's good to understand the different price dynamics and useful to have some rules of thumbs to avoid long cost calculations.

For most startups I would actually advise to start with Heroku, which is even more expensive than AWS (it is built on top of AWS). But you save on devops and building the CI/CD pipeline. For a small team it can make a big difference in terms of productivity.

For worker workloads like CI, renting dedicated hardware (like Hetzner) is usually cheaper and produces more reliable results. spot instances also work but have less reliability dues to machines cycling. The main factor for keeping everything under AWS would be egress bandwidth pricing or if the workload spikes are bigger than 2x.

I am still holding my breath for the tools to mature to the point that people can run their own data centers again with less overhead. My lips are getting a little blue but I see some hopeful signs.

For number 2 especially, there have been some cool projects for efficiency gains when different parts of the organization experience different traffic spikes. Like Netflix, where transcoding spikes precede viewing spikes, and they pulled off a load balancing coup to reduce peak instances.

I think the right thing for a tech company to do is to run their own data center in at least one town where they operate, and use cloud services for geographic distribution and load shedding.

The temptation to reduce your truck numbers below long term sustainable rates is too great, and so is lock-in. The best hunt I think you can do for the long term health of your companies these days is to hold cloud infrastructure at arm’s length. Participate, but strongly encourage choosing technologies that could be run in your own server rooms of another cloud provider. Like Kafka, Kube, and readily available database solutions. Make them avoid the siren song of the proprietary solutions. We have already forgotten the vendor lock-in lessons of the 90’s.

A good option can be to use your own data center for base load, and on top of that use AWS for traffic spikes. That way you still have the flexibility to adapt quickly but at a lower cost, once you reach a certain scale.

use your own data center for base load, and on top of that use AWS for traffic spike

Much easier to go hybrid on Azure, at least until AWS and vSphere integration is ready for prime time

AWS only provides optionality if you don't get deeply into the AWS services. If you do you find that optionality is absent.

A startup I talked to not too long ago has a revenue of around 1M a month - pretty great. Their Amazon bill is around 1M a month - not so good. Their entire architecture is one AWS service strung into another - no optionality at all.

I think you're talking about a different kind of optionality than the parent post. That was talking about option to expand/contract capacity, change infrastructure, or all the usual "elasticity" that they tout.

You're talking about vendor lock-in. Which is a totally valid point and something to be aware of, but basically orthogonal.

> (1) your business scales at a different rate than you planned -- either faster or slower are problems!

> (2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?

> (3) your business plans shift or pivot

Anyone at any real scale has a multi-datacenter setup and AWS is effectively just a very elastic datacenter you can tap into. You could just as easily tap into Google Cloud or Azure. You do not need to operate 90% of your business in AWS to use AWS.

> The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!

Its still cheaper to go in-house with a multi-DC setup in anything but a single DC setup in AWS with very few nodes.

Architecture for a decent sized business should be setup with 3 DCs in mind /anyway/. You have two primaries, you scale <insert cloud provider> as needed and only leave some DB servers there most of the time.

Sure, but isn't the REAL other selling point of AWS aside from an elastic virtual datacenter that it automates away sys admin tasks and ultimately people?

It’s deemphasized usually because you end up replacing sysadmins with new sysadmins and analysts who can understand billing optimization on AWS or pay way more for AWS.

Another commenter on this story mentioned $100k redshift workloads that were optimizable to $2k.

> Sure, but isn't the REAL other selling point of AWS aside from an elastic virtual datacenter that it automates away sys admin tasks and ultimately people?

Yes. AWS shines when you have a small team with no competent DevOps people and a desire to build a single datacenter setup. However, at that scale...we are talking a small business / startup with less than 50 employees. If you are still doing that when you've grown past that point, you've got architectural problems.

Once you scale past that point, you still need a competent sysadmin to build a reproducible process...at which point even leasing hardware from any provider works. You cannot build a multi-DC setup that integrates redundant providers without a competent sysadmin.

Even if you stick with how AWS has you do things it will eat you alive if you do not optimize your workloads. I'm not sure how you would do that effectively without an experienced sysadmin acting as an analyst.

So perhaps the best approach is to use your own servers for the guaranteed minimum load you will see and use more expensive AWS for everything else.

Over-provision will still be cheaper than the cloud.

Also, in my opinion the know-how and full control of the complete stack is paramount to maintain a quality service.

In addition, you can do both. Tying yourself to one provider is an unnecessary risk and tying yourself to the cloud is not ideal.

The obviously unanswerable question would be: would Dropbox have been able to succeed as a startup while building their own data centers in-house?

I assume that answer is a no.

I would bet they still burst-out to AWS.

For me the logic is more like: get cheapers machines (be it in-house or with cheaper alternatives), that run kubernetes for example, and monitor them with Prometheus.

If you run out of capacity, defined by whatever metric you fancy from Prometheus, start EC2 machines for the burst.

Every month, re-evaluate your base needs.

Amazon was a phase. They bridged the gap of DOTcom and new thinkers not interested in starting a devops team to begin.

Now DO, vultr, and other options exist to fill the gaps more.

Aws was never the solution to content distribution or infrastructure replacement. Just a company smart enough to notice big gaps and not afraid to fill them to get the ball rolling, maintain and progress, then move on.

I really like this comment. Amazon offers a lot of services (with ridiculous names) but you are too right that you're paying for the flexibility to pivot as your needs change. Dropbox did well to recognize its needs after a time using AWS.

There's two good reasons to use a service like AWS:

- You're too small for efficient economies of scale on your own equipment (i.e. AWS is cheaper when considering total cost of ownership).

- You need to scale rapidly to meet demand

The second one is largely a data issue, if you have enough historical data on your customers, their habits, usage, and so on then scaling becomes predictable and even when it isn't you could offload only part of your infrastructure to a cloud vendor.

What's interesting is that several companies that I know which rely on AWS/Azure/et al aren't on it for either of the two above stated "good" reasons.

They are large businesses and do almost no automated scaling. They're on it for what can only be described as internal political limitations, meaning that they are on these services to remove the politics of technology infrastructure, one less manager at the top, a shorter chain of communications, an external party to blame when something does go wrong, and issues like HR/benefits/etc for infrastructure employees is outside the scope.

In effect they view themselves as "not a technology company" so look to employ the fewest technology employees as they can. Even in cases where technology is paramount to their success. It is very interesting to watch, and I'm not even claiming they're "wrong" to handle their infrastructure this way, just that it is hard to quantify the exact reasoning for it.

Those are definitely two good reasons, but I think there's another: you are starting up and your core competency isn't object storage.

I'd argue that the core competency of Dropbox is its easy syncing. Dropbox wanted to get that to market quickly. If they had spent the time building out a data storage solution on their own, it would have meant months or years of work before they had a reliable product. Paying AWS means giving Amazon some premium, but it also means that you don't have to build out that item. It's not only about economies of scale and rapid demand. It's also about time to market.

I think it's a reasonable strategy to calculate out something along the lines of "we can pay Amazon $3N to store our data or store it ourselves for $N. However, it will take a year to build a reliable, distributed data store and we don't even know if customers want our product yet. So, let's build it on Amazon and if we get traction, we'll migrate."

S3 is a value-added service and creating your own S3 means sinking time. Even though data storage is very very near to Dropbox's core competency, it's really the syncing that was the selling point of Dropbox. To get that syncing product in front of customers as fast as possible, leveraging S3 made a lot of sense. It gave them a much faster time to market.

As time went on, they had traction, and S3 costs mounted, it made sense for them to start investing in their own data storage.

It's about figuring out what's important (the syncing is the product) and figuring out what will help you go to market fast (S3) and figuring out how to lower costs after you have traction (transitioning to in-house storage).

Yes, a lot of companies use cloud services when they don't need them. However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts. AWS can seem a bit pricy compared to alternatives, but I'm guessing that Amazon offers just enough discounts to large customers that they look at the cost of running their own stuff and the cost of migration and Amazon doesn't look so bad.

Still, when you're trying to go to market, you don't want to be distracted building pieces that customers don't care about when you can rent it from Amazon for reasonable rates. You haven't even proven that someone wants your product yet and your time is better spent on delivering what the customers want rather than infrastructure that saves costs. As you mature as a company, the calculus can change and Dropbox seems to have hit that transition quite well.

A slightly similar other reason could be that your server costs are so low relative to the size of your business that the ROI of moving off of AWS doesn’t make sense for the conceivable future.

I imagine this could be the case for a lot of smaller tech startups, and perhaps even some larger companies that don’t have significant web traffic or ongoing real-time computer services.

Something like Gusto might be a good example. I would guess that each of their paying customers (employees of companies using them) leads to only a handful of initial or yearly setup tasks and maybe a handful of web requests per month, but represents solid revenue.

The most obvious counterexamples would be any company with persistence real-time services, like Dropbox or Mixpanel, or companies with a huge number of web requests with a very small rate of conversion to revenue, like an ad network or an ad-supported social network or media site.

> Yes, a lot of companies use cloud services when they don't need them. However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts. AWS can seem a bit pricy compared to alternatives, but I'm guessing that Amazon offers just enough discounts to large customers that they look at the cost of running their own stuff and the cost of migration and Amazon doesn't look so bad.

Of course, the dollar amounts dropbox saved are compared to those negotiated prices.

In my experience it isn't so much the storage price itself, but the network transfer that makes AWS absurdly expensive.

Most companies don't need something like S3. They can perfectly suffice with one server, maybe using RAID-1, or just using backups. Data corruption mostly happens through logical errors anyway and nothing in S3 will protect you from that.

> Data corruption mostly happens through logical errors anyway and nothing in S3 will protect you from that.

S3 supports object versioning, which very much will protect you from anything other than writing the wrong data for the entire history of an object.

Versioning is optional and just becomes part of the file data. Write garbage into that and it'll still take a lot to fix it.

Anyway, there's plenty of filesystems that have versioning built in as well.

Besides, I've been tasked with recovering the specific state of a database where every version of every object was available, and it was only some 200k or so records. That took me about 2 weeks. (mostly for writing code that could find a consistent version of the whole thing)

Agreed. De-risking non core competencies is very rational, especially while iterating business strategies.

>However, Google Cloud's compute pricing is reasonably similar to DigitalOcean (with sustained usage discounts) and from what I hear these companies will often negotiate discounts.

Given the storage usecase of DropBox what would be the percent of saving if DropBox indeed went with Google or Digital Ocean?

I might be misremembering, but I'm pretty sure that when they started AWS is the only remotely reliable game in town.

Google's network egress pricing vs Digital Ocean is much higher.

Because I’m in the industry I’ve seen software companies run by the legal department or the HR department. But I’ve also witnessed both software and non tech companies where the IT department controls everything. (I know that sounds weird but a company selling on premises software, for instance, should NOT be cowtowing to the IT dept).

In every one of those cases the group in charge has been the wrong group and it really makes you wonder who has been asleep at the wheel so long that this has occurred.

Maybe outsourcing to AWS for a couple of years is a good way to reboot the organization. Cheaper than slowly going out of business. When the fad dies down you start hiring people back who are a little more humble and cheaper than AWS.

Ultimately, technical limitations (is, problem would exist regardless of people) are not always (or usually) a limiting factor for companies working. Human limitations like having the wrong group in charge are the limiting factor. Ways around that are good, potentially.

Trying to solve it with technology is naive. If the governance of a company can't regularly align and improve itself, no technology will solve that problem.

Well said but I disagree that AWS is some magical solution it's in your context just a scapegoat to catalyze change in general or a mulligan with brownfield and greenfield. It'd be far more effective to simply hire competent people to fix the organization, but that requires competent people somewhere in the governance of an organization (i.e. a board or major stakeholders).

It’s at best a passive aggressive solution to the problem. The assertive solution is to tell the manager of the group that if they don’t find some humility quick then they’ll be finding a new job.

It seems like this problem happens in organizations that equate headcount with power. Which I guess makes some sort of sense but doesn’t feel right. Plenty of companies do not have the majority of their employees working on producing the goods they sell. Especially if they’ve started automating.

But as I said above, this is an ‘asleep at the wheel’ situation. It seems like it’s often not the biggest problem these companies have with their vision.

And, it sometimes happens that actual tech companies start outsourcing their tech, which is a whole other troubling pattern.

I often wonder about the value of outsourcing and a lot of the deals I see are related to generational change whether it is management, technological stack or lopsided age structure. Not all are acknowledging the realities injecting conflicting targets into the execution.

They're on it for what can only be described as internal political limitations, meaning that they are on these services to remove the politics of technology infrastructure

Heh, at one previous company it would take 6-9 months to provision a new VM and 12-18 months for a physical. Those entrenched IT organisations absolutely deserve to get their lunches eaten.

Was about to make this exact comment. Endless forms to fill out to move a network cable.

Buuuut at my current workplace I am starting to see some slowdowns in doing AWS stuff as "departments" get more involved.

Like the "cloud team" does the account but networks must provision the VPC, and there's a "gateway review board" that gets involved if you define a new network egress or ingress etc etc.

I feel like many of the early advantages of cloud in enterprise are going to get eroded as the paper pushers catch on and "add value by defining process".

Like the "cloud team" does the account but networks must provision the VPC, and there's a "gateway review board" that gets involved

Yes, this is totally antithetical to modern working practices such as "DevOps"

At that place e.g. the VM team could provision you a VM (eventually) but they couldn't do you a login on it, that was some other team. But you couldn't raise the paperwork against that team until the VM was created, so everything proceeded serially, and each step at a glacial pace. The time ratio between them and Azure or AWS is literally under one minute for every month!

It is - but the company probably is trying to solve a problem of past mistakes. I can only speculate, but at one point, someone must have munged a network acl and created an outage. Now all network acl's need to be reviewed before being added. Maybe someone toasted VPC peering. Or their was a private VPC full of sensitive data that all of a sudden was publically available.

There was an old saying that 80% of outages were caused by humans. But there is more to account for than that. For example: - Certifications that the org has obtained - Contracts that the org has signed with customers - Compliance with laws and regulations (and international) - Insurance requirements

It can add up in a hurry and can slow an org down quickly.

It would be nice if an org could review all of its failures every few years and drop nearly all of its processes and procedures and start out fresh again - vowing to build around the failures of the past.

It would be nice if an org could review all of its failures every few years and drop nearly all of its processes and procedures and start out fresh again - vowing to build around the failures of the past

Indeed. That IT department is learning a lesson of what happens when your end users discover cloud providers and come into the office and tell their management how quick and easy it is. Managers don't know or care about the past - they only know that what should take minutes is taking months. If that department had been able to turn around a VM request in a day or two - not unreasonable - they would have been safe. Now they are being asked questions for which they have no good answers, such as, what exactly do all of you do all day?


and also in these days of data breaches where the sensitive data was just lying around on open s3, more are more places require approval from above before adding anything to aws.

This is my current life. Testing an idea costs $25M. It’s absurd, and most of my time is spent filling out forms or explaining the basics of the web to idiots.

Hiring manager here.

While your reasons are valid you are missing an important one:

Resource scarcity: the engineers that I need allocate to infrastructure I rather have working on user facing features and improvements. Talent is scarce, being able to out source infrastructure frees up valuable engineering time.

This is one of the main reasons, for example, that Spotify (I’m not working for them) is moving to google cloud.

I do devops consulting, and typically I end up with more billable hours for AWS setups than bare metal or managed server setups. The idea that AWS takes less effort to manage is flawed. What instead tends to happen is that more of it gets diffused out through the dev team, who often doesn't know best practices, but often nobody tracks how much of their time gets eaten up by doing devops stuff when there's nobody explicitly allocated to do the devops tasks.

There can be advantages in that the developers more often can do the tasks passably well enough that you can spread tasks around, but if it's not accounted for a lot of the time people are fooling themselves when it comes to the costs.

When it comes to large companies like Spotify, the situation changes substantially in that they're virtually guaranteed to pay a fraction of published prices (at least that's my experience with much smaller companies that have bothered to negotiate).

> nobody tracks how much of their time gets eaten up by doing devops stuff

This has been my experience working with companies that use cloud services as well.

Another big waste of time is on application optimization especially around database usage. Cloud services tend to provide very low IOPS storage (and then charge exorbitant amounts for semi-decent performance) which forces spending a lot of wasted time on optimization which would never be an issue on dedicated hardware.

> This has been my experience working with companies that use cloud services as well.

It's generally the case across large parts of IT. I confused the heck out of the first manager I started sending itemized weekly reports of the cost of each functional area and feature requests (based on average salary per job description), as he'd never seen it before. But it very quickly changed a lot of behaviors when they realized the value of the resources spent on various features.

Only amazon though. This is something that infuriates me about RDS.On google cloud IOPS performance is not based on instance size.

>There can be advantages in that the developers more often can do the tasks passably well enough that you can spread tasks around, but if it's not accounted for a lot of the time people are fooling themselves when it comes to the costs.

It is cheaper than hiring a full DevOps team which is a better apples to apples comparison. By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation. If the load cannot be spread across the team but requires specialized DevOps engineers then I lose both those very important points. Obviously once your company is large enough it's different but for small teams/companies it is an important factor.

> It is cheaper than hiring a full DevOps team which is a better apples to apples comparison.

My experience is that it often is more expensive when you actually account for lost development time.

> By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation.

... and my experience of this is that I'll never do it, ever, because of how it affects retention.

> Obviously once your company is large enough it's different but for small teams/companies it is an important factor.

Most of my clients have been small to medium sized, and I see the cost tip in teams with as few as 3-4 people at times.

> By spreading the load across the dev team I can automatically get a high bus factor and 24/7 hour on-call rotation.

This assertion supports what vidarh wrote. What you wrote has nothing to do with DevOps or software or engineering - what you are really saying is that you are saving money by coercing your developers into working two jobs at the same time. I have been in this position as a developer at a company where we had on-call rotations. This is a false economy and a quick way to increase stress, alienate employees, and increase turnover. Infrastructure tasks get neglected and are performed poorly because those tasks are now just necessary distractions to the main workload of feature development, to be gotten over with as quickly as possible. A lot of things get overlooked because no one "owns" areas like backups and disaster recovery.

I would phrase it differently as competence scarcity.

It doesn't take many people to do soup to nuts businesses.. think WhatsApp's 50 engineers, Netflix's 100 person OCA team (if you don't think OCA is a standalone product you don't know much about technology business) doing 40% of the Internet by volume.. the vast majority of people just aren't very good that work in technology. Business governance grossly underestimates the effects of mediocre performance.

So the real question is why aren't governors trying to encourage WhatsApp and OCA style businesses, it's far more cost efficient. I understand why an organization itself empire builds, misaligned incentives.

the engineers that I need allocate to infrastructure I rather have working on user facing features and improvements.

Cloud services still need configuring and managing. You're saving on 2-3 days upfront on racking and cabling, on boxes that will last at least 3 years, probably longer. So if this is your only reason, you're making a false economy, eventually the costs of an undermanaged cloud will bite you (e.g. VM sprawl, networking rules that no-one understands, possibly-orphaned storage, etc).

> You're saving on 2-3 days upfront on racking and cabling, on boxes that will last at least 3 years, probably longer.

"Infrastructure" is a little broader than just some cabling, much broader. You're also assuming that whoever will be in charge of DIY is a) more competent at scale than whatever will be scraped together for the cloud, and b) available with no resource cannibalisation.

The point the person you're replying to was trying to make was that for every "good" hire you're deciding where to allocate them, and sourcing plumbing from a cloud provider lets you allocate preferentially to product development (ie business growth). Even if you "pay more" for that setup, in theory business growth you will achieve more rapidly pays for it many times over (first mover advantage, market leader advantage, cost of money over time, etc).

The costs of pinching the wrong penny and making technical hiring more difficult, diluting your talent pool, can be the difference between huge success and too little too late. An undermanaged local setup that cost you 3 years on Time to Market will bite you long before 'eventually' comes, and you won't have oodles of cash to fix the problem.

How does that work out? In situations, granted only a small handful, I've worked where AWS has been extensively used what ends up happening is everyone ends up doing "devops". Whatever that might mean in a formal sense, the way I see it playing out in reality is that every engineer ends up having to spend time tinkering with the infrastructure, so does it really free up valuable engineering time?

For personaly projects, I use AWS and Azure (though am likely to migrate everything to a single box at OVH because it turns out to be cheaper for better performance - go figure) and it's made a certain amount of sense (up to now). At work we use dedicated hardware, because the cloud can't deliver the bang per buck.

You still need to engineer devops yes, so having an engineer allocated to that still makes sense. If "everyone ends up doing "devops"" you might not have a general infrastructure strategy (what your engineers need is a CI/CD pipeline) or doing something like microservices (which might or might not make sense depending how flexible you want to be and how many teams you have/need to have).

At growth companies using cloud makes complete sense because it's all about time to market and iterating on your business proposition. Requirements change all the time and having flexibility of cloud offerings gives you the velocity. Whether at scale/in maintenance mode it makes much for sense to cut corners and optimise spending.

Either way you want to focus on what brings the most business value.

Uber does the same. It the good old "buy it or build in-house" question. If total cost of ownership is more, you want to change.

Well...first off, I don't think that "human factors" like wanting one less manager, are always that bad.

Those two reasons (for needing aws) are the technical problems that aws can solve, but you can't solve in-house. That is, they are not even solvable on a boardroom whiteboard, where the board pretend everything is just a matter of resource (money) allocation.

But (imo) most of the things that companies fail on... it's not because it is impossible to do a good job. They fail for less inevitable reasons.

In any case, I actually like the strategy where you try to be good at the things that you're good at but minimize things you need to be good at.

Dropbox knew that aws was expensive. If the numbers here are real, then in housing would have been a byug efficiency gain (on said boardroom whiteboards) for years. Makes sense when you consider what Dropbox does.

I assume they paid this price because it let them avoid being an infrastructure company. They would have had to be a very good infrastructure company. Why introduce this additional failure point, limiting factor or whatnot?

I (maybe you too) have seen the kinds of problems that aws solve be the limiting factor in a bunch of companies. The fact that they're technicaly solvable is almost academic, at some point.

tldr, sometimes it's good to solve certain problems with money.

Arguably the most popular reason is "everyone else is using it". This is what I hear all the time. It usually goes like: "Why aren't we using AWS?" "Well, because it's a few times more expensive than our current infrastructure" "How can it be? If it's true, why all other companies are using it?" "They fell for the hype, just like you". Then we sit down, we do some calculations, we check different scenarios, and it always turn out AWS is more expensive than dedicated servers. In the past people also used a very strange argument that with AWS you don't need to pay the IT staff anymore - but I no longer hear this argument, I think most companies already realized how ridiculous it is. The most recent fad is the "serverless revolution" with some people claiming this time for sure the IT staff is unnecessary since the app developer can take care of everything. Good luck with that fantasy.

IMO there are many many more reasons:

- you need to iterate rapidly with with scale and reliability. if you have the right expertise this becomes very quick to setup. it lets you focus 100% on product iterations.

- you need (predictable) on-demand compute for crunching large amounts of data or running some batch jobs. it just doesn't make sense to do this on your own equipment.

- your median cpu utilization is low, so you want to save costs and you move to a serverless architecture, effectively moving the cpu utilization you pay for to 100%.

- But most importantly AWS isn't just compute and storage primitives: AWS has a vast array of abstractions on top of the cloud primitives: managed clusters, machine learning services, virtual desktops, app streaming, CI/CD pipelines, built-in IAM and Role based access control, to name a few !

Don't forget all of the cloud providers offer a robust sets of APIs and SDKs to automate provisioning of all these services. That is valuable apart from the actual service.

It's also easy to hire people who have AWS experience. It's getting harder and harder these days to find anyone who has actually seen the inside of a data centre.

The lock in is even worse than people think.

Now more and more companies are locked in in Amazon.

Hard to find a good old Datacenter Admin.

As an "old-school" sysadmin I have the opposite view: it's difficult to find jobs that don't require AWS these days.

I know perfectly well how to provision and scale a large infrastructure and can give you 99,999% availability in any application, BGP if needed.

Yet no one is interested in that. Sure I can write Ansible scripts and Terraform policies, but it's a miniscule part of my skillset and doing it on AWS is just boring compared to building the backend that powers it.

I'd like to challenge your implicit assertion that if due to knowing customer behavior patterns, scaling is predictable, therefore (this is the part I'm assuming you're implying) scaling is doable (or even easy).

The counter argument I have is that at different sizes of operations, completely new skills become important, so you and your staff are left behind.

Example: my previous employer became large enough in terms of hardware footprint (~>1M cores) that it started getting difficult to find commercial colo space. How good are software and systems engineers at electrical engineering? :)

The alternatives aren't either AWS or host yourself. You can rent managed servers for a fraction of the price of AWS instances.

Granted, if you need 1M+ cores, you're going to be dealing with humans most places (including AWS) to get the best deal possible, and that also means the cost differences can change fairly substantially (e.g. the instances I know of that are in "ask us" territory are not paying anywhere near published prices)

That said 1M cores is not that much. Depending on your needs it's as little as "just" 500 racks. Plenty of managed providers will be happy to provide customized service to e.g. design and/or manage a setup for something that size.

For a contract that big, AWS will build you your own private datacenter. US intelligence bought one themselves back in 2013 [0] and AWS just built a second one for them (different security clearance for each)[1].

[0]: https://www.theatlantic.com/technology/archive/2014/07/the-d...

[1]: https://aws.amazon.com/blogs/publicsector/announcing-the-new...

I will take a guess that 1M core is a big compute grid.

You go to google cloud and order 10k preemptible instances per datacenter. That solves half of your problem. Then the same onto SoftLayer and AWS and revise monthly who is the cheapest one. It's not very difficult.

Contrary to you, I think that most places are not ready to accept that kind of contract. I have a friend who evaluated to move one of his compute grid to DigitalOcean and he was contacted to stop doing that, their servers are not meant to run at 100% CPU use all the time.

I wasn't talking about cloud providers, as for that kind of scale you're paying 3-4 times easy vs. hiring someone to prepare a cage design for you and wire it up at a suitable colo provider. And finding colo providers that can take a sufficient numbers of racks to be worthwhile is easy.

We were already paying colo providers by the MW and had the people to build out the space. It's just that the upfront capex required for a colo provider to build capacity was getting to a point where that meant a significant bet on their part on our continued growth. Thus we started having to look at one options, one of which is to contract them to build bespoke infra.

Regarding cloud being much higher cost: indeed. Anecdotal evidence; I know of a big reference customer who said they would consider the move to cloud a success if total cost didn't go up more than 2x. A few years in, that hadn't happened yet...

Agreed. It's a hassle to negotiate a few racks and that's barely a few millions dollars. I can't imagine doing that for a hundred rack at a times.

Twice as much would be a huge savings. In the cloud, I can give you 20k cores in one hour and I can give you 100k cores in two hours, after the initial investment to open multiple regions and networks backbones. The price is known in advance and the investment can be cancelled anytime.

Obtaining the same 20k cores from a colo will be a hassle. Months of negotiations with nothing being done. I'd expect any sane provider to bail if trying to discuss 100k cores, it's too much upfront investment. It would take years to get half of it, the project is long abandoned before anything is delivered and everyone who worked on it has left.

Our use case was actually very mixed. Certainly throw in somewhere around 100PB of storage if you want to do a semi accurate napkin math.

I've sold and competed against AWS in the last 4 years and you hit the nail on the head. "Someone else to blame" drives a lot of these decisions. It's also important to note, additionally, that AWS has created an excellent marketing machine for their services. I've sold AWS instances to companies that would have been fine with a rack in the closet. But they've heard about the cloud and saw a press release where a competitor was going to "the AWS" and...the checks just write themselves.

As a tech enthusiast I love what's possible with AWS, Azure, GC, etc. As a salesperson I don't mind selling these services (although the margins stink compared to selling VPS or dedicated). But there is a lot of cloud-overkill going on out there.

When I did the calculations, the break-even point on AWS versus on-premises was about three servers - at that point it was cheaper to go with your own physical hardware.

The big reason for most people is CAPEX vs. OPEX - even if it doesn't make financial sense in a dollar amount, it does in an accounting sense. Investors don't like to see big CAPEX numbers but seem fine with large OPEX ones.

> Investors don't like to see big CAPEX numbers but seem fine with large OPEX ones.

If things go pear shaped large OPEX numbers resolve themselves as OP-erations get slimmed and shut down. Large CAPEX numbers, in the same situation, resolve themselves through liquidation and tears...

More importantly, OPEX comes from next years profits yielding a business I can loan against. CAPEX comes from last years profits, increasing the amount of loans I need to get it together.

It's the difference between thinking about short term profit margins and thinking about asset growth over time. Throwing a lot of optional cash today at a problem is better business than being forced to throw non-optional cash at a problem whenever the problem is feeling problematic. It's also quite freeing in terms of M&A.

> More importantly, OPEX comes from next years profits yielding a business I can loan against. CAPEX comes from last years profits, increasing the amount of loans I need to get it together.

That's a very illuminating way to highlight the accounting fears mentioned in the gp.

+1 Throw all other rational out the window if OPEX is what works. So here we are migrating to AWS..

They absolutely are a technology company, but compute (EC2) and storage (S3/Glacier) is a utility, like power supply. This wasn't the case back when you needed capacity planning, but today with dynamic provisioning and cheap storage, it is.

No one tries to build their own power station, or make their own laptops. They're better off using engineering resources on higher order stuff, unless, like Dropbox, the margins / TCO you are getting on your storage is an absolutely huge deal.

Even domestic premises have solar with grid top-up (and they can sell back to the grid often too). Like having baseline server resources on your own hardware and AWS - or whatever - for handling high demand.

But people do build their own power stations or hardware if they're big enough.

They might outsource most of the work, but then the gap between generating and using power is much more clearly defined.

There is an opportunity cost to switching too: you don’t know your gain before switching is realized (hence you don’t know over how long to amortize the project), and the switching project could fail altogether. So there is a conservative argument to be made here.

The first one is not a reason. If you're really small, you're much better off renting servers from a cheaper competitor, cloud, or even VPS.

There's another good reason: avoiding AWS sticker shock, which is real. Better to spin up an OpenShift instance and know what's going on with prices. God knows what gets spent on AWS resources that have been forgotten or, more likely, no one wants to run the risk of turning off. AWS has become very expensive for startups once you're past their short freebie intro phase.

Why would it be different with OpenShift? Either you know what you're running or you don't. Tagging on AWS gives you per dept/team/app cost split. If you don't use something like that you'll be lost in OpenShift as well...

Every startup I know got 100-500K$ in aws credits so your "short freebie phase" is dead long in practice.

[can't edit...]

someone asked me on PM how to get your hands on that pile of virtual cash. I'm no expert on that, and likely YMMV, but for us, we basically called account managers in the big 3 (GCP, aws, azure) and went shopping. With some PR visibility (i.e seed, some stupid "innovation reward", I guess any alternative empty proof of existence applies), we got offers (in the 50-500K range).

The way I see it, that's the de-facto standard - launch a VC funded startup, and cloud providers will align next your door to shell credits on your head.

Y combinator startup benefits: "We have deals of varying degrees of formality with many companies in the technology world either to give free services or special access to YC-funded startups. Deals range from $500,000 in cloud hosting credits from Microsoft Azure to 60% off Optimizely’s A/B testing plans." [0]

Google also has a dedicated page where they partner with a bunch of VC's [1].

[0]: https://www.ycombinator.com/atyc/#connections

[1]: https://cloud.google.com/developers/startups/

This isn't surprising. At Blekko where I ran operations we did the math not once but twice (once when we got our series C to show that it made sense, and once when we got acquired by IBM to show why moving it into Softlayer was going to be too expensive). If you can build reasonable management tools and you need more than about 300 servers and a bunch of storage, cost wise its a win to do it on your own.

The best win is to build out in a data center with a 10G pipe into Amazon's network so that you can spin up AWS only on peaks or while you are waiting to bring your own stuff up. That gives you the best of both worlds.

So, rent the spike a la hybrid cloud?

Yes. The beauty of it is that building in the capability makes hosting your own even more cost competitive. Because while you might have to plan for 2x or even 3x your average to handle normal peaks if you don't have anywhere else to send traffic, if you can spin up cloud instances to handle spikes you can plan for far higher utilization rates for your own equipment.

Makes a lot of sense for the customer. I just wonder what Amazon could try to do to prevent this model becoming prevalent.

Contrary to what you think, they activately encourage this style of usage.

AWS has a product called Direct Connect [0] to reduce bandwidth costs between AWS and your infrastructure.

[0] https://aws.amazon.com/directconnect/

This is the local data center connection option I was referring to, the Coresite data centers on Stender Way (in Sunnyvale) had this option.

There is a latency spike between local and Amazon infrastructure so it would be critical to build your system so that putting this spike in the mix didn't impact your own flow path.

I had thought a bit about how we might do that at Blekko and I would probably shift crawling into the cloud (user's don't see that compute resource) and move crawler hosts over to the frontend/index side. But I'm sure there would have been a bunch of ways to slice it.

I think it'd be false economy for them to try, as if the customer understands how expensive Amazon is, but still wants to use them for certain things, they're more likely to jump ship than switch fully to AWS if AWS starts pulling any stunts.

They certainly can and likely will respond by tweaking their cost models - bandwidth costs at AWS are completely out of whack - e.g. list prices at AWS per TB are tens of times higher than Hetzner for example; I presume that's based on looking at what customers are most sensitive to. E.g. if you retrieve only a small percentage of your objects every month, it won't matter much. Similarly, if most of your retrievals is from EC2 rather than from the public internet, the bandwidth prices won't be a big deal, and you may not pay attention to how much you actually pay per TB.

The high bandwidth prices hit some niches much more than others, and it may be more expedient for AWS to keep it straight forward for those affected to put caches in front than it is to rattle the cage of other customers. E.g. if you consume huge amounts of bandwidths you'll sooner or later speak to specialized CDN's or start looking at peering anyway.

From reviewing the thread's discussion, it looks like businesses turn to cloud-based infrastructure for a number of reasons:

1. To outsource non-core activities to experts and reduce risk, for firms that see IT as a cost center. A cost-cutting measure.

2. To provide dynamic capacity for mature businesses that experience anticipated workloads that are short-lived (seasonal or computational needs). A cost saving measure.

3. To provide dynamic capacity for new ventures growing in popularity i.e. fluctuating capacity requirements. This saves on large upfront infrastructure costs when long-term viability hasn't been established. A risk management measure.

4. To be described as "innovative" because peers are doing it, for firms that see IT as a revenue center (in industries that view such investments as a source of differentiation). A form of virtue signalling.

5. To make the accounts look good to investors by accounting for it as OPEX instead of CAPEX [0]. A seemingly irrational but valid reason. High OPEX numbers are easier to justify and more importantly, can be pared down with less friction than CAPEX, if things go south from intense competition for instance. Another risk management measure.

[0] https://news.ycombinator.com/item?id=16458863

Data centers aren't cheap, so unless you have the economies of scale to offer R&D investment and stock-based compensation to your employees to build a modern cloud DC, good luck with that... done right you can save operating expenses, but it'll take a huge investment that would not scale for others.

I strongly disagree. Datacenters are super cheap compared to EC2. (I'm not talking building your own: start by leasing space from existing datacenters). There are a surprising number of places where you can go and lease a rack or ten or a whole room and be up and running in a couple of months.

I make the case that colocating pays off at just about any scale, assuming you have $10k in the bank, have a use for at least 40 cores and are able to pay upfront to handle anticipated scale.

Hurricane Electric has prices online of $300/mo for a rack. On AWS, a single full c4 machine (36 threads) costs $1.591 per Hour x 24 x 30 = 1145/mo -- this is more than the cost of running a whole rack with 40 machines. Decent internet can be gotten for hundreds per month.

Ok, so how about buying your own machines? E5-2630 with 20 threads is $700 x 2 = $1400 + motherboard + disk + ssd brings it to several thousand, so it will pay off in at most 6 months, and we're not even talking bandwidth or storage costs. Depending on the application you could be looking at a payoff after 2-3 months.

Worried about installing or remote management? IPMI, iDRAC, etc included with basically every server make this a piece of cake.

The only good case for cloud are if you may suddenly scale 10x and can't predict it; don't have $10k in the bank; or don't have 1-2 months to order machines and sign a contract for rack space.

What you're missing in that comparison is an extra engineer (or two) who can deal with power needs, networks hardware config, firmware updates, plans for rolling hardware, stock of replacement drives, seamless base system updates, dealing with the platform (virt or containers) itself and other things that the managed cloud gives you included in price. Ah, and they need to be available on call. Hardware may be cheap - people aren't.

You need people to manage AWS instances too, ive never seen any evidence that it actually takes less people at scale.

Sure for a few instances, its likely to be true because there is a certain amount of fixed overhead for dedicated hardware, but it remains a relatively low constant, but in reality there isnt much difference in terms of labour between wrangling hundreds of AWS instances, and hundreds of servers, and the servers will be many times cheaper to run.

This is a good comparison with EC2, but doesn't directly address the comparison with many other AWS products.

Nowadays, you don't need to do that yourself to run a cloud because you have companies like Joyent offering fully managed private clouds at a fraction of the cost of AWS. They will rack and stack, manage the cloud software, provide escape plans so you aren't locked into them as a vendor and provide architectural guidance to your app teams.

DISCLOSURE - I am a Joyent employee.

Even Facebook leverages colo facilities as some of their POP's. Still, I think it's common that somewhere near the mothership these big guys build a facility from the ground up.

FB uses colo facilities as POPs because that's where the eyeball networks connect. It's strictly a latency/connectivity play.

If the revenue is there, it’s a no brainer to build out your own data center or acquire prebuilt space. Cloud provider margins become your cost savings (clearly, as evidenced by this article). You’re going to need infrastructure people regardless, whether it’s on-prem expertise or AWS/Azure.

It’s disingenuous to inflate the compensation for datacenter employees as a boogie man, or to wave away a “modern cloud DC” as a Herculean undertaking. There rarely is any R&D required, and stock compensation isn’t always necessary.

Fair points, appreciate the insight. I think living in a tech bubble city clouds my thinking a bit much.

The only real example I have good knowledge of (1 of 1 cases) is a tech friend who worked for a firm acquired by a company wanting to build out data centers, and in the end everyone left after their acquisition stock awards vested, before really adding value to the corporation and delivering on the promise, because the company did not have a "tech r&d" comp plan in place to keep the subject matter experts employed. In the end the project and the DC expansion fell short.

I can appreciate that living in tech hot spots can tilt the understanding of certain domains. “Unknown unknowns” if you will.

Most people I know who build datacenters don’t expect a tech R&D budget or outrageous compensation, as it’s general contracting, racking/stacking equipment, and infrastructure engineering (pxe booting, remote consoles, reimaging for OS and/over hypervisor, etc).

Here’s a non-comprehensive list I generated on the fly yesterday of tech orgs who run their own datacenters or equipment in colos:


So it only takes how many exabytes of storage for it to be cheaper than S3, by only < $40M?

For all this talk about how expensive S3 would be for a filesystem, and how poorly suited AWS is for this kind of thing, Dropbox has seemed to make it work just fine.

Well we removed all our servers from AWS and replaced them with lambda functions and dynamodb tables which resulted in 4.5 reduction in cost and increased performance by multiple factors. I suppose it all depends on what you are building and how you are building it. If you run servers I think it is no secret that AWS is not the cheapest option around.

Did the switch to DynamoDB make a big cost difference? I've never really thought about the cost of Dynamo vs. RDS as being huge, but honestly I don't know.

For most people (including every commercial project I've ever worked on), the time-saving and safety benefits of relational schemas was far greater than any theoretical infrastructure savings.

> Dynamo vs. RDS

This is so dangerous even to type out :) In some, limited cases, Dynamo can replace tables in an RDS, and completely outperform it, too. I'm a big fan, but it's fundamentally different from an RDS, and you can get burned, badly.

Oversimplified, DynamoDB is a key-value store that supports somewhat complex values. If you have large values, use S3 instead [0] - I think that's a good way to think of it, a faster S3 for loads of small records (with a nicer interface for partially reading and updating those records).

If you need to look up on anything but the primary key, be careful, costs can get out of control by having to provision extra indicies. If your primary keys aren't basically random, you'll run into scaling issues because of they way DynamoDB partitions. If you need to look at all the data in the table, DynamoDB probably isn't the right technology (it can, but scans absolutely tank performance = $$$).

[0] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

I think looking at something like DynamoDb as a wholesale replacement for anything that needs a relational schema is... well begging the question a little.

DynamoDB as a hugely scalable limited-scope data-source in your app are likely where you'll find the optimal cost/scalability point. By using Dynamo for scalable read-heavy activity your let the rest of the app be barebones and 'KISS', preserve competencies & legacy code, and retain the benefits of relational schemas. Dynamos scaling then becomes your cloud-cost tuning mechanism.

By way of example(s): if your editors all use RDS and you publish articles to DynamoDb you could be serving tens of millions of articles a day off a highly non-scalable CMS. If all your reporting functions pull from DynamoDB you could be serving a huge Enterprise post-merger while using the same payroll system as pre-merger. Shipping tracking posted/grabbed from Dynamo, purchasing logic on the best Perl code you could by in 1999 ;)

The biggest part of the bill are the dynamodb indexes. The lambdas are a tiny fraction in comparison. That being said, you can avoid using as many index as we do. We did it because we wanted our lambda functions to be as pure and as micro as possible.

If you are going to do joins and that sort of things, forget about dynamodb. It cannot do that and for a good reason. That being said, our architecture is mostly SPA so the lack of joins is solved at the client - there are just more calls to services client-side but the affect of that is still cheaper and faster product to run and maintain.

This is the future. Server code just just be 'hooked' into an infrastructure. I am looking at Fargate by aws. I think this would basically end all devops [puppets, servers, etc]. It is basically a simple automated hosted Kubernetes - and development is easier than lambda since you just run a docker file. I avoid dynamodb and use RDS (mysql) though since I can get out reports quicker.

You mean it's the future from 20 years ago? EJB was exactly this

Troll. No, 1998 enterprise java beans it not related to a docker container manager.

1998 EJB and app servers were “PaaS” for Java. Modern Docker-based PaaS platforms are reimplementations of the same construct, with the main value-add being SDKs for a broader set of languages.

Do you have some solution for http triggers? Like your own gateway? Because I found the AWS Gateway expensive to trigger Lambda by http events.

While S3 storage costs are fairly reasonable, request costs are unrelated to file size, which makes it an absolutely terrible choice for filesystem-like activities. I say this from experience: S3 PUT requests make up a very significant portion of our AWS bill as we write a massive number of very small files to S3.

Does anyone know of a middle-layer solution that automatically concatenates files into larger chunks (similar to HDFS) prior to persisting to S3 and for retrieval utilizes HTTP Range requests to fetch the individual file? I'm going to build it if it doesn't exist. 100MB chunks would reduce our S3 PUT costs by ~1000x.

Couldn’t find anything from a google search, but seems you could buffer files for writes, tar them, PUT the tar, and then use the tar index to perform range reads against the tar file.

Edit: “dar”, “zip”, or “dump/restore” might be able to do this: https://serverfault.com/questions/59795/is-there-a-smarter-t... (will create an archive with an index file for random access, vs tar streaming requiring reading the entire file)

If you come up with another solution, I’d love to hear it!

Do yourself a favor and use a Hetzner Storage Box: https://www.hetzner.com/storage-box

Even if you don't include any extra costs, just keeping 10 TB on Hetzner will cost you 40€ max, whereas with S3 it will cost you at least $230.

A single Hetzner box is not really comparable to S3 in terms of durability and availability.

My experience is that they go down every 2-3 years, usually the problem is with the hard rive, occasionally with some other component. Under normal circumstances it takes them up to one hour to replace the hard drive, and then the array gets rebuilt for a few hours with the system being online. That means something like 2 hours downtime every 2-3 years, that's quite acceptable for my usage scenarios. In terms of money spent during that time, the difference is enormous.

Like most businesses on a cloud provider, we're fairly locked-in to AWS without major changes. Going offsite would replace S3 PUT costs with significant bandwidth costs.

My friend also got a real big bill for S3 PUT requests, they are just too expensive.

When we built JuiceFS [1], a POSIX filesystem on top of S3, there is feature in the roadmap to combine multiple small files together before uploading into S3 (each file is a slice of chunks in S3). We will let you know when it's ready.

[1] https://juicefs.io/

heh, when s3 first came out they didnt have per request charges. then someone built a fuse fs on top of s3, and adding those charges was largely to convince people not to use s3 as a filesystem

I POCd dynamodb(+api gateway+lambda for reads) for a project that was tons of tiny files dominated by puts. was a pretty trivial switch(writes was one extra line of code, reads was like 10 total lines), estimated cost savings was large(though we didnt fully calculate it). it didnt exist at the time, but i think lambda@edge might work really well for reads.

cperciva’s tarsnap server implements a log structured filesystem on S3. [1] This enables bundling of multiple writes into large objects, with the explicit goal of reducing per-request costs. It’s not open source, but maybe some of the high level ideas can be cribbed?

[1] http://www.daemonology.net/blog/2008-12-14-how-tarsnap-uses-...

Entirely untested and just spitballing... use Firehose to concat a bunch of small files and store based on size or time passage as a larger file to s3.

Then use Lambda transform in firehose to index the byte ranges for each record (file)? (Biggest q in the process would be here... you might have to index after it hits s3)

S3 also allows the Range request header out of the box as well.

Interesting problem, and I can totally related with s3 put costs be astronomical.

Dunno if it would work in your case but ... I've had fun mounting a ZFS file system using an entire EFS volume as the "physical" back end. You have to give up on writes actually being on disk when you think they are, but the payback is that's it's really kinda fast.

Depends on how you’re writing. You could send everything to a kinesis or firehouse stream and use that to batch inputs, or you could write to an Elastic File System and batch there with a scheduled function

HDFS does no such thing. If you store a million small files it'll store a million small files on disk as well.

I think the parent might mean HDF5.

Last year I wrote about the AWS spend of my SaaS business, Cronitor [1]. I couldn't imagine building a service like this without modern cloud providers, but it is no wonder to me why AWS generates all of Amazon's profit.

Essentially our migration over the years looks like:

1) Moving to EC2 from a VPS, The Ec2 instance with same specs is notably slower and you need to add a second instance where one worked before.

2) Moving to a managed service like RDS after running the service on Ec2, managed service with same specs is notably slower and you need a second instance where one worked before.

In the end, it's worth it, in the RDS example you're getting millisecond replication times, point-in-time database recovery, hot failover, etc. But still, it would be great just once if you got the same performance from an RDS 2xl as you'd get running your own DB on a 2xl of your own.

[1] https://blog.cronitor.io/the-aws-spend-of-a-saas-side-busine...

> it is no wonder to me why AWS generates all of Amazon's profit.

Going out on a limb here but I'm guessing that Amazon makes a good amount of profit from their online retail store.

AWS (and other cloud provider) costs are non-uniform as you grow.

It goes something like this (from when you small to when you're huge):

- very/quite cheap - prohibitively expensive - much cheaper than building and maintaining own infra - prohibitively expensive/more expensive than building and maintaining own infra

So various companies will end up at various stages of this cycle. Dropbox is big enough to be in the latest stage.

Sorry, what do you mean non-uniform? The costs are digressive with volume. If you have non-linearly increasing costs with linear workload, then you have a fundamental math problem with your architecture / business model

Non-uniform with respect to what you're willing to spend on different parts of your business.

AWS is not a "once the sweet spot, always a sweet spot".

Tech and implementations rarely follow neat architecture and business model maths.

I feel after getting to a certain scale moving to your own data center is a very smart choice. Its all about how well you can run your data center there after. So, good for them for scoping it out and saving money.

If it becomes cheaper to pay Sysadmins then it is always a good choice to move to your own.

Using the cloud is a waste of money as long as you don't need to scale up and down, or change the infrastructure frequently.

Bandwidth has always been insanely(!!) overpriced with most cloud providers.

What kind of business doesn’t need to scale up or down?

Maybe if there will only ever be one client connected 24/7 and that never changes but I can’t think of a real product that would avoid taking on more that one client.

For Dropbox, it probably doesn't even matter that there was a cost savings to moving from AWS. There is strategic value in not taking a critical dependency on a potential competitor, and Amazon is an obvious potential competitor in the for consumer and small business online file storage.

Using AWS meant that Amazon had IP logs for Dropbox's users. It had detailed information about Dropbox's business velocity. It had the ability to shape the customer experience of Dropbox's customers. In short, it made sense for Dropbox to move off of AWS for the same reasons that it makes sense for obvious Amazon competitors like Home Depot and Target not to build on AWS. However, unlike Home Depot or Target, the other major cloud providers, Google & Microsoft are also potential competitors in the file storage service market.

> Using AWS meant that Amazon had IP logs for Dropbox's users. It had detailed information about Dropbox's business velocity.

That doesn't seem fair to say, without proof. Why would they risk all the other business they do, plus GovCloud, etc, to spy on a company? Are AWS and Dropbox even competitors? I don't think that assertion holds at all.

The difference between self hosting and running in "the cloud" is that when there's an error in "the cloud", you can't do anything about it ... But full stack debugging can even make a greybeard cry.

“The cloud” makes it easier to build an architecture with multiple failure domains. You don’t want to be debugging in an emergency, you want to be hitting the failover button.

I think hiring dedicated servers or VPS is a nice middle ground between self hosting and all out "cloud" cough lock-in. The cloud services do the failover and auto scaling for you, but often when something goes down it turns out they didn't even have any backups and all your data will be lost unless you actually backed it up yourself.

Full stack debugging + shaders has made this greybeard cry.

Do we know how much they were spending on AWS before? Not that $75m isn't a lot but it would be interesting as a %!

> which reduced spending on “our third-party datacenter service provider” by $92.5 million offset by increased expenses of $53 million for its own data centers.

They reduced spending on AWS by $92.5M, but still store 10% of data there so ~$103M?

Looking at the $53M cost for their data center vs the $92.5M decrease to the "third-party datacenter", they saved around 43% moving to their own data centers.

This is actually great P.R for AWS. Dropbox relied on AWS primarily for a decade, focusing on features and growth. Only when cost cutting became relevant to valuation, i.e. pre-IPO, they decided to invest in their own infrastructure.

Put another way, use the cloud unless your valuation depends on cutting infrastructure operation costs

I wonder what Dropbox is using. Of course they mention 'custom' software and possibly hardware, but I'm pretty sure it'll be some sort of Intel-based setup, maybe OCP, and Linux. The might be some hypervisory things going on, possibly Xen, maybe KVM. Then there will be some sort of firmware/OOB management as well, and perhaps some non-standard connectivity.

Those are mostly given. What runs on top of it, that's what interests me. Some orchestration/ private cloud software, some container platform, and perhaps a storage platform, which/what are those for Dropbox? Would it be standard OpenStack, Kubernetes and the true in-hous stuff: dropbox object storage?

I don't think there's much public info about the compute infrastructure, but here's a blog post about the object storage system: https://blogs.dropbox.com/tech/2016/05/inside-the-magic-pock...

Do they still use AWS for compute, with their own data center just for storage? Or do they have their own infrastructure for compute?

If I remember correctly, they noticed they could dust costs significantly in storage compared to S3 because the vast majority of files in your Dropbox folder just sit there and hardly need to be read or written again (photos, etc.).

If they’re not using AWS for compute, it’d be interesting to see what sort of similar reasoning they have for why the costs are cheaper in house.

> If I remember correctly, they noticed they could dust costs significantly in storage compared to S3 because the vast majority of files in your Dropbox folder just sit there and hardly need to be read or written again (photos, etc.).

Bandwidth costs alone for S3 are high enough that I've set up storage systems for clients that used S3 for durability but put it behind huge caching proxies that held a fairly substantial proportion of their total data, where the reduction in S3 storage paid for the cache setup several times over.

Storage costs are harder to cut, but yes, S3 is typically expensive there too, but it's also riskier to do your own because of the risk of data loss. At Dropbox's scale it'll be worth it, but letting Amazon worry about durability is one of the few cases where I'll happily recommend AWS despite the high costs.

Moving data out of AWS and using EC2 for baseline load are both more expensive than using physical servers co-located with the storage servers.

This doesn't surprise me at all, considering storage is what Dropbox does and AWS has not really been competing on storage costs lately (as much as it's great when instances get cheaper, EBS has been 10 cents a gb/month for awhile now which is really problematic considering most of the new instance types are EBS backed).

What would be the cost difference moving to Google Cloud or other cloud provider? The reason why I'm asking is that the move itself requires a re-consideration of all production configuration. The indicated savings assumes that Dropbox was using Amazon 100% efficiently to begin with.

Yes, it's cheaper but that's because Dropbox needs lots and lots of infrastructure and they can hire people to run it. At some point, if you have the capacity, it cheaper to build than to outsource. Dropbox found that point.

Smaller companies don't have the option to hire a whole IT team.

The article lacks a detailed analysis of how it saved so much money, there must be other stories like this when Cloud hosting doesnt make sense for really high volume data operations. The cost savings come purely out of large volume or is there something else to it?

The cost of switching including? What’s the percentage of saving?

Didn't Zynga report the same savings prior to collapsing?

Is reporting massive savings on an infrastructure project a signal of softening demand and a weak product pipeline?

Wot Mate? You think when a company announces they are saving money on costs, lowering COGS and increasing margin by cutting costs by investing in a lower cost, higher performance solution, it's a signal of softening demand and weak product pipeline?

There's a kernel of truth in what you're saying, perhaps. Any growing company has to choose between devoting resources (engineers, cash, executive overhead) between different projects.

Alot of the time companies expand by capturing market share through adding features and expanding their TAM.

But it's foolish to suggest that lowering COGS and increasing margin means a company is in trouble.

Let's say your company is now making $1B/year (Dropbox) and you can choose between adding 3% more revenue or lowering your costs by 7%.

In most cases, for your IPO, you'll get a higher aggregate shareholder value by cutting the costs and increasing your margin.

Also, once a company is no longer being funded by venture capital and has to finance its own growth out of its profits, those millions it saves by cutting infrastructure costs can be used to hire more staff, rent more office space, and all the other things required for growth.

Notably, the cost of doing so was over 50 million. So it's perhaps not realistic for majority of companies using AWS that are much smaller than Dropbox.

This is a success story for both aws and Dropbox.

It’s understandable that lot of companies stick with AWS only because of inertia (read laziness) when they will be better off it.

I'd rather see a figure of how much more money they MADE than how much they saved by doing something that is not directly related to making more money.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact