Hacker News new | comments | show | ask | jobs | submit login

AWS sells optionality. If you build your own data center, you are vulnerable to uncertain needs in a lot of ways.

(1) your business scales at a different rate than you planned -- either faster or slower are problems!

(2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?

(3) your business plans shift or pivot

A big chunk of the Amazon price should be considered as providing flexibility in the future. It isn't fair to compare prices backwards-looking: where you know what were your actual needs and can compare what it would have cost to meet those by AWS vs in house.

The valid comparison is forward looking: what will it cost to meet needs over an uncertain variety of scenarios by AWS compared to in-house.

The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!

You are right with this analysis. The only thing to add is why Amazon still isn’t the only choice and why sometimes it is in fact better to not use it.

Your comment makes it sound like unless you know your growth pattern, can predict your spikes, don’t plan to pivot, and you know these things perfectly, you will lose. That’s not the case. The reason for that is that you have a quite significant cost difference between AWS and DIY. DIY is cheaper by a significant enough margin that you might be able to buy, say, double the capacity you need and still spend less. So if you made a mistake of up to 2x your growth and your spikes, you are still fine.

Even if you are a small operation, you still have the option to lease hardware. Then your response time to add new servers is usually hours, not days like if you were to go the full own-your-hardware route.

As an exercise, you can try to rent and benchmark a $300/month server from e.g. SoftLayer and then compare that against a $300/month EC2 instance. Chances are, you will be blown away by the performance difference.

I don't think anybody will argue that if you have a specialised workload (CPU heavy, storage heavy, etc) there's definitely cost savings at scale for DIY over cloud.

But the calculation is harder than that. People are terrible at estimating the ops personnel cost for DIY. Turns out it's hard running with minimal downtime, or engineering hardware + software to be fault-tolerant. It's hard to put a price on dev tools/APIs/doco/the whole eco-system.

Especially for that last reason, I have never been "blown away" by Softlayer, even when their instances were way beefier than anything AWS/GCP offered. YMMV.

> Turns out it's hard running with minimal downtime, or engineering hardware + software to be fault-tolerant.

It's hard to do that with cloud too. Your instances can go down at any time.

What you're trading off is a small number of people on-site at the data center (and it can in fact be small!) plus some slightly-old-school sysadminning versus new-school cloud-native development. Maybe the latter is easier to hire for (although I doubt that) or your current people find it more fun. But it's not like physical hardware dies all the time, or that you're immune from seeing the effects of physical hardware dying just because you're on the cloud.

You might save on hiring one person to replace disks when they fail and one person to write some PXE boot automation, but you'll need more than two people to run a Kubernetes at the same scale of compute capacity than those two people could have run for you.

Well, you and the poster are about to be disappointed. I did the comparison recently for 512GB memory servers and Google Cloud is now the same price as SoftLayer, while AWS is a bit more.


If you are willing to pay a lot of money upfront, which you are since you're talking about building your own datacenter, AWS and Google will both give you huge discount and be significantly cheaper than SoftLayer.

I don’t see any performance details in your write up. Is the SL server actually comparable to the GC one? I don’t know how to equate vCPUs to real world performance.

Memory is easy to compare. But how does it stack up CPU-wise.

Another question is whether lower tier servers compare favorably or not. How does a $200-500/month server compare?

Lastly, is it possible that SL is just not competitive anymore, but another provider is? I have gotten out of that gig a few years ago so I honestly don’t know, but is it possible that Hetzner or someone similar is now the leader in leased hardware?

CPUs are in the same ballpark. They are all recent generation Intel Xeon with a lot of cores. The providers each have a few different builds, you'd need to provision the machine and check the model number if you want the details.

SoftLayer is a disaster on lower tier servers. You only go to them for higher tiers.

The market is very competitive and moving fast. Both Google Cloud and AWS have been rolling out high memory hardware recently at a reasonable rate.

Hetzner is amateur tier. It's cheap second hands servers with no ECC. It's not comparable to enterprise infrastructure.

I think you misunderstand, I agree Softlayer is terrible - I was merely pointing out that when their pricing wasn't bad, their tooling has always been pants.

(But then they were bought by a company who's "cloud strategy" is to put the word "cloud" in as many DB2 and mainframe contracts as possible.)

Yes, I agree that the tooling is subpar but the price point and the servers with terabyte of memory were decent arguments, when the competition didn't have them.

It's noteworthy that SoftLayer is now worse on all aspects.

> if you have a specialised workload (CPU heavy, storage heavy, etc) there's definitely cost savings at scale for DIY

I wasn't sure what your use "heavy" means here -- is it "a lot of" or "disproportionate"? Years ago there was much less flexibility with IaaS machine shapes, but I was super impressed when Google launched "custom machine types". There's a little slider to scale up/down your RAM/CPU independently, and of course storage is already independent. In fairness, there is some correlation between CPU allocation and storage IOPS, but that's inherent to scheduling a portion of the host machine reasonably.


Yeah, if you're using loads of CPU, but not all the RAM or block storage. This usually happens if your problem isn't really parallelisable. Then scaling out isn't really something you can do easily or quickly, so the cloud loses it's appeal a bit. In those cases, it might make sense to go DIY/in-house.

First I don’t mean SoftLayer’s cloud. I mean their physical boxes. A dual processor Xeon that is all your own is a lot.

Second, it depends on your expertise. And you don’t need scale. I ran a successfully bit of infrastructure on SL with seven servers that was quite a bit cheaper than an equivalent AWS setup. I was pretty much solely responsible for the infrastructure and still my responsibilities included more than 80% coding and later management. Given my salary at the time, it was quite cost effective.

was the comparison based on on-demand instances or reserved instances? we reserve almost all of our instances within our company for 3 years and also get an extra discount on top of it. based on my calculations, i've seen AWS cheapest out of all. would like to know if i'm missing anything

This was a few years ago so prices have changed on both sides. What I advocate is doing exactly what you did: do a comparison with actual numbers and performance stats instead of hand waving that the cloud is always best.

another factor that gets overlooked some is on-hand skills and hireable skills.

if you have experience (and/or your team does) with dedicated, that should probably be for a non-trivial in decision making. likewise if you have more aws/cloud experience (or have those skills available on staff), there's benefits to that.

cloud skills don't magically happen - someone with minimal AWS skills can provide you with a setup that is more expensive to run, and possibly more insecure than a locked down dedicated box.

Exactly. If you have expertise to run your own servers, and you don't have a spiky workload, maybe it's worth it. If you have cloud expertise, that may be the winner.

you make a good point, but my anecdotal story from experience says you really need 10-30x your average capacity for peak times instead of double. an ideal world would be you have your "cheap" datacenter that you use most of the time, and somehow extend into the cloud for the true rare event when you need 30x. I'm not sure how feasible that is, though

Can't you use a hybrid strategy and use the cloud only to handle the peak load?

Handling variable load is not free in the cloud, so if you are going to pay it anyway, you can keep your predictable cost down and use the cloud when you have that temporary extra load.

Depends on your workload. If your see this traffic pattern, then yes you absolutely are a candidate for the cloud. But if you are, say, an ecommerce shop and your spike is just double your regular traffic, then who knows what’s better.

Can you use AWS for extraordinary peaks and DIY solution for cooler periods of time?

Put another way: Would Dropbox have gotten off the ground and progressed as it did without AWS?

Probably not.

Outgrowing AWS is a great problem to have.

Not to mention they basically resold a single AWS feature. All they had to do was rebuild S3 (and ec2).

Ah, you sound like this person: https://news.ycombinator.com/item?id=9224

> For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

Not only did BrandonM have a point in 2007, today it is even easier and much more advantageous to do file synchronization with Free Software tools. If you have multiple machines you have SSH on all of them already. There is no better solution out there than git together with ediff for versioning and conflict resolution. rsync and unison are great for syncing media files across many devices/USB drives/microSD cards with different filesystems on them. Hardlink or ZFS snapshots just work.

Dropbox is not even an option for me because there is no official support or reliable way to run it on OpenBSD.

Dropbox is a consumer product with limited features and use cases. Smugly dismissing the needs and technical capabilities of power users is not going to endear you to anyone here.

Then it’s good that my goal is not to endear myself to anyone here. Flippantly dismissing certain technologies and engaging in “just”isms needs to be called out because it’s not productive or helpful. Moreover, I think it’s funny that the same kinds of posts are being made in 2018 as were made in 2007 (i.e. “why don’t you just...”), considering they have nothing to do with Dropbox saving money by migrating away from AWS. Really, it’s not germane at all. Perhaps if the source were an article about open source alternatives to Dropbox.

> Flippantly dismissing certain technologies and engaging in “just”isms needs to be called out because it’s not productive or helpful.

I agree that hashkb's comment was flippant and unhelpful, but your reply was not any better and was the one that went off-topic.

That's all true, but you are not their customer here. Their customer is the person who has never heard half the words you just said - or at least the person who has to support a bunch of people who've never heard half the words you just said. :)

And he was totally right. Now it's even easier and mature open source tool and libraries are available. https://syncthing.net/ is good example

I think that's a little disingenuous. Dropbox provides:

1. a nice web interface;

2. synchronisation;

3. an easy-to-understand pricing model;

4. tightly integrated desktop support (at least for OS X).

There is not dishonor in that. The “magic” they provide is still better than the competition.

S3 and EC2 are by design commodity, low level artifacts. Like any other commodity, you should always be shopping around.

The big issue with Amazon has always been their “roach motel” philosophy where data goes in but doesn’t come out. (ie they hose you with network costs)

If you have enough scale and enough dollars, it is often cheaper to build bespoke infrastructure vs renting somebody else’s at margin.

From the perspective of a business, building your own data center only makes sense if it part of your business (like at Dropbox). Paying a margin for a service that is not part of your core model is per se not a bad thing, in most cases it makes a lot of sense. That is because if a company makes a huge investment, it makes this investment with its own capital. In theory (and that often applies to reality) a company can only invest money effectively in its core business.

This is why many companies with a lot of money still rent their offices – even if they intend to stay in the (often customized) buildings so they don't need flexibility. It just would not make sense to buy and to bind the capital in a building, it can be invested in their core business with higher returns.

An exception might be made in cases where a company has suddenly so much money that investing it all in its core business might diminish the capital/return rate, eg. through the US tax break, where a company shifts the money stored oversee back to the US. This is why for Google it might make sense to buy property, even though its not their business: They just have to much money.

Correct, but all of those are basically independent of the backend storage system. At one point you could almost consider Dropbox to be a VAR for S3 storage, with the emphasis on "Value Added."

Add that their flat rate pricing is actually a discount for moderately heavy users. If you use half or more of the 1 TB on the pro plan you’re already saving money compared to S3 pricing.[1] This is easy to do if you use something like Arq to back up your machines and take advantage of selective sync to archive rarely used files from old projects.

[1] S3 at 0.023/GB/month * 500 = $11.50

and 11.50 doesn't include traffic, which can get you even higher.

While S3 was the bulk of it, I wouldn't be surprised if they had used DynamoDB for metadata storage, SQS for job queuing (pushing share data down to clients asynchronously), SNS for broadcasting pub/sub, Redshift for data analysis, ASG for deployments, etc.

cost for both DynamoDB and Amazon Redshift can both get out of hands real quickly. I've seen it first hand. With Redshift, the problem is that data teams do not adjust the default configuration of 5 query slots. So when things slow down, instead of increasing concurrency for a given node, they add more nodes. I've seen monthly invoices of $100K+ for a workload that could have been done with less than a $2,000 / month.

metadata on mysql on bare metal last i checked. it wasn’t on ec2

> A big chunk of the Amazon price should be considered as providing flexibility in the future.

A recent place I worked at didn't understand this. They were going to the cloud as a strategic move because they didn't want to run a data center any more. Their computing needs were stable and predictable - no "Black Friday" sales that needed triple the servers for a short part of the year. They were going to end up paying far more for the total computing cost than if they had just kept buying blade servers and VMWare licenses.

I've worked somewhere that didn't care. Business SaaS platform, that moved to the cloud because maintaining servers was dull and taking up too much developer and devops headspace. The entire yearly spend was about as much as the salary of an ops person.

I'd argue that companies where the hosting costs are the primary, or even a significant cost, are a small minority.

I worked somewhere where the resident DevOps/Sysadmin guy would rather repeatedly write me emails about using an extra 10GB of disk for a data science project than just buy more storage. And this was on an instance with like half as much memory as disk available. There are some people in this industry who just have zero business sense.

Then there are people on the opposite side of the spectrum who don't even look at server costs. I have a task sitting in our backlog to optimize our instance usage which could save us over $100,000 a year. It has been constantly deprioritized in order work on feature development which is stupid because we could hire another developer to do it in a few days and then keep him to increase our feature development all while costing our company the same amount of money.

Granted, I think this is more of a case of the prisoners dilemma [0] between lower management and upper management (lower management doesn't want to work on it because it doesn't produce features that make him look good but he also does not want to propose the additional developer for it because then upper management will just tell lower management to do it without the additional developer).

[0]: https://en.wikipedia.org/wiki/Prisoner%27s_dilemma

This is a pretty common trend in lots of places. These kind of decisions are driven for things other than computing needs. Maybe to look cool/cloudy/nextgen ...

You can design for and get steady-state discounts on cloud too. It's not only about flexibility but also maintainability and ops overhead. The increased spend on cloud is still usually less than the cost of a sysadmin/IT team and the handling of inevitable hardware and DC issues.

This is what happens when directors and C-level folks get a reputation bonus from being able to talk about how they "led a migration to the cloud" in their previous company.

A lot of times companies can yield short term savings because of how they depreciate or are taxed on assets like server rooms and equipment.

Most of the times I saw that,they did it because in-house pricing was insane and in IT they wanted to cut costs.

Fire team, comeback a few years with a new cheaper team and in-house pricing.

Job done

It's good to understand the different price dynamics and useful to have some rules of thumbs to avoid long cost calculations.

For most startups I would actually advise to start with Heroku, which is even more expensive than AWS (it is built on top of AWS). But you save on devops and building the CI/CD pipeline. For a small team it can make a big difference in terms of productivity.

For worker workloads like CI, renting dedicated hardware (like Hetzner) is usually cheaper and produces more reliable results. spot instances also work but have less reliability dues to machines cycling. The main factor for keeping everything under AWS would be egress bandwidth pricing or if the workload spikes are bigger than 2x.

I am still holding my breath for the tools to mature to the point that people can run their own data centers again with less overhead. My lips are getting a little blue but I see some hopeful signs.

For number 2 especially, there have been some cool projects for efficiency gains when different parts of the organization experience different traffic spikes. Like Netflix, where transcoding spikes precede viewing spikes, and they pulled off a load balancing coup to reduce peak instances.

I think the right thing for a tech company to do is to run their own data center in at least one town where they operate, and use cloud services for geographic distribution and load shedding.

The temptation to reduce your truck numbers below long term sustainable rates is too great, and so is lock-in. The best hunt I think you can do for the long term health of your companies these days is to hold cloud infrastructure at arm’s length. Participate, but strongly encourage choosing technologies that could be run in your own server rooms of another cloud provider. Like Kafka, Kube, and readily available database solutions. Make them avoid the siren song of the proprietary solutions. We have already forgotten the vendor lock-in lessons of the 90’s.

A good option can be to use your own data center for base load, and on top of that use AWS for traffic spikes. That way you still have the flexibility to adapt quickly but at a lower cost, once you reach a certain scale.

use your own data center for base load, and on top of that use AWS for traffic spike

Much easier to go hybrid on Azure, at least until AWS and vSphere integration is ready for prime time

AWS only provides optionality if you don't get deeply into the AWS services. If you do you find that optionality is absent.

A startup I talked to not too long ago has a revenue of around 1M a month - pretty great. Their Amazon bill is around 1M a month - not so good. Their entire architecture is one AWS service strung into another - no optionality at all.

I think you're talking about a different kind of optionality than the parent post. That was talking about option to expand/contract capacity, change infrastructure, or all the usual "elasticity" that they tout.

You're talking about vendor lock-in. Which is a totally valid point and something to be aware of, but basically orthogonal.

> (1) your business scales at a different rate than you planned -- either faster or slower are problems!

> (2) you have traffic spikes, so you to over-provision. There is then a tradeoff doing it yourself: do you pay for infrastructure you barely ever use, or do you have reliability problems at peak traffic?

> (3) your business plans shift or pivot

Anyone at any real scale has a multi-datacenter setup and AWS is effectively just a very elastic datacenter you can tap into. You could just as easily tap into Google Cloud or Azure. You do not need to operate 90% of your business in AWS to use AWS.

> The corallary of this is, for a well-established business with predictable needs, going in-house will probably be cheaper. But for a growing or changing or inherently unpredictable business, the flexibility AWS sells makes more sense!

Its still cheaper to go in-house with a multi-DC setup in anything but a single DC setup in AWS with very few nodes.

Architecture for a decent sized business should be setup with 3 DCs in mind /anyway/. You have two primaries, you scale <insert cloud provider> as needed and only leave some DB servers there most of the time.

Sure, but isn't the REAL other selling point of AWS aside from an elastic virtual datacenter that it automates away sys admin tasks and ultimately people?

It’s deemphasized usually because you end up replacing sysadmins with new sysadmins and analysts who can understand billing optimization on AWS or pay way more for AWS.

Another commenter on this story mentioned $100k redshift workloads that were optimizable to $2k.

> Sure, but isn't the REAL other selling point of AWS aside from an elastic virtual datacenter that it automates away sys admin tasks and ultimately people?

Yes. AWS shines when you have a small team with no competent DevOps people and a desire to build a single datacenter setup. However, at that scale...we are talking a small business / startup with less than 50 employees. If you are still doing that when you've grown past that point, you've got architectural problems.

Once you scale past that point, you still need a competent sysadmin to build a reproducible process...at which point even leasing hardware from any provider works. You cannot build a multi-DC setup that integrates redundant providers without a competent sysadmin.

Even if you stick with how AWS has you do things it will eat you alive if you do not optimize your workloads. I'm not sure how you would do that effectively without an experienced sysadmin acting as an analyst.

So perhaps the best approach is to use your own servers for the guaranteed minimum load you will see and use more expensive AWS for everything else.

Over-provision will still be cheaper than the cloud.

Also, in my opinion the know-how and full control of the complete stack is paramount to maintain a quality service.

In addition, you can do both. Tying yourself to one provider is an unnecessary risk and tying yourself to the cloud is not ideal.

The obviously unanswerable question would be: would Dropbox have been able to succeed as a startup while building their own data centers in-house?

I assume that answer is a no.

I would bet they still burst-out to AWS.

For me the logic is more like: get cheapers machines (be it in-house or with cheaper alternatives), that run kubernetes for example, and monitor them with Prometheus.

If you run out of capacity, defined by whatever metric you fancy from Prometheus, start EC2 machines for the burst.

Every month, re-evaluate your base needs.

Amazon was a phase. They bridged the gap of DOTcom and new thinkers not interested in starting a devops team to begin.

Now DO, vultr, and other options exist to fill the gaps more.

Aws was never the solution to content distribution or infrastructure replacement. Just a company smart enough to notice big gaps and not afraid to fill them to get the ball rolling, maintain and progress, then move on.

I really like this comment. Amazon offers a lot of services (with ridiculous names) but you are too right that you're paying for the flexibility to pivot as your needs change. Dropbox did well to recognize its needs after a time using AWS.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact