Hacker News new | comments | show | ask | jobs | submit login
Which is less expensive: Amazon or self-hosted? (gigaom.com)
70 points by oscar-the-horse 2107 days ago | hide | past | web | 71 comments | favorite

Why does the author assume 1 EC2 extra large instance is equivalent to 1 dedicated server and then base the required dedicated servers on the peak required EC2 instances?

He also neglects to weigh in ordinary web hosting with dedicated servers... 131 (based on the alleged equality between vps and dedicated) $300 boxes hosted by someone else with 2x or more cpu, 2x the required bandwidth and all that dedicated disk io for a full $30,000 per month less than AWS, $20,000 per month less than the calculated self hosting (which includes amortized cost of buying the servers), and outsources the physical maintenance of those servers. Not to mention regular old big VPSs at xyzhostingcompany.

After everything it reaches the very predictable ending that AWS is worth it if your requirements are variable.

I don't understand why so many people are fixated on Amazon AWS. Someone please "explain" this to me.

Its overpriced and underpowered. Linode, RackSpace, and many other VPS providers perform better and are a much better value.

To me where it makes sense to go with a dedicated or self-hosted solution is when you start needing servers with lots of RAM, because all of the VPS providers will gouge you when you need RAM. They will charge much more per month for the server than the cost of the extra RAM chips and CPU and you will have paid for the server within a couple of months.

I think that VPS providers will have to start lowering the prices for their higher RAM instances pretty soon because RAM prices have gone down so far.

> I think that VPS providers will have to start lowering the prices for their higher RAM instances pretty soon because RAM prices have gone down so far.

Pricing of VPS and dedicated servers scales with RAM but is entirely unrelated to the cost of RAM chips. Hosting companies use RAM as a proxy for their real costs -- power/cooling, hardware wear/replacement, bandwidth and support. There's a strong correlation between the amount of RAM a customer purchases and the hardware utilization of that customer, that's why the industry has converged on that component as the main factor in pricing.

Thinking that these companies charge $25/gb/mo for RAM has anything to do with the cost of physical RAM (which would be paid off in the first month) is a mistake.

I'm frustrated as anyone with the difficulty of finding affordable high-RAM instances/servers without colocating, but complaining to the host that their pricing should change because of the price of physical RAM won't get us anywhere. That's not how they set the prices.

This is probably also why EC2's High Memory instance types don't actually overcharge you for ram.

Consider a double extra large high memory ec2 instance - it costs you $6093/year for 4 cores and 34gb ram.

On softlayer, a dedicated quad core xeon processor with 32gb ram costs about $12,600/year. I imagine the CPU speed and disk IO on softlayer is better than the EC2 instance, but for many workloads (read: memcached/redis/giant in-memory calculations) that doesn't matter so much.

Their proxy is also a market distortion. By not pricing RAM at closer to its marginal cost, they encourage people to burn CPU instead of RAM in algorithm design, which in turn increases power usage and creates cooling problems.

You seem to be saying "In my imagination, their empirical observations are wrong." What you're saying could theoreticlly happen, but that doesn't mean it actually happens in real life.

You seem to be replying to a comment that exists in your imagination.

When targeting an environment where CPU usage is cheaper than a 30GB hash table, I'll choose the CPU usage. It's very simple. I am not actually commenting on anyone's empirical observations, theoretical, imagined, or otherwise.

Right, its a proxy for all of their costs, but when the pricing and hardware stays the same while hardware and bandwidth costs decline, that means the value is declining.

Hardware and bandwidth aren't as much of a cost factor as datacenter power / cooling.

When systems can support more memory (and adequate cpu to utilize it) per watt, then pricing should decrease.

> "I don't understand why so many people are fixated on Amazon AWS. Someone please "explain" this to me."

I believe people feel more in control of the buy process if they can make autonomous decisions and pay right away. No commercial chat, no nothing.

Whether the service is overpriced or underpowered this is not a problem until the buyer does not recognize it as such.

Only when the buyer gets burnt hard, only than he starts to consider other points like: really ad-hoc configurations, support (if they are not enough skilled on sysadmin tasks), the legislation under which the data resides (being italian and hosting on AWS I need to take this into account too), and also price.

1) Redundancy and backups are someone else's problem (Linode charges you extra for that... although you'd be crazy to rely on Amazon as your only backup, it does cuts down on the tedium -- if you're a small startup, you probably want to be writing code, not playing sysadmin).

2) Scaling is someone else's problem. For static objects, S3 can handle traffic spikes orders of magnitude beyond what would bring your Linode to its knees (and there's their CDN option on top of that). Spinning up a new server on AWS (or a hundred servers, or a thousand) to handle a sudden burst of traffic takes a minute or less, and you can turn them off in an hour if the traffic dies down, paying only for that hour. Spinning up new servers on Linode may or may not be possible at any given time, and you have to pay for them for a whole month.

Running your own or VPS is cheaper for predictable, steady loads, AWS can be much cheaper and more reliable for unpredictable loads.

It's worth having AWS in your toolkit in case you get that front-page post on HN, TechCrunch, Slashdot....

Having a cloud provider in your toolkit to handle traffic peaks, sure. That's very different from handling your base load with AWS, which a lot of people seem to end up doing.

Handling your base load with AWS is ludicrously expensive compared to the alternatives.

And the upside of having your system set up to be able to make use of a cloud provider with instance instance spinup in your toolkit is that it increases the cost gap:

Whereas if you go dedicated only, you need to be able to handle reasonable spikes on what you pay for on an ongoing basis, if you can spin up AWS or other cloud instances as needed, you can push your dedicated boxes far closer to max utilization than what you otherwise would.

Handling your base load with AWS is ludicrously expensive compared to the alternatives.

Yes but the cost is tiny compared to paying some engineers to do all the hard work replicating what AWS offers on top of some alternative like Linode or whatever.

If your hosting costs are huge part of your costs, you're doing it wrong (or are very very successful, Google/Facebook scale).

AWS is more expensive for long running instances (Although not so much if you reserve them).

However where it (and Rackspace + others) shine is using the API to spin up instances for jobs in scripts, or have your monitoring automatically spin up additional instances to scale up when load increases.

Apples an oranges here. Linode kind of sucks, their performance is good but their pricing is whack and they're not really elastic .. a 1/2 month credit when you power down an instance is pretty lame. I won't do business with those guys after their behavior at hostingcon this year.

As to RackSpace I don't trust any tech company who has a 7:1 sales rep to engineer ratio. They are overpriced in all of their services, especially their managed dedicated servers (seriously, why does it take a week to have a server installed? it takes a week because they literally have a guy go out and image servers by hand. do you really want to wage your company's future on that sort of incompetence? I don't)

What did Linode do at hostingcon?

I've been using them for a couple of years on small- to mid- sized projects and am very happy. My needs aren't elastic at all, though.

It's hearsay, but a friend of mine who was there last year told me a pretty frightening story about how one of their engineers harassed and stalked my friend's co-worker for the duration of the conference. (word to the wise, women should never put their cell phone number on the business cards they hand out at conference filled with drunk, horny nerds)

If linodes pricing is whack could you please suggest a similar-but-cheaper alternative that provides at least as good service then?


I think it's pretty well known that for most use cases, cloud hosting is more expensive than dedicated hardware.

That said, we're currently moving from a dedicated server to AWS, after we had a bit of nasty downtime. We have dedicated servers with 1and1, and the RAID in our server died and striped bad data all over one of the disks, slashing half the files with junk. 1and1 tech support refused to acknowledge the problem (and claimed we had software RAID setup…) and it took us a few days to get back online from our weekly backups.

What I'm hoping from Amazon as a cloud provider is handling failure better: With 1and1, a failed machine means a few days getting a new one, or paying double for a hot spare. With Amazon, even if dead instances happen more often, killing it and spinning up a new one is trivial, and can even be automated. Backups can be made much more often non-intrusively by using snapshots.

For reliability's sake I also like the idea of having a few small instances behind Elastic Load Balancer instead of one beefy machine. I haven't seen anything like ELB with a dedicated hosting provider (aside from using an actual load balancer, which is a very expensive proposition).

Of course, not having to plan your capacity so far in advance and being able to start small and scale out at the drop of a hat if something on your server goes viral is a really exciting proposition as well.

What about the amount of dev time and sysadmin needed to fully use each option?

If you want to take advantage of AWS for spiky use, you need to automate the heck out of starting and stopping instances, redistributing requests to new machines, etc.

Horror stories about EBS make me think that you'd better reconsider storage if you're hosting everything with AWS too.

Of course, the flip side is that with a totally self-hosted system, you'll probably need more sysadmin work, and you may end up spending money on things like remote hands when a drive fails or a network card dies.

Then there's managed hosting. You don't have the super awesome scaling magic of AWS, but you don't have to deal with the physical bits much either. And you still get real physical hardware attached directly to each system when it comes to storage.

I think that really understanding the costs is a lot more complicated than this article suggests.

If you want to take advantage of AWS for spiky use, you need to automate the heck out of starting and stopping instances, redistributing requests to new machines, etc.

The thing is, this is not difficult to do with AWS:

    $ ec2-run-instance ami zone --user-data-file spin-up-a-new-webserver.sh
    ...(you need to parse this for the instance id)...
    $ elb-register-instances-with-lb $LOAD_BALANCER --instances $INSTANCE_ID
(In real life, use Boto (Python) or equivalent in your language.)

Once the new instance comes online and is legitimately serving up pages, the load balancer will begin redirecting requests to it.

This is why we use amazon - handling stuff like this is just a matter of calling their utilities.

I recently bought a server (xeon 5606 + 24GB Ram + 2 TB storage with hardware raid) for $2k and I am hosting it in a colo for $75. I find this ideal for small services and gives you a lot of performance allowing you to go a long way with a couple of machines before you need to scale. Adding more CPU/Ram/Storage will not add more to your monthly payment since you own the machine. Now of course it's not as convenient as AWS and if a machine goes down you are responsible to get it up again.

Scaling in AWS is a piece of cake and gives you a lot of flexibility but when it comes to performance, especially RDBMS, I find AWS to be far behind. Some will say that you can scale the service to provide good RDBMS performance but that will not avoid the per instance disk IO slowness.

If your network card fails or your CPU overheats, how long are you down for?

If you care if your website is up there are other costs when doing a colo. You should have spare for every component which nearly doubles your hardware cost and have someone on call for sys admin and hands-on fixes.

You can fall back to AWS in those cases.

If I was you I'd just rent a server from ovh.co.uk / kimsufi.co.uk / hetzner.de/en and use a CDN. I am basing on your performance requirements that this is not a local gaming service. The advantage you have is you can upgrade to new hardware every year and the hardware is not your problem.

OVH is building one of the most advance networks with over 1tbps. Their 10gbps servers can actually push 10gbps which just gets me really excited.

I want the machine to be located in U.S because of latency

That a CDN can't solve?

Competitive providers exist in the US. I just don't think anyone is as advance as OVH in terms of price efficiency so I'm no expert on them.

CDN? I am not looking for a CDN, I want to host a machine not content.

Do you mind e-mailing me where you are hosting? I am in the market for a box like that myself.


Our new startup is all AWS, and honestly i dont think we could have pulled it off any other way. Key factors:

* Getting into a data centre is costly and difficult without venture funding

* When stuff breaks, i need someone to go fix it...just too expensive and time consuming

* I want predictable expenses because we dont have a lot of money, not having to pay for repairs...and being able to create new servers easily myself gives me this

I can see how we may need to move away from AWS down the road to reduce costs, but honestly i'm not convinced its going to be the difference between success and failure.

Given what AWS provides in the short term, unless you're talking expenses of 40K/month i wouldnt even waste your time with self hosting. System administration (Hardware) is very expensive...

Edit: I should also note we need lots of geographical locations, so we're a little different in that regard. AWS again gives us an easily means of being in 7 locations without opening any additional accounts.

Ive said this a few times earlier on. The main benefit to Cloud hosting is hourly billing. Unless you are utilizing this feature extensively you will almost always be better off in a dedicated/colo enviorment. You will get more power/bandwidth for less money.

That being said, there are some edge cases. If you make extreme use of additional feature of amazon ( SDB/EIP/etc.. ) then the Amazon "bundle" could make it a better situation for you.

Im only writing this because I see so many HN'ers talk about "Well, I can spin up a extra large instance at any moment to handle extra traffic!" - The fact is, most of you dont do that. And even if you do, you would likely save money just being on a dedicated anyways. Cloud hosting is really for "overflow" and nothing more.

Again, Hourly billing- Cloud hostings biggest advantage.

There are other significant benefits to cloud hosting. Specifically on AWS:

* Storage flexibility with EBS and S3

* instant migration abilities between instance sizes

* backup and restore

* provisioning of resources when needed

* staging, dev, test resources

SDB and ELB usage certainly aren't edge cases, they are core to many many deployments.

The provisioning flexibility alone is a tremendous benefit considering that most highly trafficked Web properties are not static environments. They constantly innovate and have demands for infrastructure that can really be challenging in a dedicated environment.

In our case, we use AWS extensively, but where we have been able to carve out a "static" set of resources, we do and host with 100TB.com - simply because they offer so much bandwidth for cheap and those static resources pump out a lot of bandwidth which isn't cost effective on AWS for day to day operations.

Edit: from a financial standpoint, if you choose to purchase reserved instances for AWS, hosting becomes far closer in costs benefits to dedicated environments.

Agreed. If you use the extra features of amazon, like you mentioned then it can be worth the cost.

I was strictly speaking on a financial basis. Hourly billing only pays off if you use it to constantly scale. If you are running your servers 24/7 then chances are you are better off on dedicated for some portion of your infrastructure.

* This doesnt apply for tiny sites that can get by running on a tiny linode for 20$ a month. However, once you are paying 100$/mo, every month, then this holds much more true.

RDS also saves a lot of time if you're using MySQL with replication and snapshotting. It makes a nontrivial amount of work trivial.

At the risk of being an obnoxious pedant, "nontrivial amount of work" is, to me, a non sequitur. Trivial describes the complexity of work, not the amount. Work is either well understood (trivial) or not well understood (non-trivial). In either case the work effort itself can be variable.

I can't comment on the accuracy of your comment otherwise. Those who can, write content. Those who can't, nitpick semantics :)

If you want to be a hardline language pendant about it, "trivial" describes neither an amount, nor complexity but commonness.

But either way, yeah, you're being obnoxious because idiomatically "trivial" when applied to work fits fine and is easily understood for all of these scenarios (complexity, amount, and uncommon vs common).

I certainly wouldn't consider collecting all the trash in the LA metro area "trivial work" by the well-understood meaning of the phrase even though it is commonly done and it is easily broken down into non-complex steps.

Prescriptivist vs descriptivist, and entirely off-topic.

This is only the case if you're utilizing all your hardware. A huge advantage of cloud hosting is that you can buy a slice of a machine. I run all my new experimental stuff on $20-per-month Linode instances, and once they get big enough that they need more hardware, they can graduate to whatever. You probably won't need 8 cores and 32GB of RAM and a half TB of RAID-5'd 10k RPM SCSI hard drives for a long time, but you're still paying for it.

At large scale, you're probably better off with a baseline dedicated + cloud for overflow, absolutely.

I forgot this point. Its absolutely correct, you cannot beat the price point for VPS if your stuff can run on a 20$ plan. However, once you get over that hump.. say 100$/month it becomes obvious that dedicated pays off in spades.

In other words, go ahead and spin up that extra instance for extra traffic- but put your baseline in dedicated/co-location.

Also, no contracts. Dedicated hardware usually comes with commitments.

Hasnt been this way for many, many years. None of the major hosts require you to do contracts (theplanet, softlayer, etc..)

Rackspace does.

> Because labor is a mostly fixed cost for each alternative, it will tend not to impact the relative comparison of the two alternatives

I don't buy this. My experience is that the ops cost of a co-location facility are much more expensive than aws. The ops cost functions doesn't seem to linear as he is describing either.

People costs dominate early on and are a huge factor until you start to reach steady state and that's the variable that you need to optimize for.

Having said that aws is expensive. If dollars are worth more than hours to you, then yes, by all means host things yourself.

This is looking at one aspect of hosting: your hosting bill at the end of the month.

If you are taking advantage of the various services AWS offers, you can save development time. For example, if I can get up and running quickly using RDS and save a bunch of time compared to setting up replicated mysql, maintaining backups, etc, I'll gladly pay the extra cost.

The same goes for load balancers, memcached, etc. Sure, I can save money in the long run once I've established my app. Initially, I don't want to waste a bunch of time bringing these services up on my own.

I run SSD Nodes, Inc. (http://www.ssdnodes.com) and we have various business clients using our services for their peak offloading while maintaining their in-house infrastructure (I can't be more specific than that because of our privacy policies). I would recommend doing both, mainly because scaling is super easy.

Holy crap, you guys support FreeBSD, too. I've been looking for a good alternative to Linode for a while.

Do you guys have automatic provisioning/pro-rated billing? Do you do any shady stuff like requiring cancellations two months in advance?

We're using XenServer, so auto-provisioning is still on our plate since we have to build all of that in-house (solutions like SolusVM don't work with our unique infrastructure). For the most part getting a new account up and running takes 15-30 minutes, and you can request an OS reload at any time through a support ticket.

We're very upfront with our cancellation policies, which is 24 hours from when your bill is due (I can't imagine companies requiring a month or two in advance, that's absurd). Our reasoning is that if the service is easy and painless to cancel, people will be more than willing to order again.

Cool, thanks for the info. I will be checking you guys out.

Tilaa.nl is one company does the shady two month cancellation stuff. They suck.

It seems non-Linode VPS providers either have shady cancellation/retention policies OR lack auto-provisioning/pro-rated billing.

Sorry I overlooked that last bit, we do offer pro-rated billing. The way it works is when a new client orders, that becomes their monthly billing date. Any upgrades are pro-rated with the remaining days in the billing cycle. If you order another package as an existing client on a different date than your monthly billing date, open a ticket with the billing department and we'll pro-rate it with the remaining days in the cycle. Some clients like having different billing dates for their services, others prefer them all to be on the same day each month. We're flexible and will accommodate to your preferences.

Cool, thanks for the replies.

Not to rain on your parade, but where's the advantage over getting a regular dedicated box? Your billing is monthly and your pricing doesn't look exactly competitive...

Dedicated boxes with enterprise sandy bridge processors (Intel E3-1270s), enterprise SLC SATA III SSDs, segregated gigabit public (12 network providers, 8x Tier 1) and private networks (with dedicated SSL/PPTP VPN), costs considerably more than $8.99 per month :)

Our pricing model based around scaling horizontally, rather than vertically. There are always different requirements for every project, and that's why I'll never say that our solution will fit everyone's needs.

Well, since this is a thread about amazon clouding their prices, I'll call you out on that.

It makes little sense to compare your $8.99 package to anything, but you surely know that. What is the use-case for a box with 128MB RAM in 2012? Even the cheapest VPS providers (ThrustVPS et al) start at 512MB for $5.95...

Since your packages are sliced so small, a meaningful comparison seems to start only at your "high-end" (graviton).

For $180 USD you offer 3GB Ram, 6 cores on a shared box, 8GB SSD + 400GB S-ATA disk.

I'm sorry, but for the same amount leaseweb, hetzner, OVH will happily rent me a full, unshared E5540 (passmark 9600 instead of your ~3500), 16+ GB RAM and SSD.

I don't mind constructive feedback, it helps us improve our services. I think we're both comparing two different things here. I mentioned in the previous post that our pricing model is scaling horizontally, rather than vertically.

I've seen many different uses for the 128MB node, either as backup servers, DNS servers, small web servers (nginx works amazingly well), private BSD development boxes, small-scale proxy servers, personal VPN servers, etc. I've also seen the 1GB of SSD storage used as cache for ZFS, which results in extremely low latency reads and writes. The 128MB node was initially offered because of demand and is actually one of our higher selling services.

Unfortunately, comparing the numbers from two different company offerings doesn't give the full picture on what either of them offer. There's a lot more that's included in our pricing, and you end up paying a bit more for new, under-utilized physical servers where you can actually use all of the resources provided with great disk I/O. There are a lot of providers out there who try to cram as many people as possible on a single box, and we're simply not one of them.

I appreciate you taking the time to write out your thoughts. If you have any more concerns feel free to put them here and I will try to address them the best I can. If you would prefer to discuss them in private, email me matt[at]ssdnodes.com

I noticed a few things in the calculaton that bias it towards self hosting:

* Using us-west, the more expensive option

* Not including labor, which is signifigantly higher if you have to rack and stack yourself.

That being said, I agree with the conclusions. If you're traffic isn't spiky or variable, then you might be better off self hosting.

"You're" instead of "your"? Really?

You're comment does not contribute to the thread at all. Really.

I don't give a shit.

I tend to agree with this analysis. I work in a datacenter and I noticed that people transition from shared cloud to self-hosting (or dedicated cloud) if they have nil or few traffic spikes.

Those who are already self hosted use AWS to absorb traffic/computation spikes.

As someone who has never written an application to be ran in a cloud environment, I've been wondering for a while now: what sort of extra complexity is introduced in your code to handle the unreliability of any given server, the network, and latency in general? What techniques are used to deal with it?

I'm approaching this more from the viewpoint of a webapp that might only need ~10 dedicated servers to handle peak load with redundancy rather than ~150, so our colo solution doesn't have servers constantly breaking on us.

Simple example: For one of my projects, I needed to host 16+TB worth of satellite imagery and elevation. I put a machine together with 24GB Ram, raid5, and 8 cores for around 3800. I pay $200 a month for unlimited bandwith (and I still have 2Us that I am not using). I got pricing from rackspace, amazon, go grid, etc. Nobody came remotely close to that price.

For most of my other projects, I always use Amazon. But for this use case, there was no challenge by any other cloud/hybrid cloud service.

I compared Amazon to rented servers (monthly cancelation possible) one year ago, and even if you have spike traffic, you need to deep dive into cost to find out if cloud is cheaper for you.

It wasn't for me.


The difference between $60k and $70k isn't all that significant -- particularly if you find that AMZN saves you labor.

If you have a lot of storage needs then self-hosted is the way to go. We have 30TB storage mirrored across two data centers and pay less than $500 a month. There is a copy on the east coast, one on the west coast, and one in the office.

I wonder how many nines of reliability that gives us?

Why would it be so simple to use AWS compared to a hosted server like with Hetzner? On EC2 you still have to administrate a Linux OS and install patches, take care of incompatibilities, etc.

It depends on what you use them for. I use ec2 as a testing platform to experiment and gauge performance. From that viewpoint, it's cheaper and easier than a normal host.

To be honest, I'm not much of an ops guy. So, I'm going with Hetzner dedicated servers. AWS is more difficult for me opswise than dedicated, and with dedicated I get all of the "let the people who know how to run data centers host your servers" service that EC2 gives you.

AWS and traditional architectures are too much ops load, require too much specialized knowledge and have too many single points of failure. Plus if Amazon has terms you don't like (I personally won't do business with them given their treatment of wikileaks) your reliance on their protocols and services makes it non-trivial to migrate elsewhere... if you just use EC2 as bare machines, then there's no advantage to AWS over any other bare machine host (and a big cost disadvantage.)

I'm building a cluster of distributed three-times-replicated data on top of Riak. Every node is a web server, dns server and database node. Round Robin DNS distributes the load, and if a node goes down, I don't have to even get out of bed... it can wait til morning, because nothing should break. (Of course this is what I'm building, I can't say it has performed in production yet, so this is theoretical...)

I call this Nirvana.

The only SPF I should have is if the whole datacenter goes offline-- this is a legitimate risk, and once I get large enough to handle that risk, I'll upgrade to Riak Enterprise and host in 2 data centers.

I'm not certain I'm not missing anything, but I don't understand why I'm seemingly cutting new ground here-- this seems like the way everything should run. (and if you agree and are interested in Nirvana, follow me on twitter, I'll be open sourcing it as soon as I possibly can.)

Really depends on what you're doing and what you can afford.

If you're Foursquare and you can afford to double the cost of your infrastructure because you want some of the benefits of the AWS platform (and there are plenty of benefits these days) - then it's tremendous to say the least. Amazon is doing really incredible things with AWS.

If you're doing less than a million uniques per day, you can go get three tremendous machines, lasso them together, with a web server + main db + slave db, for between $800 and $1250. You can get 100tb of bandwidth on a 1gbps port (Amazon gives you none), dual 5645 Intel processors (or 2x16 core AMD), 48gb of ram on the db machines (96gb if you want to pay another $200 / month), with a RAID 10 config SAS 15k drives. That equivalent setup with Amazon would cost you $5k to $10k depending on what you config. You can get this setup from reputable hosts like WebNX and SecuredServers; if you want to pay more for a better host and get a little less, you can go with Rackspace or Gigenet or Softlayer.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact