Did you try multiple instances on each provider? Performance between VPS's from the same provider varies wildly. If there are 64 VPS's on a server, you will probably get a lot more than 1/64th of the processing power of that server because most of the other VPS's are idle. But how much more than 1/64th you get will vary wildly from machine to machine.

This absolutely should not be true, and in my experience it isn't true on Amazon -- you will neither be starved for resources if others do intensive things, nor will you enjoy riches if they are quiet: You will get what you are paying for.

I have a personal experience from Digital Ocean that is a bit different. Firstly let me say that I think they have a great service and compelling prices, but I set up a test server (the 2GB/2CPU variant) to trial leveraging it in the platform mix, as a solution that crosses host providers = awesomeness.

The IO performance I got was terrible, despite all of the talk about SSDs. Simple operations would stall, the CPU endlessly waiting on wa. I submitted a support ticket and quickly they toggled some priority flags and I started getting performance more along the the lines of expectations, but ultimately it seems like a classic case where single tenants can completely monopolize the platform, enjoying the entirety of the storage platform at the cost of everyone else. I'd rather that they cap consumption and do appropriate IO quanta allocations rather than leaving VMs starved.

And it really makes me concerned for the future -- do I have to constantly do benchmarks and analysis, hopping VMs just to find one that isn't an abomination? That isn't how these things are supposed to run.


One thing to consider is that different instances of the same class may be deployed on widely varying hardware. For c1.xlarge instances we've seen very different cache size and CPU specs (cat /proc/cpuinfo) and now always try to get on the machines with a 20mb cache - our benchmarks show these to be better for our needs.

I have heard anecdotally that it's not uncommon for large players to bulk-start instances and then kill all of the ones that don't have the latest hardware.

Update: I just looked at 4 random c1.xlarge instances we have running, and found 3 different types of underlying hardware:

1. Intel(R) Xeon(R) CPU E5410 @ 2.33GHz w/ 6144kb cache

2. Intel(R) Xeon(R) CPU E5506 @ 2.13GHz w/ 4096kb cache

3. Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz w/ 20480kb cache

Of these cores, the E5410 is from 2007, E5506 is from 2009 and the E5-2650 is a Sandy Bridge from last year.


I would guess Amazon moves things around more, but most VPS providers, which target more the usecase of you running a VPS 24/7/365, put you on a server and leave you there, rather than rebalancing for load (migrating a live guest transparently isn't that easy).

They should still set things up so you get reasonable baseline performance, even in the high-contention case, rather than overselling the resources. But you can end up with quite a bit of performance variance in the upwards "more than your fair share" direction, especially for I/O, if your neighbors are quiet. If you're on a 32-guest machine where everyone else is idling on IRC or doing nothing, you get a whole disk to yourself; if everyone is doing random alternating reads and writes, you get a 1/32 of a thrashing disk's worst-case throughput. Usually you get something in between.


This absolutely _is_ true, and in my experience it has been on AWS. Search for past benchmarks on HN for examples.

Regardless, we stick to AWS at work for the entire suite.

When it comes to my money though, DO is making very large strides imo


Can you provide any more specific search criteria?

I've found Amazon instances to be quite consistent. They vary, of course, but quite contrary to your initial statement that they vary wildly, I find the variances quite small, and there isn't a need to constantly hunt for ripe instances. I have absolutely found what you said to be sadly true on quite a few other VM hosts.



tl;dr: Netflix kills slow performing AWS machines due to resource contention


I have found the $10 DO server to be about 4 times as fast as the $20 Linode on one of our disk heavy workloads. I'm not going to pretend that this is representative, though.


Is his simple benchmark even hitting the disks?


