I call these "weekend benchmarks" -- what you'd typically do when you have a block of free time, then spend time optimizing for said benchmark. Roll in on Monday with some staggering results, only to find one (or many) of your variables were off.
Did the author try multiple instances on each provider? VM tenancy is a bitch. (Think of how annoyed you get at the noise levels, when your neighbor in the next apartment throws a party)
Is the author's source benchmarking machine, a physical machine, or a virtualized guest? Does it have power savings turned off so that the process is running at 100% speed, instead of a variably-clocked down core?
Did the author enable or disable TCP time-wait recycle? So he doesn't bump into said ceiling when running such tests back to back?
Did the author run the tests back to back, or have a cool down period between tests?
Where was the author's network upstream located when he tried said tests? Were there any network issues at the time of the test? Would the author even be aware of them?
Your page you're testing against, does the same database call, which is presumably cached. Did he throw out the cached results? Can he identify the cached results?
Are we firing up ApacheBench with HTTP keepalives enabled? With parallel connections? How many parallel connections?
How many Apache servers (StartServers, MinSpareServers, etc)? Which httpd modules were enabled? Which httpd modules were disabled? Which PHP modules were enabled?
You're trying to benchmark CPU and I/O horsepower across three different platforms but doing it through this narrow "straw" which consists of your "independent server", your upstream, your upstream's upstream, your upstream's upstream's peering connection with Amazon/Linode/DigitalOcean, your web server and its PHP module, your application, and MySQL.
If you're rolling your eyes at this, then you shouldn't be doing weekend benchmarks.
Do you have something better to point to? It's easy to complain about stuff, but at least he's out there trying to do something. Presumably it can be improved.
I'm particularly fond of the quote "lead, follow, or get the hell out of the way", which is a bit harsh in this case because a lot of your advice is good. It could be framed in a more constructive way, though - there's some Comic Book Guy tone there in your comment.
My intended tone was not "don't try", but "try harder".
I've listed at least 5 ways to improve/normalize the testing, as well as linking to a document that does a pretty good job of explaining statistics (particularly, how programmers do a bad job of statistics; baselines for benchmarks; etc).
"At least he's out there trying" -- with this not-so-great benchmarking, the author has just effectively SHITTED on 2/3 companies that have gone to great lengths to build amazing infrastructure AND managed to spread his FUD around the web, to the point where it reached the HN front page -- and you want credit for trying?
It's true: you don't say something like "Well, Amazon just sucks." without backing the statement up with something more credible. As someone a little less savvy on the topic I'm glad to know that the test wasn't even close to the final word and why. Thank you.
It's probably also true that your tone is more abrasive than it needs to be.
Part of the problem is a benchmark like this, put together with plain ignorance, and published on the web will give people who don't know any better all kinds of false perceptions.
Well, if these benchmarks are misguided and useless, it is useful to recognize it as such so you don't go out and trade all your EC2 instances for Linodes.
"At least we're doing something!" is a silly defense.
They probably have some faults, but the general conclusions smell right to me: I don't think they're in the "really screwed up and wildly misleading" category, but in the "ok, interesting, could use some work though" category.
There is nothing you should be more wary of than a benchmark that matches your pre-existing intuition. It'll lead you to ignore serious methodological issues, without any sound scientific (or any other epistemological) reason. https://speakerdeck.com/alex/benchmarking is a slide deck I gave at my office on how to do better benchmarking
EDIT: I should probably mention I work at Rackspace, and thus everything I say on this subject should be taken with appropriate grains of sand :)
Well AWS is slow is not surprise, there was an article on HN in last couple of weeks that pitted Azure against AWS and AWS was faster hands down. So this is no surprise at least to me.
This statement is exactly the problem he is describing. :) One metric for a specific use case or scenario is a terrible indicator of overall "quality". It is much more nuanced than that. I think the worst tickets I've gotten in the 10 years sofar sysadmining is when a customer just states their app is "slow".
Yes in simplistic terms for a specific metric I'm sure other providers have better hardware than AWS, if that is all one wants to base their value of "better" on then so be it, but that is pretty naive.
Many argue that the AWS ecosystem (25 services at last count) and the extensive featureset of AWS outweigh the bare bones "fast" metrics of other providers.
I think like the poster above is mentioning...there is generally more to it than a simple metric or two sampled a few times from a single endpoint. But I guess it all lies on ones definition of what they consider valuable...
If performance is the primary concern you wouldn't pick such small server/instance types to benchmark. It's like organizing a drag race between a scooter, a moped and a kid on a bicycle.
Another major flaw is taking results for a single instance type and implying that those apply to all instance sizes and each provider as a whole.
Honest Question: Why not launch your app's infrastructure on both platforms and then round-robin your traffic between the two for a billing cycle and compare the results at the end?
If you are working towards "best practices" on AWS, you should be running multi-region (who wants to be the one left holding the bag when US-EAST goes down again?). If you've done all the heavy lifting to enable yourself to run in "pods" across mutliple regions.
Well, if you can do that, why not treat Linode/DO/Rackspace as a separate region and deploy a "pod" of servers there?
At the end of a month you should have enough statistics that are directly applicable to your own app and your specific customers, as well as some real experience with the operational experience of dealing with the new provider.
For example, maybe one of the other providers has really fast machines and their major upstream provider has a great peering relationship with whatever test node you were using for these microbenchmarks, but perhaps those servers are really flaky and crash all the time, or perhaps the majority of your customers see really bad latency when hitting those servers? Maybe their API isn't just "immature", maybe it crashes a lot and they have bad customer service.
Those are the sorts of things you aren't going to figure out after simply running a few load tests. Anyhow, it just seems like something like this would be a lot more valuable than any amount of synthetic testing.
How would you ensure that an instance launched on hardware bought 6 months ago is identical to the hardware under an instance launched N years ago and still running? Buy old hardware on eBay to prevent newer hardware introducing variation?
> To be fair to Amazon the author should have spun up 100 instances, benchmarked them, and used the fastest one. This is a best practice for AWS.
Why choose the fastest one? This would be the least accurate way to give an indication if the performance of AWS instances. A mean, or perhaps median depending on the skew, would be a better choice.
Unless you're saying that the benchmarks on the other servers are effectively cherry picked best results.
Yes, amazon is 'overpriced' if what you're looking at is CPU or disk speed (which is not a bad metric, really). Where amazon shines is the amount of programmatic interaction you can do with it - have build scripts kick off an EC2 instance to run tests, send reports, shut it down. Programmatically bring up more instances during peak times, spin them down at night, etc.
AFAICT, Linode doesn't offer that, and they probably won't. Amazon's been ahead in this arena for awhile, and will probably keep that lead for the forseeable future. EDIT: Apparently they do have an API which would cover a decent variety of use cases.
What's sad is the number of people that migrate over to Amazon because it's the done thing, without realizing what they're paying for (and that they're not utilizing the unique features of EC2).
The other thing Amazon offers is investor confidence. I've worked with several clients who could save tons of money every month by going to Linode, DO, etc., but stay with Amazon (or Heroku) because they feel it reassures investors they are following "best practices." I can understand lots of reasons not to switch, e.g. just having higher priorities, but this reason seems particularly silly.
It's more than just those incidents though. It's about basic security practices. No security audits. Passwords stored in plain text. Major discrepancies in the story from what Linode says versus the hacker specifically around whether the private key was compromised and credit cards stolen.
And it's just a continued pattern of incompetence and cover up.
For us another AWS benefit is the have a datacentre in Australia which is essential for us - storing our data overseas would require all sorts of extra legal hassle. Any performance boost from a different provider would be lost due to the extra network latency getting to the US.
> AFAICT, Linode doesn't offer that, and they probably won't.
While perhaps not entirely as mature and full featured as AWS's offering, Linode does offer an api with which you can script creation/initialization of test servers.
But they do not make it very easy to spin up / down new instances. Setting aside the API, they bill you for a full month for a new instance as soon as you create it. True, you get a pro-rated credit if you delete it earlier but this is awkward and I think it only pro-rates by the day.
And StackScripts kinda sorta work, but they are hard to write/debug and are not portable. It's a pretty weak offering.
My thoughts exactly. Also, the AWS ecosystem provides quite the tool chain that is all technically intra data center. Just the S3 interaction is worth a lot if you have to deal with UGC or large amounts of system-generated files (like call recordings).
Another important aspect is ability to easily transfer raw data into an analysis tool, like Red Shift. Google also excels here with their Compute Engine and Big Query.
I think a lot of people think of the benefits of AWS is that it is the cloud and therefore infinascale. The thing that I see most frequently is that there are generally a minimum number of servers that always need to be on, then if needed, spin up some temporary servers to fill the short term spike. It is not trivial to write this kind of automated autoscaling, and I would wager that a lot of companies that have not heavily invested in their dynamic resource allocation scripts are not really benefiting from AWS the way they think they should.
While it is cool that AWS has a ton of services they offer, I do not like the vendor lockin that comes with AWS services. I think it is generally a better strategy to go with something OpenStack based. That way you can use your own hardware and dynamically provision new nodes in "the cloud" with companies like HP, RackSpace, and others. With modern day virtualization, running your own hardware can often be much cheaper and much more performant than AWS.
Linode does do metered billing. If you take down a node, you get credit for the leftover days in your billing cycle. The big difference is that they prorate by day rather than by second.
Digital Ocean does do metered billing, and their API is very sweet & simple.
The 'small' instance he's testing is the worst deal Amazon offers. If you buy certain instances you get can a better connection to EBS, and then you can allegedly buy better I/O for $. A fair analysis is going to involve pages and pages of graphs and charts...
It is hard to beat AMZN's flexibility and easy provisioning. For instance, even my sister-in-law can walk up to AWS, spin up an instance with Windows, log in with RDP and then enjoy everything you can enjoy on Windows other than high-end gaming. (For you mac-ers in the audience, this is a clean and economical way to make sure that your landing page works for "the rest of us")
But you don't have to choose because you can work with Linux. On top of that, AMZN layers services such as Elastic Map-Reduce which are compatible with industry standards but eliminate so much time you could waste sysadmining.
> If you buy certain instances you get can a better connection to EBS, and then you can allegedly buy better I/O for $
You actually buy provisioned IO. The throughput can be actually smaller in some scenarios than for non-provisioned IO, provided you: don't have neighbors or they are not noisy. And boy, is that provisioned IO expensive (we've bought quite a bit of EBS volumes).
good thing I wasn't the only person who immediately noticed from the spec of `why would you benchmark using m1-small?`
It feel likes the author doesn't know that EBS is network storage and not local storage and therefore depends heavily on the size of the instance for performance (unless you buy Provisioned IOPS).
I've been able to make good utilization of Linode's multiple cores for our Sphinx server (http://sphinxsearch.com/)
In sphinx's config files, its super easy to split an index up into 4 parts, and then assign each partial index to one core.
Doing this, we find that all 4 cores are utilized nicely (>50%) almost all of the time. I feel like we're getting a really good value out of that machine.
EC2 always comes out behind in these comparisons. However, EC2's benefit is the synergy of the platform: painless images, flexible volume management, VPC, programmatic DNS, queue services, Heroku style service, baked-in Chef support, easy RDBMS cloning and replication, etc.
These are fine benchmarks for consideration, IFF you are interested in using AWS EC2 instances as a vps. This might not be a terrible endeavor, but it's really not the intended use.
EBS performance is an issue - Yes it is, don't use EBS. Use instance store and push to S3 for persistence. If you need performance I/O for something, there is likely a separate service available that pushes your I/O bottleneck further away (RDS, ElastiCache, SQS, etc.).
AWS costs more for weaker CPU - Indeed, this can be the case. But, it's often cheaper (but not much) to put up an Elastic Load Balancer with an Auto Scaling Group and dynamically support your peak traffic than it is to pay for an enormous VPS that sits idle 60% of the time.
As these benchmarks suggest, I agree that if you're only using one EC2 instance (and you can't get away with a micro), you should probably be investigating other solutions. If you want to architect your app/project/service/whatever to be more distributed and fault-tolerant, AWS can probably make that easier (not necessarily cheaper).
EBS sucks, but EBS+PIOPS is actually pretty great. We've gotten consistent performance well within the guarantees ever since the launch last year. Highly recommended.
On a side note - From a marketing perspective, Amazon probably should have launched PIOPS as a separate product from EBS. The idea that "EBS sucks" is pretty firmly entrenched, so EBS+PIOPS is fighting an uphill battle.
I haven't had reason to use it yet, but definitely a valid note. Thanks for the corrective note about EBS, and I'll be a little more cautious with my comments in the future (because I think your marketing note is completely valid).
I am always surprised Rackspace Cloud Servers are left out. We've been using them for over a year with only a handful of outages (VM host reboots- 3~5min each).
For us it is worth the price (>$40/mo for 1 GB ram) for great stability and support. The Openstack API is really nice, too.
I've used Rackspace cloud both when it used to be called Slicehost and after the full transition to the Rackspace brand. In my experience past and present, you get a better deal dollar for dollar with Linode (better disk, network, and cpu performance). Personally, the only reason I'd use Rackspace cloud would be as a part of their hybrid offering.
Good call - lease a bigger 16-core 32GB box such as your database server on their managed hosting platform (not cheap), and then just spin up cloud instances connected to that server using RackConnect which seems to be some fancy, cumbersome routing/vlan monstrosity.
Hi. Laravel user/contributor here. I suggest you try the framework before judging it based on a benchmark. Not sure about their system/code, but a large production application with APC (Laravel 3.2.14, just like the benchmark) has a latency of 90-120ms, and some of those pages are pretty complex.
Techempower invites SMEs and advocates of their language/platform to submit pull requests to improve any overlooked performance problems in the specific benchmarks.
You're more than welcome to help make laravel look better in these benchmarks. But unless you're running the PHP through HipHop, I doubt you're going to get anywhere near the statically-compiled platforms near the top.
Well, yeah. If you use it as just commodity servers. At Radius, we migrated our index build process over to using Elastic Map Reduce on spot-priced servers and it's been a huge cost savings.
Long story short, move to Amazon if you want elasticity and can design around saving from their services. Otherwise, look elsewhere.
Well, AWS does have some very useful value add services: Elastic MapReduce, S3, and DynamoDB being my favorites.
Way back when, Amazon gave me about $1200 of free use credits over a two year period, and I played, experimented, used it for most customer projects, etc. I also used AWS for almost all of my own projects.
In the last year or two however, I have started going back to renting large VPS by the month (I use RimuHosting, but there are a lot of good providers) because you get so much more capabilities for the same amount of money.
A little off topic, but another way I have found to save money is to wean myself off of Heroku by taking the little bit of time to set up a git commit/push hook to automatically deploy my web apps. I was using a manual deployment scheme before that took me a minute for each deployment - not so good.
All that said, AWS is really awesome for some jobs like periodically crunching data with MapReduce, etc. I bought a very useful little book "Programming Amazon EC2" a few years ago, and I recommend that as a good reference for using the AWS APIs.
Using a VCS as your deployment strategy is The Wrong Way to do it.
I won't go into it in this comment because it's been beaten to death and you can find information about deployments everywhere; even the commenter on your blog post makes a better suggestion than your VCS deployment strategy.
Continuous integration, source/binary distributions, automated provisioning, sandboxing, versioning, etc...; it's all out there.
For a lot of cases, you are correct that continuous testing, integration, deployment is the way to go, especially when deployed systems have many moving parts. However for small projects, mimicking a Heroku style of testing locally, commit changes, and use git hooks for deployment seems like a good solution.
I found your conclusion a bit misleading and unfortunately a lot of people often comes to the same conclusion.
Indeed the CPU is limiting factor in your bench, and while you say you're comparing apples to apples, it's not true in my opinion.
Maybe I'm wrong but I never see these actors as really competing in the same market : of course if you're looking for the best CPU perfomance/price ratio AWS is one the worst choice. But that's not what you're paying for imo, as someone already mention you're paying for the programmatic access, the ecosystem, the auto-scaling ability, monitoring, etc.
Move to AWS if you need the "Elastic" part in EC2 (Elastic Cloud Compute) or if you know what you're doing and what you're paying for.
Same for DO, what you're paying for is the SSD, they're advertising about it everywhere, so it's kind of obvious that a big part of what you'll pay for is the SSD.
Oh, I'm too late to the party.) Nevertheless, virtualization is not for production.) Even para- or hvm. I/O is a bottleneck.
Linode is just plain xen, and domUs packed not so tightly, so it performs more or less quickly if other instances are 100% idle.)
The main assumption about virtualization is that there is no two high load domU instances running in parallel, same scenario as it was with primitive apache virtual host based hostings - we could pack them tightly, each one have 100 requests per week on average.
So, virtualization works fine for almost always idle development servers, but everything would fall apart in a I/O intense production environment.
The mantra is "I/O request should be separated and data partitioned". In case of cheap virtualized "servers" storage is the first bottleneck, because you share it with other domUs. Once they're idle, you're OK. Should one next to you running, say, a torrent tracker - you're screwed.
It doesn't matter what you're running under - Xen or just FreeBSD's jails (still love them). The problem is that a HDD could perform only one operation (read or write) at a time.)
So, your dom0 is deeply in IOwait, and your, say, mysql on domU locked all your tables, waiting for insert/update completion.
I could write a brochure, but in short - virtualization on productions is the same unnecessary complications as a Java Virtual Machine. They are nice toys, but in production everything is better without them.)
Did you try multiple instances on each provider? Performance between VPS's from the same provider varies wildly. If there are 64 VPS's on a server, you will probably get a lot more than 1/64th of the processing power of that server because most of the other VPS's are idle. But how much more than 1/64th you get will vary wildly from machine to machine.
Performance between VPS's from the same provider varies wildly.
This absolutely should not be true, and in my experience it isn't true on Amazon -- you will neither be starved for resources if others do intensive things, nor will you enjoy riches if they are quiet: You will get what you are paying for.
I have a personal experience from Digital Ocean that is a bit different. Firstly let me say that I think they have a great service and compelling prices, but I set up a test server (the 2GB/2CPU variant) to trial leveraging it in the platform mix, as a solution that crosses host providers = awesomeness.
The IO performance I got was terrible, despite all of the talk about SSDs. Simple operations would stall, the CPU endlessly waiting on wa. I submitted a support ticket and quickly they toggled some priority flags and I started getting performance more along the the lines of expectations, but ultimately it seems like a classic case where single tenants can completely monopolize the platform, enjoying the entirety of the storage platform at the cost of everyone else. I'd rather that they cap consumption and do appropriate IO quanta allocations rather than leaving VMs starved.
And it really makes me concerned for the future -- do I have to constantly do benchmarks and analysis, hopping VMs just to find one that isn't an abomination? That isn't how these things are supposed to run.
One thing to consider is that different instances of the same class may be deployed on widely varying hardware. For c1.xlarge instances we've seen very different cache size and CPU specs (cat /proc/cpuinfo) and now always try to get on the machines with a 20mb cache - our benchmarks show these to be better for our needs.
I have heard anecdotally that it's not uncommon for large players to bulk-start instances and then kill all of the ones that don't have the latest hardware.
Update: I just looked at 4 random c1.xlarge instances we have running, and found 3 different types of underlying hardware:
1. Intel(R) Xeon(R) CPU E5410 @ 2.33GHz w/ 6144kb cache
2. Intel(R) Xeon(R) CPU E5506 @ 2.13GHz w/ 4096kb cache
I would guess Amazon moves things around more, but most VPS providers, which target more the usecase of you running a VPS 24/7/365, put you on a server and leave you there, rather than rebalancing for load (migrating a live guest transparently isn't that easy).
They should still set things up so you get reasonable baseline performance, even in the high-contention case, rather than overselling the resources. But you can end up with quite a bit of performance variance in the upwards "more than your fair share" direction, especially for I/O, if your neighbors are quiet. If you're on a 32-guest machine where everyone else is idling on IRC or doing nothing, you get a whole disk to yourself; if everyone is doing random alternating reads and writes, you get a 1/32 of a thrashing disk's worst-case throughput. Usually you get something in between.
Can you provide any more specific search criteria?
I've found Amazon instances to be quite consistent. They vary, of course, but quite contrary to your initial statement that they vary wildly, I find the variances quite small, and there isn't a need to constantly hunt for ripe instances. I have absolutely found what you said to be sadly true on quite a few other VM hosts.
I have found the $10 DO server to be about 4 times as fast as the $20 Linode on one of our disk heavy workloads. I'm not going to pretend that this is representative, though.
My two cents here: Benchmarking is all fine, but from my point of view, the performance-price-ratio is not sooooo important in hosting.
This discussion here reminds me of PC customers buying behavior in the 90ies. What's better AMD or Intel? ... Nowadays other features are key: What's the weight of this device? How thick is it even? Apple has changed the way we look at these things today.
Convenience also matters in hosting a lot. How much time do i have to spend to have my app up and running? Do i really want to set up and maintain everything myself? How good is the support? Do i want just bare metal computing resources or a solution provider with an eco system?
What matters the fastest server ever, when the queries are slow? The performance of any app/website relies heavily on the engineering skills of the developers. See caching, see i/o load, see frontend technolgies, see #perfmatters.
And compare it to a dedicated box, and they all suck. If you're looking for a stationary server and just need raw power, get a dedicated server plan and good backups.
If you need what Amazon offers, all of the AWS services or most of them, then go for it. But you don't use AWS for raw power.
I'm interested in seeing Google's Compute Engine show up in some of these benchmarks. Any reason why someone would not want to use them? Some benchmark's I've seen show them with awesome stats.
Something to consider is that DO doesn't offer internal network IP's. So if you have something like a cache farm, you have to access them via external IP's. Linode is supposed to internal IP's, although some manual tickering is required. I'm not sure about EC2, but I'll assume they do.
On a cost per hour basis, DO is the cheapest. Linode next. And Amazon most expensive.
Also, DO is only located in NY, while the others offer central and west cost locations.
What's the recommendation for hosting Postgres on a VPS these days? DigitalOcean doesn't seem to offer NAS drives, and I believe Linode is in the same boat.
This is only true with the small instance type though, which is severely CPU handicapped. The larger instances (> medium) get the full CPU throughput. Their "EC2 Compute Unit" rating is mostly marketing, I could never verify it by practice. Some instances with twice the ECU actually give more than twofold performance, while instances with 1 ECU should be renamed to 0,25 ECU.
I would expect anyone who works with AWS to know this by now.
I am curious about this. No doubt the Amazon small is quite the laggard, but as price and capability scale on each platform, I would be curious to know what the results look like.
Secondly, if you wanted to get truly accurate results, doing this same test over an array of provisioned instances would be good, as quality can vary.
Still, interesting information nonetheless. Really surprised how much faster Linode was than DigitalOcean.
I've used Cloud66 for a few sites and would recommend them.
I like the workflow for creating stacks and I really like the polyglot nature of selecting which cloud to deploy to.
They did have a security incident a while back and I'd recommend reading up on that. They had a blog post about it but it appears they've since pulled that. I don't know what that's about.
I'll tell you that you'll be hard-pressed to find better customer support/service in another company. They're open to suggestions and will help you if you ever run into an issue deploying an app.
This is one of the things which annoys me about AWS, and certain other providers which do a per GB cost on outgoing bandwidth.
We get around 50TB of data transfer per month from Linode, from having around $500/month worth of servers. The cost of using this much data with EC2 - around $5400 a month. It's not even close to being competitive.
you wouldn't want to do this straight out of EC2, but once you add S3=>CloudFront to the mix it's much less bad. Only thing you want to be serving from your EC2 boxes and load balancer is gzipped HTML/CSS/JS (could even do CSS/JS on S3/CF if you wanted).
It would be helpful if the author included the round-trip latency (even some simple pings) alongside the benchmark so we could judge how far the servers are from the client.
This will have a significant impact on the ab numbers (particularly as he's not using the -k option and therefore establishing a new TCP session for every request).
Those of you who made the switch from EC2 to any of the other providers, what are you missing the most? I'm right now pretty integrated into the whole VPC/Route53/IAM/S3/SES web and get the feeling that it'd be pretty rough to untangle myself from that.
The benchmark is on web requests per second, so it really needs to cite each server's location or ping latency relative to the machine doing the benchmarks. It's unfair if one of the servers is on the opposite coast of the other 2.
I don't know that comparing EC2 reserved instances to DigitalOcean is really "apples-to-apples", since DigitalOcean has hourly billing and no upfront cost just like EC2 on-demand, and unlike EC2 reserved.
Their ridiculous contracts rule them out for business use from the start. Start at "3.3 Use of Our services in a way that conforms to Our Ethical code" in the general service agreement if you're curious.
I call these "weekend benchmarks" -- what you'd typically do when you have a block of free time, then spend time optimizing for said benchmark. Roll in on Monday with some staggering results, only to find one (or many) of your variables were off.
Did the author try multiple instances on each provider? VM tenancy is a bitch. (Think of how annoyed you get at the noise levels, when your neighbor in the next apartment throws a party)
Is the author's source benchmarking machine, a physical machine, or a virtualized guest? Does it have power savings turned off so that the process is running at 100% speed, instead of a variably-clocked down core?
Did the author enable or disable TCP time-wait recycle? So he doesn't bump into said ceiling when running such tests back to back?
Did the author run the tests back to back, or have a cool down period between tests?
Where was the author's network upstream located when he tried said tests? Were there any network issues at the time of the test? Would the author even be aware of them?
Your page you're testing against, does the same database call, which is presumably cached. Did he throw out the cached results? Can he identify the cached results?
Are we firing up ApacheBench with HTTP keepalives enabled? With parallel connections? How many parallel connections?
How many Apache servers (StartServers, MinSpareServers, etc)? Which httpd modules were enabled? Which httpd modules were disabled? Which PHP modules were enabled?
You're trying to benchmark CPU and I/O horsepower across three different platforms but doing it through this narrow "straw" which consists of your "independent server", your upstream, your upstream's upstream, your upstream's upstream's peering connection with Amazon/Linode/DigitalOcean, your web server and its PHP module, your application, and MySQL.
If you're rolling your eyes at this, then you shouldn't be doing weekend benchmarks.
I'll leave you with this as well:
http://zedshaw.com/essays/programmer_stats.html