Hacker News new | comments | show | ask | jobs | submit login


I call these "weekend benchmarks" -- what you'd typically do when you have a block of free time, then spend time optimizing for said benchmark. Roll in on Monday with some staggering results, only to find one (or many) of your variables were off.

Did the author try multiple instances on each provider? VM tenancy is a bitch. (Think of how annoyed you get at the noise levels, when your neighbor in the next apartment throws a party)

Is the author's source benchmarking machine, a physical machine, or a virtualized guest? Does it have power savings turned off so that the process is running at 100% speed, instead of a variably-clocked down core?

Did the author enable or disable TCP time-wait recycle? So he doesn't bump into said ceiling when running such tests back to back?

Did the author run the tests back to back, or have a cool down period between tests?

Where was the author's network upstream located when he tried said tests? Were there any network issues at the time of the test? Would the author even be aware of them?

Your page you're testing against, does the same database call, which is presumably cached. Did he throw out the cached results? Can he identify the cached results?

Are we firing up ApacheBench with HTTP keepalives enabled? With parallel connections? How many parallel connections?

How many Apache servers (StartServers, MinSpareServers, etc)? Which httpd modules were enabled? Which httpd modules were disabled? Which PHP modules were enabled?

You're trying to benchmark CPU and I/O horsepower across three different platforms but doing it through this narrow "straw" which consists of your "independent server", your upstream, your upstream's upstream, your upstream's upstream's peering connection with Amazon/Linode/DigitalOcean, your web server and its PHP module, your application, and MySQL.

If you're rolling your eyes at this, then you shouldn't be doing weekend benchmarks.

I'll leave you with this as well:


Do you have something better to point to? It's easy to complain about stuff, but at least he's out there trying to do something. Presumably it can be improved.

I'm particularly fond of the quote "lead, follow, or get the hell out of the way", which is a bit harsh in this case because a lot of your advice is good. It could be framed in a more constructive way, though - there's some Comic Book Guy tone there in your comment.


My intended tone was not "don't try", but "try harder".

I've listed at least 5 ways to improve/normalize the testing, as well as linking to a document that does a pretty good job of explaining statistics (particularly, how programmers do a bad job of statistics; baselines for benchmarks; etc).

"At least he's out there trying" -- with this not-so-great benchmarking, the author has just effectively SHITTED on 2/3 companies that have gone to great lengths to build amazing infrastructure AND managed to spread his FUD around the web, to the point where it reached the HN front page -- and you want credit for trying?

Get the hell out of the way.

It's true: you don't say something like "Well, Amazon just sucks." without backing the statement up with something more credible. As someone a little less savvy on the topic I'm glad to know that the test wasn't even close to the final word and why. Thank you.

It's probably also true that your tone is more abrasive than it needs to be.

Part of the problem is a benchmark like this, put together with plain ignorance, and published on the web will give people who don't know any better all kinds of false perceptions.

Well, if these benchmarks are misguided and useless, it is useful to recognize it as such so you don't go out and trade all your EC2 instances for Linodes.

"At least we're doing something!" is a silly defense.

They probably have some faults, but the general conclusions smell right to me: I don't think they're in the "really screwed up and wildly misleading" category, but in the "ok, interesting, could use some work though" category.

There is nothing you should be more wary of than a benchmark that matches your pre-existing intuition. It'll lead you to ignore serious methodological issues, without any sound scientific (or any other epistemological) reason. https://speakerdeck.com/alex/benchmarking is a slide deck I gave at my office on how to do better benchmarking

EDIT: I should probably mention I work at Rackspace, and thus everything I say on this subject should be taken with appropriate grains of sand :)

Well AWS is slow is not surprise, there was an article on HN in last couple of weeks that pitted Azure against AWS and AWS was faster hands down. So this is no surprise at least to me.

This statement is exactly the problem he is describing. :) One metric for a specific use case or scenario is a terrible indicator of overall "quality". It is much more nuanced than that. I think the worst tickets I've gotten in the 10 years sofar sysadmining is when a customer just states their app is "slow".

There are other studies that have compared the offerings on multiple benchmarks as well. Here's the one I was referring to: https://zapier.com/engineering/quick-redis-benchmark-aws-vs-... see the comparison for yourself on multiple benchmarks.

Yes in simplistic terms for a specific metric I'm sure other providers have better hardware than AWS, if that is all one wants to base their value of "better" on then so be it, but that is pretty naive.

Many argue that the AWS ecosystem (25 services at last count) and the extensive featureset of AWS outweigh the bare bones "fast" metrics of other providers.

I think like the poster above is mentioning...there is generally more to it than a simple metric or two sampled a few times from a single endpoint. But I guess it all lies on ones definition of what they consider valuable...

If performance is the primary concern you wouldn't pick such small server/instance types to benchmark. It's like organizing a drag race between a scooter, a moped and a kid on a bicycle.

Another major flaw is taking results for a single instance type and implying that those apply to all instance sizes and each provider as a whole.

If you're going to do a benchmark at least pick something realistic like the m3.* types: http://aws.amazon.com/ec2/instance-types/instance-details/

At least the author had enough sense not to do the bench on a t1.micro

Honest Question: Why not launch your app's infrastructure on both platforms and then round-robin your traffic between the two for a billing cycle and compare the results at the end?

If you are working towards "best practices" on AWS, you should be running multi-region (who wants to be the one left holding the bag when US-EAST goes down again?). If you've done all the heavy lifting to enable yourself to run in "pods" across mutliple regions.

Well, if you can do that, why not treat Linode/DO/Rackspace as a separate region and deploy a "pod" of servers there?

At the end of a month you should have enough statistics that are directly applicable to your own app and your specific customers, as well as some real experience with the operational experience of dealing with the new provider.

For example, maybe one of the other providers has really fast machines and their major upstream provider has a great peering relationship with whatever test node you were using for these microbenchmarks, but perhaps those servers are really flaky and crash all the time, or perhaps the majority of your customers see really bad latency when hitting those servers? Maybe their API isn't just "immature", maybe it crashes a lot and they have bad customer service.

Those are the sorts of things you aren't going to figure out after simply running a few load tests. Anyhow, it just seems like something like this would be a lot more valuable than any amount of synthetic testing.

edit: typos

Yeah, it's really surprising that a server with 8 cores is rocking a server with 1 / 2 running apache. This could all be explained by latency.

I'd much rather see a two or three synthetic benchmarks around harddrive throughput/latency, memory throughput/latency and CPU.

To be fair to Amazon the author should have spun up 100 instances, benchmarked them, and used the fastest one. This is a best practice for AWS.

To me, that's an infrastructure smell.

How would you ensure that an instance launched on hardware bought 6 months ago is identical to the hardware under an instance launched N years ago and still running? Buy old hardware on eBay to prevent newer hardware introducing variation?

I would start with a provisioning process that isn't completely opaque about such things.

> To be fair to Amazon the author should have spun up 100 instances, benchmarked them, and used the fastest one. This is a best practice for AWS.

Why choose the fastest one? This would be the least accurate way to give an indication if the performance of AWS instances. A mean, or perhaps median depending on the skew, would be a better choice.

Unless you're saying that the benchmarks on the other servers are effectively cherry picked best results.

That was a nice article. Any more like it?

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact