Hacker News new | past | comments | ask | show | jobs | submit login
All Your IOPS Are Belong to Us: Case Study in Performance Optimization (2015) [pdf] (percona.com)
67 points by znpy on Jan 28, 2018 | hide | past | web | favorite | 16 comments

Hi. I'm the author of that presentation, and I just got a text message from a friend saying that I'm on the front page of Hacker News...

The one thing in that talk that I was never 100% sure of was whether it was block-mq that provided the performance improvement. It wasn't until about a year later that I came across some articles which confirmed that that it was actually due to the development of, and subsequent fixes to, the Xen persistent-grants feature.

In other words, ignore slides 15 and 16. =/

Do you have updates on how the performance is on 4.x kernel versions?

Yes and no. I have all sorts of test results for the 4.x kernel, but they are for i3 instances rather than i2.*, so they wouldn't be directly comparable. Your question kind of makes me think I should put together an updated version of this talk; I've gathered enough material over the last couple of years that would probably be useful to somebody.

Yes, that would be useful. 4.x kernels has some block io improvements and some recent phoronix benchmark shows ext4 making huge strides.

This was my first thought, kernel 3.x is more than a little dated now and there is a huge amount of IO performance and latency related changes that have been incorporated since the 3.x days.

Warning: this is from a 2015 talk, but I still found it interesting as it has shown me a very good way to approach the problems described in the early slides.

I find this low-level optimization and performance tuning fascinating. Can anyone recommend any good resources to get started with solving these kinds of problems?

If only there were an AWS service that existed purely for solving these exact problems so that your most technically talented employees could spend time working on your product instead of dicking around with linux kernel settings.

At the time of making this presentation AWS did not have anything in their offer that could match tuned MySQL on i2 instances. Aurora was just getting started.

But nowadays? I'm all in for Aurora.

According to OP, 800 IOPS was the bottleneck and i2 compute capacity was overkill. RDS offers provisioned IOPS (aka PIOPS) - up to at least 30000 at the time (https://aws.amazon.com/about-aws/whats-new/2014/10/09/amazon...).

You mean migrate to GCE?

I mean RDS.

Ok, cool. However, switching to RDS would have been a no-brainer in the OP's particular scenario.

Not really. The migration of hundreds of terabytes of data from one storage solution to another is never a no-brainer, and it's not just because of technical concerns. RDS is a very good solution for a lot of people, but it's not the right fit for everyone.

OP spent multiple months playing around with the kernel and mysql parameters to solve the issue. What happens when they have to do the same thing for the next version of the kernel, or patches like Meltdown? RDS has entire massive teams dedicated to solving exactly this problem for OP's exact use-case.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact