Hacker News new | past | comments | ask | show | jobs | submit login
Azure Storage: How fast are disks? (grsplus.com)
82 points by ed_elliott_asc on July 10, 2018 | hide | past | favorite | 25 comments

Perhaps this is separate, but Azure has a bit of a latency problem on their App Service Plans and the storage. I know that it has something to do with how they are able to offer instant scaling up, down and out. And the ability to move an app slot to another service plan instantly. However it does have some major performance issues, it's also very slow to unzip items. Unzipping from the terminal will end up just timing out (though the process will complete, eventually.) This causes auto-update failures way too often. Wordpress, Magento, etc objects load really slow for example. I hope someone at Microsoft reads this and figures out a way to bring this latency down. It's a really great service otherwise, in that you don't have to worry about patching your server.

I've never had an issue with their blobs or CDN speeds, but the app service disk latency is an issue. Specifically, in my case, with PHP apps.

We’ve had similar issues and timeouts in Node and PHP apps.

There is a cache setting you can put in your slot that will fix it, but the app has to be restarted to clear the cache. It's not selective on what gets cached...and it doesn't work well with most apps. If you could selectively add directories that cache it would work so much better, or if it worked in some asynchronous way if a file changed in the cache (for apps that auto-update.) I don't know the solution, but it's an annoying problem.

If you Google/Bing for "Azure Website slow" this issue is the cause is 99% of all the complaints. The current solution "Local Cache" is lacking. It does work, but not in my use cases.


If something works poorly without manually applying cache, it reminds me of WordPress. It seems it's nearly impossible to get Azure on par with the rest in terms of performance.

Is there any good reason to use Azure for those kind of apps when you have plenty of competitors with much better performance?

Such as?

Azure disk speed for VMs is absolutely horrendous for the amount you have to pay. Our workloads are so bad because of this.

I completely agree. For some reason they're not 100% ssd and that changes everything. That is the main reason I avoid Azure. Last time I tested (about 8 months ago) I got a new windows server instance with 4gb ram and do a windows update. In my laptop takes 20 minutes, aws takes 40 minutes or so, azure takes almost an entire day! Partly this is because MS doesnt keep their images as updated as AWS, but mostly is because HDD instead of SDD as default. Not advocating anything, just my experience.

What is the point of using Azure if the performance is way worse than competitors?

I know what I see in my apps. Yes, Azure is worse than AWS. AWS has its flaws, but Azure is simply very slow in comparison.

what vm type do you have? you have to be careful not to hit the throttle limits otherwise your performance is awful.

what matters is:

- chosen vm size - disk striping (or apps that use multiple disks) - whether you attach with caching on or off, so you can work with both throttles

It would be far more useful to see the upper percentiles of latency than an average, and to see the tests run over a sustained period of time.

As usual, the interesting information comes not during regular operation, but during incidents, where it is not uncommon for the latency to spike _massively_.


The percentiles are in the images, I was being lazy and didn’t add them to the text. If enough (well a few) people want them I’ll add them.

Azure storage either works for days and weeks 100% consistently or they have an outage and are down for days so even when I have run longer tests they don’t show any didference.

If you monitor and they have an outage the latency basically goes to a day an iop :)

Heh - I only did a scan read, but the 99.9th percentile would be much more useful than the average in the table.

I agree on the last part - but I think it's important to measure the effects during outage too. One we often see is fsync operations taking minutes with write caching disabled - this is more common than just during outages however!

Leave it with me and I’ll get some longer runs

I ran some quick tests on Azure with an app which relies heavily on fast write ops. On a dedicated server with SDD this is very fast in the range of <1ms (in relation negible latency). But on Azure there were random spikes of >1s writes. I haven’t tested any more but will do some more thorough tests.

> HDD uncached read latency: 11ms, SSD: 4.5, local SSD: 1ms

I am surprised by the poor performance of the remote SSD. I was under the impression that the ip latency within the same datacentre would be <1ms. Does anyone know what is causing this?

We even have two completely different datacenters within Frankfurt, connected through redundant waves that are most of the time < 0.6-0.8ms

This is a good point, I’ll replicate some of the tests across regions to compare, I expect there will be a difference, maybe within datacenters too

A lot of FUD in this thread here.

If you're using the free tier of App Service plans, yes you're going to have perf limitations. And that's not Azure specific.

If you choose hard drives that are HDD and not SSD, you're going to have perf limitations. That's not Azure specific.

The costs for SSD is about the same between AWS/Azure/GCP.

I’ve pulled 40k iops across 3 disks in azure running ssd. The new managed disks are fast enough for our workloads. Yes they cost money, but looking at overall cost disk is ~15-18% of total spend for us.

There are some bugs with the new archive tier and some weird behaviors for non ssd disks when you start forcing high queues.

What sort of monthly spend?

We stripe two sets of 4 P30's to get up to 2,000 MB/sec on GS 5's - storage costs about £800 per vm per month for that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact