Perhaps this is separate, but Azure has a bit of a latency problem on their App Service Plans and the storage. I know that it has something to do with how they are able to offer instant scaling up, down and out. And the ability to move an app slot to another service plan instantly. However it does have some major performance issues, it's also very slow to unzip items. Unzipping from the terminal will end up just timing out (though the process will complete, eventually.) This causes auto-update failures way too often. Wordpress, Magento, etc objects load really slow for example. I hope someone at Microsoft reads this and figures out a way to bring this latency down. It's a really great service otherwise, in that you don't have to worry about patching your server.
I've never had an issue with their blobs or CDN speeds, but the app service disk latency is an issue. Specifically, in my case, with PHP apps.
There is a cache setting you can put in your slot that will fix it, but the app has to be restarted to clear the cache. It's not selective on what gets cached...and it doesn't work well with most apps. If you could selectively add directories that cache it would work so much better, or if it worked in some asynchronous way if a file changed in the cache (for apps that auto-update.) I don't know the solution, but it's an annoying problem.
If you Google/Bing for "Azure Website slow" this issue is the cause is 99% of all the complaints. The current solution "Local Cache" is lacking. It does work, but not in my use cases.
If something works poorly without manually applying cache, it reminds me of WordPress. It seems it's nearly impossible to get Azure on par with the rest in terms of performance.
I completely agree. For some reason they're not 100% ssd and that changes everything. That is the main reason I avoid Azure. Last time I tested (about 8 months ago) I got a new windows server instance with 4gb ram and do a windows update. In my laptop takes 20 minutes, aws takes 40 minutes or so, azure takes almost an entire day! Partly this is because MS doesnt keep their images as updated as AWS, but mostly is because HDD instead of SDD as default. Not advocating anything, just my experience.
It would be far more useful to see the upper percentiles of latency than an average, and to see the tests run over a sustained period of time.
As usual, the interesting information comes not during regular operation, but during incidents, where it is not uncommon for the latency to spike _massively_.
The percentiles are in the images, I was being lazy and didn’t add them to the text. If enough (well a few) people want them I’ll add them.
Azure storage either works for days and weeks 100% consistently or they have an outage and are down for days so even when I have run longer tests they don’t show any didference.
If you monitor and they have an outage the latency basically goes to a day an iop :)
Heh - I only did a scan read, but the 99.9th percentile would be much more useful than the average in the table.
I agree on the last part - but I think it's important to measure the effects during outage too. One we often see is fsync operations taking minutes with write caching disabled - this is more common than just during outages however!
I ran some quick tests on Azure with an app which relies heavily on fast write ops. On a dedicated server with SDD this is very fast in the range of <1ms (in relation negible latency). But on Azure there were random spikes of >1s writes. I haven’t tested any more but will do some more thorough tests.
I am surprised by the poor performance of the remote SSD. I was under the impression that the ip latency within the same datacentre would be <1ms. Does anyone know what is causing this?
I’ve pulled 40k iops across 3 disks in azure running ssd. The new managed disks are fast enough for our workloads. Yes they cost money, but looking at overall cost disk is ~15-18% of total spend for us.
There are some bugs with the new archive tier and some weird behaviors for non ssd disks when you start forcing high queues.
I've never had an issue with their blobs or CDN speeds, but the app service disk latency is an issue. Specifically, in my case, with PHP apps.